Please use this identifier to cite or link to this item:
https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4927
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Rupesinghe, O V | - |
dc.date.accessioned | 2025-08-21T08:25:20Z | - |
dc.date.available | 2025-08-21T08:25:20Z | - |
dc.date.issued | 2025-06-29 | - |
dc.identifier.uri | https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4927 | - |
dc.description.abstract | Abstract This dissertation introduces a end-to-end, data-driven framework for automated grammar error detection (GED) in Sinhala, a low-resource, free-word-order language. We begin by constructing a 400 sentence Universal Dependencies (UD) treebank via a hybrid LLMand- expert annotation pipeline. Using custom 300 dimensional FastText embeddings, we train a graph-based UUParser with targeted data augmentation and cross-lingual transfer from Hindi. Our final parser achieves an Unlabeled Attachment Score (UAS) of 71.37% and Labeled Attachment Score (LAS) of 55.42%, and attains sentence-level parse accuracy of 82% on a standard correct corpus outperforming a leading CFG-based parser (60%) while retaining 64% accuracy on free-word-order variants . Building on this, we generate a synthetic GED corpus of 10,000 sentences covering five error types. We engineer multi-level token features—pretrained word embeddings, POS embeddings, morphological concatenations, dependency relation embeddings, and syntactic n-grams— and train a BiLSTM classifier. The combined model delivers 80% overall classification accuracy . On a 200 sentence standard evaluation set, it correctly classifies 82% versus 60% for prior CFG-based methods, and it generalizes to free-word-order GED with 64% accuracy. Our contributions include a UD treebank for Sinhala, an optimized dependency parser pipeline, the first dependency-enhanced GED classifier for Sinhala, and a synthetic error corpus. These results confirm that combining hierarchical dependency features with surface-level features significantly boosts GED in challenging low-resource, morphologically rich, free-word-order settings. | en_US |
dc.language.iso | en | en_US |
dc.title | Dependency Based Grammar Error Detection For Low Resource Languages | en_US |
dc.type | Thesis | en_US |
Appears in Collections: | 2025 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
20001525 - O V Rupesinghe - oshada rupasinghe.pdf | 4.89 MB | Adobe PDF | View/Open |
Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.