Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4945
Title: Applicability of Transfer Learning on Sinhala Named-Entity Recognition
Authors: Abeynayaka, A.G.K.C
Keywords: Sinhala, Named Entity Recognition, Transfer Learning, IndicBERT, CRF
Issue Date: 28-Apr-2025
Abstract: ABSTRACT Named Entity Recognition (NER) is a preliminary task in Natural Language Processing. NER has evolved from relying on rule-based mechanisms to utilizing neural networks. NER is a pretty much resolved matter in the English language. The Sinhala language faces the issue of data scarcity due to its complexities with dataset extraction. Manual annotation of a Sinhala-labeled dataset is a laborious task. Entity recognition solely depends on a tagged dataset in a specific language, but due to data limitations, it’s hard to do experiments on NER models in Sinhala. However, most of low-resource NLP researches shows remarkable improvement with the knowledge transferring mechanism, which is known as transfer learning. This research suggests a Sinhala NER model based on transfer learning, considering monolingual and multilingual approaches. An Indic language model is fine-tuned for the target Sinhala NER model during both approaches. The IndicBERT(Kakwani et al. 2020) model is chosen as the source model due to its similarity with Sinhala. The evaluations were done on monolingual and multilingual datasets. For the monolingual dataset, a separate dataset was created using a weakly supervised automatic method that contains six different categories. The multilingual dataset was created with a Bengali dataset. The final transfer learning model was trained on hyperparameter tuning followed by an augmented dataset from monolingual data. It showed a moderate precision of 48.21%. The baseline CRF model showed a macro precision of 90% and a macro F1-score of 61% showing that CRF is applicable in normal contexts.
URI: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4945
Appears in Collections:2025

Files in This Item:
File Description SizeFormat 
20000049-AGKCAbeynayaka - Kavisha Abeynayaka.pdf1.55 MBAdobe PDFView/Open


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.