Please use this identifier to cite or link to this item:
Title: Bootstrapping Sinhala Named Entities for NLP Applications
Authors: Jayasinghe, K.L.
Keywords: Sinhala Named-entity recognition
Semi-supervised learning
Issue Date: 22-Jul-2021
Abstract: Popular languages have lots of data pools to use in linguistic data applications. But languages like Sinhala have lack of data. Because of that researchers conducted studies to increase labeled data as part of speech words, Named entities and other semantic categories. Most of their studies are based on supervised learning or statistical methods which require big effort to label the train data. The proposed solution tries to design a method that requires less effort and increase the labeled Sinhala named entity data in average accuracy. It is a semi-supervised bootstrapping method which uses an iterative seeding mechanism to extract named entities in person and location categories. The complete process conducted in two main phases. First one was the bootstrapping process and outputs of the process used to train the supervised learning process which is the second phase. So evaluation was also conducted in two phases. The first intermediate bootstrapping result shows 91% accuracy and the second phase result is also shown the intended accuracy level.
Appears in Collections:2018

Files in This Item:
File Description SizeFormat 
2014CS051.pdf550.41 kBAdobe PDFView/Open

Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.