Please use this identifier to cite or link to this item:
|Title:||Bootstrapping Sinhala Named Entities for NLP Applications|
|Keywords:||Sinhala Named-entity recognition|
|Abstract:||Popular languages have lots of data pools to use in linguistic data applications. But languages like Sinhala have lack of data. Because of that researchers conducted studies to increase labeled data as part of speech words, Named entities and other semantic categories. Most of their studies are based on supervised learning or statistical methods which require big effort to label the train data. The proposed solution tries to design a method that requires less effort and increase the labeled Sinhala named entity data in average accuracy. It is a semi-supervised bootstrapping method which uses an iterative seeding mechanism to extract named entities in person and location categories. The complete process conducted in two main phases. First one was the bootstrapping process and outputs of the process used to train the supervised learning process which is the second phase. So evaluation was also conducted in two phases. The first intermediate bootstrapping result shows 91% accuracy and the second phase result is also shown the intended accuracy level.|
|Appears in Collections:||2018|
Files in This Item:
|2014CS051.pdf||550.41 kB||Adobe PDF||View/Open|
Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.