Bootstrapping Sinhala Named Entities for NLP Applications

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4195

Title:	Bootstrapping Sinhala Named Entities for NLP Applications
Authors:	Jayasinghe, K.L.
Keywords:	Sinhala Named-entity recognition Bootstrapping Semi-supervised learning
Issue Date:	22-Jul-2021
Abstract:	Popular languages have lots of data pools to use in linguistic data applications. But languages like Sinhala have lack of data. Because of that researchers conducted studies to increase labeled data as part of speech words, Named entities and other semantic categories. Most of their studies are based on supervised learning or statistical methods which require big effort to label the train data. The proposed solution tries to design a method that requires less effort and increase the labeled Sinhala named entity data in average accuracy. It is a semi-supervised bootstrapping method which uses an iterative seeding mechanism to extract named entities in person and location categories. The complete process conducted in two main phases. First one was the bootstrapping process and outputs of the process used to train the supervised learning process which is the second phase. So evaluation was also conducted in two phases. The first intermediate bootstrapping result shows 91% accuracy and the second phase result is also shown the intended accuracy level.
URI:	http://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4195
Appears in Collections:	2018

Files in This Item:

File	Description	Size	Format
2014CS051.pdf		550.41 kB	Adobe PDF	View/Open