Bootstrapping Sinhala Named Entities for NLP Applications

Jayasinghe, K.L.

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4195

Full metadata record

DC Field	Value	Language
dc.contributor.author	Jayasinghe, K.L.	-
dc.date.accessioned	2021-07-22T10:19:11Z	-
dc.date.available	2021-07-22T10:19:11Z	-
dc.date.issued	2021-07-22	-
dc.identifier.uri	http://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4195	-
dc.description.abstract	Popular languages have lots of data pools to use in linguistic data applications. But languages like Sinhala have lack of data. Because of that researchers conducted studies to increase labeled data as part of speech words, Named entities and other semantic categories. Most of their studies are based on supervised learning or statistical methods which require big effort to label the train data. The proposed solution tries to design a method that requires less effort and increase the labeled Sinhala named entity data in average accuracy. It is a semi-supervised bootstrapping method which uses an iterative seeding mechanism to extract named entities in person and location categories. The complete process conducted in two main phases. First one was the bootstrapping process and outputs of the process used to train the supervised learning process which is the second phase. So evaluation was also conducted in two phases. The first intermediate bootstrapping result shows 91% accuracy and the second phase result is also shown the intended accuracy level.	en_US
dc.language.iso	en	en_US
dc.subject	Sinhala Named-entity recognition	en_US
dc.subject	Bootstrapping	en_US
dc.subject	Semi-supervised learning	en_US
dc.title	Bootstrapping Sinhala Named Entities for NLP Applications	en_US
dc.type	Thesis	en_US
Appears in Collections:	2018

Files in This Item:

File	Description	Size	Format
2014CS051.pdf		550.41 kB	Adobe PDF	View/Open

Show simple item record