UCSC Digital Library Collection:

UCSC Digital Library Collection: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/2380 2026-07-18T15:17:50Z Named Entity Recognition For Sinhala Language https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/2469 Title: Named Entity Recognition For Sinhala Language Authors: Dahanayaka, J.K. Abstract: Today with the vast growth of technology and information content, there is a need of retrieving the required information more efficiently out of the huge unstructured contexts with own native languages. To fulfill that need Natural Language Processing related research areas such as Information Extraction, Machine Translation, Information Retrieval and Automatic Summarization are essential. In all those areas Named Entity Recognition is one of the preliminary task that has to be performed. However it is challenging to build a proper Named Entity Recognition (NER) System especially for Indic languages because of the features inherited. Sinhala language, mother tongue of Sri Lanka belongs to Indo Aryan branch of Indic language family, still has not any proper NER system to be use in its Machine Translation and Information Extraction tasks. Although Latin languages like English having far better NER solutions, Sinhala could not apply them directly as those systems use capitalization as a major crucial feature which Indic languages misses. Since there have not much previous work based on NER for Sinhala, the concept and the needed resources has to be built from them sketch. It is believed that there will be a higher probability about the applicability of the algorithms used for Indian languages to Sinhala language too. So this dissertation tries to find out the effectiveness of using data-driven techniques to detect NEs in Sinhala text. Two data-driven techniques, Conditional Random Fields and Maximum Entropy model has been tried out. To improve the performance language dependant as well as language independent features in Sinhala text were added. Conditional Random Fields model outer performs well expressing high precision, reasonable recall and f-measure respectively 91.64%, 69.34%and 78.95% while Maximum Entropy model expressed 81.71%, 51.34% and 63.06%. 2014-05-20T00:00:00Z