Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/3121
Title: A Hybrid Approach for Named Entity Recognition in Sinhala Language
Authors: Udayangi, K.A.I.
Issue Date: 25-May-2015
Abstract: Named Entity Recognition(NER) is one of the preliminary and signi cant tasks to be performed for many Natural Language Processing (NLP) applications. Development of such mechanisms for Indic languages like Sinhala is very dormant. This research describes the development of a hybrid NER system, which uses Conditional Random Fields (CRF) as the data driven technique, followed by a rule-based post-processor for identifying Sinhala Named Entities (NEs).The system makes use of orthographic word-level features along with contextual information, which are helpful in predicting three di erent NE classes. Training of the system is done using a manually prepared annotated corpus which contains randomly mixed 90% of not named entities selected from a hand tagged corpus with 90% of person names, 90% of location names available at LTRL of UCSC. The system performance has been tested using a test corpus prepared by mixing the rest of 10% of each. We performed several experiments to nd out the most suitable su x and pre x lengths for NER in Sinhala and came up with a combination of current word with 5 pre xes and 5 su xes. Research focuses on several strategies of training and testing the CRF++ model exploring an e cient and e ective NER system for Sinhala language. Explored strategies for training and testing the CRF++ model provided both pros and cons at each end. Research conducted has made prominent contributions towards improving the e ectiveness of NER in Sinhala language and shed light on much improved and much e ective ways of NER in Sinhala language.
URI: http://hdl.handle.net/123456789/3121
Appears in Collections:SCS Individual Project - Final Thesis (2014)

Files in This Item:
File Description SizeFormat 
A Hybrid Approach for Named Entity Recognition in Sinhala Language_2010CS077.pdf
  Restricted Access
1.14 MBAdobe PDFView/Open Request a copy


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.