Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/3680
Title: Hybrid Part of Speech Tagger for Sinhala Language
Authors: Gunasekara, N.A.K.B.D.
Keywords: Hybrid tagger
Part of Speech tagging
Sinhala Language
rule based tagging
stochastic tagging
Hidden Markov Model
Issue Date: 8-Sep-2016
Abstract: This research presents a hybrid Part of Speech tagging approach which utilizes both rule based and the stochastic approaches for Sinhala Language. In the first phase, Hidden Markov Model based stochastic tagger is constructed which is based on bi-gram probabilities. A stemmer is used in the tagging process to enhance the accuracy of the tagger. An experiment on three POS tag set versions is carried out to come up with the best tag set which leads towards a meaningful and precise tagging process for Sinhala Language. Since Sinhala is a morphologically rich language, rules based on morphological features are used to predict the relevant tag for unknown words which are not presented in the training set. Further, an experiment is carried out to find out whether the implemented hybrid tagger can be used to enhance the size of the data set. The implemented hybrid tagger is successful in achieving an overall accuracy of 72% when the average unknown word percentage is 20%.
URI: http://hdl.handle.net/123456789/3680
Appears in Collections:SCS Individual Project - Final Thesis (2015)

Files in This Item:
File Description SizeFormat 
11001623_NAKBDGunasekara.pdf
  Restricted Access
1.74 MBAdobe PDFView/Open Request a copy


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.