Please use this identifier to cite or link to this item:
https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/3680
Title: | Hybrid Part of Speech Tagger for Sinhala Language |
Authors: | Gunasekara, N.A.K.B.D. |
Keywords: | Hybrid tagger Part of Speech tagging Sinhala Language rule based tagging stochastic tagging Hidden Markov Model |
Issue Date: | 8-Sep-2016 |
Abstract: | This research presents a hybrid Part of Speech tagging approach which utilizes both rule based and the stochastic approaches for Sinhala Language. In the first phase, Hidden Markov Model based stochastic tagger is constructed which is based on bi-gram probabilities. A stemmer is used in the tagging process to enhance the accuracy of the tagger. An experiment on three POS tag set versions is carried out to come up with the best tag set which leads towards a meaningful and precise tagging process for Sinhala Language. Since Sinhala is a morphologically rich language, rules based on morphological features are used to predict the relevant tag for unknown words which are not presented in the training set. Further, an experiment is carried out to find out whether the implemented hybrid tagger can be used to enhance the size of the data set. The implemented hybrid tagger is successful in achieving an overall accuracy of 72% when the average unknown word percentage is 20%. |
URI: | http://hdl.handle.net/123456789/3680 |
Appears in Collections: | SCS Individual Project - Final Thesis (2015) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
11001623_NAKBDGunasekara.pdf Restricted Access | 1.74 MB | Adobe PDF | View/Open Request a copy |
Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.