Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4233
Title: Contextual Suggestion Engine for UCSC Singlish Unicode Converter
Authors: Bandara, W.M.J
Issue Date: 27-Jul-2021
Abstract: Contextual suggestion engine for the Sinhala language provides end users word suggestions as they type their sentences or phrases by analysing what domain they are writing their document on and predicting what word the user most likely to type next. Even though this is not a new idea and languages such as English already have these kinds of prediction engines implemented, it is not common to see one for the Sinhala language. This thesis describes a methodology to analyse user input and predict the next word and check the effectiveness of this methodology. As the prediction model, this thesis presents a hybrid model of Ngram model and Markov model. Ngram model is used to predict the next possible match for the given phrase and Markov model to predict the probability of occurrences. Addition to these models, experiments on how term frequency and inverse document frequency can affect the suggestion probability are included in this thesis. Analyzing the Sinhala language is different from analysing ASCII languages. In this thesis, it describes how to apply above methodologies for Sinhala language and how to overcome difficulties when analysing phrases or words of Sinhala language. In the testing that had conducted in this thesis, by training about 170 thousand sentences it gives roughly 50-60 % accuracy on suggesting a relevant word when typing in general context. When the domain of the user’s context changes, the accuracy dropped down to 40 – 50%. A reason for the dropped accuracy can be mainly due to the comprehensiveness of the domain-specific dataset. However, overall it is within the acceptable accuracy range. In conclusion, we can use hybrid of Ngram and Markov models to build a suggestion engine but with tweaked features of generic Ngram and Markov models to analyse the Sinhala language properly.
URI: http://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4233
Appears in Collections:2018

Files in This Item:
File Description SizeFormat 
2015MCS009.pdf4.67 MBAdobe PDFView/Open


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.