IMPROVING POS TAGGING FOR TAMIL USING DEEP LEARNING

Alstan, A.

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/3908

Title:	IMPROVING POS TAGGING FOR TAMIL USING DEEP LEARNING
Authors:	Alstan, A.
Issue Date:	2017
Abstract:	Abstract Part of Speech (POS) tagging is one of the basic and important application of Natural Language Processing (NLP). The accuracy of POS tagging have influence on the performance of many other NLP applications. This research presents a novel deep learning based POS tagger for Tamil language. Tamil is an agglutinative, morphologically rich and free word order language. The recent research works for Tamil language POS tagging were not be able to give state of the art POS tagging accuracy like other languages. Therefore, this research is done to improve the POS tagging for Tamil language using deep learning approaches. In the first phase of the research, few classification based models such as Decision Tree classifier, Naïve Bayes classifier and Support Vector Machine (SVM) classifier have been used to build POS tagger for Tamil language. Few handcrafted features were used to train these models. There are difficulties in useful feature extraction because of the complex structure of Tamil language. To avoid the use of handcrafted features and to improve the performance of the POS tagging of Tamil language a novel model was built using Long Short Term Memory (LSTM) neural network in this research. The models were evaluated with the AUKBC Tamil POS corpus which contains 50,876 sentences. Based on the experiments on the corpus, Support Vector Machine model was selected as the baseline model for this research. The accuracy of 95.697%, precision 96%, recall of 96% and f1-measure of 96% were obtained for the SVM classifier based POS tagger. An experiment on the AUKBC Tamil POS corpus with the LSTM model was carried out by changing the number of training epochs and the efficiency of the proposed POS tagger was evaluated on the corpus using the evaluation metrics precision, recall, f1-measure and accuracy. The accuracy of 96.74%, precision of 97%, recall of 97% and f1-measure of 97% were obtained for the LSTM model with five training epochs. Keywords – Part of Speech Tagging, Tamil Language, Deep learning
URI:	http://hdl.handle.net/123456789/3908
Appears in Collections:	SCS Individual/Group Project - Final Thesis (2017)

Files in This Item:

File	Description	Size	Format
Thesis 13000063.pdf		1.2 MB	Adobe PDF	View/Open

Show full item record