Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4531
Title: Detecting Hate Speech in Social Media Articles in Romanized Sinhala
Authors: Hettiarachchi, N.W.
Issue Date: 11-Aug-2021
Abstract: The main aim of this research is to automatically identify the hate content of social media comments and documents written by the Romanized Sinhala Language. Also most of researched done the hate speech recognition study in English or their language but here try to identify the hate speech in Romanized Sinhala language. Hate words and other hated texts are growing issue, and to combat this they turn to machine learning and computer science. In this research compare the several features extraction methods and four machine learning algorithms for hate speech detection Also compare difference N-gram values such as unigram, bigram and trigram and used the value of Min-Df as 3. This study will investigate and compare different features for the different classifier when classifying hate speech comments on Facebook. We have achieved a data set of nearly 2500 comments, some containing hate speech, and trained and tested our classifier with different features and finally examine the Multinomial Naive Bayes Classifier is performed better than other classification models Also compare the feature extraction methods countvectorizer and TfIdfVectorizer, we examined all the best performing models is TfIdf Vectorizer. In the random forest classifier method, when we evaluating those results we can see some overfitting the result on that classification methods. So used the parameter tuning for the all classification algorithms especially for the random forest classifier change the n_estimators value and random_state value then can see the some best results. According to the above examined of Final Results of Tf-idf Vectorizer feature extraction method the Multinomial Naive Bayes Classifier model is better than other models with bigram and min _Df value is 3. Multinomial Naive Bayes Classifier result with bigram and min _Df value is 3.
URI: http://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4531
Appears in Collections:2020

Files in This Item:
File Description SizeFormat 
2017 MCS 037.pdf1.79 MBAdobe PDFView/Open


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.