Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4615
Title: Detecting Sinhala Language Based Racial and Religious Offensive Statements in Social Media
Authors: Wimalasena, K. A. T. L.
Keywords: Natural Language Processing
Machine Learning
Issue Date: 12-Jul-2022
Abstract: The offensive statements and few people who promoted the violence using Facebook posts are the main reasons for few devastating incidents which took place in Sri Lanka. In March 2018, the Sri Lankan government was forced to impose a one-week social media ban in order to prevent the dissemination of false information and racial ideas that could complicate the situation. However, once the ban lifted, there were no mechanism to moderate the comments and posts in Facebook. Relevant authorities have failed to stop the spread of hate via social media platforms since they don’t have capable Sinhala language interpreters to detect racial and religious offensive statements. In this study, a machine learning based model has presented to detect Sinhala language based racial and religious offensive statements. The pre-processed TF-IDF weighted character n-grams was used as features and three prominent machine learning based classifiers as Logistic Regression, Naive Bayes and Support Vector Machines were trained and tested. Naive Bayes classifier recorded F1 Score of 0.741 while SVM records 0.801. The highest accuracy and F1 Score of 0.824 and 0.851 respectively were obtained with Logistic Regression. As per the results, TF-IDF weighted character n-grams features with Logistic Regression is a comprehensive model for detecting sinhala labguage based racial and offensive statements in social media.
URI: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4615
Appears in Collections:2021

Files in This Item:
File Description SizeFormat 
2017 MCS 097.pdf482.84 kBAdobe PDFView/Open


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.