Detecting Sinhala Language Based Racial and Religious Offensive Statements in Social Media

Wimalasena, K. A. T. L.

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4615

Title:	Detecting Sinhala Language Based Racial and Religious Offensive Statements in Social Media
Authors:	Wimalasena, K. A. T. L.
Keywords:	Natural Language Processing Machine Learning
Issue Date:	12-Jul-2022
Abstract:	The offensive statements and few people who promoted the violence using Facebook posts are the main reasons for few devastating incidents which took place in Sri Lanka. In March 2018, the Sri Lankan government was forced to impose a one-week social media ban in order to prevent the dissemination of false information and racial ideas that could complicate the situation. However, once the ban lifted, there were no mechanism to moderate the comments and posts in Facebook. Relevant authorities have failed to stop the spread of hate via social media platforms since they don’t have capable Sinhala language interpreters to detect racial and religious offensive statements. In this study, a machine learning based model has presented to detect Sinhala language based racial and religious offensive statements. The pre-processed TF-IDF weighted character n-grams was used as features and three prominent machine learning based classifiers as Logistic Regression, Naive Bayes and Support Vector Machines were trained and tested. Naive Bayes classifier recorded F1 Score of 0.741 while SVM records 0.801. The highest accuracy and F1 Score of 0.824 and 0.851 respectively were obtained with Logistic Regression. As per the results, TF-IDF weighted character n-grams features with Logistic Regression is a comprehensive model for detecting sinhala labguage based racial and offensive statements in social media.
URI:	https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4615
Appears in Collections:	2021

Files in This Item:

File	Description	Size	Format
2017 MCS 097.pdf		482.84 kB	Adobe PDF	View/Open

Show full item record