An Approach to Hate Speech Detection

De Silva, D.H.A

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4467

Title:	An Approach to Hate Speech Detection
Authors:	De Silva, D.H.A
Issue Date:	5-Aug-2021
Abstract:	Hate speech on social media becomes a highly considerable issue which is growing rapidly. As a result of the growth of internet users people tend to post on violence contents through social media. Therefore, the influence of sharing violence contents towards individuals and groups becomes a huge impact in today’s world and it directs to increase hate crimes in the society. It is essential to have a proper methodology to detect the online hate contents. Although there are many researches have been carried out based on this area, they are language specific things to detect hate contents. This research has been carried out to develop an efficient and accurate approach to detect the hate speech on social media using Sinhala Language. When comparing with the English language, to develop an approach to detect hate speech on Sinhala is a tedious task because of the large alphabet and its variations. In order to develop a model to detect hate speech on Sinhala Language, Machine learning and Deep Learning techniques were used as the core approaches. As the solution for this research, four supervised learning approaches including Linear Support Vector Machine, Logistic Regression, Naïve Bayes, Random Forest and Deep Neural Network were used to train the models and predict the accuracy values. Both Count Vectorizer and TF-IDF features used to train the models for Sinhala, Singlish and Mix datasets. Furthermore, to increase the performance level Cross Validation approach have been carried out for Count Vectorizer and saved the model results generated from Cross Validation. Then after that developed an Ensemble model by combining highest accurate models and taking different model combinations of Sinhala, Singlish and Mix data sets to get the highest accurate model. Finally, combination of Sinhala-Singlish data compared with Mix data and it predicted that concatenate with Sinhala-Singlish contents gives a highest accuracy for the hate speech detection model. Since there is no proper mechanism of using Ensemble model for hate speech detection research, this approach will direct as the proper methodology to detect the hate speech on Sinhala Language.
URI:	http://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4467
Appears in Collections:	2020

Files in This Item:

File	Description	Size	Format
2017 MCS 020.pdf		2.82 MB	Adobe PDF	View/Open

Show full item record