Please use this identifier to cite or link to this item:
https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4852
Title: | SINHALA CLICKBAIT YOUTUBE VIDEO DETECTION BASED ON THE THUMBNAIL TEXT USING MACHINE LEARNING |
Authors: | Dilhan, A.W.A.T |
Issue Date: | 20-Sep-2024 |
Abstract: | ABSTRACT The YouTube is one of the largest video sharing platforms in the world. There is a mechanism associated with the YouTube platform to earn money by displaying advertisements while playing the YouTube videos. Here, the revenue of the person who posted the video depends on the number of views of the video. Therefore, these videos include intriguing thumbnail with some captivating text to get the user's attention to increase the number of views in order to increase the revenue. Because of that, some people tend to include clickbait statements on YouTube video thumbnails, and those statements are purposely designed to attract the user’s attention and make them curious to follow the link and read, view, or listen to the attached content. It typically employs exaggeration, sensationalism, or curiosity-driven language to attract user's attention. In this research study, There are three main text feature extraction techniques have been employed including countvectorizer, TFIDF vectorizer and Word2Vec word embedding to identify such kind of clickbait content from the thumbnail of a YouTube video and employed different machine learning algorithms including Logistic Regression, Support Vector Machine, Multinomial Naive Bayes and K-Nearest Neighbors with different ranges of N-grams. According to the observed result, Logistic Regression outperformed with the F1 score of 0.81 with the N-Gram range (1,2) and (1,3) along with the TFIDF Vectorization technique. |
URI: | https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4852 |
Appears in Collections: | 2024 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2019MCS022.pdf | 2.2 MB | Adobe PDF | View/Open |
Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.