Semi-Supervised Aspect Extraction for Generic Aspect-based Sentiment Analysis

Ekanayake, E.M.K.U

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4214

Title:	Semi-Supervised Aspect Extraction for Generic Aspect-based Sentiment Analysis
Authors:	Ekanayake, E.M.K.U
Issue Date:	26-Jul-2021
Abstract:	With the increase of popularity of web 2.0 and easy access, rapidly increasing the amount of user generated data. The tendency to rely on reviews of products and services has become more natural. But most of the contents are too much to read and unstructured to get necessary information. This research is proposing an unsupervised approach to extract sentiment for different aspects considering user reviews for hotels. The focus is to do the sentiment analysis for a collection of reviews than individual reviews. Frequency based aspect word extraction for hotel reviews, aspect category detection and aspect sentiment classification is discussed and evaluated. Usage of general purpose corpus for aspect category detection is experimented. Aspect based sentiment classification is experimented using sentiment analysis implementation in Python. The data set employed is extracted from tripadvisor.com web site using a self-implemented python tool and preprocessed data with NLP techniques for data preprocessing. Preprocessed data was employed to find aspect words and then each review text was parsed to determine aspects which were discussed in each review. One of the major goal was to determine sentiment value for aspects in each review texts. Positive or negative sentiment was identified using sentiment classifiers. Aspect based sentiment analysis which was used in this research was evaluated under aspect words extraction, aspect category detection and aspect sentiment detection. A manually annotated dataset was used for the evaluation. According to the evaluation results, 70% of accuracy achieved in aspect words extraction. Aspect words were identified using a frequency based approach. Different threshold values for the frequency was evaluated. An aspect words list with less synonyms were detected by specifying a high frequency threshold which was resulting 36% of words as aspect words. When detecting the correct aspect category for a review sentence, 22% of reviews were identified with correct aspect category. Both aspect category and the sentiment value identified correctly in 18% of reviews. At the end of evaluation, 0.4808 level of accuracy found on correctly classified aspect polarity occurrences. Evaluation results reveals further improvement areas which can increase the accuracy and reduce error levels. The thesis proposes an unsupervised approach for aspect sentiment analysis problem and possible future improvement suggestions to implement an application based on the suggested process.
URI:	http://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4214
Appears in Collections:	2018

Files in This Item:

File	Description	Size	Format
2013MCS018.pdf		1.64 MB	Adobe PDF	View/Open

Show full item record