Please use this identifier to cite or link to this item:
|Topic Model for Sinhala News Sources available in the Web through Information Retrieval and Classification
|The development of information technology has led to availability of large text repositories. News articles are one such example which is now available at our finger tips using online news sources. Due to availability of multiple news sources and large number of news incidents are occurring per day, it is impractical to read all these articles to identify categories and find patterns among them. This is the issue of information overloading which led to the automatic text categorization for information retrieval. We can use topic modelling algorithms to analyze the contents of set of documents and extract the key themes occurring among them. Latent Dirichlet Allocation is one such algorithm. The main purpose of this research is to build a topic model using a collected set of Sinhala news articles and classify new unseen news articles using that model. We use Latent Dirichlet Allocation to build the topic model. We explore two approaches when building the model, a model based on a dynamic article set and a model based on a balanced article set. News articles for the models are manually collected from four major Sinhala news sources. Based on the results of each model, we see that model built with balanced data set performed better when classifying unseen news articles than the model build with dynamic data set. We think this is because some topics can get highlighted and some other fall under when the article set is collected dynamically. To demonstrate the use of the study, we collect Sinhala news articles through RSS feeds and classify them using the model. We show the result in form of a word cloud where each word in the cloud represent a certain topic. Further, in order to provide more enhanced user experience, we make this cloud clickable so that users can access news articles of different sources through a single interface.
|Appears in Collections:
|SCS Individual Project - Final Thesis (2015)
Files in This Item:
|View/Open Request a copy
Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.