Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/3171
Title: “Twitsum”: Automatic generation of event summaries using microblog streams
Authors: Madhawa, P.K.K.
Issue Date: 29-Jun-2015
Abstract: Microblogging platforms such as Twitter have become a primary medium for people to share their experiences and opinions on a broad range of topics. Because posts on Twitter are publicly viewable by default, Twitter is used to gain latest information on events like natural disasters, disease outbreaks or sports events. The abundance of tweets containing user opinions and their sentiments towards a topic necessitates the need of extracting newsworthy tweets from a large stream of tweets on a single topic. The goal of this research is to build a system capable of generating a summary of long running events using only a stream of tweets related to the particular event. This thesis contains an in-depth analysis of algorithms and features which are useful in detecting newsworthy tweets using several corpora of tweets. To address the constraints in manual annotation of large corpora, we introduce a novel heuristic based annotation scheme to generate training dataset for the system. The classifier trained using heuristically labeled tweets could achieve accuracy in the range of 80% for a manually annotated gold standard tweet corpus. Finally we investigate how these newsworthy tweets can be presented to the user as a summary. We introduce duplicate removal algorithms and an entity-centric clustering algorithm aimed at grouping tweets representing similar content.
URI: http://hdl.handle.net/123456789/3171
Appears in Collections:Master of Computer Science - 2015

Files in This Item:
File Description SizeFormat 
12440442.pdf
  Restricted Access
626.69 kBAdobe PDFView/Open Request a copy


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.