Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/2468
Title: Automation of Constructing an eProfile from Web Contents
Authors: Silva, S.S.H.
Issue Date: 20-May-2014
Abstract: The WWW is becoming more powerful in present information era. Extracting web content and summarizing them were became a di cult for Internet users. This thesis investigates the existing techniques for web data extraction and summarizing them. This research focus on creating an ePro le for scholars by extracting the web content, summarizing and classifying them. Therefore prototype application was built with a pipeline architecture. In the application, the extraction module was performed by using html parser and using XPath expressions. The summarization was done by calculating Sentence Importance Score and Semantic Similarity Score. In the classi cation module, a N-Gram based text classi cation technique was used. The proposed solution is intended to be usable for constructing the ePro le in Academia domain and also it could be expected to use to many other domains such as analyzing news articles, pro ling past activities of persons.
URI: http://hdl.handle.net/123456789/2468
Appears in Collections:SCS Individual Project - Final Thesis (2013)

Files in This Item:
File SizeFormat 
9001417.pdf
  Restricted Access
456.58 kBAdobe PDFView/Open Request a copy


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.