Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/2509
Title: Extracting knowledge from semi structured data sources
Authors: Ratnasingham, A.J.N.
Issue Date: 26-May-2014
Abstract: Today due to the high availability and high volume of information, acquiring and managing knowledge from a given data source takes precedence in different areas of research. The World Wide Web is one such source that contains a significant amount of both structured and unstructured data. The very first version of the web, which was based on unstructured data, was mainly focused on providing information in an understandable manner to humans. Then however, the evolution happened to divert the focus on making the information understandable to both human and machine in order to make the maximum use of the available information. This has now led to the development of a web from which semantic information can be extracted by man and machine alike. This research aims at finding an efficient solution to mine the necessary semantic data from semi-structured data sources such as human readable web pages, and represent them in a structured format so that its content may be both machine readable as well as human understandable. Organizing semi-structured data into a structure that is machine-readable increases the ability of a system to analyze information quickly and efficiently to query or retrieve needed information for many applications. Applications such as search engines and software agents will then be able to reason and make inferences with the knowledge extracted in this way. The semi-structured web source is first parsed using the Standford Dependency Parser, whose output is then filtered so that the knowledge encoded in each sentence is extracted as an RDF triple. The extracted information represented using RDF/XML. The prototype was tested using four different types of semi-structured HTML pages taken from the Simple English Wikipedia. The statistical accuracy of the knowledge extraction system was 67%, which was evaluated using parameters such as the number of meaningful output received as knowledge against the number of sentences that were processed from the selected web source.
URI: http://hdl.handle.net/123456789/2509
Appears in Collections:Master of Computer Science - 2014

Files in This Item:
File Description SizeFormat 
11440472.pdf
  Restricted Access
1.2 MBAdobe PDFView/Open Request a copy


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.