Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/3743
Title: Smart search on printed materials
Authors: Ajith,Srikukan
Issue Date: 19-Sep-2016
Abstract: There are huge amount of data/information available in the form of text book compare to the information available in internet/online but most of us prefer to refer the online documents. That’s because searching for keywords/information in the printed document is a time consuming task and sometime difficult too when it comes to hundreds or thousands page document where the online documents are arranged in way to find information quickly. So the goal of this research project is to build a framework that makes the searching process much efficient on digital image of printed materials. The smart search on printed material provide a mechanism to do an easy and fast search with much efficiency. And it shows significant efficiency when it comes to search on bulk documents and the efficiency achieved by digitalizing the documents into computable data format, which means the system takes the image of a document as input and convert into a digital data where we can use computers or mobiles to find the information. This thesis contains an in-depth analysis of various techniques and algorithms used to find the intended information from the documents. In an abstract view, the system has 3 main components, they are; text and coordinate extraction, indexing the extracted data and searching information. The system uses an open source OCR library and improves the library to support multi-threading, read variety of documents with less image quality and increase the accuracy of the coordinates of each characters in the image. To store the data, the system uses Dictionary and “Trie” as data structures which are chosen to tightly couple with the searching method that used in the system in order to provide a good performance. And all the techniques used to improve the OCR performance, used to store and search data to/from the data structure are discussed in this thesis. Under results chapter the thesis describe the result and testing criteria of the system. The test data contains collection of digital image documents and its corresponding word document. The conclusions are made by comparing system results with corresponding word document. When analyzing the final outputs, the system shows that the searching time has been improved by 50 to 70 percentages compare to the time taken for manual search. Finally discussed about the future improvement that can be carried out on top of the system in order to search information from variety of data sources (video, Google images) in the different kind of environment.
URI: http://hdl.handle.net/123456789/3743
Appears in Collections:Master of Computer Science - 2016

Files in This Item:
File Description SizeFormat 
MCS Final Report v3_final.pdf
  Restricted Access
2.21 MBAdobe PDFView/Open Request a copy


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.