Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4455
Title: Improving and Measuring OCR Accuracy for Sinhala with Tesseract OCR Engine
Authors: Balasooriya, B. P. K. M
Issue Date: 5-Aug-2021
Abstract: This research project proposes and implements a system to improve and measure the accuracy of the Sinhala OCR using the Tesseract OCR engine. The system implements modules to rectify the issues which are inherent to the Tesseract OCR engine when performing OCR for Sinhala language. During the course of the project, the world level accuracy was used to measure the accuracy of the output from the system. As a baseline to compare the results of the proposed system which implements tesseract OCR, the software the OCR Engine “පෙළ කැටෙත” was used. To improve the accuracy, a syntactical rule engine a module to detect and correct confusion character pairs and a rudimentary dictionary look up feature to detect and correct errors in word level has been implemented into the system. During the initial stage in the project which implemented only the Tesseract OCR library functionality, the output was less accurate when compared with the OCR Engine “පෙළ කැටෙත”. But as the features were built into the system, it yielded significantly improved results which improved the word level accuracy from the original 53.22% to 86.16%.
URI: http://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4455
Appears in Collections:2020

Files in This Item:
File Description SizeFormat 
2016 MCS 013.pdf1.28 MBAdobe PDFView/Open


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.