Please use this identifier to cite or link to this item:
Full metadata record
DC FieldValueLanguage
dc.contributor.authorDarsha, W.B.-
dc.description.abstractReceiving invoices as scanned images is one of the biggest problems business organizations are still facing. Consuming human effort for converting scanned invoices to text documents is not sustainable because of their low performance even inherently capable of. With the recent escalations of Computer Vision technology with Machine Learning we were seeing new dimensions for addressing this bursting problem. Optical Character Reading (OCR) is the latest way of extracting text from images in general context, but the output was not much helpful for identifying key parameters from invoices. Hence we employed an object detection algorithm called You Only Looks Once (YOLO) first to capture text blobs in granular level, then streamlined them to OCR and finally processed spatial information with pattern matching techniques. Using this improved approach we could successfully extract not only key parameters like merchant information, invoice no, datetime, total but also the invoice items in the table body, and indeed with a high performance. Thus methodology we developed can be adapted to any scanned invoice dataset with proper adjustments, and also for any other document type.en_US
dc.subjectscanned invoices/receiptsen_US
dc.subjectmachine learningen_US
dc.subjectTesseract, image processingen_US
dc.subjectpattern matchingen_US
dc.subjectspatial informationen_US
dc.titleInformation Extraction From Scanned Invoices using Machine Learning, OCR and Spatial Feature Mapping Techniquesen_US
Appears in Collections:2022

Files in This Item:
File Description SizeFormat 
2018 MCS 010.pdf4.37 MBAdobe PDFView/Open

Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.