Information Extraction From Scanned Invoices using Machine Learning, OCR and Spatial Feature Mapping Techniques

Darsha, W.B.

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4705

Full metadata record

DC Field	Value	Language
dc.contributor.author	Darsha, W.B.	-
dc.date.accessioned	2023-06-22T09:18:18Z	-
dc.date.available	2023-06-22T09:18:18Z	-
dc.date.issued	2023-06-22	-
dc.identifier.uri	https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4705	-
dc.description.abstract	Receiving invoices as scanned images is one of the biggest problems business organizations are still facing. Consuming human effort for converting scanned invoices to text documents is not sustainable because of their low performance even inherently capable of. With the recent escalations of Computer Vision technology with Machine Learning we were seeing new dimensions for addressing this bursting problem. Optical Character Reading (OCR) is the latest way of extracting text from images in general context, but the output was not much helpful for identifying key parameters from invoices. Hence we employed an object detection algorithm called You Only Looks Once (YOLO) first to capture text blobs in granular level, then streamlined them to OCR and finally processed spatial information with pattern matching techniques. Using this improved approach we could successfully extract not only key parameters like merchant information, invoice no, datetime, total but also the invoice items in the table body, and indeed with a high performance. Thus methodology we developed can be adapted to any scanned invoice dataset with proper adjustments, and also for any other document type.	en_US
dc.language.iso	en_US	en_US
dc.subject	scanned invoices/receipts	en_US
dc.subject	machine learning	en_US
dc.subject	YOLO	en_US
dc.subject	OCR	en_US
dc.subject	Tesseract, image processing	en_US
dc.subject	pattern matching	en_US
dc.subject	spatial information	en_US
dc.title	Information Extraction From Scanned Invoices using Machine Learning, OCR and Spatial Feature Mapping Techniques	en_US
dc.type	Thesis	en_US
Appears in Collections:	2022

Files in This Item:

File	Description	Size	Format
2018 MCS 010.pdf		4.37 MB	Adobe PDF	View/Open

Show simple item record