Please use this identifier to cite or link to this item:
https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4705
Title: | Information Extraction From Scanned Invoices using Machine Learning, OCR and Spatial Feature Mapping Techniques |
Authors: | Darsha, W.B. |
Keywords: | scanned invoices/receipts machine learning YOLO OCR Tesseract, image processing pattern matching spatial information |
Issue Date: | 22-Jun-2023 |
Abstract: | Receiving invoices as scanned images is one of the biggest problems business organizations are still facing. Consuming human effort for converting scanned invoices to text documents is not sustainable because of their low performance even inherently capable of. With the recent escalations of Computer Vision technology with Machine Learning we were seeing new dimensions for addressing this bursting problem. Optical Character Reading (OCR) is the latest way of extracting text from images in general context, but the output was not much helpful for identifying key parameters from invoices. Hence we employed an object detection algorithm called You Only Looks Once (YOLO) first to capture text blobs in granular level, then streamlined them to OCR and finally processed spatial information with pattern matching techniques. Using this improved approach we could successfully extract not only key parameters like merchant information, invoice no, datetime, total but also the invoice items in the table body, and indeed with a high performance. Thus methodology we developed can be adapted to any scanned invoice dataset with proper adjustments, and also for any other document type. |
URI: | https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4705 |
Appears in Collections: | 2022 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2018 MCS 010.pdf | 4.37 MB | Adobe PDF | View/Open |
Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.