Holistic Approach in Recognizing Handwritten Tamil words

Thadchanamoorthy, S.

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/3256

Title:	Holistic Approach in Recognizing Handwritten Tamil words
Authors:	Thadchanamoorthy, S.
Issue Date:	15-Nov- 19
Abstract:	Optical Character Recognition, OCR, is the process of converting the images of handwritten, typewritten, or printed text into machine editable text, such as ASCII code. The area of OCR concerns the essential concepts of pattern recognition. The handwriting recognition can be seen as a sub task of OCR. It provides simple interface between man and machine. In spite of several advancements in technologies pertaining to optical character recognition, handwriting continues to persist as means of documenting data for day to day life. In the field of handwriting recognition, on-line and off-line recognitions are traditional. The handwritten character recognition is a very difficult process due to the cursive and unconstrained nature of the handwritten characters due to the different styles of different writers. Further, the handwritten characters are sometimes overlapped and touched with the adjacent characters. This is one of the major hurdles in segmentation of characters from the words. Therefore, nowadays a holistic approach together with other techniques becomes popular in recognizing handwritten words, rather than recognizing individual characters. The holistic word recognition approach is mainly used in the area of postal automation, bank check processing, automatic data entry etc. In this research, due to the cursive style of handwritten Tamil scripts, two classification models using holistic approach for handwritten Tamil words are proposed. The first model is based on simple geometric features using SVM classifier and the second model is based on the directional features using MQDF classifier. The importance of the first approach is the improvement on input images prior to the feature extraction. In addition to the generally available prepossessing techniques such as Otsu’s binarization, standardization, thinning process, and the slant correction, some additional corrective measures such as removal of unwanted prolongs (pruning and clipping) and mid alignment of the characters are proposed to improve the word images. A dataset is created including 218 country names, 156 Sri Lankan city names and 109 Tamil Nadu city names for the purpose of this research. For each of the name, one hundred samples are collected from a group of 500 different writers. The first approach is based on geometric features using Gabor filter. The country names (217) were considered for this work. They are the number of vertical lines, the number of v horizontal lines, number of +45 degree slanted lines, number of -45 degree slanted lines and the number of dots appeared in the word image. Further, these features are counted at twelve different positions on a 3x4 gridded word so that to increase the intra word variations. A simple technique to compensate the loss of vertical number of lines due to touching (within and between characters) is also proposed. A significant result with accuracy of 86.36% is achieved. The second attempt is targeted for the postal automation in Tamil Language within Sri Lanka and Tamil Nadu, India. For this purpose, the famous split-and –merge algorithm is used. Avoiding proper segmentation, a city name string is considered as a word and the recognition problem is treated as lexicon driven word recognition. In this approach, binarized city names are pre-segmented into primitives (individual character or its parts) using water reservoir concept. Primitive components of each city name are then merged into possible characters to get the best city name using dynamic programming. For merging, the total likelihood of characters is used as the objective function and character likelihood is computed based on Modified Quadratic Discriminant Function (MQDF), where four directional features (horizontal, vertical, 45 degree slanted and 135 degree slanted) are applied. From the experiment, a significant result of 96.89% accuracy is obtained, out of 265 word classes
URI:	http://hdl.handle.net/123456789/3256
Appears in Collections:	2015

Files in This Item:

File	Description	Size	Format
2013-mphil-2010-019.pdf Restricted Access		2.24 MB	Adobe PDF	View/Open Request a copy

Show full item record