Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4238
Title: Transcribing Number Sequences in Continuous Sinhala Speech
Authors: Dilshan, K. A. D. C.
Issue Date: 27-Jul-2021
Abstract: Human speech recognition is still far more superior than that of the global performance of state-of-the-art speech recognition systems. Rudimentary speech recognition tools have various restrictions such as limited vocabulary of words and phrases, limited language support, and limited context support. More sophisticated tools have the ability to transcribe context independent natural speech, but the research has been done on very few languages. There are many inefficiencies in telecommunication services, such as Interactive Voice Response (IVR) navigation and call center work flows, due to unavailability of a proper online number extractor for continuous Sinhala speech. Through this research, an attempt has been made to generate number transcriptions from continuous speech for under resourced Sinhala language using readily available tools in order to optimize telecommunication services. This research focuses on a comprehensive architectural design of the Sinhala speech decoder pipeline including Gaussian Mixture Model (GMM) based acoustic modeling and feature rich language modeling for improved performance. However, data collection and annotation tool is developed exclusively for this study. Modeling tools such as Kaldi[27], that are used in this study, have more open license and comprehensive documentation and are also backed by a large community of researchers. As this study has a data analytic perspective and also an annotated speech corpus that is suitable for this study is not readily available, a careful effort is taken to build and evaluate a corpus of moderate number vocabulary with the voluntary participation of a friendly team. Further, the GMM based acoustic model with various input feature transformations are evaluated using a standard and an intrinsic scoring criteria which can be found in the general Automatic Speech Recognition (ASR) literature. An algorithm is formulated to calculate the Word Error Rate (WER) produced by each model. The best model returns an accuracy level of 80.09%. In conclusion, research objectives of the online number transcription tool are analyzed against the output of the study in terms of the performance of the models which are investigated throughout the research
URI: http://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4238
Appears in Collections:2018

Files in This Item:
File Description SizeFormat 
2015MCS023.pdf1.4 MBAdobe PDFView/Open


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.