Applicability of Transfer Learning on End-to-End Sinhala Speech Recognition

Pushpakumara, W.D.H

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4793

Full metadata record

DC Field	Value	Language
dc.contributor.author	Pushpakumara, W.D.H	-
dc.date.accessioned	2024-10-16T04:57:48Z	-
dc.date.available	2024-10-16T04:57:48Z	-
dc.date.issued	2024-05	-
dc.identifier.uri	https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4793	-
dc.description.abstract	Abstract Automatic Speech Recognition (ASR) is a rapidly evolving area within Natural Language Processing (NLP), addressing a range of linguistic challenges. While ASR technologies have made significant strides through various models, including Hidden Markov Models (HMMs), Gaussian Mixture Models (GMMs), and more recently, Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs), certain languages like Sinhala face specific limitations. One major challenge for Sinhala ASR development is the lack of sufficient labeled speech data, which makes it difficult and costly to build accurate models. This thesis explores a transfer learning-based approach to mitigate the data scarcity problem in Sinhala ASR. Specifically, the study leverages the XLS-R model developed by Babu et al. (2021) as the source model, using its pre-learned speech representations to fine-tune a Sinhala ASR model. Two distinct datasets, differing in their lexical composition, were used to evaluate the model’s performance. The proposed model achieved Word Error Rates (WER) of 33.78% and 38.31% on the two datasets, respectively. To further enhance transcription accuracy, post-processing steps, including spell correction and word boundary correction algorithms, were applied, resulting in improved WERs of 24.28% and 36.6%. While the baseline model performed better on the first dataset, a relative WER reduction of 10.07% was observed on the second dataset. An analysis of the generated transcriptions indicates that the proposed model produces results that are acceptable in practical applications, highlighting its potential to improve ASR performance for under-resourced languages like Sinhala.	en_US
dc.language.iso	en	en_US
dc.title	Applicability of Transfer Learning on End-to-End Sinhala Speech Recognition	en_US
dc.type	Thesis	en_US
Appears in Collections:	2024

Files in This Item:

File	Description	Size	Format
2019 CS 125.pdf		2.76 MB	Adobe PDF	View/Open

Show simple item record