Sinhala Speech-to-Speech Chatbot Using Deep Learning Approaches

Weerakoon, T.V.R.; Nayanathara, K.K.S.; Harischandra, L.I.L.

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4905

Full metadata record

DC Field	Value	Language
dc.contributor.author	Weerakoon, T.V.R.	-
dc.contributor.author	Nayanathara, K.K.S.	-
dc.contributor.author	Harischandra, L.I.L.	-
dc.date.accessioned	2025-08-15T10:09:04Z	-
dc.date.available	2025-08-15T10:09:04Z	-
dc.date.issued	2025-06-30	-
dc.identifier.uri	https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4905	-
dc.description.abstract	Abstract This research presents the development of an advanced Sinhala speech-to-speech chatbot designed to bridge the gap in digital accessibility for native Sinhala speakers. Despite the rapid advancements in conversational AI systems, low-resource languages like Sinhala remain underrepresented, limiting the ability of native speakers to interact with technology in their own language. Addressing this critical gap, this study proposes an end-to-end solution that seamlessly integrates Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), and Text-to-Speech (TTS) synthesis, enabling real-time, voice-based communication in Sinhala. The system leverages state-of-the-art deep learning techniques to achieve high accuracy and robustness. For ASR, transfer learning is employed to fine-tune the Wav2Vec2-BERT model on a 40-hour Sinhala speech dataset, achieving remarkable improvements with a Word Error Rate (WER) of 1.79% and a Character Error Rate (CER) of 0.33%, surpassing existing Sinhala ASR systems. The chatbot component utilizes a Retrieval-Augmented Generation (RAG) approach, combining the strengths of Large Language Models (LLMs) with dynamic knowledge retrieval to deliver context-aware and accurate responses in Sinhala. The TTS module, powered by the Variational Inference TTS (VITS) model, generates natural-sounding Sinhala speech, achieving a Mean Opinion Score (MOS) of 4.62 for intelligibility and 4.18 for naturalness in male voices, and 4.24 for intelligibility and 4.07 for naturalness in female voices. The proposed system addresses a significant gap in voice-based human-computer interaction for Sinhala speakers, with applications spanning education, accessibility, and digital services. By combining cutting-edge ASR, RAG-powered chatbot intelligence, and high-quality TTS, this research not only advances the field of NLP for low-resource languages but also sets a benchmark for future developments in multilingual speech technologies. The modular architecture and methodologies developed in this study provide a foundation for extending similar solutions to other underrepresented languages, fostering greater inclusivity in the digital age.	en_US
dc.language.iso	en	en_US
dc.title	Sinhala Speech-to-Speech Chatbot Using Deep Learning Approaches	en_US
dc.type	Thesis	en_US
Appears in Collections:	2025

Files in This Item:

File	Description	Size	Format
20000715, 20001207, 20002009 .pdf		7.62 MB	Adobe PDF	View/Open

Show simple item record