Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4905
Title: Sinhala Speech-to-Speech Chatbot Using Deep Learning Approaches
Authors: Weerakoon, T.V.R.
Nayanathara, K.K.S.
Harischandra, L.I.L.
Issue Date: 30-Jun-2025
Abstract: Abstract This research presents the development of an advanced Sinhala speech-to-speech chatbot designed to bridge the gap in digital accessibility for native Sinhala speakers. Despite the rapid advancements in conversational AI systems, low-resource languages like Sinhala remain underrepresented, limiting the ability of native speakers to interact with technology in their own language. Addressing this critical gap, this study proposes an end-to-end solution that seamlessly integrates Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), and Text-to-Speech (TTS) synthesis, enabling real-time, voice-based communication in Sinhala. The system leverages state-of-the-art deep learning techniques to achieve high accuracy and robustness. For ASR, transfer learning is employed to fine-tune the Wav2Vec2-BERT model on a 40-hour Sinhala speech dataset, achieving remarkable improvements with a Word Error Rate (WER) of 1.79% and a Character Error Rate (CER) of 0.33%, surpassing existing Sinhala ASR systems. The chatbot component utilizes a Retrieval-Augmented Generation (RAG) approach, combining the strengths of Large Language Models (LLMs) with dynamic knowledge retrieval to deliver context-aware and accurate responses in Sinhala. The TTS module, powered by the Variational Inference TTS (VITS) model, generates natural-sounding Sinhala speech, achieving a Mean Opinion Score (MOS) of 4.62 for intelligibility and 4.18 for naturalness in male voices, and 4.24 for intelligibility and 4.07 for naturalness in female voices. The proposed system addresses a significant gap in voice-based human-computer interaction for Sinhala speakers, with applications spanning education, accessibility, and digital services. By combining cutting-edge ASR, RAG-powered chatbot intelligence, and high-quality TTS, this research not only advances the field of NLP for low-resource languages but also sets a benchmark for future developments in multilingual speech technologies. The modular architecture and methodologies developed in this study provide a foundation for extending similar solutions to other underrepresented languages, fostering greater inclusivity in the digital age.
URI: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4905
Appears in Collections:2025

Files in This Item:
File Description SizeFormat 
20000715, 20001207, 20002009 .pdf7.62 MBAdobe PDFView/Open


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.