Lip Synchronization Model for Sinhala Language Using Machine Learning

Ranaweera, P.D.C.

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4717

Title:	Lip Synchronization Model for Sinhala Language Using Machine Learning
Authors:	Ranaweera, P.D.C.
Keywords:	lip synchronization Sinhala Static viseme multi class classification
Issue Date:	23-Jun-2023
Abstract:	Currently, a lot of nations produce cartoon characters for a variety of purposes, including the animation film industry, the gaming industry, live broadcast television programs, etc. These characters are made available so that users can interact more with the films, video games, or television shows. For such cartoon figures to appear more alive while speaking a language, lip synchronization is crucial. Lip synchronization is the process of synchronizing speech to a synthetic facial model's lip movement. To create realistic lip-synchronization animation, the voice and lip motions in this procedure must be appropriately timed. Building a talking face utilizing various methods for languages including English, Korean, and Portuguese has been the subject of numerous studies. Compared to other languages, Sinhala has less resources due to less contribution in the researches. The interaction between the synthetic mouth and the Sinhala sounds will be especially interesting to observe. This model can be used to create cartoon characters that speak Sinhala smoothly instead of opening and closing their mouths a lot. The most difficult challenge is to match the "phonemes," which are the fundamental sounds formed in any language, with the "visemes," a visual representation of lip movement. There are three main methods for lip synchronization: the static viseme approach, which uses the viseme alphabet to derive the language's phonemes, the dynamic approach, which employs visual cues from speech in real time, and the deep learning technique, which makes use of a vast visual data set. Because the letters in the Sinhala language indicate the language's phonemes, the viseme classification in this study is based on a variety of letter pairings. Overall, 23 viseme classes have been found. Finally, a deep learning model was produced utilizing a multiclass classification method. In the final system implementation, text input is provided first, after which the system will produce audio and the deep learning model will produce a collection of visemes based on the provided text. The system interface then offers three options for playing the vesmes at various speeds, including rapid, normal, and slow. The user interface was created in Python, and the deep learning model was integrated into the system. The deep learning model for the viseme classification is created using Google collabs. This model will be very helpful in the future when the Sinhala alphabet gets a new character. This approach can also be used to train deaf persons to read lips.
URI:	https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4717
Appears in Collections:	2022

Files in This Item:

File	Description	Size	Format
2018 MCS 073.pdf		2.67 MB	Adobe PDF	View/Open

Show full item record