Emotion-Based Melody Generation using Song Vocals and Psychophysiological Signals

Wijethunge, L.V.N.

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4937

Title:	Emotion-Based Melody Generation using Song Vocals and Psychophysiological Signals
Authors:	Wijethunge, L.V.N.
Issue Date:	30-Jun-2025
Abstract:	Abstract This research presents a novel framework that bridges EEG-based emotion recognition and AI-driven music generation, focusing on emotions evoked by song vocals. While earlier studies have explored music emotion recognition and EEG analysis separately, none of the literature has connected vocal-induced emotional states to generative music systems. This study addresses that gap by proposing a system that interprets arousal and valence values derived from EEG signals while listening to vocal tracks and translates them into emotionally aligned melodies. The experiment involved 30 participants who listened to 104 carefully curated vocal songs. EEG signals were recorded and processed using advanced feature extraction techniques, including DiscreteWavelet Transform (DWT) and ContinuousWavelet Transform (CWT). These features were input into machine learning models such as Support Vector Regression (SVR), Long Short-Term Memory (LSTM) networks, and a hybrid CNN + LSTM architecture. Results showed that the LSTM model achieved strong predictive accuracy with a Mean Absolute Error of 0.042 for arousal and 0.057 for valence, though it lacked spatial feature representation. The CNN + LSTM model demonstrated superior performance by capturing both spatial and temporal EEG features. A key novelty of this work lies in its end-to-end pipeline that converts predicted emotional states into natural language prompts, which are then used to condition MUSICGEN, a transformer-based music generation model. This enabled the creation of emotionally congruent music, with user-controllable parameters such as melody duration, instrumentation, genre, and tempo. Emotion alignment analysis revealed high consistency in arousal based generation, with over 80% alignment for most tracks. Valence alignment, while promising for several tracks (above 90%), exhibited greater variability, highlighting challenges in capturing subjective emotional tones. The dataset curated during this study has been made publicly available to support future research in affective computing, emotion-aware music generation, and human-computer interaction.
URI:	https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4937
Appears in Collections:	2025

Files in This Item:

File	Description	Size	Format
20002191 - LVN Wijethunge - Mr. WIJETHUNGE L.V.N..pdf		12.83 MB	Adobe PDF	View/Open

Show full item record