Explaining Perceived Emotion Predictions in Music: an Attentive Approach

Sanga Chaki; Pranjal Doshi; Sourangshu Bhattacharya; Prof. Priyadarshi Patnaik

1-18 - Explaining Perceived Emotion Predictions in Music: an Attentive Approach

Sanga Chaki, Pranjal Doshi, Sourangshu Bhattacharya, Prof. Priyadarshi Patnaik

Keywords: Musical features and properties, Musical affect, emotion, and mood, Applications, Music recommendation and playlist generation, Music retrieval systems, Domain knowledge, Machine learning/Artificial intelligence for music, MIR tasks, Automatic classification, Pattern matching and detection

Abstract: Dynamic prediction of perceived emotions of music is a challenging problem with interesting applications. Utilization of relevant context in audio sequence is essential for effective prediction. Existing methods have used LSTMs with modest success. In this work we describe three attentive LSTM based approaches for dynamic emotion prediction from music clips. We validate our models through extensive experimentation on standard dataset annotated with arousal-valence values in continuous time, and choose the best performer. We find that the LSTM based attention models perform better than the state of the art transformers for the dynamic emotion prediction task, both in terms of R2 and Kendall-Tau metrics. We explore individual smaller feature sets in search of a more effective one and to understand how different features contribute to perceived emotion. The spectral features are found to perform at par with the generic ComPare feature set [1]. Through attention map analysis we visualize how attention is distributed over music clips’ frames for emotion prediction. It is observed that the models attend to frames which contribute to changes in reported arousal-valence values and chroma to produce better emotion predictions, effectively capturing long-term dependencies.