3-12 - Automatic Rank-ordering of Singing Vocals with Twin-neural Network
Chitralekha Gupta, Lin Huang, Haizhou Li
Keywords: Domain knowledge, Machine learning/Artificial intelligence for music, Applications, Music training and education, Evaluation, datasets, and reproducibility, Evaluation methodology, MIR fundamentals and methodology, Music signal processing, MIR tasks, Similarity metrics, Musical features and properties, Timbre, instrumentation, and voice
Abstract:
When making judgements, humans are known to be better at choosing a preferred option amongst a small number of options, rather than giving an absolute ranking of all the options. This preference-based judgment rank-ordering method is called Best-Worst Scaling (BWS). Inspired by this concept, we propose a preference-based framework to generate a relative rank-ordering of singing vocals, and therefore, singers. We adopt a twin-neural network (Siamese) that learns to choose a preferred candidate in terms of singing quality between two inputs. With a few such pairwise comparisons, this method generates a relative rank-order of a complete list of singers. Additionally, we incorporate a knowledge-based musically-relevant pitch histogram representation, as a conditioning vector, to provide explicit musical information to the network. The experiments show that this method is able to reliably evaluate singing quality and rank-order singing vocals, independent of the song or the singer. The results suggest that the twin-neural network learns the underlying discerning properties relevant to singing quality, instead of being specific to the content of a song or singer.