7-441 - Visualization of Deep Networks for Musical Instrument Recognition

Charis Cochran, Youngmoo Kim

Abstract: We present a visualization tool for Convolutional Neural Networks focused on the task of instrument recognition. This tool allows you to visualize the network response layer by layer to a specific input sample as an array of animated activation plots corresponding to nodes, or filters, in the network. The recognition of instruments from audio, particularly in ensemble mixtures, remains a challenging and important problem fundamental to the field of music information retrieval. Early solutions to this problem focused heavily on designing task specific input features. These features were very well defined, however, their performance does not come close to state-of-the-art deep learning approaches such as convolutional neural networks, multi-task approaches, and transfer learning. However, the reported results of these black-box networks generally focus on overall performance across a dataset and ignore underlying instrument class performance disparities, which may overlook deeper issues with these approaches. Recently these types of deep learning approaches have become de facto standards for solving a wide variety of problems in the field of MIR. Still the underlying feature representations learned by these networks are not well understood in deep learning problems at large and even less in audio and spectrogram input specific cases. Our goal is to apply deep network and CNN analysis tools to the problem of predominant instrument recognition and create an analysis tool widely applicable and useful for MIR specific deep learning models.