4-13 - Modeling Perception with Hierarchical Prediction: Auditory Segmentation with Deep Predictive Coding Locates Candidate Evoked Potentials in EEG

André Ofner, Sebastian Stober

Keywords: Domain knowledge, Machine learning/Artificial intelligence for music, Cognitive MIR, Representations of music, Human-centered MIR, Personalization, MIR fundamentals and methodology, Multimodality, Musical features and properties, Rhythm, beat, tempo

Abstract: The human response to music combines low-level expectations that are driven by the perceptual characteristics of audio with high-level expectations from the context and the listener's expertise. This paper discusses surprisal based music representation learning with a hierarchical predictive neural network. In order to inspect the cognitive validity of the network's predictions along their time-scales, we use the network's prediction error to segment electroencephalograms (EEG) based on the audio signal. Using the NMED-T dataset on passive natural music listening we explore the automatic segmentation of audio and EEG into events using the suggested model. By averaging only the EEG signal at predicted locations, we were able to visualize auditory evoked potentials connected to local and global musical structures. This indicates the potential of unsupervised predictive learning with deep neural networks as means to retrieve musical structure from audio and as a basis to uncover the corresponding cognitive processes in the human brain.