3-14 - Data Quality Matters: Iterative Corrections on a Corpus of Mendelssohn String Quartets and Implications for MIR Analysis
Jacob deGroot-Maggetti, Timothy R de Reuse, Laurent Feisthauer, Samuel Howes, Yaolong Ju, Suzuka Kokubu, Sylvain Margot, Néstor Nápoles López, Finn Upham
Keywords: Evaluation, datasets, and reproducibility, Novel datasets and use cases, Applications, Music retrieval systems, Domain knowledge, Computational music theory and musicology, MIR fundamentals and methodology, Symbolic music processing, MIR tasks, Music transcription and annotation
Abstract:
In this paper, we describe a workflow of successive corrections on Optical Music Recognition (OMR) generated MusicXML files and their respective outputs under Music Information Retrieval tasks. The original OMR-generated files of six Mendelssohn String Quartets were initially corrected by individual members of this interdisciplinary group, then reviewed by others to further standardize the quality and music analysis priorities of the team. Four MIR tasks are applied to each round of corrections on this collection: cadence detection, chord labeling, key finding, and monophonic pattern discovery.We measure changes in the outputs of these four MIR tasks from one round of correction to the next in order to evaluate the impact of corrections. Results show that expert revision is more beneficial to some MIR tasks than to others. The resulting corpus of curated MusicXML files is available as an open-source repository under a Creative Commons Attribution 4.0 International License for further MIR research.