Date of Original Version
© ACM, (2006). This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Communications of the ACM, Volume 49, Issue 8 (August 2006) http://doi.acm.org/10.1145/1145287.1145311
Abstract or Description
As with many other kinds of data, the past several decades have witnessed an explosion in the quantity and variety of music in computer-accessible form. There are primarily two kinds of “music data” one encounters today: sampled audio files, such as those found on compact discs or scattered over the web in various formats, and symbolic music representations, which essentially list notes with pitch, onset time, and duration for each note. To draw an analogy, music audio is to symbolic music as speech audio is to text. In both cases the audio representations capture the colorful expressive nuances of the performances, but are difficult to “understand” by anything other than a human listener. On the other hand, in both text and symbolic music the high level “words” are parsimoniously stored and easily recognized.
We focus here on a form of machine listening known as music score matching, score following, or score alignment. Here we seek a correspondence between a symbolic music representation and an audio performance of the same music, identifying the onset times of all relevant musical “events” in the audio—usually notes. There are two different versions of the problem, usually called “off-line” and “on-line.”
Communications of the ACM, 49, 8, 38-43.