Vol. 28 Issue 2 Reviews
International Conference on Music Information Retrieval 2003

Johns Hopkins University/Library of Congress, Baltimore, Maryland, USA, 26-30 October, 2003

Reviewed by Alexandra L. Uitdenbogerd
Melbourne, Australia

ISMIR 2003, the fourth International Conference on Music Information Retrieval, was held October 26-30, 2003, in Baltimore, Maryland, USA. Since its inception in 2000, the conference has been a popular venue for technical papers on analyzing, storing, and retrieving music, whether it be in the form of MIDI files, audio recordings, or sheet music. The main applications of interest are content-based retrieval systems, music recommenders, classifiers, and transcribers. Other participants are interested in the use of metadata and the implementation of systems that allow on-line access to collections.

This year’s conference was sponsored by the Sheridan Libraries of Johns Hopkins University as well as the Library of Congress (LOC). Thus participants were able to experience an interesting tour of the LOC, and to hear works from the rare collection of 19th-century popular sheet music from the Lester S. Levy collection held by the Sheridan Libraries. In addition, a specially prepared concert at the Peabody Institute demonstrated a range of relatively accessible computer music, including the audio-visually entertaining 7 Cartoons by Maurice Wright and the beautiful Narcissus for solo flute by Thea Musgrave, sensitively performed by Peabody graduate Chia-Jui Lee. Other pieces stretched the capabilities of conventional instruments such as the piano, double bass, and trombone.

There were 23 peer-reviewed papers and 25 posters presented at this conference, with additional invited sessions and panels. Invited speaker Avery Wang amazed attendees with demonstrations of the Shazam audio search engine. Shazam’s index of local temporal features successfully identified recordings from very noisy environments. For the second year running, ISMIR included a successful tutorial program, including a return of the popular session on audio retrieval techniques by George Tzanetakis.

In this review article I discuss the results presented grouped by type of application, firstly addressing usability, then symbolic and audio-based retrieval work, followed by digital library issues, and progress in evaluation of Music Information Retrieval (MIR) systems.

User Issues
This year saw further research into user issues of content-based music retrieval, particularly the ability of users to construct queries, whether through singing or using a text-based representation. Through Roger B. Dannenberg et al.’s work we discovered that only half of users’ sung queries resembled the target piece of music, many being jumbled up fragments of the original piece. Steffen Pauws showed us that absolute pitch is unlikely to be used in sung queries, as only recently heard songs repeated by trained singers were likely to be at the original pitch. Tempo and contour, however, remain the most reliable aspects of user singing performance. Eliciting a user’s query in the form of a string of symbols was less successful (Alexandra L. Uitdenbogerd and Yaw-Wah Yap), with non-musicians having no success at all in constructing contour or numeric representations of a simple melody.

In other work, researchers of the Musical Audio Mining (MAMI) project (Micheline Lesaffre et al.) showed that there were differences in user queries based on gender, musical experience, and age.

The main interest of many researchers at ISMIR is the technology required for building successful content-based music retrieval systems. The problem is approached in several ways, using different types of data. In the symbolic realm, researchers work with MIDI files or notation-based data-forms and develop matching algorithms, indexes, or front-ends for query-by-humming systems. In the audio domain there is now a broadening of goals. Retrieval and classification are based on features representing genre, mood, exact recording match, or whatever evidence can be gained for notes or tonality. Progress in the difficult task of transcription would bring the symbolic and audio techniques together, but this appears to be a sufficiently hard problem that more years of work are required before that goal can be reached.

Another issue of concern is the ability to compare the techniques developed by different researchers so that it is possible to determine what works best. I discuss these applications and issues in the light of work presented at ISMIR 2003.

Symbolic retrieval
Most symbolic retrieval work presented was based on MIDI data. Techniques ranged from dynamic programming and n-gram (substring of length “n”) string matching to the ever more prevalent Hidden Markov Model approach.

Colin Meek and William P. Birmingham showed that a comprehensive modeling of user errors, including both cumulative and local pitch errors, worked best for their query-by-singing system. In matching, they sensibly include an alignment constraint that the sung query will be entirely found within the potential answer, which should help to improve results slightly.

Audio-based retrieval and classification
For the first time this year there was work that specifically targeted the problem of classification according to mood. Mood-based classification is likely to be useful for both recommender systems (Alexandra L. Uitdenbogerd and Ron van Schyndel, ISMIR 2002) and as a useful attribute for music retrieval. Dan Liu, Lie Lu, and Hong-Jiang Zhang made use of Thayer’s simple two-dimensional model for mood in training their audio classifier. As mood often varies during a piece of music, the researchers built a mood tracker. Other researchers classified music based on the lead singer in a collection of Mandarin pop songs (Wei-Ho Tsai et al.), and on dance genres of ballroom dance music (Simon Dixon et al.).

Martin F. McKinney and Jeroen Breebaart demonstrated some careful analysis of individual audio features in terms of their ability to predict genre. In particular they found that temporal modulation is a good feature, with the use of audio perception-based models being the most effective. This work should be of great value to those working in the audio classification field.

Transcription, Alignment, and Analysis
Other areas of work covered by the conference included alignment of scores (MIDI) with audio, harmonic labeling, and instrument identification. In a more novel application, Olivier K. Gillet and Gaël Richard labeled tabla audio recordings with the syllables representing the different ways of hitting the instrument. Also represented at the conference was work on musical pattern discovery (Olivier Lartillot).

The problem of evaluating music retrieval systems for effectiveness has been an ongoing concern since the year 2000, with discussions and panels devoted to the problem each year. Due to the intellectual property issues, researchers are unable to share their collections with other researchers, making it difficult to compare results. A model is being developed that allows people to remotely test their systems without transferring musical data to their home systems. This still creates some difficulties for the development cycle, as developers often need to listen to tracks in order to debug or refine programs, but appears to be a useful first step. However, other stopgap ideas include the sharing of audio features such as mel-frequency cepstral coefficients. Funding has been secured and a collection (NAXOS) made available for this endeavor, and the organizers believe the initial system should be running by 2004.

Meanwhile, evaluation of systems is growing from the ground up. Teams at the University of Michigan and Carnegie Mellon University have collaborated in order to produce a test bed capable of allowing system comparison for effectiveness.

Another related issue in evaluation is the availability of user queries. Much thought has gone into the required meta-data fields for these, as there are many types of data that need to be captured both for research into user issues and in testing the effectiveness of systems. The MAMI project has made its annotated database of vocal queries available via the Internet (www.ipem.ugent.be/MAMI))—possibly the only legally sharable collection able to be accessed for research currently.

Digital Music Libraries and their Issues
Keynote speaker Anthony Seeger gave us a realistic view of the proportion of music collections that can legally and ethically be made available to the public, as well as the effort required in order for this to happen at all. Many cultural issues indicate that sensitivity is needed in the handling of some materials.

Various online collections were demonstrated or discussed during the conference, including the Library of Congress music collection, and a system developed for Danish music libraries. Several systems involve allocating payment to rights-holders for downloads.

The ISMIR series of conferences has gathered together a thriving group of interested researchers with a wide range of backgrounds, yet with the common goal of improving our abilities to find the music we’re interested in. While we haven’t yet, as a community, quite solved the evaluation issue, there was much of interest and value at this year’s conference. Apart from the sights and sounds in Washington and Baltimore, and the renewal of old friendships, I was pleased to see useful results that can be built on, such as that by Martin F. McKinney and Jeroen Breebaart, and much of the user query research. The presentations are available online at the conference Web site (ismir2003.ismir.net/presentations.html). In 2004, ISMIR returns to Europe, and will be held in Barcelona, Spain.