Johns Hopkins University/Library of Congress, Baltimore,
Maryland, USA, 26-30 October, 2003
Reviewed by Alexandra L. Uitdenbogerd
ISMIR 2003, the fourth International Conference on Music Information
Retrieval, was held October 26-30, 2003, in Baltimore, Maryland, USA.
Since its inception
in 2000, the conference has been a popular venue for technical papers
on analyzing, storing, and retrieving music, whether it be in the form
MIDI files, audio recordings, or sheet music. The main applications of
interest are content-based retrieval systems, music recommenders, classifiers,
and transcribers. Other participants are interested in the use of metadata
and the implementation of systems that allow on-line access to collections.
This year’s conference was sponsored by the Sheridan
Libraries of Johns Hopkins University as well as the Library of Congress
participants were able to experience an interesting tour of the LOC,
and to hear works from the rare collection of 19th-century popular sheet
from the Lester S. Levy collection held by the Sheridan Libraries.
In addition, a specially prepared concert at the Peabody Institute demonstrated
of relatively accessible computer music, including the audio-visually
entertaining 7 Cartoons by Maurice Wright and the beautiful Narcissus
for solo flute by Thea Musgrave, sensitively performed by Peabody graduate Chia-Jui
Lee. Other pieces stretched the capabilities of conventional instruments
as the piano, double bass, and trombone.
There were 23 peer-reviewed papers and 25 posters presented
at this conference, with additional invited sessions and panels. Invited
speaker Avery Wang
amazed attendees with demonstrations of the Shazam audio search engine.
Shazam’s index of local temporal features successfully identified
recordings from very noisy environments. For the second year running,
ISMIR included a successful tutorial program, including a return
of the popular
session on audio retrieval techniques by George Tzanetakis.
In this review article I discuss the results presented
grouped by type of application, firstly addressing usability, then symbolic
retrieval work, followed by digital library issues, and progress
in evaluation of Music Information Retrieval (MIR) systems.
This year saw further research into user issues of content-based
music retrieval, particularly the ability of users to construct
queries, whether through singing or using a text-based representation.
Dannenberg et al.’s work we discovered that only half of users’ sung
queries resembled the target piece of music, many being jumbled up fragments
of the original piece. Steffen Pauws showed us that absolute pitch is unlikely
to be used in sung queries, as only recently heard songs repeated by trained
singers were likely to be at the original pitch. Tempo and contour, however,
remain the most reliable aspects of user singing performance. Eliciting
a user’s query in the form of a string of symbols was less successful
(Alexandra L. Uitdenbogerd and Yaw-Wah Yap), with non-musicians having
no success at all in constructing contour or numeric representations of
a simple melody.
In other work, researchers of the Musical Audio Mining
(MAMI) project (Micheline Lesaffre et al.) showed that there were
differences in user queries based
on gender, musical experience, and age.
The main interest of many researchers at ISMIR is the technology
required for building successful content-based music retrieval
problem is approached in several ways, using different types
of data. In the
symbolic realm, researchers work with MIDI files or notation-based
develop matching algorithms, indexes, or front-ends for query-by-humming
systems. In the audio domain there is now a broadening of
goals. Retrieval and classification are based on features representing
exact recording match, or whatever evidence can be gained
notes or tonality.
Progress in the difficult task of transcription would bring
the symbolic and audio techniques together, but this appears
hard problem that more years of work are required before
that goal can be reached.
Another issue of concern is the ability to compare the
techniques developed by different researchers so that it is possible
to determine what works
best. I discuss these applications and issues in the light
of work presented at ISMIR 2003.
Most symbolic retrieval work presented was based on MIDI
data. Techniques ranged from dynamic programming and
of length “n”)
string matching to the ever more prevalent Hidden Markov Model approach.
Colin Meek and William P. Birmingham showed that a comprehensive
modeling of user errors, including both cumulative
and local pitch errors, worked
best for their query-by-singing system. In matching,
they sensibly include an alignment constraint that
query will be
entirely found within
the potential answer, which should help to improve
Audio-based retrieval and classification
For the first time this year there was work that specifically
targeted the problem of classification according
to mood. Mood-based classification
is likely to be useful for both recommender systems
(Alexandra L. Uitdenbogerd and Ron van Schyndel,
ISMIR 2002) and
as a useful attribute
retrieval. Dan Liu, Lie Lu, and Hong-Jiang Zhang
made use of Thayer’s simple
two-dimensional model for mood in training their audio classifier. As mood
often varies during a piece of music, the researchers built a mood tracker.
Other researchers classified music based on the lead singer in a collection
of Mandarin pop songs (Wei-Ho Tsai et al.), and on dance genres of ballroom
dance music (Simon Dixon et al.).
Martin F. McKinney and Jeroen Breebaart demonstrated
some careful analysis of individual audio features
of their ability
to predict genre.
In particular they found that temporal modulation
is a good feature, with the use of audio perception-based
work should be of great value to those working
audio classification field.
Transcription, Alignment, and Analysis
Other areas of work covered by the conference included
alignment of scores (MIDI) with audio, harmonic
labeling, and instrument
identification. In a more novel application,
Olivier K. Gillet and Gaël Richard labeled
tabla audio recordings with the syllables representing the different ways
of hitting the instrument. Also represented at the conference was work
on musical pattern discovery (Olivier Lartillot).
The problem of evaluating music retrieval systems
for effectiveness has been an ongoing concern
year 2000, with discussions
devoted to the problem each year. Due to the
intellectual property issues, researchers are
unable to share
their collections with
making it difficult to compare results. A model
is being developed that allows people to remotely
data to their home systems. This still creates
some difficulties for the development cycle,
often need to
listen to tracks
to debug or refine programs, but appears to
be a useful first step. However, other stopgap ideas
such as mel-frequency
cepstral coefficients. Funding has been secured
and a collection (NAXOS) made available for
and the organizers
believe the initial
system should be running by 2004.
Meanwhile, evaluation of systems is growing
from the ground up. Teams at the University
collaborated in order to produce a test bed
capable of allowing system comparison
Another related issue in evaluation is the
availability of user queries. Much thought
has gone into the
as there are many types of data that need
to be captured both for research into user
the effectiveness of systems.
project has made its annotated database
of vocal queries available
the Internet (www.ipem.ugent.be/MAMI))—possibly
the only legally sharable collection able
to be accessed for research currently.
Digital Music Libraries and their Issues
Keynote speaker Anthony Seeger gave us
a realistic view of the proportion of
that can legally
ethically be made available
to the public, as well as the effort
required in order for this
to happen at all. Many cultural issues
that sensitivity is
needed in the
handling of some materials.
Various online collections were demonstrated
or discussed during the conference,
Library of Congress
a system developed
for Danish music libraries. Several
systems involve allocating payment to rights-holders
The ISMIR series of conferences has
gathered together a thriving group
a wide range
the common goal of improving our
abilities to find the music we’re interested
in. While we haven’t yet, as a community, quite solved the evaluation
issue, there was much of interest and value at this year’s conference.
Apart from the sights and sounds in Washington and Baltimore, and the renewal
of old friendships, I was pleased to see useful results that can be built
on, such as that by Martin F. McKinney and Jeroen Breebaart, and much of
the user query research. The presentations are available online at the
conference Web site (ismir2003.ismir.net/presentations.html). In 2004,
ISMIR returns to Europe, and will be held in Barcelona, Spain.