Johns Hopkins University/Library of Congress, Baltimore,
Maryland, USA, 26-30 October, 2003
Reviewed by Alexandra L. Uitdenbogerd
Melbourne, Australia
Introduction
ISMIR 2003, the fourth International Conference on Music Information
Retrieval, was held October 26-30, 2003, in Baltimore, Maryland, USA.
Since its inception
in 2000, the conference has been a popular venue for technical papers
on analyzing, storing, and retrieving music, whether it be in the form
of
MIDI files, audio recordings, or sheet music. The main applications of
interest are content-based retrieval systems, music recommenders, classifiers,
and transcribers. Other participants are interested in the use of metadata
and the implementation of systems that allow on-line access to collections.
This year’s conference was sponsored by the Sheridan
Libraries of Johns Hopkins University as well as the Library of Congress
(LOC). Thus
participants were able to experience an interesting tour of the LOC,
and to hear works from the rare collection of 19th-century popular sheet
music
from the Lester S. Levy collection held by the Sheridan Libraries.
In addition, a specially prepared concert at the Peabody Institute demonstrated
a range
of relatively accessible computer music, including the audio-visually
entertaining 7 Cartoons by Maurice Wright and the beautiful Narcissus
for solo flute by Thea Musgrave, sensitively performed by Peabody graduate Chia-Jui
Lee. Other pieces stretched the capabilities of conventional instruments
such
as the piano, double bass, and trombone.
There were 23 peer-reviewed papers and 25 posters presented
at this conference, with additional invited sessions and panels. Invited
speaker Avery Wang
amazed attendees with demonstrations of the Shazam audio search engine.
Shazam’s index of local temporal features successfully identified
recordings from very noisy environments. For the second year running,
ISMIR included a successful tutorial program, including a return
of the popular
session on audio retrieval techniques by George Tzanetakis.
In this review article I discuss the results presented
grouped by type of application, firstly addressing usability, then symbolic
and audio-based
retrieval work, followed by digital library issues, and progress
in evaluation of Music Information Retrieval (MIR) systems.
User Issues
This year saw further research into user issues of content-based
music retrieval, particularly the ability of users to construct
queries, whether through singing or using a text-based representation.
Through
Roger B.
Dannenberg et al.’s work we discovered that only half of users’ sung
queries resembled the target piece of music, many being jumbled up fragments
of the original piece. Steffen Pauws showed us that absolute pitch is unlikely
to be used in sung queries, as only recently heard songs repeated by trained
singers were likely to be at the original pitch. Tempo and contour, however,
remain the most reliable aspects of user singing performance. Eliciting
a user’s query in the form of a string of symbols was less successful
(Alexandra L. Uitdenbogerd and Yaw-Wah Yap), with non-musicians having
no success at all in constructing contour or numeric representations of
a simple melody.
In other work, researchers of the Musical Audio Mining
(MAMI) project (Micheline Lesaffre et al.) showed that there were
differences in user queries based
on gender, musical experience, and age.
Applications
The main interest of many researchers at ISMIR is the technology
required for building successful content-based music retrieval
systems. The
problem is approached in several ways, using different types
of data. In the
symbolic realm, researchers work with MIDI files or notation-based
data-forms and
develop matching algorithms, indexes, or front-ends for query-by-humming
systems. In the audio domain there is now a broadening of
goals. Retrieval and classification are based on features representing
genre, mood,
exact recording match, or whatever evidence can be gained
for
notes or tonality.
Progress in the difficult task of transcription would bring
the symbolic and audio techniques together, but this appears
to be
a sufficiently
hard problem that more years of work are required before
that goal can be reached.
Another issue of concern is the ability to compare the
techniques developed by different researchers so that it is possible
to determine what works
best. I discuss these applications and issues in the light
of work presented at ISMIR 2003.
Symbolic retrieval
Most symbolic retrieval work presented was based on MIDI
data. Techniques ranged from dynamic programming and
n-gram (substring
of length “n”)
string matching to the ever more prevalent Hidden Markov Model approach.
Colin Meek and William P. Birmingham showed that a comprehensive
modeling of user errors, including both cumulative
and local pitch errors, worked
best for their query-by-singing system. In matching,
they sensibly include an alignment constraint that
the sung
query will be
entirely found within
the potential answer, which should help to improve
results slightly.
Audio-based retrieval and classification
For the first time this year there was work that specifically
targeted the problem of classification according
to mood. Mood-based classification
is likely to be useful for both recommender systems
(Alexandra L. Uitdenbogerd and Ron van Schyndel,
ISMIR 2002) and
as a useful attribute
for music
retrieval. Dan Liu, Lie Lu, and Hong-Jiang Zhang
made use of Thayer’s simple
two-dimensional model for mood in training their audio classifier. As mood
often varies during a piece of music, the researchers built a mood tracker.
Other researchers classified music based on the lead singer in a collection
of Mandarin pop songs (Wei-Ho Tsai et al.), and on dance genres of ballroom
dance music (Simon Dixon et al.).
Martin F. McKinney and Jeroen Breebaart demonstrated
some careful analysis of individual audio features
in terms
of their ability
to predict genre.
In particular they found that temporal modulation
is a good feature, with the use of audio perception-based
models
being
the most
effective. This
work should be of great value to those working
in the
audio classification field.
Transcription, Alignment, and Analysis
Other areas of work covered by the conference included
alignment of scores (MIDI) with audio, harmonic
labeling, and instrument
identification. In a more novel application,
Olivier K. Gillet and Gaël Richard labeled
tabla audio recordings with the syllables representing the different ways
of hitting the instrument. Also represented at the conference was work
on musical pattern discovery (Olivier Lartillot).
Evaluation
The problem of evaluating music retrieval systems
for effectiveness has been an ongoing concern
since the
year 2000, with discussions
and panels
devoted to the problem each year. Due to the
intellectual property issues, researchers are
unable to share
their collections with
other researchers,
making it difficult to compare results. A model
is being developed that allows people to remotely
test
their systems
without transferring
musical
data to their home systems. This still creates
some difficulties for the development cycle,
as developers
often need to
listen to tracks
in order
to debug or refine programs, but appears to
be a useful first step. However, other stopgap ideas
include
the
sharing of
audio features
such as mel-frequency
cepstral coefficients. Funding has been secured
and a collection (NAXOS) made available for
this endeavor,
and the organizers
believe the initial
system should be running by 2004.
Meanwhile, evaluation of systems is growing
from the ground up. Teams at the University
of Michigan
and
Carnegie Mellon
University
have
collaborated in order to produce a test bed
capable of allowing system comparison
for
effectiveness.
Another related issue in evaluation is the
availability of user queries. Much thought
has gone into the
required meta-data
fields
for these,
as there are many types of data that need
to be captured both for research into user
issues
and
in testing
the effectiveness of systems.
The MAMI
project has made its annotated database
of vocal queries available
via
the Internet (www.ipem.ugent.be/MAMI))—possibly
the only legally sharable collection able
to be accessed for research currently.
Digital Music Libraries and their Issues
Keynote speaker Anthony Seeger gave us
a realistic view of the proportion of
music collections
that can legally
and
ethically be made available
to the public, as well as the effort
required in order for this
to happen at all. Many cultural issues
indicate
that sensitivity is
needed in the
handling of some materials.
Various online collections were demonstrated
or discussed during the conference,
including the
Library of Congress
music collection,
and
a system developed
for Danish music libraries. Several
systems involve allocating payment to rights-holders
for downloads.
Summary
The ISMIR series of conferences has
gathered together a thriving group
of interested
researchers with
a wide range
of backgrounds,
yet with
the common goal of improving our
abilities to find the music we’re interested
in. While we haven’t yet, as a community, quite solved the evaluation
issue, there was much of interest and value at this year’s conference.
Apart from the sights and sounds in Washington and Baltimore, and the renewal
of old friendships, I was pleased to see useful results that can be built
on, such as that by Martin F. McKinney and Jeroen Breebaart, and much of
the user query research. The presentations are available online at the
conference Web site (ismir2003.ismir.net/presentations.html). In 2004,
ISMIR returns to Europe, and will be held in Barcelona, Spain.
|