Vol. 30 Issue 2 Reviews
The 6th International Conference on Music Information Retrieval (ISMIR 2005)

Centre for Digital Music, Queen Mary, University of London; Centre for Cognition, Computation and Culture, Goldsmiths’ College, University of London, London, UK, 11-15 September 2005.

Reviewed by David Gerhard
Regina, Saskatchewan, Canada

ISMIR 2005, the Sixth International Conference on Music Information Retrieval, took place at Queen Mary, University of London, and Goldsmiths College, University of London, from the 11th to the 15th of September, 2005. The paper sessions and poster sessions took place at Queen Mary, while introductory tutorials took place at Goldsmiths on Sunday the 11th. The event was sponsored this year by Microsoft Research, Sun Microsystems, the British Library, Hewlett-Packard, Philips Research, the British Computer Society, the Digital Music Research Network, and the Multimedia Knowledge Management Network. Apple Computer provided an Internet Cafe consisting of six iMacs set up in the lobby of the Centre for Digital Music building where the paper and poster sessions took place.

Although the conference did not have an official theme, several topics of prominence did emerge: MIREX, the Music Information Retrieval Evaluation eXchange; annotation of ground truth for training data; the maturing of the ISMIR research community; the role of ISMIR in both the wider Computer Music and Information Retrieval communities; and the utility of Music Information Retrieval (MIR) research to musicologists and information analysts. This was in addition to the spectrum of excellent technical papers and posters. One of the main topics of conversation outside of the sessions was the interpretation and classification of genre.

Compared to other conferences this reviewer has attended, ISMIR is still relatively new, and remains small. Rather than being a detriment, this serves many advantages. There were only two parallel sessions, so it was possible to attend almost every paper in the conference. The subject area of the conference continues to be tight and specific, while at the same time being neither stifling nor predictable. One hundred and thirty-eight submissions were received, a record for ISMIR, and the reviewers maintained a high standard of quality in the accepted papers. Of the 103 oral presentation submissions, 56 were accepted for oral presentation and given eight pages in the proceedings, and 29 were accepted for poster presentation with six pages in the proceedings. Of the 35 poster and demonstration submissions, 28 were accepted and given four pages in the proceedings. In addition, there were three excellent invited speakers: Nicholas Cook, Steven Robertson, and Thomas Dolby.

Nicholas Cook, a Research Professor of Music from Royal Holloway, University of London, began the conference with an invited talk entitled "Towards the Complete Musicologist?" in which he detailed the some of the typical work musicologists do, and how music information retrieval techniques may be employed to benefit musicology. Dr. Cook noted that "What will be critical from the jobbing musicologist's point of view is the trickling down of research—the translation of cutting edge research into practical usable everyday tools." He noted that professional musicologists are often reluctant to adopt new techniques and technologies, and that even if a musicologists does use, for example, the Humdrum Toolkit, it is unlikely that they will use it frequently enough to become proficient with the interface. This is an issue for many technical fields, especially those building tools for non-technicians. The people who build the tools are, by necessity, familiar with what goes on "under the bonnet," as Dr. Cook puts it, and therefore often produce an interface that, while usable to the technician, is not as usable for the non-technical user. Many of the issues brought up by Dr. Cook's talk were examined in greater detail over coffee and after hours off-site. The MIR community must look beyond the technology toward utility and usability.

Stephen Robertson is a Professor of Information Systems at the Department of Information Science at City University, London. Stephen also works at Microsoft Research in Cambridge. As a long-standing expert in text information retrieval (TIR), Dr. Robertson was able to provide a grounding and context for MIR. The TIR community is significantly more mature than the MIR community, by a matter of decades, and still there is much work to be done in resolving ambiguity, differentiating between multiple meanings, identifying synonyms, recognizing context, and indexing documents. It is a humbling prospect to consider the relative maturity of TIR in comparison to MIR, and to see how far we really have to go. There is no doubt that some of the big questions of MIR, destined to plague us for years to come, have not even been identified yet. One question that was asked of Dr. Robertson at the end of his talk was whether TIR techniques have been applied to so-called musical text such as poetry or drama. Dr. Robertson could think of no such research, uncovering a possible area of crossover between his field and ours.
Thomas Dolby, famous for "She Blinded Me With Science," presented a talk on his most recent field of interest, data sonification. Mr. Dolby is Dr. Robertson's brother, and he said that his and his brother's careers had never really intersected before, so he was grateful for the opportunity to speak at this academic conference. We were grateful for his insight and entertaining discussion. Mr. Dolby considers his current sonification work almost inverse to the problem of MIR: he takes data and transforms it into music in order to better understand and interact with the data. His talk presented an interesting perspective into music in general, his music career up to now, and his recent sonification work.

In addition to the invited speakers, and the papers and posters to be described below, there were three panel sessions in which key topics of interest to the MIR community were discussed. On Monday, the panel topic was professional users of MIR, covering many of the topics brought forward by Dr. Cook's invited talk earlier in the day. Tuesday's panel explored creative applications of MIR, where ideas of live performance and composition techniques were discussed. A wishlist of applications gathered from this panel might include "compose in the style of me" systems, stylistic improvisation, automatic play-list generation, and live DJ assistance including beat matching and cross-fading. Audio Mosaicing, or Musaicing, was a topic of particular interest, wherein collections of music are segmented and recombined based on acoustic characteristics. Wednesday's panel topic was MIREX, the main issues of which will be discussed later in this review.

The paper sessions began on Monday with two general papers on music information retrieval. Jin Ha Lee presented challenges in cross-cultural/multilingual music information singing, observing that cross-language and cross-cultural queries may be different than queries made in the language and cultural bias of the information being sought. The work exposed the emphasis of much MIR research on Western music, a subject still of some concern in other more mature computer music fields. Cynthia Grund then presented some general philosophical remarks on music information retrieval, memory, and culture, and mentioned a new publication forum: the Journal of Music and Meaning.

The second session, on music classification, initiated some of the heated discussions on genre. This session began with Shyamala Doraisamy presenting a discussion of genre classification incorporating non-Western musical forms; Markus Schedl presented an artist classification technique based on co-occurrence of textual terms on Web sites; and Thomas Lindy brought psychoacoustic transformations to the discussion of genre classification.

The papers on music similarity and genre often raised deeper issues, such as the nature of genre, personalized interpretation, cover performances of music, and membership in multiple genres. Some participants believe that genre itself is such a nebulous concept, so difficult to define, that it is hopeless to try to write software to classify music in this way. Others believe that the idea of musical similarity and genre classification is so fundamental to the way we humans experience music that studies of music information retrieval must start with genre. Discussions of music similarity and the rigor of genre classifications, or lack thereof, were frequent throughout the conference, and paper presenters whose work touched on genre were sometimes seen to apologize or defend, depending on where they stood on the issue.

The third paper session consisted of two parallel sessions, the first of which picked up on yet another important topic, that of the collection of ground truth, which normally means annotating large datasets by hand. Several systems were presented that facilitate the annotation. The second parallel stream considered extracting features from audio. Particularly impressive was Parag Chordia's presentation of a segmentation and recognition of Tabla strokes. An example of appropriate annotation, Mr. Chordia hand-transcribed many thousands of individual strokes, and built a system that successfully segments and classifies these strokes.

The evening entertainment the first day was a concert by the Soweto Kinch Quintet. Having spent most of the day investigating monophonic segments and relatively simplistic rhythm patterns, the concert was both invigorating and humbling. The quintet fused jazz, rap, and many other musical styles to produce an exciting and cohesive whole. From a listener perspective, it was a great night out. From the perspective of MIR, it showed just how far our systems need to go before they can handle novel, complex, genre-bending music.

Tuesday’s sessions began with a set of papers on MIR systems, including applications for interactive retrieval, wireless networks, and music visualization. The first session concluded with an informative survey of current music information retrieval systems. The second session on Tuesday, "Motivic and melodic analysis and retrieval," dealt with analyzing the structural content of music for transcription and analysis. Of particular interest was recent work on polyphonic audio, and a comparison of motivic patterns in different versions of popular songs. The parallel sessions on Tuesday afternoon included papers on alignment and retrieval, some of which also touched on ground truth annotation and segmentation, as well as papers on optimized and efficient methods, which included applications of some standard fast classification systems to retrieval, query by humming, and genre classification.

Wednesday's sessions included papers on music similarity, including work on one of the classic MIR problems: that of finding new music based on an existing collection. The session on harmonic-based analysis and retrieval included a comparison of pitch-spelling algorithms and several papers on analysis by chromaticity, where the overall "flavor" of a chord is used to determine its place in a melodic structure. The afternoon session on Wednesday detailed voice/instrument analysis, classification and segregation, including an interesting paper on separation of vocals from polyphonic audio recordings.

Thursday's first session included work on rhythmic and structural analysis, including classification of musical meter, generative analysis of rhythms, and an interesting paper by Roger Dannenberg on what he calls “holistic” music analysis. The last session of the conference detailed work on user interfaces, visualization, and interaction. Of particular interest here was work by Masataka Goto and Takayuki Goto, who presented a new music playback interface which was clean, functional, usable, and fun. This work was very popular with those in attendance.
The poster presentations contained much interesting and novel work, and the reader is referred to the proceedings of the conference for those papers. The Wednesday poster session included posters detailing work by MIREX competitors.

On a final note, this reviewer was most impressed with the MIREX competition. The music information retrieval evaluation exchange was an opportunity for researchers in MIR to pit their best efforts against the rest of the community on fair and balanced tasks. Several tasks were proposed by the community, and task participation was open to all. This year, tasks included the ubiquitous audio genre classification, as well as onset detection, melody extraction, artist identification, key finding, drum detection, and others. Individual tasks had a public and private data set, the public data set for training algorithms, and the private data set used to evaluate the algorithms. It should be noted here that an enormous amount of work was required to evaluate all incoming algorithms on a single consistent data set for each task, and this was only the second time this evaluation has been performed.

The MIR community has a substantial online presence, where discussions (and arguments) initiated at the conference continue to be explored (www.ismir.net). There is a mailing list dedicated to MIR, and one specifically concerned with the MIREX competition. Readers interested in involving themselves in the MIR community may find it useful to participate in these forums in anticipation of attending ISMIR 2006 in Victoria, British Columbia, Canada, organized by George Tzanetakis and Holger Hoos.