Vol. 27 Issue 2 Reviews
Third International Conference on Music Information Retrieval

Institut de Recherche et Coordination Acoustique/Musique (IRCAM), Paris, France, 13-17 October 2002.

Reviewed by Maxwell Wells
Seattle, Washington, USA

Last Supper, or Supper at Last?

French cuisine has a well-deserved reputation for excellence. It requires selecting the very best ingredients, meticulous attention to detail, and exquisite timing. The 2002 International Conference on Music Information Retrieval (ISMIR) organizing committee, under “Maître d’” Michael Fingerhut, served up an intellectual feast with typical Parisian éclat. Close to 200 participants attended a total of 58 short and long papers, five invited papers, three tutorials, three panel discussions and a half-day professional visit, all over five days. The opening address by Pulitzer Prize-winning author and academic Douglas Hofstadter on the topic of “Gist-finding, Analogy-making, Variation-spinning” set the tone for the quality and exuberance of intellectual enquiry.

The conference papers were organized into seven subsections: 1) Preprocessing; 2) Indexing, Classification and Analysis; 3) Summarization; 4) Query by Example; 5) Similarity and Recognition; 6) Systems; 7) Usability. This review will not focus on the specifics of the papers, but on some of the strong undercurrents that tugged at the conference. This reflects my personal bias and experience as the Chief Technology Officer and co-founder of Cantametrix, a music technology company acquired by Gracenote. Suffice to say that all of the papers, posters, tutorials, and panels were of a consistently high quality. Readers are advised to visit the ISMIR2002 Web site for specifics (ismir2002.ircam.fr/papers.html). The professional visit to the Inathèque de France (equivalent to the US Library of Congress) was an education in the strength and quality of both Gallic architecture and pride. The conference supper aboard the vessel Bel Ami was an appropriate finale to the event. The cruise along the Seine provided spectacular cuisine, scenery, and comraderie, as well as some of the motifs for this review.

As digitization and electronic music distribution (EMD) make music more available, the need for tools and technologies to manage music, and information about music, is becoming increasingly evident. The experience of music listening, like fine dining, is enhanced by the artful combination of textures, colors, and flavors. A hard-drive with 10,000 tracks is an indigestible collection of bits until the music can be organized and presented in appetizing bites. The presentations were about the tools and recipes required to transform the raw ingredients into a sumptuous repast. The conference addressed very real problems being faced by music fans today. This was not a solution in search of a problem.

Yet, despite the appealing location, the comprehensive menu, and the very reasonable price (subsidized by the Centre National de la Recherche Scientifique, Indiana University, the Marie de Paris, IRCAM, and the National Science Foundation), there was very little consumption by music service providers, retailers, radio/internet radio companies, and of course, record companies (the “labels”). Was it because this section of the industry has already eaten its fill and has no appetite for the courses on offer?

Over the past three years a number of companies with an assortment of content-management tools have attempted to cater to the music industry. These include eTantrum, GigaBeat, HiFind, Music Buddha, Sonic Print, and Uplister (all now defunct), Cantametrix (acquired by Gracenote), Mongo Music (acquired by Microsoft), AgentArts, Audible Magic, BayTSP, Friskit, Media Unbound, MoodLogic, Relatable, Savage Beast, Sonicprint and Tune Print. For the most part, these companies have created products to help introduce people to more music, with the expectation of increasing music sales.

Some of the research presented at the conference replicated the work done by these companies. This is neither surprising nor a waste of effort, because few of the companies have published their research, making it difficult to know what they have done or how they did it. For example, Gigabeat crawled the Web for cultural information related to artists and CDs. Uplister harvested and shared playlists created by individuals. Human-generated playlists are an efficient means to incorporate both intrinsic attributes and extrinsic (cultural) attributes of the music, and to take people from music they know to music they don’t while maintaining some coherent theme. Music identification by means of intrinsic features (fingerprinting) has been tackled by Audible Magic, BayTSP, Cantametrix, eTantrum, Mood Logic, Relatable, Sonicprint, and Tuneprint. Accurate music identification provides the foundation for a range of value-added services.

Agent Arts, Friskit, HiFind, Mood Logic, Mongo Music, Music Buddha, and Savage Beast have all used humans to classify and describe music, and have created relational databases with that information. Both Cantametrix and Mongo Music use machine-extracted attributes to classify music. The Cantametrix technology was inspired by the work of Patrik Juslin (“Emotional Communication in Music Performance: A Functionalist Perspective and Some Data,” Music Perception 14/4, 1997: 383-418) and David Smith (“The Place of Musical Novices in Music Science,” Music Perception 14/3, 1997: 227-262). Similarity and query-by-example are means to access this type of classification, and most of the companies developed some kind of offering. Many of the companies, and their offerings, no longer exist. Working with similarity is non-trivial, and several authors at the conference described approaches and prospective solutions. The difficulties have been reviewed by Amos Tversky (“Features of Similarity,” Psychological Review 84/4, 1977: 327-352), and include context-dependence, judgment by both common and distinctive features, the presence of asymmetric similarities, and the non-complementarity of similarity and dissimilarity. The problem of generating recommendations that individuals will like, in a scalable and flexible manner, is far from being solved.

Research around collaborative filtering, as practiced by Amazon, Media Unbound, and others, was not addressed at the conference. This is an area that generates huge quantities of mineable data that are ripe for research, on the role of music as a psychographic metric, for example. This is another interesting and difficult problem.

Problems that have not been tackled by any of the companies, and for which research was presented at the conference, include: melody segmentation, voice separation (including instrument voice), summarization, transcription, query-by-humming, and singer identification. It is likely that many of the companies have performed some rudimentary usability studies, but it is also likely that most have had some bias or preconceptions inspired by their own product offering. None of the commercial companies have created an accessible corpus of music data for subsequent research, nor created extensible evaluation frameworks. Each represents a difficult problem for one or more of the following reasons: they are technically challenging, it is not clear how they would generate revenue, and there are difficult rights issues. However, they are all worth tackling because the fruits of the research are likely to result in technologies with a variety of applications.

Conferences satiate our need to “show and tell” even if we have seen it and heard it before. Therefore, the absence of particular industry participants is not adequately explained by the “been there, done that” hypothesis. A more powerful explanation is that there isn’t enough money being made to generate a feeding frenzy. This begs the question: if consumers want it, why is there so little revenue? The cynical answer is that the labels are delaying the start of supper by refusing to give their blessing, in the hopes that some guests will starve to death and leave them a bigger share. The truth is more complex and very relevant to the future role of ISMIR.

Any change within the record companies requires an economic incentive. This could take the form of a drastic and sustained reduction in CD sales, or a clear opportunity to make more money. Despite claims that person-to-person (P2P) file-sharing is eating the labels’ lunch, the jury is still out as to whether this is the cause of the approximately 10% decrease in CD sales. The majority of consumers will pay for ease of use and quality of service. Illegal P2P file-sharing can offer neither because of a combination of legal challenges, blocking, and interference tactics orchestrated by the likes of the Recording Industry Association of America (RIAA) and the International Federation of the Phonographic Industry (IFPI). In contrast, CDs are easy to use, CD players are ubiquitous, and CDs provide reassuring physicality in the face of hard-disk crashes and computer viruses.

The other economic incentive is the promise of making more money. This means finding more buyers, or persuading existing buyers to spend more. Despite industry recognition that EMD could do both, and grow what is currently a 40 billion dollar per year industry to 100 billion dollars per year, change has been slow. The lethargic response of the labels to adopting EMD belie some fundamental impediments which, like congealed fat in a drain, have blocked the flow of commerce. The essence of the blockage is that until someone makes money with EMD there is no incentive to adopt it, and no one will make money until it is adopted. These impediments include:

Agreement on rights. Many sound recordings have more than one rights owner. There are always problems in getting multiple people to agree on anything. This is even more difficult when it is not clear how money is to be made.
Antiquated infrastructure. To the embarrassment of the record industry, the fight with Napster highlighted the fact that for many sound recordings no one knows who owns the rights. The information is either missing or the rights have changed hands so many times no one knows who has the correct information. (This situation is better in Europe, which has a more centralized rights management process.) Improving the infrastructure will take money.
Leadership. The people at the top are empowered to minimize risk and maximize reward. It is easy for them to see the risk in EMD, in the form of P2P file sharing, but the reward is less clear.
Revenue models. It is not clear what people will pay for. Finding out requires some attempts and some failures. We are only beginning to see the attempts, and it will take some time to dig through the wreckage of the inevitable failures to understand how to go forward.

The challenges inherent in navigating around these impediments are huge. It will require bold leadership in an industry noted more for formulaic repetition of whatever worked last. In addition to the impediments, there are some cultural shifts that need to occur, including the realizations that:

Copy protection is both unworkable and unpopular. If music can be played it can be copied. In the competition for disposable income, music has many hungry competitors. Any sales gained by coercing people to purchase will be offset by the sales lost because copy protection creates barriers to use.
The record labels must find new ways to monetize legacy music. This is hard to swallow for an industry that makes a handsome profit from reissuing old music as “best of” CDs, and “Christmas compilations.” However, some farsighted executives will realize that the record labels can still make money by windowing the release of content, as has been done for movies, by selling legacy music “by the pound” for use in MP3 players, by offering services to manage content, and by continuing to manage the artist “brand” and the commercial activities that occur around that brand.
The objective is to increase your own profits, not to defeat the other guy. Again, this is hard to swallow in an industry where the exercise of power is an end in itself. Some of this style of interaction has leaked into the public arena, where music fans (i.e., customers) are feeling persecuted by the labels and their watchdogs (i.e., purveyors). It is difficult to see how this is good for business. The size of the pie can be increased by strategic alliances and partnerships so that there is more for everyone.

The ISMIR conference exposed two problems: the impediments to EMD, and the need for tools once EMD becomes mainstream. The impediments to EMD will be removed; the benefits are too great, the technology too agile, and the consumer demand too incessant. The labels are beginning to smell the cooking, but before they can go out to eat they need to put their own houses in order. The bad news is that this will not happen immediately. The good news is that this will give ISMIR participants time to create the science behind the products that will benefit music fans and the industry that serves them. ISMIR’s selection of ingredients and attention to detail were evident at this conference. We will see if their timing is perfect.