Vol. 29 Issue 1 Reviews
International Conference on Auditory Display 2004: Listening to the Mind Listening

Sydney, Australia, 6-9 July 2004.

Reviewed by Edward Childs
Hanover, New Hampshire, USA

Gregory Kramer, in the 1994 book Auditory Display (Santa Fe Institute Studies in the Sciences of Complexity. Boulder, Colorado: Westview Press), pioneered the organization of the field into auditory icons (earcons), audification, and sonification. Most computer users routinely experience earcons while messaging online or emptying their (digital) trash; occasionally there are papers on the subject at the International Conference on Auditory Display (ICAD), for example, the development of next-generation auditory interfaces for the navigation of cellphone menus (“Earcons in Motion,” Sami Ronkainen, ICAD 2001). The last paper on audification, in which digital data whose characteristic frequencies fall outside of the range of human hearing are “played back” so as to be audible, was presented back in 2002 (“Auditory Seismology,” F. Dombois, ICAD 2002). In this review I have chosen to focus on the topic of sonification, and in particular, on an innovative concert derived from ten different sonifications of the same data set.

Sonification, in which data is used to control some aspect of sound for the purpose of communicating information about that data, is a divergent field in which more questions are being asked than answered. The central question of sonification has been called “the mapping problem.” If a sinusoidal oscillator tone is played at 440 Hz, what temperature is that? If one wishes to design a sonification scheme to monitor the temperature of a mixing tank in a processing plant, should the pitch be higher if the temperature increases? Would anyone want to listen to sine oscillator tones all day? Would rain forest sounds or crickets be more appropriate? If musical sounds are used, would that be too distracting, or would the listener be misled by unintended connotations?

The topic of sonification generates a lot of papers and discussion at ICAD conferences (2004 was no exception), and there is always an interesting mix of approaches to the problem. Toolkits for designing sonifications (“A Flexible Framework for Real-Time Sonification with SONART,” Woon Seung Yeo, Jonathan Berger, and R. Scott Wilson, ICAD 2004), examples of specific sonifications (“sMax, A Multimodal Toolkit for Stock Market Data Sonification,” Fabio Ciardi, ICAD 2004), novel synthesis techniques with intriguing mapping possibilities (“Physically-Based Models for Liquid Sounds,” Kees van den Doel, ICAD 2004), sonification perception studies (“Individual Differences, Cognitive Abilities and the Interpretation of Auditory Graphs,” Bruce Walker and Lisa Mauney, ICAD 2004), sound spatialization and spatial auditory perception (“An Orientation Experiment Using Auditory Artificial Horizon,” Matti Gröhn, Tapio Lokki, and Tapio Takala, ICAD 2004) are some typical examples.

One tension that typically exists in sonification research, at least as evidenced in ICAD conferences, is between the artistic, the scientific, and the practical. Sonifications designed by composers or sound artists often follow in the practice of twentieth-century composition (e.g., Iannis Xenakis’s Achorripsis), in which data or numerical calculations play a significant role but are shaped by many aesthetic decisions and adjustments, and are realized using interesting orchestrations or timbres. The richness and complexity of the end result may, however, frustrate the experimental psychologist, who wishes to isolate a limited number of sonic attributes and measure the subject responses to them in a controlled environment. In a practical situation, it may also frustrate the listener who wishes to hear what the data is doing—the complex sounds may be too difficult to decode.

Two installations in the lobby of The Studio in the Sydney Opera House, “ASX Voices” by Fabio Ciardi and “PlantA” by Garth Paine, sought to sonify, respectively, “the Australian voices of the global economy” and “meteorological processes vital to plants.” Both works were presented over an eight-channel sound system. The former used recorded data from the Australian stock market; the latter was driven in real-time by weather instruments at the Opera House collecting wind speed, wind direction, temperature, and solar radiation data. The composers provided no written explanations of the specific mappings and I found it difficult to discern even qualitative aspects of the data trends by sitting, listening, and guessing.

A sonification designed for practical use by financial traders (“Creating Functional and Livable Soundscapes for Peripheral Monitoring of Dynamic Data,” Brad Mauney and Bruce Walker, ICAD 2004) provides a background rainforest sound environment displaying one dimension of data: the percent change in price of a single stock or index above or below its “simple moving average.” Bullfrogs, roadrunners, and crickets herald percent changes in positive territory. Rain and thunder depict downside movements. In this case, qualitative trends in the single data dimension are easy to discern and are very attractive and listenable: however, the complexity of the sound world needed to portray a single dimension precludes the sonification of other indices or dimensions side by side. Furthermore, the interpretation (downside = rain and thunder = bad) might not complement the mindset of a trader who takes short positions in stock and makes money when their price falls. Another sonification of financial data (“Sonification of Real-Time Financial Data,” Petr Janata and Edward Childs, ICAD 2004) uses sparse melodic fragments in familiar orchestral timbres to depict the movements of multiple indices or stocks. The system was tested on psychology students and former financial traders at Dartmouth College.

Stephen Barrass, the organizer of ICAD 2004, is to be applauded for breaking significant new ground for the organization’s tenth conference. The third day of the conference, open to the public, was billed as the “sonif(y) festival” and was held in The Studio, a venue for experimental presentations at the Sydney Opera House. The festival was followed in the evening by a concert entitled “Listening to the MIND Listening.” The concert afforded a unique and unprecedented opportunity to hear 10 compositions, chosen from 30 submissions, all of which were sonifications of the same data sets. The pieces were intended to be interesting to scientists, relevant to sonification designers, and musically satisfying to a general audience.

The data was obtained from a cap comprising 26 electrodes arranged according to the 10-20 standard for EEG (electro-encephalogram) placement on the head of Evian Gordon, neuroscientist and CEO of the Brain Resource Company, while he was listening, over headphones, to “Dry Mud” by the indigenous Australian composer David Page, taken from the score of “Fish,” a dance work choreographed by Stephen Page and performed by the Bangarra Dance Theater. Besides the neural activity obtained from the electrodes, ten additional channels provided heart rate, breathing, skin conductance, eye movements, and other data. The data sets were made available for download in the “Call for Sonifications.” The sonifications were constrained to be data-driven so as to communicate understanding of the data relations, time-bound to the five-minute measurement period (also to be the length of the piece), and reproducible (mappings described in sufficient detail). The identity of the listened-to piece was revealed at the conclusion of the concert.

The Studio afforded a 15-loudspeaker system for playback, ten at ear level, four located 5.8 meters above the floor, and one zenith speaker at 6.8 meters above the center of the room. Composers were invited to submit up to 36 mono WAV files along with specification of azimuth, elevation, and normalized radius from center for each file. These files could then be spatialized using a Lake Huron system. Alternatively, the submissions could be provided for direct playback over the 15-channel system, or presented in Ambisonics-B format. The Lake Huron system was also used to process the files to binaural format, so that the selection panel could review the submissions over headphones. These audio files are available at the ICAD 2004 Web site (www.icad.org).

The concert (which was sold out) elicited an intriguing variety of fashion statements from the Sydney area. All chairs were removed from the lower level of the Studio space, allowing listeners to move around while the pieces were diffused. Seating was available in the gallery on three sides, however eventually most in the audience chose to walk down to the lower level and move about. Colored spotlights provided a dance club-like ambience and inspired a wide spectrum of audience behavior.

A comparison of the ten pieces provides many useful examples of the artistic, practical, and scientific issues involved in designing a sonification. In the first piece, Neural Dialogues by Guillame Potard and Greg Schiemer, domain-expertise in EEGs and brain science was a decisive factor in the data preprocessing and mapping decisions. The authors chose to focus on the correlation (using a MATLAB EEG toolkit) between pairs of (all 26) electrodes. A given pair whose correlation coefficient exceeded a threshold resulted in two drum-like sounds 0.4 seconds apart (using a Csound “pluck” opcode), spatialized to the electrode positions using the Max/MSP implementation of Vector Base Amplitude Planning (VBAP). The global RMS power in each of the delta, theta, alpha, and beta waves was displayed as variations in the amplitude of displayed harmonics in four oscillator instruments based on Jean-Claude Risset’s glissando instrument. The resulting texture of spatialized pluck sounds over continuous oscillators was attractive and fairly static. The RMS power appeared to reach a stability in the middle of the piece but fluctuated substantially at the beginning and end.

Faithfulness to the data appeared to motivate Gordon Monro, in his What are You Really Thinking, who displayed the output of all 26 electrodes as a stream of notes whose pitches corresponded (by a factor of 30) to the frequencies present in each scalp signal. Effects on the notes (360,000 in all) were controlled by the other measured parameters (for example the respiration signal controlled reverberation). Each channel was spatialized to correspond to the electrode position using VBAP. All available data was used. The heartbeat was treated as a separate sound source. The result is astounding. The sounds have an ebb and flow, like breathing, and one hears notes of similar timbre and pitch constantly recurring in swirls of sound.

Real-time practicality and musicality dominated the choices in Listening to the Mind Listening, by Hans van Raijj. In order to set up an Artwonk patch able to process data in real-time on a personal computer, outputting MIDI to sampler instruments, Mr. van Raijj used data from only three channels to produce notes in only two instruments (plucked bass and piano), with stereo panning controlled by the difference between two other channels. Rhythm was controlled by the rate of change of calculated pitches, which were obtained by reducing the number of samples per channel by a factor of ten and performing additional averaging over windows of 50 (to control pitch) and 1000 (to control loudness) samples. The resultant rhythm was very jazz-like, an excellent match to the orchestration. One wonders, however, whether the very limited data use conveyed any scientifically significant relationships.

Tim Barrass, in Untidy Mind, pitch-shifted the raw data in the 26 EEG channels by a factor of 34 using a rotating buffer technique, and spatialized each channel according to the electrode positions (relying on the Lake Huron System to generate the 15.1 output). The data in the remaining 10 channels were mapped in a redundant ad hoc manner to Pd external instruments (producing an additional 8 channels of audio, which were spatialized manually). His approach is directly comparable to What Are You Really Thinking in that all data was used, and that the frequency information in the EEG channels was pitch-shifted in some manner by approximately the same factor into the audible range. Mr. Barrass emphasized that much of his sonification design was achieved by “tweaking by ear” and real-time interaction with the mapping parameters, rather than adhering faithfully to an overarching metaphor. The resultant texture is transparent and very subtle, so much so that repeated listening uncovers increasing layers of interesting detail.

John Dribus searched for musical data compatibility in the creation of The Other Ear, choosing, for example, the more irregular data files to control panning and the more stable to control pitch and amplitude. Using a Max/MSP additive synthesis approach, he arranged the output to stereo pairs of loudspeakers. His mapping controlled musical parameters of different time scales, in a sectional arrangement, and yielded control of each section to different data sets. The Other Ear is thus the only teleological piece, with a beginning, middle, and end.

David Payling’s interesting work Listen (Awakening) seeks to “tell a story” of a mind awakening and becoming aware of a piece of music. His technique begins with a straightforward mapping of the data to control the parameters of five synthesis techniques (granular, additive, FM, formant wave, and simple audio gate triggering), realized in Csound with additional fading and spatialization using Nuendo. The output of each different technique is directed to specific loudspeaker sets, allowing the listener to focus on a particular effect by moving closer to its source. Mr. Payling also fades the different channels in and out over the duration of the piece, so that at the beginning we only hear the granular manipulations of a soundfile speaking “ICAD.” He does not relate the criteria he used to decide when to fade in and out, but the end result, especially with the speech manipulation, has a definite narrative quality.

Roger Dean, Greg White and David Worrall designed Mind Your Body to emphasize the data relations between the “mind” data, in the first 26 channels, with the “body” data (heart rate, respiration, eye movement, sweating, etc.) in the remaining 10 channels. The primary organization was through spatialization of the “mind” data to a front and more present space and the “body” data to a more distant background space. Yet another pitch shifting method was used to generate five of the audio files. The 500 Hz sample rate EEG files were given a 44.1 kHz header, resulting in an audio file of approximately three seconds, which was then stretched using the SoundHack phase vocoder back out to five minutes. (I wondered, however, whether I was mainly hearing artifacts of the vocoding or neural activity). The other audio files were used to control a variety of Max/MSP patches (FM, random noise, piano samples, etc.).

John Sanderson and Tom Heuzenroeder, in Perceptions in C, subjected the data to many layers of processing in order to achieve musicality and listenability. They used data from only 7 of the 36 channels, sub-sampled it to 1/20th of its original rate, normalized it, rescaled each channel to a different register, applied a moving average of varying lengths, quantized to equal-tempered pitches, and generated Excel log-linear plots to be processed by the Coagula color-note organ. The pitches generated from Coagula were sent to a variety of MIDI instruments, with final processing done in Cool Edit Pro. The end result is unquestionably musical and listenable, but I wondered whether the data relations had been lost in the processing.

Polyrhythms in the Human Brain by Thomas Hermann, Gerold Baier, and Markus Müller, was designed to exploit the auditory pattern recognition abilities of the brain as a tool for data exploration. According to the authors, cognitive processes are thought to be in evidence when there is a high degree of firing synchronization in groups of collaborating neurons. The greater the synchronization, the higher the intensity of the associated cortical rhythm. During semantic memory recall, significant power changes are observed in the theta and beta bands. Four channels were selected in order to explore the rhythmic interaction of the two bands. The careful orchestration and sparse texture elucidated the rhythms beautifully in this enormously successful piece. The authors were judicious in their choice to focus on a subset of the data for specific scientific purposes.

Greg Hooper’s EEG Sonification, like Neural Dialogues and Polyrhythms, seeks to isolate a particular feature of scientific interest in the large dataset and sonify that. Mr. Hooper uses the General Mutualized Information algorithm to calculate a non-linear correlation between all possible pairs (26!) of the EEG signals, sonifying only those channels that exceed a threshold correlation coefficient. Substantial domain-specific data reduction by external algorithms before sonification is yet another consideration in the design process. Mr. Hooper’s piece is effective since there are periods of sparse correlation (hence very quiet) that contrast well, from a musical standpoint, with the busier sections.

The responses to Mr. Barrass’s concert call unearthed many different considerations and approaches to sonification design. An interesting follow-on concert might be to take a familiar piece from the classical repertoire and rerun the experiment, collecting less data and placing more constraints on the way the data is used (for example, that every channel must be sonified). In that case, listening to the mind listening could be accompanied by listening to the original piece in auditory memory.