Vol. 27 Issue 3 Reviews
Perry R. Cook: Real Sound Synthesis for Interactive Applications

A. K. Peters, 2002, ISBN 1-56881-168-3, softcover/CD-ROM, 263 pages; available from A. K. Peters, Ltd., 63 South Avenue, Natick, Massachusetts 01760, USA; electronic mail alice@akpeters.com; World Wide Web www.akpeters.com

Reviewed by Tae Hong Park
Princeton, New Jersey, USA

Book cover “Since sounds in the real world are made by physical processes, this book will take a physics-based perspective wherever possible. The book is intended for technical, semi-technical and aspiring technical people who know some physics, some programming, and some math. None of these technical skills are specifically required to understand much of what we’ll talk about here,” writes Perry Cook in his new book, Real Sound Synthesis for Interactive Applications. The author’s approach to introducing, analyzing, and investigating sound-centric signal processing concepts and synthesis algorithms, with a particular focus on physical modeling algorithms, is done in a well-organized, well-crafted, and enjoyable manner.

“ Enjoyment” may perhaps be one of the distinguishing features that differentiates this book from conventional signal processing and technically-oriented multimedia books on the market today. Without a doubt, a publication that consists of 16 chapters, a math-heavy appendix, and a CD-ROM, written with Mr. Cook’s humorous and fun-embracing teaching style, makes the page turning and grasping of concepts a little less intimidating, and even a fun experience. Through a plethora of sound and code examples, diagrams, graphs, and flow charts, typically hairy signal processing theories are explained clearly, straightforwardly, and as concisely as possible. It seems evident that one of the primary goals of this book is not to dwell unnecessarily on the mathematics supporting these theories just for the sake of it, but rather to approach a particular idea through the physics involved, get to the crux of how it works, help understand concepts and algorithms with sufficient and adequate inclusion of mathematics, and, ultimately, create sounds.

Mr. Cook has striven to make this book “useful to musical and non-musical sound creators of all types.” The C++ code examples given in numerous sections of the book and on the CD-ROM are included to help show how compactly and easily an algorithm can be represented, and to provide good working code for compilation and customization, while at the same time clarifying the theories behind the algorithms. As one may expect, a great number of the sound examples on the CD-ROM were actually produced with these code examples. This triangular architecture of hearing the sound, seeing the code, and reading the explanations in the book pretty much exemplifies the design of Real Sound Synthesis for Interactive Applications. It may very well be that the author’s method of elucidating ideas and structuring his book closely reflects his teaching style. It is to no surprise then, that he has based some of the structure of this book on his teachings at Princeton University, testing it in a classroom setting, adding, deleting, and editing draft versions on his laptop in real time, while teaching some of the very concepts in the book to a class of graduate and undergraduate students from computer science, electrical engineering, music, economics, and mathematics degree programs.

The book includes 16 chapters and a five-part appendix: 1. Digital Audio Signals; 2. Sampling (Wavetable) Synthesis; 3. Digital Filters; 4. Modal Synthesis; 5. The Fourier Transform; 6. Spectral Modeling and Additive Synthesis; 7. Sub-band Vocoders and Filterbanks; 8. Subtractive Synthesis and LPC; 9. Strings and Bars; 10. Nonlinearity, Waveshaping, FM; 11. Tubes and Air Cavities; 12. Two and Three Dimensions; 13. FOFs, Wavelets, and Particles; 14. Exciting and Controlling Sound Models; 15. Walking Synthesis: A Complete System; and 16. Examples, Systems, and Applications. Interestingly, Mr. Cook “really” recommends the reader start with the last chapter, which introduces various examples and application areas and shows some of the possibilities of the book’s underlying topics in a number of sound creation situations. These include: a possible digital Foley stage for the film industry; a fully articulated sound model of a dinosaur and a slinky-dog for animation and gaming developers; a sonification of stock prices for the Wall Street community. In a way, this chapter sets the stage for the whole book and encourages the reader to consider how a particular technology could work for a particular application.

The first chapter is one of shortest in the book, briefly establishing the fundamentals of digital audio, and including an introduction to the basics of quantization, compression, and Pulse-Code Modulation (PCM) sampling. Chapter 2 dives right into the sound synthesis world with wavetable synthesis, and addresses issues concerning, among others, the pros and cons of overlap-add methods. Prior to chapter 3, even though a lot of major topics are covered, no unnecessary math is included within the text. Certainly this absence could be regarded as a negative, especially for those people who like to see proofs of everything from Euler identities to the Fermat riddle in any technical book. However, this conflicts with Mr. Cook’s more application-based approach. Furthermore, any good introductory textbook on signal processing will probably be more suitable for a person seeking such standard necessities.

In chapter 3, digital filters are introduced. It includes a concise but clear introduction to Linear Time Invariant (LTI) systems, convolution, Finite Impulse Response (FIR) filters, Infinite Impulse Response (IIR) filters, and “magical” Z transforms. The chapter culminates in the ever-so-popular BiQuad filter in section 3.10. For obvious reasons, the number of equations and symbols to explain concepts becomes a non-rarity here, but the inclusion of approximately 1.2 pictures and diagrams per page helps tremendously with grasping the concepts discussed therein. Up until chapter 3, everything is pretty much explained in the time domain. Chapter 4, which deals with modal synthesis, acts as a stepping-stone to the frequency domain, leading to chapter 5’s discussion of Fourier Transform. This section examines Discrete Fourier Transform (DFT),“fast convolution,” and Short Time Fourier Transform (STFT), and ends with examples of applications, keeping a close check between theory, application, and reality.

Armed with frequency-domain knowledge, chapters 6, 7, and 8 delve deeper into synthesis/analysis concepts such as Linear Predictive Coding (LPC), spectral modeling, additive/subtractive synthesis, noise signals, and inharmonicity. An integrating example deconvolves a sound into features previously discussed. Again, one can hardly turn a page without an accompanying picture or block diagram, a particularly valuable feature of this book.

Chapter 9 explores the physical modeling concepts of string vibrations and stiff bars. Modeling algorithms are introduced using basic physics-based perspectives, centered around the familiar string, mass, and damper paradigms first introduced in chapter 4. Here again, rather than bombarding the reader with 11,025 equations, Mr. Cook opts to explain ideas mainly through diagrams, sound examples, and block diagrams—this is very helpful for practically implementing something in software and understanding an algorithm. The ready-to-compile C++ code for this section included on the CD-ROM provide models of a plucked string (Plucked.cpp), a mandolin (Mandolin.cpp), and a bowed string (Bowed.cpp). In Chapter 11, Tubes and Air Cavities, the author introduces more models while leaving detailed mathematical derivations of equations for the appendix. He concludes chapter 11 with section 11.3.1, Building and Blowing a Bottle Model, with code and sound examples, as usual. Going into chapter 12, more complex, higher dimensional models are introduced, with the traditional mass-spring model in the context of a meshed membrane starting off the chapter.

Chapter 13 introduces algorithms that focus on modeling and synthesizing interacting particles. The menu in chapter 13 consists of, among other subjects, Formant Wave Functions (FOFs) for voice synthesis, single particle models, multi-particle systems, and statistical multi-particle systems such as a simple maraca model, implemented in less than 30 lines of C code with an accompanying block diagram. Chapter 14 deals with the subtleties of exciting and controlling sound models. For example, Mr. Cook discusses the differences between exciting a string with a plectrum as opposed to using the fleshy part of the thumb. He also shows some fascinating effects of the striking conditions of the Tibetan prayer bowl, which exhibits very different spectra as a function of strike-direction while keeping strike-point constant. Other topics discussed include bowing, scraping, and frictional issues in synthesis. MIDI, OSC (Open Sound Control), and other standards for sound and multimedia control are also briefly examined.

The penultimate chapter walks the reader through a complete system called PhOLISE (Physically Oriented Library of Interactive Sound Effects) that could possibly be applied to areas such as gaming, animation, and sound effects (Foley) in film production. The five sections of the appendix go into greater detail regarding proofs, derivations, and properties of topics such as DFT properties, zero-padding, proof of “fast convolution,” and ideal string behaviors.

This publication, which originates from Mr. Cook’s years of research and teaching in the domains of signal processing and physical modeling at places like Stanford University’s Center for Computer Research and Musical Acoustics (CCRMA) Summer DSP Workshop and Princeton University, is an excellent source for learning and teaching about past, recent, and state-of-the art physical modeling techniques as well as general signal processing concepts for sound processing. The decision to introduce complex systems through a physics-first view was a direct result of his teaching in classrooms and workshops filled with a range of students of different disciplines. Perhaps that is one of the reasons why this book is already being used as a textbook for undergraduate and graduate students at colleges and universities such as University of Virginia, University of Florida, and Princeton University.

A lot of ground is covered in the book, however, and for the reader to internalize and digest everything would take much time and effort, which may be somewhat misleading at first, as the attractive design and architecture of the book is so non-intimidating. I would recommend the reader take in a line and a sentence at a time when important concepts come up. Also, those people who are expecting a conventional signal-processing book may perhaps be a little bit dissatisfied with the depth and details regarding the mathematical side of things. Probably in trying to avoid the making of this book into another bible for computer music with 44,100 pages, some basic topics such as windowing theories, uncertainty principle pertaining to frequency and time domain issues, or filter design issues are not fleshed out to their fullest extent. However, an abundance of references accompany each chapter for those who feel the need for further details on a particular subject. Furthermore, there are too many books that already step and hop through the sync function, Radix-N algorithms, Dolph windowing, elaborate extensively on the Gibbs phenomenon, or compute the group delay of a system. These are, of course, all important concepts in signal processing, but they do not necessarily keep you in check with real world applications, or the real world, period. Granted you may be able to solve problem sets 1.a, 3.c, and 4.d, but what else can you do with it?

The bottom line is that Real Sound Synthesis for Interactive Applications is not a book that introduces and uses signal processing concepts for the sake of signal processing; rather, it is a goal-oriented book. The goal is to create sounds, learn the tools and theory behind creating such sounds, and grasp the essentials of many sophisticated physical modeling concepts in particular.

One negative aspect of this book, especially for academic institutions, is the absence of problem sets and drill problems at the end of each chapter or subsection, as one is used to finding in textbooks. However, according to the author, problem sets for the book are currently being worked on and will be available in the near future on Mr. Cook’s Web site (www.cs.princeton.edu/~prc/AKPetersBook.htm). As a matter of fact, updates, errata and other information pertinent to this book can also be accessed there. In conclusion, for those people who want to create sounds, understand the physics of acoustic instruments, run/analyze/hack some audio-related software, do some math, and get a solid understanding how signal processing works in sound processing, this book definitely is the one you should get.