I can still remember the first time I got to play a MiniMoog synthesizer. I had heard it being used by rock groups like Yes and Emerson, Lake & Palmer and was fascinated by the range of sounds it was capable of making. The MiniMoog was a breakthrough device in many ways. Unlike the other synths of the era, it was portable and it didn't require the use of patch cords to route the audio signals to processing modules, making it playable in realtime. Amazing sounds, portability, and easy of use were key factors in the MiniMoog's popularity and subsequent ubiquity.
Today, synthesizers remain on the forefront of music making and I'm still enamored of their potential. Whereas synths were once single voice analog devices with inherent stability problems, they are now rock solid (pun intended) digital devices with even more amazing capabilities like multi-timbral, multi-voice polyphony, and sampled sound acquisition and playback.
As a result of my continuing interest in synthesizer technology, I recently wrote an iPhone/iPod Touch app called PSynth (Figure 1), which is a complete electronic music synthesizer and recorder that fits in the palm of your hand. I've learned a lot about electronic music and how synthesizers work over the years and so I decided to write a series of articles (starting with this one) detailing some of what I have learned. Being a programmer and musician, I have long been interested in using computers for music generation and PSynth is the direct result.
In the remainder of this article, I discuss the background necessary to understand how computer based music synthesis works and will provide a basic platform for experimentation. Note: Although PSynth was written in Objective-C as required for iPhone apps, Java will be used in these articles to make the code easier to understand.
In the second article of this series, I will present code for a multi-waveform oscillator (a staple of electronic music) and for an envelope generator, which is also an important electronic music component. In the third article, I will describe the digital equivalent of a voltage controlled amplifier (VCA) and a voltage controlled filter (VCF) and discuss the part these components play in our synth discussion. And, in the final article, I will show how effects such as delay and phasor can be deployed in our synthesizer architecture. By the time you finish reading these articles, you should be well versed in the basics of electronic music and software synthesis and have the code you need.
Be advised, I will be talking about only the real-time code used for electronic music production. If you are interested in the UI end of things, check out item four in the Resource section at the end of this article.
Introduction to Samples
Before delving into how to construct electronic music components/modules, it's necessary to ascertain information about the environment in which these components will run. Five important questions must be answered, all related to samples. Samples are to digital audio what pixels are to digital images.
- What is a sample?
- What sample rate will our components/modules support?
- What is the data type of the samples?
- What is the sample data type resolution/length?
- Will our components/modules support mono and/or stereo?
Samples can be visualized as values extracted at periodic intervals from a continuously variable analog waveform. Sampling converts the analog waveform into a discrete series of samples (numbers) representing the value of the waveform at precise times. Sampling converts the waveform from the analog domain into the digital domain, which is better suited for processing by computers. The analog waveform can be reconstituted from the stream of digital samples if certain criteria (described shortly) are met.
While we have discussed samples as being extracted from analog waveforms, this is not always their origin. Samples can also be algorithmically generated in software. For example, samples representing a sine wave at a frequency of 1000 Hz can be generated so as to be indistinguishable from their analog counterparts. Waveform generation will be described in the second article of this series.
Sample rate (the rate at which samples are acquired) is important because it determines the bandwidth of signals that can be expressed digitally without adverse side effects. The Nyquist-Shannon sampling theorem states that "an analog signal that has been sampled can be perfectly reconstructed from an infinite sequence of samples if the sampling rate exceeds 2B samples per second, where B is the highest frequency in the original signal." In practice, this means is the highest frequency that can be expressed as a series of samples must be less than ½ the sample rate. Any frequencies higher than this Nyquist limit will cause aliasing of the signals, thus creating frequency components not present in the original signal (a bad thing).
Audio subsystems in modern computers are capable of supporting a wide variety of sample rates. At the low end of the spectrum, a sample rate of 8000 samples/second is typically used for voice applications; whereas samples rates of 96K samples/second and higher are reserved for high end audio acquisition and processing applications. A sample rate of 44,100 samples/second is typically used for music stored on CDs.
It is important to understand that the faster the sampling rate, the more load placed on the systems processing the samples. It is therefore important to choose a sample rate carefully by trading off the bandwidth required for signal fidelity with the resultant system loading. In the EMU 1820m sound card I use for home recording, the faster I set the sample rate, the fewer inputs and effects I have available. This is because faster sample rates place increased load on the host system.
A sample can either be a whole number (integer value) or a real number (floating-point value). Integer samples have a word length. An integer sample can be 8 bits, 16 bits or 32 bits in length and can be a unsigned or signed quantity. 32-bit signed values are used most often these days, but there are exceptions. Real valued samples can either be single (float) or double-precision (double) floating-point values.
Sample word length determines audio resolution and noise in a system. A byte sample has 256 possible values, whereas a 32 bit integer has 232 possible values, resulting in better resolution (hence, less signal distortion due to quantization). The increased resolution available to longer word length samples also means that the noise in the system will fall well below typical signal levels and therefore be less noticeable.
Most electronic music modules utilize mono or single channel sample streams. Some, however, accept a single channel stream as input but produce a multi-channel output stream. Multi-channel streams (stereo being the best known) require the processing of multiple samples per sample time — again increasing system loading.
Programming in the iPhone/iPod Touch environment, as I did for PSynth, required that I follow the rules for that environment. The API documentation said it was best to use floating-point values (floats) for samples because the hardware was optimized for them. So I did. I developed all of the PSynth code using the iPhone emulator provided with the development tools (Xcode) for testing and got everything to work. I then moved the code onto an iPod Touch and it didn't make a sound. After some serious head scratching, I found out that the real devices don't support float samples like the emulator did. I then had to write more code that accepted streams of float samples and converted them to integers so the hardware could play them. So, for PSynth, I used float samples for sample generation and processing. I then converted them to integers for the hardware. I used a sample rate of 22050 samples/second. All sample streams within PSynth are mono.