Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

JVM Languages

Java Q&A


Java Q&A

Rick is a System Engineer for Intel Corp. He manages a central support group in a department that is responsible for NT and UNIX infrastructure design and integration with enterprise applications. He can be contacted [email protected].

Streaming audio refers to audio that can be downloaded at the same speed it is played. If the download speed matches or exceeds the rate at which the audio data is played, the audio can be played indefinitely without any interruptions. There are many practical applications for this technology on the Internet, and numerous commercial applications in the streaming audio client/server market have emerged. In this article, I'll present idtAudio, a streaming audio applet written in Java. (The source code for idtAudio and the companion program, idtAudio.cpp, is available electronically; see "Resource Center," page 3.)

As a Java applet, idtAudio inherits several advantages over most other streaming audio tools. Its most distinct advantage is that users never need to install any software. As an applet, the code is downloaded automatically into the browser as needed without user intervention. Furthermore, since the code is downloaded for each audio instance, there is no possibility of version mismatches between the audio data and the player. Also, applets are portable. The idtAudio applet seems to run equally well on Netscape Navigator 3.01 and Microsoft Internet Explorer 3.01 on Windows 95/NT.

The design of idtAudio also sports several distinctive features. The streaming protocol is built on standard HTTP, which is the same protocol used by all web browsers. As such, the applet can download its audio data through any firewall that allows web activity. This also means that the server has no requirements for special drivers or extensions.

Compression

Compression is the key to transmitting streaming audio data at low bandwidths. Audio data is considered to be one of the most difficult mediums to compress, but when using Java, the algorithm must also be very fast.

I researched several existing audio compression technologies, starting with MPEG. While its audio quality is excellent and several code samples are readily available, MPEG's psychoacoustic algorithm places a heavy toll on the CPU and cannot be implemented effectively in Java.

ADPCM is a simpler algorithm that would seem more likely to work in the Java environment. Unfortunately, ADPCM does not easily compress to the levels required for streaming modem transmission. The quality becomes poor, and the specific algorithms used become more secretive.

Other algorithms had similar problems, so I was forced to develop my own. Because compression relies on predictable patterns to be effective, I began by studying graphs of wave forms (see Figure 1). The sinusoidal qualities of the wave suggested a first algorithm: Compress by capturing the peaks and valleys only, and decompress by programmatically filling in the remaining points using a sinusoidal function. With this algorithm, the decompressed audio had an annoying echo effect. More importantly, I also realized that the algorithm was uncompromising in that it could never predictably compress to any particular ratio. For example, a wave composed entirely of peaks and valleys would double in size after being "compressed."

A common technique used in modern physics is "curve fitting," where empirical data is approximated to a mathematical function such as a line or a parabola. My next attempt at audio compression broke up the wave form into a series of straight lines that approximate the original. The audio quality was much improved, and the compression ratio could be controlled by how closely the original wave form was followed. This tolerance level could be iteratively increased until the desired compression was achieved. The only problem was that, to achieve the required compression ratios, the tolerances had to be set very high. This degraded the audio quality too much. I still needed a way to increase the compression without sacrificing sound quality.

A frequency analysis of the audio data showed that smaller values occur much more regularly than larger ones, so it seems natural to sacrifice precision at higher amplitudes and preserve more detail near the smaller amplitudes. Originally, each audio sample is a signed eight-bit byte. idtAudio further compresses the audio data by storing a four-bit value for the amplitude instead of the full eight bits. The four-bit signed value is the square root of the eight-bit value (with negative amplitudes represented by negative square roots). Right or wrong, this extra 33 percent of compression was enough to make idtAudio work.

Although the audio compression program, encode.exe, was originally written in Java, performance was poor on large files, so I rewrote it in C. The C version seems quick enough for streaming encoding. Thus, we have laid the foundations for an Internet telephone, but that would be a different article entirely...

Decoding and Playing the Compressed Audio

Due to Java's poor audio support, the audio player applet requires the use of Sun's unsupported AudioPlayer class in the sun.audio class library. (It is this dependency that prevents idtAudio from working with every Java-capable browser.) The AudioPlayer class requires an InputStream object from which it reads and plays ulaw-encoded audio data until it reaches the end of file. The idtAudio applet derives its own class from InputStream called FifoInputStream (which will be referred to as FIS for reasons I'll explain shortly), and submits it to the AudioPlayer class. FIS allows one write thread and one read thread to communicate with fairly good performance.

A simple architectural solution, then, would have one thread downloading, decoding, and depositing audio data into the FIS queue, while another thread (implemented via AudioPlayer) would read and play the audio data; see Figure 2. This design worked well when the compressed audio data was read from a local hard disk. Streaming from a true Internet URL was a different story. The audio was broken up and my modem lights indicated a lack of activity. Some quick tests of URL download times with C and Java programs suggested that the server's performance was not an issue, and that the Java language, itself, was not an issue. Clearly, the simple design described earlier was not an optimal solution.

Because a serial download leaves the CPU mostly idle, it made sense to break up the download and decode tasks into separate threads; see Figure 3. The FIS class was the perfect way for the two threads to communicate. A DoT (short for "download thread") class, derived from Thread, would be responsible for downloading the audio data and depositing it into the first FIS object. The DeT (short for "decode thread") class, also derived from Thread, would be responsible for reading the first FIS object, decoding the data, and depositing the results into the second FIS. The AudioPlayer class would then read and play the second FIS at its leisure. This solution proved to be both successful and elegant.

Optimizing Java Code

Some serious optimization was still required to make the player perform on a 486 running Netscape's slightly slower implementation of Java.

First, the synchronized sections of code were minimized. Some basic common sense pays off here. Consider the nature of our FIFO queue: One thread reads at the tail and another writes to the head. The only point of contention is the full member that indicates how full the queue is. Write operations to this member are synchronized. Read operations are not synchronized:This does carry a slight risk, but the risk only occurs under an unusual set of circumstances and is considered acceptable for the performance gain.

Every time an array is accessed in Java, the language performs some internal bounds checking. This checking can be costly. Not only was array access minimized, but looping itself was also minimized. Buffered access to the FIS objects was encouraged wherever possible in favor of character access.

Probably the most significant performance boost came from the System.arraycopy() function, which copies one array's contents to another, given offsets and lengths into each. System.arraycopy() is analogous to memcpy() in C and similar in performance. Sadly, the only alternative in Java is to use a standard loop and iteratively copy one member at a time.

Instead of a circular FIFO, I considered using a linked list of dynamically allocated buffers. When data was entered into a FIFO, a new buffer would be allocated and linked to the head of the list. Reading would retrieve and unlink tail-end buffers. Such a scheme could avoid one of two calls to System.arraycopy(), but would add the expense of garbage collection and object creation. A couple of test programs showed that System.arraycopy() was still slightly faster than the linked list scheme, so I never implemented the change.

Information on Java optimization is sketchy at best. Most texts suggest the generic approach of optimizing the innermost loops. The tricks I've just described showed remarkable performance gains, however. These techniques can be applied to almost any program.

Protecting Java Code

You may wonder why I chose such cryptic naming conventions for my classes. Originally, they were called FifoInputStream, DecodeThread, and DownloadThread, but the destiny of my applet was uncertain at that time. Because Java classes are downloaded by name, anybody using the audio player could quickly discover some key design secrets! The names were too informative, so I renamed them to FIS, DeT, and Dot, names that would be meaningless to anyone who didn't already understand the applet's internal architecture.

Another protection mechanism demonstrated by idtAudio is the "time-bomb" feature often seen in shareware programs. (The code is commented out to effectively disable the time-bomb feature, but remains for your reference.) Typically, an expiring license scheme is not very effective in Java applets because a web developer could easily download a licensed copy of the applet from another web site. The trick to making this work is to place the "bomb" in the audio data instead of the applet itself. By doing so, the key to the license is embedded in data that is useless to everyone but the web site owner who owns the license. This is accomplished rather simply by placing a time stamp within the header information. The applet checks the time stamp and refuses to play encoded files that are too old. When registered, however, the encoder inserts null data into the time stamp slot, which indicates that the encoded audio file never expires.

Using idtAudio

The first step to using the idtAudio applet is to create your audio files. I use Syntrillium's Cool Edit 96 to create mine; even the disabled shareware version has enough features to do the job. Record an audio session and save it as an 8000 MHz eight-bit sam file.

Next, the audio file needs to be compressed. Use the encode.exe program for this. It requires an input file and an output file for command-line parameters. An optional third parameter is either a tolerance level or a target bandwidth (in characters per second), depending upon the value given. If a tolerance is given, the audio file is compressed once at the specified tolerance. If a target bandwidth is specified, the audio file is compressed iteratively at increasing tolerances until the target bandwidth is reached.

My experiments suggest that a 28.8 modem connected at 24000 bps can sustain transmission speeds of approximately 2300 cps, so this is the default target bandwidth when none is specified. A 14.4 modem can only sustain bandwidths of about 1400 cps. Some experimentation has uncovered a rule of thumb: Under good conditions with PPP, a modem's bandwidth in characters per second is approximately the modem's connection speed in bits per second divided by 10. To be safe, I usually subtract another 100 cps.

After the encoded audio file is created, it must be stored on your web server. You may wish to use a familiar MIME type if your server is picky about such things. The four applet classes for the player (DeT, DoT, FIS, and idtAudio) must also be stored on the server. Add Example 1 to your HTML code to add the idtAudio applet.

In this example, "fish.idt" would be the name of the encoded audio file. The Debug parameter sets the style of the applet's appearance; true provides more detailed information. The AutoPlay parameter indicates whether the applet should begin playing immediately or wait for the user to click on it.

Conclusion

In retrospect, there are a number of variations to the compression algorithm that could increase compression and quality. You could even implement an "algorithm shifting" scheme that would change compression algorithms on the fly based on an analysis of the audio data.

Simpler improvements would be to take the amplitude compression into account when calculating tolerances. The amplitude compression introduces substantial error that is not considered by the tolerance level. Also, the amplitudes might be better stored as offsets instead of absolutes, similar to the ADPCM method.

Nevertheless, the idtAudio applet exemplifies a number of concepts. First, it is a good example of object-oriented programming. In fact, I believe every object-oriented programming principle is demonstrated in some way within the applet's code. Next, it demonstrates the power of Java. Consider that the total code size is only 17 KB yet it implements a multithreaded graphical applet with performance good enough to perform downloading, number crunching, and real-time audio on a 486 machine. Lastly, it demonstrates a basic security and licensing scheme that developers may want to apply to commercial applets.

DDJ


Copyright © 1998, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.