Dr. Dobb's | Java NIO & the iTunes Database

Java NIO & the iTunes Database

Dmitriy uses the Java NIO package—part of J2SE 1.4—to unravel the iTunesDB format, which is at the heart of the database used in Apple's iPod MP3 player.

December 01, 2003
URL:http://www.drdobbs.com/database/java-nio-the-itunes-database/184405502

Dec03: Java NIO & the iTunes Database

Dmitriy is chief architect at MetricStream. He can be contacted at [email protected].

Free-form data formats such as XML are popular in part because they can be read/edited by humans and understood by different computer systems. Fixed-form binary formats, on the other hand, are cryptic and platform dependent, but nonetheless important where memory and CPU-speed constraints are an issue.

Assembly language and C have traditionally been used when working with binary formats—but not Java. However, the Java NIO package changes this. Delivered as part of J2SE 1.4, Java NIO (short for "New Input/Output") extends the standard I/O package by adding classes for scalable network and file I/O, buffer management, and character-set support. In addition, NIO includes facilities for improving performance and simplifying the parsing of binary data formats, synchronization of data processing, threadless access to large numbers of data requesters, and so on. Consequently, Java has become a viable platform for mobile devices, messaging services with huge numbers of requesters, video/audio compression, legacy binary formats, and the like.

In this article, I examine Java NIO features that are useful for fast parsing of binary data. For the purposes of illustration, I apply the techniques described here to the iTunesDB format, which is at the heart of the database used in Apple's iPod MP3 player. In the process, I present code (available electronically; see "Resource Center," page 5) that lets you export iTunesDB-formatted data to CSV and XML format.

Java NIO

Java NIO doesn't directly deal with general I/O operations. Instead, it relies on the I/O package for traditional read/write operations. However, since buffers are what NIO is all about, the standard I/O stream or direct access read/write operations are extended by buffer operations in NIO. Buffer, NIO's abstract class, has a number of derived classes—the most important for buffer manipulation being ByteBuffer. This class enables more flexible heterogeneous access to buffer content while, at the same time, letting you reserve specific data-type buffers for bulk operations. Most buffer methods return a reference to ByteBuffer, making them convenient for chain operations.

Since all Buffer-based classes are abstract, they can't be instantiated directly. Any byte array can be converted to a buffer using wrap operations. Capacity is an important buffer characteristic. If programs try to retrieve data beyond buffer capacity, then underflow exceptions occur. If programs try to send data beyond buffer capacity, overflow exceptions result. Buffers can be read-only, thereby preventing their content from being changed. Buffers can also be direct or indirect. Direct buffers lead to better I/O performance operations because native methods use the buffer for actual native I/O operations. However, freeing buffers can be problematic for garbage collectors because they may reside in native code. Consequently, it isn't a good idea to frequently allocate/deallocate direct buffers.

ByteBuffer also supports:

Compacting, which lets you prepare buffer content for resuming writing in case some operations failed.
Slicing, which allows part of a buffer to be shared by another buffer.
Duplicating, similar to slicing, but gives you access to an entire buffer.

In addition to I/O, buffers can be used for primitive data-type conversion, or for storing different data types in one place without creating a specific class for every possible data-type combination. This is similar to unions in C, where the same memory can be interpreted as a set of different data types.

NIO classes involved in actual I/O operations are defined in the nio.channels subpackage. Channel provides access to externally stored data from multiple threads. Traditional read/write operations are extended by buffer manipulation when channels are used. To simplify access to file content, you use MappedByteBuffer, which represents part of a file usually stored in memory. The Channel interface defines a basic channel operation as closed, because a channel should be open when obtained. To check the status of a channel, use the isOpen method. Other interfaces define specific behaviors for channels. Along with interfaces, the NIO package contains basic abstract channel classes used for different I/O operations. Particular implementations give rich sets of methods simplifying concurrent access to channel content. Entire files (or parts of them) can be locked for particular types of operations. There are specific types of channels for pipes and selectors. Most base classes of package I/O let you obtain specific channels. Consequently, the nio.channels package doesn't provide ways for obtaining channels except via the helper class Channels.

Classes defined in nio.charset (another NIO base package) provide methods to convert data to/from different char sets. The classes can work with buffers, which provide a powerful mechanism for interpreting file content in different charsets. You can't do this easily with the basic I/O package. Instead, you have to provide your own implementation if the file content is in different charsets. In addition to conversion, the classes let you provide different types of behavior when decoding/encoding errors occur. There is a mechanism to obtain information about errors without catching exceptions, and Channels and charset are accompanied by corresponding service-provider packages.

Apple's iPod

Apple Computer's iPod (http://www.apple .com/ipod/) is an MP3 player that uses a proprietary data format to store information about music files and playlists on the iPod's hard drive. Although the iPod can appear as a removable drive, you can't just add music files and play them. Instead, you must use special programs to update the iPod's database.

It appears that the iPod's firmware is capable of understanding different types of filesystems, including HFS+ and Windows FAT32. In this article, I assume the iPod supports FAT32. Figure 1 illustrates the iPod's typical directory structure. The iTunes directory contains several files, including iTunesDB (which I focus on here). The Music directory contains 20 subdirectories where actual audio files are stored. Actually storing music files in these directories isn't mandatory.

The iTunesDB file has a tree-like structure, in which each element of the tree is a header. The format of the file looks similar to the hierarchy of atoms used in the QuickTime format. Every header has a similar structure—a 4-byte signature followed by a sequence of 32-bit integers. Little-endian notation is used throughout. A number after the signature indicates the size of the header. If a header has child headers, then the next number indicates the total size of the header—including the size of all children. The headers include the following items.

mhbd is the root header of the entire database tree. The header name is a header signature. Since I don't know how to decipher the abbreviated signature of every header, I settled on the approach whereby "mh" means "Mac Header" and "bd" means "Base Directory." The total size of this header matches the size of the iTunesDB file. This header includes the version number of the database, along with additional information I couldn't discern. It has two mhsd child headers, but they are of different types.
mhsd, a header that starts a section of audio-file or playlist directories; I call it the "song directory." There are two types of this header—a song directory that is the next header after mhbd, and a playlist directory that follows the song directory.
mhlt is a list of audio-file descriptors; I call it "list tracker." mhlt contains a number of children, so if you only need to know the number of audio files on the iPod, then you can stop reading the database after this header.
mhit presents numeric information for every audio file visible to the iPod firmware. This header, which I call "item tracker," provides information such as the unique ID of a song (via the audio-file playlist headers). Besides the ID, the header contains the year of music composition, track number on the CD, bit rate, length, and numeric information (dates, start/stop margins, volume, and the like). Only the ID and length are relevant to this article. Additional information for this header is encapsulated in its child mhod headers. Figure 2 is a partial hex dump that shows a header format.
mhod heads a portion of data, usually Unicode strings, but can also refer to a number. I call this "ordinary data." Data follow this header, but I put them in a separate header because they also include the sort of header that contains size information. The data type is specified in this header as one of the value/meaning combinations in Figure 3. There is a gap between commentary and composer values, which makes me think Apple may add new types (such as a conductor, recorder, or whatever).
Data headers for strings start with a 16-byte header. The length of a string in bytes is contained in the first integer, while the rest of the header is filled with 0s. An actual Unicode string starts just after the 16th byte. If a header is of type integer, then it is 4 bytes long and presents the integer value.
mhsd of playlist type deals with playlist information. This header looks like a divider, but doesn't contain much useful information. Subsequent headers follow it, providing information about lists.
mhlp starts a list of playlists and follows after mhsd of type playlist. This header contains only the number of playlists. I call it "list prefix." It appears that iTunesDB always has a mandatory list of all stored audio files.
mhyp starts every playlist and provides the number of entries and playlist type. This header, which I call "yard prefix," can have a qualifier such as an mhod header providing the name of the list or its number. A playlist such as this has a specific attribute and is defined first. All other playlists follow it.
mhip defines an element of a playlist and can contain a qualifier specifying the ID of a referred audio file. An additional mhod header is used to specify the order number in the list for certain versions of iTunesDB where the requested order is different than natural flow headers.

Figure 4 illustrates the actual header structure flow. For more information on iTunesDB format (Version 0.1b), see http://homepage.ntlworld.com/simon .mason20/ipod_tunes_spec.htm. For additional information on iTunesDB headers, see http://sourceforge.net/docman/?group_id=52976.

NIO and Binary Format Parsing

There are several ways to parse fixed-file binary formats such as iTunesDB. I use NIO to implement three techniques that let you compare coding effort, readability, and performance. Each technique is based on a separate class for reading a header of each type. Since all headers have a common part, it's reasonable to have all header classes inherit from a base class implementing a common task. And since parsing should use a specific knowledge of every particular class, this knowledge is represented by abstract methods that must be executed in particular implementations of headers.

The two knowledge attributes are the basic header size and a header signature. Although size isn't supposed to be a constant, you can assume at times that it is, since the same program can write/read headers. Of course, such assumptions should not be considered when writing software for the iPod because it deals with different formats and header sizes. Access to all other header attributes is via a generic setter/getter method that lets you avoid explicit casting to a header of a particular type when some attributes should be set or retrieved; see Listing One.

Next, you must determine how child headers are read. You can encapsulate this logic in the headers themselves to read all subheaders, or use an external reading procedure. The three approaches I present here use different methods to obtain the content of a buffer, and use header-based or external procedures for parsing. The first approach is based on obtaining the buffer content via Channel's method map; see Listing Two. The second approach is based on sequential reading of a channel to a buffer (Listing Three). These two approaches are used in conjunction with an external parser. A third approach uses buffer-slicing and header-driven parsing; see Listing Four.

In the first approach (Listing Two), the buffer is specified as read-only to potentially increase speed. The buffer-mapping position in iTunesDB is specified as a parameter of the method. Sequential calling of header readings is simplified by using an autoincremental value returned by the method as a position of the beginning of the next header buffer. However, this approach can be problematic when the requested mapped area size is larger than the physical file size. To avoid this, you first need to read a portion of a header into a buffer to determine signature and size. When you get the real size, you can go ahead and read in the rest of header. Listing Three implements the sequential read-buffer technique; in this case, there is no need to explicitly position the reading. In addition, Listing Three illustrates two-step reading that first determines size, then reads the entire header. This approach is similar to standard byte-array reading operations. The buffer-slicing approach (Listing Four) only manipulates buffers because the entire database is read at the start. The method readChildren gets a buffer slice of the first child of the current header. Reading a data header is tricky, since it doesn't contain information about its own size. This information must be extracted from the mhod header. For this purpose, Listing Five adds an extra argument when calling the read function.

Setting a limit to the char buffer is necessary only for the slicing approach because an actual buffer can be larger than you expect. charset manipulations are not required because the Unicode string is already present in the header. Experimentation with charset provided a method for converting a 4-byte signature to integer value when the signature is presented as a four-character string. The parsing process is predetermined, so it's always known what type of a header comes next.

Header signatures are used mainly to ensure the consistency of the database file. Listing Six shows a way to obtain a channel to access iTunesDB. The complete code, including code to export the iPod database to CSV and XML format, is available electronically. This example can also be used to write to iTunesDB. Writing is usually simpler, but can be tricky because the first headers must have size values set, and can be obtained only after the child headers have been formed.

Conclusion

To measure the performance benefits of using the NIO classes, I compared their use to the traditional reading procedure used in MediaChest (http://mediachest.sourceforge.net/), an open-source project of mine. To get both code bases working under similar conditions, I added a similar MediaChest overhead to fill data structures in the NIO implementation. Table 1 shows the results for parsing information for about 5713 songs on a 1-GHz Celeron PC with a USB 2.0 connection.

The fastest implementation was based on reading the entire database in memory using the slicing approach. The function memory-mapped region version performed well, too. The poorest performer was a read-buffer-based implementation, which still worked more than twice as fast as the traditional MediaChest implementation.

DDJ

Listing One

public Object get(int index) {
   if (index >= START_OBJ_INDEX && index <= END_OBJ_IDX)
      return objValues[index - START_OBJ_INDEX];
   if (index >= START_INT_INDEX && index <= END_INT_IDX)
      return new Integer(intValues[index - START_INT_INDEX]);
   throw new IllegalArgumentException("Index " + index + " is out of range.");
}
public int getInt(int index) {
   if (index >= START_INT_INDEX && index <= END_INT_IDX)
     return intValues[index - START_INT_INDEX];
   throw new IllegalArgumentException("Index " + index + " is out of range.");
}

Back to Article

Listing Two

public int read(FileChannel fileChannel, int readPosition) throws IOException {
   ByteBuffer buffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 
                     readPosition, getSize()).order(ByteOrder.LITTLE_ENDIAN);
   intValues[SIGNATURE] = buffer.getInt();
   if (getSignature() != intValues[SIGNATURE])
   throw new IOException(
      "Unexpected signature 0x"
      + Integer.toHexString(intValues[SIGNATURE])
      + "("
      + asString(intValues[SIGNATURE])
      + ") where 0x"
      + Integer.toHexString(getSignature())
      + " was expected.");
   intValues[SIZE] = buffer.getInt();
   if (buffer.capacity() < intValues[SIZE])
      throw new IOException("Real header size " + intValues[SIZE] + " 
                                            exceeds the buffer capacity.");
   readSpecific(buffer);
      return readPosition + intValues[SIZE];
}

Back to Article

Listing Three

public void read(FileChannel fileChannel) throws IOException {
   ByteBuffer buffer = 
   ByteBuffer.allocateDirect(8).order(ByteOrder.LITTLE_ENDIAN);
   fileChannel.read(buffer);
   buffer.rewind();
   intValues[SIGNATURE] = buffer.getInt();
   if (getSignature() != intValues[SIGNATURE])
      throw new IOException(
         "Unexpected signature 0x"
        + Integer.toHexString(intValues[SIGNATURE])
        + "("
        + asString(intValues[SIGNATURE])
        + ") where 0x"
        + Integer.toHexString(getSignature())
        + " was expected.");
   intValues[SIZE] = buffer.getInt();
   buffer = 
       ((ByteBuffer)ByteBuffer.allocateDirect(intValues[SIZE]).position(8)).
                                               order(ByteOrder.LITTLE_ENDIAN);
   fileChannel.read(buffer);
   buffer.position(8);
   if (buffer.capacity() < intValues[SIZE])
      throw new IOException("Real header size " + intValues[SIZE] + " 
                                              exceeds the buffer capacity.");
   readSpecific(buffer);
}

Back to Article

Listing Four

public void read(ByteBuffer buffer) throws IOException {
   intValues[SIGNATURE] = buffer.getInt();
   if (getSignature() != intValues[SIGNATURE])
      throw new IOException(
         "Unexpected signature 0x"
         + Integer.toHexString(intValues[SIGNATURE])
         + "("
         + asString(intValues[SIGNATURE])
         + ") where 0x"
         + Integer.toHexString(getSignature())
         + " was expected.");
   intValues[SIZE] = buffer.getInt();
   readSpecific(buffer);
   buffer = ((ByteBuffer) 
   buffer.position(intValues[SIZE])).slice().order(ByteOrder.LITTLE_ENDIAN);
   readChildren(buffer);
}

Back to Article

Listing Five

public void readSpecific(ByteBuffer buffer) {
   intValues[SIZE] = getSize(); // 16
   intValues[NUM_ENTRIES] = buffer.getInt();
   intValues[LENGTH] = buffer.getInt();
   intValues[TOTAL_SIZE] = intValues[LENGTH] + intValues[SIZE];
   objValues[DATA - START_OBJ_INDEX] = ((ByteBuffer) 
    buffer.position(intValues[SIZE]).
limit(intValues[SIZE]+intValues[LENGTH])).asCharBuffer().toString();
}

Back to Article