Channels ▼

Eric Bruno

Dr. Dobb's Bloggers

File IO in Java

September 13, 2009

I've recently been working with low-level file I/O in Java. I started with the java.io classes since I knew that inside and out, but moved to java.nio to see if I could improve performance. The results were shocking. First, the details:

 

  1. For the java.io code, I used a RandomAccessFile object and wrote directly to the file, seeking to the proper location as records were written, read back, and deleted. 
  2. For the java.nio code, first I used a FileChannel object. NIO is much more efficient than java.io because it allows you to read and write to a file (or a network) using whole data chunks. Java.io basically works one byte at a time, which is not as efficient.
  3. Further, to make things even more efficient, I updated the NIO code to use a MappedByteBuffer, which itself is built on top of the host OS platform's virtual memory system. According to the documentation, and the excellent O'Reilly book, Java NIO, this should result in the most efficient code in terms of performance and storage resources.

 

To start, I wrote a test application that simulated an employee database. The Employee data structure is as such:

 

    class Employee {

        String last; // the key

        String first;

        int id;

        int zip;

        boolean employed;

        String comments;

    }

 

Employe data is written to the file, and an index by key (last name) is maintained. Employee data is later loaded from the file by this key. Using a java.io.RandomAccessFile is required in all three cases - IO, NIO, and MappedByteBuffer code. The following code creates the file "employee.ejb", stored as the variable journal, in the current user's home directory:

 

    String userHome = System.getProperty("user.home");

    StringBuffer pathname = new StringBuffer(userHome);

    pathname.append(File.separator);

    pathname.append("employees.ejb");

    java.io.RandomAccessFilejournal = 

        new RandomAccessFile(pathname.toString(), "rw");

 

To use this file with java.nio code, one more line is added:

 

    java.nio.channels.FileChannel channel = journal.getChannel();

 

To further use this file with a MappedByteBuffer (also part of java.nio), add the following lines:

  

    journal.setLength(PAGE_SIZE);

    MappedByteBuffer mbb = 

        channel.map(FileChannel.MapMode.READ_WRITE, 0, journal.length() );

 

When you map a file, if the file size grows, the map will not see the new parts of the file. Since we want to read and write, we need to re-map it each time it data is written to the end of the file. To make this more efficient, the file is initially sized and subsequently grown by a predetermined amount (a page, if you will). This also makes the code a little more complex, but we can deal with it.

 

Writing Employee Records

 

With java.io, writing an employee record involves the following code:

 

    public boolean addRecord_IO(Employee emp) {

        try {

            byte[] last = emp.last.getBytes();

            byte[] first = emp.first.getBytes();

            byte[] comments = emp.comments.getBytes();


            // Just hard-code the sizes for perfomance

            int size = 0;

            size += emp.last.length();

            size += 4; // strlen - Integer

            size += emp.first.length();

            size += 4; // strlen - Integer

            size += 4; // emp.id - Integer

            size += 4; // emp.zip - Integer

            size += 1; // emp.employed - byte

            size += emp.comments.length();

            size += 4; // strlen - Integer


            long offset = getStorageLocation(size);


            //

            // Store the record by key and save the offset

            //

            if ( offset == -1 ) {

                // We need to add to the end of the journal. Seek there

                // now only if we're not already there

                long currentPos = journal.getFilePointer();

                long jounralLen = journal.length();

                if ( jounralLen != currentPos )

                    journal.seek(jounralLen);


                offset = jounralLen;

            }

            else {

                // Seek to the returned insertion point

                journal.seek(offset);

            }


            // Fist write the header

            journal.writeByte(1);

            journal.writeInt(size);


            // Next write the data

            journal.writeInt(last.length);

            journal.write(last);

            journal.writeInt(first.length);

            journal.write(first);

            journal.writeInt(emp.id);

            journal.writeInt(emp.zip);

            if ( emp.employed )

                journal.writeByte(1);

            else

                journal.writeByte(0);

            journal.writeInt(comments.length);

            journal.write(comments);


            // Next, see if we need to append an empty record if we inserted

            // this new record at an empty location

            if ( newEmptyRecordSize != -1 ) {

                // Simply write a header

                journal.writeByte(0); //inactive record

                journal.writeLong(newEmptyRecordSize);

            }


            employeeIdx.put(emp.last, offset);

            return true;

        }

        catch ( Exception e ) {

            e.printStackTrace();

        }


        return false;

    }

 

In contrast java.nio, the code to add and employee record is as follows:

 

    public boolean addRecord_NIO(Employee emp) {

        try {

            data.clear();

            byte[] last = emp.last.getBytes();

            byte[] first = emp.first.getBytes();

            byte[] comments = emp.comments.getBytes();

            data.putInt(last.length);

            data.put(last);

            data.putInt(first.length);

            data.put(first);

            data.putInt(emp.id);

            data.putInt(emp.zip);

            byte employed = 0;

            if ( emp.employed )

                employed = 1;

            data.put(employed);

            data.putInt(comments.length);

            data.put(comments);

            data.flip();

            int dataLen = data.limit();


            header.clear();

            header.put((byte)1); // 1=active record

            header.putInt(dataLen);

            header.flip();

            long headerLen = header.limit();


            int length = (int)(headerLen + dataLen);

            long offset = getStorageLocation((int)dataLen);


            //

            // Store the record by key and save the offset

            //

            if ( offset == -1 ) {

                // We need to add to the end of the journal. Seek there

                // now only if we're not already there

                long currentPos = channel.position();

                long jounralLen = channel.size();

                if ( jounralLen != currentPos )

                    channel.position(jounralLen);


                offset = jounralLen;

            }

            else {

                // Seek to the returned insertion point

                channel.position(offset);

            }


            // Fist write the header

            long written = channel.write(srcs);


            // Next, see if we need to append an empty record if we inserted

            // this new record at an empty location

            if ( newEmptyRecordSize != -1 ) {

                // Simply write a header

                data.clear();

                data.put((byte)0);

                data.putInt(newEmptyRecordSize);

                data.flip();

                channel.write(data);

            }


            employeeIdx.put(emp.last, offset);

            return true;

        }

        catch ( Exception e ) {

            e.printStackTrace();

        }


        return false;

    }

 

Using a MappedByteBuffer, the code to add and employee record is as follows:

 

    public boolean addRecord_MBB(Employee emp) {

        try {

            byte[] last = emp.last.getBytes();

            byte[] first = emp.first.getBytes();

            byte[] comments = emp.comments.getBytes();

            int datalen = last.length + first.length + comments.length + 12 + 9;

            int headerlen = 5;

            int length = headerlen + datalen;


            //

            // Store the record by key and save the offset

            //

            long offset = getStorageLocation(datalen);

            if ( offset == -1 ) {

                // We need to add to the end of the journal. Seek there

                // now only if we're not already there

                long currentPos = mbb.position();

                long journalLen = channel.size();

                if ( (currentPos+length) >= journalLen ) {

                    //log("GROWING FILE BY ANOTHER PAGE");

                    mbb.force();

                    journal.setLength(journalLen + PAGE_SIZE);

                    channel = journal.getChannel();

                    journalLen = channel.size();

                    mbb = channel.map(FileChannel.MapMode.READ_WRITE, 0, journalLen);

                    currentPos = mbb.position();

                }


                if ( currentEnd != currentPos )

                    mbb.position(currentEnd);


                offset = currentEnd;//journalLen;

            }

            else {

                // Seek to the returned insertion point

                mbb.position((int)offset);

            }


            // write header

            mbb.put((byte)1); // 1=active record

            mbb.putInt(datalen);


            // write data

            mbb.putInt(last.length);

            mbb.put(last);

            mbb.putInt(first.length);

            mbb.put(first);

            mbb.putInt(emp.id);

            mbb.putInt(emp.zip);

            byte employed = 0;

            if ( emp.employed )

                employed = 1;

            mbb.put(employed);

            mbb.putInt(comments.length);

            mbb.put(comments);


            currentEnd += length;


            // Next, see if we need to append an empty record if we inserted

            // this new record at an empty location

            if ( newEmptyRecordSize != -1 ) {

                // Simply write a header

                mbb.put((byte)0);

                mbb.putInt(newEmptyRecordSize);

                currentEnd += 5;

            }


            employeeIdx.put(emp.last, offset);


            return true;

        }

        catch ( Exception e ) {

            e.printStackTrace();

        }


        return false;

    }

 

Next, I tested each method in a loop to add 100,000 employee records, and recorded the elapsed time. Here are the results:

 

  • With java.io: ~10,000 milliseconds
  • With java.nio: ~2,000 milliseconds
  • With MappedByteBuffer: ~970 milliseconds

 

It's quite an improvement just going from java.io to java.nio. Going further to use a MappedByteBuffer (and the OS' virtual memory system) cut the elapsed time even further. Impressive!

 

Reading the records using the same three forms of file IO yielded the following results:

 

  • With java.io: ~6,900 milliseconds
  • With java.nio: ~1,400 milliseconds
  • With MappedByteBuffer: ~355 milliseconds

 

Again, java.nio resulted in the most efficient code, with the use of MappedByteBuffer yielding dramatic improvements. Of course, the timings are specific to my computer, but the percentage in performance improvement is what you should focus on. There's almost a 10 to 1 improvement in terms of both writing records and reading them when moving to NIO with a memory-mapped file. Therefore, if you cannot or do not want to use a real database to persist data in your application, java.nio with a MappedByteBuffer should yield the most efficient code. Of course, there are details such as when the data is actually persisted to the file system that you need to be aware of. You can learn all about that in the JavaDoc documentation, found here: http://java.sun.com/javase/6/docs/api/ 

 

For the complete code for this test, visit my site here:

http://www.ericbruno.com/nio.html

 

Happy coding!

-EJB 

 

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video