File IO in Java
I've recently been working with low-level file I/O in Java. I started with the java.io classes since I knew that inside and out, but moved to java.nio to see if I could improve performance. The results were shocking. First, the details:
- For the java.io code, I used a RandomAccessFile object and wrote directly to the file, seeking to the proper location as records were written, read back, and deleted.
- For the java.nio code, first I used a FileChannel object. NIO is much more efficient than java.io because it allows you to read and write to a file (or a network) using whole data chunks. Java.io basically works one byte at a time, which is not as efficient.
- Further, to make things even more efficient, I updated the NIO code to use a MappedByteBuffer, which itself is built on top of the host OS platform's virtual memory system. According to the documentation, and the excellent O'Reilly book, Java NIO, this should result in the most efficient code in terms of performance and storage resources.
To start, I wrote a test application that simulated an employee database. The Employee data structure is as such:
class Employee {
String last; // the key
String first;
int id;
int zip;
boolean employed;
String comments;
}
Employe data is written to the file, and an index by key (last name) is maintained. Employee data is later loaded from the file by this key. Using a java.io.RandomAccessFile is required in all three cases - IO, NIO, and MappedByteBuffer code. The following code creates the file "employee.ejb", stored as the variable journal, in the current user's home directory:
String userHome = System.getProperty("user.home");
StringBuffer pathname = new StringBuffer(userHome);
pathname.append(File.separator);
pathname.append("employees.ejb");
java.io.RandomAccessFilejournal =
new RandomAccessFile(pathname.toString(), "rw");
To use this file with java.nio code, one more line is added:
java.nio.channels.FileChannel channel = journal.getChannel();
To further use this file with a MappedByteBuffer (also part of java.nio), add the following lines:
journal.setLength(PAGE_SIZE);
MappedByteBuffer mbb =
channel.map(FileChannel.MapMode.READ_WRITE, 0, journal.length() );
When you map a file, if the file size grows, the map will not see the new parts of the file. Since we want to read and write, we need to re-map it each time it data is written to the end of the file. To make this more efficient, the file is initially sized and subsequently grown by a predetermined amount (a page, if you will). This also makes the code a little more complex, but we can deal with it.
Writing Employee Records
With java.io, writing an employee record involves the following code:
public boolean addRecord_IO(Employee emp) {
try {
byte[] last = emp.last.getBytes();
byte[] first = emp.first.getBytes();
byte[] comments = emp.comments.getBytes();
// Just hard-code the sizes for perfomance
int size = 0;
size += emp.last.length();
size += 4; // strlen - Integer
size += emp.first.length();
size += 4; // strlen - Integer
size += 4; // emp.id - Integer
size += 4; // emp.zip - Integer
size += 1; // emp.employed - byte
size += emp.comments.length();
size += 4; // strlen - Integer
long offset = getStorageLocation(size);
//
// Store the record by key and save the offset
//
if ( offset == -1 ) {
// We need to add to the end of the journal. Seek there
// now only if we're not already there
long currentPos = journal.getFilePointer();
long jounralLen = journal.length();
if ( jounralLen != currentPos )
journal.seek(jounralLen);
offset = jounralLen;
}
else {
// Seek to the returned insertion point
journal.seek(offset);
}
// Fist write the header
journal.writeByte(1);
journal.writeInt(size);
// Next write the data
journal.writeInt(last.length);
journal.write(last);
journal.writeInt(first.length);
journal.write(first);
journal.writeInt(emp.id);
journal.writeInt(emp.zip);
if ( emp.employed )
journal.writeByte(1);
else
journal.writeByte(0);
journal.writeInt(comments.length);
journal.write(comments);
// Next, see if we need to append an empty record if we inserted
// this new record at an empty location
if ( newEmptyRecordSize != -1 ) {
// Simply write a header
journal.writeByte(0); //inactive record
journal.writeLong(newEmptyRecordSize);
}
employeeIdx.put(emp.last, offset);
return true;
}
catch ( Exception e ) {
e.printStackTrace();
}
return false;
}
In contrast java.nio, the code to add and employee record is as follows:
public boolean addRecord_NIO(Employee emp) {
try {
data.clear();
byte[] last = emp.last.getBytes();
byte[] first = emp.first.getBytes();
byte[] comments = emp.comments.getBytes();
data.putInt(last.length);
data.put(last);
data.putInt(first.length);
data.put(first);
data.putInt(emp.id);
data.putInt(emp.zip);
byte employed = 0;
if ( emp.employed )
employed = 1;
data.put(employed);
data.putInt(comments.length);
data.put(comments);
data.flip();
int dataLen = data.limit();
header.clear();
header.put((byte)1); // 1=active record
header.putInt(dataLen);
header.flip();
long headerLen = header.limit();
int length = (int)(headerLen + dataLen);
long offset = getStorageLocation((int)dataLen);
//
// Store the record by key and save the offset
//
if ( offset == -1 ) {
// We need to add to the end of the journal. Seek there
// now only if we're not already there
long currentPos = channel.position();
long jounralLen = channel.size();
if ( jounralLen != currentPos )
channel.position(jounralLen);
offset = jounralLen;
}
else {
// Seek to the returned insertion point
channel.position(offset);
}
// Fist write the header
long written = channel.write(srcs);
// Next, see if we need to append an empty record if we inserted
// this new record at an empty location
if ( newEmptyRecordSize != -1 ) {
// Simply write a header
data.clear();
data.put((byte)0);
data.putInt(newEmptyRecordSize);
data.flip();
channel.write(data);
}
employeeIdx.put(emp.last, offset);
return true;
}
catch ( Exception e ) {
e.printStackTrace();
}
return false;
}
Using a MappedByteBuffer, the code to add and employee record is as follows:
public boolean addRecord_MBB(Employee emp) {
try {
byte[] last = emp.last.getBytes();
byte[] first = emp.first.getBytes();
byte[] comments = emp.comments.getBytes();
int datalen = last.length + first.length + comments.length + 12 + 9;
int headerlen = 5;
int length = headerlen + datalen;
//
// Store the record by key and save the offset
//
long offset = getStorageLocation(datalen);
if ( offset == -1 ) {
// We need to add to the end of the journal. Seek there
// now only if we're not already there
long currentPos = mbb.position();
long journalLen = channel.size();
if ( (currentPos+length) >= journalLen ) {
//log("GROWING FILE BY ANOTHER PAGE");
mbb.force();
journal.setLength(journalLen + PAGE_SIZE);
channel = journal.getChannel();
journalLen = channel.size();
mbb = channel.map(FileChannel.MapMode.READ_WRITE, 0, journalLen);
currentPos = mbb.position();
}
if ( currentEnd != currentPos )
mbb.position(currentEnd);
offset = currentEnd;//journalLen;
}
else {
// Seek to the returned insertion point
mbb.position((int)offset);
}
// write header
mbb.put((byte)1); // 1=active record
mbb.putInt(datalen);
// write data
mbb.putInt(last.length);
mbb.put(last);
mbb.putInt(first.length);
mbb.put(first);
mbb.putInt(emp.id);
mbb.putInt(emp.zip);
byte employed = 0;
if ( emp.employed )
employed = 1;
mbb.put(employed);
mbb.putInt(comments.length);
mbb.put(comments);
currentEnd += length;
// Next, see if we need to append an empty record if we inserted
// this new record at an empty location
if ( newEmptyRecordSize != -1 ) {
// Simply write a header
mbb.put((byte)0);
mbb.putInt(newEmptyRecordSize);
currentEnd += 5;
}
employeeIdx.put(emp.last, offset);
return true;
}
catch ( Exception e ) {
e.printStackTrace();
}
return false;
}
Next, I tested each method in a loop to add 100,000 employee records, and recorded the elapsed time. Here are the results:
- With java.io: ~10,000 milliseconds
- With java.nio: ~2,000 milliseconds
- With MappedByteBuffer: ~970 milliseconds
It's quite an improvement just going from java.io to java.nio. Going further to use a MappedByteBuffer (and the OS' virtual memory system) cut the elapsed time even further. Impressive!
Reading the records using the same three forms of file IO yielded the following results:
- With java.io: ~6,900 milliseconds
- With java.nio: ~1,400 milliseconds
- With MappedByteBuffer: ~355 milliseconds
Again, java.nio resulted in the most efficient code, with the use of MappedByteBuffer yielding dramatic improvements. Of course, the timings are specific to my computer, but the percentage in performance improvement is what you should focus on. There's almost a 10 to 1 improvement in terms of both writing records and reading them when moving to NIO with a memory-mapped file. Therefore, if you cannot or do not want to use a real database to persist data in your application, java.nio with a MappedByteBuffer should yield the most efficient code. Of course, there are details such as when the data is actually persisted to the file system that you need to be aware of. You can learn all about that in the JavaDoc documentation, found here: http://java.sun.com/javase/6/docs/api/
For the complete code for this test, visit my site here:
http://www.ericbruno.com/nio.html
Happy coding!
-EJB

