Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

import java.*: File Processing


February 2001 Java Solutions/import java.*


In the previous installment of this column (C/C++ Users Journal, November 2000) I illustrated the classes in java.lang.io that provide basic byte and character stream I/O. A distinguishing feature of many of these classes is their tiered relationship, implemented via the Decorator pattern. For example, a low-level class such as FileReader opens a file. To add line-oriented I/O capability, you can wrap the FileReader object in a BufferedReader, like this:

FileReader f =
   new FileReader("file.txt");
BufferedReader b =
   new BufferedReader(f);

The basic classes discussed last time all extend an input or output superclass, either InputStream and OutputStream, for byte streams; or Reader and Writer for character streams. One thing I didn’t mention last time was that the file output classes have an overloaded constructor with a second boolean argument for appending data to files. The program in Listing 1 uses such a FileWriter to implement file logging. Often when writing to a log file it is necessary to open and close the file each time you access it. Although certainly more costly than keeping the file open continuously, it is often the only way to guarantee that all the log data gets written. For this reason the LogFile constructor just stores the name of the log file.

The log method opens a FileWriter in append mode by using the two-arg constructor with a second argument of true. If the file doesn’t exist already, it is created. I decorate that FileWriter with a BufferedWriter, not for buffering (I really don’t want any!), but for its newline method. You might be tempted to just use a FileWriter and write a ’\n’ to it to terminate a line, but not all platforms use that character as a new line terminator. The technique in Listing 1 ports nicely because newline queries the system property line.separator for the correct character(s) to push onto the output stream. The test program in Listing 2 uses both LogFile methods to write to a log file.

So much for the basics. I’ll now cover the “rest of the story” for file I/O in Java 2.

Random Access Files

You’ve probably noticed by now that there is no basic stream class that provides both input and output capability simultaneously, like iostream does in C++. C++ can do this because it has multiple inheritance (although iostream also requires virtual inheritance, one of the most confusing features in the C++ language — count your blessings, Java people!). What Java does offer is RandomAccessFile, a class that supports both input and output as well as file positioning. A RandomAccessFile traffics in bytes, not characters, so it provides methods for reading and writing single bytes and byte arrays, although it also can read and write strings (converting to and from bytes, of course). RandomAccessFile also implements the DataInput and DataOutput interfaces, so you can also work with primitive types.

A traditional application for random access files processes fixed-length data records, so you can access particular records directly with file positioning. (Database systems used this technique in days of yore). The program in Listing 3 defines a fixed-size Employee class with the following layout:

Employee no.  1 int (4 bytes)
Last name    15 characters (30 bytes)
First name   15 characters (30 bytes)

For the convenience of users of the Employee class, it stores the name fields as String objects, but when it comes time to read or write Employee objects, these fields need to be treated as byte arrays. Furthermore, strings over 15 characters must be truncated and shorter ones need to be filled out. (I chose a fill byte of 0xFF, something that wouldn’t occur in user data.) You can see this technique illustrated in the stringToBytes method. To write an Employee record to a RandomAccessFile, function Employee.write calls stringsToBytes, which builds a buffer large enough for both name fields and calls stringToBytes to fill them, after which it writes the employee number. To read a record back in, Employee.read calls RandomAccessFile.readFully, which fills the fixed-length byte array with the name data. To correctly build each name field string I have to search for the first occurrence of fillByte to determine its length.

As you can see in the test program in Listing 4, to open a RandomAccessFile for both reading and writing, you need to specify a second argument of "rw" in the constructor. After writing a couple of Employee records to the file I move the file pointer between record boundaries by calling RandomAccessFile.seek with the size of the record as an argument. (Seek positions are always relative to the beginning of the file.) This particular example writes two employee records and then swaps them in memory by reading them backwards.

Although this is the first month that this column appears in this Java Solutions supplement, and therefore I am not obliged to mention C++ at all, I still can’t resist showing how to do the above in C++ for comparison. The program in Listing 5 accomplishes the same thing as Listings 3 and 4, but in 50 lines instead of 152! In fairness to the Java, however, I must admit a lot of safety is inherent in the Java version. For example, there is no danger of overflowing a String or even an array in Java, but if I make an error in my array access in C++, I’m dog meat! The C++ version also lacks the advantages of object-orientation, and if I had implemented a C++ Employee class, then more lines would have resulted as well. Nonetheless, if you’re coming from the C world, one of the first things you notice about Java is its verbosity. Like it or lump it.

The complement to the seek method is RandomAccessFile.getFilePointer, which returns the offset of the current file position as a long [1]. As a final example of the file positioning methods, the program in Listings 6, 7, and 8 illustrate a file viewer — an application that scrolls through a file a screen at a time, both forward and backward [2]. The FileViewer class in Listing 6 uses a read-only RandomAccessFile so it can move around, and a stack to keep track of where it’s been so it can scroll backwards. The constructor opens the file and displays the first screen. The topPos field keeps track of the file position of the first line currently in the display. To scroll down, the next method pushes topPos on the stack and then displays the next screen, while previous undoes that operation.

You might think it strange that I bother to separate the read and display operations, storing the current screen’s lines in an ArrayList (which is like a Vector), instead of just displaying the lines immediately. The reason I do so is to support the last method, which scrolls immediately to the end of the file. I need to read sequentially, stacking each screen as I go, so I can scroll backwards once I reach the end, but I certainly don’t need to display as I go.

The program in Listing 7 provides a simple command-line interface for viewing a file with FileViewer. Just to be useful it allows redundant commands for each operation (such as 'n' and 'd' (down) for viewing the next screen). I must admit that I like the way Java forces me to design in a higher-level, object-oriented fashion. The C version of this program I wrote years ago, while shorter, doesn’t separate the file positioning from the viewing, like the FileViewer and ViewFile classes do. It just came automatically now that I’ve been using Java for a number of years.

In Listing 8 you can see that I implemented a stack with Java 2’s LinkedList class. For more on LinkedList, ArrayList, and other collections, see the September 2000 issue of this column in CUJ.

Exploring the File System

Working with files is often more than just doing input and output. Sometimes you need to know what files are in a directory, or whether a certain file exists at all, or you may need to delete a file. All this and more is possible with the methods of the File class. A File object represents a path, not a file stream. In fact, the corresponding file doesn’t even have to exist, although subsequent operations may fail if that is the case. File objects are based on hierarchical directory structures such as are found in Unix and DOS/Windows [3]. Since Unix uses a forward slash to separate components of a path, and Windows uses a backslash, you can determine these characters at run time via the file.separator system property. The program in Listing 9 shows the properties of interest for file processing; the output is for a Windows 2000 system.

A File object can represent either a directory name or a file name, since both are valid path names. You can query which is the case with the isDirectory and isFile methods respectively. You can retrieve the name of the path in two basic forms: absolute and relative. The absolute name of a path is the full path name from its root (e.g., C:\), and the relative name is the last component of the absolute name (such as PropTest.java). An alternate form of absolute name, called the canonical path, is a system-dependent rendition of an absolute path name. Most of the time it is just the same as the absolute path, but on Unix systems, if the absolute path has file system links, then the canonical path will resolve those links to give the true physical path. In other words, a canonical path is more "real" than an absolute path.

The File class has methods for listing the contents of a directory, deleting and renaming files, requesting file attributes such as size, time last modified, and a user’s read and write permissions, and for navigating directories. The program in Listing 10 lists the names of the entries in an entire subdirectory tree. If you don’t specify a starting directory, it uses the current user directory. The File.listFiles method returns an array of File objects representing the contents of the given directory; getName returns the relative pathname of an entry. If the entry is a directory, I call the list method recursively. This particular example shows the files that form this article, and a subdirectory named "temp".

Listing 11 shows how you can control which files come back from a call to listFiles. The nested class SuffixFilter implements the FilenameFilter interface, which has a single method: accept(File dir, String name). When you call the overloaded version of listFiles that takes a FilenameFilter, it calls accept for each entry and returns only those for which your accept method returns true. This example reads a suffix from the command line, stores it in the static field ListSomeFiles.suffix, and displays only the matching files from the current directory.

The ListFiles class in Listing 12 illustrates the informational methods in the File class. It is basically a traditional directory lister that displays directory information in fixed-length columns. If you’re a little rusty on the format classes in java.text, see my article, "Formatted Text and Locales," in the July 2000 issue of this column (in CUJ). The program in Listing 13 shows how easy it is to find a file in a subdirectory tree by applying listFiles recursively. It uses File.getCanonicalPath to print the full pathname of the file.

Summary

Java gives you as much control over files and the file system as a “write once run anywhere” language can claim. Although not necessarily fit for implementing a DBMS, the RandomAccessFile class gives you simultaneous input and output on a file of bytes (more or less the equivalent of an expandable byte array on disk), which can be useful. The File class gives you almost everything you need for navigating and tweaking your file system. It’s not POSIX, but it’s close. Magazine real estate won’t allow me to explore it in this issue, but Java does supply classes that support zip and jar [4] files. Just as a teaser, the program in Listing 14 displays information for each entry in a zip file.

Notes

[1] C/C++ programmers: remember that a long in Java is potentially much larger than in most C/C++ environments (64 bits!), so there is no practical need for a special type like filepos_t as provided in C.

[2] Yes, I know it’s an antiquated command-line style example, but it’s fun, so bear with me.

[3] Much of File’s functionality is a no-op on the MacIntosh.

[4] Jar files are zip files that also contain manifest information. See the September 1999 installment of this column, "Packaging Your Objects," in CUJ for more information on JAR files.

Chuck Allison is a long-time columnist with CUJ. During the day he does Internet-related development in Java and C++ as a Software Engineering Senior in the Custom Development Department at Novell, Inc. in Provo, Utah. He was a contributing member of the C++ standards committee for most of the 1990’s and authored the book C & C++ Codes Capsules: A Guide for Practitioners (Prentice-Hall, 1998). He has taught mathematics and computer science at six western colleges and universities and at many corporations throughout the U.S. You can email Chuck at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.