Web Development

Get More Out of Open

By brian d foy, October 01, 2003

open()is a rich Perl feature that has a lot to offer.

October, 2005: Get More Out of Open

brian has been a Perl user since 1994. He is founder of the first Perl Users Group, NY.pm, and Perl Mongers, the Perl advocacy organization. He has been teaching Perl through Stonehenge Consulting for the past five years, and has been a featured speaker at The Perl Conference, Perl University, YAPC, COMDEX, and Builder.com. Contact brian at [email protected].

Back in the day when I was a kid and I had to walk to school uphill (both ways!) in the snow to school, open() only had two arguments, and it was enough for us. Since then open() is a lot more rich and feature full. Indeed, it is so rich that it gets its own tutorial, perlopentut, and it's perlfunc entry is almost 400 lines long (which is a lot longer than this article).

Before starting, I have a bit of a challenge for you which lets me skip talking about a part of open() and leave that up to you. There is a one argument form of open. Do you know what it is, and can you give an example of how you would use it? I'll answer this next month. It does have a useful application, but you have to keep in mind Perl's origins to be in the right mindset.

When I started in Perl, open() took a filehandle identifier and file. Filehandle identifiers don't get a special sigil, so they appear as barewords. Since they appear as barewords, Perl's convention was to completely capitalize them. They certainly stood out in the program text.

open( FILE, "README" ) || die "Could not open README! $!";

We ran into problems when we wanted to pass around filehandle around. If I want to open a filehandle in one place but use it in another, I had to do a lot of ugly typing. For instance, to pass a filehandle to a subroutine, I pass its typeglob, or even a reference to it. In the subroutine, I saved that in a scalar and let print() figure it out (even if I or the maintenance programmers had to scratch our heads).

open( LOG, ">> logfile" ) || die "Could not write to logfile: $!";

here_i_am( *LOG );
log_this( \*LOG, "Hey, I'm done!" );

sub here_i_am {
    local $fh = shift;
    print $fh "Here I am!\n";
}

sub log_this {
    local *FILE = shift;
    print FILE @_;
}

In 5.6, Perl got smart enough to skip the typeglob step. I could create a filehandle in a scalar variable directly as an "indirect filehandle". This looks a lot better. I don't have to explain typeglobs or references to typeglobs, or why some people use one or the other. This works when the variable, in this case $fh, is undefined.

open( $fh, ">> logfile" ) || die "Could not write to logfile: $!";
print "$fh\n";   # something like GLOB(...)

That variable has to be uninitialized. The following example doesn't do what I want because $fh doesn't end up with a filehandle reference since it is already defined.

my $fh = "I'm not a filehandle!";
open( $fh, ">> logfile" ) || die "Could not write to logfile: $!";
print "$fh\n";   # prints "I'm not a filehandle!"

Luckily, Perl convention gets around this by declaring the variable directly in the open(). All of your other variables are lexicals, right? So why not this one too?

open( my $fh, ">> logfile" )  || die "Could not write to logfile: $!";

That's much nicer. I can now simply pass around scalar variables, and most people already know how to do that. This is one of the telling marks of the intermediate Perl programmer. We don't talk about this in Learning Perl because we want to get people to open files the fastest (in student time) way possible, then introduce better programming idioms once they understand the quick-and-dirty way.

That didn't fix all of the problems though. There was this thing known as "magic open" that was obscure enough to be a Final Jeopardy question. Perl does some guessing on what we want it to do. For instance, in my previous example, Perl looks at the second argument and pulls it apart to figure out what to do. It sees the >> and guesses that we mean to open something in append mode, and that we don't mean to read from the file named ">> foo" (and yes, I can create a file with that name). After the >> it keeps guessing, and it figures that the leading whitespace is not part of the filename (and, yes, I can create a filename with leading whitespace). Moving on, it gets the name of the file, "logfile":

open( my $fh, ">> logfile" )  || die "Could not write to logfile: $!";

This magic also discards trailing whitespace too, so all of these open the same file. Perl does what is common, and file names " logfile", "logfile ", and " logfile " aren't common, or at least they should be. If you really want to annoy someone, put files with a trailing space in their directory and watch how long it takes them to delete it (especially if you make it so they can't use a glob. Don't tell anyone I told you about this.):

open( my $fh, ">>logfile" );
open( my $fh, ">> logfile" );
open( my $fh, ">>logfile " );
open( my $fh, ">> logfile " );

Sometimes, however, we don't want this magic open. We can use Perl's three argument form. Instead of lumping a bunch of things together in the second argument, I break apart the open() mode and the filename. When I do that, the filename is exactly what I specify, whitespace and all:

open( my $fh, ">>", "logfile " );

Okay, three arguments should be enough for anyone, right? Not really. Magic open() often causes problems because we're opening things in a pipe. Just like system() and exec() have list forms where they don't let the shell handle special characters as special characters. The example in the perlfunc has five arguments:

open(FOO, '-|', "cat", '-n', $file);

Now that you know that open() is a lot more special than the two argument form that you may be used to, get ready to take it to a whole other level with new "plumbing" (in the words of perlopentut) for the IO framework. With PerlIO, I can think about IO in layers. The first layer is the sequence of bytes in the file, the second layer is the stuff that perl reads, and another layer is the stuff that ends up in my program. With PerlIO, I can do different things to the layers. For instance, I can read a gzipped file but end up with the uncompressed output when I read from it. No fuss no muss! Well, I do have to install the PerlIO::gzip module, but that's not a big deal.

For instance, CPAN.pm uses a couple of files (02packages.details.txt.gz, 03modlist.data.gz) to figure out where distributions are and how to install them. CPAN distributes these are gzipped files to cut down on space since an installer utility needs to download these files to start its work.

Without PerlIO, I'd have to un-zip them myself, or provide some way to read them as a stream and uncompress them on the fly. That's a huge pain. Without PerlIO, this doesn't work like I want it to. It thinks it's opening a text file and reading until the first newline (or whatever is in $/). It reads quite a bit of data and prints a bunch of gook to my screen:

open( my $fh, "/MINICPAN/modules/03modlist.data.gz" );
print scalar <$fh>;

With PerlIO, I just need to stick in another layer. The PerlIO::gzip module can unzip the data on the fly and give it back to me as the uncompressed text with almost no work on my part. In the second argument, when Perl sees the ":gzip", it automatically looks for and loads PerlIO::gzip:

open( my $fh, "<:gzip", "/MINICPAN/modules/03modlist.data.gz" );
print scalar <$fh>;

Instead of gobbledygook, I get the first line of uncompressed text:

File:        03modlist.data

There are many other pre-existing filters for PerlIO. Don't like all those DOS CRLF pairs? No problem. You don't have to run dos2unix on the file. Just use the built-in PerlIO::crlf filter. The PerlIO will automatically convert line endings:

open( my $fh, "<:crlf", "dosfile1.txt" );

Do you want to get the raw data, rather than some unicode-aware layer that knows about wide characters? Use PerlIO::byte:

open( my $fh, "<:bytes", "dosfile1.txt" );

Want to turn on binmode directly in the open()? Use PerlIO::raw:

open( my $fh, "<:raw", "brian.jpg" );

Although I've shown examples for reading data, these work the other way too. Besides the layers that you see in the PerlIO man page, there are many extra ones on CPAN in either the PerlIO or the PerlIO::via namespaces. If you don't find what you need, you can even crib off one that is already there to create your own.

So open() has come a long way since I started using Perl, and the kids today have it much easier--not only do you get more arguments, but the arguments can do more. You don't even have to know you are doing complex IO transformations when PerlIO does them for you. Maybe in 10 years you'll look back on these fancy features and complain about how hard it was it your day and how easy kids have it.

TPJ

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Web Development

Get More Out of Open

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Web Development Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Web Development

Get More Out of Open

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Web Development Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content