Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

Get More Out of Open


October, 2005: Get More Out of Open

brian has been a Perl user since 1994. He is founder of the first Perl Users Group, NY.pm, and Perl Mongers, the Perl advocacy organization. He has been teaching Perl through Stonehenge Consulting for the past five years, and has been a featured speaker at The Perl Conference, Perl University, YAPC, COMDEX, and Builder.com. Contact brian at [email protected].


Back in the day when I was a kid and I had to walk to school uphill (both ways!) in the snow to school, open() only had two arguments, and it was enough for us. Since then open() is a lot more rich and feature full. Indeed, it is so rich that it gets its own tutorial, perlopentut, and it's perlfunc entry is almost 400 lines long (which is a lot longer than this article).

Before starting, I have a bit of a challenge for you which lets me skip talking about a part of open() and leave that up to you. There is a one argument form of open. Do you know what it is, and can you give an example of how you would use it? I'll answer this next month. It does have a useful application, but you have to keep in mind Perl's origins to be in the right mindset.

When I started in Perl, open() took a filehandle identifier and file. Filehandle identifiers don't get a special sigil, so they appear as barewords. Since they appear as barewords, Perl's convention was to completely capitalize them. They certainly stood out in the program text.

open( FILE, "README" ) || die "Could not open README! $!";

We ran into problems when we wanted to pass around filehandle around. If I want to open a filehandle in one place but use it in another, I had to do a lot of ugly typing. For instance, to pass a filehandle to a subroutine, I pass its typeglob, or even a reference to it. In the subroutine, I saved that in a scalar and let print() figure it out (even if I or the maintenance programmers had to scratch our heads).

open( LOG, ">> logfile" ) || die "Could not write to logfile: $!";

here_i_am( *LOG );
log_this( \*LOG, "Hey, I'm done!" );

sub here_i_am {
    local $fh = shift;
    print $fh "Here I am!\n";
}

sub log_this {
    local *FILE = shift;
    print FILE @_;
}

In 5.6, Perl got smart enough to skip the typeglob step. I could create a filehandle in a scalar variable directly as an "indirect filehandle". This looks a lot better. I don't have to explain typeglobs or references to typeglobs, or why some people use one or the other. This works when the variable, in this case $fh, is undefined.

open( $fh, ">> logfile" ) || die "Could not write to logfile: $!";
print "$fh\n";   # something like GLOB(...)

That variable has to be uninitialized. The following example doesn't do what I want because $fh doesn't end up with a filehandle reference since it is already defined.

my $fh = "I'm not a filehandle!";
open( $fh, ">> logfile" ) || die "Could not write to logfile: $!";
print "$fh\n";   # prints "I'm not a filehandle!"

Luckily, Perl convention gets around this by declaring the variable directly in the open(). All of your other variables are lexicals, right? So why not this one too?

open( my $fh, ">> logfile" )  || die "Could not write to logfile: $!";

That's much nicer. I can now simply pass around scalar variables, and most people already know how to do that. This is one of the telling marks of the intermediate Perl programmer. We don't talk about this in Learning Perl because we want to get people to open files the fastest (in student time) way possible, then introduce better programming idioms once they understand the quick-and-dirty way.

That didn't fix all of the problems though. There was this thing known as "magic open" that was obscure enough to be a Final Jeopardy question. Perl does some guessing on what we want it to do. For instance, in my previous example, Perl looks at the second argument and pulls it apart to figure out what to do. It sees the >> and guesses that we mean to open something in append mode, and that we don't mean to read from the file named ">> foo" (and yes, I can create a file with that name). After the >> it keeps guessing, and it figures that the leading whitespace is not part of the filename (and, yes, I can create a filename with leading whitespace). Moving on, it gets the name of the file, "logfile":

open( my $fh, ">> logfile" )  || die "Could not write to logfile: $!";

This magic also discards trailing whitespace too, so all of these open the same file. Perl does what is common, and file names " logfile", "logfile ", and " logfile " aren't common, or at least they should be. If you really want to annoy someone, put files with a trailing space in their directory and watch how long it takes them to delete it (especially if you make it so they can't use a glob. Don't tell anyone I told you about this.):

open( my $fh, ">>logfile" );
open( my $fh, ">> logfile" );
open( my $fh, ">>logfile " );
open( my $fh, ">> logfile " );

Sometimes, however, we don't want this magic open. We can use Perl's three argument form. Instead of lumping a bunch of things together in the second argument, I break apart the open() mode and the filename. When I do that, the filename is exactly what I specify, whitespace and all:

open( my $fh, ">>", "logfile " );

Okay, three arguments should be enough for anyone, right? Not really. Magic open() often causes problems because we're opening things in a pipe. Just like system() and exec() have list forms where they don't let the shell handle special characters as special characters. The example in the perlfunc has five arguments:

open(FOO, '-|', "cat", '-n', $file);

Now that you know that open() is a lot more special than the two argument form that you may be used to, get ready to take it to a whole other level with new "plumbing" (in the words of perlopentut) for the IO framework. With PerlIO, I can think about IO in layers. The first layer is the sequence of bytes in the file, the second layer is the stuff that perl reads, and another layer is the stuff that ends up in my program. With PerlIO, I can do different things to the layers. For instance, I can read a gzipped file but end up with the uncompressed output when I read from it. No fuss no muss! Well, I do have to install the PerlIO::gzip module, but that's not a big deal.

For instance, CPAN.pm uses a couple of files (02packages.details.txt.gz, 03modlist.data.gz) to figure out where distributions are and how to install them. CPAN distributes these are gzipped files to cut down on space since an installer utility needs to download these files to start its work.

Without PerlIO, I'd have to un-zip them myself, or provide some way to read them as a stream and uncompress them on the fly. That's a huge pain. Without PerlIO, this doesn't work like I want it to. It thinks it's opening a text file and reading until the first newline (or whatever is in $/). It reads quite a bit of data and prints a bunch of gook to my screen:

open( my $fh, "/MINICPAN/modules/03modlist.data.gz" );
print scalar <$fh>;

With PerlIO, I just need to stick in another layer. The PerlIO::gzip module can unzip the data on the fly and give it back to me as the uncompressed text with almost no work on my part. In the second argument, when Perl sees the ":gzip", it automatically looks for and loads PerlIO::gzip:

open( my $fh, "<:gzip", "/MINICPAN/modules/03modlist.data.gz" );
print scalar <$fh>;

Instead of gobbledygook, I get the first line of uncompressed text:

File:        03modlist.data

There are many other pre-existing filters for PerlIO. Don't like all those DOS CRLF pairs? No problem. You don't have to run dos2unix on the file. Just use the built-in PerlIO::crlf filter. The PerlIO will automatically convert line endings:

open( my $fh, "<:crlf", "dosfile1.txt" );

Do you want to get the raw data, rather than some unicode-aware layer that knows about wide characters? Use PerlIO::byte:

open( my $fh, "<:bytes", "dosfile1.txt" );

Want to turn on binmode directly in the open()? Use PerlIO::raw:

open( my $fh, "<:raw", "brian.jpg" );

Although I've shown examples for reading data, these work the other way too. Besides the layers that you see in the PerlIO man page, there are many extra ones on CPAN in either the PerlIO or the PerlIO::via namespaces. If you don't find what you need, you can even crib off one that is already there to create your own.

So open() has come a long way since I started using Perl, and the kids today have it much easier--not only do you get more arguments, but the arguments can do more. You don't even have to know you are doing complex IO transformations when PerlIO does them for you. Maybe in 10 years you'll look back on these fancy features and complain about how hard it was it your day and how easy kids have it.

TPJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.