PDL: The Perl Data Language

DDJ, Software Careers Fall 97: PDL: The Perl Data Language

Karl is an astronomer at the Anglo-Australian Observatory and can be reached at [email protected] Frossie is a software engineer and astronomer at the UK Infrared Telescope in Hawaii and can be reached at [email protected] Reprinted courtesy of The Perl Journal (http://www.tpj.com/).
Tables and Examples
Extolling the virtues of Perl and its many uses to the many Perl users out there is preaching to the choir. Nevertheless, there is one fundamental area in computing where Perl has been conspicuously absent-number crunching.

Tarred by the same brush as other scripting languages, Perl (which in fact is semicompiled), is perceived as too slow and memory hungry for heavy numerical computations because it doesn't lend itself to storing and retrieving zillions of numbers quickly. This has been a source of great frustration to us, enthusiastic Perl users who resent being forced to use other environments for our astronomical data analysis. Perl's potential for manipulating numerical data sets speedily and elegantly via an extension was obvious. Hence PDL, the Perl Data Language, was conceived. PDL is a Perl extension for numerical manipulation that provides the convenience of Perl with the speed of compiled C.

PDL introduces a new data structure: the "pdl numerical array," often referred to as a "piddle." (This unfortunate nickname has led to some rather dubious puns in the source code.) A piddle is a special object that can contain a large block of efficiently stored numbers for manipulation with normal mathematical expressions. For example, if \$a is a piddle containing a 346 chunk of data, then the Perl statement \$b = sin(\$a) will set \$b equal to \$a; but with every value replaced, with its sine. Because each operation is implemented via compiled C code, it's nearly as fast as a hand-crafted C program.

PDL can be used "normally" from a script, but it also has a shell interface for interactive data analysis and prototyping. In this article, we demonstrate PDL using the PDL shell, called "perldl." We assume you have PDL-1.11 (see Table 1 for a list of PDL 1.11 functions), PGPLOT-2.0 (which itself requires the pgplot graphics library), and Perl 5.003. If you also have the right versions of the Term::ReadLine and Term::ReadKey modules, the perldl shell allows interactive command-line editing.

The perldl shell, invoked at the command line by perldl, behaves like Perl's debugger. For instance, we can assign values to variables and print them with p, as in Example 1. PDL is really about matrices. In Example 2, we create a 23 matrix and multiply it by itself.

To have true fun with PDL, we need some data. Luckily, the PDL distribution comes with a picture of the sky stored in FITS, the standard format for astronomical data. PDL also supplies rfits(), a function that reads FITS files and returns a piddle containing the data. Example 3(a) shows how we read in our image and plot it.

Now we have data (and we didn't have to spend three nights freezing on top of a mountain to get it). What do we know about it? We know that it is 16-bit with 65,536 elements. But is it 65536*1 or 256*256 or even 16*16*16*16? We can determine the dimensions of our data by using the dims() function; see Example 3(b). dims() is a PDL function that returns the dimensions of a piddle. Not surprisingly (after all, it's a picture of the sky), we have a two-dimensional image: 256*256.

To determine the mean and standard deviation of a piddle, we use the stats() function; Example 3(c). To print a portion of the piddle, we use the sec() function. Since we don't want to display all 65,536 numbers in this particular example, we use sec() to return the bottom-left corner instead. Example 3(d) displays the rectangle between elements (0,252) and (3,255). Additional dimensions are handled seamlessly by passing the extra coordinate values as arguments.

Perhaps you're getting restless at this point. Let's abandon the function calls and jump to the cool stuff. The PDL function imag() uses the PGPLOT library to display images. In Example 4(a), imag() displays the piddle \$a; see Figure 1(a). It's full of stars! This is, in fact, an image of Messier 51, a spiral galaxy similar to our own but at a distance of 200,000,000,000,000,000,000 miles away, give or take a few billion. That's a bit too far for us to invade, but we can at least humiliate it, using the command in Example 4(b). This results in Figure 1(b).

Since we're exploring cosmology, we can create something out of nothing; see Example 5(a). As you can see, PDL functions can be chained together just like Perl functions. Two PDL functions are cascaded here: rvals() and zeroes(). First, zeroes() creates a piddle full of zeros-in this case, a 2020 matrix with every element zero. (There's also a ones() function that creates a piddle full of ones.) Then, rvals() fills that piddle with values representing the distance of each element from the center. We now use the exp() function to generate a two-dimensional Gaussian, as shown in Example 5(b) and Figure 2(a).

 Figure 1: (a) Image of Messier 51, a spiral galaxy similar to ours; (b) humiliated galaxy.

The less mathematically inclined will say it looks like a blob. We can inflict a bit more punishment on Messier 51 by convolving it with our newly created Gaussian filter; Example 5(c) and Figure 2(b). This enables us to simulate what we would see if we were observing through very bad viewing conditions, such as a (possibly drunken) haze. This operation takes 20-25 seconds on a Pentium 120 or Sparc 20 with PDL. Doing this with a 2D array in normal (non-PDL) Perl takes 13 minutes and uses 11 times as much memory.

Example 5(d) creates an unsharp masked image, as in Figure 2(c). This is often used in astronomy to emphasize sharp features against a bright background, such as stars in a galaxy, the giant luminous gas clouds we call HII regions, or foreground objects such as UFO's (err, weather balloons).

Where are We Now?

PDL was prototyped early in 1996 and is currently in beta release. (One author's habit of prototyping Perl modules while on astronomical observing trips leaves the other author wondering whether this is a previously unknown symptom of altitude sickness.) It is functional but not yet fully featured. It is under vigorous development, thanks to the work and enthusiasm of the perldl mailing list participants. The current stable version as we write this article is 1.11.

 Figure 2: (a) Two-dimensional Gaussian filter; (b) Messier 51 convolved with Gaussian filter; (c) unsharpened mask.
Several auxiliary modules for PDL are nearing completion: 3D graphics manipulation using OpenGL (Tuomas J. Lukka), an interface to the Meschach matrix library (Etienne Grossmann), an interface to the XMGR plotting tool (Tim Jenness), access to the TIFF, GIF, PostScript, and other formats supported by PBM+ (Christian Soeller), and an interface to the SLATEC numerical library (Tuomas J. Lukka). Other formats are under development for PDL 2.0, which will also feature virtual slicing, easier C extensions, and fast implicit looping.

Anyone wishing to reflect or opine on the rather technical issues surrounding PDL development is welcome to join the perldl mailing list at [email protected] or the porters list at [email protected] Send your questions to the former and your complaints to the latter. Finally, more information (including Christian Soeller's PDL FAQ) is available at http://www.aao.gov.au/local/www/kgb/perldl/.

Acknowledgment

We thank Pat Seitzer and the IRAF group of the National Optical Astronomy Observations for permission to use the image of M51.

DDJ

More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.