Robert is the former maintainer of the Linux Frequently Asked Questions with Answers Usenet FAQ. He can be reached at rkiesling@ mainmatter.com.
Perl has always had a compiler, as the perlcompile manual page provided with Version 5.8.0 points out, but its normal use is only to generate the internal bytecode that is run by the interpreter.
All recent versions of Perl, however, have had the ability to generate C source code or run a compiler to produce a standalone executable program. The library modules B.pm, C.pm, and CC.pm provide an interface to Perl's internal bytecode compiler. The perlcc script processes command-line options and runs all of the components of the compilation process. A listing of command-line options that perlcc accepts is given in Figure 1.
In earlier versions of Perl, compiler support for extension modules was inconsistent, reducing the usefulness of the compiler. There are still some language features that don't compile correctly, like unsigned math operations. But scripts that use complex extensions like Perl/Tk are much easier to turn into standalone programs using Perl 5.8.0.
There has been much reworking of the Perl internalsimprovements in the IO abstraction layer, new interpreter threads, and code for a new version of MacPerl. This redesign effort has also made Perl scripts easier to compile.
The compiler is still experimental, though, and some of its modules, especially those that optimize code, are not yet fully integrated with the rest of the Perl libraries. It's likely that you'll need to experiment to find the best way to turn scripts into standalone programs.
Why Not to Compile Perl
Perl remains a primarily interpreted language. Some of the language's greatest advantages, such as its ability to bind variables very late in the interpreter process and its lack of strong data typing, make the language difficult to compile and add a lot of extra code to executables. Standalone programs don't run noticeably faster, and Perl gurus say that the executables compiled from scripts aren't any more secure than the interpreted versions.
However, standalone programs compiled from Perl do not need the interpreter or libraries to run. Also, compiling Perl scripts with statically linked extensions built into the interpreter might be a better option for producing specific applications. This could be the best reason to provide compiled programs, especially those that rely on libraries that users must install themselves.
The compiler prefers scripts to have the extension .p, which can be confusing if you also write Pascal programs. In fact, the compiler's design and operation is reminiscent of a Pascal p-code compiler, without the necessity of an extra run-time module.
Executables compiled from intermediate C code are huge as well, when statically linked against the Perl libraries. A minimal "Hello, world!" script like the hello.p script shown in Listing 1 results in a binary that is over 1 MB in size. Scripts that use the Perl/Tk GUI result in binaries of at least 5 MB. When generating C code, the compiler does not translate a Perl script into corresponding C source code. Instead, it generates a C representation of the internal bytecode, including the code for all the library modules that the script uses.
However, generating a C language source file and then using the compiler's optimization can reduce executable size by about 10 percent. Producing stripped binaries, which do not have symbol table information, can reduce the size of an executable by at least another 200 KB.
Building Perl with a shared version of its libraries can also reduce the size of executables, but then you need to include the shared libraries with the program for systems that don't have it already. You must also take care that the version of the library on the target system is the same as the version on the system the script was compiled on.
Table 1 shows the file sizes for three Perl scripts: the hello.p script mentioned earlier, touch.p, which mimics the UNIX touch utility (Listing 2), and textedit.p, a Perl/Tk script for a simple text editor (Listing 3).
Table 1 also lists the sizes of statically linked executables compiled with perlcc, the size of the generated C source files, and the sizes of executables generated by GCC with and without optimization.
The smallest executables are those generated using "gcc -O." Higher levels of optimization actually increase executable size slightly and result in much longer compilation times.
All of the scripts were compiled on a SPARCstation 20 with Solaris 8 and GCC 3.0.3. Other hardware platforms and operating systems produce similar results.
Compiling with perlcc and GCC
The perlcc command-line arguments are similar to those of a C compiler. The following command, for example, generates the executable textedit from the script textedit.p.
# perlcc textedit.p -o textedit
To generate the C source file textedit.p.c, from the textedit.p script, use the -S command-line argument.
# perlcc -S textedit.p
The libperl.a static library and the Perl include files are located in directories where Perl, not GCC, can find them. In addition to these directories, you must also specify which system libraries to link with the program. Example 1 shows the options that you must provide to GCC to compile and link the textedit.p script.
When Static Linking is Better
All compiled scripts also require linking to DynaLoader.a library, whether or not the script loads additional Perl libraries. However, the dynamic module loader is so completely integrated into Perl that the interpreter cannot function without it.
If the size of compiled programs is critical or you want to use Perl in embedded environments (although the language designers recommend against it), you should consider building the Perl interpreter with statically linked versions of the libraries you want to include.
Example 2 shows how to build Perl with the Tk library modules statically linked. You must first unpack the Perl/Tk source code in the Perl source tree. When you run the configure script, answer "n" to the dynamic loading option and specify which extensions you want to include.
In practice, building a statically linked interpreter requires that you specify all of the extensions you'll need. You'll need to enter all of them when configuring the Perl interpeter.
Configuring Perl in this way allows you to build application-specific executables, but you'll need to experiment to determine the best procedure for building the executable programs and which extensions to include in the interpreter.
Conclusion
Although Perl is primarily an interpreted language, it provides the tools and configuration options to produce compiled standalone programs that do not need the Perl interpreter or libraries to be present.
TPJ
Listing 1
#!/usr/local/bin/perl print "Hello, world!\n";
Listing 2
#!/usr/local/bin/perl use IO::File; my $filename = $ARGV[0]; if (! length ($filename)) { print "Usage: touch\n"; exit 1; } if (-f $filename) { # Update access and modtimes on $filename if # it already exists. my $timenow = time; utime $timenow, $timenow, $filename; } else { # create the file my $fh = new IO::File $filename , O_CREAT; undef $fh; }
Listing 3
#!/usr/local/bin/perl use Tk; my $mw = new MainWindow; my $textwidget = $mw -> Text -> pack (-expand => 1, -fill => 'both'); MainLoop;