Channels ▼
RSS

Web Development

Controlling Internet Explorer Using Win32::OLE


June, 2004: Controlling Internet Explorer Using Win32::OLE

Marc is a software developer at VersusLaw.com, and can be reached at Perl@Doorways.org.


For programmers on Microsoft Windows operating systems, OLE/COM/ActiveX can be both a blessing and a curse. For the intrepid Perl programmer, it represents a flexible method of harnessing a variety of applications into new and useful configurations. In this article, I'll use the Win32::OLE module to demonstrate a very basic task: starting an Internet Explorer session from a Perl script, and pointing to The Perl Journal's web site. The code that accomplishes that task is deceptively simple:

use     Win32::OLE;

my  $explorer = new Win32::OLE('InternetExplorer.Application')
                or die "Unable to create OLE object:\n  ",
                       Win32::OLE->LastError, "\n"

$explorer->Navigate("http://www.tpj.com/", 3);

sleep 3;

I'll try to provide some basic background on how this code does what it does, and to give you what you need to get started writing your own OLE code.

What is This OLE Stuff, Anyway?

OLE stands for Object Linking and Embedding and COM stands for Component Object Model. The history of these technologies is rich and varied. I won't go into them in depth here, as there is much available already on the subject. For our purposes, I'll oversimplify just a bit and say that OLE and COM provide the object model and the facilities for controlling Windows applications through a scripting interface. In our case, that interface is controlled from Perl.

Here's what you need to know about OLE on Windows:

  • It's object oriented.
  • Objects have properties (fields or member variables).
  • Objects have methods (member functions).

Here are some other useful but noncritical facts:

  • Objects can generate events to registered handlers.
  • Objects can be written in different languages.
  • Objects load either separately or in related groups.
  • Objects are located and loaded via information stored in the Registry.
  • Many Windows applications and much of the Windows operating system are built using OLE objects.

All of which boils down to the fact that Perl programmers can do a lot of nifty things via OLE.

Note: Some of the statements I'll make in this article are about COM in general. Because the Perl package is referred to as Win32::OLE, I'll refer to the technology in the general sense as OLE.

Back to the Example

Let's reexamine that previously shown snippet of code. First, import the OLE package:

use     Win32::OLE;

Then instantiate an object of that package with the registered name of the Internet Explorer application:

my  $explorer = new Win32::OLE('InternetExplorer.Application')

It's always a good idea to provide some sort of error checking:

    or die "Unable to create OLE object:\n  ",
           Win32::OLE->LastError, "\n"

Actually opening the browser window to The Perl Journal's web site is almost ridiculously simple:

$explorer->Navigate("http://www.tpj.com/", 3);

But let's stop for a moment and talk about this last statement. The variable $explorer is of the package Win32::OLE. It is a proxy object, a stand-in for the actual OLE object that has been created though the arcane rules that govern such things (you do not want to see the C/C++ code for this, though the Visual Basic code is pretty simple).

Invoking the method Navigate() on the $explorer proxy causes an OLE method invocation to be made on the underlying Internet Explorer application object. The Internet Explorer OLE class publishes its API, enabling the Win32::OLE object $explorer to pick up the Navigate() call and pass it along to the real OLE object.

One of the key things to realize is that the basic Win32::OLE class does not have a Navigate() method. Win32::OLE is a generic class and Navigate() is a very specific function that only a browser would know how to perform. What Win32::OLE does know how to do is to query an OLE object for its API and provide "fake" methods (through the magic of Perl) that pass through to the actual OLE object.

The bottom line is that Win32::OLE allows you to pull OLE objects into your scripts and manipulate them as if they were regular Perl objects—just like you can do in Visual Basic, but without having to buy and learn VB.

The last statement:

sleep 3;

allows enough time for the browser to complete navigating to TPJ's site. Without this, you'll get a popup dialog with an unhelpful message. If your program runs long enough after the Navigate() call for the browser to display completely, you won't need this statement.

The Magic Cookie

Perl classes normally provide a new() class method to instantiate new objects. With OLE, use Win32::OLE::new(), which instantiates a Perl object as a proxy for the actual OLE object. Pass a string to specify which OLE class to instantiate.

The string that determines the OLE object to be created is something of a magic cookie. If you are blessed with perfect documentation (and the patience to read it), you may have the solution at your fingertips. If your local library doesn't have the requisite grimoire, however, you have to dig for it yourself.

To illustrate, consider the string used in the example above:

InternetExplorer.Application

What can an object of that class do? Where did the string itself come from? What tools can we use to dig for this information?

The OLE Browser

One of the most useful digging tools is the OLE Browser. It comes with the documentation in the ActiveState release of Perl. Look in the "Table of Contents" pane of the ActiveState HTML documentation for:

ActivePerl Components
    Windows Specific
        OLE Browser

Select the "OLE Browser" link and a new browser window should appear with a number of panes, as shown in Figure 1. In the largest center-ish pane, scroll down a bit and select the entry:

Microsoft Internet Controls

In the next pane down on the left, a list of names and icons will appear. If you scroll down you should find Internet Explorer. Select this link and the pane to the right should fill with more names and icons.

The pane on the left shows classes. The pane on the right shows the methods, properties, and events that are published by the selected class on the left. As you select methods, properties, and events the final pane on the bottom shows details about these entities.

Notice that the InternetExplorer class has an Application method. Using a dotted notation to express this, we have "InternetExplorer.Application," our magic cookie, which is in this case a class name and method name of the class. In other cases the magic cookie will be the library name dotted with the class name. In the case of Microsoft Word, this would be "Word.Application" where "Application" is a class name instead of a method name. Why the difference? Well, for one thing, the class structure exported by an OLE application or component is up to the designer of the application. The interface will reflect the needs and predilections of the designer.

In this case, there is also the lack of a document object—this would be different with a big MDI application such as Word or Excel. A web browser is stateless, so some of the normal API structure collapses, resulting in a nonstandard magic cookie.

Frankly, without detailed documentation, finding the magic cookie is often a matter of trial and error. Using the OLE browser is one way to find what you need, but you may also need to know a bit about the Registry. It's time to break out your Registry editor to look for available OLE class names.

The Registry

The Registry editor is available by bringing up the Run... dialog from the Start menu. Enter "regedt32.exe" and press OK (on some versions of Windows, there may be a different name for the executable, such as "regedt.exe"). We're only going to use the Registry editor for browsing—for gosh sakes, don't change anything unless you know what you're doing. Because all of Windows is controlled from the Registry, it is possible to do some real damage (e.g., you could make Windows forget how to find drivers and libraries it needs to boot).

The Registry editor comes up with several subwindows that describe different Registry trees. The HKEY_CLASSES_ROOT tree has a lot of entries. These include a lot of file suffixes and some file-type descriptors, and also a larger number of what you can think of as OLE class names (I'm glossing over a lot of details here that aren't particularly relevant to our task).

For example, you can find the following:

InternetExplorer.Application
Shell.Application
Word.Application

I'm using the first one for the example in this article. The second one (Shell.Application) can be used to do things like create shortcuts. The last one allows scripting of Microsoft Word.

Note that not all of the OLE classes can be created directly. Many of them are classes that can only be instantiated by an Application object. The entries are also mixed in with a lot of other stuff that you don't care about, like the suffixes and suffix classes I mentioned earlier.

Nevertheless, you can often find the magic cookie you need by searching this part of the registry. But then what? How do you get from there back to the OLE Browser? Let's focus on InternetExplorer.Application. Find that in the Registry Editor under HKEY_CLASSES_ROOT. Now double-click on that entry to expand it. Select the CLSID key under it and look at the value in the right-hand pane as shown in Figure 2:

{0002DF01-0000-0000-C000-000000000046}

This is a magic number known as a "GUID." Each OLE class will have one of these, and this is the true name of the class. Never mind where it came from—Microsoft says it's unique across all time and space. So this arcane incantation is what really connects everything together. One of the side effects of this is that you can use the CLSID in the Win32::OLE::new() statement to instantiate the object. This is slightly faster during execution, but much less readable in the code.

Now scroll HKEY_CLASSES_ROOT back up to its CLSID entry and double-click to expand that into a really big list of GUIDs. Look up the one GUID for InternetExplorer.Application and double-click it. Select the key TypeLib and in the right-hand pane find:

{EAB22AC0-30C1-11CF-A7EB-0000C05BAE0B}

(See Figure 3.) Okay, you're almost there now. Scroll HKEY _CLASSES_ROOT down to its TypeLib entry and double-click to expand that into yet another really big list of GUIDs. Look up the type library CLSID you just found and double-click it. Under that find a key "1.1" that, when selected, says "Microsoft Internet Controls" in the right-hand pane (your library may be a different version number); see Figure 4.

The name you just found is the name you looked up in the OLE Browser. If you look in the top pane of the OLE Browser, it says that, after all, it's the "Win32::OLE - Type Library Browser." What you have done is tracked through the Registry from the name of the class you want to instantiate to the name of the type library (TypeLib, remember?) in which that class resides. Now you can look it up in the OLE Browser and find out what methods, properties, and events it has.

Of course, that's way too much work, so you can use the "typeLib.pl" script in Listing 1. Just feed it a class name from the registry and it will look up the rest of the data. It will actually search for data using the argument string as a pattern, so you can search on things like "InternetExplorer" and get:

InternetExplorer:
  InternetExplorer.Application:
    CLSSID:   {0002DF01-0000-0000-C000-000000000046}
    TypeLib:  {EAB22AC0-30C1-11CF-A7EB-0000C05BAE0B}
    Library:  Win32::TieRegistry=HASH(0x1b4cb5c)
      1.1\ => Microsoft Internet Controls
  InternetExplorer.Application.1:
    CLSSID:   {0002DF01-0000-0000-C000-000000000046}
    TypeLib:  {EAB22AC0-30C1-11CF-A7EB-0000C05BAE0B}
    Library:  Win32::TieRegistry=HASH(0x1d10134)
      1.1\ => Microsoft Internet Controls

Note that there is a "generic" entry and a version-numbered entry. This is because the underlying model behind OLE provides a way to support multiple versions of OLE libraries on the same machine.

The bad news is that all of this still won't guarantee that you find the type library name. It is something of a black art. Sometimes you just have to break down and read the relevant documentation or go searching the Internet for an appropriate incantation.

A Contrived Example

Now for a contrived example using OLE to start Internet Explorer. For this, I'm going to introduce the HTTP::Daemon package that comes standard with Perl.

Actually, this isn't such a contrived example, at least for me. I'm really lazy and GUI code takes a while to write. So whenever I can, I use a command line interface, but sometimes I need something slightly more interactive. Web pages are a kind of GUI, and HTML is much simpler than most GUI toolkits—which is where HTTP::Daemon comes in.

The example program, asparagus.pl (see Listing 2), is an extremely simple server that shows the time as constructed by the Acme::Time::Asparagus module available via CPAN or the ActiveState PPM repository. It's just a bit of fluff, but it shows the basic mechanism I've used on a number of occasions. It's a relatively simple way to handle forms and/or different pages, providing a quick-and-dirty GUI without a lot of hassle.

The use of Win32::OLE to start Internet Explorer when the server is executed is just a small tweak to save some mouse clicks. Since the server stays running, the sleep statement is not needed. Usually, I add a form or link that can be used to shut down the server, making the whole thing kind of self contained.

Other Things You Can Do With OLE

Starting Internet Explorer, while simple and easy to illustrate, is not exactly the best demonstration of the power of OLE, in the sense that the object model is shallow and not really standard. Other real-world examples are much more impressive. It is possible to drive Word, Excel, PowerPoint, and virtually any OLE-enabled application. I've created Excel documents, used Word to convert Word documents to HTML, and completely disassembled PowerPoint documents using this mechanism.

The object model for fully OLE-enabled functional applications is more robust. An application has multiple documents, which are objects in their own right. Each document then contains many hierarchical levels of objects, each of which may have its own properties, methods, and events. The internal structure of the document becomes easily accessible.

The result is that Perl, the perennial glue program, becomes that much more useful. Using OLE, you can harness an even greater variety of applications to ever more peculiar purposes.

TPJ



Listing 1

use     strict;
use     warnings;

use     Win32::TieRegistry;

die "usage:  typeLib <classname>\n"
    unless @ARGV;

my  $classes = $Registry->{"Classes\\"};

die "* Unable to get HKEY_ROOT_CLASSES from Registry\n"
    unless UNIVERSAL::isa($classes, 'Win32::TieRegistry');

for my $class (@ARGV) {
    print "$class:\n";
    
    for my $key (keys %$classes) {
        next unless $key =~ /$class/;
        
        my  $clsid = $classes->{$key}->{"CLSID\\\\"};
        
        $key =~ s/\\+$//;
        print "  $key:\n";
        
        unless ($clsid) {
            print "    * No class ID\n";
            next;
        }
        
        print "    CLSSID:   $clsid\n";
        
        my  $typid = $Registry->{"Classes\\CLSID\\$clsid\\TypeLib\\\\"};
        
        unless ($typid) {
            print "    * No type library ID\n";
            next;
        }
        
        print "    TypeLib:  $typid\n";
        
        my  $typlb = $Registry->{"Classes\\TypeLib\\$typid\\"};
        
        print "    Library:  $typlb\n";
        
        unless ($typlb) {
            print "    * No type library key\n";
            next;
        }
        
        print "      $_ => $typlb->{$_}->{'\\'}\n" for keys %$typlb;
    }
}
Back to article


Listing 2
use     strict;
use     warnings;

use     Acme::Time::Asparagus;
use     HTTP::Daemon;
use     Win32::OLE;

my  $port = 9183;   # just pick one

# Generate the HTML page with the vegetable time:
sub timePage
{
    my  $conn = shift;
    $conn->send_response(<<VEGETABLE);
200 OK

<html>
<body>
  <h2>Vegetable Clock</h2>
  <p><em>At the tone the time will be:</em>&nbsp;
     <strong>@{[veggietime]}</strong></p>
</body>
</html>
VEGETABLE
}

# Main program, start Internet Explorer first:
my  $explorer = new Win32::OLE('InternetExplorer.Application', 'Quit')
                or die "Unable to create OLE object:\n  ",
                       Win32::OLE->LastError, "\n";

$explorer->Navigate("http://localhost:$port/", 3);

# Start up a tiny, single-function HTTP server:
my  $daemon = new HTTP::Daemon(LocalPort => $port)
              or die "Unable to create daemon\n";

while (my $conn = $daemon->accept) {
    while (my $rqst = $conn->get_request) {
        timePage($conn) if $rqst->method eq 'GET';
    }
    $conn->close;
    undef $conn;
}
Back to article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 
Dr. Dobb's TV