Channels ▼
RSS

Script Junkie | REBOL Bots


Script Junkie | REBOL Bots (Web Techniques, Sep 1999)

REBOL Bots

By Carl Sassenrath

Over the past two decades, my search for the perfect scripting language led me to work with companies such as HP, Amiga, and Apple. In that time, I investigated more than 50 different languages, from Ada to C, from Pascal to Lisp. I wanted a language that was very simple and readable with almost no syntax, yet very flexible with a wide degree of expressive freedom. It needed to allow a script to run on a great number of platforms without modification, have an extensive set of built-in data types that felt natural to humans, and smoothly support all of the standard network protocols, such as HTTP, FTP, POP, SMTP, NNTP, time, finger, whois, and more. And finally, so that I could use it everywhere, I wanted the entire package to be small (less than 200KB) with no installation hassle -- just copy it and run. More than anything else, I wanted a language that was friendly, usable, and highly productive.

In the mid 1980s, I stumbled across a mathematics called denotational semantics, which provided great insights into the structure and meaning of language. From there I developed the basic principles of my design and merged many of the best concepts from the languages I encountered along the way.

The result of this quest became the Relative Expression-Based Object Language (REBOL, pronounced REB-el), which was released free of charge late last year. I call REBOL a messaging language, because it's intended to be used in the same way as English: for communications, not just algorithms. It works equally well for expressing data as well as code. This aspect is its greatest strength; it's meant to offer a better approach to the exchange and interpretation of information among people, computers, and application software.

In this article, I will introduce you to the language with a few bots and agent scripts. REBOL is an ideal language for creating such scripts because of its built-in support for so many Internet protocols.

Diving In

One of the best ways to get a quick sense and overview of REBOL is to dive in to a few examples and get a feel for the syntax, structure, and benefits that it provides. Listing One provides you with such an introductory example. It's a simple script that scans Web pages for specific words and emails that page to you if it finds one of them. It's a working example, although it's simplified for clarity.

On the surface, it's evident that REBOL has very little syntax. The language is designed to flow in "sentences" rather than being chopped up with parentheses or statement separators. You'll notice that values are grouped together with blocks enclosed in square brackets, regardless of whether they're considered code or data. Each of the values is of a particular data type, and this example includes strings, a date, an email address, and a URL fragment (http://). In addition, the REBOL language supports a couple dozen other built-in data types including times, numbers, money, tags, filenames, tuples (such as IP numbers, RGB colors, or version numbers), and more. Notice, too, that most data is written in a style that's naturally readable by humans. Dates, for example, are written as you'd expect, URLs are written as URLs, and you'll find that money, time, tags, and other types follow this approach.

The Code-Data Duality

REBOL is more sophisticated than it first appears. Although the script looks just like code, in fact, everything within the script is also data. Strings, dates, and email addresses are obviously data, but so are the words and the blocks of the script. This is an important aspect of REBOL, and it makes the language highly reflective. Reflectivity is a language's ability to deal with itself. REBOL is, in fact, its own metalanguage. Here's an example that illustrates this further. It's typed directly into the interpreter command line:

phrase: [print "hello"]
print first phrase

This defines the word phrase and prints the first value within it (the word print). You could also evaluate (execute) the block held by the phrase word:

do phrase

This would display the string "hello." You could even pull out the individual words of the block and evaluate them separately:

do first phrase "hello world!"

In this case, do evaluates the print word, which prints the string argument:

hello world!

To help understand the full potential of this approach, consider the line:

if now/time > 10:30 phrase

Here the phrase block is evaluated, and the Web page displayed only if the time is past 10:30. The best way to understand this is to notice that if invokes a function that takes two arguments: the condition and the block to evaluate when the condition is true.

In Listing One, you've probably also noticed another reflective quality of REBOL. At the beginning of the script there's a descriptive header that makes scripts self-identifying. This isn't just a string, but an actual block of REBOL code/data that provides a standard method for identifying the purpose and attributes of a script. Because it's written in REBOL, other scripts can easily access this header, which means you can treat a group of scripts as a collective database to build cross-references or a script index. This is how we build the script libraries at www.rebol.com/library.html, and the script that does it can be found there. In addition, the header allows scripts to be embedded in other kinds of environments, such as email or newsgroups, or even on a Web page. You've seen that the header of a typical script looks like Example 1.

Of course, all fields are optional (but the REBOL [] is required), and there are many other standard fields that you can provide as well. In addition, you can extend this header with your own special fields to suit your requirements, such as a project identifier, department number, site location, and so on. Notice the % that's used to denote the filename. This is required to distinguish filenames from normal words.

A header is followed by the REBOL script contents, which can be either data or code, depending on the purpose of the script. Data can include code, and code can include data. This fact, coupled with the tag datatype, lets REBOL efficiently generate HTML output, because the HTML can be written directly in REBOL.

Mail Displayer

Listing Two details a simple bot that scans a POP mailbox and builds an HTML page summary containing the email sender, date, and subject information. Notice that HTML tags are written within blocks directly as tags.

Something new that you may have noticed is the braced multiline string in the header. This format is used for strings that span more than one line. All of the text, including line terminators, become part of the string.

The emit function is defined to append output to a string that holds the HTML result. The reduce used within emit is something special. It's a function that, when given a block of code and data, evaluates each segment of the code and returns a set of results as a block of data.

The following example reads the entire POP email at one time. The email is read from your server and is not deleted. This works well for small mailboxes, but larger ones might be better off with a loop that goes through emails one at a time.

mail: open pop://user:pass@mail.server.com
forall mail [print first mail]
close mail

The forall function is used to move through the mailbox one item at a time.

Web-Sniffing Bot

The Web-sniffing bot downloads all files of a particular type from a Web page. Listing Three shows the script grabbing all the linked HTML from a single page of The Onion's archives. However, it could be modified quickly to download armloads of images, MP3s, software, and so on.

If you want to point it to another site, just change the first few setup lines. For instance, if you write:

site: http://astro.pas.rochester.edu/Jupiter-Comet/
start: "jupiter-comet.html"
suffix: ".gif"

you'll end up with a bunch of images of comets smashing into Jupiter. This code also serves as the basis of what could be a much smarter bot. For instance, Web galleries often put their graphics on separate pages, so you might want to first visit all the linked Web pages on a site, then collect the images from each page separately.

Note that the script downloads only files that are on the site. It ignores HREF links that start with http:// to avoid grabbing content from remote locations.

You can see that REBOL has great expressive power that lets you to build this basic bot in less than half a page of code. You can then spend the rest of your development time making the bot as smart as it can be. It's fun to write a bot in REBOL because you don't have to fight the language to make the bot understand how to do basic Internet navigation. REBOL gives your bots the built-in ability to talk to many TCP/IP protocols.

Web-Server Agent

Here's an interesting application: your own private Web server, but with secret URL commands. With it you can add commands to the URL that are executed directly by the server code, not by CGI.

To show how to build it, I begin with a simple Web-server script. Nothing special is required of your system. This script will work on most machines, even clients (that is, it doesn't need to be on a server.) Listing Four contains the server code.

The script sets up a TCP/IP listen socket on port number 80 and waits there for a socket connection containing an HTTP get request, then returns the appropriate file. If you already have some other Web server running on your machine, you can use another port number such as 8080. To test it, use your Web browser. Just remember to specify the port number after the server name (for example, http://your.server:8080/file.html).

To make the server handle special commands, you'll need to modify the parsing of the filename. This is where you add your own special commands. Example 2 uses words separated by + for commands, but any reasonable URL syntax will work.

Of course, keep in mind that you're creating a nonstandard gateway interface that only your server will respond to. Listing Five shows the change in the Web-server code to make this enhancement.

Remote Email Agent

Have you ever arrived home after work only to find you forgot to do something on your computer at the office? Or, perhaps it's the other way around. You get to work and forgot your files at home. Perhaps you need to fetch a proposal you've been working on, or need to check your email for a critical letter from your editor, or maybe you forgot to remove a confidential file from a directory. Here's a script that will help: a simple email agent that can invoke a number of commands via email on a remote computer system that has been set up with the REBOL script in advance.

Here's how it works. First, start the REBOL email monitoring script for the Remote Email Agent ( Listing Six) to check your mailbox. When you need to access the system remotely, you send it an email message with the actions as part of the message. In the email subject line of the message you place a unique word to indicate that this is a special message that is to be interpreted. (Note: This should not be your POP or login password, but can be any other string, even a phrase of words that looks like a valid subject line to help disguise it.) This word, in conjunction with your originating email address, indicates that the message contains commands. You can specify a list of originating email addresses from which remote access is allowed.

The content of the email contains a simple REBOL dialect that determines the actions to process. A dialect is a sublanguage written in REBOL. It uses the same lexical form as REBOL (allowing strings, dates, times, and so on) but has a different grammar and vocabulary. For instance, this agent communicates using a dialect to specify as many actions as you need within the same message.

The first line in Example 3 will search all your new email and forward messages that came from the specified address. The second will forward all email that contains the word "urgent." The next three will send you the file listing for the docs directory, send a file from that directory, and delete a file in that directory. Notice that all file names are written in REBOL format (file and directory names begin with a percent sign). The last line will force the script to quit. You need to include it only if you want to prevent any further actions.

The Remote Email Agent script checks your email every 10 minutes over a period of 12 hours. When the script checks your email, it doesn't delete it from the server, but it does keep track of where it left off so as not to execute multiple remote-access scripts.

If the email originates from a known address, and the subject line contains your unique "password," then the contents of the email are loaded. This translates the entire email into a REBOL block. In case there's an error in the message and it can't be translated, an error email is returned; otherwise, the operations are passed to the do-ops function.

The do-ops function steps through pairs of operations and arguments. This works because all operations have arguments with the exception of quit. If additional operations are required that have varying numbers of arguments, the foreach loop can be modified by making it a forall loop.

The operation is then found in the action block using the select function. If found, its associated block is evaluated. If it's not found, an error message is sent back to the sender. The content of the action block is pretty obvious. In the first case (the find) the choice is made between an argument that's a string and one that's an email address. Note that if it's neither, the find action is simply ignored.

That's all there is to it. The operations sent to the script via email represent a dialect of REBOL. They're written in a consistent source format, which is not evaluated directly by the interpreter, but rather by the script itself. Because this dialect is so simple, it is parsed using a select function. If the dialect were more complicated, the parse function could be used on the block, with a great deal of flexibility.

You can also use REBOL to issue the email commands, as in Example 4.

Final Note

You've seen that the bots and agents in the above examples are small and readable. They can all be extended without much effort, and I hope that the language seems intuitive enough for you to dive in and do so.

Of course, the design of REBOL extends beyond the types of examples shown here. It strives to offer a new and more natural approach to communications, not only between you and a computer, but among computer applications. It takes a big step toward unifying and simplifying expressions by using language techniques that are more like our natural human languages. In the end, it is meant to offer you greater productivity through greater expressive economy. And this, I hope, will give you an edge against the rising tide of complexity that has swamped our modern computer systems.

(Get the source code for this article here.)


Carl is the creator of the REBOL language. With more than 20 years of experience, his work includes the design and implementation of the Amiga OS, and management and engineering positions at Apple, Commodore Amiga, and HP. He can be reached at carl@rebol.com.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video