Structured Programming

SEP88: STRUCTURED PROGRAMMING

Introduced in 1982, Modula-2 is a newcomer that has only lately begun to gather a following. The evidence of its rising importance is the recent introduction of several new Modula-2 compilers for the PC; there's a comparative review elsewhere in this issue. This review discusses specific implementations. Here we'll discuss the language as a whole.

At first glance, Modula-2 looks a lot like Pascal. Don't let appearances deceive you. Both languages were sired by the venerable Niklaus Wirth, so they share genetic similarities. While Modula-2 is a new language, every bit as powerful as C - perhaps it's even more so in some ways.

To describe Modula-2 as the successor to C would be tantamount to picking a fistfight with most readers of this magazine. Certainly it's like cussing in church. So let me hasten to state that I'm not advocating Modula-2 or predicting the demise of C, which still has plenty of good lines of code left in it. Personally, I like C and I've written a lot of stuff in it. But I do believe that Modula-2 is an alternative to C and that it is a language well worth learning and using.

You might say that Modula-2 combines Pascal, without its defects, and C in a more readable format, plus extensions.Wirth developed Modula-2 as a language for writing operating systems, utilities, and the applications that run under them. Thus, unlike Pascal, which was designed as a teaching language, Modula-2 rests solidly on a foundation of real-world software requirements.

Because Modula-2 is an evolutionary outgrowth of Pascal and an alternative to C, it's instructive to compare it with both, and that's what we'll do. This isn't a comprehensive treatment, but it covers most of the main differences.

First, Modula-2's similarities to Pascal: The definition of data structures, user-defined types, constants, and variables in Modula-2 is virtually the same as in Pascal, with some extensions mentioned later in this article. The assignment statement (a := b) is the same, as are the use of semicolons and the ability to pass parameters either by value or by reference. The latter two are very close to C conventions as well.

Semantic Differences

On a superficial level, one of the first things you notice about Modula-2 is that it's case-sensitive like C and unlike Pascal. Whether or not that's an Improvement is debatable; doesn't make much sense to me, but that's how it is. All Modula-2 keywords are uppercase. By stylistic convention, user-assigned identifiers are lowercase, but can include optional capitals. Case sensitivity means that, for example, readkey, readKey, and ReadKey are three different identifiers.

Modula-2 offers the same data types as Pascal, plus two new ones: CiRDINAL and BITSET. A CARDINAL is an unsigned integer and a BITSET is a machine word whose bits are individually manipulated with unique operators. These are C-type additions that have no counterparts in standard Pascal.

Commercial Pascal compilers implement the nonstandard string type STRING [n], which is usually an ARRAY [1.. n) OF CH4R with a length byte in the 0th element. Instead Modula-2 implements this string type as an A1tRAY [0. .n] OF CHAR with a null terminator, which is exactly the same string type that C has.

Pointer declarations are more intuitive in Modula-2 than in either Pascal or C. Here is an example:

VAR intPtr : POINTER TO INTEGER;

Modula-2 lets you declare the absolute address of a variable using the format (in the PC)

VAR absVar [segment:offset] : <type>

This format is similar to Turbo Pascal, but is generally not available in other Pascals or in C. This is extremely useful in system programming - for example, when you need to refer to fixed-location system variables such as the ROM BIOS data area.

Both C and Pascal are inconsistent with regard to the application of block statement delimiters (BEGIN/END in Pascal and paired curly braces in C). Constructs such as IF, WHILE, and FOR can control both single statements and statement blocks; for a single statement, delimiters are not used; otherwise, they are. Thus in Pascal, we have

If a = bthen
    c := d;

but

If a > b then
Begin
     c := a;
     a := b;
     b := c
End;

C syntax exactly parallels these. Not so in Modula-2: all such constructs are blocks by definition. Thus, you do not need to use BEGIN since it's implicit in IF, WHILE, or FOR. The construct always concludes with END, even if it only controls a single statement. Thus, in Modula=2 the cases that are analogous to the ones above are as follows:

IF a = b THEN
     c := d
END;

and

IF a > b THEN
     c := a;
     a := b;
     b := c
END;

This consistency makes it easier for beginners, and saves everyone the trouble of having to run down unbalanced block delimiters.

The Pascal loop constructs carry into Modula-2: REPEAT...UNTIL, FOR, and WHILE. The latter two have direct counterparts in C, and REPEAT...UNTIL is similar to (but the opposite terminating condition of) C's do... while. Additionally, Modula-2 offers an open-loop structure using the LOOP and EXIT keywords. LOOP begins an iterative block terminated by END; EXIT transfers control beyond END. This example shifts a string to uppercase:

i := 0;
LOOP
      IF strng [i] = CHAR (0) THEN
           EXIT
      ELSE
             strng [i] : = toupper (strng [i])
      END; (* of IF *)
      INC (i)
END; (* of LOOP *)

Note that EXIT skips over intervening statements, such as the incrementing of i.

Modula-2's CASE statement is better than Pascal's because BEGIN/END are not needed to delimit a multi-statement selection, and because there is an ELSE condition of last resort. The ELSE condition is usually provided by commercial Pascal compilers, but this situation is outside the standard one. To eliminate BEGIN/ END, Modula-2 employs an OR-bar in lieu of a semicolon (or a break statement in C) to indicate the end of a selection. Here is an example:

     CASE x OF
               1:        A;
         B  |
   2:           B  |
   3.. 7:      C
      ELSE
        IllegalCase (x)
      END; (* of CASE *)

The Modula-2 CASE statement is still not as flexible as C's switch(), though. CASE doesn't provide for flow from one case selection to another. In C, you could write this as

     switch (x) {
           case 1:   A();
          case2:   B();
         break;
          ...
} /* end of switch() */

Here, when x is 2, the program does B only; when x is 1, it does A and then B, flowing from case 1 to case 2. In effect, the label for case 2 is in the middle of the sequence for case 1, and that's legal in C. It's not in Modula-2.

In Pascal, subprogranis can be functions or procedures. A procedure has some effect external to itself, whereas a function returns a value assignable to a variable or that is eligible for output. C has only functions, although they might be procedural, value-returning, or both; a Pascal function can also do both. Modula-2 brings symmetry to the party by providing only procedures. The nature of a Modula-2 procedure is identified by its heading, just as it is in C, but the declaration is more Pascal-like. The heading

     PROCEDURE hypSin (theta : REAL): REAL;

defines a function that returns the hyperbolic sine of theta as a REAL. Conversely, the heading

     PROCEDURE upShift (VAR strng: ARRAY OF CHAR);

declares a procedure that presumably shifts its parametric string to upper-case. The Modula-2 calls are just as they would be in the corresponding C or Pascal incarnations:

hsim := hypSin (angle);
upShift (aString);

Like a Pascal or C subprogram, a functional procedure in Modula-2 can both return a value and have an effect external to itself.

But Modula-2 adds a couple of wrinkles that will take programmers in both camps by surprise. The first is a RETURN statement. Pascal doesn't have one. C does, and Modula-2's is identical to it. An unadorned RETURN anywhere within the procedure body forces a return without value, which is legal inside any purely procedural subprogram. You can use the statement

RETURN x;

in a functional procedure to send a value back to the caller; x can be a local variable or an expression.

The second surprise in Modula-2 is that the END statement that closes a subprogram must include the procedure's name. Thus, the outiine of hypSin in Pascal might be

FUNCTION hypSin (theta : REAL) :
REAL;
VAR result : REAL;
BEGIN
      (* Compute result, then... *)
      hypSin := result
END;

In Modula-2, the same outline is written as

PROCEDURE hypSin (theta : REAL) :REAL;
VAR result : REAL;
BEGIN
   (* Compute result, then...
   RETURN result;
END hypSin;

If you check back to the declaration of upShift and you'll discover that one of the long-discussed defects of Pascal has been removed from Modula-2; the parametric string has no stated subscript range. Because of its strict type-checking, Pascal has never been very smart about letting subroutines accept arrays of varying sizes. Microsoft Pascal has a thing called a superarray and Digital Research's Pascal MT+ honors conformant arrays, but these arrays are implementation-specific stabs at getting around standard Pascal's hidebound stupidity when it comes to array-handling in subprograms. In general, if you want to process a three-element array of some type in Pascal, you need a subroutine that is different from one that you need to process a four-element array of the same type in an identical manner. C is a little smarter: you can pass a second argument indicating the number of elements.

Modula-2 is a lot brighter in this regard than both of the other languages. Its procedures accept a parametric array of any size. The only restrictions are that the array must be one-dimensional and of the declared data type. Within the body of the program, the standard function HIGH (at-rayname) yields the upper-bounding subscript of the passed array. Thus, you might write upShift as

PROCEDURE upShift
   (VAR strng : ARRAY OF CHAR);
VAR i : CARDINAL;
BEGIN
  FOR i := 0 TO HIGH (strng) DO
    strng [i] := toupper (strng [i]);
  END (* of FOR *)
END upShift;

You can do similar operations involving HIGH on arrays of any other data type, including user-defined types such as arrays of structures. Modula-2 makes it easier to deal with arrays in subroutines.

I/O Handling

Despite the syntactic similarities, Modula-2 is more like C than Pascal. The 40-word core language of Modula-2 deals chiefly with declarations, definitions, expressions, and import/export lists (discussed later). In other words, the defined language is primarily concerned with internal matters and does not address I/O. For I/O, it relies on external libraries. This reliance is C-like and very different from Pascal.

Pascal comes with built-in I/O procedures. The main ones are WRITE (IN) and READ(LN). These procedures are remarkably supple routines, capable of communicating with the console, disk files, and auxiliary devices such as the printer and those attached to the serial port. The C standard library contains counterparts, which are the printf() and scanf() families.

Architecturally, there are two fundamental differences between Modula-2's WRITE and C's printf(). First, WRITE is built into Pascal, whereas printf() comes from an external C library. Second, WRITE encompasses virtually all output, whereas the version of printf() (fprintf() sprintf(), and so on) that the C programmer selects depends on the output destination.

Yet, they share one similarity: in both cases, they must evaluate the data types of a variable number of parameters, and must act on those types as appropriate. In one instance, you might tell the routines to output a string, then an integer, then another string, then a REAL (double to you C guys) to a file. In the next instance, the types and order might be entirely different, as well as the destination.

This flexibility is great from the programmer's viewpoint because it limits the number of output alternatives. Just pick the call you need, set up the parameters, and away you go.

But it has penalties in terms of .EXE program size and efficiency. In both Pascal and C, the output functions must be able to pick apart the parameters and dispatch subtasks to handle them. This additional work all adds up to overhead.

Modula-2 takes a radically different approach. Each I/O routine is bound to a specific data type. Thus, there are different library I/O routines for integers, reals, cardinals, strings, and so on. There's even a separate routine (WriteLn) to advance to the next output line.

          This program executed 10 times
            <CR>

In C, you might write the statement

printf
  ('This program executed %d times\n', ntimes);

The corresponding statement in Pascal is

     WRITELN ('This program executed', ntimes, 'times');

Modula-2 requires the following:

WriteString ('This program executed');
WriteInt (ntimes);
WriteString ('times');
WriteLn;

This hardly seems an improvement at first glance. One source line in either C or Pascal becomes four in Modula-2. But consider the implications. No Modula-2 I/O routine needs to interpret its parameter list; it merely acts on a specific, predictable data type.

The net result is a reduction in executable complexity and, thus, overhead. Modula-2 trades more verbose source language for more efficient programs. The distinction is subtle and hard to accept at first, but it pays off in the long run.

Modula-2 also eliminates one of Pascal's bad ideas: the typed file. According to Pascal doctrine, any given disk file consists entirely of one data type: byte, real, user-defined, or whatever. It doesn't work this way in practice, though. Many software systems require a variable-length file header consisting of several different control structures, followed by a series of fixed-length data records. You can convince Pascal to work with such files (see DDJ, May 1988, page 92), but the solution is ugly. That's one reason why C is preferred over Pascal for many applications: C doesn't require typed files.

Neither does Modula-2. Like C, it provides the generic typed object File, and it doesn't care what you read or write once you open the file. Modula-2 implementations generally furnish separate routines for text and binary file I/O, including data structures, random access, and file-pointer positioning. Consequenfly, Modula-2 is as capable as C at manipulating files.

Modularity

The name Modula -2 comes from the language's emphasis on program modules at both the source and compiled levels. Pascal borrows this concept when it furnishes units consisting of interface files and implementation files. A rough analogy also exists in C, when an .H file contains function prototypes and other declarations involving a library. But neither Pascal nor C carries the modular idea as far as Modula-2 does.

Modula has three kinds of modules:

Definition - which is a list of constants, types, variables, and procedures exported by an associated library. This module is the library's interface, and describes what is available from the library.
Implementation - which is code that implements the elements shown in the definition module.
Program - which is the highest-level portion of a program, containing the main body and supporting elements.

Here is an example that illustrates how these pieces fit together. Let's say you've developed a set of general-purpose routines for managing the user interface. You can call these routines from any program, so you put them into a module named USERINTMOD. The compiler, finding that USERINT.MOD begins with the keyword phrase IMPLEMENTATION MODULE, arranges the result into library format.

Because you must describe what the USERINT library contains, you write a companion file called USERINTDEF. This file opens with the identifier DEFINITION MODULE and consists of three main parts. The first part is an optional EXPORT QUALIFIED list, which gives the names of all identifiers from the USERINT library that are externally visible. This list is the same as a list of PUBLIC statements in assembly language. Because the export list just shows names, it's necessary to explain what they are, which leads to the remaining two parts of the definition module. One part declares the public constants, types, and variables in the usual fashion. The other part lists the headings of exported procedures.

In most Modula-2 implementations, you must also compile the definition module. The output is a symbol file -- usually with a name such as USERINTSYM -- that contains the identifier names and associated binary-encoded control information. The source listing serves as a reference for programmers who want to use the library.

Now, the USERINT library is available to other programs. Suppose you write an accounting package called BOOKKEEP, which contains the main body of the application and some local procedures and variables. The source program is called BOOKKEEP.MOD, and begins with the keyword MODULE.

Modula-2 requires that you notify it of the libraries and their associated identifiers that you intend to use. Consequently, the heading of BOOKKEEP.MOD begins with one or more import lists. In the case of USERINT, you write

FROM USERINT IMPORT <comma-delimited list of identifiers>;

if you want to use some of the symbols. Alternatively if you want access to everything from USERINT, you can alternatively write

IMPORT USERINT;

When the compiler encounters an import list, it opens the named file, which is the compiled definition module in this case, and looks up the imported identifiers, incorporating them into its symbol tables. In some implementations that have proprietary linkers, the compiler also builds a list of libraries that it will search to resolve external references at link time.

Although this profusion of modules seems complex, it offers a number of advantages. One is that a program can use imported identifiers as though they were its own. As an example, if USERINT exports a type called PopWindow, the importing program can declare variables of type PopWindow. Similarly, the user program can simply refer to variables and call routines exported from USERINT without further qualification.

Another advantage of this approach is implementation hiding. The definition module shows what is externally visible in a library, without divulging how the library actually does its thing. This is important when distributing commercial libraries for sale, and also in group projects. If you decide to market your USERINT library, I can buy the object form of the implementation, plus the source for the definition. After listing the definition file as a reference, I compile it into a form understandable to my compiler. Then I simply use the library, without having detailed knowledge of its inner workings.

An offshoot of implementation hiding is a unique feature of Modula-2 called an opaque type. This type is a data type defined in the implementation but not in the definition module, which simply exports it by name without elaboration. An opaque type is usually a record type whose format and fields are unknown to the user program. The program can declare variables of the opaque type and pass them around, but the program can't extract from or write information into the fields. For that, the program must rely on library routines. An opaque type allows library implementors to alter the data structure without affecting programs that use it, and also to prevent users from tinkering with field contents that might cause the library routines to go insane.

Standard Modules

Like C with its .H files describing the standard library, Modula-2 is surrounded by a number of standard modules. Most of these modules are concerned with I/O, but other modules deal with dynamic allocation, interprocess communication, transcendental functions, and so on. In the de facto standard work Programming In Modula-2, Niklaus Wirth specifies a core set of standard modules, including their names as well as the names and purposes of the procedures they incorporate.

Vendors of Modula-2 compilers usually ship additional libraries to enhance the functionality of their products. These might do such things as embedding hard code for an 80x87 coprocessor, providing timing routines, windowing, graphics support, mouse interface, and the like. The more libraries and procedures the product provides, the more immediately usable the package is.

It's not uncommon for two libraries to contain duplicate procedure names that do different things. For example, the standard modules InOut and FileSystem both furnish a procedure called WriteChar. Of course, if you use only one of these modules, it doesn't matter; the conflict only exists when you import WriteChar from both modules. In that case, you have to tell the compiler which WriteChar you mean. You do this by prefixing the source module name, using a period as the separator. Here are the code lines that refer to both versions of WriteChar:

InOut.WriteChar (ch); (* to display *)
FileSystem.WriteChar (fyle, ch);
         (* to file *)

Coroutines

Neither Pascal nor C supports concurrency direcfly through elements of the language, but Modula-2 does. In Modula-2 terminology, concurrent processes are called coroutines.

Normal routines observe a superior/subordinate relationship: B is called by A, which is called by the main program. Therefore, main is superior to A, and A is superior to B. B can never call A because B is subordinate toA.

Coroutines, on the other hand, are equals. If C and D are coroutines, each can transfer control to the other. C might run for a while and then give control to D, which can later switch back to C, and so on. The coroutines thus achieve concurrency by sharing the machine on a time-multiplexed basis.

Extending this idea a little further, you can see that C might be a scheduler task that dispatches a number of other coroutines (D, E, F, and so forth). When D gains control, it might run through one iteration of a loop and return the machine to C, which then dispatches E. This process continues until it comes full circle and D again gets control.

Each coroutine owns its own workspace where it maintains local variables, a process stack, and a state table. When a coroutine relinquishes control, it saves its context in the state table, and then sets a global pointer to indicate where it left off. The local stack and variables remain intact. When the coroutine is again activated, it reloads its registers from the state table and resumes execution. Coroutines can communicate among themselves using global variables.

The program registers a coroutine by calling the Modula-2 procedure NEWPROCESS. This call initializes the workspace and the global pointer associated with the coroutine. The transfer of control from C to D is simple from within source code:

TRANSFER (taskC, taskD);

In other words, once the coroutines are registered via NEWPROCESS, everything is done with pointers. TRANSFER saves C's current IP in the pointer called taskC, triggers the context switch, and then sets the IP register to the point of resumtion for taskD. D then runs until the next TRANSFER is encountered.

Coroutines are especially important in embedded controllers and operating systems, but they also have uses in end-user applications. An example is a heavily CPU-bound computation program. You can place the long-running calculation in the background by making it concurrent with a keyboard processor. Periodically, the computation gives control to its coroutine so that a keystroke, if waiting, can be processed; the handler then transfers control back to the calculation routine.

The related procedure, IOTRANSFER, also part of standard Modula-2, establishes a coroutine relationship between a process and an interrupt-service routine. IOTRANSFER says, in effect, "when this interrupt occurs, simulate a TRANSFER to interrupt the running process and switch to the ISR." Thus, you can rather easily achieve some level of multitasking with Modula-2 on a DOS machine by hooking a scheduler to timer interrupt 1Ch.

Summing Up

This article is by no means a comprehensive description of Modula-2, but it covers the main points of the language. If you want to find out more, the definitive source is Wirth's Programming In Modula-2 (Springer-Verlag), which is now in its third edition. For learning the language and also as a reference, an excellent text is K.N. King's Modula-2: A Complete Guide (Heath, 1988).

So that's what Modula-2 is, and why it's a worthy alternative to both Pascal and C.

Design

Structured Programming

Semantic Differences

I/O Handling

Modularity

Standard Modules

Coroutines

Summing Up

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Design Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Design

Structured Programming

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Design Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content