Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Parallel

Porting from Workstations to PCs


SEP93: Porting from Workstations to PCs

Barr is a research scientist at Arris Pharmaceutical, 385 Oyster Point Blvd., South San Francisco, CA 94080. He can be reached at [email protected].


Microsoft's Fortran Powerstation 1.0 is a 32-bit Fortran compiler with a Windows development environment that aims at moving compute- and data-intensive Fortran applications from high-performance workstations to low-cost PCs. To this end, Powerstation employs a single memory model and comes with an integrated DOS extender that lets you develop and run number-crunching programs once confined to UNIX, VAX, or supercomputing platforms. In this article, I'll describe my experiences in porting a data-intensive UNIX-based simulation program to the PC platform. But first, a quick overview of the Powerstation environment is in order.

Powerstation Overview

Powerstation is a development environment in transition. On one hand, it generates 32-bit code targeted at the 80386/486, but on the other, it uses 16-bit Windows as the development environment. Nonetheless, memory management is provided by the DOS extender rather than by Windows. Programs are developed in the Windows-hosted development environment, but when executed are restricted to a DOS window within Windows or DOS itself.

Powerstation adheres to the Fortran-77 language standard, but implements some Fortran 90 elements as extensions. Nonportable extensions that address deficiencies in Fortran I/O have also been added. To facilitate porting from other platforms, VAX and IBM extensions are also accepted. Also included is a complete 32-bit DOS-based graphics library.

The Windows development environment uses a toolbar that iconizes all the common file handling, searching, compiling, and debugging commands. Multimodule code projects are managed as projects and a pull-down menu item puts all building, execution, and project maintenance operations in one place. Project modules and dependencies are handled within the Powerstation environment, rather than through nmake (which is still supplied). Single complete programs are recognized as such and can be compiled and executed without the need to define a project.

Powerstation also supports color syntax highlighting. Statement lines are colored by column position and comments are uniformly colored in a contrasting color. Column 6 (the continuation line) is colored green to provide a partition between the statement number (columns 1--5) and the statement (columns 7--72). Statement text beyond column 72 is colored differently as a warning.

For 32-bit support, Powerstation includes a dedicated version of the Phar Lap DOS extender. The compilation process adds a binding step, and the resulting disk image sizes are stunningly small. Previous versions of Microsoft Fortran treated data as static and included large, empty arrays in the disk image. Powerstation incorporates transparent dynamic memory allocation into the program, which saves disk space and load time. At the beginning of program execution, the DOS extender is loaded before the program proper begins execution. It is placed in the path when Powerstation is installed, but if you want to execute on another machine, the extender must be moved with the program and located either in the same directory as the program, or in the path. (In my package, there was no documentation on the DOS extender, information programmers need. Nor was there adequate information and examples on optimization; any information on how to get the best performance really should have been included, especially since Fortran has traps that will steal performance.)

Large Data Objects

The first Powerstation feature I examined was its ability to compile and execute programs with huge, realistically sized data objects that can be either arrays or single-data objects exceeding 65K in size. Since memory access is managed by the non-documented DOS extender, I had to determine its practical and functional limits. Because performance is always an issue, I also wanted to be able to determine the measure of performance loss when nested-loops access array elements inefficiently. Memory access can be probed by increasing the array size in a simple test program until performance degrades. Example 1 fills a square array of real*4, then writes selected array element values. Overall performance is measured by the total execution time, overhead is measured from start to execution of the first statement (a write statement). The final write statement is included because many optimizers can delete statements (including do loops) if the assignment variables are not subsequently used by the program. The parameter statement allows the program to be easily scaled to new array sizes. Example 1 accesses the two-dimensional array in column-major order; that is, the first index represents contiguous memory addresses and is controlled by the inner do loop. Although backwards to C and other languages that explicitly handle multidimensional arrays, this is standard Fortran and an important component of optimization. Separate compiled versions of Example 1 for each array size were created with and without time optimization in release mode. Test programs made in debug mode performed about the same as their corresponding release-mode versions. I timed and executed all programs on a 16-MHz 386SX with 6 Mbytes of RAM, the slowest possible platform for this compiler.

The performance of Example 1 is excellent, at least up to extremely large array sizes; see Table 1. The overhead is composed of the time required to load the DOS extender and the time to establish the extended memory management for the data in the program. The sharp performance degradation in going to the 1024x1024 array is likely due to the DOS extender activity, which is probably memory-page swapping. I was unable to locate the point where the DOS extender's execution behavior changed, but in any case, three Mbytes is still an awful lot of data.

The maximum overhead incurred by the memory manager can be obtained simply by inverting the array indices of the two-dimensional array in Example 1, forcing each cycle of the nested do loop to access a noncontiguous array element; see Example 2. If the array is large enough and memory pages are small enough, the memory manager can be forced to fetch an entire page of array elements just to access one element. This maximizes the overhead and kills performance, although the program still executes correctly. All the array sizes of Example 2 are shown in Table 1 and run identically to the contiguous access cases except for the 1024 case, which required 16 hours to complete due to the accumulated overhead. The good news is that execution performance is recovered by properly indexing the array while being mindful of how the array is stored, and that the Phar Lap DOS extender handles arrays up to 3 Mbytes without demands on performance.

Memory management is new to many PC users, but old hat for UNIX users and covers both extended/virtual memory and cache memory. Under UNIX, virtual memory is handled by moving memory pages to and from disk, which dramatically steals performance even when the array's elements are addressed contiguously. UNIX has never handled virtual memory well, so the performance solution to running very large programs is to buy more memory, rather than rely on the virtual-memory manager. You can get a flavor of this problem by running the largest array size versions of Example 1 in a DOS window under Windows 3.1 which has a primitive virtual-memory manager: Execution is very slow and the disk holding the swap space is very active. (I advise you buy more memory instead of using virtual memory.)

Cache memory is used to speed up execution and is really a miniature version of the virtual-memory paging problem, but with smaller page sizes. The processor's ability to find a data item in cache, versus having to fetch it from main memory, is referred to as the "hit" rate; when it's low, so is performance. Here, the solution is to be mindful of addressing. As the distinction between UNIX workstations and PCs disappears, performance issues familiar to UNIX programmers, will have to be addressed by PC developers.

The Simulation Program

The program I ported into the PC environment is a molecular simulation program called "Ann" which determines the optimum conformational state and most stable energy for a floppy molecule using simulated annealing. Ann is a compact, efficient 1800-line Fortran program that uses approximately 1.7 Mbytes of data. The program, written by Stephen Wilson of NYU, is available from NYU in DOS, MAC, and other formats (call 212-998-3519 or contact [email protected] for details). I'd previously ported Ann from its original VAX platform to the Silicon Graphics Indigo workstation. Now was the time for a PC version.

Simulated annealing is an ideal approach to optimization problems that have a large number of configurational states. The algorithm requires (1) a method for randomizing states, (2) an objective function for measuring the "energy" of states, and (3) a control-parameter analog of "temperature." The algorithm works by moving from state to state under the control of a probability: State changes that lead to lower "energy" are always allowed, while state changes that lead to higher "energy" are sometimes allowed. At high "temperatures," all state changes are allowed. A slow cooling of the system gradually renders the higher energy states inaccessible until ultimately only the optimum state remains. The analogy to annealing is actually very good and, in fact, the process works best when the "cooling" is done slowly. Detailed descriptions of simulated annealing are described in Numerical Recipes: The Art of Scientific Computing by W.H. Press, et al. (Cambridge University Press, 1986) and "Simulated Annealing" by Michael P. McLaughlin (DDJ, September 1989).

The problem with molecules is that the number of states grows geometrically with the number of bonds. Medicinally-interesting molecules seem to have lots of bonds, so finding the optimum, or global-minimum energy state in a sea of local minima, is a challenge comparable to finding a needle in a haystack. Fortunately, simulated annealing is a good model for this problem: Conformational states can be randomized by selecting and rotating the bonds of the molecule at random using what are known as "Monte Carlo methods," a real energy can be calculated, and the real temperature can be employed.

I started with the VAX version as originally supplied. Compilation times on my 386SX were less than two minutes in time-optimized release mode. The compiler spotted a multiple declaration error tolerated by the VAX Fortran compiler in which a COMMON BLOCK element was initialized by an identical DATA statement in two different modules. Another problem was the use of VAX "slang" in the OPEN statement directives. The behavior of Fortran I/O statements is modified by directives in the statement of the form directive="OPCODE". For the most part, the opcodes are unique, so the VAX compiler tolerates a "slang" in which only the OPCODE without quotes need be listed in the I/O statement. Complicating this is a lack of standards on file-sharing directives and opcodes. The PC's simple, single-user environment let me delete the file-sharing directives without compromising the program. From start to finish, porting only took an evening of time.

Performance of Ann on my 386SX under DOS was 900 seconds (15 minutes) for the original sample problem that's supplied with the code. I also ported the VAX code to my SGI R4000 Indigo Elan (an 85 MIPS/16 Mflop workstation) and it compiled without change or error in --O3 (maximum optimization) mode and executed in 15 seconds. In other words, my lowly 16-MHz 386SX was roughly 2 percent of the performance of a top-performing workstation. Note, however, that at approximately 1.5 MIPS/0.2 Mflops, the 386SX performance was better than the microVAX-II I used for similar simulations in 1986--and the 386 was a fraction of microVAX's price!

I also compiled several other off-the-shelf compute- and data-intensive simulation programs, again with minimal or no change and performance was comparable to that with Ann. I then ran Linpack, which executed in 735 seconds on my PC and 8 seconds on the R4000 Indigo. Considering the differences in the code, the Linpack results are in the same performance ballpark as Ann.

Code Transportability

For sometime, I've been trying to use a PC as a development platform for programs that would ultimately earn their keep on a UNIX computer because application development environment is better on the PC than on, say, an SGI workstation. My hope was that by adhering to portability standards, code would be portable so that I could develop and run on any platform, including the PC. For the most part, Powerstation makes this a reality, but there are still issues that make code porting between different computers difficult. The process of porting Ann isolated a recurring problem with nonstandard I/O statements. Table 2 shows a comparison between directive modifiers of the open statement between Powerstation and the SGI F77 compiler. Fortunately, the most useful directives and opcodes are standard. In some cases the opcode becomes the directive, but generally the difference between the two involves differences in how the opened file is accessed. Variation in opcodes can be handled by substituting directive=string_var, which is allowed, and selecting the appropriate opcode at run time. Although this works, it adds a layer of complexity. Variation in directives cannot be handled or aliased and must be dealt with through code simplification, as I did in Ann.

Table 1: Timing results (time optimized).

===========================================================================
Array      Array        Time            Time
Dimension  Size         Contiguous      Noncontiguous   Overhead
                                        (sec)           (sec)
===========================================================================
128        64.0 Kbytes    4.3             4.5             4.0
512        1.0 Mbyte      6.6             6.8             4.8
900        3.1 Mbytes    12.2            12.5             5.9
1024       4.0 Mbytes    74.0         57344.0            16.0
===========================================================================

Table 2:

Comparison of OPEN statement directives and opcodes.
===========================================================================
 Fortran Powerstation                         Silicon Graphics F77
 directive           opcode                   directive         opcode
===========================================================================
 unit =              integer expression       unit =            integer

expression
 access =            APPEND                   access =          APPEND
                     DIRECT                                     DIRECT
                     SEQUENTIAL                                 SEQUENTIAL
                                                                KEYED
 blank =             NULL                     blank =           NULL
                     ZERO                                       ZERO
 blocksize =         integer expression       (no equivalent)
 err =               label number             err =             label

number
 file =              character expression     file =            character

expression
 form =              FORMATTED                form =            FORMATTED
                     UNFORMATTED                                UNFORMATTED
                     BINARY                                     BINARY
                                                 SYSTEM
 iostat =            integer expression       iostat =          integer

expression
 mode =              READ                     readonly          (no

variations)
                     WRITE
                     READWRITE
 recl =              integer expression       recl =            integer

expression
 share =             DENYRW                   shared            (no

variations)
                     DENYWR
                     DENYRD
                     DENYNONE
 status =            OLD                      status =          OLD
                     NEW                                        NEW
                     UNKNOWN                                    UNKNOWN
                     SCRATCH                                    SCRATCH
                                         carriagecontrol =      FORTRAN
                                                                LIST
                                                                NONE
                                         associatevariable =    integer

expression
                                         defaultfile =          character

expression
                                         disp =                 KEEP
                                                                SAVE
                                                                PRINT
                                                               PRINT/DELETE
                                                               SUBMIT
                                                               SUBMIT/DELETE
                                         key =                  address

expression
                                         maxrec =               integer

expression
                                         recordtype =           FIXED
                                                                STREAM_LF
                                                                VARIABLE
===========================================================================

[EXAMPLE 1]

c measure consecutive memory addressing performance
      program arraytest1
      parameter (DIM=1024)
      real r(DIM,DIM)
      write (*,*) 'start loop'
      do outer = 1,DIM
          do inner = 1,DIM
              r(inner,outer) = float(inner*outer)
          enddo
      enddo
      write(*,*) 'end loop'
      write(*,*) r(DIM/3,DIM/3), r(DIM/2,DIM/2), r(DIM*2/3,DIM*2/3),
     1    r(DIM,DIM)
      stop 'done'
      end
[EXAMPLE 2]<a name="0291_000b">
c measure nonconsecutive memory addressing performance
      program arraytest1
      parameter (DIM=1024)
      real r(DIM,DIM)
      write (*,*) 'start loop'
      do outer = 1,DIM
          do inner = 1,DIM
              r(outer,inner) = float(inner*outer)  ! indexing inverted
          enddo
      enddo
      write(*,*) 'end loop'
      write(*,*) r(DIM/3,DIM/3), r(DIM/2,DIM/2), r(DIM*2/3,DIM*2/3),
     1    r(DIM,DIM)
      stop 'done'
      end


Copyright © 1993, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.