Channels ▼
RSS

Design

Porting Unix Applications to DOS

Source Code Accompanies This Article. Download It Now.


NOV91: PORTING UNIX APPLICATIONS TO DOS

This article contains the following executables: PORTUNIX.ARC

David is vice president of Performance Computing Inc., a custom software services company specializing in development tools, windows, and applications support for high-performance architectures. He can be reached at P.O. Box 230995, Portland, OR 97223, or 503-624-8245.


Like many UNIX workstation software engineers, I've watched with surprise (and horror) as DOS and the PC have spread through the engineering community. That an operating system with so few safeguards against inadvertent crashes and a processor that forces the programmer to think like a car renter ("Will that be the compact or the small model, sir?") could become so popular continues to amaze me.

Consequently, when our biggest client asked us to port the Free Software Foundation's GNU/960 Development Tool Suite -- consisting of approximately 240,000 lines of C source code -- to DOS, we took a deep breath and dove in. Hopefully, what we learned with our port, and what we're sharing with you in this article, can reduce headaches when you undertake similar tasks.

Facing the Challenge

A number of issues are involved with porting a 32-bit UNIX application to the DOS world, the most obvious being that DOS is a 16-bit operating system. At the system-services level, all data reads, writes, and transfers are limited to 16-bit addressability (a 64K segment). While DOS native applications have learned to live with this limitation by making multiple data manipulations in 64K chunks, UNIX applications have been written to access as much as 4 gigabytes in one data transfer. Splitting each data manipulation into multiple 64K chunks would be both inefficient and error prone. It's better to use tools that will handle this for you invisibly. Of course, 16 bits has many more implications, including the segmented memory model and how it affects addressing capabilities and performance.

Another difference is the size of the int data type between 16-bit DOS and 32-bit UNIX. At first glance, this seems a minor point, but it can actually cause all kinds of misery during the port. Not only do you have to find and replace all the ints but, if you miss one and pass it as a parameter, the stack can be corrupted and cause the application to crash. Fortunately, some DOS compilers (such as those of Intel, Watcom, and Metaware) use an int size of 32 bits that eliminates this worry.

DOS inflicts many memory restrictions on its applications. Without some type of extended memory support, system and applications must fit into a maximum of 640 Kbytes of code space. Even with extended memory support, memory availability is limited to the amount of physical memory in the system, minus memory resident system utilities. This can place a strangle-hold on UNIX applications that have been written for virtual memory and the attitude that "memory is cheap." The most common solution is to stuff intermediate data that would normally be held in memory out to temporary files. Unfortunately, this can be a major rework, depending on the application. Furthermore, it can slow the execution speed tremendously, because the data is accessed based on disk transfer rates instead of physical memory access times.

The UNIX runtime library is a cornucopia of utilities that range from data manipulation to basic I/O. Corresponding DOS C runtime libraries offer most of the functions provided on UNIX. However, some UNIX capabilities simply do not exist under DOS. For example, because DOS supports only a single-thread execution (no preemptive multitasking), UNIX functions such as fork( ) cannot be equally implemented on DOS. Applications that require cooperative multitasking between child processes could require major reworking.

Finally, like many applications, ours was constantly under development. A major requirement was to minimize specific-for-DOS changes in the source code. Therefore, solving these problems by placing #ifdefs throughout the code was not acceptable because future upgrades to the application (which continue on the UNIX host) could result in as much effort to port as the original program. It was important to plan ahead and devise these sorts of one-time changes that could be separated into a system-dependent DOS include file. This well-documented file could be used when planning future upgrades and enhancements, to make sure all the coding standards are still followed.

GNU/960 Development Tools

The GNU/960 tool suite, targeted for the Intel 80960 32-bit RISC processor used in commercial applications such as laser printers, network controllers, terminals, avionics, and radar processing, is a cross-development system based on the Free Software Foundation's (FSF) generic tool suite. GNU/960 consists of an optimizing compiler, assembler, linker, archiver, debugger, and communications package, as well as numerous minor (yet useful) utilities, including a dump utility, a tool to migrate between the two object file formats the linker can produce, and a symbol table extractor. All in all, there are 17 separate development tools, 364 different source files, and over 240,000 lines of code.

The GNU/960 tool suite supports the entire 960 family, including the superscalar 960CA, which can execute multiple instructions in one clock cycle when the compiler has scheduled the instructions in the proper order.

The 960 processor generally communicates with the host development system over an RS-232 connection. This connection is manned on the host side by the GNU comm utility, and on the 960 side by a bootstrap kernel called "Nindy." Downloading an application is completed via a packet transfer protocol that detects data errors and requests a packet to be present, if necessary. These communications are full-featured, allowing programmers to specify options such as data size, stop bits, parity, and baud rates from 300 to 38,400 bps.

Project Requirements

Based on our initial evaluation, we came up with a set of criteria to help determine the DOS-based development tool suite best suited for the porting task. Obviously, the compiler had to generate 32-bit code capable of executing in the 386/486 protected mode, but also support virtual memory to alleviate 640K limitations. From my experience, UNIX engineers have been spoiled by demand paging, and would rather avoid overlaying data and code segments. We also don't want to deal with extended/ expanded memory hooey. We want a real, "use all the memory you've got, then page from disk" virtual memory. Not all DOS extenders include virtual memory managers.

Another requirement was that the toolset be a complete integrated package. We wanted tools that work together seamlessly. We didn't want to get a virtual memory manager from one vendor, a compiler from another, and a debugger from a third. In addition, the package had to be well-supported, stable, and work as promised.

Also high on the list of requirements was that the toolset not have any hidden costs attached to it. In particular, a royalty-free DOS extender was considered mandatory. It's difficult to justify charging a fee when distributing "free" software like the GNU tools. Finally, the environment had to provide a clear path to Windows 3.0. After all, one of the biggest reasons for porting the application to the PC is to make it available to the greatest number of users possible.

C Code Builder

We evaluated several options, including those from Metaware, Watcom, and Intel. We did not consider the Microsoft and Borland products because they produce only 16-bit code. The environment that best fit our criteria was Intel's 386/ 486 C Code Builder Kit. Code Builder includes a 32-bit compiler, a full-screen source-level debugger, virtual memory manager, 0.9 DPMI-compatible DOS extender, linker, librarian, and make utility. Our greatest concern was with the newness of the product. We later learned that the compiler is an adaptation of Intel's well-established x86 embedded cross-development compilers; versions of this compiler have been used to write real-time embedded applications for many years. (This perhaps explains why the compiler performs so well for a newly released product.) In short, we had no code generator related problems.

The compiler will accept K&R C syntax as well as ANSI standard C. This flexibility allows us to port the "dusty deck" C, which most of GNU is written in, while still employing the improvements of ANSI C on any new code we wrote. The Runtime Library (RTL) complies with the ANSI specification. It has included Microsoft, POSIX, and System V UNIX extensions, in the order of priority. Thus, there is a good chance that most UNIX routines will be available for use under Code Builder, especially if the code was written under System V UNIX.

Code Builder also contains a make utility very similar to UNIX make. In fact, it even contains some rudimentary UNIX shell-like commands (for, cp, and rm, for example) that do not exist under a standard DOS command-line interpreter. This makes supporting UNIX make files much easier, and getting builds going much quicker.

Limitations, Expected and Otherwise

If your application was written using the Berkeley BSD version of UNIX, you may have more trouble with the Runtime Library. BSD support was apparently never a design criterion for Code Builder and the less-common or BSD-specific, system-level functions will probably not exist. Furthermore, even if the routine you are using has a corresponding routine in the Code Builder RTL (no matter which UNIX RTL you have been using), you had better check the documentation. It is always possible that the routine was coded to some other standard than you expect, and functions a little bit differently than you were counting on. Making assumptions like these can cause premature gray when trying to debug some weird porting bug!

We started off porting the compiler and the communications tools, figuring that the sheer size and complexity of the compiler (over 100,000 lines of code) and the low-level RS-232 bit twiddling would flush out many of the problems we would encounter over the life of the project. So, we purchased Ethernet cards, bought PC-NFS for our DOS boxes, mounted the UNIX source code disk to be accessed over the net, and prepared to compile our modules.

The problems we encountered fell into five general categories: system mismatches; sloppy programming practices; C Code Builder limitations; "DOSisms;" and "library misses." The first three tended to be compilation failures, while the rest didn't show up until the link stage or at runtime. In general, the later in the compile/link/run cycle a problem showed up, the harder it was to track down.

System Mismatches

I call a problem a "system mismatch" when the tool or utility is designed with some other set of criteria in mind. These problems show up either before compilation begins or as compilation errors. For example, the make utility that comes with Code Builder can handle about 80 percent of what you might expect a UNIX make file to handle. One big difference, however, is that UNIX make files contain shell instructions that execute when a target has been recognized. Unfortunately, UNIX instructions do not exist on DOS.

In a few cases, such as echo and for, the make utility seems to add the functionality of their UNIX shell counterparts. This can be deceiving (and frustrating) because they're really provided to be Microsoft make-like, which uses a similar-but-different syntax. Once it's clear that echo is limited with respect to its UNIX cousin, and that rerouting using >, >>, and >& works, but only in fairly simple forms, the make files are not too difficult to adjust to work under both environments.

Another example of a system mismatch is the definition of certain external global names under C Code Builder. For example, the global value errno, which is used to return specific error values from certain I/O routines and is normally implemented as an int, is instead implemented as a macro in Code Builder. This was done to make the Code Builder runtime library reentrant. A noble goal, but compilation errors abound whenever the application explicitly defines errno as extern int errno. This is fairly common in certain applications.

Missing include files present yet another system mismatch related problem. Under DOS, there is no need for the data definitions and routines defined within such include files as ioctl.h, termio.h, curses.h, and sys/file.h. Many of these relate to low-level I/O functions, which on DOS are handled by the BIOS. Others are definitions of terminal types and capabilities--something foreign to DOS, which expects only the standard PC monitor. Any data or routines normally defined in these include files and used in your application will need to be simulated, stubbed out, or references removed before compilation can continue.

Unlike UNIX, DOS differentiates between text and binary files. With text files, data such as control characters and character sequences are interpreted directly by the I/O routines. Data in binary files are passed through without interpretation. We dealt with this by defining the macros shown in Figure 1 and modifying the opens to be fp = fopen (filename, READ_BIN) or fp=fopen (filename, READ_TXT). This works equally well on DOS and UNIX. This approach centralizes the DOS-specific code into a single location within an include file.

Figure 1: Macros to handle text and binary files in DOS

  #ifdef DOS
  #       define READ_BIN "rb"
  #       define READ_TXT "r"
  #       define WRITE_BIN "wb"
  #       define WRITE_TXT "w"
  #else   /* the UNIX way */
  #       define READ_BIN "r"
  #       define READ_TXT "r"
  #       define WRITE_BIN "w"
  #       define WRITE_TXT
  #endif  /* DOS */

Sloppy Programming Practices

These problems occur at compile time and are the easiest to solve because, in most cases, they are ultimately a result of bad or lazy programming. For example, we found enumerated types being defined with trailing commas. (One can only guess it made adding the next new enumeration value quicker.) While UNIX C compilers are rather lenient in this regard, Code Builder choked on the trailing comma.

Another example of sloppy code we found broke the preprocessor. In this case, note the macro definition #define abort( ) fancy_abort( ). The expansion fancy_abort( ) also contained the macro definition abort( ), so the preprocessor went into an infinite loop trying to resolve the circular recursion. Some compilers catch circular definitions; Code Builder does not.

Code Builder Limitations

Limitations inherent to Code Builder tend to be designed in artificial restrictions that no one on the design team ever thought would be questioned. For example, who would have thought that a macro definition string would exceed 1K? Unfortunately, the GNU compiler has some incredibly long macro definitions that are used to define special tables and output formats. Fortunately, the limitation on macro expansions is much greater (6K). If the problem is only in the length of the string that follows the macro name in the definition, you can work around it by splitting the macro into multiple parts.

The other two size problems we ran into show up at runtime. With Code Builder, the programmer has control over the maximum size to which the stack can grow, and the maximum size of real memory used before going to disk for virtual memory. Both these problems usually manifest themselves as a runtime abort, often changing slightly when the debugger is run, or when new routines are written and linked into the application. The default stack size is determined by the linker, and can be adjusted using the - s [+-] <size> linker command line option. Some of the GNU tools use alloca() to allocate dynamic memory on the stack for entire temporary data files, so we needed to allow the stack to grow as much as 1 Mbyte.

Something to watch out for when debugging your application is the amount of virtual memory needed to run the Code Builder application. It is necessary to anticipate the maximum amount of memory the application will need during execution, then set the "region size" accordingly. Code Builder defaults to a region size equal to that of all your system's extended memory. If your application needs more, malloc() will eventually fail and your application will take whatever error precautions have been programmed into it, if any. The region size can be adjusted at compile time by using the -xregion switch on the compiler's command line.

Another limitation relates to the library routine alloca(), which allocates dynamic memory directly on the stack so that it is automatically "freed" upon returning from the current routine. Even though this routine is considered obsolete by the ANSI C committee, Intel saw fit to include it in its RTL for Microsoft and K&R C compatibility. This turned out to be good news because GNU tends to use alloca() with gusto. However, there are some limitations on it which can cause problems not immediately evident when trying to debug a failure at runtime. The most damaging is that at least one local variable needs to be defined in any routine that uses alloca(). Otherwise, the stack pointer may not be properly restored upon executing a return statement and the application may branch off to Mars. Of course, as is true with UNIX, nothing allocated with this routine should be passed to free(), because it will cause the dynamic memory heap to become corrupted.

More Details.

The Code Builder debugger is useful and flexible, once you get used to the commands and the rules for moving around in its "windowed" environment. The only trouble I found with the debugger is really not its fault. Apparently the compiler does not place the proper debug information into included files that have executable code in them. When this happens, the debugger points to the wrong location in the source. It doesn't resynch until the application executes code from some other source file. If possible, the best way around this problem is to remove all executable instructions from include files. If this is not possible, you may have to create a temporary C file in which you have preincluded all files with executable code in them until that portion of the application is ported and tested.

DOSisms

A "DOSism" is a problem that arises because DOS simply won't do what you need it to. Most of these issues relate to limitations of the BIOS routines. We ran into both speed and accuracy problems when dealing with the RS-232 port via the usual BIOS calls. Our requirements were that the downloading be able to run at up to 38.4K bps and not lose any bits. This seems like a reasonable request, but turned into a nightmare when we looked into BIOS further.

As described in the accompanying text box, "Communicating Around the BIOS," the BIOS could not guarantee that it would return to the host program every character written to the port from Nindy, at any rate over and including 9600 baud. Ultimately, we had to write our own RS-232 driver to bypass the BIOS, then mop up all the ramifications of doing so.

The last limitation we ran into is shared by Code Builder and DOS. It is a restriction on the number of files that can be opened at any given time. If the application is failing because it can't open enough files, the DOS limitation can be removed by modifying the FILES command in the config.sys file. We found that 45 files were sufficient for our application. We ran into a bug, however, in DOS 4.01, in which these values were ignored. We were not able to run our application successfully on DOS 4.01 when it needed to open more than 20 files. We had no such problems under DOS, Versions 3.x or 5.0.

The Code Builder RTL also has a maximum number of files that can be open at one time. Unfortunately, this number is not affected by the FILES value, set in config.sys. Instead, the applications main routine must be modified to include a call to_init_handle_count(num_files), where num_files is the maximum number of files you need open at any given time. This number should be less than or equal to the value in config.sys.

Library Misses

"Library misses" are problems that relate to library routines that either don't exist or aren't the same under DOS and UNIX. A simple example is that there is no way to turn off local echo on characters typed in from the keyboard without calling a completely different read() function. The GNU interactive communications tool that talks over RS-232 to Nindy running on the 80960 processor expects to have Nindy echo the characters it receives. So on UNIX, it utilizes an ioctl call to turn off echo and calls the standard read keyboard routine. We wanted to maintain our goal of not changing the read( ) functions, so we had to turn the Nindy echo off, otherwise we would see double for everything the user typed.

Another problem was that DOS does not have available all the interrupt signals that can be used under UNIX. The list of available signals is shown in Table 1. Thus, if your UNIX application uses one of the nonmapped signals, an alternative must be used for running under DOS. For instance, DOS has no SIGALRM alarm clock capability. To port code that uses it, one must map the UNIX SIGALRM onto one of the user-definable signals (see Figure 2). Then all statements that raise(SIGALRM) will really be raising the DOS user-defined signal. Of course, in this case you also will need to write a version of alarm( ) which uses the BIOS clock and explicitly raises SIGALRM when the correct amount of time has lapsed.

Table 1: Mapping of interrupt signals under UNIX and MS-DOS

  MS-DOS Signal   UNIX Signal   Meaning
  ---------------------------------------------------------

  SIGABRT                       Abnormal termination
  SIGBREAK                      Control+Break signal
  SIGFPE          SIGFPE        Floating point exception
  SIGILL          SIGILL        Illegal instruction
  SIGINT          SIGINT        Control C interrupt
  SIGSEGV         SIGSEGV       Segmentation violation
  SIGTERM         SIGTERM       SW termination signal
  SIGUSR1         SIGUSR1       User-defined signal
  SIGUSR2         SIGUSR2       User-defined signal
  SIGUSR3                       User-defined signal
                  SIGHUP        Hangup
                  SIGQUIT       Quit
                  SIGTRAP       Trace trap
                  SIGIOT        IOT instruction
                  SIGEMT        EMT instruction
                  SIGKILL       Process kill
                  SIGBUS        Bus error
                  SIGSYS        Bad arg to system call
                  SIGPIPE       Pipe write with no reader
                  SIGALRM       Alarm clock
                  SIGCLD        Death of child process
                  SIGPWR        Power fail
                  SIGPOLL       Selectable event pending

Figure 2: Mapping the UNIX SIGALRM to a DOS user-defined signal

  #ifdef DOS
  # define SIGALRM SIGUSR1
  #endif  /*DOS*/

Furthermore, the workarounds you devise to sidestep restrictions in DOS may cause routines that exist under in Code Builder's RTL to be insufficient. As I mentioned earlier, we had to bypass the DOS BIOS to guarantee accurate high-speed RS-232 communications. In doing this, we rendered useless all calls to standard I/O routines dealing with the RS-232 port through the BIOS. This included read(), write(), and dup2(), to name a few. We had to go back and recode these routines to go through our own data structures and RS-232 driver, then figure out a way to execute our routines when accessing the RS-232 port, and the normal read(), write(), and dup2() when accessing local files on the DOS disk.

Conclusion

Obviously, we had some work to do to complete the port, but Code Builder held up its end of the bargain. Even though I've focused on things to watch for, there were workarounds. In fact, I largely credit the smoothness of the port to the development tools we used. In this respect, the Code Builder tool suite gave the DOS machine the same feeling as a UNIX workstation.

Communicating Around the BIOS

Devices and files are handled differently in DOS than in UNIX. In DOS it is not possible, for instance, to simply open a stream and begin reading from and writing to it. Instead, devices and files must be opened, the controller initialized, and the BIOS tables setup. The BIOS controls all I/O, including that which is bound for the RS-232 port. The BIOS handles all interrupts that are raised when a character comes in over the port, and supplies interface calls to access the data. The problem is, when DOS needs to service certain other high-priority requests such as disk accesses, it turns off other interrupt servicing. DOS still receives the interrupt, but does nothing until the disk access is completed.

In cases where the data doesn't need to travel terribly fast, say less than 9600 baud, chances are that the disk access will be completed before another character arrives on the port. However, as speeds exceed 9600 baud, there is an increasing chance that multiple interrupts will be received during a disk access, with only the most recent character being picked up after the disk access is over.

This isn't fatal for the port, but it meant that we had to write a driver to bypass the BIOS routines and talk directly to the UART controlling the RS-232 connection, buffering all characters as they are received. Then, when the disk access in completed and the interrupt is serviced, we take all characters that have been placed in the buffer. The driver source code (as well as other routines discussed in this section) are available electronically; see "Availability" on page 3.

Unfortunately, there are ramifications of this solution. Not only do we need the driver, but we need to modify every I/O routine invocation which could be going through the RS-232 port. This means we need to create our own versions of routines such as open(), close(), read(), and write(). Futhermore, some I/O routines may have data go over the RS-232 port at one invocation, and data for the disk at another. For example, the subroutine write_files() in Figure 3 takes as input a file handle. This subroutine has no way of knowing whether the file handle relates to a disk file or an RS-232 connection. Thus, it must be able to handle both kinds.

Figure 3: Writing files using a specified file handle

  write_files (fp, buffer, bytes)
  int fp;
  char *buffer;
  int bytes;
  {
    if (bytes > 0) write (fp, buffer,
                              bytes);
  }

So the first step was to change all I/O function calls to routines of our own making. We did this using #ifdefs, as shown in Figure 4, then changing the appropriate calls to WRITE_TTY. Then we needed to write the open_port(), read_port(), write_port(), and close_port() routines that would keep track of which file handles were open to RS-232 files. A simple test could allow those routines to use the built-in RTL disk-file routines or our own RS-232 driver routines.

Figure 4: Mapping WRIT_TTY to either DOS or UNIX I/O calls

  #ifdef DOS
  # define WRITE_TTY write_port
  #else /* unix */
  # define WRITE_TTY write
  #endif /* DOS */

There was one last hoop we had to jump through before this scheme was complete. The application we were porting used the runtime library routine dup2(), which reassigns an open file handle to another. So we needed to write a version of dup2_port(). However, the UNIX version of this file maintains the reassignment even when child processes have been spawned. To do this, we needed to devise a global data structure that maintained the list of file handles that had been reassigned, and keep track of those that ultimately were linked to the RS-232 port.

-- D.G.




Copyright © 1991, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video