Porting Unix to the 386: the Basic Kernel

SEP91: PORTING UNIX TO THE 386: THE BASIC KERNEL

Bill was the principal developer of 2.8 and 2.9BSD and was the chief architect of National Semiconductor's GENIX project, the first virtual memory micro-processor-based UNIX system. Prior to establishing TeleMuse, a market research firm, Lynne was vice president of marketing at Symmetric Computer Systems. They conduct seminars on BSD, ISDN, and TCP/IP. Send e-mail questions or comments to [email protected]. (c) 1991 TeleMuse.

Last month, we began our preliminary discussion of the main() procedure in the 386BSD kernel. This procedure is critical to UNIX, and we will be referring to it again and again as we introduce more functionality in our 386BSD kernel and incrementally turn on all of the kernel's internal services. We examined the validation and debugging of kernel functions up to executing the main() procedure, processor initialization, turning on paging, and sizing memory, among other things. We then went on to create our initial page fault and context generation for the scheduler, paging, and first user processes. In other words, by building much of the main() procedure, we also assembled the framework for the next outer layer of our UNIX operating system -- which, in turn, will initialize higher-level portions of the kernel.

This month and next, we turn to a seemingly unrelated area: multiprogramming and multitasking. Their relevance lies in that beginning with this stage of our port, the code grows more complex as more items interact with each other; as such, we are often working simultaneously on many key portions. This is one reason why ports are often begun with great hopes and aspirations, but abandoned as the complexity increases.

Multiprogramming and multitasking, two important elements which help make UNIX "UNIX," are also areas of the basic design that the creators did particularly well; hence, they are instructive to anyone interested in operating systems design. Many other popular operating systems have only recently allowed for multiprogramming -- at great cost and incompletely -- even though it was "planned for" in their early designs.

This month, we will reexamine our understanding of the conventions--the style of programming portions of the operating system--inherent in the design of UNIX. These conventions are the conceptual framework upon which a multiprogramming operating system is built. Once these conventions are understood, the ease and simplicity of the UNIX multiprogramming environment contrasts markedly with the more convoluted attempts at multiprogramming in later operating systems.

Next month we'll examine some actual code; in particular, sleep(), wake-up(), and swtch(), and how they create the illusion of multiple, simultaneous process execution on a sole processor. We'll discuss some of the requirements for the extensions needed to support multiprocessor and multithreaded operation in the monolithic 386BSD kernel and reexamine the multiprogramming attempts of some other operating systems in light of what we have learned.

What is Multiprogramming?

Occasionally, an area is so misunderstood or shrouded in mysticism that the simple and elegant explanation is ignored in favor of an obtuse or complicated one. Multiprogramming is so elemental to the design of any operating system that ignorance of its development and structure precludes any real understanding of UNIX. It is a concept which appears obvious to the typical user (and for good reason): Through multiprogramming we can leverage or work on several tasks at once. Obviously, concurrently running several editors, formatters, and output devices would be as important to a writer as simultaneously running a compiler, debugger, and editor is to a programmer.

The term "multiprogramming" implies multiple applications active at any time. In other words, it is the effect we can see of having access to and working with these programs. A UNIX system generally allows a fair number of simultaneous applications programs to be present at a given time, a convenience to which the average UNIX user quickly becomes accustomed. (While we wrote this article, there were four editor processes, two command processors or shells -- one of which was on a different host on the network -- and a contact management program present. This is a typical level of activity for one or two people doing a little writing.) While the computer can cause them all to appear active in a wink of an eye, we can't use them nearly as fast, so we just hop between programs, calling more as needed.

More Details.

Attempts at Multiprogramming: MS-DOS and Finder

UNIX users become so spoiled with the ability to multiprogram that they don't fair well when migrated back to a non-multiprogramming environment such as MS-DOS and Finder. Not surprisingly, later versions of these have attempted to compensate for this lack with limited extensions such as line printer spooling (via a TSR in MS-DOS for example). These extensions are by no means perfect, however. For example, certain applications programs invoked from within nested command processors invariably fail, sometimes with catastrophic results.

Windows 3.0 for the 386 attempts to increase multiprogramming capabilities beyond earlier MS-DOS extensions by running a protected-mode, preemptive-scheduling operating system with multiple MS-DOS partitions. It gets around the problem by simulating a separate complete MS-DOS environment for each task using the virtual 8086 mode on the processor itself. In other words, it effectively places a hard shell around each MS-DOS "task," as if it were running on a separate processor.

Internally, the software looks somewhat like UNIX. Through the 8086 mode feature of the 386 processor, Windows/386 can leverage some UNIX approaches to multiprogramming. The MS-DOS sessions end up running in "almost" real mode while the rest of the operating system, like 386BSD, runs in protected mode in a completely different part of the system. Even some of the bugs encountered (and fixed) in porting 386BSD to the PC are similar to those in extant protected-mode systems like Windows/386.

On the Macintosh, applications can be launched and switched, but again the focus is more intraapplication than interapplication. System 7.0 with Multifinder is supposed to deal with this area more comprehensively. (We shall see.)

The question as to why multiprogramming was more difficult to achieve in MS-DOS and Finder on the Macintosh, for example, can be turned around and phrased in a more direct manner: "Where in UNIX is the concept of multiprogramming implemented (if there is such a place), and what elements of multiprogramming are missing from other operating systems that made extensions for multiprogramming more difficult later on?"

In UNIX, the concept of multiprogramming was extant from the beginning, so let's examine what the designers of UNIX did right.

Conventions and Definitions

To the user, multiprogramming is the ability to use and access many programs at once. Organick, however, views multiprogramming as "Systems for 'passing the processor around' among several processes, so as to prevent the idling of a CPU during I/O waits or other delays, are known as multiprogramming systems." While utilitarian and concise, this definition yields little insight to the layman.

To make the concept of multiprogramming more obvious, we must review some fundamental terms: processes and tasks, context switching, preemption and multitasking, and time slicing. We'll also contrast our understanding of these terms with those of other designers who helped develop the functional definitions.

Processes and Tasks

Programming on UNIX is predicated upon the existence of processes. Dennis and Van Horn's is a good basic definition: "A process is a locus of control within an instruction sequence. That is, a process is that Abstract entity which moves through the instructions of a procedure as the procedure is executed by a processor." This definition describes a dynamic entity practically "swimming" through the code but does not tie us down by saying exactly what a process consists of or how it works through the code.

Organick gives a more functional definition, inherited from the precursor to UNIX, MULTICS: "Process (Lay Definition): A set of related procedures and data undergoing execution and manipulation, respectively, by one of possibly several processors of a computer." Again, our definition is not "absolute." We run into this problem with a few other terms, in particular, lightweight processes and threads. (See the sidebar entitled "Brief Notes: Lightweight Processes and Threads.") Modern UNIX systems utilize lightweight processes, as opposed to the heavyweight, let's-do-everything MULTICS processes.

Processes, by their nature, are isolated entities. Tasks, on the other hand, share resources such as variables and memory. A task can be a subset of a process (such as a thread) or live outside of a process (in a portion of memory on the system, for example). A task is really just some register state and a bag of memory that can be accessed by other tasks. Keep in mind that this concept is more primitive than that of a process.

Tasks are "well-behaved" when they process the event that activated them without locking out other cooperating tasks (that is, not doing time-consuming operations) and when they relinquish execution to give other lower-priority tasks a chance to run. Tasks must be careful to arbitrate for shared resources before they use them.

Real-time tasks perform many operations usually done within the unseen internals of a time-sharing system. Thus, the transparency so touted in a timesharing system hinders the external visibility needed for real-time operation. (See the sidebar entitled "Brief Notes: Is UNIX Real-Time Enough?")

Context Switching

The actual operation of going from one process to another is called a context switch. When this occurs, the state information of the computer's processor (registers, mode, memory mapping information, coprocessor registers, and so on) or "context," is saved away in a location where it can be later restored. Then, a new process which must be run is found. Finally, the state information of the new process must be loaded and run.

The word "context" is used to refer to different, but related things. When a processor gets an interrupt, or a procedure is called, the state information of "what the processor was doing before" is always recorded (somewhere -- it depends on the computer and system). This might be termed an "interrupt context switch" or a "procedure context switch." These are different from a process (or task) context switch.

A process context switch can be a costly operation, especially if a processor has a large number of registers. The additional overhead of this operation is just another unwelcome burden to a nonmultitasking operating system -- the price for doing n things at once. This overhead is created in proportion to how much multitasking is done at a time. Remember, many active processes result in many context switches.

Preemption and Multitasking

Simple multitasking systems can be written that run one task at a time and switch to another task when idling. These systems are nonpreemptible, because a process is not allowed to preempt or run ahead of the currently active process. This mechanism does not allow for much flexibility. Early examples of nonpreemptible multitasking systems included MPM and UCSD Pascal.

MS-DOS, in contrast, is a single-tasking system. To illustrate the difference, let's assume we are running a program, such as a number cruncher, on both our nonpreemptible multitasking system and our single-tasking MS-DOS system. If the program allows for no interruptions, it will run to completion on both systems. Let's assume, however, that the programmer who wrote this program installed a request for input midway through the program. At the point the request for input appears and the compilation is stopped, we can "put aside" the nonpreemptible multitasking system's program, run another task, and then return to input the data and continue the program. On our single-tasking MS-DOS system, however, we cannot put aside the program midway through; we must either input the requested data and complete the compilation or abort it.

A preemptible multitasking system is far more interesting for our purposes. Unlike our nonpreemptible multitasking system, we can not only run one task at a time and then switch to another task when idling, but one task can also preempt automatically, without manual intervention. This concept is quite powerful in practice, but adds its own complications, as we shall see.

Obviously, preemptive systems must have a way of grabbing control and preempting the current process. Therefore, preemption mechanisms are either "immediate" or "casual" in action, depending on the amount of time available to activate a process to run. In a real-time system, an immediate guaranteed response time is crucial; with a time sharing system, even a ponderous one fourth of a second (approximately 500,000 386 instructions) response time is unnoticeable.

Preemption adds two major costs: First, it implies more context switches (hence, increased overhead); and second, nonpreemptible sections must be carefully coded to remain nonpreemptible. Additional code is usually required to surround nonpreemptible sections and when contending for shared resources that may become active. (These nonpreemptible sections of code are called "critical" because they must execute without preemptions, else the integrity of the system is impacted.) These costs can increase, depending on the degree of preemption allowed.

More Details.

Unlike a real-time system, UNIX possesses no ability to preempt a higher priority process when the kernel is already processing something that cannot be "blocked." "Wakeup" calls can schedule a higher priority process that wants to be run, but up to an entire rescheduling clock tick can pass before this shuffling of the schedule is noticed.

Time Slicing

An important consideration with timesharing systems in general is to ensure that each process receives a period of time to run on the processor when it is required, and that, when many processes are ready, the system's scheduler "round-robin" and tender the appropriate time to the processes. With our preemptible multitasking system, we can afford to give processes a period of activity, called a timeslice, to switch them back and forth. (A UNIX timeslice has a lifetime of one-tenth of a second.) A process can run up to the lifetime of a timeslice unless some other process intrudes (preempts). In contrast, a real-time system will only activate a well-behaved task per a given event.

The system's scheduler determines the rules detailing which process is allowed to run. Usually, the policies the scheduler uses to manage resources (processor time, RAM, I/O bandwidth, preferential use) have a root goal in mind. With UNIX, the concept of "fairness" is invoked, in that processes from multiple users are allowed to compete for resources on an even basis. While this is appropriate for a time-sharing system, a real-time system would require us to have an intentionally "unfair" scheduling policy in place -- one that would award resources to a task solely on the basis of its runtime priority and event occurrence.

UNIX Organization for Multiprogramming

From the very beginning, UNIX was conceived of as a "preemptible multitasking system," on top of which lightweight processes are built. Preemptible multitasking occurs at the internal kernel level of the system and its mechanics are transparent to the user. Multiprogramming is built upon these mechanisms and is what is observed at the user level. It is more concerned with the effects of our preemptible multitasking system on the user (in other words, how we interact with the system). By the way, if we have more than one processor, we are also doing multiprocessing. Got all that? Good, because these terms are thrown about all the time with little distinction between them, and they are distinct.

UNIX utilizes a limited-preemption mechanism that provides each process a timeslice which it can consume. Because its goals are oriented towards timesharing, such timeslices are made perceptibly short to maintain the illusion of timesharing. The mechanism for providing this is simple and elegant, avoiding "ultimate" mechanisms that impose complexity throughout the kernel. The trade-off for this simple approach is a minimal, submillisecond response time delay, event-to-process -- not a great loss for a time-sharing system.

Having obtained an overview of multiprogramming and the related terminology, we can now delve inward to the actual mechanisms by which processes are created, switched around, and terminated.

A UNIX Process's Double Life

A UNIX process possesses both a user-mode program and a supervisor-mode "alter ego." The user-mode applications program runs code and obtains service from the operating system via system calls. This in turn causes the computer to enter supervisor mode and run the operating system's code, which processes the exception (system call) for this process. During the processing of a system call, or other exception, multitasking code comes into play. Because the computer system's periodic clock interrupt forces entry into the operating system (usually every 100 times a second), even a process that does not call the system by itself (a heavy calculation) mandatorily calls the system in this manner.

Blocking and Unblocking Processes

A process waiting for an event can give up the processor by calling the tsleep() routine to "block" itself out from the processor until the expected event occurs. It then frees up the processor to run another process. tsleep() records the priority and other details of its slumber. The code checks to see if the event for which it was waiting has occurred. If not, another tsleep() is issued.

When the event occurs, processes waiting for it are "unblocked" by a wakeup() call. The wakeup() call reschedules the previously blocked process, allowing it to run when it next has priority. wakeup() and tsleep() are not necessarily discreetly paired; one wakeup() call can awaken many tsleep()ing processes waiting for the same event. This explains why a process is not guaranteed that the condition waited for is true; if one buffer becomes free, a dozen processes may wake up for it. Only one is satisfied, so the others return to their slumbers.

Process Context-Switching Mechanism

In order to provide multiprogramming capability, we must be able to exchange the currently running program with the next program to be run whenever the current process blocks for an event or the currently allocated timeslice of the process is consumed. This switch from one process to another, or process context switch, is a very critical piece of code, and is the pivotal mechanism for multiple execution.

Interestingly enough, a process context switch is similar to a subroutine call mechanism called a "coroutine." Coroutines are not nested within a hierarchy like subroutines, but instead reside at the same level. When a coroutine pauses, it returns execution to its caller, knowing that it will be reentered at some future date when its caller suspends.

Next Month

Our next step is to discuss in depth the 386BSD switch() routine, and how it impacts multiprocessing capabilities to 386BSD. We'll embark upon those subjects next month.

References

Organick, Elliot I. The Multics System. Cambridge, Mass.: MIT Press, 1972.

Dennis, J.B. and Earl C. Van Horn. "Programming Semantics for Multiprogrammed Computations." CACM, vol. 3, no. 9 (1966).

Brief Notes: Lightweight Processes and Threads

On MULTICS, processes were so "heavy-weight" that a fair amount of work was done to reuse them, rather than create and destroy them constantly. In a similar vein, VMS processes are "precreated" before use to minimize activation time, "cleaned and pressed" after use, in preparation for reuse, and then terminated. UNIX, on the other hand, relies on lightweight processes because it uses one (or many) for each command executed through the shell. In other words, processes are so convenient to work with (each has a completely independent and isolated program within) that we want to use them commonly. This is why we make them lightweight in UNIX -- so we can cheaply create and destroy them as needed.

During the mid-1980s, research versions of UNIX found that a limiting factor in the speed and efficiency of UNIX might be, in part, that lightweight processes were not lightweight enough. For example, to perform a UNIX fork() operation to clone a process, the early versions of UNIX were required to copy an entire process, frequently unnecessarily (such as copy a 200-Kbyte text editor only to execute a 10-Kbyte command interpreter). In order to preserve the lightweight nature of UNIX processes, research systems implemented "copy-on-write" fork(), where only the pages actually modified would be copied. (All pages in the copy are marked "read-only;" as write operations are detected and faulted, the affected pages are copied and marked "read-write.") Copy-on-write is now a de facto standard in UNIX systems. Alas, copy-on-write required duplication of page table and other bookkeeping information, so it was not considered "lightweight enough" for some.

One extreme solution was a mechanism called vfork() -- it literally stole the parent process's address space en masse and used it temporarily until it executed another program, and thus avoided copying page tables and bookkeeping information. This sounds wonderful until you realize that the program using vfork() has to be carefully written to avoid inadvertently modifying the parent process (because both child and parent process run using exactly the same memory). Also, due to the weird semantics of vfork() (you must clean up after yourself, the child process always runs first, and so on), it's not a general-purpose replacement for fork(). (In fact, in many cases you can't employ it.) Finally, it's not easy to debug, because a single program is "running" in two places. However, it still remains the cheapest way to spawn processes, even if it is a bit ugly and cumbersome.

In the never-ending search for lighter-weight mechanisms, "threads" next come into view. Exactly what a thread is, however, varies from one system to another. Some view threads simply as tasks within a process. Others view them as Lightweight Processes (LWPs) that may share address space. Minimally, a thread must have a separately executing PC (program counter) or instruction pointer on the 386, although to be practical, we also suggest a stack. Hopefully, the cost of creation and context switching would then be so low that we could program in terms of thousands of threads. Typically, they would be used as ways for normally dormant functions to become active (by being scheduled to run), such as when an exception needs to be processed. Threads are also blockable.

Threads seem at first to provide a natural way to explore parallel processing or multiprocessing within a process, because you can now have a thread per processor. On second glance, however, threads aren't an answer to the hard questions of parallel processing. In particular, for the past 30-40 years we have been working with sequential mechanisms. Now how do we delegate work to parallel instruction streams? In other words, how do we "think parallel"? Threads may be part of the answer, but, contrary to the popular literature, they are far from all of it.

Leveraging the early idea that programs were in processes and could be used in each command, conventional UNIX multiprogramming permitted you to build complicated commands easily and tie them together in scripts and pipelines; this was valuable.

However, at the risk of sounding skeptical, it is not clear that threads offer any immediate advantage on either a uniprocessor or multiprocessor system over the conventional UNIX multiprogramming with processes. To truly make use of threads, our program development tools (compilers, linkers, debuggers) must provide even more functionality than before. This is especially true with the debugger, as you need to know just what thread modified which global variable that caused another to generate an exception. Otherwise, multithread programming might resemble an impossible can of worms, subject more to the rules of black magic than professional practice.

Watching this "slimming down" of processes, we can only speculate about what will occur next if this trend continues. Can we next expect subinstruction parallelization, or multi-threaded microcode? If so, it might fit in with the trend toward superscalar processors, where multiple operations are performed per each clock tick (that is, a 50-MHz processor that does hundreds of millions of operations per second). Perhaps programs will then be written in a two-dimensional hierarchical arrangement, and separately address parallel iteration in hardware and sequential iteration in time. We might end up with the neural net arrangement, itself an example of both time and space iteration. Whatever else occurs, the trend toward parallel execution will definitely shape the programming environments of the future.

-- B.J. and L.J.

Brief Notes: Is UNIX Real-Time Enough?

Our UNIX system model, with its 1 to 100 millisecond response time, seems more than adequate for its time-sharing role. However, we seem to be moving into a world where the elements of what is loosely described as "multimedia," namely voice, imagery, and other sensory information, are becoming more commonplace. These sources of information not only require a lot of I/O bandwith to make it onto and off of our system, but they also may require microsecond or perhaps even nanosecond response time (as for video). Does UNIX work well enough for the multimedia and networking of the future, or is it showing its age?

On the surface, there a number of problems. By the time a process is activated, a voice response event of brief duration (such as saying the words "yes" or "no") may have come and gone unnoticed. Software delays can amount to substantial portions of time, so we lean towards customized hardware solutions as the only predictable way of guaranteed response.

This is not the end of the story, or the end of UNIX, however, as many elements of multimedia have already been demonstrated on extant UNIX systems sans customized hardware. Clever software can "buffer ahead" in order to make up for the occasional delay that mars the synchronous "playback" of information. The 1991 Summer USENIX presented several examples in the areas of music and video which were really quite fun to watch and work with.

However, other problems remain, especially that of bandwith. One (relatively) quick way around this is to develop new protocols that synchronize and reserve bandwith on a network. For casual near-term needs, UNIX can function adequately in this way.

Video, however, presents an even greater problem. Even when we utilize modern data compression techniques, current data rates push hardware and software to their limit. Video grows even more intractable when we start to consider future video (HDTV) bandwiths. Finally, if we want our software to transform the video in real time, we can't work with a compressed signal. For such applications, real-time systems may be required. The question now becomes, can our workstations serve all needs well?

Real-time systems differ from time-sharing systems in that they tend to have their controls "exposed" instead of hidden. Scheduling is often controlled by a group of tasks that cooperate, and arbitrary preemption is the rule, not the exception. UNIX primitives don't always fit nicely into this world, and most existing real-time systems have different proprietary interfaces, complicating matters even further. POSIX has people working on real-time standards, so there may be hope for a standard interface yet, but not soon. As a way around the problem, a system could conceivably work both interfaces at once, by building mechanisms on top of UNIX mechanisms (this has been done by some manufacturers), or we could rewrite UNIX to provide guaranteed real-time response (as has been done by other manufacturers).

In the long run, the conflict over real-time operation verses transparent time-sharing may be an intractable problem which only a successor to UNIX may correct. For the short-term, systems which appear to "mix" real-time and time-sharing UNIX (like those systems with separate dedicated processors) will probably suffice, but the economics of the marketplace will eventually demand a less costly or less complicated solution.

-- B.J. and L.J.

Porting Unix to the 386: the Basic Kernel

What is Multiprogramming?

Attempts at Multiprogramming: MS-DOS and Finder

Conventions and Definitions

Processes and Tasks

Context Switching

Preemption and Multitasking

Time Slicing

UNIX Organization for Multiprogramming

A UNIX Process's Double Life

Blocking and Unblocking Processes

Process Context-Switching Mechanism

Next Month

References

Brief Notes: Lightweight Processes and Threads

Brief Notes: Is UNIX Real-Time Enough?

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Porting Unix to the 386: the Basic Kernel

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content