Porting Unix to the 386: a Stripped-Down Kernel

386BSD's basic kernel incorporates a unique "recursive" paging feature that leverages resources and reduces complexity.

July 01, 1991
URL:http://www.drdobbs.com/parallel/porting-unix-to-the-386-a-stripped-down/184408583

Figure 2

Figure 3

Figure 4(a)

Figure 4(b)

JUL91: PORTING UNIX TO THE 386: A STRIPPED-DOWN KERNEL

PORTING UNIX TO THE 386: A STRIPPED-DOWN KERNEL

Onto the initial utilities

This article contains the following executables: 386BSD.791

William Frederick Jolitz and Lynne Greer Jolitz

Bill was the principal developer of 2.8 and 2.9BSD and was the chief architect of National Semiconductor's GENIX project, the first virtual memory microprocessor-based UNIX system. Prior to establishing TeleMuse, a market research firm, Lynne was vice president of marketing at Symmetric Computer Systems. They conduct seminars on BSD, ISDN, and TCP/IP. Send e-mail questions or comments to [email protected]. Copyright (c) 1991 TeleMuse.

Much has been made of the preparations we have required before we could embark on our present project. While that's all well and good, at some point we really would like to get on with our adventure and start the main assault -- the kernel itself. Our roundabout development of tools and equipment allowed us to scope out the weak points in the 386BSD specification, with the added bonus of enhancing our experience and confidence. By following a disciplined set of guidelines and procedures, we minimized one of the most demoralizing activities of all -- trying to build our system without any idea as to where the bugs (or failure modes) lie, especially those enormously irritating compiler bugs induced by driver implementation bugs.

Now we arrive at the point in which we would like to create a "strippeddown" kernel. At this stage of our work, our primary concern is with the machine-dependent portions of the kernel that install it into the position to execute processes (via the bootstrap procedure) and prepare the system for initialization of the minimum machine-independent portions of the kernel (processes, files, and pertinent tables).

Our 386BSD kernel is a kind of "virtual machine" (not to be confused with the "virtual" in "virtual memory"), where functions underlie other functions transparently. When the system is initialized, it can use portions that require little direction to initialize even larger portions. Thus, this virtual machine assembles itself tool by tool, much like a set of Russian dolls. The machine-dependent kernel initialization is the innermost of the dolls -- the kernel of the kernel around which all is built. The next outer layer will then be built by the kernel's main( ) procedure (to be discussed later), which in turn initializes higher-level portions of the kernel.

While our basic approach toward "wiring" the 386 for operation with the machine independent BSD kernel is similar to that of our standalone system (see DDJ March 1991), the details are now very important. In fact, we've changed so much since our discussion of the 386BSD specification (DDJ January 1991) that even the specification needs to be revised in several key areas such as the virtual memory system and per-process data structure. In addition, the most recent versions of 386BSD (less than a month old) incorporate the unique feature of the 386 architecture in a form of "recursive" paging which not only leverages resources to the hilt, but also reduces complexity enormously. (See text box "Brief Notes: 386BSD Recursive Paging.")

The Basic Structure of the UNIX Kernel

The structure of the BSD UNIX system is akin to that of an onion. Consisting of layer upon layer, the outside layers of the BSD onion are those processes visible to the computer "user," while the inner layers hide processes the user needn't see, such as those relating to the hardware. (This can also be called the "Almond Roca" kernel, if you prefer sweets.)

The operating system kernel lies in the innermost layer. Its primary responsibility is to provide the appropriate level of utility services upon which other programs and facilities are built. The kernel itself consists of an inner "machine-dependent" portion and an outer "machine-independent" portion. The center of the onion could be considered the raw hardware itself.

In UNIX parlance, the kernel is typically divided into the "high kernel" and the "low kernel." The high kernel is concerned with UNIX abstractions, such as files, processes, and other related objects. The low kernel, in contrast, is concerned with the functionality of the kernel -- how to implement the abstractions with machine-dependent mechanisms for operation.

More Details.

To some degree, all operating system are designed with this basic "onion model in mind. However, the designers of competing systems spend a great deal of time determining what items belong in a given layer. Unlike the ISO OSI layer model which comprises computer systems networking, no agreement yet exists on the ideal model for operating systems design.

Many operating systems prior to UNIX did not precisely delineate the operating system and the user programs, and resulted in quite a wide variation in layering. Some operating systems (such as VMS, RSX, and OS/370) have thousands of different entry points and functions -- many chosen on an ad hoc basis. For example, some user programs would call directly into the operating system at a point known to be past a register-save sequence, because the writer of the program would assume that it didn't cause a problem and might even speed up the program slightly. Even nonuniformity within operating systems can occur, such as when a devotee of one particular system adds a facility which relies on a system call differing radically from the rest of the system. In these cases, the layering is blurred between the user application program and the given operating system -- not surprising considering the various ways that the same effect can be achieved.

UNIX, a fundamentalist "return to the basics" approach, was a philosophical as well as design issue. Unlike these other systems mentioned, UNIX has a very small number of system calls (typically fewer than a couple hundred), and, as such, must leverage them for maximal operation. This "simplicity" of design can be found throughout its structure. In fact, a suspect subsystem within UNIX itself is often branded as "unlike UNIX" due to nonmodular or clumsy design. Ironically, this has been the case with software that has been part of UNIX for years and widely used.

Part of the reason UNIX adherents (and its designers) appear to be "zealots" of the minimalist view is that the pressure to add "just one more" system call is quite great, and this one area alone has become a point of highly charged and subjective debate as to where to draw the line. This is one reason why a single UNIX "standard" has yet to emerge -- the lack of consensus on this and other crucial issues.

Incremental Strategy

Despite the "purity of essence" debates, UNIX has grown like a weed. (Any undesired plant is a weed, and one could say the same about UNIX, at least initially -- just ask DEC or IBM or Apple.) It has grown because the ever-increasing hunger for applications, and the functional infrastructure needed to support them, to simplify or enhance work is insatiable. Doubtlessly, UNIX will continue to grow in size and popularity (although some of us would prefer it grow in a graceful and planned manner). However, there are times when the "essence" of UNIX must be examined and understood, such as when a native port is conducted. By restricting UNIX functions via conditional compilation, we can work on making the core of the kernel functional. Once the core is functional, remaining portions can be added incrementally. This incremental methodology allows us to backtrack when errors or malfunctions occur. In addition, we always have recourse to the previous version if necessary.

Composing the Basic Minimal UNIX Kernel

What constitutes a minimal UNIX kernel? This varies according to the kind of port desired and resources available. For example, one alternative plan we almost selected involved using the Network FileSystem (NFS) instead of working with the hard disk. If we had chosen that approach, code for implementing an NFS client, along with the networking code, would have been a mandatory component of our minimal port, while the disk driver and related support would have been relegated to a less-important role.

Since we are concentrating on the machine-dependent portions of the minimal kernel, we must pare-down considerably what is required. For 386BSD, we opted for a traditional port (see DDJ March 1991) that relied on a hard disk, a console interface (via the keyboard and display) and the process reschedule clock (via the interval timer). All network protocol and related system services (interprocess communications) were removed through conditional compilation. Any extended functionality in the main body of the kernel meant to accelerate operations (for example, macros, hash lookups, short circuit evaluation) was also avoided -- after all, it makes no sense to improve the speed of something that does not even run to begin with. Also, algorithm improvement is not always a machine-independent phenomenon.

The point in generating the "tiniest" kernel imaginable is to simplify the port. At this stage, we never expected to run something this small as a complete "production" system. As we incrementally added subsystems to our minimal kernel, we got a clearer understanding of the impact of each on the kernel. Even within this small system, redundant code and interfaces occurred. As such, a small amount of patience in this area always pays a handsome dividend later.

Our minimal kernel was created by adding conditional compilation (#ifdefs) statements to the BSD kernel source code to defeat the subsystems for networking, TCP/IP protocols, routing, NFS, interprocess communications (other than pipe), user process debugging, and the related services on which they depend. In addition, since we only needed drivers for disk, display, keyboard, and process scheduling clock, we could scale down the drivers and omit autoconfiguration code. After making this operational, fleshing out the drivers, and adding back in support to run debuggers caused the kernel to grow considerably.

With all the concern these days over "bloated" kernels, with the consequent support, extensibility, and other problems, it is instructional to examine a sample listing of what can be considered a "stripped-down" 150-Kbyte kernel; see Figure 1(a), page 85. (By the way, by abiding by the rules outlined earlier and by using only the drivers necessary for basic functionality, our early initial 386BSD kernel was less than 100 Kbytes in size -- and was both debuggable and extensible.) As an example of how this differs from a production system, Figure 1(b), page 85, contains the same breakdown for a more recent system (using a derived MACH virtual memory system, NFS, TCP/IP, multiple disk and Ethernet controllers, and other features added).

How Can You Be in Two Places at Once...?

By design (DDJ January 1991), we want our operating system kernel to run at the top of the virtual address space (currently, location 0xfe000000) as in Figure 2. However, our PC memory is mapped into the lower portion of the address space before memory management is turned on. Thus, our bootstrap program must load the kernel program into low physical memory to run, even though the kernel has all of its absolute addresses directed to the top, where no physical memory is present!

For the short run, the kernel executes code which manually compensates for this problem, especially in the case of the data operands. Code operands are stored as relative offsets that work regardless of location (so-called "PIC" or Position Independent Code). PIC coding can be quite cumbersome (see Listing One, page 85, from "start" to "begin"). Fortunately, the actual amount of code required to operate in this fashion is small -- just enough to enable our memory relocation hardware (the MMU).

As we recall (see DDJ January 1991), the 386 MMU utilizes a "two-level" paging scheme in order to determine the physical page frame number -- the actual address of physical memory underneath the virtual address. This mechanism works by splitting the incoming virtual address into three parts: 10 bits of page table directory index, 10 bits of page table index, and 12 bits of offset within a page. The page table directory is a single page of physical memory that facilitates allocation of page table space by breaking it up into 4-Mbyte chunks of linear address space per each of its 1024 PDEs (Page Directory Entries), which determine the location of underlying page tables in physical memory. Each PDE-addressed page of a page table contains 1024 PTEs (Page Table Entries). A PTE is similar in form and function to a PDE. The major difference between a PDE and a PTE is one of hierarchy: A PDE selects the physical page frame of PTEs while a PTE selects the physical page frame for the desired reference. Once the frame offset least-significant address bits are obtained, the final address is determined. This two-level mechanism is quite elaborate, but it elegantly allows for the sparse allocation of address space, so that the whole address space or even all of the address space mapping information need not be present. In contrast, a one-level mapping scheme would require 4 Mbytes of real memory per task for mapping alone -- too much even for many modern systems.

To run our kernel program with the MMU enabled, we must build page tables that describe the physical location of memory storing the program, as well as the mode of access allowed to each "page" (otherwise known as the "allocation granularity" of the MMU) as in Figure 3. In addition, the MMU must have a page directory table describing where it can find all of the possible 1024 page table pages which allow it to access any part of its 4-gigabyte address space. In a way, the 386 MMU acts almost as a "coprocessor" to the 386 CPU, interpreting two data structures (page directories and page tables) on behalf of the CPU and translating virtual addresses into physical ones. (The MIPS RISC MMU is actually referred to as a coprocessor.)

While our code dutifully builds our page tables and page directory table to make the above mapping work (see Listing One, near comment "build page tables"), we are still left with a dilemma: How do we turn on mapping running at a "high" physical address when we are still running at a "low" address? In other words, how do we make the CPU switch from one address to another? Well, the answer could depend on an understanding of many hardware-related issues (such as the size of instruction prefetch queue, instruction pipelining, address translation overlap, multiprocessor arbitration, and so forth), while avoiding irregularities or non-standard approaches. For example, some systems programmers have gotten away with murder over the years by assuming in the software that the processor already has the instruction after the MMU is enabled (not always true, mind you), instead of verifying it as they should in all cases. This situation is analogous to people who dive over three lanes of traffic at the last second just to make a freeway exit. Most of the time it works, but occasionally it doesn't. In this case, Superposition of Matter (unlike radio waves) doesn't hold (although Total Conservation of Mass does hold). A disaster, possibly a crash, occurs -- so it goes with systems as well.

Not that this area will get any easier, either, what with the even more esoteric versions of the "N"86 on the drawing board. (By the way, has anyone trademarked "N"86 yet?) One must anticipate where the technology will be taken. For example, one might need to assume that the instruction queue always consists of at least one instruction. As technology shifts, features which are relied upon by even the most careful of programmers can be abandoned for better ones (for example, a fully pipelined instruction execution with pipelined MMUs that update address space state for branch prediction use).

In this case, the appropriate path around all of these hazards is simply to map the bottom of address space to the same location -- or "double-map" the same program. This way, it will work regardless of what the hardware designers do. We could also have replicated the page tables to map the bottom of address space where the kernel program begins, but we would end up duplicating the same page tables used at the top of the address space, and that would be very wasteful. Instead, we just double-map the bottom page directory entry (the one that maps the bottom 4 Mbytes of address space) to the page directory entry that maps the kernel. Once accomplished, the MMU can now be enabled.

Now that we are "running virtual," we need to leave the "bottom" of address space by jmping from low to high (see DDJ March 1991). However, we must avoid PIC from the jmp instruction, or else "jmp high" will keep us low. Because "clever" assemblers and loaders transparently assume that PIC code is desired, a quick solution is to push a constant on the stack and execute a return (ret) instruction.

UNIX as a Subroutine Call

Once you are running "high," you need to install a stack. The stack should be placed in the process's portion of address space. This way, it can be easily changed when we move from a process or task to another, because each process must have its own kernel stack. In a way, 386BSD functions like a subroutine call for a user process, with its own internal calls stacking on this separate stack, unlike the "jump to system" program approach used on systems such as TOPS-10.

Keeping each process's kernel stack at the same virtual location works well when using a single thread of execution processes, but is not advisable for multithreaded execution. For the multithreaded version of 386BSD, "lightweight" processes will require multiple kernel stacks and will be allocated out of kernel global virtual memory as needed.

Configuring the 386 for UNIX Operation

The kernel program's address space established, we must "wire" the 386 processor hardware to the kernel interfaces and set initial conditions for the system, including interrupt and exception processing, user process address space definition, and preparation for context switching. All of the facilities must be set from the earliest point possible, because before we leave the kernel to execute a single user process instruction, we are already running multitasking. In fact, we will even use multitasking and exception processing as we initialize the system! This really should come as no surprise, as software aficionados can never resist the temptation to use double-duty or recursive code -- or even inscrutable self-referential code. As a result, we page-fault the page tables to allocate them to be used in paging the first process.

Segments Revisited

There is an old expression that says: "If one is used to using a hammer, everything else becomes a nail." The 386 hammer of choice, segments, must be used to pound together the rest of the architecture no matter what. In other words, even if segments are not desired, they must be allocated and initialized. Because we have chosen to achieve most of our functionality via the paging mechanism (see DDJ January 1991), we try to minimize the need for the segmentation mechanism, but allow for future extensibility in such areas as dynamically growable tables (for example, ldt, gdt, ...). Currently, init386( ) relies only on a constant table (see Listing Two, page 86). In addition, separate descriptors for data and code segments are required as they use different attribute sets, even though they exactly alias each other. This allows for some interesting effects, such as allowing code to be executed out of the stack!

The approach outlined thus far has permitted coding to proceed in C. This is actually quite important, as the descriptor code and bitfield is obscure enough without any additional complications, such as additional coding in assembler. We actually could have worked in the reverse manner (invoking segments and then paging) by using the descriptors to relocate user space and run the kernel "low," but this would have significantly increased bookkeeping overhead when going between user and kernel without offering any clear advantage. Also, the construction of segment descriptors in assembly language can be quite tedious if done in this manner.

Interrupts and Exceptions

In the standalone system (see DDJ March 1991) we built a Global Descriptor Table to reinitialize segmentation. Now, we must follow the same techniques developed for the standalone system to build a Global Descriptor Table for descriptors used primarily by the kernel, and a Local Descriptor Table used primarily by the user tasks. (The local table can later be made "relative" per task if desired.) An Interrupt Descriptor Table must also be built that instructs the processor to execute special assembly code stubs (located at IDT-VEC(XXX) entry points) within the kernel (SEL_KPL) when any exceptions are triggered. Low-level code in each bus adapter's support code, used to wire-down all possible interrupts, is called to catch unintended interrupts prior to the configuring of devices by the kernel. And finally, through the use of special assembly language entry points (see DDJ March 1991), the descriptor tables are loaded, with any user or kernel exceptions caught on the fly.

Up to this point, we've just assumed that sufficient memory would be present to satisfy our needs, but this should not continue. Instead, we must probe and check the amount of memory present against recorded values in the system's configuration memory. If a value appears unusually large, we choose the lesser of the two. If both seem questionable, however, we revert to our minimum assumption -- 640 Kbytes of base memory only.

Memory in hand, we next initialize the virtual memory system that will manage both physical memory and virtual address space. The routine pmap_boot-strap( ) scales resources and assumptions based on available physical memory, and synchronizes the arrangement of the early "pmap" or physical map of the system to its internal data structures. The Mach virtual memory system, portions of which are incorporated into BSD, is split into machine dependent (pmap) and machine independent (vm_map) parts.

The remaining portion of init386( ) creates a way for a user process to enter the kernel and an initial process state through which a user process can be run for the first time. Because processes inherit these characteristics, this "zeroth" process state in effect initializes all subsequent sibling processes!

Upon executing init386( ) and main( ) (which initializes the kernel), the system is prepared for running the user process. Listing One (near the end) contains code which moves us into user space to execute the very first process. Little work is done to the user process itself -- instead, the exception mechanisms are relied upon to supply memory and instructions to it. This occurs from the point of initialization, because the init process that starts the system itself is faulted in incrementally.

Summary: What Do We Have Now?

As you may have noted, over the course of this series we have been building upon our previous work as we head toward our goal, and increasingly we are relying on an understanding of our growing set of tools. And, at the same time, we have recently changed some of our code to accommodate some of the exciting new developments at Berkeley. With all the changes occurring, even those very familiar with this software can become somewhat "lost."

At this stage, it is important to go back and recall the perspective we tried to established on this project. We compared it to that of climbing a mountain, and we carefully outlined and prepared for all the problems we thought we would encounter. However, even with all the preparation we could muster, we've still had to be fast on our feet. Paths which we had carefully mapped out just six months ago are wiped away by an avalanche -- removed forever by the force of innovation. Work and time and effort have been tossed aside as we've been forced to adapt new approaches, not only to keep up with the group, but occasionally to set the pace (as in recursive paging). And finally, as our system grows, the complexity grows as well, and with it the blizzards of bugs and incompatibilities that occasionally blind and dispire.

And now, after months of effort, we have developed the barest of kernels. We will continue on with our kernel development, but we now have the makings of the "Basic Kernel." Key elements of our Basic Kernel (multitasking, processes, device drivers, executing the first process, games tests, paging, and swapping) are crucial to establish a working understanding of 386BSD and Berkeley UNIX. We look forward to seeing you on the trail with us.

Brief Notes: 386BSD Recursive Paging

When we began this project, many of our notions were based on prior experience in that we emphasized the similarities of the 386 to other machines while discounting its idiosyncrasies. Like a new car owner fumbling in the wrong place for the headlight switch and cursing it for having moved from the dashboard to the steering column, we mainly tried to just get 386BSD running. However, once we felt "settled in," we decided to see if we could take it to the limit. Consequently, the last few months have been like motoring with Mr. Toad -- with the onslaught of software changes, "wild" seems too weak a word.

In keeping with the CSRG goals for the upcoming 4.4BSD release, one major task was to migrate 386BSD to a virtual memory system derived from CMU's MACH operating system. While this decision was appropriate, a major problem relating to the 386 arose almost immediately; when implemented as designed by CMU on the 386, the virtual memory system swallowed copious quantities of virtual address space for the operating system -- space which is needed for user processes. Most of this space was gobbled up maintaining address maps of all in-memory processes page tables, so that the system could maintain access to them at all times should they become active.

At the same time, we had been getting somewhat tired of how process page table mapping was handled in the traditional BSD virtual memory system; since the page tables themselves were in physical memory (for use by the 386's MMU), we needed pages of page tables to map the page tables themselves before we could modify them. As you can guess, this increased the amount of "bookkeeping" overhead considerably, especially when interacting features are added (such as shared libraries and shared memory). We hoped there would be a better way.

An ideal virtual memory system design gives access to information on the virtual-to-physical translation process (and the converse) very quickly. However, while the information is there, right on the same piece of silicon and working at warp speed doing just this, there is no way via software to invoke the mechanism other than through transparent processes -- creating the "virtual memory" effect. (Don't expect any change in this area any time soon, either, because for many hardware design reasons this is a nontrivial addition.) As a consequence, the systems programmer must encode a tedious subroutine with a sole purpose to emulate the same translation process in software that is performed in a fraction of the time of a single instruction by part of the hardware.

On the 386, page tables and page directories appear very similar -- in fact, they're identical in contents (see top of Listing Four, page 90). Turning the usual paging paradigms upside down, we examined what would occur if the page tables and page directories were viewed as if they were software data structures that could be connected in different ways. For example, frequently we want to find the page table entry associated with a given page. Obviously, the MMU does just this as it processes an ordinary reference to a page and continues on to "indirect" through the PTE to get to a page. Upon reflection, we noticed that if we arranged it so that the MMU goes through the same entry twice, we could get it to "use up" an indirection. This would allow us to reference the PTE itself instead of the underlying page. This approach, while unorthodox and confusing to the uninitiated, turned out to be quite feasible.

Thus was born the "recursive" page map technique -- one guaranteed to annoy the zealot and amaze the skeptic. Based on the "self-referential" model, the 386BSD recursive page table mechanism undergoes two iterations in the process of obtaining the PTE itself. In the first iteration, see Figure 4(a), a reference is made to the PTE of the page table directory. In the second iteration, see Figure 4(b), a reference is made to the PDE that maps the page directory itself. In other words, by "pointing" a page directory entry at the page directory itself, we have created a window in our virtual address map that consecutively maps all of the address space's page tables (in corresponding order as well) with out the need for another page of memory. In addition, this technique also maps the page directory itself, as a consequence of the second indirection, through the "recursive" page directory element.

To return to the previously mentioned example, we can find a PTE for a page with the macro vtopte( ) as seen in Listing Four, which consists of just a shift and an add. Additional macros here demonstrate the simplicity this method gives the virtual memory system.

The benefits of this technique are compelling:

We were able to reuse an existing data structure -- the page directories and page tables (contrary to the intentions of the hardware designers, by the way) -- thus reducing the memory cost of a process.
We were able to reduce the number of items we need to track per process, thus reducing bookkeeping overhead.
This method allowed us to conveniently mediate the cost of process page tables. (The process page tables belong to and don't clutter up the operating system kernel space.)
We were able to increase the locality of reference, such that the processor cache performance is enhanced.
We were able to provide a more convenient model of memory for the operating system to exploit.

Particularly relevant to items 2 and 5, by writing the 386 machine-dependent support routines in a recursive manner, we were able to make the code perform double-duty in a module a fraction of the size of previous 386 versions. In addition, the multiprocessor version of 386BSD may derive some benefit from this technique when used to hierarchically share page directory regions. It is rare when you find a method that conceptually fits so well and as a side-effect improves performance.

This technique is not limited to the 386 by any means; other two-level paging MMU microprocessors (68030, Clipper, 32532, 88000, ...) theoretically can leverage this technique, though probably with less benefit. Because most of these processors have separate address spaces for kernel and user, waste in the kernel does not rob memory from the user process as it does on the 386.

-- B.J. and L.J.

Figure 1(a):

    Minimal Kernel Breakdown (by module)
vmunix: text    data    bss    module name

        1152    32      0      clock.o
        0       500     0      conf.o
        4548    740     32     cons.o
        1508    24      0      init_main.o
        0       1212    0      init_sysent.o
        1588    28      0      kern_clock.o
        2044    12      0      kern_descrip.o
        3296    80      0      kern_exec.o
        1840    48      0      kern_exit.o
        1600    36      0      kern_fork.o
        956     0       0      kern_mman.o
        312     0       0      kern_proc.o
        1280    0       0      kern_prot.o
        1216    0       0      kern_resource.o
        3564    32      0      kern_sig.o
        684     16      0      kern_subr.o
        1808    24      0      kern_synch.o
        1864    4       0      kern_time.o
        248     0       0      kern_xxx.o
        6176    20508   0      locore.o
        5596    596     0      machdep.o
        0       148     0      param.o
        2184    84      8      subr_prf.o
        1092    72      0      subr_rmap.o
        244     0       0      subr_xxx.o
        184     72      0      swapgeneric.o
        3340    0       0      sys_generic.o
        4156    68      0      sys_inode.o
        1096    56      0      sys_process.o
        784     16      0      sys_socket.o
        2260    224     0      trap.o
        9480    516     0      tty.o
        12      204     0      tty_conf.o
        3928    4       0      tty_pty.o
        1924    0       0      tty_subr.o
        8680    1220    0      ufs_alloc.o
        3312    116     0      ufs_bio.o
        1668    0       0      ufs_bmap.o
        1248    48      0      ufs_disksubr.o
        416     0       0      ufs_fio.o
        3968    68      0      ufs_inode.o
        436     0       0      ufs_machdep.o
        2048    0       0      ufs_mount.o
        6020    220     0      ufs_namei.o
        2288    208     0      ufs_subr.o
        7100    112     0      ufs_syscalls.o
        0       620     0      ufs_tables.o
        0       152     0      vers.o
        2280    48      0      vm_drum.o
        2964    52      0      vm_machdep.o
        4364    180     0      vm_mem.o
        8280    188     0      vm_page.o
        2056    20      0      vm_proc.o
        3060    24      0      vm_pt.o
        2788    72      0      vm_sched.o
        528     0       0      vm_subr.o
        1052    32      0      vm_sw.o
        1836    44      0      vm_swap.o
        1536    152     0      vm_swp.o
        2048    68      0      vm_text.o
        3768    1492    1024   wd.o
totals: 145708  30492   1064

Figure 1(b):

    Fully Loaded Kernel Breakdown (by module)
vmunix: text    data    bss     module
        0       4       0      af.o
        592     16      0       autoconf.o
        844     0       0       clock.o
        2584    168     0       com.o
        0       640     0       conf.o
        4096    676     40      cons.o
        540     132     0       dead_vnops.o
        1440    28      0       device_pager.o
        3180    152     48      fd.o
        1264    140     0       fifo_vnops.o
        2812    12      0       if.o
        2600    12      0       if_ether.o
        1056    24      18      if_ethersubr.o
        464     0       0       if_loop.o
        5044    12      12      if_ne.o
        3184    16      0       if_sl.o
        3852    12      4       if_we.o
        2844    4       0       in.o
        356     0       0       in_cksum.o
        1684    0       0       in_pcb.o
        12      320     0       in_proto.o
        1496    12      0       init_main.o
        0       1532    0       init_sysent.o
        0       468     0       ioconf.o
        2056    68      0       ip_icmp.o
        4564    60      48      ip_input.o
        2616    0       0       ip_output.o
        1372    4       0       isa.o
        1204    16      0       kern_acct.o
        1280    4       0       kern_clock.o
        3184    0       0       kern_descrip.o
        3176    0       0       kern_exec.o
        1424    0       0       kern_exit.o
        996     8       4       kern_fork.o
        1204    0       0       kern_kinfo.o
        1772    0       0       kern_ktrace.o
        1028    4       0       kern_lock.o
        1892    268     0       kern_malloc.o
        796     0       0       kern_physio.o
        1180    0       0       kern_proc.o
        1844    0       0       kern_prot.o
        1140    0       0       kern_resource.o
        4172    132     0       kern_sig.o
        684     0       0       kern_subr.o
        1988    4       0       kern_synch.o
        1408    4       0       kern_time.o
        264     0       0       kern_xxx.o
        7076    684     0       locore.o
        4684    192     0       machdep.o
        552     0       0       mem.o
        708     44      4       mfs_vfsops.o
        656     132     0       mfs_vnops.o
        1600    0       0       nfs_bio.o
        1020    0       0       nfs_node.o
        21700   36      0       nfs_serv.o
        7748    152     0       nfs_socket.o
        1040    144     21672   nfs_srvcache.o
        10284   40      4       nfs_subs.o
        1956    72      80      nfs_syscalls.o
        2996    40      1       nfs_vfsops.o
        21304   424     0       nfs_vnops.o
        348     12      16      npx.o
        0       152     0       param.o
        6308    16      0       pmap.o
        2908    4       36      radix.o
        164     8       0       raw_cb.o
        1072    36      0       raw_ip.o
        812     0       0       raw_usrreq.o
        2304    8       0       route.o
        4552    116     0       rtsock.o
        2584    60      0       slcompress.o
        2296    180     0       spec_vnops.o
        716     0       0       subr_log.o
        1764    8       0       subr_prf.o
        888     0       0       subr_rmap.o
        340     0       0       subr_xxx.o
        5456    28      0       swap_pager.o
        0       40      0       swapvmunix.o
        3344    0       0       sys_generic.o
        0       0       0       sys_machdep.o
        904     56      0       sys_process.o
        604     20      0       sys_socket.o
        228     0       0       tcp_debug.o
        5820    8       0       tcp_input.o
        1896    16      0       tcp_output.o
        1504    12      0       tcp_subr.o
        832     60      0       tcp_timer.o
        1620    8       0       tcp_usrreq.o
        2912    0       0       trap.o
        9488    316     0       tty.o
        1864    204     0       tty_compat.o
        12      204     0       tty_conf.o
        3452    4       0       tty_pty.o
        1988    0       0       tty_subr.o
        504     0       0       tty_tty.o
        1980    36      0       udp_usrreq.o
        9644    0       0       ufs_alloc.o
        2012    0       0       ufs_bmap.o
        1424    0       0       ufs_disksubr.o
        3756    0       0       ufs_inode.o
        1668    12      0       ufs_lockf.o
        3832    4       0       ufs_lookup.o
        4572    20      0       ufs_quota.o
        732     0       0       ufs_subr.o
        0       620     0       ufs_tables.o
        3948    64      0       ufs_vfsops.o
        8264    524     0       ufs_vnops.o
        620     0       0       uipc_domain.o
        2672    64      4       uipc_mbuf.o
        8       176     0       uipc_proto.o
        6164    0       0       uipc_socket.o
        3184    24      0       uipc_socket2.o
        5520    0       0       uipc_syscalls.o
        3320    32      0       uipc_usrreq.o
        0       232     0       vers.o
        3644    0       0       vfs_bio.o
        1108    4       0       vfs_cache.o
        0       24      0       vfs_conf.o
        1776    0       0       vfs_lookup.o
        3940    44      0       vfs_subr.o
        7544    0       0       vfs_syscalls.o
        1684    20      0       vfs_vnops.o
        3524    0       0       vm_fault.o
        1964    20      0       vm_glue.o
        84      0       0       vm_init.o
        1848    0       0       vm_kern.o
        944     0       308     vm_machdep.o
        7624    16      0       vm_map.o
        384     20      0       vm_meter.o
        3196    4       0       vm_mmap.o
        3588    16      0       vm_object.o
        2500    32      0       vm_page.o
        824     8       0       vm_pageout.o
        636     20      0       vm_pager.o
        1160    0       0       vm_swap.o
        416     0       0       vm_unix.o
        304     0       0       vm_user.o
        2200    28      0       vnode_pager.o
        6176    1648    524     wd.o
        5252    48      9       wt.o
totals: 359636 12248 22832


_PORTING UNIX TO THE 386: A STRIPPED-DOWN KERNEL_
by William Frederick Jolitz and Lynne Greer Jolitz

[LISTING ONE]



/* locore.s: Copyright (c) 1990,1991 William Jolitz. All rights reserved.
 * Written by William Jolitz 1/90
 * Redistribution and use in source and binary forms are freely permitted
 * provided that the above copyright notice and attribution and date of work
 * and this paragraph are duplicated in all such forms.
 * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
 * IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
 * WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
 */

/*  [Excerpted from i386/locore.s] */
#define R(s) s - KERNEL_BASE    /* relocate references until mapping enabled */

/* Per-process region virtual address space is located at the top of user
 * space, growing down to the top of the user stack [set in the "high" kernel].
 * At kernel startup time, the only per-process data we need is a kernel stack,
 * so we allocate SPAGES of stack pages for the purpose before calling the
 * kernel initialization code. */
    .data
    .globl  _boothowto, _bootdev, _cyloffset

    /* Temporary stack */
    .space 128
tmpstk:
_boothowto: .long 0     /* bootstrap options */
_bootdev:   .long 0     /* bootstrap device */
_cyloffset: .long 0     /* cylinder offset of bootstrap partition */
    .text
    .globl  start
start:
    /* arrange for a warm boot from the BIOS at some point in the future */
    movw    $0x1234, 0x472
    jmp 1f
    .space  0x500       # skip over BIOS data areas

    /* pass parameters on stack (howto, bootdev, cyloffset)
     * note: 0(%esp) is return address of bootstrap that loaded this kernel. */
1:  movl    4(%esp), %eax
    movl    %eax, R(_boothowto)
    movl    8(%esp), %eax
    movl    %eax, R(_bootdev)
    movl    12(%esp), %eax
    movl    %eax, R(_cyloffset)

   /* use temporary stack till mapping enabled to insure it falls within map */
    movl    $R(tmpstk), %esp

    /* find end of kernel image */
    movl    $R(_end), %ecx
    addl    $NBPG-1, %ecx
    andl    $~(NBPG-1), %ecx
    movl    %ecx, %esi

    /* clear bss and memory for bootstrap page tables. */
    movl    $R(_edata), %edi
    subl    %edi, %ecx
    addl    $(SPAGES+1+1+1)*NBPG, %ecx
    #   stack + page directory + kernel page table + stack page table
    xorl    %eax, %eax  # pattern
    cld
    rep
    stosb

    /* Map Kernel--N.B. don't bother with making kernel text RO, as 386
     * ignores R/W AND U/S bits on kernel access (only valid bit works) !
     * First step - build page tables */
    movl    %esi, %ecx      # this much memory,
    shrl    $PGSHIFT, %ecx      # for this many ptes
    movl    $PG_V, %eax     #  having these bits set,
    leal    (2+SPAGES)*NBPG(%esi), %ebx #   physical address of Sysmap
    movl    %ebx, R(_KPTphys)   #    in the kernel page table,
    call    fillpt

    /* map proc 0's kernel stack into user page table page */
    movl    $SPAGES, %ecx       # for this many ptes,
    leal    1*NBPG(%esi), %eax  # physical address of stack in proc 0
    orl $PG_V|PG_URKW, %eax #  having these bits set,
    leal    (1+SPAGES)*NBPG(%esi), %ebx # physical address of stack pt
    addl    $(ptei(_PTmap)-1)*4, %ebx
    call    fillpt

    /* Construct an initial page table directory */
    /* install a pde for temporary double map of bottom of VA */
    leal    (SPAGES+2)*NBPG(%esi), %eax # physical address of kernel pt
    orl $PG_V, %eax
    movl    %eax, (%esi)

    /* kernel pde - same contents */
    leal    pdei(KERNEL_BASE)*4(%esi), %ebx # offset of pde for kernel
    movl    %eax, (%ebx)

    /* install a pde recursively mapping page directory as a page table! */
    movl    %esi, %eax      # phys address of ptd in proc 0
    orl $PG_V, %eax
    movl    %eax, pdei(_PTD)*4(%esi)

    /* install a pde to map stack for proc 0 */
    leal    (SPAGES+1)*NBPG(%esi), %eax # physical address of pt in proc 0
    orl $PG_V, %eax
    movl    %eax, (pdei(_PTD)-1)*4(%esi) # which is where per-process maps!

    /* load base of page directory, and enable mapping */
    movl    %esi, %eax      # phys address of ptd in proc 0
    orl $I386_CR3PAT, %eax
    movl    %eax, %cr3      # load ptd addr into mmu
    movl    %cr0, %eax      # get control word
    orl $0x80000001, %eax   # and let s page!
    movl    %eax, %cr0      # NOW!

    /* now running mapped */
    pushl   $begin          # jump to high mem!
    ret

    /* now running relocated at SYSTEM where the system is linked to run */
begin:
    /* set up bootstrap stack */
    movl    $_PTD-SPAGES*NBPG, %esp # kernel stack virtual address top
    xorl    %eax, %eax      # mark end of frames with a sentinal
    movl    %eax, %ebp
    movl    %eax, _PTD      # clear lower address space mapping
    leal    (SPAGES+3)*NBPG(%esi), %esi # skip past stack + page tables.
    pushl   %esi

    /* init386(startphys) main(startphys) */
    call    _init386        # wire 386 chip for unix operation
    call    _main
    popl    %eax

    /* find process (proc 0) to be run */
    movl    _curproc, %eax
    movl    P_PCB(%eax), %eax

    /* build outer stack frame */
    pushl   PCB_SS(%eax)    # user ss
    pushl   PCB_ESP(%eax)   # user esp
    pushl   PCB_CS(%eax)    # user cs
    pushl   PCB_EIP(%eax)   # user pc
    movw    PCB_DS(%eax), %ds
    movw    PCB_ES(%eax), %es
    lret            # goto user!

/* fill in pte/pde tables */
fillpt:
    movl    %eax, (%ebx)    /* stuff pte */
    addl    $NBPG, %eax /* increment physical address */
    addl    $4, %ebx    /* next pte */
    loop    fillpt
    ret

[LISTING TWO]



/* machdep.c: Copyright (c) 1989,1991 William Jolitz. All rights reserved.
 * Written by William Jolitz 7/89
 * Redistribution and use in source and binary forms are freely permitted
 * provided that the above copyright notice and attribution and date of work
 * and this paragraph are duplicated in all such forms.
 * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
 * IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
 * WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
 */
/* [excerpted from i386/i386/machdep.c] * /
/* Initialize segments & interrupt table */

#define GNULL_SEL   0   /* Null Descriptor */
#define GCODE_SEL   1   /* Kernel Code Descriptor */
#define GDATA_SEL   2   /* Kernel Data Descriptor */
#define GLDT_SEL    3   /* LDT - eventually one per process */
#define GTGATE_SEL  4   /* Process task switch gate */
#define GPANIC_SEL  5   /* Task state to consider panic from */
#define GPROC0_SEL  6   /* Task state process slot zero and up */
#define NGDT    GPROC0_SEL+1

union descriptor gdt[GPROC0_SEL+1];

/* interrupt descriptor table */
struct gate_descriptor idt[32+16];

/* local descriptor table */
union descriptor ldt[5];
#define LSYS5CALLS_SEL  0   /* forced by intel BCS */
#define LSYS5SIGR_SEL   1
#define L43BSDCALLS_SEL 2   /* notyet */
#define LUCODE_SEL  3
#define LUDATA_SEL  4

/* #define  LPOSIXCALLS_SEL 5   /* notyet */
struct  i386tss tss, panic_tss;

/* software prototypes -- in more palitable form */
struct soft_segment_descriptor gdt_segs[] = {
    /* Null Descriptor */
{   0x0,            /* segment base address  */
    0x0,            /* length - all address space */
    0,          /* segment type */
    0,          /* segment descriptor priority level */
    0,          /* segment descriptor present */
    0,0,
    0,          /* default 32 vs 16 bit size */
    0           /* limit granularity (byte/page units)*/ },
    /* Code Descriptor for kernel */
{   0x0,            /* segment base address  */
    0xfffff,        /* length - all address space */
    SDT_MEMERA,     /* segment type */
    0,          /* segment descriptor priority level */
    1,          /* segment descriptor present */
    0,0,
    1,          /* default 32 vs 16 bit size */
    1           /* limit granularity (byte/page units)*/ },
    /* Data Descriptor for kernel */
{   0x0,            /* segment base address  */
    0xfffff,        /* length - all address space */
    SDT_MEMRWA,     /* segment type */
    0,          /* segment descriptor priority level */
    1,          /* segment descriptor present */
    0,0,
    1,          /* default 32 vs 16 bit size */
    1           /* limit granularity (byte/page units)*/ },
    /* LDT Descriptor */
{   (int) ldt,          /* segment base address  */
    sizeof(ldt)-1,      /* length - all address space */
    SDT_SYSLDT,     /* segment type */
    0,          /* segment descriptor priority level */
    1,          /* segment descriptor present */
    0,0,
    0,          /* unused - default 32 vs 16 bit size */
    0           /* limit granularity (byte/page units)*/ },
    /* Null Descriptor - Placeholder */
{   0x0,            /* segment base address  */
    0x0,            /* length - all address space */
    0,          /* segment type */
    0,          /* segment descriptor priority level */
    0,          /* segment descriptor present */
    0,0,
    0,          /* default 32 vs 16 bit size */
    0           /* limit granularity (byte/page units)*/ },
    /* Panic Tss Descriptor */
{   (int) &panic_tss,       /* segment base address  */
    sizeof(tss)-1,      /* length - all address space */
    SDT_SYS386TSS,      /* segment type */
    0,          /* segment descriptor priority level */
    1,          /* segment descriptor present */
    0,0,
    0,          /* unused - default 32 vs 16 bit size */
    0           /* limit granularity (byte/page units)*/ },
    /* Proc 0 Tss Descriptor */
{   0,          /* segment base address  */
    sizeof(tss)-1,      /* length - all address space */
    SDT_SYS386TSS,      /* segment type */
    0,          /* segment descriptor priority level */
    1,          /* segment descriptor present */
    0,0,
    0,          /* unused - default 32 vs 16 bit size */
    0           /* limit granularity (byte/page units)*/ }};
struct soft_segment_descriptor ldt_segs[] = {
    /* Null Descriptor - overwritten by call gate */
{   0x0,            /* segment base address  */
    0x0,            /* length - all address space */
    0,          /* segment type */
    0,          /* segment descriptor priority level */
    0,          /* segment descriptor present */
    0,0,
    0,          /* default 32 vs 16 bit size */
    0           /* limit granularity (byte/page units)*/ },
    /* Null Descriptor - overwritten by call gate */
{   0x0,            /* segment base address  */
    0x0,            /* length - all address space */
    0,          /* segment type */
    0,          /* segment descriptor priority level */
    0,          /* segment descriptor present */
    0,0,
    0,          /* default 32 vs 16 bit size */
    0           /* limit granularity (byte/page units)*/ },
    /* Null Descriptor - overwritten by call gate */
{   0x0,            /* segment base address  */
    0x0,            /* length - all address space */
    0,          /* segment type */
    0,          /* segment descriptor priority level */
    0,          /* segment descriptor present */
    0,0,
    0,          /* default 32 vs 16 bit size */
    0           /* limit granularity (byte/page units)*/ },
    /* Code Descriptor for user */
{   0x0,            /* segment base address  */
    0xfffff,        /* length - all address space */
    SDT_MEMERA,     /* segment type */
    SEL_UPL,        /* segment descriptor priority level */
    1,          /* segment descriptor present */
    0,0,
    1,          /* default 32 vs 16 bit size */
    1           /* limit granularity (byte/page units)*/ },
    /* Data Descriptor for user */
{   0x0,            /* segment base address  */
    0xfffff,        /* length - all address space */
    SDT_MEMRWA,     /* segment type */
    SEL_UPL,        /* segment descriptor priority level */
    1,          /* segment descriptor present */
    0,0,
    1,          /* default 32 vs 16 bit size */
    1           /* limit granularity (byte/page units)*/ } };
/* table descriptors - used to load tables by microp */
struct region_descriptor r_gdt = {
    sizeof(gdt)-1,(char *)gdt
};
struct region_descriptor r_idt = {
    sizeof(idt)-1,(char *)idt
};
setidt(idx, func, typ, dpl) char *func; {
    struct gate_descriptor *ip = idt + idx;
    ip->gd_looffset = (int)func;
    ip->gd_selector = GSEL(GCODE_SEL,SEL_KPL);
    ip->gd_stkcpy = 0;
    ip->gd_xx = 0;
    ip->gd_type = typ;
    ip->gd_dpl = dpl;
    ip->gd_p = 1;
    ip->gd_hioffset = ((int)func)>>16 ;
}
#define IDTVEC(name)    X/**/name
extern  IDTVEC(div), IDTVEC(dbg), IDTVEC(nmi), IDTVEC(bpt), IDTVEC(ofl),
    IDTVEC(bnd), IDTVEC(ill), IDTVEC(dna), IDTVEC(dble), IDTVEC(fpusegm),
    IDTVEC(tss), IDTVEC(missing), IDTVEC(stk), IDTVEC(prot),
    IDTVEC(page), IDTVEC(rsvd), IDTVEC(fpu), IDTVEC(rsvd0),
    IDTVEC(rsvd1), IDTVEC(rsvd2), IDTVEC(rsvd3), IDTVEC(rsvd4),
    IDTVEC(rsvd5), IDTVEC(rsvd6), IDTVEC(rsvd7), IDTVEC(rsvd8),
    IDTVEC(rsvd9), IDTVEC(rsvd10), IDTVEC(rsvd11), IDTVEC(rsvd12),
    IDTVEC(rsvd13), IDTVEC(rsvd14), IDTVEC(rsvd14), IDTVEC(syscall);
int lcr0(), lcr3(), rcr0(), rcr2();
int _udatasel, _ucodesel, _gsel_tss;
init386() { extern ssdtosd(), lgdt(), lidt(), lldt(), etext;
    int x;
    unsigned biosbasemem, biosextmem;
    struct gate_descriptor *gdp;
    extern int sigcode,szsigcode;
    struct pcb *pb = proc0.p_addr;
    /* initialize console */
    cninit ();
    /* make gdt memory segments */
    gdt_segs[GCODE_SEL].ssd_limit = btoc((int) &etext + NBPG);
    gdt_segs[GPROC0_SEL].ssd_base = pb;
    for (x=0; x < NGDT; x++) ssdtosd(gdt_segs+x, gdt+x);
    /* make ldt memory segments */
    ldt_segs[LUCODE_SEL].ssd_limit = btoc(UPT_MIN_ADDRESS);
    ldt_segs[LUDATA_SEL].ssd_limit = btoc(UPT_MIN_ADDRESS);
    /* Note. eventually want private ldts per process */
    for (x=0; x < 5; x++) ssdtosd(ldt_segs+x, ldt+x);
    /* exceptions */
    setidt(0, &IDTVEC(div),  SDT_SYS386TGT, SEL_KPL);
    setidt(1, &IDTVEC(dbg),  SDT_SYS386TGT, SEL_KPL);
    setidt(2, &IDTVEC(nmi),  SDT_SYS386TGT, SEL_KPL);
    setidt(3, &IDTVEC(bpt),  SDT_SYS386TGT, SEL_UPL);
    setidt(4, &IDTVEC(ofl),  SDT_SYS386TGT, SEL_KPL);
    setidt(5, &IDTVEC(bnd),  SDT_SYS386TGT, SEL_KPL);
    setidt(6, &IDTVEC(ill),  SDT_SYS386TGT, SEL_KPL);
    setidt(7, &IDTVEC(dna),  SDT_SYS386TGT, SEL_KPL);
    setidt(8, &IDTVEC(dble),  SDT_SYS386TGT, SEL_KPL);
    setidt(9, &IDTVEC(fpusegm),  SDT_SYS386TGT, SEL_KPL);
    setidt(10, &IDTVEC(tss),  SDT_SYS386TGT, SEL_KPL);
    setidt(11, &IDTVEC(missing),  SDT_SYS386TGT, SEL_KPL);
    setidt(12, &IDTVEC(stk),  SDT_SYS386TGT, SEL_KPL);
    setidt(13, &IDTVEC(prot),  SDT_SYS386TGT, SEL_KPL);
    setidt(14, &IDTVEC(page),  SDT_SYS386TGT, SEL_KPL);
    setidt(15, &IDTVEC(rsvd),  SDT_SYS386TGT, SEL_KPL);
    setidt(16, &IDTVEC(fpu),  SDT_SYS386TGT, SEL_KPL);
    setidt(17, &IDTVEC(rsvd0),  SDT_SYS386TGT, SEL_KPL);
    setidt(18, &IDTVEC(rsvd1),  SDT_SYS386TGT, SEL_KPL);
    setidt(19, &IDTVEC(rsvd2),  SDT_SYS386TGT, SEL_KPL);
    setidt(20, &IDTVEC(rsvd3),  SDT_SYS386TGT, SEL_KPL);
    setidt(21, &IDTVEC(rsvd4),  SDT_SYS386TGT, SEL_KPL);
    setidt(22, &IDTVEC(rsvd5),  SDT_SYS386TGT, SEL_KPL);
    setidt(23, &IDTVEC(rsvd6),  SDT_SYS386TGT, SEL_KPL);
    setidt(24, &IDTVEC(rsvd7),  SDT_SYS386TGT, SEL_KPL);
    setidt(25, &IDTVEC(rsvd8),  SDT_SYS386TGT, SEL_KPL);
    setidt(26, &IDTVEC(rsvd9),  SDT_SYS386TGT, SEL_KPL);
    setidt(27, &IDTVEC(rsvd10),  SDT_SYS386TGT, SEL_KPL);
    setidt(28, &IDTVEC(rsvd11),  SDT_SYS386TGT, SEL_KPL);
    setidt(29, &IDTVEC(rsvd12),  SDT_SYS386TGT, SEL_KPL);
    setidt(30, &IDTVEC(rsvd13),  SDT_SYS386TGT, SEL_KPL);
    setidt(31, &IDTVEC(rsvd14),  SDT_SYS386TGT, SEL_KPL);
#include    "isa.h"
#if NISA >0
    isa_defaultirq();
#endif
    /* load descriptor tables into 386 */
    lgdt(gdt, sizeof(gdt)-1);
    lidt(idt, sizeof(idt)-1);
    lldt(GSEL(GLDT_SEL, SEL_KPL));
    /* resolve amount of memory present so we can scale kernel PT */
    maxmem = probemem();
    biosbasemem = rtcin(RTC_BASELO)+ (rtcin(RTC_BASEHI)<<8);
    biosextmem = rtcin(RTC_EXTLO)+ (rtcin(RTC_EXTHI)<<8);
    if (biosbasemem == 0xffff || biosextmem == 0xffff) {
        if (biosbasemem == 0xffff && maxmem > RAM_END)
            maxmem = IOM_BEGIN;
        if (biosextmem == 0xffff && maxmem > RAM_END)
            maxmem = IOM_BEGIN;
    } else if (biosextmem > 0 && biosbasemem == IOM_BEGIN/1024) {
        int totbios = (biosbasemem + 0x60000 + biosextmem);
        if (totbios < maxmem) maxmem = totbios;
    } else  maxmem = IOM_BEGIN;
    /* call pmap initialization to make new kernel address space */
    pmap_bootstrap ();
    /* now running on new page tables, configured,and u/iom is accessible */
    /* make a initial tss so microp can get interrupt stack on syscall! */
    pb->pcbtss.tss_esp0 = UPT_MIN_ADDRESS;
    pb->pcbtss.tss_ss0 = GSEL(GDATA_SEL, SEL_KPL) ;
    _gsel_tss = GSEL(GPROC0_SEL, SEL_KPL);
    ltr(_gsel_tss);
    /* make a call gate to reenter kernel with */
    gdp = &ldt[LSYS5CALLS_SEL].gd;
    gdp->gd_looffset = (int) &IDTVEC(syscall);
    gdp->gd_selector = GSEL(GCODE_SEL,SEL_KPL);
    gdp->gd_stkcpy = 0;
    gdp->gd_type = SDT_SYS386CGT;
    gdp->gd_dpl = SEL_UPL;
    gdp->gd_p = 1;
    gdp->gd_hioffset = ((int) &IDTVEC(syscall)) >>16;
    /* transfer to user mode */
    _ucodesel = LSEL(LUCODE_SEL, SEL_UPL);
    _udatasel = LSEL(LUDATA_SEL, SEL_UPL);
    /* setup per-process */
    bcopy(&sigcode, pb->pcb_sigc, szsigcode);
    pb->pcb_flags = 0;
    pb->pcb_ptd = IdlePTD;
}

[LISTING THREE]



/* Machine dependent constants for 386.  */

/* user map constants */
#define VM_MIN_ADDRESS      ((vm_offset_t)0)
#define UPT_MIN_ADDRESS     ((vm_offset_t)0xFDC00000)
#define UPT_MAX_ADDRESS     ((vm_offset_t)0xFDFF7000)
#define VM_MAX_ADDRESS      UPT_MAX_ADDRESS

/* kernel map constants */
#define VM_MIN_KERNEL_ADDRESS   ((vm_offset_t)0xFDFF7000)
#define KPT_MIN_ADDRESS     ((vm_offset_t)0xFDFF8000)
#define KPT_MAX_ADDRESS     ((vm_offset_t)0xFDFFF000)
#define KERNEL_BASE     0xFE000000
#define VM_MAX_KERNEL_ADDRESS   ((vm_offset_t)0xFF7FF000)

/* # of kernel PT pages (initial only, can grow dynamically) */
#define VM_KERNEL_PT_PAGES  ((vm_size_t)1)

[LISTING FOUR]



/*
 * pmap.h: Copyright (c) 1990,1991 William Jolitz. All rights reserved.
 * Written by William Jolitz 12/90
 *
 * Redistribution and use in source and binary forms are freely permitted
 * provided that the above copyright notice and attribution and date of work
 * and this paragraph are duplicated in all such forms.
 * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
 * IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
 * WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
 *
 */
/*
 * [excerpted from i386/pmap.h]
 * Recursive map version by W. Jolitz
 */

/* page directory element */
struct pde
{
unsigned int
      pd_v:1,         /* valid bit */
      pd_prot:2,      /* access control */
      pd_mbz1:2,      /* reserved, must be zero */
      pd_u:1,         /* hardware maintained 'used' bit */
      :1,         /* not used */
      pd_mbz2:2,      /* reserved, must be zero */
      :3,         /* reserved for software */
      pd_pfnum:20;      /* physical page frame number of pte's*/
};

#define   PD_MASK      0xffc00000   /* page directory address bits */
#define   PT_MASK      0x003ff000   /* page table address bits */
#define   PD_SHIFT   22      /* page directory address shift */
#define   PG_SHIFT   12      /* page table address shift */

/* page table element */
struct pte
{
unsigned int
      pg_v:1,         /* valid bit */
      pg_prot:2,      /* access control */
      pg_mbz1:2,      /* reserved, must be zero */
      pg_u:1,         /* hardware maintained 'used' bit */
      pg_m:1,         /* hardware maintained modified bit */
      pg_mbz2:2,      /* reserved, must be zero */
      pg_w:1,         /* software, wired down page */
      :1,         /* software (unused) */
      pg_nc:1,      /* 'uncacheable page' bit */
      pg_pfnum:20;      /* physical page frame number */
};

#define   PG_V      0x00000001
#define   PG_RO      0x00000000
#define   PG_RW      0x00000002
#define   PG_u      0x00000004
#define   PG_PROT      0x00000006 /* all protection bits . */
#define   PG_W      0x00000200
#define PG_N      0x00000800 /* Non-cacheable */
#define   PG_M      0x00000040
#define PG_U      0x00000020
#define   PG_FRAME   0xfffff000

#define   PG_NOACC   0
#define   PG_KR      0x00000000
#define   PG_KW      0x00000002
#define   PG_URKR      0x00000004
#define   PG_URKW      0x00000004
#define   PG_UW      0x00000006

/*
 * Page Protection Exception bits
 */

#define PGEX_P      0x01   /* Protection violation vs. not present */
#define PGEX_W      0x02   /* during a Write cycle */
#define PGEX_U      0x04   /* access from User mode (UPL) */

/*
 * Address of current address space page table maps
 * and directories.
 */
extern struct pte PTmap[], Sysmap[];
extern struct pde PTD[], PTDpde;

/*
 * virtual address to page table entry and to physical address.
 * Note: these work recursively, thus vtopte of a pte will give
 * the corresponding pde that it in turn maps into.
 */
#define   vtopte(va)   (PTmap + i386_btop(va))
#define   ptetov(pt)   (i386_ptob(pt - PTmap))
#define   vtophys(va)  (i386_ptob(vtopte(va)->pg_pfnum) | ((int)(va) & PGOFSET))
#define ispt(va)   ((va) >= UPT_MIN_ADDRESS && (va) <= KPT_MAX_ADDRESS)

/*
 * macros to generate page directory/table indicies
 */

#define   pdei(va)   (((va)&PD_MASK)>>PD_SHIFT)
#define   ptei(va)   (((va)&PT_MASK)>>PT_SHIFT)

[LISTING FIVE]



/* param.h: Copyright (c) 1989,1990,1991 William Jolitz. All rights reserved.
 * Written by William Jolitz 6/89
 * Redistribution and use in source and binary forms are freely permitted
 * provided that the above copyright notice and attribution and date of work
 * and this paragraph are duplicated in all such forms.
 * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
 * IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
 * WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
 */
/* Machine dependent constants for Intel 386. */

#define MACHINE "i386"
#define NBPG        4096        /* bytes/page */
#define PGOFSET     (NBPG-1)    /* byte offset into page */
#define PGSHIFT     12      /* LOG2(NBPG) */
#define NPTEPG      (NBPG/(sizeof (struct pte)))
#define NBPDR       (1024*NBPG) /* bytes/page dir */
#define PDROFSET    (NBPDR-1)   /* byte offset into page dir */
#define PDRSHIFT    22      /* LOG2(NBPDR) */
#define KERNBASE    0xFE000000  /* start of kernel virtual */
#define DEV_BSIZE   512
#define DEV_BSHIFT  9       /* log2(DEV_BSIZE) */
#define CLSIZE      1
#define CLSIZELOG2  0
#define SSIZE   1       /* initial stack size/NBPG */
#define SINCR   1       /* increment of stack/NBPG */
#define SPAGES  2       /* pages of kernel stack area */

/* clicks to bytes */
#define ctob(x) ((x)<<PGSHIFT)

/* bytes to clicks */
#define btoc(x) (((unsigned)(x)+(NBPG-1))>>PGSHIFT)
#define btodb(bytes)            /* calculates (bytes / DEV_BSIZE) */ \
    ((unsigned)(bytes) >> DEV_BSHIFT)
#define dbtob(db)           /* calculates (db * DEV_BSIZE) */ \
    ((unsigned)(db) << DEV_BSHIFT)

/* Map a ``block device block'' to a file system block. This should be device
 * dependent, and will be if we add an entry to cdevsw/bdevsw for that purpose.
 * For now though just use DEV_BSIZE. */
#define bdbtofsb(bn)    ((bn) / (BLKDEV_IOSIZE/DEV_BSIZE))

/* Mach derived conversion macros */
#define i386_round_pdr(x)   ((((unsigned)(x)) + NBPDR - 1) & ~(NBPDR-1))
#define i386_trunc_pdr(x)   ((unsigned)(x) & ~(NBPDR-1))
#define i386_round_page(x)  ((((unsigned)(x)) + NBPG - 1) & ~(NBPG-1))
#define i386_trunc_page(x)  ((unsigned)(x) & ~(NBPG-1))
#define i386_btod(x)        ((unsigned)(x) >> PDRSHIFT)
#define i386_dtob(x)        ((unsigned)(x) << PDRSHIFT)
#define i386_btop(x)        ((unsigned)(x) >> PGSHIFT)
#define i386_ptob(x)        ((unsigned)(x) << PGSHIFT)