Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Design

Porting Unix to the 386: Device Drivers


FEB92: PORTING UNIX TO THE 386: DEVICE DRIVERS

This article contains the following executables: UXDRIVER.ARC

Bill was the principal developer of 2.8 and 2.9BSD and was the chief architect of National Semiconductor's GENIX project, the first virtual memory microprocessor-based UNIX system. Prior to establishing TeleMuse, a market research firm, Lynne was vice president of marketing at Symmetric Computer Systems. They conduct seminars on BSD, ISDN, and TCP/IP. Send e-mail questions or comments to [email protected]. (c) 1991 TeleMuse.


In our November 1991 installment, we discussed I/O device initialization via automatic configuration. Unlike predetermined or static configuration, automatic configuration is a powerful mechanism that reduces the complexity of different configurations, adjusts the operating system to best make use of resources (such as mass storage), and discovers and configures dynamically changing services (such as networks). Device initialization is an important element of any operating system -- how else could you use even the simplest disk drive or floppy or communicate on a network? By incorporating an active mechanism into our design, our drivers become a stronger base upon which to build a system. Automatic configuration, or autoconfiguration, is a key item which differentiates Berkeley UNIX from other systems, and is the hallmark of any Berkeley-derived system. By thinking ahead from the very beginning (for example, during initialization), we can anticipate the scope of a porting project.

Unfortunately, programmers often become so involved in the minutiae of a particular device type that they minimize the driver interface itself, leading to inefficient or faulty design. In the case of initialization, the driver is usually written only for the singular device desired, rather than in anticipation of future requirements. Later, it is altered to suit the dozen or so variations of device eventually used. This approach almost always results in more tinkering than actual coding. However, under the gun, many programmers just glue the driver onto the side of the operating system and hope it works. In 386BSD, we considered the UNIX kernel as merely a pervasive "interface," and the driver as the extension of this interface -- the last step to contact with the device itself -- instead of viewing the driver and UNIX as separate entities. Thus, we incorporated the "interface" into our driver design from the beginning and ignored the minutiae involved in driver operations which might confuse the issue.

In November we also introduced some of the vagaries of ISA devices that may be encountered in our device driver, and attempted to delineate possible conflicts that autoconfiguration might need to resolve. Here we will describe in more detail the support macros and functions of the 386BSD kernel that drivers use to work devices, such as splx() (the interrupt priority-level management) and the "interrupt vector code." We will also continue to build upon our "UNIX as interface" paradigm -- the kernel's interface to drivers -- by studying the mechanisms by which we can extend the kernel's grasp through the driver.

Finally, we examine some important "points of light" (sorry George -- we don't have a thousand) in our sample drivers, including the console, disk, and clock interrupt drivers, especially with respect to BSD autoconfiguration and interfaces. We'll also discuss the basic structure of these drivers, minimal requirements, and extending the functionality through procedures such as disk labels.

Device Drivers Needed for the Basic Kernel

As a reminder, our goal was to create a "basic" system that could ultimately compile itself, so we could use it to accelerate the progress of the port. (The alternative, cross-support, is considerably more tedious and error prone because of the communications burden.)

The drivers in the basic kernel must provide a root file system on a hard disk drive, some kind of terminal function, and a rescheduling clock (because UNIX is a multiprogramming system). Though a fully functional system will require more than this, we can build a self-supporting system and refine it later.

Because we desire 386BSD to be available to a broad audience, our choice of devices is relatively fixed. Our standard 386 ISA PC contains an AT-styled disk controller and a garden-variety 40-Mbyte disk, the same as hundreds of thousands of PCs everywhere. A standard PC keyboard and display adapter are our terminal devices, and our rescheduling clock is implemented from the onboard programmable timer on the PC's motherboard.

Contents of the Device Driver: A Dumping Ground?

Device drivers frequently suffer from middle-age spread. They accumulate features from the past, both bad and good, and grow to gigantic proportions. After a while, they become accumulations of baggage -- "dumping grounds" for unchallenged code.

For example, one megafirm's research lab was stymied in trying to cudgel a driver for a special disk and controller. Unfortunately, the lab's entire budget was shot on servers built with these drivers, and the researchers were therefore committed to seeing the project through to completion. Over the years, the driver had grown to be an obscure 3000-plus line monument that never worked. Yet the firm clung to it in the vague hope that with just one more line, it might all work out.

Finally, the company hired an outside expert in drivers. In a week, a completely new driver was written, tested, and completed, using (tiny) fragments of the old driver that were isolated and incrementally proven. The new, clearly written driver specifically implemented only the features needed by the server, and was a fraction of the size. Because it used a "minimalist" approach, the critical portions of the driver stood out in detail. The "crisis" came to a deterministic conclusion. (The servers went online.)

The moral here is to distrust anything overbuilt and underjustified. And when in doubt, simplify, simplify, simplify. With our early drivers, we must enforce Spartan discipline with respect to "featurism." We want simple and direct code that provides basic functionality. This is not to say we will be devoid of any features -- some flexibility is needed because our equipment is not uniform. For example, though the disk controller on ISA PCs is almost identical, the disk drives (for example, capacities and geometries) are quite different. We would also like to support common portions of different display adapters, so that if we need to run on an MDA in a pinch, we can. Finally, we would like to use display editors and buffer kernel error printouts on separate screens. All in all, though not elaborate, our needs are something more than stone knifes and bearskins.

It's not our intention to expand the content of these drivers much. They represent the "default" case of bare minimum support. Drivers targeted specifically at a device (say, the VGA) will also be the appropriate place for more elaborate functionality (such as bitmap and color palette support), and they will autoconfigure ahead of the default display driver.

Haven't We Met Somewhere Before?

Many drivers are essentially "copies" of other drivers, because one controller generally looks pretty much like another. However, when the framework of the driver is incompletely or incorrectly replicated, new bugs are introduced.

Terminal drivers are alike. In fact, the early UNIX terminal drivers were so similar that the "pseudo" or "super-driver" tty.c was created just to share the common code. Likewise, the 386BSD kernel contains support routines for disk and network devices to minimize driver code replication.

Display and Keyboard Driver

The Display/Keyboard driver (cons.c, available electronically; see page 3) provides two kinds of output. For user processes, it filters character I/O from the keyboard and to the display screen through the tty terminal driver. For the kernel, it provides a character output routine used by the kernel to print disaster or panic messages. Multiple virtual screen support is implemented to allow separate screens for the kernel, user session, and editors. A tiny terminal emulator is present to allow vi, emacs, or jove editors to run on the basic kernel.

Hard Disk Driver

The hard disk driver (wd.c and wdreg.h, also available electronically; see page 3) supports the AT-style, programmed I/0 disk controllers (WD100[2347]). The driver reads in a data structure called a "disk label" off a known sector (in this case, the second sector of the disk). This allows the driver to configure itself for arbitrary drives, because it consults the drive first to obtain information about how to use the drive, before any other transfers are attempted. By doing this, one driver covers all possible disk drives (and there are many) -- in other words, one size fits all! Someone, however, must craft such a disk label and use a program to tack it on before the disk can otherwise be used. As you might guess, we get into a "chicken and egg" situation here -- we need the disk label to be on the drive before we can write it on the drive! This is not a problem in practice, because all drives have at least the first 17 sectors in common (for example, the first track of the smallest size). So we use a default disk label that corresponds to the smallest drive, and overwrite that "logically" when we label the disk to give it its own identity.

Clock Driver

The clock driver (Listing One, page 93) is the simplest driver in some ways. We merely need to tickle the 386 ISA motherboard's timer device to generate clock pulses and gate to the interrupt control unit every cycle. The interrupt itself will enter the kernel's machine-independent clock processing routine, hardclock(), for everything else.

We will discuss this in detail when we describe process scheduling. We'll also see how hardclock() postpones work to a software interrupt clock processing routine called softclock(), where the work does not degrade real-time response.

Driver Operations

To get the feel of how the system uses drivers, we need to look at the functional interfaces between 386BSD and its drivers from the perspective of the system. Note that devices may fit into one of many different arrangements when interacting with the system:

Configuration.

During system device autoconfiguration, the system probe()s for the existence of a device and attempts to attach() it to the system for possible use via the driver. If the device has subdevices, it attaches each of these slave()s successively. Because each device/driver combination takes up resources that may interfere or interact with other resources, it's the responsibility of the driver(s) to resolve conflicts.

Normal Devices.

Most devices are accessed as files so the system can interact with them. During normal use, one needs to open() the device, read() or write() to exchange data with it, select operating modes via ioctl(), and ultimately close() the device. From the perspective of the system, these events satisfy a given need and do not necessarily completely resemble the semantics of ordinary UNIX files. They are similar, but far from exact; that's why they are called "special files."

For example, driver writers are often surprised when they try to use device open and close routines to increment and decrement a reference count, respectively. Many UNIX implementations don't preserve a one-to-one relationship here; the "closes," for instance, can outnumber the "opens" by a fair amount (as with a disk driver). The reason is that the kernel may view the device through many aliases. The point of the open/close routines is to present the semantics of what should be done to the device to put it in a consistent state for the requested action. Another surprise is that some drivers only read/write in units of integral record size, because they operate with the restrictions of the device underlying the driver. (For example, the hard disk controller only operates with integral sector size records.)

Just as much of the file paradigm matches the given device, so should the open/close/read/write mechanisms fit nicely with a record-oriented device, even if it's just a unit record device. Operating mode shifts should be a function of different driver filename, raw/block device partitions, or ioctl modes, so that they correspond to a different view or organization of the same information.

XXintr().

Devices with interrupts need a means of asynchronously informing the device driver of the event. Typically, this involves using as small a routine as possible to minimize time spent with interrupts masked out. The interrupt-driven portion of the driver is the "bottom," or lower part; portions that run on the kernel stack of the process that has the driver open are the "top," or upper part. In common use, the top part initiates a device operation which causes one to n interrupts to ultimately occur. The top portion then sleep()s, and eventually the interrupt routine supplies a wakeup() to allow the top half to finish processing the request.

Special Use.

Beyond the more obvious device driver entry points, other operations may be less clear.

To provide a means for loading/ unloading the block buffer cache used in implementing the complex UNIX file system, the strategy() routine of mass storage device drivers encapsulates the methodology for bounds-checking the requested transfer strategy() first enqueus the transfer, then starts the lead item in the queue's I/O request. Thus, it is possible to sort the queue so the resulting transactions are conducted in an order that minimizes a disk drive's head movement (thus reducing the time spent seeking on the disk).

Two other driver entry points are of note to disk device drivers: psize(), which returns the size (in blocks) of a partition that the system uses to dynamically determine the size of swap space; and dump(), which saves a snapshot of physical memory on a partition of the disk (usually the swap space) if the system crashes, so we can find the cause of the crash.

With all special devices, the select() routine provides the inner primitives of the 386BSD select() system call that scans for activity of a file. mmap() manages to map, via the mmap() system call, the device's I/O memory address space into a portion of the calling process's address space. Although this is the common way a user process (such as an X server) maps in a bitmap display's frame buffer, it can also be used to map in other kinds of device memory for direct manipulation by a user process (such as the shared memory of a DSP chip).

Network Devices.

Network devices interface to the system differently from other devices. They either send or receive a packet of information from a computer network. The packets don't go directly to a process, but instead interact with the protocol modules that implement the necessary processing of a packet. In a sense, the model here is more akin to "stimulus-response" than the file abstraction of special devices. It may be that protocol processing bounces the incoming message out again without making it a user process (as is the case with a redirected packet), or coalesces it with another (as is the case with a transport layer segment), or drops it (in the case of a redundant or corrupted packet).

These protocol modules are internal to the kernel, and they process the link-level packets that the packet drivers send/receive. The network packet driver is concerned with recognizing which kind of link or network-level protocol the packet should be sent to, as well as how to address packets from these different protocols to the link address of the destination host. As with Ethernet, thousands of network protocols can simultaneously use the same wire without interfering with each other.

These devices have no filename associated with them; instead, they have a name built into the driver itself. A network driver is configured for operation by passing it parameters (such as its address) via an init() entry point. (It does this by attaching itself into the protocol.) From that point on, protocols may choose to direct outbound packets to the device, solely on the basis of address. (For example, don't select interfaces by name but by capability to reach other machines.) Such packets are passed via the output() routine, which wraps a packet from a given protocol into a form suited to the device's requirements, tacks on the appropriate link-layer address, and enqueues the transfer. Because many device classes do this in an identical fashion, a support function is available in the kernel (in the case of Ethernet devices, ether_output()) that implements the common code. The lead packet on the queue is then passed to the driver's start() routine, which passes the packet to the device and reclaims the temporary storage assigned to the packet.

Packet reception is a simpler process. Upon interrupt, a packet is extracted from the device in the interrupt routine, placed in a freshly allocated portion of temporary storage, and matched to a given protocol by means of its incoming address and form. It is then enqueued on the input queue of the appropriate protocol, where it will be processed after the conclusion of all remaining interrupt-level code. Because this is common processing to many drivers, we also have an ether_input() routine for Ethernet drivers (such as the common output routines) to share like processing.

To gain access to the device while it is operating or to change operating modes, fetch statistics, and so forth, an ioctl() entry point allows utility programs to manipulate the device (by name).

Next Month

Many devices work on an interrupt-driven basis -- they signal an asynchronous event by generating an exception, which tells the processor to come and service them. To support this need, we must have the ability to enter, exit, and mask various processor interrupts. This topic is fairly complex, deserving detailed discussion, and will be taken up next month.



[LISTING ONE]
<a name="0063_0011">


/* [Excerpted from /sys/i386/include/param.h] */
   ...
#ifndef __ORPL__
/* Interrupt Group Masks  */
extern  u_short __highmask__;   /* interrupts masked with splhigh() */
extern  u_short __ttymask__;    /* interrupts masked with spltty() */
extern  u_short __biomask__;    /* interrupts masked with splbio() */
extern  u_short __netmask__;    /* interrupts masked with splimp() */
extern  u_short __protomask__;  /* interrupts masked with splnet() */
extern  u_short __nonemask__;   /* interrupts masked with splnone() */

asm("   .set IO_ICU1, 0x20 ; .set IO_ICU2, 0xa0 ");

/* adjust priority level to disable a group of interrupts */
#define __ORPL__(m) ({ u_short oldpl, msk; \
    msk = (msk); \
    asm volatile (" \
    cli ;                   /* modify interrupts atomically */ \
    movw    %1, %%dx ;      /* get mask to OR in */ \
    inb $ IO_ICU1+1, %%al ; /* get low order mask */ \
    xchgb   %%dl, %%al ;    /* switch the old with the new */ \
    orb %%dl, %%al ;        /* finally, OR both it in! */ \
    outb    %%al, $ IO_ICU1+1 ;     /* and stuff it back where it came */ \
    inb $ 0x84, %%al ;      /* post it & handle write recovery */ \
    inb $ IO_ICU2+1, %%al ; /* next, get high order mask */ \
    xchgb   %%dh, %%al ;    /* switch the old with the new */ \
    orb %%dh, %%al ;        /* finally, or it in! */ \
    outb    %%al, $ IO_ICU2+1 ;     /* and stuff it back where it came */ \
    inb $ 0x84, %%al ;      /* post it & handle write recovery */ \
    movw    %%dx, %0 ;      /* return old mask */ \
    sti                     /* allow interrupts again */ " \
    : "&=g" (oldpl)         /* return values */ \
    : "g" ((m))             /* arguments */ \
    : "ax", "dx"            /* registers used */ \
    ); \
    oldpl;                  /* return the "old" value */ \
})
/* force priority mask to a set value */
#define __SETPL__(m)    ({ u_short oldpl, msk; \
    msk = (msk); \
    asm volatile (" \
    cli ;                   /* modify interrupts atomically */ \
    movw    %1, %%dx ;      /* get mask to OR in */ \
    inb $ IO_ICU1+1, %%al ; /* get low order mask */ \
    xchgb   %%dl, %%al ;    /* switch the old with the new */ \
    outb    %%al, $ IO_ICU1+1 ;     /* and stuff it back where it came */ \
    inb $ 0x84, %%al ;      /* post it & handle write recovery */ \
    inb $ IO_ICU2+1, %%al ; /* next, get high order mask */ \
    xchgb   %%dh, %%al ;    /* switch the old with the new */ \
    outb    %%al, $ IO_ICU2+1 ;     /* and stuff it back where it came */ \
    inb $ 0x84, %%al ;      /* post it & handle write recovery */ \
    movw    %%dx, %0 ;      /* return old mask */ \
    sti                     /* allow interrupts again */ " \
    : "&=g" (oldpl)         /* return values */ \
    : "g" ((m))             /* arguments */ \
    : "ax", "dx"            /* registers used */ \
    ); \
    oldpl;                  /* return the "old" value */ \
})
#define splhigh()   __ORPL__(__highmask__)
#define spltty()    __ORPL__(__ttymask__)
#define splbio()    __ORPL__(__biomask__)
#define splimp()    __ORPL__(__netmask__)
#define splnet()    __ORPL__(__protomask__)
#define splsoftclock()  __ORPL__(__protomask__)
#define splx(v) ({ u_short val; \
    val = (v); \
    if (val == __nonemask__) (void) spl0(); /* zero is special */ \
    else (void) __SETPL__(val); \
})
#endif __ORPL__
   ...

<a name="0063_0012">
<a name="0063_0013">
[LISTING TWO]
<a name="0063_0013">

LISTING TWO IS CURRENTLY UNAVAILABLE


Copyright © 1992, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.