DMA Controller Programming in C
Robert Watson
Robert Watson has a B.S. in Electrical and Computer Engineering from the University of Texas at Austin. He is the owner of Intelligent Tools Company, which offers a library of functions for implementing DMA and interrupt driven software on IBM PC compatible systems. He can be reached at Intelligent Tools Company, PO Box 6334, Abilene, TX 79608, phone 817-725-7455, CompuServe [72762,1735].
Direct Memory Access (DMA), the transfer of data to or from memory without direct CPU intervention, is a relatively obscure process in modern computer systems. This obscurity is probably due to the relatively few devices capable of using DMA. In spite of its obscurity, DMA offers a simple and convenient means of transferring data between I/O devices and memory. In fact, DMA can usually transfer data at much higher speeds than the more commonly used interrupt-driven transfers.
In this article I will describe some theory of DMA operation, the DMA hardware in the IBM PC and PC/AT platforms, and Virtual DMA Services. Since each platform has a unique DMA implementation, it is necessary to select one for illustration and for implementation of an example program. I have selected the IBM PC/AT platform for convenience.
The Concept of DMA
Direct memory access is a technique for the transfer of data within a computer system. The transfer can take place either between memory and an I/O port or between memory and memory. The most significant aspect of the transfer is that it occurs without the intervention of the host processor. Instead, dedicated hardware manages the data transfer, generating the necessary address and control signals on the system bus.
For the purposes of this article, I assume DMA operations take place only between memory and an I/O port. I make this assumption in the context of the IBM PC, PC/XT, and PC/AT, which are not designed for memory-to-memory transfers.
Advantages of DMA
DMA transfers offer speed advantages to a system by decreasing CPU workload. DMA transfers decrease CPU workload by eliminating two kinds of CPU activities:
- "overhead" activities such as device status checks (polling loops to wait until a device is ready) and context switches (transfer of control to and from interrupt service routines).
- the copying of data from I/O to memory or memory to I/O.
Therefore, as an overall result of DMA transfers, other system components gain increased access to the system bus, and the CPU concentrates on tasks that it can perform more efficiently.
A Typical DMA Transfer
A typical DMA transfer is conducted as follows:
1. The host processor configures the hardware for a DMA transfer. The processor tells the DMA hardware whether the transfer will be from memory to I/O or vice versa, the length of the transfer (e.g., number of bytes or number of words to transfer), and which block of memory the transfer will involve.
2. The host system configures the I/O device (installed in the expansion bus) that will be the source or destination of the data. Like the DMA hardware, the I/O device may need to know the direction of the upcoming transfer (to or from memory) and the length of the transfer. What the I/O device needs to know depends upon the device. Often, the I/O device is user-supplied hardware that is not provided with the original equipment. The most common DMA devices are disk and network controllers.
3. When the I/O device is ready to perform a transfer operation, it signals the host DMA controller, requesting that a transfer begin. The host controller acknowledges the request and takes control of the address and control lines from the host system.
4. The transfer takes place. The DMA controller manipulates the bus control lines, causing memory to read from (or write to) the bus, and causing the I/O device to write to (or read from) the bus.
5. After the data transfer is complete, the DMA hardware returns control of the bus to the host CPU. Some computer systems (but not those in the IBM PC family) allow multiple data transfers to take place before control is returned to the host CPU.
6. Steps 3 through 5 are repeated until the number of operations programmed by the host in step 1 have been performed.
7. During the last programmed operation, the DMA controller signals the I/O device that the last transfer is complete. The DMA controller disables itself and allows no further DMA operations to take place.
8. Typically, the I/O device responds to the last transfer by issuing an interrupt to the host. If the transfer was from I/O to memory, the interrupt informs the host that the data is now in the memory buffer, ready to be used by the application program. If the transfer was from memory to an I/O device, the interrupt signifies that the buffer is now available for reuse in another DMA transfer.
Of course, what I have presented is a fairly general description of a DMA transfer. The specifics will vary from one kind of platform to another. The DMA controller in IBM PCs and ATs supports three transfer types, and the hardware has yet again several ways of interacting with the DMA controller. However, constraints imposed by the IBM PC design severely limit these variations, so most well-behaved hardware/driver combinations will operate in a manner similar to that described in steps 1 thru 8.
DMA on the IBM PC and PC/XT
The IBM PC and XT support three DMA channels, allowing three different devices to conduct DMA transfers on the system bus. These channels are prioritized to resolve conflicts when more than one channel attempts to access the expansion bus at the same time. This section describes the DMA hardware on the IBM PC and XT motherboards. This architecture is the basis for all systems descended from the IBM PC, such as the IBM PC/AT (ISA) and EISA.
Two sections of hardware are relevant to DMA programming on the IBM PC and XT: the DMA controller and the page registers.
The DMA Controller
The DMA controller is an integrated circuit designed specifically for controlling DMA transfers in computer systems. The DMA controller in the IBM PC and XT is an Intel 8237 Programmable DMA Controller. The 8237 is located at I/O port 0x00 and provides the logic to control four prioritized channels of DMA. The highest priority DMA channel (channel 0) generates dynamic RAM (DRAM) refresh cycles on the bus. The other three channels (1 through 3) are available for I/O devices. On most systems, channel 2 is the floppy disk controller.
DMA Controller Registers
DMA I/O routines read and write several 8237 registers to initialize the 8237, control DMA transfers, and monitor device status. The following is a description of the 8237 registers:
- Command. BIOS programs this register to initialize the chip. Changing the value in this register after the BIOS has programmed it would cause compatibility problems with other applications and devices. Since DMA I/O routines seldom modify this register, I will not discuss it further.
- Request. Only applications that perform block transfers need to access the request register. Since the PC and XT do not support block transfers, I will not discuss this register either.
- Word Count. Each DMA channel includes a 16-bit Word Count register, which the I/O routine programs with the number of bytes to be transferred. The routine writes a count value, one less than the number of bytes to transfer, into the register number. When the routine reads this register, the register returns one less than the number of bytes remaining to be transferred.
- Current Address. Each DMA channel includes a 16-bit Current Address register, which supplies the low 16 bits of the memory address during DMA transfers. This address points to the next byte to be read or written. After each DMA transfer, the DMA controller increments or decrements the Current Address register, depending on the Mode register's contents. If the auto-initialize feature is enabled, the Current Address register will be reset at TC to its last programmed value. Otherwise, after TC, this register will contain an address that points one byte beyond the end of the DMA buffer.
- Mode. Each DMA channel includes a Mode register that controls several aspects of the DMA transfers:
- Mask. The chip includes one Mask register that reserves one bit for each DMA channel. If a channel's bit is set, no DMA transfers can occur on that channel; the DMA controller ignores DMA requests on the system bus for DMA transfers involving this channel. Software can set or clear the bit: the DMA controller also sets the bit when the Word Count register underflows from 0 to 0xFFFF. However, if the auto-initialize feature is enabled in the channel's Mode register, the controller will not set the mask bit when the Word Count register underflows.
- Status. The chip includes one Status register, which contains two bits dedicated to each DMA channel.
- The two Status bits are:
1. Direction of transfer. Transfers may either read to or write from memory, so they will write to or read from an I/O port respectively.
2. Auto-initialization. When an I/O routine enables auto-initialization, the Word Count and Current Address registers automatically reset after TC (convenient if the next transfer will be to the same DMA buffer).
3. Buffer addressing direction. This setting controls whether the DMA controller accesses the memory buffer from successively increasing or decreasing addresses. DMA transfers can proceed from the beginning of the DMA buffer to the end, or vice versa.
4. Consecutive transfer configurations. The Mode register can also select several modes that affect the way consecutive DMA transfers are initiated by the I/O device. However, the IBM PC and XT platforms allow only one mode, Single Transfer.
1. The TC bit, which the DMA controller sets when a programmed transfer has been completed. (This bit is called the TC bit because it indicates that the WORD COUNT register has reached terminal count.)
2. The request bit, which indicates that a DMA request is pending.
Page Registers
Page registers are four-bit registers separate from the 8237 chip. The page registers provide additional address bits to those provided by the DMA controller. Since the 8237 provides only 16-bit addresses, and the IBM PC uses 20-bit addresses, the system needs page registers to fill in the highorder four bits.
During a DMA transfer, system hardware appends the page register's contents to the contents of a channel's Current Address register. Since the 8237 Current Address registers give no indication of underflow or overflow, a program can't adjust the page register for these conditions. As a result, a DMA transfer cannot cross a 64KB boundary in memory on an IBM PC or XT. This restriction also limits DMA transfers to lengths no greater than 64KB.
The page registers occupy the lower 4 bits of the following I/O ports: Channel 1, port 0x83; Channel 2, port 0x81; Channel 3, port 0x82.
Programming DMA Transfers on the IBM PC and XT
Seven steps are required to perform a DMA transfer on the IBM PC and XT:
1. The program must set up the DMA channeL Word Count register for the length of the DMA transfer. The program should set the register to one less than the number of bytes to transfer.
2. The program sets the DMA channel Current Address register with the lower 16 bits of the memory buffer address. The program must set the complete address (page register bits + Current Address) to the beginning (or end) of the DMA buffer. Note: programs should not put the offset (from the segment: offset pair of a far pointer) into Current Address. This far pointer offset won't work, unless the lower 12 bits of segment are zero. The general formula for calculating the value of Current Address is:
CURRENT ADDRESS = (int)(((((long)bp)&0xFFFF0000L)>>12) +(((long)bp)&0xFFFF))where bp is a far pointer to the DMA buffer and CURRENT ADDRESS is the value written to the Current Address register.
3. The program sets the DMA channel's page register to the most significant 4 bits of the DMA memory buffer address. The formula for calculating the page register value is:
page = (int)(((((long)bp)&0xFFFF0000L) +((((long)bp)&0xFFFF)<<12))>>16)where bp is a far pointer to the DMA buffer.
4. The program sets up the DMA channel Mode register, specifying the Address Increment/Decrement parameter (since usually the program selects Increment, it will ordinarily receive data in the desired order); and data transfer direction (memory read, memory write, or erify). Programs can use Verify to avoid trashing memory during debugging sessions (however, some I/O devices do not respond correctly when Verify is selected).
Programs performing multiple transfers to the same buffer should select Auto-Initialize to avoid having to reinitialize the controller registers at the end of each transfer. Note that when this feature is selected, the controller will continue to accept DMA requests from the I/O device after TC is reached, instead of disabling DMA requests in the MASK register.
5. The program clears the DMA channel mask bit to enable DMA transfers. The program can clear the channel mask in one of two ways. It can write to the Mask Write register, which will write to all channel mask bits in one operation. This function is useful to the BIOS since it affects all DMA channels at once. The other (more practical) method available to the program is to clear the Set/Reset Mask register, which accepts a bit for setting or resetting the mask bit in a particular channel.
6. The program sets up the I/O device to begin I/O processing. (Instructions for preparing the I/O device are beyond the scope of this article, since each I/O device is unique.)
7. The program monitors the DMA controller Status register to determine when the transfer is complete. Alternatively, if the program doesn't enable the auto-initialize feature, it can monitor the Word Count register for the value 0xFFFF, which indicates the end of the transfer (except when 0xFFFF is also the initial value). A third alternative for the program is to wait for an interrupt to indicate transfer complete, if the I/O device can generate such an interrupt.
Additional Hardware on the IBM PC/AT
The IBM PC/AT extends the PC architecture in three ways that affect DMA hardware and software.
- The AT supports 24 bits of address space vs. 20 bits on the PC and XT.
- There are 7 available DMA channels vs. 3 on the PC and XT.
- The AT bus supports 16-bit data transfers vs. 8-bit only on the PC and XT.
The first change is that DMA channel 0 no longer controls the DRAM refresh cycles. Instead, this channel chains the additional DMA channels available on the AT expansion bus. As a result, this DMA channel is unavailable for general use and is of very little consequence to anyone except designers of AT motherboards and ROM BIOS programmers.
The addition of four DMA channels chained off of DMA channel 0 implies that the new channels will have a higher priority than the three original channels inherited from the PC bus. Programs can change channel priority by software, but if they do, they will probably cause compatibility problems with other software and hardware.
The second change is expansion of the page registers from four to eight bits, to accommodate the ability of the AT to access 16MB of memory. Programs can calculate the required value of these 8-bit registers with the same formula presented for 4-bit page registers.
The third change is the addition of another 8237 DMA controller (DMA #2) and four page registers. This new hardware accommodates the four DMA channels which perform only 16-bit read and write operations.
Note that the three DMA channels inherited from the PC are still 8-bit channels. Therefore, programs can only perform 8-bit read and write operations through these channels. This limitation maintains compatibility with software written for the PC.
The second 8237 DMA controller is located at I/O port 0xC0. All four DMA channels are available on the expansion bus and are assigned channel numbers 4 through 7, with channel 4 having the highest priority.
DMA #2 is byte-addressable at even port addresses. While the first 8237 occupies 16 consecutive I/O port addresses, the second controller on the AT occupies 32 addresses, and can only be accessed at even-numbered ports. A program would normally find the Status and Command registers at address base+8 where base is the I/O port base address (0xC0 in this case); however, on the AT, these second 8237 registers are at address base+16. To calculate the I/O port address of a register n within this chip, multiply n by 2 and add it to the base port address, base+16.
Since DMA #2 always performs two-byte (word) read and write operations, the DMA controller updates Word Count and Current Address registers (incrementing or decrementing) once every two bytes instead of once every byte as in DMA #1. The AT takes advantage of this situation to allow DMA transfers up to 128KB in length, as opposed to the 64KB allowed on DMA #1.
In the AT implementation of 16-bit DMA transfers, bus address bit 0 is always zero. DMA #2 cannot transfer data to or from an odd address. Therefore, the Current Address register generates bits 1 through 16 of the memory address and the page register provides bits 17 through 23. Programs should set the Word Count register with the number of words, not bytes, that are to be transferred by the DMA channel (minus 1, as previously described).
For 16-bit transfers, programs should initialize the Current Address register in DMA #2 with:
current_address = (int)(((((long)bp)&0xFFFF0000L)>>13) +((((long)bp)&0xFFFF)>>1))where bp is a far pointer to the DMA buffer.
The page register for the 16-bit DMA channels needs to provide only seven address bits. The AT hardware uses the seven most-significant bits of the page registers, ignoring bit 0. Calculate the initial page register value as:
page = (int)((((((long)bp)&0xFFFF0000L) +((((long)bp)&0xFFFF)<<12))>>16)&0xFE)where bp is a far pointer to the DMA buffer. This formula sets the low-order bit of the page value to zero just to be safe.
The page registers are located at the following I/O ports: Channel 4, 0x88; Channel 5, 0x8B; Channel 6, 0x89; Channel 7, 0x8A.
Protected-Mode DMA Programming
The preceding sections on hardware programming apply when the processor is running in real mode. For the IBM PC and XT this is the only mode available. However, with more advanced processors and operating systems, protected mode is likely to be enabled when DMA transfers are being performed.
Protected mode affects two DMA-related activities: producing a buffer suitable for DMA transfers and programming the Current Address and page registers with correct address values. Protected mode complicates these activities, due to several problems which I describe in this section.
The Segmentation Problem
The first problem introduced by protected mode is that the segment register now holds a "selector," not a segment address. Unlike a segment address, a selector value cannot be combined with an offset to produce a physical address. Instead, programs must call functions that will return the base address of the selector and add the offset to calculate the "linear" address. In the 286 processor, the linear address equals the physical address and can be used to program the DMA controller. In 386 and higher processors, linear addresses may or may not equal physical addresses, and programs controlling DMA transfers will require help from the operating system to determine a physical address. A linear address for a pointer can be calculated as follows:
Linear = GetBase(((long)bp)>>16) + ((int) bp)where bp is a far pointer to the DMA buffer in protected mode, GetBase is some function that returns the base address of a selector, and Linear is the linear address referenced by the pointer.
The Virtual Memory Problem
Another problem encountered by protected mode is that while a program may have aquired a pointer to a buffer, some or all of the buffer may have been paged to disk. This paging can only occur when a virtual memory manager is running. At any time the virtual memory manager may write a page of memory to disk (in this case, "page" is a different entity than previously described) and reassign the physical page to another program. If a DMA controller is accessing memory when the page is reassigned, the transfer will be unsuccessful, and all kinds of unexpected and undesired events may occur. Programs can circumvent this problem by locking the buffer in memory before beginning a DMA transfer. Locking a buffer requires the cooperation of the operating system. Typically, virtual memory managers provide memory locking in conjunction with another operation involved in generating a DMA buffer, to be described later.
The Noncontiguous Memory Problem
All locations in a DMA buffer must be contiguous in physical memory, because the DMA controller generates only consecutive physical addresses while transferring data. It seems that it should be simple to obtain and keep a block of contiguous physical memory, but in protected mode it can be difficult.
A one-time memory allocation in protected mode will return memory that is contiguous in the linear address space. (The buffer memory exists at consecutive linear addresses beginning with the first address of the buffer and continuing for the number of bytes allocated to the buffer.)
The problem is that for 386 and higher processors, page mapping allows contiguous pages of linear memory to be mapped to noncontiguous pages of physical memory; and physical memory is the only kind of memory the DMA controller understands. Consecutive pages of linear memory do not have to (and usually don't) correspond to consecutive physical memory pages. Therefore, programs must ensure that their DMA buffers exist in contiguous physical memory, and do not cross any relevant boundaries (64KB or 128KB). While a program can get by with linear address checks on the 286 (since linear = physical on the DMA), it must check physical addresses on the 386, and determining the physical memory address of a buffer may not always be possible.
The Memory Caching Problem
Many advanced CPUs, such as the 80486, provide on-chip memory caching. This memory cache holds a copy of the most recently accessed memory. When the processor reads from memory, the CPU will first check if the memory to be read is stored in the cache. If the memory is stored in the cache, the processor will not read from memory but will read from the cache. The cache management hardware will keep the cache contents consistent with the contents of memory.
The cache controller has no way of knowing if a DMA transfer has modified the contents of memory. If the CPU reads from a portion of a DMA buffer that is maintained in the cache, and the DMA buffer has been overwritten by a DMA transfer, the CPU will be reading old data.
To avoid this problem, programs must disable the cache for that region of memory containing a DMA buffer. The caching hardware is processor-specific and applications cannot access it. Therefore, applications must depend on the operating system to provide cache management.
Obtaining a DMA Buffer in Protected Mode
In real mode, generating a DMA buffer is relatively easy. In protected mode, it can become one of the most challenging aspects of DMA programming, especially for high-performance transfers.
Here I will focus on creating DMA buffers in protected mode with a virtual memory manager on the 80386. This area is where the most difficulty lies and what much DMA programming entails today and will entail in the future. Two protected-mode interface specifications provide a way to obtain a DMA buffer in protected mode. These are the Virtual DMA Services (VDS) specification, and the DOS Protected-Mode Interface (DPMI) specification. Of these, VDS is the more useful.
Virtual DMA Services (VDS)
Operating systems that protect shared resources must restrict access to the DMA hardware. Under these operating systems, application-level processes running at a low privilege level cannot access the DMA hardware, and therefore cannot operate any special interface devices, unless the operating system provides some service enabling them to utilize the DMA hardware.
For MS-DOS and compatible systems, the standard interface providing these services is the Virtual DMA Services specification. This specification controls the virtualization of the DMA hardware and provides functions for manipulation of DMA buffers.
DOS Protected-Mode Interface (DPMI)
DPMI provides functions for manipulating selectors and memory pages and for allocating memory. Applications often use these functions to complement VDS services and achieve complete buffer management.
Note that some DPMI functions appear to offer page locking. Unfortunately, these functions are designed to lock memory used by interrupt handlers. If the virtual memory manager is capable of servicing a page fault during an interrupt, the DPMI specification allows a DPMI host to ignore these page lock requests, which makes DPMI functions useless for locking a DMA buffer. Therefore, programs should only rely upon VDS services to properly lock a DMA buffer and disable caching hardware if present.
Allocating a DMA Buffer
In protected mode, there are three approaches to acquiring a DMA buffer:
1. Allocate a dedicated buffer maintained by VDS.
2. Allocate memory with standard functions and lock the buffer.
3. Allocate memory with standard functions and lock each physical memory region that makes up the linear memory buffer.
VDS maintains a DMA buffer that programs can allocate with a VDS call. The easiest and most common method of obtaining a buffer is approach #1. However, if the program allocates this way, it will not receive a data pointer to access the contents of the buffer. Instead, VDS provides services that will copy data between a normal buffer allocated by the application and the DMA buffer. In many applications this access procedure is sufficient, though cumbersome. If this procedure is too slow, the program can generate a data pointer from the physical address of the allocated buffer. The program can use this address in conjunction with DPMI functions to create a protected-mode pointer to the DMA buffer. This modification to approach #1 allows a program to read and write the contents of the buffer directly.
The VDS-allocated DMA buffer scheme suffers from the limited availability of VDS DMA buffers. Typically, VDS provides only one DMA buffer which must be shared by all programs.
Approach #2 may be appropriate in some cases. In this approach, programs allocate a buffer with standard allocation functions, then lock it via VDS-applied functions. VDS can even remap pages of physical memory to create a contiguous buffer that does not cross the memory boundaries so crucial to the operation of DMA hardware. Unfortunately, the VDS specification does not require these options to be implemented, and such requests will fail on many platforms.
Approach #3 is also called a scatter/gather DMA transfer. To perform a scatter/gather DMA transfer, a program must first allocate a buffer by normal memory allocation methods. The program then locks this buffer with a VDS function that will lock each discontinuous region of physical memory. The function returns an array of memory addresses or page table entries that define each locked region of memory. The advantage of this method is that it does not fail when the buffer is discontinuous in physical memory. However, this method assumes that the program can make use of a DMA buffer that is broken into pieces located in various places in the physical memory space.
Example DMA Application
A class of peripherals that commonly perform DMA transfers is analog-to-digital (A/D) converters. These devices often must transfer data to system memory at high speeds and with low transfer latency. The example program records analog data to a file at high speed without interruption, through a Quatech DAQ-16 analog I/O board.
In addition to a Quatech DAQ-16, the application requires an active VDS host. Quarterdeck's QEMM-386 is one of many programs that will provide VDS services. You will need Borland or Turbo C++ to compile the program. I found it difficult, if not impossible, to make this application portable, due to the extensive use of port input and output calls, software interrupt calls, and the interrupt handler.
The example applies one technique that I have not previously discussed. When the DMA channel reaches TC, the interrupt handler must reconfigure the channel before the next transfer can begin. At high transfer rates, the interrupt handler can't complete the setup before the next data sample is due, and data is lost. To avoid this, I use two DMA channels. The DAQ-16 supports two DMA channels, primary and secondary. When one channel completes a transfer, the channel hardware issues an interrupt request, and the DAQ-16 switches to the other DMA channel. This scheme allows plenty of time for the interrupt handler to reconfigure the first DMA channel before the second channel is finished. The example program implements this two-channel technique to provide continuous transfer to disk of up to 200KB/sec, which is the limit of the DAQ-16.
Listing 1 is the main application code. Listing 2 is a header file for external functions that provide access to VDS services. Listing 3 contains the source code for the VDS service access functions. Listing 4 is a make file for building the application.
Function main processes the command line, opens the output file, calls Record, closes the output file, and exits.
Record initializes the buffer queues, interrupt handler, DMA hardware, and the DAQ-16. After initialization, Record enters a loop that performs the following functions:
1. continuously prints the recording status
2. checks the keyboard for a keypress
3. checks for full data buffers arranged in a queue by the interrupt handler
4. writes all data buffers from the queue to the output file.
When the user presses a key, the program does the following:
1. the main loop terminates
2. the DMA hardware and DAQ-16 are set to an inactive state
3. all remaining buffers are written to the output file
4. the interrupt handler is removed
5. Record returns
The DMA interrupt handler cannot call the file access functions to write data to the output file since DOS is not reentrant. Therefore, I transfer data from the DMA interrupt handler to the application main loop through buffer queues. After data is transferred to the main loop, I write the queues to the output file. I return the empty buffers to the interrupt handler through another queue.
Conclusion
Despite the sparse information available on DMA, the technique is actually very easy to use. DMA is often essential if you intend to interface to any device that requires a substantial transfer rate. DMA does not always perform the fastest I/O transfers. For example, on IBM PC/AT compatible platforms, DMA transfers are limited to a transfer rate of about 2MB/sec. This limit is a consequence of the DMA hardware design and the clock speed limitations of the ISA bus. While this transfer rate was considered very fast at one time, it has become slow by modern standards. For 80386 and higher systems, a data (memory-to-memory) transfer can be conducted by the CPU at many times this speed. For an I/O device that buffers its data in memory-mapped onboard RAM, the CPU could perform the data transfer much faster than the DMA hardware. However, even for low-performance devices, DMA is convenient and often maximizes system performance.