How does your program know which Intel processor is the current system CPU? Robert looks at the options, including Intel's PUSHF/POPF technique.
September 01, 1996
URL:http://www.drdobbs.com/parallel/undocumented-corner/184409966
Robert is a design verification manager at Texas Instruments' Microprocessor Design Center. Robert can be reached via e-mail at [email protected].
The debate about the correct way to detect different generations of Intel microprocessors has raged for years. In one corner are programmers who traditionally used a series of PUSHF/POPF instructions to detect the FLAGs differences between processors. In the other corner, it always seemed I stood alone, arguing that this technique is flawed. The debate subsided somewhat in 1989, when Intel published an algorithm that relied upon PUSHF/POPF for microprocessor identification. But even while the naysayers said, "See, even Intel does it our way," I stood in my little corner saying "Sure, but it's wrong."
The truth is, neither algorithm is fail-safe. Intel's PUSHF/POPF method can misdiagnose which processor family is running and does not guarantee to operate outside of real mode. My technique should always run in v86 mode, but sometimes doesn't because of shortcomings in the design of many v86-memory managers-like EMM386 from Microsoft.
All current-generation Intel x86 processors have an instruction called CPUID that reads CPU identification information. This information can be used by software to dynamically take advantage of processor-specific programming techniques. Before CPUID, you needed to write an algorithm to detect differences between different generations of processors. This algorithm would serve much of the same purpose as executing the CPUID instruction. Intel didn't invent the algorithm; the company borrowed one that was in wide distribution on the Internet, and published it in the i486 Microprocessor Programmer's Reference Manual (Intel Corp. 1990), claiming "Copyright Intel Corporation." Oddly, the original algorithm was published in two halves, in opposite ends of the manual. Section 22.10 contained the algorithm to detect the differences between 8086 through 80386. Figure 3-23 contained the algorithm to detect the difference between the 80386 and 80486. The latest edition of this manual removes the code fragments, referring you to "AP-485, Intel Processor Identification With the CPUID Instruction," Order Number 241618 (ftp://ftp.intel.com/ pub/IAL/software_specs/ap48504f.pdf).
AP-485 includes the following comment:
Please understand that the code sequences have been validated by Intel to detect CPU_ID, math coprocessor function, and initialize accordingly. Any other approach may produce unpredictable results in future processors.
It's ironic that Intel claims that "any other approach may produce unpredictable results," since its algorithm is prone to failures that yield unpredictable results (as I'll demonstrate in this article). For more information on CPUID, see the text box "Pentium Detection," by Robert Moote (which accompanied the article "Processor-Detection Schemes," by Richard C. Leinecker, DDJ, June 1993).
The Intel algorithm relies on a series of PUSHF/POPF instructions to set and clear various FLAGs bits. Each generation of processor has a slightly different behavior which may be detected by this approach. This algorithm makes no attempt to detect the 80186/88 series of processors. In this regard, the algorithm is incomplete.
The 8086/88 is distinguished from the 80286 by attempting to clear bits 12-15 of the FLAGs register. The 8086/88 will always set these bits, regardless of what values are popped into them (see Listing One). The 286 treats these bits differently. In real mode, these bits are always cleared by the 286; in protected mode, they are used for IOPL (I/O Privilege Level) and NT (Nested Task). To continue the detection code, you need to set bits 12-15 in the FLAGs register, and see if they are cleared by the processor. If they are, then a 286 has been detected (see Listing Two).
If you get beyond this point in the algorithm, you know you have at least a 386. Therefore, it is safe to use 32-bit instructions, like PUSHFD/POPFD. This will be necessary in detecting the difference between a 386 and 486. These processors are distinguished from each other by attempting to set the AC flag in the EFLAGs register. This flag was introduced in the 486. The 386 never sets this bit, and always clears it when it is set by POPFD. Therefore, to detect the difference between these processor generations, the algorithm attempts to set this bit, to see if it is latched or cleared by the processor (see Listing Three).
At this point in the algorithm, you're almost home. To detect the difference between the 486 and the Pentium, you attempt to set another new EFLAG bit (bit-21) called the "ID flag." This flag has only one purpose-to indicate the presence of the CPUID instruction. This bit was first introduced on the Pentium, but later retrofitted into the 486. If the CPUID instruction exists on either processor, it may be executed to return the processor-identification information. 486s without the CPUID instruction will not be able to toggle this bit. Therefore, it is safe to execute a sequence of instructions on either processor that detects the processor's ability to toggle this bit (see Listing Four).
Once the algorithm gets to this point, you can execute the CPUID instruction to obtain the processor identification. This instruction can be run in any processor mode, at any privilege level. On the Pentium and 486, the CPUID instruction has two levels:
In spite of Intel's claim, this algorithm is far from perfect. For one thing, it fails to detect the 80186/88 series of processors. Even though this processor wasn't adopted by many PC manufacturers, it was used in some computers, primarily notebook computers. The 80186/88 processor contains most of the new instructions and CPU-generated exceptions contained in the 80286. These instructions include PUSHA/POPA, PUSH immed, SHL reg,
immed, and the invalid opcode exception. The only 80286 instructions and exceptions not implemented in the 80186/88 are those specifically used for protected mode. Failure to detect this processor could prohibit the use of some software that can take advantage of these new instructions and exceptions.
This algorithm is only designed to run in real mode, not in a virtual-8086 DOS box running under Windows. This limitation is even mentioned in the 486 manual. This results from the fact that PUSHF and POPF are privileged instructions that are sensitive to the I/O Privilege Level while running in protected mode. (DOS boxes, running under Windows, run in virtual-8086 mode-a special form of protected mode.) If IOPL is not equal to three, then a general-protection fault occurs while attempting to execute these instructions. The operating system then intervenes to emulate the instruction as it sees fit. Therefore, there is no guarantee that the operating system will mimic the real-mode behavior of the specific processor under test. In reality, this may not be as big a problem as it sounds. Windows sets IOPL equal to three for DOS boxes. This renders these instructions transparent to the operating system, and they execute without generating a fault.
Not all operating systems with a DOS-compatibility box follow the example set by Windows. OS/2 Warp uses a special form of virtual-8086 mode, called Virtual Mode Extensions (VME). Running in VME affords the protection advantages of running at IOPL=2 without incurring the faults generated by PUSHF/POPF used in this algorithm. (See http://www.x86.org/vme1 for a discussion on VME.) To accommodate this behavior, Intel modified the algorithms of PUSHF/POPF to allow them to run in VME without faulting to the host operating system. When IOPL<3, PUSHF always pushes an IOPL value of three onto the stack. This doesn't cause any problems for the Intel algorithm, as none of the detection code depends upon setting or clearing these two bits alone.
Should the CPUID instruction ever return a signed number (for example, 80000001h), the Intel algorithm will fail. In Listing Five, the instruction above the designated <- symbol is a conditional jump based on a signed comparison. This is a common programming error which can easily be fixed in the Intel algorithm.
This algorithm relies on undocumented processor behavior to detect the differences between early generations of Intel processors. The use of such programming tricks violates Intel's own recommendations. Consider the following guidelines set forth in various Intel manuals:
Software should not try to identify features by exploiting programming tricks, undocumented features, or otherwise deviating from the guidelines presented in this application note.
When bits are marked as reserved, it is essential for compatibility with future processors that software treat these bits as having a future, though unknown, effect. The behavior of reserved bits should be regarded as not only undefined, but unpredictable. Software should follow these guidelines in dealing with reserved bits:
These are strong guidelines set forth in Intel's documentation, and the irony of Intel's algorithm is that it violates each and every one of them. Detecting the difference between 8086/88 and 80286/88, and between 80286/88 and 80386, completely depends upon setting and clearing reserved bits in the FLAGs register, and then depends on the state of those bits when they are stored to a resultant register. Detecting the difference between 386 and 486, and between 486 and Pentium, depends upon setting an EFLAGs bit that is undefined on the previous-generation processor, then depends on that processor to clear the undefined bit. To abide by Intel's guidelines, the behavior of these undocumented FLAGs bits must be documented in their respective manuals-but they aren't. None of these differences are documented in any of the processors' respective data sheets. Processor behavior often isn't documented until many years after release. The 8086 FLAGs behavior was first described in the 386 programmer's reference manual in 1988 (nearly ten years after the 8086's introduction). The 80286 FLAGs behavior wasn't described until the Pentium manuals were introduced in 1993 (ten years after the 80286 introduction, and four years after Intel introduced this algorithm in the 486 manuals).
Even though Intel's algorithm violates all of its own guidelines, the company is partially exonerated by the Pentium programmer's reference manual, where Intel says that it's acceptable to use this algorithm to detect the differences in these processors. However, the Pentium manual doesn't change the prohibitions set forth in the 386 or 486 manuals; those prohibitions still exist. The following excerpt was taking from the Pentium Programmer's Reference Manual, chapter 5:
The setting of the flags stored by the PUSHF instruction, by interrupts, and by exceptions is different on the 32-bit processors than that stored by the 8086, and Intel 286 processors in bits 12 and 13 (IOPL), 14 (NT), and 15 (reserved). These differences can be used to distinguish what type of processor is present in a system while an application is running.
My biggest objection to this algorithm is that it's prone to failure on all processors newer than a 386. When it fails, the algorithm incorrectly determines that a 386 processor is installed in the system. The failure is caused when an interrupt occurs precisely where the <- appears in Listing Three. When this occurs, the AC flag is cleared (in real mode), and the algorithm fails to detect the correct processor type. The AC flag has always behaved in this manner, but the behavior wasn't documented until the 1994 edition of the Pentium Programmer's Reference Manual (chapter 25, description of INT instruction). There are a few ways to demonstrate this failure (assuming you're running on a 486 or later processor). You can put an HLT instruction or an INT instruction at the point designated by the "(", or run the algorithm in a loop. Eventually, a timer-tick interrupt will occur at this point. Inserting an HLT instruction will force the processor to wait for an interrupt before continuation. When the interrupt occurs, the AC flag will be cleared during its invocation. Listing Six presents source code to demonstrate this behavior.
The Intel algorithm isn't nearly as bad as it sounds. It has a few bugs that can easily be fixed. Intel's intentions were noble, but their implementation was flawed (see http://www.x86.org for an updated version of this algorithm). In spite of its drawbacks, the reasons this algorithm is in such widespread use are simple:
Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.
Listing One
pushf ; push original FLAGS
pop ax ; get original FLAGS
mov cx, ax ; save original FLAGS
and ax, 0fffh ; clear bits 12-15 in FLAGS
push ax ; save new FLAGS value on stack
popf ; replace current FLAGS value
pushf ; get new FLAGS
pop ax ; store new FLAGS in AX
and ax, 0f000h ; if bits 12-15 are set, then
cmp ax, 0f000h ; processor is an 8086/8088
mov _cpu_type, 0 ; turn on 8086/8088 flag
je end_cpu_type ; jump if processor is 8086/8088
Listing Two
or cx, 0f000h ; try to set bits 12-15
push cx ; save new FLAGS value on stack
popf ; replace current FLAGS value
pushf ; get new FLAGS
pop ax ; store new FLAGS in AX
and ax, 0f000h ; if bits 12-15 are clear
mov _cpu_type, 2 ; processor=80286, turn on 80286 flag
jz end_cpu_type ; if no bits set, processor is 80286
Listing Three
pushfd ; push original EFLAGS
pop eax ; get original EFLAGS
mov ecx, eax ; save original EFLAGS
xor eax, 40000h ; flip AC bit in EFLAGS
push eax ; save new EFLAGS value on stack
popfd ; replace current EFLAGS value
<-
pushfd ; get new EFLAGS
pop eax ; store new EFLAGS in EAX
xor eax, ecx ; can't toggle AC bit, processor=80386
mov _cpu_type, 3 ; turn on 80386 processor flag
jz end_cpu_type ; jump if 80386 processor
push ecx
popfd ; restore AC bit in EFLAGS first
Listing Four
mov _cpu_type, 4 ; turn on 80486 processor flag
mov eax, ecx ; get original EFLAGS
xor eax, 200000h ; flip ID bit in EFLAGS
push eax ; save new EFLAGS value on stack
popfd ; replace current EFLAGS value
pushfd ; get new EFLAGS
pop eax ; store new EFLAGS in EAX
xor eax, ecx ; can't toggle ID bit,
je end_cpu_type ; processor=80486
Listing Five
mov _cpuid_flag, 1 ; flag indicating use of CPUID inst.
push ebx ; save registers
push esi push edi
mov eax, 0 ; set up for CPUID instruction
CPU_ID ; get and save vendor ID
mov dword ptr _vendor_id, ebx
mov dword ptr _vendor_id[+4], edx
mov dword ptr _vendor_id[+8], ecx
mov si, ds
mov es, si
mov si, offset _vendor_id
mov di, offset intel_id
mov cx, 12 ; should be length intel_id
cld ; set direction flag
repe cmpsb ; compare vendor ID to "GenuineIntel"
jne end_cpuid_type ; if not equal, not an Intel processor
mov _intel_CPU, 1 ; indicate an Intel processor
cmp eax, 1 ; make sure 1 is valid input for CPUID
jl end_cpuid_type ; if not, jump to end
<-
mov eax, 1
CPU_ID ; get family/model/stepping/features
mov _cpu_signature, eax
mov _features_ebx, ebx
mov _features_edx, edx
mov _features_ecx, ecx
shr eax, 8 ; isolate family
and eax, 0fh
mov _cpu_type, al ; set _cpu_type with family
Listing Six
TITLE intel
DOSSEG
.model small
.stack 100h
;----- Include file section -----
includelib \masm\lib\miscutil.lib
includelib \masm\lib\videofns.lib
; ----- External declarations -----
extrn _get_fpu_type: proc
extrn _get_cpu_type: proc
extrn Set_cursor: proc
extrn Get_cursor: proc
extrn HEX32OUT: proc
extrn CLS: proc
extrn _cpu_type: byte
extrn _fpu_type: byte
extrn _cpuid_flag: byte
extrn _intel_CPU: byte
extrn _vendor_id: byte extrn _cpu_signature: dword
extrn _features_ecx: dword
extrn _features_edx: dword
extrn _features_ebx: dword
;------ Local variables & Equates ------
KBD_ReadFn equ 0 ; function to read keyboard
KBD_StatusFn equ 1 ; function to read keyboard status
; ------ Misc data variables ------
.data
PSeriesMsg label byte
db "P6: "
P6Buffer db " ",0dh,0ah
db "P5: "
P5Buffer db " ",0dh,0ah
db "P4: "
P4Buffer db " ",0dh,0ah
db "P3: "
P3Buffer db " ",0dh,0ah
db "P2: "
P2Buffer db " ",0dh,0ah
db "P2: "
P1Buffer db " ",0dh,0ah
db "P0: "
P0Buffer db " ",0dh,0ah,24h
P6Count dd 0
P5Count dd 0
P4Count dd 0
P3Count dd 0
P2Count dd 0
P1Count dd 0
P0Count dd 0
CPUTbl1 dw offset P6Count
dw offset P5Count
dw offset P4Count
dw offset P3Count
dw offset P2Count
dw offset P1Count
dw offset P0Count
CPUTbl2 dw offset P6Buffer
dw offset P5Buffer
dw offset P4Buffer
dw offset P3Buffer
dw offset P2Buffer
dw offset P1Buffer
dw offset P0Buffer
CPUID_Buffer db " $"
;------------------------------------------------------------------------
.code
.8086
start: mov ax, @data
mov ds, ax ; set segment register mov es, ax ; set segment register
and sp, not 3 ; align stack to avoid AC fault
call CLS ; clear screen
call Get_cursor
mov ah,9
mov dx,offset PSeriesMsg ; get message buffer address
int 21h
mov P6Buffer[8],'$' ; make ASCII$ string
mov P5Buffer[8],'$' ; make ASCII$ string
mov P4Buffer[8],'$' ; make ASCII$ string
mov P3Buffer[8],'$' ; make ASCII$ string
mov P2Buffer[8],'$' ; make ASCII$ string
mov P1Buffer[8],'$' ; make ASCII$ string
mov P0Buffer[8],'$' ; make ASCII$ string
@GetCPUID:
call _get_cpu_type ; determine processor type
call print
mov _cpu_type,0 ; clear it...for later
mov ah,KBD_StatusFn ; get keyboard status
int 16h ; read keyboard status
jz @GetCPUID
mov ah,KBD_ReadFn ; read keyboard function
int 16h ; get get key
mov ax, 4c00h ; terminate program
int 21h
;----- print proc near -----
xor bx,bx
mov bl,_cpu_type ; get CPUID
shl bx,1 ; *2
mov si,CPUTbl1[bx] ; get pointer to variable
add word ptr [si],1 ; adjust CPUID counter
adc word ptr [si][2],0
mov dx,608h ; get initial row/col pointer
sub dh,byte ptr _cpu_type
call Set_cursor ; set cursor position
mov si,CPUTbl1[bx]
mov di,CPUTbl2[bx] ; get buffer pointer
call HEX32OUT ; do buffer
mov ah,9 ; print it
mov dx,CPUTbl2[bx] ; get buffer address
int 21h
ret
print endp
end start