Database

CPUID Algorithm Wars

By Robert R. Collins, November 01, 1996

Robert presents a processor-detection algorithm that doesn't use Intel's CPUID instruction.

November 1996: Undocumented Corner

Robert is a design verification manager at Texas Instruments' Microprocessor Design Center. Robert can be reached via e-mail at [email protected].

In September, I presented the basics for determining the identification of specific families of x86 processors--the 8086/88, 80186/88, 80286, 80386, 80486, Pentium, Pentium Pro, and the like. In doing so, I analyzed the processor-detection algorithm distributed by Intel, noting that it failed to detect some processor families, contained a bug that caused it to misdiagnose some processors, depended on undocumented processor behavior, and wasn't guaranteed to work outside of real mode.

In spite of these shortcomings, the Intel algorithm remains in widespread use primarily because it's supported by Intel, and it works most of the time. I concluded by mentioning the possibility of writing an algorithm that was more reliable, detected more processor types, didn't rely on undocumented processor behavior, and returned the actual processor-stepping information--even on processors that don't support the CPUID instruction.

Much like Intel's, my technique uses an algorithm to detect different processor families that haven't employed the CPUID instruction. My algorithm executes a series of instructions that do not exist on all processor families. The algorithm detects if a processor supports a particular instruction, thereby detecting the processor family. For example, Intel implemented at least one new instruction with each successive processor family. My algorithm detects processor differences by attempting to execute these newly implemented instructions. If the instruction doesn't exist, an invalid-opcode exception is generated. My algorithm intercepts this exception and reports that an invalid opcode was detected. Once a newly implemented opcode executes successfully, the processor family has been determined. This will not work on the 8086/88 however, as this microprocessor did not employ the invalid-opcode exception type.

The 8086-family and 80186-family processors must be detected in a different manner. The iAPX 286 Programmer's Reference Manual, Appendix D (Intel, part #210498) provides critical information that benefits this algorithm:

"The iAPX 286 will push a different value on the stack for PUSH SP than the iAPX 86/88."

The 8086/88 and 80186/188 had a bug in the PUSH SP microcode. The 8086 and 80186 stored the value of SP on the stack after it was decremented, instead of decrementing SP first and storing the resultant value on the stack. Any program that executed PUSH SP and a subsequent POP SP would not receive the same value for the PUSH/POP combination. Therefore, this well-documented difference is exploited to detect the difference in these families of processors. Examples 1(a) and 1(b) illustrate use of the PUSH SP microcoded algorithm.

While the 80186 also demonstrates this same PUSH SP behavior, it also implements the invalid-opcode exception type. Therefore, it is possible to install an invalid-opcode exception handler and execute an instruction that exists only on the 80186-family processors. This would determine whether an 80186-family processor was installed. However, I couldn't use this approach. I needed to differentiate the 8086 from 80186, and therefore couldn't use the invalid-opcode exception for this purpose, as it does not exist on the 8086. Instead, I relied upon another documented difference between the 8086 and 80186. This difference is mentioned in the 80186/188, 80C186/C188 Hardware Reference Manual (#270788), page A-2:

When a word write is performed at offset FFFFh in a segment, the 8086 will write one byte at offset FFFFh, and the other at offset 0, while an 80186 family processor will write one byte at offset FFFFh, and the other at offset 10000h (one byte beyond the end of the segment).

All processors from the 80286 to 80486 are detected by installing an invalid-opcode exception handler and executing instructions unique to each processor. When an instruction executes (doesn't generate an exception), the processor family has been determined.

Any processor family that implements the CPUID instruction does not need to use this technique. This would include all Pentium-class processors, late-model 486s, and some non-Intel x86-compatible processors. Once a 386- or 486-class processor has been detected, my algorithm blindly executes a CPUID instruction. I don't make any attempt to verify the existence of the CPUID instruction using the Intel-recommended approach (attempt to toggle EFLAGS.ID). I do this because

I already have an invalid-opcode handler installed, therefore no harm can occur from executing the CPUID instruction.
The Nexgen Nx586 implements the CPUID instruction, but not the EFLAGS.ID bit. In this case, the Intel algorithm used to detect the EFLAGS.ID bit would fail.
The Cyrix 6x86 implements both EFLAGS.ID and CPUID, but must be enabled by programming a processor-
configuration register. It would be inappropriate for a CPU-identification algorithm to make any assumptions about the configurability of an unknown processor type and hence, program some model-specific registers.

The remainder of the algorithm depends on the success or failure of the CPUID instruction. If CPUID exists, it is used to determine whether or not the processor is a genuine Intel processor. If it is, the processor-detection algorithm is completed; if not, the results are modified with a flag to indicate a non-Intel processor (see the comments in CPUID.ASM, available electronically). If the CPUID instruction doesn't exist, an invalid-opcode exception occurs, and the processor model and stepping are determined by other means.

Detecting the Processor Model

Detecting the processor model on processors without the CPUID instruction is considerably more difficult. For example, the 80386, 80486, and Pentium have many model types. The 80386 family consists of models DX, SX, CX, EX, and SL. The 80486 family consists of models DX, DX2, DX4, SX, SX2, and SL, along with the 80487 DX and SX. The Pentium family is still growing, but currently has the P5, P54C, OverDrive, and OverDrive for DX4 processors. Sometimes it is possible to detect these different models, and other times it isn't. (The Intel algorithm makes no attempt at determining any processor models.)

The 80386 DX and SX may be differentiated from each other by detecting differences in the CR0 register. Bit 4 of this register (CR0.ET) may be toggled on the 80386 DX, while it is hardwired to 1 on the 80386 SX, CX, EX, and SL. This difference is found by comparing Figure 2.2 of the 386 SX microprocessor data sheet (#240187) to the other 386-class processor data sheets. A further distinction may be made to detect the 80386 SL, which has the ability to return its processor identification by executing a series of I/O operations (IN/OUT instructions). Accessing this internal register is documented in the 80386 SL data sheet (#240815), section 8.2. In this case, the actual family, model, and stepping information is returned.

In theory, the 80486 DX and SX may be detected in the same manner as the 80386 DX and SX. This difference is documented in Figure 2.5 of the 486 Microprocessor, 487 Math Coprocessor data sheet (#240950). However, in reality, the documentation is wrong, as CR0.ET is hardwired to 1, regardless of which 80486 is in use. Therefore, no distinction is made between the 80486 DX and the 80487 SX. However, the 80486 DX and 80486 SX may be distinguished from each other by detecting the presence of the floating-point unit (FPU). This algorithm is documented in the 80286 and 80287 Programmer's Reference Manual (80287 section of the manual), Figure 3-1 (#210498). The 80486 SL processor may be detected in the exact same manner as the 80386 SL. I did not attempt to write an algorithm to detect the differences between the 80486 DX2 and DX4. Therefore this algorithm makes no attempt to differentiate these models.

Detecting the Processor Stepping Information

A deeper level of processor identification includes obtaining the actual stepping information of the processor. The stepping information is useful in determining which bugs are fixed in a specific model. Even with this information, you must cross reference the stepping ID with the errata documents to determine which bugs are fixed. This information is publicly available for the Pentium and Pentium Pro. For all other processors, it is still kept confidential and released only by nondisclosure agreement.

The stepping information is contained in an internal processor register called the "ID register." On all 80386s and later processors, this information is returned in EDX after the processor completes a RESET. This information is not provided on any processor family before the 80386. Even though the format of this register has changed slightly over time, it contains the same essential information. To obtain this ID register on processors that do not support the CPUID instruction, you must RESET the processor. All PC compatibles have a means to recover from a RESET and return control to a user program. To do this, you must store a special signature in CMOS RAM, and a far-jump pointer in memory before the RESET occurs. After RESET, the BIOS reads this signature and returns control to your program at the specified far-jump pointer. This doesn't always work to retrieve the processor ID, as there is no guarantee that EDX hasn't been destroyed by the time the user program regains control. There is another technique, but it is much more complicated.

To capture the processor ID, you must take control of code execution immediately after a RESET occurs. Whenever the processor is RESET, the CPU starts execution from physical address 0xFFFFFFF0. The chipset maps the BIOS from 0xF0000 to the top of memory at 0xFFFF0000. This allows the CPU to start executing BIOS code immediately after RESET. We want to prevent the BIOS from gaining control of execution after RESET and, instead, direct control to our own program. In theory, this is impossible, but due to the design of the PC, there is a way to gain control of execution before the BIOS. Imagine what would happen if the first instructions the processor fetched from the RESET vector contained a bunch of invalid opcodes. The processor would generate an invalid-opcode exception and dispatch the invalid-opcode exception handler. The invalid-opcode exception handler is dispatched by reading a far pointer from address 0:0x18 and branching to that location whenever such an exception occurs. If you set up an invalid-opcode exception handler before a RESET occurs, and somehow prevent the BIOS from taking control of execution upon RESET, the exception handler could save the contents of the CPUID (contained in EDX) before returning control to your CPUID-detection program. The difficulty in this technique is forcing an invalid-opcode exception to occur after RESET, upon the first instructions fetched. Normally, this would not occur, as the first instructions fetched would always be valid BIOS instructions.

Forcing the CPU to fetch invalid opcodes upon RESET is easier than it might sound. This technique doesn't employ anything as esoteric as reprogramming the chipset to unmask the BIOS Shadow RAM, then modifying the executable RAM image of the BIOS to contain invalid opcodes, then resetting the processor. Surely this would work to generate an invalid-opcode exception, but this technique would be specific to each chipset implementation and is a ludicrous suggestion. There is a much simpler solution. Remember that the CPU starts executing at physical address 0xFFFFFFF0 upon RESET. If you could force the CPU to execute from a different physical-memory address or somehow make a different physical-memory address appear at 0xFFFFFFF0, this would likely suffice to cause the CPU to fetch invalid opcodes and generate the requisite exception. Unfortunately, there is no way to force the CPU to fetch from any address other than 0xFFFFFFF0, so you must think of a technique to force some other memory location to appear at that address. In this case, you can thank the CPU A20 gate, which is required by every PC-compatible computer. (I will not explain the CPU A20 gate, or the significance of the CPU address line A20 for PC-compatibility. For further information on this subject, see http://

www.x86.org/articles/A20.) When the A20 gate is programmed for 8086 compatibility, the CPU address line A20 is always grounded. This has the side-effect of making physical address 0xFFEFFFF0 map to 0xFFFFFFF0. If a reset occurs when the A20 gate is programmed in this mode, the CPU fetches opcodes from address 0xFFEFFFF0 instead of the normal location (though the CPU doesn't know the difference). Most likely, there aren't any valid instructions at this address, and the processor fetches invalid opcodes. Subsequently, the requisite invalid-opcode exception occurs. Thus, our invalid-opcode exception handler seizes control of execution and stores the CPUID value from EDX for later use by your program. The program SHUTDOWN.ASM (available electronically) demonstrates this technique.

There are times when this technique doesn't work. Some chipsets think of this anomaly as a bug that could lead to a system hang. Such chipsets map the BIOS to both locations (0xFFEF0000 and 0xFFFF0000). Therefore, regardless of the state of the A20 gate, a system RESET will gracefully invoke the BIOS reset code instead of allowing the system to hang. When a chipset decodes the BIOS in both regions, this technique will not work. Therefore, my source code detects this condition and doesn't attempt to use this technique if it's not going to work.

The Caveats

This algorithm is written for real and v86 mode. However, the algorithm (not the actual implementation) can easily be adapted to CPL-0 protected mode.

Executing this algorithm while using some memory managers may not work. Older memory managers, like Microsoft's EMM386.EXE, will not allow an application program to execute its own invalid-opcode exception handler. Such memory managers halt the system when such an exception occurs. Such behavior by the memory manager should be considered a design flaw on their part, and not consistent with Intel's original intent of providing the invalid-opcode exception type. The invalid-opcode exception was intended to trap invalid opcodes and allow extensions to the x86 architecture. It was envisioned that a software developer could define his own opcodes and rely on his own invalid-opcode exception handler to implement instruction extensions. This intention was documented by Intel in the iAPX 286 Programmer's Reference Manual, Appendix B:

Since the offending opcode will always be invalid, it cannot be restarted. However, the #UD handler might be coded to implement an extension of the iAPX 286 instruction set. In that case, the handler could advance the return pointer beyond the extended instruction and return control to the program after the extended instruction is emulated. Any such extensions may be incompatible with the iAPX 386.

Further supporting my point is the fact that Intel has reserved two opcodes specifically for this purpose. These opcodes will not be used by Intel and are guaranteed to be reserved on future generations of processors. These opcodes are documented in the 1995 edition of the Pentium Processor Family Developer's Manual, Volume 3 (#241430), Appendix A: "0F0B and 0FB9 should be used when deliberately generating an invalid-opcode exception."

All currently available memory managers pass the invalid-opcode exception to the faulting v86 task. This development relegates all of the objections lodged against my algorithm into obsolescence. I tested EMM386-95, 386Max v8.0, and QEMM v8.0, and found that my algorithm works without any problems.

I only make a half-hearted attempt to identify non-Intel processors. Some non-Intel processors are identified as a byproduct of my algorithm. To accommodate non-Intel processors, I have used two bits of the result code to indicate their detection (bits 15 and 14). The Intel CPUID processor specification (#241618) lists these two bits as reserved. Should Intel change their specification, my algorithm would need modification.

I make no attempt at determining some processor model types. In some cases, it may be possible to take advantage of processor differences (such as the number of CPU address lines) to detect model differences. However, any such technique would be at the mercy of the chipset implementation (which can map unused memory to the system bus), and render the technique ineffective.

The Results

I tested each processor in as many environments as I had available. In all cases, I booted from a Windows 95 boot floppy (except the 80286, which isn't supported by Windows 95). I tested the algorithm without any device drivers loaded (Clean), and using the latest versions of EMM386, 386Max, and QEMM. I also downloaded V86MON from Jim Brooks' Web site (http://webusers.anet-dfw.com/~jimb/ x86.htm). V86MON is a virtual-mode manager that is ideal for testing how a real-mode application will operate in v86 mode. The author claims to have run hundreds of real-mode applications without a single error by providing near-perfect real-mode emulation. V86MON has four flavors: v86 mode with and without interrupt virtualization, and enhanced v86 mode with and without interrupt virtualization. Table 1 summarizes the results.

Occasionally, the Intel algorithm fails to properly detect any processor higher than an 80386, instead identifying an 80486, Pentium, and Pentium Pro as an 80386. (I described this bug in my September column.) This error manifests itself when I boot without any device drivers (Clean) and while using all of the V86MON variants. As I found out, the near-perfect real-mode emulation of V86MON was a little too perfect, as Intel's algorithm demonstrated the same errant behavior when run under V86MON. The bug didn't manifest itself when run under EMM386, 386Max, or QEMM. In these cases, the bug was obscured because these memory managers didn't perfectly emulate real-mode behavior. Regardless, my algorithm isn't prone to this failure.

Intel's algorithm locked up the computer when run on V86MON (normal v86 mode) with interrupt virtualization enabled. My algorithm ran perfectly in this environment.

Intel's algorithm detects the AMD 5k86 as an 80486-class processor, even though it has all of the features and instructions of the Pentium, including the CPUID instruction. My algorithm correctly detects this processor as a non-Intel, Pentium-class processor.

Intel's algorithm detected the Cyrix 6x86 as an 80486-class processor, even though it has all of the performance and instructions of a Pentium-class processor. In real mode, my algorithm used the processor-shutdown technique to correctly detect the 6x86 as a non-Intel, Pentium class processor. Outside of real mode, my algorithm also detected the 6x86 as an 80486-class processor.

Intel's algorithm detected the Nexgen Nx586 as an 80386-class processor. My algorithm detected the Nx586 using CPUID, and correctly reported it as a non-Intel, Pentium-class processor.

Overall, both algorithms produce similar results and operate properly in a wide range of environments. Back when the CPUID algorithm wars were raging (circa 1988, 1989), the pundits lauded the Intel algorithm over my approach, claiming that their algorithm worked in a wider range of operating environments. I argued the technical correctness of my algorithm and claimed the memory managers, which kept my algorithm from running, were designed incorrectly. Therefore, I was proud to discover that all of the memory managers I tested for this column have fixed their designs. This allows my algorithm to run without failure. As my results demonstrate, my algorithm works in a wider range of operating environments, isn't prone to misdiagnosing the processor family, isn't prone to misdiagnosing non-Intel processors, provides the processor model type where applicable, and provides the model and stepping information on processors that don't support the CPUID instruction. In addition to these benefits, my algorithm is adaptable to protected mode, whereas Intel's isn't. This should settle the CPUID algorithm wars once and for all, or at least for the next delta of time.

Example 1: (a) PUSH SP algorithm for 8086, 80186; (b) PUSH SP algorithm for 80286+.

(a)

PUSH(SP) {
  SP = SP - 2;
  [SS:SP] = SP;
}


(b)

PUSH(SP) {
  TEMP = SP;
  SP = SP - 2;
  [SS:SP] = TEMP;
}

Table 1: Comparison of CPUID detection algorithms.

                  Clean  EMM386  386Max  QEMM    V86Vi  V86NoVi  Ev86Vi  Ev86Nvi

80286
 Intel Algorithm  2       N/A     N/A     N/A     N/A     N/A     N/A     N/A
 My Algorithm     20F     N/A     N/A     N/A     N/A     N/A     N/A     N/A

80386 DX
 Intel Algorithm  3       3       3       3       Hang    3       N/A     N/A
 My Algorithm     30F     30F     32F     30F     30F     30F     N/A     N/A

80386 SX
 Intel Algorithm  3       3       3       3       Hang    3       N/A     N/A
 My Algorithm     320     32F     32F     32F     32F     32F     N/A     N/A

80486 DX
 Intel Algorithm  4       4       4       4       Hang    4       N/A     N/A
 My Algorithm     402     40F     40F     40F     40F     40F     N/A     N/A

Intel 486 DX-50      
 Intel Algorithm  4       4       4       4       Hang    4       N/A     N/A
 My Algorithm     411     40F     40F     40F     40F     40F     N/A     N/A

Intel 486 DX4
 Intel Algorithm  4       4       4       4       Hang    4       N/A     N/A
 My Algorithm     480     480     480     480     480     480     N/A     N/A

Pentium
 Intel Algorithm  5       5       5       5       Hang    5       5       5
 My Algorithm     52B     52B     52B     52B     52B     52B     52B     52B

Pentium Pro
 Intel Algorithm  6       6       6       6       Hang    6       6       6
 My Algorithm     611     611     611     611     611     611     611     611

Texas Instruments 486
 Intel Algorithm  4       4       4       4       Hang    4       N/A     N/A
 My Algorithm     40F     40F     40F     40F     40F     40F     N/A     N/A

AMD 5k86
 Intel Algorithm  4       4       4       4       Hang    4       4       4
 My Algorithm     C500    C500    C500    C500    C500    C500    C500    C500

Cyrix 6x86
 Intel Algorithm  4       4       4       4       Hang    4       4       4
 My Algorithm     4520    40F     40F     40F     40F     40F     N/A     N/A

Nexgen  
 Intel Algorithm  3       3       3       3       Hang    3       N/A     N/A
 My Algorithm     C504    C504    C504    C504    C504    C504    N/A     N/A

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Database