Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Embedded Systems

Undocumented Corner


July 1996: Undocumented Corner

Paging Extensions for the Pentium Pro Processor

Robert R. Collins

Robert is a design verification manager at Texas Instruments' Microprocessor Design Center. He can be reached at [email protected].


As early as June 1991, Intel circulated confidential documents describing the new features of their P5 processor, eventually known as the "Pentium Processor." Most of those new features would be shrouded in controversy, their details kept secret by Intel. But one of the most advanced features-Physical Address Extensions (PAE)-was entirely removed. PAE gave the processor the ability to address up to 64 GB of physical memory (36-bit address bus), and access page sizes of 2 MB. The larger physical-address space and the new 2-MB paging features were interrelated, as both were enabled by the same control bit. In all other operating modes, the normal 32-bit address space was in operation.

PAE would have been enabled with CR4, bit 5. When CR4.PAE=1 (CR4[5]=1), PAE (36-bit addressing) and large 2-MB pages would be accessible. When CR4-.PAE=0, A[35..32] would be forced to 0, regardless of what addresses could be generated in protected mode (when a descriptor pointing near 4 GB is combined with an offset that results in an address above 4 GB). Even when CR4.PAE=1, addresses above 4 GB would not be generated unless they were the result of a paging translation. The only means to access memory above 4 GB was through these extensions to page mode.

Before the Pentium went into production, PAE was removed. Even so, documented references to this feature still appear in the various Pentium manuals, and from other sources. In the Pentium Processor User's Manual Volume 1 (1993 edition), "Chapter 2: Overview," paragraph 8 mentions "extensions to the architecture which allow 2 Mbyte and 4 Mbyte page sizes." This reference was removed in the next edition of this manual. In the Pentium Processor User's Manual Volume 1 (all editions), Figures 3 and 4 show that the linear address is composed of four fields, one of which is called the "directory pointers." The directory pointers are unique to PAE. The Pentium Processor at iCOMP Index 735\90 MHz (Intel part number 241997 all revisions), Section 1.1, paragraph 5, also mentions 2-MB paging extensions. This same reference is present in other versions of this data sheet, using different part numbers and speed ratings (242323 all revisions). Ironically, this reference is absent in at least one analogous data sheet (241595). It is also worth mentioning that the Intel-designed in-circuit test probe also included software support for PAE.

The most significant and interesting details on PAE come from the Pentium Processor Family Developer's Manual, volume 3, and from an Intel Architecture Labs (IAL) CD-ROM (Special Edition: P6 Processor Software Developer CD April 1995). In my previous column, I discussed page-size extensions (PSE) on the Pentium and mentioned that setting some reserved bits in various page tables would cause a page fault when an access was made to that structure (see "Understanding Pentium's 4-MB Page Size Extensions," DDJ, May 1996). This reference was quoted from the Pentium Pro Processor Reference Manual, section 23.2.14.1:

A Page Fault exception occurs when a 1 is detected in any of the reserved bit positions of a page table entry, page directory entry, or page directory pointer during address translation by the Pentium processor.

I wrote that it might be tempting to associate the page-directory pointer as CR3 and that any such association would be incorrect. In reality, this is another reference to PAE that hasn't been removed from the Pentium documentation. As amusing as these references might be, the most expository ones are in a place that you might not have thought to look-the glossary. These glossary references virtually give away PAE detail:

  • Page Directory Pointer: An 8-byte entry in the Page Directory Pointer Table specifying the 36-bit physical address of the page directory, whether it's present in memory, and cache management (PCD and PWT) bits for the page directory.
  • Page Directory Pointer Table (PDPT): Contains four Page Directory Pointers used in page translation by the Pentium processor. The PDPT's physical address is specified by the CR3 register.
Finding the glossary, however, may prove to be more difficult. The 1993 edition of Volume 3 (241430-001) is the only hard-copy edition that contains the glossary. The early-1994 edition (241430-002) mentions the glossary in the "Table of Contents," but it is missing from the manual. Ironically, the late-1994 electronic edition (241430-003, available on CD-ROM from Intel) contains the glossary, but its hard-copy counterpart does not. Finally, neither the latest electronic version nor hard-copy of Volume 3 (241430-004) contains the glossary.

Searching the IAL P6 CD-ROM turned up a near-perfect description of the PAE paging-translation mechanism, the only error being the association with 32-bit addressing and 2-MB page sizes. Embedded in a small, obscure file on this CD is the following information:

What is 36-Bit Addressing?

This is achieved by enabling CR4.PAE=1 (Page Addr Ext.). Page Translation occurs by Bits 30&31 pointing to an entry in a page dir ptr table (a new structure pointed to by CR3). Page directory pointer table points to page directory and bits 21-29 point to entry in page directory. Page directory entry points to beginning of page table with bits 12-20 pointing to entry in that table. That entry points to page frame with bits 0-11 pointing to offset in page. Page sizes are 2M when using 32-bit addresses. (Request OS Writer's Guide if more information is required.)

We can deduce from this reference that PAE, like PSE, supports two page sizes-4-KB pages and 2-MB pages. Bits 0-11 of the linear address point to an offset in the page frame. This indicates 12 bits of addressability in this mode (212), or a 4-KB page size. When the PS bit in the page directory equals 1, bits 0-11 of the linear address are combined with bits 12-20 to allow 21 bits of addressability (221), or a 2-MB page size.

Other evidence of PAE may be found by examining the Pentium architecture itself. CR4[5] is marked reserved, and was to enable PAE; CPUID.flags[6] is marked reserved and was to indicate the existence of PAE; the model-specific register (MSR) TR8 is marked reserved and was to contain the upper-four address bits used for TLB testability.

PAE and 36-bit addressing are now implemented in the Pentium Pro Processor (P6). I managed to get a copy of the official Intel documentation just as the article was being finished. As a result, the description of PAE was deduced from these previously documented sources and from reverse engineering on a P6-based computer, not the P6 manuals.

How PAE Works

To support 36-bit addressing, it's necessary to make substantial changes to the paging mechanism. Thirty-two-bit linear addresses are still used, but they are translated to 36-bit physical addresses. Intel chose to use a three-tier paging mechanism to support PAE for 4-KB pages, and a two-tier mechanism for 2-MB pages. With PAE enabled, CR3 points to a small Page Directory Pointers Table (PDPT). Each PDPT entry references a separate page directory. Each page directory points to a page table (for 4-KB pages), or directly to the page frame (for 2-MB pages). Table 1 provides a detailed description of all of the CPU structures associated with page translations while PAE is enabled. For comparative purposes, Table 2 gives a detailed description of all of the CPU structures associated with page translations while PSE is enabled (4-MB pages). The description of these fields may be found in the appropriate Pentium documentation. Table 3 describes the fields associated with Table 1 and Table 2, which may need further clarification.

For the most part, the paging-translation process works the same as it always has. Linear addresses are converted to physical addresses through a series of table lookups. The most significant changes in the PAE implementation are the extra table lookup (extra level of indirection) and the changes in the paging structures themselves. When PAE is enabled, an extra level of indirection is added (the PDPT lookup). CR3 points to a 4-entry PDPT, with entries that point to the base of separate page directories. This behavior is different from all previous implementations, where CR3 pointed to the base of a single page directory. The size of the paging structures have doubled to accommodate the extra 4 bits in the base address field, but otherwise, they are virtually identical to their predecessors. The significance of these changes is twofold:

  • The size doubling of each structure entry has forced the number of entries in each table to be reduced by half (the physical size of each table remains the same).
  • Most of the software written to support the previous paging implementation will not be compatible with PAE.
The paging translation process is virtually identical to the prior implementation. The translation process is identical for 2-MB and 4-KB pages, except the page table reference is omitted: When a 32-bit linear address is presented to the paging unit, it is broken down into various fields:

  • CR3 points to the base address of the PDPT.
  • Address bits A[31,30] of the linear address form a 2-bit index into the PDPT. The PDPT entry contains the base address (and some control information) of a page directory.
  • Address bits A[29..21] form an index into the page directory (from its base). This page-directory entry (PDE) points to the base address of a page table (PDE.PS=0, 4-KB pages), or directly to a 2-MB page frame (PDE.PS=1).
  • Address bits A[20..12] form an index into the page table from its base (4-KB pages only). This page-table entry (PTE) points to the base address of a 4-KB portion of memory.
  • The remaining address bits, A[20..00] (2-MB pages), or A[11..00] (4-KB pages), form an index (an offset) into the actual page of memory.

In all cases, the base addresses contained in these tables (PDPT, PDE, and PTE) are physical addresses and are not subject to any paging translation. Figure 1 shows the page-address translation for 2-MB and 4-KB pages when PAE is enabled.

Caveats of PAE

New features are subject to caveats and bugs, and PAE is no exception. In this case, however, the caveats are understandable, and in some cases desirable:

  • The extra level of paging translation caused by accessing the PDPT does not degrade performance. The Pentium Pro caches all four entries of the PDPT, thereby eliminating any performance degradation.
  • Writing to reserved bits in the PDPT generates a general protection fault (#GP), not a page fault (#PG). This behavior contradicts the behavior described in the Pentium manual (volume 3, section 23.2.14.1), and Pentium Pro manual (volume 3, section 15.7.2). But other sections of the Pentium Pro manuals describe this behavior correctly.
  • Setting some reserved bits in the PDPT causes a GPF, while others don't. Bits [63..36], [8..5], and [2..0] are marked reserved, and should generate a GPF when set. A GPF does not occur when setting bits [63..49], [47..46], [44..36], and [0]. (Conspicuously, setting bits [48, 45] cause a GPF.)
  • Bit 0 in the PDPT is marked reserved but behaves as a present bit. Most likely, this is a documentation error. (See http:// www.x86.org/errata for my collection of Intel errata.)
  • CR3 contains a 27-bit address pointer to the base of the PDPT. In all other implementations (386, 486, Pentium, and PSE enabled), CR3 contains a 20-bit address pointer.
  • Even when PAE is enabled, only 4 GB of physical-address space can be addressed at any given time. Regardless of how big the page sizes are, only 32 bits of linear-address space are available. To address more than 4 GB of physical memory, CR3 must be changed (thus pointing to a different set of paging structures), or the existing paging structures must be modified (on-the-fly) by the operating system.
  • If any of the PDPT entries are modified by the operating system, these modifications will not take effect until CR3 is reloaded (thereby flushing the PDPT). Even if a certain PDPT entry has not been accessed by the microprocessor prior to its modification, any changes by the OS go unnoticed until the PDPT is flushed by reloading CR3. This behavior is a side effect of caching the entire PDPT.
  • It is impossible to share paging structures with the standard 386/486 paging structures or with the structures used for PSE. The PAE structures use 8-byte entries, where the old-style structures use 4-byte entries. This change actually forces better programming techniques.
  • As with PSE, rigorous type checking on reserved bits in the various paging structures is enabled when CR4.PAE=1. This behavior was explained in my previous column. (See http://www.x86.org for source-code demonstration of this behavior.)
  • Some ambiguity may exist when PAE and PSE are simultaneously enabled. The two features should be considered mutually exclusive, with PAE having higher priority. Table 4 shows the resolution to these ambiguities.

Conclusions

PAE may have been a feature that had come before its time. In 1991, when it was secretly introduced to developers, very few computers needed anything even approaching 4 GB of physical memory. For whatever reason, Intel decided to remove this feature before the Pentium went into production. As with the Appendix H features from the Pentium Processor, not all documentation references to this feature were removed. Tracking down all of these references allowed me to write source code to test this feature months before Intel released Pentium Pro documentation; see the file PAE.ASM, available electronically, and at http:// www.x86.org.

In many ways, this new paging mode is more desirable than 4-MB pages. By allowing page sizes of 4 KB and 2 MB, the operating system has better control over paging large data structures. Page sizes of 4 KB are obviously too small for these large structures, and 4-MB pages are often times too large. Therefore 2-MB pages are a good compromise for efficiently paging large data structures. Unfortunately, since PAE wasn't introduced five years ago, its data structures aren't backward compatible. It will be much harder for operating-system developers to justify using this new feature, since it is one generation removed from other new paging features on the Pentium.

Table 3: Description of paging structure fields.

      RSV      Reserved. If set (RSV=1), may cause a page fault. Setting this bit only 
               causes a page fault during page translation. If the referenced page entry 
               is in the TLB, then setting this bit and referencing the page will not cause a
               page fault.  If the entry is not in the TLB, or gets flushed from the TLB, 
               then the next reference to this structure will cause a page fault. The page 
               fault error code on the stack will have the RSV bit set (bit-3). If a reserved 
               bit in the PDPT is set, a GPF is generated when this struct is accessed. 
               This only occurs when CR3 is loaded, thereby implicitly loading the PDPT.

      0        Reserved, does not cause a page fault. When this bit is set, a page fault 
               does not occur, even though this bit is reserved.

      PS*      This bit is always set=1. When set=1, this page directory entry points to a 
               2-MB or 4-MB page.

      PS**     This bit is always clear=0. When clear, this page directory entry points to 
               a page table.

Table 4: PAE/PSE page-size precedence.

      CR4.PAE      CR4.PSE      PDE.PS      Page Size      # Address Bits

      0            0            0            4 KB            32

      0            0            1            4 KB            32

      0            1            0            4 KB            32
     
      0            1            1            4 MB            32

      1            0            0            4 KB            36

      1            0            1            2 MB            36

      1            1            0            4 KB            36

      1            1            1            2 MB            36




Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.