Hal is a hardware engineer who sometimes programs. He is the former editor of DTACK Grounded, and can be contacted at firstname.lastname@example.org.
In the early 1970s, the computer industry was a lot smaller than it is now. Data General made the popular Nova line of minicomputers. Keronix made Nova-compatible 32-KB core memory boards, which it sold for $3300, exactly half the price of the equivalent DG board. The Keronix plant -- headquarters, manufacturing, everything -- occupied about 5000 square feet of a multitenant single-story office building in Southern California just across the street from the freeway offramp in Santa Monica at 26th street. I'd see this plant every time I drove up from Santa Ana to visit my family.
One Sunday evening, Keronix had a fire. When the firefighters arrived, they found a man in the production area who'd been overcome by smoke. An empty gasoline can was nearby. The police were at his bedside when he was revived in the hospital, but he refused to reveal his identity.
He turned out to be a private detective from Philadelphia, who did a lot of work for a certain large legal firm in that city. Guess who was a major client of that legal firm? Aw, you guessed. The major client was Data General, headquartered in Massachusetts. There were rumors back then that business in that state was conducted in a very, um, competitive fashion.
This is a true story. Tracy Kidder told it in his Pulitzer-prize-winning book The Soul of a New Machine, but he got part of the story wrong: He thought Keronix made competing computers. Nope. Just ferrite core-based memory boards.
All this stuff was circumstantial, and no criminal charges were ever filed. However, exercising my rights of free speech, every time I ran across DG's local sales rep over the next decade, I'd ask him if Data General had firebombed any more competitors. Pop quiz: Keronix never sued Data General in civil court. Why? The answer will be given at the end of this column.
The Broken Paradigm
Historically, whenever Intel introduced a new CPU, it encountered the Osborne problem: not wanting people to stop buying the old CPU and wait, while production slowly ramps, for the new one. So the Intel flack would blow the dust off this old PR release:
Our magnificent new [fill in blank] CPU has such stunning performance that it will surely always be reserved for service in very high-end minicomputers and Unix servers. Don't ever expect to see this new chip in your home PCs.
IBM actually believed that bovine fertilizer and introduced the original 286-based AT as a multiuser box with additional RS-232 ports for connection of the dumb terminals for the other users of all that magnificent power -- at 6 MHz, with two wait states. It confused the IBM folk terribly when people insisted on buying the AT strictly for their own personal use.
When the Pentium Pro was introduced, a strange thing happened: This new chip was found to run Windows 3.1 more slowly than a Pentium of the same clock speed (see "P6 Underperforms on 16-bit Software," by Linley Gwennap, Microprocessor Report [MPR], July 31, 1995). "We really meant it this time!" the Intel flacks assured us. "The Pentium Pro really is intended only for use in high-end workstations and servers!"
The single biggest problem the Pentium Pro has in running 16-bit software is the frequent segment-register writes in legacy code. The Pentium's segment-descriptor cache was removed in the Pentium Pro. This has grave consequences. According to the aforementioned MPR article:
"...a write to a segment register cripples the P6's speculative execution. Segment writes cannot be executed speculatively in the P6, all previous instructions must be drained from the pipeline before a segment write can occur. Furthermore, because a segment change can affect the execution
of all subsequent operations, instructions following the segment write cannot be executed out of order but must wait for the segment register to be updated."
The MPR asserts that a segment write in the P6 has a total "cost" of 20 to 30 clocks. That's a heavy burden in a CPU that's supposed to execute multiple instructions in one clock, not one instruction in 30 clocks!
"He is Very Fast Between the Hurdles"
The above is the caption of an old cartoon where a runner in a 110 meter, high hurdles race is shown laboriously and clumsily climbing over a hurdle.
The P6 appears to be very fast between segment-register writes, which are, according to the MPR, "common in existing code because the 16-bit addressing model limits the amount of memory per segment to 64 KB." To be fair, the P6 achieved its stated objective: As of January 1, 1996 the 200-MHz Pentium Pro was the fastest shipping CPU for desktop (or deskside) computers as measured by the UNIX-based SPECint95 32-bit benchmark. It's faster than the fastest DEC Alpha, or SuperSparc, or MIPs, or any other CPU, whether RISC or CISC.
One of the changes made in the Pentium II version of the P6 core was to improve the handling of segment-register writes in 16-bit mode. Just this one change improves the speed of the Pentium II in Windows 95 by 8 to 10 percent, according to Intel.
Meanwhile, Back at the Ranch
One of the changes Intel made in revising the Pentium to add MMX was to double the L1 cache size. That one change, aside from MMX, makes the Pentium MMX (P55) run 10 to 20 percent faster than the Pentium at the same 200-MHz clock rate. (It also greatly increased the die size; more on this later.)
A very confusing situation has ensued: The P55 runs Windows 95 at about the same speed as the (far more expensive) Pentium II. This fact is terribly confusing to the PC consumer, who has grown accustomed to seeing new-generation Intel computers run twice as fast as the prior generation.
The Onslaught of the Visigoths
If you see one Mongol, you know there's a hundred hiding in the treeline. Intel made a billion-dollar profit in the first calendar quarter of 1997. That's billion, not million -- and profit, not sales. Gee, I wonder why Intel has attracted Visigoths, er, competitors for the x86 CPU market?
First, Cyrix and NexGen introduced x86 CPUs not designed by Intel. Then, AMD finally shipped its K5, to underwhelming reviews. Cyrix released a second-generation product, the M1. Recently, AMD improved the K5 to PR166 integer performance, and Cyrix got some M1s produced from .35 micron fabs and started shipping PR200+ CPUs. Most recently, AMD has shipped its NexGen-designed K6; Cyrix is about to introduce its M2; and while I was writing this column, Electronic Engineering Times revealed that IDT subsidiary Centaur has announced a 200-MHz Pentium MMX-class CPU for shipment this fall.
The Microprocessor Report asserts that yet another x86 intro is imminent and that a dozen companies have x86 developments in their R&D labs. Yep, a $1-billion quarterly profit is a mighty attractive treasure hoard for all them Visigoths.
They All Run at the Same Speed!
In the January 1994 issue of Dr. Dobb's Journal, an article entitled "CPU Performance: Where Are We Headed?" appeared, which pointed out that previous generations of CPUs had improved both clock speed and parallelism. The article also predicted that, "around 1996," the limits of parallelism (in typical apps) would be reached and that future CPU speed improvements would depend exclusively on smaller design rules and their concomitant faster clocks.
Guess what? When running Windows 95, there's no significant speed difference between the P55, K6, or Pentium II at approximately 200 MHz. The upcoming M2 is expected to fall into line. Intel has announced a faster version of the PII by upping the clock rate to 300 MHz (and the price to $2000). Gee, it would seem the author (me, of course) of that 3.5-year-old article was correct!
Competition: The Prerequisite
The desktop x86 market will consume about 80 million CPUs this year. Anybody who expects to grab a significant portion of that market can't just depend on a good chip design; you can't sell millions of chips if you can't make millions of chips.
AMD is trying to bring its Texas-based Fab 25 up to full production. Cyrix owns no fabs and so does not control its own destiny; it is currently at the mercy of IBM and SGS Thomson's Canadian fabs. Upstart Centaur is partly owned by the Japanese firm NKK, which owns CMOS fabs.
Intel is the company that wrote the book on continually investing in fabs, realizing that it could not sell what it could not make. (Imagine what would happen if Intel's fabs all went off-line today, right now. The entire PC industry would be shut down!) Today, a new CPU fab costs two billion in dollars and two years in time. Having money is not enough; you need to have started spending that money two years ago to produce chips today.
Rumors Mongered Here
This means the new rumor going around is, well, unexpected. The rumor was first expounded by Linley Gwennap in his editorial "Intel Fab Crunch Slows MMX Rollout" (Microprocessor Report, April 21, 1997). Several of EET's staff writers seem to be saluting this rumor in its April 29th issue. It seems, the rumor goes, that Intel got caught sitting on too much cash and too few fabs.
Here's the deal. Intel is selling three CPUs into the mass PC marketplace: the Pentium (P54), Pentium MMX (P55), and Pentium II. (Intel is not aiming the Pentium Pro at the mass PC marketplace.) Because of the differing die sizes, and because the yield of large chips is strongly dependent on the die size, Gwennap asserts that, for a given fab capacity, Intel has the choice of making four P54s, two P55s, or one Pentium II.
The rumor continues that Intel would like to "move" the mass market upward to the P55 and even the Pentium II. But it can't do that without shutting down the PC industry because it can't make enough P55s and Pentium IIs to satisfy that 80-million-unit market. This situation will change as Intel switches production from .35m to .25m design rules, but right now, Intel is stuck with the P54.
Intel has just announced new pricing. It dropped prices by about 50 percent on the 200-MHz P54, by much less on the P55, and almost no reduction was made on Pentium II pricing. The message from Intel to the marketplace -- buy P54s!
This is awkward because the newly introduced AMD K6 seems to be significantly faster than Intel's P54 (and, in fact, seems to be on a level with the P55 and Pentium II when running Windows 95). Many observers expect the upcoming Cyrix M2 to run at K6-equivalent speeds.
It's the Design Rules, Stupid!
A motto much like the above propelled our incumbent President into office. It is not really the clock rate that determines how much work a CPU can accomplish per unit time. For example, the AMD K5/PR166, clocking at 116 MHz, does about the same work per unit time as the newer AMD K6/PR166 with a 166 MHz clock! The K5/PR166 has a slower clock but the same design rules and the same CPU throughput.
What makes the difference is the design rules. Right now, Intel, AMD, and Cyrix all make their latest CPU designs using .35m fabs, and all of these CPUs run at roughly the same speeds.
Right now, the fight is over small differences in performance; 6 percent here and maybe as much as 20 percent there. Fact: Without a stopwatch or its high-tech equivalent, a benchmark program, you can't tell the difference between any of these CPUs when running popular productivity apps under Windows 95!
Vive la Différence?
Well, no, not where CPUs are concerned. There are significant differences in microarchitecture between modern competing CPUs, even from the same manufacturer. The Pentium II internals are quite different from the P54 and P55 internals. In fact, the AMD K5 and K6, produced by different design teams from (originally) different companies, are more similar internally than the Pentiums are to the Pentium II (in my personal opinion).
But if you can't tell the difference sitting at the keyboard, who cares?
The most significant difference is this: The Pentium II, with its backside L2 cache, has an advantage over all the socket-7 contenders (P54, P55, K5, K6, M2). AMD and Cyrix counter this advantage in the K6 and M2 with one of their own: a 64-KB L1 cache. The backside cache requires a proprietary connector and different support chips. A 64-KB L1 cache requires a larger die, impacting production capacity. But AMD and Cyrix are not burdened with having to support the PC industry. Intel is. All AMD and Cyrix have to do is try to make a profit.
Most PCs sold to John Q. Public in the near future will use P54 CPUs because that's the CPU that's going to be made in large quantities.
Answer to Pop Quiz
In the entire history of American jurisprudence, no wholly owned subsidiary has ever sued its corporate parent. Data General bought Keronix not long after that fire.