Sculpting On Silicon: an Interview With Chuck Moore

JUN92: SCULPTING ON SILICON: AN INTERVIEW WITH CHUCK MOORE

Jack is a senior project manager at Vesta Technology in Wheat Ridge, Colo. He is a member of the ANS/ASC X3J14 Technical Committee for ANS Forth and is currently chapter coordinator for the Forth Interest Group. Jack can be reached as jax@well. UUCP, as JAX on GEnie, or as the Sysop of the RCFB BBS, 303-278-0364.

Chuck Moore, inventor of the Forth programming language, got his BS in Physics from MIT in 1960 while working at the Smithsonian Astronomical Observatory in satellite tracking and orbit determination. He then proceeded to Stanford, where he worked on stack-beam transport magnets. From 1964 to 1970 he worked a variety of programming jobs in the state of New York, working in Fortran, Algol, and eventually in his own language, Forth. Moving in 1970 to the National Radio Astronomy Observatory led to the implementation of Kitt's Peak Forth, and Moore's reputation was assured.

Moore cofounded Forth Inc. in 1973 and worked there for ten years before moving on to the field of microprocessor design. The Novix project ran approximately from 1980 to 1986 and resulted in the NC4000 (and later in the Harris RTX2000), a chip optimized for the dual-stack architecture and well-factored instruction set implied by the Forth virtual machine. After that, Moore says, he "had built the ultimate computer," and he kicked back to plan his next move. In 1989, using a computer based on the Novix as one design tool, Chuck Moore designed ShBoom, a 32-bit processor with a dedicated I/O coprocessor which controls 1 Mbyte of DRAM. Silicon was reached in 1990, and using his ShBoom, Moore built himself a graphic CAD workstation to design his next idea: P20.

Moore says, "I have a half a dozen more designs already. We're on the threshold of an explosion: silicon being made available to ordinary people. A lot of interesting things are going to be happening in the next decade."

Moore recently joined several of his friends at a quintessential California lunch of pasta marinara. Present were Chuck Moore (CM), proprietor of Computer Cowboys; Elizabeth Rather (ER), CEO of Forth Inc., "the second Forth programmer," taught the language by Chuck Moore in the early 1970s, now Chair of ANS X3J14 Technical Committee for Forth; George Shaw (GS), Shaw Laboratories, close associate of Chuck Moore on the ShBoom project, member ANS X3J14 Technical Committee for Forth; John Stevenson (JS), independent Forth consultant, member ANS X3J14 Technical Committee for Forth; Jack Woehr (JW), professional Forth project manager, member ANS X3J14 Technical Committee for Forth, and frequent contributor to Dr. Dobb's Journal; Mitch Bradley (MB), Sun Microsystems, creator of the Open Boot PROM, a Forth implementation present in the ROM of every Sun SPARCStation, member ANS X3J14 Technical Committee for Forth; and Ray Valdes (RV), Senior Technical Editor, Dr. Dobb's Journal

JW: Can you go back over the genealogy and inheritance of your boot-strapping process to reach the ideal CAD machine, starting with the Novix? I understand one processor has designed the next one; how far back does it go?

CM: Let's start with the PDP-11. I did a lot of graphic work on the PDP-11. Some of that led to the Novix [a NC4016 dual-stack machine; see "Forth Machines," Embedded Systems Programming, November, 1990]. More of it led to the circuit boards on which the Novix resided. And then the Novix took over the design activity, certainly for its own circuit boards.

RV: Now when you say design, you're talking design in the very specific sense of circuit design, or do you mean it in the broad sense?

CM: Anything that deals with layers of traces and things on boards, or two-dimensional...two-and-a-half-dimensions. If I look back on it, I see some very strong traditions--that the things I'm doing now, I did a very long time ago: I'm just doing them on a different scale.

For instance, the circuit boards [in the PDP-11 days] were 128x80 elements, or something like that. The chips [I'm now designing] are now 600x600 elements. So there has been an increase in complexity of two orders of magnitude, but very little else has changed.

Then from Novix, [the Harris Semiconductor] RTX sprung off, but I wasn't involved in that.

RV: How many elements were in the Novix chip?

CM: It was a 4000-gate array, but I didn't design that graphically. That was one of the problems. It had to be done in high level--HDL. All of the Novix's unsatisfactory characteristics derived from that.

ShBoom was designed on Sun workstations and Valid software. Again, it was not geometric, it was schematic-based, and all of its unsatisfactorinesses derived from that.

RV: Can you say what those unsatisfactory qualities were?

CM: Auto-place and -route.

RV: So, it just took up a lot of real estate?

CM: Well, my designs are like the design of a crystal: very regular, very orthogonal. The contemporary software does not permit you to do that...does not encourage you to do that. In fact, it tends to randomize any kind of design. I have a drawing of Novix in which there is no structure whatsoever in the layout.

In the case of ShBoom, it could not be routed until I had placed about 80 percent of the elements. I tried placing about 50-percent and it came out with 300 no-connects. The advice from the Oki engineers was, "You placed too much, you over-constrained it." I said, "No, I didn't place enough."

I had left little holes, and I said, "Any reasonable person would understand that the only thing that can go in this hole is a row of inverters." But of course, the auto-place didn't understand that. It ended up that I had to do all the placing anyway.

JW: I'm surprised that the software let you do so. Most of them won't even let you get that far with it.

CM: I know. It was real painful, because I had to specify all these gates with a three-letter identifier based on the netlist and position on an IBM mainframe screen. It took a week just to place a few thousand gates. There's no assistance at all in that process.

All of the auto-routing people I talked to say, "Oh yes, that was dreadful, dreadful. The situation is much better now." And they sound just like the Fortran compiler people, who say, "It can generate much better code than you are familiar with."

RV: So what is the negative impact, aside from the aesthetic one, in terms of the functionality of the resulting product? If they are placed differently, does that impact the speed?

CM: As geometries get smaller, the interconnect capacitance starts dominating. Unless you know where things are placed, you don't know where the interconnect is going to be, and you have to wait for a second pass through the place-and-route. And that second pass never happens.

JW: Are you suggesting that the "famous" Novix multiplication bug was the result of the auto-routing software?

CM: I'm exaggerating; in that case, we didn't adequately run test vectors. There was a workaround for Novix multiplication, so that wasn't a "bug": It was a "design feature." [Laughter at the table.]

The interrupt problem [with the Novix] was a bug. It was very difficult to test the effect of an interrupt coming in at an unpredictable time...

JW: ...in the middle of a two-cycle instruction.

CM: Yeah, so we underestimated the difficulty of testing asynchronous interrupts. The only interrupts we tested happened at benign times. So that was our fault.

MB: I'd like to hear the justification for why multiplication not generating the right answer is not a bug.

CM: It never affected any of my applications. [General laughter.]

MB: In other words, this chip was designed just for your applications, and that is why they were never made to sell to anybody.

CM: In fact, if it hadn't been for Greg Bailey [of Athena Programming in Oregon, Chair of the ANS X3J14 Technical Standing Committee] nobody might have known about the bug, because he discovered it by exhaustively testing all the boundary conditions, which we should have done, but weren't that clever. Or weren't that methodical.

I've got the same [design constraint] problem now with P20. Now ShBoom is the first microprocessor I've had access to with enough memory to actually do the layout on a 600x600 grid.

JW: Addressing a 1-meg DRAM array.

CM: It actually uses about one-half million words to represent my new design. I could have done it on a 386, but I couldn't have done it on an Apple, or on a PDP-11.

So now I have a tool where I could do this graphic-level design on a microprocessor. And indeed I do have all of these graphic elements laid out in four planes. I have 15 elements: horizontal, vertical, corners, tees, contacts...I just stack those up until I've got transistors and interconnects.

I run my simulator directly against this layout, with the honest capacitance, with the real inverting logic, exactly the way it is.

JW: Did you write this simulator?

CM: It's an adaptation of code which again dates all the way back to the PDP-11, through the Novix, to the ShBoom.

MB: The technology is very similar to spreadsheets: essentially, a spreadsheet like rectangular grid, and you evaluate each cell in a grid.

CM: That's right, and also it's a little bit like cellular automata. Exactly what it does depends on its neighbors.

JW: A pachinko machine! The ball is launched and goes down...

CM: Yeah, I do...I stick in five volts and watch it propagate.

GS: This simulates at an analog level, doesn't it?

CM: I have two levels, one strictly digital with gate delay timing, and an analog level. One is faster, and one is more precise. I'm going to combine them into one for the next design beyond P20. What I find is that I need the precision everywhere, to have any confidence in the design. Neither of those capabilities are available with conventional design techniques. You do netlist simulations and various degrees of approximations. They are really crude and hokey and bear very little resemblance to what you get out at the end. But if you don't know the layout, that's about the best you can do. You literally have a factor of two uncertainty that the layout introduces in your simulation.

JW: Because of propagation delays?

CM: Yes, call them that. The match between the transistor and the loaded drive is critical. And the load is now mostly...at least 50 percent and increasing...the interconnect.

Very often when I am laying this thing out, I have this little NAND gate, and it's going to drive another NAND gate, if those gates aren't real close together, both for reasons of remembering where they are and for reasons of keeping interconnects small and routing easy...

JW: ...the capacitance of the interconnect can overload the circuit.

MB: Or dominate the propagation, or switching time.

CM: And it's very hard to know with accuracy what those transistors are going to do until you build one and measure it.

This design is 1.2 micron CMOS. That's what everybody I know is working in. 0.8 is real nice...that I would prefer. 0.5 is state-of-the-art. So I'm a factor of four in speed away from cutting-edge technology. This is very conservative...and "very conservative" means 200 MHz.

JW: The P20 is going to have a 200-MHz clock?

CM: On-chip clock. You don't want that kind of frequency off-chip.

The P20 has four 5-bit instructions per 20-bit word. One reason I picked 20-bit words is that I had never heard of a 20-bit computer: It's an unoccupied niche in the world of computers, so let's stick something in it and see if it flies.

JW: And data memory is...?

CM: Twenty bits. Five 1Mx4 DRAM chips populate a single-board computer based on P20. And that's the advantage of 20 bits instead of 32: It only takes five chips instead of eight.

JW: What kind of DRAM is going to run at this speed?

CM: The new DRAMs have 30-nanosecond page-mode access. It's designed to use those memories, specifically, and it has on-chip DRAM timing. But it also is designed to use the memory cards, so you don't have an onboard ROM--you plug in a boot memory card, power up from that, either unplug it, leave the card plugged in, whatever you prefer.

The two new technologies are high-density DRAM and the new form-factor memory cards.

JW: You have wait-state circuitry so that you can boot from your ROM?

CM: Yes. ROM is very slow, it's 250 nanoseconds, and that is...I have a five-nanosecond clock, so it takes 50 clock cycles to read ROM.

MB: You can get 35-nanosecond ROM from Cypress if you are willing to pay for it.

CM: Yeah, but I'm really thinking of these cards, and I have on-chip timing circuitry.

RV: How long have you been working on P20?

CM: Intensely, since last winter, and it will take a few more months.

JW: Who is financing P20?

CM: Dr. C.H. Ting, of Forth note. I had gotten really discouraged, in that I had this capability and nobody was interested. He filled that gap, as one level of interest.

Nevertheless, the problem remains to find a customer in non-one quantities.

ER: If you were going to go out and look for a customer, what would you be offering? What is the wonderful thing that this chip does?

CM: It generates NTSC video. So take this chip which costs me, in quantity, a dollar, and plug it in the back of your television set and go.

MB: What about HDTV?

CM: Fine. I'll modify it, or we'll fill the gap somehow. But this is the first of a family of computers that have different capabilities, different word lengths, different memory interfaces, different instruction sets, all of them sharing a number of features.

RV: You're implying more the commonality of the design approach as opposed to chunks that get put together, like libraries or bitslices.

CM: Well, the chunks I have are registers, are memory interface, ALU. If I have to change the instruction set, it will slow things down a whole lot. If I can just perturb the instruction set a little bit, it'll be a quick design cycle.

RV: How far up do you think this can be scaled? To workstations?

CM: It can be, but that's a tough market, it's so thoroughly occupied. But I would like an excuse to do a 64-bit chip one time. I don't know of any applications which need 64 bits, but if you can stretch the design like they stretch the 747...

JW: Are you endorsing the philosophy of the application-specific microprocessor?

CM: Absolutely.

JW: Do you believe it is going to come down to a cottage-industry level?

CM: The fact that I can do it means that a whole lot of other people can do it, too. I think there is going to be a great radiation of microprocessors. It's not going to be dominated by Intel, Motorola any longer.

MB: I disagree, because the cost of putting a microprocessor into systems is dominated by what it takes to create and maintain the software. There is so much inertia in that system that creating a custom microprocessor, even if it's a factor of two better than what you can buy off the shelf, is not going to be compelling. There may be a small number of applications where you can justify creating a microprocessor architecture and the software to support it.

CM: Mitch, Mitch, Mitch...it takes a week to put Forth on any of these processors!

MB: I know, but people don't want to write Forth code. I've been fighting that for years.

CM: Yes, I would say that the workstation is not the market, because workstations are operating-system intensive and low volume. It's going to be the new widgets of the future that benefit from this.

JW: To what extent will this device be able to be viewed as a general-purpose microprocessor?

CM: It's really very fast, so if people need a processor, they'll probably make some trade-offs and use it in a high-performance application, rather than design their own. So in that sense, it's general purpose, but not in the sense of the 68000 which goes into a lot of products and has a massive support base.

JW: What about seeing it as a general-purpose embedded control processor? Could you build a single-board computer around the P20 that different people could use in different applications, provided that it came with a ROM Forth?

CM: Yes, but a 20-bit computer is not going to be ideal for word processing, unless you like 10-bit bytes.

RV: Comparing the scale of the design effort here with the design of the conventional microprocessor, in terms of the number of engineers, number of elements, what would be a comparable chip, and what would be the scale of effort for the nonminimalist approach?

MB: The new Sparc chip that Sun and Texas Instruments are developing has three million transistors, unbelievably complicated, scads of people working on it. It's not the same class of thing, so it's really hard to even talk about them in the same breath.

It's clear that the simple approach can give you tremendous bang per buck of engineering. But it seems like this very rarely is factored into decisions. Decisions are based upon market momentum. Market momentum is generated by big companies spending humongous amounts of money in advertising and marketing and exaggerating and fighting, whatever it takes to generate lots of people knowing about your product. Most people buy stuff just because they have heard of it.

CM: Don't underestimate the small market. I don't advertise, and I have all that I can handle. P20 has three or four successors committed already, variants, expansions, speed improvements...I can see a one-man design house. I'm busy. I can make a business of this.

MB: But can you hire a secretary?

CM: Double my business and I'll hire a secretary. I don't know to what degree it will scale up. But there are a lot of people out there who would like to have their own microprocessor and see now a chance to do it. Whether their reasoning is correct or not, whether the economics work for them, is their problem.

RV: Do you have plans to make your development environment available, to go into the tool business?

CM: Yes, that's the next level of generalization. People want it, sight unseen; if I can do it, they can do it too. It is a very peculiar development package, and it certainly is not marketable at the moment. I have a three-key keyboard. Currently, these keys are taped to the ends of my fingers, so I can touch my fingers to my thumb and select one of seven menu entries. Now, the neat thing about this...a lot of people do menus...but my menus are invisible! Because it's perfectly obvious to me what these fingers do, and I don't need something on the screen to tell me.

So it's a bit of black magic. I have to make this a whole lot more comfortable to the unskilled user.

MB: You and your computer are symbiotic...

CM: I hope so; we've lived together long enough!

JW: How deeply nested are the menus?

CM: Four, maybe. Not very.

JW: And at one menu level you can pick a character from the alphabet.

CM: Better put, I can scroll through the characters so I can pick the one I want.

JW: Then select, so you can insert text labels by searching, selecting, moving one position to the right...

CM: Just like on the video arcade games...and it's equally clumsy, but I don't do that much [text]. Mostly, I'm scrolling through graphics characters or words, if you like, if you really want to write you want to use words, not characters.

JW: So you can save entities composed of these basic elements...

CM: Right.

JW: ...and recall that entity to insert it into the diagram, such as a complex selection of gates?

CM: I can pick a region of interest and move it or flip it or whatever...or replicate it twenty times to get a register.

RV: Are these stored by name, or just visually accessed?

CM: Visually.

RV: So you don't do much specification of alphanumeric characters, then.

CM: Very, very little. The most characters I see are in memory dumps. And as sophistication increases, the memory dumps get interpreted not as hex, but as a decompiler generating Forth. So you can have quite readable object code.

On the other hand, I don't typically take it to that nice level. I don't need to. So again, a marketable tool would have to have a larger level of refinement.

RV: Is there any pointing device?

CM: Just the cursor moving on the screen: up, down, left, right.

JW: Can you move, say, southwest by holding two fingers down?

CM: No, I don't have enough fingers. Typically, [gesturing with fingers] this one is up, this one is down, these two are left, these two are right, and that leaves me three to actually change menus or something else. Four of my keys are immediately gone, off the top, for motion.

JW: How much Forth source code went into writing this system?

CM: No source. But about 4 Kbytes of object, and it was all constructed on ShBoom. ShBoom doesn't have any compilers...[turning to George Shaw] Does it have a Forth compiler?

GS: Not yet.

CM: Or a C compiler. So ShBoom is all programmed in machine code.

JW: You sat there and entered the hex bytes.

CM: That's an easy instruction set, compared to the 386.

MB: When I read the 486 manual, I was amazed that they had got this all working. Then I got to the bug list in the back, and said, "Oh, They didn't get it all working."

RV: I just wonder if there is a point where it [the Intel architecture] will collapse under its own weight. I'm surprised it's gotten this far, it's a tremendous engineering feat.

GS: It's amazing, if any of the technology applied to increasing the hardware had been applied to increasing the software, it would have grown by leaps and bounds.

MB: I don't think so. Managing this complexity, you can just throw amazing amounts of people and computing power at it. You can't really do that for making leaps and bounds of improvement in design. Those kind of things only happen when there is one person who has a better idea and the skill to do something about it. Managing these humongously complicated projects, you can throw endless amounts of money at them, and eventually succeed in some fashion.

CM: But that's the thing that's bankrupt, the humongously complicated project. If I can come up with a processor that is even vaguely comparable to the 386, and I'm sure I can, this undermines their charter. You're going to have a hundred people out there producing microprocessors at one percent of the overhead that Intel has.

MB: By that token, IBM would be out of business, yet we know that they are not.

CM: No, Intel won't be out of business, but they will have forfeited the future, just like IBM and DEC and all the others have. Apple and IBM, by their combination, have mutual suicide. They have thrown away a marketplace that is now accessible to people like me, because they decided to build bigger operating systems, hardware-independent everythings, slower, more ponderous...it's a boon! The Forth community profits tremendously from that conjunction.

JS: Except that with them goes the common wisdom that bigger is better.

JW: But is it true that that is the common wisdom nowadays?

ER: Is there any indication at all that there is a groundswell of people that are appalled by that trend? I haven't seen it.

MB: It's marketing. There is so much positive feedback in the economy in terms of the fact that you can't be successful until people perceive that you are already successful. It's a chicken/egg situation, you can't sell product unless you have already sold product, a lot of people have heard about it, have a good, warm feeling about it, feel safe.

JW: There's a self-limiting factor in that these megacorporations very quickly drive out the people who created the product. [Several noted organizations] for example are slowly self-destructing internally by driving out people who will not work in those environments where there are ten levels of managers above one.

CM: I recently encountered the magazine, Midnight Engineering.

RV: A great magazine.

CM: And that is eye-opening. Here is a whole support industry for the very, very low-scale industry. So that indicates to me that the growth is all going to come from below. These midnight engineers are the resource that is going to dominate the next decade.

JW: They may dominate the creativity, but usually the reward for invention is preempted by the marketeers.

CM: Entry costs are going down, sophistication is going up. These people are becoming an increasingly potent force.

MB: The engineering entry costs are going down, but the business entry costs are going way up.

CM: You learn to be an effective entrepreneur, instead of a victim.

GS: One of the reasons that the United States has been innovative in technology, is that you have the ability as an individual to start your own company and make a bunch of bucks. In other countries, it's not nearly so easy.

RV: Take the Japanese model, where it's not one grand flash of insight leading to one medium-sized innovation, it's building a team of people, each one contributes an incremental insight over a long period of time. You get something like the Sony HandyCam, which I don't believe could have been built in this country, because one person, no matter how brilliant, could not have built that by themselves. It would require a 20-year incremental process to get to that point.

GS: In an article in Midnight Engineering, the guy was marvelling at some of these very interesting circuits that are in some of the very small Japanese electronic items. He ran into the fellow who designed the stuff that went into these Japanese products: an American.

ER: I think that there are some developments that are amenable to the anthill approach, and some that are not. There are achievements that come about as a result of a breathtaking insight that only comes from an individual. There is nothing in, say, the HandyCam that isn't intrinsically technologically evolutionary. But there are a number of breakthroughs in the history of television technology per se that were single, unique breakthroughs, and those are the ones that come from the creative entrepreneur.

RV: History does prove that this happens again and again. Here we have an example where someone could have said, "This can't be done with 4K of object code and a three-key keyboard."

CM: I see [chip design] as an art form. I think that sculpting on silicon is going to be an increasingly important process for an increasing number of people in the next decades. Maybe I'm one of the first to approach it this way, that here I have my piece of silicon and I'm doing what I want on it. I want to make a buck, maybe, I want to do some things that are neat, nice, or pretty.

So there's this whole aesthetic involved, as well as pragmatism, and it doesn't need to be as difficult as people make it out to be. Mature technologies never are. They may be black boxes when they start, but they eventually become part of common knowledge and trivial, like circuit board design is now.

JW: Like farming, or breaking a horse to be ridden, something that is shared by a society.

CM: The first time is black magic, then it becomes routine.

JS: Some people, like Henry Ford for example, start off as innovators and end up at the end of their life punishing those who innovate and saying that innovation is not important.

ER: Chuck seems not to have fallen into that particular pitfall.

CM: Not yet!

An Update on ANS Forth

The spring equinox found the ANS/ASC X3/X3J14 Technical Committee in Hillsboro, Oregon, mulling over feedback from the first public-review period of dpANS (draft-proposed ANS Standard) Forth. This step continued the process described in "Forth: A Status Report" (DDJ, October 1991).

Although the few substantive alterations in the document must now result in another public-review period, it is pleasingly obvious that the matter is down to the proverbial jots and titles. Secure is the overarching concept: It is possible to propound an architecturally independent semantic description of Forth, traditionally the most hardware-wedded of all high-level languages.

While my wife Eleonore cooked breakfast waffles each day for committee members, the committee was not above serving up a few waffles of its own. The definition of DOES> was altered. The implementation now defines whether DOES> reveals or leaves hidden the name header of the definition in which it occurs. The concise definitions of ALLOT and related constructs were expanded in the interest of clarity, rendering them less concise and arguably less clear. In a rush to cover a previous omission, eight words of critical functionality were entered into the floating-point word set, somewhat loosely specified.

Quibbles like these notwithstanding, even the most rancorously debated issues involved extremely minor points. No one who was authored Forth systems based on previous drafts of the document will have to spend more than a few hours implementing these new changes to the standard.

The smallish membership of X3J14 has spent person-decades and well over $200,000 on this effort. We're tired but content that we have nearly fulfilled our five-year mission to seek out new worlds for Forth to conquer. The next meeting of X3J14 may promulgate true ANS Forth.

-- J.W.

Sculpting On Silicon: an Interview With Chuck Moore

An Update on ANS Forth

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Sculpting On Silicon: an Interview With Chuck Moore

An Update on ANS Forth

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content