The Heart of a CPU
Last time I talked about using CARDIAC as a teaching tool. A paper computer is certainly novel, and the spreadsheet version lets students get familiar with the architecture without having to write on cardboard. I wanted to go further, though. My eventual goal is to find a CPU that would be simple enough for a class to produce using Verilog and possibly even target a real FPGA.
I've looked at simple designs like mcpu but usually they are somewhat tricky to make the design small and simple. I wanted something small and simple, but also something that was clear. My own Spartan Blue CPU is another candidate, but it is a little complex, especially considering it isn't a modern architecture. I don't want to spend too much time revisiting circa 1970 architectures when a serious student would want to progress to pipelined RISC architectures, for example. On the other hand, learning about pipelines and other exotic CPU features is a bit much for a first step, which is why I have been looking at things like CARDIAC.
If you looked at the spreadsheet version last time, you may have noticed the list of instructions built into the spreadsheet. For the Verilog version, I plan to only do the original instructions and not build the extended ones that I added (although these would make good student projects).
The architecture is simple, which is a good thing. There's a "bug", which is the program counter and an accumulator. There are also 100 memory cells (00 to 99). The machine uses decimal, which is a mixed blessing. On the one hand, you don't have to spend time teaching binary and hex to students. On the other hand, no modern practical machines use decimal and there are several reasons why that's true.
I decided, however, to build my version of CARDIAC faithful to the original, for the most part, so it uses binary coded decimal (that is, each decimal digit has its own 4 bits in storage). Like the original, numbers stored in memory also have a sign, so they can be plus or minus.
I named the project VTACH (Verilog Teachable Architecture for CPU HelloWorld) — ok, so I'm reaching a bit for the H — and you can download the current snapshot (which isn't complete, but does work) in the online listings.
The basic instructions are pretty simple. All the instructions use three decimal digits. In all cases, the first digit specifies the operation the CPU should perform. In most cases, the second two digits are a memory address. There are a few exceptions:
- 0XX — Read from the input "tape" and store the result in memory cell XX; Halt on a blank input
- 1XX — Load the contents of memory cell XX into the accumulator
- 2XX — Add the contents of memory cell XX to the accumulator
- 3XX — Jump to memory location XX if the accumulator is negative
- 4XY — Shift the accumulator X places left (times 10) and Y places to the right (divide by 10)
- 5XX — Output the contents of memory cell XX
- 6XX — Store the accumulator into memory cell XX
- 7XX — Subtract the contents of memory cell XX from the accumulator
- 8XX — Jump to memory location XX; also store the return address to location 99
- 9XX — Halt and reset; XX doesn't matter
How hard can this be in Verilog? You'll see that most of the work is jumping through the binary coded decimal hoops. A redesign to use hex number and true binary would probably greatly simplify things.
I used 9's compliment arithmetic to handle negative numbers and subtraction. This is similar to 2's compliment that you often use with binary, but it is suited for decimal numbers. However, representations of numbers in the accumulator and memory are always positive with the sign bit set. So 0001 is a positive one, and 1001 is a negative one (the top digit can only be a 1 or a 0).
Internally, however, negative numbers and subtractions are converted to 9's compliment by subtracting each digit from 9. You have to add 1 to the result, just like with 2's compliment. For example, suppose you have 500 and you want to subtract 155. Assume the sign bit is set to one on the 999. Then the CPU has to compute 1999-0155. This is 1844. If you compute 1844+500 you get 0344 (remember the sign bit can only be 1 or 0, so the 1 plus the carry from 8+5 makes the sign bit turn to 0, not 2). Adding 1 results in 0345, which is the correct answer.
For input and output, I made a Verilog interface and then wrote modules that used
$fscanf. There is also an alternate module that reads a "tape" set up in the top-level test bench.
You might wonder why I chose Verilog when I could just as easily have described the architecture in C or another language. Eventually, I want to be able to target a real FPGA (that would require rewriting the input and output modules, of course). While it is tempting to see Verilog and VHDL as programming languages, they really aren't.
Verilog and other high-level description languages are more like formal languages to write requirements. For example, in C might write:
out = selector?a:b;
That tells me that every time the CPU comes to this point in the program, the code will evaluate selector, and set out to
b, depending on the value at that time. If that code runs once per second, then out might change as fast as once per second. If the code never runs, out is never set. Changes that occur when the program is not "looking" don't matter.