Instruction Set Basics
As you might expect, the instruction set is very simple. The way it is laid out it is very easy to read and write the hex op codes directly (although I have a cross assembler I'll tell you about shortly). Figure 1 shows the basic format. Note that the functional units have an 8-bit address and a 4-bit subfunction code.
So an instruction that doesn't use registers and has no conditions looks like:
00 sSS dDD
Where SS is the source unit, s is the source sub unit, and the D's represent the destination unit and sub unit. Constants are easy to identify also. The short form takes advantage of the fact that if the condition mask is zero, there's no meaningful reason to have the condition match bits set. So a short constant will have a 1 in the first position and you simply have to mask the top bit off. So 1000003F is the constant 3F hex (note that the constants are sign extended to 32-bits so 18000000 is the constant F8000000). Full 32-bit constants start with 00000F01 followed by the full 32-bit constant (this requires two cycles to execute and does not respect condition codes).
Of course, the real meat to this processor isn't the format of the instructions, it is the possible values for the source and destination. Table 1 shows the standard functional units and their addresses. Note the constant functional unit is unit zero which allows for some mnemonic subfunctions. So 00000001 (source unit 0, subunit 0) loads a zero into the program counter while 00A00001 loads a constant A. Of course, that only goes so far and some of the subfunctions are just arbitrarily assigned.
Although every instruction is technically a move, you might prefer to give some mnemonic aliases to special moves (my cross assembler does this). For example, moving something into the program counter is a jump. A subroutine return involves moving the top of the stack to the program counter.
You might be wondering: This all sounds interesting, but how can I implement a custom CPU? The answer is to use a Field Programmable Gate Array (FPGA) device. (For more on programmable logic devices like FPGAs, see my article Programmable Logic and Hardware).
In a nutshell, an FPGA implements a large number of logic cells and a programmable way to interconnect them. You can create your design by drawing schematics (only feasible for small designs) or by using a hardware description language (like Verilog, the language used to implement One-Der). A program running on a PC takes your description and converts it into a configuration of the FPGA's logic cells. The result is downloaded to the FPGA by the PC's parallel port. You can also download the result to a EEPROM which can automatically configure the FPGA on startup once you are happy with your design.
Working with FPGAs used to be very expensive. Today, vendors offer development boards that work with free software for well under US$100. My prototype of the One-Der architecture works on a development board available from Digilent (see Resources) that has the equivalent of about 1,000,000 logic gates (it costs a bit more than $100, but was well under $200). In addition to the Xilinx Spartan 3 FPGA chip and assorted support functions, the board also contains some memory devices along with a handful of I/O devices such as switches, LEDs, and a serial port. Of course, none of these things do anything unless your logic makes them do something. The One-Der prototype can read the switches, light the LEDs, and implements a serial port you can use in your programs. It also provides 16 bits of general-purpose outputs and another 16 bits of input available on the board's edge connector.
If you aren't familiar with Verilog, you'll notice it strongly resembles C. However, the way it is handled is very different since the result isn't object code, its a hardware configuration. Consider the following C snippet:
This actually generates code that sets x at some discrete time, then sets y a short time later. After this code executes, x and y won't change unless the program reexecutes this code, or executes some other code that modifies x and y.
In Verilog you might write:
assign x=a|b|~c; assign y=a&b;
This will create several logic gates; one that drives the x output and one that drives the y output. These gates will constantly compute the equations in hardware and the value of x (or y) at any instant will reflect the current values of the inputs. There's no sequence of computations implied by the line order. The resulting logic is all in parallel.
You can specify gates directly or use other methods, but I find them all clunky compared to using assign. For completeness though, here's two other ways you could drive the x output equivalently:
or(x,a,b,~c); always @(a,b,c) x=a|b|~c;
All of these build asynchronous logic in Verilog. You can also create synchronous logic which make up the bulk of the CPU. With synchronous logic, everything is referenced to a clock pulse. This way outputs have until the next clock pulse to "settle" and you can build very complex logic without worrying too much about the different delays encountered by the various signals.
For example, consider generating parity on a serial bit stream. The idea is to use a flip flop to remember the current parity value. When the serial data arrives (at the rising edge of the clock) the new parity will be the old parity exclusive-ored with the data bit. Here's a Verilog module to implement this logic:
module serialparity(input clk, input reset, input data, output reg even, output odd) assign odd=~even; always @(posedge clk) if (reset) even=1'b0; else even=even^data; endmodule
This defines a module (like a parallel subroutine) that "executes" on each positive clock edge. If you read C, you can probably puzzle out most of the syntax. The 1'b0 means a literal zero that is 1-bit long (the b is for binary). In English, the module takes each data bit and exclusive or's it with the previous parity value to form the next parity value.
Of course, a complete Verilog tutorial is beyond the scope of this article, but there's plenty of resources on the Internet. The CPU is relatively simple, but it probably isn't the ideal first Verilog project. On the other hand, adding functional units and manipulating existing ones is well within the reach of the Verilog neophyte