More About Units
There are many ways to solve any problem using the One-Der architecture. For example, conditional bus transfers allow you to make decisions. But functional units might use other methods. For example, the loop unit requires you to move a subunit into the program counter. The value of the subunit will vary depending on if the loop is complete or not.
Each unit receives a clock signal and the inverted clock (useful for the Xilinx stock memory which always looks for a rising edge). There is also an alternating phase signal that provides two distinct clock pulses (four clock edges, since the CPU works on both edges of the clock). The clock cycles have the following purposes:
- Clock A: rising edge, phase=0 -- Latch instruction from program memory, early decode
- Clock B: falling edge, phase=0 -- Set up memory addresses for reading, late decoding
- Clock C: rising edge, phase=1 -- Place source data on bus
- Clock D: falling edge, phase=1 -- Place bus into destination, update program counter, etc.
The CPU uses program information in clock B for some units (mostly those requiring time to set up memory transfers) and in clock C for others. This isn't strictly necessary, but it helps the Xilinx tools perform better since there s more time for signals to propagate for most functional units.
Functional units can be simple or complex. If a functional unit needs more than one bus cycle you can easily stall the bus by asserting the dta line in the functional unit. You can implement bus stalls in several ways. For example, suppose you develop a pipelined multiplier functional unit that requires 64 clock cycles to operate. One approach would be to stall the bus when the final operand is sent to the unit. Or you might prefer to let the unit calculate and only stall the bus if the program attempts to read the result before the pipeline completes.
The Future
Listing 2 shows a simple monitor program that allows direct exploration of the One-Der architecture. Of course, you have to share resources with the monitor software. The monitor lets you manually key in small programs or manually set and read functional units.
;; Simple "monitor" for One-Der ;; Williams 1 March 2009 ;; assume default stack is ok ORG 0 ;; Registers used for fast subroutine calls ##define CRLF 60 ##define HEXO8 61 ##define USEND 62 ##define READHEX 63 ;; data registers ##define HEXNUM 59 ;; Put something on the display so we know we are running LDRIQ 0x1234,FIO_DISP ;; init common subroutine calls LDRIQ crlf,R(CRLF) LDRIQ hexout8,R(HEXO8) LDRIQ uartsend,R(USEND) LDRIQ readhex,R(READHEX) ;; Print a string to the terminal (kill this for more room) CALLQ banner top: LDRIQ '?', FACC ; prompt CALLR USEND LDRIQ ' ', FACC ; note that call clobbers FIMM CALLR USEND CALLQ uartrx ; get a character CALLR USEND ; echo ;; we know about several commands ;; fu: ;; p x - print x ;; s xy - set y=x (note no spaces so s 0fe1fe for example) ;; c n y - set y=n (constant) ;; reg: ;; d r - print r ;; r n r - set r=n (constant) ;; f add data...<esc> - Write to flash (working) ;; g add - Call address ;; v add count - view flash add ;; you can backspace in a number as long as you haven't ;; entered a character <'0' yet ;; Vector for each command ;; check for p MOV FACC,FACC2 LDRIQ 'p',FACC_SUB LDIQ print MOVZ FIMMV,FPC ;; check for v MOV FACC2,FACC LDRIQ 'v',FACC_SUB LDIQ view MOVZ FIMMV,FPC ;; check for f MOV FACC2, FACC LDRIQ 'f',FACC_SUB LDIQ flash MOVZ FIMMV,FPC ;; check for s MOV FACC2, FACC LDRIQ 's',FACC_SUB LDIQ setfu MOVZ FIMMV,FPC ;; check for c MOV FACC2, FACC LDRIQ 'c', FACC_SUB LDIQ setfucon MOVZ FIMMV,FPC ;; check for d MOV FACC2, FACC LDRIQ 'd', FACC_SUB LDIQ dispreg MOVZ FIMMV,FPC ;; check for r MOV FACC2, FACC LDRIQ 'r', FACC_SUB LDIQ setreg MOVZ FIMMV,FPC ;; check for g MOV FACC2, FACC LDRIQ 'g', FACC_SUB LDIQ go MOVZ FIMMV,FPC ;; not found JMPQ top ;; View flash view: CALLR READHEX ; read address MOV FACC2, FPC_PGMRDADD CALLR READHEX ; read count ;; Set up loop MOV FACC2, FLOOP_IEND MOV FZERO, FLOOP_I MOV FONE, FLOOP_IINC LDRIQ viewloop, FLOOP_IADD viewloop: MOV FZERO, FPC_PGMRD ; read flash MOV FIMMV, FACC ; print CALLR HEXO8 LDRIQ ' ', FACC ; space between words CALLR USEND MOV FLOOP_IADD,FPC ; loop CALLR CRLF ; done JMPQ top ;; Call a user subroutine go: CALLR READHEX ; get address MOV FACC2,FPC_CALL ; do it JMPQ top ; done ;; Set a register to a constant setreg: CALLR READHEX ; get constant PUSH FACC2 CALLR READHEX ; get register # LDRIQ 0xFFF,FACC2_AND LDRIQ 0x1008000,FACC2_OR ; build instruction to exec JMPQ setfu0 ; from here out, same as setting a FU ;; printa register dispreg: CALLR READHEX ; which register? LDRIQ 12,FACC2_SHL LDRIQ 0xFFF000,FACC2_AND ; get legit part LDRIQ 0x2000002, FACC2_OR ; build instruction JMPQ dispexec ; from here out, same as printing FU ;; Set a Functional unit ;; Note this just sort of execs one instruction setfu: CALLR READHEX ; get src and dest (e.g., 0fe002 is switch->acc) JMPQ setfu0 ; from here out, same as set FU constant setfucon: CALLR READHEX ; get constant PUSH FACC2 CALLR READHEX ; get destination LDRIQ 0xFFF,FACC2_AND ; build instruction LDRIQ 0x008000,FACC2_OR setfu0: ; execute instruction MOV FACC2,FPC_WDATA LDRIQ execute,FPC_WADD MOV FZERO,FPC_WRITE POP FACC2 CALLQ execute JMPQ top ;; Print an FU print: CALLR READHEX ; which FU? MOV FACC2,FACC ; print fu address:value CALLR HEXO8 LDRIQ ':',FACC CALLR USEND LDRIQ 12,FACC2_SHL LDRIQ 0xFFF000,FACC2_AND ; get legit part LDRIQ 0x002, FACC2_OR ; build instruction dispexec: MOV FACC2,FPC_WDATA ; execute instruction and print LDRIQ execute,FPC_WADD MOV FZERO,FPC_WRITE CALLQ execute CALLR HEXO8 CALLR CRLF JMPQ top ;; write to flash flash: CALLR READHEX ; Get address MOV FACC2,FPC_WADD flashloop: MOV FPC_WADD,FIO_DISP ; echo address to LED display CALLR READHEX ; get data value (FACC2 is value FACC is terminator) LDRIQ 0x1B,FACC_SUB ; test for Esc JMPZQ top MOV FACC2,FPC_WDATA ; write data MOV FZERO,FPC_WRITE JMPQ flashloop ; and loop ;; Several commands have to execute a dynamic instruction ;; So the monitor builds a command and places it here ;; and then calls execute execute: ; this is how we overwrite flash MOV FZERO,FZERO RET ;; Canned routines for UART, etc. ##include library.inc ;; print a stored banner banner: LDRIQ message,FPC_PGMRDADD bann1: MOV FZERO,FPC_PGMRD MOV FIMMV,FACC RETZ CALLR USEND JMPQ bann1 message: ; note stringpack crams text into words ;; while string outputs individual bytes STRING "1Mon" DATA 13 DATA 10 DATA 0 END
What's next for One-Der? There's an endless number of functional units that could be written, of course. In addition, I have plans to add time-based, external, and debugging interrupts. An external memory interface is also in the works. However, I'm close to shifting gears to think more about tools. A simple example would be a "wizard" style interface to select functional units and generate a custom instance of One-Der. I'm even more interested in working on compiler support to automatically select the optimum number and types of units. Perhaps one day, a One-Der savvy compiler would even take advantage of the ability to partially reconfigure the FPGA to change the functional unit distribution on the fly!
Even so, One-Der is imminently usable as it is. Unlike many other FPGA CPU cores, this one is very simple to customize even if you aren't an expert on its internals. Applications that can benefit from custom instruction in hardware -- things like digital signal processing, for example -- are ideal for One-Der since you can implement parts of your algorithm in hardware and then easily integrate those parts with the CPU.
Resources
- Xilinx – http://www.xilinx.com
- Digilent – http://www.digilentinc.com (Note: One-Der was developed on a "Spartan 3 Starter Board" with an XS3C1000 device. The default configuration uses an XS3C200.)


