Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

# The One Instruction Wonder

There are many ways to solve any problem using the One-Der architecture. For example, conditional bus transfers allow you to make decisions. But functional units might use other methods. For example, the loop unit requires you to move a subunit into the program counter. The value of the subunit will vary depending on if the loop is complete or not.

Each unit receives a clock signal and the inverted clock (useful for the Xilinx stock memory which always looks for a rising edge). There is also an alternating phase signal that provides two distinct clock pulses (four clock edges, since the CPU works on both edges of the clock). The clock cycles have the following purposes:

• Clock A: rising edge, phase=0 -- Latch instruction from program memory, early decode
• Clock B: falling edge, phase=0 -- Set up memory addresses for reading, late decoding
• Clock C: rising edge, phase=1 -- Place source data on bus
• Clock D: falling edge, phase=1 -- Place bus into destination, update program counter, etc.

The CPU uses program information in clock B for some units (mostly those requiring time to set up memory transfers) and in clock C for others. This isn't strictly necessary, but it helps the Xilinx tools perform better since there s more time for signals to propagate for most functional units.

Functional units can be simple or complex. If a functional unit needs more than one bus cycle you can easily stall the bus by asserting the dta line in the functional unit. You can implement bus stalls in several ways. For example, suppose you develop a pipelined multiplier functional unit that requires 64 clock cycles to operate. One approach would be to stall the bus when the final operand is sent to the unit. Or you might prefer to let the unit calculate and only stall the bus if the program attempts to read the result before the pipeline completes.

### The Future

Listing 2 shows a simple monitor program that allows direct exploration of the One-Der architecture. Of course, you have to share resources with the monitor software. The monitor lets you manually key in small programs or manually set and read functional units.

```	;; Simple "monitor" for One-Der
;; Williams 1 March 2009

;;  assume default stack is ok
ORG 0

;; Registers used for fast subroutine calls
##define CRLF 60
##define HEXO8 61
##define USEND 62
;; data registers
##define HEXNUM 59

;; Put something on the display so we know we are running
LDRIQ   0x1234,FIO_DISP
;;  init common subroutine calls
LDRIQ	crlf,R(CRLF)
LDRIQ	hexout8,R(HEXO8)
LDRIQ	uartsend,R(USEND)
;; Print a string to the terminal (kill this for more room)
CALLQ   banner

top:
LDRIQ '?', FACC		; prompt
CALLR USEND
LDRIQ ' ', FACC		; note that call clobbers FIMM
CALLR USEND
CALLQ uartrx   ; get a character
CALLR USEND	 ; echo
;; we know about several commands
;; fu:
;; p x - print x
;; s xy - set y=x (note no spaces so s 0fe1fe for example)
;; c n y - set y=n (constant)
;; reg:
;; d r - print r
;; r n r  - set r=n (constant)
;; f add data...<esc> - Write to flash  (working)
;; you can backspace in a number as long as you haven't
;; entered a character <'0' yet

;; Vector for each command
;; check for p
MOV FACC,FACC2
LDRIQ 'p',FACC_SUB
LDIQ print
MOVZ FIMMV,FPC
;; check for v
MOV FACC2,FACC
LDRIQ 'v',FACC_SUB
LDIQ view
MOVZ FIMMV,FPC
;; check for f
MOV FACC2, FACC
LDRIQ 'f',FACC_SUB
LDIQ flash
MOVZ FIMMV,FPC
;; check for s
MOV FACC2, FACC
LDRIQ 's',FACC_SUB
LDIQ setfu
MOVZ FIMMV,FPC
;; check for c
MOV FACC2, FACC
LDRIQ 'c', FACC_SUB
LDIQ setfucon
MOVZ FIMMV,FPC
;; check for d
MOV FACC2, FACC
LDRIQ 'd', FACC_SUB
LDIQ dispreg
MOVZ FIMMV,FPC
;; check for r
MOV FACC2, FACC
LDRIQ 'r', FACC_SUB
LDIQ setreg
MOVZ FIMMV,FPC
;; check for g
MOV FACC2, FACC
LDRIQ 'g', FACC_SUB
LDIQ go
MOVZ FIMMV,FPC
JMPQ top

;; View flash
;; Set up loop
MOV FACC2, FLOOP_IEND
MOV FZERO, FLOOP_I
MOV FONE, FLOOP_IINC
viewloop:
MOV FZERO, FPC_PGMRD	; read flash
MOV FIMMV, FACC		; print
CALLR HEXO8
LDRIQ ' ', FACC		; space between words
CALLR USEND
CALLR CRLF		; done
JMPQ top

;;  Call a user subroutine
MOV FACC2,FPC_CALL	; do it
JMPQ top		; done

;; Set a register to a constant
setreg:
PUSH FACC2
CALLR READHEX		; get register #
LDRIQ 0xFFF,FACC2_AND
LDRIQ 0x1008000,FACC2_OR ; build instruction to exec
JMPQ setfu0		 ; from here out, same as setting a FU

;; printa register
dispreg:
LDRIQ 12,FACC2_SHL
LDRIQ 0xFFF000,FACC2_AND  	; get legit part
LDRIQ 0x2000002, FACC2_OR	; build instruction
JMPQ	dispexec		; from here out, same as printing FU

;; Set a Functional unit
;; Note this just sort of execs one instruction
setfu:	CALLR READHEX 		; get src and dest (e.g., 0fe002 is switch->acc)
JMPQ setfu0		; from here out, same as set FU constant

setfucon:
PUSH FACC2
LDRIQ 0xFFF,FACC2_AND	; build instruction
LDRIQ 0x008000,FACC2_OR
setfu0:				; execute instruction
MOV  FACC2,FPC_WDATA
MOV FZERO,FPC_WRITE
POP FACC2
CALLQ execute
JMPQ top

;; Print an FU
print:
MOV   FACC2,FACC	; print fu address:value
CALLR  HEXO8
LDRIQ ':',FACC
CALLR USEND
LDRIQ 12,FACC2_SHL
LDRIQ 0xFFF000,FACC2_AND  	; get legit part
LDRIQ 0x002, FACC2_OR		; build instruction
dispexec:
MOV   FACC2,FPC_WDATA	; execute instruction and print
MOV   FZERO,FPC_WRITE
CALLQ  execute
CALLR HEXO8
CALLR CRLF
JMPQ top

;; write to flash
flash:
flashloop:
CALLR READHEX		; get data value (FACC2 is value FACC is terminator)
LDRIQ 0x1B,FACC_SUB ; test for Esc
JMPZQ top
MOV FACC2,FPC_WDATA	; write data
MOV FZERO,FPC_WRITE
JMPQ flashloop		; and loop

;; Several commands have to execute a dynamic instruction
;; So the monitor builds a command and places it here
;; and then calls execute
execute:			; this is how we overwrite flash
MOV	FZERO,FZERO
RET

;; Canned routines for UART, etc.
##include library.inc

;; print a stored banner
banner:
bann1:
MOV FZERO,FPC_PGMRD
MOV FIMMV,FACC
RETZ
CALLR USEND
JMPQ bann1
message:
; note stringpack crams text into words
;; while string outputs individual bytes
STRING "1Mon"
DATA 13
DATA 10
DATA 0

END

```
Listing 2

What's next for One-Der? There's an endless number of functional units that could be written, of course. In addition, I have plans to add time-based, external, and debugging interrupts. An external memory interface is also in the works. However, I'm close to shifting gears to think more about tools. A simple example would be a "wizard" style interface to select functional units and generate a custom instance of One-Der. I'm even more interested in working on compiler support to automatically select the optimum number and types of units. Perhaps one day, a One-Der savvy compiler would even take advantage of the ability to partially reconfigure the FPGA to change the functional unit distribution on the fly!

Even so, One-Der is imminently usable as it is. Unlike many other FPGA CPU cores, this one is very simple to customize even if you aren't an expert on its internals. Applications that can benefit from custom instruction in hardware -- things like digital signal processing, for example -- are ideal for One-Der since you can implement parts of your algorithm in hardware and then easily integrate those parts with the CPU.

### Resources

• Xilinx – http://www.xilinx.com
• Digilent – http://www.digilentinc.com (Note: One-Der was developed on a "Spartan 3 Starter Board" with an XS3C1000 device. The default configuration uses an XS3C200.)

### More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.