Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼

Al Williams

Dr. Dobb's Bloggers

Paper to FPGA

June 03, 2013

A few weeks ago, I started playing with CARDIAC, the old paper demonstration computer from Bell. I'm wrapping up CARDIAC by moving the Verilog implementation, vtach, over to a real FPGA board. As I mentioned last time, I'm using a Spartan 3 board from Digilent and Xilinx's ISE tools. You can download the entire source tree in the online listings.

I did make a few changes to how vtach works. The Xilinx tools pick up a lot of warnings that Icarus Verilog did not. In addition, in an attempt to simplify the BCD adder, I messed up a corner case that I should have picked up in simulation, but didn't (demonstrating the value of a nice test plan).

The BCD math has turned out to be the weakest link in this entire project. There is enough logic there that I had to clock the CPU at 16MHz to make timing closure. That still allows around 4 million instructions per second, but with a simpler ALU, you could push the timing to be much faster.

Timing closure is one of those seemingly mysterious FPGA terms that you often hear and I often like to talk about it when conducting a job interview. Your FPGA development tool will tell you that you didn't make timing closure or that a timing constraint was not met. But what does that really mean? More importantly, what do you do about it?

The first answer is simple. Consider a D flip flop. Ideally, when the clock transitions, the flip flop matches the Q output to the D input. What if the D input is changing too? That's why flip flops will have a setup time and a hold time.

The setup time is how long the D input should be stable before a clock edge to ensure proper operation. The hold time, not surprisingly, is how long after a clock edge the D input has to remain stable. In other words, the clock edge isn't really a sharp edge, it is a window and the D input needs to stay the same throughout the window for the flip flop to work. If you violate the window, the flip flop can go into a metastable state where it oscillates between 1 and 0 for some time.

Flip flops can go metastable if the clock rise and fall times are too slow, too. Naturally, if you are reading an asynchronous input (like a pushbutton), you might violate the flip flop's setup or hold time (the usual solution is to use several flip flops to synchronize the input to the main clock). However, for timing closure what you are worried about mostly is the delay caused by combinatorial logic.

Imagine you have two D flip flops connected so that the output of the first one feeds the input of the second one through an inverter. At a low clock frequency, the second flip flop will always have time for the signal from the first flip flop to reach its input. But as the clock frequency increases, the delay through the inverter (and the interconnection) can become significant.

As an impractical example, suppose the flip flops had a setup and hold time of 1uS (ridiculously large, by the way). The first implication is that the clock edges can't come in faster than 2uS (500kHz) or else the windows will overlap and there will be no safe time for the input to change.

Now imagine the inverter had an equally improbably delay of 2uS. At 500kHz the clock would have a new edge before the output of the first flip flop reached the second one from the last edge. You'd have to reduce the clock even further.

If your clock is going faster than the delays in your circuit, you didn't meet timing closure. The second answer — what to do about it — is harder to answer. The obvious answer is to reduce your clock speed. The other thing to do is find the long delay paths in your combinatorial logic and try to optimize them in some way. The tool's timing reports will identify the longest paths, but exactly how you optimize them is the trick. Sometimes you can do things in parallel that you were doing in series (the hardware equivalent of picking a smarter software algorithm). Another trick is to use a faster clock and break the operation up into smaller pieces.

Since most people use the tools to place and route, you can also try to tweak the tool's settings — most of them can attempt to optimize speed or gate density, and optimizing for speed can make a difference. If you are really hardcore, you can use floor-planning tools or even physical editors to layout critical paths and gates yourself if you really think you can do a better job than the automated tools.

That's timing closure in a nutshell. With vtach, I simply dropped the clock speed to make closure. The board has a 50MHz clock but the Spartan chip has several delay locked loops (DLLs) that let you modify the clock up or down in frequency. Unfortunately, the lowest it can take the 50MHz clock is 18MHz. I really needed to run at 16MHz, so the top.v module uses a DLL set to 32MHz and then divides it in half:

	always @(posedge clkls) clkdiv<=~clkdiv;  // Divide clock by 2

It is usually a bad idea to do anything with the clock directly like this. The FPGA has special clock resources and if you put too much logic on the clock, the tool may be unable to allocate those resources for the generated clock. This simple divider and relatively low speed presents no problems for this approach, however.

The structure of the Verilog is the same as before, so if you want to sift through the online listings, you should recognize most of the code (except where I made bug fixes). The new or changed parts include:

  • display.v — I borrowed the LED display driver from one of my other CPUs, Blue, and modified it to handle vtach's negative numbers.
  • debounce.v — Another modified import from Blue, this module debounces and synchronizes the push buttons.
  • memory.v — Instead of an array of Verilog registers, I used the Xilinx block RAM feature for memory in this version of vtach.
  • io_input.v, io_output.v — Obviously the simulated input and output had to change to accommodate the real hardware.
  • alu.v — In addition to adding I/O instructions, the main execution unit had to wire in the I/O devices.
  • top.v — The major change to the main module was to manage the clock.

The memory deserves a closer look. Languages like Verilog and VHDL look like programming languages, but they aren't. They are really requirements languages. When you write something like:

	always @(posedge clkls) clkdiv<=~clkdiv;  // Divide clock by 2

The tool infers that you need a D flip flop that has its input driven by its own output inverted (oddly enough, almost the same circuit I used when talking about timing closure). It might also realize that this is really a T flip flop and, if it has those in its toolbox, it might use one or it might build one out of D flip flops or whatever other building blocks it has available.

The point is that the tool tries to develop an understanding of what you want to do and then maps it to circuitry. That's why you can write this line of code in digitadd.v:

   assign temp={1'b0,a}+{1'b0,b}+{4'b0, cyin};

The plus sign here causes the tool to infer an adder (which is a whole slew of AND, OR, and XOR gates). In fact, the FPGA probably doesn't have any of those available, so it might build a look up table to implement the logic (basically a ROM that maps address inputs to data outputs). Some FPGAs have special adding hardware and the tool might choose to map to that instead.

One of the many things the tools look for is patterns that appear to define memory. However, when you are working with a specific chip like a Spartan, you may want to take control of how the memory is set up instead of trying to describe it with Verilog. Xilinx gives you a core generator that lets you build resources for memory, clock generators, and many other on-chip features. You can also just load in "libraries" (what FPGA people call IP or Intellectual Property) that provide features you licensed from a vendor.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.