☮ Chad D. Kersey ♡

A Weblog

Tag: hardware

CHDL Tutorial Slides

Approximately annually, I have been giving brief but in-depth introductions to CHDL. The slide carousel I use for this is now self-contained enough that I am comfortable posting it to the Web for general consumption:

Spring 2015 CHDL Tutorial Slides

CHDL Internals 0: node, nodeimpl, and tickable

As a companion to the CHDL tutorial series, I would like to expose some of the details of the CHDL design in a series of in-depth technical articles. The intended audience of the tutorials consists of those wanting to use CHDL for their own hardware designs. The internals articles is for those wishing to extend, alter, or simply understand CHDL at a lower level.

There are three basic datatypes upon which all of the rest of CHDL is built:

  1. The nodeimpl, handling the simulation and output behavior of
  2. the node, the user-facing interface for manipulating nodes, and
  3. the tickable, representing objects which react to clock signals.

In this article, we will focus on these types and some of the more obvious types built around them, leaving the higher-level details for next week. Because the codebase we are discussing is a constantly-changing work in progress, I will not cite data structures and functions by line number, instead referring to them by name and file.

node and nodeimpl

In a CHDL design, you type:

node x(!y);

What actually happens? Let’s trace the function calls. First, we get to operator!(node y) in gateops.h. This is merely a wrapper for Inv(y). Inv(y) is declared in gates.h and defined in gates.cpp. All of the combinational logic functions in CHDL are implemented as combinations of (for now; there’s some desire to move to the industry standard And/Inverter Graph format) nand gates and inverters. The Inv() function, therefore, represents one of these fundamental gate types. Its implementation, then, will hopefully shed some light on the implementation of CHDL nodes. The meat of the implementation of Inv() in gates.cpp is:

node r((new invimpl(in))->id);

This seems odd. We’re creating this thing called an “invimpl” by allocating it on the heap, but despite the fact that we’re allocating it here, we’re not keeping the pointer here. Instead, we’re simply taking this “id” value and passing it to the node constructor. This reveals something about the true nature of node objects: they don’t so much represent nodes themselves as much as identify a nodeimpl. Why call them “node” and not “noderef” or similar? This is entirely for the benefit of users of the CHDL API. To the hardware implementer they certainly do represent nodes.

The nodes[] Vector

So, why aren’t we keeping a pointer to our node and what’s this “id” field about? The astute reader will have read the title of this section and understood. Pointers to all nodeimpl objects are stored in a global C++ vector<nodeimpl> called nodes, declared in nodeimpl.h. Indices into this vector have their own type, too, nodeid_t, an integer type typedef found in node.h. What’s the difference between nodeid_t and node, then? Consider the following:

node x = Foo(); // x has node id 100
node y = Reg(x); // y has node id 101
node z = Bar(); // z has node id 200
x = z; // now x and z both have node id 200

So, will y be the time-shifted-by-one-cycle output of Foo() or Bar()? Remember from the tutorial and the documentation that node assignments are transitive. All of the previous things given the value of x will now have its new value. This is useful because it means that CHDL designs can rewrite themselves. This is how optimization is implemented. It is also useful simply because hardware designs are not acyclic. Combinational logic may be, but as soon as registers are involved it will sometimes be necessary to use values before they are defined.

This behavior is implemented with a data structure called the node directory or node_dir, local to nodeimpl.cpp. The node directory keeps track of the node objects associated with each node ID. The node constructor adds a node to this directory and the node destructor removes it. The node assignment operator consults the node directory and updates all of the nodes to point to the new node ID. This means that node assignments necessarily, by design, completely remove all users of a given node ID. This is why the dead node elimination optimization is vital.

Finally, we have the issue of node sources. If I create an Inv(x), once this function has returned, a new invimpl object will have been created with x as the source. How do we store the source in such a way that, if x is overwritten later, the inverter’s source will be updated to match. We simply make the src field of nodeimpl a std::vector. This leads to the expected result without introducing a cyclic type dependency; remember, nodes don’t directly point to nodeimpls. They only contain indices.

tickable

We’ve dealt with the problem of how nodes are created, linked together, and evaluated, but this only allows us to create structures which evaluate Boolean functions, not systems which evolve over time. CHDL was not designed to evaluate low-level designs including implementations of sequential logic at the gate level, including stateful asynchronous logic. Cycles in the node graph lead to undefined behavior in both simulation and optimization, and can be detected with the cycdet() function in chdl/analysis.h.

The tickable type is completely unrelated to node and provides an orthogonal feature to node. It is only later, when we get to the register; the most useful instance of a tickable, that these two concepts are merged. In general, the node interface provides a mechanism for implementing arbitrary Boolean functions that may depend on other nodes or other parts of the program, and the tickable interface provides a mechanism for implementing time stepped simulations. The combination of these two behaviors leads to the CHDL simulation model.

The class tickable itself is a virtual class that defines four member functions:

tickable::pre_tick()
tickable::tick()
tickable::tock()
tickable::post_tock()

During simulation, each of these is executed once per clock cycle. During each clock cycle, first all of the pre_tick() functions are called, then all of the tick() functions, etc. This somewhat ungainly state of affairs arose because the original simulator demanded both a tick() and tock() function, so that register state would not be updated until all register inputs had been read. Subsequent additions to CHDL required yet more priority levels to handle the updating of values both prior to and following the execution of user code when the simulation was being advanced one step at a time.

tickable is not a pure virtual class. It can itself be instantiated, although there is no practical reason to do so. Any combination of the four tick functions can be overridden or omitted from a tickable subclass. Most override only one or two.

regimpl

The example from Tutorial 0:

node x;
x = Reg(!x);
TAP(x);

makes use of two types of node: the register node and the inverter node. The implementation of the inverter node type, invimpl, was explained in the previous section. So what about x itself, a register node. Its node type is regimpl, a type which inherits from both nodeimpl and tickable. Its tick() function reads the state of its input (a variable unique to regimpl, q, not src[0]; remember, the node graph made using src should not have any cycles). The tock() function writes the value read from q to the output, another bool variable that is returned by the eval() function.

What’s Next?

With only nandimpl, invimpl, and regimpl, a world of practical and useful digital logic circuits could be constructed and simulated. A ton of software would have to be written to make this practical. This software, of course, has already been written and represents the rest of CHDL and its companion library the CHDL STL. There has been some serious thought about moving the non-fundamental software out of CHDL and into a separate library, but for the sake of the user, this would only make sense once CHDL has stabilized and packages are provided for major distributions.

CHDL on FPGA

FPGAs are fun. They provide an execution environment for RTL models quick enough to rival custom hardware, at a tiny fraction of the initial investment. This makes them highly appealing to builders of prototypes, those needing to use high-frequency interfaces, and hobbyists, who build copies of historically significant computers from the mid 1980’s that run slower than emulators on modern hardware because why not (http://www.bigmessowires.com/plus-too/).

So of course CHDL code can be run on FPGAs. Just not directly. The configuration formats for FPGAs are proprietary and technology mapping to FPGA lookup tables has not yet been implemented. But why bother to generate an FPGA configuration bitstream for one product by one vendor when you can generate synthesizable Verilog and let the vendor’s proprietary tools do the translation and optimization?

This is what I’ve done. The result is satisfying:

shot0001

So, of course, if you do FPGA development at all, this is something you should do as well. Here’s how:

  • Get CHDL. (https://github.com/cdkersey/chdl)
  • Create a design…
  • Compile the design with an ordinary C++ compiler.
  • Write a program that:
    • Instantiates the design.
    • Calls optimize().
    • Calls print_verilog().
  • Place this Verilog code in a file called “chdl_design.v” and import it into the FPGA toolchain of your choice.

The rest is your standard FPGA workflow, for which there are plenty of tutorials on the Internet. Among the advantages of the CHDL workflow is that it pushes the proprietary tools out to the margins of the design flow. They are still responsible for much of the optimization, technology mapping, pin assignment, and the like, but as long as they speak a simple synthesizable subset of Verilog, they become completely interchangeable.

A Bit About the Demo

The demo code is a simple VGA terminal meant to be clocked at 50 MHz, with a simple parallel interface. The character ROM is stored in human-readable format in the file FONT, from which it is converted to hex by the simple program in font2hex.cpp. Attached to this VGA controller is a text generator, which outputs characters at a human-readable rate from a ROM, whose contents are initialized based on the file TEXT, which is converted into hex in the makefile using a simple hexdump command.

The entire design uses 1691 LUTs and 29k bits of block RAM on an Altera Cyclone II. Somewhat surprising is that this is more than the total number of nodes in the design after optimization by the CHDL toolchain, but this is easily explained away by duplication for the purpose of performance, since area is plentiful (on the demo board, this is ~10% of available resources).

Running the CHDL CPU Example

CHDL example 6, a simulated MIPS-esque CPU with a 5-stage pipeline running a Sieve of Eratosthenes program finally works. This example can be loaded and simulated and will surely provide an important test case for future modifications to CHDL. For those of you interested in seeing CHDL run for yourself, I will provide a (very) brief tutorial on building and running the examples.

Step 1: Getting CHDL

CHDL is hosted on github, which (among other things) means that it must be accessed using a git client, like so:

  $ git clone git://github.com/cdkersey/chdl.git

Step 2: Building CHDL

Once you have downloaded CHDL, building it should just be a matter of running make in the root source directory. There are a lot of compilation units, so if you’re on a multicore machine don’t forget to use the -j option to speed up the build.

  $ cd chdl
  $ make -j 8

Step 3: Building and Running the Examples

Once you have built the core CHDL library, you can build the examples in the examples/ directory.

  $ cd examples
  $ make -j 8

Once this finishes, you will have a set of files called example[i].vcd, for i in 1 through 7. These are waveform files, containing the state of every node (or bit vector) tapped with the “TAP()” macro, and viewable in waveform viewers like the free gtkwave. If you do not have a waveform viewer installed, go ahead and obtain gtkwave.

  $ sudo aptitude install gtkwave

If you’re not on Debian, and/or do not have aptitude installed, you’re beyond help.

Step 4: Viewing the Waveforms

The waveform files can then be viewed with gtkwave:

  $ gtkwave example6.vcd

The TAP()s from the source code are listed along the left side of the window and can be dragged to the viewing area.

chdl_waveform

In the above figure, the fetch program counter is being viewed in “analog” mode, showing the path taken through the program memory over time. Note that after the program is finished executing, the processor enters a tight loop. The output of the program is the series of prime numbers in register 12 (a good test case because it’s a simple program whose results are hard to produce accidentally). Happy hacking!

Announcing CHDL

Strange things happen as a consequence of the lack of freedom in the hardware world. Consider the case of the man who made a CPU out of discrete transistors because he was uncomfortable with FPGA vendor lock-in. (http://www.6502.org/users/dieter/mt15/mt15.htm) I do not pretend to understand the EDA community enough to make any claims about the tools they do or do not have, open source or otherwise. I have experienced a lack, but it may just be the rarified air of the field when compared to software-oriented disciplines. Other tools exist. I have not found the ones I have encountered particularly well-suited to my personal needs, so I have built another.

CHDL (call it “the” CHDL at your peril) is two things: a C++-based hardware description language and a C++ hardware design library. The former fills a perceived need for radical generality and simplicity in gate-level design specification and the latter fills a need for a free software toolchain for realizing these designs.

CHDL (the language) can be used to specify abstract digital logic designs with uncommonly terse syntax. Designs specified in this way can then be subject to optimizations and simulated directly or written out to netlists. It is the processing of these netlists that is the domain of CHDL (the library) and the related utilities. The netlist files may be translated to other HDLs (like ones supported by FPGA vendor toolchains), translated to C and simulated more quickly, or technology mapped and physically implemented.

What does CHDL code look like? Here is a a nontrivial design from the standard library: a Kogge-Stone adder:

 template  bvec Adder(bvec a, bvec b, node cin = Lit(0)) {
    vector<bvec<N+1>> g(log2(N)+3), p(log2(N)+3), i(log2(N)+3);
    bvec s;

    g[0][0] = cin;
    p[0][0] = Lit(0);

    for (unsigned j = 0; j < N; ++j) p[0][j+1] = Xor(a[j], b[j]);
    for (unsigned j = 0; j < N; ++j) g[0][j+1] = a[j] && b[j];

    unsigned k;
    for (k = 0; (1l<<k) < 2*N; ++k) {
      for (unsigned j = 0; j < N+1; ++j) {
        if (j < (1l<<l)) {
          g[k+1][j] = g[k][j];
          p[k+1][j] = p[k][j];
        } else {
          i[k+1][j] = p[k][j] && g[k][j - (1l<<k)];
          g[k+1][j] = i[k+1][j] || g[k][j];
          p[k+1][j] = p[k][j] && p[k][j - (1l<<k)];
        }
      }
    }

    for (unsigned j = 0; j < N; ++j) s[j] = Xor(p[0][j+1], g[k][j]);
    return s;
  }

This template function, when called, instantiates an adder of the given size. It will instantiate one of these adders each time it is called. Note that all of the loops are run at design instantiation time. The function’s goal is to create some gates. Once it has returned, those gates occupy some global state, where they can then be simulated, optimized, or written out to a file.

I have placed the very-much-in-development CHDL on GitHub (https://github.com/cdkersey/chdl) along with the hope that I am not alone in my ambition, and that likeminded individuals will find value in these manic machinations.