☮ Chad D. Kersey ♡

A Weblog

Tag: fpga

CHDL Tutorial 1: Conway’s Game of Life

This is the second in a series of articles introducing the CHDL C++ Hardware Desgin Libary. This time around, we’ll be looking at some of the basic features of CHDL, such as vectors of bits, and some of the features that make CHDL unique, including the use of C++ template metaprogramming. To illustrate these, we will implement a popular cellular automaton and software toy, Conway’s Game of Life.

bvecs: Collections of Nodes

Before we dive into the example, let’s build a very basic circuit, a 4-bit binary counter, using CHDL bit vectors:

bvec<4> ctr;
ctr = Reg(ctr + Lit(1));

We can wrap this in a function, yielding:

bvec<4> Ctr() {
  bvec<4> c;
  c = Reg(c + Lit<4>(1));
  return c;

This demonstrates several features of the bvec type. Just like Lit(0) and Lit(1) provide literal 0 and 1 values for nodes, Lit<N>(x) provides literal 2’s-complement integer values for bvec<N>s. A register function is provided and operates in exactly the same way, creating a register for each node in the bvec. Assignment, just like with nodes, is retroactive. The initial value of c is overwritten when it is assigned with the output of a register, and it is a function of the output of that register (and not the original contents of c) that serves as the input to the same register. The addition operator (as well as subtraction, multiplication, and division) are overloaded.

Exercise: Create a program that instantiates the 4-bit counter shown here and simulates it for 16 cycles, dumping the simulation waveform to a .vcd file. View this result in GTKWave to verify the correct behavior.

Indexing bvecs

CHDL bvecs can be indexed in two ways, one safer than the other. Arbitrary integers can be used to index CHDL arrays, allowing their assignment from within loops. For example, the following loop initializes a 256×8 array with literal binary numbers from 0 through 255:

vec<256, bvec<8> > x;
for (unsigned i = 0; i < 256; ++i)
  x[i] = Lit<8>(i);

Because of the nature of integral types in C++ it is impossible to catch the following error at compile time:

bvec<2> v;
v[2] = Lit(0);

Instead, this leads to a run-time error that causes the program to abort. Such errors are easy to diagnose in a debugger like GDB.

The other way to index arrays in CHDL leads to errors that can be found at compile time and allows for ranges of values to be selected:

bvec<32> addr;
bvec<20> tag = addr[range<12,31>()];
bvec<8> idx = addr[range<4,11>()];
bvec<4> offset = addr[range<0,3>()];

In this example, a 32-bit address is divided into a 20-bit tag, 8-bit index, and a 4-bit offset for use by a cache with 16-byte blocks and 4kB sets. Because they are actually part of the type’s identity, the template arguments of range must be compile time constant. This means that the following mistake:

bvec offset = addr[range<1,3>()];

leads to a compile time error.

Multi-Dimensional Arrays

The bvec type itself is really just a convenient shorthand. An equivalent for bvec<N&gt is vec<N, node>. The actual basic template here is not bvec, but an even more general vec template that acts as an extension to the usual fixed-length C array. There is no requirement that vec types contain only CHDL types, but they are intended to be used as such. In order to create multi-dimensional arrays, vecs can contain other vecs:

vec<8, bvec > matrix;

This can then be addressed by a multiplexer:

bvec sel;
bvec byte = Mux(sel, matrix);

Conway’s Game of Life

Before we implement Conway’s Game of Life, we should probably spend some time talking about what it is. Conway’s Game of Life is best described as a cellular automaton. It is not a game by any lay usage of the word. It is instead a system; a way for state to evolve over time based on certain rules. It just happens that despite the simplicity of the rules, some very fascinating and complex patterns emerge in Life when it is initialized with random input. When it is initialized with carefully-designed input, wonderfully complex creations have been constructed, as shown here, here, and here.

The game board is a 2D grid of cells with one of two states: set or cleared. All of the cells advance to their next state at the same time and the next state of any cell depends only on the current state of the cell itself and its eight neighbors (four in the cardinal directions and four to the corners). The rules are simple: any cell with exactly three set neighbors becomes set. Any cell with less than two neighbors set is cleared. Any cell with more than three neighbors set is cleared.

Implementing Conway’s Game of Life

A Quick Review of C++ Template Functions

Most of the features of CHDL are designed to interoperate with C++ template metaprogramming. Because of the reliance on C++ features, most errors in the type semantics of CHDL designs can be caught at compile time, and the resulting object code is optimized for relatively quick design elaboration. Since our design for a universal population count module relies on C++ template functions, we will now briefly review the concepts used. A detailed introduction C++ templates can be found in the C++ FAQ here.

C++ templates give us a way to define, in header files, functions and structs that have certain parameters. These parameters can be types, integers, functions, and other templates. The vec<int N, typename T> template we’ve already seen in this tutorial is a templated type that takes an integer and a type argument and uses these to describe a vector of length N containing objects of type T. In addition to templated types like vec, there are also template functions. An example of a template function we’ve already seen is the bvec version of Reg, as used above. Its signature is:

template<unsigned N> bvec<N> Reg(const bvec<N> &d);

Another example if Lit<N>, whose signature is:

template<unsigned N> bvec<N> Lit(unsigned long val);

Because it is easy to infer the length of the register from its parameter, it is not necessary to specify a length for Reg(). For Lit(), the length of the return type cannot be inferred from the arguments, so a template parameter is needed. This is why, in the counter example, the expression for the value of the counter is Reg(c + Lit<4>(1)).

A useful feature of templates is template specialization. If, say, I had a really well-optimzied implementation of a function like Reg that only worked for bvecs of length 8, I could declare it as:

template<> bvec Reg(const bvec &d);

This could live alongside my default implementation, only being used when N is 8.

constexpr Functions and CLOG2

Since C++11, C++ has offered a feature that allows certain simple functions to be evaluated at compile time with the use of the constexpr keyword. The results are constant as long as the inpts remain constant. This ensures type safety and succinct error messages.

In CHDL, constexpr functions are provided as utility functions for common arithmetic performed when evaluating the dimensions of hardware designs: especially the base-2 logarithm, with both the integer floor and integer ceiling operator provided.

The CHDL CLOG2() constexpr function provides a convenient way to evaluate the number of bits needed to uniquely identify N elements. The signature of the Mux function seen above, for example, is:

  T Mux(const bvec<CLOG2(N)> &sel, const vec<N, T> &in);

In previous versions of CHDL, CLOG2 was implemented as a preprocessor macro. In addition to providing no type checking, this lengthy bit of combinational logic was expanded in error messages by G++.

Population Count

An operation that comes up occasionally in computing; frequently enough, in fact that it should probably be in the CHDL standard library but currently is not, is the following: How many bits in a given word are set? This is known as population count, and is popular enough in real software that it is often featured as a processor instruction. (It is POPCNT on x86 and VCNT on ARM.)

The population count operation needed to implement Conway’s Game of Life could be implemented in a very rigid, fixed way, supporting exactly the 8 requisite neighboring bits, no more, no less. In the following code, Zext provides a zero extension operation. It pads the upper bits of a word with zeroes until it is the length of Zext‘s template argument:

bvec<4> PopCount(bvec<8> in) {
  vec<4, bvec<2> > count2;
  for (unsigned i = 0; i < 4; ++i)
    count2[i] = Zext<2>(in[2*i]) + Zext<2>(in[2*i + 1]);
  vec<2, bvec<3> > count4;
  for (unsigned i = 0; i < 2; ++i)
    count4[i] = Zext<3>(count2[2*i]) + Zext<3>(count2[2*i + 1]);

  return Zext<4>(count4[0]) + Zext<4>(count4[1]);

This code is ugly: It’s verbose, and it solves a problem so specific it will almost never happen. This is code that yearns for a recursive solution. The same basic pattern repeats itself three times, in a borderline violation of the DRY principle. If we tried to simply implement PopCount as a recursive function, we would run into a problem: the size of the input and output bvecs would remain the same, leading to quite a bit of wasted space. It is still possible to do:

bvec<4> PopCount(bvec<8> in, unsigned level = 0) {
  if (level < 3) {
    bvec<4> a, b;
    Cat(a, b) = in;
    return PopCount(Zext<8>(a), level+1) + PopCount(Zext<8>(b), level+1);
  } else {
    return in;

This is perhaps a little better, and not as bad as it seems; all of those additional zeroes will ultimately be reclaimed by the optimizer. Still, unnecessary zero extensions abound, and the level parameter is shoehorned in to provide a base case. If we didn’t care about type safety and wanted to use the C++ STL vector instead of CHDL vec, we could write (assuming we had proper operator overloads for arithmetic available):

vector<node> PopCount(vector<node> in) {
  if (in.size() == 1) return in;
  else {
    // Split input into two vectors
    vector<node> a(in.size()/2), b(in.size() - a.size());
    for (unsigned i = 0; i < in.size(); ++i)
      if (i < a.size()) a[i] = in[i];
      else b[i - a.size()] = in[i];

    // Get population count of each vector.
    return PopCount(a) + PopCount(b);

This is also ungainly in its own way; C++ vectors were not designed to be easily divisible into subvectors so we use loops to manually copy the input bit-by-bit.

Exercise: Why might we avoid using the insert() operator on a C++ vector of CHDL nodes? Hint: insert() will use the assignment operator to move the contents of the vector over by one position to make room for the new element.

None of the preceding examples represent the preferred way of implementing population count in CHDL. They represent designs with their own trade-offs that may be used as called for by this situation or that, but which are wholly unnecessary for an operation as basic as population count.

Recursively-defined functions in CHDL are typically implemented using a template with the base case represented by a specialization:

template<unsigned N> bvec<CLOG2(N+1)> PopCount(bvec<N> x) {
  return Zext<CLOG2(N+1)>(PopCount(x[range<0,N/2-1>()])) +

template<> bvec<1> PopCount<1>(bvec<1> x) { return x; }

This is the CHDL way. This implementation uses the same algorithm as the previous three examples and ultimately reduces to the same hardware, but it is general, succinct, and type safe.

Next State

For computing the next state of a cell, we will use a feature of CHDL we have not discussed before, the conditional assignment or Cassign() function. Cassign() provides an unusual syntax for solving the problem of computing future state, as shown in the following example:

template<unsigned long X> node Pulse() {
  const unsigned N(CLOG2(X));
  bvec<N> next_ctr, ctr(Reg(next_ctr));
  node p(ctr == Lit<N>(X-1));

    IF(p, Lit<N>(0)).
    ELSE(ctr + Lit<N>(1));

  return p;

This function emits a 1-cycle-long pulse once every X cycles. The Cassign() function is used to determine the next value of the counter, with the output p acting like a reset signal. Assuming that a population count, count, for all neighboring cells is available, finding the next state next_alive for a given cell with current value alive is trivial:

  IF(count < Lit<4>(2), Lit(0)).
  IF(count > Lit<4>(3), Lit(0)).
  IF(count == Lit<4>(3), Lit(1)).

Using this in combination with our previously-defined PopCount() function, we can create our function for a single cell:

node LifeCell(bvec<8> neighbors, bool init = false) {
  bvec<4> count(PopCount(neighbors));
  node next_alive, alive(Reg(next_alive, init));

    IF(count < Lit<4>(2), Lit(0)).
    IF(count > Lit<4>(3), Lit(0)).
    IF(count == Lit<4>(3), Lit(1)).

  return alive;

The init argument to this function represents the initial state of the cell. The Reg function has a second argument with a default value of 0 representing the initial value of the registers. This is true for both Reg(bvec<N>) and, as shown here, Reg(node).

Getting to Know Your Neighbors

Conway’s Game of Life can be thought of as a simple stencil operation. The same function is applied to every element of an array and used to determine that element’s next value. The only inputs used to determine each element’s value come from the array and have the same shape at each point, translated by that point’s position. For Conway’s game of life, the shape of the input stencil is that of a node and its neighbors. In the following image, the red element’s next value depends on its own value and those of all of the blue elements. The highlighted region can be shifted throughout the region and the operation can be repeated:


This is a very straightforward computation but with one major caveat: what do we do at the edges and corners? The sensible options are to assume that every cell beyond the edge has the value 0 or to assume that the edges wrap around, giving the board a toroidal topology. To accomplish this, we create a function called Get which performs the indexing operation:

template <bool T, unsigned X, unsigned Y>
  node Get(vec<Y, bvec<X> > &g, int i, int j)
  if (T) {
    while (i < 0) i += X;
    while (j < 0) j += Y;
    while (i >= X) i -= X;
    while (j >= Y) j -= Y;

  if (i < 0 || i >= X || j < 0 || j >= Y) return Lit(0);
  else return g[i][j];

By making whether the space is toroidal or surrounded by zeroes a template parameter T, we make it possible to refer to one version or the other of Get through a function pointer in our Neighbors function. We define our function pointer G to avoid having to repeatedly call Get<T> for each of the eight neighboring points.

template <bool T, unsigned X, unsigned Y>
  bvec<8> Neighbors(vec<Y, bvec<X> > &g, unsigned i, unsigned j)
  auto G(Get<T, X, Y>);

  return bvec{
    G(g, i-1, j-1), G(g, i-1,   j), G(g, i-1, j+1), G(g,   i, j-1),
    G(g,   i, j+1), G(g, i+1, j-1), G(g, i+1,   j), G(g, i+1, j+1)

This introduces another not-previously-mentioned-in-these-tutorials feature, the list constructor for CHDL vecs. It allows a comma-separated list of nodes to be used to initialize a bvec, or for that matter any comma-separated list of objects of type T to be used to initialize a vec<T, N>.


Once we have our Neighbors function and our LifeCell function, we can implement the entire grid of cells. Once again, we punt on the decision of how the space should be layed out, by making it a bool template parameter T:

template <bool T, unsigned X, unsigned Y>
  vec<Y, bvec<X> > LifeGrid(bool *init)
  vec<Y, bvec<X> > g;

  for (unsigned i = 0; i < Y; ++i)
    for (unsigned j = 0; j < X; ++j)
      g[i][j] = LifeCell(Neighbors<T>(g, i, j), init[j * X + i]);

  return g;

In this function, init is a pointer to an X*Y-element array of bools representing the initial state of the grid. The 2D array of nodes g is declared and then assigned point-by-point with LifeCell functions. The return value of this function is this same array.

A main function using this is simple:

int main() {
  const unsigned X(16), Y(16);
  bool init[X*Y];

  for (unsigned i = 0; i < X*Y; ++i) init[i] = (rand()%4 == 0);

  vec<Y, bvec<X> > g = LifeGrid<1, X, Y>(init);



  ofstream vcd("life.vcd");
  run(vcd, 1000);

  return 0;

This function initializes our initial state array to random values and calls LifeGrid(). In this case, the toroidal cell space is used, but with a 0 instead of a 1 as the first template argument to LifeGrid(), we could use a board with no active cells beyond the boundaries instead.

Improved Output Using Egress

It would be nice if we could, instead of just calling run() and looking at the output file, see the results of this simulation spatially, in the terminal. In general, it would be nice if simulated hardware had flexible I/O so that it could interface with our C++ code. CHDL provides several ways to do this, the most common of which are the Ingress() and Egress() interfaces for getting data into and out of the CHDL simulator. Let’s add #include <chdl/egress.h> to the top of our source file and replace our call to run() with the folliowing:

bool val[Y][X];
for (unsigned i = 0; i < Y; ++i)
  for (unsigned j = 0; j < X; ++j)
    Egress(val[i][j], g[i][j]);

for (unsigned cyc = 0; cyc < 10000; ++cyc) {
  for (unsigned i = 0; i < Y; ++i) {
    for (unsigned j = 0; j < X; ++j) {
      cout << (val[i][j]?"[]":"  ");
    cout << endl;

The val array is now set, at the beginning of each cycle, with the values of each of the nodes on our board.
The node states can now be viewed directly. Try using the less command in a 17-line-tall terminal and holding spacebar to view the output of this program. The patterns seem far more interesting when a toroidal space is used, as in the following screenshot:


CHDL Tutorial 0: Das Blinkenlights

CHDL and its associated standard library has grown by leaps and bounds over the past year, thanks in part to a decision by my employers to adopt it as a research vehicle. CHDL code now looks quite different than it did a year ago. There is a conditional assignment system that simplifies the creation of state machines. The chdl/statemachine.h API has been deprecated, there is now basic support for buses and tri-state drivers, and simulation speed is soon to receive a huge boost.

Because CHDL development has arrived at a point where I want to actively encourage others to try it, use it, and hopefully contribute to it, I have decided to immediately begin releasing tutorial and technical information related to CHDL in particular and digital electronics deisn in general through this web site. By doing this I hope to educate visitors interested in digital electronics, promote CHDL as a platform for hardware development, and thoroughly document the CHDL interfaces and internals.

This first set of articles, which I hope to make available in rotation with at least two other sets, is devoted entirely to the use of CHDL; written to an audience expected to consist primarily of amateur electical engineers, but with nothing that an experienced professional or engineering student would not be expected to understand. Throughout this tutorial I have included exercises for the reader. Most of these will require the consultation of other references to complete. The truly interested reader is likely to ditch these and do something they find fun instead, but I think of them as milestones. If you think you would be unable to complete an exercise, reading past it may be unwise.

Getting CHDL

It is important to note that CHDL is released under the GNU LGPL. This means that any derivative of CHDL must remain under this license, but does not extend to software linking (statically or dynamically) against the library. This means that, if you want to use CHDL on a commercial hardware project, binary simulation applications and libraries can be released under whatever license you feel like. This will satisfy all of the electrical engineers in your company with a list of patents on their CVs.

Previous articles included some instructions for getting and installing CHDL, but for the sake of completeness, slightly more detailed instructions will be included here. It is expected that anybody reading this is running a relatively recent Linux or Mac OS and has some level of comfort obtaining and using software. The only requirements are a recent G++ or Clang compiler that supports C++ 11 and GNU Make, but a waveform viewer program like GTKWave would be useful too, for viewing output. A large number of engineers primarily use Windows. CHDL will likely compile with a little coaxing in CygWin, which also provides a reasonable development environment.

CHDL is a library, and using it well requires some knowledge of how libraries work on your platform of choice. For Linux, there are many places to start, but see this guide for a tutorial.

Exercise: Write a library in C++ that contains a single function, “say_hi” that prints “Hello, world!” to the screen. Compile and link it as a shared object and write a program, in a separate directory, that calls this function once and exits. If you feel you can do this, you are certainly well-prepared to install and use CHDL!

Downloading the Source

The most recent CHDL source can be obtained from GitHub, either by using the Git client:

$ git clone https://github.com/cdkersey/chdl.git

or as a .zip archive.

Building CHDL

In the CHDL source directory, type make. This will invoke GNU Make, which will read the Makefile and automatically build the source files and then link them into libchdl.so. This is the CHDL library, ready to use. If make errors out, please send a full bug report to chad@cdkersey.com.

Installing CHDL (optional)

To make CHDL available system-wide, run sudo make install. This is not strictly necessary, but if /usr/local/lib is in your library search path, linking against an installed CHDL becomes somewhat more straightforward for any user on the system.

Building and Running the Tests

CHDL includes both a set of tests and a set of example designs. As a first step, just to make sure everything has built correctly, run some tests. They provide an automated way to verify that CHDL is running correctly.

chdl$ cd test
test$ make
g++ -std=c++11 -O2 test_adder.cpp test_common.h -lchdl -o test_adder
g++ -std=c++11 -O2 test_mult.cpp test_common.h -lchdl -o test_mult
g++ -std=c++11 -O2 test_logic.cpp test_common.h -lchdl -o test_logic
g++ -std=c++11 -O2 test_shift.cpp test_common.h -lchdl -o test_shift
test$ LD_LIBRARY_PATH=../ make run
./test_adder > test_adder.run 2> test_adder.opt
./test_mult > test_mult.run 2> test_mult.opt
./test_logic > test_logic.run 2> test_logic.opt
./test_shift > test_shift.run 2> test_shift.opt

This last command will take a few seconds but should complete with no errors.

Exercise: Build and install CHDL on your machine, and run all of the tests.

Creating a CHDL Project

Now that we have verified that CHDL is working, it’s time to build something! We will start with the simplest sequential circuit imaginable: a 1-bit toggle that alternates values on each clock cycle. This will allow us to focus our attention on the fundamentals of CHDL without being distracted by the complexity of the design.

Starting a Project

CHDL designs are just C++ programs, so feel free to use whatever environment you normally use for editing and compiling C++ and take the following as a detailed step-by-step suggestion.

We start by creating a new directory for our project. Somewhere in the filesystem:

$ mkdir 0_blinkenlights
$ cd 0_blinkenlights

We then create our code file and a Makefile:

0_blinkenlights$ touch Makefile blinkenlights.cpp

Makefile should be initialized with the following text:

CXXFLAGS ?= -std=c++11
LDLIBS ?= -lchdl
blinkenlights: blinkenlights.cpp
        rm -f blinkenlights

Remember that in makefiles, tab characters have syntactic meaning, so be sure to indent using tabs instead of spaces wherever indentation is needed.

Now initialize blinkenlights.cpp with:

#include <iostream>
#include <fstream>
#include <chdl/chdl.h>
using namespace std;
using namespace chdl;
int main() {
  return 0;

This is the most basic, empty C++ program using CHDL. It includes the proper header files, brings the CHDL namespace into file scope and then does nothing. Before you build or run the test program, make sure the CHDL library is in both your library and header search path.

A test build/run should complete successfully but do nothing interesting:

0_blinkenlights$ make
g++ -std=c++11 blinkenlights.cpp -lchdl -o blinkenlights
0_blinkenlights$ ./blinkenlights

Building and Simulating a Simple Design

Now that we have our boilerplate code in place, it’s time to build something! To the top of the main() function, add the following code:

node x;
x = Reg(!x);
ofstream vcd("blinkenlights.vcd");
run(vcd, 100);

The first three statements create a signal called x and connect it to the output of a D flip-flop. The input of this D flip-flop is the complement of its output. This is our strobe. This signal is then tapped, making it visible as simulation output. The next pair of statements create an output file called “blinkenlights.vcd” and then run a simulation. This file can be viewed in GTKWave or another waveform viewer.

Compile and run the program again. There will again be no output to the console, but instead of doing nothing, the program will generate an output file called “blinkenlights.vcd”.

0_blinkenlights$ make
g++ -std=c++11 blinkenlights.cpp -lchdl -o blinkenlights
0_blinkenlights$ ./blinkenlights
0_blinkenlights$ ls
blinkenlights blinkenlights.cpp blinkenlights.vcd Makefile
0_blinkenlights$ gtkwave blinkenlights.vcd

Unless otherwise specified, the register is initialized to 0, leading to the following waveform:

This circuit can be thought of as a frequency divider dividing the (implicit, global) clock signal’s frequency by two. It is easy to start imagining a multitude of circuits that could be built and simulated in a similar fashion, but the reader is cautioned to read on first to get a better understanding of what this code does.

An Analysis of the Example

Only three lines of the example program can be thought of as an actual “design”:

node x;
x = Reg(!x);

Let’s examine these lines in depth. (If you would like even more depth, consider reading the companion CHDL Internals articles.) The first line declares a variable named x of type node. This is similar to a node in the circuit sense. It is a Boolean logic signal that can take on only the values ‘0’ or ‘1’. It has a default value, and the value in a node can be overwritten.

The second statement overwrites the initial value of the node x with the output of a register. It also highlights a few important things about CHDL syntax and style. Remember that, in C++, arguments are evaluated before the function is called, and the function is called before its result is assigned. So how can !x mean “not this register’s output” when this register’s output has not been assigned yet. The reason for this is simple. Assignments to nodes are effective globally. If two objects of type node refer to the same signal, when the source of that signal changes, it changes for all node objects that pointed to the same signal. After all, they represent the same node in the circuit!

The second statement also demonstrates a few other things. All of the usual C++ logical and equality operators (||, &&, ==, !=, !) have been overloaded to work with CHDL nodes, creating a notation for implementing combinational logic. Also, it points out a characteristic of function names in CHDL. Uppercase initial characters (usually followed by camelCase) are reserved for functions that implement hardware. In this example, Reg() is a function that takes a node argument and returns a node result. Finally, it demonstrates an important fact about sequential logic in CHDL. Clock signals are implicit. All registers have an implied clock input that is connected to a global clock signal.

The third line of the design simply illustrates the easiest way to make signals visible in simulation. Signals are non-visible by default and must be explicitly exposed as taps. The TAP(x) preprocessor macro creates a tap with the same name as the signal passed to it. Because this is used to name the signal, the argument should only be a C++ identifier. If you would like to create taps on expressions, either give the expression a name first or rename it by using the tap() function instead:

tap("new_name_for_x", x);

This also offers an example of the capitalization scheme. Because tap(), like run() represents a utility function and not a piece of hardware, it is given a lowercase name. Compare to the function version of the ! operator, Inv() in the following alternate way to write the second line:

x = Reg(Inv(x));

Because a introduction to CHDL would be criminally incomplete without mentioning literals, let’s examine a third way to implement this:

x = Reg(Xor(x, Lit(1)));

Lit(1) creates a node that has the fixed value of 1, and Lit(0) does the same thing for 0. (The Xor() function here was chosen instead of its equivalent != operator for the sake of clarity.)

Those are the basics. There’s quite a bit you can do with only this information, and in fact this can be used as the basis for a quite complex simulation system. The only limitation being that taps are single bit signals. All of the extra functionality provided to the designer in CHDL could be implemented using only these basic elements, and indeed, moving it to a library separate from the CHDL core has been considered.

Exercise: Create a second node, y, that blinks half as fast as x.

Netlists and Optimization

Simulation is useful, but it would be nice to be able to fabricate these designs or at least load them into an FPGA. In order to do this there has to be a way to dump the design out of CHDL. Indeed there is. Add the following lines somewhere after the design and before the return statement in your main function:

ofstream netl(“blinkenlights.nand”);

When you build and run your program again, you will receive the following output in a file called “blinkenlights.nand”:

  x 2
  litX 0
  inv 2 1
  reg 1 2

Because it was tapped, x is listed as a 1-bit output. Each line in the design section corresponds to a single gate. Its inputs are listed first followed by its output, as node indices. This one should make sense for the most part. The register’s output is inverted and fed back into the register’s input. The output is also tapped as x. But what’s this litX here on node 0. Its value is unused. If you guess that it must be the leftover initial value of x before it was reassigned, you would be correct. But it’s not used. Shouldn’t we be able to get rid of it?

Of course we should, and we can. A set of simple optimizations is built into CHDL to handle cases of unused (dead) nodes, duplicated logic, and simple constant folding. To invoke it, call the optimize() function with no arguments. If we add the following line:


just after the last line of our design, we get the following output (on stderr) when we run the program:

 Before optimization: 3
 After dead node elimination: 2
 After contraction: 2
 After combining literals: 2
 After redundant expression elimination: 2
 After tri-state merge: 2

The numbers represent the number of nodes in the circuit after each optimization step. Because of the simplicity of our design, only the single dead node elimination was necessary. Our netlist is now a little more compact:

  x 1
  inv 1 0
  reg 0 1

Verilog Output

This ad-hoc netlist format, while easy enough to parse by software, is of no use to us if we would like to actually, say, run this design on an FPGA. For that we are going to need to use CHDL’s Verilog output feature. Somewhere after the call to optimize() and before return, add the following lines to your file:

ofstream verilog("blinkenlights.v");
print_verilog("blink", verilog);

The first argument to print_verilog() is the name of the Verilog module being output, as a string. Compile, run, and inspect the output. Nothing too terribly interesting, all of the nodes are wires or registers whose names start with __x. Something is missing, though. If you look at the top of the Verilog output file, you will notice that the module has a single input called phi for the clock signal and no output. Instead, x is made an internal signal so that it can be easily viewed in simulation. How do we create an output, then? Luckily, there’s a macro for that. Simply replace TAP(x) with OUTPUT(x). The variable x is now a module output. Success!

Moving Forward

This tutorial has hopefully provided a starting point for anybody interested in using CHDL to develop hardware. We’re still a long way from building floating point units and microprocessors, but perhaps not as far away as the reader expects. Next time, we will use a more complex design to explore CHDL’s other hardware design features. Until then, there is plenty to try!

Exercise: Get CHDL-generated Verilog to compile as a top-level design in your FPGA toolchain of choice.

Exercise: Build a simple ripple-carry adder that takes two arrays of nodes as input and returns an array of nodes as output.

Exercise: Use the adder from the previous exercise to build a binary counter.


FPGAs are fun. They provide an execution environment for RTL models quick enough to rival custom hardware, at a tiny fraction of the initial investment. This makes them highly appealing to builders of prototypes, those needing to use high-frequency interfaces, and hobbyists, who build copies of historically significant computers from the mid 1980’s that run slower than emulators on modern hardware because why not (http://www.bigmessowires.com/plus-too/).

So of course CHDL code can be run on FPGAs. Just not directly. The configuration formats for FPGAs are proprietary and technology mapping to FPGA lookup tables has not yet been implemented. But why bother to generate an FPGA configuration bitstream for one product by one vendor when you can generate synthesizable Verilog and let the vendor’s proprietary tools do the translation and optimization?

This is what I’ve done. The result is satisfying:


So, of course, if you do FPGA development at all, this is something you should do as well. Here’s how:

  • Get CHDL. (https://github.com/cdkersey/chdl)
  • Create a design…
  • Compile the design with an ordinary C++ compiler.
  • Write a program that:
    • Instantiates the design.
    • Calls optimize().
    • Calls print_verilog().
  • Place this Verilog code in a file called “chdl_design.v” and import it into the FPGA toolchain of your choice.

The rest is your standard FPGA workflow, for which there are plenty of tutorials on the Internet. Among the advantages of the CHDL workflow is that it pushes the proprietary tools out to the margins of the design flow. They are still responsible for much of the optimization, technology mapping, pin assignment, and the like, but as long as they speak a simple synthesizable subset of Verilog, they become completely interchangeable.

A Bit About the Demo

The demo code is a simple VGA terminal meant to be clocked at 50 MHz, with a simple parallel interface. The character ROM is stored in human-readable format in the file FONT, from which it is converted to hex by the simple program in font2hex.cpp. Attached to this VGA controller is a text generator, which outputs characters at a human-readable rate from a ROM, whose contents are initialized based on the file TEXT, which is converted into hex in the makefile using a simple hexdump command.

The entire design uses 1691 LUTs and 29k bits of block RAM on an Altera Cyclone II. Somewhat surprising is that this is more than the total number of nodes in the design after optimization by the CHDL toolchain, but this is easily explained away by duplication for the purpose of performance, since area is plentiful (on the demo board, this is ~10% of available resources).