CHDL Tutorial 1: Conway’s Game of Life

by chad

This is the second in a series of articles introducing the CHDL C++ Hardware Desgin Libary. This time around, we’ll be looking at some of the basic features of CHDL, such as vectors of bits, and some of the features that make CHDL unique, including the use of C++ template metaprogramming. To illustrate these, we will implement a popular cellular automaton and software toy, Conway’s Game of Life.

bvecs: Collections of Nodes

Before we dive into the example, let’s build a very basic circuit, a 4-bit binary counter, using CHDL bit vectors:

bvec<4> ctr;
ctr = Reg(ctr + Lit(1));

We can wrap this in a function, yielding:

bvec<4> Ctr() {
  bvec<4> c;
  c = Reg(c + Lit<4>(1));
  return c;
}

This demonstrates several features of the bvec type. Just like Lit(0) and Lit(1) provide literal 0 and 1 values for nodes, Lit<N>(x) provides literal 2’s-complement integer values for bvec<N>s. A register function is provided and operates in exactly the same way, creating a register for each node in the bvec. Assignment, just like with nodes, is retroactive. The initial value of c is overwritten when it is assigned with the output of a register, and it is a function of the output of that register (and not the original contents of c) that serves as the input to the same register. The addition operator (as well as subtraction, multiplication, and division) are overloaded.

Exercise: Create a program that instantiates the 4-bit counter shown here and simulates it for 16 cycles, dumping the simulation waveform to a .vcd file. View this result in GTKWave to verify the correct behavior.

Indexing bvecs

CHDL bvecs can be indexed in two ways, one safer than the other. Arbitrary integers can be used to index CHDL arrays, allowing their assignment from within loops. For example, the following loop initializes a 256×8 array with literal binary numbers from 0 through 255:

vec<256, bvec<8> > x;
for (unsigned i = 0; i < 256; ++i)
  x[i] = Lit<8>(i);

Because of the nature of integral types in C++ it is impossible to catch the following error at compile time:

bvec<2> v;
v[2] = Lit(0);

Instead, this leads to a run-time error that causes the program to abort. Such errors are easy to diagnose in a debugger like GDB.

The other way to index arrays in CHDL leads to errors that can be found at compile time and allows for ranges of values to be selected:

bvec<32> addr;
bvec<20> tag = addr[range<12,31>()];
bvec<8> idx = addr[range<4,11>()];
bvec<4> offset = addr[range<0,3>()];

In this example, a 32-bit address is divided into a 20-bit tag, 8-bit index, and a 4-bit offset for use by a cache with 16-byte blocks and 4kB sets. Because they are actually part of the type’s identity, the template arguments of range must be compile time constant. This means that the following mistake:

bvec offset = addr[range<1,3>()];

leads to a compile time error.

Multi-Dimensional Arrays

The bvec type itself is really just a convenient shorthand. An equivalent for bvec<N&gt is vec<N, node>. The actual basic template here is not bvec, but an even more general vec template that acts as an extension to the usual fixed-length C array. There is no requirement that vec types contain only CHDL types, but they are intended to be used as such. In order to create multi-dimensional arrays, vecs can contain other vecs:

vec<8, bvec > matrix;

This can then be addressed by a multiplexer:

bvec sel;
bvec byte = Mux(sel, matrix);

Conway’s Game of Life

Before we implement Conway’s Game of Life, we should probably spend some time talking about what it is. Conway’s Game of Life is best described as a cellular automaton. It is not a game by any lay usage of the word. It is instead a system; a way for state to evolve over time based on certain rules. It just happens that despite the simplicity of the rules, some very fascinating and complex patterns emerge in Life when it is initialized with random input. When it is initialized with carefully-designed input, wonderfully complex creations have been constructed, as shown here, here, and here.

The game board is a 2D grid of cells with one of two states: set or cleared. All of the cells advance to their next state at the same time and the next state of any cell depends only on the current state of the cell itself and its eight neighbors (four in the cardinal directions and four to the corners). The rules are simple: any cell with exactly three set neighbors becomes set. Any cell with less than two neighbors set is cleared. Any cell with more than three neighbors set is cleared.

Implementing Conway’s Game of Life

A Quick Review of C++ Template Functions

Most of the features of CHDL are designed to interoperate with C++ template metaprogramming. Because of the reliance on C++ features, most errors in the type semantics of CHDL designs can be caught at compile time, and the resulting object code is optimized for relatively quick design elaboration. Since our design for a universal population count module relies on C++ template functions, we will now briefly review the concepts used. A detailed introduction C++ templates can be found in the C++ FAQ here.

C++ templates give us a way to define, in header files, functions and structs that have certain parameters. These parameters can be types, integers, functions, and other templates. The vec<int N, typename T> template we’ve already seen in this tutorial is a templated type that takes an integer and a type argument and uses these to describe a vector of length N containing objects of type T. In addition to templated types like vec, there are also template functions. An example of a template function we’ve already seen is the bvec version of Reg, as used above. Its signature is:

template<unsigned N> bvec<N> Reg(const bvec<N> &d);

Another example if Lit<N>, whose signature is:

template<unsigned N> bvec<N> Lit(unsigned long val);

Because it is easy to infer the length of the register from its parameter, it is not necessary to specify a length for Reg(). For Lit(), the length of the return type cannot be inferred from the arguments, so a template parameter is needed. This is why, in the counter example, the expression for the value of the counter is Reg(c + Lit<4>(1)).

A useful feature of templates is template specialization. If, say, I had a really well-optimzied implementation of a function like Reg that only worked for bvecs of length 8, I could declare it as:

template<> bvec Reg(const bvec &d);

This could live alongside my default implementation, only being used when N is 8.

constexpr Functions and CLOG2

Since C++11, C++ has offered a feature that allows certain simple functions to be evaluated at compile time with the use of the constexpr keyword. The results are constant as long as the inpts remain constant. This ensures type safety and succinct error messages.

In CHDL, constexpr functions are provided as utility functions for common arithmetic performed when evaluating the dimensions of hardware designs: especially the base-2 logarithm, with both the integer floor and integer ceiling operator provided.

The CHDL CLOG2() constexpr function provides a convenient way to evaluate the number of bits needed to uniquely identify N elements. The signature of the Mux function seen above, for example, is:

template
  T Mux(const bvec<CLOG2(N)> &sel, const vec<N, T> &in);

In previous versions of CHDL, CLOG2 was implemented as a preprocessor macro. In addition to providing no type checking, this lengthy bit of combinational logic was expanded in error messages by G++.

Population Count

An operation that comes up occasionally in computing; frequently enough, in fact that it should probably be in the CHDL standard library but currently is not, is the following: How many bits in a given word are set? This is known as population count, and is popular enough in real software that it is often featured as a processor instruction. (It is POPCNT on x86 and VCNT on ARM.)

The population count operation needed to implement Conway’s Game of Life could be implemented in a very rigid, fixed way, supporting exactly the 8 requisite neighboring bits, no more, no less. In the following code, Zext provides a zero extension operation. It pads the upper bits of a word with zeroes until it is the length of Zext‘s template argument:

bvec<4> PopCount(bvec<8> in) {
  vec<4, bvec<2> > count2;
  for (unsigned i = 0; i < 4; ++i)
    count2[i] = Zext<2>(in[2*i]) + Zext<2>(in[2*i + 1]);
    
  vec<2, bvec<3> > count4;
  for (unsigned i = 0; i < 2; ++i)
    count4[i] = Zext<3>(count2[2*i]) + Zext<3>(count2[2*i + 1]);

  return Zext<4>(count4[0]) + Zext<4>(count4[1]);
}

This code is ugly: It’s verbose, and it solves a problem so specific it will almost never happen. This is code that yearns for a recursive solution. The same basic pattern repeats itself three times, in a borderline violation of the DRY principle. If we tried to simply implement PopCount as a recursive function, we would run into a problem: the size of the input and output bvecs would remain the same, leading to quite a bit of wasted space. It is still possible to do:

bvec<4> PopCount(bvec<8> in, unsigned level = 0) {
  if (level < 3) {
    bvec<4> a, b;
    Cat(a, b) = in;
    return PopCount(Zext<8>(a), level+1) + PopCount(Zext<8>(b), level+1);
  } else {
    return in;
  }
}

This is perhaps a little better, and not as bad as it seems; all of those additional zeroes will ultimately be reclaimed by the optimizer. Still, unnecessary zero extensions abound, and the level parameter is shoehorned in to provide a base case. If we didn’t care about type safety and wanted to use the C++ STL vector instead of CHDL vec, we could write (assuming we had proper operator overloads for arithmetic available):

vector<node> PopCount(vector<node> in) {
  if (in.size() == 1) return in;
  else {
    // Split input into two vectors
    vector<node> a(in.size()/2), b(in.size() - a.size());
    for (unsigned i = 0; i < in.size(); ++i)
      if (i < a.size()) a[i] = in[i];
      else b[i - a.size()] = in[i];

    // Get population count of each vector.
    return PopCount(a) + PopCount(b);
  }
}

This is also ungainly in its own way; C++ vectors were not designed to be easily divisible into subvectors so we use loops to manually copy the input bit-by-bit.

Exercise: Why might we avoid using the insert() operator on a C++ vector of CHDL nodes? Hint: insert() will use the assignment operator to move the contents of the vector over by one position to make room for the new element.

None of the preceding examples represent the preferred way of implementing population count in CHDL. They represent designs with their own trade-offs that may be used as called for by this situation or that, but which are wholly unnecessary for an operation as basic as population count.

Recursively-defined functions in CHDL are typically implemented using a template with the base case represented by a specialization:

template<unsigned N> bvec<CLOG2(N+1)> PopCount(bvec<N> x) {
  return Zext<CLOG2(N+1)>(PopCount(x[range<0,N/2-1>()])) +
         Zext<CLOG2(N+1)>(PopCount(x[range<N/2,N-1>()]));
}

template<> bvec<1> PopCount<1>(bvec<1> x) { return x; }

This is the CHDL way. This implementation uses the same algorithm as the previous three examples and ultimately reduces to the same hardware, but it is general, succinct, and type safe.

Next State

For computing the next state of a cell, we will use a feature of CHDL we have not discussed before, the conditional assignment or Cassign() function. Cassign() provides an unusual syntax for solving the problem of computing future state, as shown in the following example:

template<unsigned long X> node Pulse() {
  const unsigned N(CLOG2(X));
  bvec<N> next_ctr, ctr(Reg(next_ctr));
  node p(ctr == Lit<N>(X-1));

  Cassign(next_ctr).
    IF(p, Lit<N>(0)).
    ELSE(ctr + Lit<N>(1));

  return p;
}

This function emits a 1-cycle-long pulse once every X cycles. The Cassign() function is used to determine the next value of the counter, with the output p acting like a reset signal. Assuming that a population count, count, for all neighboring cells is available, finding the next state next_alive for a given cell with current value alive is trivial:

Cassign(next_alive).
  IF(count < Lit<4>(2), Lit(0)).
  IF(count > Lit<4>(3), Lit(0)).
  IF(count == Lit<4>(3), Lit(1)).
  ELSE(alive);

Using this in combination with our previously-defined PopCount() function, we can create our function for a single cell:

node LifeCell(bvec<8> neighbors, bool init = false) {
  bvec<4> count(PopCount(neighbors));
  node next_alive, alive(Reg(next_alive, init));

  Cassign(next_alive).
    IF(count < Lit<4>(2), Lit(0)).
    IF(count > Lit<4>(3), Lit(0)).
    IF(count == Lit<4>(3), Lit(1)).
    ELSE(alive);

  return alive;
}

The init argument to this function represents the initial state of the cell. The Reg function has a second argument with a default value of 0 representing the initial value of the registers. This is true for both Reg(bvec<N>) and, as shown here, Reg(node).

Getting to Know Your Neighbors

Conway’s Game of Life can be thought of as a simple stencil operation. The same function is applied to every element of an array and used to determine that element’s next value. The only inputs used to determine each element’s value come from the array and have the same shape at each point, translated by that point’s position. For Conway’s game of life, the shape of the input stencil is that of a node and its neighbors. In the following image, the red element’s next value depends on its own value and those of all of the blue elements. The highlighted region can be shifted throughout the region and the operation can be repeated:

1_stencil

This is a very straightforward computation but with one major caveat: what do we do at the edges and corners? The sensible options are to assume that every cell beyond the edge has the value 0 or to assume that the edges wrap around, giving the board a toroidal topology. To accomplish this, we create a function called Get which performs the indexing operation:

template <bool T, unsigned X, unsigned Y>
  node Get(vec<Y, bvec<X> > &g, int i, int j)
{
  if (T) {
    while (i < 0) i += X;
    while (j < 0) j += Y;
    while (i >= X) i -= X;
    while (j >= Y) j -= Y;
  }

  if (i < 0 || i >= X || j < 0 || j >= Y) return Lit(0);
  else return g[i][j];
}

By making whether the space is toroidal or surrounded by zeroes a template parameter T, we make it possible to refer to one version or the other of Get through a function pointer in our Neighbors function. We define our function pointer G to avoid having to repeatedly call Get<T> for each of the eight neighboring points.

template <bool T, unsigned X, unsigned Y>
  bvec<8> Neighbors(vec<Y, bvec<X> > &g, unsigned i, unsigned j)
{
  auto G(Get<T, X, Y>);

  return bvec{
    G(g, i-1, j-1), G(g, i-1,   j), G(g, i-1, j+1), G(g,   i, j-1),
    G(g,   i, j+1), G(g, i+1, j-1), G(g, i+1,   j), G(g, i+1, j+1)
  };
}

This introduces another not-previously-mentioned-in-these-tutorials feature, the list constructor for CHDL vecs. It allows a comma-separated list of nodes to be used to initialize a bvec, or for that matter any comma-separated list of objects of type T to be used to initialize a vec<T, N>.

LifeGrid

Once we have our Neighbors function and our LifeCell function, we can implement the entire grid of cells. Once again, we punt on the decision of how the space should be layed out, by making it a bool template parameter T:

template <bool T, unsigned X, unsigned Y>
  vec<Y, bvec<X> > LifeGrid(bool *init)
{
  vec<Y, bvec<X> > g;

  for (unsigned i = 0; i < Y; ++i)
    for (unsigned j = 0; j < X; ++j)
      g[i][j] = LifeCell(Neighbors<T>(g, i, j), init[j * X + i]);

  return g;
}

In this function, init is a pointer to an X*Y-element array of bools representing the initial state of the grid. The 2D array of nodes g is declared and then assigned point-by-point with LifeCell functions. The return value of this function is this same array.

A main function using this is simple:

int main() {
  const unsigned X(16), Y(16);
  bool init[X*Y];

  srand(100);
  for (unsigned i = 0; i < X*Y; ++i) init[i] = (rand()%4 == 0);

  vec<Y, bvec<X> > g = LifeGrid<1, X, Y>(init);

  TAP(g);

  optimize();

  ofstream vcd("life.vcd");
  run(vcd, 1000);

  return 0;
}

This function initializes our initial state array to random values and calls LifeGrid(). In this case, the toroidal cell space is used, but with a 0 instead of a 1 as the first template argument to LifeGrid(), we could use a board with no active cells beyond the boundaries instead.

Improved Output Using Egress

It would be nice if we could, instead of just calling run() and looking at the output file, see the results of this simulation spatially, in the terminal. In general, it would be nice if simulated hardware had flexible I/O so that it could interface with our C++ code. CHDL provides several ways to do this, the most common of which are the Ingress() and Egress() interfaces for getting data into and out of the CHDL simulator. Let’s add #include <chdl/egress.h> to the top of our source file and replace our call to run() with the folliowing:

bool val[Y][X];
for (unsigned i = 0; i < Y; ++i)
  for (unsigned j = 0; j < X; ++j)
    Egress(val[i][j], g[i][j]);

for (unsigned cyc = 0; cyc < 10000; ++cyc) {
  advance();
  for (unsigned i = 0; i < Y; ++i) {
    for (unsigned j = 0; j < X; ++j) {
      cout << (val[i][j]?"[]":"  ");
    }
    cout << endl;
  }
}

The val array is now set, at the beginning of each cycle, with the values of each of the nodes on our board.
The node states can now be viewed directly. Try using the less command in a 17-line-tall terminal and holding spacebar to view the output of this program. The patterns seem far more interesting when a toroidal space is used, as in the following screenshot:

1_running