☮ Chad D. Kersey ♡

A Weblog

Tag: C++

CHDL Tutorial Slides

Approximately annually, I have been giving brief but in-depth introductions to CHDL. The slide carousel I use for this is now self-contained enough that I am comfortable posting it to the Web for general consumption:

Spring 2015 CHDL Tutorial Slides

CHDL Tutorial 1: Conway’s Game of Life

This is the second in a series of articles introducing the CHDL C++ Hardware Desgin Libary. This time around, we’ll be looking at some of the basic features of CHDL, such as vectors of bits, and some of the features that make CHDL unique, including the use of C++ template metaprogramming. To illustrate these, we will implement a popular cellular automaton and software toy, Conway’s Game of Life.

bvecs: Collections of Nodes

Before we dive into the example, let’s build a very basic circuit, a 4-bit binary counter, using CHDL bit vectors:

bvec<4> ctr;
ctr = Reg(ctr + Lit(1));

We can wrap this in a function, yielding:

bvec<4> Ctr() {
  bvec<4> c;
  c = Reg(c + Lit<4>(1));
  return c;
}

This demonstrates several features of the bvec type. Just like Lit(0) and Lit(1) provide literal 0 and 1 values for nodes, Lit<N>(x) provides literal 2’s-complement integer values for bvec<N>s. A register function is provided and operates in exactly the same way, creating a register for each node in the bvec. Assignment, just like with nodes, is retroactive. The initial value of c is overwritten when it is assigned with the output of a register, and it is a function of the output of that register (and not the original contents of c) that serves as the input to the same register. The addition operator (as well as subtraction, multiplication, and division) are overloaded.

Exercise: Create a program that instantiates the 4-bit counter shown here and simulates it for 16 cycles, dumping the simulation waveform to a .vcd file. View this result in GTKWave to verify the correct behavior.

Indexing bvecs

CHDL bvecs can be indexed in two ways, one safer than the other. Arbitrary integers can be used to index CHDL arrays, allowing their assignment from within loops. For example, the following loop initializes a 256×8 array with literal binary numbers from 0 through 255:

vec<256, bvec<8> > x;
for (unsigned i = 0; i < 256; ++i)
  x[i] = Lit<8>(i);

Because of the nature of integral types in C++ it is impossible to catch the following error at compile time:

bvec<2> v;
v[2] = Lit(0);

Instead, this leads to a run-time error that causes the program to abort. Such errors are easy to diagnose in a debugger like GDB.

The other way to index arrays in CHDL leads to errors that can be found at compile time and allows for ranges of values to be selected:

bvec<32> addr;
bvec<20> tag = addr[range<12,31>()];
bvec<8> idx = addr[range<4,11>()];
bvec<4> offset = addr[range<0,3>()];

In this example, a 32-bit address is divided into a 20-bit tag, 8-bit index, and a 4-bit offset for use by a cache with 16-byte blocks and 4kB sets. Because they are actually part of the type’s identity, the template arguments of range must be compile time constant. This means that the following mistake:

bvec offset = addr[range<1,3>()];

leads to a compile time error.

Multi-Dimensional Arrays

The bvec type itself is really just a convenient shorthand. An equivalent for bvec<N&gt is vec<N, node>. The actual basic template here is not bvec, but an even more general vec template that acts as an extension to the usual fixed-length C array. There is no requirement that vec types contain only CHDL types, but they are intended to be used as such. In order to create multi-dimensional arrays, vecs can contain other vecs:

vec<8, bvec > matrix;

This can then be addressed by a multiplexer:

bvec sel;
bvec byte = Mux(sel, matrix);

Conway’s Game of Life

Before we implement Conway’s Game of Life, we should probably spend some time talking about what it is. Conway’s Game of Life is best described as a cellular automaton. It is not a game by any lay usage of the word. It is instead a system; a way for state to evolve over time based on certain rules. It just happens that despite the simplicity of the rules, some very fascinating and complex patterns emerge in Life when it is initialized with random input. When it is initialized with carefully-designed input, wonderfully complex creations have been constructed, as shown here, here, and here.

The game board is a 2D grid of cells with one of two states: set or cleared. All of the cells advance to their next state at the same time and the next state of any cell depends only on the current state of the cell itself and its eight neighbors (four in the cardinal directions and four to the corners). The rules are simple: any cell with exactly three set neighbors becomes set. Any cell with less than two neighbors set is cleared. Any cell with more than three neighbors set is cleared.

Implementing Conway’s Game of Life

A Quick Review of C++ Template Functions

Most of the features of CHDL are designed to interoperate with C++ template metaprogramming. Because of the reliance on C++ features, most errors in the type semantics of CHDL designs can be caught at compile time, and the resulting object code is optimized for relatively quick design elaboration. Since our design for a universal population count module relies on C++ template functions, we will now briefly review the concepts used. A detailed introduction C++ templates can be found in the C++ FAQ here.

C++ templates give us a way to define, in header files, functions and structs that have certain parameters. These parameters can be types, integers, functions, and other templates. The vec<int N, typename T> template we’ve already seen in this tutorial is a templated type that takes an integer and a type argument and uses these to describe a vector of length N containing objects of type T. In addition to templated types like vec, there are also template functions. An example of a template function we’ve already seen is the bvec version of Reg, as used above. Its signature is:

template<unsigned N> bvec<N> Reg(const bvec<N> &d);

Another example if Lit<N>, whose signature is:

template<unsigned N> bvec<N> Lit(unsigned long val);

Because it is easy to infer the length of the register from its parameter, it is not necessary to specify a length for Reg(). For Lit(), the length of the return type cannot be inferred from the arguments, so a template parameter is needed. This is why, in the counter example, the expression for the value of the counter is Reg(c + Lit<4>(1)).

A useful feature of templates is template specialization. If, say, I had a really well-optimzied implementation of a function like Reg that only worked for bvecs of length 8, I could declare it as:

template<> bvec Reg(const bvec &d);

This could live alongside my default implementation, only being used when N is 8.

constexpr Functions and CLOG2

Since C++11, C++ has offered a feature that allows certain simple functions to be evaluated at compile time with the use of the constexpr keyword. The results are constant as long as the inpts remain constant. This ensures type safety and succinct error messages.

In CHDL, constexpr functions are provided as utility functions for common arithmetic performed when evaluating the dimensions of hardware designs: especially the base-2 logarithm, with both the integer floor and integer ceiling operator provided.

The CHDL CLOG2() constexpr function provides a convenient way to evaluate the number of bits needed to uniquely identify N elements. The signature of the Mux function seen above, for example, is:

template
  T Mux(const bvec<CLOG2(N)> &sel, const vec<N, T> &in);

In previous versions of CHDL, CLOG2 was implemented as a preprocessor macro. In addition to providing no type checking, this lengthy bit of combinational logic was expanded in error messages by G++.

Population Count

An operation that comes up occasionally in computing; frequently enough, in fact that it should probably be in the CHDL standard library but currently is not, is the following: How many bits in a given word are set? This is known as population count, and is popular enough in real software that it is often featured as a processor instruction. (It is POPCNT on x86 and VCNT on ARM.)

The population count operation needed to implement Conway’s Game of Life could be implemented in a very rigid, fixed way, supporting exactly the 8 requisite neighboring bits, no more, no less. In the following code, Zext provides a zero extension operation. It pads the upper bits of a word with zeroes until it is the length of Zext‘s template argument:

bvec<4> PopCount(bvec<8> in) {
  vec<4, bvec<2> > count2;
  for (unsigned i = 0; i < 4; ++i)
    count2[i] = Zext<2>(in[2*i]) + Zext<2>(in[2*i + 1]);
    
  vec<2, bvec<3> > count4;
  for (unsigned i = 0; i < 2; ++i)
    count4[i] = Zext<3>(count2[2*i]) + Zext<3>(count2[2*i + 1]);

  return Zext<4>(count4[0]) + Zext<4>(count4[1]);
}

This code is ugly: It’s verbose, and it solves a problem so specific it will almost never happen. This is code that yearns for a recursive solution. The same basic pattern repeats itself three times, in a borderline violation of the DRY principle. If we tried to simply implement PopCount as a recursive function, we would run into a problem: the size of the input and output bvecs would remain the same, leading to quite a bit of wasted space. It is still possible to do:

bvec<4> PopCount(bvec<8> in, unsigned level = 0) {
  if (level < 3) {
    bvec<4> a, b;
    Cat(a, b) = in;
    return PopCount(Zext<8>(a), level+1) + PopCount(Zext<8>(b), level+1);
  } else {
    return in;
  }
}

This is perhaps a little better, and not as bad as it seems; all of those additional zeroes will ultimately be reclaimed by the optimizer. Still, unnecessary zero extensions abound, and the level parameter is shoehorned in to provide a base case. If we didn’t care about type safety and wanted to use the C++ STL vector instead of CHDL vec, we could write (assuming we had proper operator overloads for arithmetic available):

vector<node> PopCount(vector<node> in) {
  if (in.size() == 1) return in;
  else {
    // Split input into two vectors
    vector<node> a(in.size()/2), b(in.size() - a.size());
    for (unsigned i = 0; i < in.size(); ++i)
      if (i < a.size()) a[i] = in[i];
      else b[i - a.size()] = in[i];

    // Get population count of each vector.
    return PopCount(a) + PopCount(b);
  }
}

This is also ungainly in its own way; C++ vectors were not designed to be easily divisible into subvectors so we use loops to manually copy the input bit-by-bit.

Exercise: Why might we avoid using the insert() operator on a C++ vector of CHDL nodes? Hint: insert() will use the assignment operator to move the contents of the vector over by one position to make room for the new element.

None of the preceding examples represent the preferred way of implementing population count in CHDL. They represent designs with their own trade-offs that may be used as called for by this situation or that, but which are wholly unnecessary for an operation as basic as population count.

Recursively-defined functions in CHDL are typically implemented using a template with the base case represented by a specialization:

template<unsigned N> bvec<CLOG2(N+1)> PopCount(bvec<N> x) {
  return Zext<CLOG2(N+1)>(PopCount(x[range<0,N/2-1>()])) +
         Zext<CLOG2(N+1)>(PopCount(x[range<N/2,N-1>()]));
}

template<> bvec<1> PopCount<1>(bvec<1> x) { return x; }

This is the CHDL way. This implementation uses the same algorithm as the previous three examples and ultimately reduces to the same hardware, but it is general, succinct, and type safe.

Next State

For computing the next state of a cell, we will use a feature of CHDL we have not discussed before, the conditional assignment or Cassign() function. Cassign() provides an unusual syntax for solving the problem of computing future state, as shown in the following example:

template<unsigned long X> node Pulse() {
  const unsigned N(CLOG2(X));
  bvec<N> next_ctr, ctr(Reg(next_ctr));
  node p(ctr == Lit<N>(X-1));

  Cassign(next_ctr).
    IF(p, Lit<N>(0)).
    ELSE(ctr + Lit<N>(1));

  return p;
}

This function emits a 1-cycle-long pulse once every X cycles. The Cassign() function is used to determine the next value of the counter, with the output p acting like a reset signal. Assuming that a population count, count, for all neighboring cells is available, finding the next state next_alive for a given cell with current value alive is trivial:

Cassign(next_alive).
  IF(count < Lit<4>(2), Lit(0)).
  IF(count > Lit<4>(3), Lit(0)).
  IF(count == Lit<4>(3), Lit(1)).
  ELSE(alive);

Using this in combination with our previously-defined PopCount() function, we can create our function for a single cell:

node LifeCell(bvec<8> neighbors, bool init = false) {
  bvec<4> count(PopCount(neighbors));
  node next_alive, alive(Reg(next_alive, init));

  Cassign(next_alive).
    IF(count < Lit<4>(2), Lit(0)).
    IF(count > Lit<4>(3), Lit(0)).
    IF(count == Lit<4>(3), Lit(1)).
    ELSE(alive);

  return alive;
}

The init argument to this function represents the initial state of the cell. The Reg function has a second argument with a default value of 0 representing the initial value of the registers. This is true for both Reg(bvec<N>) and, as shown here, Reg(node).

Getting to Know Your Neighbors

Conway’s Game of Life can be thought of as a simple stencil operation. The same function is applied to every element of an array and used to determine that element’s next value. The only inputs used to determine each element’s value come from the array and have the same shape at each point, translated by that point’s position. For Conway’s game of life, the shape of the input stencil is that of a node and its neighbors. In the following image, the red element’s next value depends on its own value and those of all of the blue elements. The highlighted region can be shifted throughout the region and the operation can be repeated:

1_stencil

This is a very straightforward computation but with one major caveat: what do we do at the edges and corners? The sensible options are to assume that every cell beyond the edge has the value 0 or to assume that the edges wrap around, giving the board a toroidal topology. To accomplish this, we create a function called Get which performs the indexing operation:

template <bool T, unsigned X, unsigned Y>
  node Get(vec<Y, bvec<X> > &g, int i, int j)
{
  if (T) {
    while (i < 0) i += X;
    while (j < 0) j += Y;
    while (i >= X) i -= X;
    while (j >= Y) j -= Y;
  }

  if (i < 0 || i >= X || j < 0 || j >= Y) return Lit(0);
  else return g[i][j];
}

By making whether the space is toroidal or surrounded by zeroes a template parameter T, we make it possible to refer to one version or the other of Get through a function pointer in our Neighbors function. We define our function pointer G to avoid having to repeatedly call Get<T> for each of the eight neighboring points.

template <bool T, unsigned X, unsigned Y>
  bvec<8> Neighbors(vec<Y, bvec<X> > &g, unsigned i, unsigned j)
{
  auto G(Get<T, X, Y>);

  return bvec{
    G(g, i-1, j-1), G(g, i-1,   j), G(g, i-1, j+1), G(g,   i, j-1),
    G(g,   i, j+1), G(g, i+1, j-1), G(g, i+1,   j), G(g, i+1, j+1)
  };
}

This introduces another not-previously-mentioned-in-these-tutorials feature, the list constructor for CHDL vecs. It allows a comma-separated list of nodes to be used to initialize a bvec, or for that matter any comma-separated list of objects of type T to be used to initialize a vec<T, N>.

LifeGrid

Once we have our Neighbors function and our LifeCell function, we can implement the entire grid of cells. Once again, we punt on the decision of how the space should be layed out, by making it a bool template parameter T:

template <bool T, unsigned X, unsigned Y>
  vec<Y, bvec<X> > LifeGrid(bool *init)
{
  vec<Y, bvec<X> > g;

  for (unsigned i = 0; i < Y; ++i)
    for (unsigned j = 0; j < X; ++j)
      g[i][j] = LifeCell(Neighbors<T>(g, i, j), init[j * X + i]);

  return g;
}

In this function, init is a pointer to an X*Y-element array of bools representing the initial state of the grid. The 2D array of nodes g is declared and then assigned point-by-point with LifeCell functions. The return value of this function is this same array.

A main function using this is simple:

int main() {
  const unsigned X(16), Y(16);
  bool init[X*Y];

  srand(100);
  for (unsigned i = 0; i < X*Y; ++i) init[i] = (rand()%4 == 0);

  vec<Y, bvec<X> > g = LifeGrid<1, X, Y>(init);

  TAP(g);

  optimize();

  ofstream vcd("life.vcd");
  run(vcd, 1000);

  return 0;
}

This function initializes our initial state array to random values and calls LifeGrid(). In this case, the toroidal cell space is used, but with a 0 instead of a 1 as the first template argument to LifeGrid(), we could use a board with no active cells beyond the boundaries instead.

Improved Output Using Egress

It would be nice if we could, instead of just calling run() and looking at the output file, see the results of this simulation spatially, in the terminal. In general, it would be nice if simulated hardware had flexible I/O so that it could interface with our C++ code. CHDL provides several ways to do this, the most common of which are the Ingress() and Egress() interfaces for getting data into and out of the CHDL simulator. Let’s add #include <chdl/egress.h> to the top of our source file and replace our call to run() with the folliowing:

bool val[Y][X];
for (unsigned i = 0; i < Y; ++i)
  for (unsigned j = 0; j < X; ++j)
    Egress(val[i][j], g[i][j]);

for (unsigned cyc = 0; cyc < 10000; ++cyc) {
  advance();
  for (unsigned i = 0; i < Y; ++i) {
    for (unsigned j = 0; j < X; ++j) {
      cout << (val[i][j]?"[]":"  ");
    }
    cout << endl;
  }
}

The val array is now set, at the beginning of each cycle, with the values of each of the nodes on our board.
The node states can now be viewed directly. Try using the less command in a 17-line-tall terminal and holding spacebar to view the output of this program. The patterns seem far more interesting when a toroidal space is used, as in the following screenshot:

1_running

CHDL Internals 0: node, nodeimpl, and tickable

As a companion to the CHDL tutorial series, I would like to expose some of the details of the CHDL design in a series of in-depth technical articles. The intended audience of the tutorials consists of those wanting to use CHDL for their own hardware designs. The internals articles is for those wishing to extend, alter, or simply understand CHDL at a lower level.

There are three basic datatypes upon which all of the rest of CHDL is built:

  1. The nodeimpl, handling the simulation and output behavior of
  2. the node, the user-facing interface for manipulating nodes, and
  3. the tickable, representing objects which react to clock signals.

In this article, we will focus on these types and some of the more obvious types built around them, leaving the higher-level details for next week. Because the codebase we are discussing is a constantly-changing work in progress, I will not cite data structures and functions by line number, instead referring to them by name and file.

node and nodeimpl

In a CHDL design, you type:

node x(!y);

What actually happens? Let’s trace the function calls. First, we get to operator!(node y) in gateops.h. This is merely a wrapper for Inv(y). Inv(y) is declared in gates.h and defined in gates.cpp. All of the combinational logic functions in CHDL are implemented as combinations of (for now; there’s some desire to move to the industry standard And/Inverter Graph format) nand gates and inverters. The Inv() function, therefore, represents one of these fundamental gate types. Its implementation, then, will hopefully shed some light on the implementation of CHDL nodes. The meat of the implementation of Inv() in gates.cpp is:

node r((new invimpl(in))->id);

This seems odd. We’re creating this thing called an “invimpl” by allocating it on the heap, but despite the fact that we’re allocating it here, we’re not keeping the pointer here. Instead, we’re simply taking this “id” value and passing it to the node constructor. This reveals something about the true nature of node objects: they don’t so much represent nodes themselves as much as identify a nodeimpl. Why call them “node” and not “noderef” or similar? This is entirely for the benefit of users of the CHDL API. To the hardware implementer they certainly do represent nodes.

The nodes[] Vector

So, why aren’t we keeping a pointer to our node and what’s this “id” field about? The astute reader will have read the title of this section and understood. Pointers to all nodeimpl objects are stored in a global C++ vector<nodeimpl> called nodes, declared in nodeimpl.h. Indices into this vector have their own type, too, nodeid_t, an integer type typedef found in node.h. What’s the difference between nodeid_t and node, then? Consider the following:

node x = Foo(); // x has node id 100
node y = Reg(x); // y has node id 101
node z = Bar(); // z has node id 200
x = z; // now x and z both have node id 200

So, will y be the time-shifted-by-one-cycle output of Foo() or Bar()? Remember from the tutorial and the documentation that node assignments are transitive. All of the previous things given the value of x will now have its new value. This is useful because it means that CHDL designs can rewrite themselves. This is how optimization is implemented. It is also useful simply because hardware designs are not acyclic. Combinational logic may be, but as soon as registers are involved it will sometimes be necessary to use values before they are defined.

This behavior is implemented with a data structure called the node directory or node_dir, local to nodeimpl.cpp. The node directory keeps track of the node objects associated with each node ID. The node constructor adds a node to this directory and the node destructor removes it. The node assignment operator consults the node directory and updates all of the nodes to point to the new node ID. This means that node assignments necessarily, by design, completely remove all users of a given node ID. This is why the dead node elimination optimization is vital.

Finally, we have the issue of node sources. If I create an Inv(x), once this function has returned, a new invimpl object will have been created with x as the source. How do we store the source in such a way that, if x is overwritten later, the inverter’s source will be updated to match. We simply make the src field of nodeimpl a std::vector. This leads to the expected result without introducing a cyclic type dependency; remember, nodes don’t directly point to nodeimpls. They only contain indices.

tickable

We’ve dealt with the problem of how nodes are created, linked together, and evaluated, but this only allows us to create structures which evaluate Boolean functions, not systems which evolve over time. CHDL was not designed to evaluate low-level designs including implementations of sequential logic at the gate level, including stateful asynchronous logic. Cycles in the node graph lead to undefined behavior in both simulation and optimization, and can be detected with the cycdet() function in chdl/analysis.h.

The tickable type is completely unrelated to node and provides an orthogonal feature to node. It is only later, when we get to the register; the most useful instance of a tickable, that these two concepts are merged. In general, the node interface provides a mechanism for implementing arbitrary Boolean functions that may depend on other nodes or other parts of the program, and the tickable interface provides a mechanism for implementing time stepped simulations. The combination of these two behaviors leads to the CHDL simulation model.

The class tickable itself is a virtual class that defines four member functions:

tickable::pre_tick()
tickable::tick()
tickable::tock()
tickable::post_tock()

During simulation, each of these is executed once per clock cycle. During each clock cycle, first all of the pre_tick() functions are called, then all of the tick() functions, etc. This somewhat ungainly state of affairs arose because the original simulator demanded both a tick() and tock() function, so that register state would not be updated until all register inputs had been read. Subsequent additions to CHDL required yet more priority levels to handle the updating of values both prior to and following the execution of user code when the simulation was being advanced one step at a time.

tickable is not a pure virtual class. It can itself be instantiated, although there is no practical reason to do so. Any combination of the four tick functions can be overridden or omitted from a tickable subclass. Most override only one or two.

regimpl

The example from Tutorial 0:

node x;
x = Reg(!x);
TAP(x);

makes use of two types of node: the register node and the inverter node. The implementation of the inverter node type, invimpl, was explained in the previous section. So what about x itself, a register node. Its node type is regimpl, a type which inherits from both nodeimpl and tickable. Its tick() function reads the state of its input (a variable unique to regimpl, q, not src[0]; remember, the node graph made using src should not have any cycles). The tock() function writes the value read from q to the output, another bool variable that is returned by the eval() function.

What’s Next?

With only nandimpl, invimpl, and regimpl, a world of practical and useful digital logic circuits could be constructed and simulated. A ton of software would have to be written to make this practical. This software, of course, has already been written and represents the rest of CHDL and its companion library the CHDL STL. There has been some serious thought about moving the non-fundamental software out of CHDL and into a separate library, but for the sake of the user, this would only make sense once CHDL has stabilized and packages are provided for major distributions.

Running the CHDL CPU Example

CHDL example 6, a simulated MIPS-esque CPU with a 5-stage pipeline running a Sieve of Eratosthenes program finally works. This example can be loaded and simulated and will surely provide an important test case for future modifications to CHDL. For those of you interested in seeing CHDL run for yourself, I will provide a (very) brief tutorial on building and running the examples.

Step 1: Getting CHDL

CHDL is hosted on github, which (among other things) means that it must be accessed using a git client, like so:

  $ git clone git://github.com/cdkersey/chdl.git

Step 2: Building CHDL

Once you have downloaded CHDL, building it should just be a matter of running make in the root source directory. There are a lot of compilation units, so if you’re on a multicore machine don’t forget to use the -j option to speed up the build.

  $ cd chdl
  $ make -j 8

Step 3: Building and Running the Examples

Once you have built the core CHDL library, you can build the examples in the examples/ directory.

  $ cd examples
  $ make -j 8

Once this finishes, you will have a set of files called example[i].vcd, for i in 1 through 7. These are waveform files, containing the state of every node (or bit vector) tapped with the “TAP()” macro, and viewable in waveform viewers like the free gtkwave. If you do not have a waveform viewer installed, go ahead and obtain gtkwave.

  $ sudo aptitude install gtkwave

If you’re not on Debian, and/or do not have aptitude installed, you’re beyond help.

Step 4: Viewing the Waveforms

The waveform files can then be viewed with gtkwave:

  $ gtkwave example6.vcd

The TAP()s from the source code are listed along the left side of the window and can be dragged to the viewing area.

chdl_waveform

In the above figure, the fetch program counter is being viewed in “analog” mode, showing the path taken through the program memory over time. Note that after the program is finished executing, the processor enters a tight loop. The output of the program is the series of prime numbers in register 12 (a good test case because it’s a simple program whose results are hard to produce accidentally). Happy hacking!

Announcing CHDL

Strange things happen as a consequence of the lack of freedom in the hardware world. Consider the case of the man who made a CPU out of discrete transistors because he was uncomfortable with FPGA vendor lock-in. (http://www.6502.org/users/dieter/mt15/mt15.htm) I do not pretend to understand the EDA community enough to make any claims about the tools they do or do not have, open source or otherwise. I have experienced a lack, but it may just be the rarified air of the field when compared to software-oriented disciplines. Other tools exist. I have not found the ones I have encountered particularly well-suited to my personal needs, so I have built another.

CHDL (call it “the” CHDL at your peril) is two things: a C++-based hardware description language and a C++ hardware design library. The former fills a perceived need for radical generality and simplicity in gate-level design specification and the latter fills a need for a free software toolchain for realizing these designs.

CHDL (the language) can be used to specify abstract digital logic designs with uncommonly terse syntax. Designs specified in this way can then be subject to optimizations and simulated directly or written out to netlists. It is the processing of these netlists that is the domain of CHDL (the library) and the related utilities. The netlist files may be translated to other HDLs (like ones supported by FPGA vendor toolchains), translated to C and simulated more quickly, or technology mapped and physically implemented.

What does CHDL code look like? Here is a a nontrivial design from the standard library: a Kogge-Stone adder:

 template  bvec Adder(bvec a, bvec b, node cin = Lit(0)) {
    vector<bvec<N+1>> g(log2(N)+3), p(log2(N)+3), i(log2(N)+3);
    bvec s;

    g[0][0] = cin;
    p[0][0] = Lit(0);

    for (unsigned j = 0; j < N; ++j) p[0][j+1] = Xor(a[j], b[j]);
    for (unsigned j = 0; j < N; ++j) g[0][j+1] = a[j] && b[j];

    unsigned k;
    for (k = 0; (1l<<k) < 2*N; ++k) {
      for (unsigned j = 0; j < N+1; ++j) {
        if (j < (1l<<l)) {
          g[k+1][j] = g[k][j];
          p[k+1][j] = p[k][j];
        } else {
          i[k+1][j] = p[k][j] && g[k][j - (1l<<k)];
          g[k+1][j] = i[k+1][j] || g[k][j];
          p[k+1][j] = p[k][j] && p[k][j - (1l<<k)];
        }
      }
    }

    for (unsigned j = 0; j < N; ++j) s[j] = Xor(p[0][j+1], g[k][j]);
    return s;
  }

This template function, when called, instantiates an adder of the given size. It will instantiate one of these adders each time it is called. Note that all of the loops are run at design instantiation time. The function’s goal is to create some gates. Once it has returned, those gates occupy some global state, where they can then be simulated, optimized, or written out to a file.

I have placed the very-much-in-development CHDL on GitHub (https://github.com/cdkersey/chdl) along with the hope that I am not alone in my ambition, and that likeminded individuals will find value in these manic machinations.