Programming FPGAs with Haskell

original posting: 29 March 2009
most recent update: 5 April 2009


In this document I describe my compiler for programming FPGAs using Haskell.

Design Requirements

The compiler needed to describe math applications, initially FFTs. We wanted the results to be pipelined. The library of arithmetic operators had various delays that needed to go into a table, and multiplication had a default delay, plus optional additional delays that could be used by the FPGA vendor's compiler for further optimization. Initially the words were fixed-point with programmable width.

The output was verilog optimized for the FPGA vendor's compiler. The initial vendor was Xilinx. This compiler could handle simple behavioral code. I added some optimizations to remove vendor compiler warnings, and some to remove simulator (cver and iverilog) warnings.

Previous Compilers

There were two previous versions of this compiler. For historical reasons the first read a verilog variation, which was almost verilog so it was somewhat familiar (good), but wasn't legal so it couldn't simulate in a normal verilog simulator (bad). The second was a subset of scilab, which is similar to matlab, a popular DSP language. It was legal scilab, so you could test algorithms with the scilab interpreter.

These both worked for a 9-pt and a 16-pt FFT that we developed, consisting solely of equations using add, subtract, and negate. FFTs that combined one or both of these with transpose, input map, output map, and phase multiply (e.g., 81-pt and 1296-pt), used hand-developed verilog (reordering modules) or a combination (phase multiply consisted of a multiply module generated by the compiler, and hand-developed verilog for the state machine for sharing the single multiply).


I chose Haskell for all parts of the new compiler (input language and coding of the compiler). But I did not develop a Haskell parser. I decided that parsing full Haskell would take more effort than it was worth, and defining a subset of Haskell would lose too much of the benefit of Haskell. Instead, I extracted the essence of hardware description from verilog, vhdl, system verilog, etc, and turned it into an API.

The idea was that Haskell would supply the algorithmic power, while the API would supply the hardware description. I claim that a significant part of the difference between the various hardware description languages (HDLs) is in features that are there for algorithmic power. What a waste! And many hardware designers who find the algorithmic power of their HDL insufficient will develop home-brew preprocessors, not infrequently in Perl, to bolster the algorithmic power. More waste!

Initially, haskell was overkill for my algorithmic development, but it left an easy upgrade path for development of higher-level design of DSP modules (algorithmic definition instead of just listing equations), fancy state-machine definition, architectural exploration, and parameterizable and reusable module descriptions.

Also note that this compiler was just for in-house use. If I were developing something for sale, I'd be tempted to use matlab for dsp designers, and perhaps C for others. Any software language at all would work, simply by adding the API to it.

The Api

Here's what the API looks like in Haskell:

	type CD = Complex Double

	data Top =
		Comb [String] [Ngn] [String]
		| Seq StateMachine

	data NgnE =
		Con CD
		| Var String
		| Add NgnE NgnE
		| Sub NgnE NgnE
		| Mul NgnE NgnE
		| Cplx NgnE NgnE
		| Neg NgnE
		| Real NgnE
		| Imag NgnE
			deriving (Show)

	data NgnB =
		And NgnB NgnB
		| Eq NgnE NgnE
		| Not NgnB
		| Bully NgnE
			deriving (Show)

	data Ngn =
		Set String NgnE
		| IfThen (NgnB, Ngn)
		| IfThenElse (NgnB, Ngn, Ngn)
		| Bend [Ngn]
			deriving (Show)

	data Mem = Mem {
		addrIn :: String,
		writeEnable :: NgnB,
		dataIn :: String,
		addrOut :: String,
		dataOut :: String,
		readEnable :: NgnB,
		baseName :: String,
		-- lth 1 => length of ram, else contents of rom
		contents :: [CD]}

	data StateMachine = StateMachine {
		input :: String,
		output :: String,
		validIn :: String,
		validOut :: String,
		widths :: [(String, Int)],
		resets :: [(String, Int)],
		mems :: [Mem],
		stop :: NgnB,
		machine :: [Ngn]}

Top is the top-level of a module. It can be combinational or sequential. The combinational module has three arguments, a list of input names, a list of equations, and a list of output names. The sequential module has one argument, the state machine (defined later).

NgnE is an expression. It can be a constant, with a complex double argument, a variable, with a name argument, or various operators with one or two expressions for arguments.

NgnB is a boolean expression, with boolean operators taking boolean arguments, and relational operators taking expressions for arguments.

Ngn is a statement. It can be an assignment statement, conditional statements, or a verilog-style begin-end, bracketing multiple statements.

Now is a good time to discuss some design decisions regarding the API. The API was not designed with humans typing them in in mind. I figured there would need to be some of that at the beginning, but I didn't mind if it looked a little ugly at that point. The goal was for the API data structure to be generated algorithmically by higher-level code. That's what I had in mind all along, and the goal for which I was optimizing.

Another decision was to use the type system as much as I could. For example, why not use a C-style expression, where assignment statements are just expressions that return a value like any other, and boolean statements are just expressions that return 0 or 1, and so on? This makes some of the coding simpler, but it loses some of the power of Haskell's type system. As I defined it, the type system won't let you get sloppy with your definitions. Sometimes that means you can't take a shortcut, but usually it means it catches you making an error.

Then I immediately contradicted that philosophy with a single memory type, rather than separate ram and rom. The contents of the rom is, quite reasonably, the values of the list of complex doubles in contents. But if the length of contents is one, then its value is the length of the ram. (Whose contents can't be stored here because this structure is immutable. The simulator has to allocate a mutable (actually efficient immutable) structure for the contents of rams.)

Anyway, Mem has input address, data input, and write enable (all ignored in rom), and output address, data output, and read enable. baseName is used to identify the memory when it is allocated in the simulator. Each memory location is stored in a hash table, keyed by baseName and the address. Amusingly, I first coded the simulator with mutable data structures in a monad, but was unhappy with how much of the code ended up in monads, and recoded with efficient immutable hash tables and no monads. This didn't even noticeably slow down a 1296-point FFT. Interesting.

Finally, StateMachine describes a state machine module. It is designed for being in a pipeline. validIn determines when it should process inputs. validOut signals the same thing to the next stage. resets lists state variables that need to be initialized, and to what state. widths is ignored during simulation, but defines signal widths for verilog generation. stop is a boolean expression that tells when to stop the state machine. Once it gets a valid input, it can keep chewing on that until it's told to stop. mems refers to any number of memories, defined earlier. machine refers to any number statements, also defined earlier.

A complete circuit is any number of sequential or combinational modules (Tops) in a list. I should note here that the combinational modules aren't really combinational, because they are pipelined datapaths. But they act kinda like they're combinational, and you could take away the registers and they would work as combinational circuits, so what the heck. I called them combinational.

So the combinational modules don't have valid in and out. Or they just pass valid in to valid out, with the latency of the rest of the module. These modules run all the time, FFTing (or whatever) their inputs, whether valid or not. The sequential modules do pay attention to valid in, and generate valid out.

Comments to doug[at]DougAndJean[dot]com

Back to