Monthly Archives: June 2014

Writing an Interpreter with BNFC and Haskell

Note: this article, and others following up on it, are primarily intended as notes for myself. Of course it’s great if you’re interested in compilers and interpreters, and find those articles useful. I’m glossing over a few steps, so if something isn’t quite clear, then please refer to Aarne Ranta’s book Implementing Programming Languages.

If you want to actively follow along, make sure the following components are installed on your system: bnfc, tex-live, haskell-platform.

Introduction

We live in exciting times. While writing a compiler used to be very time-consuming, modern tools have reached such a high level of sophistication that implementing a useful language is now within reach for anybody who is interested, because a lot of the possibly tedious work can be automated.

Compilation consists of the phases parsing, lexing, type checking, and ‘executing’. The last step entails either interpreting an abstract syntax tree, or generating code for a target language. This could be assembly, byte code, or another high-level language.

I’m tempted to say that the interesting work when writing a compiler consists of writing the type checker as well as the interpreter or code generator. The front-end part, writing a parser and a lexer, can be greatly automated. With BNFC you only have to write a grammar for your language, which BNFC will then use to generate the components of the compiler front-end.

If you want to write a ‘dynamic’ language, you can even omit writing a type checker. This major defect doesn’t seem to have harmed the adoption of Python. Types are pretty much a must-have, though, and I might briefly cover this issue in further articles. If you generate code that is targeting a virtual machine, such as JVM or LLVM, your won’t be restricted to just a single processor architecture. This will also make your task a lot easier.

In this article, I intend to give an overview of working with BNFC to write an interpreter. While the example is simple, it is nonetheless realistic, and can be extended to implement ‘real’ programming languages as well.

Implementing an Interpreter

The following presentation is based on the tutorial on the BNFC website. BNFC supports C, C++, C#, Haskell, Java, and OCaml. I’m focussing on Haskell, due to its expressive power, and because I’m more familiar with it than OCaml. I do not recommend any of the languages in the C family for this task.

The running example in this article is writing a simple calculator. While my example is based on the BNFC tutorial, I made one addition that shows how to handle a slightly more complex case. Therefore, the calculator will handle the following operations on integers: addition, subtraction, multiplication, division, and computing the factorial.

The Grammar

The grammar is straightforward to write, with the most important aspect being the order of preference of the operations:

EAdd.  Exp  ::= Exp  "+" Exp1 ;
ESub.  Exp  ::= Exp  "-" Exp1 ;
EMul.  Exp1 ::= Exp1 "*" Exp2 ;
EDiv.  Exp1 ::= Exp1 "/" Exp2 ;
EFact. Exp2 ::= Exp3 "!" ;

EInt.  Exp3 ::= Integer ;
  
coercions Exp 3 ;

Put it in a file named Calculator.cf.

The syntax is explained on the BNFC website, but it should look very familiar to you if you’ve been exposed to Backus-Naur Form in one of your introductory computer science classes. On a related note, this is no longer a given. In a university course on functional programming that was mostly taken by MSc students, when discussing an example based on a grammar, our professor asked the audience to raise their hand if they’ve heard of Backus-Naur Form, and less than 10% had. The follow up question how many were familiar with UML, which everybody was, led him to conclude that computer science education was in a sad state.

Anyway, this is the grammar for our calculator. Now it has to be processed by BNFC, with the following commands:

>bnfc -m -haskell Calculator.cf
>make

Testing the Front-end

At this point, it makes sense to test the front end on some sample files, and manually check the generated abstract syntax tree. It is straightforward to do this. All you have to do is run ‘TestCalculator’ on any input file. Let’s say your testfile is called ‘ex1.calc’, and contains this line:

2 + 3 + 4

Executing this command on the command line:

./TestCalculator ex1.calc

You should get this output:

ex1.calc

Parse Successful!

[Abstract Syntax]

EAdd (EAdd (EInt 2) (EInt 3)) (EInt 4)

[Linearized tree]

2 + 3 + 4

It’s a good idea to test your grammar on some more elaborate expressions, though, such as ‘(((3 * 6!) / 216) – 1)!’:

ex6.calc

Parse Successful!

[Abstract Syntax]

EFact (ESub (EDiv (EMul (EInt 3) (EFact (EInt 6))) (EInt 216)) (EInt 1))

[Linearized tree]

(3 * 6 ! / 216 - 1)!

I find it helpful to draw a few of those ASTs on paper, to convince myself that they really conform to what I intended the grammar to express.

The Interpreter

Now it’s time for the meat of the operation: writing the interpreter. BNFC helps you out here by providing a skeleton. Look for the file ‘SkelCalculator.hs’ in your current working directory. This is its content:

module SkelCalculator where

-- Haskell module generated by the BNF converter

import AbsCalculator
import ErrM
type Result = Err String

failure :: Show a => a -> Result
failure x = Bad $ "Undefined case: " ++ show x

transExp :: Exp -> Result
transExp x = case x of
  EAdd exp1 exp2  -> failure x
  ESub exp1 exp2  -> failure x
  EMul exp1 exp2  -> failure x
  EDiv exp1 exp2  -> failure x
  EFact exp  -> failure x
  EInt n  -> failure x

Now rename this file to Interpreter.hs, and add the required rules for processing the AST:

module Interpreter where

import AbsCalculator

interpret :: Exp -> Integer
interpret x = case x of
  EAdd  exp1 exp2  -> interpret exp1 + interpret exp2
  ESub  exp1 exp2  -> interpret exp1 - interpret exp2
  EMul  exp1 exp2  -> interpret exp1 * interpret exp2
  EDiv  exp1 exp2  -> interpret exp1 `div` interpret exp2
  EFact exp        -> if exp == EInt 0
                      then 1
                      else let eval = interpret exp
                           in  eval * interpret (EFact (EInt (eval - 1)))
  EInt  n          -> n

As a last step, create the file Calculator.hs:

module Main where

import LexCalculator
import ParCalculator
import AbsCalculator
import Interpreter

import ErrM

main = do
  interact calc
  putStrLn ""

calc s = 
  let Ok e = pExp (myLexer s) 
  in show (interpret e)

For this toy example, separating this program into two files might seem superfluous, but with a more complex interpreter, and some additional operations, like passing flags to the interpreter, it leads to a much cleaner design.

Of course, make sure that those filed type-check in GHCi, before attempting to compile them. Also, it makes sense to have some linearized trees ready to test the Interpreter with, while developing it.

Once you are convinced that everything is alright, proceed to the final step, compiling the interpreter:

ghc --make Calculator.hs

You can then execute the interpreter on the command line, and run it again on your test files.

>./Calculator < ex1.calc
9
>./Calculator < ex6.calc
362880

This concludes the work on an interpreter that can handle simple operations on integers.

Implementing the game 2048 in less than 90 lines of Haskell

Last week Rice University’s MOOC Principles of Computing started on Coursera. Judging from the first week’s material, it seems to have all the great qualities of their previous course An Introduction to Interactive Programming in Python: The presentation is well done, there is plenty of support available, and the assignments are fun. The very first assignment was writing the game logic of 2048.

I don’t consider 2048 to be particularly interesting due to fundamental flaws in its design. First, it can’t be won from any starting position. Second, the most promising strategy makes it rather tedious, and further exhibits that it’s about having a lucky streak with the random number generator instead of skill. Personally, I prefer games that exhibit what is sometimes referred to as “theoretical perfection”, i.e. the attribute of a game that playing it perfectly makes victory a certainty. While 2048 is, as a consequence, rather unappealing to me, I can see why some people would enjoy sliding tiles around.

Writing the code for the game logic was rather straightforward. Due to Principles of Computing using Python as a teaching language, it was no surprise that the one mistake in my initial solution was due to mutation. Thinking that this would have been much more fun in Haskell, I then proceeded to write a complete implementation of 2048 in that language, including I/O handling. The entire source code is available on my Github account. As it turned out, the more complete Haskell solution required fewer lines of code than merely the game logic in Python.

As a side note, if you came to this page because you were looking for a solution to the Python assignment of Principles of Computing, you’re wasting your time. The Haskell implementation is fundamentally different from an implementation in Python, and uses programming language constructs that are not even available in that language. In other words, looking at the Haskell source code will not help you if you are struggling with this assignment.

In this article I only want to highlight the core part of the game logic, since it nicely demonstrates the power of functional programming. First, I defined a datatype for the direction the numbers in the grid can be moved in, and a type synonym for a list of lists of integers, to increase the readability of type signatures. This should be evident from the signature of the function ‘move’ further down, which takes as an input a grid of numbers and a move, and produces a new grid.

data Move = Up | Down | Left | Right
type Grid = [[Int]]   

The game 2048 is played on a 4 x 4 board. The starting position in my implementation is fixed:

start :: Grid
start = [[0, 0, 0, 0],
         [0, 0, 0, 0],
         [0, 0, 0, 2],
         [0, 0, 0, 2]]

The board can be moved in four directions, meaning that all numbers move in that particular direction, and if two numbers, when moved in the same direction, end up next to each other, they merge. For instance, in the starting position shown below, moving the board in the direction ‘Up’ turns the board into:

[[0, 0, 0, 4],
 [0, 0, 0, 0],
 [0, 0, 0, 0],
 [0, 0, 0, 0]]

If the grid in the starting position was moved ‘Right’, there would be no change. If the grid changes, then a new number spawns on any empty tile, and this number can be either 2 or 4.

Looking at this mechanic, the question is how it could be modelled effectively. Any row or column on the grid can be understood as a list. The relation between rows and lists is straightforward. The columns will have to be extracted, modified, and inserted again, though. Or maybe they don’t?

I wrote a function to merge a row or a column, represented as a list. First, all zeros are removed. Then, the list is processed, merging adjacent elements if they contain identical numbers, and padding the result of that operation with zeroes, if necessary.

merge :: [Int] -> [Int]
merge xs = merged ++ padding
    where padding          = replicate (length xs - length merged) 0
          merged           = combine $ filter (/= 0) xs
          combine (x:y:xs) | x == y    = x * 2 : combine xs
                           | otherwise = x     : combine (y:xs) 
          combine x        = x

The merge function can be directly applied when the board is moved to the left. The other directions require a little bit of thought, if the code is supposed to remain clean. Moving the grid to the right is done by taking each row, reversing it before handing it off to the function ‘merge’, and then reversing the result again:

                    
move :: Grid -> Move -> Grid
move grid Left  = map merge grid
move grid Right = map (reverse . merge . reverse) grid
move grid Up    = transpose $ move (transpose grid) Left
move grid Down  = transpose $ move (transpose grid) Right

Moving the grid up or down would be painful if you wanted to extract a column, apply the merge function to it, and then create a new grid with that column inserted. Instead, though, a tiny bit of linear algebra knowledge leads to a much more elegant solution. If it’s not immediately clear how transposing leads to the desired outcome, the please have a look at the following illustration.

        input       transpose   move        transpose

Up:     0 0         0 2         2 0         2 2
        2 2         0 2         2 0         0 0


Down:   2 2         2 0         0 2         0 0 
        0 0         2 0         0 2         2 2

My Haskell implementation uses the terminal as output. It’s not as impressive as the JavaScript frontend of Gabriele Cirulli’s version, but it’s serviceable, as the following two screenshots show:

Starting position

Game Over

Overall, I’m quite satisfied with this prototype. There are of course several possible improvements. A score tracker would be trivial to add, while a GUI would be a more time-consuming endeavor. I would find it interesting to have the program immediately react to keyboard input. Currently, every input via WASD requires hitting the enter key for confirmation. Gameplay would speed up a lot if merely pressing a key would trigger the next step in the program execution. I didn’t find any quick solution when researching this problem. The Haskell library NCurses contains constructors for keyboard events, though. I might look into it in case I get an itch to program an “indie” game with ASCII graphics.

If you found this article interesting, then feel free to have a look at the source code of my Haskell implementation of 2048.