Writing an Interpreter with BNFC and Haskell

Note: this article, and others following up on it, are primarily intended as notes for myself. Of course it’s great if you’re interested in compilers and interpreters, and find those articles useful. I’m glossing over a few steps, so if something isn’t quite clear, then please refer to Aarne Ranta’s book Implementing Programming Languages.

If you want to actively follow along, make sure the following components are installed on your system: bnfc, tex-live, haskell-platform.

Introduction

We live in exciting times. While writing a compiler used to be very time-consuming, modern tools have reached such a high level of sophistication that implementing a useful language is now within reach for anybody who is interested, because a lot of the possibly tedious work can be automated.

Compilation consists of the phases parsing, lexing, type checking, and ‘executing’. The last step entails either interpreting an abstract syntax tree, or generating code for a target language. This could be assembly, byte code, or another high-level language.

I’m tempted to say that the interesting work when writing a compiler consists of writing the type checker as well as the interpreter or code generator. The front-end part, writing a parser and a lexer, can be greatly automated. With BNFC you only have to write a grammar for your language, which BNFC will then use to generate the components of the compiler front-end.

If you want to write a ‘dynamic’ language, you can even omit writing a type checker. This major defect doesn’t seem to have harmed the adoption of Python. Types are pretty much a must-have, though, and I might briefly cover this issue in further articles. If you generate code that is targeting a virtual machine, such as JVM or LLVM, your won’t be restricted to just a single processor architecture. This will also make your task a lot easier.

In this article, I intend to give an overview of working with BNFC to write an interpreter. While the example is simple, it is nonetheless realistic, and can be extended to implement ‘real’ programming languages as well.

Implementing an Interpreter

The following presentation is based on the tutorial on the BNFC website. BNFC supports C, C++, C#, Haskell, Java, and OCaml. I’m focussing on Haskell, due to its expressive power, and because I’m more familiar with it than OCaml. I do not recommend any of the languages in the C family for this task.

The running example in this article is writing a simple calculator. While my example is based on the BNFC tutorial, I made one addition that shows how to handle a slightly more complex case. Therefore, the calculator will handle the following operations on integers: addition, subtraction, multiplication, division, and computing the factorial.

The Grammar

The grammar is straightforward to write, with the most important aspect being the order of preference of the operations:

EAdd.  Exp  ::= Exp  "+" Exp1 ;
ESub.  Exp  ::= Exp  "-" Exp1 ;
EMul.  Exp1 ::= Exp1 "*" Exp2 ;
EDiv.  Exp1 ::= Exp1 "/" Exp2 ;
EFact. Exp2 ::= Exp3 "!" ;

EInt.  Exp3 ::= Integer ;
  
coercions Exp 3 ;

Put it in a file named Calculator.cf.

The syntax is explained on the BNFC website, but it should look very familiar to you if you’ve been exposed to Backus-Naur Form in one of your introductory computer science classes. On a related note, this is no longer a given. In a university course on functional programming that was mostly taken by MSc students, when discussing an example based on a grammar, our professor asked the audience to raise their hand if they’ve heard of Backus-Naur Form, and less than 10% had. The follow up question how many were familiar with UML, which everybody was, led him to conclude that computer science education was in a sad state.

Anyway, this is the grammar for our calculator. Now it has to be processed by BNFC, with the following commands:

>bnfc -m -haskell Calculator.cf
>make

Testing the Front-end

At this point, it makes sense to test the front end on some sample files, and manually check the generated abstract syntax tree. It is straightforward to do this. All you have to do is run ‘TestCalculator’ on any input file. Let’s say your testfile is called ‘ex1.calc’, and contains this line:

2 + 3 + 4

Executing this command on the command line:

./TestCalculator ex1.calc

You should get this output:

ex1.calc

Parse Successful!

[Abstract Syntax]

EAdd (EAdd (EInt 2) (EInt 3)) (EInt 4)

[Linearized tree]

2 + 3 + 4

It’s a good idea to test your grammar on some more elaborate expressions, though, such as ‘(((3 * 6!) / 216) – 1)!’:

ex6.calc

Parse Successful!

[Abstract Syntax]

EFact (ESub (EDiv (EMul (EInt 3) (EFact (EInt 6))) (EInt 216)) (EInt 1))

[Linearized tree]

(3 * 6 ! / 216 - 1)!

I find it helpful to draw a few of those ASTs on paper, to convince myself that they really conform to what I intended the grammar to express.

The Interpreter

Now it’s time for the meat of the operation: writing the interpreter. BNFC helps you out here by providing a skeleton. Look for the file ‘SkelCalculator.hs’ in your current working directory. This is its content:

module SkelCalculator where

-- Haskell module generated by the BNF converter

import AbsCalculator
import ErrM
type Result = Err String

failure :: Show a => a -> Result
failure x = Bad $ "Undefined case: " ++ show x

transExp :: Exp -> Result
transExp x = case x of
  EAdd exp1 exp2  -> failure x
  ESub exp1 exp2  -> failure x
  EMul exp1 exp2  -> failure x
  EDiv exp1 exp2  -> failure x
  EFact exp  -> failure x
  EInt n  -> failure x

Now rename this file to Interpreter.hs, and add the required rules for processing the AST:

module Interpreter where

import AbsCalculator

interpret :: Exp -> Integer
interpret x = case x of
  EAdd  exp1 exp2  -> interpret exp1 + interpret exp2
  ESub  exp1 exp2  -> interpret exp1 - interpret exp2
  EMul  exp1 exp2  -> interpret exp1 * interpret exp2
  EDiv  exp1 exp2  -> interpret exp1 `div` interpret exp2
  EFact exp        -> if exp == EInt 0
                      then 1
                      else let eval = interpret exp
                           in  eval * interpret (EFact (EInt (eval - 1)))
  EInt  n          -> n

As a last step, create the file Calculator.hs:

module Main where

import LexCalculator
import ParCalculator
import AbsCalculator
import Interpreter

import ErrM

main = do
  interact calc
  putStrLn ""

calc s = 
  let Ok e = pExp (myLexer s) 
  in show (interpret e)

For this toy example, separating this program into two files might seem superfluous, but with a more complex interpreter, and some additional operations, like passing flags to the interpreter, it leads to a much cleaner design.

Of course, make sure that those filed type-check in GHCi, before attempting to compile them. Also, it makes sense to have some linearized trees ready to test the Interpreter with, while developing it.

Once you are convinced that everything is alright, proceed to the final step, compiling the interpreter:

ghc --make Calculator.hs

You can then execute the interpreter on the command line, and run it again on your test files.

>./Calculator < ex1.calc
9
>./Calculator < ex6.calc
362880

This concludes the work on an interpreter that can handle simple operations on integers.

9 thoughts on “Writing an Interpreter with BNFC and Haskell

  1. Gavin

    Dear,
    I have a problem.
    When I tried to use: ./TestCalculator ex1.calc, GHC writes: :26:1: parse error on input ‘./’

    I can’t understand what I am doing wrong. I have already installed GHC, Alex and Happy and I do everything using the terminal and WinGHCi with admin privileges.

    This is what I do:
    1) with terminal I reach the directory (using cd) and I write: bnfc -m -haskell Calculator.cf
    2) on the terminal I use:
    happy ParCalculator.y
    alex LexCalculator.x
    to create ParCalculator.hs and LexCalculator.hs
    3) At the end I write: ghc –make TestCalculator.hs
    (I can’t do only “make” because it doesn’t work)

    After these 4 steps I tried with ./TestCalculator ex1.calc but it doesn’t work.
    Please, can you help me??
    Thank you

    Reply
        1. Gregor Ulm Post author

          It could be a Windows issue, which would not surprise me at all. Just set up a virtual machine with Linux and try again.

          Reply
  2. old wise man :)

    $ echo ‘./TestCalculator ex1.calc’ | wc -c
    26
    you are typing 26 chars

    There is an important omission in your command, the redirection symbol ‘<'
    ./TestCalculator < ex1.calc
    writing in this way in any unix shell, you run the program TestCalculator in the actual directory. That program reads stdin (standard input) from ex1.calc file.

    if you are in windows on a cmd window, just type
    TestCalculator < ex1.calc
    which is the same to type:
    .\TestCalculator.exe < ex1.calc
    or even:
    .\TESTCALCULATOR.EXE < EX1.CALC
    because MSDOS or Windows CMD is not case sensitive.

    the other possible source of error may be the change in the name.
    You used TestCalculator.hs instead of Calculator.hs
    You could have done that change of name in an inconsistent way.
    Just see what is in your directory either with
    ls -lh in any unix or dir /s in MSDos or Windows CMD

    There are lots of tutorials on how to work with a shell.
    You can also read The Unix Programming Environment by Kernighan and Pike? , it is slightly outdated but you can learn a lot with it, using the newer versions of the programs. For example you can use bison instead of yacc.
    The important thing, is that you can connect small programs to compose them to do more things.

    Reply
  3. old wise man :)

    Gregor,
    Thanks for your guide to bnfc.
    Please reject my previous comment for , because it is wrong.
    I tried to use bnfc with hugs, but only ghc code is generated.
    I tried to generate a C target, and succeed!
    Maybe the problem that Gavin reported, is due because he is writing
    $ ./TestCalculator ex1.calc
    instestead of
    TestCalculator ex1.calc
    his error report is:
    :26:1: parse error on input ‘./’
    the 26 seems related with the size of what was typed:
    $ echo ‘./TestCalculator ex1.calc’ | wc -c
    26
    After all this time, he should had solved the problem.
    I could not reproduce the output in windows because I don’t have it.
    $ ./TestCalculator TestCalculator TestCalculator ex1.calc

    Reply
  4. old wise man :)

    Gregor,
    Thanks for your guide to bnfc.
    Please reject my previous comment for , because it is wrong.
    I tried to use bnfc with hugs, but only ghc code is generated.
    I tried to generate a C target, and succeed!
    Maybe the problem that Gavin reported, is due because he is writing
    $ ./TestCalculator ex1.calc
    instestead of
    TestCalculator ex1.calc
    his error report is:
    :26:1: parse error on input ‘./’
    the 26 seems related with the size of what was typed:
    $ echo ‘./TestCalculator ex1.calc’ | wc -c
    26
    After all this time, he should had solved the problem.
    I could not reproduce the output in windows because I don’t have it.
    $ ./TestCalculator < ex1.calc
    also works in Linux, it could be tested in the cmd shell of Windows/MSDOS
    maybe he should just type:
    TestCalculator < ex1.calc

    some test was lost, because this site seems to use html marking, this should display fine.

    Reply
  5. Peter

    Cant figure out where does pExp come from. have trouble myself getting something useful since lexer returns type a

    Reply
  6. MJ

    Thank you for informative article!

    Same problem occurred while testing on Windows. Always got the “lexer error”. Spent couple of days trying to solve it. Finally, just tried on Linux VM. Parsing was successful from the first attempt.

    Reply

Leave a Reply to Gregor Ulm Cancel reply

Your email address will not be published. Required fields are marked *

Spammer prevention; the answer is an integer: * Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.