Note: this article, and others following up on it, are primarily intended as notes for myself. Of course it’s great if you’re interested in compilers and interpreters, and find those articles useful. I’m glossing over a few steps, so if something isn’t quite clear, then please refer to Aarne Ranta’s book Implementing Programming Languages.
If you want to actively follow along, make sure the following components are installed on your system: bnfc, tex-live, haskell-platform.
Introduction
We live in exciting times. While writing a compiler used to be very time-consuming, modern tools have reached such a high level of sophistication that implementing a useful language is now within reach for anybody who is interested, because a lot of the possibly tedious work can be automated.
Compilation consists of the phases parsing, lexing, type checking, and ‘executing’. The last step entails either interpreting an abstract syntax tree, or generating code for a target language. This could be assembly, byte code, or another high-level language.
I’m tempted to say that the interesting work when writing a compiler consists of writing the type checker as well as the interpreter or code generator. The front-end part, writing a parser and a lexer, can be greatly automated. With BNFC you only have to write a grammar for your language, which BNFC will then use to generate the components of the compiler front-end.
If you want to write a ‘dynamic’ language, you can even omit writing a type checker. This major defect doesn’t seem to have harmed the adoption of Python. Types are pretty much a must-have, though, and I might briefly cover this issue in further articles. If you generate code that is targeting a virtual machine, such as JVM or LLVM, your won’t be restricted to just a single processor architecture. This will also make your task a lot easier.
In this article, I intend to give an overview of working with BNFC to write an interpreter. While the example is simple, it is nonetheless realistic, and can be extended to implement ‘real’ programming languages as well.
Implementing an Interpreter
The following presentation is based on the tutorial on the BNFC website. BNFC supports C, C++, C#, Haskell, Java, and OCaml. I’m focussing on Haskell, due to its expressive power, and because I’m more familiar with it than OCaml. I do not recommend any of the languages in the C family for this task.
The running example in this article is writing a simple calculator. While my example is based on the BNFC tutorial, I made one addition that shows how to handle a slightly more complex case. Therefore, the calculator will handle the following operations on integers: addition, subtraction, multiplication, division, and computing the factorial.
The Grammar
The grammar is straightforward to write, with the most important aspect being the order of preference of the operations:
EAdd. Exp ::= Exp "+" Exp1 ; ESub. Exp ::= Exp "-" Exp1 ; EMul. Exp1 ::= Exp1 "*" Exp2 ; EDiv. Exp1 ::= Exp1 "/" Exp2 ; EFact. Exp2 ::= Exp3 "!" ; EInt. Exp3 ::= Integer ; coercions Exp 3 ;
Put it in a file named Calculator.cf.
The syntax is explained on the BNFC website, but it should look very familiar to you if you’ve been exposed to Backus-Naur Form in one of your introductory computer science classes. On a related note, this is no longer a given. In a university course on functional programming that was mostly taken by MSc students, when discussing an example based on a grammar, our professor asked the audience to raise their hand if they’ve heard of Backus-Naur Form, and less than 10% had. The follow up question how many were familiar with UML, which everybody was, led him to conclude that computer science education was in a sad state.
Anyway, this is the grammar for our calculator. Now it has to be processed by BNFC, with the following commands:
>bnfc -m -haskell Calculator.cf >make
Testing the Front-end
At this point, it makes sense to test the front end on some sample files, and manually check the generated abstract syntax tree. It is straightforward to do this. All you have to do is run ‘TestCalculator’ on any input file. Let’s say your testfile is called ‘ex1.calc’, and contains this line:
2 + 3 + 4
Executing this command on the command line:
./TestCalculator ex1.calc
You should get this output:
ex1.calc Parse Successful! [Abstract Syntax] EAdd (EAdd (EInt 2) (EInt 3)) (EInt 4) [Linearized tree] 2 + 3 + 4
It’s a good idea to test your grammar on some more elaborate expressions, though, such as ‘(((3 * 6!) / 216) – 1)!’:
ex6.calc Parse Successful! [Abstract Syntax] EFact (ESub (EDiv (EMul (EInt 3) (EFact (EInt 6))) (EInt 216)) (EInt 1)) [Linearized tree] (3 * 6 ! / 216 - 1)!
I find it helpful to draw a few of those ASTs on paper, to convince myself that they really conform to what I intended the grammar to express.
The Interpreter
Now it’s time for the meat of the operation: writing the interpreter. BNFC helps you out here by providing a skeleton. Look for the file ‘SkelCalculator.hs’ in your current working directory. This is its content:
module SkelCalculator where -- Haskell module generated by the BNF converter import AbsCalculator import ErrM type Result = Err String failure :: Show a => a -> Result failure x = Bad $ "Undefined case: " ++ show x transExp :: Exp -> Result transExp x = case x of EAdd exp1 exp2 -> failure x ESub exp1 exp2 -> failure x EMul exp1 exp2 -> failure x EDiv exp1 exp2 -> failure x EFact exp -> failure x EInt n -> failure x
Now rename this file to Interpreter.hs, and add the required rules for processing the AST:
module Interpreter where import AbsCalculator interpret :: Exp -> Integer interpret x = case x of EAdd exp1 exp2 -> interpret exp1 + interpret exp2 ESub exp1 exp2 -> interpret exp1 - interpret exp2 EMul exp1 exp2 -> interpret exp1 * interpret exp2 EDiv exp1 exp2 -> interpret exp1 `div` interpret exp2 EFact exp -> if exp == EInt 0 then 1 else let eval = interpret exp in eval * interpret (EFact (EInt (eval - 1))) EInt n -> n
As a last step, create the file Calculator.hs:
module Main where import LexCalculator import ParCalculator import AbsCalculator import Interpreter import ErrM main = do interact calc putStrLn "" calc s = let Ok e = pExp (myLexer s) in show (interpret e)
For this toy example, separating this program into two files might seem superfluous, but with a more complex interpreter, and some additional operations, like passing flags to the interpreter, it leads to a much cleaner design.
Of course, make sure that those filed type-check in GHCi, before attempting to compile them. Also, it makes sense to have some linearized trees ready to test the Interpreter with, while developing it.
Once you are convinced that everything is alright, proceed to the final step, compiling the interpreter:
ghc --make Calculator.hs
You can then execute the interpreter on the command line, and run it again on your test files.
>./Calculator < ex1.calc 9 >./Calculator < ex6.calc 362880
This concludes the work on an interpreter that can handle simple operations on integers.