Category Archives: Programming

An Example of the Beauty of Haskell

In the last six months I’ve been programming mostly in Haskell, which is a language I thoroughly enjoy working with. The other day, though, a visitor on my blog asked a question about one of my Java solutions of the Coding Bat exercises. Here it is:

seriesUp:

public int[] seriesUp(int n) {
	int[] result = new int[n * (n + 1) / 2];
	int pos = 0;
	int i = 1;
	while (i <= n + 1) {
		for (int j = 1; j < i; j++) result[pos++] = j;
		i++;
	}
	return result;
}

When looking at this code, my first thought was that this would be so much nicer to write in Haskell, and that it would probably require just one line of code. We’ll be getting there in this article.

It’s probably not immediately clear what this Java function does. Indeed, even after reading the problem description on the Coding Bat website, and knowing that the result is supposed to be an array with a pattern like {1, 1,2, 1,2,3, …, 1,2,3,..,n} for n >= 0, you might have to read the Java code line by line to check whether this really is what the function is doing. To further complicate matters, there is some “magic” thrown in with regards to the length of the resulting array. This could be figured out with some knowledge of series, commonly taught in Discrete Mathematics or Calculus II. But the formula was given in the original problem description. Without it, surely some Java programmers would have felt tempted to initialize either a very large array, or an array list.

In a functional programming language the code would be infinitely more readable. Even as a relative Haskell novice you could write more maintainable code. As you gradually gain more experience, you will write even cleaner code. I will illustrate this idea by repeatedly transforming a piece of Haskell code until the result is an aesthetically pleasing and easily understandable one-liner.

Let’s say you’re just starting out, but you’ve grasped the idea of recursion already. Then, you might come up with something like the following.

seriesUp :: Int -> [Int]
seriesUp n = seriesUp' 0 n

seriesUp' :: Int -> Int -> [Int]
seriesUp' i n | i > n     = []
              | otherwise = [1..i] ++ seriesUp' (i+1) n 

(Note: the syntax highlighting plugin of WordPress does not support Haskell. In a proper editor, this would look quite a bit nicer.)

Some remarks about the syntax are in order. The “pipe” (|) represents so-called guards, which are conditions that are tested sequentially. The second guard, “otherwise”, is merely syntactic sugar. It is internally defined as the boolean value “True”. The “++” stands for list concatenation. Lastly, an expression like [1..n] is a shorthand for a list. For instance, [1..5] is equivalent to [1,2,3,4,5]. If you are wondering about the first line of the function definition: this is an optional type signature. Those are quite handy, and will allow the compiler to catch many errors for you.

The big difference between this first attempt in Haskell and the original Java solution is that it is immediately obvious what the code does, provided you’ve gotten used to the syntax. The code is easier to understand, faster to write and also much more fun to write.

The fun doesn’t end here, though, as the following simplifications show:

seriesUp :: Int -> [Int]
seriesUp = seriesUp' 0
    where seriesUp' i n | i <= n       = [1..i] ++ seriesUp' (i+1) n 
                        | otherwise    = []

This looks quite a bit nicer already! Three changes were made. First, I removed the argument “n” from the initial function call. The concept is called “eta reduction”, and it is one example of the removal of superfluous variables in the lambda calculus. It’s probably clear from looking at the code how it works.

Second, the helper function ‘seriesUp” was put as a local definition into a where-clause. It’s similar to let-expressions, but a bit more concise. Keeping seriesUp’ as a locally defined function makes sense since this function isn’t used anywhere else in the program. Lastly, the order of the guards was switched. Putting the base case of the recursive call at the bottom leads to a performance improvement — not that it would matter in this toy example — since the guards are checked in order from top to bottom.

This isn’t all that can be done. Assume you make it a bit further in your functional programming text book and discover higher-order functions. The relevant higher-order function in this example is map, which applies a function to every element of a list. For instance, type “map sqrt [1..5]” into the Haskell REPL, and you’ll end up with a list of square roots of the integers 1 to 5.

Instead of calling in-build functions like sqrt for taking the square root of a number, you can also call any other function. This turns the previous piece of code turns into:

seriesUp :: Int -> [Int]
seriesUp n = map oneToN [1..n]
    where oneToN n = [1..n]

This is a significant improvement, but there is an error in it. If you run this code in the REPL, the compiler will notify you that the type signature doesn’t match the function body. To see what is wrong, just comment out the signature and run the example again. It turns out that the result isn’t a list but a list of lists:

*Main> seriesUp 3
[[1],[1,2],[1,2,3]]

This isn’t quite what we want. However, this error can be corrected easily by calling the in-built function concat for concatenating lists.

seriesUp :: Int -> [Int]
seriesUp n = concat (map oneToN [1..n])
    where oneToN n = [1..n]

This leads to the expected result again:

*Main> seriesUp 3
[1,1,2,1,2,3]

At this point the Haskell code is already very concise, clean and highly readable. I’d say it is much superior to the initial version that uses explicit recursion.

The position of the function in map can also be taken by a so-called anonymous function, also referred to as a lambda function. On a side note, James Iry made the humorous remark that Java made lambda expressions popular by not having them. Lambdas are tremendously powerful, and once you’ve gotten used to them, you may find programming in a language that does not have them painful. This might remind you of Paul Graham’s essay Beating the Averages:

As long as our hypothetical Blub programmer is looking down the power continuum, he knows he’s looking down. Languages less powerful than Blub are obviously less powerful, because they’re missing some feature he’s used to. But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn’t realize he’s looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.

When we switch to the point of view of a programmer using any of the languages higher up the power continuum, however, we find that he in turn looks down upon Blub. How can you get anything done in Blub? It doesn’t even have y.

By induction, the only programmers in a position to see all the differences in power between the various languages are those who understand the most powerful one. (This is probably what Eric Raymond meant about Lisp making you a better programmer.) You can’t trust the opinions of the others, because of the Blub paradox: they’re satisfied with whatever language they happen to use, because it dictates the way they think about programs.

Quite recently, though, lambda expressions have finally gained a foothold in the mainstream, with Visual Basic 9.0 (2007), C# 3.0 (2007), C++11 (2011) and Java 8 (2014) supporting them.

After this diversion, let’s see how lambdas can be used in our code. Here’s a first attempt:

seriesUp :: Int -> [Int]
seriesUp n = concat (map (\x -> [1..x]) [1..n])

This is not bad at all! But if this piece of code doesn’t look nice enough yet, you can trade the parentheses for some syntactic sugar.

seriesUp :: Int -> [Int]
seriesUp n = concat $ map (\x -> [1..x]) [1..n]

The dollar sign might look strange at first, but it is common in Haskell programs. The result is one one less non-whitespace character in your code, so this tends to be seen as being more aesthetically pleasing. In many cases it also makes the code easier to read since you will perceive the code to the right of the dollar sign as one block. On the other hand, to many parentheses can easily look messy.

We’re still not done yet. A common theme in functional languages is abstraction. Many frequent operations don’t have to be explicitly programmed. Instead, there is often a standard function that can take care of it. This is true for concatenating the lists that result from our call to ‘map’ as well. That function is called ‘concatMap’.

seriesUp :: Int -> [Int]
seriesUp n = concatMap (\x -> [1..x]) [1..n]

If you’re a seasoned Haskell programmer, then this would probably be the first and last version you write, thus turning the somewhat tedious Java exercise into something rather trivial. Just look at it! Isn’t it beautiful?

With this cute one-liner this series of transformations of the original Haskell solution ends. The purpose of starting with explicit recursion and then successively moving on to ever-higher levels of abstraction was also to illustrate how the code you write will change as you gain more familiarity with functional programming. When starting out, recursion is difficult enough to understand for many people, but this is merely the beginning.

Let’s now have another look at the piece of Java code we started out with. Of course, it is missing the declaration of the main method and its class, which would add a few more lines of boilerplate to the code fragment.

public int[] seriesUp(int n) {
    int[] result = new int[n * (n + 1) / 2];
    int pos = 0;
    int i = 1;
    while (i <= n + 1) {
        for (int j = 1; j < i; j++) result[pos++] = j;
        i++;
    }
    return result;

Putting the Haskell and Java version right next to each other it is probably obvious why, given the choice, one might prefer to program in a language that allows for higher levels of abstraction. Indeed, I find it peculiar when people call functional programming languages “scary” or complain about their allegedly “weird” syntax. Curly-brackets programmers used to ridicule Lisp for its parentheses, which is an objection I can somewhat understand, even though Lisp code still looks a lot nicer than Java or C. Yet, I’m utterly baffled when Haskell syntax is called “weird” since it is so incredibly clean. Compared to the Haskell one-liner we ended up with, I’m tempted to say that the original Java version is the scary one.

Poor Treatment of Recursion in Introductory Textbooks, and a Counterexample

In some computer science or software engineering degree programs functional programming plays a very minor role. There may be an elective course, or there may just be a short segment on recursion in CS101. I went through an introductory sequence that was taught in Java, which doesn’t lend itself to recursion. Probably as a direct consequence of the limitations of Java, recursion was only briefly covered as some kind of novelty concept in programming, and some students were quick to dismiss recursion as “useless” or “academic”.

If recursion gets only treated on a surface level it does look like little more than a novelty concept of little use, with the added disadvantage that it “makes code more difficult to understand.” Sure, the unfamiliar is often difficult to grasp, but the elegance of recursive functions should be self-evident. If you think about it, loops are much less natural. Instead of taking one step after another, you first measure how many steps you are going to take and then process data, and hope that the number of steps you initially determined is equal to the number of steps that is required for the data. This is why there are off-by-one errors in iterative programs but not in functional ones.

Many freshmen have a hard enough time understanding loops. Eventually they will view them as reliable, possibly because they don’t notice half their off-by-one errors, i.e. those that don’t throw an exception due to an out-of-bounds error. I think if you taught people functional programming first and only afterwards introduced them to Java or C++, they might view it as a step back. On the other hand, if you let them experience with recursion in a language that is not suited for it, they learn that it’s easy to blow up the stack, which leads to the formation of beliefs like the two following, which mere made by top posters on Stack Overflow. The context was that one guy posted about an issue he was having with Ruby, a multi-paradigm programming language that is hardly ideal if you want to program in a functional style because it can’t handle deep recursive calls well.

If you’re intentionally recusing 10,000 times, you’re doing it horribly wrong and abusing recursion.

In general, unbounded recursion is bad. Most recursion can be rewritten to be iterative.

The cause of the problem, though, was that Ruby is missing tail call elimination. Recursion itself is not the problem. A programming language with support for tail call elimination should handle recursion just fine. Both statements are therefore misguided. I’ve found that it’s not that uncommon that supposedly good programmers, but also university lecturers, hold similar views. Sometimes you really wonder what a CS or software engineering degree is really good for. By the way, recursion and iteration are equivalent, which means that you can theoretically rewrite any recursive function as an iterative function, and vice versa.

I’ve met quite a few people who view recursion as an obscure alternative to simple loops and “something you don’t do in production code”. I was therefore wondering where this view might come from, and skimmed some introductory text books. The index of Bjarne Stroustrup’s Programming: Principles and Practice Using C++ doesn’t mention recursion at all. Y. Daniel Liang’s Introduction to Java Programming has a short chapter on recursion, but you’ll only find very basic examples there. Python is getting more popular as an introductory language, so I had a look at John M. Zelle’s Python Programming: An Introduction to Computer Science. In that book, recursion is covered in the very last chapter, which doubles as an introduction to the analysis of algorithms.

Zelle uses well-worn examples like computing factorials and Fibonacci numbers or checking whether a string is an anagram. Towers of Hanoi is in there, too. Recursion is also mentioned when describing sorting algorithms like merge sort. The problem I see with this is that the simpler examples don’t necessarily show any advantage of recursion. Towers of Hanoi is too specialized, and it’s not quite clear how that knowledge will transfer to any program you might write yourself. Lastly, sorting algorithms are normally presented as pseudo-code, and recursion is just part of some. In other words, this approach doesn’t necessarily invite exploration.

What’s missing are more elaborate and somewhat realistic problems that show how recursion can lead to cleaner code, and code that is much easier to reason about. Typical textbook examples normally don’t provide this. I’m not sure that fractals make a good example either. Sure, if you’re going to write a fractal generator, you might revisit your textbook and check how recursion is used there. However, there isn’t much of an application for it.

In short, none of the introductory textbooks mentioned give the impression that recursion may be a viable programming concept, and that it’s worth thinking about whether a certain problem might better be solved recursively, instead of making iteration the default choice. I was pleasantly surprised, though, to find a few neat problems in Allen B. Downey’s Think Python: How to Think Like a Computer Scientist. This book is quite astounding. It’s free, short, and content-wise it runs circles around your typical 1000+ page tome that is full of filler material.

Here is exercise 12.6, taken from version 2.0.10 (May 2013) of Think Python. Prof. Downey frequently updates that book, so the exercise may be modified, moved, or even removed from the book in the future. Therefore I’ll describe it fully.

The input is a list containing a good 100,000 words of the English language, and the desired output is the longest word that can be completely reduced. In this context, “reducing a word” means that a given word remains a valid word, i.e. a word contained in the word list, after removing a letter. You can remove a letter from anywhere in the word, but you cannot rearrange letters. The goal is to eventually end up with a one-letter word.

The example in the book is sprite, which leads to the following reduction:

sprite -> spite -> spit -> pit -> it -> i

So, what is the longest word in the English language that can be reduced in this manner? Feel free to grab the text file and try to figure it out yourself.

Prof. Downey gives the following hints:

Write a program to find all words that can be reduced in this way, and then find the longest one. This exercise is a little more challenging than most, so here are some suggestions:

1. You might want to write a function that takes a word and computes a list of all the words that can be formed by removing one letter. These are the “children” of the word.
2. Recursively, a word is reducible if any of its children are reducible. As a base case, you can consider the empty string reducible.
3. The wordlist I provided, words.txt, doesn’t contain single letter words. So you might want to add “I”, “a”, and the empty string.
4. To improve the performance of your program, you might want to memoize the words that are known to be reducible.

So, how would you do it? Give it a try, and you might find that recursion allows to write the most important function in a very succinct and elegant way, even though a large part of the program is iterative. Thus, it shows that recursion can be a useful addition to your toolbox, which is an impression you probably won’t get from computing factorials.

In my solution, which is reproduced below, I print out a list of the 20 longest words, but of course you can as well just return the first entry of the list.

def reduce_words():

    def create_dict():
        fin = open('words.txt')
        res = {}
        
        for line in fin:
            word = line.strip()
            res[word] = []
            
        for element in ["a", "i", ""]:
            res[element] = []

        return res

    def add_children(d):
        for key in d.keys():
            children = []
            for i in range(len(key)):
                candidate = key[:i] + key[i+1:]
                if candidate in d and candidate not in children:
                    children.append(candidate)
            d[key] = children
            
        return d

    def recursive_trails(d):
        res = []

        def helper(key, result):
            if d[key] == []:
                return
            if key in ["a", "i", ""]:
                res.append((len(result[0]), result))
            else:
                for entry in d[key]:
                    return helper(entry, result + [entry])
                
        for key,value in d.items():
            helper(key, [key])

        return res

    def top_n(results, n):
        results.sort(reverse = True)
        for i in range(n):
            print results[i]
                    
    d = create_dict()
    add_children(d)
    trails = recursive_trails(d)
    top_n(trails, 20)

reduce_words()

I thought that this was a really neat exercise that showed that you can use recursion in a pretty realistic problem, and end up with pretty elegant code. If only examples like this were included in standard introductory textbooks, or if instructors would use Prof. Downey’s book. But the textbook industry is shady, here is just one example, and the higher education industry complicit, so it may require another two or three financial crises until we see substantial rates of adoption of free high-quality textbooks.

Backtracking Search in Python with Four Queens

The final project of Coursera’s Introduction to Systematic Program Design – Part 1 was to write a solver for the four queens problem. The goal is to place four queens on a 4 x 4 chess board so that the queens do not obstruct each other. This problem is a simplification of the eight queens problem, and it’s a good exercise for backtracking search. Please note that the Coursera course was using a Lisp dialect as its teaching language. Due to the Coursera honor code I cannot share that solution. However, since some students attempted to rewrite their solution in Python and discussed their approaches on the forum, I decided to port my solution to Python as well and present it here.

Given the limited problem space, there are other approaches to the four queens problem that may be more effective. Those may not teach you about constraint programming or backtracking search, though, and they probably don’t scale that well either. Just have a look at a 4 x 4 chess board: If you have the insight to put the first queen on the second square, then the problem basically solves itself!

To make it a bit easier to transition between the problem description and the eventual Python code, I’ll describe the positions on the board as integers, and not as a letter-integer pair as you’d see it on the chess board. The board therefore turns into:


 0  1  2  3
 4  5  6  7
 8  9  10 11
 12 13 14 15 

Let’s assume you want to implement a proper backtracking algorithm. Say, you start with an empty board. What would you then do? The key idea is to check every square and if this turns out to be unfeasible further down the line, discard that branch of the emerging solution tree and continue from a point that has not yet been explored. Concretely, this means that after the very first iteration of the program, when given an empty board, you’ve created a list of 16 boards, each containing a queen on a different square. The first board in the list has the queen on position 0, the second on position 1, and so on.

Let’s stick with the very first board for the time being. In the second iteration you’d create another list of boards that all contain one queen on square 0, and the second queen on one of the squares 6, 7, 9, 11, 12, 14. Pick any of those boards, put a third queen on them, and you’ll find that it is impossible to place a fourth queen. Seeing that it’s not possible to solve the 4-queens problem if you start with a queen on square 0, you then move on to the next square and try to solve the problem. This is indeed one of the solutions, so the algorithm should present a valid board at the end.

Having illustrated the problem, let’s now talk about one way to code up the solution in Python. I tend to avoid overly terse expressions, apart from a list comprehension here or there, so it should be readable with just a bit of commentary.

To make life a bit easier and the resulting code more readable, I made the following definitions:

B = False
Q = True
all_positions = range(16)

The variable “all_positions” is nothing but a veneer to make the code more legible. It represents the idea that the board is a list with 16 entries.

You might think that it’s silly to define B (for blank) as False and Q (for Queen) as True. However, you can then define a board more succinctly in list form. Here’s what the empty board looks like:

BD0 = [B, B, B, B,
       B, B, B, B,
       B, B, B, B,
       B, B, B, B,]

This list gets evaluated to a list with sixteen “False” entries, but the shorthand introduced above saves some typing and allows for easier illustration of the problem.

A queen attacks in three directions over the whole extent of the board, i.e. horizontally, vertically, and diagonally. Since the board is represented as a list of boolean values, and Python allows for index-based list access, the following definitions can be used to traverse the board easily and check whether there are any collisions:

rows = [ [  0,  1,  2,  3 ],
         [  4,  5,  6,  7 ],
         [  8,  9, 10, 11 ],
         [ 12, 13, 14, 15 ] ]

columns = [ [ 0,  4,  8, 12 ],
            [ 1,  5,  9, 13 ],
            [ 2,  6, 10, 14 ],
            [ 3,  7, 11, 15 ] ]

diagonals = [ [  1,  4 ],
              [  2,  5,  8 ],
              [  3,  6,  9, 12 ],
              [  7, 10, 13 ],
              [ 11, 14 ],
              [  2,  7 ],
              [  1,  6, 11 ],
              [  0,  5, 10, 15 ],
              [  4,  9, 14 ],
              [  8, 13] ]

Please note that this solution does not scale well. It works fine on a 4 x 4 board, but to extend it to an 8 x 8 board, you’d either have to expand the lists with fitting numerical values, or find a way to represent rows, columns, and diagonals as sequence definitions that can be generated automatically, but that’s a problem to look into later. Currently, I’m primarily concerned with an illustration of that particular search problem.

With the preliminaries behind us, I’ll now focus on a general strategy for solving the problem, before filling in the blanks and showing you the source code. The following also serves as a blueprint if you want to go ahead and write the program yourself. I prefer to decompose functions so that they, ideally, achieve one task only. This leads to a greater number of functions, but it makes it easier to reason over code, to debug, and to extend the program later on.

Following the description above, the main steps are something like this:

  • 1. enter a starting position (board)
  • 2. if that board is solved, return the board
  • 3. else: generate a list of boards with all valid subsequent positions, and continue with 1)

This gives rise to the question when a board is valid. To make the code a bit cleaner, I’ll assume that the set of inputs only consists of well-formatted lists, according to the previous specification. A board is valid if there is at most one queen in every column, row, and diagonal, which can be expressed like this:

def board_valid(board):

    def check_entries(entries):
        for entry in entries:
            if number_of_queens(board, entry) > 1:
                return False
        return True
    
    return all([ check_entries(x) for x in [ rows, columns, diagonals ] ])

The function number_of_queens() is defined separately:

def number_of_queens(board, positions):
    return sum([ 1 for pos in positions if board[pos] ])

A board is solved if it is a valid board and if there are exactly four queens on the board:

def board_solved(board):
    return isinstance(board, list) and board_valid(board) \
            and number_of_queens(board, all_positions) == 4 

With those functions in place, everything is set up to solve the problem. I tend to prefer a bottom-up approach when programming, and I visually represent this by putting the most general function last.

# find all empty squares
def get_positions(board):
    return [ x for x in range(len(board)) if not board[x] ]

def get_next_boards(board, positions):
    result = []
    for pos in positions:
        temp = copy.deepcopy(board)
        temp[pos] = True
        result.append(temp)
    return [ board for board in result if board_valid(board) ]

def solve_board(board):
    if board_solved(board):
        return board
    else:
        return solve_board_list(get_next_boards(board, get_positions(board)))
        
def solve_board_list(board_list):
    if board_list == []:
        return False
    else:
        check = solve_board(board_list[0])
        if board_solved(check) != False:
            return check
        else:
            return solve_board_list(board_list[1:])

I think this is quite readable. If the flow of execution is not quite clear, please refer to the description of the problem at the start of the article. Congratulations! You’ve now solved the four queens problem using backtracking search.

The entire code, with test cases and some further helper functions, like a visualizer, a generator for all valid starting positions, and a function that outputs all possible solutions, is available on my github page. Look for the file “4_queens.py”.

Allen Downey’s Think Python: How to Think Like a Computer Scientist

Many textbooks are bloated, poorly structured, and badly written. Most seem to be quite useless without an accompanying college course, but if the course is well-taught, you can often skip the textbook altogether. Programming is no exception to this rule. As Python got more popular, the publishing industry started churning out one tome after another, and from what I’ve seen, they are often dreadful.

For a particularly bad example, look at Mark Lutz’s Learning Python, now in its 5th edition. It’s a staggering 1600 pages thick, and full of downright absurd examples that seem to consist of little more than manipulation of strings like “spam” and “egg”, and if you’re lucky, he will throw in the integer “42” as well. Mark Lutz’ book on Python is quite possibly the worst technical book I have ever encountered, but the other books I’ve sampled were not much better.

One thing they all seem to have in common is their inflated page count. I think this is simply a tactic of publishers to justify selling their books for a higher price. Adding another 500 pages costs very little with a decent print run. Yet, all that dead weight allows them to increase their retail price by 100 %. Apparently consumers have been misled to believe that a higher page count means that you’ll get a better bang for your buck, but the opposite is true.

On the other hand, Think Python: How to Think Like a Computer Scientist, in version 2.0.10, hardly exceeds 200 pages. Yet, Allen Downey manages to cover all basic programming constructs, recursion, data structures, and much more. He even details helpful debugging tips, and added a useful glossary for quick lookup of terms you may be unfamiliar with. File I/O got its own chapter. That’s not all. Towards the end of the book, he invites you to explore a GUI toolkit (Tkinter), object-oriented programming, and explains the basics of the analysis of algorithms. The amount of content in this book is quite staggering, especially when compared to its peers. Downey managed to organize his material very well, which resulted in a book that is slim, yet still feels complete.

What I particularly liked about Think Python is that the material is presented in a clear, logical order. Consequently object-orientation shows up very late. In fact, it is introduced as an optional feature. This is how it should be done, if you want to include OOP. For a beginner, “modern” OOP adds unnecessary complexity and makes it harder to form a clear mental model of the problem you are going to solve. On the other hand, competent experienced programmers tend to be highly critical of OOP. You’re probably better off if you never encounter it. That’s not (yet?) a mainstream opinion, though, so you may have to learn OOP eventually.

In a book like Liang’s Introduction to Programming Using Python, you get dozens of exercises at the end of each chapter. Most are a drudgery, focussing on minute details of the language. Often, they are just mere variations of one another, not unlike the mechanical “drills” that are popular in high school mathematics education. But this isn’t even the worst part of it. Nowadays, you can’t just pick up an old edition of a textbook and get basically the same product plus a few typos here and there. No, instead there are changes all over the place, not all of them particularly well-thought out, and the exercises get modified as well. Of course, compulsive updating and rushing a book to release makes it easy for new errors to find their way into the book, as evinced by long lists of errata of every new edition. On a side note, Liang’s books are particularly aggravating since you don’t even get the full book anymore. Instead, a good chunk of the content consists as “bonus chapters” that have to be downloaded by using a code that was printed in the book, presumably as an attempt to make buying used books less attractive.

Compared to that despicable strategy, you can get Downey’s book not only free of charge, but complete too. His exercise are often surprisingly engaging for a beginner’s text. Writing a simple recursive function that checks whether a word is an anagram is not very exciting. On the other hand, processing a text file that contains over 100,000 words, and finding the five longest anagrams in the English language is more involved, and to successfully solve that exercise, you have to draw from previous chapters. This makes Think Python particularly interesting for autodidacts since you can effectively use the exercises to check whether you have gained a firm grasp of the material. There is a reasonable number of exercises in the book, and they are well-chosen. It’s common that they systematically build upon each other. This should be normal for textbooks, but it’s an exception.

Think Python: How to Think Like a Computer Scientist can be freely downloaded at the homepage of the author. The book has been released under a free license. As a consequence, there are editions for different programming languages. Think Python itself is based on an earlier book that covered Java. Being freely available also led to a large audience for this book. Downey acknowledges dozens of people who have made suggestions or pointed out errors in the text. After about a decade, this now leads to Think Python being a very polished book. Of all the introductory Python textbooks I’ve had a look at, it is the only one I feel comfortable recommending. It’s great for complete beginners and also for people who have experience in another language and quickly want to familiarize themselves with Python.

No, Java is not a good first programming language

Recently someone reached this site through the search phrase, “is java a good first language 2013”, so I felt tempted to answer this question from my own highly subjective point of view. The short answer is no, it’s not. In fact, Java is one of the worst first languages you could pick. In the following I’ll highlight two main issues. One issue, the complexity of Java, will make life more difficult for you immediately, and the other, the danger of developing myopia regarding programming languages and their capabilities, has the potential to hurt you for many years to come and possibly your entire career.

For a beginner it can be exhilarating to make a computer bend to your will. However, the path to reaching even a low level of proficiency is paved with copious amounts of frustration. Writing code demands a level of attention to detail and a degree of logical thinking that is above what the average person could muster. Gaping holes in your logic are par for the course in, say, mainstream media or normal conversations, but a computer is not so forgiving. Just learning to proceed in a clear, logical way is difficult enough for a novice. But not only do you have to learn that. In addition, Java and other “industry-strength” languages burden you with their complexity and will therefore make it more difficult for you to learn programming.

One concept you’ll learn in CS101 is what computer scientists call “types” or “data types”. The declared type indicates how a data has to be interpreted, and determines how it can be manipulated. For instance, an integer with the value 42 is not the same as a string with the value “42”, even though they look the same when printed to the screen. You can multiply an integer by 2, but not a string. On the other hand, you can append another string to a string. All of this is probably not overly exciting. In Java, you have to declare variables with their type. In a more modern language like Python, which has been around for over two decades, you simply initialize the variable since it is dynamically typed. Here is an example if this sounds confusing. [Note: Static typing isn’t necessarily bad. However, the type system of Java doesn’t buy you much, unlike the type inference in Standard ML and other functional languages, but if this is an objection you wanted to raise, then you aren’t a beginner and shouldn’t consider yourself part of the target audience of this article.]

Let’s say you’ve got a list of integers and want to double each entry of the list. If you want to use Java for that, you’ll end up writing a nice little piece of code like this:

package demo;

public class RangeDemo {

	public static void main(String[] args) {
		
		int[] myArray = { 1, 2, 3, 4, 5 };
		
		for (int i = 0; i < myArray.length; i++  ) {
			myArray[i] *= 2;
		}
		
		System.out.println(myArray);
	}
}

If you’ve never seen code before in your life, this might look quite intimidating. You’ve got all this scaffolding, and plenty of keywords which represent concepts you have not encountered yet. You know nothing about packages, or classes. The syntax is baroque. What are all those curly braces about? What are the semicolons good for?

In fact, for a beginner writing something as simple as those few lines of code can pose a significant challenge in Java. This is not because the operations that are involved are so difficult to grasp. It’s basic arithmetic, after all. Instead, the reason is that Java throws so much unnecessary complexity at you. By the way, I’ve hidden an easter egg in that code. Try to run the example for yourself, and you’ll see what I mean. Come on, try it! I’ll wait for you.

Strange, isn’t it? The console gave you an output like “[I@4b71bbc9”. This is the location of the array in memory. To actually print the content of the array, you have to import the Array class, and then convert the array to a string representation:

package demo;

import java.util.Arrays;

public class RangeDemo {

	public static void main(String[] args) {
		
		int[] myArray = { 1, 2, 3, 4, 5 };
		
		for (int i = 0; i < myArray.length; i++  ) {
			myArray[i] *= 2;
		}
		
		System.out.println(Arrays.toString(myArray));
	}
}

As you see, this is a lot of code to do something very simple. Sadly, this approach scales. While your first programs will be a few dozen lines that should be just one or two, your larger programs will be several hundred lines instead of a few dozen, and in even more unfortunate cases several thousand instead of a few hundred. At this point, you’ll have heard of “patterns”, which were developed as a consequence to the deficiencies of the language and are largely superfluous in more advanced languages. [Peter Norvig discusses this problem in greater detail in Design Patterns in Dynamic Languages, but it won’t make much sense for a beginner. Maybe bookmark the link and come back after some months or a year.]

There is also the notion of “boilerplate code”, which is a problem in Java and other equally verbose programming languages. You might think that printing “hello world” should be one or two lines. Not in Java. Or you might think that a common operation like opening a file should be a line or two and consist of specifying the file name and the mode, i.e. whether the file is supposed to be only read or read and written to. In Java this isn’t quite so simple. Here is an example I’ve taken from Hackerrank:

import java.io.*;
import java.lang.*;

public class Solution
{
    public static void main( String[] args )
    {
        File fileName = new File( "myfile.txt" );
        if( !fileName.exists() )
        {
            System.out.println( "this file doesn't exist " );
            try
            {
                fileName.createNewFile();
                FileWriter fileWrite = new FileWriter( fileName );
                BufferedWriter bufferedWriter = new BufferedWriter( fileWrite );
                //bufferedWriter.write( "write something here " );
                bufferedWriter.close();
            } catch ( IOException e )
            {
                //catch exception
            }
        }
        else
        {
            //System.out.println( "this file exists " );
            try
            {
                byte[] buffer = new byte[ 100 ];
                FileInputStream inputStream  = new FileInputStream( fileName );
                int readLines = -1;
                while( ( readLines = inputStream.read( buffer ) ) != -1 )
                {
                    //System.out.println( new String( buffer ) );
                }
                inputStream.close();
            } catch ( IOException e )
            {
                //catch exception
            }
        }
    }
}

Yes, all this just to open and read a file! Also note that the use of the “BufferedWriter” in the first try block constitutes what I’ve once been told was a “pattern”. Now if you look at the try block in the else statement, you may wonder why there is no analogue like a “BufferedReader”. In this example, the file is read as an input stream. Last time I checked, the recommendation was to use “BufferedReader” since it provided better performance when reading large files, but this is nothing you have to worry about in CS101. But let’s not get lost in details. The point I’m trying to make is that code like the one above is, well, utter madness.

By the way, this is the equivalent in Python:

#!/usr/bin/python

filename = "myfile.txt"
with open( filename ) as f:
    # file read can happen here
    # print "file exists"
    print f.readlines()

with open( filename, "w") as f:
    # print "file write happening here"
    f.write("write something here ")

Again, the example has been taken from Hackerrank. The pound sign (#) indicates a comment and is not necessarily for program execution. Thus, you’re looking at a total of five relevant lines of source code. Yes, five.

Those were just two random examples, but they hopefully illustrate why choosing Java as a first language is a rather bad idea. I’m proficient to various degrees in about half a dozen languages. However, I don’t think I’ve ever found myself in a situation where I was writing code in, say, Python or Erlang and said to myself, “I wish I could do this in Java instead.” When I have to work in Java, though, I tend to feel reminded of the increased expressiveness of other programming languages.

Instead of Java as a first language, I’d recommend you learn Python instead. Python code can be very clean and readable, and dynamic typing makes it more fun, too. I haven’t given you the equivalent of the very first Java example yet, which was doubling the numbers of an array of integers. The Java code at the beginning of this article can be expressed in Python like this:

my_array = [1, 2, 3, 4, 5]
my_array = [ number * 2 for number in my_array ]
print my_array

If this is too verbose, you can shorten it a bit:

print [ number * 2 for number in [1, 2, 3, 4, 5] ]

This is what people mean when they say that one line of Python can replace ten lines of Java. As a beginner, you’ll probably find this code easy to understand, even if you’ve never seen code in your life.

Apart from all the unnecessary complexity Java throws at you, there is another aspect I’d like to highlight: programming language myopia. Many people find it very difficult to liberate themselves from the mental models and habits they have acquired in their first programming course. Instead of adopting a beginner mindset again when picking up a different language and learning what it offers, they view everything from the one perspective they know, and happily write Java in Scala or Java in Python or Java in any other language that lets them get away with it. They pronounce that differences between programming languages are “just syntax” and never realize that they don’t make use of more advanced features of the languages they use.

It’s too easy to be complacent when your code seemingly works. Your overly verbose Java in Scala does the job, so why would you bother stretching yourself a little bit? This attitude is common, and you’ll have no problem bumping into people who reject using more powerful languages because of their “weird syntax”. For more on this, read what Paul Graham has to say about Blub programmers. It may come in handy at a nerdy cocktail party. Variations of what Paul Graham calls the Blub Paradox can happen at all levels of competency. People think what they are doing is good enough and don’t make an additional effort. However, it is a missed opportunity if you view a more expressive language from the view of a less expressive one and never familiarize yourself with the possibilities more powerful languages offer.

For a possibly surprising example, let me tell you about one of the MOOCs I sampled earlier this year, MIT’s Introduction to Computer Science and Programming. There are many conveniences in Python, for instance an immediate variable swap. Swapping the content of the variables a and b requires nothing more than writing “a, b = b, a”. On the other hand, in Java and many other programming languages you would have to use a temporary variable and express the same in three lines. Of course, you can write the same three lines in Python and use the temporary variable you’re used to. It works, but it’s considerably less elegant. This is a relatively harmless example. More severe was the introduction to object-oriented programming in that course was taught from a clearly Java-inspired perspective. In Java it is necessary to write so-called getter and setter methods, sometimes called accessor and mutator methods. Yet, this is superfluous in Python, where you have direct access to attributes you want to access or modify. Now, if this happened in a CS course that was taught by MIT, then it’s probably safe to assume that a college freshman who is starting out with Java isn’t immune from that kind of programming language myopia either, so do yourself a favor and learn one or two other programming languages first.