Category Archives: Programming

Review: An Introduction to Interactive Programming in Python — Coursera

I just finished An Introduction to Interactive Programming in Python on Coursera, which is taught by Rice University professors Joe Warren, John Greiner, Stephen Wong, and Scott Rixner. This course has the rather unique approach of teaching programming by building games of increasing complexity, instead of teaching programming language syntax with a plethora of dull exercises. Overall, it was an excellent experience, and thoroughly enjoyable. Most of all, it was great fun.

I originally signed up because I was interested in the structure of the course. I liked the whimsical style of the instructors, and before I knew it, I was sinking a couple hours a week into programming games. Games were only used as a vehicle to teach programming, though. I think this is a much more attractive approach than coding a dreary CRUD application. Presumably, more students would stick with computer science or mere programming if they were presented with more interesting application domains. I certainly have never heard of teenagers who learned programming on their own because they wanted to write a CRUD app in Java. Games, on the other hand…

Pedagogically, the class was excellent, since it focusses on practical examples first. For a beginner, this is surely much more helpful than the misguided attempt of introducing rigorous definitions for their own sake. The content was similar to a typical introductory programming class that is taught using Python, such as MITx’s 6.00 or Udacity’s CS101. Yes, object orientation is  covered, and so are common data structures like lists, sets, dictionaries and tuples. Even more advanced concepts like list comprehension were taught.

As teaching platform an online IDE named Code Skulptor was used, which implements a subset of Python 2.6 in JavaScript. Two bundled multimedia libraries make it much easier to write games. Code Skulptor provides syntax highlighting and code folding, i.e. the ability to hide the definition of a function. This environment worked very well for most of the course. The last few assignments led to code bases in excess of 200 lines, which made programming in the browser a bit awkward, though.

Now, let’s talk about the games you’ll write in the course: the first project is an implementation of rock, paper, scissors, lizard, Spock, which uses console output. The week 2 project is not much more complicated: it’s the guessing game. You know, guess a number between 1 and 1000. This project also served as a vehicle to introduce binary search, and basics of GUI interactivity. Week 3 was only a minor step up, covering modulo arithmetic by programming a stop watch, and implementing a “game” on top of that: try to stop the counter at the start of a second. Since the counter keeps track of 100 ms segments, this is rather difficult.

The fun started in week 4, though, which focussed on Pong, using simple vector graphics. Pong may not be very exciting nowadays, but it’s quite satisfying to play a round or two (against yourself) once you’ve implemented all game mechanics. It felt satisfying just seeing the ball physics in action. Quite a few students seemed to have struggled in week 5 with the implementation of Memory, as it involved more complex game state and manipulations. In Pong everything is visible all the time, while in Memory cards aren’t always exposed. As you can see from those examples, the assignments got progressively more complex, and they required increasing sophistication.

The most time-consuming project was probably Blackjack, the week 6 project, which implemented a simplified rule set of that game. The addition was to work with tile maps. The last two weeks tied everything together with an implementation of Asteroids: game physics, sound, tile maps, collisions, game state, it’s all in there. Despite being the most elaborate project, it was very straight-forward. As such, it was a rewarding conclusion to this course. Here’s a screenshot:

RiceRocks

RiceRocks: An implementation of Asteroids

The support structure of this course was excellent, too. You could even email a help desk at Rice University if needed, and each week, a support thread was started in the forum that discussed the most common stumbling blocks. Examples of programming concepts were plentiful, and supplementary exercises helped reinforce concepts, if you needed it. I didn’t have to resort to any of those resources, but knowing that they were there was reassuring. I also appreciated that materials were posted early, on Saturdays, which gave you a head start, and made it much easier for me to find the time for this course.

There were only a few blemishes. One quibble I had was that in one of the later projects some “Java in Python” was taught as students were asked to write a “getter” method. However, in Python you can just access the attributes directly. Peer review of the weekly assignments worked well. However, the grading rubrics focussed merely on the exhibited functionality of the programs, not on the actual code. I found this questionable since it may reinforce bad habits. A grading rubric for “style” may be a reasonable addition to future iterations of this course.

To summarize, An Introduction to Interactive Programming in Python is a great way for a beginner to get started with programming. This course is also interesting for more experienced programmers, since you’ll be implementing simple games. I certainly found this enjoyable. Granted, it probably won’t provide much of a challenge if you’ve made it through an introductory programming course already, but it should be quite entertaining.

Finding an Eulerian Path

In Udacity’s Introduction to Algorithms course, one of the challenge problems asks you to define a function that, given a graph that contains an Eulerian path, returns one such path. I’ve reproduced my solution below. The solutions posted on the Udacity forum where often in two camps: somewhat verbose imperative, and very concise functional ones that implemented Hierholzer’s algorithm for finding cycles in Eulerian paths. My solution is somewhat in the middle. It’s relatively concise, but it makes use of an additional data structure I found helpful as it allows to easily inspect the state during program execution: a list containing the number of unused edges per node. I think it’s quite easy to follow the flow of execution, but I’ll walk you through step by step afterwards.

Here is the program listing:

def find_eulerian_tour(graph):
            
    def freqencies():
        my_list = [x for (x, y) in graph]
        result = [0 for i in range(max(my_list) + 1)]
        for i in my_list:
            result[i] += 1
        return result
        
    def find_node(tour):
        for i in tour:
            if freq[i] != 0:
                return i
        return -1
    
    def helper(tour, next):
        find_path(tour, next)
        u = find_node(tour)
        while sum(freq) != 0:     
            sub = find_path([], u)
            tour = tour[:tour.index(u)] + sub + tour[tour.index(u) + 1:]  
            u = find_node(tour)
        return tour
                 
    def find_path(tour, next):
        for (x, y) in graph:
            if x == next:
                current = graph.pop(graph.index((x,y)))
                graph.pop(graph.index((current[1], current[0])))
                tour.append(current[0])
                freq[current[0]] -= 1
                freq[current[1]] -= 1
                return find_path(tour, current[1])
        tour.append(next)
        return tour             
             
    graph += [(y, x) for (x, y) in graph]
    freq = freqencies()   
    return helper([], graph[0][0])

The graph is given as a list of tuples. Tuple (x, y) indicates that node x has an (undirected) edge to node y. In line 37 the list with nodes is enlarged by adding tuples (y, x) to it, which reflects the fact that the edges are undirected.

In addition, a list of the number of edges per node (“freq”) is created and updated as the list with the eventual Eulerian tour is built. This means that whenever a node that is part of the Eulerian tour is found, the two affected nodes have their number of edges reduced by one.

If the graph has cycles, it is possible that the first pass of find_path() doesn’t process part of the graph. In this case, the program checks whether the sum of the list “freq” is equal to zero. If it is, then the Eulerian tour has been computed since there are no unused edges left. On the other hand, if “sum(freq)” is greater than zero, then unused edges are left, so the program proceeds finding a node with unused edges, finds another cycle, and inserts it into the resulting Eulerian tour. This step will be repeated for as long as there are unused edges in the graph.

Please note that any node can be used as a staring point. This follows from the existence of an Eulerian path, which uses all edges. Since it contains all edges, you can pick any node and from there pick any edge to start the Eulerian tour with.

On the Udacity forums I found a very helpful test case, which is a graph with three cycles. When working on the program, I used this and two derivative cases to test and guide development. Here they are:

graph1 = [(0, 1), (1, 2), (2, 0)]
# returns [0, 1, 2, 0]

graph2 = [(0, 1), (1, 2), (2, 0), (2, 3), (3, 4), (4, 2)]
# returns [0, 1, 2, 3, 4, 2, 0]

graph3 = [(0, 1), (1, 2), (2, 0), (2, 3), (3, 4), (4, 2), (4, 5), (5, 6), (6, 4)]
# returns [0, 1, 2, 3, 4, 5, 6, 4, 2, 0]

How to Get Started With Java (as a Beginner)

For better or worse, Java is one of the dominant languages in the software engineering industry. As a software engineering professional, you’ll quite possibly have to work with a Java code base at one point in your career. I am not a big fan of Java, and I don’t think it’s a good first language in the computer science curriculum. Sadly, this is often the case. In fact, Java is a horrible first language. The first two “real” languages I learnt were Python and Scheme (Lisp), and thus I was used to elegance and expressiveness. Java felt like a big step back. For instance, here are a few lines of Python code that print the numbers from 1 to 5:

for i in range(1, 5):
    print i

Here is the equivalent in Java:

package demo;

public class RangeDemo {

	public static void main(String[] args) {
		
		for (int i = 1; i < 5; i++) {
			System.out.println(i);
		}
	}
}

Following Java style guides, you end up with a staggering 11 lines of code instead of 2. Just look at all the wasted horizontal space and the baroque syntax! Proponents of Java like to point out that their language was never designed to write simple “scripts”, but Java bloat is a factor no matter how large your program is. You’ll end up writing reams of code that do very little, and you quickly end up wondering why the code base is metastasizing in front of your eyes. What Python could do in, say, 100 lines could easily require about 1,000 lines in Java. This is not an exaggeration.

I could write a couple of blog posts on the downsides of Java, but my point is merely that there are better languages to learn as a beginner. You are probably better off checking out Python first if you are new to programming. But let’s say you’ve got no choice because you have enrolled in a computer science program, and now you find yourself sitting in an intro class, confused that CS isn’t about playing video games, web design, or using MS Office. Your instructor wields a 1,500 page tome on Java and pontificates about its “industrial strength” and the marvels of object-orientation.

Let’s start with textbooks first. Textbooks seem to get bigger and bigger so that they can be sold for more and more money. Yet, didactically, they only get worse. In case of Java, though, I do think that common textbooks mirror the verbosity of the language they describe in a wondrous way. The OOP course at university I attended used Liang’s Introduction to Java Programming. Just like other similar books, it lacks proper structure, and has absolutely no qualms about wasting the student’s time. The exercises were dull, repetitive and uninspiring. Fortunately, there is a free alternative out there that covers all bases, at least as far as introductory courses are concerned. It is Allen B. Downey’s Think Java: How to Think Like a Computer Scientist. In less than 300 (small) pages, you’ll get acquainted with the basics of the language. If you go through this book in a week or two, then a semester-long course on OOP with Java will teach you barely anything new. It will just take five times longer.

Some teachers tell beginning students to start programming in a text editor first, and to compile and execute their code via the command line. In the case of Java, this is seriously misguided advice. You do not want to spend hours looking for a missing curly bracket because you thought you were “hardcore” and turned off syntax highlighting in your editor. Likewise, tasks like compiling and executing your program are simply repetitive time wasters that should be automated. There is no learning experience in saddling yourself with that kind of manual labor. So, instead of an editor use an IDE. Eclipse is as just bloated as Java, and overkill for what you do in your typical first- or second-year CS course. A more lightweight (and free) alternative is DrJava.

Lastly, you’ll want to do plenty of exercises to get proficient at, well, writing code in Java. Unlike some textbook authors I see little point in going through exercises that have you write classes that don’t do anything but make you resent “object orientation” à la Java. After you’ve written a handful of those, you know how this works. Instead, you should practice programmatic thinking. For this purpose, the Java section on CodingBat is an excellent resource, providing a few hundred exercises. They won’t teach you everything you need to learn, but they will put you well ahead of the pack. If you get stuck, you are welcome to consult the solutions on my website.

Lastly, you might want to check out Stanford’s CS106A: Programming Methodology, which is a free online course that offers an introduction to programming with Java. This course and its associated textbook, The Art and Science of Java by Eric Roberts, start out with a special set of Java libraries, the ACM Java Libraries. Those hide some of the complexities of the language. They allow you to write more concise code, and they also lead to a gentle introduction to object-orientation with Java. The course itself is your standard-fare remedial programming course.

The lecturer, Mehran Sahami, is entertaining and competent, and the course focusses on pretty interesting programming tasks. One of the first bigger projects consists of writing a Breakout clone, for instance. Also, I really liked the introductory sequence, which covers “Karel the Robot”. If you’ve never programmed before and have to get started with Java, this would indeed be a very good starting point. Using those resources, you’ll be well on your way to becoming competent in Java, but I hope you’ll explore some other programming languages too.

Relational Division in SQL The Easy Way

I recently studied SQL as part of an introductory course on databases. SQL itself is not particularly difficult to grasp, yet compared to relational algebra, the division operation is much more complex. In relational algebra, there is a division operator, which has no direct equivalent in SQL. This means that you’ll have to find a workaround. There are a number of ways to express division in SQL, and with the exception of one, they are all quite complex.

Division identifies attribute values from a relation that are paired with all of the values from another relation. Popular textbook examples are the identification of suppliers who deliver all parts of a particular color. An intuitive solution would be to count the number of distinct red parts, and then look at every distributor to find out which of those deliver all those parts. To express this in SQL, you have to use the set theoretic operators “having” and “group by”, and then you simply count the tuples meeting certain criteria.

Let’s say you have table T1 in front of you and want to find out which A’s have both b2 and b3. You can assume that b2 and b3 are the red parts. If you’ve only been exposed to standard textbook treatments of division in SQL, you may be surprised that the problem can be solved as simply as this:

SELECT A
FROM T1
WHERE B IN ( 
             SELECT B
             FROM T2 
           )
GROUP BY A
HAVING COUNT(*) = ( 
                    SELECT COUNT (*)
                    FROM T2
                  );

Of course you can add a Where clause to the last expression.

The previous example is quite easy to grasp. The same can’t be said about how SQL division is commonly taught. Standard database theory textbooks expose you to a statement that is doubly nested and peppered with two negations. No matter how smart you are, it takes longer to parse than the previous example. I think you can safely call the following a monstrosity:

SELECT DISTINCT x.A
FROM T1 AS x
WHERE NOT EXISTS (
                  SELECT *
                  FROM	T2 AS y
                  WHERE NOT EXISTS (
                                     SELECT *
                                     FROM T1 AS z
                                     WHERE (z.A=x.A) AND (z.B=y.B)
                                   )
                 );

Which one would you prefer?

The examples above are taken from the paper “A Simpler (and Better) SQL Approach to Relational Division” by Matos and Grasser, published in Journal of Information Systems Education, Vol. 13(2). I was quite happy to have come across that paper. Unfortunately, theirs is not a very well-known approach to SQL division. This is unfortunate, since it’s not only easier to grasp, but, as Matos and Grasser write, it also exhibits better computational performance.

The tedium of Scratch can’t be good for kids

I recently had the pleasure of playing around with MIT’s Scratch as part of a team building exercise. In case you are not familiar with Scratch: it is a simple graphical and mouse-driven environment designed for beginners of whatever age. The aim is presumably to gently introduce novices to programming, without exposing them to oh-so-frightening programming language syntax. If you were a six-year old, you’d probably much rather look at Scratch than at Eclipse, and you’d certainly prefer looking at colored code blocks, as you can see below, instead of, say, snippets of Perl. The latter might not just be true for kids, though.

The Scratch user interface

The Scratch user interface

My team had the task of building a simple video game in Scratch, and it quickly dawned on me why this was a rather ingenious idea. It was certainly not because Scratch provides such a great interface, but because the limitations are so severe that you are forced to communicate with your team. Sharing code literally meant communicating by either turning your screen to the other guys or reading the “code blocks” to them.

Being forced to engage in teamwork due to the limitations of Scratch is a nifty idea. Yet, I later on thought about whether I would have wanted to be introduced to programming that way. Granted, it presents an ingenious way to familiarize people with computational thinking. Loops can easily be created by putting conditions together like Lego blocks. This may be fun for a while.

Quite possibly the most abused cat in history

Quite possibly the most abused cat in history

However, where Scratch falls short is in showing how powerful programming can be. For instance, when learning a high-level language, it doesn’t take long until a beginner is able to generate the Fibonacci sequence, or prime numbers, and if this is too pedestrian, then maybe looking at an output showing the first eight Mersenne primes will do the trick. The computer can uncover those within seconds, while it took mathematicians about 2,2000 years to achieve the same. If this isn’t fascinating, then I wouldn’t know what is.

Kids are probably more into graphics than number theory, though. But regarding 2D graphics I could see no advantage of Scratch either. In a high-level language, loops will help you to achieve a lot with very little code. If you wanted to fill the screen with 100 space invaders, then picking the position of the first one and placing the others relative to it is a fairly basic procedure. Unlike staring at Mersenne primes, this may actually be appealing to a slightly larger fraction of kids.

Scratch, on the other hand, feels a lot like work. Instead of letting the computer take care of repetitive steps, you will find yourself copying scripts and assigning it to different sprites. It reminded me of my frustrations with painting as a kid, because if I wanted to change something, I’d normally have to redraw the entire image. Likewise, even thought Scratch does offer a glimpse at the power of programming, you will find yourself doing a lot of manual work, and sometimes you have to literally rebuild something instead of easily changing it.

Just imagine you want to modify the script that governs the behavior of your sprites. If there are eight sprites, then there are eight copies of the script to modify. This means that you’ll first incorporate the changes, and then spend some time clicking and copying and dragging. Making a cat bump into a wall may be fun for a seven-year old, but having to do roughly four times the work to do the same with four cats certainly isn’t.

How would a smart kid react once he found out that in order to move four cats around, he’d have to put together a script each for every single sprite? Or if you told him that he can only place objects with the mouse, but if he wants their placement to be pixel-perfect, he’d have to assign values individually? Again, in a high-level language a loop would take care of this and you’d be done.

I have a hard time not seeing a bright kid getting bored to tears once they go beyond the very limited scope of, say, the basic tutorials. Letting a ball bounce around on the canvas just isn’t so much fun, and snapping code blocks together becomes tedious very quickly. To do anything remotely complicated, you end up clicking and dragging and snapping and duplicating so many steps that you’ll get frustrated as the work required increases exponentially. Scratch thus manages to successfully transfer a lot of the tedium and repetition when interacting with physical objects into the digital domain, but hardly any of the advantages.

Wasn’t the point of learning how to program to remove at least some of the tedium that comes with repetitive tasks? I remembered a statement one of my friends, a guy with a background in bioinformatics, once made. Paraphrased, it was something like:

I feel sorry for PhD students repeating their experiments over and over, but it makes me appreciate even more how great it is to utilize computers, especially when there is a task that has to be done repeatedly. It’s just a matter of executing the same script again.

You don’t even have to look at bioinformatics to come to this conclusion. Some people use scripts to take care of repetitive tasks in Excel, others spend literally hours a day copying and pasting data from one file or application to another. Probably the best thing about Scratch is that it prepares kids for that kind of toxic working culture. However, it’s probably a better idea to keep Scratch out of their sight and instead introduce them to Khan Academy’s computer science section. I haven’t had an in-depth look at it yet, but my first impression of their Processing.js powered platform was quite positive. It even makes me want to play around with it as an adult.