There are many obvious problems with the “peer review” model of essay grading Coursera is piloting. On the other hand, one may think that automatic grading of code submissions in MOOCs is much more reasonable. It’s certainly more practical, and it might explain why the majority of current online courses are related to computer science. Obviously, some piece of code either works, or it doesn’t. Yet, a few important issues seem to be neglected. Working code should of course be a necessity, but it’s not all there is to writing “good” code. While one may want to argue that the points I am going to raise are more relevant for real-life software development and less of an issue in university courses, bad practices can still lead to bad habits that only take more time to correct later on.
A big issue is that auto-graders do not recognize poorly structured code. If the code is working but sloppily written, without any comments and poor visual guidance, the software won’t notice it. To some degree this could be rectified, but the current versions of the auto-graders on Coursera, EdX and Udacity are oblivious to it. Questions of coding style are, of course, to some degree subjective. Yet, following well-thought out style guides can lead to greatly increased readability. Even just blank lines to separate functions and variables can go a long way, and certainly few people doubt that it’s a good idea to not write more than a certain amount of characters per line. (It may not be 80, though.) A good tutor, either in industry or at university, will point this out to you. The autograder won’t because it can’t.
Hand in hand with poor structure goes over-engineering. I’m currently going through MITx’s 6.00x Introduction to Computer Science and Programming. Among the introductory CS courses I’ve either taken or had a look at, it seems to be the best one by a wide margin. One reason is that the exercises extend beyond the merely mechanical application of knowledge. For instance, in an early exercise (lecture 4: problem 5), you are asked to find the maximum of three numbers without using conditional statements. Instead, you have to use the in-built min() and max() functions of Python.
Here is the code skeleton:
def clip(lo, x, hi): ''' Takes in three numbers and returns a value based on the value of x. Returns: - lo, when x < lo - hi, when x > hi - x, otherwise ''' # Your code here
This is not an overly difficult exercise, but you may have to think about it for a minute or two.
One possible solution is:
return max(min(hi,x),lo)
One of the students, though, presented a fabulous case for overthinking the problem. He writes:
The trick here is to know that the boolean True and False have values you can use:
True == 1 False == 0So you can test for the three conditions (lower, in-range, higher) and store each in a separate boolean variable. Then you can use those booleans (think now of 1 or 0) in a return statement that multiplies the low-test boolean by lo, the in-range boolean by x, and the high-test boolean by hi.
The return value is simply the booleans multiplied by each argument, and added together. That will result in math something like these
1 * lo + 0 * x + 0 * hi # too low 0 * lo + 1 * x + 0 * hi # in range 0 * lo + 0 * x + 1 * hi # too high
This is only possible on a weakly moderated platform, where any post with the veneer of plausibility, and written in an authoritative tone, tends to impress the inexperienced. Indeed, other students were busy thanking that guy for his “help”, until finally someone chimed in and clarified that this solution was entirely missing the point.
Poor structure and complicated code are not the only issues you may face. If you’ve ever worked with legacy code, then you’ve probably come across poorly chosen or variable names. This may not be an big issue in a 50-line script someone writes for his CS 101 class. However, in any context where someone else has to work with your code, this can quickly lead to problems because the other person may need much more time to familiarize himself with it. Besides, that other person may well be the original author in a few months, once he is no longer familiar with the problem he wanted to solve.
What you also sometimes see is people writing code in languages other than English. Yet, English is the de facto Lingua Franca of programming. It’s not so uncommon that someone posts a code snippet somewhere, and asks for help. If a variable is called, say, “naam” instead of name, I don’t necessarily need to know that this is Dutch to correctly guess its meaning. It remains a minor annoyance, though. Besides, there are enough languages out there, and a plethora of possible variable names that bear little to no resemblance to their English counterparts, that guessing won’t help much in the long run.
This is no trifling matter. Just think of Open Office, which is based on an office suite called Star Office. Star Office was originally developed by a team of German developers who, you guessed it, did not comment their code in English. The company behind Star Office was acquired by Sun in 1999, and the software was finally open sourced in 2000. In 2012, though, there are still German comments left in the source code, waiting to be translated into English. The German programmers could have written their comments in English, which may have taken a bit longer. However, by neglecting this proverbial “stitch in time” the problem got compounded. I don’t even want to speculate how much time was wasted on fixing this issue subsequently. Eric S. Raymond writes about this as well in “How To Become A Hacker“, where he points out that English is the “working language of the hacker culture and the Internet.” This situation hasn’t changed.
On a side note, the EdX team recently had some technical issues with their auto-grader. It couldn’t evaluate code that contained Unicode characters. However, this wasn’t a bug but a feature! If anything, you want to discourage students from putting special characters from all the languages in the world in their code. It may even make sense to artificially restrict the auto-grader to the ASCII standard. This doesn’t apply to the real world, but for CS 101 it’s probably appropriate.
In general, I think that having a few units on how to write clean code would be very beneficial as it would help students to acquire good habits. This doesn’t just apply to beginners. There are also some seemingly experienced programmers on those platforms, presumably using online courses to learn a new programming language or brushing up on their computer science knowledge. At least that’s my guess when I see someone posting nicely written code in Python, in which the lines end with semicolons. This only illustrates my point: Old habits die hard, so you better acquire good ones as early as you can.