Monthly Archives: October 2012

Automatic Grading of Code Submissions in MOOCs isn’t Perfect Either

There are many obvious problems with the “peer review” model of essay grading Coursera is piloting. On the other hand, one may think that automatic grading of code submissions in MOOCs is much more reasonable. It’s certainly more practical, and it might explain why the majority of current online courses are related to computer science. Obviously, some piece of code either works, or it doesn’t. Yet, a few important issues seem to be neglected. Working code should of course be a necessity, but it’s not all there is to writing “good” code. While one may want to argue that the points I am going to raise are more relevant for real-life software development and less of an issue in university courses, bad practices can still lead to bad habits that only take more time to correct later on.

A big issue is that auto-graders do not recognize poorly structured code. If the code is working but sloppily written, without any comments and poor visual guidance, the software won’t notice it. To some degree this could be rectified, but the current versions of the auto-graders on Coursera, EdX and Udacity are oblivious to it. Questions of coding style are, of course, to some degree subjective. Yet, following well-thought out style guides can lead to greatly increased readability. Even just blank lines to separate functions and variables can go a long way, and certainly few people doubt that it’s a good idea to not write more than a certain amount of characters per line. (It may not be 80, though.) A good tutor, either in industry or at university, will point this out to you. The autograder won’t because it can’t.

Hand in hand with poor structure goes over-engineering. I’m currently going through MITx’s 6.00x Introduction to Computer Science and Programming. Among the introductory CS courses I’ve either taken or had a look at, it seems to be the best one by a wide margin. One reason is that the exercises extend beyond the merely mechanical application of knowledge. For instance, in an early exercise (lecture 4: problem 5), you are asked to find the maximum of three numbers without using conditional statements. Instead, you have to use the in-built min() and max() functions of Python.

Here is the code skeleton:

def clip(lo, x, hi):

'''
 Takes in three numbers and returns a value based on the value of x.
 Returns:
 - lo, when x < lo
 - hi, when x > hi
 - x, otherwise
 '''
 # Your code here

This is not an overly difficult exercise, but you may have to think about it for a minute or two.

One possible solution is:

return max(min(hi,x),lo)

One of the students, though, presented a fabulous case for overthinking the problem. He writes:

The trick here is to know that the boolean True and False have values you can use:
True == 1
False == 0
So you can test for the three conditions (lower, in-range, higher) and store each in a separate boolean variable. Then you can use those booleans (think now of 1 or 0) in a return statement that multiplies the low-test boolean by lo, the in-range boolean by x, and the high-test boolean by hi.
The return value is simply the booleans multiplied by each argument, and added together. That will result in math something like these
1 * lo + 0 * x + 0 * hi # too low
0 * lo + 1 * x + 0 * hi # in range
0 * lo + 0 * x + 1 * hi # too high

This is only possible on a weakly moderated platform, where any post with the veneer of plausibility, and written in an authoritative tone, tends to impress the inexperienced. Indeed, other students were busy thanking that guy for his “help”, until finally someone chimed in and clarified that this solution was entirely missing the point.

Poor structure and complicated code are not the only issues you may face. If you’ve ever worked with legacy code, then you’ve probably come across poorly chosen or variable names. This may not be an big issue in a 50-line script someone writes for his CS 101 class. However, in any context where someone else has to work with your code, this can quickly lead to problems because the other person may need much more time to familiarize himself with it. Besides, that other person may well be the original author in a few months, once he is no longer familiar with the problem he wanted to solve.

What you also sometimes see is people writing code in languages other than English. Yet, English is the de facto Lingua Franca of programming. It’s not so uncommon that someone posts a code snippet somewhere, and asks for help. If a variable is called, say, “naam” instead of name, I don’t necessarily need to know that this is Dutch to correctly guess its meaning. It remains a minor annoyance, though. Besides, there are enough languages out there, and a plethora of possible variable names that bear little to no resemblance to their English counterparts, that guessing won’t help much in the long run.

This is no trifling matter. Just think of Open Office, which is based on an office suite called Star Office. Star Office was originally developed by a team of German developers who, you guessed it, did not comment their code in English. The company behind Star Office was acquired by Sun in 1999, and the software was finally open sourced in 2000. In 2012, though, there are still German comments left in the source code, waiting to be translated into English. The German programmers could have written their comments in English, which may have taken a bit longer. However, by neglecting this proverbial “stitch in time” the problem got compounded. I don’t even want to speculate how much time was wasted on fixing this issue subsequently. Eric S. Raymond writes about this as well in “How To Become A Hacker“, where he points out that English is the “working language of the hacker culture and the Internet.” This situation hasn’t changed.

On a side note, the EdX team recently had some technical issues with their auto-grader. It couldn’t evaluate code that contained Unicode characters. However, this wasn’t a bug but a feature! If anything, you want to discourage students from putting special characters from all the languages in the world in their code. It may even make sense to artificially restrict the auto-grader to the ASCII standard. This doesn’t apply to the real world, but for CS 101 it’s probably appropriate.

In general, I think that having a few units on how to write clean code would be very beneficial as it would help students to acquire good habits. This doesn’t just apply to beginners. There are also some seemingly experienced programmers on those platforms, presumably using online courses to learn a new programming language or brushing up on their computer science knowledge. At least that’s my guess when I see someone posting nicely written code in Python, in which the lines end with semicolons. This only illustrates my point: Old habits die hard, so you better acquire good ones as early as you can.

A Critical View on Coursera’s Peer Review Process

2 Replies

Coursera certainly deserves praise for opening access to higher education, especially since they also offer courses in the humanities, unlike competitors like Udacity or EdX. Solutions to exercises in mathematics or computer science can easily be graded because there is an expected correct answer at undergraduate level courses, but assessing the relative merits of an essay in, say, literary studies isn’t quite so straightforward. Coursera attempts to solve this problem by letting students grade the essays of their peers.

I do see some issues with automated grading even in code submissions, but that’s a topic for another article. Right now I am more concerned with the peer review system Coursera has implemented. I am sure they will attempt to modify their system eventually, but at the moment there are some serious issues. Please note that I am not speaking as an observer on the sidelines. I have sampled numerous courses, and finished finished three so far. Especially in more technical courses, the content seems to be very good, and for a motivated self-learner you could easily substitute a course at a brick-and-mortar university by one of Coursera’s, if you are more concerned about learning something new and care little about getting a paper.

On the other hand, the humanities courses don’t seem to fare that well. You’d really have to lower your expectations. I’ll talk about the shortcomings in a moment, but before that, let’s review the ambitions Coursera has for their peer review process:

Following the literature on student peer reviews, we have developed a process in which students are first trained using a grading rubric to grade other assessments. This has been shown to result in accurate feedback to other students, and also provide a valuable learning experience for the students doing the grading. Second, we draw on ideas from the literature on crowd-sourcing, which studies how one can take many ratings (of varying degrees of reliability) and combine them to obtain a highly accurate score. Using such algorithms, we expect that by having multiple students grade each homework, we will be able to obtain grading accuracy comparable or even superior to that provided by a single teaching assistant.

What actually happens is that every student has to evaluate five essays to receive feedback on his own work, which in turn also gets evaluated by five other students. Your score is merely an average of the scores you have received from other students in the course. Scoring is done according to various rubrics, reflecting the criteria of the assignment. Let’s say you were asked to discuss events that happened in a certain time period. If you did that, you were supposed to get one point, if not, you got zero. Most of the rubrics were quite surprising if your assumption was that people writing those essays actually paid attention to the directions. It turned out that many didn’t. I read a few essays that were completely unrelated. The bar seems to be rather low.

An article on Inside HigherEd recently pointed out some issues with Coursera’s peer review system. I do think that the author had a rather narrow perspective, despite raising awareness of some issues with the nature of the feedback you receive from other students. She pointed out that feedback was highly variable, ranging from a “good job!” to downright abusive language. Further, feedback is one-directional, and there is no way to comment on feedback or challenge a perceived unfair assessment. Given the current state of Coursera with low-quality feedback being the norm, I am tempted to view this as a feature and not a bug. Yet, in a traditional university, a great part of the learning experience consists in discussing your work with your supervisor.

The other issues Inside HigherEd mention go hand in hand: Anonymity of feedback is not necessarily a problem, but the associated lack of community is. In many courses, the forum have a low signal to noise ratio. Of course, this won’t matter if someone finds it deeply gratifying to express “me too!” statements in poor English. I’d imagine that most exchanges via text messages or on Twitter follow exactly this pattern. Personally, I saw little value in threads in which over 100 people reveal which country they are from, though.

But let’s look at methodological problems of the crowd review system Coursera uses. The assumption is that a random number of students is perfectly able to objectively assess the works of others, despite their own shortcomings. This itself is a rather daring hypothesis that hardly passes the laugh test since it presupposes that all students, once they submit their essays, turn into omniscient and objective evaluators who are able to detect the kind of mistakes they made before their role switched from student to peer reviewer. Further down I’ll present an argument that is supposed to show that crowd review can’t even work in a best case scenario, but for now I’ll focus on what I perceive to be the current reality on Coursera.

Providers of MOOCs focus on vanity metrics like the number of “enrollments.” It seems that no matter what kind of course is launched, eventually between 50,000 and 100,000 students sign up. But this number itself is highly inflated. It’s common that not even half the students finish the first week’s assignment. This is not because they are all unmotivated or realize that they had bitten off more than they can chew. Instead, “enrolling” is the only way to sample the course content. If you just wanted to have a look, you’d have to register first. I am fairly certain that if, for instance, the first week of lectures and assignments was available immediately, then the big drop off in numbers could be largely avoided. This would be a straightforward solution, but how impressive would it be if Daphne Koller and Andrew Ng had to tell their investors that some recent changes lowered the number of enrollments by 50%? This is not just an issue of Coursera or Udacity. Twitter and Facebook are also full of inactive accounts, in addition to fake ones. Yet, yet their reported total number of users is rarely questioned in the press.

[EDIT: The world of MOOCs is moving fast. Coursera has very recently begun making all video lectures of about a dozen courses available without registration.]

After week one about half the students are left. This will still be an enormous amount of people. Yet, it doesn’t mean that they are all automatically well-qualified. “Opening up” higher education brings this to light, and only reveals educational deficits. I am far from putting the blame on the students. Many first-world countries rather invest in their military or propping up a global fantasy economy instead of creating equal opportunities by investing in schools or libraries. Therefore, it is little surprise that quite a few of the people who are active on the forums, or participate in the peer review process, would seem quite out of place at a proper university. Given that all it takes to sign up is an email address, this is to be expected.

Unfortunately, an immediate effect is the relatively low quality of the discussion forums, which I had touched upon before. If you skim a few threads, you’ll be amazed at the widespread lack of etiquette, and often poor grammar and orthography. It’s not as bad as a random collection of comments on YouTube, but at worst it’s quite close. As the courses go on, it normally gets better, but the first two or three weeks normally make you want to stay away from online discussions.

The rather diverse student body has ramifications for crowd reviewing as a whole. Put five students with a poor command of English in a room, tell them to grade the each others’s work, and the result won’t be especially awe-inspiring. While this example may sound absurd, it is a rather common occurrence on Coursera. Of course, if you only want to check for basic writing ability, then you can just run a grammar and spell checker. I don’t want to sound condescending, but about half the students whose essays I read did’t even bother with this. But if people can’t even meet this goal, then it’s arguably too much to ask for coherent arguments.

As Coursera expands, I can only see this getting worse, because September will never end. Without strong moderation, the quality of sites degrades. However, since strong moderation isn’t conducive to growth, good sites sooner or later degenerate. It’s probably only a matter of time until people will post “First!” or “+1” on a thread where someone asks a question about algorithmic efficiency.

Finally, let’s consider a best case scenario. One may want to think that in an ideal world you could have stellar students grading the work of other equally stellar students. It’s quite obvious that they would be able to give each other much more meaningful feedback. Yet, they don’t know what they don’t know. A motivated student will therefore learn much more from interacting with a tutor who is able to make him consider a different point of view. In fact, one of the benefits of attending a university course that focuses on writing is that you can discuss your work with someone who is more experienced and more educated. Of course, not all professors are, but at the better universities you should find a lot of smart professors and highly competent tutors. It will take significant breakthroughs in AI to replicate something similar.

In summary, it seems that in a best-case scenario Coursera will stunt the growth of motivated students. This ties in with a recent talk by David Patterson, where he stated that MOOCs are a “godsend for the 99%” since they were better than other options. The “1%” were CS majors at good schools, and their education was undeniably superior to what a MOOC could provide. He was referring to his own Software as a Service course, which was originally offered through Coursera, but is now available on EdX.

David Patterson on the “99%” in Education

Patterson was referring to a computer science course. In the humanities, though, a lot has to change until a similar claim could be made. The worst case scenario I described, which seems rather close to the reality on Coursera, has quite some room for improvement. I am tempted to say that this is an enormous problem, and I don’t see how the goals of “opening up higher education” in the humanities and raising standards can be achieved. In the end there may be the realization that high school education has to be fixed before MOOCs can reach their full potential. We’ll have to see how well Khan Academy will be doing on that front, but it certainly looks promising.

Review: Internet History, Technology and Security — Coursera

3 Replies

I recently finished my third course on Coursera, Internet History, Technology and Security, which was taught by Charles Severance aka. Dr Chuck from the University of Michigan’s School of Information. In a brisk seven weeks, Dr Chuck recapitulated the history of electronic computing, ranging from Alan Turing to the modern Internet, highlighting key people and technologies, but also problems and conflicts. The course relied a lot on interviews with pioneers like the NCSA’s Larry Smarr or Tim Berners-Lee. There was even a short segment with a young and enthusiastic Jeff Bezos, outlining his vision for Amazon.com. Charles Severance then filled in the missing pieces and put those interviews in context.

Charles Severance deserves praise for his engaging teaching style. I think he is an outstanding teacher. While some MOOCs have the professor lecturing in front of a sterile background (cf. Princeton’s Intro to Sociology course), or, worse, merely slap recorded lectures and PowerPoint slides together (cf. Berkeley’s Software as a Service), Dr Chuck invites you to sit in his office with him. You can even watch him take a sip of his favorite soda every once in a while. I won’t mention names, but I think that many other lecturers could learn a great deal from him. Just compare the following screenshots, and you’ll surely notice a difference.

Dr Chuck, teaching Internet History

EdX.org: Intro to AI (Berkeley)

It’s an even more pronounced difference when watching the actual videos. Please note that I’m only talking about the presentation. The Intro to AI course looks excellent, and I’m considering taking it the next time it is offered. Yet, the presentation could be improved considerably. But let’s go back to Internet History.

In broad strokes, the history of the Internet is probably familiar to anyone who uses the Internet for more than poking their virtual friends on Facebook and playing browser games. You certainly don’t have to be a “geek” to have heard about the ARPANET. However, looking at the development of the infrastructure of the Internet in greater detail is quite worthwhile. A particularly illuminating part was the history of the NSFNet, and elaborations on how the backbone of what evolved into the Internet had to be snuck past the lobbyists on Capitol Hill. The pitch was that NSFNet would merely connect supercomputers located at various research universities. The lobbyists considered this to not be a “market”, and thus let the researchers have their way. But then scientists from all over the USA desired to have access to this network. University administrators feared losing their best researchers if they didn’t find ways to give them access to the NSFNet, and within a relatively short period of time it had become the norm for university researchers to have access to this network, and supercomputers, and the rest is history, as the trite saying goes.

Retracing the history of the Internet as it evolved gives an appreciation of the many struggles and serendipitous accidents the then nascent technology went through. It is easy to take things for granted, just like today’s teenagers couldn’t even imagine a world without smart phones or Facebook. However, history could have taken a much different path, and it’s not implausible to think that there was a possibility that the Internet was owned by AT&T and you would be billed by the minute for access. Or just imagine Microsoft had won the browser wars and found ways to inhibit the development of alternative browsers. In that case we probably would have been stuck with Internet Explorer 6 until the cows come home.

On a related note, let’s not forget contemporary attempts by the “elites” to rewrite history. In July, the Wall Street Journal published an op-ed in which Gordon Crovitz claimed that it was an urban myth that the government had “invented the Internet”. Instead it was, of course, due to the private sector. Vint Cerf, one of the “fathers of the Internet” later on stated, “I would happily fertilize my tomatoes with Crovitz’ assertion.” The Internet is nowadays a highly politicized medium, and competing interest groups are feverishly trying to promote their own goals. An example of this was given by Barak Obama who claimed that, “Government research created the Internet so that all the companies could make money off the Internet.” That’s almost as embarrassing as Al Gore, who presumably still thinks that he invented the Internet. Politics surely is a great source of amusement. I don’t get the impression that our “leaders” value truth much, but if they would, then guys like Crovitz or Obama should seriously consider taking Charles Severance’s course next time.

Internet History, Technology and Security is by no means a rigorous course, which isn’t necessarily bad. Dr Chuck explains TCP/IP and other concepts in layman-friendly terms. However, before you know it, you’ll have learnt about the entire Internet protocol suite. This is hardly common knowledge among the folks who harvest crops on FarmVille, even though you’ll probably have a hard time impressing people with a background in computer science. Still, it’s quite refreshing that Dr Chuck didn’t try to cram too much information into this course. Some people may want to know about implementation details, but a curious user is normally served perfectly well with the bird’s eye view.

Similarly, if you just want to write some simple scripts that make your life easier, or play around with Google’s App Engine it’s certainly better to know roughly what happens with the code you write in a high-level language and how it ends up as programs the computer can actually execute. After all, it’s not as if you’re typing magic incantations in your text editor which the computer just knows how to interpret. On the other hand, studying compilers is probably something you’d want to stay away from in this case. Not everybody needs to be a toolmaker.

I would assume that many more people are interested in using technology and having a basic understanding instead of knowing about minute details. If you’re using a power drill, you are normally not concerned about the engineering processes behind it either. However, if you skim a site like Hacker News, it doesn’t take long until people want to tell you that you need to know this and that before you should even start considering using certain technology. One of the more amusing threads had one guy list various branches of high-level mathematics that were deemed absolutely necessary to be familiar with in order to be a good programmer. Sure, some mathematics helps, and some computer science won’t hurt either. But people go overboard so quickly that you can’t help but shake your head. No, seriously, you do not need to study finite state machines before you can write a simple game in JavaScript.

I’m looking forward to Dr Chuck’s next courses, which seem to promise a similar down-to-earth approach. In his last lecture, he mentioned that he intends to offer two follow-up courses, one based on his free book Python For Informatics: Exploring Information, and another on using the Google App Engine. His Python book looks good, so this course is definitely one I’ll consider taking in the future. On his website you’ll also find materials for a more advanced course on Database Application Design, teaching PHP and mySQL. Overall, this strikes me as a suitable course sequence for people who want to use technology for their professional and private goals, without getting bogged down in often quite superfluous theoretical details.

When Did Progress Bars Become Extinct?