A Critical View on Coursera’s Peer Review Process

Coursera certainly deserves praise for opening access to higher education, especially since they also offer courses in the humanities, unlike competitors like Udacity or EdX. Solutions to exercises in mathematics or computer science can easily be graded because there is an expected correct answer at undergraduate level courses, but assessing the relative merits of an essay in, say, literary studies isn’t quite so straightforward. Coursera attempts to solve this problem by letting students grade the essays of their peers.

I do see some issues with automated grading even in code submissions, but that’s a topic for another article. Right now I am more concerned with the peer review system Coursera has implemented. I am sure they will attempt to modify their system eventually, but at the moment there are some serious issues. Please note that I am not speaking as an observer on the sidelines. I have sampled numerous courses, and finished finished three so far. Especially in more technical courses, the content seems to be very good, and for a motivated self-learner you could easily substitute a course at a brick-and-mortar university by one of Coursera’s, if you are more concerned about learning something new and care little about getting a paper.

On the other hand, the humanities courses don’t seem to fare that well. You’d really have to lower your expectations. I’ll talk about the shortcomings in a moment, but before that, let’s review the ambitions Coursera has for their peer review process:

Following the literature on student peer reviews, we have developed a process in which students are first trained using a grading rubric to grade other assessments. This has been shown to result in accurate feedback to other students, and also provide a valuable learning experience for the students doing the grading. Second, we draw on ideas from the literature on crowd-sourcing, which studies how one can take many ratings (of varying degrees of reliability) and combine them to obtain a highly accurate score. Using such algorithms, we expect that by having multiple students grade each homework, we will be able to obtain grading accuracy comparable or even superior to that provided by a single teaching assistant.

What actually happens is that every student has to evaluate five essays to receive feedback on his own work, which in turn also gets evaluated by five other students. Your score is merely an average of the scores you have received from other students in the course. Scoring is done according to various rubrics, reflecting the criteria of the assignment. Let’s say you were asked to discuss events that happened in a certain time period. If you did that, you were supposed to get one point, if not, you got zero. Most of the rubrics were quite surprising if your assumption was that people writing those essays actually paid attention to the directions. It turned out that many didn’t. I read a few essays that were completely unrelated. The bar seems to be rather low.

An article on Inside HigherEd recently pointed out some issues with Coursera’s peer review system. I do think that the author had a rather narrow perspective, despite raising awareness of some issues with the nature of the feedback you receive from other students. She pointed out that feedback was highly variable, ranging from a “good job!” to downright abusive language. Further, feedback is one-directional, and there is no way to comment on feedback or challenge a perceived unfair assessment. Given the current state of Coursera with low-quality feedback being the norm, I am tempted to view this as a feature and not a bug. Yet, in a traditional university, a great part of the learning experience consists in discussing your work with your supervisor.

The other issues Inside HigherEd mention go hand in hand: Anonymity of feedback is not necessarily a problem, but the associated lack of community is. In many courses, the forum have a low signal to noise ratio. Of course, this won’t matter if someone finds it deeply gratifying to express “me too!” statements in poor English. I’d imagine that most exchanges via text messages or on Twitter follow exactly this pattern. Personally, I saw little value in threads in which over 100 people reveal which country they are from, though.

But let’s look at methodological problems of the crowd review system Coursera uses. The assumption is that a random number of students is perfectly able to objectively assess the works of others, despite their own shortcomings. This itself is a rather daring hypothesis that hardly passes the laugh test since it presupposes that all students, once they submit their essays, turn into omniscient and objective evaluators who are able to detect the kind of mistakes they made before their role switched from student to peer reviewer. Further down I’ll present an argument that is supposed to show that crowd review can’t even work in a best case scenario, but for now I’ll focus on what I perceive to be the current reality on Coursera.

Providers of MOOCs focus on vanity metrics like the number of “enrollments.” It seems that no matter what kind of course is launched, eventually between 50,000 and 100,000 students sign up. But this number itself is highly inflated. It’s common that not even half the students finish the first week’s assignment. This is not because they are all unmotivated or realize that they had bitten off more than they can chew. Instead, “enrolling” is the only way to sample the course content. If you just wanted to have a look, you’d have to register first. I am fairly certain that if, for instance, the first week of lectures and assignments was available immediately, then the big drop off in numbers could be largely avoided. This would be a straightforward solution, but how impressive would it be if Daphne Koller and Andrew Ng had to tell their investors that some recent changes lowered the number of enrollments by 50%? This is not just an issue of Coursera or Udacity. Twitter and Facebook are also full of inactive accounts, in addition to fake ones. Yet, yet their reported total number of users is rarely questioned in the press.

[EDIT: The world of MOOCs is moving fast. Coursera has very recently begun making all video lectures of about a dozen courses available without registration.]

After week one about half the students are left. This will still be an enormous amount of people. Yet, it doesn’t mean that they are all automatically well-qualified. “Opening up” higher education brings this to light, and only reveals educational deficits. I am far from putting the blame on the students. Many first-world countries rather invest in their military or propping up a global fantasy economy instead of creating equal opportunities by investing in schools or libraries. Therefore, it is little surprise that quite a few of the people who are active on the forums, or participate in the peer review process, would seem quite out of place at a proper university. Given that all it takes to sign up is an email address, this is to be expected.

Unfortunately, an immediate effect is the relatively low quality of the discussion forums, which I had touched upon before. If you skim a few threads, you’ll be amazed at the widespread lack of etiquette, and often poor grammar and orthography. It’s not as bad as a random collection of comments on YouTube, but at worst it’s quite close. As the courses go on, it normally gets better, but the first two or three weeks normally make you want to stay away from online discussions.

The rather diverse student body has ramifications for crowd reviewing as a whole. Put five students with a poor command of English in a room, tell them to grade the each others’s work, and the result won’t be especially awe-inspiring. While this example may sound absurd, it is a rather common occurrence on Coursera. Of course, if you only want to check for basic writing ability, then you can just run a grammar and spell checker. I don’t want to sound condescending, but about half the students whose essays I read did’t even bother with this. But if people can’t even meet this goal, then it’s arguably too much to ask for coherent arguments.

As Coursera expands, I can only see this getting worse, because September will never end. Without strong moderation, the quality of sites degrades. However, since strong moderation isn’t conducive to growth, good sites sooner or later degenerate. It’s probably only a matter of time until people will post “First!” or “+1” on a thread where someone asks a question about algorithmic efficiency.

Finally, let’s consider a best case scenario. One may want to think that in an ideal world you could have stellar students grading the work of other equally stellar students. It’s quite obvious that they would be able to give each other much more meaningful feedback. Yet, they don’t know what they don’t know. A motivated student will therefore learn much more from interacting with a tutor who is able to make him consider a different point of view. In fact, one of the benefits of attending a university course that focuses on writing is that you can discuss your work with someone who is more experienced and more educated. Of course, not all professors are, but at the better universities you should find a lot of smart professors and highly competent tutors. It will take significant breakthroughs in AI to replicate something similar.

In summary, it seems that in a best-case scenario Coursera will stunt the growth of motivated students. This ties in with a recent talk by David Patterson, where he stated that MOOCs are a “godsend for the 99%” since they were better than other options. The “1%” were CS majors at good schools, and their education was undeniably superior to what a MOOC could provide. He was referring to his own Software as a Service course, which was originally offered through Coursera, but is now available on EdX.

David Patterson on the "99%" in Education

David Patterson on the “99%” in Education

Patterson was referring to a computer science course. In the humanities, though, a lot has to change until a similar claim could be made. The worst case scenario I described, which seems rather close to the reality on Coursera, has quite some room for improvement. I am tempted to say that this is an enormous problem, and I don’t see how the goals of “opening up higher education” in the humanities and raising standards can be achieved. In the end there may be the realization that high school education has to be fixed before MOOCs can reach their full potential. We’ll have to see how well Khan Academy will be doing on that front, but it certainly looks promising.

2 thoughts on “A Critical View on Coursera’s Peer Review Process

  1. Pingback: Done with Coursera Johns Hopkins Data Science Specialization | Jeff Heaton

  2. Pingback: 网络教育中的“中国式”问题:Coursera学生太多的弊端 | 内容采集

Leave a Reply

Your email address will not be published. Required fields are marked *

Spammer prevention; the answer is an integer: * Time limit is exhausted. Please reload CAPTCHA.