Monthly Archives: October 2013

Replicating a BSc in Computer Science through MOOCs

Introduction

Two years into the MOOC revolution we’re now at a stage where the content of entire degree-programs can be found online. Of course, going through MOOCs won’t get you any paper certificates, even though you can certainly pay for virtual proofs of your identity if you are so inclined. Coursera offers a “Signature Track”, and EdX recently added a similar option to a number of courses. You can acquire the knowledge, however.

I’m not suggesting that MOOCs can be a replacement for the entire college experience, though. You’ll miss out on a few aspects, such as interacting with your professors, but in any large university you won’t interact much with your professors either. Further, there is no equivalent of writing a thesis, or getting involved with research. Apart from those limitations MOOCs are a dream come true for an autodidact.

I was wondering whether it was possible to follow a typical CS curriculum solely with (free) online courses, and it turned out that it is largely possible. There is also a paid-for option through the University of London, but their Computing and Information Systems BSc, taught by Goldsmiths, comes with a sticker price of close to GBP 5,000. So, let’s say you don’t want to spend that much money, or much more on a traditional college degree in the UK or US. How far could you get?

Please note that I won’t bother with any “general education” requirements. In Europe 3-year BSc programs are the norm, and you primarily focus on your major subject. In the US, though, you’ve got to endure roughly a year’s worth of courses that are entirely unrelated to the subject you’re actually studying. They are proclaimed to turn you into a well-round person. This is pretty much just marketing speak. What most commonly happens is that people pick courses where they hope to get an easy “A” in, and most students don’t take them seriously anyway. A case in point is the Harvard Cheating scandal of 2012. Oh, and I certainly would feel ripped-off if I were forced to pay several thousand dollars for the doubtful privilege of taking a survey course on popular music. This isn’t an essay on the shady sides of US higher education, though, so I won’t dwell on that topic any longer.

Computer science historically grew out of either engineering or mathematics, and therefore there are differences between programs. Some have a stronger focus on CS theory and mathematics, others include many courses related to electrical engineering in their curriculum. I’ll instead focus on what I consider to be “core CS” courses, and I will highlight some specializations that are well-represented online. CS is not applied IT, though, so anything in the vein of Codecademy doesn’t quite qualify for this article. Some typical courses you won’t find on Coursera et al. yet. However, I’ll point to alternative resources in that case.

Introductory Programming and CS

There is a wealth of introductory programming courses. I think there are benefits in beginning with a functional programming language, which entails a much reduced level of artificial complexity, and the fact that it’s much easier to reason about programs without mutation. A very good first course would therefore be Introduction to Systematic Program Design – Part 1 by Gregor Kiczales, which uses the Lisp dialect Racket. Part II is planned. Those courses are based on the classic introductory CS textbook How to Design Programs by Felleisen et al. I don’t like that book due to its slow pacing. Prof. Kiczales’ course is much more digestible, though.

You’ll probably want to pick up a more mainstream language such as Python or Java. For Python, I would have recommended Udacity’s CS 101 a year ago. That course used Python. A problem with the Udacity platform is that the forums are a wasteland. I don’t like some of their forced attempts at involving students, either. For instance, in CS 101 they asked students to submit exercise questions themselves, and probably in a misguided attempt to be “inclusive” they put exercises online that show very poor style, such as printing a value instead of returning it. Some other student’s exercise, which is likewise part of a set of optional problems for CS101, has a bug I reported over half a year ago in the forum. There has been no response to it at all. However, there are now about a dozen threads of students who are confused by the original problem because their (correct) solution does not take that particular bug into account.

Udacity also offers an introductory Programming in Java course, which I haven’t gone through. It’s probably okay. If you can motivate yourself, I’d recommend Stanford’s CS106A: Programming Methodology for self-study. Mehran Sahami is a fabulous lecturer, and his course is very thorough. I taught myself Java with it, and Allen Downey’s free textbook Think Java.

If Udacity’s CS 101 is not so great, then what’s the alternative if you want to learn Python? I think it’s Rice University’s An Introduction to Interactive Programming in Python. You’ll build increasingly complex games, and by the end of the class you’ll have written between 1,000 and 2,000 lines of code, which will get you a lot of practice. It’s an entertaining class, which I’d recommend even for guys with some years of experience, particularly if you’ve never built simple games.

Those courses will teach you enough programming skills. They should be followed with a course on datastructures and algorithms, i.e. a traditional second course in CS. Unfortunately, the most thorough treatment of that topic is taught in Java: Algorithms (Part I, Part II) by Kevin Wayne and Robert Sedgewick. It would be preferable if an algorithms course was taught in either a language-agnostic way, or in a more expressive language. Tim Roughgarden’s excellent Algorithms: Design and Analysis (Part 1, Part 2) is language-agnostic. This course also includes a review of important data structures. However, to get the most out of these courses you’ll need some mathematical maturity.

Mathematics

It seems that there is relatively little use for continuous mathematics within CS. Calculus is nonetheless commonly taught, which is arguably due to historic reasons. I don’t think you could make a good utilitarian argument for studying calculus within CS. However, you could easily argue that a certain degree of mathematical maturity makes you a lot smarter. You’ll certainly be less impressed by mainstream media information if you know a bit of statistics and probability.

If you didn’t take calculus in high school, then I’d recommend two fun introductory classes from Ohio State University: Calculus One and Calculus Two. Calculus One is an introductory course on single-variable calculus. The presentation of the material is a bit different from what you might be used to in mathematics, but I don’t mind the popularization of the material. For instance, I had no idea what a grain elevator was when I first encountered that term in calculus problems, so I appreciate that Jim Fowler and Bart Snapp use examples you can more easily relate to. Calculus Two covers series. If you like mathematics, you’ll probably enjoy it, but I view the material as mostly optional. You’ll come across telescoping series in some analyses of algorithms, though.

A much more important topic is discrete mathematics. Unfortunately, a MOOC on that topic is still missing. Thankfully, MIT OCW offers a great course in Mathematics for Computer Science, with fabulous lecture notes. There are several versions available. The Fall 2010 version has video lectures, while the Spring 2010 version comes with a very good set of lecture notes.

Lastly, there is linear algebra. I did have high hopes for Coding the Matrix: Linear Algebra through Computer Science Applications by Philip Klein. Unfortunately, this course was an utter disappointment. There were countless technical problems, and occasionally poorly worded questions in the assignments. It was not uncommon that I spent more time trying to please the autograder than actually solving the programming exercises. I also remember one particularly unpleasant Saturday afternoon where I was puzzling over autograder feedback, only to later learn that there was a problem with the grading script that rejected correct solutions. I hope that those issues will eventually get sorted out. An even bigger problem, though, was that the lectures weren’t very good. Philip Klein literally read the dense text on the slides to you, line by line. This was arguably the worst presentation of the roughly two dozen MOOCs I’ve either audited or completed. (I did earn a statement of accomplishment, in case you are wondering, but it was a real drag.)

The big draw of Coding the Matrix, computer science applications, turned out to be much less exciting in practice. You’d work on toy problems that illustrate, say, Hamming codes or image transformations, but the scale was so small that you walked away being thoroughly unimpressed. Of course, we were using a slow interpreted language like Python, and working on small problems. I would have much preferred to have been properly exposed to linear algebra, and then shown realistic applications. Alternatively one could have used highly-performant libraries so that you could have solved moderately sized problems.

EdX has an upcoming course that seems to move more towards that direction, though:
Linear Algebra – Foundations to Frontiers. Then there is also Linear Algebra on MIT OCW with fabulous lectures by Gilbert Strang. He is an enthusiastic lecturer, and he develops the material properly, which makes it easy to follow the material. A further bonus is that he made the linear algebra textbook he wrote freely available as a PDF. However, going through a course on MIT OCW might require more motivation and determination since there are no fixed deadlines, and no automatically graded exercises or exams.

If you’re fine with a less traditional way of teaching mathematics, you could also make use of Khan Academy, which covers calculus, statistics and probability, as well as linear algebra. There is currently very little discrete mathematics on offer, though.

Now we’ve got basic CS, programming, data structures and algorithms covered. There is only one course missing to complete the content of a typical CS minor.

Systems

To round off your basic CS education, one course in “systems” should be added to the mix. Such courses seem to be much more common in traditional CS programs that grew out of engineering departments, while they are either electives or wholly absent in other CS programs.

I’ll admit that I have a poor background in systems, with only one project-based university course under my belt that I didn’t consider thorough enough. This is therefore an area I intend to explore further. The two most interesting options seem to be by the University of Washington and the University of Texas, Austin. I didn’t have enough spare time when it was first offered, but I had a look at the materials, and I got a very good first impression of the University of Washington course The Hardware/Software Interface. A related course is the upcoming EdX offering Embedded Systems – Shape The World.

Specializations

With the requirements of a CS minor out of the way, what would you want to go on to study? I’m quite amazed at the wealth of offerings. Of course you won’t find any cutting edge research seminars online. If you’re serious about CS research, then MOOCs are only a poor substitute, but most of what you’d find in a typical taught BSc or MSc program, as opposed to a research-based one, you can find online as well.

If you’re interested in knowledge that is more intermediately useful, pick Jennifer Widom’s thorough Intro to Databases course. It covers theory, and also a lot of practice. For anyone only wanting to learn a bit of SQL it’s overkill, though.

If networks are what interests you, then you can start by taking the University of Washington’s Computer Networks, followed by Georgia Tech’s Software Defined Networking.

Are you interested in learning more about programming languages and their implementations? In this case, there is a wealth of resources available, too. Peter Van Roy of the University of Louvain, author of Concepts, Techniques, and Models of Computer Programming, is going to offer a course on programming paradigms on EdX. You could follow this up with Dan Grossman’s fabulous course on Programming Languages. That course focuses on the elements of programming languages. A good complement to Dan Grossman’s course is Wesley Weimer’s Programming Languages: Building a Web Browser, which gives you a good foundation for a course in compilers. Wesley Weimer is another one of my favorite lecturers, by the way.

Computer science legend Jeff Ullman is about to offer his course on automata theory for the second time on Coursera. His colleague Alex Aiken teaches one on Compilers. This is another one of the courses I have not taken yet. The syllabus looks quite intimidating, though. It has one of the highest estimates for weekly workload of any course on Coursera, 10 to 20 hours, and judging from feedback on the web, it’s pretty accurate.

A hot topic at the moment is Machine Learning. Coursera lists courses from Stanford, the University of Toronto, and the University of Washington. EdX offers the Caltech course Learning from Data, which has a reputation for being the most rigorous online course in ML.

Traditional AI seems to have taken a backseat compared to Machine Learning in recent years, but it nonetheless has a strong online representation. Udacity now hosts the seminal “AI Class”, Introduction to Artificial Intelligence, taught by Peter Norvig and Sebastian Thrun. Sebastian Thrun also teaches a more advanced class on self-driving cars: Artificial Intelligence for Robotics. Alternatively, you could take the UC Berkeley course Artificial Intelligence on EdX.

Conclusion

While I’ve given an overview of core CS courses and a few specializations, there is a lot more you could learn. There are courses on scientific computing, computer architecture, cryptography, computer graphics, computational investing, parallel programming, and even on computational investing and quantum computing. The list goes on and on. I do miss a course on operating systems, though.

I merely wanted to highlight some of the larger areas of computer science, and show that they are already well-represented online. The selection is largely based on my personal interests. Still, I think my presentation convincingly conveyed that there is, a mere two years after the MOOC revolution started, an absolutely staggering amount of MOOCs available. Just look at the numbers of CS courses, generously interpreted, that are currently listed on the websites of the major providers! EdX counts a total of 18, and so does Udacity, incidentally. Coursera, on the other hand, lists 91. This is as total of 127 courses in CS, and this is not taking into account the many courses that are tangentially related to CS, like mathematics, or statistics and data analysis.