Category Archives: Education

The Value Proposition of the Georgia Tech-Udacity Online Master of Science in Computer Science (OMS CS)

I used to be bullish about MOOCs, i.e. massive open online courses. That was until they stopped being open by erecting paywalls, and instead of striving towards replicating university-level education online instead watered down the material, chopped up single courses into entire course sequences, and put a highly disingenious marketing spiel on them. The Georgia Tech-Udacity Online Master of Science in Computer Science (OMS CS) seems to share many of the same downsides of MOOCs, while the upsides are largely imaginary. Let me elaborate.

First, I want to be clear that this article is based on a Central European background, where education works a bit differently than in the US or UK. Attending university is not nearly as common as in the US. On the other hand, we also lack thousands of insitutions of questionable reputation. This is sometimes flippantly summarized in the statement that the US may well have most of the 50 best universities in the world, but they also have a lot of the 5000 worst. In Europe, a decent Master’s degree is supposed to introduce you to research. At the very least, you will produce a lengthy research-oriented thesis, which may very well be a stepping stone to a future PhD. Coursework-only MSc degree programs, like the OMS CS, are rather unheard of.

In fact, just taking a certain number of taught courses and receiving a piece of paper in return seems like a rather dubious value proposition. After all, the value of a degree program is hardly in the sum of the courses you have taken. Particularly, if those courses only cover standard material, which you could easily have picked up yourself from a textbook, or MIT OCW. Instead, I see most of the value in getting at least some exposure to ongoing research, and maybe even having the chance of participating in research. On-campus students at Georgia Tech certainly have this possibility. However, students in the OMS CS do not. Thus, they are clearly second-rate citizens.

Obviously, research is constantly progressing. This has the consequence that beyond the undergraduate level courses are hardly static. If you taught a course on automata theory in 2016, you probably could have used the same material in 1996, and even in 2036 you or your successor would do fine with it. The reason is that this particular field is bascially a dead end for research, if I am not mistaken. In any case, there is certainly not a lot of funding in this area. To make it simpler, just take a basic course on data structures: linked lists, stacks, queues, trees and so on will be required basics for computer science majors until kingdom come.

On the other hand, at the graduate-level you are getting closer to the research frontier. In several of the courses in my CS MSc program we at the very least briefly touched upon ongoing research problems. In fact, it was not at all uncommon that professors would use at least one lecture to present their own research, which was, unsurprisingly, rather typical in courses that were advertised as joint MSc/Phd courses. Those normally included local and external guest lecturers as well. Furthermore, at least some recent papers are included in a typical graduate-level course. Sometimes you may even get access to papers that are under submission. As a consequene, the content of a graduate-level course is hardly static but will instead, albeit to varying degrees, depending on how far a particular field moves, change year after year.

In contrast, let us now look at what the OMS CS does. Frankly, it is rather off-putting, and, at least from my point of view, contradicts what graduate-level education should stand for. Instead of leading you closer to the research frontier, they instead record courses, conserve them, and broadcast them in frequent intervals, apparently with only little changes. Professors receive a one-off compensation for creating a course, and a fixed fee for each run. This alone should be reason enough to turn every intellectually curious student off. Besides, the Udacity courses I took were among the weaker MOOCs I have encountered, so the thought of having Udacity-style MOOCs as the main teaching vehicle makes me raise an eyebrow.

There are other downsides, the most important one being the lack of personal interaction. Sure, you may be able to post on a course forum, but this is worlds apart from meeting with a faculty member for a face-to-face dicussion. In fact, persoal interactions may be one of the most important aspects of a college education. No, I am not going to pretend that every student you will meet at the graduate level is highly motivated and has an IQ that is off the charts, or that every member of faculty will be a great teacher. However, you will interact with some ambitious and knowledgeable people, and you will learn a tremendous amount from them. This may be in the form of group or partner projects in regular courses, or in research projects, such as your thesis project, which will happen under the supervision of a faculty member, who may be a world-class expert in his field. A side effect is that you build your personal network. In that area, the OMS CS seems to offer very little.

During the OMS CS you will not get the chance to make many personal contacts. Consequently, you will most likely not be able to get decent references, and you won’t produce a thesis. Thus, the OMS CS will be of rather dubious value if you consider pursing a PhD at some point. Your coursework will not count for all that much in your application. Instead, demonstrated research experience is much more important, and good recommendations.

The saving grace for the Georgia Tech-Udacity Online Master of Science in Computer Science therefore seems to be that it allegedly boosts your credentials for work in the industry. Frankly, I am not even convinced that this is the case. If you want to work in a big company, then not having a degree will count heavily against you. However, the difference between an MSc and a BSc is rather negligible. In that regard, I am tempted to say that a coursework-only MSc ranks below a BSc that includes a thesis project. Further, an MSc with a clear research component surely ranks much higher than a coursework-only MSc.

Terminal Master’s programs are simply cash-cows for universities. If you are from Europe and consider it, I would wonder why you just don’t get an MSc at a decent local universtiy for free. In the US, though, the stigma is clearly that a terminal Master’s program is not leading towards research and simply repackages undergraduate courses at a steep price, even though that price tag is reduced for the OSM CS. Normally, the MSc is a consolation price if you decide to, for whatever reason, not continue with a PhD. You were still paid for it. On the other hand, a terminal MSc, particularly a coursework-only degree, signals that you had to pay for what someone else was able to get for free.

To wrap this up, let’s just look at the numbers. Of 8,000 applicants, 3,000 were admitted. Admitting such a high percentage of applicants does not look good. Sure, waive the “widening access” flag all you want. However, the problem remains that at the graduate level you need to have a higher bar, because just a few poor students will worsen the experience for the other students. For an extreme example of this, look at the discussion forums of a typical MOOC! There is so much noise that it’s next to impossible to have a decent discussion. However, if you look at the fact that Georgia Tech charges not only for each coures, but also for each semester you are enrolled, it makes perfect fiscal sense to admit even students who are not likely to graduate. In summary, the OMS CS is certaily great for Georgia Tech and Udacity, but I am not at all convinced that it is a smart choice for students. If you consider this program, then don’t be blinded by the Georgia Tech logo, but instead consider the financial cost of the program in relation to what you will gain. To me it seems like a rather bad deal.

Diceware, Security Theater, and “Women in Tech” Propaganda

I recently came across an article on ArsTechnica that was the epitome of the inane “More XX chromosome carriers in Tech” propaganda that has been infesting this industry for quite some time. Of course, the problem is not that there should be no women in tech, but that there is a celebration of pseudo-achievements that do the industry, the organizations behind this social engineering initiative, and women themselves a disservice.

In the story, entitled This 11-year-old is selling cryptographically secure passwords for $2 each, we learn that Mira Modi the daughter of Julia Angwin, a “veteran privacy-minded journalist” is selling “cryptographically secure passwords”. As you you read the article, you are tempted to conclude that Mira Modo has simply been paraded around by her mother, as she seems to have little insight into what she is actually doing. In the end, the entire story boils down to her mother looking for a cute marketing gimmick, which made her enlist her daughter for selling passwords on her book tour — and now there is a website, too!

The idea behind Diceware passwords is simple: you roll five dice, or one die five times, and then look up the corresponding word in a publicly available list, which pairs a number combination with a string. For instance, if you throw 5-5-6-5-5, your passphrase is “suave”. One word itself is not particularly secure, since it would yield to a simple dictionary attack. Thus, you should repeat this process a few times, and concatenate all phrases you generate this way. There are 6^5 = 7776 words in the Dicewords list, so feel free to exponentiate this number at will. For instance, after six rounds, you end up with one out of 7776^6 possible combinations.

Overall, following this procedure is trivial. Yet, ArsTechnica found plenty to gush about. Mira Modi is quoted with,

I wanted to make it a public thing because I wasn’t getting very much money,” she said. “I thought it would be fun to have my own website.”

Each time an order comes in, Modi rolls physical dice and looks up the words in a printed copy of the Diceware word list. She writes—by hand—the corresponding password string onto a piece of paper and sends it by postal mail to the customer. (Full disclosure: I ordered two.)

That sounds like an amazing business indeed. In fact, she does not even generate enough money to cover the cost of a domain name and hosting. Considering that she has getting free publicity by her mother, it’s not a very impressive business at all. Yet, this does not keep the fawning journalist from demonstrating to the world his naivety:

If she kept busy at it full-time, Modi would be raking in about $12 per hour—fully one-third more than New York state’s $8.75 minimum wage, which is set to go up to $9.00 on December 31, 2015. As of now, she said she’s sold “around 30” in total, including in-person sales.

Oh, so she barely sells any passwords, but if she suddenly managed to sell more passwords in a day than she sold since starting her business, she would beat minimum wage. Needless to say, if she ever had to hire employees, in case her business enters hockey stick growth, she won’t even break even, assuming labor, materials and shipping aren’t for free. I found the numbers amusing, though, because they suggest that it takes her ten minutes to generate a pass phrase. Guess what also takes about ten minutes? Writing a script that does this for you automatically and which can spew out pass phrases until the cows come home. In very straightforward Python, it looks like this (also on Github):

from random import randint

WORDLIST = "diceware.wordlist.asc.txt"
PARTS    = 6

def createPhrase(dct, nums):
    result = ""
    for i in range(0, nums):
        result += dct[throwDice()] + " "
    return result

def throwDice():
    result = ""
    for i in range(0,5):
        result += str(randint(1,6))
    return result

def main():
    with open(WORDLIST, "r") as ins:
        diceDict = {}
        for line in ins:
            (nums, word) = line.split()
            diceDict[nums] = word

    print createPhrase(diceDict, PARTS)

main()

A clear giveaway that little Mira Modi does not really grasp her business, which was arguably all the creation of her mother anyway is that the ArsTechnica article shows the following picture:

Can publicly available information be top secret?
Can publicly available information be top secret?

That’s right, a publicly available list is declared “top secret”.

What I find grating about articles like this is that they celebrate non-achievement. It would be far more impressive to find an 11-year old who is able to write a simple program that would automate this task, and if said 11-year old had a well-connected parent, we would surely see articles in ArsTechnica about them as well, which would be just as misplaced because a modicum of precocity is hardly worthy of a story. This ties into a much bigger problem of today’s education at the primary and secondary level. Pupils are coddled, and don’t really learn to exert real effort. But those who continue their education at a decent university and who do not enroll in a fluffy subject will have a rude awakening. After covering Diceware and hinting at the downsides of today’s education, it’s now time to move on to talk about “security theater”.

I mentioned that a computer program would be able to automate the task of generating pass phrases. The Diceware webpage does not agree, but it is mistaken:

Can I use a computer to generate Diceware passphrases?

Generating truly random numbers using a computer is very tricky. The so-called random number generators that come with most programming libraries are nowhere near good enough. For most users dice is by far a better way to select passphrase words.

This is nonsense. To be more polite, it is an example of FUD. True, the randomness in your machine is not truly random. Using a computer program for generating Diceware phrases, though, is safe for precisely the reason that it is simply implausible that an attacker would be able to recreate and maintain the exact same state (!) your machine was in when you ran a Diceware generator. If an entity was as powerful as that, it surely would find more efficient uses for their resources.

FUD seems to be the forte of Arnold G. Reinhold, the creator of Dicewords, as he gives the following advice:

For maximum security make sure you are alone and close the curtains. Write on a hard surface — not on a pad of paper. After you memorize your passphrase, burn your notes, pulverize the ashes and flush them down the toilet.

This would have worked in the world of James Bond in the 1960s. However, today’s super villains of course have cameras installed everywhere, so you better construct the resulting passphrase entirely in your head. Indeed, once you try them on, tinfoil hats are somewhat comfortable. Let’s get back to reality, though. In case you have not been following the news for a couple of years, you should know that privacy on the Internet does not really exist anymore.

Widespread electronic surveillance is commonplace. For instance, the NSA has direct access to Google, Facebook and Apple, as well as others, though their PRISM program anyway. (Here is an article from The Guardian on PRISM.) Thus, they would not need to “hack” your password. Also, there have been several cases of backdoors. So, despite advances in cryptography, it seems that the intelligence agencies of the world would simply sidestep the issues of passwords altogether by getting access to your data in some other way. This does not mean that all your passwords should be “password”, though. Thus, let’s hope that Mira Modi always draws the curtains before generating Diceware pass phrases and does not communicate with her customers electronically.

Compiling Agda to System Fω in Theory

In early June I submitted the final revision of my Bachelor’s thesis, with the potentially impenetrable title “Compiling Agda to System Fω in Theory”.

I wrote this thesis under the supervision of Andreas Abel (Chalmers University of Technology), who is one of the main Agda implementers.

Abstract:

We develop a theoretical foundation for compiling the programming language Agda to System Fω, which is a stepping stone towards a compiler from Agda to Haskell. The practical relevance for software engineering and the problem of providing correctness guarantees for programs is highlighted. After describing relevant λ-calculi, we specify the semantics for compiling Agda to System Fω. Finally, we illustrate those compilation rules by manually translating several Agda code examples to System Fω.

The full paper is available here:
http://gregorulm.com/files/GregorUlm.BSc.FinalRevision.pdf

Computer Science Resources for Autodidacts

Introduction

My post Replicating a BSc in Computer Science through MOOCs is one of the more popular on this blog. A few people were asking me whether I could recommend some books that are particularly suited for autodidacts. In the following, I’m listing resources I found suitable for self-study, including some particularly suitable online courses (MOOCs) to complement your studies.

I do not want anybody to take my word as gospel, though. I’m going to list resources I found helpful. I have not conducted a systematic literature review but merely went through a number of books until I found one I could work with. I recommend the resources below based on my own experience. There may be better resources out there. So, if you’ve found some other books or online courses helpful, then feel free to comment below. Further, there are a lot of rather poor books and courses, which I’m not going to mention, as I found it more constructive to focus on positive examples.

I’m only focussing on key topics in the undergraduate curriculum, i.e. programming, data structures, discrete mathematics, and algorithms. I may post a follow-up on more advanced resources some time in the future, possibly on functional programming and programming language theory. The time spans I mention are based on the assumption that you have a few hours per day available for self-study. Overall, this study program will take you well over a year to finish, and will require serious dedication.

Imperative and Object-Oriented Programmming

I’d recommend learning Python since it’s easy to get started with it. It’s pretty painful to use for anything moderately complex, though. Downey’s book Think Python: How to Think Like a Computer Scientist is concise and contains plenty of exercises. If possible, go through Rice University’s Coursera course An Introduction to Interactive Programming in Python as well. These two resources alone will have you write several thousand lines of Python code. Four months seem like an adequate time span for these two resources.

Java is more of a means to an end if you want to put a skill on your CV recruiters care about. Once more, I’d recommend Downey. Think Java is likewise concise. Since you’ve learnt Python already, Java should be relatively straightforward to pick up. Therefore, schedule one month for this book.

Data Structures

You should have had exposure to data structures in the previously mentioned resources. To gain further practice, go through both the Java and Python sections on CodingBat. You should complete all exercises on your own. If you struggle, then brush up on the basics of Java and Python.

Two months should be plenty of time for all exercises. There are several hundred on there, but if you do a handful to a dozen every day, you’ll progress quickly. You should get to a point where you can solve those exercises straight away, maybe with the exception of a few of the more elaborate ones.

Discrete Mathematics

Schaum’s Outline of Discrete Mathematics is very good. It’s great to have just for the exercises. If the explanations are too sparse for your taste, then Eric Lehman and Tom Leighton’s lecture notes Mathematics for Computer Science, which are freely available online, will serve you well. For instance, I haven’t come across an explanation of generative functions that was clearer and more accessible than theirs.

For my university course in discrete mathematics I bought Rosen’s Discrete Mathematics and Its Applications, as it was the prescribed course literature. I thought it was lacking in some regards. In particular, I wasn’t too fond of the approach of using several, sometimes convoluted, examples without explaining the underlying principles in the abstract. Schaum’s makes a great companion for it, though.

A book some people recommend is Concrete Mathematics by Graham, Knuth, and Patashnik. In the very first line of the preface of said book you can read that it was written for a course at Stanford that was primarily taken by graduate students. I’m tempted to say that just like it is the case with Knuth’s The Art of Computer Programming, people recommend it because they’ve heard the title somewhere and had the impression that it was an authoritative book.

There is no MOOC on discrete mathematics, as far as I know. The Saylor Academy offers course materials, though. Their exam is a bit on the easy side and therefore a good lower bound to aim for. If you don’t have access to a university course, then using the Saylor Academy materials as a guide, and supplementing them with exercises taken from Schaum’s Outline would be a good strategy. You’ll probably need two to three months of part-time study.

Algorithms

Probably the most accessible algorithms textbook is Algorithm Design by Kleinberg and Tardos. I’ve only studied part of this book in the context of taking Stanford’s Coursera course Algorithms: Design and Analysis. Alternatively, Introduction to Algorithms on MIT OCW seems like a good resource, based on the lectures I watched.

If you can, clear your schedule and take the Algorithms course on Coursera, which is a two-part course sequence. It will take you three to four months of study. I haven’t found the time to take part II yet, though. However, I did have a very good impression of part I, and benefited quite a bit from it.

Review: Introduction to Functional Programming (edX)

I just finished the last problem set in Erik Meijer’s online course Introduction to Functional Programming. This seems like a good opportunity to briefly reflect on it. There aren’t a lot of functional programming MOOCs available. I’m only aware of two Coursera courses, one on FP in Scala, and another on FRP in Akka. While Erik Meijer repeatedly made the point that his course was not on Haskell but on FP in general, there most certainly was a strong focus on exploring functional programming with Haskell.

The recommended course literature was Graham Hutton’s Programming in Haskell, which is incidentally the same book I used when I took a similar course at university. As far as programming-related textbooks go, Hutton’s book is among the best, as he explains topics concisely, and poses carefully selected exercises to the reader. It’s the exact opposite of your typical Java or Python textbook, or the, in my opinion, highly overrated “Learn you a (Haskell|Erlang) for Great Good” books, but that may be a topic for another article.

If you just used the textbook, you’d be well-prepared for the homework exercises and labs already. Still, I enjoyed Erik Meijer’s presentation, and his sometimes quirky remarks, such as that he wants his students to “think like a fundamentalist and code like a hacker”. In special “jam sessions” he demonstrated functional programming concepts in other languages, such as Scala, Dart, Hack, and Kotlin. What I also liked was that some of the labs were offered in several programming languages. The very first lab was offered in Haskell, Groovy, F#, Frege (!), and Ruby, for instance, which led me to playing around with some new languages.

This course is certainly, for the most part, comparable to a university-level course in functional programming. I do have some gripes with the form of the assessment, though. For instance, a common type of question asked you to indicate which of a given number of alternative function definitions were valid. Sometimes the code was obfuscated, and since you couldn’t just copy and paste it, it could easily happen that a GHCi error message was due to a mistake you made while copying the program. This might not have been a problem if those questions had been rare, but because there were so many of them, the tedium was palpable. In later weeks I skipped those questions since I saw very little educational value in them.

Further, the labs were a bit too straight-forward for my taste, but that may be a limitation of the MOOC format. The advice “follow the types” was repeated like a mantra. It is of course a good idea to use type signatures as a guide. However, being given a template that contains correct type signatures and that only requires one to write a few lines of code — if I’m not mistaken, in some weeks the labs required just about a dozen lines of Haskell in total — seems partly misguided. Obviously, it is much more difficult to design a program yourself, and define the correct type signatures. Merely filling in function definitions, on the other hand, is somewhat akin to painting by numbers. It might therefore be a good idea to add a peer-reviewed project to this course in its next iteration. My experience with peer-review on MOOCs is mixed, but it’s better than nothing. After all, the theory behind FP is sufficiently covered. It’s just that the course doesn’t require writing a lot of code, which could only be excused on the labs that focus on type classes and monads.

Overall, Introduction to Functional Programming is a very good course. However, if you’re taking it as a novice, you might want to do the exercises in Hutton’s book in order to get more practice with programming in Haskell.