Category Archives: Uncategorized

Summer Internship Opportunity at the Fraunhofer-Chalmers Centre in Gothenburg, Sweden

The Systems and Data Analysis department at the Fraunhofer-Chalmers Centre for Industrial Mathematics intends to hire two students for a summer internship. The starting date is flexible; the salary is competitive for Sweden. However, note that there is no relocation assistance. You would be working under my supervision and assist in the further development of a prototype of a distributed system, which I built from scratch.

Here is the job ad:


Become a Summer Intern at the Fraunhofer-Chalmers Centre for Industrial Mathematics!

The Fraunhofer-Chalmers Research Centre for Industrial Mathematics (FCC) offers Software, Services and Contract Research for a broad range of industrial applications. Modelling, Simulation and Optimization of products and processes can boost technical development, improve efficiency and cut costs of both large and small businesses. Since 2001, our highly skilled team of mathematicians and engineers has successfully solved problems for more than 170 clients. We combine consultancy services with innovative research and development based on a wide spectrum of competences.

We are looking for two ambitious students with backgrounds in computer science or related fields to assist in an ongoing applied research project in the Systems and Data Analysis department. You will contribute to the further development of a distributed system in an Internet of Things context. There are several areas you could get involved in. Ideally, you have gained experience in two of the following:

- C programming
- Erlang programming; alternatively experience in any other programming language that supports the actor model
- design and implementation of embedded domain-specific languages
- front-end development (HTML5)

Your ideal profile:
- Chalmers student at the Master's level, preferably in the penultimate year
- Pursuing a degree in Computer Science or a similar field
- Previous work experience in the software industry or as a student research assistant
- Ability to work independently, based on supervision on a weekly basis

If you maintain a private code repository (Github, Gitlab, Bitbucket etc.), then please highlight this in your application. If you have other samples of work to show, such as a portfolio of projects on a blog or private website, we would be keen to have a look.

This internship is a full-time fixed-term position for six weeks. The starting date is flexible.

Contact persons:

Mats Jirstrand, Head of Department
mats.jirstrand@fcc.chalmers.se, 031-772 42 50

Emil Gustavsson, Applied Researcher/Data Scientist
emil.gustavsson@fcc.chalmers.se, 031-772 42 92

Gregor Ulm, Research and Development Engineer
gregor.ulm@fcc.chalmers.se, 031-772 42 71

Please send your application, marked "Summer Intern SYS", consisting of a cover letter, CV, and a current academic transcript, to recruit@fcc.chalmers.se.

Interviews will be held continually. Please apply as soon as possible.

www.fcc.chalmers.se

Latency and Throughput in Center versus Edge Stream Processing

Earlier this year I finished my Computer Science Master’s thesis project at Chalmers University of Technology in Gothenburg, Sweden, which has the title “Latency and Throughput in Center versus Edge Stream Processing: A Case Study in the Transportation Domain”. The project report as well as most of the code are available on my Gitlab repository msc-thesis-streamprocessing.

Abstract:

The emerging Internet of Things (IoT) enables novel solutions. In this thesis report, we turn our attention to the problem of providing targeted accident notifications in near real-time. We use traffic data generated by Linear Road, a popular benchmark for stream processing engines, and simulate a possible real-world scenario in which connected cars continuously send position updates. We analyze this stream of position updates with the goal of identifying accidents, so that targeted accident notifications can be issued. This means that only cars within a certain distance of a known accident site will be notified.

In a real-world scenario, the required data analysis could be performed in different ways. We consider two possibilities. First, position reports are aggregated by road side units (RSUs) and forwarded to a central server. Afterwards, the results are sent back to the cars, again involving RSUs for transmission. We refer to this as center stream processing. Second, all data analysis is performed on RSUs. An RSU is less powerful than a server. However, RSUs are located much closer to the cars than a central server. We refer to this case as edge stream processing. Performing computations directly on RSUs has the benefit that the cost of the roundtrip time for data transmission from RSUs to the server and back will be avoided. We use a contemporary stream processing engine for data analysis, and compare latency and throughput of an implementation of our solution to the accident notification problem in both cases.

PLC Factory – Automating Large-Scale PLC Development

I spent this summer working at the European Spallation Source (ESS) in Lund, Sweden. My contribution was the creation of PLC Factory, a tool that automates development for programmable logic controllers (PLCs). A paper on this project is forthcoming.

The code of PLC Factory is available on the Bitbucket account of ESS. PLC Factory was developed as FOSS. Thus, I can make the code I wrote available on my private GitLab account as well.

The draft of the PLC Factory paper is likewise available on GitLab. Here is the full title information including the abstract:

PLC FACTORY: AUTOMATING ROUTINE TASKS IN LARGE-SCALE PLC SOFTWARE DEVELOPMENT

Authors:
G. Ulm, D. Brodrick, N. Levchenko, F. Bellorini

Abstract:
At the European Spallation Source in Lund, Sweden, the entire facility including all its instruments will be controlled by a large number of programmable logic controllers (PLCs). Programming PLCs, however, entails a significant amount of repetition. It is thus an error-prone and time-consuming task. Given that PLCs interface with hardware, this involves economic aspects as well, due to the fact that programming errors may cause damage to equipment. With PLC Factory, we managed to automate repetitive tasks associated with PLC programming and interfacing PLCs from EPICS. This tool is being adopted at ESS, and has shown potential for a large increase in productivity compared to the status quo. We describe PLC Factory as well as its embedded domain-specific programming language PLCF#, which it is built upon.

Greed has ended the promised MOOC revolution

Introduction

In this post, I will critically reflect on the MOOC phenomenon. I will briefly retell the recent history, discuss their shift towards monetization as well as the cynical behavior that emerged in this context, and conclude with my view on the value proposition of MOOC providers. I will start with my personal background, as it relates to MOOCs.

I have completed over 40 MOOCs within roughly the last three years. My main focus was on computer science and related subjects, but I also took a substantial number of courses in other disciplines. Those courses were a great complement to the software engineering Bachelor’s program I was studying during that time. Initially, I was a great supporter of MOOCs, despite quickly noticing their shortcomings, such as automatic grading and peer reviews. I even readily agreed to be a Community TA for one course at Coursera in early 2014, in order to give something back to the community. These days, though, I would no longer volunteer for any for-profit MOOC provider, after seeing what the major players Coursera, edX, and Udacity have turned into.

A brief history

Stanford offered three university courses online in 2011, including a very popular Artificial Intelligence class by Peter Norvig and Sebastian Thrun. The astounding public response was arguably fueled by the prospect of being able to get access to the same materials as Stanford students, sans personal interaction with teachers and teaching assistants, and earn a PDF certificate as proof of your attendance. Online courses had been available for a long time, though. The most prominent provider was MIT with their OpenCourseWare (MIT OCW) platform. The downside of it is that if you wanted to recreate the learning experience of an MIT OCW course, you would have to grade your own assignments and exams, which is of course a ludicrous proposition. Thus, as an autodidact, you were hardly better off compared to working through a textbook at your own pace. At least in computer science and mathematics there are a lot of high-quality textbooks available, which often include answers to at least some exercises.

MOOCs seemed to want to improve the lives of autodidacts. Indeed, a very large part of early adopters were people who had already completed college degrees. A lot of the earlier courses were furthermore reasonably rigorous. There were shortcomings with regards to the assignments, since those tended to rely a lot on multiple choice questions, which entails simplifying the assessment. More extensive projects were graded by your peers, which is a far cry from having a teaching assistant provide personal feedback, like it is the case at university. Interactive exercises are great to have, as you get immediate feedback. In short, if you went through a MOOC, you got a pretty decent experience, as long as you took courses that followed textbooks, provided extensive lecture notes, and are in subjects that do not require specialized equipment.

Going sour

In 2012 MOOCs seemed to be too good to be true. Sure, there were hardly perfect. The limitations seemed rather substantial. All drawbacks can be solved, and have been solved. Particularly edX seemed to focus on high-quality courses, aiming to get as close to the class room experience as possible. In some of the courses I took, even course books were made available for free, albeit in a cumbersome interface. If you find an edX course offered by MIT that has a counterpart on MIT OCW, you may find that the content on edX is in no way watered down. I probably should not make a hasty generalization. Thus, I’d like to modify the previous statement to say that the few courses I compared on both platforms seemed more or less equivalent.

I was particularly impressed by some of the innovations of edX. For instance, some MITx courses have the option of providing feedback from an MIT teaching assistant — for a substantial fee, of course. In some of the computer science courses, the assignments are fairly substantial, going beyond what I have seen at Coursera or Udacity. I have not encountered any complex projects, as they would need more guidance than current MOOC interfaces can provide. There are open challenges, like administering group projects, which are a common in CS education. I am not aware of any attempt of replicating this in a MOOC.

Of the three MOOC providers I am most familiar with, Coursera, Udacity, and edX, it seems that only edX is interested in reaching university standards, with the aforementioned limitations, of course. Udacity has been going the trade-school way, offering “nano degrees” in specialized subjects like mobile app development or web development. The Udacity courses I have taken were okay, but I would not at all be interested in paying for them, as they had too many drawbacks, and seemed very much focussed on applying knowledge instead of fundamental understanding.

Coursera has been undergoing a rather dramatic shift. At first, it seemed that they were aiming for academic excellence. I liked their focus on the sciences, and in particular their broad offering in computer science. It was unfortunate that many courses would only run once a year. This problem is currently being resolved, considering that a lot of courses on Coursera have been remodeled into self-paced versions. This was the only positive change I can see on that platform. Otherwise, I am deeply disappointed. While there used to be a fair number of standard-length courses that were clearly based on their campus counterparts, the current situation is much different. There were instances were single courses were broken up into several much shorter ones and repackaged as a “specialization”.

Of course innovation comes with a price tag at Coursera. I will talk about the value proposition of MOOCs a little later. For now, let me only state that Coursera seems to have taken some inspiration from the kind of sites that pop up on your screen if you don’t suppress ads in your web browser. You know, those with long and fancy sales letters and bullet points that reiterate what a great deal their dubious product is. Coursera’s Data Science specialization is currently available for $470, and consists of not one but a staggering ten courses. However, those ten courses do in no way correspond to ten proper university courses. In fact, they used to be one or two longer courses. This reminds me of bloggers who want to sell you a bundle of ebooks, where each “book” contains 20 to 30 pages with just a few dozen words.

Lastly, edX has not been inactive on the monetization front either. While it at first seemed there was a focus on standard-length courses and adding a price tag to them, you nowadays do see quite a few Coursera-style offerings as well, with “XSeries” programs consisting of courses that each are four weeks long. Again, like we have seen it with Coursera, splitting a 10 or 12 week course into three courses seems to have been done with an eye towards profit maximization and possibly the intention to deceive customers. Thus, keep in mind that most if not all course programs do in no way reflect the pacing of a typical university course.

The value proposition

An appealing aspect of MOOCs was that you were rewarded with a PDF certificate that indicated completion of the course. This was hardly comparable to getting a new entry on your university transcript after passing a proctored exam at a brick-and-mortar institution. Still, it provides some evidence that you have studied the course materials. Cheating happens at universities, too, after all.

I never saw any value in paying for a “verified” certificate. Apparently, this sentiment was shared by a high enough number of other MOOC users. Apart from the dubious value proposition, it was also an aesthetic one, as I considered the design of the non-free certificates at both Udacity and edX, with their tacky ribbons and huge logos, far less aesthetically pleasing than the much simpler free certificates. Demand for verified certificates apparently wasn’t what it should have been, considering that the free alternative was infinitely better value for money. Thus, the three major MOOC providers no longer offer free certificates. Udacity was the first entity that did so, in early 2014, because they wanted to “maximize the learning outcome for our students“. Their first instructor, Dave Evans, whose gave the Introduction to CS class, disagreed with that decision, for a while, apparently issued certificates himself, until Udacity told him to stop. Coursera followed suit in October 2015, but everything is fine because they “remain fully committed to [their] financial aid program“. Apparently, just stopping issuing free certificates did not have the desired effect on autodidacts, so nowadays you cannot even access assignments without paying their “small fee”. This makes their computer science courses now useless for anybody who does not want to part with his money. In December, edX apparently did not want to feel left out, so they had to “ensure that edX certificates carry the merit our learners deserve“, and axed the free honor certificates, which were previously praised as a great motivation on their homepage. Why can’t they all just openly say that they want to improve their bottom line, maybe not in those words, instead of making such bullshit statements?

Now that you have to pay if you want a certificate, one has to ask what the value proposition of MOOCs is. It is of course very cynical that the big providers still claim they offer “free” education, when, particularly in the case of Coursera, the freely available content is heavily restricted. Improving the value proposition by eliminating the free option, and still claiming that you offer “free” education is quite something, though. Further, edX amuses me by asking for donations and pestering me to pay for certificates. At brick-and-mortar universities they at least let you graduate before you get calls, emails, and letters from their fundraising office.

From my perspective, as a European who is used to high quality education for a very low cost or entirely for free, the suggestion of paying for a lesser product is dubious at best. Also, I don’t look particularly favorably at the deceptions by Coursera and edX who split up regular courses into several smaller ones, which primarily serves to mislead the customer. Sock puppets would of course respond that caveat emptor applies. Scammers use the same defense. So, let’s do some basic math! As of today, I have completed 43 MOOCs. Some of those courses have been, in the meantime, split into several smaller ones. So, let’s say my 43 courses are equivalent to 50 “courses” as they are currently offered. Those courses are by no means equivalent to a college degree, but I would not be surprised if you could get a BA in an information technology degree at a second-tier university for less effort. Coursera and edX have several price levels, ranging from $49 at the low end, all the way to $99 and $149 offerings for courses. Let’s be generous and set an average price of $89 for a verified certificate. This is about $4,500 in total — for a bunch of PDFs! To put this sum into perspective: for a comparable amount of money you can cover the cost of a high-quality and respected distance-learning degree from the University of London.

In the end, greed won. Of course you can turn this around and call me “greedy” for taking so many MOOCs. Well, here is the kicker: any non-technical “certificate” has very little value to begin with, apart from possibly indicating that you have an interest in a particular subject. Thus, their value is approximately zero. The value of an arts degree from a brand-name university comes primarily from the brand-name of the university, and to a much lesser degree from what they teach you. Further, the value of a technical education, even at a no-name institution, is primarily due to the knowledge you acquire. With a technical degree from a second-rate institution you will have a hard time getting into Wall Street or this year’s hot Internet companies, but you will find employment relatively easily, unlike people with non-technical background who have far worse career prospects. This means that a certificate of a MOOC even for marketable skills is relatively useless since the skill is more important than the certificate.

The fact that the big three MOOC providers abolished free certificates seems to clearly indicate that they were unable to differentiate free and paid-for certificates. Thus, by reducing the options of their users, they hope to increase their revenue. As I wrote before, the marketing and presentation is becoming misleading if not shady, for instance when repackaging one course as many, suggesting greater value than is delivered. The new target customers is far from the old one. No longer are autodidacts an interesting clientele. Instead, the ill-informed are the new target who may salivate at the thought of getting Yale or Harvard credential for a relatively modest price, possibly not realizing that they are going to acquire a useless product.

Retrospective: Software Engineering Summer Internship (2015) at Jeppesen Systems AB in Gothenburg

Introduction

Jeppesen is a fully owned subsidiary of Boeing, and part of their Digital Aviation business. Concretely, they provide navigational information, crew and fleet management solutions, and offer an industry-leading optimization product. The Gothenburg office started originally as Carmen Systems, which was spun off Chalmers University of Technology. Ties between Chalmers and Jeppesen are still strong. The Gothenburg office of Jeppesen employs around 350 people, of which around 300 are technical staff. Further, there are around 60 contractors in the building.

Jeppesen was not an entirely new entity to me. I had been mentored by one of their managers for about a year, though whom I met several other Jeppesen employees, including a veteran software architect with a very strong technical background, and an experienced software developer. Further, I was one of the winners of a coding competition they held this spring, and one of five students who were invited to what turned out to be an intimate recruiting event, where we were outnumbered by Jeppesen managers and employees. Several managers represented their division, the challenges they are working on, and, of course, the kind of skills they are looking for in future employees. Alas, back then they did not have any openings, but we were told that a number of full-time positions would open up in autumn, and possibly some summer internships as well. Both turned out to be the case.

The interview

I applied for a data processing project with a parallel programming component, which I will detail further below. But before I got to do any work, I had to make it through the interview. Job interviews are often cringe-worthy, particularly the “your greatest weakness”-kind. The interview experience at Jeppesen was distinctly positive, though. The hiring manager briefly came in to shake hands and exchange some niceties, but otherwise, the interview was conducted by two senior engineers. They described the project, how they work, and the technologies they use.

What I greatly appreciated was that they were looking for competence instead of buzzwords. We even talked a bit about the necessary computer science background for their project. It was helpful that I had heard of recursive descent parsing, had written a parser for a subset of C++, or implemented a compiler for a C-like language. Normally, a coding test is also part of the interview process, but I presume this was skipped due to my performance on their recent coding challenge. Still, they asked me a few ‘design’ questions, like how I would approach this or that problem, or pros and cons of concrete approaches to given problems. Thankfully, it was more about how to break down a complex task into manageable subproblems, instead of the dreaded UML and design patterns interview.

The project

Of course I signed and NDA can therefore not be too specific, so I am only going to describe the problem in general terms. Let’s say your software runs on dozens or possibly even hundreds of machines at a customer site. Each copy of the program produces partly unstructured log files, recording all kinds of things. In order to extract useful information from those log files, you have to run a parser that understands the structure of the files, and extracts relevant information, which may be processed further. To make this even more fun, let the client machines run around the clock, and update log files continually. This sounds fairly complicated already. However, keep in mind that you have more than one customer, many more than one parser, and currently you only execute one parser after another in a sequential order via a cron job once a day, for historical reasons, maybe because the programs predate the many-core CPU era.

The goal of the project I was given was to write a framework that parses all log files — think in orders of magnitudes of tens of thousands of files — in pseudo-real time, so that you can extract more useful information from the existing data. For instance, if you take a metric like ‘active instances’ once a day, you’ll learn a lot less than if you are able to update this metric every few minutes.

Concretely, the framework I developed unified the existing parsers for several Jeppesen products. Understanding and modifying existing parsers required a little bit of background in programming languages. This was pretty fun, but the most interesting aspect was trying to find ways to effectively parallelize the execution of the parsers. The hardware at our disposal was pretty decent. Thus, running a single-threaded program on a many-core CPU is a bit of a waste, to put it mildly. My approach was to avoid all shared state. Once this problem was solved, the actual parallelization was easy to achieve. The machine I was mostly working on was an Intel Xeon with 16 cores and over 100 GB RAM, which was far from being the most powerful machine I had access to. Even on that one, we easily hit our performance target.

In my last two weeks, I had plenty of time to ensure the documentation was comprehensive. At the same time, my main supervisor put the project successfully into production at two customer sites, both major European airlines. At both sites we encountered minimal issues that were easily resolved. In one case, the volume was much greater than the fairly large test data directory I had been working with, so I needed to use a more efficient data structure, and make use of memoization at one point in order to resolve a bottleneck. At the other customer site, a slightly different naming convention for some files was used, which was an even easier fix. My last update was that this product has been successfully rolled out to two more customers.

The software development process

While Jeppesen is officially committed to “Agile”, they nonetheless have a strong engineering culture, which entails that they make use of software processes not for their own sake but when there is a clear benefit. The first few days on the job, I was expected to do the daily “stand up” song and dance, and report on everything I’ve done, what I intended to do that day, and give various estimates about the next few days. Personally, I found this more annoying than helpful. After I showed a proof of concept of this product after a few days, instead of the planned two weeks, my supervisors concluded that I can apparently work well on my own, and scrapped daily meetings. Instead, we had the occasional demo session, or just casually discussed progress during company breakfast.

I did like that, after proving myself, I was given relatively free reign with regards to the development, of course while still conforming to the overall vision of the product, which I received at the beginning in the form of a succinct document. The eventual software development process was somewhat reminiscent of Iterative and incremental development.

The work environemnt

Based on my experience at a previous employer, and from various company visits, I would say that Jeppesen has a clearly above average office environment. It’s not uncommon that three or four employees share their own, reasonably spacious office. However, there are also several open-plan office areas, the kind that is claimed to foster cooperation and communication, even though they are “damaging to the workers’ attention spans, productivity, creative thinking, and satisfaction”. Well, Joe Programmer, no matter where on the world he works, sadly does not command enough status to get his own office. Thus, companies are, quite literally, squandering money by not giving people access to (genuinely) quiet working environments.

Still, thanks to artificial dividers, plants, and a reasonable amount of space, the open-space areas at Jeppesen are quite okay. I was sitting in one of the larger ones in the building, among around 15 employees, all fairly well spread-out. The main office hours are a bit busy, with infrequent distractions. However, if you come in very early or stay late, you can enjoy a quiet working environment and get a lot done. During the busier hours of the day, you may notice decreased productivity, compared to, say, intently working on something in the privacy of your home.

A few perks are offered, too. Every morning, there is free basic breakfast, which is also a great opportunity to talk to people outside of your department. Further, bowls of fruit are placed all over the building. This being Sweden, there is also a strong tradition of having ‘fika’, i.e. a short break for consuming coffee and pastries. Noteworthy is that a masseuse visits the office in regular intervals, for which you can sign up at reception, and get the cost deducted from your pay. Regular employees are even entitled to some free money (friskvård) for that purpose, or other health-related expenditures.

Working hours are quite flexible. You are expected to put in 40 hours per week. As a rank-and-file employee you have to be present during core business hours (9:00 to 15:00), but you do have some flexibility otherwise. For instance, taking the occasional long lunch break, and staying longer some other days is certainly something you can discuss with your manager, just like taking half a day off and putting in a few more hours on some other days to make up for it. Overall, they try to achieve a good work/life balance.

Summary

This was a very enjoyable internship, and a summer well-spent. The end result was that I wrote, relatively independently, a medium-sized application weighing in at thousands of lines of code, all tested and well-documented. It was a good mixture of intellectual challenge and more mundane practical aspects. In the beginning, intellectual challenges were more dominant, like the entire design of the product, or figuring out effective parallelization. Later on, the focus was much more on polishing the code base, and eventually deploying it. The latter was not quite as mentally stimulating, but on the other hand it was immensely gratifying to see the project hitting or exceeding all performance targets, as well as its successful deployment.