Contraction Clustering (RASTER): paper and reference implementations

One of the results of my work at the Fraunhofer-Chalmers Research Centre for Industrial Mathematics (FCC) was the discovery of a very fast special-purpose linear-time clustering algorithm, Contraction Clustering (RASTER). I presented our work at the Third International Conference on Machine Learning, Optimization and Big Data (MOD 2017) in Volterra, Italy, earlier this month.

At FCC we decided to create reference implementations in a number of programming languages. You will find reference implementations of RASTER in Python, Erlang, Haskell, and Scala in my GitLab repository. Note that this is a mirror of the official FCC source code release.

Our paper will appear in the Proceedings of MOD 2017, which are part of the Springer Lecture Notes in Computer Science, early next year. The submitted self-archived manuscript of our RASTER paper is likewise available on my GitLab account.

Accepted Paper for IFL 2017: Purely Functional Federated Learning in Erlang

My current work at the Fraunhofer-Chalmers Research Centre for Industrial Mathematics focuses on distributed data analytics in a large-scale industrial setting. We are closely collaborating with Volvo Cars and Volvo Trucks. As part of my work, I explored the suitability of Erlang for distributed machine learning tasks. I wrote up a draft paper on my implementation of Federated Learning in Erlang, which got accepted to IFL 2017, the 29th symposium on Implementation and Application of Functional Languages. It will take place in Bristol, UK, from 30 August to 1 September, 2017. The abstract is reproduced below.


Purely Functional Federated Learning in Erlang

Authors: Ulm, Gregor; Emil Gustavsson, Mats Jirstrand

Arguably the biggest strength of the functional programming language Erlang is how straightforward it is to implement concurrent and distributed programs with it. Numerical computing, on the other hand, is not necessarily seen as one of its strengths. The recent introduction of Federated Learning, a concept according to which edge devices are leveraged for decentralized machine learning tasks, while a central server only updates and distributes a global model, provided the motivation for exploring how well Erlang was suited to such a use case. We present a framework for Federated Learning in Erlang, written in a purely functional style, and compare two versions of it: one that has been exclusively written in Erlang, and one in which Erlang is relegated to coordinating client processes that rely on performing numerical computations in the programming language C. Initial results are promising, as we learnt that a real-world industrial use case of distributed data analytics can easily be tackled with a system purely written in Erlang.
The novelty of our work is that we present the first implementation of a Federated Learning framework in a functional programming language, with the added benefit of being purely functional. In addition, we demonstrate that Erlang can not only be leveraged for message passing but also performs adequately for practical machine learning tasks.

Accepted Paper for MOD 2017: Contraction Clustering (RASTER): A Big Data Algorithm for Density-Based Clustering in Constant Memory and Linear Time

My first big project at the Fraunhofer Chalmers Research Centre for Industrial Mathematics centered around processing huge geospatial data sets. The industrial partner was Scania. The main part of my contribution consisted of generating labelled evolving network graphs that make it possible to trace the routes of vehicles over time. During that work, I discovered a linear-time clustering algorithm (Contraction Clustering, which is itself contracted to RASTER) for detecting clusters with dense centers. Those centers are nodes in the aforementioned graphs. The key novelty of my work is the exploitation of domain knowledge, which facilitates clustering with just one pass through the entire input, neighborhood lookup in O(1), and constant memory requirements.

My paper on RASTER got accepted to MOD 2017, which will take place from 14 to 17 September in Volterra, Italy. The abstract of my paper is below.

Contraction Clustering (RASTER): A Big Data Algorithm for Density-Based Clustering in Constant Memory and Linear Time

Authors: Ulm, Gregor; Emil Gustavsson, Mats Jirstrand

Clustering is an essential data mining tool for analyzing and grouping similar objects. In big data applications, however, many clustering methods are infeasible due to their memory requirements or runtime complexity. Contraction Clustering (RASTER) is a linear-time algorithm for identifying density-based clusters. Its coefficient is negligible as it depends neither on input size nor the number of clusters. Its memory requirements are constant. Consequently, RASTER is suitable for big data applications where the size of the data may be huge. It consists of two steps: (1) a contraction step which projects objects onto tiles and (2) an agglomeration step which groups tiles into clusters. Our algorithm is extremely fast. In single-threaded execution on a contemporary workstation, it clusters ten million points in less than 20 seconds when using a slow interpreted programming language like Python. Furthermore, RASTER is easily parallelizable.

Accepted Paper for ICALEPCS 2017: PLC Factory: Automating routine tasks in large-scale PLC software development

My paper on the application PLC Factory, of which I wrote at the European Spallation Source in Lund, Sweden, in 2016, got accepted to ICALEPCS 2017. This year’s ICALEPCS conference will take place in Barcelona, Spain, from 8 October to 13 October 2017. The abstract is below. The final revision of the paper is available on my Gitlab repository.

PLC FACTORY: AUTOMATING ROUTINE TASKS IN LARGE-SCALE PLC SOFTWARE DEVELOPMENT

Authors:
G. Ulm, F. Bellorini, D. Brodrick, R. Fernandes, N. Levchenko, D. Piso Fernandez

Abstract:
At the European Spallation Source in Lund, Sweden, the entire facility including all its instruments will be controlled by a large number of programmable logic controllers (PLCs). Programming PLCs, however, entails a significant amount of repetition. It is thus an error-prone and time-consuming task. Given that PLCs interface with hardware, this involves economic aspects as well, due to the fact that programming errors may cause damage to equipment. With PLC Factory, we managed to automate repetitive tasks associated with PLC programming and interfacing PLCs from EPICS. This tool is being adopted at ESS, and has shown potential for a large increase in productivity compared to the status quo. We describe PLC Factory as well as its embedded domain-specific programming language PLCF#, which it is built upon.

Summer Internship Opportunity at the Fraunhofer-Chalmers Centre in Gothenburg, Sweden

The Systems and Data Analysis department at the Fraunhofer-Chalmers Centre for Industrial Mathematics intends to hire two students for a summer internship. The starting date is flexible; the salary is competitive for Sweden. However, note that there is no relocation assistance. You would be working under my supervision and assist in the further development of a prototype of a distributed system, which I built from scratch.

Here is the job ad:


Become a Summer Intern at the Fraunhofer-Chalmers Centre for Industrial Mathematics!

The Fraunhofer-Chalmers Research Centre for Industrial Mathematics (FCC) offers Software, Services and Contract Research for a broad range of industrial applications. Modelling, Simulation and Optimization of products and processes can boost technical development, improve efficiency and cut costs of both large and small businesses. Since 2001, our highly skilled team of mathematicians and engineers has successfully solved problems for more than 170 clients. We combine consultancy services with innovative research and development based on a wide spectrum of competences.

We are looking for two ambitious students with backgrounds in computer science or related fields to assist in an ongoing applied research project in the Systems and Data Analysis department. You will contribute to the further development of a distributed system in an Internet of Things context. There are several areas you could get involved in. Ideally, you have gained experience in two of the following:

- C programming
- Erlang programming; alternatively experience in any other programming language that supports the actor model
- design and implementation of embedded domain-specific languages
- front-end development (HTML5)

Your ideal profile:
- Chalmers student at the Master's level, preferably in the penultimate year
- Pursuing a degree in Computer Science or a similar field
- Previous work experience in the software industry or as a student research assistant
- Ability to work independently, based on supervision on a weekly basis

If you maintain a private code repository (Github, Gitlab, Bitbucket etc.), then please highlight this in your application. If you have other samples of work to show, such as a portfolio of projects on a blog or private website, we would be keen to have a look.

This internship is a full-time fixed-term position for six weeks. The starting date is flexible.

Contact persons:

Mats Jirstrand, Head of Department
mats.jirstrand@fcc.chalmers.se, 031-772 42 50

Emil Gustavsson, Applied Researcher/Data Scientist
emil.gustavsson@fcc.chalmers.se, 031-772 42 92

Gregor Ulm, Research and Development Engineer
gregor.ulm@fcc.chalmers.se, 031-772 42 71

Please send your application, marked "Summer Intern SYS", consisting of a cover letter, CV, and a current academic transcript, to recruit@fcc.chalmers.se.

Interviews will be held continually. Please apply as soon as possible.

www.fcc.chalmers.se