Internship Opportunity at the Fraunhofer-Chalmers Centre in Gothenburg, Sweden

The Systems and Data Analysis department at the Fraunhofer-Chalmers Centre for Industrial Mathematics intends to hire a student for an internship. The starting date is flexible, but the end date is firm, it’s the end of May 2018. You would be working under my supervision and contribute to algorithm development in the realm of big data analytics. This internship is related to our work on Contraction Clustering (RASTER).

Here is the job ad:


Internship at the Fraunhofer-Chalmers Centre for Industrial Mathematics

The Fraunhofer-Chalmers Research Centre for Industrial Mathematics (FCC)
offers software, services and contract research for a broad range of
industrial applications. Modelling, simulation and optimization of products
and processes can boost technical development, improve efficiency and cut
costs of both large and small businesses. Since 2001, our highly skilled team
of mathematicians and engineers has successfully solved problems for more than
170 clients. We combine consultancy services with innovative research and
development based on a wide spectrum of competences.

We are looking for an ambitious student with a background in computer science
or related fields to assist in an ongoing applied research project in the
Systems and Data Analysis department. You will contribute to research in data
stream processing that is conducted in the area of distributed data analytics.

Your task:
- Implement a stream processing algorithm, which was developed in-house
- Compare the implementation with other existing algorithms on a variety
of metrics

Required background:
- Functional programming in Scala or Haskell

Meriting:
- Experience implementing algorithms based on a mathematical specification
or pseudocode
- Algorithms/Machine Learning, in particular clustering
- Stream processing, in particular Apache Spark - Structured Streaming
- Data visualization, in particular Matplotlib

Your ideal profile:
- Chalmers student at the Master's level, preferably in the penultimate year
- Pursuing a degree in Computer Science or a similar field
- Previous work experience in the software industry or as a student research
assistant
- Ability to work independently

If you maintain a private code repository (Github, Gitlab, Bitbucket etc.),
then please highlight this in your application. If you have other samples of
work to show, such as a portfolio of projects on a blog or private website,
we would be keen to have a look.

This internship is a paid part-time (4h/week) fixed-term position until the
end of May 2018. The starting date is flexible.

Contact persons:

Mats Jirstrand, Head of Department
mats.jirstrand@fcc.chalmers.se, 031-772 42 50

Emil Gustavsson, Applied Researcher/Data Scientist
emil.gustavsson@fcc.chalmers.se, 031-772 42 92

Gregor Ulm, Research and Development Engineer
gregor.ulm@fcc.chalmers.se, 031-772 42 71

Please send your application, marked "Contracted Student / SYS (OODIDA)",
consisting of a cover letter, CV, and a current academic transcript, to
recruit@fcc.chalmers.se.

Interviews will be held continually. Please apply as soon as possible.

www.fcc.chalmers.se

Contraction Clustering (RASTER) paper published

Our paper “Contraction Clustering (Raster): A Big Data Algorithm for Density-Based Clustering in Constant Memory and Linear Time” has been published in the Springer Lecture Notes in Computer Science series.

Here is the full citation with doi:

Ulm G., Gustavsson E., Jirstrand M. (2018) Contraction Clustering (Raster). In: Nicosia G., Pardalos P., Giuffrida G., Umeton R. (eds) Machine Learning, Optimization, and Big Data. MOD 2017. Lecture Notes in Computer Science, vol 10710. Springer, Cham
https://doi.org/10.1007/978-3-319-72926-8_6

Alternatively, the submitted manuscript is available in my Gitlab repository:
https://gitlab.com/gregor_ulm/publications

Accepted Abstract for SweDS 2017: Functional Federated Learning in Erlang

At IFL 2017 in Bristol, UK, I presented our work-in-progress paper “Purely Functional Federated Learning in Erlang.” An update to this work will be presented as a poster at the upcoming 5th Swedish Workshop on Data Science (SweDS 2017), which will take place from 12 to 13 December 2017 in Gothenburg, Sweden. It is hosted by the University of Gothenburg. The abstract is reproduced below.

Functional Federated Learning in Erlang

Authors: G. Ulm, E. Gustavsson, and M. Jirstrand

A modern connected car produces gigabytes to terabytes of data per day. Collecting data generated by an entire fleet of cars, and processing it centrally on a server farm, is thus not feasible. The problem is that the total amount of data generated by cars, i.e. on edge devices, is too large to be efficiently transmitted to a central server. However, CPUs used in edge devices such as connected cars but also regular smartphones that connect to the cloud, have been getting more and more powerful in recent years. Tapping into this computational resource is one way of addressing the problem of processing big data that is generated by large numbers of edge devices.

One such approach consists of distributed data processing. Using the example of training an Artificial Neural Network, we introduce a framework for distributed data processing. A particular focus is on the implementation language Erlang. Arguably the biggest strength of the functional programming language Erlang is how straightforward it is to implement concurrent and distributed programs with it. Numerical computing, on the other hand, is not necessarily seen as one of its strengths.

The recent introduction of Federated Learning, a concept according to which edge devices are leveraged for decentralized machine learning tasks, while a central server only updates and distributes a global model, provides the motivation for exploring how well Erlang is suited to such a use case. We present a framework for Federated Learning in Erlang, written in a purely functional style. Erlang is used for coordinating data processing tasks but also for performing numerical computations. Initial results show that Erlang is well-suited for that kind of task.

We provide an overview of the general framework and also discuss an existing and fully realized in-house prototypical implementation that performs distributed machine learning tasks according to the Federated Learning paradigm. While we focus on Artificial Neural Networks, our Federated Learning framework is of a more general nature and could also be used with other machine learning algorithms.

The novelty of our work is that we present the first publicly available implementation of a Federated Learning framework; our work is also the first implementation of Federated Learning in a functional programming language, with the added benefit of being purely functional. In addition, we demonstrate that Erlang can not only be leveraged for message passing but that it also performs adequately for practical machine learning tasks.

Our presentation is based on our work-in-progress paper “Purely Functional Federated Learning in Erlang”, which we presented at IFL 2017. The context of this research is our ongoing involvement in the Vinnova-funded project "On-board/off-board distributed data analysis" (OODIDA), which is a joint-project between the Fraunhofer-Chalmers Research Centre for Industrial Mathematics, Chalmers University of Technology, Volvo Car Corporation, Volvo Trucks, and Alkit Communications.

Contraction Clustering (RASTER): paper and reference implementations

One of the results of my work at the Fraunhofer-Chalmers Research Centre for Industrial Mathematics (FCC) was the discovery of a very fast special-purpose linear-time clustering algorithm, Contraction Clustering (RASTER). I presented our work at the Third International Conference on Machine Learning, Optimization and Big Data (MOD 2017) in Volterra, Italy, earlier this month.

At FCC we decided to create reference implementations in a number of programming languages. You will find reference implementations of RASTER in Python, Erlang, Haskell, and Scala in my GitLab repository. Note that this is a mirror of the official FCC source code release.

Our paper will appear in the Proceedings of MOD 2017, which are part of the Springer Lecture Notes in Computer Science, early next year. The submitted self-archived manuscript of our RASTER paper is likewise available on my GitLab account.

Accepted Paper for IFL 2017: Purely Functional Federated Learning in Erlang

My current work at the Fraunhofer-Chalmers Research Centre for Industrial Mathematics focuses on distributed data analytics in a large-scale industrial setting. We are closely collaborating with Volvo Cars and Volvo Trucks. As part of my work, I explored the suitability of Erlang for distributed machine learning tasks. I wrote up a draft paper on my implementation of Federated Learning in Erlang, which got accepted to IFL 2017, the 29th symposium on Implementation and Application of Functional Languages. It will take place in Bristol, UK, from 30 August to 1 September, 2017. The abstract is reproduced below.


Purely Functional Federated Learning in Erlang

Authors: Ulm, Gregor; Emil Gustavsson, Mats Jirstrand

Arguably the biggest strength of the functional programming language Erlang is how straightforward it is to implement concurrent and distributed programs with it. Numerical computing, on the other hand, is not necessarily seen as one of its strengths. The recent introduction of Federated Learning, a concept according to which edge devices are leveraged for decentralized machine learning tasks, while a central server only updates and distributes a global model, provided the motivation for exploring how well Erlang was suited to such a use case. We present a framework for Federated Learning in Erlang, written in a purely functional style, and compare two versions of it: one that has been exclusively written in Erlang, and one in which Erlang is relegated to coordinating client processes that rely on performing numerical computations in the programming language C. Initial results are promising, as we learnt that a real-world industrial use case of distributed data analytics can easily be tackled with a system purely written in Erlang.
The novelty of our work is that we present the first implementation of a Federated Learning framework in a functional programming language, with the added benefit of being purely functional. In addition, we demonstrate that Erlang can not only be leveraged for message passing but also performs adequately for practical machine learning tasks.