A preprint of our paper OODIDA: On-board/Off-board Distributed Data Analytics for Connected Vehicles is now available on arXiv. It describes a distributed system for data analytics for the automotive industry, targeting a fleet of reference vehicles. The abstract is reproduced below:
OODIDA: On-board/Off-board Distributed Data Analytics
for Connected Vehicles
Gregor Ulm, Emil Gustavsson, and Mats Jirstrand
Connected vehicles may produce gigabytes of data per
hour, which makes centralized data processing
impractical at the fleet level. In addition, there
are the problems of distributing tasks to edge
devices and processing them efficiently. Our solution
to this problem is OODIDA (On-board/off-board
Distributed Data Analytics), which is a platform that
tackles both task distribution to connected vehicles
as well as concurrent execution of large-scale tasks
on arbitrary subsets of clients. Its message-passing
infrastructure has been implemented in Erlang/OTP,
while the end points are language-agnostic. OODIDA is
highly scalable and able to process a significant
volume of data on resource-constrained clients.
Our paper “Functional Federated Learning in Erlang (ffl–erl)” has been accepted for publication in the Proceedings of the 26th International Workshop on Functional and (Constraint) Logic Programming (WFLP 2018). These proceedings will appear as Springer Lecture Notes in Computer Science Vol. 11285.
The abstract is below:
Functional Federated Learning in Erlang (ffl-erl)
Gregor Ulm, Emil Gustavsson, and Mats Jirstrand
The functional programming language Erlang is well-suited
for concurrent and distributed applications, but numerical
computing is not seen as one of its strengths. Yet, the
recent introduction of Federated Learning, which leverages
client devices for decentralized machine learning tasks,
while a central server updates and distributes a global
model, motivated us to explore how well Erlang is suited to
that problem. We present the Federated Learning framework
ffl-erl and evaluate it in two scenarios: one in which the
entire system has been written in Erlang, and another in
which Erlang is relegated to coordinating client processes
that rely on performing numerical computations in the
programming language C. There is a concurrent as well as a
distributed implementation of each case. We show that Erlang
incurs a performance penalty, but for certain use cases this
may not be detrimental, considering the trade-off between
speed of development (Erlang) versus performance (C). Thus,
Erlang may be a viable alternative to C for some practical
machine learning tasks.
I just added the solutions to the three new problems in the Map-1 section of CodingBat and fixed the problem of the code samples on that page being incorrectly rendered.
Mastering CodingBat (Java), Vol. 1: Basics is now also available as an ebook for Amazon Kindle. Amazon also offers a free reader app for computers and mobile phones. Check out the product page of this book for further information, including a sample PDF.
Our paper “A Performance Evaluation of Federated Learning Algorithms” has been accepted at the Second Workshop on Distributed Infrastructures for Deep Learning (DIDL 2018), which is colocated with the 2018 ACM/IFIP International Middleware Conference (Middleware 2018). This conference will take place from December 10 to 14 in Rennes, France. The abstract is reproduced below.
A Performance Evaluation of Federated Learning Algorithms
Adrian Nilsson, Simon Smith, Gregor Ulm, Emil Gustavsson, Mats Jirstrand (Fraunhofer-Chalmers Centre & Fraunhofer Center for Machine Learning)
Federated learning proposes an environment for distributed machine learning where a global model is learned by aggregating models that have been trained locally on data generating clients. Contrary to centralized optimization, clients can be very large in number and are characterized by challenges of data and network heterogeneity. Examples of clients include smartphones and connected vehicles, which highlights the practical relevance of this approach to distributed machine learning. We compare three algorithms for federated learning and benchmark their performance against a centralized approach where data resides on the server. The algorithms covered are Federated Averaging (FedAvg), Federated Stochastic Variance Reduced Gradient, and CO-OP. They are evaluated on the MNIST dataset using both i.i.d. and non-i.i.d. partitionings of the data. Our results show that, among the three federated algorithms, FedAvg achieves the highest accuracy, regardless of how data was partitioned. Our comparison between FedAvg and centralized learning shows that they are practically equivalent when i.i.d. data is used, but the centralized approach outperforms FedAvg with non-i.i.d. data.