Upcoming Poster Presentation at Euro-Par 2019

Our paper “Active-Code Replacement in the OODIDA Data Analytics Platform” has been accepted to the poster session at Euro-Par 2019, which will take place from August 26 to 30 in Goettingen, Germany. The abstract is reproduced below. A preprint of the paper the poster is based on is available on arXiv, where you also find a preprint of the paper on OODIDA.

Title: Active-Code Replacement in the \OODIDA Data Analytics Platform
Authors: Gregor Ulm, Emil Gustavsson, Mats Jirstrand 

OODIDA (On-board/Off-board Distributed Data Analytics) is a
platform for distributing and executing concurrent data analytics
tasks. It targets fleets of reference vehicles in the automotive
industry and has a particular focus on rapid prototyping. Its
underlying message-passing infrastructure has been implemented in
Erlang/OTP. External Python applications perform data analytics
tasks. Most work is performed by clients (on-board). A central
server performs supplementary tasks (off-board). OODIDA can be
automatically packaged and deployed, which necessitates restarting
parts of the system, or all of it. This is potentially disruptive. 
To address this issue, we added the ability to execute user-
defined Python modules on clients as well as the server. These 
modules can be replaced without restarting any part of the system 
and they can even be replaced between iterations of an ongoing 
assignment. This facilitates use cases such as iterative A/B
testing of machine learning algorithms or deploying experimental
algorithms on-the-fly.

New Preprint: Contraction Clustering (RASTER): A Very Fast Big Data Algorithm for Sequential and Parallel Density-Based Clustering in Linear Time, Constant Memory, and a Single Pass

A preprint of our paper “Contraction Clustering (RASTER): A Very Fast Big Data Algorithm for Sequential and Parallel Density-Based Clustering in Linear Time, Constant Memory, and a Single Pass” is now available on arXiv. The abstract is reproduced below:

Title: Contraction Clustering (RASTER): A Very Fast Big Data
Algorithm for Sequential and Parallel Density-Based Clustering
in Linear Time, Constant Memory, and a Single Pass

Authors: Gregor Ulm, Simon Smith, Adrian Nilsson,
Emil Gustavsson, Mats Jirstrand

Abstract: Clustering is an essential data mining tool for
analyzing and grouping similar objects. In big data
applications, however, many clustering algorithms are
infeasible due to their high memory requirements and/or
unfavorable runtime complexity. In contrast, Contraction
Clustering (RASTER) is a single-pass algorithm for identifying
density-based clusters with linear time complexity. Due to its
favorable runtime and the fact that its memory requirements
are constant, this algorithm is highly suitable for big data
applications where the amount of data to be processed is huge.
It consists of two steps: (1) a contraction step which
projects objects onto tiles and (2) an agglomeration step
which groups tiles into clusters. This algorithm is extremely
fast in both sequential and parallel execution. In single-
threaded execution on a contemporary workstation, an 
implementation in Rust processes a batch of 500 million points 
with 1 million clusters in less than 50 seconds. The speedup 
due to parallelization is significant, amounting to a factor 
of around 4 on an 8-core machine. 

New Preprint: Active-Code Replacement in the OODIDA Data Analytics Platform

A preprint of our paper “Active-Code Replacement in the OODIDA Data Analytics Platform” is now available on arXiv. It describes a key feature of OODIDA, which is a distributed system for data analytics for the automotive industry, targeting a fleet of reference vehicles. With active-code reloading, it is possible to replace code for custom computations without taking any component of the system down. The abstract is reproduced below:

Active-Code Replacement in the OODIDA Data Analytics Platform
Gregor Ulm, Emil Gustavsson, Mats Jirstrand

OODIDA (On-board/Off-board Distributed Data Analytics) is a
platform for distributing and executing concurrent data analysis
tasks. It targets a fleet of reference vehicles in the
automotive industry and has a particular focus on rapid
prototyping. Its underlying message-passing infrastructure has
been implemented in Erlang/OTP, but the external applications
for user interaction and carrying out data analysis tasks use
a language-independent JSON interface. These applications are
primarily implemented in Python. A data analyst interacting with
OODIDA uses a Python library. The bulk of the data analytics
tasks are performed by clients (on-board), while a central server
performs supplementary tasks (off-board). OODIDA can be
automatically packaged and deployed, which necessitates restarting
parts of the system, or all of it. This is potentially disruptive.
To address this issue, we added the ability to execute
user-defined Python modules on both the client and the server,
which can be replaced without restarting any part of the system.
Modules can even be swapped between iterations of an ongoing
assignment. This facilitates use cases such as iterative A/B
testing of machine learning algorithms or deploying experimental
algorithms on-the-fly. Active-code replacement is a key feature
of our system as well as an example of interoperability between
a functional and a non-functional programming language.

New Preprint: OODIDA: On-board/Off-board Distributed Data Analytics for Connected Vehicles

A preprint of our paper “OODIDA: On-board/Off-board Distributed Data Analytics for Connected Vehicles” is now available on arXiv. It describes a distributed system for data analytics for the automotive industry, targeting a fleet of reference vehicles. The abstract is reproduced below:

OODIDA: On-board/Off-board Distributed Data Analytics
for Connected Vehicles
Gregor Ulm, Emil Gustavsson, and Mats Jirstrand

Connected vehicles may produce gigabytes of data per
hour, which makes centralized data processing
impractical at the fleet level. In addition, there
are the problems of distributing tasks to edge
devices and processing them efficiently. Our solution
to this problem is OODIDA (On-board/off-board
Distributed Data Analytics), which is a platform that
tackles both task distribution to connected vehicles
as well as concurrent execution of large-scale tasks
on arbitrary subsets of clients. Its message-passing
infrastructure has been implemented in Erlang/OTP,
while the end points are language-agnostic. OODIDA is
highly scalable and able to process a significant
volume of data on resource-constrained clients.