July | 2017 | Gregor Ulm

My first big project at the Fraunhofer Chalmers Research Centre for Industrial Mathematics centered around processing huge geospatial data sets. The industrial partner was Scania. The main part of my contribution consisted of generating labelled evolving network graphs that make it possible to trace the routes of vehicles over time. During that work, I discovered a linear-time clustering algorithm (Contraction Clustering, which is itself contracted to RASTER) for detecting clusters with dense centers. Those centers are nodes in the aforementioned graphs. The key novelty of my work is the exploitation of domain knowledge, which facilitates clustering with just one pass through the entire input, neighborhood lookup in O(1), and constant memory requirements.

My paper on RASTER got accepted to MOD 2017, which will take place from 14 to 17 September in Volterra, Italy. The abstract of my paper is below.

Contraction Clustering (RASTER): A Big Data Algorithm for Density-Based Clustering in Constant Memory and Linear Time

Authors: Ulm, Gregor; Emil Gustavsson, Mats Jirstrand

Clustering is an essential data mining tool for analyzing and grouping similar objects. In big data applications, however, many clustering methods are infeasible due to their memory requirements or runtime complexity. Contraction Clustering (RASTER) is a linear-time algorithm for identifying density-based clusters. Its coefficient is negligible as it depends neither on input size nor the number of clusters. Its memory requirements are constant. Consequently, RASTER is suitable for big data applications where the size of the data may be huge. It consists of two steps: (1) a contraction step which projects objects onto tiles and (2) an agglomeration step which groups tiles into clusters. Our algorithm is extremely fast. In single-threaded execution on a contemporary workstation, it clusters ten million points in less than 20 seconds when using a slow interpreted programming language like Python. Furthermore, RASTER is easily parallelizable.

My paper on the application PLC Factory, of which I wrote at the European Spallation Source in Lund, Sweden, in 2016, got accepted to ICALEPCS 2017. This year’s ICALEPCS conference will take place in Barcelona, Spain, from 8 October to 13 October 2017. The abstract is below. The final revision of the paper is available on my Gitlab repository.

PLC FACTORY: AUTOMATING ROUTINE TASKS IN LARGE-SCALE PLC SOFTWARE DEVELOPMENT

Authors: G. Ulm, F. Bellorini, D. Brodrick, R. Fernandes, N. Levchenko, D. Piso Fernandez

Abstract: At the European Spallation Source in Lund, Sweden, the entire facility including all its instruments will be controlled by a large number of programmable logic controllers (PLCs). Programming PLCs, however, entails a significant amount of repetition. It is thus an error-prone and time-consuming task. Given that PLCs interface with hardware, this involves economic aspects as well, due to the fact that programming errors may cause damage to equipment. With PLC Factory, we managed to automate repetitive tasks associated with PLC programming and interfacing PLCs from EPICS. This tool is being adopted at ESS, and has shown potential for a large increase in productivity compared to the status quo. We describe PLC Factory as well as its embedded domain-specific programming language PLCF#, which it is built upon.

Gregor Ulm

Monthly Archives: July 2017

Accepted Paper for MOD 2017: Contraction Clustering (RASTER): A Big Data Algorithm for Density-Based Clustering in Constant Memory and Linear Time

Accepted Paper for ICALEPCS 2017: PLC Factory: Automating routine tasks in large-scale PLC software development