BTW Logo

Paper Poster Slides

Keynote

Time: 09:00-10:00
Location: Physics HS1
Session Chair: Felix Naumann (Hasso Plattner Institute, University of Potsdam)

  • Ihab Ilyas (University of Waterloo):
    Building Scalable Machine Learning Solutions for Data Cleaning

Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. In this talk I discuss why leveraging data semantics and domain-specific knowledge is key in delivering the optimizations necessary for truly scalable ML curation solutions. The talk focuses on two main problems: (1) entity consolidation, which is arguably the most difficult data curation challenge because it is notoriously complex and hard to scale; and (2) using probabilistic inference to suggest data repair for identified errors and anomalies using our new system called HoloClean. Both problems have been challenging researchers and practitioners for decades due to the fundamentally combinatorial explosion in the space of solutions and the lack of ground truth. There’s a large body of work on this problem by both academia and industry. Techniques have included human curation, rules-based systems, and automatic discovery of clusters using predefined thresholds on record similarity Unfortunately, none of these techniques alone has been able to provide sufficient accuracy and scalability. The talk aims at providing deeper insight into the entity consolidation and data repair problems and discusses how machine learning, human expertise, and problem semantics collectively can deliver a scalable, high-accuracy solution.


Demo-Flash

Time: 10:00-10:30
Location: Physics HS1


Session 6: Query Processing and Optimization II

Time: 11:00-12:30
Location: Physics HS1
Session Chair: Wolfgang Lehner (TU Dresden)
Type: parallel with Session 7 and Tutorial

  • Maximilian Schüle (TU Munich), Linnea Passing (TU Munich), Alfons Kemper (TU Munich) and Thomas Neumann (TU Munich):
    Ja-(zu-)SQL: Evaluation einer SQL-Skriptsprache für Hauptspeicherdatenbanksysteme
    (scientific program, short paper)

  • Adrian Bartnik (TU Berlin), Bonaventura Del Monte (DFKI GmbH), Tilmann Rabl (TU Berlin, DFKI GmbH) and Volker Markl (TU Berlin, DFKI GmbH):
    On-the-fly Reconfiguration of Query Plans for Stateful Stream Processing Engines
    (scientific program, full paper)

  • Yvonne Hegenbarth (Software AG) and Gerald Ristow (Software AG):
    Konzept und Implementierung eines echtzeitfähigen Model Management Systems – am Beispiel zur Überwachung von Lastprognosen für den Intraday Stromhandel
    (industrial program, full paper)

Session 7: Similarity

Time: 11:00-12:30
Location: Zuse 037
Session Chair: Tomas Seidl (LMU Munich)
Type: parallel with Session 6 and Tutorial

  • Jan Martin Keil (Friedrich Schiller University Jena):
    Efficient Bounded Jaro-Winkler Similarity Based Search
    (scientific program, short paper)

  • Xiao Chen (Otto-von-Guericke-University of Magdeburg), Gabriel Campero Durand (Otto-von-Guericke-University of Magdeburg), Roman Zoun (Otto-von-Guericke-University of Magdeburg), David Broneske (Otto-von-Guericke-University of Magdeburg), Yang Li (Otto-von-Guericke-University of Magdeburg) and Gunter Saake (Otto-von-Guericke-University of Magdeburg):
    The Best of Both Worlds: Combining Hand-Tuned and Word-Embedding-Based Similarity Measures for Entity Resolution
    (scientific program, short paper)

  • Michael Günther (TU Dresden), Maik Thiele (TU Dresden) and Wolfgang Lehner (TU Dresden):
    Fast Approximated Nearest Neighbor Joins For Relational Database Systems
    (scientific program, full paper)

Sponsor tutorial Actian Vector

Title: The analytic database Actian Vector
Time:
11:00-12:30
Location: Zuse 001


Session 8: Machine Learning

Time: 13:30-15:00
Location: Physics HS1
Session Chair: Kai-Uwe Sattler (TU Ilmenau)
Type: parallel with Demos

  • Maximilian Schüle (TU Munich), Frédéric Simonis (TU Munich), Thomas Heyenbrock (TU Munich), Alfons Kemper (TU Munich), Stephan Günnemann (TU Munich) and Thomas Neumann (TU Munich):
    In-Database Machine Learning: Gradient Descent and Tensor Algebra for Main Memory Database Systems
    (scientific program, full paper)

  • Matthias Boehm (Graz University of Technology), Alexandre Evfimievski (IBM Research – Almaden, San Jose) and Berthold Reinwald (IBM Research – Almaden, San Jose):
    Efficient Data-Parallel Cumulative Aggregates for Large-Scale Machine Learning
    (scientific program, full paper)

  • Lars Bremer (IBM Germany Research & Development GmbH), Mariya Chkalova (IBM Germany Research & Development GmbH) and Martin Oberhofer (IBM Germany Research & Development GmbH):
    Machine Learning Applied to the Clerical Task Management Problem in Master Data Management Systems
    (industrial program, full paper)

Demo group 2

Time: 13:30-15:00
Location: Zuse 210
Type: parallel with Session 8

  • Jurica Seva (HU Berlin), Julian Goetze (University Hospital of Tübingen), Mario Lamping (Charité), Damian Tobias Rieke (Charité, Berlin Institute of Health), Reinhold Schäfer (Deutsches Krebsforschungszentrum) and Ulf Leser (HU Berlin):
    Information Retrieval for Precision Oncology

  • Alexander Krause (TU Dresden), Annett Ungethüm (TU Dresden), Thomas Kissinger (TU Dresden), Dirk Habich (TU Dresden) and Wolfgang Lehner (TU Dresden):
    NeMeSys – Energy Adaptive Graph Pattern Matching on NUMA-based Multiprocessor Systems

  • Thomas Lindemann (TU Dortmund), Patrick Brinkmann (TU Dortmund), Fadi Dalbah (TU Dortmund), Christian Hakert (TU Dortmund), Philipp-Jan Honysz (TU Dortmund), Daniel Matuszczyk (TU Dortmund), Nikolas Müller (TU Dortmund), Alexander Schmulbach (TU Dortmund), Stefan Petyov
    Todorinski (TU Dortmund), Oliver Tüselmann (TU Dortmund), Shimon Wonsak (TU Dortmund) and Jens Teubner (TU Dortmund):
    MAGPIE: A Scalable Data Storage System for Efficient High Volume Data Queries

  • Daniyal Kazempour (LMU Munich), Maksim Kazakov (LMU Munich), Peer Kröger (LMU Munich) and Thomas Seidl (LMU Munich):
    DICE: Density-based Interactive Clustering and Exploration

  • Stefan Hagedorn (TU Ilmenau), Oliver Birli (TU Ilmenau) and Kai-Uwe Sattler (TU Ilmenau):
    Processing Large Raster and Vector Data in Apache Spark

  • Mark Lukas Möller (University of Rostock), Nicolas Berton (ENSEIRB-MATMECA), Meike Klettke (University of Rostock), Stefanie Scherzinger (OTH Regensburg) and Uta Störl (Hochschule Darmstadt):
    jHound: Large-Scale Profiling of Open JSON Data

  • M. Ali Rostami (University of Leipzig, ScaDS Dresden Leipzig), Eric Peukert (University of Leipzig, ScaDS Dresden Leipzig), Moritz Wilke (University of Leipzig, ScaDS Dresden Leipzig) and Erhard Rahm (University of Leipzig, ScaDS Dresden Leipzig):
    Big graph analysis by visually created workflows

  • Roman Zoun (University of Magdeburg), Kay Schallert (University of Magdeburg), David Broneske (University of Magdeburg), Wolfram Fenske (University of Magdeburg), Marcus Pinnecke (University of Magdeburg), Robert Heyer (University of Magdeburg), Sven Brehmer (Bruker Daltonik GmbH), Dirk Benndorf (University of Magdeburg) and Gunter Saake (University of Magdeburg):
    MSDataStream – Connecting a Bruker Mass Spectrometer to the Internet

Session 9: Challenges in Data Processing

Time: 15:30 - 17:00
Location: Zuse 037
Session Chair: Andreas Heuer (University of Rostock)
Type: parallel with Demos and Sponsor tutorial Exasol 

  • Christoph Gröger (Robert Bosch GmbH) und Eva Hoos (Robert Bosch GmbH):
    Ganzheitliches Metadatenmanagement im Data Lake: Anforderungen, IT-Werkzeuge und Herausforderungen in der Praxis
    (industrial program, full paper)

  • Kai-Uwe Sattler (TU Ilmenau):
    Vorstellung des DFG-Schwerpunktprogramms " Skalierbares Datenmanagement für zukünftige Hardware" (SPP 2037) 

  • Poster session of the SPP 2037 in the atrium

Sponsor tutorial Exasol

Time: 15:30-17:00
Location: Zuse 219
Type: parallel with Session 9 and Demos

Demo group 2

Time: 15:30-17:00
Location: Zuse 210
Type: parallel with Session 9 and Sponsor tutorial Exasol


FGDB meeting

Time: 17:00-18:00
Location: Zuse 037
Chair: Felix Naumann (Hasso Plattner Institute, University of Postsdam)


Dinner

Time: 19:00
Location: Radisson Blu Hotel