BTW 2019 - Workshop on Big (and Small) Data in Science and Humanities

The importance of data has dramatically increased in almost all scientific disciplines over the last decade, e.g. in meteorology, genomics, complex physics simulations, biological and environmental research, and recently also in humanities. This development is due to great advances in data acquisition and data accessibility, e.g. improvements in remote sensing, powerful mobile devices, popularity of social networks and the ability to handle unstructured data (including texts). On the one hand, the availability of such data masses leads to a rethinking in scientific disciplines on how to extract useful information and on how to foster research. On the other hand, researchers feel lost in the data masses because appropriate data management, integration, analysis and visualization tools have not been available so far. However, this is starting to change with the recent development of big data technologies and with progress in natural language processing, semantic technologies and others that seem to be not only useful in business, but also offer great opportunities in science and humanities. Scientific workflows need to be realized as flexible end-to-end analytic solutions to allow for complex data processing, integration, analysis and visualization of Big Data in various application domains.

This workshop intends to bring together scientists from various disciplines with database researchers to discuss real-world problems in data science as well as recent big data technology. The workshop will consist of three parts: inspiring invited talks from international experts, presentation of accepted workshop papers and concrete working groups on challenging subjects.

Program

The workshop program included a keynote talk on digital humanities by Andreas Henrich, where he discussed current approaches, challenges and applications in the context of data integration, data federation and data analysis for humanities. We further selected six contributions that address different challenges in the context of data-driven
analytics. The papers contribute to the management and analysis of data from various domains, such as mobile data, automobile data, textual data like legal texts and bibliographic data as well as ecological data. The proposed approaches are related to the analysis and use of complex graphs and ontologies, item set mining and entity extraction as well as evaluation and quality criteria.

Two papers focus on methods and models in the context of data analytics. Rost et al. present an extension of the graph data management tool Gradoop to support temporal graph analytics. They added time properties to vertices, edges and graphs and used them within graph operators, e.g. to analyze temporal citation patterns as presented in a bibliographic usage scenario.
Spieß and Reimann analyzed the regulation and control of vehicle components in automotive series production. They developed an adapted item set mining approach in order to successfully perform association analysis for the domain-specific problem of automatically identifying vehicles with high risk of failure.

Three papers deal with the analysis and extraction of information from textual data.
Cornelia Kiefer presents and discusses quality indicators for textual data. Beside the quality of texts themselves, her aim is to predict the quality of text analysis results and to decide whether default text mining modules are likely to deal with the textual data or not. In her evaluation she investigates texts, e.g. from production, news and tweets using the proposed quality indicators. The goal of Wehnert et al. is to provide a decision support system for legal regulations, e.g. to inform companies about relevant regulatory changes that need to be considered. In this work, they use linked laws from their ontology of legal textbooks and developed a context selection mechanism to help users navigating in their legal knowledge base, e.g. to find all applications of a law.
Udovenko et al. present a hybrid approach to extract entities from scientific publications in the ecological domain. They propose a framework including the use of domain-related ontologies for entity annotation, and run an initial evaluation for entity extraction from publications on biodiversity.

Finally, Steinberg et al. present a comparative evaluation for different software solutions that support the form-based collection of mobile data. Nowadays, mobile devices heavily support the data collection process, and users often build on existing infrastructure and software to collect and submit data. The paper reports on experiences with respect to the whole data collection workflow and compares eight existing tools in terms of their features and characteristics.

All contributions of this year's BigDS workshop give new domain-relevant insights and promote the use of generic as well as domain-specific methods for scientific data management and analytics. The detailed program can be found here.

Topics of Interest

In the context of big and small data in science and humanities, the scope of the workshop includes, but is not limited to:

Big Data architectures for research
Design, implementation, optimization of scientific workflows
Data integration for scientific applications
Data archives, data repositories, data governance
Data provenance, data quality and data curation
Natural language processing, text analytics
Semantic technologies
Low-latency processing of scientific data streams
Transformation & exchange of very large scientific data
Scalable data analytics
Predictive models in science
Scalable visualization and visual analytics
User interfaces for big data
Case studies and best practices of big data in science and humanities
Big Data and grand challenge science questions
New applications in humanities and social sciences

Submission Guidelines

Submitted papers will be refereed by the workshop Program Committee. Accepted papers will appear in the BTW’19 Workshops proceedings, published as part of LNI. The papers should be written in German or English and adhere to the LNI formatting guidelines. Research and Experience papers are limited to 10 pages, Position papers to 6 pages.

Research papers must be an original unpublished work and not under review elsewhere. Experience reports must be stated as such and a comprehensive discussion of the taken approach, experiences, and its assessment are expected. All papers and reports must be submitted as PDF documents through https://easychair.org/conferences/?conf=bigdsbtw2019

Authors of accepted high-quality papers will be invited to submit an extended version of the paper for publication in Datenbank Spektrum.

Important Dates

03.12.2018	Submission of Contributions (Extended Deadline)
18.01.2019	Author Notification
31.01.2019	Camera Ready

Workshop Organizers

Anika Groß, Hochschule Anhalt
Friederike Klan, DLR Institut für Datenwissenschaften
Birgitta König-Ries, Friedrich-Schiller-Universität Jena
Peter Reimann, Universität Stuttgart
Bernhard Seeger, Philipps-Universität Marburg

Program Committee

Alsayed Algergawy, Universität Jena
Peter Baumann, Universität Bremen
Matthias Bräger, CERN
Thomas Brinkhoff, FH Oldenburg
Michael Diepenbroek, Universität Bremen
Jana Diesner, University of Illinois at Urbana-Champaign
Johann-Christoph Freytag, Humboldt-Universität zu Berlin
Michael Gertz, Universität Heidelberg
Thomas Heinis, Imperial College London
Andreas Henrich, Universität Bamberg
Jens Kattge, Max-Planck-Institut für Biogeochemie
Alfons Kemper, TU München
Bertram Ludaescher, University of Illinois at Urbana-Champaign
Alexander Markowetz, Universität Bonn
Jens Nieschulze, Universität Göttingen
Eric Peukert, Universität Leipzig
Norbert Ritter, Universität Hamburg
Kai-Uwe Sattler, TU Ilmenau
Holger Schwarz, Universität Stuttgart
Uta Störl, Hochschule Darmstadt
Andreas Thor, HfT-Leipzig

Workshop on Big (and Small) Data in Science and Humanities

Program

Topics of Interest

Submission Guidelines

Important Dates

Workshop Organizers

Program Committee

Platin

Gold

Silber

Support

Kontakt: