The importance of data has dramatically increased in almost all scientific disciplines over the last decade, e.g. in meteorology, genomics, complex physics simulations, biological and environmental research, and recently also in humanities. This development is due to great advances in data acquisition and data accessibility, e.g. improvements in remote sensing, powerful mobile devices, popularity of social networks and the ability to handle unstructured data (including texts). On the one hand, the availability of such data masses leads to a rethinking in scientific disciplines on how to extract useful information and on how to foster research. On the other hand, researchers feel lost in the data masses because appropriate data management, integration, analysis and visualization tools have not been available so far. However, this is starting to change with the recent development of big data technologies and with progress in natural language processing, semantic technologies and others that seem to be not only useful in business, but also offer great opportunities in science and humanities. Scientific workflows need to be realized as flexible end-to-end analytic solutions to allow for complex data processing, integration, analysis and visualization of Big Data in various application domains.
This workshop intends to bring together scientists from various disciplines with database researchers to discuss real-world problems in data science as well as recent big data technology. The workshop will consist of three parts: inspiring invited talks from international experts, presentation of accepted workshop papers and concrete working groups on challenging subjects.
The workshop program included a keynote talk on digital humanities by Andreas Henrich, where he discussed current approaches, challenges and applications in the context of data integration, data federation and data analysis for humanities. We further selected six contributions that address different challenges in the context of data-driven
analytics. The papers contribute to the management and analysis of data from various domains, such as mobile data, automobile data, textual data like legal texts and bibliographic data as well as ecological data. The proposed approaches are related to the analysis and use of complex graphs and ontologies, item set mining and entity extraction as well as evaluation and quality criteria.
Two papers focus on methods and models in the context of data analytics. Rost et al. present an extension of the graph data management tool Gradoop to support temporal graph analytics. They added time properties to vertices, edges and graphs and used them within graph operators, e.g. to analyze temporal citation patterns as presented in a bibliographic usage scenario.
Spieß and Reimann analyzed the regulation and control of vehicle components in automotive series production. They developed an adapted item set mining approach in order to successfully perform association analysis for the domain-specific problem of automatically identifying vehicles with high risk of failure.
Three papers deal with the analysis and extraction of information from textual data.
Cornelia Kiefer presents and discusses quality indicators for textual data. Beside the quality of texts themselves, her aim is to predict the quality of text analysis results and to decide whether default text mining modules are likely to deal with the textual data or not. In her evaluation she investigates texts, e.g. from production, news and tweets using the proposed quality indicators. The goal of Wehnert et al. is to provide a decision support system for legal regulations, e.g. to inform companies about relevant regulatory changes that need to be considered. In this work, they use linked laws from their ontology of legal textbooks and developed a context selection mechanism to help users navigating in their legal knowledge base, e.g. to find all applications of a law.
Udovenko et al. present a hybrid approach to extract entities from scientific publications in the ecological domain. They propose a framework including the use of domain-related ontologies for entity annotation, and run an initial evaluation for entity extraction from publications on biodiversity.
Finally, Steinberg et al. present a comparative evaluation for different software solutions that support the form-based collection of mobile data. Nowadays, mobile devices heavily support the data collection process, and users often build on existing infrastructure and software to collect and submit data. The paper reports on experiences with respect to the whole data collection workflow and compares eight existing tools in terms of their features and characteristics.
All contributions of this year's BigDS workshop give new domain-relevant insights and promote the use of generic as well as domain-specific methods for scientific data management and analytics. The detailed program can be found here.
Topics of Interest
In the context of big and small data in science and humanities, the scope of the workshop includes, but is not limited to:
- Big Data architectures for research
- Design, implementation, optimization of scientific workflows
- Data integration for scientific applications
- Data archives, data repositories, data governance
- Data provenance, data quality and data curation
- Natural language processing, text analytics
- Semantic technologies
- Low-latency processing of scientific data streams
- Transformation & exchange of very large scientific data
- Scalable data analytics
- Predictive models in science
- Scalable visualization and visual analytics
- User interfaces for big data
- Case studies and best practices of big data in science and humanities
- Big Data and grand challenge science questions
- New applications in humanities and social sciences
Submitted papers will be refereed by the workshop Program Committee. Accepted papers will appear in the BTW’19 Workshops proceedings, published as part of LNI. The papers should be written in German or English and adhere to the LNI formatting guidelines. Research and Experience papers are limited to 10 pages, Position papers to 6 pages.
Research papers must be an original unpublished work and not under review elsewhere. Experience reports must be stated as such and a comprehensive discussion of the taken approach, experiences, and its assessment are expected. All papers and reports must be submitted as PDF documents through https://easychair.org/conferences/?conf=bigdsbtw2019
Authors of accepted high-quality papers will be invited to submit an extended version of the paper for publication in Datenbank Spektrum.
|03.12.2018||Submission of Contributions (Extended Deadline)|
- Anika Groß, Hochschule Anhalt
- Friederike Klan, DLR Institut für Datenwissenschaften
- Birgitta König-Ries, Friedrich-Schiller-Universität Jena
- Peter Reimann, Universität Stuttgart
- Bernhard Seeger, Philipps-Universität Marburg
- Alsayed Algergawy, Universität Jena
- Peter Baumann, Universität Bremen
- Matthias Bräger, CERN
- Thomas Brinkhoff, FH Oldenburg
- Michael Diepenbroek, Universität Bremen
- Jana Diesner, University of Illinois at Urbana-Champaign
- Johann-Christoph Freytag, Humboldt-Universität zu Berlin
- Michael Gertz, Universität Heidelberg
- Thomas Heinis, Imperial College London
- Andreas Henrich, Universität Bamberg
- Jens Kattge, Max-Planck-Institut für Biogeochemie
- Alfons Kemper, TU München
- Bertram Ludaescher, University of Illinois at Urbana-Champaign
- Alexander Markowetz, Universität Bonn
- Jens Nieschulze, Universität Göttingen
- Eric Peukert, Universität Leipzig
- Norbert Ritter, Universität Hamburg
- Kai-Uwe Sattler, TU Ilmenau
- Holger Schwarz, Universität Stuttgart
- Uta Störl, Hochschule Darmstadt
- Andreas Thor, HfT-Leipzig