Big Data and Exascale Computing (BDEC) Community Report

Authors: Prof. Jack Dongarra (University of Tennessee)

BP Abstract: The emergence of large scale data analytics and machine learning in a wide variety of scientific fields has disrupted the landscape on which emerging plans for exascale computing are developing. Participants in the international workshop series on Big Data and Extreme-scale Computing (BDEC) are systematically mapping out the ways in which the major issues associated with data intensive science interact with plans for achieving exascale computing. This meeting will present an overview of this road mapping effort and elicit community input on the development of plans for the convergence of currently bifurcated software ecosystems on a common software infrastructure.

Long Description: Anticipating the National Strategic Computing Initiative (NSCI) in the US, and the H2020 Extreme Computing initiative in Europe, the Big Data and Exascale Computing (BDEC) workshop series is building an international effort (funded by the NSF, DOE, the EU and Japan) to develop a plan for transnational cooperation in the design and development of a new generation software infrastructure for extreme scale science and engineering. Building on earlier efforts of the International Exascale Software Project (IESP) and the ongoing European EXtreme Data and Computing Inititive (EXDCI), the BDEC community is working on a plan for a common, high quality computational environment that 1) systematically maps out and analyzes the major issues associated with the integration of big data analytics and extreme scale modeling, and that 2) uses this analysis to define one or more pathways from the current state of software balkanization to a common software ecosystem that supports big data analytics and HPC-oriented modeling and simulation. The proposed BOF will offer an overview of the BDEC advanced road mapping and planning effort. It will include the presentation and discussion of the “Pathways to Convergence” report that is currently being drafted by the BDEC community. BoF leaders will seek feedback from the BoF attendees on the emerging plan for a software infrastructure that brings together and integrates of the currently bifurcated software stacks of big data analytics and HPC-driven modeling and simulation. With contributions from the US, the EU and Asia, the themes of the BDEC effort intersect with an extremely broad cross section of interests of the SC’16 community. It carries forward earlier planning efforts for international cooperation in the co-design and co-development of software infrastructure for extreme scale science in a broad spectrum of research domains; but it reframes the problems involved to fully take account of varied (and evolving) workflows and software ecosystems that different communities have created to work with data flows and computing resources that, both separately and together, are unprecedented in their scale. All the imposing design and development issues of creating an exascale-capable software stack remain. But the supercomputers that need this stack must now be viewed as the nodes (albeit the largest and perhaps the most important nodes) in a very large network of computing resources that will be required to generate, collect, manipulate, transform, analyze, and collaboratively explore gigantic mountains of data. If we seek to avoid putting arbitrary barriers between different scientific domains and research communities, then bringing together big data analytics and extreme scale computing will require a complete rethinking of the scientific software stack. As the draft BDEC report will show, the three BDEC workshops that have been held have made substantial progress in achieving this necessary change in perspective. Its four leading topic areas—software infrastructure, system architecture, operations and resource management, representative science applications—framed the context for discussions that took place at the workshops; at the BoF, a panel of experts from the BDEC community will raise issues and solicit input in each of these areas.

