New Technologies, Platforms and Services for Sharing Research Data: Data Commons, Distributed Clouds, and Distributed Data Services
Authors: Dr. Robert Grossman (University of Chicago)
Abstract: Data commons collocate data, storage, and computing infrastructure with core data services and commonly used tools and applications for managing, analyzing, and sharing data to create an interoperable resource for the research community. This session will discuss practical experiences designing, building, and operating data commons for the research community. It will also discuss key services that data commons require, such as index services, metadata services, etc.
Long Description: With the amount of available scientific data being far larger than the ability of the research community to analyze it, there’s a critical need for new algorithms, software applications, software services, and cyberinfrastructure to support data throughout its life cycle in data science. While there are a number of approaches that the research community is exploring for data intensive computing, one that is emerging is building what are called data commons. Key data commons services include: digital ID services, metadata services, data sharing services, high performance transport services, and data export services.
We describe the design, architecture and services of several data commons that the Center for Data Intensive Science (CDIS) and the Open Commons Consortium (OCC) developed and operated for the research community in conjunction with the NCI Genomic Data Commons, the NOAA Big Data Project, and the Open Science Data Cloud (OSDC).
The goals of this BOF include bringing together members of the HPC community involved in all aspects essential for building data commons, discussing challenges, and exploring and soliciting solutions. We expect the outcomes of this BOF to be: a) increased awareness of the requirements of scientific researchers in many fields working with large datasets; b) discussion leading to increased understanding of what core functionalities, technologies, and innovations may be required to create effective data commons; c) increased awareness of the resources already available to the research community; and d) potential collaborations to build and peer data commons.
The Center for Data Intensive Science (CDIS) and the Open Commons Consortium (OCC) have led well attended BOFs at SC12-SC14 on the Open Science Data Cloud (OSDC) and the NSF supported Partnership for International Research and Education (PIRE) program. At SC15 we led a joint BOF with the Research Data Alliance entitled “Integrating Data Commons and Other Data Infrastructure with HPC to Accelerate Research and Discovery”. The OSDC includes one of the first known examples of a data commons, with nearly a petabyte of general datasets (including NASA satellite data, 1000 Genomes data, US Census Data, and others) available for the research community.
Birds of a Feather Index