Large-scale efforts are underway at NIH to tackle hurdles associated with accessing and utilizing Big Data. Making public data, especially large commonly used data sets, easily accessible in the cloud will reduce the burden and cost of individual investigators independently moving and storing data, enable the ability to compute against data sets, and permit new and novel uses across data sets. The (NIH) Commons was created as a platform to make this possible. It is a shared virtual space where scientists can engage biomedical research digital objects that will allow them to find, manage, share, use and reuse data, software, metadata and workflows. For digital objects to be in the Commons, they must have attributes that make them Findable, Accessible, Interoperable and Reusable (FAIR), i.e. they must follow the FAIR principles.
Because the Commons was only recently introduced and is still under development, many in the research community are still wondering what it’s composed of and how it works. This week, a document was released by NIH that provides a clearer description of the Commons and some of the pilot programs initiated to develop and test its various components.
The Data Science at NIH online forum, INPUT | OUTPUT, published the first in a series of blog posts describing the four main building blocks of the Commons that work together to form a complex ecosystem:
- A computing environment, such as the cloud or HPC (High Performance Computing) resources, which support access, utilization and storage of digital objects.
- Public datasets that adhere to Commons Digital Object Compliance principles.
- Softwareservices and tools that enable a broad array of data indexing and sharing, and connectivity between repositories and registries;
- A set of Digital Object Complianceprinciples that describes the properties of digital objects that enables them to be findable, accessible, interoperable and reproducible (FAIR).
Each of the components are being further refined and integrated through a series of Commons pilots. Because development is ongoing, NIH hopes to foster discussion and gather input from the research community through the INPUT | OUTPUT blog. Stay tuned for future postings.
Read the full document, including details about specific pilots here.