Background

Department of Energy (DOE) National Laboratory Data Curation Working Group (DCWG) first convened in the spring of 2023. This group was proposed at the 2022 DOE Data Days in response to the challenges posed by the exponential growth in data generation across the diverse landscape of United States National Laboratory complex. The surge in data volume, the velocity of its production, and the grand diversity of data types have prompted the need for a dedicated consortium to tackle the complex data-related issues faced by these laboratories. Cognizant of these challenges, and the need for a systematic approach to tackling them, led a group of curators, librarians, information scientists, researchers, and developers representing the different laboratories to form a working group to identify emerging issues and share expertise in data curation.

Our Goal

The goal of the DOE Data Curation Working Group (DOE DCWG) is to address the full lifecycle of data--from data management planning and ensuring early liaison engagement with scientists to establishing data curation standards and tools, data literacy materials, and repository best practices as well as citation-level metadata harvesting protocols as informed by the FAIR data principles. As a strategic, cross-functional decision-making entity, the DCWG identified 7 objectives that members work asynchronously on throughout the fiscal year. This allows the group to focus on implementing data curation best practices that are adapted to the DOE National Laboratories environment. By sharing this model, we hope to share our challenges and successes and receive feedback from the larger data curation community on the efforts of the working group.

Seven Objectives of the DCWG

  • Work with DOE OSTI (Office of Scientific and Technical Information) to come up with a more mature data sharing plan and discovery layer
  • Research and implement metadata and ontologies that improve discovery and interoperability at the general and specialist/disciplinary level
  • Develop a "Data Curator Toolkit" to help curators asses datasets when they are too large to curate manually
  • Develop best practices for Machine Actionable Data Management Plans that reflect the needs of the laboratories
  • Develop best practices for liaison work and training
  • In collaboration with our partners at Data Curation Network (DCN) create documentation and primers for curators
  • Machine Learning - Create scripts and/or a proof-of-concept short paper that demonstrates how accessing various repository metadata can allow for semantic search capabilities