This page provides guidance to NGA LTER researchers who are submitting data. It summarizes the steps to publish research data to our archive. Additionally, it serves as a repository of our site’s data policies.
The primary responsibility for data completeness and integrity (quality control) rests with the Principal Investigator (PI) who is submitting the data. However, the Information Manager(IM) handles the process of publishing the data. Therefore, successful data management requires a good partnership between these two parties.
No data can be archived until the PI and IM agree that the data and metadata are complete and accurate. This collaboration takes time and can be labor intensive. In particular, new datasets or unique research techniques may present challenges. However, following standard best practices of data management can smooth the process.
As has been noted in the Information Management webpage, the Research Workspace is NGA LTER’s primary tool for data management. We collate data and author metadata using the Research Workspace, which is cloud based and accessed via a web browser.
The Research Workspace and its metadata editor have documentation that is linked to below. In addition, the site IM provides on-boarding and training for NGA LTER researchers, grad students, and technicians. Please contact the IM if additional training is needed.
These are the steps for submitting data in NGA LTER:
- The PI submits a data management plan to the IM at the beginning of a new initiative. Together, they update the plan annually thereafter. They establish a Research Workspace Project in accordance with the plan.
- The PI prepares their data files according to the NGA LTER Best Practices and uploads them to the Research Workspace Project.
- The IM creates the beginning of a metadata record for the dataset in the Research Workspace Metadata Editor using a generic NGA site template and the data management plan.
- The PI completes the metadata record to fully document the dataset. This metadata directly becomes the content of the dataset’s public expression within DataONE’s search and data catalog, so this step is critical.
- When the dataset is ready for review, the PI alerts the IM. Then the IM copies the data within the Research Workspace to a new folder that will become the archive version. The IM checks the metadata for adherence to standards.
- If needed, the IM and PI collaborate on improvements that should be made to the data formats or metadata.
- When the changes are complete, the IM submits a publication agreement to the PI via email. Then the PI agrees to the publication, also via email.
- The IM then submits the data and metadata record as a whole to the DataONE archive pipeline. After successful ingest and re-index, the dataset appears in the NGA LTER DataONE catalog. It is during this step that the DOI, or Digital Object Identifier, is issued.
Dataset Updates and Revisions
The DOI of a dataset refers to a version of the dataset that is frozen in time. Immutability gives a DOI its value. However because of that, changes to the data or metadata within the Research Workspace do not automatically transfer to the archived version of the data. Therefore, dataset revisions must go through the same procedure as the initial data submission, and a revised DOI must be issued.
This applies to any sort of revisions, from the correction of typos to annual updates performed after new data is collected. In all these cases, the IM must transfer the new version of data and/or metadata into the archive folder and then alert DataONE about the update. Please work with the IM to revise previously archived datasets and their metadata. The IM may delay multiple revisions in order to consolidate them into a single submission.
Researchers associated with our site are expected to upload their data in the NGA LTER campaign on Research Workspace within 1 year of collection, even if it is in a preliminary state. This will enable the IM to begin preparing the data for publication. It also enables intra-site collaborations. Data in the Research Workspace are not publicly accessible by default.
The National Science Foundation (NSF) and the LTER Network require that data are publicly available according to their policies. According to LTER’s Data Access Policy, “data are to be released to the general public according to the terms of the general data use agreement … within 2 years from collection and no later than the publication of the main findings from the dataset.”
Data Use Agreement
When the LTER’s Data Access Policy says “publicly available”, it is within the context of intellectual rights. Text within each metadata document defines the agreement between the data user and the LTER site releasing the data. Previously, LTER has accepted the Creative Commons licensing framework as the correct text to use. There are ongoing discussions about whether these licenses are the most appropriate for scientific data. Nevertheless, LTER’s current language regarding licenses states:
“LTER data and metadata may be released under the license: CC BY – Attribution. This license lets others distribute, remix, tweak, and build upon your work (even commercially), as long as you are credited for the original creation. This is the most accommodating of licenses offered, and is recommended for maximum dissemination and use of licensed materials.” — LTER’s Data Access Polic
“Alternatively, LTER data and metadata may be released into the public domain: CC0 – No Rights Reserved. CC0 states that data are placed in the public domain, so that others may freely build upon, enhance and reuse the works for any purposes without restriction under copyright or database law. It is usual practice for major databases to make data freely available under CC0.” — LTER’s Data Access Policy
We assume that data users will conform to scientific norms regarding citing datasets and contacting PIs when their use of the data is extensive enough to warrant collaborations.
This language should be used to acknowledge NGA LTER funding sources in metadata:
“This material is based upon work supported a consortium that comprises the Alaska Ocean Observing System (AOOS), the Exxon Valdez Oil Spill Trustee Council via its long-term monitoring program GulfWatchAlaska, the North Pacific Research Board through its Seward Line Long-term Monitoring program, and by the National Science Foundation through the Northern Gulf of Alaska Long-Term Ecological Research program (OCE-1656070). Any opinions, findings, conclusions, or recommendations expressed in the material are those of the author(s) and do not necessarily reflect the views of the funding agencies.”
This statement goes in the “Additional Credits” in the Contacts section of the Resource Overview.
NGA LTER Best Practices
We have developed some rules and suggestions to follow when formatting tabular data files for a couple reasons. Firstly, these best practices will help us create data files are easier to document and archive. Secondly, standardizing things like columns names will make them easier to visualize, integrate, or analyze. Therefore, conforming to these best practices should be considered part of the data file QC process.
We’ve created a document in GitHub to track this document as it evolves: NGA LTER Best Practices
Chris Turner (Email)
Axiom Data Science