U.S. Environmental Protection Agency. 2022. Using Data Repositories for Ocean and Coastal Acidification Monitoring Data. Washington D.C., Document No. EPA-842-R-22001. ------- Authors: Gulf of Maine Research Institute Riley Young Morse Northeast Regional Association of Coastal Ocean Observing Systems Tom Shyka EPA Office of Wetlands, Oceans, and Watersheds Holly Galavotti EPA Region 1 (retired) Matthew Liebman Any mention of trade names, products, or services does not imply an endorsement by the U.S. Government or EPA. EPA does not endorse any commercial products, services, or enterprises. The views expressed in this report are those of the authors and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency. ------- Contents Acknowledgments 1 Executive Summary 2 Project Overview 3 Background on Online Data Repositories 5 Approach 6 Evaluation of Online Data Repositories 7 Interviews with Data Providers and Data Users 7 Descriptions of Online Data Repositories 9 Online Data Repository Evaluation 12 Data Repository Test Cases 16 Results 19 Best Practices for Preparing and Submitting Data 20 References 22 Appendix A. Glossary of Terms 23 Appendix B. Interview Questions 26 Appendix C. Summary of Interviews with Data Managers and Data Users 27 Appendix D. Data Repository Submission Detail 29 NCEI OCADS 29 CUAHSI HydroShare 32 CUAHSI HydroServer/HIS 34 Environmental Information Exchange Network (EN) 35 IOOS RA ERDDAP 35 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Acknowledgments This project was supported by the U.S. Environmental Protection Agency (EPA) and developed in collaboration with the Casco Bay Estuary Partnership, Northeastern Regional Association of Coastal Ocean Observing Systems (NERACOOS), Gulf of Maine Research Institute, the participating National Estuary Programs and their partners. Special thanks to the following individuals who participated in interviews about managing and accessing ocean and coastal acidification data: Dr. Curtis Bohlen, Director, Casco Bay Estuary Partnership Pam DiBona, Executive Director, Massachusetts Bay National Estuary Partnership Mike Doan, Research Associate, Friends of Casco Bay Dr. Chris Hunt, Research Scientist, University of New Hampshire Dr. Emilio Mayorga, Senior Oceanographer, Applied Physics Laboratory, University of Washington Dr. Karina Nielsen, Executive Director, Estuary & Ocean Science Center, San Francisco State University Nicole Petersen, Water Quality Specialist, Barnegat Bay Partnership Dr. Grace Saba, Assistant Professor, Rutgers University Dr. Jim Vasslides, Senior Program Scientist, Barnegat Bay Partnership Prassede Vella, Staff Scientist, Massachusetts Bay National Estuary Partnership Test Case Participants: Mike Doan, Research Associate, Friends of Casco Bay Dr. Chris Hunt, Research Scientist, University of New Hampshire Jennie Rheuban, Research Specialist, Marine Chemistry & Geochemistry, Woods Hole Oceanographic Institution Reviewers: Jonathan Pollak, Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) Dr. Liqing Jiang, NOAA's National Centers for Environmental Information (NCEI) Ocean Carbon and Acidification Data Portal (OCADS) Katherine Weiler, EPA, Office of Wetlands, Oceans, and Watersheds Dwane Young, EPA, Office of Wetlands, Oceans, and Watersheds 1 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Executive Summary Since 2015, the U.S. Environmental Protection Agency (EPA) has funded and supported efforts to expand ocean and coastal acidification (OCA) monitoring to include in situ, autonomous pH and pC02 sensors at several National Estuary Program (NEP) sites. These NEPs and their partners have generated a large volume of high temporal resolution carbonate chemistry data and associated parameters including temperature, salinity, and dissolved oxygen. NEPs, however, often lack information, staff, and funding resources to store and share data beyond a local computer network. This results in limited discoverability and accessibility of data to a broader community of researchers and partners. Typically, data are made available through downloads from organization websites or by responding to direct requests for data. Continuous monitoring adds a new challenge where high frequency data result in large files, which cannot be readily transmitted and require greater storage capacity. Therefore, EPA commissioned this report to provide information that NEPs, their partners, and other monitoring groups can use to submit time-series ocean and coastal acidification data to publicly accessible online data repositories. Several online data repositories were evaluated for submitting ocean and coastal acidification data collected through the NEP and other monitoring programs. The evaluation of the repositories was informed by interviews with NEP data providers and OCA community data users to better understand their data management needs. The findings from the interviews were synthesized to produce a set of attributes to evaluate the data repositories. Test cases were conducted with three data providers to evaluate their experience using data repositories. The evaluation identified NOAA's National Centers for Environmental Information (NCEI) Ocean Carbon and Acidification Data Portal (OCADS) and The Consortium of Universities for the Advancement of Hydrologic Science (CUAHSI)'s HydroShare Data Portal as the two most suitable repositories for ocean and coastal time series data because they are user-friendly and employ a robust approach to metadata requirements to make datasets discoverable, accessible, and interoperable. Other repositories that were evaluated include CUAHSI Hydrologic Information System (HIS), EPA's The Exchange Network (EN), and NOAA's Integrated Ocean Observing System (IOOS) Regional Association partnerships. It is important to note that this is not a comprehensive review of online data repositories that could house data from the NEPs. Rather, the focus was on several federally supported domain-specific ocean and coastal acidification and water quality repositories that are recognized for meeting the requirements for FAIR1 (Findable, Accessible, Interoperable, Reusable) data principles. In this report, general best practices are described for choosing a data repository, using discovery tools, preparing metadata and data, understanding processes for submitting data, and accessing a published dataset. Overall, new collaboration and online data sharing tools and approaches offer greater opportunities for scientists and managers to communicate results to a broad community, including large continuous monitoring datasets. Because this report may introduce new terminology to the reader, a glossary is provided in Appendix A. 1. Fair Data Principles - https://www. go-fair, ora/fair-principles/ 2 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Project Overview Estuaries and coastal areas are highly vulnerable to the impacts of ocean and coastal acidification (OCA), particularly on shellfish, coral reefs, fisheries, and the commercial and recreational industries that they support (Washington State Blue Ribbon Panel on Ocean Acidification, 2012; Gledhill et al., 2015; Saba et al., 2019). To assess this vulnerability, high resolution monitoring data are needed at varying spatial and temporal scales to provide actionable information tailored to each estuary. Estuarine specific drivers can contribute to acidification, such as nutrient enrichment from stormwater, agriculture and wastewater discharges, upwelling of (XX; rich seawater, elevated atmospheric C02from urban and agricultural activities, benthic and marsh-driven processes, and alkalinity and carbon content of freshwater flows (Duarte et al., 2013; Rheuban et al., 2019; Turner et al., 2021). Therefore, synoptic water quality surveys are expanding in scope to encompass carbonate chemistry parameters, such as pH and partial pressure of carbon dioxide (pC02). Since 2015, the Environmental Protection Agency (EPA) has funded and supported efforts to expand coastal acidification monitoring at several National Estuary Program (NIP) sites. The NEP is an EPA place-based program to protect and restore the water quality and ecological integrity of "estuaries of national significance." Each of the 28 NEPs sets priorities and develops a plan for restoring and protecting water quality in their estuary and watershed. With partner organizations, NEPs conduct water quality monitoring to track trends and to evaluate changes in response to management actions. The NEP sites conducting coastal acidification monitoring using autonomous pH and pC02 sensors include Casco Bay (Maine), Massachusetts Bay (Massachusetts), San Francisco Bay (California), Santa Monica Bay (California), Tampa Bay (Florida), Long Island Sound (Connecticut/ New York), Coastal Bend Bays (Texas), Tillamook Bay (Oregon), and Barnegat Bay (New Jersey), Mobile Bay (Alabama), Indian River Lagoon (Florida), and Puget Sound (Washington). These NEPs and their partners have generated a large volume of high temporal resolution carbonate chemistry data and associated parameters including temperature, salinity, and dissolved oxygen. These data have improved our understanding of spatial and temporal variability of coastal water carbonate chemistry and the drivers responsible for this variability (Rosenau et al., 2021). Several of the NEPs have telemetry capabilities for transmitting sensor data remotely in real-time. Other NEPs manually download the sensor data from an onboard instrument data logger during routine maintenance, typically every two to six weeks. Some sensors take measurements at 15-rnin intervals, which results in data files that can be larger than traditional water quality sampling programs. In some cases, observations are averaged and reported on an hourly basis to make the data easier to manage, but may result in the loss of detail and resolution. Detailed information about coastal acidification monitoring deployments in the NEPs can be found in EPA's report: Measuring Coastal Acidification Using In Situ Sensors in the National Estuary Program (EPA, 2021). The NEP OCA sensor monitoring projects have succeeded in collecting several years of high-resolution baseline data and are in the process of reporting results to the scientific and coastal management communities. NEPs, however, are frequently challenged by the capacity to store and provide access to high resolution continuous monitoring data. They typically do not have a consistent approach for analyzing, sharing, and archiving their ocean and coastal acidification data. Most of the data are stored on local computer networks at the NEP or with their partners, which include universities, state agencies, and other organizations. This has resulted in some data that are not as easily accessible (i.e., "findable") to the broader scientific and management community. It is noted that some NEPs have developed partnerships with federal agencies such as U.S. Geological Survey (USGS) and NOAA's Integrated Ocean Observing 3 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Systems (IOOS) to make their data more accessible. in addition to the NEPs, there are an increasing number of organizations, including community science organizations, that are beginning to monitor OCA data. The recently released EPA document "Guidelines for Measuring Changes in Seawater pl-l and Associated Carbonate Chemistry in Coastal Environments of the Eastern U.S" (Pimenta and Grear, 2018) provides guidelines for sample collection, preservation, and analysis. Many of these organizations, however, are also challenged by data management. As efforts and interests on OCA continue to rise around the nation, monitoring groups need information about how to manage data locally and make it more accessible to a wider public. In addition, there are new reguirements from funders, agencies, and journal publishers to improve the accessibility of data so that it is available to the scientific community and other stakeholders. Some publications, such as Nature, recommend that where possible datasets should be submitted to discipline-specific, community-recognized repositories.2 Therefore, EPA commissioned this report to provide information that NEPs, their partners, and other monitoring groups can use to submit time-series data to publicly accessible online data repositories. Data repositories provide the capacity to store and provide public access to data which can minimize the in-house technology burden on individual organizations. By putting data into a data repository, the data and associated metadata collected can be made available for discovery and use in research, management, and industry operations. Storage of data in a repository can also satisfy data management and data sharing reguirements from funding agencies and peer-reviewed publications. It is also important to note that sharing data in online repositories is a component of the trend towards Open Science3 principles, where science is conducted in a way so that others can collaborate and share research data and processes (e.g., code for producing graphics or statistical analyses on platforms such as GitHub). However, finding the right repository for a dataset can be a time-consuming process. Data Provider , . , . . . . . . | publications Findable and preserved for Data Provider manages data and metadata in , H Accessible Reusability local system and publishes to repository i __ __ ___ ___ ___ _______ __ / Figure 1 - Adapted from Common data management workflow (Amorim, 2014). This figure demonstrates a common data workflow where the data provider (e.g., NEP project data manager or principal investigator) collects sensor data, prepares metadata, and stores both records together in an internal data management system. By publishing data to an online repository, these datasets become findable and accessible to a larger audience of data users. 2. https://www. nature, com/sdata/policies/repositories 3. Center for Open Science: htlosj/www. cos, io/ Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Fully reviewing and evaluating the many publicly available data repositories would be a large undertaking outside the scope of this report. At the outset of this project, a few key repositories were identified that were deemed by the authors and advisors to be well-suited for the continuous coastal acidification monitoring data collected by the NEPs. Although this report focuses on the needs of the NEP and their partners, additional audiences that could benefit from this project include academia, the Coastal Acidification Networks, non-governmental organizations, and citizen and community science groups involved in collecting OCA data (e.g. Gassett et al., 2021). In addition, the best practices described are applicable to other water quality monitoring programs. Background on Online Data Repositories Online data repositories are large, well-managed, database infrastructures designed to store data, enable public access to datasets, and in some cases act as a permanent archive for the data. Repositories are a central common point for storing related files and may be specialized by subject matter, scientific discipline, or geospatial region. All repositories support the idea that sharing data improves awareness of the results of a study and enables use of data beyond the initial purpose. Making research and monitoring data available online increases transparency, supports reproducibility of the original work, and enables data users anywhere to access, share, understand, compare, and synthesize results from the research or monitoring effort. Data repositories vary in the level of complexity and required elements from the data provider. At one end of the spectrum are the generalist repositories that are open to submissions of data regardless of data type, content, or discipline. These repositories have minimal requirements for metadata that are necessary to catalog the dataset for discovery through search tools and allow the dataset to be cited by others (i.e., citation metadata). At the other end of (0 > 3 a) (A I/) 0) 0 o re si re ฆo c Domain Specific Specialized Format/ Data Schema iCUAHSI HIS NATIONAL CENTERS FOR _ป ENVIRONMENTAL INFORMATION OCADS \7/ ERDDAP Easier access to scientific data DataฎN Environmental Information exchange ^Network fyffigshare zenodo iCUAHSI HydroShare Biological & Chemical Oceanography Data Managomen: Office Metadata and data formatting requirements, interoperability Adapted from Corinna Gries, cgries@wisc.edu, https://EnvironmentalDatalriitiative.org Figure 2 - Adapted from figure developed by Corinna Gries, Environmental Data Initiative https://EnvironmentalDatalnitiative,org. This figure illustrates the range of data repositories from the general to the highly specialized. Repositories in the bottom left have fewer requirements for the data provider while the repositories in the upper right have specific requirements and data formats. 5 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- the spectrum are repositories that are discipline-specific and require metadata and datasets to conform to a specific data schema that defines variable names and units, often with a specific, standard vocabulary. The repositories evaluated in this project were those that are domain-specific with more general metadata requirements. While some repositories may fit somewhere in between the two ends of the spectrum, a citation metadata data repository allows the data provider to upload data in the original (or native) file format (e.g., .csv, Excel, multidimensional space-time data sets (NetCDF)) along with metadata required by the individual repository. These metadata typically include basic descriptive information (e.g., study name, data provider name, institution, temporal and geospatial bounds of study, project abstract, keywords) and information about the fields (e.g., parameters, units, calculations). Once datasets are added to the repository, they can be found using search tools that query the metadata and the original data files can be downloaded by end users for further analysis. Some repositories enable users to organize similar datasets into collections around specific studies or events. The data repositories that have a specialized data schema are more complex but enable greater interoperability of datasets. These data repositories require that datasets (generally time series) conform to a defined data schema with required fields vocabularies for naming conventions the data itself (e.g., parameter name, units) that must be matched by the incoming dataset. Templates are available to the data provider to map or align the metadata and data fields to the data schema. From there, the datasets can be integrated into a relational database where the data observations can be queried, visualized, and compared to other data using tools provided by the repository. In all types of repositories, well-described datasets are more findable and accessible to the public. A citation metadata data repository can be an easier path for a group with limited time or resources for data management because of the minimal metadata requirements and the ability to submit data in the original file format. Additionally, in a citation metadata data repository, the data provider can define the parameter names and units for the data. By contrast, the data repositories with a more defined data schema and controlled vocabularies require that the data provider transform the data to conform to the data schema. Approach To better understand the OCA data management needs of the NEPs, a series of interviews were conducted with NEP data providers and users of the data. Data providers often include the principal investigators, or technical data managers for the project, who are responsible for maintaining datasets on local computer networks. Data providers often report the analyses of data to the public, such as a NEP State of the Bay report. Interviews with data users outside of the individual NEPs were also conducted to help inform NEP principal investigators and data managers about additional uses of OCA data and considerations about documenting and sharing these data. The findings from the interviews were synthesized to produce a set of attributes to evaluate the data repositories. These attributes were then used to inform three test cases that were conducted with data providers (who are also principal investigators of the research) to evaluate the process of preparing and submitting data to online repositories. This report summarizes the development and application of the attributes in the test cases including: 1) identifying and selecting a data repository; 2) preparing metadata and data for submission; and 3) evaluating accessibility of the data. Online data repositories are identified that are suitable for submission of continuous ocean and coastal acidification data collected by the NEP and other monitoring groups. Lastly, general best practices are described for accessing and using search tools in a data repository, preparing metadata and data, understanding processes for submitting data, and testing for access and publication of a dataset. 6 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Evaluation of Online Data Repositories Interviews with Data Providers and Data Users After reviewing EPA's report Measuring Coastal Acidification Using In Situ Sensors in the National Estuary Program (EPA, 2021), five data providers were interviewed to better understand how NEPs are managing and sharing OCA data and submitting data to online data repositories. Three data users were also interviewed to identify issues related to accessing data from repositories or issues working with the OCA data in general. Data Provider Interviews: Submitting Data to a Repository Interviews were conducted with five data providers regarding the workflow from data collection to data submission. They were asked about the type of data being collected; the workflow from the sensor to the internal data management system; if metadata records had been established; guality assurance/guality control (QA/QC) processes; whether the NEP had submitted or planned to submit to a public data repository; attributes for selecting a repository; and challenges or concerns submitting data to that repository. Full guestions and responses are provided in Appendices B and C. Common practices and issues emerged. The NEPs and their monitoring partners: manage sensor OCA data in-house (database, local file store) and make data available upon reguest (but not online); establish metadata records that are not always complete; can share data with stakeholders including state agencies, scientists, and more recently aguaculturists; have technical capability to manage data and share to a public repository, but lack time to evaluate and choose repository and maintain regular updates (e.g., guarterly, annually); are interested in making OCA data more accessible; are interested in more guidance on QA/QC for OCA data; and are interested in common processes and tools, such as the NOAA National Estuary Research Reserve System's Centralized Data Management Office (CDMO). Data User Interviews: Accessing Data in a Repository Interviews were conducted with three users of OCA data about their experiences discovering and accessing data from online repositories. These users included a university research scientist interested in OCA data to support research and modeling efforts, and two NEP managers who use OCA data to produce synthesis reports on local environmental conditions (e.g., Casco Bav State of the Bav report. 6th edition). The full list of guestions and responses are provided in Appendices B and C. Common issues are grouped into three categories - data discovery, metadata, and data formats. Data Discovery Data users often must look at multiple repositories, which all have their own data access and discovery tools. Common repositories mentioned were NOAA's National Centers for Environmental Information (NCEI) Ocean Carbon and Acidification Data Portal (OCADS), Biological & Chemical Oceanography Data Management Office (BCO-DMO), and state agency data repositories. The larger online repositories generally have good search tools and return datasets that match the search gueries. However, the results often return too many datasets in the search results, reguiring the user to evaluate each result. The user must often repeat the search with different keywords and parameters to further filter the list or evaluate each data set individually. The repositories generally organize the datasets by title which can be complicated and hard to understand. Often, the data user must contact the data provider to answer specific guestions before they feel comfortable using the dataset. An ideal functionality for a repository, as identified by the data users interviewed, would include better search functionality 7 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- and the ability to programmaticaliy acquire the data through web services such as an application programming interface (API). Metadata While the data users had different purposes for the data, the most common issues were that the metadata provide insufficient information for a dataset. There is often inconsistent use of field names by data providers measuring similar parameters as well as a lack of detail about accuracy and uncertainty ranges of sensors. Data providers have different styles and approaches for describing metadata, and data users find that they often need to search through protocols to find the relevant information. Some data repositories generate search lists from user-entered parameters which can result in multiple versions of similar terms making it difficult to do effective searches. The QA/QC processes are often not well described in the metadata or are buried in protocols. Data users found it difficult to understand processes around flagged data. More clarity on how data were flagged, definitions of the flags, and whether the flagged data were removed from the dataset would be desired. To avoid making assumptions, data users regularly need to contact the data provider to address questions about the data. Data users found that they are more likely to use data when the metadata are clear and QA/QC processes are well described. A basic metadata record that would satisfy the data user needs would include: well-documented QA/QC procedures, ideally represented in formal documentation of data quality practices such as a Quality Assurance Project Plan (QAPP); information about methodology and study design that is easy to find; description of how data were processed or analyzed; well-described variables using common vocabularies for terms (e.g., units, instruments, date/time, location); and detail on how flagged data are handled. Data Formats There were differences in preferred file formats among the data users. Some prefer NetCDF output while others prefer .csv/Excel if well-described. Some common problems that were noted with .csv and Excel files had to do with file size and file structure. When the data files are too large, it can create issues with analysis programs (e.g., R). In cases where data providers add metadata to the header of the document, it can cause problems with analysis programs skipping rows. 8 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Descriptions of Online Data Repositories Data Repositories Considered Working with project partners at the outset of this project, several large, federally supported repositories with a focus on water quality data and/or OCA data were selected for evaluation.4 These include: NOAA's National Centers for Environmental Information (NCEI) Ocean Carbon and Acidification Data Portal (OCADS) The Consortium of Universities for the Advancement of Hydrologic Science (CUAHSI)'s HydroShare Data Portal and Hydrologic Information System (HIS) EPA's The Exchange Network (EN) NOAA's Integrated Ocean Observing System (IOOS) ฆ Specifically, IOOS Regional Associations that are using ERDDAP5 software to manage and disseminate data. These data repositories were reviewed and evaluated based on the issues identified during the interviews with data providers and data users. The evaluations outline the challenges and opportunities associated with submitting data to different repositories. The repositories were also evaluated on the degree to which they adhere to the FAIR data standards, which include making data Findable, Accessible, Interoperable, and Reusable (Wilkinson et al, 2016). While this evaluation approach was specifically applied to a few select repositories based on the type of data, the structured approach described here could be used to evaluate other data repositories.These repositories are briefly described below. Additional information about preparing data and submitting to the repositories is available in Appendix D. Name: NCEI Ocean Carbon Data System (OCADS) URL: https://www.ncei.noaa.aov/access/ocean-carbon-data-svstem/ Owner/operator: NOAA National Centers for Environmental Information (NCEI); with funding provided by NOAA/ Climate Program Office/Ocean Observing and Monitoring Division, NOAA/Ocean Acidification Program, and the National Aeronautics and Space Administration (NASA). NOAA's NCEI OCADS is a data repository established in 2017 as a carbon data specific repository with a mission to host and provide access to ocean carbon datasets collected worldwide. It is also the permanent home of data from the Carbon Dioxide Information Analysis Center (CDIAC), which includes over 30 years of data, as well as newly acquired ocean carbon data. The data are stored and served from NCEI's archive data access services. NCEI supports FAIR data publication and is listed in the Repository Finder tool https://repositorvfinder.datacite.org/ developed by the Enabling FAIR Data Project. The OCADS data repository specializes in carbon data and includes templates with metadata and data recommendations for parameter names and attributes specific to OCA. The template was developed by working with the OCA research community to their meet scientific needs. OCADS manages controlled vocabularies for many metadata elements such as parameters, observation type, instruments (including uncertainty descriptions specific to certain sensor manufacturers), institution type, etc. All parameter names provided in the dataset are mapped to a controlled vocabulary; previously this was done by NCEI after submission. Going forward, data providers will be able to do this during submission using the Scientific Data Information System (SDIS). 4. Other National Science Foundation (NSF) supported repositories such as Biological & Chemical Oceanography Data Manage- ment Office (BCO-DMO) for ecosystem research data and DataONE were also considered but this project was limited in scope. 5. https://coastwatch. pfea. noaa. gov/erddap/index. html National Centers for Environmental Infori NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION NATIONAL OCEANIC AND Information Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- HYDROSHARE Name: CUAHSI HydroShare URL: https://www.hvdroshare.org/ Owner/operator: The Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) HydroShare is a data repository developed by CUAHSI for the purpose of storing water-resources data and models, and making them accessible to the public. HydroShare provides a data portal to discover data as well as tools to manipulate and visualize data. HydroShare was developed and supported by the National Science Foundation (NSF) as well as an active membership program. It was built using open-source software (Python/Django, iRODS) and has a collaborative project on GitHub (https://github.com/hvdroshare/') to further develop the platform and share data and models. CUAHSI supports FAIR data publication and is listed in the Repository Finder tool https://repositorvfinder. datacite.org/ developed by the Enabling FAIR Data Project. The data types supported include hydrological time series, geographic features (vector data), geographic rasters (gridded data), multidimensional space-time data sets (NetCDF) and composite/complex datasets (river geometry). HydroShare only reguires basic citation metadata for submitted datasets. The actual datasets submitted to HydroShare are not automatically integrated into a relational database where the values can be gueried or visualized through tools on the website. The reguired descriptive metadata that accompanies the dataset enables discovery through the search portal. HydroShare enables collaboration during the preparation of a dataset and provides capacity to group individual datasets under a common theme or project. Metadata entries can also be created on HydroShare where the dataset itself are not uploaded, but a link is provided to access data stored in another location. CUAHSI universities allied for water research Name: CUAHSI Hydrologic Information System (HIS) URL: http://data.cuahsi.org/ Owner/operator: The Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) CUAHSI's HIS portal, known as HydroClient, provides access to millions of time series observations collected by federal agencies, researchers, and volunteer groups. The data are primarily hydrological and include stream gauge measurements, meteorological stations, grab samples, and soil moisture measurements. It is a more complex data repository that reguires that metadata and data are transformed into a discipline-specific data schema for hydrological data where they are then integrated into a relational database. Data submitted to HIS are available through the HydroClient data portal. Datasets can be directly downloaded in the original format or visualized through online tools available on HydroClient. Upon discussion with CUAHSI program staff, the newer HydroShare data tool was recommended for time series such as the continuous OCA sensor data. This was largely because the template reguired to map datasets for submission to HydroClient can be cumbersome and the recommended format is specific to the hydrological community. Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Environmental Information exchange ^Network Name: The Exchange Network (EN) URL: https://www.epa.aov/exchanaenetwork. https://www.exchangenetwork.net/ Owner/operator: U.S. Environmental Protection Agency EN was designed to improve data sharing between states, territories, tribes, and EPA. Prior to the EN, data sharing was difficult because of inconsistencies with how data were collected and stored as well as incompatibility with computer systems. The EN consists of community developed "exchanges" that are essentially data schema for specific disciplines. Generally state agencies adopt relevant schema data standards into their own data systems and set up EN software (known as a server node) that enables them to share data directly to EPA's Central Data Exchange through standardized web services and APIs. For water data, the Water Quality Exchange (WQX) is the primary data exchange. Non-state agency data providers can share data directly with the relevant state agency through individual arrangement. Non-state entities can use a tool called WQX Web (https://www.epa.gov/waterdata/wqx-web-account- registration) to upload data files. The process does involve setting up a user account for the organization and mapping the data to the WQX format. WQX currently does not support continuous water quality monitoring data and therefore it may not be the ideal repository for the NEP OCA data. However, EPA continues to explore approaches for sharing and storing continuous monitoring data. If high frequency time series such as continuous monitoring data is submitted to the Exchange Network, it is recommended that data be summarized by hourly or daily averages with the full data logger files uploaded as an attachment. IOOS Name: The U.S. Integrated Ocean Observing System (IOOS) Regional Associations URL: https://ioos.noaa.gov/regions/ Owner/operator: NOAA IOOS is a coordinated network of people and technology that work together to compile and distribute observation data from ocean and coastal environments and guide the development of activities within defined regions. IOOS is comprised of eleven Regional Associations (RAs) that serve stakeholders from the nation's coastal communities including the Great Lakes, the Caribbean and the Pacific Islands and territories. NEPs can partner with IOOS RAs to make OCA data accessible through the RA's data distribution services. Most of the eleven IOOS RAs have implemented a data management software tool called ERDDAP to provide access to a multitude of ocean observing datasets from the region. For example, the San Francisco Estuary Partnership works with the Central and Northern California Ocean Observing System (CENCOOS) to integrate data from their OCA monitoring buoy. The data are available through the CENCOOS ERDDAP installation https://erddap.cencoos.org/erddap/tabledap/tiburon-co2.html. All RAs have a charge from IOOS to serve as a Regional Information Coordination Entity (RICE)6. This includes integrating local ocean observing data from the region and making it accessible to the public. The first step for an interested NEP partner is to reach out to the regional RA to discuss the data and develop a plan for integrating the data. 6. https://ioos.noaa.aov/about/aovernance-and-manaaement/certification-extendina-reach-reaional-data/ Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Online Data Repository Evaluation A series of attributes were developed from the results of the interviews to evaluate the individual data repositories. The attributes were organized into four categories: metadata, data preparation and submission, data access, and other attributes. Other attributes include information about long-term stability of the repository, associated costs, data archiving, and additional notable features. The following questions were developed for each category: Metadata Are there metadata requirements? Are there templates provided? What metadata standards are used? Are there metadata file format requirements? Do the repositories adhere to FAIR principles? Data Preparation and Submission What guidance is available for data providers? Are data templates available? What are the file type requirements/recommendations? What is the process for submitting data? Data Access How are data accessed? What guidance is provided for users? What filter/tools are available? Are there other OCA datasets in the repository? Are there APIs for data? Other Attributes Are Digital Object Identifiers (DOIs) available? Is the repository expected to be stable in the long-term? Are there collaborative features? Is there a cost to submit and publish data? Does it meet the FAIR data standard? Metadata All the repositories required some level of metadata that describe the dataset and are used to find the dataset through the search tools. OCADS required additional metadata that describe the carbon data parameters being collected. CUAFISI HIS and EPA's EN required that all metadata conform to a discipline specific schema. All repositories provided support for preparing metadata for submission through examples or templates. There were varying levels of guidance or requirements to use specific discipline-based vocabularies or standards for the data. While the initial repository evaluation occurred before the test case work, it should be noted that this was an area that was identified as needing the most time to navigate by the test case participants. Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- TABLE 1. METADATA EVALUATION ATTRIBUTES DATA REPOSITORY OCADS/NCEI CUAHSI HydroS ha re CUAHSI HIS IOOS ERDDAP EPA Exchange Network Metadata Yes Yes Yes Yes Yes Requirements Metadata Yes Yes Yes Yes Yes Templates Available Metadata WHP-Exchange No specific Dublin core, CF WQX (Water Vocabularies Format vocabularies or ODM, data Quality data (CCHDO); data standards schema, and Exchange) schema WOCE flags required controlled and controlled recommended vocabulary vocabulary Metadata Metadata added Citation metadata Added to Excel Work directly Work directly with Submission through form on through form, templates with Regional Exchange Network Process SDIS site can upload (Standard or Association; node (usually state supplementary Advanced) ERDDAP has agency) or submit PDF data submission template via WQX Web CF - NetCDF Climate and Forecast (CF) Conventions CCHDO - Clivar and Hydrographic Data Office (for global CTD and hydrographic data) ODM - Observations Data Model, data structure used by CUAHSI HIS WHP Exchange format - a text-based format for bottle and CTD data WOCE - World Ocean Circulation Experiment Preparing/Submitting Metadata and Data The evaluation attributes for preparing and submitting data included whether metadata and data templates were available, if there were data type limitations (e.g., numeric, date/time, spatial, string), recommendations or requirements for file format, and guidance on how to submit data. TABLE 2. EVALUATION ATTRIBUTES FOR PREPARING AND SUBMITTING DATA DATA REPOSITORY OCADS/NCEI CUAHSI HydroS ha re CUAHSI HIS IOOS ERDDAP EPA Exchange Network Data Templates Available Yes - additional data templates for underway, profile, mooring Yes - also provides examples of other datasets Yes - template for data schema Yes - IOOS ERDDAP gold standard Yes - template for data schema Dataset File Format ASCII/.csv, NetCDF .csv, .xlsx, PDF .xlsx ASCII/.csv, NetCDF, XML XML, .xlsx, .csv How to Submit Data Create account on OAP Science Data Information System (SDIS) website Create account, use website form to provide metadata, and upload dataset Fill out data templates, upload files through website Work directly with Regional Association; ERDDAP has data submission template Work directly with Exchange Network node (usually state agency) or submit via WQX Web Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Accessing Data Evaluation of attributes for accessing data included description of the data access portal, availability of an API for acquiring data, ease of discovery/access of data, and whether other OCA data are in the repository. All the repositories have data portals with similar features for discovering and accessing data. The queries can be done by filtering based on keywords, location, and date. The query results are only as good as the data entered by the data provider, reinforcing the need to develop and provide good, descriptive metadata. TABLE 3. EVALUATION ATTRIBUTES FOR ACCESSING DATA DATA REPOSITORY OCADS/NCEI CUAHSI Hydro- Share CUAHSI HIS IOOS ERDDAP EPA Exchange Network Data Access Portal Data Portal - map/filter search Data Portal - map/filter search Data Portal - map/filter search ERDDAP search/ filter interface, API USGS/EPA Water Quality Portal map/ filter search API for Data Yes - for metadata, data stored in files (.csv/.xlsx) Yes - for metadata, data stored in files (.csv/.xlsx) Yes Yes - for metadata and data Yes - for metadata and data; data can be combined across organizations because the data are in common format and use common terms Ease of Accessibility Filter based search (keywords, location, date) Filter based search (keywords, location, date) Filter based search (keywords, location, date) Filter based search (keywords, location, date) Filter based search (keywords, location, date) Data Discovery URL https://www. ncei.noaa.aov/ access/oads/ https://www. hvdroshare.ora/ search https://data. cuahsi.ora/ Varies by institution, CENCOOS example: https:// erddap.cencoos. ora/erddap/ tabledap/tiburon- co2.html https://www. waterqualitvdata. usL Other OCA Data in Repository Yes - OCA data are the primary focus of repository Yes Yes Yes - varies by RA Yes Other Attributes The attributes for this category includes issues raised in the interviews focused on publishing and collaboration not covered by the previous categories. They included the ability to get a Digital Object Identifier (DOI) number, long term storage or archiving policies, collaborative features, costs, versioning, limitations, and stability of platform. Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- TABLE 4. OTHER EVALUATION ATTRIBUTES DATA REPOSITORY OCADS/NCEI CUAHSI HydroShare CUAHSI HIS IOOS ERDDAP EPA Exchange Network DOIs Yes - Upon request, Yes - DOI numbers are Yes - DOI numbers Not issued by No Available are available after minted immediately on are minted after ERDDAP but data are accepted publishing data data are accepted can be added by NCEI (1-2 weeks) as variable in configuration Long Term Yes - Data are Yes - NSF commitment Yes - NSF No - but RA Yes - Water Data Storage archived at NCEI to maintain data commitment to may have an Quality Portal (NOAA's official data repository maintain data archiving plan archive) repository with NCEI Collaborative No Yes - can group No No No Features datasets under collections, tools for interaction within platform Cost for Not for NOAA No cost at present and No cost at present RA may No Submitting supported data; no immediate plans to and no immediate request May be costs for change plans to change support for very large datasets; publishing Non-NOAA funded data data will be subject to approval Data Submitted data are Submitted data can be Submitted data are Public only Supports Publishing publicly available kept private until ready publicly available public and Options to make public private data (e.g., regulatory) Limitations Datasets larger than 2 GB data per user but 2 GB data per user. Work with WQX is not 2 GB must be sent can create additional Larger datasets Regional structured for via FTP; for larger users for collection can be added Association continuous datasets, contact if needed. Larger directly to the party to set up monitoring OCADS directly datasets can be added middleware used by agreement data, requires directly to the party CUAHSI CiRODSI before conversion to middleware used by submitting binned data CUAHSI (iRODS) Long Term Yes - Uses NCEI Yes - Data system Yes - Data Yes - RAs are EPA has Stability data system as stored at Renaissance system stored supported by long-term backend. OCADS Computing Institute at Renaissance NOAA IOOS; commitment is upgrading to (RENCI), supported Computing Institute specific data to EN better metadata user by NSF; offsite backup (RENCI); supported systems can interface and search storage by NSF; offsite vary by RA portal in 2021 backup storage Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Data Repository Test Cases Following the initial review and evaluation of the data repositories, test cases were conducted to determine which repository may be the best fit for the data management needs of the NEPs and related OCA monitoring groups. SHELL DAY Test Case: Shell Day Data https://iopscience.iop.Org/article/10.1088/1748-9326/abcb39: http://www.necan.org/ShellDav Data Provider: Jennie Rheuban, Woods Hole Oceanographic Institution Data Summary: Citizen Science data from Shell Day monitoring event. Discrete samples of surface water samples were collected up to five times at different tidal stages on one day throughout New England in August 2019. Bottles were delivered to participating laboratories and analyzed for total alkalinity. Salinity and temperature were also recorded by samplers. Data Description: Discrete sampling data/historic/final data set Requirement for Data Repository: Need to submit data to accompany a paper in publication, DOI is required. Repository Tested: CUAHSI HydroShare Recommend: Yes Link to Dataset: https://www.hvdroshare.org/resource/4364cffedc7e49d49255eef5f8e83148/ Overall Ms. Rheuben reported that the HydroShare platform was straightforward and easy to use, largely because she had a separate file for a detailed metadata record and the data files were ready to submit. It only took her a few hours to set up an account, fill out the required metadata fields through an online form, and upload the individual data and supplemental metadata files. While developing the record, she noted that the data can be kept private, and other collaborators can be given access. This was extremely useful in finalizing the dataset with her collaborators. Metadata General project metadata elements were required and included project name, authors, keywords, project summary, and temporal and spatial boundaries. Additional fields were available to list contributors and funders. There is no requirement or recommendation for structuring the data in terms of vocabulary or required parameters. The general project metadata elements included an abstract, or description, of the dataset. She felt this was easy to generate using content from the paper that was written about the project and dataset. Additional sentences were added to describe the parameters and keywords. Ms. Rheuben had familiarity submitting other carbonate datasets to OCADS where additional metadata requirements include information about the data. These metadata are mapped to a template that allows for information to be provided about uncertainty for carbonate data parameters. She was surprised that HydroShare did not provide a similar template. Submitting Data There was flexibility on the platform to use a variety of data formats for data and metadata files which was helpful. There are several states of data visibility after submitting data that are well-described. These include Private - not visible publicly, but can be shared with collaborators; Public - publicly visible, but DOI isn't published, can still edit; and Published - DOI finalized, dataset record is closed to edit. Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Publication and Collaboration Tools The feature to add collaborators for co-editing in draft mode was helpful to Ms. Rheuben and is a feature unique to HydroShare. For this highly collaborative project, she found it useful to have the ability to add co-authors and funders. However, the format was structured for federal grants and didn't fit all funder cases such as foundations. Before adding the keywords, she tried to infer what the search tools are looking for to tailor the information so the dataset would be discoverable. She looked at a few other datasets and did something similar with the structure of her metadata. Getting a DOI quickly was a requirement for this data set as a publication was about to go to press. The DOI was minted instantly, and a citation was provided in draft mode even though the data weren't finalized and published. This was particularly helpful for her to be able to send to the journal immediately. There were some lingering questions Ms. Rheuban had about best practices for updating this dataset annually. It wasn't clear to her if it is recommended to do a separate submission with a new DOI issued each year. But she noted that approach is consistent with projects like the Global Carbon Project that has a yearly citation and DOI for data. C-tutCv Partnership Test Case: Casco Bay Estuary Partnership Continuous Monitoring OCA system, South Portland, Maine Data Provider: Dr. Chris Hunt, University of New Hampshire (UNH) Data Summary: Casco Bay Estuary Partnership is an NEP collecting continuous monitoring OCA data. The University of New Hampshire manages the sensors and processes the data. Data Description: One continuous monitoring station, housing sensors for measuring pH, salinity, temperature, pCO.-,, and DO, was deployed in Casco Bay. There is no telemetry, so once per month the data were downloaded from the sensor loggers. The raw data were processed by UNH using MATLAB and vendor-provided software and output to an Excel file with averaged hourly readings. The data were then organized in annual files and sent to the Casco Bay Estuary Partnership. Requirement for Data Repository: None, however, it is planned to publish the data findings in the future. Repository Tested: OCADS Recommend: Yes Link to Dataset: https://www.ncei.noaa.aov/data/oceans/ncei/ocads/metadata/0229832.html Overall Because of his familiarity submitting other datasets to NCEI, Dr. Hunt found the process for submission to OCADS was straightforward and easy to follow. The website provided clear guidance on accessing the metadata templates and how to submit once they are completed. Metadata Dr. Hunt stores the monthly data logger data in an Excel file that contains basic metadata about the dataset. Each year an annual report is submitted to the Casco Bay Estuary Partnership that contains additional metadata. OCADS provides a metadata submission form (as an Excel file template) specific to OCA data with required and recommended fields. In the OCADS template, there are over two hundred metadata elements where information can be provided if available, but only about 36 fields are required. The most specificity that is asked for in this template is for calibration information; for this dataset, that only applied to manufacturer information and was easy to add. For data that might have a lab analysis component or was measured Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- on a research or monitoring cruise, there would likely be more information about calibration that could be added, and the template would accommodate that data. Submitting Data Once the metadata submission form was filled out, it was an easy process to email the contact person listed on the OCADS website. Dr. Hunt received confirmation that the metadata and data files were received, and he was given an estimate of about a week to have the data published. He received notice within a week that the data were published. Note: since the use case testing, NCEI has updated to the Scientific Data Information System (SDIS) for submitting data. More information is available in Appendix D. Publication and Collaboration Tools OCADS did not have the collaboration tools that were present in CUAHSI. Once submitted and accepted by OCADS, the data set cannot be edited, and all data are publicly available. Friends of Casco Bay Casco BAYKEEPER Test Case: Friends of Casco Bay Data Provider: Mike Doan, Friends of Casco Bay Data Summary: Casco Bay, Maine Data Description: A continuous monitoring station has been deployed off Cousins Island in Yarmouth, near the coastal midpoint of Casco Bay since 2016. Year-round, the station collects hourly measurements of depth, temperature, salinity, dissolved oxygen (DO), chlorophyll, turbidity, pH, and the partial pressure of carbon dioxide (pC02). The data are processed annually and shared with the Casco Bay Estuary Partnership. https://www.cascobavestuarv.org/resources/data-open-science/ https://www.cascobavestuarv.ora/casco-bav/monitorina/ https://www.cascobav.org/our-work/science/continuous-monitoring-station/ Requirement for Data Repository: None, however, there is interest in sharing data more broadly. Repository Tested: CUAHSI HydroShare Recommend: Yes Link to Dataset: http://www.hvdroshare.org/resource/0aadc8e61e68436abb5e99e0be6565a2 ("currently in private mode) Overall Friends of Casco Bay downloads the data from the sensor data logger every two to three weeks and manages the data in a local database along with metadata about the project and sensor. They also have a QAPP for the monitoring program that is available upon request. They have been interested in sharing data more broadly but have not submitted to an online data repository. The process to create an account on HydroShare was simple and straightforward. Metadata The basic metadata that is required to set up the dataset is added through a form on the website. Having this information ready to go made the process very easy. Looking at a few similar data records was helpful in structuring keywords and additional metadata for the data set. Submitting Data The data are kept in an Excel file and the metadata in a PDF. It was straightforward to upload the data as attachments to the record. Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Publication and Collaboration Tools The dataset is on HydroShare but is still in unpublished mode pending final decision from Friends of Casco Bay to make it publicly available. Results As a result of the interviews, online data repository review, and the test cases, two primary data repositories, NOAA's OCADS and CUASHI's HydroShare, were found to be suitable options for the NEPs and other data managers for submission of ocean and coastal acidification data (both discrete and continuous). These repositories both employ a similar approach to basic metadata requirements to make datasets more discoverable and accessible. HydroShare requires basic metadata that describes the project (e.g., who conducted the research, the temporal and geospatial bounds of the study, a summary or abstract description of the project, and keywords that are used to find the dataset). OCADS also requires similar project level metadata as well as metadata specific to OCA parameters (e.g., parameter name, units, uncertainty of sensor). The other repositories that were evaluated are also suitable for OCA data but had additional process steps that were not evaluated with the test cases. These repositories were CUAHSI HIS, EPA Exchange Network, and IOOS Regional Association partnerships. The HydroShare platform is very easy to use and provides some useful features that OCADS does not. Specifically, HydroShare allows users to: collaborate on dataset submission, organize datasets under a collection, and select from several options for publication states based on the requirements of the data provider. One of the limitations of HydroShare, however, is a lack of specific requirements or prompts for detailed metadata about OCA parameters which puts the onus on the data provider to include that information. NOAA's OCADS is a discipline-specific repository for global carbon chemistry data and is a valuable resource for the OCA community, providing a "one-stop shop" for OCA data. OCADS provides more structure to the data provider by requiring a metadata template to describe OCA metadata. While there are more required metadata fields than the other platforms, it is up to the data provider to create as thorough a record as possible. Submitting the data is straightforward, and the OCADS contacts were quick to respond once the data were submitted. In sum, if the data provider values sharing additional datasets that are not specifically measuring OCA data, HydroShare would be suitable for the collaborative features and ability to group multiple datasets into collections. In contrast, if the data provider is only interested in submitting OCA data, the OCADS repository could be the best choice. OCADS is interested in being a clearinghouse for OCA datasets and synthesized data products such as the recently developed Coastal Ocean Data Analysis Product for North America (CODAP-NA) dataset comprised of data from over 60 cruises and 3000 oceanographic profiles7. There is also a hybrid model where an NEP could use HydroShare to set up a metadata-only record that points to a dataset that is located within another repository such as OCADS. This approach would be useful if the NEP was interested in managing multiple datasets with HydroShare, but also wanted to make sure the OCA data was findable through the dedicated OCADS repository. While these repositories were found to be best suited to the individual test cases, they do have limitations in enabling full interoperability of data across organizations and datasets because the measurements or observations are not standardized to a common template or schema. If that level of interoperability is desired by the organization, using The Exchange Network, CUAHSI HydroClient, or another domain specific repository and conforming to the specialized data schema would be necessary. Overall, new collaboration and online data sharing tools and approaches are available to scientists and managers to communicate results to a broad community. These approaches, such as Open Science, and the FAIR0 data principles also offer greater opportunities to scientists to share large continuous monitoring datasets. 7. https://essd. copernicus. ora/articles/13/2777/2021/ 8. FAIR Data Principles - https://www. go-fair.ora/fair-principles/ Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Best Practices for Preparing and Submitting Data Choose Repository Before submitting data to an online repository, it is important to identify the repository that is best suited for the dataset and the level of expertise of the data submitter. Understanding the repository submission process at the outset of a project can provide insight into best practices for developing metadata and structuring datasets. It is important to be familiar with the requirements from the selected data repository. Communication with the OCA community of practice, for example through the Ocean Acidification Information Exchange, is a good source of information on how to best represent data using standard names if there are minimal requirements for the data. If a standard does not exist for the type of data being submitted, it might be helpful to search for similar datasets in the repository as examples for structuring the data and metadata. For example, OCADS recommends the use of data standards and quality flag codes from earlier work of the Clivar & Carbon Hvdroaraphic Data Office (CCHDO) and World Ocean Circulation Experiment (WOCE). OCADS is in the process of updating these standards. Test Access and Discovery Tools The data submitter should become familiar with the discovery tools from the selected data repository. This will be a helpful guide in making sure the data are well described and findable. Use the repository's tools to discover and evaluate similar datasets. This can help inform what information should be included to make the dataset more findable. What filters are available? Is the search conducted using open ended keywords or a curated list of selectable terms? Is the title descriptive of the dataset? Are similar datasets stored in the data repository? Create Metadata Independent of which data repository is ultimately chosen, creating and managing a thorough metadata document for the OCA dataset is important and can be stored as a text or PDF file. Once this information has been developed about the dataset, it can be used to add data to any repository. Depending on the requirements of the individual repository, this information can be easily transferred to a template, added to an online form, or uploaded as a file. This file should contain all the information about the dataset to help others filter for relevance. Good metadata will enable researchers to determine if data are valuable for their needs. Most online repositories require some common descriptive metadata necessary to catalog the dataset within the repository, including project title, contributing researchers, project summary, spatial and temporal extent, parameters measured, and keywords. To ensure data are optimized for reusability, the FAIR data principles should be applied to the metadata that describe a dataset. The FAIR principles were primarily designed to improve machine-readability (i.e., the ability use computational systems to find, access, integrate data from multiple sources (i.e., make interoperable), and reuse data with no or minimal human intervention). Even if machine-readability is not a primary consideration for the data provider, implementing FAIR data principles will improve the reusability of the data by others. More detailed metadata to describe the parameters being measured should include the parameter names, units, calibration detail, uncertainty from sensor provider, laboratory processes, calculations, and other data specific descriptions. The repository may provide a template to enter the metadata that can be submitted with the dataset. Basic information that should be described in a metadata document: Description of the program and project. ฆ Include institution, title, abstract, funders, partners. Description of the origin of the dataset in the metadata. ฆ Where were the data collected? Provide a geospatial bounding box of the study area. ฆ When were the data collected? Provide a temporal range if the data are complete. Method of data collection. 20 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Indication if the data are processed or a raw dataset. Description of the data being collected. ฆ Variables, units, vocabularies, uncertainty (from instruments). Description of the purpose of the dataset. Why was this dataset created? What was the goal when the dataset was created? While data may have been created for a specific purpose, describing the data may enable other purposes or applications of the data. Indication if there are usage limitations or considerations, Indication if the data can be shared openly or are they too sensitive to be shared publicly. Are the data freely available for use? For what purposes? Description of how to access the data (online? email? contact?). If data are available online be sure to include a get data link so the data can be easily accessed. Submit Data and Test Access and Publication of the Dataset Once the metadata have been created and any requirements for the dataset are met (file type or format), it can be submitted to the data repository. Depending on the repository, the data submitter may be asked to create an account or directed to send an email to an individual. Once the data are submitted, it should be verifed that the data are stored accurately, a DOI is provided (if available from the repository), and that the data can be found using the search tools. 21 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- References Amorim, R.C., Castro, J.A., Rocha, J., and C. Ribeiro. (2014). LabTablet: semantic metadata collection on a multi- domain laboratory notebook. In: Communications in Computer and Information Science 478:193-205. Duarte, C.M., Hendriks, I.E., Moore, T.S., Olsen, Y.S., etai. (2013). Is Ocean Acidification an Open-Ocean Syndrome? Understanding Anthropogenic Impacts on Seawater pH. Estuaries and Coasts 36: 221-236. https://doi.Org/10.1007/ s12237-013-9594-3. Gledhill, D.K., White, M.M., Salisbury, J., Thomas, H., etai. (2015). Ocean and coastal acidification off New England and Nova Scotia. Oceanography 28(2): 182-197. http://dx.doi.org/10.5670/oceanog.2015.41. Gassett, P.R., O'Brien-Clayton, K., Bastidas, C., Rheuban, J.E., etai. (2021). Community Science for Coastal Acidification Monitoring and Research. Coastal Management AQ{b)\^Q-b^. https://doi.Org/10.1080/08920753.2021.19 47131. Jiang, L.-Q., O'Connor, S. A., Arzayus, K. M., and Parsons, A. R.: A metadata template for ocean acidification data, Earth Syst. SData, 7, 117-125, https://doi.org/10.5194/essd-7-117-2015. 2015. Pimenta, A. and J. Grear. Guidelines for Measuring Changes in Seawater pH and Associated Carbonate Chemistry in Coastal Environments of the Eastern United States. U.S. EPA Office of Research and Development, Washington, DC, EPA/600/R-17/483, 2018. Rheuban, J.E., Doney, S.C., McCorkle, D.C., and R.W. Jakuba. (2019). Quantifying the effects of nutrient enrichment and freshwater mixing on coastal ocean acidification. Journal of Geophysical Research: Oceans 124(12):9085-9100. httos://doi.oro/10.1029/2019JC015556. Rosenau, N.A., Galavotti, H., Yates, K.K., Bohlen, C.C., etai. (2021). Integrating High-Resolution Coastal Acidification Monitoring Data Across Seven United States Estuaries. Frontiers in Marine Science 8:679913. https://doi.org/10.3389/ fmars. 2021.679913. Saba, G.K., Goldsmith, K.A., Cooley, S.R., Grosse, D., etai. (2019). Recommended priorities for research on ecological impacts of ocean and coastal acidification in the U.S. Mid-Atlantic. Estuarine, Coastal and Shelf Science 225:106188. https://d0i.0rg/l 0.1016/j.ecss. 2019.04.022. Turner, J., Gassett, P., Dohrn, C., Miller, H., etai. (2021). Opportunities for U.S. State Governments and in-Region Partners to Address Ocean Acidification through Management and Policy Frameworks. Coastal Management 49(5):436-457. httos://doi.oro/10.1080/08920753.2021.1947126. U.S. Environmental Protection Agency. (2021). Measuring Coastal Acidification Using In Situ Sensors in the National Estuary Program. Washington D.C., Document No. EPA-842-R-21001. Washington State Blue Ribbon Panel on Ocean Acidification. (2012). Ocean Acidification: From Knowledge to Action, Washington State's Strategic Response. H. Adelsman and L. Whitely Binder (eds). Washington Department of Ecology, Olympia, Washington. Publication no. 12-01-015. https://cig.uw.edu/publications/ocean-acidification-from-knowledge- to-action-washington-states-strategic-response/. Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., etai. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3:160018. https://dx.doi.org/10.1038%2Fsdata.2016.18. 22 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Appendix A: Glossary of Terms Application Programming Interface (API) - Programming code that enables separate software applications to communicate even if they are written in different programming languages. APIs are typically used to programmatically retrieve or submit data from one application to another application. CF (Climate and Forecast) Conventions - NetCDF (Network Common Data Form) Climate and Forecast metadata conventions https://cfconventions.org/. Citation Metadata - High-level information about a data set that can be used to cite the data for use by others. Depending on the requirements of the repository, this information can be entered through a website form or submitted to data repositories as a separate file along with the file of data observations. Citation metadata generally includes the name of the study or dataset, summary of the data and parameters measured, researcher names and affiliations, dates and geospatial range of the study, and keywords that will make the data more findable. The Darwin Core Standard (DwC) - A data standard for compiling biodiversity data from varied and variable sources. https://dwc.tdwa.org/ (see data standard below). Data Archive - A system for the long-term storage of data. Generally, archiving is the process of moving data out of a more easily accessible system and into a long-term storage system. The terms "archive" and "repository" are used to describe systems for long-term data storage and are often interchanged. A formal data archive generally indicates there are specific procedures and rules for storage and retrieval of data. NOAA's National Center for Environmental Information (NCEI) serves as the official Archive for data collected by NOAA scientists and NOAA funded projects and research efforts (e.g., the Integrated Ocean Observing System). Other non-NOAA data can be submitted to be archived, but must adhere to guidelines and are subject to approval. Other federal agencies may have formal data archiving policies and procedures. Database - A structured format used to store and manage data. In a relational database structure, data are stored in tables made up of rows and columns. The relational database also contains descriptions of how the data tables relate to each other so that it can be accessed through queries. Examples of relational database software include Microsoft SQL, MySQL, PostgreSQL, and Oracle. Data Management System - Systems used to manage, organize, and provide access to data. Data management systems can include relational databases (e.g., Oracle, SQL Server, Microsoft Access, MySQL) or flat file data storage (e.g., Microsoft Excel or .csv). Data Repository - A data storage system containing a collection of individual datasets that have been organized in a logical manner and made accessible for use. The method of data storage used in a particular data repository can range from a structured relational database to a collection of individual files such as spreadsheets, documents, or even image or video files. Data Schema - The structure of a database defined by how data are organized and their relationships. It is also defined by the business rules or constraints on the data. Elements include description of data fields and data types. Data Standard - A set of specifications or rules for how data should be described and recorded. Data standards are generally developed and maintained through consensus of a group of subject matter experts. Mapping datasets to a common format appropriate to the content area of the data will enable the ability to share, combine, and better understand data from different sources. CF Conventions, Darwin Core, ISO, and WQX are examples of data standards used by the water quality community. 23 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- DOI - Digital Object Identifier, a unique string of numbers and letters used to give a unique identifier to an article, document, or dataset. Some data repositories will issue or mint a DOI for datasets contained in the repository. Others will allow users to add a DOI obtained through another provider. DOIs are increasingly being required for datasets referenced in journal publications. ERDDAP - Open-source data management software used to set up a data repository for storing gridded and tabular scientific datasets in common file formats. Developed by NOAA's Southwest Fisheries Science Center, ERDDAP software is free and open source and can be installed and managed on a local or cloud-based server. https://coastwatch.pfea.noaa.aov/erddap/index.html. FAIR Data - Data that meet the principles of Findable, Accessible, Interoperable, Reusable. The GO FAIR Initiative is a stakeholder driven effort to implement FAIR principles for data, https://www.ao-fair.ora/fair-principles/. FTP - File Transfer Protocol, a methodology for transferring files between computers across the internet using standard internet protocols. It is often used for files that are too large to transmit through email. GitHub - A repository hosting platform for storing and sharing open-source software code. GitHub is also a version control system which enables collaborative development on public repositories for maintenance, upgrades, and improvements, http://aithub.com. Metadata - Information that describes and is essential for understanding the data. Citation metadata often required by data repositories includes the researcher's name and affiliation, and temporal and spatial extent of where the dataset was collected. Additional descriptive metadata often required by data repositories includes a description, keywords, usage and citation information, DOI, and study design and methods to name a few. Metadata that describes the actual observations should include date and time of observation, variable names, label or description, and units at a minimum. Data repositories and data standards will have required metadata elements and example templates. NetCDF - Network Common Data Form, a file format for storing complex scientific data that contains multiple variables, such as a data from an observing platform, satellite, or model grid (e.g., temperature, salinity, air pressure, etc.). The file format is "self-describing" which means it includes information within the file that describes the structure and layout of the file that can be interpreted by software designed to read the file type, https://www.unidata.ucar.edu/software/netcdf/. Published data - Data that are finished, have received formal DOIs, and are discoverable and accessible by the public. Once published, data managers can no longer modify the content or metadata descriptions. QAPP (Quality Assurance Project Plan) - A formal written document describing the detailed quality control procedures that will be used to achieve a specific project's data quality requirements. https://www.epa.aov/citizen-science/qualitv-assurance-handbook-and-auidance-documents-citizen-science-projects. Vocabulary - Often referred to as controlled vocabulary, provides a consistent way to describe data using a set of standard terms (e.g., field names, units, categories). Vocabularies are developed and maintained through communities of practice. Examples include Climate and Forecast conventions and Darwin Core. Web Service - Software that enables two machines (or applications) to send and receive data over the internet. A request is made from the client machine to the server, and information (typically data) is returned. Client and server do not have to have the same configuration to communicate when using a web service. XML - extensible Markup Language is a markup language like FITML that was designed to store and transport data. What makes it extensible is flexibility within the language for a user to define custom tags that describe the data. 2 1 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Appendix B: Interview Questions Responses can be found in Appendix C. Data Manager Can you describe the flow of data from sensors to internal data management system? ฆ Highlight challenges and aspects that work well. What format are the data stored in (Excel, database)? Has a metadata record been developed? Is there a QAPP and does it address QA/QC or flags on data? Have the data been submitted to a data repository? ฆ If so, why are they submitting data to that system, and what is the desired functionality of the data repository/ management system? ฆ If not, what are the limitations? Who are the users/stakeholders of your data? Data User How do you use OCA sensor data sets? How do you currently access OCA sensor data sets? Do you have a preferred file format (e.g., csv, MATLAB) for data? How important is access to metadata? Do you use any online data portals/repositories to access OCA other water quality data? What works and what doesn't work for the portal(s)? Do you use any online portals/repositories for submitting data? What works and what doesn't work for submissions? What functionality are you looking for in an online repository? What is your ideal process for accessing data? What is your ideal process for uploading data? When you download a data set do you expect to get metadata? What metadata do you expect to get? How important is a DOI in understanding origin/provenance of data (for access or for citation/publication)? How often do you need to contact someone associated with the data before using it? If that information was included in metadata, would it remove that burden? Does that clarification information make it back into the dataset in some way? 25 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Appendix C. Summary of Interviews with Data Managers and Data Users Responses to interview questions in Appendix B. NEP DATA COLLECTION, PROCESSING, AND STORAGE METHODS NORTHEAST/MID-ATLANTIC Casco Bay - Friends of Casco Bay (FOCB) One continuous monitoring station off Cousin's Island, plan to deploy two more Requires on site work to maintain, need to visit every two weeks (land access preferred) YSI data sonde measures pH, temperature, salinity, chlorophyll, fdom, pC02 - happy with system, requires maintenance Take grab samples for validation Have existing QAPP and added to it for OCA sensors QA/QC procedures extension of work they've always done Like EPA document for lab, but wish there was field station QA/QC guidance Every two weeks upload data to local database following QAPP ฆ Convert pC02 data, calculate TA and other OCA indicators with C02 ฆ Values added to Excel spreadsheet ฆ Flag QA/QC issues Metadata are essentially ongoing log Data used by Maine Dept of Env Protection, Univ of New Hampshire, Univ of New England, EPA and Casco Bay Estuary Program (CBEP) Have been working with Maine DEP to get data into Water Resources Database (WRDB) ฆ Like this because other groups use the data ฆ It has been a big undertaking to convert to ME's Environmental and Geographic Analysis Database (EGAD) database first Interested in sharing data more broadly Casco Bay - University of New Hampshire (UNH) Three sensors measuring pH, salinity, temp, pC02 funded by Casco Bay Estuary Program (CBEP) No telemetry, must retrieve loggers and manually download data monthly Data processed by UNH using MATLAB into hourly data in Excel, organize into annual files Files are stored at UNH, create annual QC file to send with data (primarily CBEP) No telemetry, so they don't know if sensors stopped working until they are downloaded Excel data contains basic metadata, also create annual report with additional metadata CBEP data has not been submitted to online repository UNH has submitted other OCA data to OCADS and SOCAT (Surface Ocean C02 Atlas) cruises Massachusetts Bays Current sensor in Duxbury, data are telemetered to UMass Boston, plans for another sensor in Barnstable Data available on request, interest in making data accessible (considering NERACOOS) Developing QAPP with UMass Would be interested in IOOS Quality Assurance/Quality Control of Real-Time Oceanographic Data (QARTOD)/QA QC for OCA data Stakeholders include shellfish industry, state agency, research Barnegat Bay Three sensor systems (YSI EXO platform) All telemeter but only one has OCA data (pH, C02) YSI data goes to NJ DEP website available in real time, but no OCA data Developing a QC process with NOAA lab for OCA pH and C02 data are pulled off logger and stored as .csv files No metadata records for OCA data Interested in DOI, but not a requirement 26 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- NEP DATA COLLECTION, PROCESSING, AND STORAGE METHODS WEST COAST San Francisco Bay Data stream sent to CENCOOS and are available in ERDDAP Following IOOS guidance working with CENCOOS (RA) to apply QARTOD and archive For non-real-time data, periodically send to CENCOOS to be posted; not high level of QC Maintain basic internal metadata, QA/QC Flaven't submitted to other OCA repositories, time/resource limited, hasn't been a big priority Interviews with Data Users DATA USER DATA DISCOVERY AND ACCESS SUMMARY NEP Program Director (Curtis Bohlen, Casco Bay Estuary Partnership) Prefers data in csv or Excel, but concerns with large data files and analysis software (R) Metadata is critical and not well defined or buried in protocols ฆ Specifically, would like more clarity on how data was processed, what calculations were used, sampling design, where/when collected Flas used online repositories (NCEI, OCADS), prefers API to script data, must use different process for every repository Data formats and vocabularies are inconsistent across data providers DOI not critical yet, but can see growing need CBEP has started using GitHub repository to share data and analysis from State of the Bay report OCA Researcher (Grace Saba, PhD, Rutgers University) NCEI is the first stop for full datasets, generally easy to get list of matching datasets ฆ Often must search with in the results, change search parameters or contact data provider to be sure Uses data from Barnegat Bay, but must contact directly to get data Gets data from state repositories like NJDEP Flas gotten data from BCO DMO Prefers NetCDF, but can work with csv or Excel Metadata is usually lacking information about QC, often hidden in general protocols ฆ Often hard to tell if flagged data are removed or just flagged ฆ Would like to see better QC with carbonate data generally to feel confident using in analysis DOI becoming more important for citing in reporting and with data requirements for projects Mass Bays Estuary Partnership (Pam DiBona, Prasseda Vella) Using OCA data to address shellfish industry concerns, working with OCA commission Often get data directly from organization website or regional collaborations (NE Ocean Data Portal) ฆ Data are often stale Flexible in terms of data types, Excel is preferred if had to choose Finds most providers have different style/narrative for metadata Need QA/QC information to know how data was collected to not have to make assumptions EPA requires QAPP for data providers which is helpful Prefer data portal to discover/access data, less important to visualize and see trends over time but would find useful DOI is not important at this stage Often must contact data providers to understand study design and metadata Very interested in QARTOD recommendations for OCA data 27 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Appendix D. Data Repository Submission Detail Disclaimer: This section was accurate at the time of publication. The process for submitting data to the repositories is subject to change or updates. Please visit the repository website for updated information or guidance before submitting data. NCEI OCADS Submitting Data The OCADS data portal provides support and guidance for preparing data for submission. The recommended approach is to prepare metadata following a template provided (Metadata submission form). This template is an Excel file with metadata elements (both required and recommended) for the user to provide input specific to the dataset. There are supporting documents for help on metadata element names (Instruction file). Of the 221 elements, 36 are required and several are repeat blocks for multiple variables. The global variables include basic program level detail (PI name and institution, title of project, abstract, author list). The main component of the metadata is the "variable metadata" section. Required elements include variable abbreviation, unit, uncertainty. Specific variable blocks for OCA variables (DIC, TA, pH, pC02A, pC02D) are provided to provide additional detail where appropriate (e.g., for pH, temperature of measurement, temperature recorded, uncertainty). Other values (e.g., date, longitude, latitude, depth, bottle number, OA flags, standard deviation) that are not considered independent don't require separate sections. It is recommended to avoid special characters (degree symbol, sigma, etc.). Additional information is available through a 2015 paper that describes a metadata template for ocean acidification data (Jiang et. al., 2015). Work is currently underway to update these standards. Guidance for preparing data files is available through additional templates based on the type of data. These include Underway, Profile data (e.g., CTD, discrete bottle water samples), and Mooring data (e.g., buoys). The templates provide recommendations for column header names and an example data file for each category of data. No specific file format is required, but ASCII (.csv) or NetCDF is recommended, and proprietary file types are to be avoided if possible. Using CCHDO parameters and WOCE quality control flags is also recommended whenever possible. Once the data templates are complete, the metadata and data files are then uploaded through the NOAA PMEL Scientific Data Information System (SDIS) (https://data.pmel.noaa.aov/sdia/oap/Dashboard/OAPUploadDashboard. html). A video tutorial is available on the website. If the data files are too large (>20 MB), NCEI will provide an alternative method for transferring data. 1. Underway Column header names description (data file (csv) example). 2. Profile data (e.g. CTD, discrete bottle water samples) Column header names description (data file (csv) example). 3. Mooring data (e.g. buoys) Column header names description (data file (csv) example). While the data are mapped to a template, the search and discovery tools only search metadata and data are downloaded in their original file format. Acquiring Data from OCADS OCADS provides a data access portal (https://www.ncei.noaa.aov/access/oads/) to discover and acquire datasets. Several filters are available to narrow down the selection of data. They include filters for: "Core variables" (e.g., TA, DIC, pH, C02, temp, salinity, nutrient, etc.); "Other variables" (an uncurated list of user submitted variables); "Observation category" (e.g. surface underway, profile, time series, model output, benthic FOCE); "Additional terms" (free text keyword search) "Observation date" (start/end date), and a map where users can enter bounding box coordinates to filter geospatially. Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- The results are sorted by most recently submitted data, but no additional sorting options are given at this time. The results include the title, first two lines of abstract, a thumbnail image of sample area and links to NCEI metadata and Project metadata. The number of matching results is indicated with a link to further refine the search. The full results can be accessed through web services (RSS, ATOM, KML, JSON, CSV). The metadata page includes the full description of the dataset, including contact information, citation information, DOI, usage constraints, etc. From the metadata page, data can be downloaded via HTTPs or FTP. Instructions: Please try not to change the order of Rows No. 1 through No. 211, as the information will be read by a computer computer program later on. Starting from No. 212, please first append the additional variable sections, then the non-measured variable sections, then the additional principal investigator sections (if there are more than three Pis), and then the platform sections (if there are more than 3 platforms). Please do not use special characters. No Metadata element name Your input Help reference no. 1 Submission Date 1 "1 2 Accession no. of related data sets 2 3 lnvestfgator-1 name 3.1 4 lnvestigator-1 institution 3.2 ! 5 lnvestigator-1 address 3.3 6 lnvestigator-1 phone 3.4 ' 7 lnvestigator-1 email 3.5 8 lnvestigator-1 researcher ID 3.6 9 10 lnvestigator-1 ID type (ORCID, Researcher ID, etc.) 3.7 tnvestigator-2 name 3.1 11 lnvestigator-2 institution 3.2 12 lnvestigator-2 address 3.3 13 lnvestigator-2 phone 3.4 14 lnvestigator-2 email 3.5 15 lnvestigator-2 researcher ID 3.6 16 17 lnvestigator-2 ID type (ORCID, Researcher ID, etc.) 3.7 lnvestigator-3 name 3.1 18 lnvestigator-3 institution 3.2 19 lnvestigator-3 address 3.3 20 lnvestigator-3 phone 3.4 21 lnvastigator-3 email 3.5 22 lnvestigator-3 researcher ID 3.6 23 24 lnvestigator-3 ID type (ORCID, Researcher ID, etc.) 3.7 Data submitter name 4.1 25 Data submitter institution 4.2 26 Data submitter address 4.3 27 Data submitter phone 4.4 26 Data submitter email 4.5 29 Data submitter researcher ID 4.6 30 31 Data submitter ID type (ORCID, Researcher ID, etc.) 4.7 Title 5 Figure 3 - screen capture of OCADS metadata template showing required fields (in red) Instmctions: Please try not to change the order of Rows No. 1 through No. 211, as the information will be read by a computer computer program later on. Starting from No. 212, please first append the additional variable sections, then the non-measured variable sections, then the additional principal investigator sections (if there are more than three Pis), and then the platform sections (if there are more than 3 platforms). Please do not use special characters. No Metadata element name Your input Help reference no. 66 DIC: Variable abbreviation in data files 22.1 67 DIC: Variable unit 22.5 68 DIC: Observation type 22.2 69 DIC: Measured or calculated 22.6 70 DIC: Calculation method and parameters 22.7 _Z!_ DIC: Sampling instrument 22.8 72 DIC: Analyzing instrument 22.9 73 DIC: Detailed sampling and analyzing information 22.10 74 DIC: Field replicate information 22.11 75 DIC: Standardization technique description 22.12.1 76 DIC: Frequency of standardization 22.12.2 77 DIC: CRM manufacturer 22.12.3.1 78 DIC: Batch number 22.12.3.2 79 DIC: How were the samples preserved (HgCI2, or others) 22.13.1 80 DIC: Concentration and amount of the preservative added 22.13.2 81 DIC: Preservative correction description 22.13.3 82 DIC: Uncertainty 22.14 83 DIC: Data quality flag description 22.15 84 DIC: Method reference (citation) 22.16 85 DIC: Researcher Name 22.17.1 86 87 DIC: Researcher Institution 22.17.2 TA: Variable abbreviation in data files 23.1 88 TA: Variable unit 23.5 89 TA: Observation type 23.2 90 TA: Measured or calculated 23.6 91 TA: Calculation method and parameters 23.7 92 TA: Sampling instrument 23.8 93 TA: Analyzing instrument 23.9 Figure 4 - screen capture of OCADS metadata template showing additional required metadata for describing data parameters (in red) 29 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- OAP Scientific Data Information System Please enter your OAP Dashboard Login Credentials Username: | Password: | 3^ Submit Request an account Need login help? NOAA | OAR | PMEL | Privacy Policy | Disclaimer | Accessibility Figure 5 - screen capture of Ocean Acidification Program Scientific Data Information System (OAP SDIS) login page OAP Science Data Information System Manage Metadata; FOCB Citizen Steward 2016_2017FINAL*lsx Sond Feedback Preferences Help Investigators Citation Information Time and Location Information Finding Platforms OIC pC02A PCQ2D Preview Download Save Upload OADS Metadata fiซ (XML. Excel, or CSV) Last Name Enter the Information for this Data Submitter. <*) Denotes a required field. First Name * M l. Last Name * | First Name 0 | M.l.(s) institution * | Institution Address Line 1 | Address First Line Address Line 2 (Optional) Address Second Line State/Province Zip Code/Postal Code Zip Code/Postal Code Country Country Telephone Number Emai Address * Telephone Number Researcher ID Type Researcher ID | OAR | PMEL | Privacy Pdley v_2Q2I062at239 Figure 6 - screen capture of OAP SDIS metadata entry form Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- CUAHSI HydroShare Submitting Data Data submitters can publish and share datasets, manage access to shared content for collaborators, obtain a digital object identifier (DOI) that can be used in publications and citations, and aggregate resources into collections. Once in HydroShare, data resources can be readily discovered and acquired. HydroShare requires that data submitters create an account before uploading data; this is done by simply providing an email address and password, and validating the link. Support is available through a collection of FAQs (https://help.hvdroshare.org/) as well as a tutorial to describe the process for uploading data. The guidance recommends organizing data in advance of submission, especially where there are multiple files to be submitted. Recommendations include taking into consideration how potential end-users would want to interact with the data resources and the structure that will enable access, interpretation, and reuse. Each data submission is considered a "resource" which is defined as the data and associated metadata for a single unit of digital content. Comparable terms used by other systems include "package" or "dataset." Within a resource on HydroShare, multiple files of different types can be grouped within one resource and examples are provided on how one might structure individual data resources. Another level of organization called a "collection" is available, where multiple, individual data resources can be grouped together under a common theme such as an event (e.g., Hurricane Harvey) or a network of groups monitoring similar data. A template is provided to make it easier to add metadata for the resource. There are only a few required fields for the metadata: title, abstract, and keywords. Recommended fields include temporal and spatial coverage of the resource. Additional metadata fields are available to add references, sources, related resources, credits (funding agencies), and contributors. Best practices for generating the content are provided as well as guidance for naming files. A guide is provided for data authors and publishers to provide guidance for structuring the submitted data files and metadata. The guidance is to use file formats that are open and documented standards with widespread usage in the community of practice. Tabular data should be submitted as .csv rather than Excel, with additional .csv files instead of workbooks. Each column in a table should have a detailed description in the abstract, in a README file, or in a descriptive header within the file that contains the data table. This includes the full name of the variable, a description of what it represents, units, and how it was obtained for each column. A workflow guideline is provided for submitting data that are in a complex Excel workbook with multiple tabs for raw data, finished products, and intermediate steps. Specific guidance for data and common vocabulary or standard names is not provided. Once the data resource is added, it can be made private or shared in three states - discoverable, public, or published. These are described as follows: Discoverable - A resource that can be discovered by anyone in HydroShare, but only users with permission can access the content files. Public - A resource that can be discovered by anyone, and anyone can access the files. The resource may not be final, may be subject to change, could be deleted, and is not considered to be published. Published - A resource that is finished and is formally published. Published resources receive formal DOIs and, once published, users can no longer modify their content or metadata descriptions. Published resources can be discovered and accessed by the public. Upon first submitting a resource and before publishing, data submitters can obtain the actual citation and DOI that will be finalized once the data are published. This facilitates providing the information to a journal that may have a deadline associated with publication. 31 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- There are no specific data structure requirements for data resources which makes the platform very flexible but provides fewer supports for data providers. This is meant to allow flexibility for data that needs to conform to a particular data model or resource type, and data providers can define specific file formats, syntax, and file hierarchies. The idea is that the data detail is well described by resource-level metadata to include information for all resources and extensions for each resource type. Discovering Data HydroShare has a data discovery tool (https://www.hvdroshare.org/search/) to find public and discoverable resources. Filters include selecting a temporal range (using a date selection) and geospatial range (by selecting a spatial boundary from a map). Additional filters are available for categories such as author, subject, resource type, and status (public or published). The filter terms are generated from the submitted data resources and are uncurated. A text field is provided for searching via keywords. ฉHYDROSHARE HOME MY RESOURCES DISCOVER COLLABORATE APPS HELP & * Abstract ฃ.+ O Subject Keywords xamples: Hydrologic_modeling, USU, land use O Deleting all keywords will set the resource sharing status to private. No subject keywords have been added. Figure 7 - screen capture of HydroShare metadata submission form showing abstract and keyword fields 32 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- ฉHYDROSHARE HOME MY RESOURCES DISCOVER COLLABORATE APPS HELP A Coverage & O O You can set the spatial and temporal coverage manually by using the map to place a point or box or by filling in coordinates. Alternatively, you can add content files to x your resource that have spatial coverage information (e.g., geographic feature, geographic raster, multidimensional, etc.) and then click the button to set the coverage from the content files. Spatial: Place/Area Name Q Coordinate System/Geographic Projection; WGS 84 EPSG.-4326 & Coordinate Units: Decimal degrees North Latitude East Longitude South Latitude West Longitude 180 to 180 I O Point () Box IN OH P* MDDE WV DE KV VA ซ ;?>/ "c North Atlantic Ocean ฉ + Go gle Map data ฉ2021 Google. INEGI Terms of Use Temporal: ง Start Date ง End Date Figure 8 - screen capture from HydroShare showing geospatial and temporal coverage fields CUAHSI HydroServer/HIS The original data repository developed by CUAHSI is known as HydroServer (https://hvdroserver.cuahsi.org/') and is part of the Hydrological Information System, This data repository was a precursor to HydroShare and consists of a relational database and data services. HydroServer is a more robust data integration effort and is designed primarily for time series data. The platform provides detailed templates in Excel for users to format their individual dataset. The data must be mapped to the template, and there are six reguired tables (general metadata) and seven optional tables. Once data are submitted to HydroServer, they are integrated into a relational database at CUAHSI, and the time series data can be discovered through a web application called HydroClient (https://data.cuahsi.org/). While this data repository reguires more work on the part of the data submitter to prepare and submit, the data values become gueryable and can be manipulated and visualized as basic plots through available tools and applications. This platform was designed to support hydrological data and may be more challenging to map OCA data to the provided templates. An interest in integrating a wider variety of data (models, geospatial) led to the development of HydroShare, which is more of a data catalog with no specific requirements on data type or format. 33 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- Environmental Information Exchange Network (EN) Submitting Data https://acwi.aov/monitorina/pubs/misc/publishina vour data wqp.pdf Data are mapped to one of several available data schemas, called Data Exchanges, and submitted to the network (e.g., WQX for water data). Data are stored and managed through a centralized database at EPA (CDX/STORET). The EN is primarily for regulatory data from states, tribes, and some citizen monitoring data. Submitting to the EN typically requires working with a state agency to submit data to EPA. The WQX is the most appropriate standard for OCA data, but has limited capacity to handle continuous monitoring data. Accessing Data Data submitted to the Exchange Network WQX can be access through the National Water Quality Monitoring Council Water Quality Portal: h ttp s: //www, wate rq u a I i tvd ata. u s/ IOOS RA ERDDAP Submitting Data Each IOOS Regional Association manages data independently for their region and most of the eleven RAs have an ERDDAP server for distribution and access to data. If an NEP is interested in working with the local RA to put OCA data into ERDDAP, the first step would be to make contact to discuss the process for submitting data, which generally involves describing the dataset (parameters, units) and providing basic metadata about the dataset. Accessing Data Once datasets are integrated into the RA ERDDAP, they can be accessed directly from the dedicated URL. For example, the San Francisco Estuary Partnership OCA data can be accessed through the Central and Northern California Ocean Observing System (CENCOOS) ERDDAP: https://erddap.cencoos.org/erddap/tabledap/tiburon-co2.html General Comments Feedback on the data access portals was collected from the interviews. Common issues included difficulty navigating and filtering results. The titles aren't always descriptive of the dataset and requires a bit of digging into the data. It was also easy to limit results by over selecting terms and keywords. The query results are only as good as the data entered by the data provider, reinforcing the need to develop and provide good, descriptive metadata. 3 1 Using Data Repositories for Ocean and Coastal Acidification Monitoring Data ------- |