SCIENTIFIC DOCUMENTATION FOR EMAP DATA: GUIDELINES FOR THE INFORMATION MANAGEMENT CATALOG Prepared for Dr. Robert F. Shepanek Office of Modeling, Monitoring Systems, and Quality Assurance U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 Prepared by Donald E. Strebel Jeffrey B. Frithsen Versar, Inc. 9200 Rumsey Road Columbia, Maryland 21045 30 April 1995 ------- The suggested citation for this report is: Strebel, D.E. and J.B. Frithsen. 1995. Scientific Documentation for EMAP Data: Guidelines for the Information Management Catalog. Draft: April 30, 1995. Report prepared for U.S. Environmental Protection Agency, Office of Modeling, Monitoring Systems and Quality Assurance, Washington, DC. Prepared by Versar, Inc., Columbia, MD. ------- Drait^0_A£riM_995i EXECUTIVE SUMMARY The Environmental Monitoring and Assessment Program (EMAP) is an interagency effort coordinated by the U.S. Environmental Protection Agency and designed to collect information to assess the condition of the nation's ecological resources. The Information Management System for EMAP was developed to capture, preserve, and provide to users data and information collected and prepared by the program. The report describes the data catalog component of the EMAP Information Management System. Together with the data directory and dictionary, the catalog is one of three components that provides information about data (metadata) to users. The catalog contains the detailed, scientific documentation about data that enables assessment scientists and managers to understand the conditions, assumptions, and methods under which data were collected and compiled. Presented in this report is a summary of the metadata components for the EMAP Information Management System, design requirements for the catalog, and a description of the catalog structure that was developed in response to those requirements. Requirements for the catalog were determined by EMAP assessment scientists and other users through multiple joint application design sessions and feedback provided through EMAP task group information managers. These requirements were refined based upon the overall vision for the EMAP Information Management provided by the Strategic Plan (Shepanek 1994) and review of existing standards and procedures for the completion of scientific documentation. In response to these requirements, the catalog was designed as an integral component of the EMAP relational data base. The structure of the catalog is, therefore, a set of fields organized in relational tables. The drafting of scientific documentation is presented as a collaborative effort between scientific investigators and information management staff. Guidelines presented in this report for scientific investigators present writing a catalog entry as an analogous process to writing a scientific publication. Guidelines presented for information management staff focus on formats and the definition of specific fields. This report also contains examples of data documentation to be used by others for the compilation of additional catalog entries. ------- Drait^0_A£riM_995i ACKNOWLEDGMENTS This report was developed with input from Paul Cole, Gary Collins, Steve Hale, Melissa Hughes, Judith Lear, Jane Lovelace, Doug Mann, Jeff Rosen, and David Strevel. Technical oversight and review as provided by Robert Shepanek. This report was prepared in response to work assignment 4-12 of contract 68- DO-0093 from the U.S. Environmental Protection Agency to Versar, Inc. ------- Drait^0_A£riM_995i TABLE OF CONTENTS Page EXECUTIVE SUMMARY iii ACKNOWLEDGEMENTS iv ABBREVIATIONS vii 1.0 INTRODUCTION 1-1 1.1 EMAP INFORMATION MANAGEMENT SYSTEM 1-1 1.2 METADATA COMPONENTS 1-1 1.3 ORGANIZATION OF REPORT 1-4 2.0 DATA CATALOG REQUIREMENTS AND CONTENTS 2-1 2.1 REQUIREMENTS 2-1 2.2 DATA CATALOG COMPONENTS 2-3 3.0 WRITING THE CATALOG ENTRY 3-1 3.1 DATA SET IDENTIFICATION 3-1 3.2 INVESTIGATOR INFORMATION 3-2 3.3 DATA SET ABSTRACT 3-3 3.4 OBJECTIVES AND INTRODUCTION 3-4 3.5 METHODS 3-4 3.5.1 Data Acquisition 3-5 3.5.2 Data Preparation and Sample Processing 3-6 3.6 DATA MANIPULATIONS 3-6 3.7 DATA DESCRIPTION 3-7 3.7.1 Description of Parameters 3-8 3.7.2 Data Record Example 3-8 3.7.3 Related Data Sets 3-9 3.8 GEOGRAPHIC AND SPATIAL INFORMATION 3-10 3.9 QUALITY CONTROL/QUALITY ASSURANCE 3-11 3.10 DATA ACCESS 3-12 ------- Drait^0_A£riM_995i TABLE OF CONTENTS (Continued) Page 3.11 REFERENCES 3-12 3.11.1 EMAP References 3-13 3.11.2 Background References 3-14 3.12 GLOSSARY AND TABLE OF ACRONYMS 3-14 3.13 PERSONNEL INFORMATION 3-15 4.0 EXAMPLES 4-1 4.1 BIOLOGICAL DATA SET 4-1 4.1.1 Data Set Identification 4-1 4.1.2 Investigator Information 4-2 4.1.3 Data Set Abstract 4-2 4.1.4 Objectives and Introduction 4-2 4.1.5 Methods 4-3 4.1.6 Data Manipulations 4-5 4.1.7 Data Description 4-6 4.1.8 Geographic and Spatial Information 4-8 4.1.9 Quality Control/Quality Assurance 4-8 4.1.10 Data Access 4-9 4.1.11 References 4-10 4.2 CHEMICAL DATA SET 4-11 5.0 CATALOG OUTPUT TO USERS 5-1 5.1 THE METADATA PUBLICATION 5-1 5.2 SELECTION OF METADATA COMPONENTS 5-3 6.0 REFERENCES 6-1 APPENDIX DETAILED SPECIFICATIONS FOR DATA SET CATALOG CONTENTS ------- Drait^0_A£riM_995i ABBREVIATIONS ASCII American Standard Coding for Information Interchange BOREAS Boreal Ecosystem-Atmosphere Study EMAP Environmental Monitoring and Assessment Program EOSDIS Earth Observing System Data and Information System FGDC Federal Geographic Data Committee FIFE First International Satellite Land Surface Climatology Project Field Experiment FTP File Transfer Protocol IM Information Management IMS Information Management System JAD Joint Application Design LTER Long-Term Ecological Research NASA National Aeronautics and Space Administration NRC National Research Council NSF National Science Program QA/QC Quality Assurance/Quality Control RDBMS Relational Data Base Management System SOP Standard Operating Procedure TGIM Task Group Information Manager TIFF Tagged Image File Format URL Univesal Resource Locator USEPA U.S. Environmental Protection Agency UT Univesal Time WAIS Wide-Area Information Server WWW World Wide Web ------- Drait^0_A£riM_995i 1.0 INTRODUCTION The U.S. Environmental Protection Agency (USEPA) is coordinating the Environmental Monitoring and Assessment Program (EMAP), a multiagency effort to establish a national monitoring program designed to collect the information necessary to assess the condition of the nation's ecological resources. The purpose of this document is to provide scientists and information managers within the program guidance concerning the writing of detailed documentation for EMAP and other environmental data sets. Detailed documentation for data enable assessment scientists and managers to understand the conditions, assumptions, and methods under which data were collected and compiled, thus allowing the data to be used for some purpose. The detailed documentation is one of several levels of information about data (metadata) included in the EMAP Information Management System. 1.1 EMAP INFORMATION MANAGEMENT SYSTEM The EMAP Information Management System (IMS) was designed based upon evaluations of the diverse types of users the system must serve; the great diversity of data types that must be captured, stored, and provided to those users; the types of environmental assessments that those users are likely to complete; and, the analytical, statistical, and data visualization tools users will need to complete those assessments. The results of those evaluations formed the basis for the EMAP Information Management Strategic Plan (Shepanek 1994), a document that has guided the activities of the EMAP Information Management (IM) Task Group. The core of the EMAP IMS is a distributed relational data base containing both data and descriptive information (metadata) about data residing in the data base or in files external to the data base. Metadata are a central component of the IMS. A fundamental requirement outlined in the strategic plan is that metadata components must be both flexible and robust to meet the needs of EMAP users. When fully implemented, EMAP data will be collected throughout the United States and involve the resources of multiple federal agencies. Initial users of EMAP data represent resource and coordinating groups within the program. Each of these groups has developed their own information management centers and some groups have implemented multiple data centers based upon the regional implementation of monitoring efforts. In addition to these initial data users, a diverse set of users outside of the program has arisen. These users represent various government agencies, industry, academia, and private nongovernment organizations. 1.2 METADATA COMPONENTS To meet the data needs of these varied users, EMAP metadata components are structured and organized so that users can easily find the information needed to select EMAP data sets, while not becoming ------- Drait^0_A£riM_995i inundated with unnecessary information. This organization is based upon identifying components of metadata that may be logically grouped based upon the expectations of data users. Metadata components are often referred to using terms that are not uniformly defined or applied by the information management community (Strebel and Frithsen 1991). Terms such as inventory, directory, catalog, and dictionary refer to different levels of metadata, the elements of which are typically defined by individual programs and not by generally accepted guidelines. The three principal metadata components being developed as part of the EMAP IMS are the data set directory, catalog, and dictionary. Each component provides users with different types of information needed to identify, describe, and locate data sets. The definitions recommended for EMAP for each of these metadata components (Table 1-1) are consistent with those given in the NASA Earth Sciences Lexicon (NASA 1991), and the outline for the EMAP virtual repository presented in the EMAP Information Management Strategic Plan (Shepanek 1994; USEPA 1994). Metadata components are designed to meet the needs of data base personnel and scientists and managers that use the data. The functionalities provided by the data set directory, catalog, and dictionary for data base personnel and other users are different (Table 1-2). In general, data base personnel use metadata to index, track, and organize data; others use metadata to identify what data are available and to obtain information used to understand the data. The functions are complementary, but in general the documentation required by data base personnel should not obfuscate other users' understanding of the data. In a previous report (Strebel and Frithsen 1991), metadata components were outlined and organized in a manner analogous to a scientific publication. This analogy has since been further developed (Strebel and Meeson 1992; Strebel et al. 1994a) to reflect the importance of metadata development to the general scientific community. Completely describing a data set is analogous to writing a manuscript for publication in a scientific journal (Strebel et al. 1994a). The metadata analogy to the manuscript is the data set catalog, which is also referred to as detailed documentation. This detailed documentation includes information concerning the originators of the data set, the general purpose for which the data were collected, sampling and laboratory methods, descriptions of the data and any manipulations or transformations of the data, related quality control and quality assurance measurements, procedures necessary for data access, and references to publications that use the data set. ------- Drait^0_A£riM_995i Table 1-1. Working definitions for EMAP metadata components Metadata Component Definition Data set directory Summarized data set documentation. A uniform set of descriptions of a large number of data sets, containing information suitable for making an initial determination of the existence and nature of each data set (NASA 1991). Data set catalog Detailed data set documentation - also referred to as the scientific documentation. A uniform set of detailed descriptions of a number of data sets and related entities, containing information suitable for making a determination of the nature of each data set and its potential usefulness for a specific application (NASA 1991). Data dictionary Fundamental data set documentation. The data dictionary provides a short scientific description of a parameter or variable in a data set, along with format and other basic information used in storing, searching, and displaying the data item. The dictionary contains information about the contents of each table in the relational data base. Table 1-2. Functions of metadata components Metadata Component Use by Data Base Personnel Use of Others Directory - Index and track data - Identify data sets - Select data sets of interest Catalog - Record ancillary information about data - Obtain descriptions of the data Dictionary - Organize the data - Define data formats - Select data items of interest - Understand how to use the data ------- Drait^0_A£riM_995i A summary of the detailed documentation is provided in the data set directory. Directory entries are analogous to the abstract of a scientific paper and contain information to assist the data user in identifying data sets that may be of interest. The directory is linked to the data set catalog so that users may quickly locate additional information concerning data sets of interest. Further, directory level information helps data management personnel index and track data sets. Details for the design of the EMAP data directory and provided in Frithsen and Strebel (1995). A section of a data set catalog is directly linked to the data dictionary and provides technical specifications for each data item. This fundamental documentation is used by data management personnel to organize the data and by data users to select data items of interest and to determine output formats. In addition to summarized, detailed, and fundamental documentation (directory, catalog, and dictionary metadata components, respectively), auxiliary documentation may be used to store additional information related to data sets. This auxiliary documentation can include methods manuals, photographic and electronic images of field and laboratory data sheets and quality assurance audit reports, and publications that use the data set in question. The method of access to this type of information generally depends upon user defined requirements. The EMAP Strategic Plan (Shepanek 1994) outlines a design for a virtual repository (USEPA 1994) linking common metadata components (directory, catalog, and dictionary) with auxiliary documentation in related data bases, thus providing users with links between related information about EMAP and EMAP data sets. The term inventory was used in earlier metadata discussions to refer to a subset of the information contained within the data set directory. In this context, the inventory contains information used to index and track data sets only, with little summary of the contents of the data set. This definition of inventory is redundant with what is being referred to in the directory, and is in conflict with the definition embraced by other environmental science programs (NASA 1991). The term inventory should refer to a uniform set of descriptions of elements, or granules of a data set. (A data set granule is the smallest aggregation of data that is independently managed.) The descriptions contained within the inventory provide the information needed to select the data granule of interest. Granule descriptions typically include temporal and spatial coverage, data quality indicators, and physical storage information. The contents of the inventory, therefore, should be tailored to the set of data granules to be described. Guidance on the contents of specialized inventories will be provided in a future report. Principal components of the directory and catalog are shown in Figure 1-1. Also shown is how the components relate to the processes affecting the maturation of data from collection to publication. The development of metadata is presented as a process that begins with compilation of objectives and collection methods for the catalog by sample collection investigators. When the data are electronically captured in the IMS, IM staff begin a directory entry. Additional details are provided for the directory as the data are ------- PROCESSES AFFECTING DATA MATURITY DIRECTOR Y COM PONENTS Data Set ID Information Data Set Abstract & Keywords COMPONENTS V Contact & Availability Information Objectives, Collection Methods CATALOG Data Set ID Information Processing Methods & QA/QC Analytical Manipula- tions Metadata Publication Development Contact & Access Information Figure 1-1. Principal components of the directory and catalog as they relate to the maturation of data ------- Drait^0_A£riM_995i processed and aggregated and various quality control information is assessed. Both the directory and the catalog undergo a review by internal and external environmental and information management scientists to assess completeness and usefulness to future assessment scientists and interested public users. 1.3 ORGANIZATION OF REPORT This section of the report has presented background information for the development of the EMAP IMS and the metadata components included in the EMAP IMS. The user defined requirements for the detailed, scientific documentation component of metadata (the data catalog) are presented in Section 2 along with a summary of the individual fields that comprise the catalog. Section 3 provides guidance to the EMAP assessment scientist for the writing of a catalog entry. Examples of catalog entries are provided in Section 4 to show how information is formatted within specific fields. Guidelines for the display of catalog information to users is provided in Section 5. A detailed appendix is provided to define for information management staff the format and suggested content of each field included in the catalog. ------- Drait^0_A£riM_995i 2.0 DATA CATALOG REQUIREMENTS AND CONTENTS Specific requirements for the design and composition of the data catalog were developed based upon the vision provided in the EMAP Information Management Strategic Plan (Shepanek 1994), input from representative users of EMAP data, insight gained from data documentation efforts completed for other programs, and general guidance from the scientific information management community. Input from representative users of EMAP data was obtained through various joint application design (JAD) sessions (USEPA 1993a,b; Palmer and Fields 1994). Additional input was provided by task group information managers (TGIM) who interact with EMAP assessment staff almost daily. Additional requirements for the catalog were developed from an evaluation of the lessons learned from the development of data documentation for other projects, specifically NASA'S EOSDIS, FIFE, and BOREAS projects and the National Science Program's (NSF) Long-Term Ecological Research (LTER) Program (Michener et al. 1990; Meeson et al. 1993; EOSDIS 1994; Strebel et al. 1994b; Justice et al. 1995). Requirements for the EMAP data catalog also reflect guidance provided from various interagency efforts concerning information management for environmental data. These efforts include the National Research Council's report concerning data management for global change assessment (NRC 1991) and the Federal Geographic Data Committee's guidance for the compilation of spatial metadata (FGDC 1994). 2.1 REQUIREMENTS The following general requirements have been defined for catalog documents: • The document must be a stand-alone description of the data set The detailed documentation presented in the data set catalog should contain enough information so that a potential user can fully evaluate the application of a data set for a particular purpose. This requirement is a component of what has been termed the "20-year rule" for data documentation (NRC 1991), i.e.: "Will someone 20 years from now, not familiar with the data or how they were obtained, be able to find data sets of interest and then fully understand and use the data solely with the aid of the documentation archived with the data set?" • The document must be structured for usability Users should be able to browse the detailed information in the catalog starting at specific sections of interest (for example, methods, data descriptions, publications, etc.). They should also be able to read the entire document as if it were a scientific paper describing the data set. ------- Drait^0_A£riM_995i • Catalog information must be linked to the data Access to the information in the data set catalog should also be provided directly from the EMAP data base. This represents a data management challenge because documentation about data is organized around data sets but the data are stored in a combination of relational tables in a data base and conventional data files in ASCII, SAS, or other formats. • Detailed documentation must accompany distributed data A stated requirement of EMAP (Kirkland 1994) is that complete data documentation will accompany data distributed by the program. In this way, users will be provided information with which to understand the appropriate uses and limitation of the data. The documentation that needs to accompany data is contained within the data set catalog. • The design of the catalog must minimize redundancy. Complete scientific documentation includes information that may be stored in other components of the EMAP IMS. Information for personnel contacts, methods, and publications are examples of catalog information that is already a part of the overall EMAP relational database. The catalog needs to be organized so that this information is stored once, but accessed in various forms. This is consistent with the concept of a virtual repository for information about the program, EMAP data sets, and the EMAP IMS (USEPA 1994). • Catalog information must be linked to the EMAP Data Set Directory The data set directory (Frithsen and Strebel 1995) will, in most cases, be the first stop for users interested in obtaining information about EMAP data sets. The directory should be linked to the data set catalog so that potential users can obtain detailed information about data sets identified through the directory and thus, select data sets of interest. • The data set catalog must be linked to the data base dictionary Descriptions of the data that are included as part of the data set catalog are derived directly from the data base dictionary; therefore, the catalog and dictionary need to be linked to ensure that changes within the data base are represented in the detailed documentation. ------- Drait^0_A£riM_995i 2.2 DATA CATALOG COMPONENTS There is a logical organization for the detailed information contained within a data set catalog entry. This organization reflects the expectations of a potential user who might browse the information presented in the catalog. The organization of the data catalog is patterned after that of a scientific paper (Strebel et al. 1994a). Title and author information is followed by an introduction presenting the objective and purpose of the data set. There is a methods section describing how the data were collected or acquired and the analytical procedures used to process samples. A separate section describes any analytical transformations used to prepare the data in the data set. Similar to the scientific paper, the introduction and method sections are followed by descriptions of results. The results presented in the catalog present a description of the contents of the data set and the geographic and spatial scope represented in the data. Summarized results of quality control and quality assurance procedures are also presented. The data set catalog ends with a discussion of how to access the data and other means by which the program has distributed the data. This section is followed by a list of bibliographic references pertaining to the collection and processing of the data, and demonstrating how the data has been used. The information presented in the data catalog supplements information provided in the data set directory and the data base dictionary; however, selected information included in the directory and dictionary are also included in the data catalog. The design of the catalog is such that the information stored in the three main metadata components (directory, catalog, and dictionary) is brought together in various views that in turn make the data catalog. The components of the data set catalog are organized into 13 sections representing separate groups of related information. The 13 sections are listed in Table 2-1. Each section includes fields used to organize and store information contained within the catalog. These fields generally contain text which in some cases may be several paragraphs to pages in length. The fields have been defined to assist information managers responsible for the completion of data catalog entries. By providing specifically defined fields instead of fewer, general fields, the information manager and the scientist that produced the data set (the catalog authors), are prompted for the information necessary to complete the data set catalog entry. ------- Drait^0_A£riM_995i Table 2-1. Major sections of the EMAP Data Catalog 1. Data set identification 2. Investigator information 3. Data set Abstract 4. Objectives and introduction 5. Data acquisition and processing methods 6. Data manipulations 7. Data description 8. Geographic and spatial information 9. Quality control and quality assurance 10. Data access and distribution 11. References 12. Glossary and Table of Acronyms 13. Personnel Information The specific fields defined for each of the 13 section of the catalog are summarized in Table 2-2. Several fields included in the catalog are replicated from the data objects that represent groups of tables describing the data directory, contacts, methods, and publications. These fields are noted in Table 2-2. The components of the tables describing the data directory are described in the guidance document for the data directory (Frithsen and Strebel 1995). Earlier drafts of this document were the basis for the data object describing contacts within the program. The directory and the contact data objects are already implemented in the EMAP Oracle data base. The tables describing the methods data object are being designed by the Methods Coordinating Group with assistance from the EMAP Information Management User Interaction and Planning Team. The components of the tables describing the EMAP publications data object are defined in this document and were developed based upon the information presented in Lear and Chapman (1994) and the components of the data base used to develop that publication (Lear, personnel communication). ------- Drait^0_A£riM_995i Table 2-2. Individual fields included in the EMAP Data Catalog. Fields replicated from other data objects in the EMAP relation data base are footnoted. Numbers refer to major sections defined in Table 2-1. 1. Data Set Identification Title of Catalog document Author(s) of the Catalog entry Catalog revision date Data set name(a) Task Group(a) Data set identification code(a) Version(a) Requested Acknowledgement 2. Investigator Information Principal Investigator Sample Collection Investigator Sample Processing Investigator Data Analysis Investigator Additional Investigator 3. Data Set Abstract Abstract of the Data Set(a) Keywords for the Data Set(a) 4. Objectives and Introduction Program Objective Data Set Objective Data Set Background Information Summary of data set parameters 5.1 Data Acquisition Sampling Objective Sample Collection Methods Summary Beginning Sampling Date Ending Sampling Date Sampling Platform Sampling Equipment Manufacturer of Sampling Equipment Key Variables Sampling Method Calibration Sample Collection Quality Control Sample Collection Method Reference Sample Collection Method Deviations 5.2 Data Preparation and Sample Processing Data Preparation Objective Data Processing Methods Summary Sampling Processing Method Calibration Sample Processing Quality Control Sample Processing Method Reference Sample Processing Method Deviations 6. Data Manipulations Name of New or Modified Value Data Manipulation Description Data Manipulation Examples Data Manipulation Computer Code File Data Manipulation Computer Code Language Data Manipulation Computer Code 7.1 Description of Parameters Parameter Name SAS Parameter Name Parameter label or description Units of measurement Parameter data type Precision to which values are reported Accuracy of the data values Minimum Value in Data Set Maximum Value in Data Set 7.2 Data Record Example Column Names for Example Records Example Data Records 7.3 Related Data Sets Related Data Set Name Related Data Set Identification Code ------- Drait^0_A£riM_995i Table 2-2. (Continued) 8. Geographic and Spatial Information 10. Data Access Minimum Longitude Data Access Procedures Maximum Longitude Data Access Restrictions Maximum Latitude Data Access Contact Person Minimum Latitude Data Set Format Name of the area or region Information Concerning Anonymous FTP Direct Spatial Reference Method Information Concerning Gopher Horizontal Coordinate System Used Information Concerning World Wide Web Resolution of Horizontal Coordinates EMAP CD-ROM Containing the Data set Units for Horizontal Coordinates Vertical Coordinate System Resolution of Vertical Coordinates Units for Vertical Coordinates 11. References 9. Quality Control/Quality Assurance Reference Type(d) Measurement Quality Objectives Reference Author® Quality Assurance/Control Methods Reference Author's Affiliation® Actual Measurement Quality Title of Reference® Sources of Error Journal or Volume Title® Known Problems with the Data Journal or Volume Editor® Confidence Level/Accuracy Judgement Page and Volume Reference® Allowable Minimum Values Date the Reference was Published® Allowable Maximum Values Location of Publishing Organization® QA Reference Data Name of Publishing Organization® Reference Report Number or Other ID® Procite Record Number for the Reference® 12. Table of Acronyms 13. Personnel Information Glossary Term or Acronym Formal Title(a-b) Definition of Glossary Term or Acronym Last Name(a-b) First Name(a-b) Middle Initial(a-b) Role(a-b) Line 1 of Address(a-b) Line 2 of Address(a-b) Line 3 of Address(a-b) Line 4 of Address(a-b) City(a-b) State(a,b) Zip Code(a-b) County(a,b) Voice Phone Number(a-b) Fax Phone Number(a-b) Email Address(a-b) Email Network(a-b) Additional Email Information(a-b) (a) Field included in the Data Set Directory (b) Field included in the Contacts data base object. (c) Field included in the Methods data base object. ® Field included in the Publications data base object ------- Drait^0_A£riM_995i 3.0 WRITING THE CATALOG ENTRY The catalog represents a technical document written to convey technical information to a scientific audience. Catalog entries are best written by the investigators who directed the collection and analysis of the data being documented. In general, information management staff will not have the technical background to write or review catalog entries; however, the data documentation effort represents a collaborative effort between scientific and information management staff. Principal investigators provide technical descriptions that enhance understanding of the data while information management staff help to organize those descriptions within the information management system. The data set catalog contains detailed, background information about a data set. This information includes descriptions of data collection methods, laboratory and analytical procedures, summaries of quality assurance results, and other types of information that can be utilized by potential users to understand and begin using the data for some purpose. These uses of the data may be entirely different from the uses for which the data were originally collected. Information contained within the catalog should be of sufficient detail and completeness to minimize the need for most users to access additional sources of information. This chapter lays out the organizational structure and content of the catalog entry at a level appropriate for primary authors. The author of a catalog entry should normally be a scientist familiar with the data set who will approach this task as one of writing a scientific paper about the data set. In most cases, the primary author will be the investigator responsible for collecting and analyzing the data. Some technical details, of course, will be provided by the cognizant TGIM when the document is finalized. It is the responsibility of the TGIM to represent EMAP IM in obtaining, submitting, and maintaining catalog entries that adequately describe all EMAP data sets. The detailed technical reference for the material in this section is given in the Appendix. 3.1 DATA SET IDENTIFICATION This section of the catalog contains information used to identify the data set (Table 3-1). The data set name gives a convenient user reference for the data set, while the data set identification allows the information management staff to build technical links between the catalog document and the data set itself. ------- Drait^0_A£riM_995i Table 3-1. Catalog components describing data set identification information Field Field Description Title: Title of Catalog document Cat Author: Author(s) of the Catalog entry Cat Rev Date: Catalog revision date Data Set Name: Data set name Task Group Task Group Code Data Set Id: Data set identification code Version Version number Req Acknowl Requested acknowledgement Discussion: These fields identify the document and the data set to which it refers. The title, author, and catalog revision date are used to identify and track the document itself. The title of the data set catalog entry, in most cases, will be similar to the name of the data set, although perhaps more succinct. Catalog author is one of the defined roles supported in the contacts data base object. The field identifies people who have contributed to writing the catalog entry and may be repeated as many times as is necessary to identify multiple catalog authors. Additional information about each named individual is provided in the Personnel Information section of the catalog (Section 3.13). The catalog revision date field is a mandatory field used to record when the catalog entry was last revised (e.g. to reflect the results of new analyses or to incorporate information concerning additional data uses, limitations, and publications). The data set identification code contains information about the EMAP task group from which the data set originates. Along with Task Group and Version Number, the Data Set Identification Number uniquely identifies a data set or view. Because the data set may be referenced in scientific publications, a suitable acknowledgement for its use should be suggested. Data sets collected by EMAP staff under programmatic auspices will have a general EMAP acknowledgement statement. Where a data set is the product of a specific investigator's work, and should be so referenced, the investigator should provide appropriate acknowledgement text. 3.2 INVESTIGATOR INFORMATION This section of the data set catalog contains information identifying the individuals, or group of individuals, who produced the data set (Table 3-2). This normally means principal investigators and their immediate associates. Reference to whom should be contacted to obtain the data, and data acquisition procedures, are handled in a separate section (Section 3.10). Separate roles have been defined for investigators that have been responsible for various aspects of the development of a data set. These roles reflect that EMAP monitoring has been completed using multiple teams of contractor and federal staff and that often no one investigator is responsible for all activities related to a data set. The fields may be repeated as many times as ------- Drait^0_A£riM_995i is necessary to represent multiple investigators. This section of the catalog references investigator names only; additional information about each named individual is provided in the Personnel Information section of the catalog (Section 3.13) Table 3-2. Catalog components identifying investigators Field Field Description Princ Invest: Sam Coll Invest: Sam Proc Invest: Data Anal Invest: Add Invest: Principal Investigator Sample Collection Investigator Sample Processing Investigator Data Analysis Investigator Additional Investigator Discussion: These fields identify the investigators who had some active role in producing the data set. Those actively concerned with the generation and use of the data set are intended to be listed here; individuals responsible for the administration and management of the program are more appropriately referenced in the data set contact fields provided in the directory. Although highly desirable, it is not necessary to provide information for principal investigators to complete a data set catalog entry. If principal investigator information is missing, the name of the contact person referenced in the directory will be given. If no contact person is identified, the name of the data center originating the data set will be given. 3.3 DATA SET ABSTRACT A brief summary of the data set, like the abstract of a paper, allows a potential user to browse without learning all of the details. The General Keyword field can be repeated as many times as necessary. Table 3-3. Catalog components describing abstract information Field Field Description Abstract: Gen Keyword: Abstract of the Data Set Keywords for the Data Set ------- Drait^0_A£riM_995i Discussion: These fields are used to provide an overview of the data set. The Abstract should be a paragraph or two and summarize the main points expanded in the following sections. Any appropriate keywords should be listed in the Keyword field. A menu of suggested keywords is provided through the data set directory; however, catalog authors are not limited to these keywords. 3.4 OBJECTIVES AND INTRODUCTION Information in this section provides background and justification for each data set within the context of the study from which it originated (Table 3-4). This material can usually be abstracted from research proposals or project plans. The Program Objective field will describe overall EMAP program objectives using standard text supplied by the Information Management Staff, and need not be completed by the catalog author. Table 3-4. Catalog components describing objectives and introduction information Field Field Description Prog Objectiv: Data Objectiv: Data Backgrd: Parameter Sum: Program Objective Data Set Objective Data Set Background Information Summary of data set parameters Discussion: These fields are roughly the equivalent of an introductory section in a scientific paper. They consist of a paragraph or two of textual material that gives an overview of the program and data set. These fields must be provided to complete a data catalog entry. The Data Objective field is used to describe the specific goals that were to be met by the collection of this data set. The Data Set Background Information, on the other hand, sets this data collection activity in a broader context of scientific motivation and interactions with other data available or to be collected. The last field provides a summary (not an exhaustive list) of the parameters, variables, and measured quantities reported in the data set. 3.5 METHODS The methods employed to acquire the data set are summarized in this section of the catalog. Documents and reports containing more extensive descriptions of methods, including standard operating ------- Drait^0_A£riM_995i procedures (e.g., Conkling and Byers, 1993; Macauley and Summers 1991) are more appropriately stored separately from the catalog, but are referenced in the data set catalog. Descriptions of methods are organized in two subsections: data acquisition (including sampling; Table 3-5) and data preparation (including sample processing; Table 3-6). 3.5.1 Data Acquisition Table 3-5. Catalog components describing data acquisition methods Field Field Description Samp Objectiv: Sampling Objective Sampl Method: Sample Collection Methods Summary Beg Sampl Date: Beginning Sampling Date End Sampl Date: Ending Sampling Date Platform: Sampling Platform Sample Equip: Sampling Equipment Equip Manufac: Manufacturer of Sampling Equipment Key Variables: Key Variables Sam Meth Cal: Sampling Method Calibration Sam Qual Con: Sample Collection Quality Control Sam Meth Ref: Sample Collection Method Reference Sam Meth Dev: Sample Collection Method Deviations Discussion: These text fields describe why, how, and when sampling data were acquired. In general, they would consist of a sentence or two to possibly a paragraph each, forming the equivalent of a sampling methods description in a scientific paper. For the most part, the information required is straightforward. The methods summary need not be exhaustive, as long as an adequate bibliographic reference to the complete description of the sample collection methods is included. (It is anticipated that the data set catalog will be linked to extensive text files containing complete descriptions of methods and standard operating procedures.) The most critical field may be the last, which is used to document any known deviations from the methods referenced. Knowledge of such deviations may be necessary to assess the overall quality of the data or to evaluate the significance of any long-term changes identified from the integration of separate data sets collected in multiple years. ------- Drait^0_A£riM_995i 3.5.2 Data Preparation and Sample Processing Table 3-6. Catalog components describing data preparation and sample processing methods Field Field Description Proc Objectiv: Data Proc Sum: Sampl Proc Calib: Proc Qual Con: Samp Proc Ref: Sampl Proc Dev: Data Preparation Objective Data Processing Methods Summary Sampling Processing Method Calibration Sample Processing Quality Control Sample Processing Method Reference Sample Processing Method Deviations Discussion: These text fields describe why and how samples were processed. In general, they would consist of a sentence or two to possibly a paragraph each, forming the equivalent of a sample processing description in a scientific paper. For the most part, the information required is straightforward. The processing summary need not be exhaustive, as long as an adequate bibliographic reference to the complete description of the sample processing procedures is included. (It is anticipated that the data set catalog will be linked to extensive text files containing complete descriptions of methods and standard operating procedures.) The most critical field may be the last, which is used to document any known deviations from the standard oper- ating procedures referenced. Knowledge of such deviations may be necessary to assess the overall quality of the data or to evaluate the significance of any long-term changes identified from the integration of separate data sets collected in multiple years. 3.6 DATA MANIPULATIONS This section of the data set catalog provides documentation for any manipulations of the data subsequent to data acquisition and data preparation (Table 3-7). Documentation of these data manipulations should be sufficiently detailed so that future users of the data will fully understand what was done. The fields used in this section may be repeated as many times as is necessary to document the data manipulations used to produce the final data set. ------- Drait^0_A£riM_995i Table 3-7. Catalog components describing data manipulations Field Field Descriptions Deriv Value Name: Data Manip Desc: Data Manip Exmpl: Man Code File: Man Code Lang: Manip Code: Name of New or Modified Value Data Manipulation Description Data Manipulation Examples Data Manipulation Computer Code File Data Manipulation Computer Code Language Data Manipulation Computer Code Discussion: These fields describe any algorithms that have been used to derive new values or quantities from existing data. Such data manipulations may include, for example, conversions to different units, transformation of continuous data values to discrete values, or derivation of unmeasured quantities that are uniquely determined by known data. The data manipulation description and example fields must be completed and include complete descriptions of algorithms or reference information (e.g., look up tables) applied. It may be necessary to use a "pseudo-code" approach to define the algorithm in a stepwise fashion in ASCII text. It is optional, but encouraged to include relevant program segments of the actual computer code used and to submit the program file for archiving with the data. 3.7 DATA DESCRIPTION The data description sections give details of each value reported in the data set, sufficient for a user to read the data set file and ingest it into application software. The description is provided in three parts: 1) A description of the quantities in the data set (Table 3-8), 2) An example data record (Table 3-9), and 3) A cross reference to related data sets (Table 3-10). The data description contents and accuracy are primarily a function of the investigator submitting the data set. However, the description provides the basis for organizing the data itself in the EMAP relational data base management system (RDBMS), and will be entered into the RDBMS data dictionary and used heavily by the information management staff. This usage will generate feedback to the investigator and to this catalog document through efforts to standardize like parameter names and definitions across data sets, checks of maximum and minimum values, reports generating data record examples, and identification of related data sets. ------- Drait^0_A£riM_995i 3.7.1 Description of Parameters Parameters in the data set are listed and described in this section of the data catalog. This information is abstracted from the data dictionary for the EMAP monitoring data base and may be presented in tabular form. The following information should be provided for each quantity (column) in the data set: Table 3-8. Catalog components describing data set parameters Field Field Descriptions Param Name: Parameter Name SAS Param Name: SAS Parameter Name Param Descrip: Parameter label or description Units: Units of measurement Data Type: Parameter data type Precision: Precision to which values are reported Accuracy: Accuracy of the data values Act Min Value: Minimum Value in Data Set Act Max Value: Maximum Value in Data Set Discussion: These fields describe the basic characteristics (name, type, accuracy, range) of values that are reported in the data set. The minimum value and maximum value fields are optional for non-numeric values. The shortened (8 character) SAS name should be provided if the data set or the parameters in the data base originated from a SAS data set. The parameter description is a one line complete and precise scientific name of the quantity, not an extensive definition or algorithm. 3.7.2 Data Record Example A display of a limited number of records or observations in the data set assists the data user in understanding the structure and composition of the data set. This section of the data set catalog presents several example records from the data set. The records to be displayed can be provided by the investigator or can be generated from the EMAP relational data base after the data set is loaded into the data base. If the data set is complex or there are specific examples that should be shown, the investigator should elect to provide the material at the time the catalog document is written. The Example Data Values line is repeated once for each data set record included. ------- Drait^0_A£riM_995i Table 3-9. Catalog components describing data record Field Field Description Header Line: Exmpl Data Values: Column Names for Example Records Example Data Records Discussion: The example data values from a data set record are recorded in this field. They should be spaced so as to line up with the column names in the header line, when displayed in a fixed (non proportional) font. Special arrangements should be made if this field is to be used in the description of binary data sets (for example, SAS files, GIS coverages, or a satellite images). The field serves two purposes: to give the user a preview of the scientific data and to give the programmer who must access the data formatting and verification information. For the first purpose, if numbers are relevant, they can be extracted, and a "pseudo record" constructed. Similarly, a thumbnail version of a displayed GIS or image data set could be attached to the catalog document. Actual byte values can be included to meet the second purpose, if necessary. For example, a TIFF image header and the corresponding values in the data file could be listed in the two fields. 3.7.3 Related Data Sets The names and identification codes of data sets containing similar or related data are referenced in this section of the data catalog (Table 3-10). The references should be entered exactly to make possible relational links (via the RDBMS) to the documentation for those data sets. These fields can be repeated to accommodate multiple data set references. Table 3-10. Catalog components describing data record Field Field Description Related DS Name: Task Group: Related DS ID: Version: Related Data Set Name Task Group Code Related Data Set Identification Code Version number Discussion: The name and identification code of a data set containing similar or related data is recorded in this field. ------- Drait^0_A£riM_995i 3.8 GEOGRAPHIC AND SPATIAL INFORMATION Information about the spatial coverage of the data set, particularly information that is specific to spatial data sets, is provided in this section of the data set catalog (Table 3-11). In the geographic coverage section, specific spatial data organization and reference information will be supplied if applicable to the data set. (The spatial data documentation standards recommended by the Federal Geographic Data Committee (FGDC 1994) have been considered in designing this set of fields - see the Appendix for additional details.) Table 3-11. Catalog components describing geographic and spatial information Field Field Description Min Longitude: Minimum Longitude Max Longitude: Maximum Longitude Max Latitude: Maximum Latitude Min Latitude: Minimum Latitude Geo Keyword: Name of the area or region Spatial Ref Meth: Direct Spatial Reference Method Horiz Coord Sys: Horizontal Coordinate System Used Horiz Resolution: Resolution of Horizontal Coordinates Horiz Coord Units: Units for Horizontal Coordinates Vertical Coord Sys: Vertical Coordinate System Vertical Resolution: Resolution of Vertical Coordinates Vertical Coord Units: Units for Vertical Coordinates Discussion: These fields provide basic information about the spatial extent and organization of the data set. The maximum and minimum longitude and latitude give the bounding coordinates (West, East, North, South) marking the limits of coverage of a data set. Latitude and longitude values are expressed in decimal degrees. The geographic keywords or names should uniquely identify the spatial extent of the data set using a term descriptive of the boundaries (e.g. New York State; EPA Region IX) or geographic features (e.g. Chesapeake Bay; Virginian Province). This field may be repeated to indicate multiple overlapping region names. The spatial reference method informs the user whether the data set organization is point, vector, or raster. ------- Drait^0_A£riM_995i 3.9 QUALITY CONTROL/QUALITY ASSURANCE Quality control and quality assurance information is used to understand the limits of the data. Specific data collected to assess data quality may be included in this section of the data set catalog (Table 3-12), or may be included in a separate data set that is referenced in the data set catalog. Table 3-12. Catalog components describing quality control/quality assurance information Field Field Description Meas Qual Obj: Measurement Quality Objectives QA/QC Meth: Quality Assurance/Control Methods Act Meas Quality: Actual Measurement Quality Sources of Error: Sources of Error Known Data Prob: Known Problems with the Data Conf Level Stmnt: Confidence Level/Accuracy Judgement Allow Min Value: Allowable Minimum Values Allow Max Value: Allowable Maximum Values QA Ref Data: QA Reference Data Discussion: In these fields the investigator should provide brief discussions of the quality issues associated with the data set. The first three fields encode the formal quality objectives, methods, and results obtained. The next set of fields should contain more descriptive information such as field notes on sources of error, problems encountered by the investigator or others in using the data set for analysis, and a subjective evaluation of the original investigator's confidence in the data set. The allowable minimum and maximum values are useful QA check and search query information. The last field may include actual reference data or the name of a file or document that has the appropriate reference data. ------- Drait^0_A£riM_995i 3.10 DATA ACCESS This section is designed to provide users with information on how to access data (Table 3-13). Table 3-13. Catalog components describing data access information Field Field Description Data Access: Data Access Restrict: Data Access Contact: Data Set Format: FTP In for: Gopher Infor: WWW Infor: EMAP CD-ROM: Data Access Procedures Data Access Restrictions Data Access Contact Person Data Set Format Information Concerning Anonymous FTP Information Concerning Gopher Information Concerning World Wide Web EMAP CD-ROM Containing the Data set Discussion: The procedures discussion should provide general information about the different ways users can access data including telephone contact, dial-in lines, and Internet. Access to data via Internet most likely will utilize standard file transfer protocols (FTP), as well as the Internet discovery and retrieval tools Gopher and World Wide Web (WWW) provided through the Agency's public access server (Strebel and Frithsen 1995). Additional reference may be made to the EMAP Internet server for draft documents and data; however, access to this server is restricted and these restrictions will need to be documented. Contact information is entered in a descriptive way in these fields, but the content should be essentially the same as required for the investigator name and address section. 3.11 REFERENCES This section of the data set catalog provides a list of any published documentation relevant to the data collected. Documentation may include manufacturer's instruction manuals, government technical manuals, user's guides, etc. Also referenced should be any technical reports and scientific publications concerning the methods, instruments, or data described in this document. Publications by the principal investigator or investigating group that would help a reader understand or analyze the data are particularly important. The format of the bibliographic references is taken from the EMAP bibliographic data base specification (Lear, pers. comm.). The section is broken into two subsections to accommodate references that will be maintained in the relational tables for the EMAP bibliographic data object (detailed reference format) and those background and general references which are primarily of use to the reader and will not be tracked (brief reference format). ------- Drait^0_A£riM_995i 3.11.1 EMAP References References given in this section are those published or used as part of EMAP and which should be tracked in the data table for the EMAP bibliographic data object (Table 3-14). Table 3-14. Catalog components describing EMAP reference information Field Field Description Ref Type: Reference Type Ref Author: Reference Author Ref Authors Affil: Reference Author's Affiliation Ref Title: Title of Reference Volume Title: Journal or Volume Title Volume Editor: Journal or Volume Editor Page Ref: Page and Volume Reference Date of Ref: Date the Reference was Published Place of Pub: Location of Publishing Organization Publisher: Name of Publishing Organization Ref Other ID: Reference Report Number or Other ID Procite Rec Num: Procite Record Number for the Reference Discussion: Standard bibliographic reference information is provided in these fields. The type of documentation referenced may include journal articles, workshop proceedings, books, reports, films, video tapes, audio tapes, CD-ROMs,etc. All authors are listed using paired repeats of the fields Reference Author and Reference Authors Affiliation. If the reference is an article in a journal, a proceedings volume, or other compendium, the title and editor of the full work are given in the "Volume" fields. For special reports, technical memoranda, or other material that carries an internal organizational identification code or report number, it should be recorded in the Reference Other Identification field. The Procite Record Number is optional. It is included in the catalog specifications because the EMAP bibliography is currently maintained in a Procite data base. Eventually, all EMAP bibliographic information should be stored and maintained in the EMAP Oracle data base, eliminating the effort needed to maintain separate data bases. ------- Drait^0_A£riM_995i 3.11.2 Background References The references listed in this section are materials in the open literature which may support the collection, use, and interpretation of the data set, but are not directly related to or products of EMAP. As such they need not be tracked in the EMAP bibliographic data object, but should be captured as part of the documentation. A single text field is provided, but it should contain all of the relevant information that would otherwise have been listed in the previous section (Table 3-15). Table 3-15. Catalog components describing EMAP background references Field Field Description Supt Ref: Supporting Reference Discussion: This field contains the reference information as described above for a single supporting reference relevant to the data set. It may be repeated as many times as required. 3.12 GLOSSARY AND TABLE OF ACRONYMS The detailed documentation for each data set are likely to contain terms and acronyms that are unfamiliar to some users of the data. Catalog authors should provide definitions for those terms that have specific meaning within EMAP or within a particular discipline (Table 3-16). All acronyms used should be recorded in these fields (which are repeated in pairs) to provide convenient references for detailed documentation reports and on-line documentation searches. Table 3-16. Catalog components describing glossary terms and acronyms used in the catalog entry Field Field Description Acronym: Acronym Def: Glossary term or acronym used in the Detailed Documentation Definition of glossary term in acronym Discussion: These fields provide definitions for terms and acronyms in the catalog that may be unfamiliar to some users of the data. The EMAP Master Glossary (EMAP 1993) should be consulted for suggested definitions for terms that have specific meaning within EMAP. ------- Drait^0_A£riM_995i 3.13 PERSONNEL INFORMATION This section of the data set catalog contains information identifying the individuals associated with the data set and named in the data set catalog (Table 3-17). This includes catalog authors (Section 3.1), investigators (Section 3.2), and data set contacts (Section 3.10). Whereas names only were provided in previous sections, this section provides address, telephone and email information for individuals named within the catalog. The fields used to provide this information are the same fields used in the data set directory and the contacts data base object. Individuals in the contacts data base object are linked to the catalog through their defined roles and the data set identification information. Table 3-17. Catalog components describing personnel information. Title: Formal Title LstName: Last Name FrstName: First Name Mid Init: Middle Initial Role Role Address 1: Line 1 of address Address2: Line 2 of address Address3: Line 3 of address Address4: Line 4 of address City: City State: State Zip: Zip Code Country: Country Voice Phone: Voice Phone Number Fax Phone: Fax Phone Number Email Addr: Email Address Email Netwk: Email Network Add EM Inf. Additional Email Information ------- Drait^0_A£riM_995i 4.0 EXAMPLES The purpose of this chapter is to illustrate how typical data set information may be associated with the fields of the catalog. The format in this chapter would be convenient for writing the entry, updating it , submitting it to the IMS staff, and loading it into the IMS documentation data base. It is not intended to be the output format seen by data users and others who consult the data set documentation. The output formats are discussed in Chapter 5. The example entries are drawn from information obtained from the EMAP Estuaries Resource Group. Example catalog documentation is provided for a biological data set (1990 Virginian Province Benthic Data Set) and a chemical data set (1990 Virginian Province Sediment Metal Chemistry Data Set). The catalog documentation examples are provided as illustrations only and for this draft have not been reviewed by resource group scientists or information management specialists from the EMAP Estuaries Resource Group. Although attempts were made to complete accurate catalog entries, some information was not available for this draft and hypothetical information was provided to complete the example. Readers are cautioned not to use these examples as working documentation for the data sets described. Note: The examples were incompletely developed for the 30 April 1995 draft of this report. 4.1 BIOLOGICAL DATA SET This example describes a benthic data set collected in 1990. Although it is based on an actual example, some information is hypothetical to complete the example. 4.1.1 Data Set Identification Title Cat Author Cat Rev Date Data Set Name Task Group Data Set Id Version Req Acknowl 1990 Virginian Province Benthic Data for Species Abundance and Biomass Jeffrey B. Frithsen 1995-04-01 EMAP - Esturaries Program Level Database - 1990 Virginian Province Benthic Species Data Set Summarized by Station 01 1001 001 These data were produced as part of the U.S. EPA's Environmental Monitoring and Assessment Program (EMAP). Although the data described in this article have been funded wholly or in part by the U.S. Environmental Protection Agency through its EMAP Estuaries Program, it has not been subjected to Agency ------- Drait^0_A£riM_995i review, and therefore does not necessarily reflect the views of the Agency and no official endorsement should be inferred. 4.1.2 Investigator Information Princ Invest: Sam Coll Invest: Sam Proc Invest: Data Anal Invest: Add Invest: Charles J. Strobel Jeffrey B. Frithsen A. Frederick Holland Steve Weisberg 4.1.3 Data Set Abstract Abstract: This data set contains measurements for the abundance and biomass of bottom- dwelling (benthic) macroinvertebrates collected from sediments within the estuaries of the Virginian Province (Cape Code to Chesapeake Bay). All samples were collected during the summer using a young-modified Van Veen grab. Abundance measurements were made for individual taxa; biomass measurements were made for selected species and taxonomic groups. Gen Keyword: Benthic Gen Keyword: Macroinvertebrate Gen Keyword: Estuary 4.1.4 Objectives and Introduction Prog Objectiv: The EPA's Environmental Monitoring and Assessment Program (EMAP) was designed to provide quantitative assessment of the national extent of environmental problems by measuring status and change in selected indicators of ecological condition. Data Objectiv: The specific objective of the investigation described in this document was to collect information to characterize the bottom dwelling (benthic) animal assemblages in the estuaries of the EMAP-Estuaries Virginian Province. Data Backgrd: Macrobenthic organisms play an important role in the estuarine conceptual model. As major secondary consumers in estuarine ecosystems, they represent an important linkage between primary producers and higher trophic levels for both planktonic and detritus- based webs. They are a particularly important food ------- Drait^0_A£riM_995i source for juvenile fish and crustaceans and also include many commercially and recreationally important species. The benthic macroinvertebrate species composition and abundance indicator has been placed in the core group not only because of its importance, but also because of its responsiveness to the kinds of environmental stress gradients of interest to EMAP-E. Benthic assemblages are composed of diverse taxa with a variety of reproductive modes, geeding guilds, life history characteristics, and physiological tolerances to environmental conditions. As a result, benthic populations respond to changes in conditions, both natural and anthropogenic, in a variety of ways. Responses of some species (e.g., filter feeders, species with pelagic life stages) are indicative of water quality changes, while responses of others (e.g., organisms that burrow in or feed on sediments) are indicative of changes in sediment quality. Parameter Sum: Benthic species composition, abundance, and biomass were estimated for each of three sediment grabs taken at each sampling station. 4.1.5 Methods 4.1.5.1 Data Acquisition Samp Objectiv: Sampl Method: Beg Sampl Date End Sampl Date Platform To collect sediment samples suitable for the analysis of benthic assemblage characteristics. Three replicate sediment samples were to be collect as each EMAP-VP station for benthic species composition, abundance, and biomass. Sediment grabs used for benthic samples were randomly interspersed with the grabs used for sediment chemistry/toxicity samples. 1990-06-20 1990-09-22 Samples were collected from 8-m (24 ft) twin-engine Chesapeake style workboats. Sample Equip: A 1/25 m2 stainless stell Young-modified Van Veen grab sampler with a maximum penetration depth of 10 cm was used to collect sediment samples. The sampler was constructed entirely of stainless steel and had been Kynar (similar to Teflon) coated. ------- Drait^0_A£riM_995i Equip Manufac: Young: Falmouth, MA Key Variables: RPD Depth; Sam Meth Cal: The sample gear required no calibration. Sam Qual Con: Acceptable grabs penetrated the sediments at least 7 cm. Grabs containing no sediments partially filled grabs, or grabs with grossly slumped surfaces were unacceptable. Grabs completely filled to the top where the sediment was in direct contact with the hinged top were also unacceptable. Sam Meth Ref: Strobel, C.J. 1990. Environmental Monitoring and Assessment Program - Near Coastal Component: 1990 Demonstration Project Field Operations Manual. Environmental Research Laboratory, Narragansett, RI. U.S. Environmental Protection Agency, Office of Research and Development. Sam Meth Dev: None. 4.1.5.2 Data Preparation and Sample Processing Proc Objectiv: To process sediment samples to characterize benthic assemblages in terms of species composition, abundance, and biomass. Data Proc Sum: Benthic fauna identified included those commonly termed "macrofauna" by benthic ecologists. "Meiofaunal" groups were not identified or enumerated. These groups included: nematodes, ostracods, turbellarians, harpacticoid copepods and foraminifera. In addition to meiofauna, taxonomic groups having only planktonic forms were excluded from the identification process. Examples of these groups were copepods and cladocerans. Benthic fauna were identified to the lowest practical taxonomic level. Macrobenthos were identified to species, except for the following groups: class Anthozoa (class), subclass Copepoda (order), phylum Nemertinea (phylum), subclass Ostracoda (subclass) and class Turbellaria (class). For samples collected in low salinity (less than 5 ppt) water, oligochaets were identified to species and chironomides to genus. Above 5 ppt salinity, oligochaetes were identified to class and chironomids were identified to family. Sampl Proc Calib: No calibration required. ------- Drait^0_A£riM_995i Proc Qual Con: Ten percent of all samples were resorted as a quality control check on each technician's efficiency. Samp Proc Ref: Klemm, D.J., L.B. Lobring, J.W. Eichelberger, A. Alford-Stevens, B.B. Porter, R.F. Thomas, J.M. Lazorchak, G.B. Collins, and R.L. Graves. 1993. Environmental Monitoring and Assessment Program (EMAP) Laboratory Methods Manual: Estuaries. U.S. Environmental Protection Agency, Environmental Research Laboratory, Cincinnati, OH. Frithsen, J.B. 1991. Technical scientific assistance to EMAP Near Coastal Group for the processing and integration of EMAP-NC Demonstration Project benthic data. 6 May 1991. Report completed for the U.S. Environmental Protection Agency, Office of Research and Development, Environmental Monitoring and Assessment Program (EMAP), Washington, DC. Report prepared by Versar, Inc., Columbia, MD. Sampl Proc Dev: 4.1.6 Data Manipulations Deriv Value Name Data Manip Desc Data Manip Exmpl Man Code File Man Code Lang Manip Code BSPECNUM BSPECNUM (Total number of individuals of each species code = abundances of a species code summed across 'n' grabs collected at a station, where n=lto 3. Deriv Value Name: Data Manip Desc: B SPEC MA BSPEC MA (mean abundance of indivuals for each species) = Abundances of a species code summed over 'n' grabs and divided by 'n' grabs ------- Drait^0_A£riM_995i Dat manip Exmpl Man Code File Man Code Lang Manip Code Deriv Value Name: Data Manip Desc: Dat Manip Exmpl: Man Code File: Man Code Lang; Manip Code: BSPECSTD BSPECSTD (standard deviation of the mean abundance) = Standard deviation of the mean abundance of each species code 4.1.7 Data Description 4.1.7.1 Description of Parameters Param Name: SAS Param Name: Param Descrip: Units: Data Type: Precision: Accuracy: Act Min Value: Act Max Value: STA_NAME Station identification code Character Param Name: SAS Param Name: Param Descrip: Units: Data Type: Precision: Accuracy: Act Min Value: Act Max Value: VST DATE Date sampling started at this station Date (YYMMDD) 90-06-20 90-09-22 Param Name: SAS Param Name: Param Descrip: Units: Data Type: Precision: SCINAME Scientific name of the species or taxonomic group Character ------- Drait^0_A£riM_995i Accuracy Act Min Value Act Max Value Param Name: SAS Param Name: Param Descrip: Units: Data Type: Precision: Accuracy: Act Min Value: Act Max Value: BSPECNUM Number of individuals for a specific value of SCINAME Number per 440 cm2 grab Numeric 1 +- 10 % 0 Param Name: SAS Param Name: Param Descrip: Units: Data Type: Precision: Accuracy: Act Min Value: Act Max Value: B SPEC MA Mean number of individuals for a specific value of SCINAME for all grabs collected at a station Mean number per 440 cm2 grab Numeric 1 0 Param Name: SAS Param Name: Param Descrip: Units: Data Type: Precision: Accuracy: Act Min Value: Act Max Value: BSPECSTD Standand deviation of the mean number of individuals for a specific value of SCINAME for all grabs collected at a station Numeric 4.1.7.2 Data Record Example Header Line: Exmpl Data Values: ------- Drait^0_A£riM_995i 4.1.7.3 Related Data Sets Related DS Name: Related DS ID: 4.1.8 Geographic and Spatial Information Min Longitude: 77.17048 Max Longitude: 70.04186 Max Latitude: 41.38330 Min Latitude: 36.49546 Geo Keyword: Virginian Province Geo Keyword: EPA Region I Geo Keyword: EPA Region II Geo Keyword: EPA Region III Spatial Ref Meth: Point Horiz Coord Sys: Geographic Horiz Resolution: Horiz Coord Units: Vertical Coord Sys: Vertical Resolution: Vertical Coord Units: 4.1.9 Quality Control/Quality Assurance Meas Qual Obj: Measurement quality objectives were outlined in the Quality Assurance Project Plan (Valente et al. 1991). The objective of laboratory processing was to achieve a maximum error for sorting, counting, and taxonomic identifications of 10% of the individuals in the sample. QA/QC Meth: Quality control for processing grab samples involved both sorting and counting check systems. A check on the efficiency of the sorting process was required to document the accuracy of the organism extraction process. In addition to sorting QA, it was necessary to perform checks on the accuracy of sample counting. This was done in conjunction with taxonomic identification and used the same criteria presented for taxonomic identification quality control. ------- Drait^0_A£riM_995i Act Meas Quality: Actual measurement quality as indicated by quality assurance audits indicated the error for sorting as 3.1% of the number of specimens in the sample. The error for species identification and enumerations was 1.4%. Sources of Error: The methods used to process benthic samples require that a small number of representative specimens of each species be set aside in a taxonomic reference collection. Therefore, the total biomass is underestimated for those samples from which reference specimens were taken. Total macrofaunal biomass was also potentially underestimated for samples from tidal freshwater and oligohaline salinity regions where the number of chironomids or the number of oligochaetes was less than 20. In those instances where the number of oligochaetes or chironomids was less than 20, all specimens were mounted for identification and no biomass measurements were made. Known Data Prob: Conf Level Stmnt Allow Min Value Allow Max Value QA Ref Data Known data problems were not judged to be significant but were documented in the report describing the laboratory processing of benthic samples (Frithsen 1991). 4.1.10 Data Access Data Access Data Access Restrict Data Access Contact Data Set Format FTP Infor Gopher Infor WWW Infor EMAP CD-ROM ------- Drait^0_A£riM_995i 4.1.11 References 4.1.11.1 EMAP References Ref Type Ref Author Ref Authors Affil Ref Author Ref Authors Affil Ref Author Ref Authors Affil Ref Author Ref Authors Affil Ref Author Ref Authors Affil Ref Author Ref Authors Affil Ref Author Ref Authors Affil Ref Author Ref Authors Affil Ref Title Volume Title Volume Editor Page Ref Date of Ref Place of Pub Publisher: Ref Other ID: Procite Rec Num: Manual Klemm, D. J. Lobring, L. B. Eichelberger, J. W. Potter, B . B. Thomas, R. F. Lazorchak, J. M. Collins, G. B. Graves, R. L. Environmental Monitoring and Assessment Program (EMAP) Laboratory Methods Manual: Estuaries. September, 1991 Cincinnati, Ohio U.S. EPA Office of Research and Development, Environmental Monitoring Systems Laboratory EPA/600/4-91/xxx 4.1.11.2 Background References Supt Ref: 4.1.12 Glossary and Table of Acronyms Acronym: ppt Acronym Def: parts per thousand ------- Drait^0_A£riM_995i 4.1.13 Personnel Title Dr. Lst Name Frithsen Frst Name Jeffrey Middle Init B. Role Catalog Author Address 1 Versar, Inc. Address2 9200 Rumsey Road Address3 Address4 City Columbia State MD Zip 21045 Country USA Voice Phone 410-740-6112 FAX Phone 410-964-5156 Email Address Frithsenj ef@V ersar. Com Email Network Internet Add EM Info 4.2 CHEMICAL DATA SET This example was not included in the April 30, 1995 draft of this report. ------- Drait^0_A£riM_995i 5.0 CATALOG OUTPUT TO USERS The catalog fields are designed to reflect the elements of a scientific paper. The reader should be given a document-like text format that is easy to read and understand. There should also be a capability to search a document by section. This chapter gives guidance on constructing those output formats, but does not explore implementation details. 5.1 THE METADATA PUBLICATION The catalog sections are to be presented as the sections of a paper. Each section header should match the section number and title (to permit searching the file electronically). Within each section, the fields names are to be deleted and the field contents are to be aggregated into complete sentences and paragraphs that can be easily read. Some sections will have more descriptive text than others. For example, the first section contains brief identification information and might be formatted (compare 4.1.1 above): Section 1: Data Set Identification 1990 Virginian Province Benthic Data for Species Abundance and Biomass This document was written by Jeffrey B. Frithsen This document was last revised on April 1, 1995 This document describes the EMAP-E Estuaries-1990 Virginian Province Benthic Species Data Set Summaried by Station The data set identification number is 1001, Version 001. The following acknowledgment is requested by the investigators of anyone who uses and publishes on these data: These data were produced as part of the U.S. EPA's Environmental Monitoring and Assessment Program (EMAP). Although the data described in this article have been funded wholly or in part by the U. S. Environmental Protection Agency through its EMAP Estuaries Program, it has not been subjected to Agency review, and therefore does not necessarily reflect the views of the Agency and no official endorsement should be inferred. On the other hand, a section with a significant amount of text already provided in descriptive format would simply be broken into paragraphs. For example, consider this section about quality control (compare 4.2.9 above): Section 9: Quality Control/Quality Assurance Measurement Quality Objectives (MQOs) for the 1990 Virginian Province sediment chemistry analyses were defined in the 1990 Demonstration Project Quality Assurance Project Plan for EMAP-Near Coastal (Valente, et al., 1990). This plan required each laboratory to analyze the ------- Drait^0_A£riM_995i following quality control (QC) samples along with every batch or "set" of field chemistry samples: laboratory reagent blank, calibration check standards, laboratory fortified sample matrix, laboratory duplicate, and Laboratory Control Material (LCM). Results for these QC samples had to fall within certain pre-established control limits for the analysis of a batch of samples to be considered acceptable. Results of QC sample analyses are stored in the EMAP-Estuaries data base and are available upon request. For the analysis of major and trace elements by ICP-AES and GFAA, the laboratory generally met the pre-established acceptability criteria (control limits) for the QC samples. For the 1990 mercury analyses, the average percent recovery in the reference material fell just outside the accuracy control limit range of 85% to 115%, suggesting that mercury may have been slightly under-recovered in some sample batches. All QC results for the analysis of total organic carbon in the 1990 sediment samples fell within required control limits. A major deficiency in the 1990 organics data set is related to the laboratory's failure to achieve the target detection limits originally specified in the QA plan. These target detection limits were lOng/g (dry weight) for each PAH compound and 0.5 ng/g for each PCB congener and pesticide. In general, the detection limits achieved by the laboratory ranged from 1.5 to 30 times higher than the target value for PAH compounds and up to 15 times higher than the target value for PCB congeners and pesticides. In addition, the detection limits varied widely because the laboratory analyzed a different amount (i.e., dry weight) of sedimentfrom each sample. As a result, the analytes of interest were not detected in a large number of samples, and the "calculated detection limit (i.e., the theoretical concentration of each analyte necessary for detection) differed significantly from sample to sample. Data users are cautioned that there are several major deficiencies in the 1990 sediment organics data set that might limit or preclude the use of these data. These deficiencies were the result of numerous methodological and QA/QC problems experienced by the laboratory responsible for the analysis. Data users are cautioned that there are deficiencies in the 1990 sediment data set for butyltin compounds which might limit or preclude the use of these data. The laboratory's failure to detect the butyltin compounds of interest (TBT, DBT, MBT) in the majority of samples analyzed suggests a potential deficiency from the method detection limits for the individual analytes. The method detection limit (MDL) established by the laboratory was 4 ng/g dry weight for both TBT and DBT and 10 ng/g dry weight for MBT. Assuming these MDLs are valid, it is probable that contamination by butyltin compounds may be more widespread than indicated by these data. If the target detection limits had been achieved and consistent sample sizes had been used, the organic analytes of interest probably would have been detected and quantified in most of the 1990 Virginian Province samples. In reality, analytes of interest present in the samples at low concentrations were not detected and therefore not reported. This limits the comparability of the ------- Drait^0_A£riM_995i 1990 organics data with other data sets for which lower detection limits were achieved and limits data users' ability to make quantiative evaluations of sediment contamination for these organic compounds in the Virginian Province. The 1990 mercury and TOC results were deemed acceptable for use without qualification. Actual report formatting instructions will depend on how the documentation fields suggested in this document are implemented in a data base. This is an implementation issue which cannot be addressed at this time. 5.2 SELECTION OF METADATA COMPONENTS A full document may be difficult to view and search electronically. For the EMAP IMS online system, a document should be viewable using a menu that allows the reader to select specific components or sections of the document. It is suggested that document sections formatted as described in 5.1 above be stored so that they may be accessed by search tools (such as gopher) or hypertext links (as in html) organized by the primary document sections. That is, the user should be given the document title and the following list of options from which to choose: The components of the data set catalog are organized into the following sections: 1. Data set identification 2. Investigator information 3. Data set Abstract 4. Objectives and introduction 5. Data acquisition and processing methods 6. Data manipulations 7. Data description 8. Geographic and spatial information 9. Quality control and quality assurance 10. Data access and distribution 11. References 12. Glossary and Table of Acronyms 13. Personnel Information. A metadata access system will be required to present the user with lists of metadata documents from which to choose. This may be based on the EMAP data set directory or organized hierarchically by resource group and data type in a manner parallel to the data directories on the public access server. The metadata access system is a separate design topic and is not discussed further in this document. ------- Drait^0_A£riM_995i 6.0 LITERATURE CITED Conkling, B.L. and G.E. Byers (eds.). 1993. Forest Health Monitoring Field Methods Guide, Revised July, 1993. Internal Report. EPA/600/X-92/073. U.S. Environmental Protection Agency, Las Vegas, NV. EMAP. 1993. Environmental Monitoring and Assessment Program: Master Glossary. EPA/620/R-93/013. U.S. Environmental Protection Agency, Office of Research and Development, Environmental Monitoring and Assessment Program (EMAP), Research Triangle Park, NC. EOSDIS 1994. Interim Release2 ECS Core Metadata Baseline. 194-00269TPW. EOSDIS Core System Project. Hughes Applied Information Systems, Inc., Landover. MD. FGDC 1994. Content Standards for Digital Geospatial Metadata. June 8, 1994. Federal Geographic Data Committee. Washington, DC. Frithsen, J.B. and D.E. Strebel. 1995. Summary Documentation for EMAP Data: Guidelines for the Information Management Directory. April 30,1995. Report prepared for the U.S. Environmental Protection Agency, Office of Research and Development, Environmental Monitoring and Assessment Program, Washington, DC. Report prepared by Versar, Inc., Columbia, MD. Justice, C.O., G.B. Bailey, M.E. Maiden, S.I. Rasool, D.E. Strebel and J.D. Tarpley. 1995. Recent data and information system initiatives for remotely sensed measurments of the land surface. Remote Sensing of the Environment 51:23 5-244. Kirkland, L.L. 1994. EMAP Quality Management Plan. U.S. Environmental Protection Agency, Washington DC. Lear, J.S. and C.B. Chapman, eds. 1994. Environmental Monitoring and Assessment Program (EMAP Cumulative Bibliography. EPA/620/R-94/024. U.S. Environmental Protection Agency, Office of Research and Development, Environmental Monitoring and Assessment Program (EMAP), Research Triangle Park, NC. Lear, J.S. 1994. Personal communicaton. Components of the EMAP bibliographic data base. ManTech Environmental Technology, Inc., Research Triangle Park, NC. Macauley, J.M. and J.K. Summers. 1991. Near Coastal Louisianian 1991 Province Demonstration Project - Field Operations Manual. Environmental Research Laboratory, Gulf Breeze, FL. U.S. Environmental Protection Agency, Office of Research and Development. ------- Drait^0_A£riM_995i Meeson, B.W., D.E. Strebel and E.D. Paylor. 1993. Earth science information systems: A perspective from the Pilot Land Data System. In, A. Zygielbaum (ed) Earth and Space Science Information Systems, AMerican Institute of Physics Conference Proceedings 283, AIP Press, NY, PP. 216-226. Michener, W.K., A.B. Miller, and R. Nottrott. 1990. Long-Term Ecological Research Network Core Data Set Catalog. Belle W. Baruch Institute for Marine Biology and Coastal Research, University of South Carolina, Columbia, SC. NRC. 1991. Solving the global change puzzle: A U.S. Strategy for Managing Data and Information. A report by the Committee on Geophysical Data, Commission on Geosciences, Environment, and Resources, National Research Council. National Academy Press, Washington, DC. Palmer, J.D. and A. Fields. 1994. Report on the Environmental Monitoring and Assessment Program, Surface Waters JAD Session, Surface Waters Facilities, Corvallis, OR, June 6-9, 1994. June 23, 1994. Report completed for the U.S. Environmental Protection Agency, Office of Research and Development, Environmental Monitoring and Assessment Program (EMAP), Washington, DC. Report prepared by George Mason University, Fairfax, VA. Shepanek, R. 1994. EMAP Information Management Strategic Plan: 1993-1997. EPA/620/R-94-012. U.S. Environmental Protection Agency, Office of Research and Development, Environmental Monitoring and Assessment Program (EMAP), Washington, DC. Strebel, D.E., B.W. Meeson, and A.K. Nelson. 1994a. Scientific information systems: A conceptual framework. In: Environmental Information Management and Analysis: Ecosystem to Global Scales. W.K. Michener, J.W. Brunt, and S.G. Stafford, eds. Taylor and Francis, Bristol, PA. Strebel, D.E., D.R. Landis, K.F. Huemmrich and B.W. Meeson. 1994b. Collected Data of The First ISLSCP Field Experiment, Volume 1: Surface Observations and Non-Image Data Sets. Published on CD- ROM by NASA. USEPA. 1993a. Summary of the Proof of Concept Joint Application Design (JAD) Session II. January 15, 1993. U.S. Environmental Protection Agency, Office of Research and Development, Environmental Monitoring and Assessment Program (EMAP), Washington, DC. USEPA. 1993b. System Design Specifications for the Proof of Concept (POC). February 26, 1993. U.S. Environmental Protection Agency, Office of Research and Development, Environmental Monitoring and Assessment Program (EMAP), Washington, DC. ------- Drait^0_A£riM_995i USEPA. 1994. EMAP Information Management Virtual Repository. Draft September 9, 1994. U.S. Environmental Protection Agency, Office of Research and Development, Environmental Monitoring and Assessment Program (EMAP), Washington, DC. ------- Drait^0_A£riM_995i APPENDIX DETAILED SPECIFICATIONS FOR DATA SET CATALOG CONTENTS ------- Drait^0_A£riM_995i This appendix is intended to be a field-by-field technical reference for those concerned with finalizing, formatting, and submitting catalog documents to the EMAP Information Management System. Note that, although all information is entered and stored in individual fields for entry into the data base management system, the information is designed to be provided to potential users in a document-like report format that is easy to read and understand (see Chapter 5). A.l DATA SET IDENTIFICATION This section of the catalog contains information used to identify the data set. Most of this information is taken directly from the data set directory. Title: Title Description: This field contains the title of the data set catalog entry. Recommendation: The title of the data set catalog entry, in most cases, will be similar to the title of the data set. The field should be a variable-length text field. Typically the title is no more than 160 characters. Cat Author: Catalog Author Description: The name or names of the author(s) of the catalog document are provided using this field. One name, last name first, should be entered into each instance of the field. The field may be repeated as many times as necessary. Note that those making significant revisions to an existing document should add their name to the author list as well as update the Catalog revision date. Recommendation: This should be a variable length character field. Cat Rev Date: Catalog revision date Description: The scientific documentation of a data set will change over time to reflect the results of new analyses and to incorporate information concerning additional data uses, limitations, and publications. The catalog revision date field is used to record when the catalog entry was last revised. Recommendation: This field is mandatory. Dates should be given in the format YYYY-MM-DD, where DD is the two-digits for the date, MM is the two digits ------- Drait^0_A£riM_995i signifying the month, and YYYY is the four-digit year. Leading zeroes are used in the entry if needed. Data Set Name: Data set name Description: This field contains a descriptive name for a data set. This field is also included in the directory. Recommendation: The format of this field is specified in the guidance provided for building directory entries. Task Group: Task Group Description: Name of the EMAP task group from which the data set originates. Recommendation: This field is mandatory. EMAP task groups are referenced using a unique two-digit code. Valid codes for each task group are given in Table A-l. Table A-l. Valid entries for EMAP Task Groups 01 Estuaries 02 Forests 03 Surface Waters 04 Agricultural Lands 05 Rangelands 06 Great Lakes 07 Landscape Ecology 08 Wetlands 09 Assessment 10 Design and Statistics 11 Information Management 12 Indicators 13 Integration 14 Landscape Characterization 15 Logistics and Methods 16 Stressors 17 EMAP Center ------- Drait^0_A£riM_995i Data Set Id: Data set identification code Description: Number assigned by information management personnel within an EMAP task group to identify a data set. This field is mandatory. The data set identification is a positive whole number. Each task group will ensure that no data set from the same task group is assigned the same number. The Data Set ID and the Task Group will be concatenated to determine a unique identifier for each data set; therefore, different task groups having the same data set identification number will not present a problem. Version: Version number for a data set Recommendation: This field is mandatory. The version number should be a positive whole number. Comment: Any change in a data set results in the creation of a new data set or an updated version of an old data set. All changes need to be documented and that documentation included as part of the data set catalog. Req Acknowl: Requested acknowledgement Description: Because the data set may be referenced in scientific publications, a suitable acknowledgement for its use should be suggested. Data sets collected by EMAP staff under programmatic auspices will have a general EMAP acknowledgement statement. Where a data set is the product of a specific principal investigator's work, and should be so referenced, the principal investigator should provide appropriate acknowledgement text. For example, it could be requested that a specific published paper describing the data set be referenced or that certain individuals who made exceptional contributions be acknowledged by investigators using the data set. Recommendation: This field should be a variable length character field. A.2 INVESTIGATOR INFORMATION This section of the data set catalog contains information identifying the individuals, or group of individuals, who produced the data set. This normally means principal investigators and their immediate associates. Contact people and procedures are handled in a separate section. Separate roles have been defined for investigators that have been responsible for various aspects of the development of a data set. These roles reflect that EMAP monitoring has been completed using multiple teams of contractor and federal staff and that ------- Drait^0_A£riM_995i often no one investigator is responsible for all activities related to a data set. The fields may be repeated as many times as is necessary to represent multiple investigators. This section of the catalog references investigator names only; additional information about each named individual is provided in the Personnel Information section of the catalog (Section 3.13) Princ Invest: Principal Investigator Description: The principal investigator is the scientist responsible for all aspects of the steps associated with the creation of the data set including sampling design, sample collection, sample processing, data analysis, and publication. That person may be the technical director of an EMAP Resource Group, a lead scientist from one of the USEPA laboratories, an academic scientist working under a cooperative agreement, or a contractor. Recommendation: The first name, middle initial, and last name of an individual should be given. Titles and other information should be omitted. Sam Coll Invest: Sample Collection Investigator Description: The individual who directly supervised the collection of samples should be named in this field. Recommendation: The first name, middle initial, and last name of an individual should be given. Titles and other information should be omitted. Sam Proc Invest: Sample Processing Investigator Description: The individual who directly supervised the processing of samples should be named in this field. Recommendation: The first name, middle initial, and last name of an individual should be given. Titles and other information should be omitted. Data Anal Invest: Data Analysis Investigator Description: The individual who directly supervised the analysis of data leading to the current data set should be named in this field. ------- Drait^0_A£riM_995i Recommendation: The first name, middle initial, and last name of an individual should be given. Titles and other information should be omitted. Add Invest: Additional Investigators: Description: Other individuals associated with the creation of this data set should be named in this field. Recommendation: The first name, middle initial, and last name of an individual should be given. Titles and other information should be omitted. It is not necessary to provide information for principal investigators to complete a data set catalog entry. If principal investigator information is missing, the name of the contact person referenced in the directory will be given. If no contact person is identified, the name of the data center originating the data set will be given. A.3 DATA SET ABSTRACT A brief summary of the data set, like the abstract of a paper, allows a potential user to browse without learning all of the details. The General Keyword field can be repeated as many times as necessary. These fields are used to provide an overview of the data set as well as to provide information that can be extracted into indexes and summaries of aggregates of data sets. Abstract: Abstract of the Data Set Description: The Abstract should be a paragraph or two that summarizes the main points expanded in the following sections. That is, the short story on what data was collected, how it can be used, how it has been used, and how good is it. Recommendation: This field is should be a variable length character field. Gen Keyword: Keywords for the Data Set Description: Any appropriate keywords should be listed in the Keyword field. Recommendation: This field is restricted to no more than 40 characters. It may be repeated as often as necessary. ------- Drait^0_A£riM_995i A.4 OBJECTIVES AND INTRODUCTION This material can usually be abstracted from research proposals or project plans and provides the general information concerning why the particular study was completed. The material presented is organized into several variable-length text fields. Information must be provided for each field to complete a data catalog entry. The Program Objective field will describe overall EMAP program objectives using standard text supplied by the Information Management Staff, and need not be completed by the catalog author. Prog Objectiv: Program Objective Description: The field presents the objective of the program under which the data set was collected. The objective of the program provides a general background for the specific data set. Many data sets will have the same program objective. Recommendation: This field is a variable length text field. No limit is specified, but typically one to three paragraphs is sufficient to define the general objectives of a program. Data Objectiv: Data Set Objective Description: The objective or purpose of collecting the data set is specified in this field. Recommendation: This field should be a variable length text field. No limit is specified, but typically one paragraph is sufficient to define the specific objectives of collecting a data set. Data Backgrd: Data Set Background Information Description: This field contains background information for the data set and helps the potential data user understand the rationale for collecting the data.that were to be met by the collection sets this data collection activity in a broader context of scientific motivation and interactions with other data available or to be collected. Recommendation: This field should be a variable length text field. No limit is specified, but typically several paragraphs are used to provide background information for a data set. ------- Drait^0_A£riM_995i Parameter Sum: Summary of data set parameters Description: This field contains a summary of the parameters in the data set. The field is not a duplication of the data set dictionary and should not contain a simple list of parameters included in the data set. Recommendation: This field should be a variable length text field. No limit is specified, but typically one or two paragraphs are used to summarize parameters in the data set. A.5 METHODS The methods employed to create the data set are summarized in this section of the catalog. More detailed descriptions of methods, including standard operating procedures (SOPs), are referenced in the data set catalog and are more appropriately stored in separately from the catalog. Descriptions of methods are organized in two subsections: data acquisition (including sampling) and data preparation (including sample processing). A.5.1 Data Acquisition Samp Objectiv: Sampling Objective Description: This field contains a brief summary of the objective of data collection or sampling. Recommendation: This field is a variable length text field. No limit is specified, but typically one or two sentences is sufficient to define the general objectives of sampling. Sampl Method: Sample Collection Methods Summary Description: This field provides a brief summary of the sampling method used. Recommendation: This field should be a variable length text field. No limit is specified, but typically one or two paragraphs is usually sufficient to summarize methods. ------- Drait^0_A£riM_995i Beg Sampl Date: Beginning Sampling Date Description: The earliest data acquisition date in the data set. Recommendation: This field should be a date field. End Sampl Date: Ending Sampling Date Description: The latest data acquisition date in the data set. Recommendation: This field should be a date field. Platform: Sampling Platform Description: This field contains information about the platform from which sampling was conducted. Platforms include boats, vehicles, satellites, etc. Recommendation: This field could be restricted to specific terms defined in a look-up table; however, it is more flexible if formatted as a variable length text field, restricted to 80 characters. Sample Equip: Sampling Equipment Description: A description of the sampling equipment or instrument used to collect the samples is given in this field. Recommendation: This field is a variable length text field. No limit is specified, but typically one or two sentences is sufficient to define the general objectives of sampling. Equip Manufac: Manufacturer of Sampling Equipment Description: This field contains the name of the manufacturer of the equipment specified in the field 'Sample Equip'. Recommendation: This field is a variable length text field restricted to no more than 80 characters. Key Variables: Key Variables Description: This field presents those variables measured directly with the sampling equipment or instrument listed above. The field may have a value of "none" if no data were directly generated as a result of the use of particular equipment. ------- Drait^0_A£riM_995i Recommendation: This field is a variable length text field having no length restrictions. Sam Meth Cal: Sampling Method Calibration Descriptions: Specific calibration procedures for sampling equipment or instrumentation is documented in this field. Recommendation: This field should be a variable length text field. Sam Qual Con: Sample Collection Quality Control Descriptions: Specific procedures used to ensure the consistent quality of samples is documented in this field. Recommendation: This field should be a variable length text field. Sam Meth Ref: Sample Collection Method Reference Description: This field contains the bibliographic reference to the complete description of the sample collection methods. Recommendation: This field should be a variable length text field. Note: The data set catalog should be linked to files containing the complete descriptions of methods and standard operating procedures. These text files may be extensive due to documentation of step-by-step procedures that are not normally provided in the data set catalog. Sam Meth Dev: Sample Collection Method Deviations Description: This field is used to document any known deviations from the methods referenced in the previous field. Departures from standard procedures need to be documented as completely as possible. These descriptions are useful to scientists interested in assessing the overall quality of the data. Further, these descriptions are necessary to evaluate the significance of any long-term changes identified from the integration of separate data sets collected in multiple years. Recommendation: This field should be a variable length text field. ------- Drait^0_A£riM_995i A.5.2 Data Preparation and Sample Processing Proc Objectiv: Data Preparation Objective Descriptions: This field contains a brief summary of the objective of data preparation and sample processing steps. Recommendation: This field should be a variable length text field. No limit is specified, but typically one or two sentences is sufficient to define the general objectives of sampling. Data Proc Sum: Data Processing Methods Summary Descriptions: This field provides a brief summary of the methods used to process samples. Recommendation: This field should be a variable length text field. No limit is specified, but typically one or two paragraphs is usually sufficient to describe processing methods. Sampl Proc Calib: Sampling Processing Method Calibration Description: Specific procedures used to calibrate instruments or gear used to process samples is documented in this field. Recommendation: This field should be a variable length text field. Proc Qual Con: Sample Processing Quality Control Description: Specific procedures used to ensure the consistent quality of samples is documented in this field. Recommendation: This field should be a variable length text field. Samp Proc Ref: Sample Processing Method Reference Description: This field contains the bibliographic reference to the complete description of the data processing and methods. Recommendation: This field should be a variable length text field. Note: The data set catalog should be linked to files containing the complete descriptions of methods and standard operating procedures. These text files may be extensive due to documentation of step-by-step procedures that are not ------- Drait^0_A£riM_995i normally provided in the data set catalog. Sampl Proc Dev: Sample Processing Method Deviations Description: This field is used to document any known deviations from the methods referenced in the previous field. Departures from standard procedures need to be documented as completely as possible. These descriptions will be useful to scientists interested in assessment the overall quality of the data and are necessary to assess long-term changes from the integration of data sets collected in multiple years. This field should be a variable length text field. A.6 Data Manipulations This section of the data set catalog provides documentation for any manipulations of the data subsequent to data acquisition and data preparation. Data manipulations include: • conversions to different units (mg/1 to mg/kg for example) • transformation of continuous data values to discrete values (for example, conversion of benthic index values to nondegraded and degraded categories) • deviation of values from existing data (for example, calculation of water density from temperature and salinity values) Documentation of these data manipulations should be sufficiently detailed so that future users of the data will fully understand what was done. The fields used in this section may be repeated as many times as is necessary to document the data manipulations used to produce the final data set. Deriv Value Name: Name of New or Modified Value Description: This field contains the name of any new or modified value which was created as a result of data manipulations. Recommendation: The length of this field should be restricted to 80 characters. Data Manip Desc: Data Manipulation Description Description: This field contains an in-depth description of any data manipulations used to create or modify the contents of the data set. This field is mandatory and should include complete descriptions of algorithms or reference information (e.g. ------- Drait^0_A£riM_995i look up tables) applied. For complex formulas, remember that digital transfer between machines and software works best with simple ASCII text. It may be necessary to use a "pseudo-code" approach to define your algorithm in a stepwise fashion. Recommendation: This field should be a variable length text field. Data Manip Exmpl: Data Manipulation Examples Description: This field contains a few examples of the input data and values derived from them using the algorithm described. Intermediate calculations need not be shown, but the examples should be sufficient for someone who works through or programs the algorithm to determine that they achieve the correct results. Recommendation: This field should be a variable length text field. Man Code File: Data Manipulation Computer Code File Description: This field contains the name and location of the file containing the original computer code used to complete data manipulations. Providing an entry in this field is optional. Recommendation: The length of this field should be restricted to 80 characters. Man Code Lang: Data Manipulation Computer Code Language Description: This field documents the language of the computer code used to complete data manipulations (for example, SAS, FORTRAN, BASIC, C, etc.). This field need should be completed if either the preceding or following items are used. Recommendation: The length of this field should be restricted to 40 characters. Manip Code: Data Manipulation Computer Code Description: This field can be used to provide the specific segment of computer code that performs the calculations or derivations described in the Data Manip field. Completing this field is optional, but is encouraged if complex algorithms have been used. Recommendation: This field should be a variable length text field. ------- Drait^0_A£riM_995i A.7 Data Description The data description contents and accuracy are primarily a function of the investigator submitting the data set. However, the description provides the basis for organizing the data itself in the EMAP relational data base management system (DBMS), and will be entered into the DBMS data dictionary and used heavily by the information management staff. This usage will generate feedback to the investigator and to this catalog document through efforts to standardize like parameter names and definitions across data sets, checks of maximum and minimum values, reports generating data record examples, and identification of related data sets. A.7.1 Description of Parameters Parameters in the data set are listed and described in this section of the data catalog. This information is abstracted from the data dictionary for the EMAP monitoring data base and may be presented in tabular form. The following information should be provided for each parameter in the data set: Param Name: Parameter Name Description: The name given to the quantity in the data set (e.g. a column name in the ORACLE data base and/or the ASCII text file. Recommendation: This field should be a character field of length 40. SAS Param Name: SAS Parameter Name Description: This is an optional field to be used if the data set is distributed in a SAS data file format. It contains the shortened form (i.e. 8 character limit) of the parameter name used in SAS. Recommendation: This field should be a character field of length 8. Param Descrip: Parameter label or description Description: The full English name of the quantity, with adjectives or modifiers as required to be scientifically precise. Recommendation: This field should be a character field of length 80. Units: Units of measurement Description: The physical units of measure used for recording the value in the data set. Indexes and other unitless numerical values should be indicated with the ------- Drait^0_A£riM_995i term "Unitless". Qualitative values or quantities which are not associated with a measurement scale should be indicated "N/A" Recommendation: This field should be a variable length text field. Data Type: Parameter data type Description: This field describes the fundamental digital storage type of the parameter, primarily for use with programs that read or interpret the data. For data in the relational data base, this will normally be numeric, character, or date. When an ASCII data file is being described, it may be useful to also specify numeric fields as integer, real, or complex. Data encoded in application specific formats should be described as binary. Recommendation: This field should be a variable length text field. Precision: Precision to which values are reported Description: The precision of a value is the smallest value increment that is reported by the instrument or procedure by which the value is determined. For example, the last digit on an electronic display of an instrument represents the instrument precision. If the values in the data set are recorded or calculated at less than the instrument precision, then the actual smallest value increment in the data set should be entered in this field. For example, even though a GPS unit reports location to the nearest hundredth of a meter, the investigator decides it is most appropriate to report the data only to the nearest meter. Then the precision of the coordinate value would be 1 meter. Recommendation: This field should be real numeric field, and should use the same units as the parameter value itself. Accuracy: Accuracy of the data values Description: The accuracy of a value is an indication of the total measurement error (random and/or systematic) associated with the value. This can be as small as +/- 1/2 of the instrument precision, but in most cases is significantly larger. The accuracy is normally determined by observing the deviations in calibration tests or repeated measurements of the same sample. Calculated values have accuracies determined from the individual accuracies by the standard statistical rules for propagation of error. The absolute value of the measured or calculated ------- Drait^0_A£riM_995i accuracy for the values of this parameter in the data set should be entered in this field. Recommendation: This field should be a real numeric field and should use the same units as the parameter value itself. Act Min Value: Minimum Value in Data Set Description: This field is used to record the actual minimum value recorded in the data set. It is used for numeric fields only. Recommendation: This field should be a numeric field. Act Max Value: Maximum Value in Data Set Description: This field is used to record the actual maximum value recorded in the data set. It is used for numeric fields only. Recommendation: This field should be a numeric field. A.7.2 Data Record Example A display of a limited number of records or observations in the data set assists the data user in understanding the structure and composition of the data set. This section of the data set catalog presents in tabular form a display of several example records from the data set. For simple data sets, the first couple of records may suffice. If there is a complex structure to the data (widely varying parameters, or parameter dependent measurements, for example), it may be useful to select a few records that illustrate the range of values or the patterns of missing values. In no case should more than 20 records be included here. The records to be displayed can be provided by the investigator or can be generated from the EMAP relational data base after the data set is loaded into the data base. If the data set is complex or there are specific examples that should be shown, the investigator should elect to provide the material at the time the catalog document is written. The Example Data Values line is repeated once for each data set record included. Special arrangements should be made if this field is to be used in the description of binary data sets (for example, SAS files, GIS coverages, or a satellite images). The field serves two purposes: to give the user a preview of the scientific data and to give the programmer who must access the data formatting and verification information. For the first purpose, if numbers are relevant, they can be extracted, and a "pseudo record" constructed. Similarly, a thumbnail version of a displayed GIS or image data set could be attached to the catalog document. ------- Drait^0_A£riM_995i Actual Byte values can be included to meet the second purpose, if necessary - e.g. A TIFF image header and the corresponding values in the data file could be listed in the two fields. Header Line: Column Names for Example Records Description: This field gives the names of each column (in order) shown in the sample data records. Recommendation: This field should be a variable length text field. Exmpl Data Values: Example Data Records Description: The example data values from a data set record are recorded in this field. They should be spaced so as to line up with the column names in the header line, when displayed in a fixed (non proportional) font. Recommendation: This field should be a variable length text field. A.7.3 Related Data Sets The names and identification codes of data sets containing similar or related data are referenced in this section of the data catalog. The references should be entered exactly to make possible relational links (via the RDBMS) to the documentation for those data sets. These fields can be repeated to accommodate multiple data set references. Related DS Name: Related Data Set Name Description: The name of a data set containing similar or related data is recorded in this field. The contents of this field should match the contents of the "Data Set Name" field of the referenced data set. Recommendation: The format of this field is the same as the Data Set Name field, and is specified in the guidance provided for building directory entries. Related DS ID: Related Data Set Identification Code Description: The identification code for a data set containing similar or related data is recorded in this field. The contents of this field should match the contents of the "Data Set Id" field of the referenced data set. ------- Drait^0_A£riM_995i Recommendation: The format of this field is the same as the Data Set Id field, and is specified in the guidance provided for building directory entries. A.8 GEOGRAPHIC AND SPATIAL INFORMATION This section of the data set catalog will contain information about the spatial coverage of the data set. Information specific to the documentation of spatial data sets will also be provided in this section of the catalog. The documentation of geospatial data sets will be based upon the spatial metadata standards being developed by the Federal Geographic Data Committee (FGDC). Note, however, that the FGDC standards address only a subset of complete scientific data set documentation, and do so at a level of detail appropriate for a granule of data (i.e. a single GIS coverage or a single satellite image). The catalog contains descriptive information common to all data granules in a data set. Granule specific details in the FGDC standard are not appropriately covered in the catalog, but may be included in an inventory specifically designed for an individual data set. The FGDC standards are intended to provide a common set of terminology and definitions; they explicitly do not specify (i) the means to organize information in a computer system, (ii) the means to organize information in a data transfer, or (iii) the means by which the information is transmitted , communicated, or presented to the user. Note also that the standards specify acceptable output formats for four types of values: (1) calendar dates (YYYYMMDD and variants thereof), (2) time of day (HHMMSSSS, either local, local with UT differential, or UT), (3) latitude and longitude (decimal degrees), (4) network addresses and file names (URL internet convention). The information need not be submitted or stored in these formats, although it may be convenient to do so, as long as the necessary information is present. Data management systems such as ORACLE include sophisticated date, time, numeric, and lexical conversion and manipulation functions that allow essentially arbitrary report formatting to be determined by the user. Most of the relevant details of the sections of the FGDC metadata, i.e. data set identification, quality, parameters, and distribution information, will be addressed elsewhere in the catalog in the broader context of documenting all data sets uniformly. In the geographic coverage section, specific spatial data organization and reference information will be supplied if applicable to the data set. Min Longitude: Minimum Longitude Description: The bounding coordinates (West, East, North, South) give the limits of coverage of a data set expressed by latitude and longitude values. Under most circumstances the minimum longitude value is the Western Bounding Coordinate. For data sets that include a complete band of latitude around the Earth, the Western Bounding Coordinate shall be assigned the value -180.0, and the Eastern Bounding Coordinate shall be assigned the value 180.0. Latitude and longitude are expressed in decimal degrees. ------- Drait^0_A£riM_995i Recommendation: This should be a real numeric field. Max Longitude: Maximum Longitude Description: The bounding coordinates (West, East, North, South) give the limits of coverage of a data set expressed by latitude and longitude values. Under most circumstances the maximum longitude value is the Eastern Bounding Coordinate. For data sets that include a complete band of latitude around the Earth, the Western Bounding Coordinate shall be assigned the value -180.0, and the Eastern Bounding Coordinate shall be assigned the value 180.0. Latitude and longitude are expressed in decimal degrees. Recommendation: This should be a real numeric field. Max Latitude: Maximum Latitude Description: The bounding coordinates (West, East, North, South) give the limits of coverage of a data set expressed by latitude and longitude values. Under most circumstances, the maximum latitude will be the North Bounding Coordinate. Latitude and longitude are expressed in decimal degrees. Recommendation: This should be a real numeric field. Min Latitude: Minimum Latitude Description: The bounding coordinates (West, East, North, South) give the limits of coverage of a data set expressed by latitude and longitude values. Under most circumstances the minimum latitude will be the South Bounding Coordinate. Latitude and longitude are expressed in decimal degrees. Recommendation: This should be a real numeric field. Geo Keyword: Name of the area or region Description: This should be a searchable indirect spatial reference. An indirect spatial reference describes a location without using coordinates, usually by a feature such as apolitical entity (county, state), a road, or a geological province. The reference should uniquely identify the spatial extent of the data set, e.g. by using a name or a code that identifies the feature (such as a county FIPS code or a HUC code). To include multiple overlapping names (e.g. an EPA Region and ------- Drait^0_A£riM_995i the states within it), the Geo Keyword field can be repeated as many times as necessary. Recommendation: This field should be a variable length character field. Spatial Ref Meth: Direct Spatial Reference Method Description: This field provides information on the system of objects used to represent space in the data set. Valid entries are limited to the values "Point", "Vector", or "Raster". Recommendation: This should be a fixed character field of length 6. Horiz Coord Sys: Horizontal Coordinate System Used Description: This field names the reference frame or system from which linear or angular quantities are measured and assigned to the position that a point occupies. In general this will be "Geographic" (for latitude/longitude references) or the name of a planar map projection (such as Albers Equal Area; Universal Transverse Mercator; or Space Oblique Mercator). Where a special or unique projection is used (i.e. where the relationship between the coordinates and geographic coordinates is not known), use the term "Local System". Recommendation: This should be a variable length character field Horiz Resolution: Resolution of Horizontal Coordinates Description: The minimum distance between two adjacent coordinate values or grid cells. In raster data sets, these values normally are the dimensions of the pixel or grid cell. In vector data sets, the resolution is the shortest line that is encoded in the data set. Recommendation: This should be a real numeric field Horiz Coord Units: Units for Horizontal Coordinates Description: The units of measure in which the coordinates are reported. Recommendation: This should be a variable length character field. ------- Drait^0_A£riM_995i Vertical Coord Sys: Vertical Coordinate System Description: This field names the reference frame or system from which vertical distances (altitudes or depths) are measured. This will normally consist of an Altitude Datum Name (e.g. North American Vertical Datum of 1988) or a Depth Datum Name (e.g. Local Surface, Mean Sea Level) Recommendation: This field should be a variable length character field. Vertical Resolution: Resolution of Vertical Coordinates Description: The minimum vertical distance between two adjacent altitude or depth values. Recommendation: This should be a real numeric field Vertical Coord Units: Units for Vertical Coordinates Description: The units of measure in which the vertical coordinates are reported. Recommendation: This should be a variable length character field. A.9 QUALITY CONTROL/QUALITY ASSURANCE Quality control and quality assurance information is used to understand the limits of the data. This information includes: methods used to measure and ensure data quality, measurement quality objectives, and summaries of data quality parameters. Specific data collected to assess data quality may be included in this section of the data set catalog, or may be included in a separate data set that is referenced in the data set catalog. Meas Qual Obj: Measurement Quality Objectives Description: This field lists any specific a priori objectives established for measurement or sampling data. Recommendation: This should be a variable length character field. ------- Drait^0_A£riM_995i QA/QC Meth: Quality Assurance/Control Methods Description: This field should hold a brief description of the methods used to measure and ensure data quality. Recommendation: This should be a variable length character field. Act Meas Quality: Actual Measurement Quality Description: The results of any assessments of measurement quality, including the measurement error for parameters and variables and a summary of data quality parameters. For spatial data, the positional accuracy should be noted. Recommendation: This should be a variable length character field. Sources of Error: Sources of Error Description: Any uncontrolled factors which may have impacted the quality of the measurements should be described in this field. In particular, sources of error that cause the actual measurement quality to vary greatly from the measurement quality objective for a parameter should be noted. Also include systematic measurement errors that may not be reflected in quality assessments, but may affect the usefulness of the data in analysis. Recommendation: This should be a variable length character field. Known Data Prob: Known Problems with the Data Description: This field should contain a discussion of problems that have been documented for the data set. Recommendation: This should be a variable length character field. Conf Level Stmnt: Confidence Level/Accuracy Judgement Description: Subjective statement of Investigator's confidence in the data. Recommendation: This should be a variable length character field. ------- Drait^0_A£riM_995i Allow Min Value: Allowable Minimum Values Description: List the parameter name and the allowable (physically, biological, mathematical limits) maximum value. Repeat the field as many times as required to describe the parameters in the data set. Recommendation: This should be a variable length character field. Allow Max Value: Allowable Maximum Values Description: List the parameter name and the allowable (physically, biological, mathematical limits) minimum value. Repeat the field as many times as required to describe the parameters in the data set Recommendation: This should be a variable length character field. QA Ref Data: QA Reference Data Description: May include actual reference data or the name of a file or document that has the appropriate reference data. Recommendation: This should be a variable length character field. A.10 DATA ACCESS In an integrated environment, metadata for a data set will be directly linked to the data. Data access will be direct and the user will require no additional information. However, there will be instances when direct links from the metadata to the data will not be possible. In those instance where the metadata are not linked to data, it is necessary to provide information regarding data access. This section is currently designed to provide users with information on how to access data. The section may be expanded to include information concerning data archiving; however, data archiving is primarily a data management function that is of little interest to the user. It is recommended that data archiving information be stored separately from the data set catalog. Data Access: Data Access Procedures Description: Procedures for accessing data is given in this field. Procedures should provide general information about the different ways users can access data including telephone contact, anonymous FTP WWW, Gopher, WAIS, dial-in lines, etc. Recommendation: This field should be a variable length text field. ------- Drait^0_A£riM_995i Data Access Restrict: Data Access Restrictions Description: If there are restrictions on accessing or using the data, they should be explained clearly in this field. Recommendation: This field should be a variable length text field. Data Access Contact: Data Access Contact person Description: The primary person to contact to obtain the data set or information about the data set. For a data set that is located in a Task Group data base, this will normally be the data librarian. For data sets available through the EMAP IMS, the IMS staff will provide the name of the appropriate contact. The name and address of the person or organization to contact should be included in this field using the fields that provide information for the principal investigator as a guide. Recommendation: This field should be a variable length text field. Data Set Format: Data Set Format Description: Give a description of the format of the file(s) containing the data set, e.g. ASCII, SAS, ARC/Info Export. Repeat this field once for every format type that is available. Recommendation: This field should be a variable length text field. FTP Infor: Information Concerning Anonymous FTP Description: Information concerning the access of EMAP data sets using the Internet and anonymous file transfer protocols should be provided as part of the data access documentation. This information should include the URL (Universal Resource Locator), name of the node on the network, login and password information, and the name of the directories containing the data of interest. This information should be provided for those data sets that contain unrestricted information only. Recommendation: This field should be a variable length text field. Gopher Infor: Information Concerning Gopher Description: Information concerning the access of EMAP data sets using Gopher services via the Internet should be provided as part of the data access ------- Drait^0_A£riM_995i documentation. This information should include the URL (Universal Resource Locator), name of the node on the network, login and password information, and the name of the directories containing the data of interest. This information should be provided for those data sets that contain unrestricted information only. Recommendation: This field should be a variable length text field. WWW Infor: Information Concerning World Wide Web Description: Information concerning the access of EMAP data sets using WWW browsers via the Internet should be provided as part of the data access documentation. This information should include the URL (Universal Resource Locator), name of the node on the network, login and password information, and the name of the directories containing the data of interest. This information should be provided for those data sets that contain unrestricted information only. Recommendation: This field should be a variable length text field. EMAP CD-ROM: EMAP CD-ROM Containing the Data set Description: CD-ROM is becoming a widely used method to distribute data and metadata. It is likely that EMAP will also distribute data using CD-ROM technology sometime in the near future. This field documents the name of the CD-ROM that contains the data set. The field may be repeated in where the data set is saved on more than one CD-ROM. Recommendation: This field should be a variable length text field. A. 11 REFERENCES This section of the data set catalog provides a list of any published documentation relevant to the data collected. Documentation may include manufacturer's instruction manuals, government technical manuals, user's guides, etc. Also referenced should be any technical reports and scientific publications concerning the methods, instruments, or data described in this document. Publications by the Principal Investigator or investigating group that would help a reader understand or analyze the data are particularly important. The format of the bibliographic references is taken from the EMAP bibliographic data base specification. The section is broken into two subsections to accommodate references that will be maintained in the EMAP bibliographic data base (detailed reference format) and those background and general references which are primarily of use to the reader and will not be tracked (brief reference format) ------- Drait^0_A£riM_995i A.ll.l EMAP References References given in this section are those published or used as part of EMAP and which should be tracked in the EMAP bibliographic data base. RefType: Reference Type Description: This field indicates the type of documentation being referenced. Valid values include Journal Article, Workshop Proceedings, Book, Report, Film, Video Tape, Audio Tape, CD-ROM. Recommendation: This should be a variable length character field. Ref Author: Reference Author Description: Names of the authors are given in this field. It is repeated (in conjunction with the following field) as many times as necessary to list all authors. Recommendation: This should be a variable length character field. Ref Authors Affil: Reference Author's Affiliation Description: The professional affiliation and address of each author is provided in this field. It is paired with the previous field, and is repeated once for each author. Recommendation: This should be a variable length character field. Ref Title: Title of Reference Description: The full title of the referenced material is given in this field. Recommendation: This should be a variable length character field. Volume Title: Journal or Volume Title Description: If the reference is an article in a journal, a proceedings volume, or other compendium, the title of the full work is given in this field. Recommendation: This should be a variable length character field. ------- Drait^0_A£riM_995i Volume Editor: Journal or Volume Editor Description: If the reference is an article in a compendium or other edited work, the name of the overall editor is entered in this field. Recommendation: This should be a variable length character field. Page Ref: Page and Volume Reference Description: The standard volume and page citation information should be entered into this field. Recommendation: This should be a variable length character field. Date of Ref: Date the Reference was Published Description: This field contains the publication date for the reference. Recommendation: This should be a variable length character field. Place of Pub: Location of Publishing Organization Description: The location (city, state, country) of the organization publishing the reference is listed in this field. Recommendation: This should be a variable length character field. Publisher: Name of Publishing Organization Description: The business name of the publishing organization is entered in this field. Recommendation: This should be a variable length character field. Ref Other ID: Reference Report Number or Other ID Description: For special reports, technical memoranda, or other material that carries an internal organizational identification code or report number, it should be recorded in this field. Recommendation: This should be a variable length character field. ------- Drait^0_A£riM_995i Procite Rec Num: Procite Record Number for the Reference Description: If the reference has a Procite record number, it should be entered in this field. Recommendation: This field should be formatted to match the Procite record number format. A.11.2 Background References The references listed in this section are materials in the open literature which may support the collection, use, and interpretation of the data set, but are not directly related to or products of EMAP. As such they need not be tracked in the EMAP bibliographic data base, but should be captured as part of the documentation. A single text field is provided, but it should contain all of the relevant information that would otherwise have been listed in the previous section. Supt Ref: Supporting Reference Description: This field contains the reference information as described above for a single supporting reference relevant to the data set. It may be repeated as many times as required. Recommendation: This field should be a variable length text field. A.12 GLOSSARY AND TABLE OF ACRONYMS The detailed documentation for each data set is likely to contain terms and acronyms that are unfamiliar to some potential users of the data. These acronyms will be defined the first time they are used; however, due to the length of the documentation, a separate table of acronyms is suggested to assist the user. Acronym: Acronym used in the Detailed Documentation Description: Acronyms used in the text of the detailed documentation should be listed using this field. The field should be repeated once for each acronym, in conjunction with the following field. Recommendation: This field should be a variable length text field. ------- Drait^0_A£riM_995i Acronym Def: Definition of Acronym Description: The full expanded version of each acronym should be given in this field. It is paired with the preceding field. Recommendation: This field should be a variable length text field. A.13 PERSONNEL INFORMATION This section of the data set catalog contains information identifying the individuals who are associated with the data set and named in the data set catalog. This normally includes principal investigators, co- investigators, catalog authors and contributors, data librarians, and other contact people. The fields used to identify data set personnel are similar to those used in the data set contact fields of the data set directory and may be linked to those fields. These fields may be repeated as many times as is necessary to identify people. The personnel named in the catalog entry and described in this section should be identified by their role with respect to this data set using the Role field. For example, if the person is named as Cat Author in section 1, then Role in this section would be Catalog Author. Similarly, the role is used to identify the Principle Investigator, the sub-investigators, and the contact people. There may be multiple individuals listed for a role or multiple roles for an individual. In implementation, these fields may be used to reconstruct the name fields in the primary sections if desired. That is, the Role, Last Name, First Name, and Middle Initial fields may be selected and concatenated to form the Cat Author, Principal Investigator Contact Person, etc., fields. However, we recommend that these fields be written and maintained separately within the catalog section of the data base, because the role field of the contacts data objects will be frequently updated and easily corrupted. Once lost in a blanket update of the contacts data object, catalog specific information may not be recoverable, especially if the catalog entry is not used for QA'd for an appreciable period after the update. Title: Formal title Description: The formal title for investigators is given in this field (i.e., Mr., Mrs., Miss, Dr., or Ms). Recommendation: This field is restricted to no more than 5 characters. Valid entries are given in Table A-2. Table A-2. Valid entries for contact title Dr Miss Mr Miss Prof ------- Drait^0_A£riM_995i LstName: Last name Recommendation: This field is restricted to no more than 30 characters. FrstName: First name Recommendation: This field is restricted to no more than 15 characters. Mid Init: Middle Initial Recommendation: This field is restricted to no more than 1 character. Role: Role described in the Catalog Description: The role is the name of the field in catalog sections 1-12 in which the person is listed, e.g., Catalog Author. Recommendation: This is a variable length character field. Valid entries role are provided in Table A-3. Table A-3. Valid entries for role Director, EMAP Technical Director Technical Coordinator Task Group Information Manager Data Center Information Manager Data Base Administrator Data Librarian Regional Environmental Services Division Director Principal Investigator Sample Collection Investigator Sample Processing Investigator Data Analysis Investigator Additional Investigator Catalog Author Reference Author Quality Assurance Officer Chief Scientist ------- Drait^0_A£riM_995i Address 1 Address2 Address3 Address4 Line 1 of address Line 2 of address Line 3 of address Line 4 of address Description: Organizational names, street addresses, rural route codes, or post office box numbers may be specified in four address fields (Address 1 through Address4). Recommendation: The four address fields are restricted to no more than 40 characters each. City: City Recommendation: This field is restricted to no more than 30 characters. State: State Recommendation: This field is restricted to no more than 2 characters. The two- letter state abbreviation is used to identify each state. Zip: Zip code Recommendation: This field is restricted to 10 characters to identify the zip code. Country: Country Recommendation: This field is restricted to no more than 40 characters. Voice Phone: Voice phone number Recommendation: Phone numbers should include area codes. Up to 18 characters can be used. Phone extension numbers, if applicable, can be added at the end of the phone numbers as 'XI234'. FAX Phone: FAX phone number Recommendation: Phone numbers should include area codes. Up to 18 characters can be used. Phone extension numbers, if applicable, can be added at the end of the phone numbers as 'XI234'. ------- Fields specifying the electronic mail addresses may be repeated as many times as needed to reflect multiple electronic mail addresses. Email addresses should be given for both internal (EPA) and external (Internet) networks to be accessible to the widest range of potential users. Email Address: Email address Recommendation: This field is restricted to no more than 80 characters. Email Network: Email network Recommendation: This field is restricted to no more than 20 characters. Valid entries for this field are given in Table A-4. Add EM Info: Additional Email Information Description: Any additional information concerning electronic mail addresses may be given in this field. For example, this field may be used to describe which Email network is preferred by a particular person or organization. Recommendation: This field is restricted to no more than 80 characters. ------- |