United States Office Of Water EPA 503/6-91/001 Environmental Protection (WH-556F) November 1990 Agency v>EPA National Estuary Program Tier I Data Management J ------- FINAL REPORT on NATIONAL ESTUARY PROGRAM TIER 1 DATA MANAGEMENT SYSTEMS SUMMARY U.S. Environmental Protection Agency Office of Water Office of Marine and Estuarine Protection October 19, 1990 ------- TABLE OF CONTENTS Page 1.0 INTRODUCTION 1 1.1 DIMS OBJECTIVES AND DESIGN ALTERNATIVES 1 1.2 EPA'S NATIONAL SYSTEMS 3 1.3 BASIS FOR DIMS PLANS 3 2.0 METHODS 4 3.0 DIMS DESCRIPTIONS 5 3.1 CHESAPEAKE BAY PROGRAM 6 3.1.1 Individual Roles 6 3.1.2 Major Features 7 3.1.3 DIMS Users 7 3.1.4 Tools or Products Available 7 3.1.5 QA/QC Techniques 8 3.1.6 Estimated Costs 8 3.1.7 System Development Milestones/Schedule 9 3.1.8 Recommendations 9 3.2 PUGET SOUND ESTUARY PROGRAM 9 3.2.1 Individual Roles 10 3.2.2 Major Features 11 3.2.3 DIMS Users 11 3.2.4 Tools or Products Available 11 3.2.5 QA/QC Techniques 12 3.2.6 Estimated Costs 12 3.2.7 System Development Milestones/Schedule 13 3.2.8 Recommendations 13 3.3 ALBEMARLE-PAMLICO ESTUARINE STUDY 14 3.3.1 Individual Roles 14 3.3.2 Major Features 14 3.3.3 DIMS Users 15 3.3.4 Tools or Products Available 15 3.3.5 QA/QC Techniques 15 3.3.6 Estimated Costs 16 3.3.7 System Development Milestones/Schedule 16 3.3.8 Recommendations 16 3.4 NARRAGANSETT BAY PROJECT 16 3.4.1 Individual Roles 17 3.4.2 Major Features 17 3.4.3 DIMS Users 18 3.4.4 Tools or Products Available 18 3.4.5 QA/QC Techniques 19 3.4.6 Estimated Costs 19 3.4.7 System Development Milestones/Schedule 19 3.4.8 Recommendations 19 3.5 LONG ISLAND STUDY 20 ------- TABLE OF CONTENTS (Continued) Page 3.6 BUZZARDS BAY PROJECT 21 3.6.1 Individual Roles 21 3.6.2 Major Features 22 3.6.3 DIMS Users 22 3.6.4 Tools or Products Available 22 3.6.5 QA/QC Techniques 23 3.6.6 Estimated Costs 23 3.6.7 System Development Milestones/Schedule 24 3.7 SAN FRANCISCO ESTUARY PROGRAM 24 3.7.1 Individual Roles 24 3.7.2 Major Features 25 3.7.3 DIMS Users... 26 3.7.4 Tools or Products Available 26 3.7.5 QA/QC Techniques 26 3.7.6 Estimated Costs 26 3.7.7 System Development Milestones/Schedule 27 3.7.8 Recommendations 27 4.0 SUMMARY 27 ------- LIST OF TABLES Page TABLE 1. MATRIX SHOWING SYSTEM FEATURES COMMON TO THE SIX TIER 1 ESTUARY PROGRAMS AND CHESAPEAKE BAY PROGRAM 29 TABLE 2. MATRIX SHOWING USER ACCESS FOR THE SIX TIER 1 ESTUARY PROGRAMS AND CHESAPEAKE BAY PROGRAM 33 ------- 1.0 INTRODUCTION Under the Clean Water Act, the Environmental Protection Agency (EPA) is accountable for environmental data collected as part of the National Estuary Program (NEP). Programs within the NEP are responsible for the management of data relevant to their mission. To manage these data, NEPs must adopt a data management strategy and implement data/information management systems (DIMS). DIMS can include local computing resources as well as EPA's national computer systems. 1.1 DIMS OBJECTIVES AND DESIGN ALTERNATIVES DIMS often serve multiple objectives. One common objective is to provide a centralized source of data and/or information that can be useful to a variety of analysts or managers. Another objective of an NEP DIMS is to archive the results of new studies conducted by NEP studies. Oftentimes, NEP DIMS are expected to make data or information available to the general public. A fourth possible objective of DIMS could be to provide online functions that help users to analyze data. Each of these objectives can lead to different system implementations. In certain situations, a "centralized" DIMS can refer to a nationwide system available to a broad range of potential users. A centralized DIMS can also denote a system that serves an array of users within one region, such as the community of scientists involved in an estuary program. During the early stages of the NEP, Tier I programs were assisted in the automation of their data into a centralized DIMS. At that time, efforts were made to load data sets into individual data files that were stored as SAS data sets on the National Computer Center (NCC). Although these data were centrally located, they were intended to be used primarily by the contributing estuary programs. In their original centralized disposition (i.e., in SAS on NCC), these data sets required knowledge of SAS in order to retrieve and/or examine the data. The prerequisite of SAS familiarity was too demanding for many and it prompted 1 ------- two types of DIMS development reactions. In one scenario, data sets could be incorporated into EPA's ODES system, which offered online functions for basic data analyses without significant knowledge of the SAS software. In another scenario, some Tier I programs chose to develop custom systems aimed at serving the particular needs of their respective programs. DIMS objectives other than centralized access are equally important and influential in how DIMS are implemented. A top priority should be ensuring that the DIMS will help scientists and managers to characterize the estuary. Likewise, DIMS are intended to enable managers to analyze for trends that show the effectiveness of the Comprehensive Conservation and Management Plans. However, DIMS cannot be expected to produce the analyses. It is critical for NEP data managers to recognize two distinct data management functions of storage and analytical capability. Most DIMS are designed to store raw information. Characterization and CCMP efforts can be served by storing and retrieving information with a DIMS. To serve characterization or CCMP efforts with data analysis functions requires fairly complex system designs. System designs to serve data storage and/or analytical needs have evolved during the course of the NEP. Some complex systems have been built, such as the Chesapeake Bay Systems, to serve regional needs. Other regional systems have been designed basically to store and disseminate information. Data dissemination allows the users to analyze the data with their own facilities. In many cases, these regional systems were developed because the SAS data sets on NCC were not being utilized by the NEP scientists or managers. Regional DIMS that are regional must also address needs that extend beyond the immediate set of users. During 1990, EPA adopted an NEP Data Management Policy that is intended to serve regional and national goals for the use of data. This policy addresses how NEP data can be managed regionally while also utilizing EPA's centralized data systems. This report reviews briefly how the national and regional DIMS developed during the Tier I programs. 2 ------- 1.2 EPA'S NATIONAL SYSTEMS At the beginning of the Tier I programs, there were many National data systems, including EPA's STORET, BIOS, and ODES, as well as NOAA's NODC. At that time, none of these systems was adapted specifically to the NEP. Each stored data relevant to the NEP. However, the systems were dissimilar. STORET focused primarily on contaminant levels. BIOS generically stored biological measurements. NODC was intended to store oceanographic data. These systems did not have very much analytical capability. ODES, which employed formats modeled after NODC, was originally intended to store and analyze data from 301H dischargers. EPA has supported the development and initial use of the Ocean Data Evaluation System (ODES) as a national data management system. ODES provides EPA with accountability for the quality of data sets, as well as for data storage, access, user support, system maintenance, and general analytical tools. Recently, ODES functions have expanded to include data dissemination, incorporation of data from other EPA databases, portability to Geographic Information Systems, and user-friendly data entry modules. There are extensive manuals to document data formats and utilities. Successful implementation of ODES depends on continued commitment from EPA to support this centralized approach to data management, and effective education of prospective users such as the NEP Management Conferences. 1.3 BASIS FOR DIMS PLANS Each NEP Management Conference should implement the most effective, cost- efficient, and timely means of handling its data and information. Proper data/information handling is essential for estuarywide status and trends reports, characterization efforts, and use in executing the Comprehensive Conservation and Management Plan (CCMP). The NEP Management Conference method of handling data/information should adhere to, and benefit from, the NEP Data Management Policy as established by EPA's Office of Marine and Estuarine Protection (OMEP). To achieve this objective, NEP Management Conferences 3 ------- should consider the variety of user needs, technical requirements, and alternatives for implementing DIMS. This report describes the DIMS used by Tier 1 estuary programs. The DIMS descriptions provide information on the computing resources utilized, as well as on the factors essential to data management such as staffing and Quality Assurance/Quality Control (QA/QC). It is hoped that Tier 2 estuary programs can adopt some of the successful ingredients of Tier 1 systems. 2.0 METHODS Representatives from the six original estuaries designated to the NEP (Puget Sound, WA; San Francisco Bay, CA; Albemarle-Pamlico Sounds, NC; Long Island Sound, NY and CT; Narragansett Bay, RI; and Buzzards Bay, MA) and the Chesapeake Bay Program were contacted to provide information for this report. This report serves as a vehicle of technology transfer for data-management strategies from established Tier 1 programs to newer Tier 2 programs. Individuals responsible for data management activities in the seven programs were interviewed by telephone. Appropriate interviewees were identified through discussions with regional EPA program coordinators and regional data managers (listed in the table below), as well as through Battelle's participation in Tier 1 programs. Tier 1 Program Chesapeake Bay Program Puget Sound Estuary Program Albemarle-Pamlico Estuarine Project Narragansett Bay Project Data Management Coordinator 4 Lowell Bahner (301) 266-6873 Senior Computer Scientist Roberta Feins (206) 464-7320 Scientists/Systems Analyst Karen Siderelis (919) 733-2090 Director, Center for GIS & Analysis Stephen Hale (401) 792-6617 (401) 277-3165 ------- Long Island Sound Study Jane Copeland Minicomputer Specialist Barbara Finazzo EPA Region II (401) 782-3168 (212) 264-8968 Buzzards Bay Project Neil MacGaffey Database Administrator (508) 748-3600 San Francisco Bay Estuary Project Mark Flachsbart DIMS Representative (415) 464-7990 Thomas Gulbransen DIMS Administrator (617) 934-0571 During the interview discussions, information on the following major topics was solicited: • Individual roles of key DIMS personnel • DIMS major features • DIMS users • Tools or products available from DIMS • Quality Assurance/Quality Control techniques • Estimated costs • System development milestones/schedule • Recommendations The Data and Information Management System approaches vary greatly across the Tier I National Estuary Programs. While some NEPs have implemented formal DIMS with standardized procedures, other programs are still developing their DIMS and basic data management strategies. It is important to note that a basic data management strategy, such as adopting a centralized approach to data compilation and storage, is a fundamental element of NEP data management. Such a strategy is a precursor to DIMS development. DIMS development should be based on an agreed-upon data management strategy. This report attempts to include any available elements of Tier 1 data management strategies. However, the emphasis of this report is to provide functional descriptions of the existing DIMS. 3.0 DIMS DESCRIPTIONS 5 ------- The DIMS descriptions are presented below for each Tier I NEP individually. Within each description, information is presented for the major topics that are relevant. 3.1 CHESAPEAKE BAY PROGRAM The Chesapeake Bay Program (CBP) often is held up as the model estuarine resource management program. The program has a higher level of funding than other programs in the NEP and thereby has DIMS budget constraints that are very different from those of Tier 2 programs. The CBP may procure equipment and dedicate staff necessary to run a large data management operation. In addition, the CBP has been active for nearly twice as long as the Tier I NEPs. 3.1.1 Individual Roles The CBP is supported by a data management staff of 25 individuals. All staff members are trained to program in the SAS statistical package that serves as the data management software. This helps to minimize retraining, which might otherwise be necessary because of staff transfers or attrition. The DIMS staff at CBP includes two full-time modelers. Additionally, there are six staff members trained and three of these dedicated to geographic information processing with ARC/INFO. Geographic Information System (GIS) trained staff is expected to increase to 12 within a few months. Each individual study within the program (i.e., water quality monitoring, point source discharges, etc.) has one or two staff dedicated to managing data/results. These people are responsible for interacting with the State agencies that collect the data to resolve data submission issues. They are also responsible for collecting documentation for data received from State agencies. 6 ------- 3.1.2 Major Features The CBP DIMS is run on a VAX 8600 computer. Sixty terminals including Digital Equipment Corporation, Macintosh and DOS PCs and Tektronix graphic terminals are interfaced to the main computer. The computer system is interfaced to EPA's wide area communications network. The system includes a variety of plotters and printers. System software includes SAS, ARC/INFO, and Digital Command Language (DCL). Program staff have developed custom DCL programs to run SAS and ARC/INFO interactively. The Chesapeake Bay Program uses SAS as its data management system. This was considered the best available option for Chesapeake Bay, based on the data management systems available to Federal programs at the time of initial development. A Fortran-based watershed model of Chesapeake Bay is maintained by data management staff. The model is linked to ARC/INFO Geographic Information System (GIS) to display spatial and temporal output. A three-dimensional time-variant model of Chesapeake Bay is currently under development. It will run on the EPA Cray supercomputer in FY92. 3.1.3 DIMS Users Access to the Chesapeake Bay DIMS is widely available. The system is accessible not only by State and Federal agencies collecting data but also by the research community. The public also has access to the system via modems. Tools have been developed to make the system user-friendly. 3.1.4 Tools or Products Available The Chesapeake Bay Program staff has developed several custom tools to assist users and data management staff. • MONITOR reformats data received from various state organizations into a standard format so that it can be added to the SAS monitoring database. • CHESSEE is an index tool for the DIMS. CHESSEE serves as a card catalog supplying summary information of all the data stored in the 7 ------- database. Queries for summary information are made by using keywords. CHESSEE is available to the public via modems. • BAYSTAT allows researchers and computer scientists access to the raw data instead of to summary information. Menus are used to select the type of analysis to be performed on specific types of data. The analysis may be printed out in hard copy or viewed on screen. • HISTATS is used to access the historical database maintained by the Chesapeake Bay Program. Due to gaps in earlier data, HISTATS stores data by decade. • QAQCSTATS provides user access to the QA/QC database. • VOLUME is Fortran-based software that models Chesapeake Bay water quality in three dimensions. The results are mapped in color and as videos which show how water quality changes through time. Standard tools for the GIS have not been developed due to the diversity of requests for analyses. Each GIS project is supported with custom-developed procedures. 3.1.5 QA/QC Techniques The Chesapeake Bay Program has a full-time data QA/QC officer. The program feels that this is essential, especially with monitoring data. Since data collection is such a large effort, a full-time QA/QC officer is warranted. The QA Officer ensures that data collection and reporting procedures include measures to increase and document the quality of the data being produced. The program was without a QA/QC officer for 4 years and ran into some difficulties with misreported data. Significant time and money was needed to recompile and ensure the quality of retrospective data through examination of laboratory records. 3.1.6 Estimated Costs Approximately 21 percent of the individual study (i.e., water quality monitoring projects, point source discharges projects) budget is allocated to data management activities. 8 ------- 3.1.7 System Development Milestones/Schedule Chesapeake Bay data management staff recommend that first-year data management activities include the establishment of quality guidelines for data submission. This requires negotiation with State agencies, the scientific community, and management. While developing these guidelines, a feasible scope of work under the proposed standards should be defined. The hardware and software design of the system can be resolved after making decisions necessary to implement the program. The Chesapeake Bay DIMS is still being enhanced, especially GIS-based components. 3.1.8 Recommendations Chesapeake Bay data management staff stressed the need for on-site computer resources. This was deemed essential because of turnaround time. Also, computing power from a shared database system was preferred over unlinked resources obtained through independent microcomputers. The need for a QA/QC person was also stressed. The Chesapeake Bay Program also recommends data management plans for each monitoring study in the program. Chesapeake Bay Program data management plans were developed as guidelines for State and Federal agencies for data submission. The plans also contain a complete data dictionary and all QA/QC procedures. Program staff recommended that a standard be established throughout all National Estuary Programs. The Chesapeake Bay documents are available to other programs. The Chesapeake Bay staff expressed willingness to assist other programs with data-management strategy development and system planning by conducting on-site demonstrations or conference. 3.2 PUGET SOUND ESTUARY PROGRAM The Puget Sound Water Quality Authority was created by State mandate to develop a management plan for the estuary. The Authority, EPA, and the Washington Department of Ecology joined efforts under the NEP and created the 9 ------- Puget Sound Estuary Program (PSEP). As a result, the PSEP has strong State and local support. The Puget Sound Ambient Monitoring Program (PSAMP) is one portion of the PSEP. PSAMP is designed to be a long-term comprehensive monitoring program for the Sound. A data management system has been developed to handle PSAMP data. This data management system was developed with funding from the EPA. Actual monitoring activities and subsequent data management costs are funded by the State of Washington. The data management system is coordinated by Puget Sound Water Quality Authority personnel. The data management system developed for PSAMP is designed to store and manage data from PSAMP monitoring, as well as additional data of value to the assessment of environmental conditions in the Sound. These additional data may be from PSEP, or from other agencies. Each agency conducting monitoring can determine what other data should be included in their own monitoring database; the PSAMP Steering Committee assesses what additional data should be included in the PSAMP central database. PSEP contracts now require that all data be submitted in PSAMP data transfer format, so that they can be loaded into the PSAMP central database. From the central database, PSEP data collected in 1990 will be transferred to OMEP in ODES format. 3.2.1 Individual Roles PSAMP staff at the Puget Sound Water Quality Authority are involved in coordination of the monitoring program. For several years, there was one full-time staff member responsible for assessing data management needs and developing a data management approach. A programmer was added to the staff within the last year. Much of the earlier data management effort was performed by contractor support who collected, digitized, and archived historical data. Other state agencies also have staff or contractors involved in development of their distributed databases. Ongoing PSAMP data management at the Water Quality Authority will require one full-time data manager, and one staff member for data analyses and report 10 ------- preparation. An additional 3.5 staff will be needed to manage the data in the state agencies doing the monitoring. The monitoring agencies are required to produce yearly reports of monitoring results, and the central staff will prepare a yearly summary. 3.2.2 Major Features The PSEP data management strategy is based on a distributed system approach. Raw monitoring data are the responsibility of the agency that collects it. The Puget Sound Water Quality Authority has developed a central database for storing summary data from the agencies involved in the monitoring program. The central database is maintained on a PC using dBASE IV software. No software or database requirements have been imposed on the agencies that manage the raw data. A data management strategy document was produced that outlines strict transfer formats for data transmittal to the central system. These same formats can be used for data exchange among agencies. The data management staff recommend that these formats be used by other studies in the PSEP. The Puget Sound Water Quality Authority is developing a GIS with other state agencies that run ARC/INFO software. 3.2.3 DIMS Users The PSEP explored the option of developing a networked DIMS that would allow multiuser access to the system. The networked system was not developed because of budget constraints. The agencies that perform monitoring studies have access to the DIMS, but the scientific community and public do not and must request data from the Puget Sound Water Quality Authority. The Authority is responsible for filling data requests and providing information in the form of reports. To date, this has been a very minor task. 3.2.4 Tools or Products Available The Puget Sound Estuary Project has developed the following tools for getting information about the Sound: • Computerized bibliography called SOUND ACCESS 11 ------- • Library, maintained by a contractor • PSAMP central database • Puget Sound Environmental Atlas SOUND ACCESS was installed at The University of Washington library to provide greater public access. Keywords are used to select information about Puget Sound. As yet, the system is not widely distributed. SOUND ACCESS is scheduled to be updated this year by the Puget Sound Water Quality Authority. The central database, PSAMP, was written in dBase IV (source code available). An important element in PSAMP is the systematic set of formats that have been established for data transfer or exchange. An extensive set of documents has been produced to describe PSAMP components. The program has also produced the Puget Sound Environmental Atlas. The Atlas is a hard-copy two-volume set of maps that show the Sound's resources and pollutant levels. The GIS will be used to update the Atlas. 3.2.5 QA/QC Techniques Agencies that carry out environmental monitoring as part of the Puget Sound Ambient Monitoring Program must develop a QA plan for data management as well as for sampling. The QA plan is required to include PSAMP protocols for sample analysis and QA/QC. Technical review of the data is the responsibility of each agency. Data submitted to the central database are checked for required fields and value ranges. Qualifiers have been added to the database to describe the level of QA/QC for individual data records. This allows users to omit questionable data points for status and trends analyses. 3.2.6 Estimated Costs A set percentage of funding for data management has not been established for PSEP. The staff estimates that about $200,000 was spent over 3 years on salaries, software, applications development, and hardware for the central system. An additional $30,000 was spent on the needs assessment and $20,000 12 ------- on the SOUND ACCESS tool. There are also costs spread across the individual agencies that enter and maintain their distributed databases. Data management activities are funded at approximately 10 percent of the requested budget. 3.2.7 System Development Milestones/Schedule A data management strategy for the PSEP was developed over the course of 1 year. The goals and objectives of the data management system were defined through user needs assessment. Alternative systems were brought before the Monitoring Management Committee. The committee decided to develop a distributed database system. Documentation on system design and data formats was prepared during the same year. System implementation has taken an additional 1.5 years. Implementation of the GIS is on a 2-year schedule. Existing data from the 1987 environmental Atlas are being converted into GIS format in 1990. The Atlas will be updated during the second year. During 1990, the data management staff plans to examine historical data for Puget Sound. A list of potential studies will be presented to the Steering Committee. The committee is responsible for the final selection of studies to be added to the central database, and used for Atlas update. 3.2.8 Recommendations The PSEP staff cautions that the data-management planning process consider realistic ideas about budget and results. In programs of this type, there is often conflicting opinion on spending money on collecting new samples versus managing existing data. The distributed approach is useful because it tasks the agency collecting the data with the responsibility for all aspects of data submission and formatting. This ensures better data documentation, and encourages "ownership" and use of the data by agencies. The PSEP staff expressed willingness to work with new programs on data management issues. 13 ------- 3.3 ALBEMARLE-PAMLICO ESTUARINE STUDY The Albemarle-Pamlico Estuarine Study (APES) has combined its data management efforts with those of the State of North Carolina. By doing this, the study benefits technically, financially, and with respect to implementation schedules. 3.3.1 Individual Roles The APES has one full-time data coordinator position. The coordinator, who is responsible for all aspects of data management for the program, does not actually perform the data manipulations. Presently, this position is open. Early in.the program, the APES Management Conference decided to link APES data management efforts with State efforts. The North Carolina Center for Geographic Information and Analysis (formerly Land Resources Information Service) is the office that handles State data management and GIS efforts. The Center is organized as a service bureau and easily fits into the role of working with the APES. Since the State had staff and hardware in place, minimal training was necessary for the staff to handle NEP data. The Center has 20 individuals on staff. 3.3.2 Major Features At the beginning of the APES, the existing hardware and software at the Center for Geographic Information and Analysis included a Data General minicomputer that used ARC/INFO as the GIS. Since then, the minicomputer has been replaced with Sun workstations. It was decided early in the APES program to use a GIS as an integrated database and to combine efforts with the existing State GIS. The GIS does not serve as a major comprehensive database, but as a selective database that can be integrated with other systems such as EPA's STORET. APES data management staff is focusing on the use of particular data and is not building a regional system for computerized archiving of program data. The Center is in the process of adding ERDAS Image Processing software and is also acquiring SAS statistical software. 14 ------- 3.3.3 DIMS Users The APES data management staff envisions three major user groups of the APES DIMS. These groups include resource managers (State and Federal agencies), researchers (universities and contractors performing projects under APES), and others (private citizens and industries). Resource managers are expected to be the major users of the DIMS. Currently, the system is used by State agencies via either direct access or data requests. Direct access is limited because the software for dialing in is not fully operational. Data from APES research projects have been added to the system but researchers have not been active users. The data management staff hopes to see an increase in users from the research community over the next few months. The only public exposure to the DIMS has been through GIS presentations for public meetings. 3.3.4 Tools or Products Available A custom user interface to ARC/INFO is being developed to make the database more accessible to resource management agencies. The GIS has been used to provide acreage data, preliminary status and trends report, and maps. Since these GIS outputs are specialized, their production has not been automated into a fixed set of tools. The APES is developing software to integrate the GIS with other systems. Efforts to build a data inventory or index are under way. All data referenced in the inventory are not being actively managed by the program. A data dictionary is being developed to describe each data set. 3.3.5 QA/QC Techniques QA/QC procedures are in place only for cartographic data. APES is documenting sources, dates, and limitations of data sets but no effort is being made to correct existing data. 15 ------- 3.3.6 Estimated Costs Twenty percent of program funds are used for data management activities. This figure is seen as recognition of the importance of data management by the APES Management Conference. This amounts to approximately $150,000 per year, including the salary of the data coordinator. 3.3.7 System Development Milestones/Schedule Within 6-9 months, a working version of the access interface software should be completed. The database is scheduled to be fully designed during 1990. The database may not be fully populated before the program is over. 3.3.8 Recommendations Staff involved in APES data management encourages other programs to look at existing computer resources. If APES had started from scratch, 3-4 people would have been needed for data entry, liaison, etc. NEP program data management can not be handled with one staff person unless work is contracted out. APES ran into problems in the beginning because of a delay in making decisions while trying to hire a data coordinator. The hiring process took more time than expected, and the new hire left the position after 6 months. Meanwhile, the program delayed a formal needs assessment and database design. The needs assessment and database design was not agreed upon and finalized early in the program. The DIMS remained active because some data had been entered into the GIS. The program feels that a GIS is a good data management option. The mapping versatility gains public support and interest. 3.4 NARRAGANSETT BAY PROJECT The Narragansett Bay Project (NBP) has benefitted by the presence of EPA's National Research Laboratory at Narragansett and a strong State interest in 16 ------- the Bay. Data management efforts have been enhanced by the availability of existing data management systems operated by EPA and the State. 3.4.1 Individual Roles Data management staff for the NBP include a data management coordinator, two FOCUS programmers, and an ARC/INFO programmer. The senior-level FOCUS programmer is responsible for developing the database system and the junior- level FOCUS programmer is setting up a data library and tracking down data. The Project is guided by the Data Management Working Group, a sub-group of the Science and Technical Committee. 3.4.2 Major Features The NBP DIMS uses FOCUS (an EPA and RI Department of Environmental Management standard) data management software on a microVAX computer. The microVAX is networked to EPA's National Research Laboratory VAX 785 computer which is a node on the EPA national network. The NBP DIMS, called the Narragansett Bay Data System (NBDS), features an on-line index to all data on the system as well as to certain relevant data from other data systems. The user can dial up the database from a terminal or a personal computer equipped with a modem. The system is menu-driven, which means that the user makes choices from a tree of menus. This makes it fairly easy for almost everyone to use, including people with little computer experience. Users have direct access to the data contained in the system. Various formats, code tables, and naming conventions were borrowed from the CBP, NODC, and ODES to try to obtain some degree of inter-estuarine compatibility. The NBP DIMS also uses the Rhode Island GIS located on the University of Rhode Island's Prime 6350. The GIS is implemented with ARC/INFO software. Additional hardware includes IBM and Macintosh microcomputers. A data transfer mechanism is available for downloading data from the database to the GIS or microcomputers. 17 ------- The Project funded the Pell Marine Science Library of the University of Rhode Island's Graduate School of Oceanography to update the Bay Bib, a bibliography of Bay-related publications. This will be available on a microcomputer-based bibliographic software package Inmagic. The NBDS, the GIS, and the Bay Bib are cross-referenced by geographic regions, station IDs, and accession numbers. 3.4.3 DIMS Users A formal Survey of Potential Narrangansett Bay Data System Users was conducted to obtain input on the type of system to develop. The NBP DIMS is in the development stage and not available for full online access. The data being collected as part of the characterization study are being used by NBP staff, the State, and the principal investigators. The user needs survey indicated that the research community and State and Federal agencies need access to the completed DIMS. Nonprofit environmental groups also requested access to the system. The survey indicated some interest in a citizen's account, which the general public could use to obtain summary information about the Bay and the Project. 3.4.4 Tools or Products Available The NBP data management system is in the development phase. The goal is to have the FOCUS database online by the end of 1990. The program is also developing an index to search by parameters, species, or areas to find information concerning available data, collection dates and locations, and collection agency. A major use of the data system will be to store and analyze data obtained from the long-term monitoring program to see whether the implemented management recommendations are having the desired effect in improving Bay water quality. 18 ------- 3.4.5 QA/QC Techniques Data submitted to the NBP must be in accordance with the Narrangansett Bay Data System Data Submission Manual. The NBP has adopted the geographic data standards used by the GIS center at the University of Rhode Island. Monitoring data receive technical review through the peer review process established by the NBP Management Conference. Agencies collecting data are required to develop a QA document as part of the funding agreement between the program and the agency. The FOCUS database has computerized routines for checking formats, codes, frequencies, ranges, and character frequencies. 3.4.6 Estimated Costs Data management costs for the NBP are difficult to estimate because of support from EPA's Environmental Research Laboratory - Narragansett and the state. Salary for four full-time staff is approximately $180,000 per year but there have been four staff only in the past year. The program received a 2-year grant ($240,000) to develop the data system. This included hardware, software, and personnel costs. 3.4.7 System Development Milestones/Schedule The NBP is working to create a system that has been designed around local needs. Demonstrations of system prototypes and development are an important part of the design process. Demonstrations are used to secure support for the system and obtain input. 3.4.8 Recommendations The data management staff for the NBP recommend using commercially available database management systems. An index/catalog of available data is an invaluable tool. Programs should explore resources that are available. The NBP chose FOCUS as its data management system because it is used by EPA's National Research Laboratory and the Rhode Island Department of Environmental 19 ------- Management. Data management staff stressed the development of local systems that have the financial support of the local community. A long-term commitment is important. Mechanisms to keep the database up to date have to be implemented. The staff expressed the need for some national standardization of NEP policy. New estuary programs can benefit from the use of various formats, code tables, and so on (such as the NODC taxonomic codes and the Chemical Abstract Services chemical codes) which have become somewhat standard. The Chesapeake Bay data management staff assisted the Narragansett Bay Project with its data management efforts; the NBP has also corresponded with staff involved in the Tier I estuary management efforts. 3.5 LONG ISLAND SOUND STUDY The initial data management efforts for the Long Island Sound Study (LISS) were the collection of historical data. Review of the historical data was performed manually. The historical data were put online at EPA1s NCC but the general consensus was that the data were inaccessible with poor documentation on quality assurance. The Management Conference decided that a DIMS was not necessary in developing a characterization of Long Island Sound. Two models are being developed by the LISS; a hydrodynamic model developed by contractor support and a water quality model developed by the National Oceanographic and Atmospheric Administration. The completed models will be coupled together. Both modeling efforts receive data directly from State agencies and universities collecting data. The Management Conference has decided that a DIMS is necessary for managing monitoring data collected during the implementation of the CCMP. A data management working group of 8-10 people has been tasked with developing a DIMS for this monitoring data. This group has recently completed an extensive survey of prospective users needs. The results of the survey have been compiled and are available from the LISS program office. To date, an actual DIMS is not in place. During 1990, the working group expects to reach a 20 ------- decision on their data management approach and hire a full time data manager. Efforts to implement a DIMS and compile data will begin after a staff person is hired. The study anticipates using an existing data management system that is easily accessible and meets the needs of the users. LISS DIMS users will include State and Federal agencies and citizen groups. The agencies will have access to the entire database to determine the success of the CCMP. Citizen groups will access summary information. The study is exploring the option of establishing a repository for the historical data and the data collected as part of the characterization study. The data repository may reside at Connecticut's environmental library. 3.6 BUZZARDS BAY PROJECT The Buzzards Bay Project (BBP) is presently designing and implementing a DIMS. The DIMS is being developed to support the Comprehensive Conservation and Management Plan. Earlier data management efforts did not involve a DIMS, but instead consisted of compiling separate data files on the EPA NCC. Active development of the DIMS began when a data manager was hired. 3.6.1 Individual Roles Presently, there is one full time staff position dedicated to design, implementation, and management of the DIMS. This database administrator leads a working group of approximately 12 regional scientists, program managers, and system administrators. The working group provides input to DIMS development by identifying priorities, existing resources, and related activities. The working group also provides feedback to DIMS planning by reviewing system prototypes under development. DIMS plans are formally reviewed by state personnel according to standards and criteria used to manage all software development by state agencies. Protocols from Massachusetts' state geographic information system group are being followed during development of the GIS component of the BBP DIMS. State agencies also play a large role in implementation of the Buzzards Bay DIMS because most historical data exist in 21 ------- various agency systems. Likewise, the Buzzards Bay DIMS implementation will be able to accommodate new data being collected by these organizations. 3.6.2 Major Features The Buzzards Bay DIMS has two major components, a marine monitoring database and a GIS. The marine database is being implemented with ORACLE database software. The database will have an index function to reference both online data and other data types collected by surveys that are not yet online. The GIS is being developed in ARC/INFO. A direct link between these two components is planned by using an ARC/INFO utility. System components are being built and tested on a microcomputer. The system plan calls for full-scale implementation to be on a VAX at the State environmental agency's data center. Migration to the state's minicomputer will occur when multiuser access is needed. Implementation on the State's minicomputer will also enable direct integration with related databases maintained by other groups within the State. The Buzzards Bay DIMS is being implemented in ORACLE primarily because all marine monitoring data collected in the State, either by BBP or regional organizations, is being stored in ORACLE. ORACLE is also the State environmental agency's standard database management software. 3.6.3 DIMS Users Direct access to the Buzzards Bay DIMS is limited to the database administrator. The database administrator fills any data requests from State and Federal agencies, the scientific community, and citizens groups. Analyses of these data are accomplished by users on their own computer systems. 3.6.4 Tools or Products Available The DIMS being implemented for Buzzards Bay will be used to produce outputs to support specific questions about estuarine conditions. The database will be 22 ------- able to generate reports about particular stations or areas in the estuary. Station-level data retrievals are well suited to report water quality data, especially for sub-areas of the estuary that have been the subject of concentrated surveying efforts. Reports about station level data are starting to be produced and will be more extensive as the database expands. Spatial presentation of marine data has been used to depict frequency of water quality measurements exceeding given thresholds; these kinds of depictions, and also spatial analysis, are among the short-term objectives of the data management effort. 3.6.5 Quality Assurance/Quality Control Techniques Water quality data for the BBP was examined for assurance of quality based on review of historical data files and field sampling documentation. However, a program-wide data quality control plan does not exist yet. Quality assurance guide requirements for future data submissions and data recording are established. QA/QC will also be applied when Buzzards Bay data are submitted for archival in ODES. Some information relevant to quality assurance is stored as part of the index to existing data sets. This information includes items such as data sources, contact(s), sampling organization, analytical laboratory, and a bibliographic reference. 3.6.6 Estimated Costs For the first few years, data management efforts focussed on compiling historical data, and entering data into files at NCC. The contractor who compiled these data was available to assist users in gaining access to the files. The cost for these activities is not known because they were not separate budget items. Recently, data management resources have been redirected toward implementing the regional DIMS described above. The two year cost for DIMS implementation including staff and equipment, is approximately $150,000. Additional costs may be incurred beyond 1991, but have not yet been budgeted. 23 ------- 3.6.7 System Development Milestones/Schedule All system development has taken place over the past 18 months. Full implementation is planned over the next few years; however, some detailed steps are scheduled for the immediate future. Existing data sets are being added to the DIMS. GIS data are being added to support program objectives, and GIS products are regularly produced using existing MassGIS data. Other data types, such as environmental conditions, physical/chemical characteristics, and geographic data will be added to the DIMS individually based on the priority data needs. Initial data loading is focusing on nutrients, toxics, and pathogens. A comprehensive survey of existing data sets has been completed. This eight page survey was completed for approximately 55 data sets. Comparable information, prepared by a contractor, exists for an additional 50-60 data sets. 3.7 SAN FRANCISCO ESTUARY PROGRAM The San Francisco Estuary Program (SFEP) has developed a DIMS and produced raw data files. The DIMS is intended to provide users with information such as bibliographic references, indices of data sets, and testimonies. The DIMS is not intended to provide users with data; select data are available from files which reside on NCC. The SFEP Technical Advisory Committee decided that it was in the best interest of the SFEP to focus on information management in the DIMS. This early decision served as the target need being met by the existing DIMS. 3.7.1 Individual Roles The primary roles in SFEP DIMS implementation were program managers and system developers. The program managers included a SFEP staff member and members of the technical advisory committee. The system developers who provided input into the DIMS development included contractors and a regional nonprofit organization with DIMS development interests. 24 ------- The role of the program managers was to define the objectives of the DIMS. Once the objectives were defined, the program managers determined the most appropriate organizational mechanisms for developing the DIMS. The program managers were also asked periodically to review DIMS features. The role of the system developers was to determine the most feasible way to design and build a DIMS that met the users' needs. Contractors played a key role in designing, building, and loading the DIMS. The nonprofit organization also played a key role in gathering the basic information and organizing it for input into the system. It should be noted that the nonprofit organization provided staff, facilities, and other resources to assist in DIMS implementation. 3.7.2 Major Features There are three main features of the SFEP DIMS: a State Hearings Testimony database, a data index, and a bibliography. The State Hearings Testimony database is a text-based system that allows retrieval of articles, exhibits, and testimony admitted in a court case in the State of California. The data index and bibliography provide reference to data sets or studies identified as relevant to the San Francisco Bay estuary system. SFEP DIMS make use of local computing resources and NCC. The three SFEP DIMS features described above were developed by using INF0/DB+, C, and user- interface software on a MicroVAX that is networked to microcomputers. The three features on the local system can be accessed with modems that access a menu-driven user interface. The SFEP DIMS includes a series of dedicated terminals and modems located at universities and at State and Federal offices throughout the State. Data files were stored on the NCC IBM mainframe by using SAS software and the TSO operating system. There are no direct connections between the local DIMS and the NCC system. 25 ------- 3.7.3 DIMS Users The SFEP DIMS is accessible to State and Federal agencies, the scientific community, and the public. Access to the system may be gained via modems or use of dedicated terminals located around the State. User accounts are available upon request. To date, there are over 40 user accounts. Use statistics are maintained on frequency and duration of system usage. 3.7.4 Tools or Products Available The tools and products available from the SFEP DIMS are described briefly in the preceding section on system features. 3.7.5 Quality Assurance/Quality Control Techniques Any information that is included in the DIMS features is compiled by members of the staff who check for completeness of information based on their familiarity with the topics. Keyword entries are manually added to the testimony transcript data tape. Loading of information into the index, bibliography and testimony database are accomplished via subroutines written by the system administrator. The information loading process is not subject to quality control beyond the loading program's ability to reject and report specific entries. Because raw data are not included in the SFEP DIMS, there is no need for data- submission quality control standards or procedures. It is expected that all new data generated by SFEP field surveys will be submitted in accordance with ODES conventions. 3.7.6 Estimated Costs The costs for SFEP DIMS development have fluctuated according to how many individuals were dedicated to development and loading activities. During the first 2 years, DIMS development costs included multiple system analysts, data technicians and purchase of hardware and software. Most recently, SFEP DIMS 26 ------- costs have been limited to a system operator and data,technician, primarily to operate and maintain the system. In total, the SFEP data and information management effort over the past 3 years has cost approximately $600,000. 3.7.7 System Development Milestones/Schedule Each of the SFEP DIMS features took approximately 6-12 months to design and implement. The most important milestones in SFEP DIMS development were the decisions concerning priorities and program direction. System developments were discontinuous over the past 3 years while priority objectives for the DIMS were reevaluated. Informal user needs surveys were conducted at the outset of DIMS activities and again recently. The first user survey polled to identify the relative demand for centralized data management versus centralized access to information and references. The second needs survey was intended to reevaluate the recent demand for the existing DIMS features, as well as reassess whether centralized data access was now necessary. 3.7.8 Recommendations The information system, like any data management system, must be kept up to date. New information should be added as it becomes available. The system becomes less useful if it contains historical information, but does not include the latest information. 4.0 SUMMARY The DIMS described above have been produced to satisfy a diversity of needs that varied according the regional priorities facing each of the National Estuary Programs. These descriptions can improve development of Tier II DIMS through examination of the individual NEP DIMS assumptions and development objectives. This summary highlights the commonalities and significant departures between Tier I DIMS along with some explanation of their rationale. 27 ------- All seven estuary programs interviewed for this report had at least one staff member dedicated to data management activities. Several programs stressed the importance of having one identified person who was responsible for the coordination of all data management activities. Even when data were managed at several locations, a central person with data management responsibility increased coordination and enabled data integration. In the one noted case where a QA/QC officer was appointed, costs for data integration were significantly reduced. Even though a person was dedicated to data management within each Tier I NEP, the Management Conferences differed greatly in their commitment of resources to DIMS development or a long-term data management strategy. In the absence of a strategy for managing data, some of the early data gathering efforts were not well coordinated. Likewise, it was often difficult to commit resources for planning and development of a DIMS without a clear understanding of the DIMS objectives. Implementation of full-scale DIMS in Tier I NEPs was possible only when the management conferences were given the opportunity to review, comprehend, and modify the DIMS objectives. In other words, Tier I DIMS development progress was a function of how well the major proponent of DIMS could enlist adequate, multiyear commitments of resources and staff from the management conferences. The major DIMS features developed and the occurrence of each within the individual estuary programs are best summarized in a matrix (Table 1). One feature common to the majority of programs is a GIS. Another common feature is a computerized index of data sets or information sources. Features that are presented as conceptually similar between programs are not necessarily identical, especially with respect to how fully the features are implemented. Of equal importance to note are the variety of differences between DIMS features implemented in these programs. Five of the seven programs interviewed maintain a GIS. These programs are Chesapeake Bay Program, Puget Sound Estuary Program, Albemarle-Pamlico Estuarine Study, Narragansett Bay Project, and Buzzards Bay Project. 28 ------- TABLE 1. MATRIX SHOWING SYSTEM FEATURES COMMON TO THE SIX TIER 1 ESTUARY PROGRAMS AND THE CHESAPEAKE BAY PROGRAM. Feature CBP PSEP APES NBP LISS BBP SFEP Main Frame Computer with networked PCs & graphics terminals X X X Personal Computers X MicroVAX X K X X Sun Workstations X EPA network X X SAS statistical software Xa _Q CL X GIS w/ARC/INFO X X xa X X dBASE IV software Xa Focus database management system xa Oracle database management system xa INFO/DB+ software Custom integration software X X Image Processing software XP Model of estuary X X INDEX X X X X Data formatting tools X X Data accessing tools X X X 29 ------- TABLE 1. (Continued) Feature CBP PSEP APES NBP LISS BBP SFEP Transfer format XX X documentation Data dictionary X XX X Computerized QA/QC XX X routines Bibliography XX X Text-based information X X system Dial in access X Xp Xp Xp Data in ODES Data at NCC X X X X X J Used as DIMS Used for statistical analysis not dbase management Xp: Planned features 30 ------- All five use ARC/INFO as the GIS software. Each program uses their GIS for specific needs, which are not necessarily similar between programs. All programs use the GIS for presentation of spatial information, such as simultaneous overlays of ecological resource maps and observed contaminant levels. A few programs use the GIS to assist in assembling data sets by using maps to determine/verify sampling station locations. Battelle did not gather sufficient information to characterize how Tier I programs may be using GIS to conduct spatial analyses, such as quantifying change in habitat areas or proximity to incidences of contaminant levels in violation of EPA criteria. Spatial analyses should be a major value added to the NEPs by GIS implementation, provided that the underlying data are rigorous enough to support the tests. Understanding spatial analyses possible with a GIS can serve as integral feedback to help NEPs decide which types of data to gather. Whereas a GIS can be helpful to plan data collection, an index of data or information is essential to utilizing what is already available. Data or information indices were built by five programs (CBP, APES, NBP, BBP, and SFEP). These indices were included as part of the DIMS, with varying levels of detail. It is unclear how many of these indices will be updated as new entries are generated. Dial-in access will eventually be available for four programs. APES, NBP, and LISS anticipate developing dial-in access, whereas CBP1s system is already available. Differences in DIMS features between NEPs arose for many reasons, including: (1) Priorities of the management, technical, and citizens advisory committees (2) Decisions to build on existing State agency infrastructures, including staff, software, and/or hardware (3) Adoption of a decentralized approach to data handling/analyses (4) Selection of different data management software and/or hardware (5) Different uses of previously available data management support (6) Emphasis on information dissemination in lieu of data management (7) Funding levels and sources 31 ------- Tier I DIMS are implemented on hardware that ranges in scope from mainframes and minicomputers to microcomputers. The hardware platform employed has depended on the system needed, financial resources being committed, development staff involved, and objectives in meeting end-users' needs. MicroVax computers are currently used by two programs (NBP and SFEP). The BBP plans to switch to a mainframe VAX computer eventually. Two DIMS are part of the EPA communications network. The CBP DIMS is a node on the network and the NBP DIMS is interfaced to EPA's National Research Laboratory computer system also a node on the EPA network. The six programs with a DIMS completed or in development chose six different software systems for database management. Two programs (CBP and APES) employing custom integrated software are supported by large data management staff. SFEP is the only program that developed a information management system. Having different database management software does not preclude programs from supporting consistency of data sets. Proper use of any database management system can afford transfer of data between programs even if the original database structures may be different. Of more importance to data consistency are policies or standards for data encoding. Few Tier I NEPs had program-wide standards for encoding data. Those programs that had standards (NBP, PSEP and CBP) were able to more completely integrate separate data sources than those programs where data compilation was not coordinated by standards. Tier II NEPs should learn from the database management successes of earlier NEPs who were able to adopt program-wide encoding standards. ODES is an example of a large data management system that provides inherent value to Tier II NEPs because it provides these fundamental standards (referred to as "formats") for encoding new data. Other programs, such as PSEP1s system and the CBP's also benefit from being able to consider uses for data instead of reinvesting in reformatting exercises each time new data combinations are needed. The final aspect of summarizing Tier I DIMS is to review the types of users that each system is intended to serve. Table 2 shows the users served by the individual program DIMS. The table includes a distinction between how the 32 ------- TABLE 2. MATRIX SHOWING USER ACCESS FOR THE SIX TIER 1 ESTUARY PROGRAMS AND THE CHESAPEAKE BAY PROGRAM. Users CBP PSEP APES NBP LISS BBP SFEP Online State & Federal X X X X X Agencies Scientific Community X )\ X Public X X ? On Demand State & Federal X X X X X Agencies Scientific Community XX X X Public X X 33 ------- DIMS is operated to serve a given data need. Online uses denote DIMS which can be operated directly by the user group. In the case of the public, this would mean that a citizen could gain direct access to the DIMS contents. On- demand uses refer to those DIMS that could be operated by a system analyst on behalf of the given type of user. All the programs interviewed, with the exception of LISS, support data needs for State and Federal agencies. Five programs (CBP, PSEP, NBP, BBP, and SFEP) support data requirements for the scientific community. Only CBP and SFEP provide direct access for the scientific community. These two programs also support direct access for the public. In summary, the Tier I DIMS contain a variety of features that serve a diversity of users needs. In planning a Tier II DIMS, it is important to consider the above efforts and successes. In addition to these considerations, proper DIMS design and implementation must examine the long- term budgetary commitments required, system-development time constraints, and level of quality assurance expected of the DIMS contents. 34 ------- |