United States	Office Of Water	EPA 503/6-91/001
Environmental Protection	(WH-556F)	November 1990
Agency
v>EPA National Estuary Program
Tier I Data Management
J

-------
FINAL REPORT
on
NATIONAL ESTUARY PROGRAM
TIER 1 DATA MANAGEMENT SYSTEMS SUMMARY
U.S. Environmental Protection Agency
Office of Water
Office of Marine and Estuarine Protection
October 19, 1990

-------
TABLE OF CONTENTS
Page
1.0 INTRODUCTION		1
1.1	DIMS OBJECTIVES AND DESIGN ALTERNATIVES 		1
1.2	EPA'S NATIONAL SYSTEMS		3
1.3	BASIS FOR DIMS PLANS		3
2.0 METHODS		4
3.0 DIMS DESCRIPTIONS		5
3.1	CHESAPEAKE BAY PROGRAM				6
3.1.1	Individual Roles		6
3.1.2	Major Features		7
3.1.3	DIMS Users		7
3.1.4	Tools or Products Available		7
3.1.5	QA/QC Techniques		8
3.1.6	Estimated Costs		8
3.1.7	System Development Milestones/Schedule		9
3.1.8	Recommendations		9
3.2	PUGET SOUND ESTUARY PROGRAM		9
3.2.1	Individual Roles		10
3.2.2	Major Features		11
3.2.3	DIMS Users		11
3.2.4	Tools or Products Available		11
3.2.5	QA/QC Techniques		12
3.2.6	Estimated Costs		12
3.2.7	System Development Milestones/Schedule		13
3.2.8	Recommendations		13
3.3	ALBEMARLE-PAMLICO ESTUARINE STUDY		14
3.3.1	Individual Roles		14
3.3.2	Major Features		14
3.3.3	DIMS Users		15
3.3.4	Tools or Products Available		15
3.3.5	QA/QC Techniques		15
3.3.6	Estimated Costs		16
3.3.7	System Development Milestones/Schedule		16
3.3.8	Recommendations		16
3.4	NARRAGANSETT BAY PROJECT		16
3.4.1	Individual Roles		17
3.4.2	Major Features		17
3.4.3	DIMS Users		18
3.4.4	Tools or Products Available		18
3.4.5	QA/QC Techniques				19
3.4.6	Estimated Costs		19
3.4.7	System Development Milestones/Schedule		19
3.4.8	Recommendations		19
3.5	LONG ISLAND STUDY		20

-------
TABLE OF CONTENTS (Continued)
Page
3.6	BUZZARDS BAY PROJECT		21
3.6.1	Individual Roles		21
3.6.2	Major Features		22
3.6.3	DIMS Users		22
3.6.4	Tools or Products Available		22
3.6.5	QA/QC Techniques		23
3.6.6	Estimated Costs		23
3.6.7	System Development Milestones/Schedule		24
3.7	SAN FRANCISCO ESTUARY PROGRAM		24
3.7.1	Individual Roles		24
3.7.2	Major Features		25
3.7.3	DIMS Users...		26
3.7.4	Tools or Products Available		26
3.7.5	QA/QC Techniques		26
3.7.6	Estimated Costs		26
3.7.7	System Development Milestones/Schedule		27
3.7.8	Recommendations			27
4.0 SUMMARY		27

-------
LIST OF TABLES
Page
TABLE 1. MATRIX SHOWING SYSTEM FEATURES COMMON TO THE SIX TIER 1
ESTUARY PROGRAMS AND CHESAPEAKE BAY PROGRAM	 29
TABLE 2. MATRIX SHOWING USER ACCESS FOR THE SIX TIER 1 ESTUARY
PROGRAMS AND CHESAPEAKE BAY PROGRAM	 33

-------
1.0 INTRODUCTION
Under the Clean Water Act, the Environmental Protection Agency (EPA) is
accountable for environmental data collected as part of the National Estuary
Program (NEP). Programs within the NEP are responsible for the management of
data relevant to their mission. To manage these data, NEPs must adopt a data
management strategy and implement data/information management systems (DIMS).
DIMS can include local computing resources as well as EPA's national computer
systems.
1.1 DIMS OBJECTIVES AND DESIGN ALTERNATIVES
DIMS often serve multiple objectives. One common objective is to provide a
centralized source of data and/or information that can be useful to a variety
of analysts or managers. Another objective of an NEP DIMS is to archive the
results of new studies conducted by NEP studies. Oftentimes, NEP DIMS are
expected to make data or information available to the general public. A
fourth possible objective of DIMS could be to provide online functions that
help users to analyze data. Each of these objectives can lead to different
system implementations.
In certain situations, a "centralized" DIMS can refer to a nationwide system
available to a broad range of potential users. A centralized DIMS can also
denote a system that serves an array of users within one region, such as the
community of scientists involved in an estuary program. During the early
stages of the NEP, Tier I programs were assisted in the automation of their
data into a centralized DIMS. At that time, efforts were made to load data
sets into individual data files that were stored as SAS data sets on the
National Computer Center (NCC). Although these data were centrally located,
they were intended to be used primarily by the contributing estuary programs.
In their original centralized disposition (i.e., in SAS on NCC), these data
sets required knowledge of SAS in order to retrieve and/or examine the data.
The prerequisite of SAS familiarity was too demanding for many and it prompted
1

-------
two types of DIMS development reactions. In one scenario, data sets could be
incorporated into EPA's ODES system, which offered online functions for basic
data analyses without significant knowledge of the SAS software. In another
scenario, some Tier I programs chose to develop custom systems aimed at
serving the particular needs of their respective programs.
DIMS objectives other than centralized access are equally important and
influential in how DIMS are implemented. A top priority should be ensuring
that the DIMS will help scientists and managers to characterize the estuary.
Likewise, DIMS are intended to enable managers to analyze for trends that show
the effectiveness of the Comprehensive Conservation and Management Plans.
However, DIMS cannot be expected to produce the analyses. It is critical for
NEP data managers to recognize two distinct data management functions of
storage and analytical capability. Most DIMS are designed to store raw
information. Characterization and CCMP efforts can be served by storing and
retrieving information with a DIMS. To serve characterization or CCMP efforts
with data analysis functions requires fairly complex system designs.
System designs to serve data storage and/or analytical needs have evolved
during the course of the NEP. Some complex systems have been built, such as
the Chesapeake Bay Systems, to serve regional needs. Other regional systems
have been designed basically to store and disseminate information. Data
dissemination allows the users to analyze the data with their own facilities.
In many cases, these regional systems were developed because the SAS data sets
on NCC were not being utilized by the NEP scientists or managers.
Regional DIMS that are regional must also address needs that extend beyond the
immediate set of users. During 1990, EPA adopted an NEP Data Management
Policy that is intended to serve regional and national goals for the use of
data. This policy addresses how NEP data can be managed regionally while also
utilizing EPA's centralized data systems. This report reviews briefly how the
national and regional DIMS developed during the Tier I programs.
2

-------
1.2 EPA'S NATIONAL SYSTEMS
At the beginning of the Tier I programs, there were many National data
systems, including EPA's STORET, BIOS, and ODES, as well as NOAA's NODC. At
that time, none of these systems was adapted specifically to the NEP. Each
stored data relevant to the NEP. However, the systems were dissimilar.
STORET focused primarily on contaminant levels. BIOS generically stored
biological measurements. NODC was intended to store oceanographic data.
These systems did not have very much analytical capability. ODES, which
employed formats modeled after NODC, was originally intended to store and
analyze data from 301H dischargers.
EPA has supported the development and initial use of the Ocean Data Evaluation
System (ODES) as a national data management system. ODES provides EPA with
accountability for the quality of data sets, as well as for data storage,
access, user support, system maintenance, and general analytical tools.
Recently, ODES functions have expanded to include data dissemination,
incorporation of data from other EPA databases, portability to Geographic
Information Systems, and user-friendly data entry modules. There are
extensive manuals to document data formats and utilities. Successful
implementation of ODES depends on continued commitment from EPA to support
this centralized approach to data management, and effective education of
prospective users such as the NEP Management Conferences.
1.3 BASIS FOR DIMS PLANS
Each NEP Management Conference should implement the most effective, cost-
efficient, and timely means of handling its data and information. Proper
data/information handling is essential for estuarywide status and trends
reports, characterization efforts, and use in executing the Comprehensive
Conservation and Management Plan (CCMP). The NEP Management Conference method
of handling data/information should adhere to, and benefit from, the NEP Data
Management Policy as established by EPA's Office of Marine and Estuarine
Protection (OMEP). To achieve this objective, NEP Management Conferences
3

-------
should consider the variety of user needs, technical requirements, and
alternatives for implementing DIMS.
This report describes the DIMS used by Tier 1 estuary programs. The DIMS
descriptions provide information on the computing resources utilized, as well
as on the factors essential to data management such as staffing and Quality
Assurance/Quality Control (QA/QC). It is hoped that Tier 2 estuary programs
can adopt some of the successful ingredients of Tier 1 systems.
2.0 METHODS
Representatives from the six original estuaries designated to the NEP (Puget
Sound, WA; San Francisco Bay, CA; Albemarle-Pamlico Sounds, NC; Long Island
Sound, NY and CT; Narragansett Bay, RI; and Buzzards Bay, MA) and the
Chesapeake Bay Program were contacted to provide information for this report.
This report serves as a vehicle of technology transfer for data-management
strategies from established Tier 1 programs to newer Tier 2 programs.
Individuals responsible for data management activities in the seven programs
were interviewed by telephone.
Appropriate interviewees were identified through discussions with regional EPA
program coordinators and regional data managers (listed in the table below),
as well as through Battelle's participation in Tier 1 programs.
Tier 1 Program
Chesapeake Bay Program
Puget Sound Estuary Program
Albemarle-Pamlico Estuarine
Project
Narragansett Bay Project
Data Management Coordinator
4
Lowell Bahner	(301) 266-6873
Senior Computer Scientist
Roberta Feins	(206) 464-7320
Scientists/Systems Analyst
Karen Siderelis	(919) 733-2090
Director, Center for GIS
& Analysis
Stephen Hale	(401) 792-6617
(401) 277-3165

-------
Long Island Sound Study
Jane Copeland
Minicomputer Specialist
Barbara Finazzo
EPA Region II
(401) 782-3168
(212) 264-8968
Buzzards Bay Project
Neil MacGaffey
Database Administrator
(508) 748-3600
San Francisco Bay Estuary
Project
Mark Flachsbart
DIMS Representative
(415) 464-7990
Thomas Gulbransen
DIMS Administrator
(617) 934-0571
During the interview discussions, information on the following major topics
was solicited:
•	Individual roles of key DIMS personnel
•	DIMS major features
•	DIMS users
•	Tools or products available from DIMS
•	Quality Assurance/Quality Control techniques
•	Estimated costs
•	System development milestones/schedule
•	Recommendations
The Data and Information Management System approaches vary greatly across the
Tier I National Estuary Programs. While some NEPs have implemented formal
DIMS with standardized procedures, other programs are still developing their
DIMS and basic data management strategies. It is important to note that a
basic data management strategy, such as adopting a centralized approach to
data compilation and storage, is a fundamental element of NEP data management.
Such a strategy is a precursor to DIMS development. DIMS development should
be based on an agreed-upon data management strategy. This report attempts to
include any available elements of Tier 1 data management strategies. However,
the emphasis of this report is to provide functional descriptions of the
existing DIMS.
3.0 DIMS DESCRIPTIONS
5

-------
The DIMS descriptions are presented below for each Tier I NEP individually.
Within each description, information is presented for the major topics that
are relevant.
3.1 CHESAPEAKE BAY PROGRAM
The Chesapeake Bay Program (CBP) often is held up as the model estuarine
resource management program. The program has a higher level of funding than
other programs in the NEP and thereby has DIMS budget constraints that are
very different from those of Tier 2 programs. The CBP may procure equipment
and dedicate staff necessary to run a large data management operation. In
addition, the CBP has been active for nearly twice as long as the Tier I NEPs.
3.1.1 Individual Roles
The CBP is supported by a data management staff of 25 individuals. All staff
members are trained to program in the SAS statistical package that serves as
the data management software. This helps to minimize retraining, which might
otherwise be necessary because of staff transfers or attrition. The DIMS
staff at CBP includes two full-time modelers. Additionally, there are six
staff members trained and three of these dedicated to geographic information
processing with ARC/INFO. Geographic Information System (GIS) trained staff
is expected to increase to 12 within a few months.
Each individual study within the program (i.e., water quality monitoring,
point source discharges, etc.) has one or two staff dedicated to managing
data/results. These people are responsible for interacting with the State
agencies that collect the data to resolve data submission issues. They are
also responsible for collecting documentation for data received from State
agencies.
6

-------
3.1.2 Major Features
The CBP DIMS is run on a VAX 8600 computer. Sixty terminals including Digital
Equipment Corporation, Macintosh and DOS PCs and Tektronix graphic terminals
are interfaced to the main computer. The computer system is interfaced to
EPA's wide area communications network. The system includes a variety of
plotters and printers.
System software includes SAS, ARC/INFO, and Digital Command Language (DCL).
Program staff have developed custom DCL programs to run SAS and ARC/INFO
interactively. The Chesapeake Bay Program uses SAS as its data management
system. This was considered the best available option for Chesapeake Bay,
based on the data management systems available to Federal programs at the time
of initial development. A Fortran-based watershed model of Chesapeake Bay is
maintained by data management staff. The model is linked to ARC/INFO
Geographic Information System (GIS) to display spatial and temporal output. A
three-dimensional time-variant model of Chesapeake Bay is currently under
development. It will run on the EPA Cray supercomputer in FY92.
3.1.3 DIMS Users
Access to the Chesapeake Bay DIMS is widely available. The system is
accessible not only by State and Federal agencies collecting data but also by
the research community. The public also has access to the system via modems.
Tools have been developed to make the system user-friendly.
3.1.4 Tools or Products Available
The Chesapeake Bay Program staff has developed several custom tools to assist
users and data management staff.
•	MONITOR reformats data received from various state organizations
into a standard format so that it can be added to the SAS monitoring
database.
•	CHESSEE is an index tool for the DIMS. CHESSEE serves as a card
catalog supplying summary information of all the data stored in the
7

-------
database. Queries for summary information are made by using
keywords. CHESSEE is available to the public via modems.
•	BAYSTAT allows researchers and computer scientists access to the raw
data instead of to summary information. Menus are used to select
the type of analysis to be performed on specific types of data. The
analysis may be printed out in hard copy or viewed on screen.
•	HISTATS is used to access the historical database maintained by the
Chesapeake Bay Program. Due to gaps in earlier data, HISTATS stores
data by decade.
•	QAQCSTATS provides user access to the QA/QC database.
•	VOLUME is Fortran-based software that models Chesapeake Bay water
quality in three dimensions. The results are mapped in color and as
videos which show how water quality changes through time.
Standard tools for the GIS have not been developed due to the diversity of
requests for analyses. Each GIS project is supported with custom-developed
procedures.
3.1.5	QA/QC Techniques
The Chesapeake Bay Program has a full-time data QA/QC officer. The program
feels that this is essential, especially with monitoring data. Since data
collection is such a large effort, a full-time QA/QC officer is warranted.
The QA Officer ensures that data collection and reporting procedures include
measures to increase and document the quality of the data being produced. The
program was without a QA/QC officer for 4 years and ran into some difficulties
with misreported data. Significant time and money was needed to recompile and
ensure the quality of retrospective data through examination of laboratory
records.
3.1.6	Estimated Costs
Approximately 21 percent of the individual study (i.e., water quality
monitoring projects, point source discharges projects) budget is allocated to
data management activities.
8

-------
3.1.7 System Development Milestones/Schedule
Chesapeake Bay data management staff recommend that first-year data management
activities include the establishment of quality guidelines for data
submission. This requires negotiation with State agencies, the scientific
community, and management. While developing these guidelines, a feasible
scope of work under the proposed standards should be defined. The hardware
and software design of the system can be resolved after making decisions
necessary to implement the program. The Chesapeake Bay DIMS is still being
enhanced, especially GIS-based components.
3.1.8 Recommendations
Chesapeake Bay data management staff stressed the need for on-site computer
resources. This was deemed essential because of turnaround time. Also,
computing power from a shared database system was preferred over unlinked
resources obtained through independent microcomputers. The need for a QA/QC
person was also stressed.
The Chesapeake Bay Program also recommends data management plans for each
monitoring study in the program. Chesapeake Bay Program data management plans
were developed as guidelines for State and Federal agencies for data
submission. The plans also contain a complete data dictionary and all QA/QC
procedures. Program staff recommended that a standard be established
throughout all National Estuary Programs. The Chesapeake Bay documents are
available to other programs. The Chesapeake Bay staff expressed willingness
to assist other programs with data-management strategy development and system
planning by conducting on-site demonstrations or conference.
3.2 PUGET SOUND ESTUARY PROGRAM
The Puget Sound Water Quality Authority was created by State mandate to
develop a management plan for the estuary. The Authority, EPA, and the
Washington Department of Ecology joined efforts under the NEP and created the
9

-------
Puget Sound Estuary Program (PSEP). As a result, the PSEP has strong State
and local support.
The Puget Sound Ambient Monitoring Program (PSAMP) is one portion of the PSEP.
PSAMP is designed to be a long-term comprehensive monitoring program for the
Sound. A data management system has been developed to handle PSAMP data.
This data management system was developed with funding from the EPA. Actual
monitoring activities and subsequent data management costs are funded by the
State of Washington. The data management system is coordinated by Puget Sound
Water Quality Authority personnel.
The data management system developed for PSAMP is designed to store and manage
data from PSAMP monitoring, as well as additional data of value to the
assessment of environmental conditions in the Sound. These additional data
may be from PSEP, or from other agencies. Each agency conducting monitoring
can determine what other data should be included in their own monitoring
database; the PSAMP Steering Committee assesses what additional data should be
included in the PSAMP central database. PSEP contracts now require that all
data be submitted in PSAMP data transfer format, so that they can be loaded
into the PSAMP central database. From the central database, PSEP data
collected in 1990 will be transferred to OMEP in ODES format.
3.2.1 Individual Roles
PSAMP staff at the Puget Sound Water Quality Authority are involved in
coordination of the monitoring program. For several years, there was one
full-time staff member responsible for assessing data management needs and
developing a data management approach. A programmer was added to the staff
within the last year. Much of the earlier data management effort was
performed by contractor support who collected, digitized, and archived
historical data. Other state agencies also have staff or contractors involved
in development of their distributed databases.
Ongoing PSAMP data management at the Water Quality Authority will require one
full-time data manager, and one staff member for data analyses and report
10

-------
preparation. An additional 3.5 staff will be needed to manage the data in the
state agencies doing the monitoring. The monitoring agencies are required to
produce yearly reports of monitoring results, and the central staff will
prepare a yearly summary.
3.2.2 Major Features
The PSEP data management strategy is based on a distributed system approach.
Raw monitoring data are the responsibility of the agency that collects it.
The Puget Sound Water Quality Authority has developed a central database for
storing summary data from the agencies involved in the monitoring program.
The central database is maintained on a PC using dBASE IV software. No
software or database requirements have been imposed on the agencies that
manage the raw data. A data management strategy document was produced that
outlines strict transfer formats for data transmittal to the central system.
These same formats can be used for data exchange among agencies. The data
management staff recommend that these formats be used by other studies in the
PSEP. The Puget Sound Water Quality Authority is developing a GIS with other
state agencies that run ARC/INFO software.
3.2.3 DIMS Users
The PSEP explored the option of developing a networked DIMS that would allow
multiuser access to the system. The networked system was not developed
because of budget constraints. The agencies that perform monitoring studies
have access to the DIMS, but the scientific community and public do not and
must request data from the Puget Sound Water Quality Authority. The Authority
is responsible for filling data requests and providing information in the form
of reports. To date, this has been a very minor task.
3.2.4 Tools or Products Available
The Puget Sound Estuary Project has developed the following tools for getting
information about the Sound:
• Computerized bibliography called SOUND ACCESS
11

-------
•	Library, maintained by a contractor
•	PSAMP central database
•	Puget Sound Environmental Atlas
SOUND ACCESS was installed at The University of Washington library to provide
greater public access. Keywords are used to select information about Puget
Sound. As yet, the system is not widely distributed. SOUND ACCESS is
scheduled to be updated this year by the Puget Sound Water Quality Authority.
The central database, PSAMP, was written in dBase IV (source code available).
An important element in PSAMP is the systematic set of formats that have been
established for data transfer or exchange. An extensive set of documents has
been produced to describe PSAMP components.
The program has also produced the Puget Sound Environmental Atlas. The Atlas
is a hard-copy two-volume set of maps that show the Sound's resources and
pollutant levels. The GIS will be used to update the Atlas.
3.2.5	QA/QC Techniques
Agencies that carry out environmental monitoring as part of the Puget Sound
Ambient Monitoring Program must develop a QA plan for data management as well
as for sampling. The QA plan is required to include PSAMP protocols for
sample analysis and QA/QC. Technical review of the data is the responsibility
of each agency. Data submitted to the central database are checked for
required fields and value ranges. Qualifiers have been added to the database
to describe the level of QA/QC for individual data records. This allows users
to omit questionable data points for status and trends analyses.
3.2.6	Estimated Costs
A set percentage of funding for data management has not been established for
PSEP. The staff estimates that about $200,000 was spent over 3 years on
salaries, software, applications development, and hardware for the central
system. An additional $30,000 was spent on the needs assessment and $20,000
12

-------
on the SOUND ACCESS tool. There are also costs spread across the individual
agencies that enter and maintain their distributed databases. Data management
activities are funded at approximately 10 percent of the requested budget.
3.2.7 System Development Milestones/Schedule
A data management strategy for the PSEP was developed over the course of 1
year. The goals and objectives of the data management system were defined
through user needs assessment. Alternative systems were brought before the
Monitoring Management Committee. The committee decided to develop a
distributed database system. Documentation on system design and data formats
was prepared during the same year. System implementation has taken an
additional 1.5 years.
Implementation of the GIS is on a 2-year schedule. Existing data from the
1987 environmental Atlas are being converted into GIS format in 1990. The
Atlas will be updated during the second year. During 1990, the data
management staff plans to examine historical data for Puget Sound. A list of
potential studies will be presented to the Steering Committee. The committee
is responsible for the final selection of studies to be added to the central
database, and used for Atlas update.
3.2.8 Recommendations
The PSEP staff cautions that the data-management planning process consider
realistic ideas about budget and results. In programs of this type, there is
often conflicting opinion on spending money on collecting new samples versus
managing existing data.
The distributed approach is useful because it tasks the agency collecting the
data with the responsibility for all aspects of data submission and
formatting. This ensures better data documentation, and encourages
"ownership" and use of the data by agencies. The PSEP staff expressed
willingness to work with new programs on data management issues.
13

-------
3.3 ALBEMARLE-PAMLICO ESTUARINE STUDY
The Albemarle-Pamlico Estuarine Study (APES) has combined its data management
efforts with those of the State of North Carolina. By doing this, the study
benefits technically, financially, and with respect to implementation
schedules.
3.3.1 Individual Roles
The APES has one full-time data coordinator position. The coordinator, who is
responsible for all aspects of data management for the program, does not
actually perform the data manipulations. Presently, this position is open.
Early in.the program, the APES Management Conference decided to link APES data
management efforts with State efforts. The North Carolina Center for
Geographic Information and Analysis (formerly Land Resources Information
Service) is the office that handles State data management and GIS efforts.
The Center is organized as a service bureau and easily fits into the role of
working with the APES. Since the State had staff and hardware in place,
minimal training was necessary for the staff to handle NEP data. The Center
has 20 individuals on staff.
3.3.2 Major Features
At the beginning of the APES, the existing hardware and software at the Center
for Geographic Information and Analysis included a Data General minicomputer
that used ARC/INFO as the GIS. Since then, the minicomputer has been replaced
with Sun workstations. It was decided early in the APES program to use a GIS
as an integrated database and to combine efforts with the existing State GIS.
The GIS does not serve as a major comprehensive database, but as a selective
database that can be integrated with other systems such as EPA's STORET. APES
data management staff is focusing on the use of particular data and is not
building a regional system for computerized archiving of program data. The
Center is in the process of adding ERDAS Image Processing software and is also
acquiring SAS statistical software.
14

-------
3.3.3 DIMS Users
The APES data management staff envisions three major user groups of the APES
DIMS. These groups include resource managers (State and Federal agencies),
researchers (universities and contractors performing projects under APES), and
others (private citizens and industries).
Resource managers are expected to be the major users of the DIMS. Currently,
the system is used by State agencies via either direct access or data
requests. Direct access is limited because the software for dialing in is not
fully operational. Data from APES research projects have been added to the
system but researchers have not been active users. The data management staff
hopes to see an increase in users from the research community over the next
few months. The only public exposure to the DIMS has been through GIS
presentations for public meetings.
3.3.4 Tools or Products Available
A custom user interface to ARC/INFO is being developed to make the database
more accessible to resource management agencies. The GIS has been used to
provide acreage data, preliminary status and trends report, and maps. Since
these GIS outputs are specialized, their production has not been automated
into a fixed set of tools. The APES is developing software to integrate the
GIS with other systems.
Efforts to build a data inventory or index are under way. All data referenced
in the inventory are not being actively managed by the program. A data
dictionary is being developed to describe each data set.
3.3.5 QA/QC Techniques
QA/QC procedures are in place only for cartographic data. APES is documenting
sources, dates, and limitations of data sets but no effort is being made to
correct existing data.
15

-------
3.3.6 Estimated Costs
Twenty percent of program funds are used for data management activities. This
figure is seen as recognition of the importance of data management by the APES
Management Conference. This amounts to approximately $150,000 per year,
including the salary of the data coordinator.
3.3.7 System Development Milestones/Schedule
Within 6-9 months, a working version of the access interface software should
be completed. The database is scheduled to be fully designed during 1990.
The database may not be fully populated before the program is over.
3.3.8 Recommendations
Staff involved in APES data management encourages other programs to look at
existing computer resources. If APES had started from scratch, 3-4 people
would have been needed for data entry, liaison, etc. NEP program data
management can not be handled with one staff person unless work is contracted
out.
APES ran into problems in the beginning because of a delay in making decisions
while trying to hire a data coordinator. The hiring process took more time
than expected, and the new hire left the position after 6 months. Meanwhile,
the program delayed a formal needs assessment and database design. The needs
assessment and database design was not agreed upon and finalized early in the
program. The DIMS remained active because some data had been entered into the
GIS. The program feels that a GIS is a good data management option. The
mapping versatility gains public support and interest.
3.4 NARRAGANSETT BAY PROJECT
The Narragansett Bay Project (NBP) has benefitted by the presence of EPA's
National Research Laboratory at Narragansett and a strong State interest in
16

-------
the Bay. Data management efforts have been enhanced by the availability of
existing data management systems operated by EPA and the State.
3.4.1 Individual Roles
Data management staff for the NBP include a data management coordinator, two
FOCUS programmers, and an ARC/INFO programmer. The senior-level FOCUS
programmer is responsible for developing the database system and the junior-
level FOCUS programmer is setting up a data library and tracking down data.
The Project is guided by the Data Management Working Group, a sub-group of the
Science and Technical Committee.
3.4.2 Major Features
The NBP DIMS uses FOCUS (an EPA and RI Department of Environmental Management
standard) data management software on a microVAX computer. The microVAX is
networked to EPA's National Research Laboratory VAX 785 computer which is a
node on the EPA national network. The NBP DIMS, called the Narragansett Bay
Data System (NBDS), features an on-line index to all data on the system as
well as to certain relevant data from other data systems. The user can dial
up the database from a terminal or a personal computer equipped with a modem.
The system is menu-driven, which means that the user makes choices from a tree
of menus. This makes it fairly easy for almost everyone to use, including
people with little computer experience. Users have direct access to the data
contained in the system. Various formats, code tables, and naming conventions
were borrowed from the CBP, NODC, and ODES to try to obtain some degree of
inter-estuarine compatibility.
The NBP DIMS also uses the Rhode Island GIS located on the University of Rhode
Island's Prime 6350. The GIS is implemented with ARC/INFO software.
Additional hardware includes IBM and Macintosh microcomputers. A data
transfer mechanism is available for downloading data from the database to the
GIS or microcomputers.
17

-------
The Project funded the Pell Marine Science Library of the University of Rhode
Island's Graduate School of Oceanography to update the Bay Bib, a bibliography
of Bay-related publications. This will be available on a microcomputer-based
bibliographic software package Inmagic. The NBDS, the GIS, and the Bay Bib
are cross-referenced by geographic regions, station IDs, and accession
numbers.
3.4.3 DIMS Users
A formal Survey of Potential Narrangansett Bay Data System Users was conducted
to obtain input on the type of system to develop. The NBP DIMS is in the
development stage and not available for full online access. The data being
collected as part of the characterization study are being used by NBP staff,
the State, and the principal investigators. The user needs survey indicated
that the research community and State and Federal agencies need access to the
completed DIMS. Nonprofit environmental groups also requested access to the
system. The survey indicated some interest in a citizen's account, which the
general public could use to obtain summary information about the Bay and the
Project.
3.4.4 Tools or Products Available
The NBP data management system is in the development phase. The goal is to
have the FOCUS database online by the end of 1990. The program is also
developing an index to search by parameters, species, or areas to find
information concerning available data, collection dates and locations, and
collection agency. A major use of the data system will be to store and
analyze data obtained from the long-term monitoring program to see whether the
implemented management recommendations are having the desired effect in
improving Bay water quality.
18

-------
3.4.5 QA/QC Techniques
Data submitted to the NBP must be in accordance with the Narrangansett Bay
Data System Data Submission Manual. The NBP has adopted the geographic data
standards used by the GIS center at the University of Rhode Island.
Monitoring data receive technical review through the peer review process
established by the NBP Management Conference. Agencies collecting data are
required to develop a QA document as part of the funding agreement between the
program and the agency. The FOCUS database has computerized routines for
checking formats, codes, frequencies, ranges, and character frequencies.
3.4.6 Estimated Costs
Data management costs for the NBP are difficult to estimate because of support
from EPA's Environmental Research Laboratory - Narragansett and the state.
Salary for four full-time staff is approximately $180,000 per year but there
have been four staff only in the past year. The program received a 2-year
grant ($240,000) to develop the data system. This included hardware,
software, and personnel costs.
3.4.7 System Development Milestones/Schedule
The NBP is working to create a system that has been designed around local
needs. Demonstrations of system prototypes and development are an important
part of the design process. Demonstrations are used to secure support for the
system and obtain input.
3.4.8 Recommendations
The data management staff for the NBP recommend using commercially available
database management systems. An index/catalog of available data is an
invaluable tool. Programs should explore resources that are available. The
NBP chose FOCUS as its data management system because it is used by EPA's
National Research Laboratory and the Rhode Island Department of Environmental
19

-------
Management. Data management staff stressed the development of local systems
that have the financial support of the local community. A long-term
commitment is important. Mechanisms to keep the database up to date have to
be implemented.
The staff expressed the need for some national standardization of NEP policy.
New estuary programs can benefit from the use of various formats, code tables,
and so on (such as the NODC taxonomic codes and the Chemical Abstract Services
chemical codes) which have become somewhat standard. The Chesapeake Bay data
management staff assisted the Narragansett Bay Project with its data
management efforts; the NBP has also corresponded with staff involved in the
Tier I estuary management efforts.
3.5 LONG ISLAND SOUND STUDY
The initial data management efforts for the Long Island Sound Study (LISS)
were the collection of historical data. Review of the historical data was
performed manually. The historical data were put online at EPA1s NCC but the
general consensus was that the data were inaccessible with poor documentation
on quality assurance. The Management Conference decided that a DIMS was not
necessary in developing a characterization of Long Island Sound.
Two models are being developed by the LISS; a hydrodynamic model developed by
contractor support and a water quality model developed by the National
Oceanographic and Atmospheric Administration. The completed models will be
coupled together. Both modeling efforts receive data directly from State
agencies and universities collecting data.
The Management Conference has decided that a DIMS is necessary for managing
monitoring data collected during the implementation of the CCMP. A data
management working group of 8-10 people has been tasked with developing a DIMS
for this monitoring data. This group has recently completed an extensive
survey of prospective users needs. The results of the survey have been
compiled and are available from the LISS program office. To date, an actual
DIMS is not in place. During 1990, the working group expects to reach a
20

-------
decision on their data management approach and hire a full time data manager.
Efforts to implement a DIMS and compile data will begin after a staff person
is hired. The study anticipates using an existing data management system that
is easily accessible and meets the needs of the users.
LISS DIMS users will include State and Federal agencies and citizen groups.
The agencies will have access to the entire database to determine the success
of the CCMP. Citizen groups will access summary information.
The study is exploring the option of establishing a repository for the
historical data and the data collected as part of the characterization study.
The data repository may reside at Connecticut's environmental library.
3.6 BUZZARDS BAY PROJECT
The Buzzards Bay Project (BBP) is presently designing and implementing a DIMS.
The DIMS is being developed to support the Comprehensive Conservation and
Management Plan. Earlier data management efforts did not involve a DIMS, but
instead consisted of compiling separate data files on the EPA NCC. Active
development of the DIMS began when a data manager was hired.
3.6.1 Individual Roles
Presently, there is one full time staff position dedicated to design,
implementation, and management of the DIMS. This database administrator leads
a working group of approximately 12 regional scientists, program managers, and
system administrators. The working group provides input to DIMS development
by identifying priorities, existing resources, and related activities. The
working group also provides feedback to DIMS planning by reviewing system
prototypes under development. DIMS plans are formally reviewed by state
personnel according to standards and criteria used to manage all software
development by state agencies. Protocols from Massachusetts' state geographic
information system group are being followed during development of the GIS
component of the BBP DIMS. State agencies also play a large role in
implementation of the Buzzards Bay DIMS because most historical data exist in
21

-------
various agency systems. Likewise, the Buzzards Bay DIMS implementation will
be able to accommodate new data being collected by these organizations.
3.6.2 Major Features
The Buzzards Bay DIMS has two major components, a marine monitoring database
and a GIS. The marine database is being implemented with ORACLE database
software. The database will have an index function to reference both online
data and other data types collected by surveys that are not yet online. The
GIS is being developed in ARC/INFO. A direct link between these two
components is planned by using an ARC/INFO utility. System components are
being built and tested on a microcomputer.
The system plan calls for full-scale implementation to be on a VAX at the
State environmental agency's data center. Migration to the state's
minicomputer will occur when multiuser access is needed. Implementation on
the State's minicomputer will also enable direct integration with related
databases maintained by other groups within the State.
The Buzzards Bay DIMS is being implemented in ORACLE primarily because all
marine monitoring data collected in the State, either by BBP or regional
organizations, is being stored in ORACLE. ORACLE is also the State
environmental agency's standard database management software.
3.6.3 DIMS Users
Direct access to the Buzzards Bay DIMS is limited to the database
administrator. The database administrator fills any data requests from State
and Federal agencies, the scientific community, and citizens groups. Analyses
of these data are accomplished by users on their own computer systems.
3.6.4 Tools or Products Available
The DIMS being implemented for Buzzards Bay will be used to produce outputs to
support specific questions about estuarine conditions. The database will be
22

-------
able to generate reports about particular stations or areas in the estuary.
Station-level data retrievals are well suited to report water quality data,
especially for sub-areas of the estuary that have been the subject of
concentrated surveying efforts. Reports about station level data are starting
to be produced and will be more extensive as the database expands. Spatial
presentation of marine data has been used to depict frequency of water quality
measurements exceeding given thresholds; these kinds of depictions, and also
spatial analysis, are among the short-term objectives of the data management
effort.
3.6.5 Quality Assurance/Quality Control Techniques
Water quality data for the BBP was examined for assurance of quality based on
review of historical data files and field sampling documentation. However, a
program-wide data quality control plan does not exist yet. Quality assurance
guide requirements for future data submissions and data recording are
established. QA/QC will also be applied when Buzzards Bay data are submitted
for archival in ODES.
Some information relevant to quality assurance is stored as part of the index
to existing data sets. This information includes items such as data sources,
contact(s), sampling organization, analytical laboratory, and a bibliographic
reference.
3.6.6 Estimated Costs
For the first few years, data management efforts focussed on compiling
historical data, and entering data into files at NCC. The contractor who
compiled these data was available to assist users in gaining access to the
files. The cost for these activities is not known because they were not
separate budget items. Recently, data management resources have been
redirected toward implementing the regional DIMS described above. The two
year cost for DIMS implementation including staff and equipment, is
approximately $150,000. Additional costs may be incurred beyond 1991, but
have not yet been budgeted.
23

-------
3.6.7 System Development Milestones/Schedule
All system development has taken place over the past 18 months. Full
implementation is planned over the next few years; however, some detailed
steps are scheduled for the immediate future. Existing data sets are being
added to the DIMS. GIS data are being added to support program objectives,
and GIS products are regularly produced using existing MassGIS data. Other
data types, such as environmental conditions, physical/chemical
characteristics, and geographic data will be added to the DIMS individually
based on the priority data needs. Initial data loading is focusing on
nutrients, toxics, and pathogens.
A comprehensive survey of existing data sets has been completed. This eight
page survey was completed for approximately 55 data sets. Comparable
information, prepared by a contractor, exists for an additional 50-60 data
sets.
3.7 SAN FRANCISCO ESTUARY PROGRAM
The San Francisco Estuary Program (SFEP) has developed a DIMS and produced raw
data files. The DIMS is intended to provide users with information such as
bibliographic references, indices of data sets, and testimonies. The DIMS is
not intended to provide users with data; select data are available from files
which reside on NCC. The SFEP Technical Advisory Committee decided that it
was in the best interest of the SFEP to focus on information management in the
DIMS. This early decision served as the target need being met by the existing
DIMS.
3.7.1 Individual Roles
The primary roles in SFEP DIMS implementation were program managers and system
developers. The program managers included a SFEP staff member and members of
the technical advisory committee. The system developers who provided input
into the DIMS development included contractors and a regional nonprofit
organization with DIMS development interests.
24

-------
The role of the program managers was to define the objectives of the DIMS.
Once the objectives were defined, the program managers determined the most
appropriate organizational mechanisms for developing the DIMS. The program
managers were also asked periodically to review DIMS features.
The role of the system developers was to determine the most feasible way to
design and build a DIMS that met the users' needs. Contractors played a key
role in designing, building, and loading the DIMS. The nonprofit organization
also played a key role in gathering the basic information and organizing it
for input into the system. It should be noted that the nonprofit organization
provided staff, facilities, and other resources to assist in DIMS
implementation.
3.7.2 Major Features
There are three main features of the SFEP DIMS: a State Hearings Testimony
database, a data index, and a bibliography. The State Hearings Testimony
database is a text-based system that allows retrieval of articles, exhibits,
and testimony admitted in a court case in the State of California. The data
index and bibliography provide reference to data sets or studies identified as
relevant to the San Francisco Bay estuary system.
SFEP DIMS make use of local computing resources and NCC. The three SFEP DIMS
features described above were developed by using INF0/DB+, C, and user-
interface software on a MicroVAX that is networked to microcomputers. The
three features on the local system can be accessed with modems that access a
menu-driven user interface. The SFEP DIMS includes a series of dedicated
terminals and modems located at universities and at State and Federal offices
throughout the State. Data files were stored on the NCC IBM mainframe by
using SAS software and the TSO operating system. There are no direct
connections between the local DIMS and the NCC system.
25

-------
3.7.3 DIMS Users
The SFEP DIMS is accessible to State and Federal agencies, the scientific
community, and the public. Access to the system may be gained via modems or
use of dedicated terminals located around the State. User accounts are
available upon request. To date, there are over 40 user accounts. Use
statistics are maintained on frequency and duration of system usage.
3.7.4 Tools or Products Available
The tools and products available from the SFEP DIMS are described briefly in
the preceding section on system features.
3.7.5 Quality Assurance/Quality Control Techniques
Any information that is included in the DIMS features is compiled by members
of the staff who check for completeness of information based on their
familiarity with the topics. Keyword entries are manually added to the
testimony transcript data tape. Loading of information into the index,
bibliography and testimony database are accomplished via subroutines written
by the system administrator. The information loading process is not subject
to quality control beyond the loading program's ability to reject and report
specific entries.
Because raw data are not included in the SFEP DIMS, there is no need for data-
submission quality control standards or procedures. It is expected that all
new data generated by SFEP field surveys will be submitted in accordance with
ODES conventions.
3.7.6 Estimated Costs
The costs for SFEP DIMS development have fluctuated according to how many
individuals were dedicated to development and loading activities. During the
first 2 years, DIMS development costs included multiple system analysts, data
technicians and purchase of hardware and software. Most recently, SFEP DIMS
26

-------
costs have been limited to a system operator and data,technician, primarily to
operate and maintain the system. In total, the SFEP data and information
management effort over the past 3 years has cost approximately $600,000.
3.7.7 System Development Milestones/Schedule
Each of the SFEP DIMS features took approximately 6-12 months to design and
implement. The most important milestones in SFEP DIMS development were the
decisions concerning priorities and program direction. System developments
were discontinuous over the past 3 years while priority objectives for the
DIMS were reevaluated.
Informal user needs surveys were conducted at the outset of DIMS activities
and again recently. The first user survey polled to identify the relative
demand for centralized data management versus centralized access to
information and references. The second needs survey was intended to
reevaluate the recent demand for the existing DIMS features, as well as
reassess whether centralized data access was now necessary.
3.7.8 Recommendations
The information system, like any data management system, must be kept up to
date. New information should be added as it becomes available. The system
becomes less useful if it contains historical information, but does not
include the latest information.
4.0 SUMMARY
The DIMS described above have been produced to satisfy a diversity of needs
that varied according the regional priorities facing each of the National
Estuary Programs. These descriptions can improve development of Tier II DIMS
through examination of the individual NEP DIMS assumptions and development
objectives. This summary highlights the commonalities and significant
departures between Tier I DIMS along with some explanation of their rationale.
27

-------
All seven estuary programs interviewed for this report had at least one staff
member dedicated to data management activities. Several programs stressed the
importance of having one identified person who was responsible for the
coordination of all data management activities. Even when data were managed
at several locations, a central person with data management responsibility
increased coordination and enabled data integration. In the one noted case
where a QA/QC officer was appointed, costs for data integration were
significantly reduced.
Even though a person was dedicated to data management within each Tier I NEP,
the Management Conferences differed greatly in their commitment of resources
to DIMS development or a long-term data management strategy. In the absence
of a strategy for managing data, some of the early data gathering efforts were
not well coordinated. Likewise, it was often difficult to commit resources
for planning and development of a DIMS without a clear understanding of the
DIMS objectives. Implementation of full-scale DIMS in Tier I NEPs was
possible only when the management conferences were given the opportunity to
review, comprehend, and modify the DIMS objectives. In other words, Tier I
DIMS development progress was a function of how well the major proponent of
DIMS could enlist adequate, multiyear commitments of resources and staff from
the management conferences.
The major DIMS features developed and the occurrence of each within the
individual estuary programs are best summarized in a matrix (Table 1). One
feature common to the majority of programs is a GIS. Another common feature
is a computerized index of data sets or information sources. Features that
are presented as conceptually similar between programs are not necessarily
identical, especially with respect to how fully the features are implemented.
Of equal importance to note are the variety of differences between DIMS
features implemented in these programs.
Five of the seven programs interviewed maintain a GIS. These programs are
Chesapeake Bay Program, Puget Sound Estuary Program, Albemarle-Pamlico
Estuarine Study, Narragansett Bay Project, and Buzzards Bay Project.
28

-------
TABLE 1. MATRIX SHOWING SYSTEM FEATURES COMMON TO THE SIX TIER 1 ESTUARY
PROGRAMS AND THE CHESAPEAKE BAY PROGRAM.
Feature	CBP PSEP APES NBP LISS BBP SFEP
Main Frame Computer
with networked PCs &
graphics terminals
X X




X
Personal Computers

X




MicroVAX


X
K
X
X
Sun Workstations


X


EPA network
X


X


SAS statistical
software
Xa

_Q
CL
X



GIS w/ARC/INFO
X
X
xa
X

X
dBASE IV software
Xa





Focus database
management system



xa


Oracle database
management system





xa
INFO/DB+ software






Custom integration
software
X

X



Image Processing
software


XP



Model of estuary
X



X

INDEX
X

X
X

X
Data formatting tools
X


X


Data accessing tools
X

X
X


29

-------
TABLE 1. (Continued)
Feature	CBP PSEP APES NBP LISS BBP SFEP
Transfer format	XX	X
documentation
Data dictionary	X	XX	X
Computerized QA/QC XX	X
routines
Bibliography	XX	X
Text-based information X	X
system
Dial in access	X	Xp Xp	Xp
Data in ODES
Data at NCC	X	X	X	X X
J Used as DIMS
Used for statistical analysis not dbase management
Xp: Planned features
30

-------
All five use ARC/INFO as the GIS software. Each program uses their GIS for
specific needs, which are not necessarily similar between programs. All
programs use the GIS for presentation of spatial information, such as
simultaneous overlays of ecological resource maps and observed contaminant
levels. A few programs use the GIS to assist in assembling data sets by using
maps to determine/verify sampling station locations. Battelle did not gather
sufficient information to characterize how Tier I programs may be using GIS to
conduct spatial analyses, such as quantifying change in habitat areas or
proximity to incidences of contaminant levels in violation of EPA criteria.
Spatial analyses should be a major value added to the NEPs by GIS
implementation, provided that the underlying data are rigorous enough to
support the tests. Understanding spatial analyses possible with a GIS can
serve as integral feedback to help NEPs decide which types of data to gather.
Whereas a GIS can be helpful to plan data collection, an index of data or
information is essential to utilizing what is already available. Data or
information indices were built by five programs (CBP, APES, NBP, BBP, and
SFEP). These indices were included as part of the DIMS, with varying levels
of detail. It is unclear how many of these indices will be updated as new
entries are generated. Dial-in access will eventually be available for four
programs. APES, NBP, and LISS anticipate developing dial-in access, whereas
CBP1s system is already available.
Differences in DIMS features between NEPs arose for many reasons, including:
(1)	Priorities of the management, technical, and citizens advisory
committees
(2)	Decisions to build on existing State agency infrastructures,
including staff, software, and/or hardware
(3)	Adoption of a decentralized approach to data handling/analyses
(4)	Selection of different data management software and/or hardware
(5)	Different uses of previously available data management support
(6)	Emphasis on information dissemination in lieu of data management
(7)	Funding levels and sources
31

-------
Tier I DIMS are implemented on hardware that ranges in scope from mainframes
and minicomputers to microcomputers. The hardware platform employed has
depended on the system needed, financial resources being committed,
development staff involved, and objectives in meeting end-users' needs.
MicroVax computers are currently used by two programs (NBP and SFEP). The BBP
plans to switch to a mainframe VAX computer eventually. Two DIMS are part of
the EPA communications network. The CBP DIMS is a node on the network and the
NBP DIMS is interfaced to EPA's National Research Laboratory computer system
also a node on the EPA network.
The six programs with a DIMS completed or in development chose six different
software systems for database management. Two programs (CBP and APES)
employing custom integrated software are supported by large data management
staff. SFEP is the only program that developed a information management
system. Having different database management software does not preclude
programs from supporting consistency of data sets. Proper use of any database
management system can afford transfer of data between programs even if the
original database structures may be different. Of more importance to data
consistency are policies or standards for data encoding.
Few Tier I NEPs had program-wide standards for encoding data. Those programs
that had standards (NBP, PSEP and CBP) were able to more completely integrate
separate data sources than those programs where data compilation was not
coordinated by standards. Tier II NEPs should learn from the database
management successes of earlier NEPs who were able to adopt program-wide
encoding standards. ODES is an example of a large data management system that
provides inherent value to Tier II NEPs because it provides these fundamental
standards (referred to as "formats") for encoding new data. Other programs,
such as PSEP1s system and the CBP's also benefit from being able to consider
uses for data instead of reinvesting in reformatting exercises each time new
data combinations are needed.
The final aspect of summarizing Tier I DIMS is to review the types of users
that each system is intended to serve. Table 2 shows the users served by the
individual program DIMS. The table includes a distinction between how the
32

-------
TABLE 2. MATRIX SHOWING USER ACCESS FOR THE SIX TIER 1 ESTUARY PROGRAMS AND
THE CHESAPEAKE BAY PROGRAM.
Users	CBP PSEP APES NBP LISS BBP SFEP
Online
State & Federal	X	X	X	X	X
Agencies
Scientific Community X	)\	X
Public	X X
?
On Demand
State & Federal	X X X X	X
Agencies
Scientific Community	XX X	X
Public	X X
33

-------
DIMS is operated to serve a given data need. Online uses denote DIMS which
can be operated directly by the user group. In the case of the public, this
would mean that a citizen could gain direct access to the DIMS contents. On-
demand uses refer to those DIMS that could be operated by a system
analyst on behalf of the given type of user. All the programs interviewed,
with the exception of LISS, support data needs for State and Federal agencies.
Five programs (CBP, PSEP, NBP, BBP, and SFEP) support data requirements for
the scientific community. Only CBP and SFEP provide direct access for the
scientific community. These two programs also support direct access for the
public.
In summary, the Tier I DIMS contain a variety of features that serve a
diversity of users needs. In planning a Tier II DIMS, it is important to
consider the above efforts and successes. In addition to these
considerations, proper DIMS design and implementation must examine the long-
term budgetary commitments required, system-development time constraints, and
level of quality assurance expected of the DIMS contents.
34

-------