United States
Environmental Protection
Agency
Robert S. Kerr Environmental
Research Laboratory
Ada OK 74820
Research and Development
EPA/600/S2-89/040 Sept. 1989
Project Summary
The Establishment of a
Groundwater Research Data
Center for Validation of
Subsurface Flow and
Transport Models
Paul K. M van der Heijde, Wilbert I. M. Elderhorst, Rachel A. Miller, and
Manjit F. Trehan
The International Ground Water
Modeling Center has established a
Groundwater Research Data Center
that provides information on datasets
resulting from publicly funded field
experiments and related bench
studies in soil and groundwater pol-
lution and distributes datasets for
testing and validation of models for
flow and contaminant transport in the
subsurface. To fulfill its advisory
role, the Data Center analyzes
information and documentation
resulting from field and laboratory
experiments in the saturated and
unsaturated zones and evaluates the
appropriate datasets for their
suitability in model testing and
validation. To assure consistency in
the analysis and description of these
datasets and to provide an efficient
way to search, retrieve, and report
information on these datasets, the
Center has developed a com-
puterized data directory, SATURN,
programmed independently from any
proprietary software. As secondary
users of such data are highly inter-
ested in information about the
assessment of data quality, a primary
concern of the Center is the
evaluation and documentation of the
level of quality assurance applied
during data acquisition, data hand-
ling, and data storage. In addition to
providing referral services, the Data
Center distributes on an "as-is"
basis, selected, high-quality datasets
described in the data directory. The
datasets of concern represent differ-
ent hydrological, geological, and
geographic-climatic settings, pollu-
tant compositions, and degrees of
contamination.
This Project Summary was devel-
oped by EPA's Robert S. Kerr Environ-
mental Research Laboratory, Ada, OK,
to announce key findings of the re-
search project that is fully docu-
mented in a separate report of the
same title (see Project Report order-
ing information at back).
Introduction
The ability to predict accurately the
transport and fate of potential con-
taminants is critical to the success of
most groundwater regulations. Attempts
at protecting the integrity of an aquifer or
engineered facility through monitoring of
groundwater quality alone often are inef-
fective alternatives to predictive model-
ing. Thus, development and adoption of
methods for predicting pollutant transport
and fate in the saturated and unsaturated
zones of the subsurface are key elements
of the EPA's groundwater research stra-
tegy. The development and accuracy of
such predictive capabilities cannot take
place without an equally significant effort
in subsurface characterization.
With the growing availability and use of
subsurface flow and transport models.
-------
concerns regarding their validity and
accuracy has increased. Model testing, or
more specifically model validation, pro-
vides model users, decision makers,
policy makers, and legal authorities with
information on a model's performance
characteristics—information needed to
judge the usefulness of the model results
for their problem assessments.
The Groundwater Review Committee of
EPA's Science Advisory Board con-
cluded that regardless of the type of
model chosen, increased emphasis
should be given to field testing and field
validation. Data generated in association
with remedial action and monitoring
Superfund sites may be used to fulfill
model validation requirements. The Re-
view Committee commented that these
data should be made available for use by
other investigators. The Review Com-
mittee also found that the conclusions of
many publicly funded research efforts are
based on data not available for peer
review. Therefore, the Committee recom-
mended that databases from field
research projects be made readily avail-
able to other groups.
No institution has existed for rapidly
locating and searching soil water and
groundwater research databases or for
standardizing data integrity and docu-
mentation of research datasets. Existing
centralized database facilities for ground-
water resource management do not
provide the detail and quality of data
required to successfully complete re-
search on contaminant transport and fate.
In many research projects, the lack of
rapid access to these data causes delays
and money unnecessarily spent, resulting
in many incomplete model validation
initiatives. The groundwater research
strategy prepared by the U.S. Environ-
mental Protection Agency (USEPA) and
the National Center for Ground Water
Research states that the data accumu-
lated through Agency-funded research
will be made available to the Agency and
to the user community through informa-
tion transfer. A central data clearinghouse
could acquire and distribute such data in
error-free, machine-usable form, efficient-
ly and economically.
In addressing this need, the Holcomb
Research Institute of Butler University,
with support from USEPA, has estab-
lished the Ground Water Research Data
Center within the framework of the
International Ground Water Modeling
Center (IGWMC). The new Data Center
provides information and referral services
regarding datasets resulting from publicly
funded field and laboratory research on
soil and groundwater pollution. In
addition, the Data Center has established
procedures for selecting, evaluating,
documenting, and redistributing such
datasets. Creation of the Data Center is
expected to lead to additional protocols
for error checking, documentation, ac-
cessing, and transferring this kind of
research data, and for acknowledging the
rights that researchers have vested in
their data.
Project Approach
The project consisted of two phases:
(1) determination of the scope and design
of the Data Center, and (2) development
of facilities and implementation of opera-
tional procedures and organizational
framework.
The first phase consisted of five ele-
ments: analysis of data needs and
potential users; survey and analysis of
existing datasets; assessment of quality
assurance (QA) requirements; determina-
tion of computer and other facilities for an
operational data center; and operational
design of the Data Center.
The analysis of soil and groundwater
research data needs and the identification
of potential users of high-quality, well-
documented datasets provided guidance,
justification, and motivation for the devel-
opment of the Data Center. To determine
the required level-of-effort and to obtain
baseline information for the design of the
Data Center facilities, the availability and
status of a number of groundwater data-
sets resulting from publicly funded re-
search have been evaluated. Current
practices in collecting, handling, storing,
documenting and distributing these data-
sets have been studied.
Other data centers utilizing high-quality
environmental research and monitoring
datasets have been contacted to benefit
from their experience in such areas as
dataset acquisition, data handling, and
quality assurance procedures. Specifical-
ly, issues related to the invested rights of
researchers involved in the data col-
lection have been discussed.
Quality assurance (QA), an essential
task for a central data distribution facility,
must be incorporated on two levels: (1)
the quality of the datasets of interest
needs to be determined and docu-
mented; and (2) adequate quality assur-
ance procedures need to be established
for the operation of the Data Center in
such areas as dataset evaluation, referral,
management, and transfer.
To determine the level of detail
required for the Data Center in the evalu-
ation of the quality of prospective
datasets, an inventory has been mac
standards and current accepted prac
as documented in the open literature
technical guidance of regula
agencies.
Based on the findings in phase 1
institutional structure for the Data C
has been determined and the data
framework created. Two types of
base have been developed: (
directory-type or referral database
taining descriptive information on
sets available from the Data Cent
from other sources; and (2) a dat«
containing the datasets selected for i
bution by the Data Center. Inforrr
resulting from the dataset sum
phase 1 has been incorporated ii
referral database.
Arrangements have been mac
protect dataset integrity in their tr;
from their generators to the Data (
and from the Data Center to seco
users. Furthermore, quality assu
procedures have been implements
data handling, storing, archiving
backup. Different levels of implen
tion are distinguished, dependent c
quality and extent of the dataset
level of documentation, anc
importance of the data. Technical si
for format and transfer medium, an
limited extent for the analysis of the
will be provided; the level of supp<
depend on the implementation
selected. Policies have been devi
regarding such issues as propi
rights, conditional use, potential Hal
and other legal and ethical issues.
As a part of the IGWMC, the
Center's activities will be subji
annual review by the IGWMC
Board and the International Tec
Advisory Committee (ITAC).
Groundwater Research Dai
Data on groundwater qualit1
quantity are characterized in bo
spatial and temporal domains. Twc
types of data are distinguished
specific data, and generic, site-ind
ent data. It should be noted that
report the term groundwater is u
water in both the saturated at
saturated zones of the aquati'
surface.
Certain kinds of site-specific d
constant for the time period und
sideration, but may vary from loc
location. Other site-specific data
show a significant time-depend*
havior. Collection of such data
erally aimed at identifying regioi
terns during a certain time peric
-------
tudying the time variability at specific
ocations. These objectives of site-
:pecific data collection may change
Juring the operation of the data collection
network, due to changes in management
leeds, technology, and institutional
irrangements. Subsequently, the design
md operation (when and where to
ample or measure, and which variable to
neasure) may be altered. Such variability
certainly applies to research data net-
vorks, which are often project-oriented
ind of relatively short duration.
Because water in the underground
)ften moves quite slowly, abiotic or biotic
ransformations may represent significant
ittenuation processes in the transport
nd fate of pollutants. The presence of
uch processes results in a significant
ncrease in data requirements for the pre-
lictive analysis of water quality. Much of
'iis additional data is generic and can be
istablished off-site in controlled labora-
ory or field experiments in combination
rith relevant site characteristics. Such
leneric, site-independent data on specific
ihemicals are increasingly available from
esearch on the basic processes that
lovern contaminant transport and fate,
nd are crucial for successful application
if computer-based prediction techniques
in specific hydrogeologic environments.
At the beginning of many research
projects requiring data acquisition, the
establishment of efficient data manage-
ment practices is often more difficult than
anticipated. Traditionally, researchers
have had almost total control over the
form and documentation of their data;
even contractual requirements for data in
machine-process!ble form have had little
effect on the ultimate availability and
utility of most data. In addition, control by
funding agencies over procedures and
quality of data collection, storing, and
distribution to a large number of
institutions, requires extensive organiza-
tional arrangements and additional per-
sonnel. This is especially true when
securing the collected data for distribu-
tion after the final research has been
completed and the original research staff
is no longer available or when no funding
is available for continuing data manage-
ment at each individual site.
Datasets for use in transport and fate
modeling studies require a high level of
detail concerning soil and aquifer prop-
erties, density of data points, contaminant
behavior, and qualitative data descriptors.
Specific data requirements for subsurface
models include the need to define
precisely the units of measure of each
input value; for example, point versus
averaged values.
Data quality is often critical in model
validation due to the sensitivity of most
models to changes in certain parameters.
Although a given field investigation may
result in a large amount of data, the
usefulness of the study site for model
validation is determined to a large extent
by the quality of the data, as reported in
the data documentation. However, often
the data documentation is lacking m
detail, especially with respect to data
quality.
Secondary Use of Research
Data
A recent EPA groundwater protection
data-requirements study stressed the
importance of improved access to
existing soil water and groundwater data
and of lowering the transaction costs
associated with obtaining and using such
data. The report indicates that knowledge
about and access to the large volume of
groundwater data being generated from
federal programs and state initiatives is
limited, because the data are managed
by many organizations and are stored in
many different locations, files and for-
mats. In addition, relatively few of these
soil water and groundwater datasets are
computerized, and a central cataloging
facility is lacking. Although the study's
conclusions are concerned with all
groundwater data useful in the protection
of groundwater resources, they apply
equally well to research data.
Sharing Research Data
Availability and accessibility of envi-
ronmental research data are discussed in
a wide variety of environmental literature.
Reviews of data availability indicate that
many researchers give little thought to
the use of their data other than for
immediate research purposes. The ap-
praisal by researchers of the importance
of data accessibility is reflected in their
approach to data management. Many
consider it an administrative chore to be
handled separately from the research,
usually at the end of the study. Other
investigators show a keen awareness of
the importance of data management both
for their own use and the use of others.
Sharing data from detailed groundwater
monitoring studies and laboratory bench
studies is a subject of concern both
economically and with respect to the
advancement of scientific research. Due
to the ever-increasing cost of field studies
and the extensive sampling periods
required for transport and fate studies, it
has become essential to share ground-
water data so that unnecessary duplica-
tion can be avoided. Sharing data not
only produces cost benefits; it "reinforces
open, scientific inquiry; permits verifi-
cation, refutation, or refinement of original
research results; stimulates improve-
ments in measurement and data col-
lection methods; allows more efficient
use of resources spent on data collection,
encourages interdisciplinary use of data;
and strongly discourages the uncommon,
but nevertheless serious, problem of
fraudulent research."
A comprehensive referral center as
represented by the IGWMC Groundwater
Research Data Center, focusing on
selected datasets for groundwater model
validation and testing, will help to avoid
situations where datasets of value to
many potential users go unrecognized
and therefore unused.
-------
Paul K. M van der Heijde, Wilbert I. M. Elderhorst, Rachel A. Miller, and
Manjit F. Trehan are with Butler University, Indianapolis, IN 46208.
Joe ft. Williams is the EPA Project Officer (see below).
The complete report, entitled "The Establishment of a Groundwater Research Data
Center for Validation of Subsurface Flow and Transport Models," (Order No. PB
89-224 455/AS; Cost: $28.95, subject to change) will be available only from:
National Technical Information Service
5285 Port Royal Road
Springfield, VA 22161
Telephone: 703-487-4650
The EPA Project Officer can be contacted at:
Robert S. Kerr Environmental Research Laboratory
U.S. Environmental Protection Agency
Ada, OK 74820
United States
Environmental Protection
Agency
Center for Environmental Research
Information
Cincinnati OH 45268
o .i.
Official Business
Penalty for Private Use $300
EPA/600/S2-89/040
CHICAGO
------- |