United States
Environmental Protection
Agency	
Robert S. Kerr Environmental
Research Laboratory
Ada OK 74820
Research and Development
EPA/600/S2-89/040  Sept. 1989
Project  Summary

The  Establishment of a
Groundwater  Research  Data
Center  for Validation  of
Subsurface  Flow  and
Transport  Models
Paul K. M van der Heijde, Wilbert I. M. Elderhorst, Rachel A. Miller, and
Manjit F.  Trehan
  The International  Ground Water
Modeling Center has established a
Groundwater Research Data Center
that provides information on datasets
resulting from publicly funded  field
experiments  and related bench
studies in soil and groundwater pol-
lution and  distributes datasets for
testing and validation of models for
flow and contaminant transport in the
subsurface.   To fulfill its  advisory
role,  the  Data Center analyzes
information  and documentation
resulting from field  and  laboratory
experiments in the  saturated  and
unsaturated zones and evaluates the
appropriate  datasets  for  their
suitability  in  model testing  and
validation. To assure consistency in
the analysis and description of these
datasets and  to provide an efficient
way to search, retrieve, and report
information on these datasets, the
Center has  developed  a com-
puterized data directory,  SATURN,
programmed independently from any
proprietary  software.  As  secondary
users of such data are highly inter-
ested  in information  about the
assessment of data quality, a primary
concern  of the Center is  the
evaluation and documentation of the
level of quality assurance applied
during  data acquisition, data hand-
ling, and data storage. In addition to
providing referral services,  the  Data
Center distributes  on  an "as-is"
basis, selected, high-quality datasets
described in the data directory. The
datasets of concern represent differ-
ent hydrological, geological, and
geographic-climatic settings,  pollu-
tant compositions, and degrees  of
contamination.
  This Project  Summary was  devel-
oped by EPA's Robert S. Kerr Environ-
mental Research Laboratory, Ada, OK,
to announce key findings of the re-
search project that is fully  docu-
mented in a separate report of the
same title (see Project  Report order-
ing information at back).


Introduction
  The  ability to predict accurately the
transport and fate of  potential con-
taminants is critical to the success of
most groundwater regulations. Attempts
at protecting the integrity of an aquifer or
engineered facility through monitoring of
groundwater quality alone often are inef-
fective alternatives to predictive  model-
ing. Thus, development and adoption of
methods for predicting pollutant transport
and fate in the saturated and unsaturated
zones of the subsurface are key elements
of the EPA's groundwater research stra-
tegy. The development and accuracy of
such predictive  capabilities cannot take
place without an equally significant effort
in subsurface characterization.
  With the growing availability and use of
subsurface flow and transport models.

-------
concerns  regarding their  validity  and
accuracy has increased. Model testing, or
more specifically model  validation,  pro-
vides model  users, decision  makers,
policy makers, and  legal  authorities with
information on  a model's  performance
characteristics—information  needed  to
judge the usefulness of the model results
for their problem assessments.
  The Groundwater Review Committee of
EPA's Science  Advisory  Board con-
cluded that regardless  of  the type  of
model  chosen,  increased  emphasis
should be given  to field testing  and field
validation. Data generated in  association
with remedial  action  and  monitoring
Superfund  sites  may  be used to fulfill
model validation requirements.  The Re-
view Committee commented  that  these
data should be made available for use by
other investigators. The  Review Com-
mittee also found that the conclusions of
many publicly funded research efforts are
based on data  not available  for peer
review. Therefore, the Committee recom-
mended  that  databases  from  field
research projects be made readily avail-
able to other groups.
  No institution  has existed  for rapidly
locating and  searching  soil  water and
groundwater  research databases  or  for
standardizing data  integrity  and docu-
mentation of  research datasets.  Existing
centralized database facilities  for ground-
water resource  management  do not
provide  the detail and  quality  of data
required  to successfully complete  re-
search on contaminant transport and fate.
In many research projects, the lack  of
rapid access to these data causes delays
and money unnecessarily spent, resulting
in many  incomplete  model validation
initiatives.  The  groundwater research
strategy prepared  by the U.S. Environ-
mental Protection Agency (USEPA) and
the National  Center for  Ground Water
Research  states that the data  accumu-
lated  through Agency-funded  research
will be made  available to the Agency and
to the user community through informa-
tion transfer. A central data clearinghouse
could acquire and distribute such data in
error-free, machine-usable form, efficient-
ly and economically.
  In addressing  this need,  the  Holcomb
Research  Institute of  Butler  University,
with support from  USEPA, has  estab-
lished the Ground Water Research Data
Center  within the framework of the
International  Ground Water  Modeling
Center (IGWMC). The new Data  Center
provides information and referral services
regarding  datasets resulting from publicly
funded  field  and laboratory research on
soil  and  groundwater  pollution.  In
addition, the Data Center has established
procedures  for  selecting,  evaluating,
documenting, and  redistributing  such
datasets. Creation of the Data Center is
expected to lead to  additional protocols
for  error checking,  documentation,  ac-
cessing,  and  transferring  this kind  of
research data, and for acknowledging the
rights that  researchers  have vested  in
their data.

Project Approach
  The  project consisted of two phases:
(1) determination of the scope and design
of the Data Center, and  (2) development
of facilities  and implementation of opera-
tional  procedures and organizational
framework.
  The  first  phase consisted of five ele-
ments:  analysis of  data needs  and
potential  users;  survey  and analysis  of
existing datasets; assessment  of quality
assurance (QA) requirements; determina-
tion of computer  and other  facilities for an
operational  data center; and operational
design of the Data Center.
  The  analysis of soil and groundwater
research data needs and the identification
of potential  users of high-quality,  well-
documented datasets provided guidance,
justification, and  motivation for the devel-
opment of the Data Center. To determine
the  required level-of-effort and to obtain
baseline information for the design of the
Data Center facilities, the availability and
status of a  number of groundwater data-
sets resulting from publicly funded  re-
search have  been  evaluated. Current
practices in collecting, handling, storing,
documenting and distributing these data-
sets have been studied.
  Other data centers  utilizing high-quality
environmental research  and monitoring
datasets have been contacted  to benefit
from their experience in such areas as
dataset acquisition, data handling, and
quality assurance procedures. Specifical-
ly, issues related to the invested rights of
researchers involved in  the  data col-
lection have been discussed.
  Quality assurance  (QA),  an essential
task for a central data distribution facility,
must be  incorporated on two levels: (1)
the  quality  of the datasets of interest
needs  to be determined  and  docu-
mented; and (2) adequate quality assur-
ance procedures need to be established
for  the operation of  the Data Center in
such areas  as dataset evaluation, referral,
management, and transfer.
  To  determine the  level  of  detail
required for the Data Center in the evalu-
ation of the quality  of  prospective
datasets, an inventory has been mac
standards and current accepted prac
as documented in the open literature
technical guidance  of  regula
agencies.
  Based on the findings in phase  1
institutional structure for the Data C
has been determined and  the  data
framework created. Two types of
base  have  been developed:   (
directory-type or referral database
taining descriptive information  on
sets available from the  Data Cent
from other sources;  and (2)  a  dat«
containing the datasets selected for i
bution  by  the  Data  Center.  Inforrr
resulting  from the  dataset  sum
phase  1  has  been incorporated  ii
referral database.
  Arrangements  have  been mac
protect dataset integrity  in their tr;
from their generators  to the Data (
and from the  Data Center  to seco
users.  Furthermore,  quality assu
procedures have been  implements
data handling,  storing,  archiving
backup.  Different  levels  of  implen
tion are distinguished, dependent c
quality and extent of the  dataset
level   of  documentation,  anc
importance of the data. Technical si
for format and transfer medium, an
limited extent for the analysis of the
will be provided; the  level of  supp<
depend  on the  implementation
selected.  Policies  have  been devi
regarding  such issues  as  propi
rights, conditional use, potential  Hal
and other legal and ethical issues.
  As a part  of the  IGWMC, the
Center's  activities will  be  subji
annual  review by the   IGWMC
Board  and the  International Tec
Advisory Committee (ITAC).

Groundwater Research Dai
  Data  on  groundwater  qualit1
quantity  are  characterized in  bo
spatial  and temporal domains. Twc
types of  data are distinguished
specific data, and generic, site-ind
ent data.  It should be noted  that
report the term groundwater  is u
water  in  both the saturated  at
saturated zones  of  the aquati'
surface.
  Certain kinds of site-specific d
constant for the time period und
sideration, but may vary from loc
location.  Other site-specific  data
show  a  significant time-depend*
havior. Collection  of  such  data
erally aimed  at identifying regioi
terns during a certain time peric

-------
 tudying  the time variability at specific
 ocations.  These objectives  of  site-
 :pecific  data collection  may  change
 Juring the operation of the data collection
 network, due to changes in management
 leeds,  technology,  and institutional
 irrangements. Subsequently, the design
 md  operation  (when  and where  to
 ample or measure, and  which variable to
 neasure) may be altered. Such variability
 certainly  applies  to research  data net-
 vorks,  which  are  often  project-oriented
 ind of relatively short duration.
  Because  water in  the underground
 )ften moves quite slowly, abiotic or biotic
 ransformations may represent significant
 ittenuation processes  in  the  transport
 nd fate  of  pollutants.  The  presence of
 uch processes results  in a significant
 ncrease in data requirements for the pre-
 lictive analysis of water quality.  Much of
 'iis additional data is generic and can be
 istablished  off-site  in controlled  labora-
 ory or field experiments in  combination
 rith  relevant  site characteristics. Such
 leneric, site-independent data on specific
 ihemicals are increasingly available from
 esearch  on  the  basic  processes that
 lovern contaminant transport and  fate,
 nd are crucial for successful application
 if computer-based prediction techniques
in specific hydrogeologic environments.
  At the beginning of  many  research
projects requiring  data  acquisition, the
establishment of  efficient  data  manage-
ment practices is often more difficult than
anticipated.  Traditionally,  researchers
have had almost total  control  over the
form and documentation  of  their data;
even contractual requirements for data in
machine-process!ble form  have had little
effect on  the ultimate  availability and
utility of most data. In addition, control by
funding  agencies over  procedures and
quality of data collection,  storing, and
distribution  to  a large  number  of
institutions,  requires extensive  organiza-
tional arrangements and additional per-
sonnel.  This  is  especially  true  when
securing  the collected data  for  distribu-
tion after the final research  has   been
completed and the original research staff
is no longer available or  when no funding
is available for continuing data manage-
ment at each individual site.
  Datasets for use  in transport and fate
modeling  studies require a high level of
detail concerning soil and  aquifer prop-
erties, density of data points, contaminant
behavior, and qualitative data descriptors.
Specific data requirements for subsurface
models  include the  need to  define
precisely  the units  of  measure  of  each
input value; for  example,  point  versus
averaged values.
  Data quality is often  critical  in model
validation  due to the sensitivity  of  most
models to changes in certain parameters.
Although a given field  investigation may
result in  a  large amount  of  data, the
usefulness of the study site for model
validation  is determined to a large extent
by the quality of the data, as reported in
the data documentation. However,  often
the  data  documentation is lacking m
detail, especially with   respect  to  data
quality.

Secondary Use of Research
Data
  A  recent  EPA  groundwater  protection
data-requirements  study stressed  the
importance  of  improved  access  to
existing  soil water and  groundwater data
and  of  lowering  the transaction  costs
associated with obtaining and using  such
data. The  report indicates that knowledge
about and access to the large volume of
groundwater data being  generated  from
federal programs and state initiatives is
limited,  because  the data are managed
by many organizations  and are  stored in
many different  locations,  files  and  for-
mats. In addition, relatively few of these
soil water and groundwater  datasets are
computerized, and  a central cataloging
facility is  lacking. Although the study's
conclusions are  concerned  with all
groundwater data useful in the protection
of groundwater  resources,  they apply
equally well to research data.

Sharing Research  Data
  Availability and accessibility  of envi-
ronmental research data are  discussed in
a wide variety of environmental literature.
Reviews of data availability  indicate that
many researchers give little thought to
the  use  of their data  other  than for
immediate research purposes. The ap-
praisal by researchers of the importance
of data accessibility is reflected in their
approach  to data  management.  Many
consider it an administrative chore to be
handled  separately from  the research,
usually  at  the  end of the study.  Other
investigators show a  keen  awareness of
the importance of data management both
for their own use and the use of others.
  Sharing data from detailed groundwater
monitoring studies and laboratory bench
studies is a  subject of  concern  both
economically  and with respect  to the
advancement of scientific research.  Due
to the ever-increasing cost of field studies
and  the  extensive   sampling  periods
required for transport and fate studies, it
has become essential to  share  ground-
water data so that unnecessary  duplica-
tion can be avoided.  Sharing data not
only produces cost benefits; it "reinforces
open,  scientific  inquiry;  permits  verifi-
cation, refutation, or refinement of original
research  results;  stimulates improve-
ments in  measurement  and  data  col-
lection methods;  allows  more efficient
use of resources spent on data collection,
encourages interdisciplinary use  of  data;
and strongly discourages the uncommon,
but nevertheless  serious, problem of
fraudulent research."
  A comprehensive  referral  center as
represented by the IGWMC Groundwater
Research  Data  Center,  focusing  on
selected datasets for  groundwater model
validation  and testing, will  help to avoid
situations where datasets of value to
many potential users  go unrecognized
and therefore unused.

-------
Paul K. M van der Heijde, Wilbert I. M. Elderhorst, Rachel A. Miller, and
Manjit F.  Trehan are with Butler University, Indianapolis, IN 46208.
Joe ft. Williams  is the EPA Project Officer (see below).
The complete report, entitled "The Establishment of a Groundwater Research Data
 Center for Validation of Subsurface Flow and Transport Models," (Order No. PB
 89-224 455/AS; Cost: $28.95, subject to change) will be available only from:
        National Technical Information Service
        5285 Port Royal Road
        Springfield, VA 22161
        Telephone: 703-487-4650
The EPA  Project Officer can be contacted at:
        Robert S. Kerr Environmental Research Laboratory
        U.S. Environmental Protection Agency
       Ada, OK 74820
United States
Environmental Protection
Agency
Center for Environmental Research
Information
Cincinnati OH 45268
o .i.
 Official Business
 Penalty for Private Use $300

 EPA/600/S2-89/040
          CHICAGO

-------