United States        Office of Research and    EPA/600/R-00/036
         Environmental Protection    Development       February 2000
         Agency          Washington, D.C. 20460

vvEPA   Environmental Technology
         Verification Report

         Environmental Decision
         Support Software

         University of Tennessee
         Research Corporation
         Spatial Analysis and Decision
         Assistance (SADA)

-------

-------
                THE ENVIRONMENTAL TECHNOLOGY VERIFICATION
                                        PROGRAM.
  J'KULrKAlVJ^I
ETY
                                                                         oral
c/EPA
 n, IromotiiLil Infection ABH>o                                                     Oak R,dge Nat,ona| Laboratory

                  ETV Joint Verification Statement
 TECHNOLOGY TYPE:    ENVIRONMENTAL DECISION SUPPORT SOFTWARE
 APPLICATION:           INTEGRATION, VISUALIZATION, SAMPLE OPTIMIZATION,
                           COST-BENEFIT, AND RISK ANALYSIS OF ENVIRONMENTAL
                           DATA SETS

 TECHNOLOGY NAME:   Spatial Analysis and Decision Assistance (SADA)

 COMPANY:              University of Tennessee Research Corporation (UTRC)
                           1534 White Avenue, Suite 403
                           Knoxville, TN 37996-1527

 PHONE:                  (865) 241-5741

 WEBSITE:                www.sis.utkedu/cis/SADA
The U.S. Environmental Protection Agency (EPA) has created the Environmental Technology Verification
Program (ETV) to facilitate the deployment of innovative or improved environmental technologies through
performance verification and dissemination of information. The goal of the ETV Program is to further
environmental protection by substantially accelerating the acceptance and use of improved and cost-effective
technologies. ETV seeks to achieve this goal by providing high-quality, peer-reviewed data on technology
performance to those involved in the design, distribution, financing, permitting, purchase, and use of
environmental technologies.

ETV works in partnership with recognized standards and testing organizations and stakeholder groups
consisting of regulators, buyers, and vendor organizations, with the full participation of individual technology
developers. The program evaluates the performance of innovative technologies by developing test plans that
are responsive to the needs of stakeholders, conducting field or laboratory tests (as appropriate), collecting
and analyzing data, and preparing peer-reviewed reports. All evaluations are conducted in accordance with
rigorous quality assurance protocols to ensure that data of known and adequate quality are generated and that
the results are defensible.

The Site Characterization and Monitoring Technologies Pilot (SCMT), one of 12 technology areas under
ETV, is administered by EPA's National Exposure Research Laboratory (NERL). With the support of the
U.S. Department of Energy's (DOE's) Environmental Management (EM) program, NERL selected a team
from Brookhaven National Laboratory (BNL) and Oak Ridge National Laboratory (ORNL) to perform the
verification of environmental decision support software. This verification statement provides a summary of
the test results of a demonstration of the University of Tennessee Research Corporation's (UTRC's) Spatial
Analysis and Decision Assistance (SADA)™ environmental decision support software product.

DEMONSTRATION DESCRIPTION
In September 1998, the performance of five decision support software (DSS) products were evaluated at the
New Mexico Engineering Research Institute located in Albuquerque, New Mexico. In October 1998, a sixth
DSS product was tested at BNL in Upton, New York. Each technology was independently evaluated by
comparing its analysis results with measured field data and, in some cases, known  analytical solutions to the
problem.
   EPA-VS-SCM-35       The accompanying notice is an integral part of this verification statement         February 2000

-------
Depending on the software, each was assessed for its ability to evaluate one or more of the following
endpoints of environmental remediation problems: visualization, sample optimization, and cost-benefit
analysis.  The capabilities of the DSS were evaluated in the following areas: (1) the effectiveness of
integrating data and models to produce information that supports the decision, and (2) the information and
approach used to support the analysis.  Secondary evaluation objectives were to examine the DSS for its
reliability, resource requirements, range of applicability, and ease of operation. The verification study
focused on the developers' analysis of multiple test problems with different levels of complexity. Each
developer analyzed a minimum of three test problems. These test problems, generated mostly from actual
environmental data from six real remediation sites, were identified as Sites A, B, D, N, S, and T. The use of
real data  challenged the software systems because of the variability in natural systems.

The University of Tennessee Research Corporation (UTRC) demonstrated Spatial Analysis and Decision
Assistance (SADA) by performing visualization, sample optimization, and cost-benefit analysis for Sites N
and S. Site N had two separate problems, and both were evaluated using SAD A. The Site N problems  were
two-dimensional (2-D) soil contamination problems for three heavy metals (arsenic, chromium, and
cadmium). In the Site N sample optimization problem, data were supplied over a limited area of the site, and
the analyst was asked to develop a sampling strategy that characterized the remainder of the 125-acre site
while taking only 80 additional samples. The Site N cost-benefit problem contained 524 data points on a 14-
acre region of the site and required the analyst to perform a cost-benefit analysis of the remediation costs vs
cleanup goal for each of the three contaminants. In addition, the analyst was asked to estimate the human
health risks based on current conditions. The Site S test problem was a three-dimensional (3-D) groundwater
contamination cost-benefit problem for a single contaminant (chlordane). The analyst was provided with a
series of wells containing chlordane concentrations as a function of depth. The analyst was asked to define the
region, mass, and volume of the plume at contaminant threshold concentrations of 5 and 500 |Jg/L. Based on
this information and groundwater flow rates, estimates of current and future human health risks were
requested.

SADA was used to integrate large quantities of data into a visual framework for assistance in understanding a
site's contamination problem. For the Site N sample  optimization problem, the data were used to develop a
sampling scheme to characterize the site. Upon completion of the data-collection phase of the problem, maps
with the probability of exceeding threshold concentrations were provided. For the Site N cost-benefit
problem, SADA was used to estimate the cost of cleanup versus the cleanup threshold. Human health risks
were evaluated on the basis of current conditions. For the Site S problem, SADA was used to estimate the
volume of contamination above threshold levels and  human health risks based on current conditions.

Details of the demonstration, including an evaluation of the software's  performance, may be found in the
report entitled Environmental Technology Verification Report: University of Tennessee Research
Corporation,  Spatial Analysis  and Decision Assistance (SADA), EPA/600/R-00/036.

TECHNOLOGY DESCRIPTION
SADA is an environmental software product that incorporates tools from various fields — including
visualization, geospatial analysis, statistical analysis, human health risk assessment, cost-benefit analysis,
sampling design, and decision  analysis — into a dynamic and interactive environment. Each of these modules
can be used independently or in an integrated fashion to address site-specific concerns in the characterization
and remedial  action design. SADA was designed to simplify and streamline several of the processes in
environmental characterization, risk assessment, and cost-benefit analysis to bring the information together in
a way that can help users make decisions about their particular site in a quick and cost-effective manner.
SADA is designed to assist environmental professionals who need to examine the data within a spatial
context. SADA runs on Windows 95, 98, and NT platforms.

VERIFICATION OF PERFORMANCE
The following performance characteristics of SADA were observed:
       EPA-VS-SCM-35        The accompanying notice is an integral part of this verification statement         February 2000

-------
Decision Support: SADA was designed as a decision support tool and directly addresses environmental
questions such as (1) the location and size of the area of contamination, (2) the size of the cleanup zone at a
specified contaminant threshold concentration or risk level, (3) the confidence in predicting the area of
contamination or cleanup zone, (4) the costs for remediating the cleanup zone, (5) the human health risks, and
(6) the optimal location for the next set of samples to best define the extent of contamination. In the
demonstration, UTRC was able to use SADA to quickly import data on contaminant concentrations, overlay
site maps, and integrate this information on a single platform. SADA demonstrated the ability to place the
information in a visual context and produced 2-D and 3-D maps that support data interpretation and decision
making. SADA was used in the demonstration to automatically generate maps showing contaminant
concentration, recommended cleanup zones, cost-benefit curves, and human health risk. These maps can be
based on the probability of exceeding specified contaminant threshold concentrations or risk levels and at
specified probability levels. SADA was also used to predict new sample locations based on statistical and/or
geostatistical analyses of the existing data.

Documentation  of the SADA Analysis: UTRC staff used SADA to generate reports that provided an
adequate explanation of the process and parameters used to analyze each problem.  Documentation of data
transfer, manipulations of the data (e.g., how to treat contamination data as a function of depth in a well),  and
analyses were included. Model selection and parameters for statistical analysis and contouring were also
provided in the exportable documentation.

Comparison with Baseline Analysis and Data: SADA was able to generate 2-D and 3-D maps of
contaminant concentrations, human health risk, probability of exceeding contaminant threshold
concentrations as a function of degree of probability, and remedial zone maps  for specified contaminant
thresholds and probability levels. The maps included posting of data at the sample location, color coding of
sample points to represent a parameter (concentration or risk),  contaminant concentration contours, human
health point risks, and human health risk contours. SADA also generated cost-benefit curves for the cost of
remediation vs the cleanup threshold.  These curves could be calculated for varying degrees of probability in
the data. For the Site N sample optimization problem, the SADA analysis generated an acceptable match to
the data and the baseline analysis. When compared with the baseline geostatistical  analysis that used the entire
data set, SADA identified approximately 75% of the site that had arsenic contamination above 125 mg/kg
with the constraint of an additional 80 samples to characterize the entire 125-acre site. For the Site N cost-
benefit problem, contaminant contour and probability maps were consistent with the baseline interpolation
and geostatistical analysis. Estimates of the area where the contamination exceeded the threshold
concentrations matched, to within 21%, the baseline  interpolation and geostatistical analyses at the 50%
probability levels. Likewise, the area estimates at the 90% probability level were within 21% of the baseline
analyses and geostatistical analysis. The slight differences between SADA and the baseline analysis were due
to the different parameters used for interpolation.  For the Site S cost-benefit test problem, at the 50%
probability level there is good agreement between SADA, the baseline analysis using  Surfer™, and the
baseline geostatistical analysis. In fact, all three area estimates are within 13% of each other, indicating
agreement. For the 10% probability level, the SADA area estimates are 6% less at the 5-|jg/L threshold and
9% less at 500-|jg/L than the baseline area estimates. The difference between the SADA results and the
baseline analysis is due to the slightly different selection of boundaries of contamination and kriging
parameters selected for the analyses. Overall, there is close agreement among the area estimates produced by
SADA and the baseline geostatistical  analysis. Both the noncarcinogenic and carcinogenic risks calculated by
SADA for Site N and Site S were accurate and consistent with the baseline analysis and EPA's risk
assessment guidance for Superfund for all of the test problems.

Multiple Lines of Reasoning: UTRC staff conducted multiple data explorations and evaluations that were
supported by the statistical and geostatistical functions in SADA. This information provided a quantitative
measure of the confidence that could be placed in the decision. Several data interpolation routines were
considered on a problem-specific basis before UTRC staff selected the best one for data analysis. Several
sample optimization schemes are available for use. Selection of a particular scheme depends on the objectives
of the analysis and the amount of data.
   EPA-VS-SCM-35        The accompanying notice is an integral part of this verification statement         February 2000

-------
In addition to performance criteria, the following secondary criteria were evaluated:

Ease of Use: The demonstration showed that SADA was easy to use. The SADA graphical user interface has
a logical structure to facilitate use of the options in the software package. SADA accepts database files in
comma-delimited format; however, database files were supplied in .dbf format. The analyst imported the .dbf
files into another software program (Microsoft Excel) and converted them into comma-delimited files.
Drawing and map files could be read in .dxf format.  Other common image file formats such as jpg and .bmp
were not supported. Visualization results can be output to any other Windows application that supports the
use of the Clipboard, including commonly available software (e.g., Microsoft PowerPoint, Word, and
WordPerfect).

Efficiency and Range of Applicability: SADA relies on a flexible database format with user-defined inputs.
This provides a flexible platform that addresses problems efficiently and is tailored to the problem under
study. The database permits filtering on the contaminant identifier and location. SADA has an auxiliary
database that contains contaminants identified by name and Chemical Abstract Service (CAS) number. This
feature facilitates data checking. SADA also has databases  containing toxicological and exposure scenario
parameters. These databases facilitate human health risk assessment. The software provides analysis on
spatially correlated data and can simulate a wide range of environmental media and conditions (e.g.,
contaminant in groundwater, soil, sediment, or surface water; multiple contaminants on a single site) to be
evaluated.

Training and Technical Support: SADA requires training for efficient and proper use. An analyst with a
background in environmental problems and a basic knowledge  of database operations, human health risk
assessment, and statistics/geostatistics can be using SADA  after one or two days of training. A detailed on-
line help system is supplied with the software package. The on-line help provides examples of how to conduct
analysis and gives recommendations on approaches  to statistical/geostatistical modeling. Examples of
software applications are provided as part of the software packages. A two-day training course is available.
Technical  support is available through e-mail.

Operator Skill Base: Effective use of all of the features of SADA requires that the operator possess a
thorough understanding of the use of geospatial modeling in analyzing environmental problems and human
health risk assessment. This includes an understanding of interpolation algorithms and geostatistics along with
a fundamental knowledge of database manipulations, sample optimization, and cost-benefit analysis.

Platform: During the demonstration, SADA Beta Version 3.0 was operated on a Windows 95 operating
system using a laptop with a 266-MHz Pentium processor,  128 MB of RAM, and 4 MB of video memory.

Cost: SADA will be distributed free over the Internet.

Overall Evaluation: The technical team concluded that the main strength of SADA is its technical approach
to assist environmental decision-makers by defining areas of concern based on user-defined contaminant
concentrations or human health risks. SADA's use of a geostatistical approach provides an estimate of the
degree of uncertainty in the prediction that provides  key information to assist in the selection  of future sample
locations and in determining cost-risk tradeoffs. The incorporation of databases of risk parameters, coupled
with the pull-down menus in SADA, make risk calculations easy to perform. The integration of geostatistical
analysis, human health risk assessment, cost-benefit analysis, sampling design, and decision analysis into a
single software product makes SADA a powerful tool for analyzing spatially correlated data.  SADA
demonstrated the capability to accurately perform sample optimization analysis, estimate areas and volumes
of contamination for cost-benefit analysis, and estimate the probability of exceeding threshold levels in
concentration or risk.

The technical team did not notice any major limitations in SADA. Several minor limitations were noted. The
3-D visualizations provided only a qualitative depiction of the plume because a frame of reference (axis  scale
or surface maps) was not provided. Maps and drawings could be imported only as .dxf files; the capability to
       EPA-VS-SCM-35         The accompanying notice is an integral part of this verification statement         February 2000

-------
import other graphic formats would be beneficial. Finally, data files could be imported only in comma-
delimited format, which requires reformatting in another software product.

A credible computer analysis of environmental problems requires good data, reliable and appropriate
software, adequate conceptualization of the site, and a technically defensible problem analysis. The results of
the demonstration show that the SADA software can be used to generate reliable and useful analyses for
evaluating environmental contamination problems. This is the only component of a credible analysis that can
be addressed by the software. The results of a SADA analysis can support decision making. Although SADA
has been demonstrated to have the capability to produce reliable and useful analyses, improper use of the
software can cause the results of the analysis to be misleading or inconsistent with the data. As with any
complex environmental DSS product, the quality of the output is directly dependent on the skill of the
operator.

As with any technology selection, the user must determine if this technology is appropriate for the application
and the project data quality objectives. For more  information on this and other verified technologies visit the
ETV web site at http://www.epa.gov/etv.
Gary J. Foley, Ph.D.
Director
National Exposure Research Laboratory
Office of Research and Development
                           David E. Reichle, Ph.D.
                           ORNL Associate Laboratory Director
                           Life Sciences and Environmental Technologies
   NOTICE: EPA verifications are based on evaluations of technology performance under specific, predetermined
   criteria and appropriate quality assurance procedures. EPA, ORNL, and BNL make no expressed or implied
   warranties as to the performance of the technology and do not certify that a technology will always operate as
   verified. The end user is solely responsible for complying with any and all applicable federal, state, and local
   requirements. Mention of commercial product names does not imply endorsement.
   EPA-VS-SCM-35
The accompanying notice is an integral part of this verification statement
February 2000

-------

-------
                                   EPA/600/R-00/036
                                   February 2000
Environmental Technology
Verification Report
Environmental  Decision Support
Software
University of Tennessee Research
Corporation
Spatial Analysis and Decision
Assistance (SADA)
                    By

                  Terry Sullivan
              Brookhaven National Laboratory
                Upton, New York 11983

                Anthony Q. Armstrong
                  Amy B. Dindal
                 Roger A. Jenkins
              Oak Ridge National Laboratory
               Oak Ridge, Tennessee 37831

                  JeffOsleeb
                  Hunter College
               New York, New York 10021

                  Eric N. Koglin
             U.S. Environmental Protection Agency
              Environmental Sciences Division
             National Exposure Research Laboratory
              Las Vegas, Nevada 89193-3478
                onvl

-------
                                           Notice
The U.S. Environmental Protection Agency (EPA), through its Office of Research and Development (ORD),
and the U.S. Department of Energy's (DOE's) Environmental Management Program through the National
Analytical Management Program (NAMP), funded and managed, through Interagency Agreement No.
DW89937854 with Oak Ridge National Laboratory, the verification effort described herein. This report has
been peer-reviewed and administratively reviewed and has been approved for publication as an EPA
document. Mention of trade names or commercial products does not constitute endorsement or
recommendation for use of a specific product.

-------
                                  Table of Contents
    List of Figures	    v
    List of Tables 	   vii
    Foreword	    k
    Acknowledgments	    xi
    Abbreviations and Acronyms	  xiii

1   INTRODUCTION	    1
    Background	    1
    Demonstration Overview	    2
    Summary of Analysis Performed by SADA	    2

2   SADA TECHNOLOGY DESCRIPTION	    4
    SADA Features	    4
    SADA Assessment Capabilities	    4
        Human Health Risk	    4
        Geospatial Analysis	    5
        Decision Analysis	    5

3   DEMONSTRATION PROCESS AND DESIGN	    6
    Introduction	    6
    Development of Test Problems	    6
        Test Problem Definition.	    6
        Summary of Test Problems	    6
        Analysis of Test Problems	    7
    Preparation of Demonstration Plan.	    9
    Summary of Demonstration Activities	    9
    Evaluation Criteria	   10
        Criteria for Assessing Decision Support	   10
            Documentation of the Analysis and Evaluation of the Technical Approach	   10
            Comparison of Proj ected Results with the Data and Baseline Analysis	   11
            Use of Multiple Lines of Reasoning.	   11
        Secondary Evaluation Criteria	   11
            Documentation of Software	   11
            Training and Technical Support	   11
            Ease of Use	   12
            Efficiency  and Range of Applicability	   12

4   SADA EVALUATION	   13
    SADA Technical Approach	   13
    SADA Implementation of Geostatistical Approach 	   13
    Description of Test Problems	   14
        SiteN Sample Optimization	   14
        Site N Cost-Benefit Problem	   14
        Site S Cost-Benefit Problem	   15
    Evaluation of SADA	   16
        Decision Support	   16
            Documentation of the SADA Analysis and Evaluation of the Technical Approach	   16
            Comparison of SADA Results with the Baseline Analysis and Data	   16
                Site N Sample Optimization Problem	   16

                                             iii

-------
            Site N Cost-Benefit Problem	   21
            Site S Cost-Benefit Problem	   26
        Multiple Lines of Reasoning	   33
    Secondary Evaluation Criteria	   33
        Ease of Use	   33
        Efficiency and Range of Applicability	   33
        Training and Technical Support	   33
    Additional Information about the SADA Software	   33
Summary of Performance	   35
SADA UPDATE AND REPRESENTATIVE APPLICATIONS	   37
Objective	   37
Technology Update	   37
Representative Applications	   37
REFERENCES	   38

Appendix A— Summary of Test Problems	   39
Appendix B — Description of Interpolation Methods	   45
                                         IV

-------
                                      List of Figures


 1    Site N initial sample locations provided to the analyst and arsenic contours for two threshold
      concentrations	  17
 2    SiteN final sample locations obtained using SADA	  18
 3    SADA contour map for Site N arsenic concentration after completion of the sample
      optimization	  18
 4    Baseline analysis generated by Surfer for the Site N sample optimization problem	  19
 5    Baseline analysis generated by GSLIB for the Site N sample optimization problem	  20
 6    Sample locations and arsenic concentrations generated by SADA for the Site N
      cost-benefit problem	  21
 7    Arsenic contours generated by SADA for the Site N cost-benefit problem	  22
 8    Baseline analysis performed by Surfer, with kriging interpolation of the data, for
      arsenic concentration contours in the SiteN cost-benefit problem	  23
 9    Cleanup zones for arsenic threshold concentration of 75 mg/kg with 50%
      probability  level generated by SADA	  24
10    Cleanup zones for arsenic threshold concentration of 75 mg/kg with 90%
      probability  level generated by SADA	  24
11    SADA-generated  curve for arsenic cleanup costs as a function of concentration at the 50%
      probability level	  25
12    SADA-generated curve for arsenic cleanup costs as a function of concentration at the 90%
      probability level	  26
13    Carcinogenic risk for arsenic based on residential scenario produced using SADA	  27
14    Summed carcinogenic risk for the three contaminants	  27
15    Two-dimensional depiction of chlordane concentrations for one 5-ft interval and
      3-D depiction of entire plume generated by SADA for Site S  	  29
16    Site S chlordane 5- and 500-|jg/L contours for the analytical solution and the
      baseline contour  obtained using kriging with an anisotropy ratio of 0.3	  30
17    SADA-generated Site S chlordane contours at 500 |jg/L produced using the
      maximum value in each well  	  31
18    Site S areas where chlordane threshold of 5 |jg/L is exceeded with 90% probability 	  32
19    Site S chlordane carcinogenic risk contours based on ingestion of groundwater for  a
      residential scenario 	  34

-------

-------
                                     List of Tables
1    Summary of test problems	   7
2    Data supplied for test problems	   7
3    Site N soil contamination threshold concentrations for the sample optimization problem	  14
4    Site N soil contamination threshold concentrations for the cost-benefit problem	  15
5    Comparison of baseline analyses and SADA area estimates for the Site N cost-benefit
     test problem at the 50% probability level	  25
6    Comparison of baseline analyses and SADA area estimates for the Site N cost-benefit
     test problem at the 90% probability level	  25
7    Volume estimates for the Site S cost-benefit test problem by baseline analysis
     methods  and SADA	  35
8    SADA performance summary	  36
                                             vn

-------

-------

-------

-------
                                          Foreword
The U.S. Environmental Protection Agency (EPA) is charged by Congress with protecting the nation's natural
resources. The National Exposure Research Laboratory (NERL) is EPA's center for the investigation of
technical and management approaches for identifying and quantifying risks to human health and the
environment. NERL's research goals are to (1) develop and evaluate technologies for the characterization and
monitoring of air, soil, and water; (2) support regulatory and policy decisions; and (3) provide the science
support needed to ensure effective implementation of environmental regulations and strategies.

EPA created the Environmental Technology Verification (ETV) Program to facilitate the deployment of
innovative technologies through performance verification and information dissemination. The goal of the
ETV Program is to further environmental protection by substantially accelerating the acceptance and use of
improved and cost-effective technologies. The ETV Program is intended to assist and inform those involved
in the design, distribution, permitting, and purchase of environmental technologies.  This program is
administered by NERL's Environmental Sciences Division in Las  Vegas, Nevada.

The U.S. Department of Energy's (DOE's) Environmental Management (EM) program has partnered with
EPA to provide cooperative technical management and funding support. DOE EM realizes that its goals for
rapid and cost-effective cleanup hinge on the deployment of innovative environmental  characterization and
monitoring technologies. To this end, DOE EM shares the goals and objectives of the ETV.

Candidate technologies for these programs originate from the private sector and must be commercially ready.
Through the ETV Program, developers are given the opportunity to conduct rigorous demonstrations of their
technologies under realistic field conditions. By completing the evaluation and distributing the results, EPA
establishes a baseline for acceptance and use of these technologies.
Gary J. Foley, Ph.D.
Director
National Exposure Research Laboratory
Office of Research and Development
                                                IX

-------

-------

-------

-------
                                  Acknowledgments


The authors wish to acknowledge the support of all those who helped plan and conduct the demonstration,
analyze the data, and prepare this report. In particular, we recognize the technical expertise of Jeff Van Ee
(EPA NERL) and Budhendra Bhaduri (ORNL), who were peer reviewers of this report. For internal peer
review, we thank Marlon Mezquita (EPA Region 9); for technical and logistical support during the
demonstration, Dennis Morrison (NMERI); for evaluation of training during the demonstration, Marlon
Mezquita and Gary Hartman [DOE's Oak Ridge Operations Office (ORO)]; for computer and network
support, Leslie Bloom (ORNL); and for technical guidance and project management of the demonstration,
David Garden and Regina Chung (ORO), David Bottrell (DOE Headquarters), Stan Morton (DOE Idaho
Operations Office), Deana Crumbling (EPA's Technology Innovation Office), and Stephen Billets (EPA
NERL). The authors also acknowledge the participation of Robert Stewart of the University of Tennessee,
who performed the analyses during the demonstration.

For more information on the Decision Support Software Technology Demonstration, contact

       Eric N. Koglin
       Project Technical Leader
       Environmental Protection Agency
       Characterization and Research Division
       National Exposure Research Laboratory
       P.O. Box 93478
       Las Vegas, Nevada 89193-3478
       (702) 798-2432
For more information on the SADA product, contact

       Robert N. Stewart
       University of Tennessee
       Temple Court #304
       804 Volunteer Boulevard
       Knoxville, TN 37996-1610
       (865) 241-5741
                                              XI

-------

-------

-------

-------
                          Abbreviations and Acronyms
As            arsenic
.bmp          bitmap file
BNL          Brookhaven National Laboratory
CAS          Chemical Abstract Service
Cd            cadmium
CD-ROM     compact disk — read only memory
Cr            chromium
CTC          carbon tetrachloride
DBCP         dibromochloroproprane
.dbf          database file
DCA          dichloroethane
DCE          dichloroethene
DCP          dichloropropane
DOE          U.S. Department of Energy
DSS          decision support software
.dxf          data exchange format file
EDB          ethylene dibromide
EM           Environmental Management Program (DOE)
EPA          U.S. Environmental Protection Agency
ESRI          Environmental Systems Research Institute
ETV          Environmental Technology  Verification Program
FTP          file transfer protocol
Geo-EAS      Geostatistical Environmental Assessment Software
GIS           geographic information system
GUI          graphical user interface
GSLIB        Geostatistical Software Library (software)
HTML        Hypertext Markup Language
IDW          inverse distance weighting
MB           megabyte
MHz          megahertz
NAMP        National Analytical Management Program (DOE)
NERL         National Exposure Research Laboratory (EPA)
NMERI       New Mexico Engineering Research Institute
ORD          Office of Research and Development (EPA)
ORNL        Oak Ridge  National Laboratory
PCE          perchloroethene or tetrachloroethene
ppm          parts per million
QA           quality assurance
QC           quality control
RAM         random access memory
SADA        Spatial Analysis and Decision Assistance (software)
SCMT        Site Characterization and Monitoring Technology
SVGA        super video graphics adapter
TCA          trichloroethane
TCE          trichlorethene
Tc-99         technetium-99
UTRC        University of Tennessee Research Corporation
                                             xin

-------
VC           vinyl chloride
VOC         volatile organic compound
2-D           two-dimensional
3-D           three-dimensional
                                              xiv

-------
                               Section 1 — Introduction
Background
The U.S. Environmental Protection Agency (EPA)
has created the Environmental Technology
Verification Program (ETV) to facilitate the
deployment of innovative or improved
environmental technologies through performance
verification and dissemination of information. The
goal of the ETV Program is to further environmental
protection by  substantially accelerating the
acceptance and use of improved and cost-effective
technologies.  ETV seeks to achieve this goal by
providing high-quality, peer-reviewed data on
technology performance to those involved in the
design, distribution, financing, permitting, purchase,
and use of environmental technologies.

ETV works in partnership with recognized standards
and testing organizations and stakeholder groups
consisting of regulators, buyers, and vendor
organizations, with the full participation of
individual technology developers. The program
evaluates the performance of innovative
technologies by developing test plans that are
responsive to  the needs of stakeholders, conducting
field or laboratory tests (as appropriate), collecting
and analyzing data, and preparing peer-reviewed
reports. All evaluations are conducted in accordance
with rigorous  quality  assurance (QA) protocols to
ensure that data of known and adequate quality are
generated and that the results are defensible.

ETV is a voluntary program that seeks to provide
objective performance information to all of the
actors in the environmental marketplace and to assist
them in making informed technology decisions.
ETV does not rank technologies or compare their
performance,  label or list technologies as acceptable
or unacceptable, seek to determine "best available
technology," nor approve or disapprove
technologies.  The program does not evaluate
technologies at the bench or pilot scale and does not
conduct or support research.

The program now operates 12 pilots covering a
broad range of environmental areas. ETV has begun
with a 5-year  pilot phase (1995-2000) to test a wide
range of partner and procedural alternatives in
various pilot areas, as well as the true market
demand for and response to such a program. In these
pilots, EPA utilizes the expertise of partner
"verification organizations" to design efficient
processes for conducting performance tests of
innovative technologies.  These expert partners are
both public and private organizations, including
federal laboratories, states, industry consortia, and
private sector facilities. Verification organizations
oversee and report verification activities based on
testing and QA protocols developed with input from
all major stakeholder/customer groups associated
with the technology area. The demonstration
described in this report was administered by the Site
Characterization and Monitoring Technology
(SCMT) Pilot.  (To learn more about ETV, visit
ETV's Web site at http://www.epa.gov/etv.)

The SCMT pilot is  administered by EPA's National
Exposure Research Laboratory (NERL). With the
support of the U.S.  Department of Energy's (DOE's)
Environmental Management (EM) program, NERL
selected a team from Brookhaven National
Laboratory (BNL) and Oak Ridge National
Laboratory (ORNL) to perform the verification of
environmental  decision support software. Decision
support software (DSS) is designed to integrate
measured or modeled data (such as soil  or
groundwater contamination levels) into a framework
that can be used for decision-making purposes.
There are many potential ways to use such software,
including visualization of the nature and extent of
contamination, locating optimum future samples,
assessing costs of cleanup versus  benefits obtained,
or estimating human health risks.  The primary
objective of this demonstration was to conduct an
independent evaluation of each software's capability
to evaluate three common endpoints of
environmental  remediation problems: visualization,
sample optimization, and cost-benefit analysis.
These endpoints were defined as follows.

•   Visualization — using the software to organize
    and display site and contamination data in ways
    that promote understanding of current
    conditions, problems, potential solutions, and
    eventual cleanup choices;
•   Sample optimization  — selecting the minimum
    number of samples needed to define a
    contaminated area within a predetermined
    statistical confidence;

-------
•   Cost-benefit analysis — assessment of either the
    size of the zone to be remediated according to
    cleanup goals, or estimation of human health
    risks due to the contaminants. These can be
    related to costs of cleanup.

The developers were permitted to select the
endpoints that they wished to demonstrate because
each piece of software had unique features and
focused on different aspects of the three endpoints.
Some focused entirely on visualization and did not
attempt sample optimization or cost-benefit analysis,
while others focused on the technical aspects of
generating cost-benefit or sample-optimization
analysis, with a minor emphasis on visualization.
The evaluation of the DSS  focused only on the
analyses conducted during the demonstration. No
penalty was assessed for performing only part of the
problem (e.g., performing only visualization).

Evaluation of software that is used for complex
environmental problems is by necessity primarily
qualitative in nature. It is not meaningful to
quantitatively evaluate how well predictions match
at locations where data have not been collected.
(This is discussed in more detail in Appendix B.) In
addition, the selection of a software product for a
particular application relies heavily on the user's
background, personal preferences (for instance,
some people prefer Microsoft Word, while others
prefer Corel WordPerfect for word processing), and
the intended use of the software (for example,
spreadsheets can be used for managing data;
however, programs specifically designed for
database management would be a better choice for
this type of application). The objective of these
reports is to provide sufficient information to judge
whether the DSS product has the analysis
capabilities and features that will be useful for the
types of problems typically encountered by the
reader.

Demonstration Overview
In September 1998, a demonstration was conducted
to verify the performance of five environmental
software programs: Environmental Visualizations
System (C Tech Development Corp.), Arc View and
associated software extenders  [Environmental
Systems Research Institute (ESRI)], GroundwaterEY"
(Decision/^ Corp.), Sampling/^ (DecisionEY"
Corp.), and SitePro (Environmental Software Corp.).
In October, a sixth software package from the
University of Tennessee Research Corporation
(UTRC), Spatial Analysis and Decision Assistance
(SADA), was tested. This report contains the
evaluation of SAD A.

Each developer was asked to use their own software
to address a minimum of three test problems. In
preparation for the demonstration, ten sites were
identified as having data sets that might provide
useful test cases for the demonstration.  All of this
data received a quality control review to screen out
sites that did not have adequate data sets. After the
review, ten test problems were developed from field
data at six different sites. Each site was given a
unique identifier (Sites A, B, D, N, S, and T). Each
test problem focused on different aspects of
environmental remediation problems. From the
complete data sets, test problems that were subsets
of the entire data set were prepared. The
demonstration technical team performed an
independent analysis of each of the ten test problems
to ensure that the data sets were complete.

All developers were required to choose either Site S
or Site N as one of their three problems because
these sites had the most data available for
developing a quantitative evaluation of DSS
performance.

Each DSS was evaluated on its own merits based on
the  evaluation criteria presented in Section 3.
Because of the inherent variability in soil and
subsurface contamination, most of the evaluation
criteria are qualitative. Even when a direct
comparison is made between the developer's
analysis and the baseline analysis, different
numerical algorithms and assumptions  used to
interpolate data between measured values at known
locations make it almost impossible to  make a
quantitative judgement as to which technical
approach is superior. The comparisons, however, do
permit an evaluation of whether the analysis is
consistent with the data supplied for the analysis and
therefore useful in supporting remediation decisions.

Summary of Analysis Performed by
SADA
SADA is a Windows 95, 98, or NT environmental
software product that incorporates tools from various
fields — including visualization, geospatial analysis,
statistical analysis, human health risk assessment,
cost-benefit analysis, sampling design,  and decision
analysis — into an interactive environment. SADA
relies mainly  on statistical and geostatistical
algorithms to quantify the nature and extent of
uncertainties in environmental data and various

-------
cost-risk methods to provide objective guidance on
key decision analysis needs. SADA provides the
information in a visual form, as two-dimensional
(2-D) and three-dimensional (3-D) graphics, to assist
the user in data interpretation and provides statistical
information about the contamination (e.g., area or
volume of contamination, standard deviation,
probability  of exceeding cleanup goals).

UTRC staff chose to use SADA to perform all three
endpoints (visualization, sample optimization, and
cost-benefit analysis) using data from the Site N
sample optimization problem, the Site N cost-benefit
problem, and the Site S cost-benefit problem.
Visualization results were presented on all three
problems. The three problems were analyzed using a
statistical approach that permitted the evaluation to
be defined in terms of probability of exceeding a
threshold in risk or concentration.

The Site N  sample optimization problem involved
soil contamination from three heavy metals —
arsenic, cadmium, and chromium. Data were
provided from a small section of the 125-acre site,
and the analyst was required to define contamination
throughout  the site using only 80 additional samples.
SADA was used in an iterative fashion to select a
few sample locations for further data collection. This
information was used to generate the next set of
sample locations, and the process continued until 80
sample locations had been specified. Using the final
data set, SADA generated contaminant concentration
contour maps and remediation zone maps based on
contaminant threshold levels and the probability
level of the interpolation results. These maps were
overlain with site features (roads and surface
waterways).

The Site N  cost-benefit problem also involved soil
contamination from the three heavy metals. SADA
was used to generate maps for each contaminant
with sample locations color-coded by concentration
and suggested cleanup areas for each contaminant at
two threshold levels. These cleanup maps were
calculated based on the probability of exceeding the
threshold concentration for two probability levels.
SADA was then used to produce cost-benefit curves
of remediation cost vs cleanup goal for each
contaminant at the two probability levels. Finally,
SADA was used to estimate human health risks
based on current conditions for all contaminants.
The risks were summed to obtain total risk, and
SADA generated maps of human health risk.  In
addition, SADA generated maps with sample points
that had contamination levels exceeding a threshold
health risk inscribed in a square to facilitate location
of these points.

The Site S cost-benefit problem involved
groundwater contamination by chlordane. SADA
was used to define the 3-D volume of groundwater
contamination above specified contamination levels
and to estimate human health risks from drinking the
contaminated water. SADA was used to divide the
data into a series of 5-ft vertical strata based on
depth below ground surface. The data were analyzed
in these strata to generate a series of 2-D maps for
concentration contours, cleanup zones based on
threshold concentration levels, and carcinogenic and
noncarcinogenic health risks.  The maps were
prepared at two probability levels.  The 2-D
representations of each stratum were combined to
provide a 3-D depiction of the concentration,
cleanup zone, and risks.

Section 2 contains a brief description of the
capabilities of SADA.  Section 3 outlines the
approach used to develop the test problems, a
summary description of the ten test problems, the
approach used to perform the baseline analyses  used
for comparison with the developers analyses,  and the
evaluation criteria. Section 4 presents the technical
review of the analyses performed by SADA. This
includes a detailed discussion of the problems
attempted, comparisons of the SADA analyses and
the baseline results, and an evaluation of SADA
against the  criteria established in Section 3. Section
5 presents an update on the SADA technology and
provides examples of representative applications of
SADA in environmental problem-solving.

-------
                  Section 2 — SADA Technology Description
The following section provides a general overview
of the capabilities of UTRC's SADA software
product. The information was supplied by UTRC.

SADA Features
Spatial Analysis Decision Assistance (SADA) is an
environmental software product for Windows NT 4
(Service Pack 4 or higher) and Windows 95/98 that
incorporates tools from various fields — including
visualization, geospatial analysis, statistical analysis,
human health risk assessment, cost-benefit analysis,
sampling design, and decision analysis — into a
dynamic and interactive environment. Each of these
modules can be used independently or in an
integrated fashion to address site-specific concerns
in the characterization and remedial action design.
SADA was designed to simplify and streamline
several of the processes in environmental
characterization and to bring the information
together in a way that can help users make decisions
about their particular site in a quick and cost-
effective manner. SADA may be found useful by
anyone who needs to look at data within a spatial
context. These users include

•  statisticians,
•  human health risk assessors,
•  CIS/visualization users,
•  project managers, and
•  stakeholders.

SADA was developed by UTRC and can be
accessed through the SADA website at
http://www.sis.utk.edu/cis/SADA. Technical
assistance is contained  in SADA's on-line help and
on the SADA website.  Formal training modules are
currently being developed.

SADA provides 2-D and 3-D visualization. The
visualization techniques in SADA were designed to
be simple to use and easy to understand and to
facilitate the data-exploration, modeling, and
decision analysis components. Two-dimensional
information is presented as simple xy plots. Three-
dimensional information is presented through two
different methods. The first is by 2-D slices (layers)
in the third dimension.  The user can easily set the
depth of each of these layers.  The layer approach,
while not a true 3-D visualization, provides a way to
see results quickly in daily application. In addition,
environmental data are often categorized by depth
(e.g., surface, subsurface, 0-1 ft, 0-2 ft) during
remedial investigations, and SADA was designed to
fit into this type of framework. The second method
allows the user to view true 3-D volume rendering.
All the standard methods for viewing 3-D
information are available.

SADA can accept any map layer from a geographic
information system (GIS) if saved in a data
exchange format (.dxf) file. Multiple layers can be
imported into SADA, and the user can  control the
layer order and coloring scheme.

In addition, the user can select a subregion of the site
to conduct the analysis. This region is defined by a
user-defined polygon with only the interior region
considered in the analysis.

SADA provides methods for quick and easy data
exploration. Tools include statistical analysis, visual
database queries, and basic data screening exercises.
All these tools can be applied to the  entire site or to
any subset of the site. Similarly, they may be applied
to all or  some of the contaminants. In addition, the
user may select any region of the site and
immediately view the human health risk results for
that region.

For most applications, minimum system
requirements for SADA are a Pentium  computer
with 32 MB of RAM, a clock speed  of 120 MHz, a
disk drive with 50 MB office space, and a super
video graphics adapter (SVGA) monitor.  For more
involved modeling, particularly for 3-D geospatial
models,  a higher-performance computer is
recommended. As an example, SADA has
performed well under these conditions  on a
266-MHz Pentium Pro with 128 MB of RAM and
100 MB office disk space. To visualize true 3-D
volumes a minimum of 16 million colors is required.

SADA Assessment Capabilities
Human Health Risk
SADA provides the user with a full human health
risk assessment module and associated databases.
The risk models follow EPA's Risk Assessment
Guidance for Superfund(EPA 1989) and can be

-------
customized to fit site-specific exposure conditions.
Updated toxicological databases and default scenario
parameters can be downloaded over the web directly
from SADA.  For radioactive and nonradioactive
contaminants, SADA simulates five land-use
scenarios (residential,  industrial, agricultural,
recreational, and excavation) and five exposure
pathways [ingestion, inhalation, dermal contact,
external (radiation), and food consumption]. The
exposures resulting from different pathways and
contaminants can be summed to provide total
exposure from all contaminants.

Geospatial Analysis
SADA provides several tools for performing
geospatial analysis. These include methods for
measuring spatial correlation among data, modeling
spatial correlation,  and producing concentration,
risk, probability, variance, and cleanup maps.
Among these tools are four geospatial interpolators:
ordinary kriging, indicator kriging, inverse distance,
and nearest neighbor. With these tools, the user can
generate concentration-contour, probability, risk,
and remedial design maps.

Decision Analysis
SADA's decision support tools include cost-benefit
analysis, defining areas of concern, and sampling
optimization. SADA produces cost-benefit curves
that demonstrate the relationship between the
cleanup goal (concentration- or risk-based) and the
cost of remediation. Based on the decision rule,
SADA estimates the location of areas of concern.
The decision rule includes components such as the
cleanup goal, the level of confidence, and whether
the goal applies to the entire site or any part of the
site. These areas of concern can then serve as a basis
for remedial action design. SADA allows the user to
choose from a variety of strategies for determining
where to collect data in the next round of sampling.
Depending on the chosen geospatial interpolator, the
following five strategies are available: adaptive fill,
estimate rank, variance rank, percentile rank, and
uncertainty rank.

-------
              Section 3 — Demonstration Process and Design
Introduction
The objective of this demonstration was to conduct
an independent evaluation of the capabilities of
several DSSs in the following areas: (1) effective-
ness in integrating data and models to produce
information that supports decisions pertaining to
environmental contamination problems, and (2) the
information and approach used to support the
analysis. Specifically, three endpoints were
evaluated:

•   Visualization — Visualization software was
    evaluated in terms of its ability to integrate site
    and contamination data in a coherent and
    accurate fashion that aids in understanding the
    contamination problem. Tools used in
    visualization can range from data display in
    graphical or contour form to integrating site
    maps and aerial photos into the results.
•   Sample optimization — Sample optimization
    was evaluated for soil and groundwater
    contamination problems in terms of the
    software's ability to select the minimum number
    of samples needed to define a contaminated
    region with a specified level of confidence.
•   Cost-benefit analysis — Cost-benefit analysis
    involved either defining the size of remediation
    zone as a function of the cleanup goal or
    evaluating the potential human health risk. For
    problems that defined the contamination zone,
    the cost could be evaluated in terms of the size
    of the zone, and cost-benefit analysis could be
    performed for different cleanup levels or
    different statistical confidence levels. For
    problems that calculated human health risk, the
    cost-benefit calculation would require
    computing the cost to remediate the
    contamination as a function of reduction in
    health risk.

Secondary evaluation objectives for this
demonstration were to examine the reliability,
resource requirements, range of applicability, and
ease of operation of the DSS. The developers
participated in this demonstration in order to
highlight the range and utility of their software in
addressing the three endpoints discussed above.
Actual users might achieve results that  are less
reliable, as reliable, or more reliable than those
achieved in this demonstration, depending on their
expertise in using a given software to solve
environmental problems.

Development of Test Problems
Test Problem Definition
A problem development team was formed to collect,
prepare, and conduct the baseline analysis of the
data. A large effort was initiated to collect data sets
from actual sites with an extensive data collection
history. Literature review and contact with different
government agencies (EPA field offices, DOE, the
U.S. Department of Defense, and the U.S.
Geological Survey) identified ten different sites
throughout the U.S which had the potential for
developing test problems for the demonstration. The
data from these ten sites were screened for
completeness of data, range of environmental
conditions covered, and potential for developing
challenging and defensible test problems for the
three endpoints of the demonstration. The objective
of the screening was to obtain a set of problems that
covered a wide range of contaminants (metals,
organics, and radionuclides), site conditions, and
source conditions (spills, continual slow release, and
multiple releases over time). On the basis of this
screening,  six sites were selected for development of
test problems. Of these six sites, four had sufficient
information to provide multiple test problems. This
provided a total often test problems for use  in the
demonstration.

Summary of Test Problems
A detailed description of the ten test problems was
supplied to the developers as part of the
demonstration (Sullivan, Armstrong, and Osleeb
1998).  A general description of each of the problems
can be found in Appendix A. This description
includes the operating history of the site, the
contaminants of concern, and the objectives of the
test problem (e.g., define the volume over which the
contaminant concentration exceeds 100 |Jg/L). The
test problems analyzed by UTRC are discussed in
Section 4 as part of the evaluation of SADA's
performance.

Table 1 summarizes the ten problems by site
identifier, location of contamination (soil or
groundwater), problem endpoints, and contaminants

-------
 Table 1.  Summary of test problems
Site identifier
A
A
B
D
N
N
S
S
T
T
Media
Groundwater
Groundwater
Groundwater
Groundwater
Soil
Soil
Groundwater
Groundwater
Soil
Groundwater
Problem endpoints
Visualization, sample optimization
Visualization, cost-benefit
Visualization, sample optimization,
cost-benefit
Visualization, sample optimization,
cost-benefit
Visualization, sample optimization
Visualization, cost-benefit
Visualization, sample optimization
Visualization, cost-benefit
Visualization, sample optimization
Visualization, cost-benefit
Contaminants
Dichloroethene, trichloroethene
Perchloroethene, trichloroethane
Trichloroethene, vinyl-chloride,
technetium-99
Dichloroethene, dichloroethane,
trichloroethene, perchloroethene
Arsenic, cadmium, chromium
Arsenic, cadmium, chromium
Carbon tetrachloride
Chlordane
Ethylene dibromide,
dibromochloropropane, dichloropropane,
carbon tetrachloride
Ethylene dibromide,
dibromochloropropane, dichloropropane,
carbon tetrachloride
of concern. The visualization endpoint could be
performed on all ten problems. In addition, there
were four sample optimization problems, four cost-
benefit problems, and two problems that combined
sample optimization and cost-benefit issues. The
range of contaminants considered included metals,
volatile organic compounds (VOCs), and
radionuclides. The range of environmental
conditions included two- and three-dimensional soil
and groundwater contamination problems over
varying geologic, hydrologic, and environmental
settings. Table 2 provides a summary  of the types  of
data supplied with each problem.

Analysis of Test Problems
Prior to the demonstration, the demonstration
technical team performed a quality control
examination of all data sets and test problems. This
involved reviewing database files for improper data
(e.g., negative concentrations), removing
information that was not necessary for the
demonstration (e.g., site descriptors), and limiting
the data to the contaminants, the region of the site,
and the time frame covered by the test problems
(e.g., only data from one year for three
contaminants).  For sample optimization problems, a
limited data set was prepared for the developers as  a
starting point for the analysis. The remainder of the
data was reserved to provide input concentrations to
developers for their sample optimization analysis.
For cost-benefit problems, the analysts were
provided with an extensive data set for each test
problem with a few data points reserved for
checking the DSS analysis. The data quality review
     Table 2. Data supplied for test problems
Site history
Surface structure
Sample locations
Contaminants
Geology
Hydrogeology
Transport parameters
Human health risk
Industrial operations, environmental settings, site descriptions
Road and building locations, topography, aerial photos
x, y, z coordinates for
soil surface samples
soil borings
groundwater wells
Concentration data as a function of time and location (x, y, and z) for
metals, inorganics, organics, radioactive contaminants
Soil boring profiles, bedrock stratigraphy
Hydraulic conductivities in each stratigraphic unit; hydraulic head
measurements and locations
Sorption coefficient (Kd), biodegradation rates, dispersion
coefficients, porosity, bulk density
Exposure pathways and parameters, receptor location

-------
also involved importing all graphics files (e.g., .dxf
and .bmp) that contained information on surface
structures such as buildings, roads, and water bodies
to ensure that they were readable and useful for
problem development. Many of the drawing files
were prepared as ESRI shape files compatible with
ArcView™. ArcView was also used to examine the
graphics files.

Once the quality control evaluation was completed,
the test problems were developed. The test problems
were designed to be manageable within the time
frame of the demonstration and were often a subset
of the total data set.  For example, in some cases, test
problems were developed for a selected region of the
site. In other cases, the database could have
contained information for tens of contaminants,
while the test problems themselves were limited to
the three or four principal contaminants. At some
sites, data were available over time periods
exceeding 10 years.  For the DSS test problems, the
analysts were typically supplied chemical and
hydrologic data for a few sampling periods.

Once the test problems were developed, the
demonstration technical team conducted a complete
analysis of each test problem. These analyses served
as the baseline for evaluating results from the
developers. Each analysis consisted of taking the
entire data set and obtaining an estimate of the
plume boundaries for the specified threshold
contaminant concentrations and estimating the area
of contamination above the specified thresholds for
each contaminant.

The independent data analysis was performed using
Surfer™ (Golden Software 1996). Surfer was
selected for the task because it is a widely  used,
commercially available software package with the
functionality necessary to examine the data. This
functionality includes the ability to import drawing
files to use as layers in the map, and the ability to
interpolate  data in two dimensions.  Surfer has eight
different interpolation methods, each of which can
be customized by changing model parameters, to
generate contours. These different contouring
options were used to generate multiple views of the
interpolated regions  of contamination  and
hydrologic information. The best fit to the data was
used as the baseline  analysis. For 3-D problems, the
data were grouped by elevation to provide a series of
2-D slices of the problem. The distance between
slices ranged between 5 and 10 ft depending on the
availability of data.  Compilation of vertical slices
generated 3-D depictions of the data sets.
Comparisons of the baseline analysis to the SADA
results are presented in Section 4.

In addition to Surfer, two other software packages
were used to provide an independent analysis of the
data and to provide an alternative representation for
comparison with the Surfer results. The
Geostatistical Software Library Version 2.0 (GSLIB)
and Geostatistical Environmental Assessment
Software Version 1.1 (Geo-EAS) were selected
because both provide enhanced geostatistical
routines that assist in data exploration and selection
of modeling parameters to provide extensive
evaluations of the data from a spatial context
(Deutsch and Journel 1992; Englund and Sparks
1991). These three analyses provide multiple lines of
reasoning, particularly for the test problems that
involved geostatistics. The results from Surfer,
GSLIB, and Geo-EAS were compared and
contrasted to determine the best fit of the data, thus
providing a more robust baseline analysis for
comparison to the developers' results.

Under actual  site conditions, uncertainties and
natural variability make it impossible to define
plume boundaries exactly. In these case studies, the
baseline analyses serve as a guideline for evaluating
the accuracy of the analyses prepared by the
developers. Reasonable agreement should be
obtained between the baseline and the developer's
results. A discussion of the technical approaches and
limitations to estimating physical properties at
locations that are between data collection points is
provided in Appendix B.

To minimize problems in evaluating the software
associated with uncertainties in the data, the
developers were required to perform an analysis of
one problem from either Site N or Site S. For Site N,
with over 4000 soil contamination data points, the
baseline analysis reflected the actual site conditions
closely; and if the developers performed an accurate
analysis, the correlation between the two should be
high. For Site S, the test problems used actual
contamination data as the basis for developing a
problem with a known solution. In both Site S
problems, the data were modified to simulate a
constant source term to the aquifer in which the
movement of the contaminant can be described by
the classic advective-dispersive transport equation.
Transport parameters were based on the actual data.
These assumptions permitted release to the aquifer
and subsequent transport to be represented by a

-------
partial differential equation that was solved
analytically. This analytical solution could be used
to determine the concentration at any point in the
aquifer at any time. Therefore, the developer's
results can be compared against calculated
concentrations with known accuracy.

After completion of the development of the ten test
problems, a predemonstration test was conducted. In
the predemonstration, the developers were supplied
with a problem taken from Site D that was similar to
test problems for the demonstration. The objective of
the predemonstration was to provide the developers
with a sample problem with the level of complexity
envisioned for the demonstration. In addition, the
predemonstration allowed the developers to process
data from a typical problem in advance of the
demonstration and allowed the demonstration
technical team to determine if any problems
occurred during data transfer or because of problem
definition. The results of the predemonstration were
used to refine the problems used in the
demonstration.


Preparation of Demonstration Plan
In conjunction with the development of the test
problems, a demonstration plan (Sullivan and
Armstrong  1998) was prepared to ensure that all
aspects of the demonstration were documented and
scientifically sound and that operational procedures
were conducted within quality assurance
(QA)/quality control (QC) specifications. The
demonstration plan covered

•   the roles and responsibilities of demonstration
    participants;
•   the procedures governing demonstration
    activities such as data collection to define test
    problems and data preparation, analysis, and
    interpretation;
•   the experimental design of the demonstration;
•   the evaluation criteria against which the DSS
    would be judged; and
•   QA and QC procedures for conducting the
    demonstration and for assessing the quality of
    the information generated from the
    demonstration.

All parties involved with implementation of the plan
approved and signed the demonstration plan prior to
the start of the demonstration.
Summary of Demonstration
Activities
On September 14-25, 1998, the Site Characteri-
zation and Monitoring Technology Pilot, in
cooperation with DOE's National Analytical
Management Program, conducted a demonstration to
verify the performance of five environmental DSS
packages. The demonstration was conducted at the
New Mexico Engineering Research Institute,
Albuquerque, New Mexico. An additional software
package was tested on October 26-29, 1998, at
Brookhaven National Laboratory, Upton, New York.

The first morning of the demonstration was devoted
to a brief presentation of the ten test problems, a
discussion of the output requirements to be provided
from the developers for evaluation, and transferring
the data to the developers. The data from all ten test
problems — along with a narrative that provided a
description of the each site, the problems to be
solved, the names of data files, structure of the data
files, and a list of output requirements — were given
to the developers. The developers were asked to
address a minimum of three test problems for each
software product.

Upon completion of the review of the ten test
problems and the discussion of the outputs required
from the developers, the developers received data
sets for the problems by file transfer protocol (FTP)
from a remote server or on a high-capacity
removable disk. Developers downloaded the data
sets to their own personal computers, which they had
supplied for the demonstration. Once the data
transfers of the test problems were complete and the
technical team had verified that each developer had
received the data sets intact, the developers were
allowed to proceed with the analysis at their own
pace. During the demonstration, the technical team
observed the developers, answered questions, and
provided data as requested by the developers for the
sample optimization test problems. The developers
were given 2 weeks to complete the analysis for the
test problems that they selected.

The third day of the demonstration was visitors' day,
an open house during which people interested in
DSS could learn about the various products being
tested. During the morning of visitors' day,
presenters from EPA, DOE, and the demonstration
technical team outlined the format and content of the
demonstration. This was followed by a presentation

-------
from the developers on the capabilities of their
respective software products. In the afternoon,
attendees were free to meet with the developers for a
demonstration of the  software products and further
discussion.

Prior to leaving the test facility, the developers were
required to provide the demonstration technical team
with the final output files generated by their
software. These output files were transferred by FTP
to an anonymous server or copied to a zip drive or
compact disk-read only memory (CD-ROM). The
technical team verified that all files generated by the
developers during the demonstration were provided
and intact. The developers were given a 10-day
period after the  demonstration to provide a written
narrative of the work that was performed and a
discussion of their results.

Evaluation Criteria
One important objective of DSS is to integrate data
and models to produce information that supports an
environmental decision. Therefore, the  overriding
performance goal  in this demonstration was to
provide a credible analysis. The credibility of a
software and computer analysis is built  on four
components:

•   good data,
•   adequate and reliable  software,
•   adequate conceptualization of the site, and
•   well-executed problem analysis (van der Heijde
    and Kanzer 1997).

In this demonstration, substantial efforts were taken
to evaluate the data and remove data of poor quality
prior to presenting it to the developers.  Therefore,
the developers were directed to assume that the data
were of good quality. The technical team provided
the developers with detailed site maps and test
problem instructions on the requested analysis and
assisted in site conceptualization. Thus, the
demonstration was primarily to test the adequacy of
the software and the skills of the analyst. The
developers operated their own software on their own
computers throughout the  demonstration.

Attempting to define  and measure  credibility makes
this demonstration far different from most
demonstrations in the ETV program in which
measurement devices are evaluated. In the typical
ETV demonstrations, quality  can be measured in a
quantitative and statistical manner. This is not true
for DSS. While there are some quantitative
measures, there are also many qualitative measures.
The criteria for evaluating the DSS's ability to
support a credible analysis are discussed below. In
addition a number of secondary objectives, also
discussed below, were used to evaluate the software.
These included documentation of software, training
and technical support, ease of use of the software,
efficiency, and range of applicability.

Criteria for Assessing Decision
Support
The developers were asked to use their software to
answer questions pertaining to environmental
contamination problems. For visualization tools,
integration of geologic data, contaminant data,  and
site maps to define the contamination region at
specified concentration levels was requested. For
software tools that address sample optimization
questions, the developers were asked to suggest
optimum sampling locations, subject to constraints
on the number of samples or  on the confidence with
which contamination concentrations were known.
For software tools that address cost-benefit
problems, the developers were asked either to define
the volume (or area) of contamination and, if
possible, supply the  statistical confidence with
which the estimate was made, or to estimate human
health risks resulting from exposure to the
contamination.

The criterion for evaluation was the credibility of the
analyses to support the decision. This evaluation was
based on several points, including

•   documentation of the use of the models, input
    parameters, and  assumptions;
•   presentation of the results in a clear and
    consistent manner;
•   comparison of model results with the data and
    baseline analyses;
•   evaluation of the use of the models; and
•   use of multiple lines of reasoning to support the
    decision.

The following sections provide more detail on each
of these topics.

Documentation of the Analysis and
Evaluation of the Technical Approach
The developers were requested to supply  a concise
description of the objectives of the analysis, the
procedures used in the analysis, the conclusions of
                                                 10

-------
the analysis with technical justification of the
conclusions, and a graphical display of the results of
the analysis. Documentation of key input parameters
and modeling assumptions was also requested.
Guidance was provided on the quantity and type of
information requested to perform the evaluation.

Based on observations obtained during the
demonstration and the documentation supplied by
the developers, the use of the models was evaluated
and compared to standard practices. Issues in proper
use of the models include selection of appropriate
contouring parameters, spatial and temporal
discretization, solution techniques, and parameter
selection. This evaluation was performed as a QA
check to determine if standard practices were
followed. This evaluation was useful in determining
whether the cause of discrepancies between model
projections and the data resulted from operator
actions or from the model itself and was
instrumental in understanding the role of the
operator in obtaining quality results.

Comparison of Projected Results with the
Data and Baseline Analysis
Quantitative comparisons between DSS-generated
predictions and the data or baseline analyses were
performed and evaluated. In addition, DSS-
generated estimates of the mass and volume of
contamination were compared to the baseline
analyses to evaluate the ability of the software to
determine the extent of contamination. For
visualization and cost-benefit problems, developers
were given a detailed data set for the test problem
with only a few data points held back for checking
the consistency of the analysis. For sample
optimization problems, the developers were
provided with a limited data set to begin the
problem. In this case, the data not supplied to the
developers were used for checking the accuracy of
the sample optimization  analysis. However, because
of the inherent variability in environmental  systems
and the  choice of different models and parameters by
the analysts, quantitative measures of the accuracy
of the analysis are difficult to obtain and defend.
Therefore, qualitative evaluations of how well the
model projections reproduced the trends in the data
were also performed.

A major component of the analysis of environmental
data sets involves predicting physical or chemical
properties (contaminant concentrations, hydraulic
head, thickness of a geologic layer, etc.) at locations
between measured data. This process, called
interpolation, is often critical in developing an
understanding of the nature and extent of the
environmental problem. The premise of interpolation
is that the estimated value of a parameter is a
weighted average of measured values  around it.
Different interpolation routines use different criteria
to select the weights. Due to the importance of
obtaining estimates of data between measured data
points in many fields of science, a wide number of
interpolation routines exist. Three classes of
interpolation routines commonly used in
environmental analysis are nearest neighbor, inverse
distance, and kriging. These three classes of
interpolation, and their strengths and limitations, are
discussed in detail in Appendix B.

Use of Multiple Lines of Reasoning
Environmental decisions are often made with
uncertainties because of an incomplete
understanding of the problem and lack of
information, time, and/or resources. Therefore,
multiple lines of reasoning are valuable in obtaining
a credible analysis. Multiple lines of reasoning may
incorporate statistical analyses, which in addition to
providing an answer provide an estimate of the
probability that the answer is correct.  Multiple lines
of reasoning may also incorporate alternative
conceptual models or multiple simulations with
different parameter sets. The DSS packages were
evaluated on their capabilities to provide multiple
lines of reasoning.

Secondary Evaluation Criteria
Documentation of Software
The software was evaluated in terms of its
documentation. Complete documentation includes
detailed instructions on how to use the software
package, examples of verification tests performed
with the software package, a discussion of all output
files generated by the software package, a discussion
of how the output files may be used by other
programs (e.g., ability to be directly imported into an
Excel spreadsheet), and an explanation of the theory
behind the technical approach used in the software
package.

Training and Technical Support
The developers were asked to list the  necessary
background knowledge necessary to successfully
operate the software package (i.e., basic
understanding of hydrology, geology, geostatistics,
etc.) and the auxiliary software used by the software
package (e.g., Excel). In addition, the  operating
                                                 11

-------
systems (e.g., Unix, Windows NT) under which the
DSS can be used was requested. A discussion of
training, software documentation, and technical
support provided by the developers was also
required.

Ease of Use
Ease of use is one of the most important factors to
users of computer software. Ease of use was
evaluated by an examination  of the software
package's operation and on the basis of adequate on-
line help, the availability of technical support, the
flexibility to change input parameters and databases
used by the software package, the time required for
an experienced user to set up the model and prepare
the analysis (that is,  input preparation time, time
required to run the simulation, and the time required
to prepare graphical output).

The demonstration technical team observed the
operation of each software product during the
demonstration to assist in determining the ease  of
use. These observations documented operation and
the technical skills required for operation. In
addition, several members of the technical team
were given a 4-hour tutorial by each developer on
their respective software to gain an understanding of
the training level required for software operation as
well as the functionalities of each software.

Efficiency and Range of Applicability
Efficiency was evaluated on the basis of the
resources that were necessary to evaluate the test
problems. This was assessed through the number of
problems completed as a function of time required
for the analysis and computing capabilities.

Range of applicability is defined as  a measure of the
software's ability to represent a wide range of
environmental conditions and was evaluated through
the range of conditions over which the software was
tested and the number of problems analyzed.
                                                12

-------
                            Section 4 — SADA Evaluation
SADA Technical Approach
For sample optimization and quantification of
uncertainties in predicted values, the technical
approach applied in SADA is based on geostatistics.
Geostatistical methods are based on the premise that
measured variables located close to each other will
have similar values, while variables far apart will
have little correlation between their corresponding
values. A statistical measure for this
interrelationship is summarized by the correlation
between measured variables measured at different
points in space. This measure or related measures,
such as the variogram and covariance, form the
central idea around which linear estimation methods
in geostatistics operate. The use of correlation
measures also separates this estimation method from
other interpolation algorithms such as inverse
distance, nearest neighbor, linear interpolation,
splines, and quadrature methods. Using a statistical
estimator allows the estimation error to be calculated
along with the estimate. Thus, a geostatistical
method provides both the most likely value and an
estimate of the range  of other possible values for a
given location. This is important information
because the spatial variability present in most
variables is such that  error-free estimation is not
possible. In fact, often there are many possible
solutions to the estimation problem that agree with
the measurements (Appendix B). Ordinary and
indicator kriging, which are estimation methods
available in SADA, represent the more  common
geostatistical methods used to provide smoothed
estimates of variables.

For human-health risk analysis, SADA has
toxicologicalexposure databases and databases for
five land-use scenarios and multiple exposure
pathways. Default values for the different land-use
scenarios (residential, agricultural, recreational, and
excavation) and exposure pathways (ingestion,
inhalation, dermal contact, external radiation, and
food consumption)  used in the risk assessment
calculations follow EPA guidance (EPA 1989) and
can be modified by the user for site-specific
applications. The exposure concentrations used in
the risk assessment can be based on interpolated
estimates of the measured values and adjusted for
probability  level.
SADA Implementation of
Geostatistical Approach
SADA imports measured data, defines a grid (i.e.,
divides the area of concern into a number of 2-D
rectangular blocks), provides algorithms to calculate
the spatial correlation of the data in 2-D (i.e.,
generates a variogram), and from the variogram
obtains estimates for the parameters necessary for
kriging interpolation of the data. The kriging process
provides an estimate of the most likely value of the
variable and a statistical measure of the variability
expected at that location. SADA help files provide
guidance for calculating the spatial correlation and
kriging parameters. For 3-D problems, the data is
sliced into vertical sections and a 2-D analysis is
performed for each vertical layer and then
summarized into 3-D images.

In estimating the volume of contaminated media that
contains contamination at levels above the cleanup
level as a function of probability levels, SADA
performs an  analysis for each vertical layer defined
by the analyst. In this approach,  SADA determines
the contamination volume as a function of
probability by using one of the two available
geostatistical interpolation algorithms (indicator
kriging or ordinary kriging) to calculate the nominal
value and associated standard deviation at every
location in the model. For the case of the 90%
probability level, at each model location an estimate
of the concentration is obtained such that 90% of the
time, the actual  value is expected to be greater than
the estimated value for that nominal concentration
and standard deviation. Thus, the resulting estimate
of contamination is the region in which there is a
90% probability that contamination region is at least
that large.  The approach used in SADA is consistent
with the EPA data quality objectives guidance (EPA
1994).

The objective in sample optimization is to collect
samples at the locations that will provide the
maximum amount of information to define the
extent of contamination. Depending on the
interpolation scheme used, five schemes may be
used to select sample locations: adaptive fill,
estimate rank, variance rank, percentile rank, and
uncertainty rank. Adaptive fill is useful during early
                                                 13

-------
stages of analysis when data are sparsely located and
the statistics of the data are not well defined. As
more data are collected, the other techniques, which
use the data and its statistical properties, can be used
to assist in selection of sample locations.

Description  of Test Problems
SADA is an environmental decision support
software product that incorporates tools from various
fields — including visualization, geospatial analysis,
statistical analysis, human health risk assessment,
cost-benefit analysis, sampling design, and decision
analysis — into a dynamic and  interactive
environment. UTRC used SADA to analyze
problems for Sites N and S; SADA addressed all
three endpoints of the demonstration. As part of the
demonstration, several dozen visualization outputs
were generated. A few examples that display the
range of SADA's capabilities and features are
included in this report. A general description of each
test problem and the analysis performed using
SADA follows. Detailed descriptions of all test
problems are provided in Appendix A and in
Sullivan, Armstrong, and Osleeb (1998).

Site N Sample  Optimization Problem
The  objective of the Site N sample optimization
problem was to challenge the software's ability to
develop a sample optimization scheme to
characterize a 125-acre site. The Site N data set
contained the most extensive and reliable data set for
evaluating the accuracy of the analysis for a soil
contamination problem. To focus only on the
accuracy of the soil sample optimization analysis,
the test problem was simplified by removing
information regarding groundwater contamination at
this site and by limiting the problem to three
contaminants.

This test problem considers surface soil
contamination (2-D) for three contaminants —
arsenic (As), cadmium (Cd), and chromium (Cr).The
analyst was given an extensive data set for a small
region of the site (<10 acres) that was highly
contaminated and asked to develop a sample
optimization scheme to define the extent of
contamination on the entire site as defined by two
threshold concentrations for each contaminant
(Table 3). Budgetary restraints limited the number of
additional sample locations to 80. Because of the
limited number of samples, the analyst was asked to
supply estimates  of the extent of contamination
based on the confidence in their results.

SADA was used  to perform an iterative analysis in
which several suggested sample locations were
requested. Data from these locations were supplied
to the analyst, and this information was used to
define the next set of sample locations. The process
continued until data at the 80 additional sample
locations were provided. SADA was used to produce
a site map with all sample locations color-coded by
concentration, concentration contour maps for each
contaminant, and maps of the probability of
exceeding the arsenic threshold concentration of
500 mg/kg.

Site N Cost-Benefit Problem
The objective of  the Site N cost-benefit problem was
to challenge the software's ability to perform cost-
benefit analysis as defined in terms of area of
contaminated soil above two threshold
concentrations for three contaminants. The Site N
data set contained the most extensive and reliable
data set for  evaluating the accuracy of the analysis
for a soil contamination problem. To focus only on
the accuracy of the soil cost-benefit analysis, the
problem was simplified by removing information
regarding groundwater contamination at this  site and
by limiting  the problem to three contaminants.

This test problem considers surface soil
contamination (2-D) for three contaminants — As,
Cd, and Cr. The developers were given an extensive
        Table 3.  Site N soil contamination threshold concentrations for the sample
                  optimization problem
Contaminant
Arsenic (As)
Cadmium (Cd)
Chromium (Cr)
Minimum threshold
concentration
(mg/kg)
125
70
370
Maximum threshold
concentration
(mg/kg)
500
700
3700
                                                14

-------
data set for a small region of the site and asked to
conduct a cost-benefit analysis to evaluate the area
and cost for remediation to achieve specified
threshold concentrations provided in Table 4.

SADA estimated the areal extent of the soil
contamination by taking the supplied contaminant
concentration data and interpolating over the region.
The following output was generated for this
problem:

•   for each contaminant (As, Cd, and Cr), a map
    with roads, water bodies, and buildings overlain
    with concentration contours at the specified
    threshold concentrations;
•   a map with the contaminated areas defined as a
    function of the probability of exceeding
    specified threshold concentrations at defined
    probability levels of 10, 50, and 90%;
•   an estimate of the area of contamination of each
    contaminant above its minimum threshold
    concentration at defined probability levels of 10,
    50, and 90%;
•   curves  of remediation costs for each
    contaminant's minimum threshold concentration
    and probability levels of 10, 50, and 90%; and
•   noncarcinogenic and carcinogenic risk maps for
    each contaminant as well as summed
    noncarcinogenic and carcinogenic risk maps
    (i.e., sum of risks from As, Cd, and Cr).

Site S Cost-Benefit Problem
The Site S  cost-benefit problem was designed as a
method for assessing the accuracy with which the
software can predict the volume and area of
contamination to assist in cost-benefit analysis as a
function of cleanup goals. Site S contains the most
extensive and reliable data set for evaluating the
accuracy of the analysis for a 3-D groundwater
problem. To focus only on the accuracy of the
analysis, the problem was simplified by removing
information regarding surface structures (e.g.,
buildings and roads) and selecting only one
contaminant for the test problem.

This test problem was a 3-D groundwater
contamination cost-benefit problem for a single
contaminant, chlordane. The data consisted of a
series of wells containing chlordane concentrations
as a function of depth. The analyst was asked to
define the region, mass, and volume of the plume at
contamination concentrations of 5 and 500 |Jg/L.
The analysis could be extended to include definition
of the plume volumes as a function of three
probability levels, 10, 50, and 90%.

SADA estimated the extent of the plume by dividing
the data into strata that were 5 ft thick and analyzing
the contaminant data in each stratum.  Then, each
stratum was combined to produce 3-D depictions of
the chlordane contamination. The analysis by SADA
produced the 2-D region contaminated by chlordane
as well as 3-D plume maps. SADA was used to
generate the following output for this problem:

•   graphics of chlordane contours for each 5-ft
    interval (2-D depiction);
•   a 3-D chlordane contour plume map;
•   chlordane-concentration contour maps for
    specified threshold concentrations of 5 and
    500 ng/L;
•   maps (2-D and 3-D) indicating areas where the
    5-|jg/L chlordane threshold concentration was
    exceeded at 90% probability; and
•   maps (2-D and 3-D) of carcinogenic risk
    estimates based on ingestion of chlordane-
    contaminated groundwater for a residential
    scenario.
        Table 4. Site N soil contamination threshold concentrations for the cost-benefit
                 problem
Contaminant
Arsenic (As)
Cadmium (Cd)
Chromium (Cr)
Minimum threshold
concentration
(mg/kg)
75
70
370
Maximum threshold
concentration
(mg/kg)
500
700
3700
                                                 15

-------
Evaluation of SADA
Decision Support
During the demonstration, SADA was able to
quickly import data on contaminant concentrations
and site maps and integrate this information on a
single platform. SADA demonstrated the ability to
place the information in a visual context and
generated 2-D and 3-D maps that support data
interpretation and decision-making. SADA was used
in the demonstration to automatically generate
contaminant concentration maps, recommended
cleanup zones, cost-benefit curves, and human
health risk maps. These maps can be based on the
probability of exceeding specified contaminant
threshold concentrations or risk levels and at
specified probability levels. SADA was also used to
predict new sample locations based on geostatistical
analysis of the existing data. The accuracy of the
analysis is discussed in the section on comparison of
SADA results with baseline data and analysis.

Documentation of the SADA Analysis and
Evaluation of the Technical Approach
For each analysis, UTRC provided a step-by-step
description of the SADA manipulations necessary to
import data provided and perform the desired
analysis. The  steps proceeded logically and in a
straightforward manner. Manipulations to format the
data within the SADA architecture were relatively
simple. Files containing data were supplied to the
analyst using  a .dbf format. Prior to using these files
in SADA, these files were imported into another
program (Microsoft Excel) and saved using ASCII
comma-delimited file format (.csv). Discussions
occurred on the choice of the different model
approaches (adaptive fill, rank uncertainty, etc.) used
in performing the sample optimization problem.
Model selection and parameters for contouring,
human health risk assessment,  and cost-benefit
analysis were also provided in the output files and
documentation. The technical approach used by
UTRC followed standard practices.

Comparison of SADA Results with the
Baseline Analysis and Data
Site N Sample Optimization Problem
For the Site N sample optimization test problem,
data from soil samples from the southwest corner of
the site, which indicated contamination above the
threshold concentrations, were provided to the
analyst. Figure 1 presents a site map with overlays
of roads,  ponds, and creeks. Initial sample locations
are marked with the symbol +  and arsenic contours
are shown for threshold concentrations of 125 and
500 mg/kg. Since this initial region contained data
covering only a small area of the entire site, the test
problem required that additional sample locations be
determined to characterize contamination for the
entire site. UTRC initiated the analysis by using
SADA to plot sample locations and contaminant
data for As, Cd, and Cr. Although UTRC evaluated
all three contaminants, only the results for arsenic
are presented in this report. Next, SADA was used to
predict additional sampling locations to define the
boundary of arsenic contamination in the initial
region in the southwest corner. SADA required four
rounds of sampling and a total of 19  additional
samples to bound the initial small region of
contamination. Next, the UTRC analyst used the
adaptive fill model in SADA to obtain an additional
40 sample locations throughout the entire site. Using
this information, the analyst performed rank
uncertainty analysis (in which locations with the
highest uncertainty in exceeding the threshold
concentration are selected for sampling) to select the
final 21 sampling locations. During the analysis,
UTRC examined the effect of model parameters
(correlation length  and probability level) for the area
predicted to exceed the threshold concentration.

Figure 2 shows the final sample locations selected
by UTRC using SADA for the area of concern at
Site N. Color-coded circles denote sample locations.
The highest arsenic concentrations are denoted in
red and the lowest concentrations in purple. From
the SADA-generated map, it can be seen that the
entire area of concern has been evaluated with more
samples taken in the regions of higher
concentrations. Figure 3 presents the final nominal
arsenic concentration contour map as determined
using SADA and the data obtained through the 80
additional samples. The map is color-coded based on
default concentrations and the maximum
concentration. This makes it extremely difficult to
judge the location of the regions above the threshold
concentrations of 125 and 500 mg/kg, which are not
displayed directly on the map. In addition, the choice
of dark purple for the regions of low concentration
makes it difficult to see the base site map containing
streets and other points of reference. During the
demonstration, the  UTRC analyst showed that the
default parameters used in generating the contour
maps could be updated, thus incorporating user-
specified parameters. Updating to user-specified
parameters  would have been helpful in comparing
the areal extent of predicted contamination with the
baseline data, but this was not done for this map.
                                                16

-------
      Site N Sample Optimization
      Initial Data Set:   Arsenic
      Contour at 125 (blue) and 500 (red) mg/kg
      Sample locations are marked with a +
23000.00-
22500.00-
22000.00-
21500.00-
21000.00-
20500.00-
20000.00-
                \          i          i          r
             30000.00    30500.00   31000.00   31500.00   32000.00
Figure 1. Site N initial sample locations provided to the analyst and arsenic contours for two
       threshold concentrations (125 and 500 mg/kg).
                             17

-------
                                                               0.00
Figure 2.   Site N final sample locations obtained using SADA. Sample locations are
           color-coded, with red indicating the highest arsenic concentrations and
           purple the lowest.
                                                                 4140.85
                                                                 3323.76
                                                                 2492.82
                                                                 1661.83
                                                                 830.94
                                                                 0.00
   Figure 3.  SADA contour map for Site N arsenic concentration after
              completion of the sample optimization.
                                    18

-------
Figure 3 shows several areas of the site to be above
the arsenic threshold concentrations of 125 and
500 mg/kg. Areas colored blue, green, or red are
above the 500 mg/kg threshold concentration. Areas
in light purple may be above the  125  mg/kg
threshold, but this is not clear. Areas in dark purple
are below the minimum threshold concentration of
125 mg/kg.

For the baseline analysis of Site N using the entire
data set (4,187 sample points), the technical team
used two approaches to generate  multiple lines of
                     reasoning. The first approach consisted of evaluating
                     the entire data set using the ordinary kriging
                     interpolator in Surfer (Figure 4). In addition, the data
                     set was evaluated using indicator kriging approach
                     available in GSLIB Version 2.0, as shown in
                     Figure 5. A comparison of these two figures
                     indicates that the two approaches yielded similar
                     results, thus providing a baseline for comparison to
                     the SADA  results. A comparison of SADA results
                     (Figure 3) with the baseline results (Figures 4  and 5)
                     indicates that SADA was able to define most,  but
                     not all, of the regions contaminated above the
             23000-
             20000-
                  29500
30000
30500
31000
31500
32000
                                                  Easting (ft)
         Figure 4.  Baseline analysis generated by Surfer for the Site N sample optimization problem.
                   Arsenic contours are shown at 125 mg/kg (blue) and 500 mg/kg (red). Sample locations
                   are marked with a +.
                                                  19

-------
  Figure 5.  Baseline analysis generated by GSLIB for the Site N sample optimization problem. Arsenic contours are
           shown at 125 mg/kg (blue) and 500 mg/kg (red). Sample locations are marked with a 'o'.
arsenic threshold concentrations while using only 80
additional data points (2% of the complete data set).
The data supplied to the analyst indicated that the
southwest corner of the site was contaminated above
the threshold levels (see Figure 1). The contours
generated with SADA after sample optimization and
data collection indicate that there are four newly
identified areas of contamination that had arsenic
concentrations in excess of 500 mg/kg (blue and
green regions in Figure 3).

The baseline and SADA analyses predicted the areas
in which the contaminant concentration exceeded the
threshold concentration (Table 3). In addition to
visual comparison of the results, the calculated areas
were used as a basis for comparison. The baseline
analysis estimated the area that exceeded the arsenic
threshold concentration of 125 mg/kg to be
955,000 ft2 and the area exceeding the arsenic
threshold concentration of 500 mg/kg to be
247,500 ft2. SADA estimated the area with arsenic
concentrations greater than 125 mg/kg to be
714,500 ft2 and the area greater than 500 mg/kg to be
186,605 ft2. The SADA estimates were
approximately 75% of the baseline analysis. The
technical evaluation team concluded that this degree
of accuracy was reasonable considering the con-
straint of 80 additional samples to characterize the
entire 125-acre site.

UTRC used SADA to generate maps that estimated
the probability of exceeding the 500 mg/kg arsenic
threshold. Using these maps, the analyst prepared
maps of the area requiring remediation for different
                                                  20

-------
probability levels. Similar analyses were performed
for the other two contaminants, cadmium and
chromium.

Site N Cost-Benefit Problem
For the Site N cost-benefit test problem, the analyst
was given contaminant data for soil on a region of
the site that was well characterized. The goal of this
test problem was to demonstrate the software's
capability to conduct a cost-benefit analysis for soil
contamination to support environmental decisions.

UTRC initiated the analysis by using SADA to plot
the sample locations and contaminant data on  a map
with overlays of roads and water bodies (Figure 6).
Next, UTRC performed statistical analyses of the
data to determine the appropriate geospatial tool not
only for interpolating contaminant concentrations
between points but also for quantifying the
uncertainty in these estimations. Selection of a
geostatistical routine is the foundation for
determining  the area of remediation and for later
cost-benefit analysis. Given a correlation structure,
the distribution of data points dictates the choice of
geostatistical routines. However, the ability to use a
geostatistical method depends on the existence of a
spatial correlation structure in the data. After
examination of the data distributions, UTRC
selected indicator kriging for interpolating As, Cd,
                        and Cr concentration data. Use of geostatistical
                        analysis allowed the UTRC analyst to provide
                        results in terms of probability levels or other
                        uncertainty qualifiers. During the geostatistical
                        selection process, UTRC demonstrated the
                        functionality of the multiple statistical and
                        geostatistical routines in the SADA software,
                        thereby providing multiple lines of reasoning. This
                        functionality allows users of SADA software to
                        conduct multiple evaluations; however, ultimately it
                        is up to the analyst to optimize the treatment of the
                        data.

                        After completion of the spatial correlation modeling,
                        SADA was used to generate contaminant
                        concentration contour maps and conduct cost-benefit
                        analyses. The application and utilization  of indicator
                        kriging to produce these maps and conduct the cost-
                        benefit analyses was demonstrated to be
                        straightforward using SADA. Figure 7 is a plot of
                        arsenic concentration contours generated by SADA.
                        Figure 8 presents the test team's baseline analysis,
                        obtained by interpolation of the data with Surfer.
                        The lack of a direct correspondence between the
                        threshold levels and the contour levels in the SADA
                        analysis (Figure 7) makes comparison difficult.
                        However, there is general agreement between the
                        two analyses. Again, the choice of dark purple for
       A
_D-j^~^^
C^rJtrfpJ/  ,
Jr£»c?^rV^
                    ;r r-^v-~  'Jrr-1  ««
                     P- if ''i ^»;
        Figure 6. Sample locations and arsenic concentrations (mg/kg) generated by SADA for the Site N
                 cost-benefit problem.
                                                21

-------
              Figure 7. Arsenic contours (mg/kg) generated by SADA for the Site N cost-benefit problem.
the region of low concentration makes determination
of exact locations on the site difficult. A transparent
color would have been a better choice.

The test problem definition requested the analyst to
estimate the area of contamination at three
probability levels (10%, 50%,  and 90%) for each
threshold concentration. The probability level
corresponds to the amount of uncertainty in the
decision. The 10% probability  level is the level at
which the analyst believes that there is a 10%
probability that the concentration of the contaminant
at a specified location exceeds  the threshold
concentration. This leads to a maximum estimate of
the contaminated region. Similarly, the 90%
probability level corresponds to level at which the
analyst believes that there is a 90% probability that
the concentration of the contaminant  at a specified
location exceeds the threshold concentration. Based
on the arsenic contours, SADA was used to define
remediation zones for removing arsenic-
contaminated soil at the 75 mg/kg threshold at the
50% probability level (Figure 9) and the 90%
probability level (Figure 10). The gray highlighted
regions in these figures represent the areas that
exceed the threshold. The area exceeding the
75 mg/kg arsenic concentration is much smaller at
the 90% probability level than  at the 50% probability
level, consistent with the higher probability of
exceeding the threshold. Figure 9, depicting the 50%
probability level, shows a very close agreement with
the baseline analysis at the 75 mg/kg arsenic
threshold, shown in Figure 8. The minor differences
near the edges of the predicted contamination zone
are caused by the different treatment of the data in
the two analyses and the slightly differing selection
of data boundaries.

For comparison, baseline area estimates were
generated using Surfer to interpolate the data using
kriging. Additional baseline  analyses were
conducted using kriging interpolants in GSLIB and
Geo-EAS. Table 5 presents the area estimates of the
baseline kriging analysis, baseline geostatistical
analysis at the 50% probability level, and the SADA
analysis at the 50% probability level. When
comparing SADA results to the baseline
geostatistical analysis, the area contaminated with
arsenic estimated by SADA was 20% less than the
baseline analysis. Likewise, the area contaminated
with cadmium was 20% less than the baseline
geostatistical analysis whereas the area contaminated
with chromium was only 4% less than the baseline
analysis. These slight  differences between SADA
results and the baseline geostatistical area estimates
are due to different parameters selected by the
analyst during data interpolations using kriging.
However, it is concluded that the SADA area
estimates at the 50% probability level are consistent
with the baseline analysis.
                                                  22

-------
           2260O
           2240
           21400-
               30200
                30400
30600
30800
31000
31200
31400
 Figure 8.
                                      Easting (ft)
Baseline analysis performed by Surfer, with kriging interpolation of the data, for arsenic concentration
contours in the Site N cost-benefit problem. Areas in blue correspond to regions above the arsenic 75
mg/kg threshold. Areas in red correspond to regions above the arsenic 500 mg/kg threshold.
A comparison between SADA results at the 90%
probability level and the baseline analysis is
presented in Table 6. The 90% probability level
corresponds to the level at which the analyst believes
that there is a 90% probability that the contamination
concentration at a specified location exceeds the
threshold concentration. The area estimates
generated by SADA for As, Cd, and Cr are 12, 21,
and 8%, respectively, less than the baseline analysis.
As previously observed, the SADA area estimates
are slightly less than the baseline analysis. The
difference is due to the difference in the parameters
selected for data interpolation by the SADA analyst
and the technical team who performed the baseline
analysis.  However, the technical evaluation team
concluded that the SADA generated area  estimates
                                         at the 90% probability level are consistent with the
                                         baseline analysis.

                                         Next, the UTRC analyst used SADA to estimate the
                                         cost for remediating the arsenic-contaminated soil.
                                         SADA generated cost curves based on the area of
                                         the site exceeding the arsenic threshold
                                         concentration of 75 mg/kg and an assumed unit cost
                                         of $20/ft3 for excavating and treating the
                                         contaminated soil (Figures 11 and 12). Figures 11
                                         and 12 demonstrate one of the multiple lines of
                                         reasoning that SADA can perform to assist decision
                                         making. When using SADA, the operator can select
                                         any point on the curve to obtain an estimate of cost
                                         at a specified cleanup level. This is displayed in
                                         Figures 11  and 12 as the pair of numbers
                                                 23

-------
Figure 9. Cleanup zones (gray areas) for arsenic threshold concentration of 75 mg/kg with 50%
         probability level generated by SADA
  Figure 10. Cleanup zones (gray areas) for arsenic threshold of 75 mg/kg with 90% probability
             level generated by SADA.
                                       24

-------
Table 5.  Comparison of baseline analyses and SADA area estimates for the Site N cost-
          benefit test problem at the 50% probability level
Constituent
Arsenic
Cadmium
Chromium
Concentration
threshold
(ppm)
75
70
370
Area of contamination
(ft2)
Baseline kriging
(Surfer)
330,000
285,000
37,100
Baseline kriging
(geostatistics 50%
probability)
389,000
325,000
30,500
SADA
(50% probability)
312,048
258,336
29,232
    Table 6.  Comparison of baseline analyses and SADA area estimates for the
               Site N cost-benefit test problem at the 90% probability level
Constituent
Arsenic
Cadmium
Chromium
Concentration
threshold
(ppm)
75
70
370
Area of contamination
(ft2)
Baseline kriging
(geostatistics 90%
probability)
187,226
155,385
12,190
SADA
(90% probability)
164,016
122,976
11,232
       12479048-1—
         983232
        7487424
        4991616
        2495803
                                Arsenic, Inorganic Cost Benefit Curve for Conf = 50%
            (5.2E+2J1247017.625)
                                     2.7E+3         4.E+3
                                       Cleanup Goal (mglkg)
       Figure 11. SAD A-generated curve for arsenic cleanup costs as a function of
                 concentration at the 50% probability level.
                                           25

-------
                                       Arsenic, Inorganic Cost Benefit Curve for Conf = 90%
                 c581
                       (7.9E+2.S137580.421875)
                  Figure 12.  SADA-generated curve for arsenic cleanup costs as a function of
                             concentration at the 90% probability level.
(concentration, cost) above the rectangle in the
figures.  In this example, for the 50% probability
level, the cost to remediate soil contaminated above
520 mg/kg is $1.25 106 (Figure 11), and at the 90%
probability level the cost to remediate contaminated
soil  above 790 mg/kg is $1.37 105 (Figure 12).

To demonstrate the human health risk assessment
capabilities of SAD A, UTRC calculated both the
carcinogenic and noncarcinogenic risks from
exposure to contaminated soil. As discussed  in
Section 2, SADA follows the Risk Assessment
Guidance for Superfund Sites  (EPA, 1989) and
guidance for calculating risk-based preliminary
remediation goals (EPA 1991). SADA has a
complete data base of parameters to automate the
human health risk calculations. The spatially
distributed estimates of As, Cd, and Cr soil
concentrations after completion of sample
optimization were used as input concentrations for
the human health risk modules in SADA to calculate
risk and present the risk results spatially (Figure 13).
SADA calculated risks associated with exposure to a
single contaminant (arsenic; Figure 13) and also
summed risk from exposure to the three
contaminants (As, Cd, and Cr) identified in the test
problem (Figure 14). The ability to sum risks is
important when multiple contaminants are found at a
site. The risk maps could have been improved by
operator intervention to change the scale
(logarithmic scale would be more appropriate for
risk) and color scheme. The use of a linear scale
does not allow much differentiation in risk to be
seen.

The technical team evaluated the risk calculations in
SADA by independently calculating  the risks using
contaminant concentrations from the baseline data
and risk equations documented in EPA's Risk
Assessment Guidance for Superfund  (EPA 1989)
and guidance for calculating risk-based preliminary
remediation goals (EPA 1991). The technical team,
using the same input parameters as the UTRC
analyst and the same approach, independently
reproduced the risks calculated by SADA for As,
Cd, and Cr as well as the summed risk from
exposure to all contaminants for Site  N. This is the
expected result and indicates that the EPA guidance
on calculating human health risk is correctly
implemented in SADA

Site S Cost-Benefit Problem
The Site S data set consisted of a series of wells with
chlordane concentrations provided on a 5-ft vertical
spacing. The test problem was developed to evaluate
the software's capability to address a 3-D
groundwater contamination problem from a cost-
benefit perspective. UTRC used SADA to evaluate
the chlordane groundwater data at 5-ft vertical
intervals to create both 2-D and 3-D depictions of
the contamination. The UTRC analyst began with a
data exploration exercise to determine spatial
correlation in the data and to define the boundaries
of the chlordane contamination. The  analyst
concluded from the data exploration  using SADA
                                                 26

-------
                  I?
                  J=f)  JlJ
                 r~'lr'xr,
              Figure 13. Carcinogenic risk for arsenic based on residential scenario produced using SADA.
           Figure 14. Summed carcinogenic risk for the three contaminants (arsenic + chromium + cadmium).
that the chlordane plume begins near the surface in
the north and increases in depth as it travels south.
Also, a significant anisotropy in the data was
observed and confirmed in the spatial correlation
analysis. The analyst also concluded that the data
were neither lognormally nor normally distributed,
and therefore indicator kriging was the interpolator
of choice.
Next, UTRC used the indicator kriging function in
SADA to generate 2-D contours of the chlordane
data at 5-ft vertical intervals. The 2-D sections were
combined to produce a 3-D depiction of the entire
chlordane plume (Figure 15). The visualizations
presented in Figure 15 could have been better. For
the 2-D representation, the visualizations had the
following problems:
                                                27

-------
•   The color key did not match the threshold values
    of 5 and 500 |Jg/L, making comparison with the
    baseline analysis difficult.
•   The implications of the blank areas near the
    edge of the plume are not clear. When SADA
    does not have enough data to obtain a reliable
    estimate for concentration, the region with
    inadequate information is depicted without color
    coding, as is seen in Figure 15.
•   The sample locations are marked with a circle,
    but these are difficult to  see.

For the 3-D representation, the lack of coordinates
on the axes makes it difficult to interpret the figure.
In addition, the fuzzy black region around the plume
also makes data interpretation difficult. However,
close examination of the 3-D map indicates a plume
whose centerline decreases in depth  as the distance
from the source increases.  This trend as depicted —
a long, narrow plume that is  moving deeper — is
generally consistent with the data.

The chlordane data for Site S were generated by the
technical team using a simulation model  with a
constant source of chlordane supplied to  the aquifer.
Hydrologic and chlordane transport parameters were
obtained from field data at Site S and were taken to
be constant over the problem domain. These
assumptions permitted release and transport of
chlordane through the aquifer to be represented by a
partial differential equation that was solved
analytically. This analytical  solution provides  an
exact value for chlordane concentrations at each
location to compare with the SADA results.

To evaluate the accuracy of the plume maps
generated by SADA, the demonstration technical
team compared the SADA results with the analytical
solution as well as with the baseline analysis
conducted by the technical team using the same data
set as that supplied to UTRC. The baseline analysis
was obtained by using Surfer to interpolate the data
using  ordinary kriging with an anisotropy ratio of
0.3. The concentration contour presented in
Figure 16 represents the maximum concentration
observed at the location (independent of depth) and
provides a 2-D representation of the  extent of the
plume. Figure 16 shows contours based on the
analytical solution and contours based on the
baseline analysis. For the 500-|j,g/L contour, the
analytical solution has a length of approximately
1000 ft and a maximum width of 100 ft. The
analytical 500-|jg/L  contour  reaches  approximately
100 ft farther south than the transect defined by
wells DP-122 through DP-125 (Figure 16). Because
the Surfer-generated baseline analysis was based on
the incomplete data set supplied to UTRC, it does
not show the 500-|jg/L contour extending as far
south as the analytical solution does. Examining the
data supplied for the baseline analysis, wells DP-123
and 124 are both below the 500-|j,g/L concentration
limit. Thus, the baseline analysis indicates that the
contour does not extend to these locations. The
analytical solution, however, shows a region
between these wells where the concentration exceeds
the threshold, and therefore, the  500-|jg/L contour
does extend past these locations. The difference seen
in the 5-|j,g/L contour also arises from incomplete
data.

The baseline analysis tends to predict a larger area of
contamination than does the analytical  solution.
These discrepancies illustrate the difficulty in
obtaining precise plume boundaries without
measuring data at every location.

The SADA-generated analysis of the 500-|jg/L
contour, based on the maximum measured value in
each well (Figure 17), provided results (gray-shaded
area) consistent with the Surfer baseline analysis
depicted in Figure 16. There are minor differences
between the SADA and baseline analyses at the
leading edge of the 5- and 500-|jg/L contours, with
the baseline analysis predicting a slightly larger area
of contamination. This is due to the different
parameters and analysis techniques used in the two
analyses.

Further comparison shows that the leading edge of
the SADA-predicted 500-|jg/L plume (Figure 17)
extends to the south approximately  100 ft beyond
the transect defined by wells DP-111 through
DP-115, indicating a close match with the baseline
analysis, in which the plume extended about 200 ft
past the transect (Figure 16) The analytical solution
indicates a larger area of contamination above the
500-|j,g/L level and a smaller area of contamination
above the 5-|jg/L level than either the baseline
analysis or SADA analysis. This is  expected because
the analytical solution contours were based on more
information about the plume location. The technical
test team evaluated the maps produced by SADA at
each vertical level by comparing these with a
baseline analysis on the same data set. On the basis
of these comparisons, the technical  team concluded
                                                 28

-------
                                               CHLORDAKE Estimates
                                                                                      1043."
                                                                                      837.8:
                                                                                      628.31
                                                                                      418.9'
                                                                                      209.4!
                                                                                      0.00
                              129*937.00  1295022.00 1295107.00 1295192.00 1295277.00 1295362.00
                                                                                                                    1043.
                                                                                                                    837.8
                                                                                                                    628.3
                                                                                                                    418.9
                                                                                                                    208.4
Figure 15.   Two-dimensional depiction of chlordane concentrations for one 5-ft interval and 3-D depiction of entire
             plume generated by SADA for Site S.
                                                          29

-------
S
 wo
 O
      256200-
      256000-
      255800-
      255600-
      255400-
      255200-
      255000-
      254800-
      254600-
      254400-
                        — Analytical solution

                       	Best estimate from
                          kriging interpolation
                          (based on baseline
                          data)
               1295000
                              1295200
                                             1295400
                           Easting (ft)
 Figure 16. Site S chlordane 5- and 500-|jg/L contours for the analytical solution (solid lines) and the baseline contour
           (dashed line) obtained using kriging with an anisotropy ratio of 0.3. Sample locations are posted (+) and
           labeled on the map.
that SADA produced an adequate match to the
baseline analysis.

To complete the cost-benefit portions of the Site S
test problem, UTRC used SADA to produce
depictions of areas of chlordane-contaminated
ground-water at the chlordane threshold
concentrations of 5 and 500 |jg/L with probability
levels of 10 and 50%. The region enclosed by the
10% probability  level contour represents the region
in which there is at least a 10% chance that
contamination exists  above the threshold value and
gives a larger estimate of area than the 50%
probability level contour. Figure 18 provides an
example of the SADA output depicting areas of
contamination above  the 5-|jg/L threshold
concentration at the 10% probability level. Figure 18
provides an example of one vertical level and the 3-
D representation of the plume. The visualization
problems noted for Figure 15 also apply to
Figure 18.

For evaluation, the volume of groundwater with
chlordane concentrations above the 5-|jg/L and
500-|ag/L thresholds at the 10 and 50% probability
levels estimated by SADA were compared to the
volume estimated by the technical team (Table 7).
For the 50% probability level, there is good
agreement between SADA, the baseline analysis
using Surfer, and the baseline geostatistics. In fact,
all three volume estimates are with 13% of each
other, indicating agreement. For the 10% probability
                                                 30

-------
                                Chlordane Contours Decision = 500 ppb
                     E5J7E9.S9- -
                                                                            1043.'
                                                                            837.8:
                                                                            628.3l
                                                                            418.8'
                                                                            209.4!
                                                                            0.00
                        1294957.00  1295022.00   1205107.00   1205102.00   1205277.00  12953&2.00
                                             Easting

                     Figure 17. SADA-generated Site S chlordane contours at 500 |jg/L
                               produced using the maximum value in each well. Colored
                               circles note sample locations.
level, the SADA volume estimates are 6% less at the
5-|jg/L threshold and 9% less at 500 |jg/L than the
baseline area estimates. The difference between the
SADA results and the baseline analysis is due to the
slightly different kriging parameters selected for
each analysis. Overall, the technical team concluded
that there is close agreement among the volume
estimates produced by SADA and the baseline
geostatistical analysis.

The last analysis conducted by UTRC for Site S was
to demonstrate the capabilities of SADA in assessing
human health risk. UTRC calculated the
carcinogenic risks from exposure to  chlordane-
contaminated groundwater following EPA guidance
(EPA, 1989; EPA, 1991).
The chlordane concentrations were used as input
concentrations for the risk modules in SADA to
calculate and present the risk results in tabular form
as well as spatially in risk maps. Figure 19 presents
SADA carcinogenic risk outputs spatially in 2-D
(one vertical level) and 3-D. The limitations to these
visualizations are similar to those found in Figure
15. In addition, the risk map could have been
improved by using a logarithmic scale to provide
clearer definition of risk zones.

As in the Site N example, the technical team
evaluated the risk calculations in SADA by
independently calculating the risks using chlordane
concentrations from the baseline data and the risk
equations documented in  EPA 1989 and in EPA's
guidance for calculating preliminary remediation
                                                  31

-------
                                     Chlordane Contours Decision = 500 ppb
                                                  Easting
                                                                                1043.'
                                                                                837.8:
                                                                                 S28.3I
                                                                                418.9'
                                                                                209.4!
                                                                                0.00
Figure 18.  Site S areas where chlordane threshold of 5 (ig/L is exceeded with 10% probability: 2-D depiction for one
            5-ft interval and 3-D depiction of the entire plume generated by SADA.
                                                      32

-------
goals (EPA 1991). Using the same technical
approach and risk input parameters as the UTRC
analyst, the technical team was able to independently
reproduce the carcinogenic risks calculated by
SADA for chlordane. This is the expected result and
indicates that the EPA guidance on calculating
human health risk is correctly implemented in
SADA

Multiple Lines of Reasoning
The UTRC analyst conducted multiple data
explorations  and evaluations that were supported by
the statistical and geostatistical functions in SADA.
This information provided a quantitative measure of
the probability that could be  placed in the decision.
Several data interpolation routines were considered
on a problem-specific basis before selecting the best
one for data analysis. Several sample optimization
schemes are  available for use. Selection of a
particular scheme depends on the objectives of the
analysis and the amount of data.

Secondary Evaluation Criteria
Ease of Use
The analysis team found that SADA was easy to use.
It has a graphical user interface (GUI) with pull-
down menus to permit use of the options in the
software. SADA imports database files with comma-
delimited format. It also imports .dxf image files and
integrates them into the visualization of the problem.

The GUI provided a platform to address problems
efficiently and to tailor the analysis to the problem
under study (e.g., contours at certain threshold
concentrations). The database structure permitted
queries on any field (e.g., chemical name, date, con-
centration, and well identifiers) and also permitted
filtering (e.g., to include only data within a range of
elevations or to include selected data points).

SADA exhibited the capability to export text  and
graphics to standard word processing software
directly. It also was able to generate  project files.
This allows the entire project to be moved to  another
machine with SADA software.

During the demonstration, several members of the
technical team received a 4-hour introduction to
SADA. The reviewers observed that SADA was  a
large, feature-rich software program  that has on-line
manual and case studies included in  the software to
guide the novice user through the system and
applications. The reviewers felt that  with one or two
days of training, they would be able to use the
fundamental features found in SADA. However,
they all felt that regular use of the product would be
needed to efficiently utilize all of the features found
in the product, especially the geostatistical and
human health risk features.
Efficiency and Range of Applicability
During the demonstration, UTRC provided one staff
member for 4 days to perform the analysis of three
problems. An additional 4 days were spent preparing
the report that documented the models, assumptions,
parameter choices, and results of the analysis. The
demonstration showed that the software was capable
of importing data and .dxf files to perform an
evaluation. The software has a flexible structure that
allows a wide range of environmental conditions
(e.g., contaminant in groundwater, soil, multiple
contaminants on a single site) to be represented.

Training and Technical Support
UTRC provides a number of options for SADA
training and technical support. These include

•    an extensive on-line help manual,
•    tutorial case studies provided with the software,
•    a 2-day training course, and
•    technical support via internet.

The on-line user manual provides detailed
instructions on how to operate SADA. The manual
also gives recommendations on approaches to
statistical/geostatistical modeling and other aspects
of the code. The manual is organized in an orderly
fashion and contains pictures of the pull-down
menus the operator would see when using SADA.
Screen captures demonstrating the type of output
produced by SADA are also available in the on-line
help.

Additional  Information about the
SADA Software
To use SADA efficiently, the operator should
possess a basic understanding of the use of
geospatial modeling in analyzing environmental
problems and human health risk assessment. This
includes an understanding of interpolation
algorithms and geostatistics along with fundamental
knowledge about database manipulations, sample
optimization, and cost-benefit analysis.
                                                33

-------
                                 Chlordane Carcinogenic Risk Contours

                                2SM41.S4
                                                                               .E-2
                                                                               .E-3
                                          3T.«E9S
-------
          Table 7.   Volume estimates for the Site S cost-benefit test problem by baseline
                     analysis methods and SADA
Chlordane threshold
concentrations and
probability levels
5 ng/L
50% probability level
5^g/L
10% probability level
500 ng/L
50% probability level
500 ng/L
10% probability level
Volume of chlordane-contaminated groundwater
(ft3)
Baseline Surfer
kriging
15,783,000
NA
571,250
NA
Baseline geostatistics
kriging
17,423,070
22,458,800
589,600
1,520,800
SADA
16,250,000
20,990,000
510,000
1,380,000
During the demonstration, SADA Beta Version 3.0
was operated on a Windows 95 platform using a
laptop with a 266-MHz Pentium processor, 128 MB
of RAM and 4 MB of video memory. SADA
requires a minimum of 15 MB of disk space to load.

Summary of Performance
A summary of SADA's performance is presented in
Table 8. Overall, the technical team observed that
the main  strength of SADA is its technical approach
to assist environmental decision makers by defining
areas of concern based on user-defined contaminant
concentrations or human health risks. SADA's
estimate of the degree of uncertainty in the
prediction provides key information to assist in
selection  of future sample locations and in
determining cost/risk tradeoffs. The incorporation of
databases of risk parameters coupled with the pull-
down menus make risk calculations easy to perform.
The integration of geostatistical analysis, human
health risk assessment, cost-benefit analysis,
sampling design, and decision analysis into a single
software product makes SADA a powerful tool for
analyzing spatially correlated data. SADA
demonstrated the ability to perform accurate sample
optimization analysis, estimate areas and volumes of
contamination for cost-benefit analysis, and estimate
probability of exceeding thresholds based on
contaminant concentrations or risk.

The technical team did not notice any major
limitations in SADA. Several minor limitations were
noted: the 3-D visualizations provided only a
qualitative depiction of the plume because a frame of
reference (axis scale or surface maps) was not
provided; maps and drawings could be imported
only as .dxf files; and data files could be imported
only in comma-delimited format, which requires
reformatting in another software product.
                                                35

-------
Table 8. SADA performance summary
Feature/parameter
Decision support
Documentation of analysis
Comparison with baseline
analysis and data
Multiple lines of reasoning
Ease of use
Efficiency
Range of applicability
Training and technical support
Operator skill base
Operating system
Cost
Performance summary
SADA integrated data, site maps, and surface features into 2-D and 3-D spatial
representations of the problem. SADA is designed to automatically generate cost-
benefit curves and human health risk maps. These maps can be based on the
probability of exceeding threshold concentrations. SADA is also designed to
assist the analyst in selecting sample optimization schemes based on
geostatistical analysis of the existing data.
Documentation of the process and parameters were provided and assumptions
explained. Input data and parameters, outputs, and maps were exported to word
processing files to document the analysis.
2-D and 3-D contaminant concentration contours consistent with measured data and
baseline analysis.
Accurately mapped sample locations and groundwater wells, buildings and surface
features.
Accurately posted data to sample locations.
Quasi-3D layered maps of contaminant concentrations consistent with data.
Estimates of area and/or volume of contaminated media for Site N (2-D) and S
(3-D) were consistent with the baseline kriging analysis and baseline
geostatistical analysis.
Calculations of human health risk were consistent with EPA'sRisk Assessment
Guidance for Superfund: Human Health Evaluation Manual (EPA 1991).
SADA can calculate the probability of exceeding a threshold value (risk or
concentration). This facilitates a better understanding of the uncertainty involved
in the decision.
User-friendly, logical layout of menus on the graphical user interfaces. Performing
risk calculations made easy through the extensive database of contaminants,
exposure properties, and risk scenario parameters.
Three problems completed and documented with 8 person days of effort.
SADA is designed to handle any form of spatially correlated data. Therefore, it can
handle contamination in soils, groundwater, surface water, and air. It contains an
extensive database of more than 1000 contaminant names and CAS numbers. Its
extensive database on risk parameters, which follows EPA'sRisk Assessment
Guidance for Superfund, permits easy calculation of human health risk.
On-line users' manual
On-line help
Examples provided with the software
Two-day training course
Technical support through internet
Fundamental understanding of environmental contamination problems, geospatial
analysis, cost-benefit analysis and human health risk assessment
Windows 95, 98, NT
Free
                                        36

-------
     Section 5 — SADA Update and Representative Applications
Objective
The purpose of this section is to allow UTRC to
provide information regarding new developments
with SADA since the demonstration activities. In
addition, UTRC has provided a list of representative
applications in which its technology has been or is
currently being used.

Technology Update
SADA's initial full public release of version 1.0 was
in November 1999. Version 1.0 of SADA has
undergone additional development on a number of
features since the beta version evaluated for this
report. The major  improvements include the
following:

•    Significant improvement of the computational
     speed and visual appearance in 3-D
     visualizations. Also, axis labels are included
     with the 3-D output in order to provide a frame
     of reference during visualization.
•    SADA is now able to import Microsoft Access
     97 files in addition to  comma-delimited data
     files.
•    Creation  of an auto-documentation feature for
     SADA. In this feature, SADA generates  a
     report that documents (at the user's requested
     level of detail) all sources, models, parameters,
     and assumptions used to produce the result
     being viewed  in SADA. This report is in
     Hypertext Markup Language (HTML) format
     and can be read by many word processing
     programs (e.g., Microsoft Word or Corel
     WordPerfect) or network browsers (e.g.,
     Netscape or Microsoft Internet Explorer).
•    Expansion of the human health risk module to
     include a larger combination of media/land-use
     scenarios. Also, total risk and PRO values can
     explicitly include or exclude certain pathways.
    Inhalation pathway modifications have been
    made to comply with recent EPA guidance on
    this pathway.

Representative Applications
In applications to date, SADA has been found to
significantly reduce the amount of startup time and
modeling efforts associated with site
characterization and risk assessment. In addition, the
results have been found to process and produce
information in a clear, transparent manner, directly
supporting decision processes, and to serve as a
communication tool between technical and non-
technical audiences.

SADA has already been distributed worldwide via
the internet. SADA users have applied SADA in a
variety of situations ranging from environmental
remediation to human health risk analysis at oil
production sites. It is estimated that approximately
150 private-sector companies have been using the
Beta version of SADA. In addition, a number of
government agencies, including the Environmental
Protection Agency, and various state offices have
evaluated or implemented SADA. The following list
provides some specific examples of how SADA has
been used:

•   determination of areas of concern in a harbor-
    dredging application in the northeast United
    States;
•   geospatial modeling of human health risk in oil
    production sites in Australia;
•   classical  risk assessments at state facilities in
    Tennessee and Kentucky;
•   secondary sampling design at Portsmouth
    Gaseous  Diffusion Plant in Ohio;
•   remedial design at a reactor facility in Oak
    Ridge, Tennessee.
                                               37

-------
                               Section 6 — References
Deutsch, C. V., and A. Journel. 1992. Geostatistical Software Library Version 2.0 and User's Guide for
GSLIB 2.0. Oxford Press.

Englund, E. I, and A. R. Sparks. 1991. Geo-EAS (Geostatistical Environmental Assessment Software) and
User's Guide, Version 1.1. EPA 600/4-88/033.

EPA (U.S. Environmental Protection Agency). 1989. Risk Assessment Guidance for Super fund. Vol. 1,
Human Health Evaluation Manual. EPA/540/1-90/002. Office of Emergency and Remedial Response, U.S.
Environmental Protection Agency, Washington, D.C.

EPA (U.S. Environmental Protection Agency). 1991. Risk Assessment Guidance for Super fund. Vol. 1,
PartB, Human Health Evaluation Manual: Development of Risk-Based Preliminary Remediation Goals.
OSWER Directive 9285.7-01B. Office of Emergency and Remedial Response, U.S. Environmental Protection
Agency, Washington, D.C.

EPA (U.S. Environmental Protection Agency). 1994. Guidance for the Data Quality Objective Process,
QA/G-4. EPA/600/R-96/055. U.S. Environmental Protection Agency,  Washington, D.C.

Golden Software. 1996. Surfer Version 6.04, June 24. Golden Software Inc., Golden Colorado.

Sullivan, T. M., and A. Q. Armstrong. 1998.  "Decision Support Software Technology Demonstration Plan."
Environmental & Waste Technology Center, Brookhaven National Laboratory, Upton, N.Y., September.

Sullivan, T. M., A. Q. Armstrong, and J. P. Osleeb.  1998. "Problem Descriptions for the Decision Support
Software Demonstration." Environmental & Waste Technology Center, Brookhaven National Laboratory,
Upton, N.Y., September.

van der Heijde, P. K. M., and D. A. Kanzer. 1997. Ground-Water Model Testing: Systematic Evaluation and
Testing of Code Functionality and Performance. EPA/600/R-97/007. National Risk Management, Research
Laboratory, U.S. Environmental Protection Agency,  Cincinnati, OH.
                                              38

-------
                  Appendix A— Summary of Test Problems
Site A:  Sample Optimization Problem
Site A has been in operation since the late 1940s as an industrial machine plant that used solvents and
degreasing agents. It overlies an important aquifer that supplies more than 2.7 million gal of water per day for
industrial, commercial, and residential use. Site characterization and monitoring activities were initiated in the
early 1980s, and it was determined that agricultural and industrial activities were sources of contamination.
The industrial plant was shut down in 1985. The primary concern is volatile organic compounds (VOCs) in
the aquifer and their potential migration to public water supplies. Source control is considered an important
remediation objective to prevent further spreading of contamination.

The objective of this Site A problem was to challenge the software's capabilities as a sample optimization
tool. The Site A test problem presents a three-dimensional (3-D) groundwater contamination scenario where
two VOCs, dichloroethene (DCE) and trichloroethene (TCE), are present. The data that were supplied to the
analysts included information on hydraulic head, subsurface geologic structure, and chemical concentrations
from seven wells that covered an approximately 1000-ft square. Chemical analysis data were collected at 5-ft
intervals from each well.

The design objective of this test problem was for the analyst to predict the optimum sample locations to
define the depth and location of the plume at contamination levels exceeding the threshold concentration
(either 10 or 100 |J,g/L). Because of the limited data set provided to the analysts and the variability found in
natural systems, the analysts were asked to estimate the plume size and shape as well as the confidence in
their prediction. A high level of confidence indicates that there is a high probability that the contaminant
exceeds the threshold at that location. For example, at the 10-|jg/L threshold, the 90% confidence level plume
is defined as the region in which there is greater than a 90% chance that the contaminant concentration
exceeds 10 |Jg/L. The analysts were asked to define the plume for three confidence levels — 10% (maximum
plume, low certainty, and larger region), 50% (nominal plume), and 90% (minimum plume, high certainty,
and smaller region). The initial  data set provided to the analysts was a subset of the available baseline data
and intended to be insufficient for fully defining the extent of contamination in any dimension. The analyst
used the initial data set to make a preliminary estimate of the dimensions of the plume and the level of
confidence in the prediction. In order to improve the confidence and better define the plume boundaries, the
analyst needed to determine where the next sample should be collected. The analyst conveyed this
information to the demonstration technical team, which then provided the analyst with the contamination data
from the specified location or locations. This iterative process continued until the analyst reached the test
problem design objective.

Site A:  Cost-Benefit  Problem
The objectives of the Site A cost-benefit problem were (1) to determine the accuracy with which the software
predicts plume boundaries to define the extent of a 3-D groundwater contamination problem on a large scale
(the problem domain is approximately 1 square mile) and (2) to evaluate human health risk estimates resulting
from exposure to contaminated groundwater. The VOC contaminants of concern for the cost-benefit problem
were perchloroethene (PCE)  and trichloroethane (TCA).

In this test problem analysts were to define the location and depth of the PCE plume at concentrations of 100
and 500 |jg/L and TCA concentrations of 5 and 50 |jg/L at confidence levels of 10 (maximum plume),
50 (nominal plume), and 90% (minimum plume). This information could be used in a cost-benefit analysis of
remediation goals versus cost of remediation. The analysts were provided with geological information,
borehole logs, hydraulic data, and an extensive chemical analysis data set consisting of more than 80 wells.
Chemical analysis data were collected at 5-ft intervals from each well. Data from a few wells were withheld
from the analysts to provide a reference to check interpolation routines. Once the analysts defined the PCE
and TCA plumes, they were asked to calculate the human health risks associated with drinking 2 L/d of
                                                39

-------
contaminated groundwater at two defined exposure points over the next 5 years. One exposure point was in
the central region of the plume and one was at the outer edge. This information could be used in a cost-benefit
analysis of reduction of human health risk as a function of remediation.

Site B:  Sample Optimization and Cost-Benefit  Problem
Site B is located in a sparsely populated area of the southern United States on a 1350-acre site about 3 miles
south of a large river. The site is typical of many metal fabrication or industrial facilities because it has
numerous potential sources of contamination (e.g., material storage areas, process activity areas, service
facilities, and waste management areas). As with many large manufacturing facilities, accidental releases
from laboratory activities and cleaning operations introduced solvents and other organic chemicals into the
environment, contaminating soil, groundwater, and surface waters.

The objective of the Site B test problem was to challenge the software's capabilities as a sample optimization
and cost-benefit tool. The test problem presents a two-dimensional (2-D) groundwater contamination scenario
with three contaminants — vinyl chloride (VC), TCE, and technetium-99 (Tc-99). Chemical analysis data
were collected at a series of groundwater monitoring wells on quarterly basis for more than  10 years along the
direction of flow near the centerline of the plume. The analysts were supplied with data from one sampling
period.

There were two design objectives for this test problem. First, the analyst was to predict the optimum sample
location to define the depth and location of the plume at specified contaminant threshold concentrations with
confidence levels of 50, 75, and 90%. The initial data set provided to the analyst was a subset of the available
baseline data and was intended to be insufficient for fully defining the extent of contamination in two
dimensions. The analyst used the initial data set to make a preliminary estimate of the dimensions of the
plume and the level of confidence in the prediction. In order to improve the confidence in defining the plume
boundaries, the analyst needed to determine the location for collecting the next sample. The analyst conveyed
this information to the demonstration technical team, who then provided the analyst with the contamination
data from the specified location or locations. This iterative process continued until the analyst reached the
design objective.

Once the location and depth of the plume was defined, the second design objective was addressed. The second
design objective was to estimate the volume of contamination at the specified threshold concentrations at
confidence levels of 50, 75, and 90%. This information could be used in a cost-benefit analysis of remediation
goals versus cost of remediation. Also, if possible, the analyst was asked to calculate health risks associated
with drinking 2 L/d of contaminated groundwater from two exposure points in the plume. One exposure point
was near the centerline of the plume, while the other was on the edge of the plume. This information could be
used in a cost-benefit analysis of reduction of human health risk as a function of remediation.

Site D:  Sample Optimization and Cost-Benefit  Problem
Site D is located in the western United States and consists of about 3000 acres of land bounded by municipal
areas on the west and southwest and unincorporated areas on northwest and east. The site has been an active
industrial facility since it began operation in 1936. Operations have included maintenance and repair of
aircraft and, recently, the maintenance and repair of communications equipment and electronics. The aquifer
beneath the site is several hundred feet thick and consists of three  or four different layers of sand or silty sand.
The primary concern is VOC contamination of soil and groundwater as well as contamination of soil with
metals.

The objective of the Site D problem was to test the software's capability as a tool for sample optimization and
cost-benefit problems. This test problem was a 3-D groundwater sample optimization problem for four VOC
contaminants — PCE, DCE, TCE, and trichloroethane (TCA).  The test problem required the developer to
predict the optimum sample locations to define the region of the contamination that exceeded threshold
concentrations for each contaminant. Contaminant data were supplied for a series of wells screened at
different depths for four quarters in a 1-year time frame. This initial data set was insufficient to fully define
the extent of contamination. The analyst used the initial data set to make a preliminary estimate of the
                                                40

-------
dimensions of the plume and the level of confidence in the prediction. In order to improve the confidence in
the prediction of the plume boundaries, the analyst needed to determine the location for collecting the next
sample. The analyst conveyed this information to the demonstration technical team, who then provided the
analyst with the contamination data from the specified location or locations. This iterative process was
continued until the analyst determined that the data could support definition of the location and depth of the
plume exceeding the threshold concentrations with confidence levels of 10, 50, and 90% for each
contaminant.

After the analyst was satisfied that the sample optimization problem was complete and the plume was defined,
he or she was given the option to continue and perform a cost-benefit analysis. At Site D, the cost-benefit
problem required estimation of the volume of contamination at specified threshold concentrations with
confidence levels of 10, 50, and 90%. This information could then be used in a cost-benefit analysis of
remediation goals versus cost of remediation.

Site N:  Sample Optimization Problem
Site N is located in a sparsely populated area of the southern United States and is typical of many metal
fabrication or industrial facilities in that it has numerous potential sources of contamination (e.g., material
storage areas, process activity areas, service facilities, and waste management areas). Industrial operations
include feed and withdrawal of material from the primary process; recovery of heavy metals from various
waste materials and treatment of industrial wastes. The primary concern is contamination of the surface soils
by heavy metals.

The objective of the Site N sample optimization problem was to challenge the software's capability as a
sample optimization tool to define the areal extent of contamination. The Site N data set contains the most
extensive and reliable data for evaluating the accuracy of the analysis for a soil contamination problem. To
focus only on the accuracy of the soil sample optimization analysis, the problem was simplified by removing
information  regarding groundwater contamination at this site, and it was limited to three contaminants. The
Site N test problem involves surface soil contamination (a 2-D problem) for three contaminants — arsenic
(As), cadmium (Cd), and chromium (Cr). Initial sampling indicated a small contaminated region on the site;
however, the initial sampling was limited to only a small area (less than 5% of the site area).

The design objective of this test problem was for the analyst to  develop a sampling plan that defines the
extent of contamination on the 150-acre site based on exceedence of the specified threshold concentrations
with confidence levels of 10, 50% and 90%. Budgetary constraints limited the total expenditure for sampling
to $96,000. Sample costs were $1200 per sample, which included collecting and analyzing the surface soil
sample for all three contaminants. Therefore, the number of additional samples had to be less than 80. The
analyst used the initial data to define the areas of contamination and predict the location of additional
samples. The analyst was then provided with additional data at these locations and could perform the sample
optimization process again until the areal extent of contamination was defined or the maximum number of
samples (80) was attained. If the analyst determined that 80  samples was insufficient to adequately
characterize the entire 150-acre site, the analyst was asked to use the software to select the regions with the
highest probability of containing contaminated soil.

Site N:  Cost-Benefit  Problem
The objective of the Site N cost-benefit problem was to challenge the software's ability to perform cost-
benefit analysis as defined in terms of area of contaminated  soil above threshold concentrations and/or
estimates of human health risk from exposure to contaminated soil. This test problem considers surface soil
contamination (2-D) for three contaminants — As, Cd, and  Cr. The analysts were given an extensive data set
for a small region of the site and asked to conduct a cost-benefit analysis to evaluate the cost for remediation
to achieve specified threshold concentrations. If possible, an estimate of the confidence in the projected
remediation areas was provided at the 50 and 90% confidence limits. For human health risk analysis, two
scenarios were considered. The first was the case of an on-site worker who was assumed to have consumed
500 mg/d of soil for one year during excavation activities. The worker would have worked in all areas of the
site during the excavation process. The second scenario considered a resident who was assumed to live on a
                                                 41

-------
200- by 100-ft area at a specified location on the site and to have consumed 100 mg/d of soil for 30 years.
This information could be used in a cost-benefit (i.e., reduction of human health risk) analysis as a function of
remediation.

Site S:  Sample Optimization Problem
Site S has been in operation since 1966. It was an industrial fertilizer plant producing pesticides and fertilizer
and used industrial solvents such as carbon tetrachloride (CTC) to clean equipment. Recently, it was
determined that routine process operations were causing a release of CTC onto the ground; the CTC was then
leaching into the subsurface. Measurements of the CTC concentration in groundwater have been as high as
80 ppm a few hundred feet down-gradient from the source area. The site boundary is approximately 5000 ft
from the facility where the release occurred. Sentinel wells at the boundary are not contaminated with CTC.

The objective of the Site S sample optimization problem was to challenge the software's capability as a
sample optimization tool. The test problem involved a 3-D groundwater contamination scenario for a single
contaminant, CTC. To focus only on the accuracy of the analysis, the problem was simplified. Information
regarding surface structures (e.g., buildings and roads) was not supplied to the analysts. In addition, the data
set was modified such that the contaminant concentrations were known exactly at each point (i.e., release and
transport parameters were specified, and concentrations could be determined from an analytical solution).
This analytical solution permitted a reliable benchmark for evaluating the accuracy of the software's
predictions.

The design objective of this test problem was for the analyst to define the location and depth of the plume at
CTC concentrations exceeding 5 and 500  |jg/L with confidence levels of 10, 50, and 90%. The initial data set
provided to the analysts was insufficient to define the plume accurately. The analyst used the initial data to
make a preliminary estimate of the dimensions of the plume and the level of confidence  in the prediction. In
order to improve the confidence in the predicted plume boundaries, the  analyst needed to determine where the
next sample should be collected. The analyst conveyed this information to the demonstration technical team,
who then provided the analyst with the contamination data from the specified location or locations. This
iterative process continued until the analyst reached the design objective.

Site S:  Cost-Benefit Problem
The objective of the Site S cost-benefit problem was to challenge the software's capability as a cost-benefit
tool. The test problem involved a 3-D groundwater cost-benefit problem for a single contaminant, chlordane.
Analysts were given an extensive data set consisting of data from 34 wells over an area that was 2000 ft long
and 1000 ft wide. Vertical chlordane contamination concentrations were provided at 5-ft intervals from the
water table to beneath the deepest observed contamination.

This test problem had three design objectives. The first was to define the region, mass, and volume of the
plume at chlordane concentrations of 5 and 500 |J,g/L. The second objective was to extend the analysis to
define the plume volumes as a function of three confidence levels — 10, 50, and 90%. This information could
be used in a cost-benefit analysis of remediation goals versus cost of remediation. The third objective was to
evaluate the human health risk at three drinking-water wells near the site, assuming that a resident drinks
2 L/d of water from a well screened over a 10-ft interval across the maximum chlordane  concentration in the
plume. The analysts were asked to estimate the health risks at two locations at times of 1, 5,  and 10 years in
the future. For the health risk analysis, the analysts were told to assume source control preventing further
release of chlordane to the aquifer. This information could be used in a  cost-benefit analysis of reduction of
human health risk as a function of remediation.

Site T:  Sample Optimization Problem
Site T was developed in the 1950s as an area to store agricultural equipment as well as fertilizers, pesticides,
herbicides, and insecticides. The site consists of 18 acres  in an undeveloped area of the western United States,
with the nearest residence being approximately 0.5  mile north of the site. Mixing operations (fertilizers and


                                                42

-------
pesticides or herbicides and insecticides) were discontinued or replaced in the 1980s when concentrations of
pesticides and herbicides in soil and wastewater were determined to be of concern.

The objective of the Site T sample optimization problem was to challenge the software's capability as a
sample optimization tool. The test problem presents a surface and subsurface soil contamination scenario for
four VOCs: ethylene dibromide (EDB), dichloropropane (DCP), dibromochloropropane (DBCP), and CTC.
This sample optimization problem had two stages. In the first stage, the analysts were asked to prepare a
sampling strategy to define the areal extent of surface soil contamination that exceeded the threshold
concentrations listed in Table A-l with confidence levels of 10, 50, and 90% on a 50- by 50-ft grid. This was
done in an iterative fashion in which the analysts  would request data at additional locations and repeat the
analysis until they could determine, with the aid of their software, that the plume was adequately defined.

The stage two design objective addressed subsurface contamination. After defining the region of surface
contamination, the analysts were asked to define subsurface contamination in the regions found to have
surface contamination above the 90% confidence limit. In stage two, the analysts were asked to suggest
subsurface sampling locations on a 10-ft vertical scale to fully characterize the soil contamination at depths
from 0 to 30 ft below ground surface (the approximate location of the aquifer).

Site T: Cost-Benefit Problem
The objective of the Site T cost-benefit problem was to challenge the software's capability as a cost-benefit
tool. The test problem involved a 3-D groundwater contamination scenario with four VOCs (EDB, DCB,
DBCP, and CTC). The analysts were given an extensive data set and asked to estimate the volume, mass, and
location of the plumes at specified threshold concentrations for each VOC. If possible, the analysts were
asked to estimate the 50 and 90% confidence plumes at the specified concentrations. This information could
be used in a cost-benefit analysis of various remediation goals versus the cost of remediation. For health risk
cost-benefit analysis, the analysts were asked to evaluate the risks to a residential receptor (with location and
well screen depth specified) and an on-site receptor over the next 10 years. For the residential receptor,
consumption of 2 L/d of groundwater was the exposure pathway. For the on-site receptor, groundwater
consumption of 1 L/d was the exposure pathway.  For both human health risk estimates, the analysts were told
to assume removal of any and all future sources that may impact the groundwater. This information could be
used in a cost-benefit analysis of various remediation goals versus the cost of remediation.
                Table A-l.  Site T soil contamination threshold concentrations
Contaminant
Ethylene dibromide (EDB)
Dichloropropane (DCP)
Dibromochloropropane (DBCP)
Carbon tetrachloride (CTC)
Threshold concentration
(HS/kg)
21
500
50
5
                                                43

-------
44

-------
           Appendix B — Description of Interpolation Methods

A major component of the analysis of environmental data sets involves predicting physical or chemical
properties (contaminant concentrations, hydraulic head, thickness of a geologic layer, etc.) at locations
between measured data. This process, called interpolation, is often critical in developing an understanding of
the nature and extent of the environmental problem. The premise of interpolation is that the estimated value of
a parameter is a weighted average of measured values around it. Different interpolation routines use different
criteria to select the weights. Because of the importance of obtaining estimates of data between measured data
points in many fields of science, a wide number of interpolation routines exist.

Three classes of interpolation routines commonly used in environmental analysis are nearest neighbor, inverse
distance, and kriging. These three classes cover the range found in the software used in the demonstration and
use increasingly complex models to select their weighting functions.

Nearest neighbor is the simplest interpolation routine. In this approach, the estimated value of a parameter is
set to the value of the spatially nearest neighbor. This routine is most useful when the analyst has a lot of data
and is estimating parameters at only a few locations. Another simple interpolation scheme is averaging of
nearby data points. This scheme is an extension of the nearest neighbor approach and interpolates parameter
values as an average of the measured values within the neighborhood (specified distance). The weights for
averaging interpolation are all equal to l/n, where n is the number of data points used in the average. The
nearest neighbor and averaging interpolation routines do not use any information about the location of the
data values.

Inverse  distance weighting (IDW) interpolation is another simple interpolation routine that is widely used. It
does account for the spatial distance between data values and the interpolation location. Estimates of the
parameter are obtained from a weighted average of neighboring measured values. The weights of IDW
interpolation are proportional to the inverse of these distances raised to a power. The assigned weights are
fractions that are normalized such that the sum of all the weights is equal to 1.0. In environmental problems,
contaminant concentrations typically vary by several orders of magnitude. For example, the concentration
may be  a few thousand micrograms per liter near the source and tens of micrograms per liter away from the
source. With IDW, the extremely high concentrations tend to have influence over large distances, causing
smearing of the estimated area of contamination. For example, for a location that is 100 m from a measured
value of 5 |jg/L and 1000 m from a measured value of 5000|j,g/L, using a distance weighting factor of 1 in
IDW yields a weight of 5000/1000 for the high-concentration data point and 5/100 for the low-concentration
data point. Thus, the predicted value is much more heavily influenced by the large measured value that is
physically farther from the location at which an estimate is desired. To minimize this problem, the inverted
distance weight can be increased to further reduce the effect of data points located farther away. IDW  does
not directly account for spatial correlation that often exists in the data. The choice of the power used to obtain
the interpolation weights is dependent on the skills of the analyst and is often obtained through trial and error.

The third class of interpolation schemes is kriging. Kriging attempts to develop an estimate of the spatial
correlation in the data to assist in interpolation. Spatial correlation represents the correlation between two
measurements as a function of the distance and direction between their locations. Ordinary  kriging
interpolation methods assume that the spatial correlation function is  based on the assumption that the
measured data points are normally distributed. This kriging method  is often used in environmental
contamination problems and was used by some DSS products in the demonstration and in the baseline
analysis. If the data are neither lognormal nor normally distributed, interpolations can be handled with
indicator kriging. Some of the DSS products in this demonstration used this approach. Indicator kriging
differs from ordinary kriging in that it makes no assumption on the distribution of data and  is essentially  a
nonparametric counterpart to ordinary kriging.
                                                 45

-------
Both kriging approaches involve two steps. In the first step, the measured data are examined to determine the
spatial correlation structure that exists in the data. The parameters that describe the correlation structure are
calculated as a variogram. The variogram merely describes the spatial relationship between data points.
Fitting a model to the variogram is the most important and technically  challenging step. In the second step,
the kriging process interpolates data values at unsampled locations by a moving-average technique that uses
the results from the variogram to calculate the weighting factors. In kriging, the spatial correlation structure is
quantitatively  evaluated and used to calculate the interpolation weights.

Although geostatistical-based interpolation approaches are more mathematically rigorous than the simple
interpolation approaches using nearest neighbor or IDW, they are not necessarily  better representations of the
data. Statistical and geostatistical approaches attempt to minimize a mathematical constraint, similar to a least
squares minimization used in curve-fitting of data. While the solution provided is  the "best" answer within the
mathematical constraints applied to the problem, it is not necessarily the best fit of the data. There are two
reasons for this.

First, in most environmental problems, the data are insufficient to determine the optimum model to use to
assess the data. Typically, there are several different models that can provide a defensible assessment of the
spatial correlation in the data. Each of these models has its own strengths and limitations, and the model
choice is subjective.  In principle, selection of a geostatistical model is equivalent  to picking the functional
form of the equation when curve-fitting. For example, given three pairs of data points, (1,1), (2,4) and (3,9),
the analyst may choose to determine the best-fit line. Doing so gives the expression y = 4x - 3.33, wherey is
the dependent variable and x is the independent variable. This has a goodness of fit correlation of 0.97, which
most would consider to be a good fit of the data. This equation is the "best" linear fit of the data constrained
to minimization of the sum of the squares of the residuals (difference between measured value and predicted
value at the locations of measured values). Other functional forms (e.g., exponential, trigonometric,  and
polynomial) could be used to assess the data. Each of these would give a different "best" estimate for
interpolation of the data. In this example, the data match exactly with y = x2, and this is the best match of this
data. However, that this is the best match cannot be known with any high degree  of confidence.

This conundrum leads to the second reason for the difficulty, if not impossibility,  of finding the most
appropriate model to use for interpolation — which is that unless the analyst is extremely fortunate, the
measured data will not conform to the mathematical model used to represent the data. This difficulty is often
attributed to the variability found in natural systems, but is in fact a measure of the difference between the
model and the real-world data. To continue with the previous example, assume that another data point is
collected atx = 2.5 and the value is.y = 6.67. This latest value falls on the previous linear best-fit line, and the
correlation coefficient increases to 0.98. Further, it does not fall on the curve y = x2. The best-fit 2nd-order
polynomial now changes fmmy =x2to become>> = 0.85x2 + 0.67x - 0.55. The one data point dramatically
changed the "best"-fit parameters for the polynomial and therefore the  estimated value at locations that do not
have measured values.

Lack of any clear basis for choosing one mathematical model over another and the fact that the data are not
distributed in a manner consistent with the simple mathematical functions in the model also apply to the
statistical and geostatistical approaches, albeit in a more complicated manner. In natural systems, the
complexity increases over the above example because of the multidimensional spatial characteristics of
environmental problems. This example highlighted the difficulty in concluding that one data representation is
better than another. At best, the interpolation can be reviewed to determine if it is consistent with the data.
The example also highlights the need for multiple lines of reasoning when assessing environmental data sets.
Examining the data through use of different contouring algorithms and model parameters often helps lead to a
more consistent understanding of the data and helps eliminate poor choices for interpolation parameters.
                                                  46

-------