Geostatistical Sampling and Evaluation Guidance for Soils and Solid Media


GEOSTATISTICAL SAMPLING AND EVALUATION
   GUIDANCE FOR SOILS AND SOLID MEDIA
                REVIEW DRAFT
         United States Environmental Protection Agency
                Office of Solid Waste
                Washington, DC 20460
                  February 1996

-------
                                        Geostatistical Soil Sampling Guidance
                         TABLE OF CONTENTS
1.  MOTIVATION FOR THIS GUIDANCE	.1-1

   1.1  RELATIONSHIP OF USEPA'S DQO PROCESS TO SOIL GUIDANCE	1-2
   1.2  FORMULATION OF KEY QUESTIONS,	1-3
   1.3  DEFINING VIOLATIONS OF REGULATORY STANDARDS	1-6
   1.4  DETERMINING AN ACCEPTABLE LEVEL OF STATISTICAL
       UNCERTAINTY	1-7
   1.5  ENSURING THAT CONCLUSIONS HAVE WELL-DEFINED
       STATISTICAL MEANING	1-10
   1.6  ESTABLISHING CLEAR SITE BOUNDARIES	1-11
   1.7  THREE STATISTICAL OPTIONS FOR MAKING COMPLIANCE
       DECISIONS	1-12

2.  USING AN AVERAGE TO DETERMINE COMPLIANCE  	'.  . 2-1

   2.1  LAND'S METHOD FOR ESTIMATING NON-NORMAL MEANS  	... 2-4
   2.2  ARITHMETIC MEAN OFTEN USED FOR LONG-TERM EXPOSURE
       ANALYSES	 2-6
   2.3  UNIQUENESS OF SOIL SAMPLING FRAMEWORK	 2-7
       2.3.1  A Note on Classical Sampling Techniques	 2-9
       2.3.2  Insights into Spatial Dependence	2-11
   2.4  POTENTIAL DIFFICULTIES IN USING AVERAGES TO
       DETERMINE COMPLIANCE	2-13
       2.4.1  Strict Random Sampling May Not Be Efficient or Cost-effective	2-14
   2.5  GEOSTATISTICAL SAMPLING TECHNIQUES  	2-15
       2.5.1  Polygons of Influence	2-16
            2.5J.I Second-OrderStationarity  	2-18
            2.5.1.2 ,Polygon Size Determination	2-21
       2.5.2  Cell Declustering	2-25
            2.5.2.7 Cell Size Determination	2-27
       2.5.3  Kriging	2-28
            2.5.3.1 Local Search Neighborhoods  	2-30
       2.5.4  Summary	 .	2-31
   2.6  ESTIMATING CONFIDENCE BOUNDS ON A GLOBAL AVERAGE	2-31
       2.6.1  Effect on Variance Estimates  	2-32
       2.6.2  Isaaks and Srivastava Method	2-33

3.  USING AN UPPER PERCENTER TO MEASURE COMPLIANCE	 3-1

   3.1  DEFINITION	 3-1
   3.2  ADVANTAGES	 3-3
Draft                              i                       February 1996

-------
Geostatistical Soil Sampling Guidance
TABLE OF CONTENTS (Continued)
3.3 POTENTIAL DIFFICULTIES 3-3
3.3.1 Summary 3-4
3.4 STRATEGIES TO ESTIMATE AN UPPER PERCENTILE 3-5
3.4.1 Approximate Confidence Bounds 3-7
3.4.2 Estimation Using an Isopleth Contour Map 3-12

4. USING HOTSPOTS TO DETERMINE COMPLIANCE '. . . 4-1

4.1 DEFINITION 4-1
4.2 ISSUES CONCERNING MINIMUM-SIZED HOTSPOTS 4-2
4.2.1 Usually Need a Grid-based Sampling Design 4-2
4.2.2 Very Large Sites Need Special Consideration 4-5

5. DECISION ERROR AND UNCERTAINTY VERSUS SAMPLING COSTS ...... 5-1

5.1 -KEYISSUES .' 5-1
5.1.1 No Limited Sampling Program Can Guarantee a Correct Decision .... 5-1
5.1.2 Goal: Quantify Uncertainty Relative to a Given Set of Sampling Costs . 5-1
5.2 COMPLIANCE VIOLATIONS DEFINED BY AVERAGES 5-5
5.2.1 The Loss Function Approach 5-8
5.3 VIOLATIONS DEFINED BY UPPER PERCENTELES 5-10
5.4 USING HOTSPOTS TO DEFINE VIOLATIONS 5-12
5.5 FACTORING IN SAMPLING COSTS 5-14
5.5.1 Strategies to Balance Cost Versus Uncertainty 5-15
5.6 DETERMINING SAMPLE SIZE VIA GEOSTATISTICAL SIMULATION . . . 5-17
5.6.1 What To Do When a Full Site Model is Not Available 5-29

6. SAMPLING PLANS AND SCENARIOS 6-1

6.1 COMMON ISSUES 6-1
6.1.1 F-gtahiighing Site Boundaries 6-1
6.1.2 Common Questions To Be Answered 6-2
6.2 ISSUES UNIQUE TO GEOSTATISTICAL STUDIES 6-2
6.2.1 Grid-Based Sampling 6-2
6.2.2 Two-Stage Sampling 6-4
6.3 SCENARIO #1: SAMPLING EX SITU SOIL 6-8
6.3.1 First Considerations : 6-8
6.3.2 Sampling Considerations 6-8
6.4 SCENARIO #2: SAMPLING IN SITU SOIL FROM A WASTE SITE 6-10
6.4.1 First Considerations 6-10
6.4.2 Site Stratification 6-12
6.4.3 Concept of Kriging 6-13
Draft ii February 1996

-------
                                                  Geostatistical Soil Sampling Guidance
                         TABLE OF CONTENTS (Continued)
         6.4.4  Ordinary Kriging	6-14
               6.4.4.1  Search Radius	6-16
         6.4.5  Block Kriging	6-16
         6.4.6  Indicator Kriging	6-18
               6.4.6.1  Performing Indicator Kriging	6-20
         6.4.7  Block Indicator Kriging	6-22
         6.4.8  Indicator Kriging Example	6-25
               6.4.8.1  Site Background and Initial Sampling	6-25
               6.4.8.2  First-Phase Indicator Kriging	6-27
               6.4.8.3  Block Classification  	6-33
               6.4.8.4  Second-Phase Sampling	6-35
    6.5   MULTIPLE INDICATOR KRIGING	6-42
                                 LIST OF FIGURES

Figure 1.    Polygons of Influence	.	2-17
Figure 2.    Moving Windows 	2-19
Figure 3.    Tables of Windows Means and Standard Deviations	 2-20
Figure 4.    FORTRAN Program to Calculate Influence Polygon Weights	2-22
Figure 5.    Cell Declustering Example	2-26
Figure 6.    Empirical Distribution Function  	  3-6
Figure 7.    Declustered vs Naive Estimates	  3-8
Figure 8.    Determining Extent of Hotspot Area  	  4-4
Figure 9.    Example of Power Curve	  5-4
Figure 10.   Computing False Negative Rates	  5-7
Figure 11.   Loss Function Approach	 .  5-9
Figure 12.   Average Cost vs Sampling Intensity	5-22
Figure 13.   Optimal Sample Size vs Number of Phases  	5-23
Figure 14.   Approximately Type I Error Rates	5-26
Figure 15.   Example of Randomized Grid	  6-5
Figure 16.   Site Layout	6-26
Figure 17.   Beryllium Data Postplot  	6-28
Figure 18.   Postplot of Indicator Data	6-29
Figure 19.   Sample Indicator Variogram	6-30
Figure 20.   Model Indicator Variogram  	6-32
Figure 21.   First Indicator Kriging Results  	6-34
Figure 22.   Adding One Second-Phase Sample	6-36
Figure 23.   Corresponding Indicator Postplot	6-37
Figure 24.   After 13 Second-Phase Samples	6-39
Figure 25.   Corresponding Indicator Postplot	6-40
Figure 26.   Final Indicator Kriging Results	6-41
Draft                                    iii                           February 1996

-------
                                                  Geostatistical Soil Sampling Guidance
                               LIST OF APPENDICES
 Appendix A: The Data Quality Objectives Process
 Appendix B: Confidence Intervals for Nonlinear Functions
 Appendix C: Other Elements of Sampling Design Optimization
 Appendix D: Bright-Line Values and Opportunities for Sample Compositing
 Appendix E: Relating Soil and Site Characteristics to Sampling Designs
Draft                                    iv                           February 1996

-------
                                                   Geostatistical Soil Sampling Guidance
                     1. MOTIVATION FOR THIS GUIDANCE
       Under the Resource Conservation and  Recovery Act (RCRA), the United  States
 Environmental Protection Agency (USEPA) is proposing new regulations for treatment, storage, and
 disposal facilities involved with the management of contaminated media, including contaminated
 soils, ground water, and sediments, during government-overseen remedial actions.  Through the
 proposed new regulation, known as the Hazardous Waste Identification Rule for Contaminated
 Media (HWIR-media), USEPA intends to develop more flexible management standards for media
 and wastes generated during cleanup actions.  The HWIR-media proposal would establish modified
 Land Disposal Restriction (LDR) treatment requirements, Minimum Technological Requirements
 (MTRs), and permitting procedures for higher risk contaminated media that are subject to the
 hazardous waste regulations, and would give USEPA and authorized states the authority to exempt
 certain lower risk contaminated media from regulation as hazardous wastes under Subtitle C of
 RCRA,

       HWIR-media would establish two new regulatory designations for contaminated media that
 would otherwise be subject to regulation under the current RCRA Subtitle C regulations. Those two
 designations, "above the bright-line" and "below the bright-line," would distinguish between lower
 risk and higher risk contaminated media.  Media that are designated as above the bright-line would
 remain subject to RCRA Subtitle C, but HWIR-media proposes  a more relaxed set of regulatory
 standards (by comparison to the current RCRA regulations) for those media. USEPA and authorized
 states would have the authority to exempt media that are designated as "below the bright-line" from
 the RCRA Subtitle C regulations, and establish waste management requirements for those media on
 a site-specific basis. The proposed rule would specify bright-line values for as many hazardous
 constituents as possible; that is, all constituents for which USEPA has sufficient verified human
health effects data to calculate the bright-line levels. The bright-line levels themselves are based on
a simple residential exposure scenario, assuming ingestion and inhalation of contaminants by
 humans. In setting these levels, USEPA proposes to use a 10° risk level for carcinogens and a
 hazard index of 10 for non-carcinogens.
Draft                                    1-1                           February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
       To take advantage of the new flexibility that this rule proposes, USEPA anticipates that many
 facilities conducting (or anticipating) remedial actions will collect, analyze, and evaluate samples
 of contaminated media to determine whether bright-line levels are exceeded In determining whether
 concentrations of constituents in contaminated media lie above  or below bright-line values,
 sampling, chemical analysis, and data evaluation procedures must be employed.  Possible chemical
 analysis  methods are  presented in  USEPA's Test Methods for Evaluating Solid  Waste,
 Physical/Chemical Methods (SW-846) (USEPA 1986). Various sampling and statistical evaluation
 procedures for determining compliance with a standard have been described by USEPA guidance
 (e.g., USEPA 1989 and USEPA 1995).  Much of this previous USEPA guidance does not provide
 an adequate description of geostatistical techniques that can be used for making these compliance
 determinations, although they acknowledge geostatistical procedures as an acceptable alternate
 approach. This document reviews geostatistical  sampling and evaluation procedures that USEPA
 believes are appropriate for assuring the status of contaminated media (as compared to bright-line
 levels) under the HWIR-media proposal.  It is not an exhaustive treatise on geostatistics, nor is it
 meant to serve as a primer for those readers with little or no experience in designing and conducting
 a geostatistical study. This  document cites several useful references, however, for those interested
 in pursuing further study in the field of geostatistics.

       Readers of this guidance are strongly encouraged to review the provisions of the HWIR-
 media proposal. Information about the proposal can be obtained from the RCRA-Superfund Hotline
 (the toll-free long distance telephone number is 1-800-424-9346; the local number for callers in the
 Washington, DC Metropolitan Area is 703-412-9810).

 1.1    RELATIONSHIP OF USEPA'S DQO  PROCESS TO SOIL GUIDANCE
       The process of sampling soils in potentially contaminated areas to meet specific statistical
 and data quality objectives has caused confusion and uncertainty on the part of both regulators and
 the regulated community.  Often,  the confusion stems from the unclear link between the "bottom
 line" objectives for the sampling exercise and a clear statistical description of the problem, including
 what statistic(s) will be used to measure compliance under a given environmental standard and what
 specific statistical hypothesis is being tested. Using a formal process to establish Data Quality

Draft                                     1-2                            February 1996

-------
                                                   Geostatistical Soil Sampling Guidance
Objectives (DQOs) on a site-specific basis (see Appendix A) can alleviate difficult questions that
                                                                                 •
may arise after the sampling data has been collected and processed. These questions may include
the following:

       •  Can the collected soils data be used to answer the "bottom line" objectives?
       •  Have the data been collected in a way that allows the estimation of statistical uncertainty
          or error?
       •  Have money or resources been wasted on sampling that wasn't really needed?
       •  Could additional sampling have been done that would have allowed the bottom line
          objective to be answered with significantly less uncertainty?

       Without establishing relevant DQOs prior to actual soil sampling, all of the above questions
can potentially cripple the site assessment process, which ultimately can lead to over- or under-
cleaning areas of concern at a site. The purpose of this document is to address these and similar
questions, giving concrete and realistic strategies for conducting soils sampling, and addressing the
statistical uncertainty, benefits, and disadvantages of distinct approaches to sampling.   The
conceptual framework and the examples provided in this guidance are not meant to be exhaustive
or to address every relevant issue. However, the strategies  described are designed to address
fundamental questions that should be answered as part of the sampling design and data analysis
process.

1.2    FORMULATION OF KEY QUESTIONS
       The first step in any sampling design process is to formulate the key or "bottom line"
question(s). What is the real issue at stake and how can it be answered in a quantitative, statistical
fashion? Will the data be used to make exploratory estimates of the distribution of one or more
contaminants, or are the data supposed to measure compliance with a given action level or regulatory
standard? What does the client or regulator actually want to conclude when all is said and done?
Only by explicitly  formulating the fundamental questions can a proper sampling strategy be
developed.  Even then, the real question may prove too difficult to answer with the  available
Draft                                     1-3                            February 1996

-------
Geostatistical Soil Sampling Guidance
resources or physical constraints. But it is far better to realize and consider such dilemmas at the
•
front end of a site assessment, as opposed to 6 months into a sampling campaign.

With respect to HWIR-media, the most basic question revolves around the bright-line
standards: does a site exhibit concentration levels for one or more constituents that exceed bright-
line? Depending on which side of bright-line the concentrations fall, jurisdiction of the site might
change from Federal control to state or local control. The bottom line objective is to determine
whether Federal RCRA regulations shall be applied in making site clean-up decisions, or whether
state discretion can be exercised in making these decisions. Such questions necessitate a statistical
hypothesis testing framework, rather than a subjective set of decision criteria based on a biased
exploratory description of the site.

Deciding that a testing framework is necessary leads to consideration of the most appropriate
type of hypothesis structure. A statistical hypothesis test involves a quantitative and objective
judgment between two competing, alternative, and mutually exclusive statements about reality.
Because the typical hypothesis test is analogous to a criminal trial, one side of the hypothesis
(i.e., the null hypothesis HO) is akin to a presumption of innocence and is initially granted greater
statistical "presumption" than the competing hypothesis (i.e., the alternative hypothesis HA). Since
the presumption is not equally balanced between the two sides, one of the two competing statements
must be designated the null hypothesis and the other must be designated the alternative. Rather than
being arbitrary, this choice will affect how the statistics computed from the data will be used to
assess compliance.

In assessing compliance with a HWIR-media bright-line level, one must either initially
assume that concentration levels onsite exceed the bright-line or that they do not. Generally, hi this
guidance, sites that will be sampled are assumed to be already in violation of the bright-line unless
strong onsite sampling evidence indicates otherwise. Consistent with this assumption, the basic
hypothesis framework is as follows: HQ: concentrations exceed bright-line vs. HA: concentrations
are below bright-line.
Draft 1-4 February 1996

-------
                                                     Geostatistical Soil Sampling Guidance
 By defining the hypothesis framework this way, good sampling evidence will be necessary to
 statistically conclude that a site falls below bright-line levels.

       Once the basic hypothesis structure has been chosen, it must be refined  by choosing a
 specific test statistic to evaluate the data.. Especially when the test measures compliance with a
 regulatory standard, the test statistic should measure a characteristic of the data comparable to the
 standard itself.  For instance, in the HWIR-media proposal, the statistic of choice is the arithmetic
 mean concentration; the recommended comparison is one between an estimate of the true  onsite
 mean concentration level and the regulatory bright-line.  Such a comparison will be appropriate if
 the distribution of individual concentration levels is not too (positively) skewed and if the bright-line
 is supposed to represent a long-term average exposure scenario (e.g., as in a chronic exposure risk
 model).

       In other cases, the test statistic might need to be different For distributions of contaminants
 that are highly skewed, certain local sub-areas of the site often have very high concentration  levels
 compared to the remainder. These sub-areas are often referred to as "hotspots." In this scenario,
 depending on the definition of the regulatory standard, an arithmetic mean of a small number of
 hotspot concentrations with a large number of much lower values may or may not answer the right
 question.  If the action level is supposed to instead represent the "typical" short-term exposure of a
 random individual wandering  around the site, estimating a median concentration may be more
 appropriate than the arithmetic mean, since the individual is less likely to wander into a hotspot on
the majority of such ventures.

       A  different situation would occur if the bright-line were specifically designed to protect
 against the smaller possibility of encountering a hotspot In this case, the statistic of greatest interest
might be the probability of locating a hotspot of specified size and intensity using a given sampling
 design. Or, if there are potentially many small hotspot areas, the interest might focus on estimating
 an appropriate  upper percentile of the onsite concentration distribution.  That is, if bright-line
 represents a protection against short-term, acute  exposure as opposed to a  long-term, chronic
Draft                                      1-5                             February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
 (i.e., average) exposure, the most appropriate statistic might be a determination of the upper 90th or
 95th percentile of all concentration levels likely to be encountered.

 13    DEFINING VIOLATIONS OF REGULATORY STANDARDS
       The study design should also designate what statistical criteria will be used to make bottom-
 line compliance decisions and how these criteria are consistent with the established hypothesis
 structure.  A case in point would be the criteria for determining compliance with the draft Superfund
 Soil Screening Level (SSL) guidance. Under the draft guidance, a specific SSL number would be
 violated anytime the upper 95 percent confidence limit on the arithmetic mean concentration
 exceeded the standard. Therefore, only if the upper confidence limit were below the standard would
 state or local authorities be given jurisdiction over the handling of the site.

       It is important to recognize that this proposed statistical criteria only makes sense given a
 hypothesis structure that initially presumes the average concentration level to be greater than the
 HWIfc-media bright-line levels, "with an alternative hypothesis (HJ of "the average is lower than
 bright-line," there is strong evidence that the null hypothesis (Ho) is incorrect when the upper
 confidence limit is less than bright-line because, by implication, the entire confidence interval
 around the true average is below the compliance standard.  If, however, the null hypothesis were to
 presume that the average concentration is no greater than bright-line unless proven otherwise, the
 proposed statistical criteria would offer little assurance in deciding between the two hypothetical
 alternatives.  The reason for this is that the true average would only be above bright-line with
 statistical assurance if the entire confidence interval were above bright-line, necessitating that the
 lower confidence limit, as well as the upper limit, be above bright-line. Allowing the confidence
 interval to potentially "straddle" the bright-line would lead to an indeterminate result.

       The same logic shows that conclusive evidence about the true concentration average and its
relation to bright-line can only be obtained if the entire confidence interval is on one side or the
other. If the confidence interval straddles bright-line, it is unclear whether the true average is above
or below the standard In addition, under the null hypothesis "Ho: true average is above bright-line,"
it is technically misleading to say that bright-line has been violated whenever  only the  upper 95

Draft                                     1-6                             February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
 percent confidence limit is above the bright-line.  In that case, the null hypothesis is not actually
 being rejected, the evidence is simply not strong enough to conclude that the true average is below
 bright-line. However, given the initial presumption of the null hypothesis, for policy purposes and
 assigning proper jurisdiction, it may be presumed that the true mean is above bright-line.

       The indeterminate case—where the confidence interval straddles bright-line—is a good
 candidate for further sampling efforts.  For a fixed level of confidence and method of sampling, the
 only effective way to decrease the confidence interval width and thereby improve the chances that
 the confidence interval will fall completely to one side or the other of the bright-line is to increase
 the number of samples used to compute the interval.  It would be possible, hi fact, though not
 standard statistical practice, to defer determination of whether or not the clean-up of a site should
 fall under Federal purview until enough evidence had been collected to construct a non-straddling
 confidence interval. Then the statistical issue would be more akin to developing a stopping rule in
 the field of sequential sampling: continue collecting data until the confidence interval results are
 unambiguous.

       When considering what hypothesis structure is appropriate and what statistic^) will be used
 to test the hypothesis (remember that testing will often be equated with measuring compliance), keep
 in mind that a hypothesis geared toward judging whether the true mean concentration is above or
 below bright-line may give little to no information about other statistical characteristics of the site.
 For instance, knowing that the mean concentration is below bright-line may not indicate whether any
 hotspots exist that exceed bright-line. It also may be impossible to determine whether an upper
 percentile of the concentration distribution is above or below bright-line. Of course, if these
 additional characteristics have no bearing on site characterization, the additional information may
 not be needed. A thorough understanding of what is being tested statistically, however, can prove
 useful if information about more than one key site characteristic is desired.

 1.4    DETERMINING AN ACCEPTABLE LEVEL OF STATISTICAL UNCERTAINTY
       Any statistical decision carries with it, by definition, some level of uncertainty. The DQO
process recommended by USEPA specifically acknowledges that any decision based on sample data

Draft                                      1-7                            February 1996

-------
Geostatistical Soil Sampling Guidance
and statistical manipulation necessarily involves the risk of both false positive (Type I) and false
negative (Type II) errors. When it is understood what those errors represent in terms of the statistical
hypothesis framework and the real-life consequences attendant to each type, it should be possible
to posit a reasonable or tolerable level of uncertainty and risk. In no case can the risk of either type
of error be completely eliminated; rather the goal is to minimize the risks to acceptable levels. Often
this process is the key to designing an adequate sampling regime.

Partly because the two alternatives in a statistical hypothesis framework are not given equal
presumption, the risk of a Type I error is rarely equal to the risk of a Type II error for a given
sampling scheme. The two error types also may be associated with very different practical
implications. So it is extremely important to explicitly translate the concepts of false positives and
false negatives into site-specific consequences, prior to designing the sampling regime. For
example; if the hypothesis structure is defined as HO: true mean > bright-line vs. HA: true mean £
bright-line, a false positive is defined as rejecting the null hypothesis when HO is true, and a false
negative is defined as accepting the null hypothesis when the alternative hypothesis is true.

More practically, a false positive would thus imply mat the true concentration mean is greater
than bright-line, but that it was concluded to be less than bright-line. Ultimately, under the proposed
HWIR-media, mis would mean a decision to let the site be handled by state or local authorities when
in fact the site should be regulated under Federal control. Similarly, a false negative would imply
that the true mean is less man bright-line, but that it was (wrongly) reported as exceeding bright-line.
The consequence of this kind of error, of course, is that the site would fall under Federal jurisdiction
when, in relation to bright-line, the site could be handled under state or local authority (with the
potential to exercise more flexible criteria in setting clean-up standards).

Note that the two types of errors result in different consequences. One error may even be
judged to be qualitatively worse than the other (though this may depend on the decision-maker's
perspective). In general, the practical and economic consequences of each error type should be
judged in advance, so that the sampling strategy may be tailored to minimize the type of error
considered most egregious. Indeed it is rarely possible, even if more data can be collected, to

Draft 1-8 'February 1996

-------
Geostatistical Soil Sampling Guidance
minimize both kinds of errors at the same rate. Often some imbalance between the expected rate of
false positives and false negatives will still exist.

Furthermore, the false negative rate depends explicitly on the statistical power of the
hypothesis test, that is, the probability that the null hypothesis will be correctly rejected (i.e., a true
positive). In the above example, since the null hypothesis is rejected only when the sample data
indicate the true mean is less than bright-line, the statistical power of the test will tend to increase
the farther the true mean is actually located below bright-line. Why? Because a soil concentration
distribution with true mean far below bright-line will tend to generate sampling data also well below
bright-line. By contrast, a distribution with true mean barely below bright-line will tend to generate
sampling data that is both above and below bright-line, leading to a less certain determination.
Consequently, the false negative rate also depends on where the true mean is located, dropping in
proportion to how far the true mean is below the standard.

Under a different kind of hypothesis, the meaning and consequences attached to Type I and
Type n errors may be completely different For instance, suppose the null hypothesis states that no
hotspots of a given size and contamination level exist, and the alternative hypothesis states that there
exists one or more such hotspots. Then a false positive error would mean that a qualifying hotspot
was "found" when no such contaminated sub-area actually exists (depending on how hotspots are
determined, a case like this could occur if sampling data are used to create a contoured map of the
site in order to locate potential hotspot areas). A false negative would mean that one or more real
hotspots was missed during sampling and analysis, leading to the false conclusion that the site is
devoid of such areas. Again, the relative impacts and real-life consequences of making either a
Type I error or a Type n error should be established ahead of time and fully acknowledged. Missing
a real hotspot may entail certain risks to the environment and/or human health. Declaring the false
presence of a hotspot may entail unnecessary clean-up expenditures. Neither mistake is desirable,
but one may be less desirable than the other for political, scientific, economic or other reasons.
Consequently, knowing ahead of time what the errors represent in real terms will allow the
development of a sampling plan that at the very least ensures that one type of risk is minimized even
at the expense of the other.

Draft 1-9 February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
       The process of weighing the distinct real-life consequences of Type I versus Type II errors
 and adjusting the sampling design to limit the probability of one or both types to agreed upon levels
 lies at the heart of developing a good sampling framework. In some cases it may be possible to
 create tables and/or graphs that cross-tabulate increments in false positive and false negative error
 rates against the degree of sampling effort necessary to achieve these levels of risk. For example,
 if sampling is to be conducted on a regular grid across the site, it may be possible to tabulate the
 sampling effort hi terms of grid spacing (i.e., the minimum distance between adjacent sampling
 nodes on the grid). From the desired grid spacing and physical boundaries of the site hi question,
 it can be determined how many samples must be collected and how much the sampling and analysis
 of those samples will cost Then, the planner is in a position to perform a cost-benefit analysis to
 either (1) minimise the expected chances of Type I and/or Type n errors for a given set of available
 resources or (2) determine the level of resources needed to mrpjmjTe the risk of these same two error
 types to acceptable levels.

 1.5    ENSURING  THAT  CONCLUSIONS HAVE  WELL-DEFINED STATISTICAL
       MEANING
       Another step in designing a sampling and analysis  program is to anticipate the kinds of
 conclusions that are possible.  These conclusions should of course have a well-defined statistical
 meaning, but also should satisfy regulatory goals. Under the proposed HWIR-media, the bright-line
 standards are generally meant to represent average contaminant levels above which Federal
jurisdiction is required. In this context, it makes sense to develop statistical statements about the
 average onsite contaminant levels.  However, one still should be careful about the type of average
 needed. For instance, at a site with a large hotspot area and relatively clean surrounding areas, a
 valid  statistical estimate  of the overall site mean might suggest that bright-line has  not been
 exceeded.  But the  average  contamination  level at one or more minimum  exposure units
 (£U>—perhaps representing half an acre each—either within or near the hotspot area might indeed
 exceed bright-line.  If one needs to know whether any of these individual EUs exceeds bright-line,
 the sampling program better be designed so mat estimates of concentration levels within individual
 EUs can be made, rather man just an estimate of the overall site average.  Conversely, if the overall
 site mean is judged to exceed  bright-line, primarily on the basis of a single hotspot area, but the

Draft                                    1-10                            February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
majority of exposure units have average concentration levels  below bright-line, it would be
advantageous to be able to target for clean-up only those EUs with sufficiently high contamination,
rather than being forced to remediate the entire site.

       Some regulatory thresholds are not defined in terms of average concentration levels, but
instead depend on other statistical characteristics of the soil. What if the threshold is designed to
represent the upper 90th or 95th percentile of the concentration distribution? In such cases, the
statistical conclusion should involve an inference about the appropriate upper percentile of the soil
population and not the mean.  An example might be whether the data indicate with 95 percent
confidence that 90 percent of all the soil concentrations lie below the compliance standard.  This
approach fits the rationale behind exposure to chemicals with acute health effects.

1.6    ESTABLISHING CLEAR SITE BOUNDARIES
       Another concern that particularly plagues soil studies is the need to establish and define clear
site boundaries. If a single hotspot exists within a broad landfill area, no legitimate statement about
the average contaminant level can be made without reference to a particular set of site boundaries
because the average level in an area narrowly defined around the hotspot will be much greater than
the average over an area including much of the surrounding "clean" land The wider the boundaries
are drawn, the greater the chance that uncontaminated soil will  be  statistically mixed  with
contaminated soil to produce a lower overall average.

       In this context, if the site boundaries are not already well-established by political and/or
geographical boundaries, one  must  watch for  attempts to statistically dilute the impact  of a
contaminated zone through the arbitrary widening of site boundaries.  The  distribution of soil
contamination will change whenever the boundaries are changed. One practical solution to this
dilemma has already been mentioned as part of the HWIR-media proposal. That is, divide the site
into  a series of identically sized sub-units (e.g., half-acre EUs) that correspond to the smallest
amount of land worth remediating or for which health and environmental risks can reasonably be
anticipated and developed. Then a sampling and analysis plan can be developed to estimate the
statistical characteristics of each sub-unit separately, independent of the overall site boundaries. No

Draft                                     1-11                       '     February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
dilution "effects" would occur because instead of an overall site average, interest would lay in
estimating the average of each sub-unit individually, where each sub-unit has predefined and fixed
boundaries.

1.7    THREE STATISTICAL OPTIONS FOR MAKING COMPLIANCE DECISIONS
       Overall, there appear to be three basic statistical options useful for determining compliance
with regulatory soils standards:  (1) using an arithmetic mean or median, (2) using an upper
percentile such as the 90th or 95th, and (3) determining compliance via the presence of one or more
hotspots of a given size.  As noted above, each of these options can be rendered almost meaningless
unless the she boundaries are well-defined and pre-established. In addition, certain interrelationships
exist between the three options that should be noted.
                                       •
       For instance, to establish that a hotspot of given size exceeds the compliance standard, the
precise meaning of hotspot must be defined. Theoretically, a hotspot could represent an excessively
high "point*4 concentration, perhaps the size of a square centimeter.  Such hotspots would be
extremely difficult to locate, and, no matter what the concentration level, might be so small as to not
constitute any tangible health or environmental risk. Instead, a hotspot is usually defined as a
concentration above some threshold over a minimum-sized area (e.g., a 10-meter radius circle).
Even this last statement does not completely define the term hotspot, for there are at least two
additional possibilities—must a hotspot exceed the threshold (1) in every portion of the minimum-
sized area uniformly, or (2) only in average concentration level across the hotspot area?

       The connection between compliance option (3) and compliance option (1) now becomes
apparent.  For determining that a hotspot exists under the last definition of hotspot given above is
equivalent to determining that the average concentration level within a hotspot-sized sub-area of the
site exceeds a particular threshold.  In fact, the HWIR-media proposal of determining whether the
average level of each EU onsite is above or below bright-line is akin to locating the presence of any
EU-sized hotspots, where the hotspot threshold is taken equal to bright-line.
Draft                                     1-12                           February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
       Of course, for hotspots defined as minimum-sized sub-areas in which every portion of the
sub-area must exceed the threshold, the sampling strategy would often be substantially different from
the case where the hotspot is defined as a type of average concentration.  It is therefore important
to know ahead of time how hotspots are to be defined if they are to constitute the core of compliance
determination.

       Another connection exists between compliance option (2) and compliance option (3). By
definition, an upper percentile (e.g., the 95th) is an upper concentration limit above which only a
small percentage of concentration values is expected. If the concentration value equal to the true
95th percentile were known in advance (which of course is unrealistic), one could theoretically set
a cutoff at this  value and define a  hotspot as any minimum-sized area in which  every point
concentration was equal to or greater than the cutoff.  In this way, defining hotspots would be
analogous to picking some (usually unknown) upper percentile of the population of concentration
values. Locating all such hotspots and measuring their areal extent in relation to the total site area
would give an estimate of the percentage of concentration values above the selected cutoff and hence
an estimate of the percentile represented by the cutoff.

       Of course, since hotspots must be defined in relation to a minimum-sized area, locating all
such hotspots might not locate all portions  of the soil with concentrations above the cutoff.
Depending on the pattern of contamination, high concentration levels could occur in patches of soil
smaller than, the  minimum size needed to qualify as a hotspot.  Moreover, though the defining of
hotspots is related to the problem of estimating upper percentiles, there are substantial differences
in the sampling strategy appropriate for each problem. A sampling strategy that searches until the
first hotspot is located may not provide enough information to estimate the upper 90th percentile,
for instance.

       Understanding the similarities and distinctions between the three compliance options is
instructive to understanding the overall soil sampling framework.  Much more will be said in the
pages that follow about the peculiarities of soil sampling,  especially since patterns  of soil
contamination often exhibit strong degrees of spatial correlation.  Because of this, developing a valid

Draft                                     1-13                            February 1996

-------
                                                     Geostatistical Soil Sampling Guidance
 and efficient soil sampling plan often requires more work than developing a plan for populations
 without such dependence. Before a more detailed discussion, however, it may be helpful to look at
 each of the compliance options in turn, contrasting their advantages and disadvantages as statistical
 tools.
Draft                                     1-14                            February 1996

-------
Geostatistical Soil Sampling Guidance
2. USING AN AVERAGE TO DETERMINE COMPLIANCE
The arithmetic average (mean) and the median, are the most frequently referenced
statistical procedures in other United States Environmental Protection Agency (USEPA) guidance
(see, for example, USEPA 1989 in the reference section). In addition, statistical procedures
involving the mean or a comparison of means are the most common in general statistical literature,
and for good reason. The mean is mathematically tractable, easily understood in terms of its
physical implications, and backed by a wealth of statistical theory and application. Because the
arithmetic mean is often approximately normally distributed even when the underlying population
is not, due to the Central Limit Theorem, it is often possible to make good statistical inferences
about the mean of an underlying population. Extensive tables of the normal distribution exist, as
well as ready formulas in most statistics textbooks for computing confidence intervals around the
mean, and comparisons of means via t-tests or F-tests. These formulas tend to work quite well
provided that the set of sample concentration values is approximately statistically independent and
the underlying population is not too heavily skewed.

Using an average to determine compliance often conforms with the meaning of the standard
that one compares site data against. Some standards (such as maximum contaminant levels
[MCLs]—promulgated under the Safe Drinking Water Act) are based on lifetime average daily
dose exposure scenarios. Comparing averaged site data to standards computed on the basis of
average exposure scenarios assures a logical and consistent decision framework. Other standards
could be based on median concentrations, such as the LCso (the concentrations lethal to half of the
test indicator organisms under a given exposure scenario). Comparing the estimated median level
at the site to a standard based on an LC^ also represents a logical and consistent decision
framework.

Example
Suppose the following 12 soil measurements of cadmium have been obtained from
randomly selected locations within a small landfill that has handled smelter waste in the past.
Because the soil samples were collected strictly at random within the boundaries of the landfill,
the set of measurements may be treated as a statistically independent random sample.

Draft 2-1 February 1996

-------
                                                   Geostatistical Soil Sampling Guidance
 Furthermore, initial testing of the data has indicated that the sample distribution is reasonably
 symmetric (skewness coefficient y,= -0.195)  and passes  a Shapiro-Wilk test of normality
 (W= 0.967) at the 0.01 level of significance.
$anjn|e Number
1
2
3
4
5
6
7
8
9
10
11
12
it-line fnr rriMn cadmium
Concentration
35.7 ppb
24.0 ppb
32.3 ppb
39.1 ppb
28.0 ppb
31.6 ppb
27.9 ppb
33.5 ppb
23.2 ppb
34.0 ppb
SOJppb
32.4 ppb
i has been set at 40 ppb, is the true
landfill sufficiently low to allow for local or state handling of clean-up activities at the site?
Assuming that a 95% confidence level has been chosen and that the null hypothesis presumes the
true mean is actually above bright-line, the question can be answered by computing a one-sided
upper confidence limit on the mean and comparing this limit against bright-line.  The normal-
based formula for such a limit is given by:
       Plugging the gadmhim measurements into this formula— where the symbol t MiB., represents
the upper 95th percentile of the t-distribution with (n-1) degrees of freedom (d.f.)— results in an
upper 95% confidence limit of 33.42 ppb, which when compared to the bright-line standard of
40 ppb suggests that the true mean r-aHminm concentration level is low enough to allow for state
or local control of the site.

       Note that since the implicit alternative hypothesis (HJ in this example is that the true mean
         level is less than bright-line, it makes sense to compare an upper confidence limit (UCL)
Draft                                     2-2                            February 1996

-------
Geostatistical Soil Sampling Guidance
on the mean to the standard. Onfy when the UCL is below bright-line is one 95 % confident that
the true mean is really below 40 ppb. It should also be noted that this approach of comparing a
one-sided upper 95% confidence limit against bright-line (the lower confidence limit was not
needed because it would not help determine whether the true cadmium average was less than
bright-line) is identical to performing a one-tailed, one sample t-test at the 5% significance level
of the following hypothesis: H<>: n i BL vs. HA: n < BL.

Despite the familiarity of normal-based tests and confidence limits, in many soil sampling
problems, neither the assumption of independence or approximate symmetry holds. Often the
samples are collected in a systematic fashion, which, combined with the spatial dependence
exhibited by soils, leads to a collection of non-independent samples. Likewise, the underlying
population is often fairly heavily skewed, usually to the right, and characterized by a majority of
lower level concentrations combined with"a long right-hand tail of extreme values.
i
Leaving aside the issue of spatial dependence, inferences on means for certain kinds of
skewed populations can be made either exactly or approximately through the use of special
techniques. In particular, if a confidence interval around the mean is desired, Land (1971 and
1975) has developed an exact technique (with extensive tables for implementing it) when the
underlying population is lognonnal. He also developed a more complicated approximate
technique when the population can be transformed to normality via any other increasing, 1-1, and
twice differentiable transformation (e.g., square root, cube root).

For those with little experience in estimating means of skewed populations, examining the
lognormal distribution is instructive. The lognonnal distribution is traditionally designated by the
notation A(/t,o) (see Aitchison and Brown 1969), where p and o denote the parameters controlling
the location and scale of the population. Unlike the normal distribution, however, which is
typically designated by N(p,o) and where /i and o denote the exact mean and standard deviation,
these same parameters play a different role in the lognormal case. For one thing, being positively
skewed, the lognormal mean is larger than the lognormal median. In the symmetric normal
distribution, the mean and median are identical.

Draft 2-3 February 1996

-------
                                                   Geostatistical Soil Sampling Guidance
       However,  because  lognormal data that have been transformed via logarithms are
 approximately normal in distribution on the transformed scale, a common mistake is to assume
 that a standard normal-based confidence interval formula can be applied to the transformed data,
 with the confidence interval endpoints retransformed  to the original scale to get a confidence
 interval around the mean. Invariably, such an interval will be biased to the low side.  In fact, the
 procedure just described actually produces a confidence interval around the median of a lognormal
 population, rather than the higher mean.  The reason for this is that the mean of log-transformed
 lognormal data  gives  an  estimate  of the lognormal parameter p.   When this estimate is
 retransformed to the original scale,  one has an estimate of e*—the lognormal median—not an
 estimate of the actual lognormal mean, which is given by expO* + .So2).  Correctly accounting
 for this so-called "transformation bias" is the core of Land's procedure.

 2.1    LAND'S METHOD FOR ESTIMATING NON-NORMAL MEANS
       The basic steps involved in Land's procedure for approximately lognormal data include:
 (1) transforming the data via logarithms (natural  logs are typically used); (2) computing the sample
 mean and standard deviation on the log scale; (3) obtaining the correct bias-correction factor(s)
 (HJ from Land's tables (1975), where the correct factor depends on sample size, sample standard
 deviation, and desired confidence level (1-ct); and (4) plugging all the factors into one or both of
 the equations given below (depending on whether a one-sided or two-sided interval is desired),
 where y is the mean of the log-scale  data and s, is the log-scale standard deviation.
                                   - exp
_a = exp
                                            +.5*-
Draft                                    2-4                            February 1996

-------
Geostatistical Soil Sampling Guidance
Example
Suppose that, at the same landfill described in the previous example, eight randomly
selected locations are sampled for benzene with the following results:
fipinplc Number
1
2
3
4
5
6
7
8
Concentration
0.5 ppm
0.5 ppm
1.6 ppm
1.8 ppm
1.1 ppm
16.1 ppm
1.6 ppm
<0.5ppm
» +1«A *OT«A M«AAM fl^rif^g^^ M*«««AAW*VAt«S%«« «n*l« O/^CK ^«%««A«4^M*X»J
be followed? First, the skewness and normality of the data set should be tested. Since the one
non-detect concentration is unknown but presumably between 0 ppm and the reporting limit of 0.5
ppm, a reasonable compromise is to impute this value at 0.25 ppm, halfway between the two
limits. Then the skewness is computed at y, = 2.21 and the Shapiro-Wilk test statistic on the raw
data works out to W= 0.521, failing an assumption of normality at well below the 1 % level of
significance.

On the other hand, first transforming the data via natural logarithms gives a smaller
skewness coefficient of Y!= 0.90 and a Shapiro-Wilk statistic of W= 0.896. Since these values
are consistent with normality on the log-scale, the data set should be treated as lognormal for
estimation purposes. Furthermore, the skewness of the raw-scale data is too high to reasonably
estimate the population mean as if the data were normal. Therefore, Land's equations should be
used to construct a two-sided confidence interval on the mean. In this case, the percentage in each
tail would be set at 5% to reach the desired 90% confidence level.

Using Land's (1975) tables to pick the bias-correction factors, with a sample size of eight
(d.f. =7) and a standard deviation on the log-scale of 1.2575 log(ppm), one finds that HQJ=

Draft 2-5 February 1996

-------
                                                   Geostatistical Soil Sampling Guidance
 -1.687 and H** 4.068.  Plugging these values,  along with the log-scale mean of 0.2037
 log(ppm), into the above equations leads to the 90% two-sided confidence limits of LCL <„= 1.21
 ppm and UCL.,o=  18.69 ppm.  One could then conclude with 90% statistical confidence that the
 true average benzene concentration at the landfill is between 1.2 and 18.7 ppm. Note that by
 ignoring Land's bias-correction and applying the normal-based confidence limits instead, the
 lower and upper confidence limits would be computed as -0.66 ppm and 6.52 ppm, demonstrating
 the importance of properly assessing the data distribution in advance.

       When the data are not lognormal but can be approximately normalized through some other
 1-1, increasing transformation, the approximate but more complicated method described by Land
 (1975) can be used for constructing a confidence interval around the mean.  The basic strategy
 involves the following observation: for the lognormal distribution, the logarithm of the exact mean
 is equal- to (p +  .So3), that is, a function of the form (ft +  JLo).  The Land procedure for
 lognormal data involves adding and/or subtracting an a-level adjustment factor to this function,
 thus creating a confidence interval for the function (p + Ao*). Then, upon re-exponentiation of
 these confidence limits, one has a confidence interval for the final lognormal mean, that is
 cxp(/t + .So2).

       Unfortunately, the  logarithmic transformation is the only  one for which the true  mean
 (E[X]) can be expressed as a function of the parameters  M and o in the form  Ot +  A.O2).
 Therefore, the confidence interval factors developed by Land only apply directly to the lognormal
 case. For instance, if a square root transformation is used to produce approximate normality, the
 true mean is of the form (p* + o2). If a cube root is used, the true mean is equal to (p3 + 3/to2).
 To develop confidence intervals for non-linear functions of these types, see Appendix B for
 details.

2.2    ARITHMETIC MEAN OFTEN USED FOR LONG-TERM EXPOSURE ANALYSES
       Another reason for the popularity of the arithmetic mean as a measure of compliance is
that it has frequently been used in past rules and  guidance involving risk-assessment.  For
example, several USEPA documents mandate the use of an upper confidence limit on the mean

Draft                                    2-6                            February 1996

-------
Geostatistical Soil Sampling Guidance
to determine whether health and/or environmental safety standards have been violated. The
arithmetic mean is singled out in these cases (as well as in the current HWBR-media proposal)
because it can be interpreted as a type of long-term average risk exposure. For instance, a typical
model of long-term exposure to a given carcinogen postulates that the arithmetic mean is a
reasonable estimate of the typical level of chemical ingested by a randomly selected individual
who moves across the site in a random pattern over an extended period of time (often several
years to several decades).

In these kinds of exposure models, a key numerical component is either an estimate of the
mean contaminant level or, as mentioned above, an estimate of the upper confidence limit on the
mean. These estimates are used along with estimates of average body mass, ingestion rates,
toxicity of the chemical, and other estimates to develop the final long-term risk-exposure
estimates.

Of course, long-term exposure is not always the most pertinent factor in deciding whether
a particular site needs remedial action. Occasionally, there is more interest hi acute, short-term
exposure effects. In such cases, one may be less interested in the average level of contamination
across the site area and more interested in possible pockets of high-level contamination
(i.e., hotspots) that could trigger acute exposure among one or more segments of the local
populace if encountered. Or one may need to generate an estimate of a particular upper percentile
of the contaminant population, such as the 90th or 95th. In general, it is good to keep in mind
what kinds of estimates are of greatest interest and need. It should not be automatically assumed
that the mean is necessarily the best or only estimate needed for a particular problem.

2.3 UNIQUENESS OF SOIL SAMPLING FRAMEWORK
Estimating means (or other statistics) of soil populations is often more difficult than with
other media such as air or water. The soil sampling framework typically exhibits unique
characteristics that must be accounted for when developing a valid sampling and analysis plan.
Unlike environmental media that tend to exhibit frequent migration of contaminants (e.g., surface
water, air, etc.), soil populations tend to be relatively fixed with respect to geography and exhibit

Draft 2-7 February 1996

-------
Geostatistical Soil Sampling Guidance
either immobile or very slowly changing patterns of soil contamination. Such conditions
substantially constrain the kinds of sampling techniques that can be employed in collecting soil
samples.

As an illustrative contrast, consider the problem of collecting a random sample of ad-
measurements from an air monitoring station. Even though the monitoring station is fixed at one
geographic location and monitoring may occur at regular time intervals, it is still often possible
to collect a random series of essentially statistically independent samples, primarily because air
conditions, including wind directions, convection currents, etc., and contaminant levels change
so frequently. In other words, the series of samples collected at a single point in space can often
be used as an approximate random sample from the entire population of possible air measurements
(though certainly not in every case). The same strategy applied to a soil population would not
result in a valid random sample. Conditions in the soil at a particular location are unlikely to
change much between sampling episodes, forcing the collection of soil samples at geographically
distinct locations in order to obtain independent samples and to obtain a "representative" set of
samples from the site.

In addition, there are often persistent geochemical and geologic relationships between
adjacent parcels of soil that remain essentially fixed, unlike many air and water populations.
These relationships induce certain similarities in the soil at adjacent locations, perhaps even
similar levels of contamination, so that similar groups of soil parcels tend to be clustered together
in patches. This general phenomenon is known as "positive spatial dependence" or "spatial
correlation." That is, parcels in nearby locations tend to be more similar than parcels separated
by greater distances.

Again, the frequent migration of contaminants hi other environmental media typically
results in little to no discernable spatial correlation, although there are some populations besides
soils that exhibit spatial or geographic dependence. Consider for instance measuring the income
patterns of heads of households in the United States. Patterns of residential development have led
to distinct geographic clusterings highly correlated with income level. Therefore, the incomes of

Draft 2-8 February 1996

-------
                                                   Geostatistical Soil Sampling Guidance
two neighbors living across the street from each other are likely to show greater similarity than
the incomes of two individuals living across town.  This same principle often holds for patterns
of soil contamination.

       In summary, it is frequently true that soil populations exhibit a greater or lesser degree of
positive spatial correlation, that is, the tendency for higher concentration levels to be clumped
together and, consequently, for lower concentration levels to be clumped together.  In many air
and water populations, greater homogeneity or mixing of the contaminants and much less evidence
of spatial dependence is expected.  Because of these facts, the sampling of soils typically must be
handled differently than the sampling of other environmental media.

2.3.1  A Note on Classical Sampling Techniques
       Of course, the presence of significant spatial correlation does not preclude the use of
simple random sampling to gamer a random sample of soil measurements.  One could randomly
select coordinates from a grid superimposed over the site area and collect samples from each of
these randomly chosen locations.  The resulting set of measurements would indeed be a random
sample from the underlying soil population; however, such a scheme often would prove to be less
than advantageous from both practical and theoretical considerations. One drawback is that
selecting samples in random locations, as opposed to collection from nodes along a systematic
grid, is more time consuming and can present practical and logistical difficulties in terms of trying
to physically locate the sampling points.  Also, a random scatter of locations is likely to leave
some portions of the site with few and widely scattered sampling locations, necessitating greater
travel expense per sample.

       From a theoretical standpoint,  the presence of spatially correlated and hence, clumped
data, as opposed to more homogenous patterns of contamination, implies that random sampling
will often be much more inefficient than systematic sampling at estimating quantities like the
overall mean. Inefficiency in  this context refers to the degree of precision achieved given the
amount of sampling effort undertaken.  Systematic sampling plans (e.g., grid sampling) are more
likely to capture small clumped areas of higher concentration and therefore tend to do a better job,

Draft                                     2-9                            February 1996

-------
Geostatistical Soil Sampling Guidance
given the same number of samples, of precisely estimating the mean. Systematic sampling plans
also tend to provide better information for estimating localized statistics, since a systematic plan
ensures that all portions of the site are sampled while a random sample does not.

On the other hand, proper statistical analysis of systematic samples must explicitly account
for the presence of spatial correlation, while analysis of a simple random sample of soil
measurements can rely on classical and readily available formulas, at least when estimating the
overall site mean. But as seen below, a site will often be divided into several exposure units
(EUs) and interest will focus on the localized statistics associated with each EU (e.g., whether a
given EU exceeds bright-line for a particular contaminant). Simple random .samples generally will
not provide adequate information to make such localized determinations with adequate precision
and statistical confidence.
•

One alternative sampling approach intermediate to these schemes is the use of stratified
random sampling. Under mis approach, the site is first divided up into smaller sub-areas or
strata. Then in each stratum, a random sample of measurements is collected. Local statistics can
be estimated within each stratum, and the entire set of measurements can be combined using a
well-defined weighting scheme to estimate the overall mean and variance of the site. In some
cases, particularly if the site has distinct patterns of contamination in different sub-areas and the
general outlines of these sub-areas are known hi advance and can be used to define the strata,
stratified random sampling can be more efficient than even systematic sampling in estimating the
overall mean. However, it may not be clear in advance how to stratify the site, and the optimal
strata may not necessarily correspond to the boundaries of EUs onsite.

More typically, stratification can be very useful in separating large chunks of the site prior
to developing a detailed sampling strategy, where the only information available might consist of
historical land use or general waste disposal practices. In cases of this son, it may be possible
to isolate (i.e., stratify) certain portions of the site as being improbable areas for significant
contamination. Then different types of sampling could potentially be carried out in different
strata, depending on the general probability of substantial contamination. For instance, an area

Draft 2-10 February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
thought to have little or no previous exposure might be sampled via a series of composites, while
an area known for past industrial usage might require systematic sampling followed by additional
second-phase sampling in locations identified as potential hotspots.  Portable, field screening
analytical instruments can provide assistance in stratifying the  site  into areas with potential
hotspots (see Appendix C for more details).

2.3.2  Insights into Spatial Dependence
       Once described, the concept of (positive) spatial  dependence or correlation is fairly
straightforward.  Sometimes it is referred to as spatial continuity, to emphasize the fact that as one
moves from one point to, another nearby location in a field of spatially correlated data, the values
encountered at points along the way should not change dramatically.  Rather,  one should witness
a more gradual or continuous change in concentration level, much as one standing on a gentle
slope moves gradually from one  height to another, instead of having to jump a steep cliff.
However, just because spatial continuity is not hard to grasp conceptually does not mean it is easy
to describe in terms of useful statistical and/or probabilistic models.

       Despite the straightforward concept of spatial continuity, the basic probabilistic models
used in spatial statistical analysis  vary depending on the  analyst's viewpoint.  Since the soil
concentration at each location is presumed fixed (at least for a given time period), the concept of
randomness can only be introduced in one of two ways:
       •  Either the site is thought of as a single realization of a hypothetical geological process
          (and/or contaminant release and migration process) that might have produced an
          infinite number of different onsite concentration patterns (all with certain common
          characteristics like a common mean and variance), or
       •  The site is presumed fixed in pattern but unknown prior to sampling.  In this model,
          randomness is introduced by the process of sampling itself, since one does not know
          in advance what samples will be collected or where the soil locations will be.
       Theoretical  statistical properties  such as expected value,  bias, and  variance can  be
computed under either basic model, though the  results will generally differ.  The first model

Draft                                     2-11                            February 1996

-------
Geostatistical Soil Sampling Guidance
(sometimes called the superpopulation model) has certain theoretical advantages, but is less tied
to the reality of a specific site. It would mainly be used to assess the typical effectiveness of a
given data evaluation strategy averaged over many similar sites possessing the same common
statistical properties (like the mean and variance and degree of spatial correlation). To visualize
this, imagine a single univariate distribution (bag) of concentration measurements being parcelled
out repeatedly to different locations at the site. Each time the bag is empty, the overall
concentration pattern is observed and then the process begins anew, thus ensuring that the
univariate distribution of concentrations always has the same statistical properties, even if the
spatial pattern differs with each parcelling.

The second basic model conditions all probability statements on the specific, but unknown,
pattern of onsite concentrations. Sample locations must be picked at random in order to
approximate the process of "mixing" the "bag" of possible concentration measurements in order
to get a random sample. Under this model, the data may be treated as an independent, identically
distributed (i.i.d.) random sample in estimating the mean and variance.

Regardless of which model is used, some very important assumptions about the pattern of
spatial correlation have to be made before any statistical conclusions can be drawn regarding the
sampled data. The only data that are available are those measurements made at the sampled
locations. To the extent that the pattern of spatial correlation observed between pairs of these
. points is reflective of the unsampled remainder of the site, estimates and conclusions drawn from
the data should be reasonable. However, one cannot typically select alternate sets of samples to
verify this assumption about die pattern of spatial correlation. Moreover the estimated pattern of
spatial continuity is typically an average correlation over all pairs of sampled data. The
correlation could possibly be stronger in some parts of the site than in others. Yet there is usually
not enough data to verify whether or not a single pattern of spatial continuity will adequately
describe the entire site area. Instead it is merely assumed as a necessary prerequisite for
generating local estimates of the concentration distribution.
Draft 2-12 February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
2.4    POTENTIAL  DIFFICULTIES  IN   USING   AVERAGES  TO  DETERMINE
       COMPLIANCE
       Because of the presence of strong spatial continuity at many soil sites, there can be
potential difficulties  in using averages (e.g., arithmetic means) to determine compliance with
regulatory standards or even screening levels. For one, the sampling strategy must be carefully
planned  if  one wants to  avoid generating severely biased estimates  of  the  mean and/or
underestimates of the variability. Experience has shown that many sites are sampled in either a
haphazard  manner or via "professional judgment," instead of following  a structured  and
statistically  defensible sampling plan.  Such haphazard  sampling  can easily miss areas, or
"clumps," of soil having either particularly large or particularly small concentration levels.

       Haphazard or judgment sampling would not be a significant problem if the underlying
media were fairly homogenous throughout in terms of concentration level.  In that case, the choice
of sampling  location would be much less important, and the resulting set of samples would tend
to behave  similar to a  true  random  sample.   However,  the  clumping and patchwork of
concentration levels exhibited by many soil sites necessitates a more careful approach. Only by
maximizing  (1) the probability that all portions of the site will have a chance to be sampled and
(2) the fraction of the site that is actually covered by sampling, will one tend to ensure that all
clumps are "hit."  And only by hitting all of the clumps will one be able to accurately estimate
mean concentration levels and  in particular, the inherent local variation in concentration.levels.

       In addition, the actual univariate distribution of the soil does not play as big a role as is
often assumed from a classical statistics perspective.  If one could dig up the entire site, measure
each parcel of soil, and plot all these measurements on a histogram, it might be that the overall
concentration distribution is  approximately normal or lognormal.  However,  knowing  this
information would not indicate the degree of spatial continuity in the  soil  population.  One
normally distributed soil  population might be very spatially discontinuous, where low and high
concentrations are as likely to be found close together as are pairs of high concentrations or pairs
of low concentrations.  Another normally distributed population might exhibit  strong spatial
correlation,  so that essentially  all the high concentration values are located in one corner of the

Draft                                    2-13                           February 1996

-------
Geostatistical Soil Sampling Guidance
site and all the low concentration values are in another corner. The pattern of spatial continuity
will have a large impact on the kind of sampling strategy chosen. In the latter case, if no samples
are collected from the corner of high values, a very biased estimate of the overall mean will
result, and chances are good that the variance will be underestimated as well.

Note that spatial correlation can exist not only along the horizontal plane of a site, but also
along the vertical dimension of depth. While somewhat beyond the scope of this document, one
must be very careful before averaging multiple samples collected along a vertical borehole. Such
sampling is of course very practical and cost effective, but if the data exhibit strong spatial
continuity in a vertical direction (such as the accumulation of contaminants in a clay layer of soil),
the samples collected along any given borehole will tend to be correlated and non-independent.
This in turn can lead to biased estimates of the mean.
•
2.4.1 Strict Random Sampling May Not Be Efficient or Cost-effective
Using a strictly random sampling approach to locate soil samples (i.e., selecting a random
set of geographic coordinates) will give a mutually independent set of samples, but not necessarily
a cost-effective or efficient set of samples. Randomization of the sample coordinates will lead to
a random sample of soil measurements regardless of the pattern of spatial variability onsite.
Consequently, the results of such sampling can be used to estimate the mean and variance and
determine confidence bounds on the overall site mean.

However, randomly selected sample coordinates are likely to miss significant portions of
the. site area, as some of the coordinates are likely to be bunched together and others are likely
to be spread much farther apart. Furthermore, the cost of physically locating and sampling at
random coordinates can be much higher than that required for a systematically laid out sampling
plan such as a systematic grid. And, if strong spatial correlation does characterize the soil
population, greater numbers of randomly selected samples will typically be necessary than the
number required on a systematic grid to estimate the overall site average with the same level of
precision. Thus, random samples tend to be less efficient for estimation in such cases.
Draft 2-14 'February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
       The reason for this last statement is that randomly selected locations around the site will
rarely provide as much areal coverage as the same number of locations positioned along a
regularly-spaced grid.  To the extent that there exists strong spatial correlation and the data are
grouped in clumps of highs and lows, a systematically placed sample is more likely to garner
measurements from all the major clumped areas and hence to enable better and more precise
estimates of the mean.  To get the same kind of areal coverage from a strictly random sample
would generally require more sample locations.

       Interestingly enough, the same chain of reasoning shows why stratified random sampling
tends to fall in between simple random sampling and systematic grid sampling in terms of relative
efficiency at estimating the overall mean.  Because the site is first stratified into smaller sub-areas,
a certain level of site  coverage  is guaranteed even  though samples  within each  sub-area are
selected- at random instead  of systematically.  But  the overall site coverage and extent of a
stratified random sample still will not tend to be as great as that achieved by  a systematic grid,
unless of course the site is divided into as many strata  as samples and only one  sample is  selected
from each stratum.

2.5    GEOSTATISTICAL SAMPLING TECHNIQUES
       Barring the collection of a simple random sample, which tends to be the least efficient way
to sample soil populations, any method used to estimate an overall site mean (as well as the site
variance) must properly account for the pattern  of spatial continuity.  Any non-random or
partially-random sampling scheme (including a systematic grid design) will tend to produce biased
estimates if not adjusted for the degree of spatial correlation.   Fortunately, there exist three
specific techniques designed to minimiyg the biasing impact of spatial correlation while generating
reasonable  estimates of the mean.

       Each of these techniques—polygons  of influence, cell declustering,  and kriging—has
advantages and disadvantages. In addition, only the  method of kriging can be used to generate
both global estimates (e.g., the overall site mean) and local estimates (e.g., concentration averages
in a particular localized EU), along with estimates of variability for each.  However, kriging is

Draft                                     2-15                            February 1996

-------
Gtostatistical Soil Sampling Guidance
significantly more complicated to implement for a given set of data. Therefore the other two
techniques also are described for use in those circumstances where kriging cannot be implemented
or where a simpler technique will be sufficient. All three techniques involve algorithms for
developing unequal weighting of the sample data prior to estimating the mean. Only by
determining proper weights for the data can the mean be estimated in a way that accounts for the
spatial continuity.

2.5.1 Polygons of Influence
The first technique, polygons of influence, is also sometimes known as "nearest neighbor
estimation." The reason for this is mat every unsampled location at the site is given an estimate
equal to the known sample value nearest to the unsampled location. These local concentration
estimates are not likely to be correct, since no allowance is made for changes in concentration as
•
one moves from the sampled location to a nearby unsampled spot. However, this technique is
useful for capturing the idea that nearby locations should have similar concentration levels in the
presence of spatial continuity. On average, the unsampled locations near a sampled point should
be roughly similar to the sampled value. Thus, even if the specific local estimates are off the
mark, the global average estimate might still be close to the target.

An important consequence of this simple estimation scheme is that the site will end up
being divided into a patchwork of multi-sided polygons with a sampled location at the middle of
each one. Each polygon will consist of those locations closer to the sampled location than to any
other sampled spot. And each soil parcel within a particular polygon will have an estimated value
identically equal to the sampled location at the polygon's center, thus the name polygon of
influence: the sampled value extends its influence to all unsampled points closer to it than to any
other sampled point.

In effect, nearest neighbor estimation leads to a series of weights attached to the sampled
values mat art equivalent to the ratio of (1) the area of a given sample's polygon of influence and
(2) the total area of the site (see Figure 1). Consequently, samples that are highly clustered
together tend to receive less relative weight per sample in estimating the overall mean,

Draft 2-16 February 1996

-------
                                                   Geostatistical Soil Sampling Guidance
                            Figure 1. Polygons of Influence
       0     20    40    60    80   100   120   140   ICO   180   200   220   240   280
Draft
2-17
February 1996

-------
                                                    Geostatisdcal Soil Sampling Guidance
while isolated samples in sparse quadrants tend to receive greater relative weight. Samples along
a systematic regularly-spaced grid tend to receive nearly equal weights in this estimation scheme,
because in that case the polygons of influence are mostly identical (except perhaps for samples
located along the "edge" or border of the site).

       The basic philosophy behind nearest neighbor estimation schemes is that, in the presence
of strong spatial continuity, samples  clustered close together will tend  to provide redundant
information and then only from a limited portion of the site. Isolated samples in sparser areas
provide the only hard data from those parts of the site and so should be  given greater relative
weight.  In general, the polygons of influence method will provide an unbiased estimate of the
overall site mean, at least under conditions of second-order stationarity.

2.5.1.1    Second-Order Stationarity
       Second order stationarity refers to a certain set of probabilistic assumptions that can be
made about  the site.   In particular, if the mean level of contamination does  not change
dramatically from one portion of the site to another (so that there are no discernable trends in the
mean) and if the pattern of spatial variability likewise remains about the same, the site is said to
satisfy the  conditions of second-order stationarity. These same  conditions will be required to
perform kriging analyses and represent the basic  assumptions of all practical  spatially related
statistical analyses.

       Conditions of second-order stationarity are not necessarily easy to verify, but often are
assumed nonetheless. However, one partial check that works well particularly for a site with
regularly gridded sample data is to  form what are known as "local window"  statistics.  Each
window of sample data is formed by mentally superimposing a small rectangle or "window" over
the site layout marked with the sample locations. By placing as many partially overlapping
windows as possible and computing the mean and variance of the sample data in each window (see
Figure 2),  these statistics can then be tabled (as in Figure 3, using the data  in Figure 2  with a
window size of 3x3) or graphed on a three-dimensional plot using the coordinates of each window
center   as   the   x  and  y   axis   values.    Then   the   tables   and/or   plots  can

Draft                                    2-18                            February 1996

-------
                                               Geostatistical Soil Sampling Guidance
                            Figure 2. Moving Windows






















81 77
82 61
82 74
88 70
89 88
77 82
74 80
+ +.
J-92.3 S-17.7

81 77
+ +
82 61
82 74
88 70
89 88
77 82
+ +
74 80
£=99.3 s=26.9


103 112 123 19 40 111
110 121 119 77 52 111
97 105 112 91 73 115
103 111 122 64 84 105
94 110 116 108 73 107
86 101 109 113 79 102
85 90 97 101 96 72
+ ' + + .+ +. +
mio-61 max- 121 IQR-30.8

103 112 123 19 40 111
+ + ' + + + +
110 121 119 77 52 111
97 105 112 91 73 115
103 111 122 64 84 105
94 110 116 108 73 107
86 101 109 113 79 102
+ + + + + +
85 90 97 101 96 72
min«19 max =123 IQR-24.8
Draft
2-19
February 1996

-------
                                                 Geostatistical Soil Sampling Guidance

Figure 3. Tables of Windows Means and Standard Deviations


Window x-1 x«2 x-3 x-4 x«=5 x=6
means '
y»l 85.2 95.6 111.3 97.7 78.4 76.6
y-2 85.2 94.7 111.1 102.4 88.2 85.8
y-3 87.2 94.7 107.8 104.3 93.7 91.1
y-4 86.3 93.9 105.8 106.0 96.4 92.8
y-5 83.9 90.7 98.7 105.0 99.1 94.6
Window x-1 x»2 x-3 x-4 x-5 x-6
SDs
y-1 15.4 20.2 8.7 33.1 36.5 34.1
y-2 16.0 21.0 8.5 20.7 24.8 21.6
y-3 10.6 14.8 8.9 17:3 21.4 18.5
y-4 9.5 13.7 11.1 16.7 21.4 17.9
y-5 6.3 9.6 11.1 8.3 14.8 15.8
•

Draft
2-20
February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
be examined for any significant trends or non-stationary features in either the mean or variance.
Data that satisfy second-order stationariry will exhibit variations in the mean and/or variance that
look more like random noise than any recognizable pattern.

2.5.1.2    Polygon Size Determination
       The most difficult part of using the polygons of influence method in practice is defining
the actual  polygons associated with each sample point. If the samples are taken strictly on a
regularly spaced grid, the polygons can be determined with reasonable ease.  However, if the
sampling has been collected at irregularly spaced coordinates, or, as is reasonably common in soil
studies, the samples have been collected in two or more phases, with only the first phase samples
having been obtained along a regular grid, the determination of the actual influence polygons is
more difficult.

       Statistical software packages exist that can take a set of coordinates data combined with
data describing the boundaries of the site area and produce the required polygons and associated
sample weights.  But if access to such a package is not readily available, one of two additional
options may be attempted. If the number of sample points is not too large and the boundaries of
the site are fairly regular, one can trace the site boundaries onto pre-gridded graph paper and plot
the sample locations by hand  (or produce such a graph on a computer).   Then, picking each
sample location in turn, draw a straight bisector perpendicular to the imaginary line segment
between the given sample location and any other sample point.  Repeat this process for each
pairing of the  selected location and other sample points, form the smallest closed polygon from
the intersections of these bisector line segments, and measure the approximate area of the resulting
                                                                                       *
polygon  using the gridded boxes on the graph paper.  Dividing this number by the total  area of
the site provides the appropriate sample weight for that sample location. This graphical method
becomes extremely cumbersome if there are more than just a handful of sampling points.

       A quicker, more accurate method is to develop a short computer program to compute the
approximate influence polygons, such as the one given in the Figure 4 below and written in
FORTRAN. The basic  idea is to discretize the entire site into an array of points, with a finer

Draft                                     2-21                            February 1996

-------
                                                        Gfostatistical Soil Sampling Guidance
          Figure 4. FORTRAN Program to Calculate Influence Polygon Weights
  C      PROGRAM TO COMPUTE APPROXIMATE INFLUENCE POLYGON WEIGHTS
  C
         panmeter(ndau »74)
         real t(odaa.S).Unbda(odau)
         real uegl.iseg2.iseg3.ueg4.iseg5
         real seg 1 ,scg2,seg3 ,seg4.seg5 ,scg6,$eg7,seg8,seg9,$eg lO.seg 1 1
         real $egl2,segl3,$egl4,iegl5,$egl6,»egl7.segl8,«egl9,$eg20
         integer zone,coum(ndata).tum,ininzooe.cbeck
         cnaracter*4 urgetl,target2,locl.ioc2

         open(2,file-'jcK.xx',«anis-'old')
         rewind!

  C
  C      READ SOILS DATA AND ASSOCIATED COORDINATES
  C
         dolOk*l,ndiu
             retda*)tft,l).KU),bore.tfr,3).t(M).t(k.5)
             count(k)-0
  10     continue
  C
  C      FOR EACHPOINT ON DISCRETE GRID, CHECK IF POINT IS WITHIN SITE BOUNDARIES. IF SO,
  C      COMPUTE NEAREST SAMPLE POINT AND INCREASE COUNT OF INFLUENCE POLYGON FOR
  C      THAT SAMPLE
  C
         check-0
         do 70 u -45,525
               do 60 jy- 147.435
                     loci •nrgetl(real(ix),retl(jy))
                     if (loci. eq. 'on ') goto 60
                     k)c2-ttr|et2(real(ix),retl(jy))
                     ifOoc2.eq.'in ') goto 45
                      sum + count (k)                            __
Draft                                       2-22                              February 1996

-------
                                                               Geostatistical Soil Sampling Guidance
    Figure 4.  FORTRAN Program to Calculate Influence Polygon Weights (Continued)
   80     continue

          write(V) check.sum
          open(3.file«* 'wgts.rain',status« 'new')
          dolOOk-l.ndata
                 lambda(k) » real(count(k))/real(sum)
                 wnte(3,150) (t(k,i).i-l,5).lambda(k)
  100     continue
  150     format(3f8.1,2fl0.2.fl5.8)
          stop
          end

          characters function target l(x,y)
  C                                       	                         	
  C       DETERMINE WHETHER A DISCRETIZED POINT W IS IN THE PERIMETER
  C
          real x.y.segl ,seg2,seg3,seg4,seg5,seg6.seg7.seg8,seg9,seglO
          real seglI,segl2.$egl3,segl4,segl5.segl6,segl7,$egl8.segl9
          real seg20
          logical areal,area2,arej3.are»4
          areal -((segl(x).lc.y).and.(seg2(x).ie.y).and.(seg3(x).le.y)
     +          .and.(seg21(x).le.y).and.(segn(x).ge.y).aod.(iegl8(x).ge.y)
     *          .aod.($egl9(x).ge.y).aiid.($eg20(x).ge.y))
          area2-((seg4(x).le.y).and.(seg5(x).Je.y).tnd.(seg6(x).le.y)
     +          .and.(seg22(x).le.y).aod.(iegl6(x).ge.y).aad.(8eg21(x).ge.y))
          area3-((seg7(x).le.y).and.(seg23(x).te.y).and.(segl5(x).ge.y)
     +          .and.(seg22(x).ge.y))
          are»4-((seg8(x).le.y).and.(seg9(x).le.y).and.
     +          (seglO(x).le.y).ind.(segl I(x).le.y).and.(segl2(x).ge.y).ind.
     +          (seg!3(x).ge.y).and.(segl4(x).ge.y).aiid.(seg23(x).ge.y))
          if (areal .or.area2.or.area3.or.area4) then
                 targetl-'in '
                 else
                 targetl*'out'
          end if
          return
          end

          real function segl(x)
  C
  C       CALCULATE BOUNDARY CONDITION BASED ON LINEAR SEGMENT BETWEEN TWO
  C       OUTER BOREHOLES
  C
          real x.mm.bb
          mm- (180.-213.)/(51.-45.)
          bb- 213.- mm*45.
          segl - mm*x -f bb
          return
          end
Draft                                             2-23                                  February 1996

-------
                                                      Geostatistical Soil Sampling Guidance
    Figure 4. FORTRAN Program to Calculate Influence Polygon Weights (Continued)
         real function seg2(x)
  C
  C      CALCULATE BOUNDARY CONDITION BASED ON LINEAR SEGMENT BETWEEN TWO
  C      OUTER BOREHOLES
  C
         real x.mm.bb
         mm- (159.-180.)/(78.-51.)
         bb- 180.- mm*51.
         seg2- mm*x + bb
         return
         end
         real function seg23(x)
  C
  C      CALCULATE BOUNDARY CONDITION BASED ON LINEAR SEGMENT BETWEEN TWO
  C      OUTER BOREHOLES
  C
         real x.mm.bb
         mm- (396.-159.)/(408.-378.)
         bb- 159.- nnn"378.
         seg23- mm*x + bb
         return
         end


  C
  C      DETERMINE WHICH SAMPLE IS CLOSEST TO DISCRETIZED POINT W;
  C      THEREFORE, IN WHICH ZONE W SITS
  C
         parameter(ndata-74)
         real mindi«.diit,x.y.z.tt(ndata4)
         integer ion
         mindist-400.
         zon-1
         do200k-l.ndata
              dist- iqrt((o(k,l)-x)»*2 + (nXkJ)-y)»*2 -f (tt(k.3)-2)*«2)
         if (dist.lt.mindist) then
              mindist—dist
               zon-k
         end if

  200     continue
         minzone- zon
         return
         end
Draft                                      2-24                             February 1996

-------
Geostatistical Soil Sampling Guidance
discretization leading to more accurate polygons but also more required computing time. At each
discretized point within the site boundaries, determine the distance from that point to each of the
known sample locations. The minimum such distance then determines which sample point
acquires the discretized point for its polygon of influence. Looping through all the discretized
points in this fashion and keeping tallies of the total number of points assigned to each sample
location results in approximate areas of the influence polygons, which can then be divided by the
total number of discretized points within the site to generate the sample weights.

2.5.2 Cell Declustering
Cell declustering is an alternate method of determining sample weights for global
estimation of the overall site mean. It is similar to the polygons of influence method, but can be
easier to use in some circumstances. Like the first approach, the basic goal is to assign weights
to the sample values that account for the basic clustering of the data. If the sample locations are
strictly laid out on a regularly-spaced grid, both cell-declustering and polygons of influence will
result in equal sample weights of 1/n for all sample locations (ignoring edge effects due to
irregularly shaped boundaries). However, when the data are inherently clustered, especially in
the presence of significant spatial continuity, the clustered samples most likely will provide
somewhat redundant information about the sampled area and could bias the estimate of the overall
mean if they are not downweighted in an appropriate way. This process of downweighting is what
declustering is designed to do.

The basic algorithm for cell declustering is as follows. First, the site is divided into an
array of equal-area, non-overlapping rectangles, with the ratio of sides in each rectangle selected
in approximate proportion to the lengths of the site boundaries considered as a whole (see.
Figure 5). Then the number of sample locations falling within each rectangle is tallied and used
to compute the final sample weights for each sample location. Because each rectangular region
has the same area, the sample weights are computed so that each rectangular area makes an equal
contribution to the overall mean estimate. To that end, the sample weight for a given location is
computed by taking the reciprocal of the product of the number of sample points located within
Draft 2-25 February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
                          Figures. CellDeclusteringExample
                                                    13
                                316
                               n-1
                                                               10
                                                                    306
                  0-3
                                     501
                                                  233   6
                                                                       32S
                                                                   529
                                 323
                                                  10
                               n-2
                  n-6
                                                  100          273
                                                                             304
Draft
2-26
February 1996

-------
Geostatistical Soil Sampling Guidance
its rectangle and the number of rectangles used to subdivide the site area. For example, if a given
sample was 1 of 10 total samples in its rectangle, and if a total of 20 rectangles were used to carve
up the site, each sample within that particular rectangle would receive a sample weight of 1/200.
Samples in another rectangle containing only 5 locations would receive weights equal to 1/100.
In each case, the total weight contributed by any single rectangle is 1/20, the reciprocal of the
number of total rectangular regions. In this way, those samples that are the most clustered receive
the least individual weight, while those samples that are the most isolated receive greater
proportional weight.

One difference between cell-declustering and influence polygons is that while the
individual weights for every sample Ming within the same rectangular region are identical in the
former approach, no two sample weights are necessarily the same in the latter approach. In this
sense, cell dechistering does not try to determine the exact area of influence of any given sample
point. However, it is far easier to compute the sample weights under cell declustering, even when
there is a lot of sample data.

2.5.2.7 Cell Size Determination
The only other practical concern with cell-declustering is how to choose the size of the
rectangular regions in the first place. It is clear that if the rectangles were made so small that only
one sample fell into any given rectangle, all the sample weights would be identical. At the same
time, if only one rectangle were formed (that is, a rectangle equivalent in area to the entire site),
the sample weights would again be identical. Unfortunately, there is no single correct choice of
rectangle size between these two extremes that will always give the best overall mean estimate.
Different choices of rectangle size will alter the final estimate, making the use of cell-declustering
seem somewhat arbitrary.

One approach around this difficulty is to compute the cell-declustering estimate repeatedly
for different rectangle sizes to see if the mean is fairly stable over some range of sizes. If
approximately the same mean estimate is derived for different size choices, it should be a
reasonable final estimate. If the mean is not stable but clustered sampling is known to have

Draft 2-27 February 1996

-------
                                                   Geostatistical Soil Sampling Guidance
occurred only in those portions of the site with higher valued areas, Isaaks and Srivastava (1989)
recommend that the lowest mean estimate be selected.  After all, the unweighted data will tend
to give too high a mean estimate since so many clustered samples would be preferentially located
in the high-valued areas.  Therefore they argue that the choice of rectangle size should depend on
the lowest mean estimate generated, since this estimate would hopefully represent the case of
greatest declustering.

2.5.3  Kriging
       The last method for generating overall mean estimates is known as lodging, named after
a pioneering South African mining engineer who first tried to use mathematical  methods to
describe the location and extent of ore bodies in gold mining.  Kriging will be discussed in more
detail in later sections of this document; however, it is useful to give a brief description here of
                                     •
how it can be used to do global estimation.  In general, kriging is most often thought of as a local
estimation tool, because it is typically used to generate point estimates at unknown site locations
and, consequently, to produce contoured maps of the spatial distribution of concentration values
onsite.

       Whether used for local estimation or global estimation, kriging fundamentally involves the
calculation of optimal weights for the sample data.  These sample weights are optimal  in the sense
that kriging will generate the single best linear combination of the sample data for a given model
of the spatial correlation (i.e., spatial variability).  Therefore, if one assumes that the modeled
pattern of spatial variability  is correct, and this is a big assumption indeed, kriging will  produce
the best  linear estimates of the  concentration values at unsampled points.  Other estimates,
including non-linear ones, of course are possible to compute and may provide better  results than
kriging in certain situations.  However, kriging has proved to be a very useful tool over the  last
20 years and there are several software programs available both commercially and  through the
USEPA that are designed to implement kriging algorithms, including GEO-EAS and GEOPACK
(see USEPA 1988 and USEPA 1990).
Draft                                    2-28                            February 1996

-------
Geostatistical Soil Sampling Guidance
Since kriging also results in a weighted linear combination of the sample data it would be
tempting to simply plug all the data into one of the software programs mentioned above, let it
compute overall sample weights for the data, then use these sample weights to estimate the overall
mean. Unfortunately, this would be very difficult unless the size of the data set were rather
modest. To generate kriging sample weights for an entire site all at once, the site area first has
to be discretized into a regular grid of points and then a system of linear equations, with as many
equations and unknowns in each equation as the number of sample data, and then has to be
simultaneously solved. The computational difficulties of this approach make it impractical for
most cases.

An easier and computationally tractable approach is to produce local estimates for each
discretized point via kriging and then average all these local estimates into an overall site mean.
This approach will typically result in approximately the same global mean estimate as the
computationally cumbersome method described above. In addition, the local estimates also can
be combined hi other, ways to generate mean estimates for specific portions of the site, such as
localized EUs. A mean estimate for a specific EU would consist merely of those discretized local
point estimates that were included in the EU's boundaries. Hence, kriging can be used to generate
local, as well as global, estimates using the same data and techniques.

The savings in computational effort and expense from producing localized kriging
estimates is gained by understanding the underlying principles that make kriging work. First, it
should be understood that kriging can only proceed once the observed pattern of spatial variability
has been modeled and supplied to the kriging algorithm. Because some degree of spatial
correlation is assumed, only those sampled locations closest to the unsampled point being
estimated are expected to significantly influence its estimated value. Samples that are a greater
distance away should have relatively little influence since the greater the separation between any
pair of points, the more they behave statistically as independent random values, meaning that one
does not influence the other in any predictable manner.
Draft 2-29 February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
 2.5.3.1    Local Search Neighborhoods
       Because of this feature of spatially correlated data, it is possible within the kriging
 algorithm to define what is known as a local search neighborhood, consisting of a small number
 of samples in the local neighborhood of the unsaxnpled point to be estimated. This small set of
 samples is assumed to provide all the necessary information to adequately estimate the unsampled
 value. In fact, for each unsampled location, the samples in the local search neighborhood for that
 location are assigned local weights mat add to unity, so that the final local estimate is equivalent
 to a weighted linear average of the sample data in the local neighborhood.  But the estimate also
 can be viewed as a weighted linear combination of the entire sample data set—one in which those
 samples outside the local neighborhood are merely given weights of zero.

       As one moves from one unsampled point to another, the samples within the local search
 neighborhood will change, and so will the local kriging weights.  Of course, there will be overlap
 in these local neighborhoods for unsampled locations that are relatively close together. However,
 even when the local neighborhoods for two distinct but  nearby locations completely overlap, the
 local kriging weights attached to each sample may change, since the configuration of the sample
 points in relation to the unknown location will have changed, thus affecting the apparent pattern
 of spatial correlation influences on the unknown value.

       When each local estimate over the entire site is averaged together to form the estimated
 site mean, each local value could be written as a weighted linear combination of the entire sample
 data set, as noted above. Any given sample point will then have a series of different local kriging
 weights associated with it, due to the variety of unknown locations that included that particular
 sample in the local search neighborhood. The remaining weights for that sample will be equal to
 zero,  since  they would not have been included in the search neighborhood for many of the
 estimated locations.

       Thus, an average of these local estimates results  in a series of other averages: each sample
point will be associated with an average local kriging weight, which approximates the influence
of that particular sample when all unsampled locations are considered  together.  Because of this,

Draft                                     2-30                            February 1996

-------
Geostatistical Soil Sampling Guidance
the overall average of the individual local estimates can be used to compute final sample weights
of the overall data set for use in estimating the global mean. As noted before, this is not the
typical use of kriging, but it does provide one more reasonable method to make such global
estimates. For farther details on using kriging in global estimation, the reader is encouraged to
study Isaaks and Srivastava (1989).

2.5.4 Summary
All three of the methods examined in this section involve forming a weighted average of
the sample data, as opposed to a simple arithmetic average. The necessity for weighting the data
is a consequence of the presence of spatial correlation. Simple averages will almost surely result
in biased estimates of the mean unless a truly random sample of site locations is selected for
sampling. If this is not the case, one of the above methods should be used instead.

2.6 ESTIMATING CONFIDENCE BOUNDS ON A GLOBAL AVERAGE
Even if a random sample of points is selected, and the mean estimate is based on this data
set, it is generally necessary to also assess the variability or precision of the estimate. Often this
translates into constructing a confidence interval for the mean. As discussed previously, if a
strong pattern of spatial correlation is evident, a random sample will not tend to result in a very
precise estimate of the mean, at least not compared to alternate sampling strategies. Of course,
these alternate strategies require one to recognize and account for the presence of spatial
continuity, but once this is done, it is possible to substantially sharpen one's estimate of the global
mean and the confidence interval bounds surrounding it.

Properly accounting for the presence of spatial correlation in developing a global
confidence interval is not, however, a trivial or fail-safe matter. In fact, the results depend
significantly on one's view of the nature of randomness in spatially continuous data. As discussed
previously, the superpopulation model assumes that the actual observed site is just one of an
infinite number of similar sites that could have been created with the same conditions of second-
order stationarity. That is, if a computer were given a set of constraints including the overall
mean of the data, the overall variance, and a function describing the pattern of spatial continuity,

Draft 2-31 February 1996

-------
                                                     Geostatistical Soil Sampling Guidance
 it could produce an infinite series of sites satisfying these conditions but potentially looking quite
 different from the actual site in question.  Such is the nature of spatial variability or spatial
 covariance models.

       The existing formulas for estimating the variability associated with a global mean estimate
 all depend explicitly on the spatial covariance pattern developed from the data.  Since this pattern
 of covariance is generally modeled using only the assumptions of constant mean and second-order
 staiionarity, the model is typically more general man the actual observed pattern of variability on
 site. Because of this, the same model could fit equally well to an infinite number of possible site
 realizations.  Consequently, such spatial covariance models may or may not give appropriate
 variance estimates. That is to say, the variance estimates will tend to apply to all possible sites
 with the same mean and stationarity characteristics and may tend to be too large hi general.

 2.6.1  Effect on Variance Estimates
       The reason for mis is that the variance estimates of the global mean generated by a typical
 spatial covariance model necessarily attempt to account for the infinite  number of site realizations
 consistent with a similar spatial correlation pattern. Consequently, they are not the same variance
 estimates as one would get by assuming a fixed spatial pattern in which variability in the results
 can only be induced by choosing a different set of samples.  As discussed before, this latter idea
 of a fixed, one-of-a-kind spatial pattern makes more sense from a realistic viewpoint than the
 superpopulation model,  and the estimate of global variability under a fixed spatial pattern
 framework is usually smaller man the variability estimate under the superpopulation framework.
 As a consequence, confidence bounds on a global mean derived from a random function model
 of the spatial covariance pattern are likely to be overly conservative and wide, thus not as sharp
as those estimated by assuming a fixed spatial pattern.

       As attractive as the fixed spatial pattern framework sounds, it entails significant practical
difficulties.  The main problem is that the variability under a fixed spatial pattern is difficult to
estimate given the kinds of data usually available. Since variability under a fixed spatial pattern
 is strictly the result of altering the  configuration of sample locations, the  only good way to

Draft                                     2-32                            February 1996

-------
Geostatistical Soil Sampling Guidance
measure it is to have access to all the true site data and then repeatedly vary the sample
configuration to see how the results differ. For instance, one might repeatedly alter the random
starting point associated with a systematic grid and measure how much the results change from
iteration to iteration. Of course in practice, the true site data for every possible location will not
be known in advance (otherwise no sampling plan would be needed!), so one is forced to rely on
spatial covariance models and their inherent shortcomings.

2.6.2 Isaaks and Srivastava Method
As for actually computing a confidence interval on the global mean estimate, Isaaks and
Srivastava (1989) suggest one method that should give reasonable, if somewhat overly
conservative, results. The method involves four basic steps and some key assumptions:
•
Step 1. Explore the pattern of observed spatial variability at the site and develop a reasonable
functional model of the spatial covariance. Note the following assumption:
The statistical characteristics of the variable under study should not vary from
one location to another. Specifically, we assume that the random variables*
that model the real data values all have the same expected value and that the
covariance between any two of these random variables does not depend on
their specific locations, but rather only on the separation vector between
them. In practice, these theoretical considerations translate into an
assumption mat there is no noticeable trend in the local means and variances.
(Isaaks and Srivastava 1989, p. 508)
That is, if the assumptions of constant mean and second-order stationarity seem to hold,
the spatial covariance model can be expressed as C(h), where h represents the separation
vector between any two points and C is the function modeling the covariance between
them.

Step 2. Adjust the sill of the spatial covariance model to a reasonable estimate of the global
variance. As will be discussed later, modeling the spatial covariance pattern at any site
involves examining all possible pairs of sample points, arranged into mutually exclusive

Draft 2-33 February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
         groups having the same or nearly the same separation vector.  Those pairs in groups
         with the largest separation vectors typically exhibit the least spatial correlation.  In fact,
         as the separation distance increases from  zero  to infinity,  the degree  of spatial
         correlation tends to drop down to a minimum.  The highest level of covariance thus
         occurs for a zero separation vector (i.e., C(0)), which is never actually observed, but
         which represents the common variance of all the random values being used to model the
         actual data values.

         C(0)is also equivalent to what is known as the "sill," which represents the upper plateau
         of a variogram, a kind of inverted spatial covariance.  A variogram models the  squared
         concentration differences expected between any two randomly chosen locations, using
         the  separation vector between the points as the key input to  the functional model.
         .Consequently, the sill of a variogram is essentially the maximum level of variability
         expected between two points separated by a large distance, when the two points ought
         to have essentially no spatial correlation (i.e.  they should be roughly statistically
         independent).

         Because the sill should represent the level of variability one would expect from non-
         spatially correlated (i.e., independent) data, the modeled sill, or C(0), of the spatial
         covariance model, should closely match the variance derived from a strictly •random
         sample of site locations.  Of course,  if no strictly random sample is available, a
         reasonable alternative is to use a declustered estimate of the global variance. Such an
         estimate can be  derived using the same weights that were computed to calculate a
         declustered estimate of the global  mean (using either polygons of influence, cell
         declustering, or kriging).  The only additional calculation needed is a weighted estimate
         of the global variance, which can be computed with the following formula:
Draft                                     2-34                            February 1996

-------
Geostatistical Soil Sampling Guidance
where Wj represents the weight computed for the jth sample point and Z(Xj) is the sample
value at the jth location.

Once the global variance has been estimated, it can be compared with the modeled sill
to see how well they match up. If there is a significant difference, it is recommended
that the spatial covariance model (or alternatively the variogram) be scaled so that C(0)
or the sill is equivalent to the estimated global variance.

Step 3. Using the scaled spatial covariance model, compute the global estimation error variance,
o* , using the following formula:
The weights w, in this formula are precisely the same weights used to estimate the
weighted global mean. The term CAA is the average covariance between any two points
randomly selected from the site area A and must be computed by repeatedly using the
spatial covariance model on all possible pairs of points that result when the entire site
is discretized into a regular grid of locations. The other covariance terms are similarly
defined: Ctt is simply the modeled spatial covariance between locations x, and x,; while
Cu is the average covariance between location, x and all possible points in the site
area A.

Note that the estimation error variance is designed to estimate the typical squared
difference expected between the true site mean and the weighted mean estimate.
Therefore, it can be used to construct an approximate confidence interval around the true
site mean.

Step 4. Once the error variance is computed, the last calculation is to add and subtract twice the
square root of the error variance onto the weighted mean estimate. This results in an
approximate 95% confidence interval. Of course, the key assumption in this step is that

Draft 2-35 February 1996

-------
                                                     Geostatistical Soil Sampling Guidance
         the distribution of errors captured in the error variance statistic is approximately normal,
         so that a normal-based confidence interval can be constructed. Because of the averaging
         involved in estimating the global mean, such an assumption is often fairly reasonable,
         but it may not be in all cases.  To get approximate confidence intervals at other levels
         of confidence, one  would simply multiply the square root of the error variance by an
         appropriate standard normal quantile such as z«/j.

         Again it should be  noted that this method of constructing confidence intervals on the
         global mean is highly dependent on the model of spatial covariance. Such models treat
         the observed site as just one possible realization of an infinite number that could have
         been observed under the same stationarity conditions. As Isaaks and Srivastava describe
         this dilemma,
              If by a global confidence interval we intend to describe how the estimation
              error of the global mean might fluctuate if other sample data sets had been
              collected, then the error variance given by our random function model will
              likely be too large. Rather man calculate the error variance over all possible
              outcomes, we should be calculating it only over the one outcome that has real
              significance.   The  range of  possible fluctuations over  all outcomes is
              generally larger than the range of possible fluctuations over a single outcome.
              (1989,  p.  509)
Draft                                     2-36                             February 1996

-------
                                                   Geostatistical Soil Sampling Guidance
       3. USING AN UPPER PERCENTILE TO MEASURE COMPLIANCE
3.1    DEFINITION
       Instead of the more typical estimates of the mean, one alternate strategy for measuring
compliance is to use an estimate of a pertinent  and meaningful upper  percentile  of the
concentration distribution, for example, the upper 80th or upper 90th percentile.  Depending on
the hypothesis being tested, one could initially presume at a given site that the upper percentile
of interest was either above the compliance standard or below it. In using an upper percentile to
determine compliance with a standard, the presumption would be that the upper percentile exceeds
the standard unless strong evidence indicates that the percentile is actually less than the  cutoff.
Note that using an upper percentile to compare to a  bright-line standard in this manner would
actually entail a more stringent test than the comparison of the mean against bright-line.  The
reason is that  since the upper 80th or 90th percentiles will almost certainly be above the mean,
demonstrating  that an upper percentile is less than bright-line will require stronger evidence than
that needed to prove  the mean is less than the standard.

       Given the greater stringency of a test on an upper percentile, one might question why an
upper percentile  would ever be  considered for compliance  purposes.  The answer  lies in
understanding  how the two competing statistics—the mean versus an upper percentile—differ in
the information they provide about the underlying population of measurements.  The mean is of
course well-known as an excellent estimator of long-term, chronic phenomena, where long-term
exposure is based on averaging the  effects of many possible exposure events over a period of time.
When chronic, long-term averages are really of interest,  the mean is typically the best available
estimator. However, in some situations there may be greater interest in possible acute effects or
transient exposures associated with significant short-term risk. Such exposure events may not
happen often or on a regular basis but may be important to track for monitoring and compliance
purposes.

       Exposure of infants to nitrate concentrations in excess of 10 mg/L (N03- as N) in drinking
water provides an example of a scenario where concern over acute  exposure effects prevails. The
flora in the intestinal tract of infant humans and animals does not fully develop until the age of

Draft                                     3-1                            February 1996

-------
Geostatistical Soil Sampling Guidance
approximately 6 months. This results in a lower acidity in the intestinal tract, which permits the
growth of nitrate reducing bacteria. These bacteria convert nitrate to nitrite. When absorbed into
the bloodstream, nitrite interferes with the absorption of oxygen. Suffocation by oxygen
starvation in this manner produces a bluish skin discoloration—a condition known as "blue baby"
syndrome (or methemoglobinemia)—which can result in serious health problems, even death.

In such a scenario, suppose an acute, short-term exposure above some critical level should
normally occur in no more than 10 percent of all exposure events. Then the critical level so
identified would be equivalent to the upper 90th percentile of all exposure events by definition.
If, upon investigation, it was determined that more man 10 percent of the exposures exceeded the
critical level, the 90th percentile would cease to be equal to the critical level and would actually
have decreased below the critical level. On the other hand, if fewer than 10 percent of the
exposures actually exceeded the critical level, the 90th percentile would have increased above this
point. Thus, measuring the location of a particular percentile relative to a critical level or
standard provides information on the percentage of exposure events that exceed the standard.

Relating exposure events to soil concentration levels and bright-line numbers is
accomplished by assuming mat contact with a parcel of contaminated soil equates to some likely
exposure amount. Following the example above, if more than 10 percent of the soil
concentrations are above bright-line, then the 90th percentile of the soil distribution lies below
bright-line and perhaps the 90th percentile of the exposure event distribution as well. In such
cases where acute, short-term exposures above a certain level must be controlled, the evidence
from the site might suggest the need for further remediation or Federal involvement.

One additional way to view upper percentiles is to consider the process of picking random
locations around the site and recording the percentage of times the soil measurement exceeds
bright-line (or another standard of interest). If the percentage of such "hits" is equal to p, the
percentile associated with the standard is the (lOO-p)th. Again, determining that the actual
percentage is greater or less than P will indicate whether the percentile of interest is either below
or above the standard respectively.

Draft 3-2 February 1996

-------
                                                     Geostatistical Soil Sampling Guidance
 3.2   ADVANTAGES
       Unlike the estimation of an overall average, an upper percentile can be used to ensure that
 no more than a small fraction of the site area is contaminated above a given level.  For sites with
 a non-uniform or non-homogenous pattern of contamination, and where a small portion of the site
 is heavily contaminated but the rest is essentially clean, the overall mean  is likely to be the
 average of many low values mixed with an occasional high concentration. The net result will tend
 to be a lower mean estimate that may not adequately reflect the presence of highly contaminated
 soil samples.

       Estimation of an upper percentile such as the 90th also allows one to explicitly measure
 how  often randomly selected parcels of soil will have  concentrations above the compliance
 standard and equivalently, to. measure what specific fraction of the site is contaminated above the
 action level.  For instance, the presence of a single physical spill zone that contaminates a small
 portion of the site but not the bulk of the area lends itself to either estimating an upper percentile
 or the use of a hotspot search as described later.

 3.3    POTENTIAL DIFFICULTIES
       Despite the tremendous desirability of estimating upper peicentiles in cases where acute
 exposure events are of primary importance, there are potential difficulties that must be considered
 in designing the sampling plan.  The most important key to remember is that accurate estimation
 of an upper percentile will be very difficult if none of the soil samples exhibit some of the high-
 level concentrations that exist on site. Due to contaminant depositional and migration patterns and
 strong spatial continuity, much of the highly contaminated areas may be relegated to  small
 portions of the site.  Unless the sampling plan purposefully attempts to sample these areas, they
 may easily be missed, resulting in a set of measurements that is much lower in concentration and
 therefore offers little clue as to the kinds of extreme concentration measurements that may be
 present.

       If the site has only one or perhaps just a few large hotspots, sampling can be designed to
 find these hotspots on a course grid, making estimation of the upper percentile less difficult.

Draft                                     3-3                            February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
However, if the site has many small hotspots scattered around the site, it may be more difficult
to determine their location without very intensive and costly sampling, leading to less certainty
about whether or not the upper percentile of interest is actually below the compliance standard.

       Because of these factors, accurately estimating an upper percentile can prove challenging.
As noted above, it is generally very important to ensure that some samples are collected from
hotspot areas on site, for unless the  true distributional pattern of the concentrations can be
determined from the limited sample data and a parametric model such as the lognormal assumed,
it would be unusual to estimate an upper percentile value as greater man any of the observed data
values.  Without any samples from the hotspot areas, the collection of available samples will
consist of lower  level measurements, leading to lower-level and inaccurate estimates of the upper
percentile.

       Strict random sampling of soil locations tends to be an even poorer choice of strategy when
it comes to estimating upper percentiles than it does for estimating the overall mean, especially
if the hotspot areas are concentrated in just a small portion of the site area.  A random sample can
easily miss such "targets" completely.  On the other hand, grid-based sampling will tend to be
much more efficient at locating hotspot "target" areas, given the same degree of sampling effort,
as least if the hotspots are fairly large relative to the grid-spacing.  If the hotspots are rather small
and scattered, even a grid-based design might prove inadequate without very intensive sampling
efforts and  a tightly-spaced grid.  In general,  it is easier to estimate a mean than an upper
percentile, easier in the sense of degree of precision gained per unit of sampling effort.

3.3.1  Summary
       Fortunately, the bright-line numbers are based on  a lifetime average exposure scenario
where chronic effects are the primary concern. Many health-based standards, such as the majority
of drinking water  maximum  contaminant  levels (MCLs), were  established using  the same
considerations.  Some standards, however, such as the MCL for nitrate, use an acute effects
exposure scenario due to more immediate health responses.  In any case, a correct understanding
Draft                                     3-4                             February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
of the meaning of a standard (e.g., bright-line, a ground-water protection standard, or a clean-up
standard) is needed to design a meaningful statistical test to determine compliance.

3.4    STRATEGIES TO ESTIMATE AN UPPER PERCENTDLE
       To estimate an upper percentile, for example the (1-p) x 100% percentile, the goal is to
estimate that concentration level that will be exceeded by only p% of the underlying population.
The most direct way of doing this is to use a declustered (i.e., weighted) version of the sample
data to form a weighted cumulative distribution function (cdf), F(x). Then the value of the inverse
cdf, Fl(P), can be used as the desired estimate.

       To see how this  works,  consider the simpler case of equally-weighted data (perhaps
obtained through simple  random sampling).  By sorting the data values in ascending order and
plotting the cumulative probabilities associated with this list on the y-axis of a graph against the
concentration values along  the x-axis, this function, sometimes referred to as the  empirical
distribution function (cdf), increases in jumps that are multiples of 1/n as one moves from left to
right across the graph (see Figure 6).  If there is a unique measurement at a  given concentration,
the graph jumps by exactly 1/n.  If there are multiple measurements at a given concentration, the
graph jumps by 1/n times the number of such measurements.   In all cases,  however, each
additional measurement causes the graph to increase by an amount identical to its sample weight.
These increases all equal 1/n since the n samples are equally weighted.

       In the case of weighted data, the principle for constructing the weighted cdf is the same,
but the weights are potentially different for each measurement. The data values are again sorted
in ascending order; the only difference is that each measurement adds a jump to the graph only
equal to its weight w,. In the end, a very similar looking distribution function emerges, only with
unequal jumps.

       Once the cumulative distribution is drawn, it is relatively easy to  graphically estimate
various upper percentiles  associated with the underlying soil population.  Since the scaling of the
vertical axis runs from 0 to 1, the upper (l-p)xlOO% percentile can be estimated by drawing a

Draft                                     3-5                             February 1996

-------
                                                  Geostatistical Soil Sampling Guidance




P = 0.8


0.5

o-


Figure6. Empirical Distribution Function




.»
.
•

•— 1
Xj X2 X3 X< X5 X6 X7 Xg
Concentration

Draft
3-6
February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
 horizontal line from the left edge of the graph at the point (1-P) until it intersects the weighted
 cdf.  The concentration level associated  with this point is the estimated percentile.   From a
 computational standpoint, a slightly different result can be derived by computing the weighted cdf
 and interpolating linearly between those two known values of the cdf that bound the desired
 percentile. Either way, it is extremely important to first create a reasonably declustered version
 of the sample data set. Otherwise, the estimates of upper percentiles are  likely to be significantly
 biased (see Figure 7).

 3.4.1  Approximate Confidence Bounds
       Ideally, one might want to estimate a confidence interval around the true percentile, so
 that, as with the global mean, one could conclude with a specific level of confidence that the true
 percentile is definitely below or above the compliance standard, depending on whether the upper
 confidence limit was below or the lower confidence limit was above  the standard respectively.
 The literature on this specific topic unfortunately appears to be very limited, so the following
 outlined approach should be considered approximate at best.

       The goal of this approach is to mimic the estimation of upper percentile confidence
 intervals in the case of simple random samples from normally distributed populations.  In this
 setting, the estimated mean and standard deviation are used to construct intervals of the form:
where the values K, and *t depend on the sample size, confidence level, and percentile being
estimated (Hahn and Meeker 1991).  Since this technique works well with simple random samples,
one simple approach would be to sample locations at random from the site.  Then one would only
need to verify the key assumption of approximate normality of the data or find a transformation
of the data that would lead to approximate normality.

       To modify this approach for spatially-correlated soil samples hi non-random or partially-
random designs (such as systematic grids), the basic idea is to use declustered estimates of the
global mean and variance to substitute for the arithmetic  mean and simple random variance
estimate  used in the Hahn and  Meeker formulation.  However,  additional complications are

Draft                                      3-7                            February 1996

-------
                                                           Geostatistical Soil Sampling Guidance
                           Figure?.  Dedustered vs Naive Estimates
                                        (a) Polygonal eitimates
                          1
                      II
                      5l
                          0 1
                                          SOO          1000         1500
                                     (b) CeH'decluitering estimates
                          1
                      II
                          0 1
                                          SOO          1000         1500
                                (c) Naive estimates from equal weighting
                          1  '
                          o •!_;
                              0            500          1000         1500


                        la each figure the trot enmutatrvt distribution appears as the thick line;
                        the correspondiag estimate appears as the thin line
Draft
3-8
February 1996

-------
                                                    Geostalistical Soil Sampling Guidance
 immediately introduced, including whether or not the underlying population is really normal, and
 the fact that the correlated nature of the measurements impacts  the effective sample size being
 used, which in turn impacts the choice of K.

       The first of these complications can be explored by comparing the weighted distribution
 of the. declustered data with the normal distribution, most usefully via a probability plot.  By
 forming the weighted cdf, the data values  are associated with specific cumulative probabilities.
 Computing the expected normal distribution quantiles  (z-vaiues) for these same  cumulative
 probabilities and plotting  the sample values by these z-values should give an indication as to how
 closely the data follow an approximate normal distribution.

       The second complication is more troublesome: while declustering the data effectively
 downweights data values that are closely clustered together and  are therefore more likely to be
 similar in  value, the declustering process does not eliminate the spatial correlations between
 nearby sample pairs. While it is possible  to estimate the global variance using the declustered
 data, because of these pairwise correlations the variance estimate is not based on a full set of n
 independent data points.   Rather the effective  sample size is  somewhat less, based on the
declustering weights and  more generally the set of pairwise sample correlations.  Because the K
values in Hahn and Meeker (1991) are based on n independent data points and don't account for
possible correlations in the data, the value n in their tables must be replaced by an effective
 sample size v that better accounts'for the effects of weighting and correlation.

       As  a word of caution, however, the effective sample size suggested below is at best
approximate, since it too does not fully account for the pairwise spatial correlations but rather
adjusts  the effective sample size  for the  declustering  weights.  Still it may give a decent
approximation in many cases.  Given a set of declustering weights, w,, the quantities needed to
compute an approximate confidence interval around a true upper percentile  include the weighted,
declustered mean, given by
Draft                                      3-9                            February 1996

-------
                                                   Geostatistical Soil Sampling Guidance
 the weighted, declustered estimate of the global variance, given by
 where a slight modification has been made to the estimate suggested by Isaaks and Srivastava,
 (1989, p. 431) in order to make the estimate unbiased; and the estimated effective sample size,
 solved by using the method of moments to fit a chi-square variable to the global variance estimate
 above, and given by:
 where the intermediate quantities are 4> = Sw* and A = EWj3.

       Once these quantities have been computed, the effective sample size can be used to look up
 appropriate values of K, and KJ from the tables in Hahn and Meeker (1991). Then the approximate
 confidence interval will be of the form:
       Example
       As a simple example of these computations, suppose the polygons of influence method has
been used to produce the following declustering weights for 10 sample concentrations of toluene
given below. Also suppose that a 90% confidence interval around the upper 95th percentile is
desired.
Draft                                    3-10                            February 1996

-------
                                                    Geostatistical Soil Sampling Guidance

Sample Number
1
2
3
4
5
6
7
8
9
10
Concentration
125 ppb
221 ppb
3 16 ppb
430 ppb
524 ppb
618 ppb
722 ppb
835 ppb
927 ppb
1020 ppb
DecltKterinp Weight
0.20
0.10
0.02
0.01
0.05
0.30
0.15
0.03
0.10
0.04
       The declustered mean estimate is equal to the sum of each weight times the corresponding
sample value for a total of 22.17 ppb.  The sum of the squared declustered weights gives 4> =
0.178, while the sum of the cubed weights gives A = 0.0408.  Plugging the declustered mean and
4> into the formula for the declustered variance then gives 19.417 ppb2, leading to a declustered
standard deviation estimate of 4.4065 ppb.  Using the values for  and A, the estimated degrees
of freedom (d.f.) for the best approximating cbi-square variable is (v-1) = 5.28, giving a final
effective sample size of 6.28, or approximately 6. Thus, instead of 10 independent samples, the
tables in Hahn and Meeker should be used with n=6.1

       Taking a = .05 for a two-sided 90%  confidence interval around the 95th percentile, the
Hahn and Meeker tables yield K,  = 0.875  and  KJ  = 3.708.  Finally, the  approximate 90%
confidence bounds are computed as LCLj, = 22.17-1- (4.4065)(.875) = 26.03 ppb and UCL«
= 22.17 + (4.4065)(3.708) =  38.51 ppb. By comparison, if the weighting of the data is ignored
and a naive confidence interval is formed with the vector of sample data, the confidence limits
instead would be estimated as (29.6 ppb, 40.5 ppb), where in this latter case the sample size for
selecting the K values is taken  equal to  10.
   'Some of the tables in Hahn and Meeker can also be found in Statistical Training Course Materials (pp. 61-62
of the "Outline of Statistical Training .Course"), available from the RCRA docket [USEPA/530-R-93-003; phone
number (703) 603-9230].

Draft                                     3-11                            February 1996

-------
                                                     Geostatistical Soil Sampling Guidance
 3.4.2  Estimation Using an Isopleth Contour Map
       Since the (l-p)xlOO% upper percentile can be viewed as that concentration level exceeded
 by only  p% of the underlying soil population, an alternate approach to estimating a percentile
 would be through the use of an isopleth contour map of the site.  For a particular contour, if one
 could integrate all the site area on the map above this level and divide the result by the total site
 area, one would have an estimate of the percentage of soil concentrations exceeding this level, say
 £, and hence an estimate of the (1-0x100% percentile.

       Since the calculation of isopleth contours would require some kind of spatial interpolation
 and knowledge about how to conduct at  least  a simple geostatistical study  of the data, the
 declustering approach described above may seem the easier of the two routes. Still, if enough
 sample data is  available  to run the kriging algorithm,  which can provide regularly gridded
 estimates of the concentration levels throughout the site, it is not hard  to generate isopleth
 contours with a package such as GEO-EAS (USEPA 1988). If the contours of the site map are
 overlaid on graph paper, or are approximated by regular, square pixels (such as in the GEO-EAS
 display),  it is not a difficult matter to estimate  the percentage of exceedances above a given
 concentration level and thus to estimate various percentiles.

       If one is interested in estimating the percentage of "hits" or exceedances above a certain
 cutoff level, for example the bright-line standard for  a certain constituent, but no information on
 other isopleth contours is really needed, the kriging estimates can be easily dichotomized into a
 series of Os and Is according to whether or not a specific  estimate is below or above the cutoff.
 Then these dichotomized values can be mapped on a  pixel  grid to indicate just those areas of the
 site where exceedances of the cutoff occur.  One word of caution: contouring of 0/1 type data can
 lead to averaging  in areas containing a  mixture of Os and Is and result  in isopleth contours
 between 0.0 and 1.0.  This may be acceptable for some situations, but it would not be useful for
estimating the areal portion of the site with exceedances.

       In general, the mapping and isopleth contours available through kriging are valuable tools
 in pinpointing those local areas of the site that may  be out of compliance.  As will be seen in a

Draft                                     3-12                           February 1996

-------
                                                     Geostatistical Soil Sampling Guidance
 later section, it is possible through these methods to make local estimates of the contamination
 pattern in each exposure unit (EU), which provides a wealth of information not available through
 global  analyses of the data.  On the other hand, the kriging solution is not a panacea for the
 problem of estimating global percentiles.  In no way does the kriging method help us easily
 produce confidence interval estimates for these quantities.
Draft                                     3-13                            February 1996

-------
                                                   Geostatistical Soil Sampling Guidance
             4.  USING HOTSPOTS TO DETERMINE COMPLIANCE
4.1    DEFINITION
       One additional approach that can be used in some cases to assess compliance is through
the use of "hotspots." A hotspot is defined as any contiguous sub-area of a site that has a
concentration level either much greater than typical site concentrations or which exceeds a pre-
specified action level.  For instance, if the vast majority of concentrations for benzene at a site
are either non-detect or less than 10 ppm, but a moderate size patch of soil near a dumping point
has concentrations in the range of 100 to 500 ppm, mis entire contaminated patch could be labeled
as a  hotspot. The principal difference between hotspots and upper percentiles is that a hotspot
refers to a geographically connected area of high-level contamination, whereas the high-level
concentrations in an upper percentile might be scattered around the site.

       Of course, even within a hotspot, the concentrations will probably not be completely
uniform. For this reason, the exact definition of a hotspot should be specified and tied either to
(1) the minimum concentration exceeded by all points within me hotspot area or to (2) the average
concentration within the hotspot.  For compliance purposes, it is also necessary to specify how
large the contaminated area must be to qualify as a hotspot.  For instance, suppose there exists
a 2-foot diameter patch of soil with concentrations that greatly exceed the compliance standard,
but the surrounding soil is within acceptable levels.  While this 2-foot patch could be considered
a hotspot, it is not clear  whether the expense to exhaustively sample  in order to identify and
remediate such a small area would be worthwhile or even necessary to be protective of human
health and the environment. So a minimum size is necessary to avoid declaring violations on the
basis of point exceedances that do not represent meaningful  or practical areas for subsequent
clean-up or other regulatory action.

       In addition, the sampling effort necessary to find such a small hotspot  could easily be
prohibitive.  Locating a hotspot with any efficiency requires some type  of structured, sequential
sampling  scheme.  The  available literature on  locating hotspots (Gilbert 1987)  provides
probabilities of finding one or more hotspots only for regularly spaced grid sampling designs and
then only for hotspots that have been defined in terms of a minimum size and particular geometry.

Draft                                    4-1                            February 1996

-------
Geostatisticcd Soil Sampling Guidance
It is not difficult to compute similar probabilities using strictly random sampling; however, the
loss of sampling efficiency tends to be substantial compared to using a regular sampling grid.
Consequently, it is almost always significantly more expensive to find hotspots via random
sampling than through the use of a systematic grid design.

Because of these difficulties, a minimum size area (and approximate geometrical shape
such as a circle, square, or rectangle) should be specified in advance before searching for
hotspots. The choice of minimum area is somewhat arbitrary, but some guidelines can be drawn
from previous United States Environmental Protection Agency (USEPA) risk assessment efforts
(e.g., Neptune et al. 1990; Barth 1989). For most situations, the smallest contiguous physical
area of regulatory and/or risk assessment concern would be half an acre, particularly when the
land is to be used for residential purposes. Of course, exceptions might be necessary or
reasonable in some cases. If the land were' to be used strictly for industrial applications, the half
acre minimum might be smaller than necessary. In mat case, hotspots might need to be as large
as 1 to 5 acres before regulatory action was required. On the other hand, if the land was targeted
for more sensitive uses (e.g., a playground), the minimum hotspot size might have to be smaller
(e.g., a 10-foot square or a 6-foot radius circle).

4.2 ISSUES CONCERNING MINIMUM-SIZED HOTSPOTS
4.2.1 Usually Need a Grid-based Sampling Design
To flesh out some of the important conceptual issues connected with hotspots, it is
important to understand why hotspots are often very difficult to locate through the use of strictly
random sampling. As was discussed earlier, if sampling points around the site are selected at
random, one has no control or foreknowledge as to what parts of the site will actually be sampled.
Some portions may be heavily sampled while others are left practically untouched. Only in the
long run (i.e., with a very large sampling size) does random sampling promise to evenly cover
the site area. Because of this fact, a single hotspot is more easily located through the use of a
grid-based design. With a systematic grid sample, the grid can be "tightened" or "coarsened"
by making the spacing between grid nodes smaller or larger respectively, thus increasing or
decreasing the chance of finding a particular hotspot.

Draft 4-2 February 1996

-------
Geostatistical Soil Sampling Guidance
As mentioned above, given a particular geometry (e.g., circular or elliptical) for a potential
hotspot, the tightness of the grid and the minimum size of the hotspot can be combined to
determine the probability of hitting (i.e., intersecting) the hotspot at one or more of the sample
points. In this way, one can gauge what level of sampling intensity is necessary to locate a
particular hotspot with a specified probability, the sampling intensity measured by the total
number of samples needed to complete a regular grid with a certain internode spacing.

Of course, even if the probability is high that at least one sample node will hit the hotspot,
unless two adjacent samples both hit the contaminated area, it will be impossible to say how large
an extent the hotspot actually covers except that its radius is smaller than the internode spacing.
If only one sample hits the hotspot, it might not even satisfy the minimum size requirement. The
sample might just be a "lucky" hit on a much smaller bullseye (the same could be said about two
hits on adjacent grid nodes, but the probability is much smaller that each would separately hit a
contaminated area not connected to the other).

The point of these speculations is twofold. First, if a hotspot search is the goal of the
sampling effort, the grid should be designed to ensure that a minimum-sized hotspot will be hit
with a certain probability. The grid also should not be so coarse mat the probable extent of the
hotspot (once it is found) is difficult to determine. One possible option would be to design a tight
enough grid so that the probability is high that two adjacent grid nodes will both hit any hotspot
of minimum size.

Second, finding a potential hotspot is not the same as determining its extent. Very often
additional "second-phase" sampling will be necessary in areas that the initial "first-phase" samples
indicate might qualify as hotspots. This second-phase sampling can be conducted in a variety of
ways. For the purpose of determining the extent of a potential hotspot, it may be useful to place
second-phase samples around the initially triggering grid node in a series of radial transects (see
Figure 8). That way, the pattern of contamination can be more carefully defined. If the area
qualifies as a hotspot, its total extent can be determined. It also can be determined if the area does
not qualify as a hotspot. Of course, it is important to plan the overall sampling effort in a way

Draft 4-3 February 1996

-------
                                                  Geostatistical Soil Sampling Guidance
                       Figures. Determining Extent of Hotspot Area
                                         O   0    0
                                          O     0
                                           000

                                 X     000X000      X

                                           O O O.
                                          00
                                    i    O   *    0
            x - First-phase grid locations

            o - Second-phase samples near potential hotspot
Draft
4-4
February 1996

-------
                                                     Geostatistical Soil Sampling Guidance
 that will account for the need for additional second-phase samples, so that the sampling budget
 is not expended after the first-phase grid.  As will be discussed, two-phase sampling is often an
 extremely useful and cost-efficient strategy not only for locating hotspots, but also for estimating
 average soil concentrations in local blocks around the site (e.g., exposure units [EUs]).

 4.2.2 Very Large Sites Need Special Consideration
       One additional caveat concerning the search for hotspots is that very large or extensive site
 areas may require special consideration. At a very large site, the sampling effort and resources
 required to reliably locate hotspots of a minimum size will almost certainly be much greater than
 the effort required at smaller sites (i.e., the number of samples on a regular sampling grid tends
 to grow in proportion to the area of the site for a given internode spacing).  If these sampling
 costs become prohibitive, and if the site is not targeted for particularly sensitive uses, one option
 would be to define the minimum size qualification for a true hotspot in a way that specifically
 links it to the total site area.   For instance, the minimum  size could be defined as a small
 percentage,  perhaps one percent, of the total area, so diat a contaminated area would only qualify
 as a hotspot in a 1,000-acre site if its extent were at least 10 acres. The idea behind this approach
 is to minimise the sampling effort and costs required for large sites, and also to avoid mandating
 remediation or further regulatory action in areas of a site containing less than a tiny fraction of
 the total area.

       Another  approach  to  make  sampling  at large sites  economically  feasible  and
 environmentally  protective utilizes  composite  sampling and  sequential analysis strategies.
 Composite  sampling involves physically combining two or more samples  (preferably from
 adjoining EUs within a single sample stratum  and of equal volume)  for  a single laboratory
 analysis.  If the sample is judged "clean" or less than the applicable standard divided  by the
 number of samples used to construct the composite (e.g., if the standard equals 20 ppm and four
 individual samples are composited, the composite measurement would need to be no greater than
 5 ppm), then no further analyses are needed on those samples.  If, however, the composite
 measurement exceeds the cutoff above, then further analyses would be required to identify which
 specific EU(s) exceed the standard.  To optimize this search, sequential analysis suggested by

Draft                                     4-5                            February 1996

-------
                                                  Gtostatistical Soil Sampling Guidance
 Li and Rajagopol (1994) is recommended. This approach, and safeguards to prevent "diluting'
 hot spots through compositing, are discussed further in Appendices C and D.
Draft                                   44                            February 1996

-------
Geostatisticd Soil Sampling Guidance
5. DECISION ERROR AND UNCERTAINTY VERSUS SAMPLING COSTS
5.1 KEY ISSUES
5.1.1 No Limited Sampling Program Can Guarantee a Correct Decision
One of the most important things to understand about the design of a sampling program
is that, like anything in life, "you get what you pay for." The field of statistics is often described
as the art of managing uncertainty or the science of uncertainty, but too many non-statisticians
expect that fancy statistical jargon can magically eliminate any uncertainty in their particular
results. Most uncertainty can be eliminated, but there is no magic bullet; the elimination of
uncertainty always comes at a price. The price tag is typically a significant increase in the
required sampling effort and/or sampling cost. As a simple example, suppose one decides to cut
the internode distance by half on a regular square grid. This will increase the total number of
samples, and probably the total sampling cost, by a factor of four.

Even with an increased sampling effort, despite a potentially large increase in the overall
study cost, some uncertainty will remain. Only if the entire site were exhaustively sampled could
all uncertainty be theoretically eliminated. Barring this, no sampling program can absolutely
guarantee that a correct compliance decision will be made. However, sampling plans can be
designed to balance the cost of sampling against the uncertainty that will be incurred. This
goal—choosing the sampling design and the intensity of sampling to eliminate the greatest degree
of uncertainty for minimum cost—is the art of good statistical planning.

5.1.2 Goal: Quantify Uncertainty Relative to a Given Set of Sampling Costs
Quantifying uncertainty relative to a given set of sampling costs is not an easy task. The
problem and statistical formulation must be well-defined. Furthermore, the statistical planner
must determine whether any hard constraints are already fixed. Hard constraints can make the
job of sample planning much easier. For instancy if the sampling budget is fixed in advance and
there is no penalty or savings associated with not using the entire budget, the solution is simply
to compute how many samples can be collected and then determine how much uncertainty can be
eliminated with that number of samples. Similarly, if the desired level of maximum uncertainty

Draft 5-1 February 1996

-------
Geostatistical Soil Sampling Guidance
is fixed, one can determine how many samples need to be collected under a particular design to
meet or exceed the uncertainty requirement.

However, in many instances there may be no fixed constraints in either the sampling
budget or the level of desired uncertainty. Several United States Environmental Protection
Agency (USEPA) guidance documents do recommend particular levels of uncertainty, although
these documents are by no means uniform in their approach. Still, one approach would be to pick
desired uncertainty levels based on past precedent, and use this precedent as a hard constraint in
order to determine the needed sampling budget. For managers that need greater flexibility in
setting the sampling budget, an alternate approach would be to first recognize that any change in
sampling intensity (e.g., on a regular grid) will be tied to a specific change in the overall level of
uncertainty. By hypothetically varying the sampling intensity, one should be able to map or table
the expected changes in uncertainly level.

. To be more precise, the total uncertainty associated with a statistical hypothesis test used
to measure compliance is generally divided in two components: Type I error rates (false
positives) and Type n error rates (false negatives). False positives represent cases where the null
hypothesis HO is falsely rejected. In the typical scenario under the Hazardous Waste Identification
Rule for Contaminated Media (HWIR-media), the null hypothesis is that the true arithmetic mean
(either of the site as a whole or perhaps within a single exposure unit [EU]) exceeds bright-line,
while the alternative hypothesis HA is that the mean is actually below bright-line. Therefore, a
false positive represents the case where the mean is judged below bright-line when in fact it
exceeds bright-line. A false negative, on the other hand, represents the case where the mean is
judged to be above bright-line when in fact it is not.

Both of these errors contribute to the overall level of uncertainty associated with any
specific sampling design. Neither can be eliminated completely, but one or the other or possibly
both can be minimized through a prudent choice of sampling plan. However, not all changes to
a sampling plan will impact the Type I and Type n error rates in the same way. Increasing the
sampling intensity on a regular grid, for instance, may decrease the likelihood of both false

Draft 5-2 February 1996

-------
Geostatisticd Soil Sampling Guidance
positive and false negative mistakes, but the impact might be more significant on the Type I error
than the Type n error (or vice-versa). One might have to hypothetically compute the impact of
several alternative sampling plans to find one that minimizes both the false positive and false
negative rates to acceptable levels.

Furthermore, while a fixed null hypothesis often means that the false positive rate is
readily computed, the false negative rate explicitly depends on the specific alternative hypothesis
being considered. For example, suppose the alternative hypothesis states that the overall site
mean is less than bright-line. This hypothesis is actually what is known as a compound hypothesis
representing an infinite number of alternatives. In fact, the probability of falsely accepting the
null hypothesis that the overall mean is greater man bright-line greatly depends on how far below
bright-line the true mean actually sits. If the true mean is substantially lower than bright-line, it
should not be too hard to judge correctly, thus making a false negative mistake unlikely.
However, if the true mean is barely beneath bright-line, it may be very difficult with the data
collected not to make a false negative error, since the data are likely to be too close to bright-line
to judge accurately. Thus the false negative rate is not in general a fixed quantity, but varies as
a function of the strength of the alternative hypothesis.

One way to explicitly account for the variety of possible false negative rates is by
constructing a power curve. A power curve is a graph which plots the probability of accepting
the alternative hypothesis (along the vertical axis) against the strength of the alternative. For
example, if the alternative hypothesis is parameterized in terms of the number of standard
deviation units that the alternative mean is below the null mean, the horizontal axis would be
scaled from zero units at the left endpoint up to perhaps four or five units on the right. Then the
power curve would increase with increasing x-axis units, representing the ever increasing
probability that the null hypothesis will be rejected as the true (alternative) mean gets lower and
lower (see Figure 9). By comparing power curves for alternate sampling strategies, one could
choose a strategy on the basis of which power curve was steeper and more likely to result in a
correct decision to reject HO. This approach was taken in the revised Superfund Soil Screening
Level guidance (USEPA 1995).

Draft 5-3 February 1996

-------
                      Figure 9.  Example of Power Curve
    0.75
£

3  0.!
£
    0.25 -

                                                                                                      ' %
"I
 0
....	__t
                                               t
                       1234
                       Standard Deviation Units Below Bright-Line
                                                                                                     !

-------
                                                    Geostatistical Soil Sampling Guidance
       Of course, the calculation of an entire power curve may be difficult depending on the
 statistical decision criterion to be used. A simpler method designed to fix the false negative rate
 only for a particular choice of alternative hypothesis can be found within the USEPA's Decision
 Error Feasibility Trails (DEFT) sampling design software (see Appendix A).  In this package,
 there is deliberate recognition of the fact that for null and alternative hypotheses that differ only
 slightly (e.g., the alternative mean is slightly smaller than the  null mean), the probability of
 correctly rejecting HO will be small, leading to large false negative rates.  Instead of trying to
 minimize error rates in these cases, the DEFT program asks the user to supply a "gray region"
 in which the false negative rates are of lesser concern. Anything beyond  the gray region is
 considered of substantial importance, so for these alternatives the false negative rate must be
 minimized to a pre-specified level. In practice, these constraints imply that the user must decide
 on a minimum difference of practical concern (e.g., the difference between the null and alternative
 means is of practical significance whenever it is at least 20 mg/kg). Then the sampling program
 is designed to ensure mat such a difference will be discovered with a specific level of probability
 during statistical testing.

 5.2    COMPLIANCE VIOLATIONS DEFINED BY AVERAGES
       Quantifying uncertainty through the computation of false positive rates and false negative
 rates and comparing these rates against the degree of sampling effort or total sampling cost is the
 ultimate goal in planning a sampling program.  However, the correct computation of error rates
 greatly depends on how a violation or  out-of-compliance declaration is defined.  Under the
 proposed HWIR-media, the null hypothesis assumes that mean contamination levels on site exceed
 bright-line standards at one  or more EUs.  Statements about mean concentration levels do not
 directly address upper percentiles or hotspots, nor do they address the error rates associated with
 compliance decisions based on these alternate criteria. However, it is possible to compute false
 positive and  false negative error rates for mean estimates, though one must first specify the kind
 of mean estimate to be computed.

       If an overall "global" site mean is desired, the methods described earlier for computing
 global confidence intervals around the mean can be used to judge compliance with bright-line by

Draft                                      5-5                             February 1996

-------
Geostatistical Soil Sampling Guidance
comparing the upper confidence limit against the standard. In this case, the maximum false
positive rate will simply equal the complement of the confidence level chosen (i.e., alpha). The
false negative rate, on the other hand, will depend on the type of sampling design used to generate
the confidence level.

If strictly random sampling was used to select n data locations, the confidence interval can
be based on the classical t-distribution with (n-1) degrees of freedom (d.f.). Similarly, one can
use the non-central t-distribution with the same d.f. to determine false negative rates for various
alternate mean levels and work backwards to find that sample size n resulting in the desired false
negative rate for the minimum difference of practical significance.

If non-random or partially-random sampling is used, the global confidence interval will
be based explicitly on the estimated spatial covariance model and the key assumption that the
distribution of estimation errors is approximately normal. The estimation variance so constructed
can be used to compute a very approximate estimate of the Type n error for an alternate
hypothesized mean below bright-line. As an example, suppose a 95% upper one-sided confidence
interval around the global mean was computed by adding a multiple of 1.645 times the square root
of the estimation variance to the dechistered global mean estimate. Assuming this upper
confidence limit falls above bright-line, a false negative would occur if the true site mean were
actually below bright-line. By agqmyng that the same estimation variance and.normal
distributional pattern is equally as valid for any alternative mean, one can easily compute, for a
normal curve centered on the hypothesized alternative, that portion of the curve exceeding bright-
line. In this way, one can compute the approximate false negative rate for any mean below bright-
line (see Figure 10).

The above methods of determining error rates are well suited for when one wants to
estimate an overall mean, but they do hot provide help when one needs to estimate a series of
localized means, such as the mean at each EU on the site. When decisions about remediation or
further regulatory control/action need to be made on an EU-specific basis, estimates of the global
site mean may give little information about local error rates. Under the proposed HWIR-media

Draft 5-6 February 1996

-------
I
          Figure 10. Computing False Negative Rates
                                                                  Bright-Line
s
                                                                                        Cone.
                                                                                                    .

                                                                                                    J.
                                                                                                   S
                                                                          I
                                                                          9
I
Where   «X> ==  Cumulative Normal Distribution

-------
                                                     Geostatistical Soil Sampling Guidance
 in this more complex case, a false positive would be incurred whenever a particular EU  is
 designated as below bright-line when hi fact the local  EU mean  is greater than bright-line.
 Likewise, a false negative would represent any EU designated as being above bright-line when
 in fact the true local EU mean was below the standard.

 5.2.1  The Loss Function Approach
       Overall false positive and false negative rates for the site as a whole can be measured in
 a variety of ways.  One approach would be to count, for a given sampling design,  the overall
 expected number of EU-specific false positive and false negative decisions taken as a percentage
 of the total number of decisions to be made (i.e., equal to the  number of EUs on the site).
 Another approach would be to assign a loss function to the decision process that totals the
 expected sampling and remediation costs associated with a given sample design. For instance, in
 a paper by Englund and Heravi (1994) on phased sampling designs, they suggest the use of a
 linear loss function keyed to the action level (e.g., bright-line).

       In this model, the cost of remediating a particular EU is exactly equal to the cost of not
 remediating when the true EU mean is equal to the action level (see Figure 11).  Above and below
 the action level, the cost of not remediating an EU is directly proportional to the true  local mean
 concentration, but the cost of remediation remains constant.  As  a consequence, EUs with mean
 concentrations lower than the action level cost less not to remediate than to remediate, while the
 reverse is true for EUs with mean levels higher than the action level.  Such a model is a simplistic
 approximation of the actual losses incurred, but it can be a useful measure for assessing the
 approximate overall combined error rates.  By  attaching hypothetical dollar amounts to false
 positive  and false negative decisions and combining these  amounts with costs incurred from
 correct remediation decisions and degree of sampling, a single cost measure can capture much of
 the statistical  uncertainty due to sampling and analysis  hi a simple way and allows for easy
 comparison among competing sampling designs.1
   'Note in the Enghmd and Heravi (1994) paper thai the flnT'p«MO hypothesis structure is reversed from the HWIR-
media proposal. In particular, they test HO: n * action level vs. HA: ft > action level.

Draft                                      5-8                             February 1996

-------
                                                          Geostatistical Soil Sampling Guidance
                              Figure 11. Loss Function Approach
                                     Ho: remediation unit clean
                                     HA: remediation unit dirty
                       (H>AL)
                  .
              Positive
              Loss
                                                                                 False
                                                                                 Negative
                                                                                 Loss
                                   Remediation Unit Concentration

                   False Positive Loss:  Cost incurred by incorrectly remediating clean soil,
                    Le. soil with concentrations below the action level
                   False Positive Loss: Cost incurred by foiling to remediate dirty soil
                    i.e. soil with concentrations above the action level

               Figure illustrates cost of remediation (horizontal line) and non-remediation (diagonal line)
               vs true unit concentration. The costs are set equal at the action level concentration. At
               any concentration, the lower line represents a correct decision, the upper line, an incorrect
               decision. The difference is the loss incurred if a decision error is made
Draft
5-9
February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
 5.3    VIOLATIONS DEFINED BY UPPER PERCENTILES
       For violations defined by upper percentiles like the 90th or 95th, the decision is based on
 whether a pre-specified fraction of the onsite concentrations is greater than the compliance
 standard.  For instance, if the 90th percentile is used, the goal would be to ensure that no more
 than 10 percent of all the expected soil concentrations exceeded the standard, or equivalently, that
 the 90th percentile did not exceed the regulatory limit.  If it did, it would imply that at least 10
 percent and probably more of the highest concentrations would violate the standard.

       As noted before, upper percentile violations are best used in comparisons against standards
 that are designed to guard against short-term, acute exposure risks.  This kind of comparison
 might be important  if individual high-level exposure events could cause significant health or
 environmental risks. Note, however, that measuring upper percentiles is not quite the same as
 locating hot spot areas.  Estimating an upper percentile does not account for the number, size, or
 location of hotspots onsite; rather, the estimate simply identifies whether or not a certain fraction
 of the total soil has a concentration greater than the standard.

       To decide with statistical confidence that  a given percentile either exceeds or does  not
 exceed a fixed regulatory limit, the same kinds of questions connected with estimating averages
 must be addressed.  First,  is the decision to be based on an overall, global percentile estimate or
 on a series of localized, EU-specific percentiles? In the global case, while it may be possible to
 construct an approximate confidence interval around the desired percentile, it should not be
 assumed that the sampling distribution of the percentile estimate is approximately normal. Unlike
 the situation for estimating a global mean, a normal distribution cannot be invoked to approximate
 the likely Type n error rate associated with a particular sampling plan.

       Even if a specific confidence level is set for the hypothesis HO: true percentile > standard
 vs.   HA:  true percentile £  standard, and an upper confidence limit  on the true percentile is
 compared against the standard, this approach only specifies the Type I (false positive) error rate
 and not the Type n error probability of falsely declaring the percentile to be above the limit when
 it is actually below.  As with estimating a mean, the Type n error probability will depend on what

Draft                                     5-10                            February 1996

-------
                                                       Geostatistical Soil Sampling Guidance
 the true percentile is and how far above the standard it sits.  But because the distribution of an
 upper percenule is not normal, the existing research literature does not appear to be helpful in
 determining likely Type n error rates associated with such an estimation strategy, especially vis-a-
 vis fixed levels of sampling effort, grid size, sampling design, etc. If a series of localized, EU-
 specific percentiles is needed instead, an alternate approach would be to modify Englund and
 Heravi's (1994) loss function model.  In the case of estimating a mean, an equal trade-off between
 costs and benefits was assumed whenever the true mean was equal to the action level (e.g., bright-
 line).  Above the action level, the loss associated with not remediating the site was assumed to be
 proportional to the difference between the true mean concentration and the action level.  Below
 the action level, loss occurred if the she was remediated,  since the costs of remediating were
 assumed to be greater than the cost of taking no action.2

       In principle, the same kind of conceptual model could be applied to an upper percentile,
 perhaps with modifications to the  concentration levels at which costs and benefits would equally
 trade-off and the degree of loss to be expected from either (1) not remediating when the true.
 percentile was above the action level or (2) remediating when the action level equaled or exceeded
 the true percentile.  As will be seen below, a conceptual loss framework of this kind combined
 with the  development of a simulated geostatistical site model can be used to estimate the overall
 expected loss due to false positive and false negative errors in a series of individual EUs.  The
 same basic method could be used to estimate EU-specific  upper percentiles instead of means and
 to compare these percentile estimates against the action level or compliance standard for each EU.
 The only serious complication might be in deriving an approximate sampling distribution for an
upper percentile  over a localized area, given the high degree of spatial correlation  often
anticipated.
   'Under die HWR-media proposal, the consequences of the loss function could be similar (i.e., the application of
stringent Federal standards when state oversight—and perhaps more flexible clean-up decision criteria—is justified
represents a false negative loss, and conversely, the application of state oversight to a site when more stringent Federal
standards are warranted represents a false positive loss). The discussion here, however, is generic, as it is thought
that this guidance will have application beyond the HWIR-media proposal.

Draft                                       5-11                             February 1996

-------
Geostatistical Soil Sampling Guidance
5.4 USING HOTSPOTS TO DEFINE VIOLATIONS
Using the presence of hotspots as a criterion for regulatory violation is more complicated
in some respects than using either a mean or upper percentile, especially when it comes to pinning
down the expected uncertainty associated with a sampling regime designed to locate one or more
hotspots of a given minimum size. At least two important factors affect the overall uncertainty.
First, there is the probability of actually locating a potential hotspot. Even sampling on a
systematic grid does not guarantee that a hotspot, if it indeed exists, will be found. The
probability will depend not only on the coarseness of the sampling grid in relation to the minimum
area to be detected, but also on the shape and orientation of the hotspot if it is not completely
circular (see Gilbert 1987).

As an example, if the diameter of a circular hotspot is at least 30 meters and the spacing
between rectangular grid nodes is 21 meters or less, any such hotspot will be sampled with
certainty. However, if a closely-spaced grid is too expensive to implement and a looser grid is
used instead, the probability of intersecting a hotspot decreases in relation to the grid spacing.
Furthermore, even in this example the probability need not be certain if the hotspot is elliptical
instead of circular, for then it may be possible to "fit" die hotspot between grid nodes without any
of the samples intersecting the hotspot area. Rectangular grids, of course, are only one possibility
in the layout of a sampling plan. Triangular or hexagonal grids will incur different probabilities
of success in locating a hotspot, again depending on the shape and orientation of the hotspot area
relative to the grid.

The probability will also be impacted by the number of minimum size hotspots existing
onsite. Most of the available literature on hotspots (e.g., Gilbert 1987) deals with the worst-case
scenario of having to locate a single hotspot. If multiple hotspots exist but only one need be found
to trigger a remediation effort, the probability of locating any of the hotspots will improve,
perhaps dramatically, even with a grid too coarse to guarantee finding a lone hotspot.

The second factor impacting overall uncertainty is the probability correctly determining
the extent of any potential hotspot to see if it meets the minimum size requirements. Just because

Draft 5-12 February 1996

-------
Gcostatistical Soil Sampling Guidance
& potential hoispot has been located via an extreme sample concentration collected from a node
on the sampling grid, it does guarantee that the extreme concentration is pan of a hotspot of
sufficient extent. The sample may represent nothing more than an isolated but tiny patch of high
concentrations, too small to be worth the remediation effort and/or cost. Only a connected patch
of high concentrations that satisfy the minimum extent requirements should be classified as a bona
fide hotspot. However, to determine this extent more accurately, additional sampling may be
required in a second stage, after the first-stage grid results have been reported. Such additional
sampling will impact the overall sampling costs as well as the overall uncertainty associated with
declaring a hotspot exceedance.

An important consideration to remember is what is being tested when hotspots are used
to establish compliance violations. Unless a known spill or incident of contamination makes it
likely that a hotspot already exists a priori onsite, the usual hypothesis is HQ: no hotspot of
minimum size exists vs. HA: one or more such hotspots exist. In this testing framework, a Type
I error occurs whenever a hotspot is declared to be found, but in reality it does not exist. This
can happen, of course, when an isolated, extremely high measurement is made, but the
surrounding soil samples do not qualify as hotspot concentrations. The likelihood of such a false
alarm will depend in part on the probability of laboratory measurement error and also on the
degree of spatial correlation exhibited onsite. If there is a strong as opposed to weak pattern of
spatial correlation, the probability will be much greater than an extreme, high soil concentration
will be surrounded by other high soil concentrations.

On the other hand, a Type n error occurs when no hotspot is located, even though such
a hotspot exists. This will happen most frequently when the first-stage grid completely misses the
hotspot. It could also occur, however, if a potential hotspot is located but the second-stage
sampling used to measure its extent is not adequate to determine whether or not the minimum size
requirement is met. For either type of decision error then, it is important to consider those factors
associated with correctly measuring the extent of a potential hotspot.
Draft 5-13 February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
       Some of these factors include the specific geometric pattern of the second-stage samples
around the initial extreme first-stage soil concentration, and what kind of strategy is employed in
placing these samples. Without excavating an extensive area around the extreme measurement,
there will always be uncertainty as to whether the extreme value is part of a connected patch of
high concentrations or just an isolated parcel.  However, if the degree of spatial correlation onsite
has previously been estimated and found  to be fairly high, it may be possible to use a radial
second-stage sampling pattern suggested earlier, where the number of radial "spokes" or transects
and the number of samples taken along each transect are balanced against the available sampling
and analysis budget.

       Similar considerations of sampling budget could be used to possibly tighten the first-stage
sampling grid dining the planning phase to  ensure t^stt the extent of a minimum-size hotspot can
be determined with a minnniim of second-stage sampling efforts. The idea in this case would be
to design the first-stage grid to ensure with  high probability mat any hotspot would be intersected
by at least two adjacent grid nodes.  While this would not eliminate the need to perform some
additional second-stage sampling between these "hits" on the chance that they were isolated soil
parcels and not pan of a connected hotspot, it would certainly reduce the amount of second-stage
effort needed. The cost, however, would be reflected in the more intensive first-stage sampling
necessary, and this cost could outweigh the second-stage savings depending on the specific site.
Other strategies to locate hotspots—while simultaneously conserving sampling and analytical
costs—using sample compositing and sequential analysis are discussed in Appendix C.

5.5    FACTORING IN SAMPLING COSTS
       As mentioned earlier, one of the most important goals of any sample design is to minimize
the false positive and false negative error rates to acceptable levels while keeping total sampling
and analysis costs to a minimum.  Clearly then, sampling costs must be factored into any decision
about the  final sampling design. In some cases, the number of samples  to be taken will be an
adequate surrogate for actual sampling costs. However, overall analytical costs per sample should
also be factored into the sampling cost equation.  The choice between one sampling design versus
Draft                                     5-14                            February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
 another may depend more on the analytical cost involved than the cost of collecting the actual soil
 cores.

       One example of this would be a decision to use an inexpensive field screening method to
 measure soil concentrations as opposed to an expensive, but more precise, laboratory analytical
 method.  While the field screening method would not generally provide  as much sample-specific
 measurement precision, a much larger number of samples overall might be collected for the same
 sampling and analysis budget. In at least some cases, the larger number of samples possible using
 a field screening method will more than make up for the loss of measurement precision in terms
 of  how  much the overall  rates of false positive  and false negative errors are  reduced.
 Furthermore, high-cost laboratory precision is generally not needed to identify hotspots, making
 field screening methods ideal for this task.

       Another example would be the decision to use composite sampling techniques to reduce
the analytical expense associated with having to measure a large number of soil cores. In areas
of a site with little known or expected contamination,  such composite sampling methods can
accommodate a large number of individual samples, but substantially minimize the number of
analytical measurements (and hence overall sampling and analysis costs)  needed to make accurate
compliance decisions.

5.5.1  Strategies to Balance Cost Versus Uncertainty
       In line with the ultimate goal of proactive sample planning, one would ideally want a table
or graph that portrayed, for a particular type of sample design (e.g., simple random, stratified,
regular grid, etc), expected changes in the overall level of uncertainty as the sample intensity or
total sampling cost increased.  Then one could (1) pick a required !"«*"""*" uncertainty level and
find the corresponding sampling intensity, (2) fix the sampling intensity under a given sampling
budget and find the corresponding level of uncertainty,  or (3) choose a reasonable tradeoff
between minimizing the sampling budget and minimising the expected uncertainty. By generating
such tables or graphs for each kind of sample design, one could compare, for fixed sampling
Draft                                     5-15                            February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
 budget or fixed uncertainty level, the type of sample design that would generate the best results.
 Unfortunately, statistical planning is not always so easy, although there are certain exceptions.

       If a classical simple random or stratified random sampling design is planned, and the goal
 of the study is to make a decision about a single parameter such as the overall site mean, one can
 use USEPA's Decision Error Feasibility Trials (DEFT) software package to interactively balance
 sampling and analysis costs versus expected Type I and n error rates (see Appendix A).  In this
 package, once a basic hypothesis structure and a minimum practical difference are chosen, the
 user can fix the level of tolerable .(i.e.,  maximum desired) error rates and  the  program will
 generate the total sampling costs required.  If the total costs are too high, the tolerable error limits
 can be raised until the sampling cost is acceptable.

       In addition to the DEFT software; a document by Mason (1992) provides a discussion
 (with references) on a method for integrating sampling costs with the precision of the estimates
 obtained from a soil sampling program. Both of these approaches are certainly useful for cases
 of a single site parameter; however, when a compliance decision must be made for each EU
 individually, a different approach to balancing uncertainty and sampling costs must generally be
 considered. One possibility would be to perform a stratified random design where each EU is
 designated as a separate stratum, with the goal of making individual stratum estimates.  Usually,
 though, the degree of sampling required to generate estimates with the desired precision in each
 EU would be prohibitive if there are more than just a few EUs on site.

       Alternately, geostatistical sample designs can be considered that involve a regularly-spaced
 grid of sample locations and perhaps multiple phases of sample collection. The advantage of such
 designs, as will be detailed later in this document,  is that  the degree of spatial dependence
exhibited by the data can be explicitly accounted for and compliance decisions for each EU can
 be generated in an efficient manner.  However, accounting for spatial dependence makes the pre-
 computation of required sampling intensity much more difficult.
Draft                                     5-16                            February 1996

-------
Geostatistical Soil Sampling Guidance
In fact, without an attempt at more complicated geostatistical site simulation, the total
number of samples required ahead of time cannot be readily forecasted. At best, an iterative
procedure is described below whereby samples collected during a first phase of sampling are used
to predict those specific EUs with the highest potential for making a false positive or false
negative decision error. Then additional samples can be collected in a second phase from those
specific EUs and the error rates recomputed. Additional samples can be collected until the
expected error rates fall within tolerable levels.

5.6 DETERMINING SAMPLE SIZE VIA GEOSTATISTICAL SIMULATION
The basic process of estimating total sample size on a site-specific basis using a
geostatistical sampling design is described in the-following pages. This algorithm is described in
more detail in a paper by England and Heravi (1994) entitled, "Phased Sampling for Soil
Remediation." Overall, the approach they recommend is computationally intensive but feasible
with today's ever-improving personal computers. Some prototype software is apparently already
available through USEPA/EMSL-Las Vegas. As will be seen below, some information about the
pattern of spatial variability at the site and actual sample data unfortunately are needed ahead of
time to estimate the final required sample size. At present there seems to be no good way to
estimate the required size of a geostatistical sample without a fair amount of prior information,
information that will probably require some initial sampling expense.

Englund and Heravi measure uncertainty via the loss function approach previously
described. The cost of not remediating a particular EU is modeled as a linear function of the true
mean concentration, while the cost of remediation is considered as a constant for each EU
regardless of mean level. In addition, the process of sampling and analysis is taken as a constant
per sample and added onto the total expected cost. In this way, they are able to reduce the
problem to one of minimizing the overall expected cost under a given sampling design.

The problem is also specific to the geostatistical framework because compliance decisions
(i.e., the decision whether or not to remediate) are made for each local EU and not just for the
site as a whole. As a result, the sampling designs considered by Englund and Heravi are

Draft 5-17 February 1996

-------
Geostatistical Soil Sampling Guidance
geostatistical in nature and include a one-phase design that approximates a regularly-spaced grid
(more precisely a stratified random design with one sample location per stratum), and multi-phase
designs that begin with the same initial "regular" grid but then add additional samples in locations
suggested by the first phase results.

As for the compliance decisions themselves, the estimated mean at each local EU is
compared to an action level. If the mean is greater than the action level, remediation is initiated.
If not, the EU is left alone. This corresponds to a statistical hypothesis structure with a null
hypothesis of the form HO: M £ action level, and an alternative of the form HA: /t > action level.
Such a structure is essentially the reverse of the proposed HWR-media where the hypotheses are
of the form: HO: p * bright-line vs HA: ft < bright-line. Reversal of the structure flips the roles
of the Type I and Type n errors as described in Englund and Heravi; otherwise their method
should yield very similar error rates and sampling cost considerations to the HWIR-media.

The single most important requirement for mtmieiring Englund and Heravi's sample size
calculations is a fully known site model. By fully known, we mean a site at which the values of
all possible discrete sampling locations are known in advance. Of course, such a model is
completely impractical, because knowing the values in advance would preclude any need for
sampling. Still, such fully known sites provide a fertile ground for testing alternate sampling
strategies to determine which ones work best. In addition, Englund and Heravi allude to an
intermediate method for partially known sites, where some sampling information is available and
the pattern of spatial correlation is either already known or can be reasonably estimated from the
existing sample data.

This latter method is known as conditional geostatistical simulation (CS). For details on
implementing the method, the reader is referred to Englund and Heravi (1993). The basic idea
involves the need to estimate an individual value at each possible discrete sampling location, the
same kind of information one would have if the site were fully known. However, since the values
at all unknown locations must be estimated from the already existing sampling data, the data are
simulated through a process that ensures that the known sample data are equal to the simulated

Draft 5-18 February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
values and that the simulated  values at unknown locations have the same pattern of spatial
correlation exhibited by the sample data.  By the end of the process, an estimated but fully known
site model is built, from which Englund and Heravi's algorithm can be used to design the most
efficient sampling program.

       The obvious drawbacks to the CS approach are the complicated computational algorithms
needed to build an estimated site model and the fact that some prior information about the site
(i.e., existing sampling data and perhaps prior knowledge of the pattern of spatial dependence)
is necessary to build the model.  As noted above, it does not appear that one can adequately plan
a geostatistical sample design without taking some samples first or having prior existing data with
which to work.

       In fact, the prior sampling information is very important to correctly plan the sample size.
In the two fully known site models used by Engiund and Hevari, the first with significantly greater
spatial continuity than the second, the total number of samples needed for the most efficient
designs ranged from close to 60 in the first model (representing approximately one sample for
every 330 "pixels," or discrete sampling locations) to approximately 250 in the second model
(approximately one sample for every 80 pixels).

       Sample  sizes of this magnitude may  seem excessive to someone used to working with
classical statistical  methods and the estimation of a single population parameter.  But  in
geostatistical studies, the goal of mapping the entire site so that localized concentrations can be
estimated (e.g., the mean of a local EU) is far more difficult to achieve within tolerable error
bounds.  Consequently, much more sampling is generally required.  It is also clear that unless
prior information about the spatial covariance pattern is either known or can be estimated,  the
minimum intensity of sampling required is likely to be purely guesswork with no particular basis.
Geostatistical sampling designs have no magic "30 sample" rules of thumb as are sometimes used
in classical statistics studies.
Draft                                     5-19                            February 1996

-------
Geostatistical Soil Sampling Guidance
Further advice on geostatistical sampling, especially for the purpose of estimating
directional variograms, is given by Flatman and Yfantis (1984), where they observe that "the
number of samples to be taken on each transect is site specific. Mining geologists suggest 50
samples per direction but some natural scientists have had usable results with 10. If too few
samples are taken for directional semi-vahograms, the composit or omnidirectional semi-
variogram can be used. Too few samples will be obvious from a stochastic appearance of the
semi-variograms. Too small a sample size is the most common and most subtle way to invalidate
any statistical procedure, and geostatistics is no exception." It should also be noted that not only
would one need to sample along at least two to four separate transects for a proper variogram
analysis, but additional sampling also would usually be needed to actually perform a kriging
exercise.
Once a site model has been developed, the Enghind and Heravi approach involves the
following steps:

Step 1. Divide the site into n equally spaced blocks and randomly select one sample location
from each block. Strictly speaking, this procedure is an example of stratified random
sampling. However, the same procedure is closely mimicked by a regularly-spaced
grid with random starting point in the first block. The samples collected on this
"grid" represent the first phase of sampling.

Step 2. After collecting the first phase sampling measurements, use ordinary block kriging (see
below for a full description) to estimate the mean level of concentration within each
EU on site. If the design is strictly a one-phase type, remediate any EU with estimated
mean concentration above the action level. If the design instead is multi-phase, do not
perform any remediation until the additional samples have been.located and the block
kriging estimates updated using the additional data, Instead, compute the local kriging
standard deviation (KSD) associated with each block (i.e., EU).

Step 3. For one-phase designs, compute approximate Type I and Type n errors by comparing
the estimated block means with the actual site model block averages. Kriged (i.e.,

Draft 5-20 February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
           estimated) block means that are higher than the action level when the site model mean
           is lower than the action level are counted as false positives.  Kriged block means that
           are lower than the action level when the site model mean is higher than the action level
           are counted as false negatives.  Compute the total cost (i.e., remediation +  non-
           remediation + sampling) for this design and repeat the process described in steps 1-3
           for several different grids and  a  range of different one-phase sample sizes.  By
           calculating the average total cost among the replicated designs at each fixed, distinct
           sample size and then connecting the average total costs  across the range of sample
           sizes as in the Figure 12, one can develop a graphical relationship  between expected
           total cost and overall sample size. From mis graph, one can then determine the point
           of minimum total cost and the sample size associated with it  This sample size should
           approximate the most efficient (i.e., least uncertain) one-phase design.

Step 4.    For two- or multi-phase designs, the path to the most efficient sampling program is
           more complicated, but generally does not require any increase in the overall sampling
           budget.   Englund and Heravi found that regardless  of the number  of phases of
           sampling used, the minimum cost designs involved approximately the same  total
           number of samples (see figure 13).  Of course, in a two-phase design, the number of
           first-phase samples used in the regular "grid" is lessened, so that additional samples
           can be allocated for the second phase.  Still, the most efficient two-phase designs used
           a combined phase one and phase two sample size almost identical to the size of the
           most efficient one-phase only design. The same conclusion was reached for multi-
           phase designs.  As Englund and Heravi note, the "optimal total number of samples is
           essentially independent of the number of phases." (1994, p.  247)

           The advantage of adding a second phase of sampling is that the most efficient designs
           have a lower expected total cost than the most efficient one-phase designs. However,
           this conclusion does depend somewhat on the relative proportion of samples allocated
           to the first phase compared to the second.  Englund and Heravi note that "for two
           phase designs, 75 percent of samples  in the first phase is  near optimal; 20 percent or

Draft                                     5-21                           February 1996

-------
                                                           Geostatistical Soil Sampling Guidance
                        Figure 12. Average Cost vs Sampling Intensity
                 two-

                 IMO-

                 17001

                 l«0'

                 1500-

                 14001

                 1300'
                 1100


                 1700'


                 1*00


                 IJOO'


                 1400'


                 IJOO-
                 IMO-


                 1700-


                 l«00-


                 1500-


                 1400-


                 IJOO-
                                A.
                          100    MO    MO
                                reft
                                c.
                          100    HO    300    400
                                E.
•t:
                          100    200    300
                                            B.
                             i no
                             IMO
                             1100
                                                        1710
                                                            0    50    100  ISO   MO   MO
                                            D.
                             IKO
                             IMO
                             IMO
                                                        ITfO
                                 0   SO   100   ISO  MO   250
                                        NwtarofSnpto

                                            F.
                             i no
                             IMO
                             i too
                                                        1760
                                                            0    »    100  IX   200   2X1
                                 Mean estimate     	 Upper 90th Percentile
Draft
                     5-22
February 1996

-------
                                                      Geostatistical Soil Sampling Guidance
                   Figure 13. Optimal Sample Size vs Number of Phases
              1700


              1650


              1600'


              1550


              1500-


              1450-


              1400-
                                100          200

                                       Number of Samples
             300
400
              II7J-

              1163-

              1153 •

              IMS

              1135

              los

              1115

              IMS'

              1795-

              1715-
                             50        100        150       200

                                       Number of Samples


                              Cost curves comparing one-phase,
                              two-phase, and N-phase sampling
                          250
Draft
5-23
       February 1996

-------
                                                   Geostatistical Soil Sampling Guidance
           less is actually counterproductive." (1994, p. 247) The upshot is that enough samples
           need to be allocated to the first phase regular grid to allow effective placement of the
           second phase locations.

           Another consequence of the above 75 percent optimality criterion is that, given a fully
           known or fully estimated site model in hand, one can determine the most efficient
           (i.e., least costly) sample size for a two-phase design simply by simulating the most
           efficient one-phase design. For once the most efficient total sample size, N, in a one-
           phase  design  is established, Englund and Heravi's results suggest that the most
           efficient two-phase design can be setup by allocating .7SxN samples for the first phase
           and the remaining  .25xN samples for the second phase.

 Step 5.    To actually locate second phase samples once the first phase samples have been
           collected, Englund and Heravi use a method based on the ordinary KSD. While by
           no means  the only method available for efficient allocation of second phase samples,
           the approach is highly instructive and will allow the reader to understand how Type
           I and Type  n errors can be estimated for each remediation unit (or EU under  the
           proposed HWIR-media).

           As described earlier, the first phase samples can be used in conjunction with a model
           for the spatial variability onsite to  construct "best" linear estimates of the  mean
           concentration levels in each pre-designated block. A block is simply a well-defined
           subunit of the site, and can refer to EUs, remediation units, or any series of pre-
           defined subareas enclosed within the site boundaries.  These best linear estimates are
           known as kriging estimates.  They are "best" in the sense that, if the spatial covariance
           model has been chosen correctly, such estimates minimise what is known as  the
           estimation variance, or the  variance of errors (i.e., discrepancies) between the true
           values and the estimated ones.
Draft                                    5-24                            February 1996

-------
                                                    Geostatistical Soil Sampling Guidance
           Along with the block kriging estimates for each remediation unit, the kriging algorithm
           allows for the calculation of the minimized estimation variance for each block, also
           known as th^ordinary kriging variance.  This variance estimate will differ depending
           on the block being estimated, since it depends not only on the spatial covariance model
           chosen but also heavily on the number and spatial arrangement of actual sample values
           located close to the block. Indeed, the ordinary kriging variance will tend to shrink
           as additional samples are located within or near the block being estimated.  This fact
           is precisely what lies behind the improvement to be gained from placement of second
           phase samples after a first phase grid has been analyzed.  By locating second phase
           samples in areas of the site where mere exists the greatest uncertainty about the block
           estimates (i.e., the largest kriging variances), one can reduce the overall level of
           uncertainty in the most efficient way.
                                      *
           Still, the method is not simply to place second phase samples in locations where the
           kriging variance (or equivalently, the KSD, because the KSD is just the square root
           of the kriging variance) is largest.  Rather, the goal is to place additional samples
           where the estimated Type I or Type n error is greatest. This latter criterion generally
           will lead to a different set of sample placements.

           To see how mis works, first imagine a block (i.e., remediation unit) with an estimated
           (i.e., kriged) mean concentration above the action level and an estimated KSD. By
           making the very approximate assumption that the block mean estimate is normal with
           standard deviation equal to the KSD, we can estimate  the Type I error probability
           associated with this block (only a Type I error would be possible in this case, because
           the estimated mean is greater ftan the action level and no remediation will be initiated
           [i.e., HA is the accepted hypothesis]). Referring to Figure  14 on the next page, the
           approximate Type I error probability for the block in question would equal the chance
           that the kriged mean was observed at a certain point above the action level even though
           the true block mean is less than or  equal to  the action level. Assuming the KSD
         describes the variability  in the kriged mean, this calculation is equivalent to (1) the

Draft                                     5-25                           February 1996

-------
                                                 Geostatistical Soil Sampling Guidance
                    Figure 14.  Approximately Type I Error Rates
                                 Action
                                 Level
Draft
5-26
February 26, 1996

-------
                                                    Gfostatistical Soil Sampling Guidance
           area under the normal curve above the estimated mean when the curve is centered on
           the action level; or, by symmetry, (2) the area under the normal curve below the
           action level when the curve is centered on the estimated mean.

           Conceptually, if the kriged mean is significantly above  the action level, the entire
           normal curve centered on the estimated mean will essentially be above the action level,
           leading to a negligible false positive rate:  On the other hand, if the kriged mean is
           very close to the action level, almost half the normal curve will be below the action
           level, suggesting a Type I error rate of almost 50 percent.  Clearly, in this latter case
           one would want to place additional second phase samples to improve the  kriged
           estimate and reduce the KSD.  It can also be seen why a large KSD at a particular
           block would not necessarily be associated with a large Type I error probability. If the
           kriged estimate is far above the action level, the distribution of errors with a large
                                      •
           KSD might still fall almost  entirely above the standard, leading to a small false
           positive probability.

           The case where the kriged block mean is initially below the action level is quite
           similar.  Only Type n errors would be possible for such blocks, because the lower
           kriged estimate leads to a decision not to remediate. The false negative rate  in this
           scenario would correspond to the probability mat the true mean is greater.than the
           action level even though the kriged mean is observed at a certain point below this
           level.  The exact false negative rate would depend on precisely how far above the
           action level the true mean lies, but the largest possible false negative rate would occur
           when the  true mean is essentially equal to the  action level.   One can estimate
           computationally mis probability by computing the area under a normal curve above the
           action level when it is centered at the kriged mean.  In this setting, the false positives
           and false negatives .are balanced in the sense that, regardless of which side of the
           action level the kriged estimate sits, the probability of error is greatest when the
           estimate is close to the action level and smallest when the estimate is much smaller or
           much greater than this limit.

Draft                                     5-27                            February 1996

-------
                                                   Geostatistical Soil Sampling Guidance
 Step 6.    Once the kriged means have been estimated for each remediation unit, the Type I
           errors have been computed for blocks with mean estimates greater than the action
           level, and the Type n errors have been computed for blocks with mean estimates less
           than the action level, the worst single error (note that these will  never exceed 50
           percent unless  an asymmetric error distribution is assumed) is chosen,  and a single
           second-phase sample location is randomly selected from the corresponding block.
           Then the errors are recomputed before the next second-phase  sample location is
           selected, and the worst remaining error chosen.

           The reason for  recomputing the error probabilities after each new sample location is
           chosen has to do with the nature of the kriging algorithm.  Because kriging involves
           a weighted linear combination of the sample data, with weights specifically depending
           on the strength of the spatial covariances between neighboring pairs  of samples, each
           time a new sample is located, the number and arrangement of samples in the vicinity
           of the particular block is affected. Not only does mis tend to improve the  kriged mean
           estimate and decrease the KSD for this block, it also affects the KSDs and kriged
           estimates in neighboring blocks. Thus if the worst two errors occur in  side-by-side
           blocks, and a new second phase sample is located in the block with the  worst single
           error,  recomputation of the remaining errors may lead to the worst remaining error
           being located in a different portion of the site.  What initially was the second worst
           error may be dampened substantially by the addition of a second phase sample at the
           neighboring block.

           To avoid the problem of having to actually collect physical second phase  samples one
           at a time in order to iteratively recompute the false positive and false negative errors,
           Englund and Heravi recommend only updating the KSDs, not the kriged mean
           estimates themselves. As shown below, the formula for the kriging standard deviation
           does not depend at all on the actual kriged estimate.  It depends only on the pairs of
           spatial covariances among neighboring points. Since the spatial covariance model is
           already known prior to second phase sampling, a new sampling location can be chosen

Draft                                    5-28                           February 1996

-------
                                                   Geostatistical Soil Sampling Guidance
          and the KSD recomputed without having to physically collect and analyze the soil
          core. The errors are then recomputed using the updated KSDs along with the first
          phase kriged mean estimates for each block.

Step 7.   Once all the second phase sample locations have been selected, the physical samples
          are collected and analyzed. Then ordinary kriging is performed on the combined first
          and second phase data to derive updated estimates for each remediation unit along with
          updated KSDs.  The units actually selected for final remediation are based on whether
          or not the updated kriged mean exceeds the action level. One can also estimate the
          final Type I and Type n error probabilities by using the updated KSDs with the newly
          kriged mean estimates in the manner described above.

          There is no absolute guarantee that all me specific Type I and Type n error rates will
          fell below a pre-specified limit However, if the total sample size needed to perform
          the least costly design has been run correctly and approximately 75 percent of the total
          samples have been allocated to the first phase, it will probably be quite difficult to
          improve the overall design or significantly reduce the error rates to any greater degree.
          The Englund and Heravi method does not answer every question that one might want
          to know in advance, but it does appear to be a practical strategy.

5.6.1  What To Do When a Full Site Model is Not Available
       Two important scenarios not covered by Englund and Heravi involve cases where a fully
known or fully estimated site model is not available.  Understandably, many practitioners may not
have the ready expertise to implement a conditional geostatistical simulation (used by Englund and
Heravi to develop-a fully estimated site model), even though they may be able to use a software
package like GEO-EAS or GEO-PACK to run a basic kriging algorithm. In these cases, we have
two recommendations:
1.     If the total  sampling budget is fixed,  use Englund and Heravi's results to allocate
       75 percent of the total number of samples to the first phase and 25 percent to the second
Draft                                    5-29                           February 1996

-------
Geostatistical Soil Sampling Guidance
phase. Once all the samples have been collected, the block-specific Type I and n error
rates can be computed to see if they fall within tolerable limits.

2. If the sampling budget is not fixed, but firm error rate targets are, make an educated guess
as to the number of samples to be allocated to the first phase regular grid. Once the first
phase samples have been collected and analyzed, follow the procedure described above for
iteratively locating second phase sample locations, but institute one small change. Instead
of stopping with a pre-determined number of samples allocated to the second phase two,
keep allocating additional samples until all the remaining error rates have fallen below the
targeted limits.
Neither of these two recommendations is ideal. The first does not guarantee pre-specified error
rates for each block, and the second does not specify the total sample size. However, if no other
prior information or sampling data is available from the site, it will be impossible to develop a
fully estimated site model or use the Enghmd and Heravi method to estimate the total sample size
associated with minimum overall loss (i.e./cost). In the end, one of these additional constraints

will have to be employed to finalize the sampling design.
Draft 5-30 'February 1996

-------
Geostatistical Soil Sampling Guidance
6. SAMPLING PLANS AND SCENARIOS
Establishing an efficient and workable sampling plan requires a great deal of forethought
on the part of the planning team. The following section will provide important examples of
•
typical soil sampling scenarios and how one might think through the issues unique to each.
Certainly, soil sampling studies that suggest the use of geostatistical methods will be the focus,
because other USEPA guidance documents provide thorough instructions for scenarios where
classical statistical methods may be used (for instance, see USEPA 1989).

6.1 COMMON ISSUES
6.1.1 Establishing Site Boundaries
In any soil sampling study there are common issues that must be addressed. One such
issue is the establishment of well-defined site boundaries. Just as the target population must be
clearly defined in a questionnaire survey, so too the target population of soils must be
established. This latter task is accomplished by setting clear, unambiguous boundaries. Many
sites, perhaps, will have formal boundaries already in place. For those that do not, study
boundaries should be mapped out, perhaps with the help of information on past waste
management practices or land usage.

Of critical importance is that all parties agree on the extent of the site, for this defines
the target soil population and the area over which statistical inferences will be made. As noted
earlier, any statement made about the soil distribution is highly influenced by the area! extent
of the soil in question. A mean concentration for lead estimated in a 1-acre area surrounding
a smelter would be much different than if the mean had been estimated instead hi a 40-acre area
surrounding the same plant. Any statistical inference must be put into the context of the
population one is targeting, and soils are no exception.
Draft 6-1 February 1996

-------
Geostatistical Soil Sampling Guidance
6.1.2 Common Questions To Be Answered
The following common questions should be answered before choosing a particular
sampling strategy:
•
• Is there a concrete and practical algorithm for implementing the strategy?
• What situations or scenarios are most appropriate for its use?
• What are the advantages and disadvantages of this method as opposed to other
possible methods?
• What is the relative cost in terms of sampling, measurement, and analysis of the
data?
• What levels of precision and/or uncertainty should be expected?
• How will the collected data need to be analyzed statistically?
• Can the amount of required sampling data be estimated ahead of time, and if not,
what additional information will be needed to make such an estimate?
• Will the data support the decisions that need to be made at the site?

Answering these questions as part of the overall data quality objectives (DQO) process will help
tremendously in developing a solid and realistic sampling plan. However, there are additional
issues of sample design that tend to be unique to geostatistical sampling plans. The most
important of these are discussed below.

6.2 ISSUES UNIQUE TO GEOSTATISTICAL STUDIES
6.2.1 Grid-Based Sampling
An important feature of most geostatistical designs is the use of some type of systematic
regular grid hi locating sample points (at least those in the first phase). There are many types
of regular grids but all feature identical spacing between immediately adjacent grid nodes. Such
regular spacing serves the practical purpose of allowing Held teams to easily locate the next
sample point once a sample point is marked. It also serves the purpose of providing maximum
spatial coverage of a site by a given number of samples. This can be crucial because, unlike

Draft
-------
Geostatistical Soil Sampling Guidance
independent populations, spatial correlation induces areal patterns and distinct spatial features
within the data. To ensure that such patterns are actually sampled with the highest possible
probability, collecting data on a regular grid is the most efficient approach.

While all regular grids tend to work reasonably well, there are differences in typical
efficiency depending on the type of grid pattern chosen. The most common grid types include
square, triangular, and hexagonal patterns, each one creating a set of grid nodes superimposed
on the site and exhibiting repeated geometric patterns of the type suggested by the name (e.g.,
a square grid creates a set of non-overlapping but adjacent squares, each square defined by the
grid nodes at its corners).

For the same total number of sample points, the slight differences in geometry resulting
from the use of different grid types lead to somewhat different levels of efficiency and accuracy
when it comes to modeling the pattern of spatial correlation. The reader is referred to a more
comprehensive study by Yfantis, Flatman, and Behar (1987) for further details. The authors-
general conclusion is that an equilateral triangular grid worked slightly better for the majority
of cases they studied. However, this study did not include the effects of a second or additional
phases of sampling. It is possible that when a multiple-phase sample is planned, the specific
type of first-phase grid may be less important than using the proceeding methods to locate
second-phase (or later) samples in locations that most reduce the probabilities of Type I and
Type n errors.

When a single-phase design is to be used, the type of regular grid is probably more
important. This will be particularly true if, instead of estimating block means through the use
of kriging, the goal is to locate hotspots of a particular minimum size with specified probability..
In this case, the interspacing between adjacent grid nodes is quite critical, and a triangular or
hexagonal design may marimire the spatial coverage of the grid more so than a square or
rectangular design. Gilbert (1987) and Yfantis et al. (1987) provide further information on this
point.
Draft 6-3 February 1996
-------
Geostatistical Soil Sampling Guidance
To actually cany out a grid-based sampling design, either the total number of (first-
phase) samples or the desired internodal spacing must be chosen in advance. If only the total
number of samples is known, a potential grid of the desired type should be superimposed on a
map of the site and the internodal distance adjusted as necessary until the grid fits within the site
boundaries and maximizes the spatial coverage.

The starting point of the grid should also be considered. To introduce a certain degree
of randomness into the sampling procedure, it is generally recommended that the very first node
of the grid be located at random within the subarea defined as follows: construct the polygon
of influence of a sample point fitted as closely as possible into the upper northwest comer of the
site boundaries.

For example, using a square grid in a site with rectangular boundaries, the first sample
point will have a square polygon of influence with side length equal to the internodal spacing.
By fitting such an influence polygon into the northwest comer of the site boundaries and picking
the first grid node location at random from mis subarea, the internodal spacing and square design
of the grid can be used to locate all the remaining grid nodes relative to the starting point (see
Figure 15).

6.2.2 Two-Stage Sampling
Unlike many more common sampling plans, geostatistical soil studies are often
characterized by the collection of samples in multiple phases or stages, the most popular example
being the two-phase plan. The purpose for breaking the sampling into distinct groups is to
enable the investigator to use the information from the first-phase samples to help identify where
the second-phase samples would be most effectively collected. Often, in the presence of
significant spatial correlation, there are spatial patterns within the soil concentrations that show
up in the first-phase, which need to be detailed more carefully by prudent second-phase
sampling. Taking all samples in a single phase would not allow for an efficient use of a limited
sampling budget.
Draft 6-4 February 1996
-------
Geostatistical Soil Sampling Guidance
Figure 15. Example of Randomized Grid
NW Comer

•

Y///A

YS0/
X
X.
X
X
X

• Pint Influence Polygon
X X X X X
X X X X X
X X X X . X
XX X X X
X X X X X
X X XX X

x = Grid Nodes

Draft
6-5
February 1996
-------
Geostatistical Soil Sampling Guidance
One example where a two-phased sampling approach is important and most cost-effective
is in the search for hotspots. As noted before, the statistical uncertainties associated with
sampling hotspots come not only from identifying potential hotspot areas but also correctly
measuring their areal extent. To minimize both these uncertainties, the usual approach would
be first to collect samples off a systematic grid in the first phase, where the grid is made "tight"
enough to make it highly probable that a hotspot will be identified, or "hit," if it exists. Then,
part of the sampling budget would be reserved to collect second-phase samples near and around
any first-phase measurements that qualified as potential hotspots, in order to determine whether
the extreme first-phase measurements really satisfied the minimum-size hotspot requirement.
The second-phase would thus consist of more intensive sampling around selected first-phase grid
nodes, the selection depending on the first-phase results. In this way, the fust-phase samples
are used to help pinpoint where second-phase samples are most needed.

Of course, there are many ways to allocate second-phase samples once the first-phase
grid has been sampled. To keep the overall sampling costs down, which of course will depend
on the combined amounts of first- and second-phase sampling, specific strategies and spacing
patterns for second stage sampling need to be developed. One possible strategy mentioned
earlier is to imagine a set of radial "spokes" emanating from any first-phase grid node that
registers as a potential hotspot. Second-phase samples can be located along these "spokes" to
measure the extent of the hotspot area. In fact, if the second-phase samples can be analyzed
sequentially, as might be possible with a field screening method of analysis (see Appendix C),
the samples could be measured in an ever-increasing series of "rings" around the first-phase grid
node, with second-phase sample collection being stopped whenever the samples in the last "ring"
indicate concentrations that are no longer at hotspot levels.

Even with a well-planned scheme for allocating second-phase samples to measure the
extent of potential hotspots, it may not be possible to accurately forecast the total amount of
sampling that will be required. First, it will not necessarily be known in advance how many
first-phase grid nodes will register as possible hotspots; consequently, the amount of second-
phase sampling required may not be known either. In this case, the total expected amount of
sampling would be relative to the number of initially identified potential hotspot areas. Second,

Draft 6-6 February 1996
-------
Geostatistical Soil Sampling Guidance
unless the same number of second-phase samples is collected at each first-stage grid node, the
exact amount of second-phase sampling needed may be uncertain. Sequential analysis of the
second-phase samples allows one to efficiently "shut down" the sampling when the extent of a
hotspot area has been reached. Since there is no guarantee that a first-phase sample will
intersect a hotspot at its center, it can also happen that instead of concentric "rings" around the
potential hotspot grid node, the sampling will have to follow a "lopsided" pattern to most
effectively characterize the areal extent. In either case, the necessary intensity of second-phase
sampling may not be known in advance.

Another kind of sequential sampling strategy would be important if only a single
minimum-size hotspot is necessary to trigger clean-up of the entire site. If more than one
potential hotspot were identified on the first-phase grid, one could start with the first possibility
and perform enough second-phase sampling to determine whether or not that particular location
qualified as a hotspot If so, remedial action would be mandated regardless of whether any other
hotspots might exist. If not, the next potential hotspot area could be sampled, and so on until
the first adequately-sized hotspot was located and measured.

On the other hand, if every hotspot has to be remediated separately, but resources
allocated for second-phase sampling are thin, consideration should be given to tightening the
first-stage grid to make it probable that any minimum sized hotspot would be hit by at least two
adjacent grid nodes, thus implicitly giving information about the minimum size of the hotspot,
even though no additional second-phase sampling is conducted. Such a strategy would not be
fail-safe; if two very small hotspots were positioned in just the right way, two adjacent grid
samples could both be "hits" and yet represent different patches of extreme concentrations,
neither of which might satisfy the minimum size requirement. To get around this problem, of
course, some additional second-phase sampling could still be conducted. However, it should be
recognized that sufficient tightening of the first-phase grid to intersect each hotspot with at least
two sample nodes may require significant sampling effort. On a rectangular grid, for instance,
halving the distance between adjacent nodes requires approximately a four-fold increase in the
number of samples.
Draft 6-7 February 2996
-------
Geostatistical Soil Sampling Guidance
6.3 SCENARIO #1: SAMPLING EX SITU SOIL
6.3.1 First Considerations
The first general scenario to be considered is based on those situations where ex situ
soils, perhaps as part of a waste pile or other well-defined waste/soil segregating area need to
be analyzed. The assumption is that soil and/or waste has been moved to the pile or segregating
area from the original location. It may not be known, however, whether the entire waste pile
has soil concentrations above the applicable health limit or compliance standard. If the soil
concentrations are not homogenous in distribution, there could, in fact, exist possible hotspot
areas within the pile, while the rest of the soil is at a much lower level of contamination.

If the compliance limit is akin'to a bright-line standard, it will be assumed that the main
objective is to assess the average concentration of the pile and to determine (1) whether the pile
average is above or below bright-line, and possibly (2) how far above or below bright-line the
average sits. Despite the possibility of mini-hotspot areas within a given pile, it will also be
assumed that because the soil and/or waste has been excavated from the original site and
aggregated elsewhere, no significant spatial correlation exists. Consequently, the waste pile will
be treated as a spatial "grabbag," that is, a random or at least quasi-random mixture of parcels
of soil and waste.

6.3.2 Sampling Considerations
If the ex situ soil is assumed to be mostly a random mixture of soils, in many cases
simple random sampling from the pile might be sufficient to adequately construct a confidence
interval around the average and thereby make a comparison to bright-line. No special
geostatistical techniques need be employed, unless it is strongly suspected that the pile contains
possible hotspot areas of its own. Certainly, the process of searching for hotspots via a grid-
based sampling design may be difficult to implement on a conically-shaped waste pile. Even the
process of simple random sampling on such a pile may cause some logistical difficulties.
Random sampling of soil locations implies that all possible locations are selected with equal
probability. There is no differentiation between locations along the surface of the pile and
locations at the bottom of the pile. In fact, it is quite crucial to have the tools necessary to be

Draft 6-8 February 1996
-------
Geostatistical Soil Sampling Guidance
able to collect sample cores from the bottom or the middle of the waste pile. Otherwise, the
statistical analysis is likely to give a biased estimate of the overall average and a biased
comparison against bright-line.

Not all waste piles or other segregated areas of excavated soils are likely to be strictly
random mixtures of soils. Knowing when the segregated soil is a random mixture and when it
is not can be very important in designing the sampling plan. If, for example, an existing pile
at a site has been built in stages corresponding to different waste management practices over
different time intervals, it may be advantageous to stratify the pile into distinct sub-areas and to
sample each sub-area independently. After all, the average level of contamination may be quite
different within each sub-area (corresponding to a different time period, waste management
practice, or even soil particle size distribution) and the soil concentrations within each of the
sub-areas may be much more homogenous than when compared to soils found in other sub-areas.

If this is the case, simple random sampling of the impoundment should not be used.
Rather, stratified random sampling should be employed. Under stratified random sampling,
samples are collected at random in each sub-area, statistically weighted, and then combined
together with the samples from the other sub-areas to form an estimate of the overall
impoundment average. The weights for samples in a given sub-area are usually taken as the
ratio of the volume of the whole impoundment to the volume of the specific sub-area. The
advantage in doing this is that the mean estimate based on combining separate strata is often
much more precise than a mean estimate based on simple random sampling. In other words,
a confidence interval around the mean under stratified random sampling will often be much
shorter or tighter than a similar confidence interval under simple random sampling. All of
which leads to better and more definitive comparisons against bright-line or any other
compliance standard.

Since the main focus of this document is on the issues that are unique to sampling soils
within a geostatistical framework, the basic aspects of simple random and stratified random
sampling will not be described in detail. Rather, the reader is referred to previous USEPA
guidance (USEPA 1989), where these methods are described clearly and accompanied by several

Draft 6-9 February 1996
-------
Geostatistical Soil Sampling Guidance
useful examples. The reader should also note that other physical characteristics of the sampling
process, especially the degree of inherent heterogeneity within the soil itself and the techniques
used to extract and mix samples, can significantly impact the overall statistical representativeness
of the soil samples used in the analysis. An extensive framework for addressing these concerns
is provided in Pitard (1993); see the overview and further references in Appendix B.

6.4 SCENARIO #2: SAMPLING IN SITU SOIL FROM A WASTE SITE
6.4.1 First Considerations
The general soil sampling plan to be developed for in situ soils at a waste site will be
based on the goals of the proposed Hazardous Waste Identification Rule (HWIR-media). In this
setting, it is assumed mat decisions are needed not necessarily for the site as a whole, but rather
for individual subareas of the site that constitute exposure units (EUs). As referenced earlier,
these EUs might typically be defined as half-acre rectangles, although the potential usage of the
land (e.g., industrial park, school zone) and whether or not this usage is particularly sensitive
(in terms of exposure) could affect the ultimate EU size.

In each EU, it is assumed that a decision must be made about the mean soil concentration
level for a series of environmental chemicals. In particular, the hypothesis framework proposed
under the HWIR-media is one in which if the true mean concentration level for any constituent
of concern is greater than the bright-line standard for that chemical, the specific EU and perhaps
the site as a whole will need to be regulated under federal hazardous waste standards. On the
other hand, if the true mean level is less than or equal to bright-line for all of these constituents,
the EU can be regulated under state or local authority rather than requiring Federal mandates.

Please note that the proposed solution to this problem will be applied to a single
constituent, and does not address the question of the statistical risks involved when making joint
inferential statements about an entire series of chemicals. Such joint statements would have to
incorporate the set of spatial intercorrelations between different pairs of chemical parameters and
the likely pattern of co-occurrence for these chemicals. Such discussions fall beyond the scope
of this guidance document, although the interested reader is encouraged to investigate the
technique of co-kriging as a possible partial solution (see Isaaks and Srivastava 1989).

Draft 6-10 February 1996
-------
Geostatistical Soil Sampling Guidance
Note also that the proposed solution treats each EU as a separate unit requiring a separate
decision. If a decision about the mean concentration for site as a whole is all that is required,
one should use the techniques for estimating confidence intervals around the global mean
described earlier. Of course, the global mean concentration could be less than bright-line even
while one or more EU-specific means is greater than bright-line. None of the subsequent
discussion will directly deal with the statistical risks associated with a slightly different decision
rule: mandate Federal control under the HWIR-media if any single EU mean estimate exceeds
bright-line; allow local or state control only if all EU mean estimates are below bright-line.
Under this latter framework, the false negative rate would have to be defined as the joint
probability that all true EU mean estimates are below bright-line when one or more mean
estimates exceed bright-line. The false positive rate would similarly be defined as the joint
probability that the true mean in at least one EU is greater than bright-line when all the EU
mean estimates register below the bright-line. Such joint probabilities are beyond the scope of
this document and would require new research on the optimization of sample designs under that
kind of decision criteria.

The overarching goal then of in situ soils sampling is to accurately judge the mean
concentration at each EU while spending as few resources as possible on sampling collection and
analysis. To do this most efficiently requires consideration of three major topics and the
interplay between them: the geostatistical technique of kriging, the use of field screening versus
laboratory analytic measurement methods, and the process of sample compositing to reduce
analytical costs. One should keep in mind that efficiency in the sense used here refers to
minimizing the overall "costs" associated with the entire decision-making process. These costs
include the following:

• Cost of collecting each physical sample core
• Cost of analyzing either an individual sample core or a composite formed from two
or more cores
• Cost of regulating a specific EU under Federal rules (this might include remediation
costs; note that the Englund and Heravi loss function model [1994] counts
remediation costs as a constant regardless of the actual contamination level)
Draft 6-11 February 1996
-------
Geostatistical Soil Sampling Guidance
Cost of regulating a specific EU under state and/or local authority (if the unit is to
be left alone in these cases, this cost might involve the impact of any residual
contamination that is not remediated; in the model suggested by Englund and Heravi,
the cost of residual, unremediated contamination is assumed to be proportional to the
actual concentration level).
6.4.2 Site Stratification
The most practical framework for solving this problem is as follows. First consider any
and all information on past waste management practices at the site and any known locations of
past spills or dumping areas. Then consider detailed information on site-specific geology and
any site characterization data (see Appendix B for specific geologic factors that should be taken
into account when assessing the impact of historical waste management practices and appropriate
physical sampling techniques). Combining these two sources of expert judgment information,
determine whether the site can clearly be delineated into broad areas of probable contamination
above bright-line versus areas of no probable contamination.

The basic idea behind this last step is to stratify the site, if possible, into areas where
little or nothing is likely to be found and other areas where the contamination level is either
unknown or probably above bright-line. In this way, those areas with no probable contamination
can be sampled in a fundamentally different way than areas where contamination is probable.
In particular, the recommended strategy in low contamination areas is to use either (1) field
screening methods, which generally have lower precision than laboratory-based analytical
measurement techniques but which can reliably determine when a sample is below bright-line
levels; (2) composite sampling to cut down on analytical costs if an expensive analytical method
must be used; or (3) a combination of compositing and field screening to lower overall sampling
and analysis costs even more.

As will be discussed in more detail in the following pages, compositing involves creating
a mixture of two or more individual sample cores in order to estimate the approximate average
concentration of the individual samples without having to measure each one separately. Because
it is often difficult to adequately homogenize a composite of soils (see Appendix C) and because
the variability associated with the individual samples making up the composite cannot accurately

Draft 6-12 February 1996
-------
Geostatistical Soil Sampling Guidance
be estimated unless the individual samples are also measured, compositing is only recommended
for determinations when a specific EU mean is likely to be well below bright-line (e.g., the EU
is located within a low contamination zone). Mean levels in other EUs should be estimated via
the kriging techniques described below. However, the use of compositing, field screening, or
both in areas of no probable contamination should reduce overall sampling and analysis costs
in significant portions of many sites.

For areas where contamination is highly probable, the recommended strategy is
significantly different, though field screening methods (where applicable) could still be used,
particularly if one implements a multiple indicator kriging (MDC) approach. The goal in
contaminated areas of the site is to compare an estimate of the mean in each EU against bright-
line, but to do enough sampling in the right places to keep the probabilities of false positive and
false negative errors within tolerable limits.

6.4.3 Concept of Kriging
The basic technique for making EU-specific mean estimates is kriging. In general,
kriging refers to a "weighted-moving-average estimator where the weights are chosen to
minimize the estimation (kriging) variance" (England and Heravi 1994). This means that any
kriging estimate is a weighted linear average of the sample data located near the unknown
location being estimated. Furthermore, the weights used in the kriging estimate are chosen so
that the variance between the estimated value and the unknown value is minimized. The key to
this last step is the existence of a reasonable model describing die pattern of spatial covariance
at the site. Thus, kriging involves two major tasks: first, the sample data or other prior site-
specific information must be used to develop a reasonable spatial covariance model; second, the
basic kriging equations used to determine the appropriate sample weights must be solved with
the help of the covariance model developed in step 1.

The details for mastering the art of kriging are too involved to be adequately described
within a guidance document of the current scope. Helpful guidance and several useful references
already exist for someone learning how to krige; one of the best books is by Isaaks and
Srivastava, entitled An Introduction to Applied Geostatistics (1989). In the following discussion,

Draft 6-13 February 1996
-------
Geostatistical Soil Sampling Guidance
we will sketch the process for developing estimates using three types of kriging— ordinary,
indicator, and multiple indicator— however, the reader is strongly encouraged to study the
additional references and to work through the examples provided in the users guide to the
USEPA-developed software packages GEO-EAS or GEO-PACK. Neither of these packages is
difficult to use and at least the first two types of kriging mentioned above can be run on both.

6.4.4 Ordinary Kriging
Ordinary kriging (OK) is the most common and the best known form of kriging. As
described earlier, OK makes the assumption that the area to be kriged satisfies the second-order
stationanty requirements of constant (but unknown) mean and a spatial covariance function that
depends only on the separation vector between any two points and not on the location of the
points themselves. It should be noted in connection with these assumptions that if the site has
first been stratified into areas of probable and no probable contamination through the use of prior
expert geologic knowledge, the chances should improve that second-order stationanty will be
approximately valid in the areas to be kriged.

OK produces estimates of the actual concentration or trait being measured, so that the
OK estimate is a weighted linear combination of the concentration measurements at nearby
sampled locations. If we let Z(x) represent the unknown value at location x, this estimate can
be expressed as
The weights w,, as noted above, depend explicitly on the location and arrangement of
sample points near to the unknown location x, through the chosen model of spatial covariance.
In the simplest case, this model will depend only on the distance of separation between any two
samples and not on die direction of die separation vector. Then the spatial covariance model
can be written as C(h), with h representing the separation distance. With this notation, the set
of weights w, that best estimate Z(x), in the sense that the estimation variance is minimized, can
be found via the solution of a set of simultaneous linear equations of the form:
Draft 6-14 February 1996
-------
Geostatistical Soil Sampling Guidance
where one such equation is formed for each sample location xjt the sum of the weights MJ is
unity, and as before Cy is the spatial co variance between locations x, and Xj , while C,, is the
spatial covariance between sample location x, and the unknown location x.
•
To ensure that the weights add up to one, and hence that the estimate for Z(x) is indeed
a weighted average of the nearby sample data, note the Lagrange parameter lambda (X) in the
kriging equations above. Once the weights have been determined, this Lagrange parameter can
also be expressed as a function of the weights and then substituted into a formula for the kriging
estimation variance, leading to a form of the variance that depends only on the weights w, and
the spatial covariances between the pairs of sample locations:
In this last equation, the first term on the right-hand side refers to the estimated global site
variance, which can be computed from the spatial covariance model as C(0).

The square root of this estimation variance is also known as the kriging standard
deviation (KSD) referred to earlier. The KSD tells us something about the estimated precision
of the kriged estimate at location x. Though, as Isaaks and Srivastava (1989), Joumel (1988),
and Englund and Heravi (1994), among others, are quick to point out, the KSD is often a less
than robust estimate of variability even when the kriged estimate of Z(x) is acceptable. These
authors also point out that the errors of estimation in kriging are often non-normal in
distribution, so that one should generally be skeptical of trying to use the KSD as a kind of
normal standard error for constructing confidence intervals.

The most instructive feature of the KSD is that it does provide an excellent relative
measure of the precision of our kriged estimate compared to the kriged estimates at other
locations. Note that even as the kriged estimate changes from location to location, the KSD is
also location-specific. The KSD will in fact be larger at one estimated location than another
whenever the nearby samples within the "search radius" of the kriging algorithm are either not
as numerous, not as close to the unknown location x, or arranged in a less symmetrical fashion

Draft 6-15 ' February 1996
-------
Geostatistical Soil Sampling Guidance
around x. Since the OK estimate at location x can also be viewed as a kind of linear
interpolation of the nearby samples, the KSD does a great job of indicating at which estimated
locations the interpolation is less certain and more imprecise. Because of this fact, one can use
OK and the KSD in the way Englund and Heravi (1994) describe to choose second-phase
locations where additional sampling will lead to the greatest lowering of the overall uncertainty.

6.4.4.1 Search Radius
The search radius described above is just what it sounds like. At any location x to be
estimated (that is, kriged), a pre-specified distance from x is used to locate nearby sample
points. Those samples falling within the search radius are then used in the kriging equations
above to develop the kriging weights w,. The reason for not using all the sample data in these
equations is that the solution to the kriging equations would be far more computationally
intensive and time-consuming (often prohibitively so). The fact is that beyond the few nearest
samples, which receive the vast majority of the total weight due to their proximity to the
unknown location, other more distant samples generally receive very little kriging weight even
when included in the calculations. The search radius is then a convenient and time-saving tool
for limiting the number of necessary calculations while negligibly affecting the kriging estimates
themselves.

6.4.5 Block Krigmg

The discussion above sketches the basic algorithm for estimating the value Y(x) at any
unknown location x. It does not offer the more important estimate of the mean level within a
specific EU. Such •*%!««»« are known as block estimates because the a mean level is computed
for an entire sub-area of the site, rather than at a single location known as a point estimate.
Fortunately, the kriging equations developed above can be readily modified to allow easy
computation of block estimates for any size block.

The key to this modification is understanding that the spatial covariance between any
point location x and an entire block A is really just an average of the spatial covariances between
point x and each individual location y within A. With this in mind, the computation of a

Draft 6-16 February 1996
-------
Geostatistical Soil Sampling Guidance
covariance C(x,A) can be accomplished practically by discretizing block A into a series of
individual locations y, and then computing the individual covariances C(x,y) for each point. The
average of all these values C(x,y) will then equal the desired covariance C(x,A). Of course, in
a program such as GEO-EAS or GEO-PACK one has only to specify the block size and the
number of discretizing points; the program does the rest of the calculation automatically.

Using the above notations, the kriging estimate for mean concentration in a block A can
be written:
where the weights are based on the block covariances instead of just the point covariances.
Likewise, the kriging equations undergo a slight modification to become:
Finally the block kriging variance equation can be expressed as:
where the first right-hand term C^ is the average spatial covariance between any two points
selected from block A. In ordinary point kriging, by contrast, this term is the global variance
at the site, which also represents the average covariance between any two points taken from the
site as a whole.

These equations allow one to develop weights w, for a linear interpolation of the sample
data nearest the block being estimated, and to estimate the (relative) precision of this mean
estimate through the block KSD. They also provide an overview of the general kriging
algorithm and its power to determine local estimates for any sub-area or individual location at
the site. Unfortunately, since much soils concentration data is non-normally distributed and the
KSD is not a very robust estimate of precision in the face of such data, we do not recommend
OK for the general soils sampling problem introduced above unless one is well-versed in the

Draft 6-17 February 1996
-------
Geostatistical Soil Sampling Guidance
finer points of kriging. Instead, the foregoing description of OK will prepare us adequately to
introduce a second, somewhat simpler and yet more robust form of kriging known as indicator
kriging (IK).

6.4.6 Indicator Kriging
Indicator kriging (IK) is very similar to OK in almost every way except one: instead of
estimating the true concentration value at an unknown location x (i.e., Z(x)), one estimates the
probability that the concentration at x is below a particular cutoff or threshold c. The reason
for the tremendous similarity is that simple IK is identical to running OK after first using an
indicator transformation on the concentration data, Z(x). The indicator transformation changes
each value Z(x) into either a 0 or a 1, depending respectively on whether Z(x) is less than or
greater than the cutoff c. In fact, the indicator function can be defined for any cutoff c as:
0 i/Z(x)>c
An important distinction between IK and OK is that by using the indicator function to
transform the data Z(x) into a series of Os and Is, information is lost about the actual
concentration magnitudes. Thus, IK retains less information about the distribution of
concentrations than OK. For a given cutoff, one only knows using IK whether or not the
concentration at a particular sample location x is greater than the cutoff. To get a more
complete picture of the distribution of Z(x), one would need to define a large series of cutoffs
{c1
-------
Geostatistical Soil Sampling Guidance
we could express the vector of indicators Ie(x) as {0,0,0,1,1,1,1,1,1}, Looking at this vector
of indicators, one could quickly determine without knowing Z(x) that the concentration must lie
somewhere between the third cutoff of 15 and the fourth cutoff of 20. Note that the estimate
could be made even more precise by defining additional cutoffs and corresponding indicators.

For the simple case where only one indicator has been defined for a single cutoff, much
information about Z(x) is lost. However, the loss of information also helps to make estimates
derived from IK more robust than those from OK. Instead of having to estimate the precise
concentration value Z(x), one has only to estimate whether or not Z(x) is greater than a single
cutoff. And by choosing the cutoff at a level equal to bright-line (or hi other cases, a
compliance or clean-up standard), one can make inferential statements about whether or not a
given point location or block exceeds bright-line, using IK.

Note in this connection that the difficulty in precisely estimating a concentration value
at an unknown location x rises all the more when the values at sample locations are only known
approximately. This will almost always be the case to some degree for concentration data, since
measurements on any chemical can only be made with a certain degree of precision. The worse
the precision of the analytical method being used, the greater the level of uncertainty
surrounding the sampled values and the greater the implicit uncertainty in the kriged estimate
Z(x). However, for an indicator cutoff set equal to bright-line, the degree of uncertainty about
the indicator value may be much smaller.

For example, suppose that a sample value at location x is measured as Z(x)=30 ppm
with an analytical precision standard deviation equal to 5 ppm. Assuming that the analytical
errors of measurements are symmetric and normally distributed, the true concentration value
would have very low probability of exceeding a bright-line cutoff equal to 50 ppm (since the
observed measurement is approximately four standard deviations below bright-line), although
the true value could easily fall anywhere within the range of 20 to 40 ppm. Thus, there are
many cases where a measured value can be placed with great certainty above or below a specific
cutoff, even if the measurement itself is not very precise. As a consequence, IK can be used
Draft 6-19 February 1996
-------
Geostatistical Soil Sampling Guidance
to generate fairly robust results even in those cases where OK might give more uncertain
estimates.

A specific implication of the above discussion is that IK can be considered for use even
when it may be too expensive to do precise analytical laboratory testing on a large number of
samples. The earlier discussion on decision errors and sampling costs noted that typical
geostatistical studies often require sample sizes in the range of a few dozen to a few hundred
samples, depending on the degree of spatial correlation at the site. If the analytical laboratory
measurement method of choice is very expensive, a significant portion of the study resources.
might have to be devoted toward the analysis of soil samples. Rather than decide to collect and
analyze only a few samples because of the prohibitive analytical costs, if a less precise but far
cheaper field screening method is available (see Appendix C for a list of such methods), the
bright-line standard may be high enough to allow acceptable certainty about the indicator
function even though switching to the field method decreases the measurement precision. And
by using a much cheaper measurement technique, the number of samples that can be collected
within budget can be raised to tolerable ranges for efficient kriging. Fortunately, this appears
to be the case for many of the constituents with bright-line numbers listed in the HWIR-media
proposal.

6.4.6.1 Performing Indicator Kriging
The first step to performing an K study to assess compliance with bright line is to
transform the sample concentration measurements into a series of indicator values, using a cutoff
equal to the bright-line level. Then the sample data can be written in the form I^x) instead of
Z(x) for each location x. Once this is done, the remaining steps are identical to OK. First the
sample indicator data are used to develop a spatial covariance model C,(h) for the pattern of
differences between pairs of indicator values separated by distance vector h. Note that the forms
of such indicator covariance models are more highly constrained than in the case where the
actual concentrations are used. This follows because the difference between any two indicator
values can only equal -1,0, or 1. Therefore, it is often easier to develop a reasonable indicator
model than a covariance model based on concentration values.
Draft 6-20 February 1996
-------
Geostatistical Soil Sampling Guidance
Once the model C,(h) has been developed, the same kriging equations that were
developed for OK can be solved to find the best kriging weights w,. These equations include
the IK equations, expressed as:
the IK variance, expressed as:
and the kriged estimate itself, which, as in OK, represents a weighted linear average of the
sample indicator data close to the location being estimated. This last estimate has the form:
Unlike OK, where die kriged estimate can be interpreted as die estimated concentration
at location x (or in the case of ordinary block kriging as the mean concentration within the
block), the kriged estimate in IK will generally not be a value equal to 0 or 1. Instead, the
estimates are likely to be fractions between 0 and 1, and thus not directly comparable to
indicator data. Because each IK estimate is a weighted average of other indicator values, it
fortunately has a simple and extremely useful interpretation. To see this, recall that each 0-
valued indicator represents a sample point where die concentration is greater than the cutoff,
while each 1 -valued indicator is a sample point where the concentration is no greater than the
cutoff. For an unknown location x surrounded closely by mostly 0-valued indicators, the
probability would seem high diat if the nearby concentrations are mostly greater than the cutoff,
so it should be with the unknown concentration at x. Such is the impact of positive spatial
correlation. On the other hand, an unknown location surrounded mostly by 1 -valued indicators
would seem likely to have a concentration less than the cutoff for the same reason, again
encouraged by the presence of positive spatial correlation.
Draft 6-21 February 1996
-------
Geostatistical Soil Sampling Guidance
Now in the first case above, the weighted average of the surrounding indicators is likely
to be close to 0, but will be identically 0 only if all the sample indicators within the search
radius are 0-valued. Likewise, in the second case, the weighted average is likely to be close to
1 but not identically equal to one. A third case could include those locations x surrounded by
an equal mixture of 0-valued and 1-valued indicators. In this case, the weighted average is
likely to be closer to 0.5 than to either 0 or 1.

All in all, the weighted averages of the sample indicators offer a probability-based scale
for estimating whether the actual indicator value at an unknown location x is either above or
below the cutoff. If the kriged value is close to 1, the probability is high that indeed Z(x) is less
than the cutoff. If the kriged value is close to 0, the probability is low that Z(x) is less than the
cutoff. And if the kriged value is close to 0.5, we are completely uncertain one way or the
other (i.e., there is about a 50-50 chance that the true indicator could go either way).
Therefore, any estimate from an IK study can be specifically interpreted as the probability that
the true concentration does not exceed the chosen cutoff.

6.4.7 Block Indicator Krigtag
As with point locations, all the comments and formulas concerning block OK translate
directly to block IK once the indicator transformation has been made and an indicator spatial
covariance model has been chosen. One has only to substitute the indicator sample data I^x,)
for the concentrations Z(xp and the indicator covariance function C i(h) for the concentration
covariance function C(h). The only real difference is in the interpretation of weighted average
estimate for a given block. With DC, the kriged block estimate represents the estimated
probability that the true block mean concentration does not exceed the cutoff (e.g., bright-line).
Therefore, it is possible to directly use the block IK estimates to assess compliance with a
bright-line standard and to estimate the false positive and false negative risks associated with
these decisions.

One simple compliance strategy would be the following:
Draft 6-22 ' February 1996
-------
Geostatistical Soil Sampling Guidance
Step 1. Collect and analyze a single phase of soil samples at the nodes of a
regular grid, superimposed upon that portion of the site where
contamination is strongly suspected. Translate the sample concentrations
into indicator data with a cutoff equal to the bright-line standard and
choose an appropriate indicator covariance model.

Step 2. Divide the potentially contaminated area into a series of adjacent non-
overlapping blocks equal in size to an EU. Then use block IK to estimate
the probability of non-exceedance at each block.

Step 3. Classify EU block: estimates with probabilities no greater than 0.5 as units
where the mean concentration exceeds bright-line, and estimates with
probabilities larger- than 0.5 as units where the mean concentration does
not exceed bright-line.

Step 4. Compute first-phase false positive and false negative error rates as
follows. For units classified in Step 3 as exceeding bright-line
(representing an acceptance of the null hypothesis that mean levels at the
EU require Federal control), the false negative rate can be estimated
simply as the indicator block estimate. For this estimate is the probability
that the unit mean is actually less than bright-line, even though it has been
classified as exceeding the standard. Similarly, for units classified in Step
3 as not exceeding bright-line (representing a rejection of the null
hypothesis), the false positive rate can be computed as one minus the
indicator block estimate (that is, l-l(EU)). For this estimate is the
probability that the unit mean exceeds bright-line, even though it has been
classified otherwise.

Please note mat both of these error rates can have a maximum value near
50 percent, occurring precisely when the indicator block estimate itself is
near 0.5. This should make sense, because in these cases the chances are

Draft 6-23 February 1996
-------
Geostotistical Soil Sampling Guidance
approximately 50 percent that the true mean concentration exceeds bright-
line and 50 percent that it does not. Thus we should expect the greatest
degree of overall uncertainty.

Step 5. To minimize the block-specific decision errors, choose the worst single
estimated Type I or Type n error and pick a second-phase sample location
at random from the offending EU. Unlike the England and Heravi
method outlined earlier, since the Type I and n error rates are estimated
from the IK estimates themselves and not just from the kriging standard
deviation, this new sample must be measured prior to locating other
second-phase samples. However, the rest of Englund and Heravi's (1994)
algorithm can be employed to iteradvely recompute the estimated block
error rates and pick successive second-phase sample locations one by one
in the same fashion until either all the sampling budget is exhausted or all
the estimated decision errors are within acceptable bounds.

Step 6. Once ail second-phase samples have been located and measured,
recompute the block IK estimates one last time using the combined set of
first-phase and second-phase samples. Reclassiry each EU block
according to the method in Step 3 and compute the associated block-
specific Type I and Type n error rates as in Step 4.

If all error rates are within acceptable bounds, the geostatistical analysis
is finished. If not, one could consider an additional phase of sampling
similar to phase two in order to refine the estimates further. Note also
that all the above geostatistical analysis can be done in a straightforward
manner in a package such as GEO-EAS or GEOPACK simply by
converting the concentration data to indicator form (GEO-EAS provides
this as a built in capability) and running ordinary block kriging on the
transformed data. We present a simple example below.
Draft 6-24 February 1996
-------
Geostatistical Soil Sampling Guidance
6.4.8 Indicator Kriging Example
6.4.8.1 Site Background and Initial Sampling
The Beryllium Products Company (BPC) formerly produced beryllium, beryllium alloys,
and beryllium products at its west Texas facility for use in the defense and electronics industries.
As part of its manufacturing operations, BPC generated wastewater treatment sludge listed as
USEPA hazardous waste No. F006. These wastewater treatment sludges were drummed and
managed offsite as hazardous waste; however, filtered wastewater and numerous other non-listed
waste streams were discharged to a small onsite lagoon, and additional treatment of these
wastewaters led to the deposition of additional F006 and beryllium-contaminated sludges.
Discharges to the lagoon occurred in the period between April 1975 and September 1980. In
September 1980, the residual wastewater was drained from the lagoon. The resulting unit was
approximately 160 feet long by 320 feet wide and 5 feet deep and located approximately 600 feet
from a river that flows northeast past the facility, forming its northern boundary.

In September* 1989, a Resource Conservation and Recovery Act (RCRA) Facility
Assessment (RFA) was conducted at BPC. The UFA identified two solid waste management
units (SWMUs) at the facility, including the onsite lagoon, SWMU #1 (see Figure 16). Based
on information gathered during the RFA, the former lagoon was considered to pose imminent
danger to human health and the environment. A limited sampling effort was undertaken to
characterize the remaining sludge in the unit Two months following the sampling effort, 51,000
cubic feet of residual sludge was excavated and disposed offsite as hazardous waste.

An RCRA Facility Investigation (RFI) was initiated at the site in 1992. Because
prevailing winds at the facility are from the southwest to the northeast, wind erosion from the
landfill and leaching of the waste from the landfill to the river were determined to be possible
contaminant transport pathways. A soil sampling plan was designed and implemented to
determine the distribution of beryllium in shallow soils downwind and downstream of SWMU
#1. The samples were not collected systematically, but were chosen by expert judgement from
the subset of nodes off a regular grid placed over die site. A finer sampling grid was established
for the immediate vicinity of the excavated lagoon to evaluate the beryllium contamination
remaining from the excavated sludges. (Note the orientation of the finer scale grid and its

Draft 6-25 February 1996
-------
I
Figure 16. Site Layout
LEGEND

• Sod Samfti locate*

GiUfoatimJorSoil
e w uo MO at
-------
Geostatistical Soil Sampling Guidance
relationship to the SWMU orientation. Grids oriented to line up along the directions of greatest
and least spatial continuity or correlation allow for the construction of variograms that best
describe the correlation structure of the data. This results in more accurate estimates of the
spatial correlation structure of the data and allows for successful development and modeling of
the variogram for the site.)

Initially, 30 samples were collected and tested for beryllium contamination For purposes
of this example, the action level of interest under the HWIR-media was assumed to be 100 ppm
and the size of each exposure unit (EU) is assumed to be approximately 400 feet by 400 feet.
Overall, there are IS such EUs that lie essentially within the boundaries of the study area. For
each EU, the goal was to determine whether the average concentration level lies above or below
the 100 ppm action level. Any EU with an average level above 100 ppm would need to be
regulated under Federal hazardous waste mandates. EUs with average levels below 100 ppm
could be managed under state or local regulations.

In addition, given the statistical uncertainty associated with any decision about a
particular EU, it was required up front that the estimates of average beryllium levels be made
so that there would be no greater than a 20 percent probability of a Type I or Type n error.
In other words, there should be at most a 20 percent chance that a particular EU would be
incorrectly classified as being either above or below the action level. Such a constraint on the
probabilities.of possible decision errors usually necessitates additional sampling, but goes a long
way toward establishing objective statistical measures that all interested parties can agree upon.

6.4.8.2 First-Phase Indicator Kriging
At the BPC site, block IK was used to analyze the original 30 samples, using an indicator
cutoff level of 100 ppm. The first steps in this analysis were to construct a postplot of the
beryllium concentrations in the geostatistical software package GEO-EAS (see Figure 17), which
plots the data values on a simple map reflecting the geographic coordinates of each sample, and
then to transform the data into a series of indicator Os and Is using the 100 ppm cutoff (see
Figure 18). These indicator data, and not the original arsenic concentrations, were used to
estimate a sample indicator variogram (see Figure 19).

Draft «7 February 1996
-------
Figure 17. Beryllium Data Postplot

'
Northing (100 ft)
® " 0 S
Postplot of Beryllium Data
X
21
•f X
+ I 11.4
0 '•» X X
x o •« + » »!
11.7 79.5 o 7
412 + 4- XX
Q 3 J6 x II » ^
0 142 + " J
172 • , o «
X 190« 162 71,
16 193 • 356 «
500 299 •
• 2461
322
I • ' •
0 10 20
Easting (100 ft)
IstQuaitile: 3.0 < + < 8.0
2nd Quartile: 8.0 < X < 21.0
3rdQuartilc: 21.0 < O < 172.0
4thQu«tle 172.0 < « < 500.0

I1
5
?'

1
C/j

1
-------
Figure 18. Postplot of Indicator Data
Postplot of I(x) for Brightline - 100 ppm
IS

I 10
1 ,
I

0
,
i. i.
i.
i. i. i.
i. i. i.
i.
i. i. i. i.
o '• '
.0 1
0 0 1
1. .0 0
.0 .0
0.
.0
0 10 20
Easting (100 ft)
1. Concentration < 100 ppm
0. Concentration > 100 ppm
-------
I
Figure 19. Sample Indicator Variogram
Variogram for I(x)
0.4 •
0.3 •
c 0>2
I
o.i -
0.
I
4.
8. 12.
Distance
16.
20.
Parameters
Pairs :
Direct. :
Tol.
MaxBand :
423
0.000
90.000
n/a
Minimum :
Maximum :
Mean :
Var.
0.00
1.000
0.667
0.22222
-------
Geostatistical Soil Sampling Guidance
As discussed earlier, the sample variogram is essentially the average squared difference
between indicator values when pairs of sample points, each having roughly equal separation lags,
are grouped together. Because indicator values can only be 0 or 1, the squared difference for
any pair must also be either 0 or 1. When an average is taken over a group of such pairs, the
result will necessarily be between 0 and 1, as can be seen along the vertical axis of the sample
variogram in the Equus data. One thing to watch for when creating a sample variogram in a
program such as GEO-EAS is that the user can choose how many pairs of roughly equal
separation lags should be grouped together before averaging by adjusting the "lag tolerance."
Often, if a sample variogram looks too ragged, without any discernible smooth pattern, it can
be helpful to reduce the number of approximate lags (i.e., increasing the lag tolerance) and
consequently increase the number of pairs used in averaging at each lag.

Once a reasonably smooth pattern is discernible in the sample variogram, an analytical
model for the variogram must be chosen. GEO-EAS supplies a very helpful interactive
modeling module that can be used to experiment with different variogram choices to see which
ones best fit the sample variogram values. For the original set of PCB data, a simple Spherical
model was selected with nugget parameter equal to 0.03, sill set to .31, and range equal to 13
distance units (i.e., 1,300 ft). The final model is portrayed in figure 20 and indicates an
adequate fit to the sample variogram pattern.

Note that variograms are essentially inverted covariance models. In the presence of
strong spatial correlation, the spatial covariance will tend to be largest for small separation lags
and gradually decrease (usually to zero) at larger lags. The variogram, on the other hand, tends
to be an increasing function of the lag separation. The nugget value occurs at zero lag
separation and represents the highest degree of spatial correlation in the data. However, lower
nuggets suggest greater spatial correlation than higher nuggets. In similar fashion, the
variogram sill (i.e., its highest point) represents the level of least spatial correlation. Roughly
speaking, the smallest lag at which the sill is reached is labeled the range of the variogram.
This lag is the separation distance between any pair of sample locations beyond which there is
no detectable spatial covariance.
Draft 6-31 February 1996
-------
Figure 20. Model Indicator Variogram
Variogram for I(x)
0.4 -
0.3 -
E 0.2 •
I
o.i
0.
8. 12. 16.
Distance
20.
Parameters
Pairs
Direct.
Tol.
MaxBand
Minimum
Maximum
Mean
Var.
423
0.000
90.000
n/a
0.00
1.000
0.667
0.22222
-------
Geostatistical Soil Sampling Guidance
Using the final variogram model, the block IK equations were used repeatedly to estimate
an average indicator value for each 400-foot by 400-foot block at the BPC site. The results are
presented in Figure 21. Some blocks on the figure are labeled as missing because they fell
either completely or mostly outside the site boundaries. The remaining blocks have estimated
values between 0 and 1, representing the estimated probability that a particular block has an
average beryllium concentration below the 100 ppm action level. This interpretation follows
from the discussion of IK results in the previous section.

6.4.8.3 Block Classification
Classification of each block (i.e., EU) occurs according to its IK estimate. If the
estimate is less than 0.5, the probability is greatest mat the true average beryllium level is above
100 ppm and so would require Federal control under the HWIR-media. On the other hand, if
the estimate is greater than 0.5, toe probability is greatest that the true average is below 100
ppm and hence, the EU could be handled by state or local authority.
* '
Of particular interest, most of the estimated blocks have values between 0.8 and 1.0.
For these EUs, IK estimates that the probability is at least 80 percent that the mean beryllium
level is below 100 ppm. Furthermore, given a decision to treat these EUs as only needing state
or local control (that is, accepting the alternative hypothesis HA that the true mean level is less
than 100 ppm), the ^grfrnatPd Type I error rate for each block is no greater than 20 percent (i.e.,
one minus the DC estimate). In similar fashion, two blocks in the bottom row of Figure 21 have
IK estimated probabilities between 0.0 and 0.2. These EUs will be treated as falling under
Federal jurisdiction as per the HWIR-media, since the probability is so low that the true mean
beryllium level is below 100 ppm. Furthermore, given the decision to accept HO for these
blocks, the Type n error rate is identically equal to the IK estimate (that is, no greater than 20
percent) since a false negative mistake would occur whenever the true beryllium mean is less
than the action level but is classified as being above it.

All of the EUs discussed so far can be classified within the required error bounds (i.e.,
with a Type I or Type n error no greater than 20 percent). Three remaining blocks located near
SWMU #1 are not so certain. The estimated K probabilities fall between 0.2 and 0.8 and

Draft 6-33 February 1996
-------
Figure 21. First Indicator Kriging Results
14-
12-
Northing (100 ft)
6 -
4 ~
\
0
Original Indicator Kriging Results

No Added Second Phase Samples
i
4
I I I
a 12 12
Easting (100 ft)
Estimated Probability
\
20
o.a - 1.0

0.6 - 0.8

0.4 - 0.6

0.2 - 0.4

0.0 - 0.2

Missing
-------
Gepstatistical Soil Sampling Guidance
therefore have estimated Type I or Type n errors greater than 20 percent. For instance, the first
block in the next-to-last row has an estimated IK value of 0.519. Though this EU can be
classified as having a true mean less than the action level, the chance of a Type I error is
approximately 48 percent (i.e., 1 - .519 = .481).

6.4.8.4 Second-Phase Sampling
To remedy the uncertainty associated with the more uncertainly classified EUs, additional
samples must be collected from the BPC site and the IK results reanalyzed. As mentioned
earlier, the one significant drawback to using IK for classification of block averages is that each
new sample must be measured and reanalyzed statistically before collecting the next second-
phase sample. In practical terms, the process works as follows. First, the two blocks with the
largest Type I and Type n error rates should be identified. The largest Type I error will
accompany the block with estimated IK probability closest to, but exceeding, 0.5. Similarly,
the largest Type n error will accompany the block with estimated IK probability closest to, but
no greater than, 0.5. .

For the BPC data, the second block on the next-to-last row has an estimated IK value of
.334. It is classified as having a true mean greater than 100 ppm (consistent with HO), but it
also possesses the largest estimated Type n error at about 33 percent. The first block on the
same row is even more uncertain since, as mentioned above, the approximate Type I error is
just over 48 percent. Picking the block with the single largest error probability (i.e., 48
percent), a new random sample location was selected from the associated EU with a measured
value of 15.5 ppm (see Figures 22 and 23 for postplots of the augmented data set). This
augmented data set was then reanalyzed using IK as before, with block averages re-estimated
for each EU based on the new sample information. Finally, the largest Type I and Type n
errors were again determined to see if all the error probabilities were under 20 percent.

In the BPC example, the addition of the new sample measurement decreased the largest
Type I error to 20.8 percent, which occurred in the third block on the next-to-last row, but
increased the largest Type n error to 47.9 percent, this time at the first block on the next-to-last
row. The reason that both errors decreased simultaneously is that the IK estimate at the sampled

Draft 6-35 February 1996
-------
Figure 22. Adding One Second-Phase Sample

Postplot of Beryllium Data
Adding 1 Second Phase Sample
15

^^
* 10
^^J
^^S
^
60
.S
3 5
o
fc

0
x
21
+ X
+ 1 114
0 J» X X.
x 0 M.7 + 19 21
117 79.5 o 7
«« + + XX
3 56 v II 20 +
° •« «
O |42 +
11 * • o . 7-7
v ° J90» 361 7U
X I5S W
16 7 193 * 356 •
/^ 500 299 •
/ ™

X 0 10 20
New Sample Easting ( 1 00 ft)
IstQuartile: 3.0 < + < 8.0
2ndQuartile: 8.0 < X < 21.0
3rdQuartilc: 21.0 < O < 172.0
4lhQuartle 172.0 < * < 500.0

c>
1
I
o
C;
oq

O
e
-------
Figure 23. Corresponding Indicator Postplot

Postplot of I(x)
Adding 1 Second Phase Sample
15

43 10
8
*•-•'
s
1 3
fc

0
i.
i. i.
i.
i. i. i.
i. i. i.
i.
i. i. i. i.
.0 ' '•
.0 1.
a ° » '
1. s .0 .0
/ .0 .0
/ .0

M • I
/ 0 -10 20
New Sample Easting (100 ft)
1. Concentration < 100 ppm
0. Concentration > 100 nun

-------
Geostatistical Soil Sampling Guidance
block changed from 0.519 originally to 0.479. The drop was enough to force a re-classification
of the block from above the action level to below the action level, which can easily occur for
highly uncertain blocks. Indeed, such uncertainty is the main impetus for additional second-
phase sampling.

Because the largest errors are still both greater than 20 percent and since the single
largest error occurs within the same EU that was previously sampled, the whole random
sampling and re-analysis algorithm must be repeated, again on block one of the third row. In
fact, the process must be repeated as many times as necessary until the largest Type I and Type
n errors are both under 20 percent. The same block will not always be targeted for additional
second-phase sampling, but one can expect the bulk of the newly placed samples to occur within
the most initially uncertain blocks.

To satisfy the error probability requirements, the BPC data set had to be augmented with
13 additional second-phase samples, leading to a total data set of 43 measurements. Postplots
of the final data set (Figures 24 and 25) clearly show that the additional samples were
concentrated in the areas near SWMU #1, indicating the statistical uncertainty associated with
determining the extent of beryllium contamination above the action level near the original
lagoon. The final block IK results are presented in Figure 26, where it is apparent now that all
the estimated EUs have IK probabilities either between 0.8 and 1.0 or between 0.0 and 0.2. The
lower-valued estimates accompany EUs classified with true means above the action level and
Type n error rates under 20 percent. Likewise, the higher-valued estimates accompany EUs
classified with true means lower than the action level and Type I errors also under 20 percent.

The effect of adding the additional 13 samples was to more accurately classify those
blocks that initially possessed large Type I or Type n errors. Without the more complicated and
involved process of developing a fully estimated site model, die number of second-phase samples
that would be needed to satisfy the target error rates could not be known in advance. However,
since the process of sampling and re-analyzing the data is iterative, additional sampling (and the
associated costs of measurement) could be stopped at the appropriate time without wasting
money on estimates of the additional number of samples needed. At the same time, since it may

Draft 6-38 February 1996
-------
I
Figure 24. After 13 Second-Phase Samples

•
Northing (100 ft)
* "5 5
Postplot of Beryllium Data
Adding 13 Second Phase Sample
X
21
•f X
+ 1 11.4
0 *9 X X
X 0 ««•' + " 2I
117 ° 79.J o 1
n 0 (ii + + . X X
'» " .u o ?»Z> o « J x6 x » - *
f> ° .!« ° « * »
o m • . n»oa "
x 4 " • » "•
16 191 • M6 «
MO 299 •
• 246.1
322
• • B
0 10 20
Easting (100 ft)
IstQuartile: 3.0 < . + < 8.0
IndQuartile: 8.0 < X < 21.0
3rdQuartilc: 21.0 < O < 172.0
4lhQuartlc 172.0 < « < 5000

».

Cxj

1
-------
Figure 25. Corresponding Indicator Postplot
Postplotofl(x)
Adding 13 Second Phase Sample
00
ng
5
i. i
o.

.0
o,
o a
.0 .0
.0 .0
'
10
1.
0.
Easting (100 ft)

Concentration < lOOppm
Concentration > lOOppm
20
-------
Figure 26. Final Indicator Kriging Results
14-
12-
I
Northing (100 ft)
8 -
4 -l
Revised Indicator Kriging Results

13 Added Second Phase Samples
2
i i
a 12
Easting (100 ft)
Estimated Probability

0.8 - 1.0

0.6 - 0.8

0.4 - 0.6

0.2 - 0.4

0.0 - 0.2
12
20
Missing
-------
Geostatistical Soil Sampling Guidance
be inefficient from a practical standpoint to collect just one new sample at a time before re-
estimating the IK results, a slightly more efficient alternative would be to collect two new
samples per iteration, one from the block with largest Type I error and one from the block with
largest Type n error. Such a strategy may be more feasible to implement and should give very
similar results in most cases.

6.5 MULTIPLE INDICATOR KRIGING
The simple IK approach should be practical and easy to implement for many sites and
constituents, particularly if it is easy to classify any given concentration measurement as being
either below or above the bright-line cutoff. When field screening techniques are used, the loss
of measurement precision for certain chemicals may result in measured values that are difficult
to classify as indicators, especially considering the large measurement standard error associated
with some screening methods. In these cases, it may be helpful to consider MBC, a variant of
IK.

MIK is essentially an extension of simple IK where indicator data are defined at each
sample location for more than one cutoff. Typically a series of increasing cutoffs are set, which
attempt to mimic the likely range of observed data. Then at each sample location, the indicator
transform for each cutoff is used to create a vector of indicator data (see earlier discussion of
IK). Simple K is then performed for the indicator data associated with each cutoff, so that in
the end, the user is left with a vector of IK kriged estimates at each unknown location x.
Assuming the vector of cutoffs has been arranged in increasing order and the IK estimates are
arranged in'a corresponding order, the vector of estimates at any given location represent the
hopefully increasing probabilities that the true concentration is less than the corresponding
cutoff. In fact, this vector can be thought of as an estimate of the cumulative probability
distribution of Z(x), where the distribution is computed only at the chosen set of cutoffs.

While performing an MIK study clearly involves more work than simple IK (since simple
IK must be run for each cutoff and not just at the single cutoff equal to bright-line), there are
some significant benefits. Only two of the benefits will be sketched here (see Journel 1988).
First and foremost, the MIK approach offers a natural way to incorporate the measurement

Draft 6-42 ' February 1996
-------
Geostatistical Soil Sampling Guidance
uncertainty associated with any observed value. With highly precise measurements, one would
generally be able to classify a particular sample value as either above or below each indicator
cutoff. With less precise measurements, this may not be possible; however, it is possible with
MIK to explicitly capture the uncertainty surrounding a particular value.

To illustrate, suppose at location x that the measured value is Z(x)=20 ppb with a
measurement standard error of 5 ppb. If the vector of chosen cutoffs is given by {10,20,30}
and the distribution of measurement errors is normal, we could make the following calculations.
For the first cutoff, the measured value is clearly greater. However, the uncertainty associated
with Z(x) suggests that there is approximately a 2.5 percent chance (the area of the normal curve
with parameters N(20,5) below a level of 10) that the true value is less than this cutoff.
Therefore, we could set the first indicator value to 0.025 instead of 0 to indicate our slight
uncertainty about the true concentration: For the second cutoff, we would normally set the
indicator value to 1 but since there is approximately a 50 percent chance that the true value
actually exceeds mis level, we could set the indicator value to 0.5 instead. Finally, for the last
cutoff, though the measured value is clearly less than 30 ppb, there is approximately a 2.5
percent chance that the true: value actually exceeds this point. Therefore the indicator value
could be set to 0.975.

Of course, handling the measurement uncertainty in this way no longer leaves us with
a simple series of Os and Is, but these Os and Is can likewise be regarded as probabilities in
their own right. A 0-valued indicator suggests there is absolutely no chance that the measured
value is less man the cutoff, while a 1-valued indicator implies that the opposite is certain. So
in principle, the modification of the indicator data values does not at all affect the interpretation
of the IK results. In fact, the astute reader will have noticed that this same approach can be
done with the simple IK approach described above where the only cutoff is set to bright-line.
Nothing is different in MIK except the presence of multiple cutoffs.

The second major advantage of NOK over simple IK is that better estimates of the block-
specific mean concentration level can usually be obtained. In simple IK one can only compute
a probability that the mean concentration is below the cutoff. In MIK, a fairly robust but

Draft £3 February 1996
-------
Geostatistical Soil Sampling Guidance
approximate estimate of the actual concentration level can be computed, often leading to a better
classification rule for deciding whether or not a specific EU exceeds bright-line.

All in all, the MIK approach is more complicated than the simple IK method, but
substantial gains can often be realized. After all, with MIK, not only does one have estimated
probabilities as to whether EUs exceed bright-line, one also has similar probabilities for all the
other cutoffs as well, offering a more detailed and informative picture of the distribution of
onsite concentration levels.
Draft 6-44 February 1996
-------
Geostatistical Soil Sampling Guidance
REFERENCES
Aitchison, J., and J.A.C. Brown. 1969. The Lognormal Distribution. Cambridge:
Cambridge University Press.

Barth, D.S., B.J. Mason, T.H. Slarks, and K.W. Brown. 1989. Soil Sampling Quality
Assurance User's Guide, Second Edition. USEPA Cooperative Agreement No. CR
814701, EMSL-Las Vegas, NV.

Englund, E.J., and N. Heravi. 1993. Conditional simulation: practical application for
sampling design optimization, in Geostatistics Troia '92. A. Scares, Ed. Kluwer
Academic Publishers.

Englund, E.J., and N. Heravi. 1994. Phased sampling for soil remediation.
Environmental and Ecological Statistics, Vol. 1, pp. 247-263.

Flatman, G.T., and A.A. Yfantis. 1984. Geostatistical strategy for soil sampling: the
survey and the census. Environmental Monitoring and Assessment, 4, pp. 335-349.

Gilbert, R.O. 1987. Statistical Methods for Environmental Pollution Monitoring. New
York: Van Nostrand Reinhold.

Hahn, G.J., and W.Q. Meeker. 1991. Statistical Intervals: A Guide for Practitioners.
New York: Wiley & Sons.

Isaaks, E.H.,andR.M. Srivastava. 1989. An Introduction to Applied Geostatistics. New
York: Oxford Univ. Press.

Journal, A. 1988. Non-parametric geostatistics for risk and additional sampling
assessment, in Principles of Environmental Sampling. L. Keith, Ed. American
Chemical Society, pp. 45-72.

Land, C.E. 1971. Confidence intervals for linear functions of the normal mean and
variance. Annals of Mathematical Statistics, 42, pp. 1187-1205.

Land, C.E. 1975. Tables of confidence limits for linear functions of the normal mean
and variance, in Selected Tables in Mathematical Statistics, vol. III. American
Mathematical Society, Providence, RI, pp. 385-419.
Draft R-l February 1996
-------
Geostatisticd Soil Sampling Guidance
Li, P.C., and R. Rajagopal. 1994. "Utility of Screening in Environmental Monitoring,
Part 2: Sequential Screening Methods for Monitoring VOCs at Waste Sites."
American Environmental Laboratory. May.

Mason, B J. 1992. Preparation of Soil Sampling Protocols: Sampling Techniques and
Strategies. EMSL-Las Vegas, NV, USEPA, (EPA/600/R-92/128).

Neptune, D., E.P. Brantly, M.J. Messner, and D.I. Michael. 1990. Quantitative decision
making in Superfund: a data quality objectives case study. HMC, pp. 18-27.

Pitard, F.F. 1993. Pierre Gy 's Sampling Theory and Sampling Practice: Heterogeneity,
Sampling Correctness, and Statistical Process Control. 2nd Ed. Boca Raton, FL:
CRC Press, Inc.

USEPA. 1986. Test Methods for Evaluating Solid Waste, Physical/Chemical Methods.
3rd Edition and its updates. November.

USEPA, 1988. GEO-EAS (Geostatistical Environmental Assessment Software) User's
Guide. EMSL-Las Vegas, NV, (EPA 600/4-88/033).

USEPA. 1989. Methods for Evaluating the Attainment of Cleanup Standards: Volume
I: Soils and Solid Media. Washington, DC: EPA, (EPA 230/02-89-042).

USEPA. 1990. Geostatistics for Waste Management: A User's Manual for the GEO-
PACK (Version 1.0) Geostatistical Software System. Robert S. Kerr Environmental
Research Lab, ADA, Oklahoma. (EPA/600/8-90/004).

USEPA. 1995. Superfund Soil Screening Level Guidance. Washington, DC: EPA, (EPA
540/F-95/041;NTIS#: PB.963501).

Yfantis, A.A., G.T. Flatman, and J.V. Behar. 1987. Efficiency of kriging estimation for
square, triangular, and hexagonal grids. Mathematical Geology, 19, pp. 183-205.
Draft R-2 February 1996
-------
APPENDIX A

THE DATA QUALITY OBJECTIVES PROCESS
One planning tool available to help in the design and implementation of a sampling and
analysis project is the data quality objective (DQO) process developed by the United States
Environmental Protection Agency's (USEPA's) Quality Assurance Management Staff (QAMS).
DQOs are qualitative and quantitative statements that clarify the study objective, define the type,
quantity, and quality of required data, and specify the tolerable limits on decision errors. DQOs
are used to define the quality control (QQ requirements for data collection, sampling and analysis,
and data review and evaluation. These QC requirements are included in the quality assurance
(QA), objectives for environmental measurements and the DQOs also are incorporated into a
quality assurance project plan (QAPP). DQO development is an ongoing process involving
discussions between management and technical staff. This process is a practical means for
specifying and ensuring that the requested information is known to be of the "required" quality.

Failure to establish DQOs prior to implementing field and laboratory activities can cause
difficulties for site investigators in the form of inefficiencies, increased costs, or generation of
unusable data. For example, if low-cost analytical techniques will suffice, but higher cost
techniques are selected, time and money are wasted.

THE DQO PROCESS
A seven-step DQO Process has been developed for uniform and consistent data generation
activities:
Stepl: State the Problem
Step 2: Identify the Decision
Step 3: Identify Inputs to the Decision
Step 4: Define the Study Boundaries
Step 5: Develop a Decision Rule
Draft A-l February 1996
-------
Geostatistieal Soil Sampling Guidance—Appendix A
Step 6: Specify Tolerable Limits on Decision Errors
Step 7: Optimize the Design for Obtaining Data.
The DQO process is based on the guidance document issued by QAMS as "final" in
September 1994. Guidance for the Data Quality Objectives Process (EPA QA/G-4) provides
general guidance to organizations for developing data quality criteria and performance
specifications for decision making. Chapter One of the SW-846 Methods Manual also provides
additional guidance on the QA program in the Office of Solid Waste (OSW).

Stepl: State the Problem
The first step in any derision-malting process is to define the problem that has resulted in the
inception of the study. A planning team is assembled and is tasked with developing the project-
specific DQOs. The planning team comprises personnel representing all phases of the project and
may include technical project managers, QA/QC managers, data users, and decision maken. The
primary decision maker, or leader, must be identified. Where applicable, field and lab
technicians, chemists, statisticians, and modelers also should be recruited for the planning team.
The responsibilities of each team member should be clearly defined during this initial planning
stage.

A concise description of the problem must be developed during this early stage of DQO
development Existing information should be summarized, and the need for additional
information should be determined. Performance of literature searches or an evaluation of
historical data or ongoing studies related to the current site can be studied.

Available financial and manpower resources must be identified and project milestones and
deadlines also should be determined, if sufficient information is present

Step 2: Identify the Decision
This step is used to define the decision statement that the study must resolve. The decision
statement is a consolidation of the principal study question and alternative actions. The principal

Draft A-2 February 1996
-------
Geoaatistied Soil Sampling Guidance—Appendix A
study question identifies the key unknown conditions or unresolved issues that will be used to
reveal the solution to the problem being investigated. Alternative actions, which are items that
may be taken to solve the problem based on the outcome or on the decisions arrived at from the
study, also are identified.

Step 3: Identify Inputs to the Decision
Specific information required to resolve the decision statement must be identified during this
step in the DQO development process. The selected data acquisition approach will lead to the next
set of questions that address the specific types of information needed to support the decision
statement. Sources of the necessary information are then developed and can include regulatory
guidance, scientific literature, historical data or past projects that were similar in scope to the
current effort.

A bright-line, defined as the threshold value that provides the criterion for choosing between
alternative actions, needs to be established.

Existing analytical methods are evaluated to determine if the method will perform as
published, or if method modification or method development needs to be included in the study.
Each analyte of interest should have a method detection limit or level of quantitation assigned, as
this performance information is used later in the DQO Process (Steps 5 & 7).

Step 4: Define the Study Boundaries
TWO types of boundaries must be defined and quantified: spatial and temporal. Spatial
boundaries define the physical area to be studied and locations to collect samples. Temporal
boundaries describe the timeframe mat the study data will represent and when the samples should
be collected. To arrive at these boundaries, the characteristics that define the population must be
identified. For instance, the compounds of interest and the matrix that should be evaluated might
need to be selected to determine if the compounds are present and at what typical concentrations.
Draft A-3 February 1996
-------
Gtoaatisticol Soil Sampling Guidance-Appendix A
The spatial boundaries, or the geographic area to be studied, must be specified using some
physical feature or border, such as units of measure. Where possible, the population should be
further segregated into more homogenous subsets, or strata, as a means of reducing variability.

Step 5: Develop a Decision Rule
A statement must be developed that combines the parameters of interest and the action levels
with the DQO outputs already developed. The combination of these three elements forms the
decision rule and summarizes what attributes the decision maker wants to study and how the
information will assist in solving the central problem. The four elements that form the decision
rule include: (1) the parameter of interest that describes a characteristic of the statistical
population, (2) the scale of decision making defined in Step 4 when boundaries were defined,
(3) the bright-line (action level or a measurement threshold value), used as a criterion to choose
alternative actions through the use of "if/then" statements, and (4) identifying the alternative
actions, as developed in Step 2.

Step 6: Specify Limits on Decision Errors
Decision makers are interested in knowing the true state of some feature of the environment
(e.g., the concentration of the constituent of concern in soil). However, data generated from a
sampling and analysis program can only be used to estimate this state, and there is a chance that
the data are in error and the correct decision will not be made. This step in the DQO development
process allows the decision makers to specify acceptable or tolerable limits on decision errors.

There are at least two primary reasons why the decision maker might not determine the true
value of a population parameter. First, sampling design error occurs when the sampling design
is unable to capture the complete extent of variability that exists in the true state of the
environment Second, measurement error, which is a combination of random and systematic
errors, results from various steps in the measurement process including sample collection, sample
handling, sample preparation, sample analysis, data reduction, and data handling. The
combination of sampling design error and measurement error can be viewed as the total study
error and may lead to decision errors.

A-4 Febnuuyim
-------
Geostatistical Soil Sampling Guidance—Appendix A
To estimate the probability of decision errors, the anticipated range of results from the
parameters of interest usually must be determined, perhaps through the use of historical data.
Values between the observed upper and lower bounds or perhaps from a distribution modeled after
the historical data can be used to estimate how likely are the various decision errors that might
occur depending on the hypothesis framework that has been constructed. The statistical
hypotheses associated with any decision criterion consist of a null hypothesis, supposed to
represent the initially assumed condition of the site, and the alternative hypothesis, representing
the condition of the site when the null hypothesis is not true. Often the null hypothesis indicates
the location of the center (in terms of concentration levels) of the hypothesized sampling
distribution, but it can describe other characteristics of the site population (e.g., an upper
percentile). Both the null and alternative hypotheses make statements regarding a characteristic
of the population rather than a characteristic of a sample. The probability of a decision error is
determined by estimating the chance mat one of the two hypotheses will be accepted when in fact
the opposite hypothesis is true.

To identify the decision errors and to construct the hypothesis framework, performance of
four steps must be accomplished. (1) Both types of decision errors must be defined, determining
which occurs above the action level and which occurs below the action level. (2) Potential
consequences of each decision error must be specified and the impact of arriving at the incorrect
decision considered. The severity of the error may affect economic and social costs or have
ramifications to human health and the environment One of the two types of errors (e.g., above
or below the action level) often will have a greater impact than the other. (3) The decision maker
should evaluate which scenario results in more serious consequences. (4) The null hypothesis,
or baseline condition, should be defined and the decision as to what constitutes a false-positive or
false-negative result should be answered. The term false-positive is assigned to the decision error
where the decision maker rejects the null hypothesis when it is true. Conversely, a false-negative
is the resulting decision error if the null hypothesis is not rejected when it is false.

Some decision errors may be considered minor and of minimal impact to use of the data.
This "grey region* should be specified as a range of values having little or no adverse

Draft A-5 ' February 1996
-------
Geoaatistieal Soil Sampling Guidance-Appendix A
consequences to the project. Use of these grey area regions is often important as a tool for
building tolerable limits on the probability of making an incorrect decision.

Step 7: Optimize the Design for Obtaining Data
The final step addresses the design of a resource-effective data collection system to satisfy
the DQOs. Verify that the DQO outputs produced in all preceding steps are internally consistent.
The design options should have been developed based on cost benefits versus achieving DQOs.
General data collection designs can then be developed as either a factorial design, systematic
sampling, composite sampling, or one of the following random sampling designs: simple,
stratified, or sequential.

In general, three statistical expressions need to be selected to optimize the data collection
design. (1) An appropriate method for testing the statistical hypothesis framework must be
chosen. (2) A statistical model used to compare measured values to the modeled values must be
developed and tested for consistency with the observed data. Once established, the model also can
be used to more thoroughly describe the components of error or bias that may exist in the
measured values. (3) Finally, a cost evaluation of number of samples versus the total cost of
sampling and analysis must be developed. Using these statistical expressions, an optimal sample
size and sampling layout can be chosen to meet the DQOs for each data collection design
alternative.

Quality Assurance Review
Lastly, have the document peer reviewed, preferably by personnel experienced in statistical
data collection designs. Ensure that all aspects of the project have been documented to minimize
the numbers of assumptions made during performance of the project

DQO DECISION ERROR FEASJBIUTY TRIALS (DEFT) SOFTWARE
The two most intensive steps in the DQO Process are Step 6: Specify Tolerable Limits on
Decision Errors, and Step 7: Optimize the Design for Obtaining Data. During Step 7, the entire
set of DQO outputs is incorporated into a sampling design. If the DQO constraints are not
Drqfl A-6 February 1996
-------
Geostatistical Soil Sampling Guidance-Appendix A
feasible, it is necessary to iterate through one or more of the earlier steps of the DQO Process to
identify a sampling design that will meet the budget and generate adequate data for the decision.
This iteration can be time consuming and costly. EPA developed the DEFT User's Guide and
software (USEPA 1994c) to streamline this iterative process. Users can change DQO constraints
such as limits on decision errors or the "grey region" and evaluate how these changes affect the
sample size for several basic sampling designs. The output of the DEFT software can be used to
set upper and lower bounds on the sample size (i.e., the appropriate number of observations).
Through this process, the planning team can evaluate whether these constraints are appropriate or
feasible before the sampling and analysis plan is developed.

Users of the DEFT software are first prompted to enter information from the DQO outputs
based on a series of entry screens. Specific information requested by the DEFT software includes:

• Parameter of interest
• Minimum and maximum values (range) of the parameter of interest
• Action level (i.e., the bright-line)
• Null and alternative hypothesis
• Bounds of the gray region
• Estimate of the standard deviation
• Cost per sample for sample collection (i.e., field cost per sample)
• Cost per sample for sample analysis (i.e., laboratory cost per sample)
• Probability limits on decision errors for the bounds of the gray region
• Any additional limits on decision errors.

The DEFT software automatically starts with a simple random sampling design, so the
information requested corresponds to this design.

EXAMPLE APPLICATION OF THE DEFT SOFTWARE
At a site contaminated with polynuclear aromatic hydrocarbons (PAH) compounds,
contaminated soil has been excavated and placed on a pad. Investigators are interested in

Draft A-7 February 1996
-------
Gcostatistieal Soil Sampling Guidance-Appendix A
determining whether the mean concentration of benzo(a)pyrene (BAP) exceeds the bright-line
standard of 90 mg/Kg. The investigators have decided to use the DEFT software during the DQO
Process to help optimize the study design.

Parameter of Interest
•
The parameter of interest for this study is the population mean of the concentration of BAP.

Minimum and Maximum Values (Range) of the Parameter of Interest
Based on data generated during a preliminary study of the contaminated soil, the minimum
concentration of BAP was 62 mg/Kg and the maximum was 120 mg/Kg.

Action Level
The action level, or bright-line, for BAP is 90 mg/Kg.

Null and Alternative Hypothesis
H,: mean * bright-line vs. H,: mean < bright-line.

Bounds of the Gray Region
The gray region is bounded on one side by the bright-line (90 mg/Kg). For "H0:
mean * bright-line vs. H,: mean < bright-line," DEFT sets a default value for the other bound
of the gray region at the midpoint between the minimum concentration (62 mg/Kg) and the bright-
line. In this example, the lower bound of the gray region is 76 mg/Kg.

Estimate of the Standard Deviation
If there is no estimate of the standard deviation available, DEFT calculates a default value
given by:

(Maximum Concentration - Minimum Concentration)/6

Draft A-8 February 1996
-------
Geoaatistical Soil Sampling Guidance-Appendix A
In this example, the estimate of the standard deviation is 9.7.

Cost Per Sample for Sample Collection
The cost of sample collection is approximately $67.00 per sample. This estimate is based
on the following assumptions:

• Two field sampling technicians are required.
• Samplers can collect, prepare, and ship 12 samples per 8-hour day.
• Labor rate is $50.00/hour ("loaded" rate).

Cost Per Sample for Sample Analysis
The cost per soil sample analysis for semi-volatile organic compounds, including BAP, is
approximately $400.00 per sample.

Probability Limits on Decision Errors for the Bounds of the Gray Region
For this example, the probability of making a false positive error is set a o « .01, and the
probability of making a false negative error is set a p » .05.

After the above information is entered into the DEFT software, sampling design and DQO
summary information is provided. For this example, a simple random sampling design would
require 11 samples at a total cost of $5,137.00 (see attached "Design/DOO Summary Screen" and
"Decision Performance Goal Diagram Screen with the Performance Curve").
Draft A-9 February 1996
-------
Gcostattstical Soil Sampling Guidance—Appendix A
******************************************

Design/DQO Summary Screen

For the Sampling Design of: Simple Random Sampling
Total Cost: $5137.00
Laboratory Cost per Sample: $400.00
Field Cost per Sample: $67.00
Number of Samples: 11
Data Quality Objectives
Action Level: 90.00
Gray Region: 76.00 - 90.00
Null Hypothesis: mean 2 90.00
Standard Deviation (SD) : 9.67

Decision Error Limits
cone. pr ob (error) type
---- P(-)
---- F(-)
76.00 0.0500 F(-)
90.00 0.0100 F(+)
******************************************
Draft A-10 February 1996
-------
Geostatistical Soil Sampling Guidance—Appendix A
Action Level
90
Gray Region
I
73.60 85.20 96.80 108.4

True Mean Concentration
0.1
120
Dntft
A-11
February 1996
-------
Geostatistical Soil Sampling Guidance-Appendix A
RELEVANT GUIDANCE ON DATA QUALITY OBJECTIVES

1. USEPA. 1994a. EPA Requirements for Quality Assurance Pwject Plans for Environmental
Data Operations (Interim Final). EPA QA/R-5. Quality Assurance Management Staff
(QAMS), Washington, DC.

Presents detailed specifications and instructions for the information that must be contained
in a QAPP for environmental data operations performed by or on behalf of USEPA and the
procedures for its review and approval.

2. USEPA. 1994b. Guidance for the Data Quality Objectives Process (Final). EPA QA/G-4.
Quality Assurance Management Staff (QAMS), Washington, DC.

Offers general guidance on developing data quality criteria and performance specifications
for data operations. The document outlines the seven distinct steps of the DQO Process:
state the problem; identify the decision; identify inputs to the decision; define the decision
boundaries; develop a decision rule; specify limits on decision rule; and optimize the design
for obtaining data. Includes a detailed example and a glossary.

3. USEPA. 1994c. Data Quality Objectives Decision Error Feasibility Trials (DQO/DEFT)
Version 4.0 Software ami User's Guide. EPA QA/G-4D (Final). Quality Assurance
Management Staff (QAMS), Washington, DC.

The DEFT software uses the outputs from Steps 1 through 6 of the DQO Process to allow
a decision maker or member of the DQO planning team to quickly generate cost information
about several simple sampling designs based on the DQO constraints.

4. USEPA. 1996. Guia^tnce for Data Quality Assessment (Final). EPA QA/G-9. Quality
Assurance Management Staff (QAMS), Washington, DC.

The purpose of this guidance is to demonstrate the use of EPA's data quality assessment
(DQA) process in evaluating environmental data sets and to provide some graphical and
statistical tools that are useful in performing DQA.
Draft A-12 February 1996
-------
Geostatistical Soil Sampling Guidance
APPENDIX B.

CONFIDENCE INTERVALS FOR NON-LINEAR FUNCTIONS
To develop confidence intervals for non-linear functions of the parameters p and o which
are not of the form (ji + Xo2), the method of Land (1975) is described below. Land used the fact
that the confidence regions associated with the function (M + Xo2), for which exact confidence
intervals can be developed and which includes the case of the lognormal mean, are parallel lines
when drawn in the /i-o2 plane (see Figure B-l). Even when the "equi-confidence" bands—those
combinations of the parameters ft and o2 that are equally plausible for a given set of data—for
another non-linear transformation are curved, in a small enough region of the parameter space
they will be approximately parallel. In such a setting, it should be possible to approximate the
slightly curved confidence bands with straight lines of the form (p + Xo3). This idea is the basis
for Land's approximation.

Although many common non-linear transformations are applied to data to achieve
approximate normality, the logarithmic transformation is the only one for which the true mean
(E[X]) can be expressed as a function of the form 0* + A.O2). Other common transformations
resulting in normally distributed data have true means that are more complicated functions of the
parameters p and o2 (see Table B-l). In each of these cases, the exact confidence region around
the true mean is a curved, non-linear area in the p-o2 plane.

Table B-l. Transformations and Expected Values
Transformation
Logarithmic
Square Root
Cube Root
Fourth Root
Reciprocal
Normalizing Function
Y=log(X)
Y~JX
r»V*
Y*'Jx
r-1
X
E[XJ
exp(m + .5s2)
p2 + o2
V? + 3 po2
p4 + 6pV + 30*
J mf «\u* 1
4*E £ n(2l'M)
|i[ *-iVlU <-i J
Draft B-l February 1996
-------
Geostatistical Soil Sampling Guidance
Figure B-l. Confidence Contours
1.01
0.14-
0.61
0.51
0.34
0.11
0.01~V
T^ I I
0.12 0.24 OJ6 0.41 0.60 .
Coottraetioa of in approximate coarUaaca limit for • aoaliaaar fnactioatdt.o *)
0.72
Draft
B-2
February 1996
-------
Geostatistical Soil Sampling Guidance
To complete the chain of reasoning in Land's development, the non-linear (i.e., curved)
confidence region can only be adequately approximated in a small area very near the location of
the most plausible set of parameters. Using the technique of maximum likelihood as the
estimation engine, this set of parameters is labeled the maximum likelihood estimates (e.g., p, o).
Then, at this point in the (p, o^-plane, the value of lambda (A.) can be determined so that the line
(ji + A.02) is tangential to the (curved) confidence contour for the true mean E[X]- Because the
true confidence contours are roughly parallel in this small region around the maximum likelihood
estimates (because these estimates should be close to the true mean), confidence limits for the
approximating function (p + Ao2) also should roughly equal confidence limits for the true mean.

Computationally, the details of Land's method are as follows:

Step 1. Find a transformation that leads to approximately normally distributed data with
parameters (p.o2). Determine the function of these parameters f(p,o*) equivalent to
the true mean E[X], such as the examples for the square root and cube root given
above.

Step 2. Compute estimates of the mean p and variance o2 on the transformed scale, given by
the usual sample mean and variance. These values are essentially equal to the
•"maximum likelihood estimates. Then, approximating the estimated mean value
at the point of tangency, compute lambda by taking the ratio of partial derivatives of
the function f(p, o2); first with respect to o1, and then n, and evaluating at the point
(At O2) • That is, compute:
Step 3. Using this value of A,, compute the adjusted variance S* = 2Xs, and then use the tables
provided by Land (replacing s with S*) to find the values Q' and Qf that provide

Draft B-3 February 1996
-------
Geostatistical Soil Sampling Guidance
confidence limits for the function (ji + A.02) in the lognonnal case. These values are
computed with the following equation, which is identical to a lognonnal confidence
limit on the mean except that 0.5 is replaced by A,:
Setting first Q' and then £?A'~" equal to (p + Ao2), one has an approximation to the actual
confidence limits. However, an infinite combination of parameters (/i, a2) satisfy these
equations. So there is still a need to find those parameter values ( A . 0 ) on the line that
are the most plausible given the observed data. Finding this point on the (p, o^-plane
requires the use of a technique known as restricted maximum likelihood estimation.
Step 4. Compute the solution given by Land to the restricted maTmnim likelihood problem to
determine the most plausible cc-level parameter values (ft, 0 ):
where k - (2X)'1.
Step 5. Compute each final o-level confidence limit for E[X] as
Example
Consider the following data set comprising of 10 lead measurements from a series of soil
cores. Analysis of the data determined that a square root transformation best approximates
Draft B-4 February 1996
-------
Geostatistical Soil Sampling Guidance
normality on the transformed scale. To find approximate 95% confidence limits on the true
population mean, the steps from Land's algorithm above were followed.

Samnle Number Concentration
1 10.5 ppm
2 10.2 ppm
3 28.4 ppm
4 50.9 ppm
5 50.1 ppm
6 54.0 ppm
7 7.0 ppm
8 32.1 ppm
9 36.7 ppm
10 21.1 ppm

First, the mean and standard deviation on the transformed scale were computed to be 5.229
sqrt(ppm) and 1.751 sqrt(ppm) respectively. Then, because a square root transformation was
used, the true mean has the form E[X] = n* + o*. Consequently, the ratio of partial derivatives
can be expressed as (2 ft)'1, which gives an estimate of lambda of 0.09562 when the sample mean
is substituted for ft. Multiplying the sample standard deviation by 2A. gives S*=0.335. Using
Land's (1975) table with 9 degrees of freedom and taking a= 0.025, the appropriate adjustment
factors can be interpolated as H.^ -1.958 and H w= 2.556. The estimates for e° and Ql"
then follow- as 4.3794 and 7.0140 respectively, using the equation:
V/9
Finally, one pair of estimates (A.&2) was computed for each value of Q ', corresponding
to a-levels of .025 and .975. These pairs resulted from the formulas in Step 4 above, giving
(A.o2) equal to (4.0565, 3.3770) and (6.4734, 5.6534). Plugging each of these pairs into the
formula for E[X] in turn led to an approximate 95% confidence interval of (19.83 ppm, 47.56
ppm).

Draft B-5 February 1996
-------
Geostanstical Soil Sampling Guidance
Thus, the approximate Land procedure estimated with 95% confidence that the true lead
population mean is somewhere between the limits of 19.8 ppm and 47.6 ppm. Had Land's
procedure not been used to account for the transformation bias due to using the square root
function, a normal-based confidence interval on the transformed scale leads to limits of 3.976
sqrt(ppm) and 6.48 sqrt(ppm). Squaring these limits in turn gives the biased confidence interval
(15.8 ppm, 42.0 ppm). It is clearly seen that both of these latter limits are less than the
approximately correct limits resulting from Land's method. Hence the square root transformation,
i
like the log transformation, will lead to biased confidence limits on the mean that are too low,
unless the transformation bias is accounted for.
Draft B-6 February 1996
-------
APPENDLXC

OTHER ELEMENTS OF SAMPLE DESIGN OPTIMIZATION
To optimize sampling design to meet the Data Quality Objective, project planners must select
analytical methods with respect to their sensitivity, the action or "bright-line" level, and any
compositing strategy that might be employed. This appendix provides general guidance and
references for topics important to sample design optimization: (1) laboratory versus field screening
methods, (2) optimizing analytical methods selection, (3) bright-line values versus laboratory/field
methodology quantitation limits, and (4) compositing strategies with respect to quantitation limits.

C.I ANALYTICAL METHODS FOR SOIL SAMPLE ANALYSIS: COMPARING LABORATORY
METHODS VERSUS FIELD SCREENING METHODS
Environmental pollutants may be classified as organic or inorganic compounds. The organic
compounds are usually divided into volatile organic*, semivolatile organics, petroleum
hydrocarbons, pesticides, herbicides, polychlorinated biphenyls, and dioxins/furans. Inorganic
compounds include elemental metals and other inorganic constituents such as cyanide. Field
screening, field characterization/quantitation, or laboratory analysis are all viable options to
characterize soil samples for many of these chemical classifications. The choice of utilizing field
methods rather than shipping samples offsite to a laboratory facility will depend on the level of
sample characterization that is required, quality control constraints, and the ultimate final use of
the data. There is certainly a cost factor that also must be considered between performing
laboratory versus field analyses.

Laboratory analyses provide a more controlled environment to produce analytical data using
equipment mat is often more complex and more accurate than field equipment. On the downside,
laboratory analyses are more expensive than field analyses, require trained technicians and
chemists to perform the analyses, and laboratory turn-around time can frequently be a limiting
Draft C-l February 1996
-------
Geoaatistieal Soil Sampling Guidance-Appendix C
factor. Shipping constraints also must be considered. The laboratory analytical instruments
require a certain amount of "stability" (e.g., temperature, power requirements, etc.) and usually
do not lend themselves to movement or vibration, making them unsuitable for field operations.

In the following sections, overviews of both laboratory and field methods will be presented
that individually address each type of analyte that may be required to characterize the chemical
composition of a soil sample. Sample collection considerations also will be discussed.

Volatile Organtes
Volatile organic compounds (VOCs) are defined as compounds that have a relatively low
boiling point and a relatively high vapor pressure. As a rule of thumb, VOCs generally have
boiling points less than about 200°C or have vapor pressures of greater than 0.1 mmHg. When
looking for VOCs in environmental matrices, it is necessary to collect and store samples using
special techniques that take into account the possibility that the compounds could degrade in the
sample container or could be released as soon as the cover from the sample container is opened.
Many commonly encountered industrial solvents are VOCs. Chlorinated solvents including
trichloroethene and chlorofluorocarbons (e.g., Freons) in addition to petroleum hydrocarbons and
aromatic compounds such as benzene, toluene, ethylbenzene, and xylenes (BTEX) are classified
as VOCs. Most regulated VOCs are liquids at room temperature; however, a few of the VOCs,
such as chloromethane, bromomethane, and vinyl chloride, are gaseous at standard temperature
and pressure (STP»25°C ®1 atm). Some commonly encountered chemical names are shown in
Table 1 at the end of mis appendix.

Volatile organic analyses (VOAs) can successfully be performed either in the laboratory or
in the field. Analyses applicable to each setting will be discussed in the following sections.

Laboratory Methods for Analyzing VOCs
The traditional method for analysis of VOCs utilizes a laboratory sample introduction
procedure called purge-and-trap. An inert gas is bubbled, or purged, through an aqueous sample.
In the case of a soil, a slurry is prepared with the soil mixed with either water or with methanol,

Draft C-2 February 1996
-------
Gtostatistieal Soil Sampling Guidance-Appendix C
and the slurry is then subjected to the purging procedure. The purging strips the VOCs from the
sample and the VOCs are then "trapped" on an adsorbent material. After purging is complete, this
trap is heated to release, or volatilize, the compounds. The mixture of compounds is then
separated into individual constituents in a gas chromatograph (GQ instrument and the VOCs are
detected by any number of analytical detection systems.

The USEPA Office of Solid Waste's Test Methods for Evaluating Solid Waste:
Physical/Chemical Methods, 3rd Edition and Updates (SW-846) is the commonly used method
manual when performing analyses on environmental matrices in support of Resource Conservation
and Recovery Act (RCRA) regulations. Methods 8260 and 8021 are SW-846 determinative
methods that may be used for analysis of VOCs. Each of these methods can use a "purge-and-
trap* coupled to a GC and a detector to determine whether certain VOCs are present and how
much of those VOCs are present Methods 8260 and 8021 can be used for the analysis of either
water or soil (solid waste) matrices. The primary difference between Method 8260 and Method
8021 is the type of detector used to after GC separation. Method 8260 uses a mass spectrometer
(MS) to determine the compounds and provide confirmation information regarding the molecular
weight and structure of the compounds of interest being detected. Method 8021 uses two simpler
detectors in series; a photoionization detector (PID) and a electrolytic conductivity detector
(ELCD). The PID is sensitive to pi electrons in aromatic compounds such as benzene or toluene.
The ELCD is sensitive to electronegative atoms such as chlorine or bromine. NOTE:
Historically, Methods 8240, 8010, and 8020 had been the SW-846 methods of choice for VOCs.
However, Methods 8260 and 8021 were added to SW-846 several years ago so that specific
guidance regarding the use of capillary columns could be added to the manual. Capillary columns
provide improved resolution of GC peaks compared to the older technology, packed columns,
specified in Methods 8240, 8010, and 8020. Furthermore, Methods 8010 and 8020 have been
combined into a single method (Method 8021), saving time and analysis costs.

A benefit to using a method that employs GC/MS instrumentation, rather than just gas
chromatpgraphy (GQ, is unequivocal confirmation regarding the presence of a compound.
However, GC/MS analyses have the highest per sample cost of any commonly used method for
VOC determination. Methods, such as Method 8021, that use GC coupled to detection systems
Draft C-3 February 1996
-------
Gtottatistical Soil Sampling Guidance-Appendix C
specific to certain chemicals (e.g., halogenated compounds) can have some limitations in the
identification process, but typically achieve lower detection limits for analytes of interest. The
cost of GC methods for VOC analysis can often be 50 percent less than GC/MS methods.

As shown in Table 2 at the end of this appendix, there are a number of different GC methods
and sample introduction techniques that may be used to analyze samples for different analytes.

Field Screening Procedures far Analyzing VOCs
There are alternative methods for analysis of VOCs in a field environment rather than
shipping samples to an offsite laboratory. Due in part to the volatility of these analytes and their
relatively low molecular weights, VOCs present in soils can be sampled by testing or "sniffing"
the vapors released from the solid matrix. Options include:

• Immunoassay
• Flame lonization Detector
• Portable Gas Chromatograph
• Photoionization Detector (PID).

Sample Collection for VOCs
Due to the potential for volatilization of analytes of interest, collection of soil samples for
laboratory analyses requires some special handling and conscientiousness by the technician.
Head space must be minimized, while at the same time, the amount of sampling handling while
filling sample jars must be lessened. This is especially important when samples are collected
during elevated ambient temperatures.

Solid samples must be maintained at a maximum temperature of 4°C from the time of
collection until analysis is performed. Chemical preservative is not used for solid samples.
However, aqueous samples, including trip blanks, equipment blanks, and field blanks must be
preserved to a pH of < 2 using .either hydrochloric acid (HC1) or sulfuric acid (HjSO4).
Draft C-4 February 1996
-------
Gcoaatistical Soil Sampling Guidance—\ooendix C
SW-846 methods for VOC analysis permit a maximum of 14 days from the date of sample
collection until the laboratory analysis is performed. This is referred to as a "holding time" and
defines the maximum allowable time period for analyses. Data generated after the holding time
has elapsed may be discarded as invalid.

Semivolatile Organic Compounds
Semivolatile organic compounds (SVOCs) include a number of commonly encountered
compounds including polynuclear aromatic hydrocarbons or polycyclic aromatic hydrocarbons
(PAHs or PNAs, respectively); aromatic compounds including dichlorobenzenes and
trichlorobenzenes; nitrosamines; and phenolic compounds. As the name implies, SVOCs are only
slightly volatile at ambient temperatures and have a greater molecular weight than VOCs. Most
of SVOCs, in their pure form, are solids at ambient temperature. Due to their lower volatility
(higher boiling point and lower vapor pressure), SVOCs can be extracted from common
environmental matrices using heated solvents without significant loss.

Another type of compound often detected in environmental samples are phthalates.
Phthalates are another type of Semivolatile target analyte. Phthalates are found predominantly in
plastic materials and may be present in a soil sample as a result of either field or laboratory
contamination. The use of polyvinylchloride (PVQ gloves can transfer phthalates, particularly
bis(2-ethylhexyl)phthalate, should a glove inadvertently contact the sample. Nitrile gloves are
recommended for general purpose field and laboratory use to minimize potential phthalate
contamination.

Laboratory Methods for the Analysis of SVOCs
Prior to instrumental analysis, soil samples must be extracted with an organic solvent,
typically methylene chloride or mixtures of acetone and hexane. This procedure is referred to as
a "liquid-solid" extraction. In the USEPA SW-846 Methods Manual, there are method numbers
specific to the extraction procedures independent of the instrumental analysis methods. Method
3550 uses a sonic probe that emits a high frequency used to disrupt the solids and to provide
maximum aggressive contact with the solvent Method 3540 uses a Soxhlet extractor system that

Drift C-5 February 1996
-------
Geoaatistical Soil Sampling Guidance-A opendu C
performs an extraction technique similar to the method used to brew ground coffee. The soil is
placed in an inert glass fiber vessel called a thimble. The thimble is placed into a special glass
holder that is located between a condenser and a flask. The flask is rilled with the extraction
solvent. As the solvent is heated and boils, the vapors rise and condense back into a liquid that
then drips through the solid sample contained in the thimble. Over a period of hours, the organic
contaminants are "washed" from the solid sample matrix and transferred into the solvent. The
volume of solvent containing the organic compounds can later be concentrated and subsequently
analyzed by analytical instrumentation.

Terminology frequently encountered when discussing SVOC data are "base-neutrals" and
"acids," abbreviated as BNAs and ABNs. The SVOCs can be categorized into acid compounds,
basic compounds, or neutral compounds, depending on the pH of each compound and its
preferential solubility. Methodology for the extraction of soils is performed at a single pH
(neutral) compared to aqueous samples that are extracted at both a basic and an acidic pH.

Method 8270 is the SW-846 method used to identify and quantify SVOCs using a GC/MS
instrument. Method 8250 is also listed in SW-846, but is an inferior packed column version of
Method 8270. The primary difference between the two methods is the GC column used to
separate the mixture of SVOCs. Method 8270 uses a capillary column that has better resolution
of GC peaks which more effectively is able to separate complex mixtures of SVOCs. •

There are GC methods available for analysis of SVOCs, but just as with VOCs, the amount
of definitive data generated by GC/MS methods is vastly superior compared to GC methods. The
GC/MS methods produce a mass spectrum for each detected compound. Chemical compounds
produce mass spectra that are unique, analogous to each person having a distinctive fingerprint
pattern.

High performance liquid chromatography, sometimes referred to as high pressure liquid
chromatography (HPLC) also has been used successfully for the separation and detection of the
PAH compounds. SW-846 Method 8310 uses HPLC coupled to fluorescence and ultraviolet
detectors. Method 8100 is a GC method specific for analysis of PAH compounds.
Dnffl C-6 February 1996
-------
Geostatutical Soil Sampling Guidance—Appendix C
Field Screening for the Analysis ofSVOCs
Immunoassay test kits have been developed for the detection of pentachlorophenol (PCP),
petroleum hydrocarbons, and explosives (TNT, RDX) as listed in Table 4 at the end of this
appendix. A more detailed discussion of immunoassay technologies is provided in a separate
section of this appendix.

A quick "screen" to determine the presence of PAH compounds in field environments is to
use the property of chemiluminescence. Certain compounds when exposed to higher energy
sources, such as ultraviolet (UV) light, will fluoresce. PCBs may also fluoresce, so this technique
can only be used as a broad-based screen to determine if a soil contains certain contaminants.

Discussion of GC Field Units
The majority of field portable GCs are designed only to sample atmospheric vapors.
Laboratory methods used in the field to extract soil and water samples can only be performed in
a mobile laboratory due to the complexity of the sample preparation procedures and due to health
and safety constraints. Therefore, the portable GCs are predominantly used to analyze for VOCs
in the form of soil gasses, headspace (outgasing of volatile compounds from waters or soils), or
ambient air.

The portable GCs require electrical power provided by either batteries or generators. If
using batteries, the instrument operational time may be a limiting factor. The GC system also will
require compressed gasses that are used as the carrier gas, and depending on the type of detector
being used, flame gasses (for FID and FPD detectors) also will need to be supplied. Regulations
regarding transportation of compressed gasses must be observed. Smaller cylinders of gasses,
called lecture bottles, are more portable than full-sized gas cylinders. The lecture bottle size may
restrict the operational period of the instrument as a result of limited gas supply. Also, there are
fieldable gas generators, which are easier to ship, but require some operational instruction.

The detection limits that can be achieved using a field GC unit may not be as low as gas
chromatograph in a laboratory. Detector sensitivity in a Meld unit may be less than laboratory

Draft C-7 February 1996
-------
Geoaatistieal Soil Sampling Guidance-Appendix C
models. Also, the detection level of compounds confined to an atmospheric environment will be
less than a direct measurement of water or soil matrices.

Petroleum Hydrocarbons
One of the most common analyses required for a site evaluation is a determination of the
possibility and extent of petroleum hydrocarbon (PHC) contamination. Although the basic
analysis of PHCs in aqueous and solid matrices has long been understood, there is still a
significant amount of confusion and mis-information as individuals attempt to request analyses
from laboratories in support of regulations. Much of the confusion stems from the fact that states,
for the most pan, have been left with the task of establishing the methods for PHC analyses.
While some States refer the regulated community to established methods or "modified" methods
in their regulations, other States have developed their own standardized method for determining
the extent of PHC contamination.

In several cases, the methods required by State regulations provide little information about
the fraction of PHCs that were released on the site. Some methods may even show false positives
depending on the nature of the soil matrix. Without PHC fraction information, the individual
tasked with the evaluation of the site will find it difficult to recommend cleanup or remediation
approaches. Therefore, it is important to select methods to analyze for PHCs that will both satisfy
the regulatory body's requirements and the background information needs of the site evaluation.

Table 3, found at the end of this appendix and titled 'Common Petroleum Hydrocarbon
(PHC) Methods of Analysis," lists several of the common quantitative analysis procedures
available at the release of this document The procedures can be divided into two types of
analyses: spectrophotometric infrared (IR) and gas chromatography/flame ionization detector
(GC/FID). The two methods listed in Table 3 that use the spectrophotometric IR approach are
USEPA Method 418.1 and Southern California Laboratory (SCL) Method 418. The technique
involves extraction of the sample using Freon-113, which is a chlorofluorocarbon (CFC) and is
known to degrade ozone in the Earth's upper atmosphere. The USEPA and the international
community are trying to phase out use of dangerous CFCs. The extract is then cleaned up through
C-8 February 1996
-------
Geoaatistical Soil Sampling Guidance-Appendix C
the use of a silica gel column to remove nonpetroleum hydrocarbons (e.g., fatty acids, greases,
and oils) that are also extracted by the Freon. The cleaned-up extract is then analyzed by IR to
produce a single number representing the total amount of PHC recovered by the process.
However, there is evidence that the single silica gel cleanup is not adequate to remove all of the
potential positive interferences that may occur in complex environmental matrices.

The other methods listed in Table 3 utilize GC/FID as an analytical technique. The most
common GC/FID procedures for PHCs can be divided into gasoline range organics (GROs) and
diesel range organics (DROs). GROs are considered VOCs and DROs are considered SVOCs.
Samples should be handled using the proper preservation and handling precautions specified for
VOCs and SVOCs to prevent losses. The results from the GC/FID analysis are graphically
displayed as a series of peaks representing the various components of the PHC mixture. The total
area under these peaks is directly related to the concentration of PHCs in the sample.

.The general approach to GRO analysis involves the purge-and-trap GC sample introduction
technique. Water samples are measured into the purging unit while soil samples must have a
specified amount of water added after they are placed in the purging unit Both the API and
Wisconsin GRO procedures specify addition of methanol to soil samples while in the field to
disperse and extract the sample prior to introduction into the purge-and-trap unit. Although
methanol has been shown to prevent losses of VOC target analytes prior to analysis, the solvent
can be difficult to work with in the field from a health and safety perspective.

The general approach to DRO analysis is extraction with a solvent, usually methylene
chloride (CH2Clj), followed by injection of an aliquot of the sample extract into the GC/FID.
Extraction can be performed by Soxhlet, sonication, or prolonged shaking/mixing with the
extraction solvent

Contacting Die state regulatory branch in charge of underground storage tank (UST) testing
is normally the way to determine the proper method to satisfy State regulations. Sometimes
contacting the USEPA Regional Coordinator for UST issues also may be a way to get a list of
individuals at the State level or list of methods that are approved for States in that USEPA Region.
Draft C-9 ' February 1996
-------
Geottatistical Soil Sampling Guidance—Appendix C
Some States even have newsletters that highlight changes and updates to UST regulations. Any
historical data, including records of what was stored in the tanks or previous analyses of samples
collected from the site may provide important information about what PHC fractions to test for
in the samples.

If the site is not relatively new, then it is very likely that any PHC mixture spilled at the site
has "weathered." The term weathered refers to the changes that occur to a PHC mixture upon
exposure to the environment over time. Evaporation and biodegradation are two of the driving
forces in the weathering process. After exposure to weathering, a sample of gasoline will have
different GC peak pattern than a characteristic gasoline standard peak pattern. Sometimes new
GC peaks can be seen as biodegradation occurs and new compounds are formed in the degradation
process. Other peaks may diminish in intensity or completely disappear.

Pesticides and Herbicides
There are numerous types of pesticides and herbicides that may be present in a soil sample.
Many of these compounds are slow to degrade and their degradation products also may be targeted
for analysis. GC has been the preferred method of analysis due in part to superior detection limits
compared with alternative instrumentation. The use of compound-specific GC detectors optimizes
the detection limits. For example, a method that uses an electron capture detector (ECD) is
specific tot halogenated compounds (e.g., ones mat contain either chlorine, fluorine, or bromine)
and the nitrogen-phosphorous detector (NPD) is used for compounds that have nitrogen and/or
phosphorous atoms. A summary of methods and the compounds amenable to detection are
provided in Table 2.

Due to the laborious extraction procedures required prior to instrumental analysis, pesticide
analyses, like the SVOC analyses, are most frequently performed in a laboratory. The
development of immunoasay test kits now permit field detection of some pesticide and herbicide
compounds. Compounds suitable for immunoassay detection are shown in Table 4.
Draft C-10 februa^!996
-------
Geostatistieal Soil Sampling Guidance-Appendix C
Also provided in Table 4 are a number of additional immunoassay kits specific to either
pesticide or herbicide compounds that are currently under review based on recent performance
data, or are under development by commercial manufacturers. Kits for new compounds likely
will be developed based on market demand.

Polychlorinated Biphenyk
Polychlorinated biphenyls (PCBs) have been persistent in the environment following decades
of use in power transformer oils, hydraulic fluids, capacitors, and lighting ballasts in addition to
other less frequent uses. PCBs also are referred to by their trade names including Aroclors.
Aroclors are identified by the their chlorine content as percent chlorine by weight. For Aroclor
nomenclature, the last two digits of the four digit identifying code indicate this percentage. For
example, Aroclor 1260 is approximately 60 percent chlorine by weight (NOTE: this rule is not
applicable to Aroclor 1016 that has similar total chlorine content to Aroclor 1242 but contains a
different congener distribution). Also, the more chlorinated PCBs are more viscous in their pure
form.

Since these compounds are heavily chlorinated, the analysis is performed using the same
methods as the organochlorine pesticides such as GC/ECD by SW-846 Method 8081 (or the old
packed column version, Method 8080). In addition immunoassay test kits have been developed,
as listed in Table 4, that are specific to- PCB detection, permitting easier and quicker field
detection.

PCB determinations have been adapted for immunoassay field testing by the major kit
manufacturers. Detection limits are in the 1 ppm range depending on the specific PCB Aroclors
present in the sample. A quick "screen" to determine the presence of PCBs in field environments
is to use the property of chemiluminescence similar to using this technique as described for PAH
evaluations. Certain compounds when exposed to higher energy sources, such as UV light, will
fluoresce. Other compounds may also fluoresce, so this technique can only be used as a broad-
based screen to determine if a soil contains contaminants.
Draft C-ll February 1996
-------
Gtoaatistical Soil Sampling Guidance-Appendix C
Dioxins and Furans
Chlorinated dibenzo-p-dioxins (PCDDs) and dibenzofurans (PCDFs) are analyzed using
methods that can only be performed in a sophisticated laboratory. The methods, including SW-
846 Method 8280 and Method 8290, include extensive (and therefore, expensive) sample
preparation procedures and utilize GC/MS equipment calibrated to exacting specifications in order
to separate and identify the 17 2,3,7,8-substituted PCDDs/PCDFs of lexicological and regulatory
concern. The primary difference between Methods 8280 and 8290 is that Method 8280 is a "low
resolution" mass spectrometric method, capable of differentiating compounds that differ by one
atomic mass unit (amu) and Method 8290 is "high resolution" method is capable of identifying
compounds that differ in weight by as little as 5/10,OOOths of an amu. The higher mass resolution
allows the analyst to achieve lower detection limits and prevents several common interferences in
the low resolution method from being misidentified as dioxins or furans. Both methods are very
costly due to the level of sample preparation to remove potential interferences and ready the
sample for instrumental analysis.

Field determination of dioxins and furans, in its current state, has only limited use as a
screening technique for potential sites of contamination. The primary difficulty with the
immunoassay technique is that the test kits cannot differentiate between different isomers of
dioxins and furans, and may give a positive response to any compound having a dioxin structure.

Metals and Cyanide
Laboratory Methods for the Analysis of Metals
The customary methods for analysis of the individual metallic elements include either atomic
absorption (AA) specuoscopy or inductively coupled argon plasma/atomic emission spectroscopy
(ICP). Each technique is performed in the laboratory and is not readily amenable to field use.
ICP provides simultaneous analysis of more than one element but sacrifices some sensitivity
compared to AA, which typically achieves superior detection limits but is limited to only one
element per analysis. There are different atomic absorption analysis techniques. Graphite furnace
atomic absorption (GFAA) achieves the best detection limits and is the preferred method for
analysis of the "heavy metals." Heavy metals exhibiting high toxicities include arsenic, lead,
Dnffi C-12 February 1996
-------
Gcostautical Soil Sampling Guidance—Appendix C
selenium, and thallium. Mercury can be analyzed only by AA (rather than ICP) and uses a
slightly different technique (cold vapor atomic absorption [CVAA]) than other elements amenable
to AA. Flame atomic absorption (FLAA) and hydride generation AA also have specific
applications but are not encountered as frequently as ICP, GFAA, and CVAA. Analysis using
ICP requires compressed gases and GFAA has significant electrical demands, both limiting the
applicability to Meld adaptation.

For the analysis of soil samples, the matrix must be prepared prior to either AA or ICP
analyses. Analogous to liquid-solid extraction performed for organic determinations, the solid
samples must be digested with acids prior to analysis for metals. Different SW-846 preparation
methods are available depending on the sample matrix, the targeted metals, and the
instrumentation that will be used for analysis. Most of the methods require a portion of the
sample that has been mixed with acid to' be slowly heated on an electrical hot-plate. Newer
methods are currently available that perform the digestion procedure in a laboratory microwave
oven, reducing the preparation time and increasing digestion efficiencies, especially for more
solid, complex matrices.

Field Methods far Metals Identification
The immunoassay test kits that have been developed to date are primarily for the detection
of organic compounds. Development of immunoassay for inorganic species has not been
extensively investigated on a commercial basis. There are kits that will be available for mercury
(measured as the mercuric ion) and for lead.

However, analysis of some inorganics, such as lead, can be performed with equipment
designed for field use. The X-ray fluorescence (XRF) instrument is commercially available and
can be used for some inorganic applications. The cost of the instrument is significant and some
models require advanced training to ensure proper usage. The XRF unit comes with radioactive
source material, and special permitting may be required prior to transport to various field
locations.. There are some more simplified XRF units that are limited only to lead analysis, but
their use is simplified to permit fast confirmation of lead content in soil, paint, and paint dust.

Drttf C-13 February 1996
-------
Geostatistical Soil Sampling Guidance-Appendix C
Cyanide
Cyanide refers to all compounds containing a CN group that can be determined analytically
as the cyanide ion, CN*. CN cyanide compounds are referred to as simple and complex
cyanides. This distinction between simple and complex cyanides provides a mechanism to predict
the relative solubility or dissociation with the ultimate formation of toxic hydrogen cyanide
(HCN), which is a gas.

Laboratory analytical methods are available to determine a number of potential cyanide
classifications including total, amenable, and "free" cyanide concentrations. The terms "weak"
cyanide and "dissociable" cyanide can be used interchangeably with free cyanide.

Total cyanide is defined as both simple and complex compounds that are converted to HCN
gas when sulfuric acid is added and heat is applied using a customized distillation and HCN
adsorption apparatus. However, it is important to note that the total cyanide distillation procedure
may hot fully recover some of the more stable metallic cyanide complexes and may decompose
certain organic-cyanide compounds. Therefore, the total cyanide concentration reported may
underestimate the sum of all cyanide compounds present The sodium hydroxide adsorption
solution obtained from the total cyanide distillation contains HCN as NaCN, and can be analyzed
using a variety of procedures (colorimetric, potentiometric, or titrimetric) depending on the
sensitivity required.

Amenable cyanide refers to the portion of total cyanide that is effectively destroyed through
excess chlorination. In order to determine amenable cyanide, two distillations similar to a total
determination are required — one that has been chlorinated and one that is unchlorinated. The
analysis of both adsorption solutions and the resulting difference in concentration is equal to
cyanides amenable to chlorination. Since amenable cyanide includes primarily unstable and easily
dissociable cyanide compounds, the amenable concentration can be used to estimate the relative
sample toxicity.
Draft C-14 February 1996
-------
Geostatistical Soil Sampling Guidance—Appendix C
Another means to estimate a sample matrix cyanide toxicity is the determination of free
cyanide and includes those cyanide compounds that readily liberate HCN upon addition of a dilute
acid solution. The determination of free cyanide is similar to total cyanide with the exception of
the distillation procedure required for total cyanide.

Because both total and amenable cyanide require a distillation with specialized glassware,
a laboratory setting is necessary. The availability of field testing applications to the determination
of cyanide are therefore limited to the quantitation of free cyanide only. Field test kits are
designed to provide a colorimetric determination (as measured either visually or with a pocket
colorimeter) of free cyanide based on the sample matrix reaction with a dilute acid. The free
cyanide concentrations detected in the field should only be used for screening purposes due to a
number of possible matrix interferences. Therefore, all critical field data points should be
confirmed with a replicate laboratory analysis.

Ion Chromatography
Ion chromatography is a technique for sequential determination of anions, cations, or
selected organic acids using ion exchange with either a conductivity, amperometric, or
colorimetric detector. The technique is applicable to aqueous samples that have been filtered to
remove particles greater than 0.45 pm. Solid samples may be analyzed, but a leaching and
filtration step would be necessary prior to analysis. Because solid particles are excluded from
analysis, the concentration measured is more indicative of the dissolved rather than the total
concentration present.

The basic ion chromatography system includes an injection port through which the aqueous
sample is introduced and merged with an eluent stream that is pumped onto a packed column
containing a resin material with active sites that have an affinity for the constituents of interest.
Analytes are separated and have unique retention times based on their affinity to these active sites.
The individual analyte concentrations are measured by a detector and displayed as instrument
responses over analysis time. These responses are recorded as peaks at a given retention time and
are integrated against a known series of calibration standards to calculate the sample concentration.

Draft C-15 February 1996
-------
Geostatistical Soil Sampling Guidance—Appendix C
Although ion chromatography procedures provide a mechanism to determine multiple
analytes with a high degree of method sensitivity, there are numerous matrix interferences that
inhibit adequate analyte separation and create co-elution problems. In most instances, these
interferences can be adequately resolved through trial and error experimentation; however, an
experienced chromatographer would be recommended for these situations. Efficient sample
quantification is susceptible to reagent and eluent background contamination as well as ambient
temperature and humidity extremes. For this reason, even though the ion chromatography
equipment and supplies are portable, analytical procedures should be limited to a controlled
laboratory environment

The When and Why of Field Screening and Analysis
The type of field equipment to be used and the tests to be performed will be determined, in
pan, by the type and quality of information required on a quick-response basis. There must be
a planned approach to sample and site characterization. For example, if rapid data acquisition is
needed to determine the boundaries of contamination caused by a chemical spill, then a semi-
quantitative or even a qualitative test may be sufficient to gather preliminary information. The
determination would be able to identify the most contaminated sites based on a positive test for
a specific chemical compound at a stated detection limit The total financial burden would be
minimized, but so would the quantity of information generated by many of the available field
tests.

In order to acquire results in a more timely manner and perform tests at lower costs
compared to laboratory analyses, a number of field screening techniques or semi-quantitative tests
are available for use by field personnel. Portable analyses are currently at a stage where new
developments and new and improved technologies are emerging on a regular basis. An annual
international conference is devoted to discussing and comparing field screening methods for
hazardous wastes and toxic chemicals. During the technical program, new methods and new
instrumentation are presented in addition to performance data generated with newly introduced
products available for commercial use.
Draft C-16 February 1996
-------
Gtoaatistical Soil Sampling Guidance-Appendix C
The following sections are presented to illustrate the different types of field testing apparatus
that are currently available and also to highlight the advantages as well as the limitations of
various equipment.

Immunoassay Test Kits
The application of newer technologies, such as immunoassay test kits, is increasing as more
analytes are incorporated into this methodology. The frequency of immunoassay test kit usage
is expanding based on a number of factors:

• The test is less costly than laboratory tests.
• The turn around is rapid—typically less than 1 hour from sample collection to final
result.
• Immunoassay permits a cost effective means to map contaminated sites, monitor
remediation, identify hot spots, determine risk, verify that field equipment has been
decontaminated, or choose selected samples for laboratory analysis to confirm and
quantify field assays.

Presented on a simplified basis, the immunoassay technique uses a controlled reaction to
initiate a competition for antibody bonding sites between the target compound and an "enzyme-
conjugate." The enzyme-conjugate is capable of producing a detectable color. Antibodies,
specific to the targeted analyte, are linked at binding sites to a solid substrate. The solid substrate
captures the target analyte molecules, if they are present in the sample and are collected on a
membrane surface of the collection device. Color developing solutions are placed into the
collection device and the presence (or absence) of the compounds of interest can be semi-
quantitated using either a portable spectrophotometer or reflectometer. The more color, the more
enzyme-conjugate is present, rather than target analyte. Less color production signifies greater
concentration of contaminant Qualitative determinations regarding the presence or absence of
a compound also can be quickly performed visually through comparison of the resultant color and
color reference cards supplied by the manufacturers.
Draft C-17 February 1996
-------
Geostatistical Soil Sampling Guidance—Appendix C
As usage increases, performance data are being compiled to substantiate manufacturers'
claims regarding detection limits and frequency of either false-positive results or false-negative
results. As with any procedure, the user must be aware of the limitations in the technology.
Some of the manufacturers state that the immunoassay kit is to be used as a screening test only.
One of the immunoassay manufacturer's notes that "actual (PCB) quantitation is only possible if
the contaminating Aroclor is known and if the assay is standardized using that Aroclor." In
addition, the manufacturer explains that "soil sampling error may significantly affect testing
reliability. The distribution of PCBs in different soils can be extremely heterogeneous. Soils
should be homogenized thoroughly before analysis by any method." Remember that the results
of any analytical testing are only as valid as the sample collected.

A false-positive is defined as any sample yielding a positive result but containing less than
one-half the method minimum detection limit. False positive reactions can be caused by the
presence of cross-reactants or mixtures of compounds in the soil samples. For instance, mixed
products such as diesel fuel, home heating oil, fuel oil, or Bunker C oil may elicit a positive
response in the PAH test kit Similarly, the presence of chlorinated compounds such as
dichlorobenzene, trichlorobenzene, and pentachlorophenol nave the potential to produce cross-
reactivity (i.e., a positive response) when employing the PCB test kit Users of immunoassay test
kits must be aware of possible interferences, or compounds that cause reactivity, when the
potential exists for co-occurrence of compounds that may generate a positive response. Table 4
indicates chemically similar compounds for each test lot that have been demonstrated to trigger
a positive response. There could be additional compounds that are not listed in this table that may
exhibit cross-reactivity based on chemical similarity to the target analyte.

Conversely, the definition of a false-negative is a sample that yields a negative result but
contains more than twice the MDL.

Also, the user must be aware that test kits such as the total petroleum hydrocarbon
immunoassay will react to any number of hydrocarbon compounds. The lack of specificity in
certain immunoassay tests has the potential to confuse the user, especially if high concentrations
or a large diversity of native compounds are present in a soil. This would include soils that are
Draft C-18 February 1996
-------
Geoaatistical Soil Sampling Guidance-Appendix C
characterized by high concentrations of organic materials such as organic acids and other natural
decomposition compounds.

To minimize the amount of error introduced during sample preparation when using an
immunoassay field test kit, the following guidelines should be observed. The temperature of the
soil used in the test must be maintained within a range defined by the manufacturer of the
immunoassay kits. Color development time is both time and temperature dependent and the entire
assay procedure must be performed at one temperature. Typical working ranges are between
15°C and 30°C. Chemical reagents supplied in the test kits must not be exposed to temperature
extremes while in storage or during shipping of the test kits to the field locations. Also, reagents
supplied in the test kits have a finite shelf-life. Use after the manufacturer's recommended dates
must be avoided. Do not mix reagents from different production lots. Liquid reagents should be
dispensed onto the side of the vessel wall, rather than dropping the solution into the middle of the
well. Ensure that the dispenser tips on each reagent bottle do not contact any other solutions or
any surfaces on the test kit.

Once the test is initiated, perform the test without delay, without stopping between steps.
This is especially important when testing for VOCs.

Onsite Mobile Laboratories
To achieve laboratory quality data using the same methods used in environmental testing and
analysis labs, remediation and testing laboratories have developed mobile laboratories to be
brought onsite to the project location. These mobile labs conduct full-scale analytical
determinations, eliminating delays experienced in shipping samples to laboratories, breakage of
samples during shipment, and the need to compete for instrument time with other projects at the
laboratory facility.

Much of the instrumentation used in a static laboratory is sensitive to changes in environment
(temperature and atmospheric pressure). Compressed gasses and a source of stable electrical
Dnffi C-19 'Febnayl996
-------
Geottatistical Soil Sampling Guidance-Appendix C
power are required. These variables make the operation of a mobile lab facility very challenging.
Of course, costs are high due to dedicated instrumentation and personnel.

Most of the newer technologies that are commercially available are focused on the sample
preparation aspect rather than instrumental analysis. For instance, more rapid sample preparation
is accomplished using solid phase microextraction (SPME), solid phase extraction (SPE),
accelerated solvent extraction (ASE), supercritical fluid extraction (SFE), microwave digestions
(for metals sample preparation), and microwave-assisted extractions for organic sample
preparations. Each of these techniques reduces solvent usage, eliminating time consuming
concentration procedures that often increase health and safety hazards.

As these techniques are developed, they improve the logistics of mobile laboratory
operations. Laboratory-grade instrumentation can be used onsite, while sample preparation times
are significantly reduced along with the inherent hazards of sample preparation using flammable
solvents. . .

REFERENCES ON ANALYTICAL METHODS AND FIELD SCREENING
1. Cincotta, JJ. 1994. "Mobile Laboratory Advances." Environmental Testing and Analysis.
September/October, p. 20.
Adaptation of lab methods to mobile lab environments are explored in this brief, overview
.article.
2. Dohmann, L, 1993. "Screening Soil Samples in the Field Using a Rapid Immunoassay Test
Kit." American Environmental Laboratory. August Vol. 5, No. 5, p. 31.
The use of a commercially available field screen kit for PCB determinations is discussed and
includes procedures for conducting the test and interpretation of the results.
3. Friedman, David. 1993/1994. The Power of Environmental Screening Tests.'
Environmental Lab. December/January, p. 16.
This brief overview, article discusses how to use screening test effectively and defines a
screening test.
4. Gy, Pierre M. 1994. "Sampling - Diving into the Unknown." LC-GC. November.
Vol. 12, No. 11. p. 808.
Draft C-20 February1996
-------
Geostatistieal Soil Sampling Guidance—Appendix C
A detailed discussion of sampling practices and techniques are presented from the chemist's
viewpoint. Variables associated with proper sampling, introduction of error, reproducibility,
and selection of a subsample contained within a collection jar for analysis are explored.

5. Hartigen, J. Christopher. 1993/1994. "Screening for PCBs." Environmental Lab.
December/January, p. 18.

Case study of a PCB field screen application that compares Method 8080 (GC/ECD) with
the use of a commercially available immunoassay test kit.

6. Lesnik, Barry. 1993/1994. "SW-846: The Current Status." Environmental Lab.
December/January, p. 46.

Provides updated information regarding the RCRA Organic Methods Development Program
and status of Methods included in SW-846 Updates 1, 2, & 3.

7. Murray, Kent S. 1994. The TRPH Controversy." American Environmental Laboratory.
October, p. 8.

A discussion regarding the various nomenclature and terminology used to describe petroleum
hydrocarbon analysis and the methodology currently available for this type of analytical
determination.

8. Parsons, A., and A. Weiss. 1994. "Quantitative Analysis of PCBs in Soil Using
Immunoassay and a Low-Cost Programmable Spectrophotometer." American Environmental
Laboratory. February. Vol. 6, No. 1. p. 10.

Field quantitation of PCBs was performed using a commercially available immunoassay test
kit The article discwre procedures for using the test kit and compares the results obtained
using the CLP Method (GC/ECD) with the use of the immunoassay test kit.

9. Schneider, J.F., J.D. Taylor, D.A., Bass, S.D. Zellmer, and M Rieck. 1994. "Evaluation
of a Field Portable X-ray Fluorescence Spectrometer for the Determination of Lead
Contamination in Soil." American Environmental Laboratory. November/December. Vol.
6, No. 10. p. 35.

Summaries a study mat utilized a portable XRF unit with comparison to data from laboratory
analyses using ICP instrumentation.

10. Applicable articles and poster presentations from the proceedings of the Field Screening
Methods for Hazardous Wastes and Toxic Chemicals, Second International Symposium.

11. Bernick, M., D. Idler, L. Kaelin, D. Miller, J. Patel, G. Prince, and M Springer. An
Evaluation of Field Portable XRF Soil Preparation Methods, p. 603.
Drqfi C-21 February 1996
-------
Geostatisticd Soil Sampling Guidance-Appendix C
A portable XRF unit was used to measure metals during a post remedial site lead survey.
Field procedures were reviewed. Results were statistically compared to laboratory splits.

12. Chamerlik-Cooper, M., R. Carlson, and R. Harrison. Determination of PCBs by Enzyme
Immunoassay. pg 625.

Use of immunoassay techniques were investigated and cross-reactivity potential was
measured. Test sensitivity and spike recoveries were also quantified.

13. Duquette, P., P. Guire, M. Swanson, M. Hamilton, S. Chudzik, and R. Chappa. Fieldable
Enzyme, Immunoassay Kits for Drugs and Environmental Chemicals, p. 633.

Pentachlorophenol (PCP) in soils and sediments were tested using immunoassay
technologies. The application of immunoassay was investigated for its applicability to this
environmental pollutant.

14. Cole in, W.H., L.A. Eccles, W.H. Engelmann, R.E. Enwall, G.A. Raab, and C.A.
Kuharic. Rapid Assessment of Superjund Sites for Hazardous Materials with X-Ray
Fluorescence Spearometry. p. 497.

Use of XRF is discussed, including how to properly calibrate the instrument, and which QA
procedures should be implemented, in order to optimize the quality of the screening data
used to identify areas contaminated with inorganics.

15. Gabry, J. Comparisonof Mobile Laboratory XRF and CLP Split Sample Lead Results from
a Superjund Site Remediation in New Jersey, p. 671.

A mobile laboratory XRF spectrophotometer was used to determine soil lead concentrations
during a Superfund site remediation. XRF results were compared to AA and/or ICP
analyses using the CLP analysis protocol. Interferences to the XRF analyses were
investigated and discussed.

16. Jenkins, T., K. Lang, M. Stutz, and M. Walsh. Development of Field Screening Methods
for TNT and RDX in Soil and Groundwater. p. 683.

Sample preparation and field determinations using a spectrophotometric technique to measure
absorbance of the sample extract are presented.

17. Piorek, S., and J. Pasmore. A Si/Li Based High Resolution Portable X-Ray Analyzer for
Field Screening of Hazardous Waste, p. 737.

The technical specifications of a commercial, field portable XRF analyzer are presented and
its attributes and limitations are discussed.

18. Riddell, A., A. Hafferty, and T.Yerian. Field Analytical Support Project (FASP)
Development of High-Performance Liquid Chromatography (HPLC) Techniques for On-Site

Drqft C-22 February 1996
-------
Geostatistical Soil Sampling Guidance—Appendix C
Analysis ofPotycyclic Aromatic Hydrocarbons (PAHs) at Pre-remedial Superjund Sites, p.
751.

The use of laboratory HPLC equipment transported to the field is explored. Application of
HPLC to analysis of PAHs was performed.

19. Theis, T., A. Collins, P. Monsour, S. Pavlostathis, and C. Theis. Analysis of Total
Pofyaromatic Hydrocarbons using Ultraviolet-Fluorescence Spectrometry. p. 805.

PAHs are one of a number of compounds that emit a portion of incident UV radiation at
longer wavelengths. The article discusses extraction techniques and sources of errors in
measuring total PAH by UV-fluorescence.

20. Vo-Dinh, T., G.H. Miller, A. Pal, W. Watts, and M. Uriel. Rapid Screening Technique
for Pofychlorinated Biphenyls (PCBs) Using Room Temperature Phosphorescence, p. 819.

The use of a spectrofluorimeter equipped with a phosphoroscope is examined for easy, rapid
and low cost field screening for PCBs.

21. Wohltjen, H., N.L. Jarvis, and J. Lint A New Approach for On-Site Monitoring of Organic
Vapors at Low PPB Levels, p. 829.

The poster presentation summarizes the development of a small, lightweight, portable gas
chromatograph that uses ambient air as a carrier gas. The GC was used to measure organic
vapors including aromatic hydrocarbons.

22. Wylie, D.E., L.D. Carlson, R. Carlson, F.W. Wagner, and S.M. Schuster. Detection of
Mercuric Ion in Water with a Mercury-Specific Antibody, p. 845.

Early work to develop an immunoassay test specific for an inorganic compound is
investigated. Mercuric ion is targeted as a determinative method for establishing the
presence of mercury.
Draft C-23 February 1996
-------
Geostatistual Soil Sampling Guidance—Appendix C
C.2 OPTIMIZING ANALYTICAL METHOD SELECTION

The selection of the appropriate analytical methods involves more than simply finding the
compounds of concern listed in the method description. One of the most commonly encountered
issues involves that of method sensitivity. As noted in the sections below, sensitivity is often
expressed in terms of "detection limits" which vary by method and across laboratories. Prior to
reviewing the discussion of detection limits below, one must determine the purpose of a given
analysis, and in particular, if there is a numerical concentration value that will drive the decision-
making process. For instance, an analysis may seek to demonstrate that a sample or samples
contain contaminant X at greater than 50 ppb. In this instance, the selection of the method must
enable one to measure the concentration of X at a level at least as low as 50 ppb, however one
need not choose a method that is capable of measuring concentrations down to 1 ppb.

One also needs to consider the precision and accuracy needed in the final results relative to
the decision to be made. Thus, if the decision is whether two samples containing contaminant X
are statistically different, the user may need to choose between methods on the basis of both
sensitivity and precision. A method with greater precision will allow one to distinguish between
the samples with greater certainty.

In addition, one needs to address the possibility of compositing of samples. Compositing
can save analytical costs under some circumstances. However, the effects of compositing need
to be considered in establishing the needed sensitivity, since each sample added to a composite
dilutes the sensitivity of the method to the contents of any one sample. The effects of compositing
on sensitivity are described in the section on bright-line values versus quantitation limits.

Discussion of Detection Limits as They Relate to Method Sensitivity
A basic consideration in the design of any environmental sampling and analysis program is
the sensitivity of the overall program relative to the compounds of interest. In most applications,
sensitivity is equated with "detection limit." However, as can be seen from a review of
regulations and guidance documents across many USEPA programs and materials from outside

Drqfi C-24 February 1996
-------
Geostatistical Soil Sampling Guidance-Appendix C
sources, the concepts and tenninology encompassed by "detection limit" vary widely. The
purpose of the following discussion is to clarify the various concepts and terms that one may
encounter, and to focus considerations of sensitivity on those terms specific to the RCRA
program.

An important distinction that must be made in any discussion of sensitivity is the difference
between detecting a substance and providing a quantitative estimate of the concentration of that
substance in a specific matrix. Unfortunately, the distinction between "detection limits" and
"quantitation limits" is often overlooked. At its simplest, "detection" means the ability to specify
the presence or absence of a substance. While presence or absence (+/-) may still be acceptable
for reporting the results from some traditional bacteriological measurements, the majority of
environmental measurements made today in response to Federal or state regulations require a
numerical estimate of the concentration of the substance under consideration. As a result, various
USEPA programs have developed definitions of "detection limits" and "quantitation limits" that
are used in those programs. These "limits" are expressed as numerical values, and the
differentiation between "detection* and "quantitation" has become a function of the degree of
inherent uncertainty in the limit As typically defined, "detection limits" have a greater associated
uncertainty than "quantitation limits." The following are commonly employed terms. The source
of the term is indicated in parentheses:
Method Detection Limit (Office of Water, Office of Solid Waste, Office of Ground
Water and Drinking Water)
k
Estimated Quantitation Limit (Office of Solid Waste)
Practical Quantitation Limit (Office of Solid Waste)
Practical Quantitation Limit (Office of Ground Water and Drinking Water
Instrument Detection Limit (Off ice of Solid Waste, Contract Laboratory Program)
Reporting Limit (Office of Solid Waste)
Regulatory Threshold (Office of Solid Waste)

C-25 February 1996
-------
Gtostatistieal Soil Sampling Guidance—Appendix C
• Contract-Required Detection Limit and Contract-Required Quantitation Limit (Contract
Laboratory Program)
• Limit of Detection and Limit of Quantitation (American Chemical Society).

The following sections describe these terms, focusing first on the definitions used by the RCRA
program.

In addition, in recent years, concerned members of the regulated community have funded
a series of studies evaluating alternative means of expressing detection and quantitation limits.
The stated goals of these studies are to employ more rigorous statistical methods to the derivation
of such limits, based on an examination of the instrument calibration data from multiple
laboratories. Dr. R.D. Gibbons of the University of Chicago has authored several papers on this
topic and his conclusions are summarized following the discussion of the terms listed above.

Method Detection Limit
The Method Detection Limit (MDL) is defined in the RCRA program as:
the minimum concentration of an anatyte that can be measured and reported with 99%
confidence that the anatyte concentration is greater than zero as determined by a
specific laboratory method

This definition was first used by the USEPA Office of Water, and published in 40 CFR Part 136.
It was later adopted by the Office of Solid Waste with minor modifications and described in
Chapter One of SW-846. As defined by either Office, the MDL may be determined in reagent
(blank) water, since reagent water is available to all laboratories. The MDL may also be
determined in the matrix of interest, provided that the analyte is present or has been added at the
appropriate concentration.

Under this definition, the MDL is a single-laboratory concept, and represents a snapshot of
the analytical capabilities of that laboratory. The definition in 40 CFR Part 136 specifies a
procedure for the determination of the MDL that involves the preparation and analysis of at least
seven aliquots of reagent water or the matrix of interest in a six-step process. Those steps are:
Drqft C-26 February 1996
-------
Gtostatisticd Soil Sampling Guidance—Appendix C
• Step 1: Make an initial estimate of the detection limit.
• Step 2: Prepare a laboratory standard in the specific matrix that contains an
acceptable concentration of analyte.
• Step 3: Take a minimum of seven aliquots of the sample and process these sample
aliquots through the entire analytical method, recording the measurements.
• Step 4: Calculate the variance and the standard deviation of the replicate
measurements.
• Step 5: Calculate the MDL as the standard deviation times the Student1 s t value.
• Step 6: Verify the reasonableness of the estimated detection limit and the
subsequent MDL determination.
In adopting the MDL, the Office of Solid Waste permitted it to be calculated from as few as four
replicate measurements, however, employing the same statistical calculations. The net effect is
that for fewer replicates, the Student's r value used in Step 5 increases with decreasing numbers
of replicates, and usually results in a larger MDL value than would be calculated from seven
replicates. The six steps are described below:
Step 1 involves making an initial estimate of the detection limit using the laboratory's best
judgement. One of the four following techniques set forth in Appendix B to 40 CFR Part 136
must be used:

• Determine the concentration value mat corresponds to an instrument signal/noise in the
range of 2.5 to 5
• Determine the concentration equivalent to three times the standard deviation of replicate
instrumental measurements of the analyte in reagent (blank) water
• locate the region of the standard curve where there is a significant change in sensitivity
(i.e., a break in the slope of the standard curve)
• Utilize the instrumental limitations.
In Step 2, the laboratory prepares a "standard' containing the analyte. This standard may
be prepared from reagent water, from a clean solid matrix such as sand, or from an actual waste

Draft C-27 February 1996
-------
Gtostatistieal Soil Sampling Guidance—Appendix C
sample matrix. If a waste sample matrix is to be employed, a waste sample should be obtained
and analyzed to determine if the analyte is in the recommended range of one to five times the
estimated detection limit. If so, it can be used for developing an MDL. If not, and the measured
level of analyte is less than this recommended range, a known amount of analyte should be added
to bring the level of analyte to the recommended range of between one and five times the
estimated detection limit On the other hand, if the measured level of analyte is greater than five
times the estimated detection limit, there are two options. The first is to obtain, if possible,
another waste sample with a lower level of analyte. The other is to use the sample as is, as long
as the analyte level does not exceed ten times the USEPA-specified MDL.

Step 3 involves analyzing a minimum of seven aliquots of the prepared standard. This is the
heart of the MDL development process. Strict adherence to procedures and well-kept records are
essential.
If a blank measurement is required by the analytical method to calculate the measured level
of analyte (e.g., for a spectrophotometric method involving a color change), then a separate blank
is analyzed for each sample aliquot. Each blank measurement should be subtracted from the
respective sample measurement to obtain the final result These blank-subtracted results are then
used to calculate the MDL. (NOTE: This blank subtraction only applies to those methods that
explicitly require it It does not apply to those methods that require the preparation of a method
blank concurrent with the sample preparation, i.e., it does not apply to the 6000, 7000, or 8000
series SW-846 methods).
Before performing all seven analyses, it may be economically and technically desirable to
evaluate the estimated MDL before analyzing all seven aliquots. To ensure that the MDL estimate
is appropriate, it is necessary to determine that a lower concentration of analyte will not result in
a significantly lower MDL. To do this, two aliquots of the waste sample should be processed
through the entire method, including the blank measurements described above. If the
measurements indicate that the sample is in a desirable range for determination of the MDL, the
five additional aliquots can be analyzed and all seven measurements can then be used to calculate
the MDL. If the measurements indicate the sample is not in the correct range, the MDL should
Drqft C-28 February 1996
-------
Geostatistical Soil Sampling Guidance—Appendix C
be re-estimated, a new waste sample obtained as described in Step 2, and Step 3 should be
repeated.

In Step 4, the variance (S2) and the standard deviation (S) of the replicate measurements
taken in Step 3 are calculated as follows:
1*1
/l-l
and
S =
where X, ... X, are the analytical results in the final method reporting units obtained from the n
sample aliquots and £ is the sum of the values X, to X,. Note that the calculation of S1 uses n-1
degrees of freedom.
Step 5 involves the calculation of the MDL using the equation:
where Vi.i—oi99> i* ** Students' / value appropriate for a 99% confidence level and n-1 degrees
of freedom. -

Step 6 is the iterative procedure in 40 CFR Part 136 that should be used to verify the
reasonableness of the MDL. It is also the step that is most often overlooked. The calculated
MDL value is compared to the estimated detection limit determined in Step 1. If the spiking level
used was not within a factor of five of the calculated MDL, then the calculated MDL value is not
valid. Samples spiked at the very high concentration may result in an MDL value that is not
representative of the capabilities of the analytical procedure at the lower limit of its sensitivity.
Draft
C-29
February 1996
-------
Geottatistical Soil Sampling Guidance—Appendix C
The procedure in 40 CFR Part 136 also describes how to calculate a 95% upper and lower
confidence limits for the MDL. Based on seven replicate analyses, the lower confidence level is
calculated as LCL = 0.64 x MDL and the upper confidence limit is UCL = 2.20 x MDL. If
fewer replicates are analyzed, the multipliers used here will change (see 40 CFR Part 136). The
limits are rarely reported by the laboratory and their utility is subject to debate.

As defined by both the RCRA and Clean Water Act (CWA) programs, the MDL is a single-
laboratory measurement. Therefore, one should expect that the exact MDL for an analyte will
vary from laboratory to laboratory, even when the same analytical method is employed. Further,
the MDL will also vary within a single laboratory when determined repeatedly over time.
However, the within-laboratory variability should be relatively small.

Estimated Quantitation Limit
By its very definition, the MDL should provide only a 1% risk of a false positive
determination, that is, claiming that the substance is present when in reality, it is not However,
at the MDL, there is a 50% chance that a false negative determination will be made, saying that
the substance is not present when it is. Given this relative lack of certainty (i.e., the 50% chance
of a false negative) measurements reported at or near the MDL are not considered particularly
reliable by many parties, especially for monitoring compliance with a regulatory limit. Therefore,
various USEPA Offices have applied the concept of a "quantitation limit," the concentration at
which one has reasonable certainty that the reported concentration is correct, and which provides
adequate protection against false negative determinations.

The Office of Solid Waste currently uses the Estimated Quantitation Limit (EQL) as a means
of providing guidance to laboratories and RCRA-regulated facilities on the ability of the SW-846
methods to provide results with acceptable certainty. Beginning with the 2nd Update to the 3rd
Edition, many SW-846 methods provide EQLs for guidance. EQLs are usually provided for a
clean matrix such as reagent water and may be provided for other matrices of interest as well.
EQLs for solid matrices are sometimes expressed as a multiplier times the aqueous EQL. This
multiplier accounts for differences in sample sizes between matrices. EQLs are also usually

Draft C-30 ' February 1996
-------
GtpaatisticalSoU Sampling Guidance—Appendix C
expressed on a wet weight or "as received" basis, and do not account for the reporting of dry
weight results.

By their nature, EQLs are also method- and analyte-specific. The EQL value for an analyte
by a specific method will always be greater than its associated MDL for the same analytical
method. As noted in Chapter One of SW-846, the EQL represents:

The lowest concentration that can be reliably achieved within specified limits of
precision and accuracy during routine laboratory operations. The EQL is generally
5 to 10 times the MDL However, it may be nominally chosen within these guidelines
to simplify data reporting. For many anafytes the EQL analyte concentration is
selected as the lowest non-zero standard in die calibration curve....

As will be seen, this definition of the EQL includes aspects of several other terms used by
USEPA, including the PQL from Drinking Water and the CRQL from the Contract Laboratory
Program. However, in practice, the derivations of EQL values are less clear than in these other
programs.

Practical QuantitaUon Limit - OSW
The Practical Quantitation Limit (PQL) was an earlier attempt to provide guidance in SW-
846, and was used prior to the 3rd Edition and in many methods in the 1st Update to the 3rd
Edition. The idea behind the PQL was to provide a limit higher than the MDL that would
represent what could be practically achieved in a commercial environmental testing laboratory.
The term was abandoned by OSW after the 1st Update for several reasons, including the fact that
the Office of Ground Water and Drinking Water employed the same name for a different concept.

In practice, the PQL values listed in many older SW-846 methods were drawn from similar
methods in USEPA Programs such as the Contract Laboratory Program (CLP), and therefore,
were based on the definition of the Contract-Required Quantitation Limit (CRQL) discussed later
in this document. Thus, it was quite common to see PQLs of 10 pg/L for aqueous samples and
330 /ig/Kg for solid samples. These values were drawn from the CLP semivolatile organic
methods, where a 1-L aqueous samples or a 30-g soil sample was extracted with an organic

Drift C-31 February 1996
-------
Gtostatistieal Soil Sampling Guidance-Appendix C
solvent. However, in other SW-846 methods, the derivation of the PQL values was far from
clear. In some cases, it appeared that the values were simply copied from method to method
without regard for critical differences in sample size, final extract volumes, and calibration
standards.

As noted above, the PQL was abandoned by OSW and replaced with the EQL. However,
if an SW-846 method has not been revised since the 2nd Edition, it may still contain PQL values.

Practical Quantftation Limit • Drinking Water
Currently, the Office of Ground Water and Drinking Water (OGWDW) calculates PQLs by
two different techniques. The first is used in the absence of adequate performance data. The
second is based on data obtained through performance evaluation (PE) studies and interlaboratory
studies.

If sufficient PE data are not available or if the PE data are not available at the concentration
of regulatory interest, the PQL may be based on single laboratory or "interlaboratory MDLs."
The OGWDW is one of the few USEPA Offices to use the concept of an "interlaboratory MDL."
The interlaboratory MDL is essentially an MDL calculated from data from MDL studies
conducted at a number of laboratories, and is sometimes referred to as a "pooled MDL." It is the
standard deviation of a series of replicate MDL measurements made in more than one laboratory
times the Student's t value for the total number of replicates. While the increased number of
replicates should yield greater statistical certainty than a single-laboratory MDL employing only
seven replicates, the fact that the different laboratories each prepare the replicates, often at
noticeably different concentrations (based on their individual estimates of method sensitivity),
results in a higher MDL value. While this is viewed by the regulated community as accounting
for interlaboratory variability, it often simply lumps the "good" laboratories in with the "not so
good" laboratories, and tells the data user little about the capabilities of a single laboratory or the
method.
Draft C-32 February 1996
-------
Geostatistical Soil Sampling Guidance—Aooendix C
In the absence of PE data, the PQL is usually set at 10 times the MDL. OGWDW believes
that setting the PQL at 10 times the MDL achieved by good laboratories is a fair expectation
during routine operation of most qualified state and commercial laboratories. If the situation
warrants, for instance, for a cancer-causing substance, a PQL of 5 times the MDL may be
considered.

The following procedure is currently being used by the OGWDW to calculate PQLs from
PEdata;
1. Regression equations for precision and accuracy are generated using the USEPA and
State laboratory data for each contaminant
2. The percent recovery and relative standard deviation are calculated at the proposed
regulatory limit using the regression equations generated from the data for each
contaminant The percent recovery and relative standard deviation are used to estimate
the 95% confidence limits.
3. The laboratory data are evaluated to determine the range as a percent of the true value.
4. The PQLs are set at a concentration where at least 75% of USEPA and State
laboratories are within the specified acceptance range.

Unfortunately, it is not always possible to determine whether a drinking water PQL was
determined from PE or as a simple multiple of the MDL without reading the text of the Safe
Drinking Water Act regulation under which it was developed.

By their very nature, PQL values from drinking water methods should never be applied
outside of OGWDW Programs because drinking water represents a much cleaner sample matrix
than virtually any matrix encountered under RCRA or even CWA. Further, because there are
often significant differences in the analytical methods themselves, including sample sizes, the
OGWDW PQLs should not even be applied to RCRA ground water monitoring results without
a thorough comparison of the methods employed.
Draft C-33 February 1996
-------
Geostatistual Soil Sampling Guidance—Appendix C
Instrument Detection Limit
USEPA has recognized that the overall sensitivity and reliability of an environmental
measurement are dependent on a number of factors, including those related to sample collection,
sample preparation, and analysis. In an attempt to separate the analytical issues from those related
to sample collection and handling, various USEPA Offices have narrowed the MDL concept to
address the Instrument Detection Limit (IDL).

Whereas the procedure for determining the MDL involves the preparation and analysis of
replicate samples of reagent water or other matrix, the IDL procedure involves only replicate
measurements of an analytical standard containing the substance. The standard is prepared in a
solution that can be directly introduced into the instrument and that resembles the final solution
prepared from an environmental sample. Thus, the variability due to the sample collection,
handling, and preparation steps has been eliminated in die IDL. IDL values are then calculated
from die standard deviation of at least seven replicate measurements of the standard, and represent
the capability of the instrument itself to measure the substance. The IDL is a snapshot of
instrument capability, and will vary with time, between instruments, and between laboratories.
The IDL concept is not explicitly described in 40 CFR Part 136, but has been included in Contract
Laboratory Program documentation for inorganic analyses.
• . •
By its derivation, the IDL for an analyte should always be less than the associated MDL
value for the same instrumental procedure. In practice, the IDL is often calculated for inorganic
analytes such as metals. It is rarely calculated for organic analytes, in part, because instrumental
limitations are rarely the controlling factor in establishing sensitivity for the organic methods.
Rather, the level of chemical interferences co-extracted from a sample will often control the
sensitivity of techniques such as GC or GC/MS. Thus, the IDL for an organic analyte would
provide an overly optimistic and rarely attained estimate of the sensitivity of the technique. Given
the cost of generating IDL values for organic analytes, the IDL is generally considered of minimal
utility for organic contaminants.
Draft C-34 February 1996
-------
Geoaatiaieal Soil Sampling Guidance-Appendix C
Reporting Limit and Regulatory Threshold
A reporting limit is usually a numerical value specified in a regulatory permit issued under
RCRA (or other authority) that specifies the cut-off point for reporting analytical results. Values
below a reporting limit may be considered too uncertain or not of sufficient consequence to be
used for compliance monitoring purposes. Thus, a permit may stipulate that values below a
specified reporting limit be reported as zero. The reporting limit may be driven by considerations
of risk assessment or a regulatory threshold or action limit

A regulatory threshold is usually an amount or concentration of a substance that must be
exceeded to trigger some action by USEPA or other regulatory authority. As implied by its name,
the threshold is set forth in a specific regulation, or under the auspices of a specific regulatory
authority.

The important aspect of reporting limits or regulatory thresholds to consider is that in order
to demonstrate compliance with such as limit, the overall sampling and analysis program must be
designed to achieve a sensitivity lower than such limits. While this may seem obvious, it is an
often overlooked part of the overall Data Quality Objective (DQO) process. Conversely, another
common mistake is to specify sensitivity that is much lower than the reporting limit or regulatory
threshold, which complicates the analytical process by gearing the analysis to very low
concentrations that are not expected in the samples. For instance, why use a method capable of
quantitating 0.1 ug/L of a compound when the regulatory threshold is set at 100 ug/L?

Contract-Required Detection Limit and Contract-Required QuantRation Limit
In response to burgeoning analytical demands placed on USEPA under the Comprehensive
Environmental Response, Cleanup, and Liability Act (CERCLA) of 1980, USEPA established the
Contract Laboratory Program (CLP) to provide for routine analytical services under the Superfund
Program. The terms Contract-Required Detection Limit (CRDL) and Contract-Required
Quantitation Limit (CRQL) were establish by the CLP in its various Statements of Work (SOWs).
The CRDLs are specified in the CLP inorganic SOWs and the CRQLs are specified in the organic

Draft C-35 February 1996
-------
Gcottatistical Soil Sampling Guidance-Appendix C
SOWs. While the methods employed by the CLP bear a great number of similarities to some SW-
846 methods and methods from other USEPA Programs, the SOWs represent the basis of a
contract between a laboratory and USEPA. As such, the laboratory is required by its contract to
meet certain levels of sensitivity.

In 1986 the CLP organic contracts recognized that the CRDL was actually a quantitation
limit. The CLP formally defined the CRQL as:

...the concentration in the sample equivalent to the concentration of the lowest
calibration standard analyzed for each anafyte.

Since die organic SOW specifies the concentrations of the three to five initial calibration standards
analyzed by each of the procedures, specifies the sample volume or weight extracted, and the final
extract volume and injection volume, it is possible to set a value for each analyte in each matrix
that can be related to die quantitative ability of the method demonstrated during the initial
calibration. The "cost" of this ability is the relative inflexibility inherent in the CLP methods.
However, there is much to be said for the resulting standardization of reporting requirements.
The CLP SOWs provide CRDLs and CRQLs for both aqueous and soil samples. The CRQLs for
organics may also be divided into low and medium values for the soil samples, as two different
sample sizes may be employed, depending on the expected concentration.

Unfortunately, few USEPA methods outside of the CLP actually specify the concentrations
of the calibration standards, or set the low point concentration of the initial calibration. These
other methods may stipulate setting the low point standard at or near the MDL, but given the
relative latitude for calibration in these methods and the lack of specific requirements, it is rarely
possible to derive a method-specific measure of sensitivity that is equivalent to the CRQL or
CRDL. In addition, since the CLP methods are geared towards the range of concentrations
typically of interest at Superfund sites, they do not approach the limits of method sensitivity.
Rather, they rely on the inherent standardization of the analyses to allow comparable results to be
generated among different CLP laboratories.
Drift C-36 February 1996
-------
Gtoaatiaical Soil Sampling Guidance-Appendix C
Despite their name, the CRDL values for inorganics are actually quantitation limits as well.
The CLP inorganic SOWs require that a standard be analyzed at the concentration equivalent to
the CRDL, and in fact, require the analysis of another standard at half the CRDL.

For solid samples, both the CRDL and the CRQL are based on the wet weight of a sample
as received by the laboratory. As a result, these values must be adjusted for the dry weight of
solid samples and other dilutions of sample extracts or digestates. Both these adjustments will
increase the CRQL or CRDL for the specific sample.

In addition, while the CRDL and CRQL function as standardized reporting levels, the
organic SOW encourages the laboratories to report values below the CRQL, but to flag these
results with a contract-specific data qualifier, the T flag, which indicates that the value is an
estimate. These estimated concentration probably have a significantly greater uncertainty
associated with them then values above the CRQL.

The CLP inorganic SOW requires that the laboratories determine and report their IDL values
quarterly, using the procedure described earlier. The requirement for organic IDL reporting was
dropped in 1986, due to its lack of utility and cost

As noted earlier, the PQL values for many older SW-846 methods appear to have been
copied directly from the CLP CRQLs and CRDLs. While the CLP values cannot be directly
related to most other methods, they can be a useful indication of what can be routinely achieved
in a capable laboratory, assuming mat one compares critical method considerations such as sample
size, extract volume, and injection volume.

Limit of Detection and Limit of Quantitation
The American Chemical Society has a standing committee on environmental improvement
that has published guidelines for data acquisition and data quality evaluation in environmental
chemistry. In a 1980 publication, the ACS committee put forth two terms related to sensitivity,
the Limit of Detection (LOD) and the Limit of Quantitation (LOQ). These terms were included

Droft C-37 Febnuvyl996
-------
Gtostatistieal Soil Sampling Guidance—Appendix C
in an updated discussion around 1988. As a result, the terms LOD and LOQ are frequently
encountered in any review of method sensitivity.

As defined by ACS, the LOD is very similar to the MDL defined by USEPA. While the
MDL is defined as a concentration that can be distinguished from zero with 99 percent certainty,
the LOD is defined as the concentration that can be distinguished from a blank. For methods that
involve a blank subtraction, this difference is significant; however, if one is evaluating most
USEPA methods for organic* and inorganics, no blank subtraction is performed during routine
analysis, and the LOD and the MDL can be considered closely related. The ACS procedure for
determining the LOD also employs replicate measurements and recommends that the standard
deviation of the replicates be multiplied by a factor of 3. This multiplier is close to the Student's
t-value of 3.143 for seven replicates specified in USEPA's MDL procedure. In order to provide
greater quantitative certainty, the ACS recommended that the LOQ be calculated as ten times the
standard deviation, or 3.3 times the LOD.

In addition, the ACS and USEPA engaged in joint discussions in 1992 in an effort to resolve
the differences between their respective approaches and in order to address the concerns of the
regulated community about compliance monitoring decisions. The results of these efforts were
to be a Federal Register proposal of two new terms, the Reliable Detection Level (RDL) and the
Reliable Quantitation Level (RQL). The word "limit" was changed to "level* in an attempt to
make a distinction between the operationally-defined measure of sensitivity and the impression that
a "limit" was some fundamental restriction. The RDL and RQL included the ability to specify
a levd of confidence in the measurement other man the 99% certainty specified in the definition
of the MDL (for instance, 90% or 95%). However, the resulting approach was simply another
choice of a multiplier for a standard deviation derived from replicate analyses of spiked reagent
water. The conceptual problems remained, despite a new pair of terms, and no proposal has been
put forth by USEPA as of 1995.
Draft C-38 February 1996
-------
Geostatistical Soil Sampling Guidance-Appendix C
Gibbons Approach
As noted earlier, some members of the regulated community have voiced concerns about the
adequacy of the MDL with respect to compliance monitoring. These concerns have included the
use of MDL values in both the RCRA and NPDES programs. In studies sponsored by the
regulated community, Dr. R.D. Gibbons of the University of Chicago has developed an
alternative approach to estimating detection and quantitation limits based on statistical evaluations
of the calibration data from multiple laboratories. Dr. Gibbons has published these results and
the regulated community has presented them to USEPA for consideration.

While the statistical derivation of the limits is beyond the scope of this document, Gibbons'
basic approach is summarized here: Routine calibration data for a given method are collected
from multiple laboratories (e.g., 10 or more), verified, and screened for gross outliers (defined
as measured concentrations that are not within an order of magnitude of the "true" value of the
standard). The data are then used to generate a detection limit estimator that is designed to
balance both false positive and false negative results at 1% each. This estimator is also designed
to determine the value above which 99% of all future measurements can be statistically
distinguished from zero. This is in sharp contrast to the approach embodied in the USEPA MDL
procedure. As noted in the discussion of the USEPA MDL, that value represents a 1 % risk of
false positives and a 50% risk of false negative results. It is also designed to address the
concentration above which a single future measurement can be distinguished from zero.

Gibbons notes that the current USEPA MDL procedure will produce different results
depending on the spiking level employed by the laboratory. While true, his concern ignores the
iterative evaluation procedure (described by USEPA in the Part 136 method calculating the MDL),
and admittedly, sometimes ignored by the laboratory. According to Gibbons, his approach has
the advantage of linking the MDL estimator to the concentrations used in the instrument
calibration, rather then to an unspecified spiking concentration. Gibbons has also noted that
USEPA's equation for calculating the MDL (see 40 CFR Part 136 and SW-846) is not a
confidence interval or any other usual statistical estimate.
Draft C-39 ' February 1996
-------
Gtostatistieal Soil Sampling Guidance—Appendix C
Gibbons also describes the use of inter-laboratory data to derive a PQL estimator. This
estimator represents the concentration of an analyte at which the precision of the replicate
calibration data Ms within a desired range. Thus, Gibbons believes that it represents not only
a "quantitative" value, but one at which the uncertainty can be specified. For instance, Gibbons
suggests a relative standard deviation (RSD) of either 10% or 20%, depending on the desired
accuracy. In general, his PQL estimators for a 20% RSD are 3-4 times lower than those derived
for a 10% RSD. He also derives an upper confidence limit for the PQL.

The net result of the involved statistical manipulations are detection and quantitation limits
that are 5-10 times those that might be derived using the current USEPA MDL approach.
Gibbons argues that his quantitation limits ought to be considered when establishing compliance
monitoring levels.

USEPA has not completed formal consideration of the Gibbons approach, but a number of
concerns have been raised. These include the fact that the approach relies on calibration data from
multiple laboratories. Given the differences in instrumentation and specific procedures employed
across commercial laboratories, USEPA expects that difficulties may arise in pooling such data.
For instance, a laboratory using Method 8260 to monitor volatile organic contaminants in ground
water is likely to calibrate their instrument for a lower range of concentrations and perhaps use
t.
a larger purge volume than a laboratory analyzing waste samples. Thus, while both laboratories
can be said to be using the same basic method, it is not clear that it would be appropriate to pool
the calibration data for both instruments to determine a single MDL or PQL estimator. While this
situation could be addressed using the Gibbons approach, it is not clear that it would be.

It is also not entirely clear that the balance of false positive and false negative rates both need
to be held at the 1 % level. USEPA regulatory compliance monitoring programs have traditionally
been concerned about the occurrence of false positive results, thus the 1 % rate was incorporated
into the current MDL definition. The greater risk of false negative results inherent in the current
USEPA MDL would seem to favor the regulated entity, and it is unclear that there is sufficient
reason to change mat risk. In addition, compliance monitoring determinations have typically been
based on a single sampling event, often a grab sample. If each sampling event is viewed as an
Draft C-40 February 1996
-------
Geostatistical Soil Sampling Guidance—Appendix C
independent action, the fact that the current MDL predicts the result of a single future sample
analysis would seem to be appropriate.

In addition, the Gibbons approach assumes that USEPA establishes compliance monitoring
levels solely as a function of the analytical methodology. While USEPA has used this approach
in monitoring compliance with some risk-based regulatory limits, the majority of limits associated
with both the NPDES and RCRA Listing programs are based on technology evaluations, not risk
assessments. Thus, the compliance limits are typically derived from best demonstrated available
technology (BDAT) evaluations and are often well above the detection and quantitation capabilities
of available monitoring methods.
While there are a variety of terms and concepts used to describe method sensitivity, the ones
of greatest significant to RCRA Programs are:

• Method Detection Limit (MDL)
• Estimated Quantitation Limit (EQL)
• Regulatory Threshold.

The MDL values listed in SW-846 methods may have been derived from reagent water under
the best possible conditions. When choosing between methods within SW-846, the MDLs from
each method may be compared with greater certainty, however, these values are provided in the
methods for guidance, and may not be achievable in many real-world matrices.

MDL values provided by individual laboratories must be recognized as a snapshot of
laboratory capability. These MDL values may be determined in reagent water, a clean solid
matrix, or the actual matrix of interest, and the matrix used must be known for the values to be
compared. The utility of these MDLs is increased if they are determined frequently, as
comparisons can be made over time and the capability of the laboratory can be more meaningfully
projected to future sample analyses. However, the MDL values remain matrix-, instrument-,

Draft C-41 February 1996
-------
Geostatistieal Soil Sampling Guidance-Appendix C
method-, and laboratory-specific. Thus, MDL values should not be used to establish trigger points
in a decision-making process.

Similarly, EQLs are provided in SW-846 for guidance and can be useful in comparing
methods within SW-846. EQLs should not be used to assess laboratory capability or performance
unless the reviewer has detailed knowledge of the reporting conventions used by the laboratory
and laboratory data on dilutions, interferences, matrix spikes recoveries, etc.

Regulatory thresholds (i.e.. action limits or •bright-lines'1) should be used to determine the
appropriate analytical procedures to be employed. This determination should be made using the
Data Quality Objective process, to ensure that analytical resources are appropriately employed and
can demonstrate compliance with the applicable regulatory limits.

The approach described by Gibbons has attracted much attention. Whether it could be put
into practice in a fashion that provided an appropriate balance between protection of the
environment and the interests of the regulated community remains to be seen.

Co-occurrence of Contaminants, Constituents, and Method Selection
Current industrial processes and waste handling procedures rarely result in the release of
individual chemical contaminants to the environment Rather, wastes usually contain a series of
chemicals mat reflect the process from which the waste is derived. The compounds present may
be closely related chemically, forming classes of compounds such as PAHs, or be characteristic
of the original source material, such as the BTEX compounds found in petroleum products. This
phenomenon is known as co-occurrence. The patterns of occurrence of chemicals in these wastes
can sometimes even be used to trace the sources of wastes mixed together prior to disposal.

Co-occurrence can be exploited in sampling designs in several ways. First, initial sampling
efforts can target classes of compounds rather than individual components. Since analyses for
classes of compounds can usually be conducted at lower cost and often more rapidly, resources
can be conserved and the results can be used to focus more expensive analyses on the most likely
C-42 Febnuuyl996
-------
Geoaatutual Soil Sampling Guidance-Appendix C
trouble spots. The overall result can be the collection and analyses of more samples on the same
*
overall budget. Many field screening techniques discussed earlier are readily applied to classes

of compounds known to co-occur.

Secondly, co-occurrence patterns can be used to minimize the instances in which aberrant

analytical results drive a decision-making process. For instance, if compound-specific laboratory

results indicate that only one of a series of co-occurring compounds is present, one could evaluate

those results more carefully before using the data in a decision. There may be a variety of reasons

why the other compounds were not detected, but knowledge of co-occurrence patterns can prevent

a spurious result from leading to an inappropriate decision.
In considering co-occurrence, a few general suggestions can be offered.

• Co-occurrence in soils generally follows similar patterns as co-occurrence in ground
water (Rajagopal and Li 1994, Li and Rajagopal 1994a, and Li and Rajagopal 1994b).

• initial soil sampling can be focuses on analyses for total target compound categories,
e.g. PAHs, TPHs, BTEX (Clarkston et al. 1992, Kampe and Leschber 1989, Lajoie
and Strom 1994). This may be more applicable to field screening techniques, but tests
for total compound categories can often be conducted in the field at substantially less
expense man total analyses, and the detection of a compound category can serve to limit
the total analyses requested.

• There are two ways to identify specific potentially co-occurring compounds:

- use studies of industries to determine compounds that might occur together (e.g.
USEPA 1986).

- use existing case studies and look for similar waste sites (e.g. USEPA 1995, Wise
and Trantolo 1994, or Hoddinott 1992).

• Some groups of compounds have been identified as occurring together in wastes or
soils:

- 1-Methyl naphthalene and 2-Methyl naphthalene; pyrene and benzo[a]pyrene;
acenaphthene, anthracene, fluorene, fluoranthene, phenanthrene, and pyrene in
industrial waste streams (USEPA 1986).

- A 12 or a 15 compound screen was developed for nationwide hazardous waste sites
that correctly identified contamination in 99 percent or 100 percent of 1565 wells
Draft C-43 February 1996
-------
Gtostatistud Soil Sampling Guidance—Appendix C
studied (Li and Rajagopal 1994). The authors state that the results can be applied,
with appropriate modifications, to soil samples.

• Measurements of microbial activity combined with biomass specific respiration have
been shown to be good indicators of heavy metal and pesticide contamination in soils
(Brookes 1993).

• The ratios and distributions of 16 USEPA priority PAHs and dibenzothiophene to
identify the petroleum product type that caused the soil contamination (Douglas et al.
1992).

• The USEPA has a published database that summarizes site specific information for
hazardous waste sites across the United States. This database may be used to determine
co-occurrence of hazardous compounds (USEPA 1995, and personal communication
with Joe Williams, Ada Lab).
NOTE: This reference list is not comprehensive. In particular, it does not include the
numerous references that contain specific site case studies that might be used to
establish specific chemical suites that might occur together in soils at hazardous
waste sites.

1. Brookes, P.C. 1993. "The Potential of Microbiological Properties as Indicators in Soil
Pollution Monitoring/ Soil Monitoring: Eorfy Detection and Surveying of Soil
Contamination and Degradation. R. Schulin, ed. Birkhauser Verlag. Boston.

Discusses the advantages and disadvantages of using soil microbiology to indicate soil
pollution. Specifically discusses heavy metal pollution and pesticide pollution. No
single microbiological property is a good indicator of pollution, but using several in
conjunction, and recognizing the inherent variability of biological processes in soils,
the methods can be useful.

2. Clarkson, J.R., E.A. Peuler, C.A. Menzie, D.T Crotwell, T.V. Bordenave, M.C. Metcalf,
and D.H. Pahl. 1992. "Field Screening Procedures Applied to Soils for Use in Risk
Assessment' SuperjundKskAssessment in Soil Contamination Studies. ASTM STP 1158.
K.B. Hoddinott, ed. American Society for Testing and Materials. Philadelphia.

Discusses rapid field analytical techniques to screen for the presence of significant
levels of categories of pollutants. Specifically, screening procedures useful in
approximating the categories heavy metals, volatile organics, base/neutral extractable
organics, and acid extractable organics are discussed.

3. Douglas, G.S., KJ. McCarthy, D.T. Dahlen, J.A. Seavey, W.G. Steinhauer, R.C. Prince,
and D.L. Qmendorf. 1992. "The Use of Hydrocarbon Analyses for Environmental
Assessment and Remediation." Journal of Soil Contamination. Vol. 1. No. 3. pp. 197-
216.

Drift C-44 February 1996
-------
Geostatistical Soil Sampling Guidance-Appendix C
Discusses the use of total recoverable hydrocarbon analyses to determine the source
of site contamination. Uses the ratio of 16 PAH compounds to predict the product
causing contamination.

4. Hoddinott, K.B. 1992. Superjund Risk Assessment in Soil Contamination Studies. ASTM
STP 1158, ed. American Society for Testing and Materials. Philadelphia.

See the table of contents for useful case studies of hazardous waste sites.

5. Kampe, W., and R. Leschber. 1989. "Occurrence of Organic Pollutants in Soil and Plants
After Intensive Sewage Sludge Application.* Organic Contaminants in Waste Water,
Sludge, and Sediment: Occurrence, Fate, and Disposal. D. Quaghebuer, I. Temmerman,
and G. Angeletti, eds. Elsevier Applied Science. New York.

Discusses the transport of definable chlorinated hydrocarbons, hexachlorobenzene,
polychlorinaied biphenyls (PCBs), and four polyaromatic hydrocarbons between soils
and plants as a result of sewage sludge applications. PCBs, were found to be elevated
in soils, but did not transfer from soil to plant

6. Lajoie, C.A., and P.P. Strom. 1994. Biodegradation of Pofynuclear Aromatic
Hydrocarbons in Coal Tar OH Contaminated Soil.

Analyzes the microbial degradation products of coal tar and creosote by compound in
soils. Lists the general compounds contained in coal tar and creosote. Describes the
fate and rate of disappearance of these compounds under various environmental
conditions.

7. Li, P.C., and R. Rajagopal. 1994a. "Utility of Screening in Environmental Monitoring,
Part 2: Sequential Screening Methods for Monitoring VOCs at Waste Sites." American
Environmental Laboratory. May, 1994.

Continues research described in Part 1, by identifying a subset of 12 or 15 VOCs mat
can successfully determine contamination in wells at hazardous waste sites. Data from
national surveys are verified using data from a single Iowa landfill.

8. Li, P.C., and R. Rajagopal. 1994b. "Utility of Screening in Environmental Monitoring,
Part 3: Sample Compositing." American Environmental Laboratory. May, 1994.

Continues research described in Part 2. Discusses sample compositing as a cost-
effective screening strategy for the identification of hot spots or plumes.

9. Rajagopal, R., and P.C. Li. 1994. "Utility of Screening in Environmental Monitoring, Part
1: Occurrence and Distribution of VOCs in Groundwater." American Environmental
Laboratory. April, 1994.
Draft C-45 February 1996
-------
Geostatistical Soil Sampling Guidance-Appendix C
Review of U.S. Groundwater Supply Survey, U.S. Hazardous Waste Site Survey, and
Iowa Drinking Water Supply Survey to determine the frequency of occurrence of
VOCs in groundwater. Indicates that only a small number of the 35, 36, or 31 VOCs
considered, respectively, occurs routinely in groundwater samples.

10. USEPA. 1986. Industry Standards Dma Bast (ISDB) System, Drtf User's Guide. USEPA
Waste Identification Branch. Office of Solid Waste. January, 1986.

A repository of chemical manufacturing and waste generation and management
information for the Hazardous Waste Listing Program. Includes information such as:
facility identification information, industry segment represented, products of concern,
production quantity, product processes involved, waste streams generated, waste type,
waste quantity, waste composition, waste management practices, specific on-rite waste
management facility data, and identification of off-site facilities that accept specific
wastes.

11. USEPA. 1995. Subsurface Remediation Technology Database. USEPA Robert S. Kerr
Environmental Research Laboratory. Ada, OK.

Provides rite-specific information on subsurface contamination for more than 600
hazardous waste sites nationwide. Database components include rite characterization,
methods of remediation, contaminants, consulting firms, and references cited. Joe
Williams of USEPA Ada indicates that the database can be searched by industry to
summarize co-occurring chemicals. The database should be available on the Ada
bulletin board by April, 1995.

12. Wise, D.L., and DJ. Trantolo. 1994. Remediation of Hazardous Wane Contaminated
Soils. Marcel Dekker, Inc. New York.

See the table of contents for useful case studies of hazardous waste rites.

13. Gibbons, R.D. "Some Statistical and Conceptual Issues in the Detection of Low-Level
Environmental Pollutants.* Environmental and Ecological Statistics Vol. 2, No. 2, June
1995 pp. 125-1677

Describes traditional detection limit estimators and critically evaluates them.
Draft C-46 February 1996
-------
Giottatistical Soil Sampling Guidance-Apptndu C
C.3 BRIGHT-LINE VALUES VS. LABORATORY/FIELD METHODOLOGY
QU ANTTTATION LIMITS
Current quantitative methods available both for the laboratory and for field identification of
organic and inorganic compounds are capable of readily achieving bright-line values. Tables 1
through 4 list target constituents with identified bright-line values. Typical reporting limits for
both laboratory and field methods are shown in the attached tables for comparison with bright-line
values. A reporting limit is equal to the sample concentration equivalent to the low point of the
calibration curve. The reporting limits shown are different man method detection limits (MDL)
or instrumentation detection limits (IDL). The reporting limits listed for compounds in Tables 1
through 4 would be applicable to solid/soil samples that do not contain interferences necessitating
further sample preparation procedures (e.g., cleanup, dilutions, etc.) prior to instrumental
analysis.

Field detection methods, either for quantitative or screening purposes, have the capability
of determining concentrations of analytes well within bright-line values. The applicable on-site
methods include test kits (e.g., immunoassay, colorimetric, etc.), X-ray fluorescence (XRF), and
laboratory instrumentation configured for field usage. There are additional on-site analytical
procedures currently in use such as the HNu Photoionization Detector (FED). However, this
particular FID instrument does not isolate, separate, identify, or quantify any one compound. The
instrument registers a combined response that accounts for the presence of mixtures of organic
compounds, and therefore, will not be discussed here.

The bright-line values are typically three to four orders of magnitude greater than detection
limits of available laboratory and field methodologies. Because analytical methods are often
capable of delecting part per billion (ppb) concentrations, and bright-line values are usually several
orders of magnitude higher, an analyst could combine many samples together (i.e., composite)
and, if there are no concentrations of target analytes in the samples above the bright-line
concentration, determine that the samples are without significant contamination. The compositing
approach is an option to reduce overall analytical testing costs while achieving detection limits
below bright-line values.

Drttf C-47 February 1996
-------
Gtostatistical Soil Sampling Guidance—Appendix C
The process of compositing or combining samples results in the dilution of each sample that
is composited. For example, if equal volumes of two samples are mixed together, the relative
detection limit of each individual sample will be increased by a factor of two. Tables 1 through
4 show the theoretical maximum number of individual samples that could be composited while still
detecting constituents present in soil at the bright-line values. This number was derived from the
following equation:
Theoretical number of samples _ Bright-line Value
composited for a single analysis " Detection Limit

This equation is given in a slightly different format by Skalslti and Thomas (1984)1 whereby
the maximum number of samples or increments that can be composited is given by:
n* AUMDL

where: AL is the action level, or in this case the bright-line value, and MDL is the minimum
detection limit, or as referred to here, the reporting limit.

Certainly, the theoretical numbers of samples shown in Tables 1 - 4 that might be
composited together for volatile*, semivolatiles, pesticides, PCBs, and metals are neither practical
nor realistic. But the numbers illustrate that many samples can be combined, and existing
methodologies will permit the detection of constituents at the bright-line values.

There is another consideration that must be evaluated when determining the optimum number
of individual samples to be combined to form the composite sample. If the analysis of the
composite sample indicates the presence of target analytes at or above the bright-line, additional
analytical procedures would be required. The procedures would be used to isolate, through
analysis of individual samples, the sample or samples that exceed the bright-line. If many samples
had been composited into a single analysis, then many more individual samples would have to be
analyzed to identify the specific samples that show contamination. Additional costs associated
with the further analyses would then be incurred, negating the cost savings derived from
Draft C-48 February 1996
-------
Geostatistical Soil Sampling Guidance-Appendix C
compositing samples for an initial single analysis. For this reason, the investigator should use the
composite sampling approach only in those areas believed to be uncontaminated.

Laboratory methods other than SW-846 methods may be used for identification of volatile
organic compounds, semivolatile organic compounds, pesticides, PCBs, and metals. Since many
of the alternative methods (e.g., USEPA waste water 6QO series, USEPA CLP protocols, etc.) use
analytical instrumentation that is equivalent to the equipment specified in the SW-846 methods
cited in this document, bright-line values would be achieved when alternative methods from other
USEPA programs are applied. For example, analysis of benzene by SW-846 Method 8260 meets
the bright-line value as does the analysis for benzene using USEPA Methods 624 and 524.
Drqfi C-49 ' Februwy 1996
-------
Geostatistual Soil Sampling Guidance—Appendix C
C.4 COMPOSITING STRATEGIES
There are two primary approaches to obtain a composite sample comprised of many
individually collected soil samples. Both techniques are performed under controlled conditions
in the laboratory and not in the field. One approach involves the subsampling of multiple grab
samples immediately prior to sample extraction. A second approach involves the analysis of an
individual sample aliquot taken from a composite sample prepared in the laboratory from larger
amounts of individual grab samples.

The information presented below is provided only as recommended or suggested compositing
approaches. There may be additional compositing arrangements that will permit analysis of
composited samples while still attaining detection limits necessary to achieve bright-line values.
The design of a successful compositing strategy should maximize the number of samples combined
into a single analysis without sacrificing the "representativeness* of any individual sample used
in the composite mixture.

For any compositing strategy, it is recommended that a portion of each individual sample,
as collected, be retained in case reanalysis of the original sample is required.

Approach ft - Combining samples to equal the amount of material specified by
an analysis method.
This analysis approach uses equal portions of multiple samples to arrive at a composite equal
to the weight specified by the method. The subsample composite approach may be used for
samples to be analyzed for volatiles, as there is no direct mixing or physical blending of the
individual sample aliquots. Examples of subsampling composite strategies are shown below:
Analytical
Parameter
Volatiles
Semivolatika
Pesticidea/PCBa
Metals
Laboratory
Method
8260A
8270B
8080A/8081
6010A/7000 seriea
Method Spodfied
Sample Sae
S grams
30 grams
30 grams
1 gram
NnberorSaaphsper
Total Composite
5
10
10
N A/ See Below
SubsampteSiae
Used per ladhiduai
Grab
1 grain
3 grams
3 grams

Draft C-50 February 1996
-------
Geoaatistical Soil Sampling Guidance-Appendix C
For example, volatile organic analysis using SW-846 Method 8260A requires a 5 gram
aliquot of soil. If five separate samples are combined into one sample for analysis, the sample for
analysis would be prepared with 1 gram of soil from each of the five individual samples. Note
that for volatile organics, each individual sample that makes up the combined mixture is measured
directly from the field collection container into the vessel used for analysis. Prior to the purge-
and-trap analysis for volatile organics, the sample is not stirred or mixed, to ensure that any loss
of volatile constituents is minimized.

Semivolatile organic analysis by SW-846 Method 8270B requires 30 grams of soil.
Preparing a single sample of 30 grams soil could be performed by using 3 grams of soil obtained
from each of 10 individual samples. The same combining strategy is suggested for pesticide, and
PCS analyses using Methods 8080A or 8081.

Use of sample aliquots mat are too small often leads to inaccurate analysis since the aliquot
may be non-representative, especially in nonhomogeneous matrices. For example, only 1 gram
of sample is required for a total metals analysis. Therefore, combining equal portions of multiple
samples may not be advisable when performing metals analyses.
Approach #2 • Lab composttinf moderate amounts of individual sample and
subsampling from this composite mixture
This laboratory compositing approach specifies the measurement of equal amounts (>50
grams) of several individual soil samples followed by mixing and blending of the mixture to obtain
a homogeneous mixture. Subsequent analysis is performed from a single aliquot subsampled from
the mixed and homogenized composite sample.

The process of obtaining a composite, prepared in the laboratory, is particularly useful when
the amount of sample required for an analysis is small. This is true in the case of metals analysis
requiring only 1 gram of sample material. However, it is important to note that this type of lab
Drqfi C-51 February 1996
-------
Geostatistical Soil Sampling Guidance-Appendix C
composite is not applicable to analyses for volatile organic compounds. The extensive handling
performed during the mixing and homogenizing procedures used during this type of compositing
procedure results in significant losses of volatile analytes from solid sample matrices.

This technique also may not be appropriate for moist samples that are difficult to mix or
sample matrices that do not appear to be homogeneous based on physical/chemical attributes of
the matrix. Samples containing debris, rocks, or vegetation and samples that appear not to be
uniform based on color, striations, or other apparent variabilities, including multiphasic matrices,
should not be subjected to this compositing approach.
Draft C-52 February 1996
-------
Geostatistical Soil Sampling Guidance-Appendix C
TABLE 1
FREQUENTLY ENCOUNTERED VOCi HAVING COMMON SYNONYMS
Chemical Num
Methytae chloride
Trichloromethuw
T^^^nHfiaMi— Methyl-2-peounooe
Acetone
PfchkHnfluommtiiMit
IVjTMjt«*ulifln- _tu
-------
Soil Sampling Report—Appendix C
TABLE 2
SW-846 GAS CHROMATOGRAPHIC METHODS
Capillary Column
Version of Method
8015B
(proposed)
802 12
NA
8061
NA
8081
8091
(proposed)
NA
8111
(proposed)
8121
8141
8IS1
Picked Column'
Version of Method
NA
8010 and 8020
8040
8060
8070
8080
8090
8100
8110
8120
8140
8150
Detector(t) used for method
FID - Flame ionuation
ELCD • Electrolytic conductivity
PITI . pttf^nirmirmtinfl
FID • Flame ionizaljon
BCD - Electron capture
FID -.Flame ionizatioa (Method 8060 only)
BCD - Electron capture
KtDTt tJil.r>«»> -nf uu< Juuti* /!•> M »UMl«l
FtrLi - narofon-pDo*|Juorus |sn n nmnoj
ELCD - Electrolytic oooductiviiy
TEA - Thennal onercy analyzer
BCD - Electron capture
FID - Flame ioniiation
BCD - Electron capture
FID - Flame ioniiation
ELCD - Electrolytic conductivity (Method 81 10 only)
BCD - Electron capture
ECD • Electron capture
NPD - Nitrogen-phosphorus (in P mode)
FPD - Flame photometric
ECD - Electron capture
Compounds Determined
Nonhalof enated volaliles
Petroleum hydrocaii>ons
Halof coated and aromatic volatile*
Phenols
Phthalate esters
Nstrosamines
OrymocJiloripe pesticides and PCBs
Nitroaromaucs and cyclic ketones
Polyaromatic hydrocarbons (PAHs)
•
Chlorinated hydrocaioons
Orfanophosphonu pesticide*
Chlorinated herbicides
1 Generally speaking, packed columns do not resolve GC peaks as well as capillary columns..
1 PID and ELCD are used in series.
NA No alternative column procedure exists for this analysis.
Drqfl
C-54
February 1996
-------
Soil Sampling Report—Appendix C
TABLE 3
COMMON PETROLEUM HYDROCARBON (PHQ METHODS OF ANALYSIS
Number/
Delineation
Full Method Name
USEPA METHODS
USEPA
SDWAand
NPDES
4IS.I
(TRPH)
USEPA
OSW
Proponed
Method
SOI5B
Petroleum Hydrocarbon*, Tottl
Recoverable
(Sp«ctrof4MMomctric. Infrared)
Nottalofeneted Orfenic*
Usinf aC/FID (acheduled far
propoeal in Updated!)
API METHODS
API Method
farORO*
API Method
forDROi
API Method
far
PHC*in
SoU
Method for Determination of
Oaaotin* Ranf s Otfanics
(l/IS/93)
Method for Determination of
Diesel Rang* Organic*
(l/lt/93)
Not*: The word 'dicier
mum dieeel f2 or Kiel oil 12.
Method for Characterization of
Petroleum Hydrocarbon* In
Soil 0/11/93)
STATE OF CAUFOBNIA METHODS
Source
t
USEPA 'Method*
for Chemical
Analysis of Water
MdWMM*
USEPA SW-S46
Picf mad Update
OlMemod

The America*
^ — • -_
wrowiai
In*** (API)
HM Anukan
BlHtMlJ«B*eltt
•rvimV^BjBJi •
•MdMe(API)
The Americaa
PMiokw.
lMtiMi*(API)

RufeofPHCe
Delected

Not specified.
Recovery rune
defined by
OROe: nafeof
gaaatta*
•UuArd
DROe: C»-CB
•IkeUM •^•KMtvWtt
ned

co(uidance. Uae of Method 3550 in
validation attdy mentioned. Introduce eample by
aolveax injection. Analyze by OC/FID. Qvantilate the
total area under me hydrocaibon peacj.

10 1 (approximilc) of aoil/aedunent U dupened in
mefend to diaeolveVOCabtfM sample. The
mf*ann>V aohjtion U men analyzed by puife-«nd-tnp
OC/FID. Water aampfea can be analyzed directly by me
aanM tachniojM. OuanlilBle me fnlil an under me
hydrocajbon peaka.
1 L of water to extracted 3 time, with CHJO, in •
acparmory Aumel. For eoHd aampka, 25 ( of aolid
mMeifal !• eomcMed 3 omee with CH£V Oifwuc
ftquid can be diluted in CH/3,. Analyze by OC/FID
Minf2«LiaJecaoM. Quenlilate me total area under me
hydracarbon peaka.
10 f of aoil ii mixed wMk aodium aulble and extracted
with 10 mL of CHf\ uamj • vortex mixer end abakcr
Analyze by OC/FID uainj 1-5 |iL iiyctiona. Quantiute
me total area under the hydrocarbon peak*.

Applicable
Matrix

eurbce/
aaline watera
waattwalera
liquid.
eotidi

water*
eoib
waatea
water*
•oil*
watte*
aoila

Detection Limit

Sensitive to 1
aqi/Landka*.
None f iven.
Calibnlion
curve of dieael
fuel uaed in the
validation study
rangea from 5-
1000 mj/L.

PQL U approx.
5 mf/kf for
soils and 01
mg/l for water.
Qiuntiuiioa
Limiu: 0.10
mj/L for water*
and 4.0 mf/kf
for aoila
PQL ia 50 - 100
mt/kf.

Draft
C-5S
February 1996
-------
Soil Sampling Report—Appendix C
TABLES
COMMON PETROLEUM HYDROCARBON (PHQ METHODS OF ANALYSIS
SCL 815
SCLIII
SCLSI6
SCL
Method 411
(TPH)
Tola! Purge«ble Petroleum .
Hydrocarbon* by OC/FID
Purge-and-lnp Method.
(Sept. 1991)
Tout Volatile Petroleum
Hydrocarbon* by OC/FID
(Sept. 1991)
SCL 116: DieadbyOC/FID.
(Augua 1991)
Total Petroleum HydrocaiboM
Speclropholomcuk, Infomd
(Reviaion 14. Augtut 1991)
STATE OF WISCONSIN
Wiacomin
Modified
QRO
Wieconain
Modified
DRO
Modified ORO: Method for
Determining Oaaoain* Rang*
Ofgaaic*
Witconam DNR (luly 1993)
Modified DRO: Method for
i^j • • — ns^^i ••••••
Organic*
Wucoaaia DNR (Inly 1993)
State of California
LUFT Field
Manual
• '
State of Catifonia
LUFT Field
Manual
State of Caafomi*
LUFTFiaU
Manual
Slate of California
LUFT Field
Manual

WucommDNR
LUST and
PttvolcttfB
Analytical and
QA Guidance
(luly 1993)
WUconauDNR
LUST and
•f^ctiolcuin
Analytkal and
QA Guidance
(July 1993)
CfCa
fairttiinnj ""iikt
C.-C,)
CrC,,
DieatlFMl
C*C*
Not ^ecified.
Recovery taoge
defined by

CfCm
C.-C,
10 f oftampk U extracted in no kaa than 20 mL of
ne*anol. Extract with 10 mL for low-level aoil. Dilute
an aliquot (20 jd. for tow-level toil and 10 fiL for nifh-
level aoU) of the ample with 50 mL of water and purfe
S mL. Analyn by OC/FID. *>•»-*'*•'* the total area
under AM hydrocaibon peak*.
10 f of aampte b Irandene4 to a 20 mL capacity
autoeamplervial. The vial b heated in an oil bath to
90*C and tebaada^cab analyzed by OC/FID.
20 g of eample it nuied with 10 ( of sodium aulbte and
pieced » a 250 mLflaak. The mixtura b extnctod wi*
Analyxe by OC/FID. Quantmato me total area under the
procedurei are ytowidad fat water, waatewater, oil, and
•wdw.
10 f of a aoHd temple ia aerially extracted with CFC-
1 13 in a 125 Erknmeyer flaak. Remove interference*
wMiiiHcagd. Ot analyai* of extract.

25 ( of aofid aample ia collected and 25 mL of methancl
•Baking and tonkating. Analyze by putfe-and^np
OC/FID. Quamilate dM total area under me
hyd^ocaibonoeaUfromMTBEloiMphmaleiM. Water
•ample* am directly analyzed by purfe-and-Uap
OCVFID.
Extract 1 L water (ample* by acpanlory tunnel or by
contmnou* Uquid-liayid exuactor. Extract aoil* by
iajeclinc aolvent direclty into vial and ahakraf and
ennfcatiaf. Hexana i* eufgeated eolvent. Analyze by
puifa-and-uap OC/FID. Quantitrte the total area under
the bydrocaifcon peak* from CM to C».
water
waatewiter
•oil*
aofid wallet
•oil*
•olid wine
water
waalewaler
•oil
•olid waale
oil
aludfe
aoil*
eolida wine*
awdfe*

water
•oil
water
•oil
waalca
Quantiution
Limit - 45 w't
(ppm) for low-
level coil*.
None f iven.
Calibr. curve
range* from 7.5-
1500 MI/L.
Quanlitation
Limit - 52 |i«/|
(ppm) for low-
level •oil*
None given.
Calibr. curve
range* from 20-
200mc/L.

PQL-
lOmg/kgfor
•oil
O.I mg/Lfor
groundwater
PQL-
10 ing/kg for
•oil
O.I mg/Lfor
groundwMcr
Drqft
C-56
February 1996
-------
Soil Sampling Report-Appendix C
TABLE 4
COMPOUNDS AMENABLE FOR DETECTION BY FIELD TEST KIT DETERMINATION
Analyte
PmUcUoropbeaol
(PCP)
2.4-D
PolycUorinated
Bipbenyls (PCBs)
Total Petroleum
Hydrocarbons
Poly aromatic
Hydrocarbons
(PAHs)
Toxapbeoe
Type of
Detection
Tecbnoloty
EL1SA/
limmiM|fl§§aM

ELISA/
bimHIfWIalHAV

ELISA/
I mm mnsMMV

ELISA/
IfimVIIWMIMY

ELISA/
llTafnUflfMHAV

ELISA7
InuraiiMMiiuy
SW-846
Method
4010
4015
4020
4030
4035
4040
Method
Status
Finalized
(Promul-
gated in
Update UA)
Proposed in
Update ID
Proposed in
Update ID
Proposed in
Update IH
Proposed in
Update 10
Proposed in
Update m
Manufacturer
Millipore
EnSys
Ohmicron
Ohmicron
Millipore
D-TECH
Millipore
EnSys
EnSys
Millipore
D-TECH
Millipore
EnSys
Millipore
Minimum
Detection
Limit (MDL)
NA
0.5 ppm
NA
NA
NA
0.5 ppm
1.0 ppm
0.4 - 2 ppm1
10-40 ppm2
NA
0.6 ppm
NA
1.0 ppm
NA
Approved
Matrices
Soil A Water
Soil A Water
Soil
Soil* Water
Soil & Water
Soil
Soil
Soil
(also*
separate oil
kit)
Soil
Soil
NA
Soil
Soil
Soil
Cross-Reactivity of Related
Compounds Tested &.
Potential Interferences
Soil and water matrices OK
2,4,6-trichloropheool. 2,3,5,6-
tetracUorophenol
NA
NA
NA
Halowax 1099. Bifenox
None Reported
non-PCB oils in soil,
excessively wet soils, 2,4-
dJchloro- 1 -naphthol, Dhiron,
Tetradifon
Aromatk volatiles, styrene.
SW'^MfcslI J^ttx*!*** ffla1ffltltlasll#HaA IC/mj-
octane. creosote, bipbenyl
NA
None Reported
NA
Creosote, dibenzofuran,
Halowax 1013
NA
Draft
C-57
February 1996
-------
Soil Sampling Report-Appendix C
COMPOUNDS AM
AnaJyte
Chbrdane
DDT
Trinitrotoluene
(TNT)
RDX
Triazines
Mercury
Type of
Detection
Technology
ELISA/
Immunnuuy
ELISA/
biYiBJiMMtiMY
Colorimctric
ELISA/
ItHmillKMItHy
ELISA/
Imnnimtflwufly
ELISA/
fflVnilflfTAHsiy
ELISA/
Immunoassay
SW-846
Method
4041
4042
8515
4050
4051
NA
4500
TABLE 4
ENABLE FOR DETECTION BY FIELD TEST KIT DETERMINATION
Method
Status
Proposed in
Update in
Proposed in
Update in
Proposed io
Update in
Proposed in
Update UI
Proposed in
Update m
inOSW
Review
InOSW
Review
Manufacturer
Millipore
Millipora
EoSys
DTECH
Millipore
D-TECH
Millipore
Bio-Nebraska
Minimum
Detection
Lunil(MDL)
NA
NA
0.6 • I.I ppm'
0.5 ppm
0.2 -15.0 ppm
0.5 ppm
0.01 ppm
NA
Approved
Matrices
Soil
Soil
Soil
Soil & Water
Soil
Soil A Water
NA
Soil
Cross-Reactivity of Related
Compounds Tested &
Potential Interferences
NA
NA
Related mtroaromatics
TETRYL, 1,3,5-
Irinitrobenzene ,
2,4-dinitrotoluene, 2-anuno-4,6-
dinitfo-toliieae
NA
HMX
None Reported
NA
ELISA Enzyme Linked unmunosorbant Assay
NA Information not currently available
1 Depending on specific Arock>r(s) present
1 Depending on specific petroleum produces) present
' Depending on the specific niiro-tohieae compound(s) present
Drqft
C-58
February 1996
-------
APPENDIX D

TABLES OF BRIGHT-LINE VALUES AND

OPPORTUNITIES FOR SAMPLE COMPOSITING
Drift EM Ftbrwuy 1996
-------
Geoaatutical Soil Sampling Guidance—Appendix D
TABLE 1 - BRIGHT-LINE VALUES AND EQLi USED TO CALCULATE A THEORETICAL MAXIMUM NUMBER OF
COMPOSITED SAMPLES FOR VOLATLE OROAMC COMPOUNDS M SOUSOUO MATRIX (Al cancel roethefW
tram-1 .2-OfcNaroetham
1 .2-OfchtoroprapMW
La^McNoiapraiMra (mbt of cto A tram)
DtehtoTonMttMot (oMthytan* chkxkte)

Ettiytwniww
Styi*M
1.1.2.2-T«H»chtofoeth»n«

i MracnniOMnMM
Tohiww
CAS No.
67-64-1
71-43-2
75-27-4
7443-9
71-36-3
75-154
56-2M
10640-7
124-46-1
67-66-3
75-34-3
107-06-2
75-35-4
15640-2
156405
7647-5
542-754
7549-2
100-41-4
100-42-5
79-344
127-16-4
10646-3
HWIR-MKM
Brioht-LkwValuM
10.000
500
1.800
20
9.700
110
200
940
1.900
200
9.600
300
40
1.500
3.600
110
100
7.000
260
1.400
400
10.000
520
EQU tor Pur0»4-Trap
GC/MS Method 8260A'
0.005*
O.OOS
0.005
0.005
O.OOS4
0.005*
0.005
0.005
0.005
0005
0.005
0.005
0.005
0.005*
0.006
0.005
0.005
0.005
0.006
0.005
0.005
O.OOS
0.005
Theoretical Max. Number
of Comported Samples'
2.000.000
100.000
360.000
4.000
1.940.000
22.000
40.000
188.000
380.000
40.000
1,960.000
60.000
8.000
300.000
720,000
22.000
20,000
1.400.000
52.000
260.000
80.000
2.000.000
104000
Draft
D-2
February 1996
-------
GeostatisOcal Soil Sampling Guidance—Appendix D
TABLE 1 • BRIOHT4JNE VALUES AND EQLa USED TO CALCULATE A THEORETICAL MAXIMUM NUMBER OF
COMPOSITED SAMPLES FOR VOLATfcE OROANK COMPOUNDS M SCC/SOUD MATRIX (Al concentfatlons In mfl/Kfl)
Compound

IIIUUIlMIMUMlRV (DfOfnOfUfinii
1^.4-TricMorabiraMW
«t 4 THn>il ill ii •
.i.i-incnHroeniaiia
1m A Y«^^^MM^Jt»AMA
. i it- 1 ncnonwraiw

TflCniaroMnene
Vkntchtortde (cMonMthMM)
Xytenes
CAS No.
75-25-2
120-82-1
71-554
7»XX^5
7W)1-«
TSOI^
1330-20-7
HWIR-Medta
BrtgM-Un* Values
10.000
2.400
980
800
3.000
2
320
EQL» tor Purg«-&-Trap
OCA4SMettwd8280A'
0.005
0005
0.005
0.005
0.005
0.005
0.006
Theoretic* Max. Number
of ConvxMlted Sanvte*'
2.000.000
480.000
196.000
160.000
600.000
400
64.000
i IMs value represents the number of Individual camples that
Un
can be comported and tUt achieve* • reporting Ml equal to Hw bright-lM vakw.
1 Ealknatad(X«ttallanlJml(EQL)¥^UManMmlromMalhod620uAln EQU ara highly iratrtx-dapendanl and may not atoaym be achievable EQU •houM be used only as
guidance.
Compound to known to be • poor purgar and may actuaty haw • Ngtm EQL VynMad.
Thte compound to only appropriate far Method 8260A whan purged at 80'C.
Compound not Mad asatargatanalyto for Method 6280A. EQLMluaealbnatadasttwsameaslhalrafw-laonw.
NOTE: ConMMrabto bnmunoassay or aRamattv* acraenlng
Draft
D-3
February 1996
-------
GtoitatisOcal Soil Sampling Guidance-Appendix. D
TABLE 2 • BRK1HT-UNE VALUES AND ESTIMATED QUANTITATION LIMITS (EQLs) USED TO CALCUL
COMPOSITED SAMPLES FOR SEMIVOLATLE OROAMC COMPOUNDS M SOaUSOUD MA
Compound
Acenaphttiene
B«nzo(a)anlriracene
Benzofblfluranthem
BeraoMpyrem
Batf2-Chtoroethyl)efher
BW2-«ihyltt>M
_mm
^
_
_
_
Draft
D-4
February 1996
-------
Geostatistical Soil Sampling Guidance—Appendix D
TABLE 1 - BRIGHT-LINE VALUES AND ESTMATED QUANTITATION LIMITS (EQLs) USED TO CALCUL
COMPOSITED SAMPLES FOR SEMIVOLATLE ORGAMC COMPOUNDS M SOUSOUD MA
Compound
OMMdytphthatate

Fluoraitfhene
Fluorene
M-^iU™t*iu«»

Hexachtorocvctooitfaolone

I Mwcraor ocaim
lndemX1.2.3
-------
Geottatisticat Soil Sampling Guidance-Appendix D
TABLE 3 - BRIGHT-LINE VALUES AND ESTIMATED QUANTITATION LIMITS (EQLs) USED TO CALCULATE THEORETICAL MAXIMUM NUMBER OF
COMPOSITED SAMPLES FOR OROANOCHLORME PESTtCDES AND POLYCHLORMATED BPHENYL8 M SOMJSOUD MATRIX (Al concentrations hi mg/Kfl)
Compound
AkMn
CNordane
4.4-OOD
4.4--ODE
4.4--ODT
Dtekfetn
EndosuMan (mixture)
Endrin
abha-HCH (a-BHC)
beta-MCH (0-BHC)
aamrra-HCH (-BHCarLMane)

tiepiacnior
HeptecNorepaodde (e.bj| toomers)
4.41 MethoKychlar
PoWtorinatsdMphanyla (PCBs)
TfflupftMno
CAS^No.
30940-2
57-74-9
72-544
72-56-9
50-29-3
60-57-1
115-29-7
72-204
319444
31945-7
at* mm
-------
Geostatistical Soil Sampling Guidance-Appendix D
TABLE 4 • BRIOHT4JNE VALUES AND ESTIMATED INSTRUMENTAL DETECTION LIMITS USED TO CAL<
OF COMPOSITED SAMPLES FOR METALS AND CVANDE M SOL/SOUO MATRIX
Compound
fcllMalJLJLJUJ
Arornony
Anr-nte
Barium
BeryMum
Cadmhjm
Chromium
Cyanide (amenable)
Lead
Mercury
Nickel
Selenium
SAw
Vanadium
Zinc
CAS No.
7440-304
744048-2
7440-30-3
744041-7
7440434
7440-47-3
57-12-5
7439-92-1
7439-97-0
744042-0
7702-49-2
7440-22-4
744002-2
7440404
HWIR-Medta
Bright-Una
.Values
310
400
10.000
100
390
3.900
10.000
4.000
70
10.000
3.900
3.900
5.500
10.000
Estimated IDU for
ICPAES
Method 0010A*
0.032
0.005*
0.002
0.0003
0.004
0.007
0.020*
0.005*
0.0002*
0.015
0.0021
0.007
0.000
0.002
Theoretical Max. Number
of Composted Samples'
9.090
00.000
5.000.000
333.000
97.500
557.000
500.000
.000,000
350.000
007.000
1.950.000
557.000
000,000
5.000,000
CULATE THEORET1CA1
umcwMiauwM m\ moii
Reporting Umts for
XRFTaa«n0
Methods'
NA
30
NA
NA
30
100
NA
00
<40
NA
NA
NA
NA
30
. MAXIMUM NUMBERS
-------
APPENDIX E

RELATING SOIL AND SITE CHARACTERISTICS TO SAMPLING DESIGNS

INTRODUCTION
Studies to characterize the concentration of contaminants in soil generally fall into one of
two scenarios:

1. . Characterization of ex-situ soils (e.g., sampling to characterize soil that has been
excavated, treated, stored, etc.), or

2. Characterization of in-situ soil contaminated by a release.

One of the key characteristics of soil that must be addressed under both scenarios is the
extreme variability of soil. Techniques designed to take the variation into account must be
employed in any soil sampling plan. This includes the sampling design, the collection procedures,
sample handling (both in the Meld and the laboratory), the analytical procedures, and the data
analysis. In'addition, studies to characterize in-situ soil contaminated by releases must also
consider site and waste characteristics (e.g., contaminant fate and transport characteristics and
geologic and soil characteristics). These factor are important because they influence how the site
should be stratified as part of the design of the statistical sampling plan.

This appendix provides (1) and overview of guidance on relating site and soil
characteristics to sampling design, and (2) an overview of techniques that can be used to ensure
that "correct" samples are extracted in the field.
-------
Gtostatistieal Soil Sampling Guidance-Appendix E
RELATING SITE CHARACTERISTICS TO SAMPLING DESIGN
As part of the sampling design process, investigators should review existing information
on the contaminants of concern, the source of the contaminants, the environmental setting, and
existing monitoring data and develop a conceptual model of the site or population of soil to be
characterized. The conceptual model will help the investigators in the data quality objective
process. Specifically, a conceptual model of the site will allow investigators to "state the
problem", "define the study boundaries", and "optimize the design for obtaining data", all key
steps in the data quality objectives process.

Waste and Unit Characterization
The physical and chemical properties of the waste and waste constituents affect their fate
and transport in soil and influence the design of the sampling plan both in terms of field sampling
procedures and the depth and location of sampling points. Chemicals released to soil may undergo
transformation and/or degradation by chemical and biological processes, be adsorbed onto soil
particles, or may volatilize into soil pore spaces or into the air (USEPA, 198%). Chemical
properties will influence field sampling methods. For example, sampling for volatile organic
compounds requires use of soil sampling and sample handling techniques which minimize the loss
of volatiles. Guidance on sampling of soil for VOCs can be found in Mason (1992) and USEPA
(1991c). -

Developing an understanding of the release type, mechanism, depth, and magnitude will
also play an important role in sampling design. For example, if a release is from a localized
(point) source, contamination might be characterized by a limited area of relatively high
contaminant concentration surrounded by larger areas of relatively clean soil (USEPA, 1989b).
In this scenario, it would be appropriate to stratify the site in terms of high concentration areas
and low concentration areas.
E-2 February 1996
-------
Gtostatistical Soil Sampling Guidance—Appendix E
Comprehensive guidance on assessing waste and unit characteristics for soil sampling
programs can be found in USEPA (1989b) and USEPA (1991a).

Relating Site and Environmental Characteristics to Sampling Design
Site geology, soil characteristics, and other environmental factors are important for
determining the routes of migration of soil pollutants and are a factor in any attempt to stratify the
area or site into homogeneous soil types as part of the sampling design (Mason, 1992). These
factors include

1. Surface features such as topography, drainage patterns, erosion potential, and
vegetation;

2. Stxatigraphk/hydrologic features such as soil horizons, presence of hardpans (i.e.,
fragipans and durapans), particle size distribution, hydraulic conductivity, pH,
porosity, clay content, amount of organic matter, cation exchange capacity, and
depth to ground water, and

3. Meteorological factors such as temperature, precipitation, runoff, and
evapotranspiration (USEPA, 1989c).

Soil and contaminant characteristics should be evaluated by investigators to determine the potential
mobility of the contaminants in soil and the likely extent of contaminants in the soil (both vertical
and horizontal). For example, clay content, organic matter content, texture, permeability, pH,
Eh, and cation exchange capacity (CEQ influence the ability of the soil to retain contaminants.

Comprehensive guidance on relating site and environmental characteristics to sampling
design can be found in Mason (1992) and USEPA (199la, 1991b, 19895, 1989c).

Draft E-3 February 1996
-------
Geostatiaical Soil Sampling Guidance-Appendix E
PARTICULATE SAMPLING THEORY
A sampling theory developed by Pierre Gy can provide guidance to site investigators to
address the challenge posed by the inherent heterogeneity of soil. Gy's theory addresses seven
types of sampling error and offers proven techniques for minimizing sampling error that is
introduced in sample-taking and in the subsequent handling, subsampling, and preparation
(USEPA, 1992). Gy's particulate sampling theories are based upon the relationship that exists
between the variability of the material, the particle sizes in the material, the distribution of the
component of interest (pollutant), and the size of the sample taken. The variability found in
particulate material is based upon the number of particles includes in the sample. Thus,, the
smaller the particle the lower the variability in a fixed weight of sample submitted for analysis.
Comminution (or reduction of particle size) to attain this relationship may not be practical in
environmental sampling because of safety, costs, and the handling requirements that are required
to apply the theory. Much of the theory can not be applied to sampling and analysis for volatile
organic compounds (VOCs) because of the grinding, mixing, and subsampling that is required to
obtain a correct sample (Mason, 1992). Guidance on sampling of soil for VOCs can be found in
Mason (1992) and USEPA (1991c).

Sample Correctness:

As described in Mason (1992), Gy's theory makes use of the concept of sample correctness
which is a primary structural property. A primary structural property is a property that is intrinsic
to the material (i.e., soil) itself and to the equipment used to extract the sample, and is
independent of die sampling problem itself. A sample is correct when all particles in a randomly
chosen sampling unit have the same probability of being selected for inclusion in the sample.
Antithesis to this, all particles that do not belong to the material to be sampled should have a zero
probability of selection if the sample is to be correct Grab sample or judgmental sample are not
correct.

Drift E-4 February 1996
-------
Geostatistical Soil Sampling Guidance-Appendix E
Sampling Devices and Sample Collection:

There are statistical and sampling procedures that can be used to control or reduce the
effect of sampling errors. Correct sampling requires the use of proper tools. The sampling tool
must traverse the entire strata or portion of the strata that is considered to be the interval of
interest While Pitard (1989 and 1993) has pointed out that the technology for correct sampling
remains to be developed, Mason (1992) suggests Shelby tube samplers, split spoon samplers, and
other core sampling devices. Mason (1992) advises against the use of augers. Guidance on the
location and extraction of soil samples in the field can be found in Mason (1992), ASTM (1994),
USEPA (1991c), USEPA (1989a), and USEPA (1993).

Subsampling in the field and in the analytical laboratory also requires good sampling
protocol. For example, Gy recommends scoops and spatulas that are flat, not spoon-shaped, to
avoid the preferential sampling of coarse particles. In the laboratory, error can be introduced by
poorly designed riffle splitter, spatulas, and vibrating tools. Gy recommends that the sample be
subsampled using a system of alternate shoveling wherein a large sample is "dealt out" into several
smaller piles. One of these subsamples is chosen for the analysis. This method avoids preferential
sampling by saving the subsample selection until last (USEPA, 1992, and Pitard, 1989, and
1993).

Mason (1992) provides a general overview of Gy's sampling theory as it applies to soils
and soil sampling. Appendix A and B of Mason (1992) provide guidance on the application of
Gy's theory to soil sampling. Also see Bilonick (1990), Pitard (1989), Pitard (1993), and USEPA
(1992).
Draft E-S February 1996
-------
Geostatistical Soil Sampling Guidance-Appendix E
Annotated Bibliography/List of References

ASTM, 1994, ASTM Standards on Ground Water and Vadose Zone Investigations (Second
Edition), American Society for Testing and Materials, Philadelphia.

Contains various ASTM field and laboratory methods that are used for soil investigations
including: sampling by auger (D14S2); split barrel sampling (D1586); thin-walled tube
sampling (D1587); Unified Soil Classification System (D2487); description and
identification of soils (D2488); practices for preserving and transporting soil samples
(D42240); soil sampling from the vadose zone (D4700); decontamination (D5088); and
various laboratory methods.

Barth, D.S., BJ. Mason, T.H. Starks, and K.W. Brown. 1989. Soil Sampling Quality
Assurance User's Guide. 2nd Ed. EPA 600/8-89/046 (NTIS PB89-189864) Environmental
Monitoring Systems Laboratory, Las Vegas, NV. 225 pp.

Soil sampling guidance discussing sample site selection, sample collection, sampling
handling, sample analysis, and interpretation of resulting data. Discusses setting Type I
and n error rates (Chapter 7), and how to design an "exploratory* and "definitive" study.
Many sections lack detail and useful specifics. Includes a glossary.

Bilonick, Richard A. 1990. Gy's Paniculate Material Sampling Theory in Statistical Sampling:
Past, Present and Future - Theoretical and Practical. ASTM STP 1097, Milton J. Kowalewslri
and Josh B. Tyne, Eds., American Society for Testing and Materials, Philadelphia.

An outline of Gy's contributions to paniculate material sampling is presented.
Draft E-6 February 1996
-------
Geostatistical Soil Sampling Guidance-Appendix E
Flatman, G.T., and A. A. Yfantis. 1983. Geostaristical Strategy for Soil Sampling: The Survey
and the Census in Environmental Monitoring and Assessment. 4, 335-349.

Describes a two-stage soil sampling strategy including modeling of a semi-variogram and
subsequent kriging analysis to give isopleth maps.

Keith, L.H. 1988. Principles of Environmental Sampling. American Chemical Society, 458 pp.

Provides 30 chapters by various authors discussing the many variables and special
techniques needed to plan and execute reliable sampling activities. Several chapters are
applicable to the soil sampling guidance: "Sampling for Tests of Hypothesis When Data
Are Correlated in Space and Time" by L.E. Bergman and W.F Quimby (Chapter 2, pp
25-42); "Nonparametric Geostatistics for Risk and Additional Sampling Assessment* by
A.G. Journal (Chapter 3, pp. 45-71); "Geostatistical Approaches to the Design of
Sampling Regimes" by G.T. Flatman, EJ. Englund, and A.A. Yfantis (Chapter 4, pp. 73-
80); "Sampling Variability in Soil and Solid Wastes" by E.K. Triegel (Chapter 27, pp.
385-393); and "Relations of Sampling Design to Analytical Precision Estimates* by L.J.
Holcombe (Chapter 28, pp395-406).

Mason, B.J. 1992. Preparation of Soil Sampling Protocols: Sampling Techniques and
Strategies. EPA/600/R-92/128, Environmental Monitoring Systems Laboratory, Office of
Research and Development, USEPA, Las Vegas, NV.

• Update of a 1983 guidance. Provides discussion of paniculate sampling theory and
application of Gy's Sampling Theory to U.S. EPA soil sampling. Provides brief
discussion (page 3-5 and 3-6) on a method to determine if any samples within a group of
samples combined into a composite are above an action level. The guidance is

Draft E-7 February 1996
-------
Geostatistical Soil Sampling Guidance—Appendix E
comprehensive, however, some sections lack detail.

• "Clay content, organic matter content, texture, permeability, pH, Eh, and cation exchange
capacity (CEC) will influence the rate of migration and form of the chemical found in
leachate migrating from the waste. These factors must be considered by the investigator
when designing a soil sampling effort," p. 1-6.

• Increment Delimitation Error, the result of incorrectly defining the shape of the volume
of material that is to be extracted, such as a soil horizon, pp. 2-7 to 2-8.

• Definition of "stratified sampling" and "strata* (from a sampling perspective), p. 3-8.

• Where to obtain background information on geology and soils, pp. 4-4 and 4-5.

• Definition of "soil horizon* and considerations for sampling specific strata, p. 5-1 and 5-2.
• Stratified random sampling, pp. 6-12 through 6-14. Includes Table 6-1, "Factors that can
be used to stratify soils."

• Soil type is the main factor in determining the background or control sampling point, p.
6-17.

• Use of soil punch tubes fitted with teflon caps, teflon tape, and vapor sealant to collect
samples for VOC analysis, p. 7-2.

• Determining geological structure, e.g., the effect of structure on the variances obtained

Draft E-8 February 1996
-------
GeostaHaical Soil Sampling Guidance-Appendix E
by sampling different media, pp. 9-3 to 9-4.

• Advantages and disadvantages of various sample collection methods (e.g., trench sampling
allows determination of soil horizons, scoop sampling does not allow good determination
of depth of soil sampled), Chapter 7.

Pitard, F.F. 1989. Pierre Oy's Sampling Theory and Sampling Practice. (2 volumes) CRC
Press, Inc, Boca Raton, Florida.

This text has been updated. See Pitard (1993).

Pitard, Francis F. 1993. Pierre Oy's Sampling Theory and Sampling Practice: Heterogeneity,
Sampling Correctness, and Statistical Process Control. 2nd ed. CRC Press, Inc., Boca Raton,
FL. 488pp.

Provides comprehensive study of heterogeneity, covering the basic principles of sampling
theory and its various applications. Provides specific methods for minimizing sampling
errors in the field and in the laboratory. See Chapters 14 and 21.

USEPA. 1993. Subsurface Characterization and Monitoring Techniques: A Desk Reference
Guide (Volume I: Solids and Ground Water, Volume 11: The Vadose Zone, Field Screening, and
Analytical Methods). EPA/625/R-93/003a, Environmental Monitoring Systems Laboratory, Las
Vegas, NV.

Volume I, Section 2, Drilling and Solids Sampling Methods, coven 20 drilling methods
and devices for sampling soils and geologic materials. The section also briefly identifies
important soil physical properties that are described in the field. Section 10, Field

Draft E-9 February 1996
-------
Gcostatistical Soil Sampling Guidance-Appendix E
Screening and Analytical Methods, covers a large number of techniques and groups of
techniques for field screening and analysis: chemical field measurement, sample extraction
procedures, gaseous phase analytical techniques, luminescence/spectroscopic techniques,
wet chemistry methods, and others.

USEPA. 1992. Correct Sampling Using the Theories of Pierre Gy (fact sheet). Office of
Research and Development Technology Support Project, EMSL Las Vegas, NV. 2 pp.

Fact sheet describing the seven types of sampling errors, sample integrity issues, and
correct sampling devices.

USEPA. 1991a. Seminar Publication. Siu OiaTtaerizaaon for Subsurface Remediation. Office
of Research and Development, Washington, D.C. 259 pp.

• Geologic aspects of site remediation, including stratigraphy, lithology, and structural
geology, pp. 23-27.

• Characterization of the Vadose Zone, Chapter 5.

• Soil/Aquifer matrix, particle size, mineralogy, CEC, p. 106.

• Considerations regarding sample type and size, specifically influence of fractures, p. 128.

• Sampling for calcium carbonate and iron/manganese oxide are important if remediation
involves air stripping, p. 131.

• Physicochemical Processes: Organic Contaminants, Chapter 10. Discussion of sorption,

Draft E-10 February 1996
-------
Gtostatistied Soil Sampling Guidance—Appendix E
etc.
• Physicochemical Processes: Inorganic Contaminants, Chapter 12. Discusses, speciation,
adsorption, ion exchange, etc.

• Mass balance conceptual framework, Chapter 14, p. 203-209.

• Volatile organic carbon adsorption to soil surface in the presence of two soil moisture
regimes, Figure 15-6, p. 226.

USEPA. 1991b. Description and Sampling of Contaminated Soils: A Field Pocket Guide (by
Russd Boulding). EPA/625/12-91/002, 122 pp.

• Good reference for describing soil physical parameters, color, porosity, permeability,
engineering properties, soil chemistry and biology (see Chapter 3, pages 27 through 92).
Sample collection procedures lack detail. Sample collection and preparation procedures
are not consistent with Gy's Theory.

• Provides procedures for conducting field description of soils and methods of field
determination of soil chemical and physical properties, including: bulk density (Section
3.1.6c), organic matter content by ignition test procedure (Section 3.3.1), clay mineralogy
(Section 3.3.7), determination of soluble salts (chloride and sulfate) (Section 3.3.6).

• Provides general soil sample collection procedures for volatiles (Section A.2.1) and
semivolatiles and metals (Section A.2.2).

USEPA. 199 Ic. Ground-Water Issue, Soil Sampling and Analysis for Volatile Organic

Drqft E-ll February 1996
-------
Geoaatistieal Soil Sampling Guidance-Appendix E
Compounds. EPA/540/4-91/001, Office of Research and Development and Office of Solid Waste
and Emergency Response.

Discusses issues and provides guidance related to sampling of soil for VOCs. Includes
discussion of devices for sampling, minimizing loss of VOC during sampling, appropriate
containers for sample storage, and reliable preservation procedures.
USEPA. 1989a. Med^ far Evabaang the Attainment of Cleanup Standards, Volume 1: Soils
and Solid Media. EPA 230/02-89-042 (NTIS PB89-234959), Statistical Policy Branch, Office
of Policy, Planning, and Evaluation, Washington, DC.

Sampling and analysis methods for evaluating whether a soils remediation effort has been
successful. The guidance describes basis statistical concepts related to sampling, designs
of sampling and analysis plans, field sampling procedures, testing a mean against a
standard, comparing a proportion to a standard, sequential sampling, hot spot detection,
geostatistics, and a glossary. Many of the procedures described in Chapter 5 (Field
Sampling Procedures) could be incorporated by reference. The chapter provides specific
guidance on identifying sample locations (p. 5-13), setting up a grid for systematic
sampling, and methods for acquiring samples (Figure 5-8). Page 5-15 provides a number
of references regarding composite sampling.

USEPA. 198%. RCRA Facility Investigation (RFI) Interim Final Guidance. EPA
530/SW-89-Q31, OSWER Directive 9502.00-6D.

Table of transformation/transport processes in soil, including process (biodegradation,
photodegradation, hydrolysis, oxidation/reduction, volatilization, adsorption, and
dissolution) and key factor (e.g., pH, nutrient concentrations, soil porosity, CEQ, Table

Drqfi E-12 February 1996
-------
Geoaatiaied Soil Sampling Guidance-Appendix E
9-3, p. 9-10.

• Discussion of contaminant and soil characteristics (including pH, pKa, solubility, Kow,
biodegradability, moisture content, oxygen content) and their influence on contaminant fate
and transport, pp. 9-11 to 9-17.

• Considerations for determining depth of the release, pp. 9-20 to 9-22.

• Discussions of Spatial variability and spatial and temporal fluctuations in soil moisture
content pp. 9-25 to 9-28.

• Discussions of soil classification, particle size distribution, porosity, hydraulic
conductivity, relative permeability, sorptive capacity and Kd, CEC, organic carbon
content, soil pH, depth to ground water, pore-water velocity, percolation, and volumetric
water content and their influence on contaminant fate and transport, pp. 9-28 to 9-38.

• Predicting mobility of hazardous constituents in soil, pp. 9-48 to 9-60.

• General discussion of sampling techniques, pp. 9-66 to 9-69.

USEPA. 1989c. Seminar Publication. Transport and Fate of Contaminants in the Subsurface.
EPA/625/4-89/019, Center for Environmental Research Information and R.S. Ken Environmental
Research Laboratory, 148pp.

• Discussion of subsurface chemical processes (e.g., hydrolysis, sorption, biodegradation,
volatilization), Chapter 5.
Drq/t E-13 February 1996
-------
Gtostatisticd Soil Sampling Guidance-Appendix E
• Expanded discussion of biodegiadation, Chapters 7 and 8.

USEPA. 1989d. Risk Assessmem Guidance for Superjund, Volume I: Human Health Evaluation
Manual (Part A), Interim Final. EPA/540/1-89/002.

While this document is intended for use in die Superfund Program, it contains information
and references related to soil sampling mat might be useful. Specifically, see: Section
4.4.3, "Background Sample Size", discusses sample size and Type I and Type n errors.
Section 4.5.2, "Soil", discusses soil sampling considerations. Section 4.6, "Developing
an Overall Strategy for Sample Collection", includes discussions of sample size, locations,
types, and field screening analyses.

USEPA. 1987. Dam Quality Objectives for Remedial Response Activities. Office of Emergency
and Remedial Response and Office of Waste Programs Enforcement, Washington D.C., NTISf
PB90-272634.

• Ca» study mustrating the development of a conceptual modd uses the example t^
with a pH of 10 will bind metals, p. 3-9.

• Brief mention of spatial variability (p. C-4) and stratification (p. C-6).

van Ee, J.J., LJ. Blume, and T.H. Starks. 1990. A Rationale for the Assessment of Errors in
the Sampling of Soils. EPA 600/4-90/013, Environmental Monitoring Systems Laboratory, Las
Vegas, NV. 57pp.

EMSL guidance for determining how many, and what type, of samples are required to
assess the quality of data in a field sampling effort, and how the information from the

Draft E-14 February 1996
-------
E-15
February 1996
-------