Methods for Evaluating the Attainment of Cleanup Standards for Soils and Solid Media: a Guide

United States
Environmental Protection
Agency
Office of
Solid Waste and
Emergency Response
Publication: 9355.4-04FS
July 1991
&EPA
A Guide:
Methods for Evaluating the
Attainment of Cleanup Standards
For Soils and Solid Media
Office of Emergency and Remedial Response
Hazardous Site Control Division OS-220W
Quick Reference Fact Sheet
GOALS

This fact sheet highlights statistical concepts and methods used in the evaluation of the attainment of cleanup standards. It provides
an example of a basic procedure for determining sample size required to obtain a given confidence level focusing on a cleanup standard
specified as a mean concentration with a specified confidence. It does not provide policy on specification of cleanup levels but should
be considered a technical reference guide for using some of the more common methodologies. More detailed information on these and
other methodologies can be obtained from Methods for Evaluating the Attainment of Cleanup Standards. Volume 1: Soils and Solid
Media, EPA 230/02-89/042. Copies of this volume are available from the National Technical Information Service, Springfield, VA
22161. Price: $28.95 (paper), S6.95 (microfiche).
[Terms in bold, italicized print are defined in the glossary on the last page of this fact sheet.]
WHY ARE STATISTICS IMPORTANT?

Statistical methods perform apowerful and useful function. They
allow extrapolation from a set of samples to the entire site in a
scientifically valid fashion.

Extrapolation involves uncertainty. Statistical methods enable
estimation and management of the uncertainty. Ideally, uncer-
tainty may be reduced to any desired level given complete
freedom in sampling and testing. This is seldom a viable option,
so statistics are used to determine a balance between sampling and
certainty.

Statistical principles can be used to design sampling plans that
correlate with methods of analysis tailored to evaluating attain-
ment of cleanup standards. Correlated sampling and analysis
methodologies offer higherconfidence levels in decision-making.

Efficient statistical sampling plans can be developed to detect the
presence of hot spots on a site. The plans allow the prediction of
the uncertainty of overlooking a hot spot of a specified size.
Sequential test procedures test only enough samples to accept or
reject a clean or not-clean hypothesis and this can quickly indicate
highly contaminated areas or areas of very low contamination.

Statistical methods can be used to compute mean concentrations
.over areas where information indicates that contaminant levels
|tre substantially higher or lower than surrounding levels. This
provides more accurate evaluation through limiting dilution of
the mean by data from unaffected soil units.
ROLE OF STATISTICS

If a remedial cleanup goal is that each square meter of site soil
surface shall have a residual concentration level no greater than
(C) ppm, how can the attainment of such a goal be measured? If
the site area is one hectare (2.87 acres), there are ten thousand
square meters of surface area. To be absoutely sure, one must test
each square meter for contamination (if one sample from each
meter is known to be representative of the whole meter). Obvi-
ously, ten thousand samples is prohibitive. So, what are the
alternatives?

If the number of samples that can be economically and practically
acquired is limited, the question immediately arises: how repre-
sentative of the whole site is a small set of samples? There is a
chance, for instance, that either too many samples came from
relatively clean areas of the site or from the more heavily contami-
nated areas of the site. The possibilities present a finite probability
that a false positive (a) or false negative (J3) conclusion may be
drawn where the actual condition of the site is misinterpreted
because of uncertainties in sampling. Statistical sampling and
analysis techniques allow a determination of the level of confi-
dence for a specific set of conditions. These techniques can be
used to evaluate data or determine how much data are required to
confirm that a designated cleanup level has been attained.

Statistical evaluations also provide a logical consistent approach
for optimizing results from limited resources. The known prop-
erties of sample data distributions are used to design sampling
plans and data analysis routines to provide predictable confidence

-------
levels for decisions. The confidence levels attainable will depend
on the quantity and quality of available data.

It helps to think of cleanup standards as having four components:
1) the  magnitude - concentration deemed protective of human
health and the environment; 2) a sampling plan  to evaluate
attainment of the specified concentration; 3) a method for com-
paring  the  data  collected to the cleanup  level; and  4) the
probability of mistakenly declaring the sample area clean (false
positive rate).  All but the first step depends heavily on statistical
analysis.  Figure 1 indicates the steps that must be completed to
define attainment objectives.
             Various methods can be used to compare data to cleanup levels,
             e.g., 1) average  condition (mean concentration (x) is below
             cleanup level at a specified confidence level); 2) value rarely to
             be exceeded (specified  proportion of soil is below a cleanup
             standard); or 3) hot spots that should be found if present.  Ex-
             amples of other  options in which methods are combined are
             provided in Box  1.  It is important to consider  the attainment
             evaluation during the site investigation so that the method for
             evaluating attainment can be included in the remedy specification.
                 START
          Define the sample areas
         Specify the chemicals to be
                  tested
                  SSSSJSSSS^
        Establish the cleanup standard
     SSSBSSSSSSSSSSSSSSSSSSSSSS SSSSSSSSSSSSSSSSSSSS
         Specify the parameter to be
      compared to the cleanup standard
     Specify the probability of mistakenly
       declaring the sample area clean
     ssssssssssssssssssssssss^
         Review all elements of the
           attainment objectives
     ssssssssssssssssssssssssssss^^
                                                               Yes
Are any changes
in the attainment
   objectives
   required?
                                                   | Specify sampling
                                                   I and analysis plan
                                                                                   BOX 1 - ExanapteS of Using SSufcipte
                                                                                            Attainment Criteria
                                Most of the soil has concentrations below the
                                cleanup standard while concentrations are above
                                the cleanup standard.  This standard may be
                                accomplished by testing whether the 75th
                                percentile is below the cleanup standard and
                                whether the mean of those concentrations above
                                the cleanup standard is less than twice the cleanup
                                standard.

                                The mean concentration is less than the cleanup
                                standard and the standard deviation (a) of the data
                                is small, thus limiting the number of extreme
                                concentrations. This standard may be
                                accomplished by testing if the mean is below the
                                cleanup standard and the coefficient of variation (r)
                                is less than a low level  (.5 for example).

                                The mean concentration is less than the cleanup
                                standard and the remaining contamination is
                                uniformly distributed across the sample area
                                relative to the overall spread of the data. Testing
                                these criteria may be accomplished by testing for a
                                mean below the cleanup standard and variability
                                between strata means that is not large compared to
                                the variability within strata (analysis of variance).

                                The mean concentration is less than the cleanup
                                standard and no area of contaminated soil
                                (assumed to be circular) is larger than a specified
                                size.
                           STATISTICAL METHODOLOGY
                           LIMITATIONS
When key assumptions about the site and col-
lected data are violated, the statements of data
confidence may change.  Statistical assumptions
include:  the sample area is homogenous; the
distribution  of data is normal, or can be trans-
formed into near normal data (e.g., taking the log
of the data tends to normalize the data thus allowing
standard procedures to be used); and sampling
locations were selected using a simple random
sampling procedure.

-------
PROCESS - DETERMINING WHETHER THE
MEAN CONCENTRATION AT A SITE IS
LESS THAN THE CLEANUP  STANDARD
   wer Curve
The probability of declaring a sample area clean will depend on
the sample population mean concentration.  The relationship
between a population mean and decision outcome is shown in
Figure 2.  This relationship is known  in statistics as a "power
curve."

Power curves can facilitate understanding the relationship be-
tween mean concentration and confidence level.  Power curves
also can help determine an appropriate sample size.
                                 Sampling Plan


                                 Once the cleanup concentration and statistical method (i.e., for
                                 this discussion, the mean concentration) has been specified, the
                                 sampling and analysis plans should be developed. There are two
                                 basic types of sampling plans: systematic and random. These are
                                 illustrated in Figure 3.


                                 Pros and Cons - Systematic or Random Sampling
                                 Systematic sampling  is generally easier to carry out.  Such
                                 sampling almost always results in both lower costs and in higher
                                 data reliability than simple random sampling. Systematic sam-
                                 pling also protects against having large contiguous areas of high
                                         FIGURE 2 - Power Curve
  This curve represents a condition where, when both the false negative (P) and false positive (a) risks are set at 10%, the population
  mean concentration must be 0.5 ppm (or less) in order to be 90% certain the site is clean at the 1 ppm level. Power curves have
  been developed for several values of a and can be found in Appendix A of Methods for Evaluating the Attainment of Cleanup
  Standards. They are defined by the cleanup level, the false negative rate, and the variance and can be used to determine the mean
  concentration required to achieve a particular false positive rate. (See example calculation at end of fact sheet.)
  False negative &r100°/0
  rate, p        1
  False positive
  rate, a
                                                                                     Power curve for data set
                                                                                     with 0 variance

                                                                                      Power curve for data set
                                                                                      with moderate variance*

                                                                                      Power curve for data set
                                                                                      with large variance*
0   0.1
                                                 0.5        0.7
                                    Population mean concentration, ppm
1 .0
                                                                                     Clean target, ppm
1 .4
   *Whether the variance is considered low, moderate, or high will depend on the magnitude of the standard and the risk level
   it represents; e.g., a variance that is 10 times the magnitude of the standard may be considered moderate if the standard is
   conservative (i.e., if the standard is set low).

-------
                                       3- Strategies for Sheeting Sampling Locations
      SYSTEMATIC SAMPLING DESIGN EXAMPLE
      Systematic sampling distributes the sampling points uniformly over the site area of interest.  The systematic sampling plan provides a
      uniform site coverage with a larger grid spacing.
                                                                          SUE BOUNDARY
                                                                          L
                                                                                      IDENTIRED
                                                                                      PORTION (STRATUM)
                                                                                      OF THE SITE WITH
                                                                                      OTHER SAMPLING
                                                                                      NEEDS IS TREATED
                                                                                      SEPARATELY
                           Gnd Dimension
      RANDOM SAMPLING DESIGN EXAMPLE
      True random selection of sampling points requires that each sample point chosen must be independent of the location of all other sample
      points. The random sampling plan has a better chance of detecting site anomalies than the systematic sampling plan.


      X Coordinate = X   + (X   - X  ). RND
                   nun   * max   mm'
      Y Coordinate = Y   = (Y    Y  ) * RND
                   rrun   * mai   rmn'
                            Random Cho
-------
 Hence, ji < C  The relationship of p.., C , a, and B is illustrated
 * T~>*     A                           S
 m Figure 4.
   ic variance is generally not known at the time that the sample
  Size is being calculated but can be estimated from any data that
 does exist or crudely approximated using the formula:
             rj2 (estimated variance) = Range/6
 where Range is the expected spread between the smallest and
 largest values.
 Box 2 shows a sample calculation of sample size.


 Evaluation of Attainment
 The mean of the sampling data is an estimate  of  the mean
 contamination of the entire sample area; it does not  convey
 information regarding the reliability of the estimate. Through the
 use of a "confidence interval," it is possible to provide  a range of
 values within which the true mean is located.


 The formula for an upper one-sided 100( 1 -a) percent confidence
 limit around the population mean is presented below:
where:
X = computed mean level of contamination
S = the standard deviation of the sampling data
 f = the degrees of freedom  (= n-1)
    = confidence limit


The appropriate value of t(1 o dl)can be obtained from Table 2. The
one-sided confidence interval can be used to test whether the site
has attained the cleanup standard.


To determine whether the site meets a specified cleanup standard,
use the upper one-sided confidence limit U, defined in the above
equation. If \j^a < C , conclude that the area attains the cleanup
standard. If p^a > Cs> conclude that the area does not attain the
cleanup standard.


EXAMPLE CALCULATION USING THE
POWER CURVES
 At a former wood processing plant it is desirable to determine if
 the average concentrations of PAH compounds in the surface soil
 are below 50 ppm (the cleanup standard Cs). The project man-
 agers have decided that the dangers from long-term exposure can
 be reasonably controlled if the mean concentration in the sample
 area is less than the cleanup standard. The false positive rate for
 the test is to be at most 5% (i.e., a = .05). The false negative rate
js desired to be no more than 20% (i.e., 8 = .20). The coefficient
T>f variation of the data is thought to be about 1.2.  The power
 curves for a=.05 (see Figure 5) and the approximate sample sizes
 for random sampling were reviewed.
                    - Relationship of JL and C
  H0: The site exceeds the cleanup standard (i.e., u^ > CJ
  H : The site attains the cleanup standard (i.e., u. < C )*
C     fi     o   j  j                 ''a
    = Cleanup Standard
     BOX 2 * Example sample Size cafcutatkm
If the site cleanup target (Cs) is 12 ppm, the alternative clean
decision level (n,) is 11 ppm, and the expected variance (a2) of the
data is 8, we can obtain a 95% confidence level (false positive rate =
.05) at a risk of 10% (false negative rate = 0.10) of declaring the site
clean by determining the mean site sample concentration from:
                         2
                                                                            1.645 + 1.282
                                                                                12-11
                            = 68.53 = 69 samples
                                                                              = Z   =1.645
                                                              Note: If the false negative risk is decreased from 1 0% to 5%, the
                                                              number of samples required would increase to 87. The reduction of
                                                              risk always requires increasing sample size.
                                                             These curves illustrate the relationship between cleanup level and
                                                             probability of attainment for various sample sizes. Approximate
                                                             sample sizes for a range of coefficients of variation are presented
                                                             below the figure as a guide to determining which curve is appro-
                                                             priate for the situation under consideration.  Using this informa-
                                                             tion, the following conclusions can be made:
                                                                     While it would be desirable to have a test with power
                                                                     curves similar to E and F, the sample sizes of more than
                                                                     100 will cost too much.
       Power curves A, B, and C have unacceptably low power
       (i.e., the power, 1 -p, is too low) when the mean concen-
       tration is roughly 75% of the cleanup level (i.e., 37 ppm).
       For example, at .75 on the x axis, curves A, B, and C give
       power (on the y axis) of approximately .15 to .40 (i.e., P
       error rates of .85 to .60). This clearly is undesirable in
       most situations. Viewing the table in Figure 5, we see
       that in order to have a false negative rate of 20% or less
       the site mean concentration would have to be approxi-
       mately 25% of the cleanup level for curve A to 57% of the
       cleanup level for curve C.
       Consequently, a reasonable compromise between high
       power and low sample size is to have a test with a power
       curve similar to D.
-------
Based on  spec ifi cad ons above and the table at the bottom of
Figure 5, the information needed to calculate the sample size is:
                a =.05;
                P = .20; and
                (i, = Cs * .69 = 34.5 ppm.


These values can be used to calculate sample size. From
Table 1:
        Z_= 1.645
   C.V.--U

   0 = (1.2)(34.5) = 44.4
   02= 1971.36
Number of samples =
                           0-0)+ Z
M  ,    ,     .     ,Q-, ,, I 1.645+ .842
Number of samples = 1971.36 \—50. 34 5

                  = 50.75 = 51 samples
                                                             This number is smaller than the numbers presented below Fig-
                                                             ure 5 because the numbers in Figure 5 are calculated to be
                                                             conservative estimates (C was used to calculate CT rather than jij).
                                                             Once the samples are taken, attainment can be evaluated
                                                             follows:
                                        The following data are known or calculated:


                                                C = Cleanup Standard = 50 ppm
                                                "x = Mean concentration = 38 ppm
                                                s = Standard deviation = 15

                                                ,1.684+1.671  =1.677 = 41.52
                                                             2

                                        The upper one-sided 95% confidence interval

                                goes to iiua ="x + t,^df JL = 38 + l-677^y =41.52
                                                                     Since 41.5 < 50, there is a 95 % confidence that the mean
                                                                     concentration  of the sample area attains  the cleanup
                                                                     standard of 50 ppm.
                                     FIGURE 5 - Power Curves for a = 5%
            Probability
           of Deciding
             the S ite
            Attains the
             Cleanup
             Standard
ammeters for ihc
ower Curves    J
                 True parameter as a fraction of Ck

                             Power Curve:
                        B
                                       D
                                Approximate sample sizes for simple random sampling for testing the parameters indicated

                                                                Power Curve:
                                Parameters being tested |    A   |    B   |   C   |   D   |   E   |   F
                 .05

                 .20
                         .05

                         .20

                      ,43.CS
   .05

   .20

.57 .C,
   .05

   .20

.69-C,
   .05

   .20

.77.C,
.05

.20
Mean
with cv (data) = .5
with cv (data) = 1
wilhcv (data) = 1.5

4
11
25

5
20
43

9
34
76

17
65
145

30
117
264

61
242
544
-------
£ Values for Selected Alpha and Beta
P
a
0.450
0.400
0.350
0.300
0.250
0.200
0.100
0.050
0.025
0.010
0.0050
0.0025
0.0010
Zi.p
Zi-a
0.124
0.253
0.385
0.524
0.674
0.842
1.282
1.645
1.960
2.326
2.576
2.807
3.090
TABLES
Table oft for Selected Alpha and Degrees af Freedom
Use alpha to determine which column to use based on the desired parameter, tl 
-------
                                                   GLOSSARY
  Distribution -The frequencies with which measurements in a
  data set fall within specified intervals.

  False Negative (P) - The probability of mistakenly concluding
  that the sample area has not attained the cleanup level when it
  has. It is known as the probability of making a Type II error.

  False Positive (a) - The probability of mistakenly concluding
  that the sample area has attained the cleanup level when it has
  not It is known as the probability of making a Type I error.

  Hypothesis - An assumption about a property or characteristic
  of a population under study. The goal of statistical inference is
  to decide which of two complementary hypotheses is likely to
  be true. In the context of this document, the null hypothesis is
  that the sample area has not achieved the cleanup standard and
  the alternative hypothesis is that it has.

  Normal Distribution  - A  family of "bell-shaped" distribu-
  tions, or curves, where each individual distribution is uniquely
  defined by its mean and variance.

  Sample Area - The specific area within a waste site for which
  a separate decision on attainment is to be reached.

  Sample Mean  -  The arithmetic average of a set of sample
  measurements, x,, x.,,.  . .  x , defined to be:
                 12      n
Sample Population  - The total number of soil/solid media
units at a waste site for which inferences regarding attainment
of cleanup standards are to be made.

Sample Standard Deviation - The more commonly used
measure of dispersion of the sample measurements, defined to
be:
              (See definition for variance)

Sequential Test Procedures - Sampling process that termi-
nates when enough evidence is obtained to either accept or
reject the null hypothesis.

Simple Random Sample - A sample of n units collected from
apopulation of interest (for example, all possible samples of soil
units at a site) such that each unit has an equal chance of being
selected.

Variance   -   A measurement of dispersion  of the sample
measurements, x,, x2, . . . xn, defined to be:
 SEPA
United States
Environmental Protection
Agency (OS-120)
Washington, D.C.  20460
                                   First Class Mail
                                   Postage and Fees Paid
                                   EPA
                                   Permit No. G-35
Official Business
Penalty for Private Use
$300
-------