United States
Environmental Protection
Agency
Office of
Solid Waste and
Emergency Response
Publication: 9355.4-04FS
July 1991
&EPA
A Guide:
Methods for Evaluating the
Attainment of Cleanup Standards
For Soils and Solid Media
Office of Emergency and Remedial Response
Hazardous Site Control Division OS-220W
Quick Reference Fact Sheet
GOALS
This fact sheet highlights statistical concepts and methods used in the evaluation of the attainment of cleanup standards. It provides
an example of a basic procedure for determining sample size required to obtain a given confidence level focusing on a cleanup standard
specified as a mean concentration with a specified confidence. It does not provide policy on specification of cleanup levels but should
be considered a technical reference guide for using some of the more common methodologies. More detailed information on these and
other methodologies can be obtained from Methods for Evaluating the Attainment of Cleanup Standards. Volume 1: Soils and Solid
Media, EPA 230/02-89/042. Copies of this volume are available from the National Technical Information Service, Springfield, VA
22161. Price: $28.95 (paper), S6.95 (microfiche).
[Terms in bold, italicized print are defined in the glossary on the last page of this fact sheet.]
WHY ARE STATISTICS IMPORTANT?
Statistical methods perform apowerful and useful function. They
allow extrapolation from a set of samples to the entire site in a
scientifically valid fashion.
Extrapolation involves uncertainty. Statistical methods enable
estimation and management of the uncertainty. Ideally, uncer-
tainty may be reduced to any desired level given complete
freedom in sampling and testing. This is seldom a viable option,
so statistics are used to determine a balance between sampling and
certainty.
Statistical principles can be used to design sampling plans that
correlate with methods of analysis tailored to evaluating attain-
ment of cleanup standards. Correlated sampling and analysis
methodologies offer higherconfidence levels in decision-making.
Efficient statistical sampling plans can be developed to detect the
presence of hot spots on a site. The plans allow the prediction of
the uncertainty of overlooking a hot spot of a specified size.
Sequential test procedures test only enough samples to accept or
reject a clean or not-clean hypothesis and this can quickly indicate
highly contaminated areas or areas of very low contamination.
Statistical methods can be used to compute mean concentrations
.over areas where information indicates that contaminant levels
|tre substantially higher or lower than surrounding levels. This
provides more accurate evaluation through limiting dilution of
the mean by data from unaffected soil units.
ROLE OF STATISTICS
If a remedial cleanup goal is that each square meter of site soil
surface shall have a residual concentration level no greater than
(C) ppm, how can the attainment of such a goal be measured? If
the site area is one hectare (2.87 acres), there are ten thousand
square meters of surface area. To be absoutely sure, one must test
each square meter for contamination (if one sample from each
meter is known to be representative of the whole meter). Obvi-
ously, ten thousand samples is prohibitive. So, what are the
alternatives?
If the number of samples that can be economically and practically
acquired is limited, the question immediately arises: how repre-
sentative of the whole site is a small set of samples? There is a
chance, for instance, that either too many samples came from
relatively clean areas of the site or from the more heavily contami-
nated areas of the site. The possibilities present a finite probability
that a false positive (a) or false negative (J3) conclusion may be
drawn where the actual condition of the site is misinterpreted
because of uncertainties in sampling. Statistical sampling and
analysis techniques allow a determination of the level of confi-
dence for a specific set of conditions. These techniques can be
used to evaluate data or determine how much data are required to
confirm that a designated cleanup level has been attained.
Statistical evaluations also provide a logical consistent approach
for optimizing results from limited resources. The known prop-
erties of sample data distributions are used to design sampling
plans and data analysis routines to provide predictable confidence
-------
levels for decisions. The confidence levels attainable will depend
on the quantity and quality of available data.
It helps to think of cleanup standards as having four components:
1) the magnitude - concentration deemed protective of human
health and the environment; 2) a sampling plan to evaluate
attainment of the specified concentration; 3) a method for com-
paring the data collected to the cleanup level; and 4) the
probability of mistakenly declaring the sample area clean (false
positive rate). All but the first step depends heavily on statistical
analysis. Figure 1 indicates the steps that must be completed to
define attainment objectives.
Various methods can be used to compare data to cleanup levels,
e.g., 1) average condition (mean concentration (x) is below
cleanup level at a specified confidence level); 2) value rarely to
be exceeded (specified proportion of soil is below a cleanup
standard); or 3) hot spots that should be found if present. Ex-
amples of other options in which methods are combined are
provided in Box 1. It is important to consider the attainment
evaluation during the site investigation so that the method for
evaluating attainment can be included in the remedy specification.
START
Define the sample areas
Specify the chemicals to be
tested
SSSSJSSSS^
Establish the cleanup standard
SSSBSSSSSSSSSSSSSSSSSSSSSS SSSSSSSSSSSSSSSSSSSS
Specify the parameter to be
compared to the cleanup standard
Specify the probability of mistakenly
declaring the sample area clean
ssssssssssssssssssssssss^
Review all elements of the
attainment objectives
ssssssssssssssssssssssssssss^^
Yes
Are any changes
in the attainment
objectives
required?
| Specify sampling
I and analysis plan
BOX 1 - ExanapteS of Using SSufcipte
Attainment Criteria
Most of the soil has concentrations below the
cleanup standard while concentrations are above
the cleanup standard. This standard may be
accomplished by testing whether the 75th
percentile is below the cleanup standard and
whether the mean of those concentrations above
the cleanup standard is less than twice the cleanup
standard.
The mean concentration is less than the cleanup
standard and the standard deviation (a) of the data
is small, thus limiting the number of extreme
concentrations. This standard may be
accomplished by testing if the mean is below the
cleanup standard and the coefficient of variation (r)
is less than a low level (.5 for example).
The mean concentration is less than the cleanup
standard and the remaining contamination is
uniformly distributed across the sample area
relative to the overall spread of the data. Testing
these criteria may be accomplished by testing for a
mean below the cleanup standard and variability
between strata means that is not large compared to
the variability within strata (analysis of variance).
The mean concentration is less than the cleanup
standard and no area of contaminated soil
(assumed to be circular) is larger than a specified
size.
STATISTICAL METHODOLOGY
LIMITATIONS
When key assumptions about the site and col-
lected data are violated, the statements of data
confidence may change. Statistical assumptions
include: the sample area is homogenous; the
distribution of data is normal, or can be trans-
formed into near normal data (e.g., taking the log
of the data tends to normalize the data thus allowing
standard procedures to be used); and sampling
locations were selected using a simple random
sampling procedure.
-------
PROCESS - DETERMINING WHETHER THE
MEAN CONCENTRATION AT A SITE IS
LESS THAN THE CLEANUP STANDARD
wer Curve
The probability of declaring a sample area clean will depend on
the sample population mean concentration. The relationship
between a population mean and decision outcome is shown in
Figure 2. This relationship is known in statistics as a "power
curve."
Power curves can facilitate understanding the relationship be-
tween mean concentration and confidence level. Power curves
also can help determine an appropriate sample size.
Sampling Plan
Once the cleanup concentration and statistical method (i.e., for
this discussion, the mean concentration) has been specified, the
sampling and analysis plans should be developed. There are two
basic types of sampling plans: systematic and random. These are
illustrated in Figure 3.
Pros and Cons - Systematic or Random Sampling
Systematic sampling is generally easier to carry out. Such
sampling almost always results in both lower costs and in higher
data reliability than simple random sampling. Systematic sam-
pling also protects against having large contiguous areas of high
FIGURE 2 - Power Curve
This curve represents a condition where, when both the false negative (P) and false positive (a) risks are set at 10%, the population
mean concentration must be 0.5 ppm (or less) in order to be 90% certain the site is clean at the 1 ppm level. Power curves have
been developed for several values of a and can be found in Appendix A of Methods for Evaluating the Attainment of Cleanup
Standards. They are defined by the cleanup level, the false negative rate, and the variance and can be used to determine the mean
concentration required to achieve a particular false positive rate. (See example calculation at end of fact sheet.)
False negative &r100°/0
rate, p 1
False positive
rate, a
Power curve for data set
with 0 variance
Power curve for data set
with moderate variance*
Power curve for data set
with large variance*
0 0.1
0.5 0.7
Population mean concentration, ppm
1 .0
Clean target, ppm
1 .4
*Whether the variance is considered low, moderate, or high will depend on the magnitude of the standard and the risk level
it represents; e.g., a variance that is 10 times the magnitude of the standard may be considered moderate if the standard is
conservative (i.e., if the standard is set low).
-------
3- Strategies for Sheeting Sampling Locations
SYSTEMATIC SAMPLING DESIGN EXAMPLE
Systematic sampling distributes the sampling points uniformly over the site area of interest. The systematic sampling plan provides a
uniform site coverage with a larger grid spacing.
SUE BOUNDARY
L
IDENTIRED
PORTION (STRATUM)
OF THE SITE WITH
OTHER SAMPLING
NEEDS IS TREATED
SEPARATELY
Gnd Dimension
RANDOM SAMPLING DESIGN EXAMPLE
True random selection of sampling points requires that each sample point chosen must be independent of the location of all other sample
points. The random sampling plan has a better chance of detecting site anomalies than the systematic sampling plan.
X Coordinate = X + (X - X ). RND
nun * max mm'
Y Coordinate = Y = (Y Y ) * RND
rrun * mai rmn'
Random Cho
-------
Hence, ji < C The relationship of p.., C , a, and B is illustrated
* T~>* A S
m Figure 4.
ic variance is generally not known at the time that the sample
Size is being calculated but can be estimated from any data that
does exist or crudely approximated using the formula:
rj2 (estimated variance) = Range/6
where Range is the expected spread between the smallest and
largest values.
Box 2 shows a sample calculation of sample size.
Evaluation of Attainment
The mean of the sampling data is an estimate of the mean
contamination of the entire sample area; it does not convey
information regarding the reliability of the estimate. Through the
use of a "confidence interval," it is possible to provide a range of
values within which the true mean is located.
The formula for an upper one-sided 100( 1 -a) percent confidence
limit around the population mean is presented below:
where:
X = computed mean level of contamination
S = the standard deviation of the sampling data
f = the degrees of freedom (= n-1)
= confidence limit
The appropriate value of t(1 o dl)can be obtained from Table 2. The
one-sided confidence interval can be used to test whether the site
has attained the cleanup standard.
To determine whether the site meets a specified cleanup standard,
use the upper one-sided confidence limit U, defined in the above
equation. If \j^a < C , conclude that the area attains the cleanup
standard. If p^a > Cs> conclude that the area does not attain the
cleanup standard.
EXAMPLE CALCULATION USING THE
POWER CURVES
At a former wood processing plant it is desirable to determine if
the average concentrations of PAH compounds in the surface soil
are below 50 ppm (the cleanup standard Cs). The project man-
agers have decided that the dangers from long-term exposure can
be reasonably controlled if the mean concentration in the sample
area is less than the cleanup standard. The false positive rate for
the test is to be at most 5% (i.e., a = .05). The false negative rate
js desired to be no more than 20% (i.e., 8 = .20). The coefficient
T>f variation of the data is thought to be about 1.2. The power
curves for a=.05 (see Figure 5) and the approximate sample sizes
for random sampling were reviewed.
- Relationship of JL and C
H0: The site exceeds the cleanup standard (i.e., u^ > CJ
H : The site attains the cleanup standard (i.e., u. < C )*
C fi o j j ''a
= Cleanup Standard
BOX 2 * Example sample Size cafcutatkm
If the site cleanup target (Cs) is 12 ppm, the alternative clean
decision level (n,) is 11 ppm, and the expected variance (a2) of the
data is 8, we can obtain a 95% confidence level (false positive rate =
.05) at a risk of 10% (false negative rate = 0.10) of declaring the site
clean by determining the mean site sample concentration from:
2
1.645 + 1.282
12-11
= 68.53 = 69 samples
= Z =1.645
Note: If the false negative risk is decreased from 1 0% to 5%, the
number of samples required would increase to 87. The reduction of
risk always requires increasing sample size.
These curves illustrate the relationship between cleanup level and
probability of attainment for various sample sizes. Approximate
sample sizes for a range of coefficients of variation are presented
below the figure as a guide to determining which curve is appro-
priate for the situation under consideration. Using this informa-
tion, the following conclusions can be made:
While it would be desirable to have a test with power
curves similar to E and F, the sample sizes of more than
100 will cost too much.
Power curves A, B, and C have unacceptably low power
(i.e., the power, 1 -p, is too low) when the mean concen-
tration is roughly 75% of the cleanup level (i.e., 37 ppm).
For example, at .75 on the x axis, curves A, B, and C give
power (on the y axis) of approximately .15 to .40 (i.e., P
error rates of .85 to .60). This clearly is undesirable in
most situations. Viewing the table in Figure 5, we see
that in order to have a false negative rate of 20% or less
the site mean concentration would have to be approxi-
mately 25% of the cleanup level for curve A to 57% of the
cleanup level for curve C.
Consequently, a reasonable compromise between high
power and low sample size is to have a test with a power
curve similar to D.
-------
Based on spec ifi cad ons above and the table at the bottom of
Figure 5, the information needed to calculate the sample size is:
a =.05;
P = .20; and
(i, = Cs * .69 = 34.5 ppm.
These values can be used to calculate sample size. From
Table 1:
Z_= 1.645
C.V.--U
0 = (1.2)(34.5) = 44.4
02= 1971.36
Number of samples =
0-0)+ Z
M , , . ,Q-, ,, I 1.645+ .842
Number of samples = 1971.36 \—50. 34 5
= 50.75 = 51 samples
This number is smaller than the numbers presented below Fig-
ure 5 because the numbers in Figure 5 are calculated to be
conservative estimates (C was used to calculate CT rather than jij).
Once the samples are taken, attainment can be evaluated
follows:
The following data are known or calculated:
C = Cleanup Standard = 50 ppm
"x = Mean concentration = 38 ppm
s = Standard deviation = 15
,1.684+1.671 =1.677 = 41.52
2
The upper one-sided 95% confidence interval
goes to iiua ="x + t,^df JL = 38 + l-677^y =41.52
Since 41.5 < 50, there is a 95 % confidence that the mean
concentration of the sample area attains the cleanup
standard of 50 ppm.
FIGURE 5 - Power Curves for a = 5%
Probability
of Deciding
the S ite
Attains the
Cleanup
Standard
ammeters for ihc
ower Curves J
True parameter as a fraction of Ck
Power Curve:
B
D
Approximate sample sizes for simple random sampling for testing the parameters indicated
Power Curve:
Parameters being tested | A | B | C | D | E | F
.05
.20
.05
.20
,43.CS
.05
.20
.57 .C,
.05
.20
.69-C,
.05
.20
.77.C,
.05
.20
Mean
with cv (data) = .5
with cv (data) = 1
wilhcv (data) = 1.5
4
11
25
5
20
43
9
34
76
17
65
145
30
117
264
61
242
544
-------
£ Values for Selected Alpha and Beta
P
a
0.450
0.400
0.350
0.300
0.250
0.200
0.100
0.050
0.025
0.010
0.0050
0.0025
0.0010
Zi.p
Zi-a
0.124
0.253
0.385
0.524
0.674
0.842
1.282
1.645
1.960
2.326
2.576
2.807
3.090
TABLES
Table oft for Selected Alpha and Degrees af Freedom
Use alpha to determine which column to use based on the desired parameter, tl
-------
GLOSSARY
Distribution -The frequencies with which measurements in a
data set fall within specified intervals.
False Negative (P) - The probability of mistakenly concluding
that the sample area has not attained the cleanup level when it
has. It is known as the probability of making a Type II error.
False Positive (a) - The probability of mistakenly concluding
that the sample area has attained the cleanup level when it has
not It is known as the probability of making a Type I error.
Hypothesis - An assumption about a property or characteristic
of a population under study. The goal of statistical inference is
to decide which of two complementary hypotheses is likely to
be true. In the context of this document, the null hypothesis is
that the sample area has not achieved the cleanup standard and
the alternative hypothesis is that it has.
Normal Distribution - A family of "bell-shaped" distribu-
tions, or curves, where each individual distribution is uniquely
defined by its mean and variance.
Sample Area - The specific area within a waste site for which
a separate decision on attainment is to be reached.
Sample Mean - The arithmetic average of a set of sample
measurements, x,, x.,,. . . x , defined to be:
12 n
Sample Population - The total number of soil/solid media
units at a waste site for which inferences regarding attainment
of cleanup standards are to be made.
Sample Standard Deviation - The more commonly used
measure of dispersion of the sample measurements, defined to
be:
(See definition for variance)
Sequential Test Procedures - Sampling process that termi-
nates when enough evidence is obtained to either accept or
reject the null hypothesis.
Simple Random Sample - A sample of n units collected from
apopulation of interest (for example, all possible samples of soil
units at a site) such that each unit has an equal chance of being
selected.
Variance - A measurement of dispersion of the sample
measurements, x,, x2, . . . xn, defined to be:
SEPA
United States
Environmental Protection
Agency (OS-120)
Washington, D.C. 20460
First Class Mail
Postage and Fees Paid
EPA
Permit No. G-35
Official Business
Penalty for Private Use
$300
------- |