-------
-------
Guidance Document on the Statistical Analysis
of Ground-Water Monitoring Data
at RCRA Facilities
-------
PREFACE
This guidance document has been developed primarily for evaluating
ground-water monitoring data at RCRA (Resource Conservation and Recovery Act)
facilities. The statistical methodologies described in this document can be
applied to both hazardous (Subtitle C of RCRA) and municipal (Subtitle D of
RCRA) waste land disposal facilities.
The recently amended regulations concerning the statistical analysis of
ground-water monitoring data at RCRA facilities (53 FR 39720, October 11,
1988), provide a wide variety of statistical methods that may be used to
evaluate ground-water quality. To the experienced and inexperienced water
quality professional, the choice of which test to use under a particular set
of conditions may not be apparent. The reader is referred to Section 4 of
this guidance, "Choosing a Statistical Method," for assistance in choosing an
appropriate statistical test. For relatively new facilities that have only
limited amounts of ground-water monitoring data, it is recommended that a form
of hypothesis test (e.g., parametric analysis of variance) be employed to
evaluate the data. Once sufficient data are available (after 12 to 24 months
or eight background samples), another method of analysis such as the control
chart methodology described in Section 7 of the guidance is recommended. Each
method of analysis and the conditions under which they will be used can be
written in the facility permit. This will eliminate the need for a permit
modification each time more information about the hydrogeochemistry is
collected, and more appropriate methods of data analysis become apparent.
This guidance was written primarily for the statistical analysis of
ground-water monitoring data at RCRA facilities. The guidance has wider
applications however, if one examines the spatial relationships involved
between the monitoring wells and the potential contaminant source. For
example, Section 5 of the guidance describes background well (upgradient) vs.
compliance well (downgradient) comparisons. This scenario can be applied to
other non-RCRA situations involving the same spatial relationships and the
same null hypothesis. The explicit null hypothesis (H0) for testing contrasts
between means, or where appropriate between medians, is that the means between
groups (here monitoring wells) are equal (i.e., no release has been detected),
or that the group means are below a prescribed action level (e.g., the ground-
water protection standard). Statistical methods that can be used to evaluate
these conditions are described in Section 5.2 (Analysis of Variance), 5.3
(Tolerance Intervals), and 5.4 (Prediction Intervals).
A different situation exists when compliance wells (downgradient) are
compared to a fixed standard (e.g., the ground-water protection standard). In
that case, Section 6 of the guidance should be consulted. The value to which
the constituent concentrations at compliance wells art compared can be any
-------
standard established by a Regional Administrator, State or county health
official, or another appropriate official.
A note of caution applies to Section 6. The examples used in Section 6
are used to determine whether ground water has been contaminated as a result
of a release from a facility. When the lower confidence limit lies entirely
above the ACL (alternate concentration limit) or MCL (maximum concentration
limit), further action or assessment may be warranted. If one wishes to
determine whether a cleanup standard has been attained for a Superfund site or
a RCRA facility in corrective action, another EPA guidance document entitled,
"Statistical Methods for the Attainment of Superfund Cleanup Standards (Vol-
ume 2: Ground Water—Draft), should be consulted. This draft Superfund
guidance is a multivolume set that addresses questions regarding the success
of air, ground-water, and soil remediation efforts. Information about the
availability of this draft guidance, currently being developed, can be
obtained by calling the RCRA/Superfund Hotline, telephone (800) 424-9346 or
(202) 382-3000.
Those interested in evaluating individual uncontaminated wells or in an
intrawell comparison are referred to Section 7 of the guidance which describes
the use of Shewhart-CUSUM control charts and trend analysis. Municipal water
supply engineers, for example, who wish to monitor water quality parameters in
supply wells, may find this section useful.
Other sections of this guidance have wide applications in the field of
applied statistics, regardless of the intended use or purpose. Section 4.2
and 4.3 provide information on checking distributional assumptions and
equality of variance, while Sections 8.1 and 8.2 cover limit of detection
problems and outliers. Helpful advice and references for many experiments
involving the use of statistics can be found in these sections.
Finally, it should be noted that this guidance is not intended to be the
final chapter on the statistical analysis of ground-water monitoring data, nor
should it be used as such. 40 CFR Part 264 Subpart F offers an alternative
[§264.97(h)(5)] to the methods suggested and described in this guidance
document. In fact, the guidance recommends a procedure (confidence intervals)
for comparing monitoring data to a fixed standard that is not mentioned in the
Subpart F regulations. This is neither contradictory nor inconsistent but
rather epitomizes the complexities of the subject matter and exemplifies the
need for flexibility due to the site-specific monitoring requirements of the
RCRA program.
-------
CONTENTS
Preface i i i
Figures v1
Tables vi1
Executive Summary E-l
1. Introduction 1-1
2. Regulatory Overview 2-1
2.1 Background 2-1
2.2 Overview of Methodology 2-3
2.3 General Performance Standards 2-3
2.4 Basic Statistical Methods and Sampling
Procedures 2-6
3. Choosing a Sampling Interval 3-1
3.1 Example Calculations 3-8
3.2 Flow Through Karst and "Pseudo-Karst" Terranes 3-11
4. Choosing a Statistical Method 4-1
4.1 Flowcharts—Overview and Use 4-1
4.2 Checking Distributional Assumptions 4-4
4.3 Checking Equality of Variance: Bartlett's Test 4-17
5. Background Well to Compliance Well Comparisons 5-1
5.1 Summary Flowchart for Background Well to
Compliance Well Comparisons 5-2
5.2 Analysis of Variance 5-5
5.3 Tolerance Intervals Based on the Normal
Distribution 5-20
5.4 Prediction Intervals 5-24
6. Comparisons with MCLs or ACLs 6-1
6.1 Summary Chart for Comparison with MCLs or ACLs 6-1
6.2 Statistical Procedures 6-1
7. Control Charts for Intra-Well Comparisons 7-1
7.1 Advantages of Plotting Data 7-1
7.2 Correcting for Seasonal ity 7-2
7.3 Combined Shewhart-CUSUM Control Charts for Each
Well and Constituent 7-5
7.4 Update of a Control Chart 7-10
7.5 Nondetects in a Control Chart 7-12
8. Miscellaneous Topics 8-1
8.1 Limit of Detection 8-1
8.2 Outliers 8-11
Appendices
A. General Statistical Considerations and Glossary of
Statistical Terms A-l
B. Statistical Tables B-l
C. General Bibliography C-l
D. Federal Register, 40 CFR, Part 264 D-l
-------
FIGURES
Number Page
3-1 Hydraulic conductivity ojf selected rocks 3-3
3-2 Range of values of hydrajulic conductivity and permeability.... 3-4
3-3 Conversion factors for permeability and hydraulic
conductivity units. 3-4
3-4 Total porosity and drairtable porosity for typical
geologic materials...« 3-7
3-5 Potentiometric surface rtiap for computation of hydraulic
gradient 4 3-9
4-1 Flowchart overview 4-3
4-2 Probability plot of rawjchlordane concentrations 4-11
4-3 Probability plot of log+transformed chlordane concentrations.. 4-13
5-1 Background well to compliance well comparisons 5-3
5-2 Tolerance limits: alternate approach to background
well to compliance we'll comparisons 5-4
6-1 Comparisons with MCLs/ACLs 6-2
7-1 Plot of unadjusted and Seasonally adjusted monthly
observations 7-6
7-2 Combined Shewhart-CUSUM chart 7-11
vi
-------
TABLES
Number Page
2-1 Summary of Statistical Methods 2-7
3-1 Default Values for Effective Porosity (Ne) for Use in Time
of Travel (TOT) Analyses 3-5
3-2 Specific Yield Values for Selected Rock Types 3-6
3-3 Determining a Sampling Interval 3-11
4-1 Example Data for Coefficient-of-Variation Test 4-8
4-2 Example Data Computations for Probability Plotting 4-10
4-3 Cell Boundaries for the Chi-Squared Test 4-14
4-4 Example Data for Chi-Squared Test 4-15
4-5 Example Data for Bartlett's Test 4-19
5-1 One-Way Parametric ANOVA Table 5-8
5-2 Example Data for One-Way Parametric Analysis of Variance 5-11
5-3 Example Computations in One-Way Parametric ANOVA Table 5-12
5-4 Example Data for One-Way Nonparametric ANOVA--Benzene
Concentrations (ppm) 5-18
5-5 Example Data for Normal Tolerance Interval 5-23
5-6 Example Data for Prediction Interval—Chlordane Levels 5-27
6-1 Example Data for Normal Confidence Interval--Aldicarb
Concentrations in Compliance Wells (ppb) 6-4
6-2 Example Data for Log-Normal Confidence Interval--EDB
Concentrations in Compliance Wells (ppb) 6-6
6-3 Values of M and n+l-M and Confidence Coefficients for
Smal 1 Sampl es 6-9
6-4 Example Data for Nonparametric Confidence Interval—T-29
Concentrations (ppm) 6-10
vii
-------
TABLES (continued)
Number Page
6-5 Example Data for a Tolerance Interval Compared to an ACL 6-13
7-1 Example Computation for Deseasonalizing Data 7-4
7-2 Example Data for Combined Shewhart-CUSUM Chart—Carbon
Tetrachloride Concentration (vg/L) 7-9
8-1 Methods for Below Detection Limit Values 8-2
8-2 Example Data for a Test of Proportions 8-6
8-3 Example Data for Testing Cohen's Test 8-9
8-4 Example Data for Testing for an Outlier 8-13
v i i i
-------
ACKNOWLEDGMENT
This document was developed by EPA's Office of Solid Waste under the
direction of Dr. Vernon Myers, Chief of the Ground-Water Section of the Waste
Management Division. The document was prepared by the joint efforts of
Dr. Vernon B. Myers, Mr. James R. Brown of the Waste Management Division,
Mr. James Craig of the Office of Policy Planning and Information, and
Mr. Barnes Johnson of the Office of Policy, Planning, and Evaluation. Tech-
nical support in the preparation of this document was provided by Midwest
Research Institute (MRI) under a subcontract to NUS Corporation, the prime
contractor with EPA's Office of Solid Waste. MRI staff who assisted with the
preparation of the document were Jairus D. Flora, Jr., Ph.D., Principal
Statistician, Ms. Karin M. Bauer, Senior Statistician, and Mr. Joseph S.
Bartling, Assistant Statistician.
ix
-------
EXECUTIVE SUMMARY
The hazardous waste regulations under the Resource Conservation and
Recovery Act (RCRA) require owners and operators of hazardous waste facilities
to utilize design features and control measures that prevent the release of
hazardous waste into ground-water. Further, regulated units (i.e., all sur-
face impoundments, waste piles, land treatment units, and landfills that
receive hazardous waste after July 26, 1982) are also subject to the ground-
water monitoring and corrective action standards of 40 CFR Part 264, Sub-
part F. These regulations require that a statistical method and sampling pro-
cedure approved by EPA be used to determine whether there are releases from
regulated units into ground water.
This document provides guidance to RCRA facility permit applicants and
writers concerning the statistical analysis of ground-water monitoring data at
RCRA facilities. Section 1 is an introduction to the guidance; it describes
the purpose and intent of the document and emphasizes the need for site-
specific considerations in implementing the Subpart F regulations of 40 CFR
Part 264.
Section 2 provides the reader with an overview of the recently promul-
gated regulations concerning the statistical analysis of ground-water moni-
toring data (53 FR 39720, October 11, 1988). The requirements of the
regulation are reviewed, and the need to consider site-specific factors in
evaluating data at a hazardous waste facility is emphasized.
Section 3 discusses the important hydrogeologic parameters to consider
when choosing a sampling interval. The Darcy equation is used to determine
the horizontal component of the average linear velocity of ground water. This
parameter provides a good estimate of time of travel for most soluble con-
stituents in ground water and may be used to determine a sampling interval.
In karst, cavernous volcanics, and fractured geologic environments, alterna-
tive methods are needed to determine an appropriate sampling interval. Exam-
ple calculations are provided at the end of the section to further assist the
reader.
Section 4 provides guidance on choosing an appropriate statistical
method. A flow chart to guide the reader through this section, as well as
procedures to test the distributional assumptions of data, are presented.
Finally, this section outlines procedures to test specifically for equality of
variance.
Section 5 covers statistical methods that may be used to evaluate ground-
water monitoring data when background wells have been sited hydraulically
upgradient from the regulated unit, and a second set of wells are sited
E-l
-------
hydraulically downgradient from the regulated unit at the point of compli-
ance. The data from these compliance wells are compared to data from the
background wells to determine whether a release from a facility has
occurred. Parametric and nonparametric analysis of variance, tolerance inter-
vals, and prediction intervals are suggested methods for this type of compari-
son. Flow charts, procedures, and example calculations are given for each
testing method.
Section 6 includes statistical procedures that are appropriate when
comparing ground-water constituent concentrations to fixed concentration
limits (e.g., alternate concentration limits or maximum concentration lim-
its). The methods applicable to this type of comparison are confidence inter-
vals and tolerance intervals. As in Section 5, flow charts, procedures, and
examples explain the calculations necessary for each testing method.
Section 7 presents the case where the level of each constituent within a
single, uncontaminated well is being compared to its historic background con-
centrations. This is known as an intra-well comparison. In essence, the data
for each constituent in each well are plotted on a time scale and inspected
for obvious features such as trends or sudden changes in concentration
levels. The method suggested in this section is a combined Shewhart-CUSUM
control chart.
Section 8 contains a variety of special topics that are relatively short
and self-contained. These topics include methods to deal with data that is
below the limit of analytical detection and methods to test for outliers or
extreme values in the data.
Finally, the guidance presents appendices that cover general statistical
considerations, a glossary of statistical terms, statistical tables, and a
listing of references. These appendices provide necessary and ancillary
information to aid the user in evaluating ground-water monitoring data.
E-2
-------
SECTION 1
INTRODUCTION
The U.S. Environmental Protection Agency (EPA) promulgated regulations
for detecting contamination of ground water at hazardous waste land disposal
facilities under the Resource Conservation and Recovery Act (RCRA) of 1976.
The statistical procedures specified for use to evaluate the presence of con-
tamination have been criticized and require improvement. Therefore, EPA has
revised those statistical procedures in 40 CFR Part 264, "Statistical Methods
for Evaluating Ground-Water Monitoring Data From Hazardous Waste Facilities."
In 40 CFR Part 264, EPA has recently amended the Subpart F regulations
with statistical methods and sampling procedures that are appropriate for
evaluating ground-water monitoring data under a variety of situations (53 FR
39720, October 11, 1988). The purpose of this document is to provide guidance
in determining which situation applies and consequently which statistical
procedure may be used. In addition to providing guidance on selection of an
appropriate statistical procedure, this document provides instructions on
carrying out the procedure and interpreting the results.
The regulations provide three levels of monitoring for a regulated
unit: detection monitoring; compliance monitoring; and corrective action.
The regulations define conditions for a regulated unit to be changed from one
level of monitoring to a more stringent level of monitoring (e.g., from detec-
tion monitoring to compliance monitoring). These conditions are that there is
statistically significant evidence of contamination [40 CFR §264.91(a)(l) and
(2)1.
The regulations allow the benefit of the doubt to reside with the current
stage of monitoring. That is, a unit will remain in its current monitoring
stage unless there is convincing evidence to change it. This means that a
unit will not be changed from detection monitoring to compliance monitoring
(or from compliance monitoring to corrective action) unless there is statisti-
cally significant evidence of contamination (or contamination above the com-
pliance limit).
The main purpose of this document is to guide owners, operators, Regional
Administrators, State Directors, and other interested parties in the selec-
tion, use, and interpretation of appropriate statistical methods for monitor-
ing the ground water at each specific regulated unit. Topics to be covered
include sampling needed, sample sizes, selection of appropriate statistical
design, matching analysis of data to design, and interpretation of results.
Specific recommended methods are detailed and a general discussion of evalu-
ation of alternate methods is provided. Statistical concepts are discussed in
1-1
-------
Appendix A. References for suggested
references to alternate procedure
calling for external consultation
ing expert assistance when needed.
procedures are provided as well as
s and general statistics texts. Situations
are mentioned as well as sources for obtain-
EPA would like to emphasize
implementing the Subpart F regu
amended, 53 FIR 39720, October 11,
promulgate regulations that are
enough to accommodate a wide var
This is usually achieved by spec
majority of monitoring situations
tives that are also protective
philosophy is maintained in the
"Statistical Methods for Evaluat
ardous Waste Facilities" (53 FR
allow for the use of an alternat
[§264.97(g)(2) and §264.97(h)(5),
are explicitly referenced [§264.S
meet the performance standards
should be given when preparing
applications.
:he need for site-specific considerations in
ations of 40 CFR Part 264 (especially as
1988). It has been an ongoing strategy to
specific enough to implement, yet flexible
ety of site-specific environmental factors.
fying criteria that are appropriate for the
, while at the same-time allowing alterna-
f human health and the environment. This
recently promulgated amendments entitled,
ing Ground-Water Monitoring Data From Haz-
9720, October 11, 1988). The sections that
e sampling procedure and statistical method
respectively] are as viable as those that
97(g)(l) and §264.97(h)(l-4)], provided they
3f §264.97(i). Due consideration to this
and reviewing Part B permits and permit
1-2
-------
SECTION 2
REGULATORY OVERVIEW
In 1982, EPA promulgated ground-water monitoring and response standards
for permitted facilities in Subpart F of 40 CFR Part 264, for detecting
releases of hazardous wastes into ground water from storage, treatment, and
disposal units, at permitted facilities (47 FR 32274, July 26, 1982).
The Subpart F regulations required ground-water data to be examined by
Cochran's Approximation to the Behrens-Fisher Student's t-test (CABF) to
determine whether there was a significant exceedance of background levels, or
other allowable levels, of specified chemical parameters and hazardous waste
constituents. One concern was that this procedure could result in a high rate
of "false positives" (Type I error), thus requiring an owner or operator
unnecessarily to advance into a more comprehensive and expensive phase of
monitoring. More importantly, another concern was that the procedure could
result in a high rate of "false negatives" (Type II error), i.e., instances
where actual contamination would go undetected.
As a result of these concerns, EPA amended the CABF procedure with five
different statistical methods that are more appropriate for ground-water moni-
toring (53 FR 39720, October 11, 1988). These amendments also outline sam-
pling procedures and performance standards that are designed to help minimize
the event that a statistical method will indicate contamination when it is not
present (Type I error), and fail to detect contamination when it is present
(Type II error).
2.1 BACKGROUND
Subtitle C of the Resource Conservation Recovery Act of 1976 (RCRA) cre-
ates a comprehensive program for the safe management of hazardous waste. Sec-
tion 3004 of RCRA requires owners and operators of facilities that treat,
store, or dispose of hazardous waste to comply with standards established by
EPA that are "necessary to protect human health and the environment." Sec-
tion 3005 provides for implementation of these standards under permits issued
to owners and operators by EPA or authorized States. Section 3005 also pro-
vides that owners and operators of existing facilities that apply for a permit
and comply with applicable notice requirements may operate until a permit
determination is made. These facilities are commonly known as "interim
status" facilities. Owners and operators of interim status facilities also
must comply with standards set under Section 3004.
EPA promulgated ground-water monitoring and response standards for per-
mitted facilities in 1982 (47 FR 32274, July 26, 1982), codifi«rf in 40 CFR
2-1
-------
Part 264, Subpart F. These standards establish programs for protecting ground
water from releases of hazardous wastes from treatment, storage, and disposal
units. Facility owners and operators were required to sample ground water at
specified intervals and to use a statistical procedure to determine whether or
not hazardous wastes or constituents from the facility are contaminating
ground water. As explained in more detail below, the Subpart F regulations
regarding statistical methods used in evaluating ground-water monitoring data
that EPA promulgated in 1982 have generated criticism.
The Part 264 regulations prior to the October 11, 1988 amendments pro-
vided that the Cochran's Approximation to the Behrens-Fisher Student's t-test
(CABF) or an alternate statistical procedure approved by EPA be used to deter-
mine whether there is a statistically significant exceedance of background
levels, or other allowable levels, of specified chemical parameters and haz-
ardous waste constituents. Although the regulations have always provided
latitude for the use of an alternate statistical procedure, concerns were
raised that the CABF statistical procedure in the regulations was not appro-
priate. It was pointed out that: (1) the replicate sampling method is not
appropriate for the CABF procedure, (2) the CABF procedure does not adequately
consider the number of comparisons that must be made, and (3) the CABF does
not control for seasonal variation. Specifically, the concerns were that the
CABF procedure could result in "false positives" (Type I error), thus requir-
ing an owner or operator unnecessarily to collect additional ground-water
samples, to further characterize ground-water quality, and to apply for a
permit modification, which is then subject to EPA review. In addition, there
was concern that CABF may result in "false negatives" (Type II error), i.e.,
instances where actual contamination goes undetected. This could occur
because the background data, which are often used as the basis of the
statistical comparisons, are highly variable due to temporal, spatial,
analytical, and sampling effects.
As a result of these concerns, on October 11, 1988 EPA amended both the
statistical methods and the sampling procedures of the regulations, by requir-
ing (if necessary) that owners or operators more accurately characterize the
hydrogeology and potential contaminants at the facility, and by including in
the regulations performance standards that all the statistical methods and
sampling procedures must meet. Statistical methods and sampling procedures
meeting these performance standards would have a low probability of indicating
contamination when it is not present, and of failing to detect contamination
that actually is present. The facility owner or operator would have to demon-
strate that a procedure is appropriate for the site-specific conditions at the
facility, and to ensure that it meets the performance standards outlined
below. This demonstration holds for any of the statistical methods and sam-
pling procedures outlined in this regulation as well as any alternate methods
or procedures proposed by facility owners and operators.
EPA recognizes that the selection of appropriate monitoring parameters is
also an essential part of a reliable statistical evaluation. The Agency
addressed this issue in a previous Federal Register notice (52 FR 25942,
July 9, 1987).
2-2
-------
2.2 OVERVIEW OF METHODOLOGY
EPA has elected to retain the idea of general performance requirements
that the regulated community must meet. This approach allows for flexibility
in designing statistical methods and sampling procedures to site-specific
considerations.
EPA has tried to bring a measure of certainty to these methods, while
accommodating the unique nature of many of the regulated units in question.
Consistent with this general strategy, the Agency is establishing several
options for the sampling procedures and statistical methods to be used in
detection monitoring and, where appropriate, in compliance monitoring.
The owner or operator shall submit, for each of the chemical parameters
and hazardous constituents listed in the facility permit, one or more of the
statistical methods and sampling procedures described in the regulations
promulgated on October 11, 1988. In deciding which statistical test is
appropriate, he or she will consider the theoretical properties of the test,
the data available, the site hydrogeology, and the fate and transport charac-
teristics of potential contaminants at the facility. The Regional Administra-
tor will review, and if appropriate, approve the proposed statistical methods
and sampling procedures when issuing the facility permit.
The Agency recognizes that there may be situations where any one statis-
tical test may not be appropriate. This is true of new facilities with little
or no ground-water monitoring data. If insufficient data prohibit the owner
or operator from specifying a statistical method of analysis, then contingency
plans containing several methods of data analysis and the conditions under
which the method can be used will be specified by the Regional Administrator
in the permit. In many cases, the parametric ANOVA can be performed after six
months of data have been collected. This will eliminate the need for a permit
modification in the event that data collected during future sampling and
analysis events indicate the need to change to a more appropriate statistical
method of analysis. In the event that a permit modification is necessary to
change a sampling procedure or a statistical method, the reader is referred to
53 FR 37912, September 28, 1988. These are considered Class 1 changes requir-
ing Director approval and should follow minor modification procedures.
2.3 GENERAL PERFORMANCE STANDARDS
EPA's basic concern in establishing these performance standards for sta-
tistical methods is to achieve a proper balance between the risk that the pro-
cedures will falsely indicate that a regulated unit is causing background
values or concentration limits to be exceeded (false positives) and the risk
that the procedures will fail to indicate that background values or concen-
tration limits are being exceeded (false negatives). EPA's approach is
designed to address that concern directly. Thus any statistical method or
sampling procedure, whether specified here or as an alternative to those
specified, should meet the following performance standards contained in
40 CFR §264.97(i):
2-3
-------
1. The statistical method used to evaluate ground-water monitoring data
shall be appropriate for the distribution of chemical parameters or
hazardous constituents. If the distribution of the chemical
parameters or hazardous constituents is shown by the owner or
operator to be inappropriate for a normal theory test, then the data
should be transformed or a distribution-free theory test should be
used. If the distributions for the constituents differ, more than
one statistical method may be needed.
2. If an individual well comparison procedure is used to compare an
individual compliance well constituent concentration with background
constituent concentrations or a ground-water protection standard,
the test shall be done at a Type I error level of no less than 0.01
for each testing period. If a multiple comparisons procedure is
used, the Type I experimentwise error rate shall be no less than
0.05 for each testing period; however, the Type I error of no less
than 0.01 for individual well comparisons must be maintained. This
performance standard does not apply to control charts, tolerance
intervals, or prediction intervals.
3. If a control chart approach is used to evaluate ground-water moni-
toring data, the specific type of control chart and its associated
parameters shall be proposed by the owner or operator and approved
by the Regional Administrator if he or she finds it to be protective
of human health and the environment.
4. If a tolerance interval or a prediction interval is used to evaluate
ground-water monitoring data, then the levels of confidence shall be
proposed; in addition, for tolerance intervals, the proportion of
the population that the interval must contain (with the proposed
confidence) shall be proposed by the owner or operator and approved
by the Regional Administrator if he or she finds these parameters to
be protective of human health and the environment. These parameters
will be determined after considering the number of samples in the
background data base, the distribution of the data, and the range of
the concentration values for each constituent of concern.
5. The statistical method will include procedures for handling data
below the limit of detection with one or more procedures that are
protective of human health and the environment. Any practical quan-
titation limit (PQL) approved by the Regional Administrator under
§264.97(h) that is used in the statistical method shall be the low-
est concentration level that can be reliably achieved within speci-
fied limits of precision and accuracy during routine laboratory
operating conditions available to the facility.
6. If necessary, the statistical method shall include procedures to
control or correct for seasonal and spatial variability as well as
temporal correlation in the data.
In referring to "statistical methods," EPA means to emphasize that the
concept of "statistical significance" must be reflected in several aspects of
the monitoring program. This involves not only the choice of a level of
2-4
-------
significance, but also the choice of a statistical test, the sampling require-
ments, the number of samples, and the frequency of sampling. Since all of
these parameters interact to determine the ability of the procedure to detect
contamination, the statistical methods, like a comprehensive ground-water
monitoring program, must be evaluated in their entirety, not by individual
components. Thus a systems approach to ground-water monitoring is endorsed.
The second performance standard requires further comment. For individual
well comparisons in which an individual compliance well is compared to back-
ground, the Type I error level shall be no less than 1% (0.01) for each test-
ing period. In other words, the probability of the test resulting in a false
positive is no less than 1 in 100. EPA believes that this significance level
is sufficient in limiting the false positive rate while at the same time con-
trolling the false negative (missed detection) rate.
Owners and operators of facilities that have an extensive network of
ground-water monitoring wells may find it more practical to use a multiple
well comparisons procedure. Multiple comparisons procedures control the
experimentwise error rate for comparisons involving multiple upgradient and
downgradient wells. If this method is used, the Type I experimentwise error
rate for each constituent shall be no less than 5% (0.05) for each testing
period.
In using a multiple well comparisons procedure, if the owner or operator
chooses to use a t-statistic rather than an F-statistic, the individual well
Type I error level must be maintained at no less than 1% (0.01). This
provision should be considered if a facility owner or operator wishes to use a
procedure that distributes the risk of a false positive evenly throughout all
monitoring wells (e.g., Bonferroni t-test).
Setting these levels of significance at 1% and 5%, respectively, raises
an important question in how the false positive rate will be controlled at
facilities with a large number of ground-water monitoring wells and monitoring
constituents. The Agency set these levels of significance on the basis of a
single testing period and not on the entire operating life of the facility.
Further, large facilities can reduce the false positive rate by implementing a
unit-specific monitoring approach. Data from uncontaminated upgradient wells
can be pooled and treated as one group. This will not only reduce the number
of comparisons in a multiple well comparisons procedure but will also take
into account spatial heterogeneities that may affect background ground-water
quality. If the overall F-test is significant, then testing of the contrasts
between the mean of each compliance well concentration and the mean background
concentration must be performed Tor each constituent. This will identify the
monitoring wells that are out of compliance. The Type I error level for the
individual comparisons shall be no less than 0.01. Nonetheless, it is evident
that facilities with an extensive number of ground-water monitoring wells
which are monitored for many constituents may still generate a large number of
comparisons during each testing period.
In these particular situations, a determination of whether a release from
a facility has occurred may require the Regional Administrator to evaluate the
site hydrogeology, geochemistry, climatic factors, and other environmental
parameters to determine if a statistically significant result is indicative of
2-5
-------
an actual release from the facility. In making this determination, the
Regional Administrator may note the relative magnitude of the concentration of
the constituent(s). If the exceedance is based on an observed compliance well
value that is the same relative magnitude as the PQL (practical quantitation
limit) or the background concentration level, then a false positive may have
occurred, and further sampling and testing may be appropriate. If, however,
the background concentration level or an action level is substantially
exceeded, then the exceedance is more likely to be indicative of a release
from the facility.
2.4 BASIC STATISTICAL METHODS AND SAMPLING PROCEDURES
The October 11, 1988 rule specifies five types of statistical methods to
detect contamination in ground water. EPA believes that at least one of these
types of procedures will be appropriate for virtually all facilities. To
address situations where these methods may not be appropriate, EPA has
included a provision for the owner or operator to select an alternate method
which is subject to approval by the Regional Administrator.
2.4.1 The Five Statistical Methods Outlined in the October 11, 1988 Final
Rule
1. A parametric analysis of variance (ANOVA) followed by multiple com-
parison procedures to identify specific sources of difference. The
procedures will include estimation and testing of the contrasts
between the mean of each compliance well and the background mean for
each constituent.
2. An analysis of variance (ANOVA) based on ranks followed by multiple
comparison procedures to identify specific sources of difference.
The procedure will include estimation and testing of the contrasts
between the median of each compliance well and the median background
levels for each constituent.
3. A procedure in which a tolerance interval or a prediction interval
for each constituent is established from the background data, and
the level of each constituent in each compliance well is compared to
its upper tolerance or prediction limit.
4. A control chart approach which will give control limits for each
constituent. If any compliance well has a value or a sequence of
values that lie outside the control limits for that constituent, it
may constitute statistically significant evidence of contamination.
5. Another statistical method submitted by the owner or operator and
approved by the Regional Administrator.
A summary of these statistical methods and their applicability is pre-
sented in Table 2-1. The table lists types of comparisons and the recommended
procedure and refers the reader to the appropriate sections where a discussion
and example can be found.
2-6
-------
TABLE 2-1. SUMMARY OF STATISTICAL METHODS
SUMMARY OF STATISTICAL METHODS
COMPOUND
ANY
COMPOUND
IN
BACKGROUND
ACL/MCL
SPECIFIC
SYNTHETIC
TYPE OF COMPARISON
BACKGROUND VS
COMPLIANCE WELL
INTRA-WELL
FIXED STANDARD
MANY NONDETECTS
IN DATA SET
RECOMMENDED METHOD
ANOVA
TOLERANCE LIMITS
PREDICTION INTERVALS
CONTROL CHARTS
CONFIDENCE INTERVALS
TOLERANCE LIMITS
SEE BELOW DETECTION
LIMIT TABLE 8-1
SECTION OF
GUIDANCE
DOCUMENT
5.2
5.3
5.4
7
6.2.1
6.2.2
8.1
2-7
-------
EPA is specifying multiple
and has allowed for alternative
priate for all circumstances.
procedures are appropriate for
from ground-water monitoring sy
site-specific factors that
Student's t-test (CABF) and the
regulations. The statistical
comparison problems and provide
natural variation. EPA believi
procedures consider and control
statistical methods and sampling procedures
5 because no one method or procedure is appro-
EPA believes that the suggested methods and
the site-specific design and analysis of data
terns and that they can account for more of the
Coichran's Approximation to the Behrens-Fisher
accompanying sampling procedures in the past
methods specified here address the multiple
for documenting and accounting for sources of
is that the specified statistical methods and
for natural temporal and spatial variation.
2.4.2 Site-Specific Considerat
by
The decision on the number
made on a site-specific basis
the statistical method being us
port characteristics of potential
The number of wells must be suf
ing contamination when it is p
should be used, the owner or
characteristics, including the
sampling procedures are:
1.
Obtain a sequence of
ensures, to the great
pendent sample is obtained
effective porosity,
and the fate and transport
nants. The sampling
the Regional Administ
2.
ons for Sampling
of wells needed in a monitoring system will be
the Regional Administrator and will consider
ed, the site hydrogeology, the fate and trans-
contaminants, and the sampling procedure.
icient to ensure a high probability of detect-
esent. To determine which sampling procedure
operator shall consider existing data and site
possibility of trends and seasonality. These
e
statistical methods that will a
procedures may be used to rep"
Subpart F regulations. Rather
dividing it into four replicat
taken at intervals far enougi(i
depending on rates of ground-
characteristics) will help ensu
independent sample) of ground
ground-water velocity prohibits
a semiannual basis, an alterna
Administrator may be utilized
The Regional Administrator
dure and interval submitted b.
effective porosity, hydraulic
uppermost aquifer under the
sampling procedures will allow the use of
xurately detect contamination. These sampling
ace the sampling method present in the former
than taking a single ground-water sample and
: samples, a sequence of at least four samples
apart in time (daily, weekly, or monthly,
v|/ater flow and contaminant fate and transport
'•e the sampling of a discrete portion (i.e., an
water. In hydrogeologic environments where the
one from obtaining four independent samples on
:e sampling procedure approved by the Regional
CFR §264.97(g)(l) and (2)].
[ro
shall approve an appropriate sampling proce-
the owner or operator after considering the
conductivity, and hydraulic gradient in the
was^te management area, and the fate and transport
2-8
-------
characteristics of potential contaminants. Most of this information is
already required to be submitted in the facility's Part B permit application
under §270.14(c) and may be used by the owner or operator to make this deter-
mination. Further, the number and kinds of samples collected to establish
background concentration levels should be appropriate to the form of statisti-
cal test employed, following generally accepted statistical principles
[40 CFR §264.97(g)]. For example, the use of control charts presumes a well-
defined background of at least eight samples per well. By contrast, ANOVA
alternatives might require only four samples per well.
It seems likely that most facilities will be sampling monthly over four
consecutive months, twice a year. In order to maintain a complete annual
record of ground-water data, the facility owner or operator may find it
desirable to obtain a sample each month of the year. This will help identify
seasonal trends in the data and permit evaluation of the effects of auto-
correlation and seasonal variation if present in the samples.
The concentrations of a consistent determined in these samples are
intended to be used in one-point-in-time comparisons between background and
compliance wells. This approach will help reduce the components of seasonal
variation by providing for simultaneous comparisons between background and
compliance well information.
The flexibility for establishing sampling intervals was chosen to allow
for the unique nature of the hydrogeologic systems beneath hazardous waste
sites. This sampling scheme will give proper consideration to the temporal
variation of and autocorrelation among the ground-water constituents. The
specified procedure requires sampling data from background wells, at the
compliance point, and according to a specific test protocol. The owner or
operator should use a background value determined from data collected under
this scenario if a test approved by the Regional Administrator requires it or
if a concentration limit in compliance monitoring is to be based upon
background data.
EPA recognizes that there may be situations where the owner or operator
can devise alternate statistical methods and sampling procedures that are more
appropriate to the facility and that will provide reliable results. There-
fore, today's regulations allow the Regional Administrator to approve such
procedures if he or she finds that the procedures balance the risk of false
positives and false negatives in a manner comparable to that provided by the
above specified tests and that they meet specified performance standards
[40 CFR §264.97(g)]. In examining the comparability of the procedure to
provide a reasonable balance between the risk of false positives and false
negatives, the owner or operator will specify in the alternate plan such
parameters as sampling frequency and sample size.
2.4.3 The "Reasonable Confidence" Requirement
The methods indicate that the procedure must provide reasonable confi-
dence that the migration of hazardous constituents from a regulated unit into
and through the aquifer will be detected. (The reference to hazardous
constituents does not mean that this option applies only to compliance
monitoring; the procedure also applies to monitoring parameters and
2-9
-------
constituents in the detection monjitori
indicating the presence of hazardous
specific tests, however, will
"reasonable confidence" in the proposed
shows that his or her suggested t
the specified tests, then it is li
confidence" test. There may b
difficult to directly compare the
protocols for the specified tests.
to be evaluated on its own merits.
2.4.4 Implementation
Owners and operators currently
employing the CABF procedure may c
procedure at the time of State i
course, these owners and operator
under § 270.41(a)(3). This chang
tion. Class 1 permit modification
limited interest to the public.
approval from the Director.
September 28, 1988 for more detail
or
Under appropriate circumstan
continue using the CABF procedure
comparably few monitoring wells (e
a limited number of chemical pa:
fewer than four). In this case, f
testing period, and performing th
nificance may result in no more t
The owner or operator should cons
adequacy of the CABF procedure fo
or operator should also continual
upgradient monitoring wells and <
well data (background wells) to
This practice will help reduce thi
ated with the CABF procedure.
independent samples from the mon
addresses how one might accompli
replicate sampling procedure
information about analytical varic
ground-water sampling programs
geochemical variability in the
Obtaining independent samples whe
autocorrelation.
should
c.bi
In all cases any statistical
approved by the Regional Administ
statistical method or sampling pro
Regional or State permit review e
modification is approved (see 53 F
ng program since they are surrogates
constituents.) The protocols for the
e used as general benchmark to define
procedure. If the owner or operator
st is comparable in its results to one of
kely to be acceptable under the "reasonable
situations, however, where it will be
performance of an alternate test to the
In such cases the alternate test will have
operating under a RCRA permit and
hange this procedure to a more appropriate
Regional permit review and update. Of
may also apply for a permit modification
is considered a Class 1 permit modifica-
s are technical in nature and generally of
1 modifications may be made with prior
he reader is referred to 53 FR 37912,
about the permit modification process.
ces, the owner or operator may wish to
This would involve a facility that has
g., fewer than five) and monitors for only
ameters and hazardous constituents (e.g.,
ewer than 20 comparisons would be made each
CABF procedure at the 0.05 level of sig-
an one false positive each testing period.
der a similar evaluation when deciding the
his or her facility. Likewise, the owner
ly update the background concentrations in
;imultaneously compare aggregate upgradient
downgradient well data (compliance wells).
component of temporal variability associ-
urther, efforts should be made to obtain
toring wells. Section 3 of the guidance
sh this task. If situations permit, the
be avoided. Replicate samples provide
ility and accuracy. The goal of all RCRA
Id be to provide data about the hydro-
fers below the hazardous waste facility.
possible will help reduce the effects of
should
acui
method or sampling procedure must be
ator or State Director. Changing from one
:edure to another may be done at the time of
nd update, or at any time a Class 1 permit
37912, September 28, 1988).
2-10
-------
SECTION 3
CHOOSING A SAMPLING INTERVAL
This section discusses the important hydrogeologic parameters to consider
when choosing a sampling interval. The Darcy equation is used to determine
the horizontal component of the average linear velocity of ground water for
confined, semi confined, and unconfined aquifers. This value provides a good
estimate of time of travel for most soluble constituents in ground water, and
can be used to determine a sampling interval. Example calculations are pro-
vided at the end of the section to further assist the reader. Alternative
methods must be employed to determine a sampling interval in hydrogeologic
environments where Darcy 's law is invalid. Karst, cavernous basalt, fractured
rocks, and other "pseudo karst" terranes usually require specialized monitor-
ing approaches.
Section 264.97(g) of 40 CFR Part 264 Subpart F provides the owner or
operator of a RCRA facility with a flexible sampling schedule that will allow
him or her to choose a sampling procedure that will reflect site-specific con-
cerns. This section specifies that the owner or operator shall, on a semi-
annual basis, obtain a sequence of at least four samples from each well, based
on an interval that is determined after evaluating the uppermost aquifer's
effective porosity, hydraulic conductivity, and hydraulic gradient, and the
fate and transport characteristics of potential contaminants. The intent of
this provision is to set a sampling frequency that allows sufficient time to
pass between sampling events to ensure, to the greatest extent technically
feasible, that an independent ground-water sample is taken from each well.
For further information on ground-water sampling, refer to the EPA "Practical
Guide for Ground-Water Sampling," Barcelona et al., 1985.
The sampling frequency of the four semiannual sampling events required in
Part 264 Subpart F can be based on estimates using the average linear velocity
of ground water. Two forms of the Darcy equation stated below relate ground-
water velocity (V) to effective porosity (Ne), hydraulic gradient (i), and
hydraulic conductivity (K):
V(Kh*i)/Ne and Vv=(K/i)/Ne
where V^ and Vv are the horizontal and vertical components of the average
linear velocity of ground water, respectively; Kh and Kv are the horizontal
and vertical components of hydraulic conductivity; i is the head gradient; and
Ne is the effective porosity. In applying these equations to ground-water
monitoring, the horizontal component of the average linear velocity (V^) can
be used to determine an appropriate sampling interval. Usually, field
3-1
-------
investigations will yield bulk
cases, the bulk hydraulic conduct
or a slug test will be sufficien
ponent of the average linear velocity
in estimating flow velocities in
velocity such as recharge and discharge
values for hydraulic conductivity. In most
vity determined by a pump test, tracer test,
It for these calculations. The vertical com-
of ground water (Vy) may be considered
with significant components of vertical
zones.
To apply the Darcy equatioji to ground-water monitoring, one needs to
determine the parameters K, i, and Ne. The hydraulic conductivity, K, is the
volume of water at the existing kinematic viscosity that will move in unit
time under a unit hydraulic gradient through a unit area measured at right
angles to the direction of flow. The reference to "existing kinematic vis-
cosity" relates to the fact that hydraulic conductivity is not only determined
by the media (aquifer), but also by fluid properties (ground water or poten-
tial contaminants). Thus, it is^ possible to have several hydraulic conduc-
tivity values for many differentjchemical substances that are present in the
same aquifer. In either case it is advisable to use the greatest value for
velocity that is calculated usirg the Darcy equation to determine sampling
intervals. This will provide for the earliest detection of a leak from a
hazardous waste facility and expeditious remedial action procedures. A range
of hydraulic conductivities (the transmitted fluid is water) for various aqui-
fer materials is given in Figure;; 3-1 and 3-2. The conductivities are given
in several units. Figure 3-3 lists conversion factors to change between vari-
ous permeability and hydraulic conductivity units.
The hydraulic gradient, i, [is the change in hydraulic head per unit of
distance in a given direction, tt can be determined by dividing the differ-
ence in head between two points on a potentiometric surface map by the
orthogonal distance between those two points (see example calculation). Water
level measurements are normally u^ed to determine the natural hydraulic gradi-
ent at a facility. However, the
in the vicinity of the monitori
effects of mounding in the event of a leak
from a waste disposal facility ma^y produce a steeper local hydraulic gradient
|ig well. These local changes in hydraulic
gradient should be accounted for fn the velocity calculations.
The effective porosity, Ne, is the ratio, usually expressed as a per-
centage, of the total volume of vjoids available for fluid transmission to the
total volume of the porous medium dewatered. It can be estimated during a
pump test by dividing the volume of water removed from an aquifer by the total
volume of aquifer dewatered (se<; example calculation). Table 3-1 presents
approximate effective porosity values for a variety of aquifer materials. In
cases where the effective porosity is unknown, specific yield may be substi-
tuted into the equation. Specific yields of selected rock units are given in
Table 3-2. In the absence of measured values, drainable porosity is often
used to approximate effective porosity. Figure 3-4 illustrates representative
values of drainable porosity and total porosity as a function of aquifer
particle size. If available, field measurements of effective porosity are
preferred.
3-2
-------
I G N E OUS AND METAMORPHIC ROCKS
U n ' r o c r u r e d
Fractured
BASALT
Unfracfured
Fractured
SANDSTONE
Lava flow
Fractured Semiconsolidoted
SHALE
Unfractured Fractured
CARBONATE ROCKS
Fractured
CLAY SILT, LOESS
Cavernous
SILTY SANO
CLEAN SAND
Fine Coarse
GLACIAL TILL
GRAVEL
IO"8 IO"7 IO'6 IO"5 IO"4 IO~J IO"2 10"' I 10 10 2 I03 I04
m/day
IO"7 IO"6 IO"5 IO"4 IO"3 IO"2 IO"1
ft/day
10 10 z 10 3 10 4 10 5
IO"7 I0"$ IO"5 IO"4 IO"3 I0"z 10"' I 10 10 2 10 3 10 4 10 5
gal/day-ft2
Source: Heath, R. C. 1983. Basic Ground-Water Hydrology. U.S. Geological
Survey Water Supply Paper, 2220, 84 pp.
Figure 3-1. Hydraulic conductivity of selected rocks.
3-3
-------
ROCKS
i>
1 i
~f> n
sS
"~ "rt
- o"?
" D 0^
C w 3 2
DO
O_ Q 0
C^£
ls|
^ c ^
ii I
1 — J
Ic
_o <
•a 1
c
;_
1>
5
n
3
5
Unconsondared A- A A" /C /r
deposits ^ (darcy) (cm2) (cm/s) (m/s) (aa|/dav/ff
u
!u
a
0
(_
a
Crt
n
.22
- c
T^ -
O
(.
(*
is
o
i
C/l —
7
3
•o o £
S (,o
3 4
0
"H
•a
2
B
a>
c
— ^
-2
3 3 _£-
- Ql (/)
3 Q
—
'o
>> tj
C
a
0
o
x
•>
j
rlO5 plO'3 rt02 p!
-to4
-IO3
-to2
-10
- 1
-10''
-tO'2
-to-3
-10"4
-to'5
-to-4
-ID'5
-IO"6
-io-7
-10"8
-io-9
-io-'°
-to-"
-io-'2
-io-'3
-to"6 -to'14
-to-7
-to-8
-to'15
-io-16
-10
-1
-10-'
-io-2
-to-3
-to-4
-to-5
MO'6
-10"7
-to"8
-10'9
-10"°
.IO6
-to"
-to-2
-io-3
-to-4
-to-5
-io-6
MO'7
-io-8
-10'9
- 10"°
- to-"
- to"2
-10'" "-'O'13
-to5
) w
-104
-IO3
- IO2
-10
-1
-10"
-1Q-2
i w
-io-3
- 10'4
1 w
MO'5
-10'6
L 10
,-7
Source: Freeze, R. A., and J. A. Cherry. 1979. Ground Water.
Hall, Inc., Englewood Cliffs, New Jersey, p. 29.
Figure 3-2. Range of values of hydraulic conductivity
and permeability.
Prentice
Permeability, k*
cm2
cm2
ft2
darcy
m/s
ft/s
9.29
9.87
1.02
3.11
gal/day/ft2 5.42
*To
Source:
Hall
, Inc.
obtain k in
Freeze,
i
x IO2
X JO"'
x 10-3
x 10-*
X 10-!°
ft 2
1.08 •<
1
1.06 x
1.10 x
3.35 x
5.83 x
ft2, multiply k in
R. A.
, Englewood Cl
10-3
IQ-i'
10's
IO-7
10-13
cm2 by
, and J.
iffs
, New
darcy
1.01 x 10« 9.
9.42 x IO10 9.
1 9.
1.04 x 105
3.15 x 10* 3,
5.49 x 1C'2 4.
1.08 x 10-3.
A. Cherry.
Jersey, p.
Hydraulic conductivity, K
m/s
80 x IO2
11 x IO5
66 x 10-«
1
,05 x 10-i
.72 x IO-7
1979.
29.
ft/s
3.22 x 103
2.99 x 10«
3.17 x 10-s
3.28
1
gal/day/ft2
1
1.
1
2
5
1.74 x 10~6
Ground
.85 x
,71 x
.82 x
.12 x
.74 x
1
Water.
10»
10' -
IQi
10s
10'
Prentice
Figure 3-3. Conversion factors for permeability and
hydraulic conductivity units.
3-4
-------
TABLE 3-1. DEFAULT VALUES FOR EFFECTIVE POROSITY (Ne) FOR USE
IN TIME OF TRAVEL (TOT) ANALYSES
Effective porosity
Soil textural classes of saturation3
Unified soil classification system
GS, GP, GM, GC, SW, SP, SM, SC 0.20
(20%)
ML, MH 0.15
(15%)
CL, OL, CH, OH, PT 0.01,
USDA soil textural classes
Clays, silty clays, sandy clays
Silts, silt loams, silty clay loams 0.10
(10%)
All others 0.20
(20%)
Rock units (all)
Porous media (nonfractured rocks 0.15
such as sandstone and some carbonates) (15%)
Fractured rocks (most carbonates, 0.0001
shales, granites, etc.) (0.01%)
Source: Barari, A., and L. S. Hedges. 1985. Movement of Water
in Glacial Till. Proceedings of the 17th International Congress of the
International Association of Hydrogeologists, pp. 129-134.
a These values are estimates and there may be differences between
similar units. For example, recent studies indicate that
weathered and unweathered glacial till may have markedly dif-
ferent effective porosities (Barari and Hedges, 1985; Bradbury
et a!., 1985).
Assumes de minimus secondary porosity. If fractures or soil
structure are present, effective porosity should be 0.001
(0.1%).
3-5
-------
TABLE 3-2. SPECIFIC YIELD VALUES FOR
SELECTED ROCK TYPES
Rock type Specific yield (%)
Clay 2
Sand 22
Gravel 19
Limestone 18
Sandstone (semiconsolidated) 6
Granite 0.09
Basalt (young) 8
Source:Heath, R. C,1983.Basic Ground-Water
Hydrology. U.S. Geological Survey, Water Supply
Paper 2220, 84 pp.
3-6
-------
o
u
a.
50
45
40
35
30
25
20
15
10
5
0
I I
I I I I I I
Porosity
SoecifiC yield
(dramable porosity)
•a
«*
oi
O
£
3
E
U
o
U
a
01
I*
5
O
O
CC
1/16 1/18 1/4 1/21 2 4 8 16 32 64 128 256
Maximum 10% gram size, millimeters
(The grtin site m which rfte cumulttue rota/ beginning .vitrt the cotntit mttentl.
retcnei 10% of the total tfrnpie I
Source: Todd, D. K. 1980. Ground Water Hydrology. John
Wiley and Sons, New York. 534 pp.
Figure 3-4. Total porosity and drainable porosity for
typical geologic materials.
3-7
-------
Once the values for K, i, and Ne are determined, the horizontal component
of the average linear velocity of ground water can be calculated. Using the
Darcy equation, we can determine the time required for ground water to pass
through the complete monitoring well diameter by dividing the monitoring well
diameter by the horizontal component of the average linear velocity of ground
water. (If considerable exchange of water occurs during well purging, the
diameter of the filter pack may be used rather than the monitoring well diam-
eter.) This value will represent the minimum time interval required between
sampling events that will yield an independent ground-water sample. (Three-
dimensional mixing of ground water in the vicinity of the monitoring well will
occur when the well is purged before sampling, which is one reason why this
method only provides an estimation of travel time).
In determining these sampling intervals, one should note that many chemi-
cal compounds will not travel at the same velocity as ground water. Chemical
characteristics such as adsorptive potential, specific gravity, and molecular
size will influence the way chemicals travel in the subsurface. Large mole-
cules, for example, will tend to travel slower than the average linear veloc-
ity of ground water because of matrix interactions. Compounds that exhibit a
strong adsorptive potential will undergo a similar fate that will dramatically
change time of travel predictions using the Oarcy equation. In some cases
chemical interaction with the matrix material will alter the matrix structure
and its associated hydraulic conductivity that may result in an increase in
contaminant mobility. This effect has been observed with certain organic
solvents in clay units (see Brown and Andersen, 1981). Contaminant fate and
transport models may be useful in determining the influence of these effects
on movement in the subsurface. A variety of these models are available on the
commercial market for private use.
3.1 EXAMPLE CALCULATIONS
EXAMPLE CALCULATION NO. 1: DETERMINING THE EFFECTIVE POROSITY (Ne)
The effective porosity, Ne, expressed in %, can be determined during a
pump test using the following method:
Ne = 100% x volume of water removed/volume of aquifer dewatered
Based on a pumping rate of the pump of 50 gal/min and a pumping
duration of 30 min, compute the volume of water removed as:
50 gal/min x 30 min = 1,500 gal
To calculate the volume of aquifer dewatered, use the formula:
V = (l/3)Trr2h
where r is the radius (ft) of area affected by pumping and h (ft) is the drop
in the water level. If, for example, h = 3 ft and r = 18 ft, then:
V = (l/3)*3.14*182*3 = 1,018 ft3
3-8
-------
Next, converting ft3 of water to gallons of water,
V = (1,018 ft3)(7.48 gal/ft3) = 7,615 gal
Substituting the two volumes in the equation for the effective
porosity, obtain
Ne = 100% x 1,500/7,615 = 19.7%
EXAMPLE CALCULATION NO. 2: DETERMINING THE HYDRAULIC GRADIENT (i)
The hydraulic gradient, i, can be determined from a potentiometric
surface map (Figure 3-5 below) as i = Ah/i, where Ah is the difference
measured in the gradient at Vz^ and Pz2, and s. is the orthogonal distance
between the two piezometers.
Using the values given in Figure 3-3, obtain
i = Ah/a = (29.2 ft - 29.1 ft)/100 ft = 0.001 ft/ft
29.21
29.1'
O1
Figure 3-5. Potentiometric surface map for computation
of hydraulic gradient.
This method provides only a very general estimate of the natural
hydraulic gradient that exists in the vicinity of the two piezometers.
Chemical gradients are known to exist and may override the effects of the
hydraulic gradient. A detailed study of the effects of multiple chemical
contaminants may be necessary to determine the actual average linear velocity
(horizontal component) of ground water in the vicinity of the monitoring
wells.
3-9
-------
EXAMPLE CALCULATION NO. 3: DETERMINING THE HORIZONTAL COMPONENT OF THE
AVERAGE LINEAR VELOCITY OF GROUND WATER (Vh)
A land disposal facility has ground-water monitoring wells that are
screened in an unconfined silty sand aquifer. Slug tests, pump tests, and
tracer tests conducted during a hydrogeologic site investigation have revealed
that the aquifer has a horizontal hydraulic conductivity (Kh) of 15 ft/day and
an effective porosity (Ne) of 15%. Using a potentiometric map (as in
example 2), the hydraulic gradient (i) has been determined to be 0.003 ft/ft.
To estimate the minimum time interval between sampling events that will
allow one to obtain an independent sample of ground water proceed as follows.
Calculate the horizontal component of the average linear velocity of
ground water (Vh) using the Darcy equation, Vh = (K^*i)/Ne.
With Kh = 15 ft/day,
Ne = 15%, and
i = 0.003 ft/ft, calculate
Vh = (15)(0.003)7(15%) = 0.3 ft/day, or equivalently
Vh = (0.3 ft/day)(12 in/ft) = 3.6 in/day
Discussion: The horizontal component of the average linear velocity of
ground water, Vh, has been calculated and is equal to 3.6 in/day. Monitoring
well diameters at this particular facility are 4 in. We can determine the
minimum time interval between sampling events that will allow one to obtain an
independent sample of ground water by dividing the monitoring well diameter by
the horizontal component of the average linear velocity of ground water:
Minimum time interval = (4 in)/(3.6 in/day) = 1.1 days
Based on the above calculations, the owner or operator could sample every
other day. However, because the velocity can vary with recharge rates sea-
sonally, a weekly sampling interval would be advised.
Suggested Sampling Interval
Date Obtain Sample No.
June 1 1
June 8 2
June 15 3
June 22 4
Table 3-3 gives some results for common situations.
3-10
-------
TABLE 3-3. DETERMINING A SAMPLING INTERVAL
DETERMINING A SAMPLING INTERVAL
UNIT
GRAVEL
SAND
SILTY SAND
TILL
SS (SEMICON)
BASALT
Kp (ft/day)
104
102
10
10'3
1
1C'1
Ne (%)
19
22
14
2
6
8
Vn (in/mo)
9.6x104
8.3x102
1.3x 102
9.1 x 10"2
30
2.28
SAMPLING INTERVAL
DAILY
DAILY
WEEKLY
MONTHLY *
WEEKLY
MONTHLY *
The horizontal component of the average linear velocities is based on
a hydraulic gradient, i, of 0.005 ft/ft.
* Use a Monthly sampling interval or an alternate sampling procedure.
3.2 FLOW THROUGH KARST AND "PSEUDO-KARST" TERRANES
The Darcy equation is not valid in turbulent and nonlinear laminar flow
regimes. Examples of these particular hydrogeological environments include
karst and "pseudo-karst" (e.g., cavernous basalts and extensively fractured
rocks) terranes. Specialized methods have been investigated by Quinlan (1989)
for developing alternative monitoring procedures for karst and "pseudo-karst"
terranes. Dye tracing as described by Quinlan (1989) and Mull et al. (1988)
is useful for identifying flow paths and travel times in karst and "pseudo-
karst" terranes. Conventional ground-water monitoring wells in these
environments are often of little value in designing an effective monitoring
system. Field investigations are necessary to locate seeps and springs, which
may serve as better "monitoring wells" for identifying releases of hazardous
constituents into ground water and surface water.
3-11
-------
SECTION 4
CHOOSING A STATISTICAL METHOD
This section discusses the choice of an appropriate statistical method.
Section 4.1 includes a flowchart to guide this selection. Section 4.2 contains
procedures to test the distributional assumptions of statistical methods and
Section 4.3 has procedures to test specifically for equality of variances.
The choice of an appropriate statistical test depends on the type of mon-
itoring and the nature of the data. The proportion of values in the data set
that are below detection is one important consideration. If most of the
values are below detection, a test of proportions is suggested.
One set of statistical procedures is suggested when the monitoring con-
sists of comparisons of water sample data from the background (hydraulically
upgradient) well with the sample data from compliance (hydraulically down-
gradient) wells. The recommended approach is analysis of variance (ANOVA).
Also, for a facility with limited amounts of data, it is advisable to ini-
tially use the ANOVA method of data evaluation, and later, when sufficient
amounts of data are collected, to change to a tolerance interval or a control
chart approach for each compliance well. However, alternate approaches are
allowed. These include adjustments for seasonality, use of tolerance inter-
vals, and use of prediction intervals. These methods are discussed in Sec-
tion 5.
When the monitoring objective is to compare the concentration of a haz-
ardous constituent to a fixed level such as a maximum concentration limit
(MCL), a different type of approach is needed. This type of comparison com-
monly serves as a basis of compliance monitoring. Control charts may be used,
as may tolerance or confidence intervals. Methods for comparison with a fixed
level are presented in Section 6.
When a long history of data from each well is available, intra-well com-
parisons are appropriate. That is, the data from a single uncontaminated well
are compared over time to detect shifts in concentration, or gradual trends in
concentration that may indicate contamination. Methods for this situation are
presented in Section 7.
4.1 FLOWCHARTS—OVERVIEW AND USE
The selection and use of a statistical procedure for ground-water moni-
toring is a detailed process. Because a single flowchart would become too
complicated for easy use, a series of flowcharts has been developed. These
flowcharts are found at the beginning of each section and are intended to
4-1
-------
guide the user in the selection and use of procedures in that section. The
more detailed flowcharts can be. thought of as attaching to the general flow-
charts at the indicated points.
Three general types of statistical procedures are presented in the flow-
chart overview (Figure 4-1): (1) background well to compliance well data
comparisons; (2) comparison of compliance well data with a constant limit such
as an alternate concentration limit (ACL) or a maximum concentration limit
(MCL); and (3) intra-well comparisons. The first question to be asked in
determining the appropriate statistical procedure is the type of monitoring
program specified in facility permit. The type of monitoring program may
determine if the appropriate comparison is among wells, comparison of down-
gradient well data to a constant, intra-well comparisons, or a special case.
If the facility is in detection monitoring, the appropriate comparison is
between wells that are hydraulically upgradient from the facility and those
that are hydraulically downgradient. The statistical procedures for this type
of monitoring are presented in Section 5. In detection monitoring, it is
likely that many of the monitored constituents may result in few quantified
results (i.e., much of the data are below the limit of analytical detection).
If this is the case, then the test of proportions (Section 8.1.3) may be rec-
ommended. If the constituent occurs in measurable concentrations in back-
ground, then analysis of variance (Section 5.2) is recommended. This method
of analysis is preferred when the data lack sufficient quantity to allow for
the use of tolerance intervals or control charts.
If the facility is in compliance monitoring, the permit will specify the
type of compliance limit. If the compliance limit is determined from the
background, the statistical method is chosen from those that compare back-
ground well to compliance well data. Statistical methods for this case are
presented in Section 5. The preferred method is the appropriate analysis of
variance method in Section 5.2, or if sufficient data permit, tolerance inter-
vals or control charts. The flow chart in Section 5 aids in determining which
method is applicable.
If a facility in compliance monitoring has a constant maximum concentra-
tion limit (MCL) or alternate concentration limit (ACL) specified, then the
appropriate comparison is with a constant. Methods for comparison with MCLs
or ACLs are presented in Section 6, which contains a flow chart to aid in
determining which method to use.
Finally, when more than one year of data have been collected from each
well, the facility owner or operator may find it useful to perform intra-well
comparisons over time to supplement the other methods. This is not a regula-
tory requirement, but it could provide the facility owner or operator with
information about the site hydrogeology. This method of analysis may be used
when sufficient data from an individual uncontaminated well exist and the data
allow for the identification of trends. A recommended control chart procedure
(Starks, 1988) suggests that a minimum background sample of eight observations
is needed. Thus an intra-well control chart approach could begin after the
first complete year of data collection. These methods are presented in
Section 7.
4-2
-------
FLOWCHART OVERVIEW
Detection Monitoring
Compliance Monitoring
or Corrective Action
Background
Background/
Compliance Well
Comparisons
(Section 5)
Type of
Compliance
Limit
MCL/ACL
with
with
i
Intra-Well
Comparisons
If more than
1Yr. of Data
Control Charts
(Section 7)
Comparisons
with MCL/ACLs
(Section 6)
1
Figure 4-1. Flowchart overview.
4-3
-------
4.2 CHECKING DISTRIBUTIONAL ASSUMPTIONS
The purpose of this section is to provide users with methods to check the
distributional assumptions of the statistical procedures recommended for
ground-water monitoring. It is emphasized that one need not do an extensive
study of the distribution of the data unless a nonparametric method of analy-
sis is used to evaluate the data. If the owner or operator wishes to trans-
form the data in lieu of using a nonparametric method, it must first be shown
that the untransformed data are inappropriate for a normal theory test.
Similarly, if the owner or operator wishes to use nonparametric methods, he or
she must demonstrate that the data do violate normality assumptions.
EPA has adopted this approach because most of the statistical procedures
that meet the criteria set forth in the regulations are robust with respect to
departures from many of the normal distributional assumptions. That is, only
extreme violations of assumptions will result in an incorrect outcome of a
statistical test. Moreover, it is only in situations where it is unclear
whether contamination is present that departures from assumptions will alter
the outcome of a statistical test. EPA therefore believes that it is protec-
tive of the environment to adopt the approach of not requiring testing of
assumptions of a normal distribution on a wide scale.
It should be noted that the normal distributional assumptions for
statistical procedures apply to the errors of the observations. Application
of the distributional tests to the observations themselves may lead to the
conclusion that the distribution does not fit the observations. In some cases
this lack of fit may be due to differences in means for the different wells or
some other cause. The tests for distributional assumptions are best applied
to the residuals from a statistical analysis. A residual is the difference
between the original observation and the value predicted by a model. For
example, in analysis of variance, the predicted values are the group means and
the residual is the difference between each observation and its group mean.
If the conclusion from testing the assumptions is that the assumptions
are not adequately met, then a transformation of the data may be used or a
nonparametric statistical procedure selected. Many types of concentration
data have been reported in the literature to be adequately described by a log-
normal distribution. That is, the natural logarithm of the original observa-
tions has been found to follow the normal distribution. Consequently, if the
normal distributional assumptions are found to be violated for the original
data, a transformation by taking the natural logarithm of each observation is
suggested. This assumes that the data are all positive. If the log trans-
formation does not adequately normalize the data or stabilize the variance,
one should use a nonparametric procedure or seek the consultation of a profes-
sional statistician to determine an appropriate statistical procedure.
The following sections present four selected approaches to check for
normality. The first option refers to literature citation, the other three
are statistical procedures. The choice is left to the user. The availability
of statistical software and the user's familiarity with it will be a factor in
the choice of a method. The coefficient of variation method, for example,
requires only the computation of the mean and standard deviation of the data.
4-4
-------
Plotting on probability paper can be done by hand but becomes tedious with
many data sets. However, the commercial Statistical Analysis System (SAS)
software package provides a computerized version of a probability plot in its
PROC UNIVARIATE procedure. SYSTAT, a package for PCs also has a probability
plot procedure. The chi-squared test is not readily available through commer-
cial software but can be programmed on a PC (for example in LOTUS 1-2-3) or in
any other (statistical) software language with which the user is familiar.
The amount of data available will also influence the choice. All tests of
distributional assumptions require a fairly large sample size to detect
moderate to small deviations from normality. The chi-squared test requires a
minimum of 20 samples for a reasonable test.
Other statistical procedures are available for checking distributional
assumptions. The more advanced user is referred to the Kolmogorov-Smirnov
test (see, for example, Lindgren, 1976) which is used to test the hypothesis
that data come from a specific (that is, completely specified) distribution.
The normal distribution assumption can thus be tested for. A minimum sample
size of 50 is recommended for using this test.
A modification to the Kolmogorov-Smirnov test has been developed by
Lilliefors who uses the sample mean and standard deviation from the data as
the parameters of the distribution (Lilliefors, 1967). Again, a sample size
of at least 50 is recommended.
Another alternative to testing for normality is provided by the rather
involved Shapiro-Wilk's test. The interested user is referred to the relevant
article in Biometrika by Shapiro and Wilk (1965).
4.2.1 Literature Citation
PURPOSE
An owner or operator may wish to consult literature to determine what
type of distribution the ground-water monitoring data for a specific con-
stituent are likely to follow. In cases where insufficient data prevents the
use of a quantitative method for checking distributional assumptions, this
approach may be necessary and make it easier to determine whether there is
statistically significant evidence of contamination.
PROCEDURE
One simple way to select a procedure based on a specific statistical dis-
tribution, is by citing a relevant published reference. The owner or operator
may find papers that discuss data resulting from sampling ground water and
conclude that such data for a particular constituent follow a specified dis-
tribution. Citing such a reference may be sufficient justification for using
a method based on that distribution, provided that the data do not show evi-
dence that the assumptions are violated.
To justify the use of a literature citation, the owner or operator needs
to make sure that the reference cited considers the distribution of data for
the specific compound being monitored. In addition, he or she must evaluate
4-5
-------
the similarity of their site to the site that was discussed in the literature,
especially similar hydrogeologic and potential contaminant characteristics.
However, because many of the compounds may not be studied in the literature,
extrapolations to compounds with similar chemical characteristics and to sites
with similar hydrogeologic conditions are also acceptable. Basically, the
owner or operator needs to provide some reason or justification for choosing a
particular distribution.
4.2.2 Coefficient-of-Variation Test
Many statistical procedures assume that the data are normally distrib-
uted. The concentration of a hazardous constituent in ground water is inher-
ently nonnegative, while the normal distribution allows for negative values.
However, if the mean of the normal distribution is sufficiently above zero,
the distribution places very little probability on negative observations and
is still a valid approximation.
One simple check that can rule out use of the normal distribution is to
calculate the coefficient of variation of the data. The use of this method
was required by the former Part 264 Subpart F regulations pursuant to Sec-
tion 264.97(h)(l). Because most owners and operators as well as Regional
personnel are already familiar with this procedure, it will probably be used
frequently. The coefficient of variation, CV, is the standard deviation of
the observations, divided by their mean. If the normal distribution is to be
a valid model, there should be very little probability of negative values.
The number of standard deviations by which the mean exceeds zero determines
the probability of negative values. For example, if the mean exceeds zero by
one standard deviation, the normal distribution will have less than 0.159
probability of a negative observation.
Consequently, one can calculate the standard deviation of the observa-
tions, calculate the mean, and form the ratio of the standard deviation di-
vided by the mean. If this ratio exceeds 1.00, there is evidence that the
data are not normal and the normal distribution should not be used for those
data. (There are other possibilities for nonnormality, but this is a simple
check that can rule out obviously nonnormal data.)
PURPOSE
This test is a simple check for evidence of gross nonnormality in the
ground-water monitoring data.
PROCEDURE
To apply the coefficient-of-variation check for normality proceed as fol-
lows.
Step 1. Calculate the sample mean, X, of n observations X^, i=l, ...,n.
X = ( z X.)/n
4-6
-------
Step 2. Calculate the sample standard deviation, S.*
I (X. - X)2/(n - 1)
1-1 1
1/2
Step 3. Divide the sample standard deviation by the sample mean. This
ratio is the CV.
cv = s/x.
Step 4. Determine if the result of Step 3 exceeds 1.00. If so, this is
evidence that the normal distribution does not fit the data adequately.
EXAMPLE
Table 4-1 is an example data set of chlordane concentrations in 24 water
samples from a fictitious site. The data are presented in order from least to
greatest.
Applying the procedure steps to the data of Table 4-1, we have:
Step 1. X = 1.52
Step 2. S = 1.56
Step 3. CV = 1.56/1.52 = 1.03
Step 4. Because the result of Step 3 was 1.03, which exceeds 1.00, we
conclude that there is evidence that the data do not adequately follow the
normal distribution. As will be discussed in other sections one would then
either transform the data, use a nonparametric procedure, or seek professional
guidance.
Throughout this document we use S2 to denote the unbiased estimate of the
population variance a2. We refer to this unbiased estimate of the popu-
lation variance as the sample variance. The formula given in Step 2
above for S, the square root of the unbiased estimate of the population
variance, is used as the sample estimate of the standard deviation and is
referred to as the "sample standard deviation." Any computation of the
sample standard deviation or the sample variance, unless explicitly noted
otherwise, refers to these formulas. It should be noted that this esti-
mate of the standard deviation is not unbiased in that its expected value
is not equal to the population standard deviation. However, all of the
statistical procedures have been developed using the formulas as we
define them here.
4-7
-------
TABLE 4-1. EXAMPLE DATA FOR COEFFICIENT-
OF-VARIATION TEST
Chlordane
Dissolved phase
Immiscible phase
NOTE. The owner or operator
1.03 is so close to the limi
nonparametric test if he or she
would be incorrect due to the dep
4.2.3 Plotting on Probability Paper
PURPOSE
Probability paper is a vlsi
whether a small set of data fol
estimates of the mean and standa
from the plot.
PROCEDURE
Let X be the variable; Xlt
The values of X can be raw data,
concentration (ppm)
0.04
0.18
0.18
0.25
0.29
0.38
0.50
0.50
0.60
0.93
0.97
10
16
1.29
1.
1.
37
38
1.45
1.
2.
2.
2,
3.
4.
46
58
69
80
33
50
6.60
may choose to use parametric tests since
t but should use a transformation or a
believes that the parametric test results
rture from normality.
al
lows
aid and
a normal
d deviation
diagnostic tool in
distribution. Also,
of the distribution
determining
approximate
can be read
X2,...,X1-,...,Xn the set of n observations,
esiduals, or transformed data.
4-8
-------
Step 1. Rearrange the observations in ascending order:
Step 2. Compute the cumulative frequency for each distinct value X(i)
as (i/(n+l)) x 100%. The divisor of (n+1) is a plotting convention to avoid
cumulative frequencies of 100% which would be at infinity on the probability
paper.
If a value of X occurs more than once, then the corresponding value of i
increases appropriately. For example, if X(2) = X(3), then the cumulative
frequency for X(l) is 100*l/(n+l), but the cumulative frequency for X(2) or
X(3) is 100*(l+2)/(n+l).
Step 3. Plot the distinct pairs [X(i), (i/n+1)) x 100] values on prob-
ability paper (this paper is comrnercially available) using an appropriate
scale for X on the horizontal axis. The vertical axis for the cumulative
frequencies is already scaled from 0.01 to 99.99%.
If the points fall roughly on a straight line (the line can be drawn with
a ruler), then one can conclude that the underlying distribution is approxi-
mately normal. Also, an estimate of the mean and standard deviation can be
made from the plot. The horizontal line drawn through 50% cuts the plotted
line at the mean of the X values. The horizontal line going through 84% cuts
the line at a value corresponding to the mean plus one standard deviation. By
subtraction, one obtains the standard deviation.
REFERENCE
Dixon, W. J., and F. J. Massey, Jr. Introduction to Statistical Analysis.
McGraw-Hill, Fourth Edition, 1983.
EXAMPLE
Table 4-2 lists 22 distinct chlordane concentration values (X) along with
their frequencies. These are the same values as those listed in Table 4-1.
There is a total of n=24 observations.
Step 1. Sort the values of X in ascending order (column 1).
Step 2. Compute [100 x (i/25)], column 4, for each distinct value of X,
based on the values of i (column 2).
Step 3. Plot the pairs [X-, 100x(i/25)] on probability paper (Fig-
ure 4-2).
INTERPRETATION
The points in Figure 4-2 do not fall on a straight line; therefore, the
hypothesis of an underlying normal distribution is rejected. However, the
4-9
-------
TABLE 4-2. EXAMPLE DATA COMPUTATIONS FOR
PROBABILITY PLOTTING
Concentration
X
0.04
0.18
0.25
0.29
0.38
0.50
0.60
Dissolved phase 0.93
0.97
1.10
1.16
1.29
1.37
1.38
1.45
1.46
2.58
2.69
Immiscible phase 2.80
3.33
4.50
6.60
Absolute
frequency
1
2
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
i 1
1
3
4
5
6
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
L00x(i/(n+l)
4
12
16
20
24
32
36
40
44
48
52
56
60
64
68
72
76
80
84
88
92
96
) 1n(X)
-3.22
-1.71
-1.39
-1.24
-0.97
-0.69
-0.51
-0.07
-0.03
0.10
0.15
0.25
0.31
0.32
0.37
0.38
0.95
0.99
1.03
1.20
1.50
1.89
4-10
-------
—rtrt
:
1 (
1 i
!
, [
1 1 |
,
,
1
^ ;
r •~'r ;
i
i
, . i
i ' _ i f . •
i ^.. . , _ . .. _ r . . .r _ .
1 ' ' ' " r > • 1 ' -* ' ' [ ' ' ( ' TJ
— ' ! ^ ' ' ' i
i 0 !
|ZZ— i _ : » 1 1
— ; :. . i:,-. :. : •:. . ; . —
__ | -.__ *. - •- ,..*..,-__,_-.
. **- 1
1 i i |
, 1 , | 1 ;
i •',•.•••". I — rt. Concentration 100x (i/(n*1)) '~^==\
X
0.04
0.18
0.25
0.29
0.33
0.50
0.60
0.93
0.97
1.10
1.16
1.29
1.37
1.38
1.45
4
12
16
20
24
32
36
40
44
48
52
56
60
64
68
0 0.5 1
3 4
X-Axis: (Concentration)
— -—"-^
| :_
"" '"" ^~'~l '"'
(
1
!
;
i
=
i
i '
i
1"
i
, — \
:
\
j
(
|
i
i ,
i
i
r ""i 2.53 7R — p — 1
i 2.69 80 " ! '
{- 2-*?0 84 1
• •• -p 3.33 88 -"! 1
\. 'r 4.50 92 r^t-_^
,_ 6.60 96 -;-.''.HJ
1 1 i i , ! i
i 1 ' ' I i ; i
[ill
Figure 4-2. Probability plot of raw chlordane concentrations.
4-11
-------
shape of the curve indicates a lognormal distribution. Thi
next step.
Also, information about the solubility of chlordane in
helpful. Chlordane has a solubility (in water) that ranges
1.85 mg/L. Because the last six measurements exceed this solut
contamination is suspected.
V fs
Next, take the natural logarithm of the X-values (ln(X)) (coy
Table 4-2). Repeat Step 3 above using the pairs [ln(X), 100x(i/25)].
suiting plot is shown in Figure 4-3. The points fall approximately
straight line (hand-drawn) and the hypothesis of lognormality of X,
ln(X) is normally distributed, can be accepted. The mean can be estimatev
slightly below 0 and the standard deviation at about 1.2 on the log scale.
CAUTIONARY NOTE
The probability plot is not a formal test of whether the data follow a
normal distribution. It is designed as a quick, graphical procedure to
identify cases of obvious nonnormality. Figure 4-3 is an example of a
probability plot of normal data, illustrating how a probability plot of normal
data looks. Figure 4-2 is an example of how nonnormal data look on a prob-
ability plot. Data that are sufficiently nonnormal to require use of a pro-
cedure not based on the normal distribution will show a definite curve. A
single point that does not fall on the straight line does not indicate non-
normality, but may be an outlier.
4.2.4 The Chi-Squared Test
The chi-squared test can be used to test whether a set of data properly
fits a specified distribution within a specified probability. Most introduc-
tory courses in statistics explain the chi-squared test, and its familiarity
among owners and operators as well as Regional personnel may make it a
frequently used method of analysis. In this application the assumed distribu-
tion is the normal distribution, but other distributions could also be used.
The test consists of defining cells or ranges of values and determining the
expected number of observations that would fall in each cell according to the
hypothesized distribution. The actual number of data points in each cell is
compared with that predicted by the distribution to judge the adequacy of the
fit.
PURPOSE
The chi-squared test is used to test the adequacy of the assumption of
normality of the data.
PROCEDURE
Step 1. Determine the appropriate number of cells, K. This number
usually ranges from 5 to 10. Divide the number of observations, N, by 4.
Dividing the total number of observations by 4 will guarantee a minimum of
four observations necessary for each of the K = N/4 cells. Use the largest
whole number of this result, using 10 if the result exceeds 10.
4-12
-------
i 1
j • In
a I '
"!— 1
- 1 -i
-1
— |~' •£
"fe 'c
! — ; c
hn ~c
3 ~" -C
s „ I c
•e I C
3 ' _
j I ' C
- " i — ' c
- — c
8 ~ i — c
•^* ,J_ ~
^ l~~ '
S K^ i
«• | — i
§ "f= . ./
X 1
**~ 1 ,1 '
t: E
1 '"•"• i'. ' .-. ,
i | t
_
^_
1 •" .
-i i .
J
s r
a 1 — : —
m
s ' "" ' '
- — : — r
1 ' . i !
a
.j
X-Axis:
i
X) 1 0Ox (i/{
.22 4
.71 12
.39 16
.24 20
.97 24
.69 32
.51 36
07 40
.03 44
.10 48
.15 52
.25 5fc
.31 60
.32 64
37 68
.38 72
.95 76
qq ar
.03 84
.20 88
50 92
.89 96
—r- i rjt
f-y— —
\/ . ;
i Jf • - ••
A :
i /
\ / '
/ i
—f 1 .
--i J 1. -
J '
j 1
! I
! !
-2.5 -2
In (Concentrati
n+l))
I
l—,/
r/—
*-—
¥
B —
l
1
on)
inr
• • '"
—
jLJ
L_
.
i
j
t
! '
" i
!
1
.
1 i
t ' J
; — r
1 -JL
1 — ^f '
i — T"
\/*"~ '
f •
• : — i
|
'
• ""*' * — —
...
S~1 ?
<< t
,
f
0
L
^
^
P
—
P-
-
^
-
(
i
i
_— . t___^.
/
; /
+\/~
^4 —
- " -> - - - i
— I
._
— f~ — —
•
. . ... r - • l
| 1
j
i — *
/
\
/
Y-
[•
,
— •
____
:_. _
.^-— -
„ ..
•
^
>
i
I — -
J-yr
\/~
,
~~
^
' ^\
^ " -
rr .- - z"
I
!
I
: i
i
i
_ 1 j ,
— ~ I '
i •
{—-^4—;
i
i
i , 1
hr,1 • ..:.- "i — r r.-
^_: — ^;_ru--^_i.- —
i 1 1 1
-. 1 1
1 '
j=j
— -— ' p~ '•• ~
- - ' | r
.
' " | ' " ' '
_ ._, ;
— _- -r ^_ -p.
— :- — j T—
! 1 |
|
I i
± 1
*
— !^ 1- r
"s
"
.3
""
—
—
-
•
'-
3
Mean
Mean-fStd
Figure 4-3. Probability plot of log-transformed chlordane concentrations.
4-13
-------
Step 2. Standardize the data by subtracting the sample mean and divid-
ing by the sample standard deviation:
Z1 =
- X)/S
Step 3. Determine the number of observations that fall in each of the
cells defined according to Table 4-3. The expected number of observations for
each cell is N/K, where N is the total number of observations and K is the
number of cells. Let N^ denote the observed number in cell i (for i taking
values from 1 to K) and let E^ denote the expected number of observations in
cell i. Note that in this case the cells are chosen to make the E^'s equal.
TABLE 4-3. CELL BOUNDARIES FOR THE CHI-SQUARED TEST
Number of cells (K)
7
8
10
Cell boundaries
for equal ex-
pected cell
sizes with the
normal distri-
bution
-0.84
-0.25
0.25
0.84
-0.97
-0.43
0.00
0.43
0.97
-1.07
-0.57
-0.18
0.18
0.57
1.07
-1.15
-0.67
-0.32
0.00
0.32
0.67
1.15
-1.22
-1.08
-0.43
-0.14
0.14
0.43
1.08
1.22
-1.28
-0.84
-0.52
-0.25
0.00
0.25
0.52
0.84
1.28
Step 4. Calculate the chi-squared statistic by the formula below:
2
2 =
K (N. - E.)
z ' 1
Step 5. Compare the calculated result to the table of the chi-squared
distribution with K-3 degrees of freedom (Table 1, Appendix B). Reject the
hypothesis of normality if the calculated value exceeds the tabulated value.
REFERENCE
Remington, R. 0., and M. A. Schork. Statistics -with Applications to the
Biological and Health Sciences. Prentice-Hall, 1970. 235-236.
EXAMPLE
The data in Table 4-4 are N = 21 residuals from an analysis of variance
on dioxin concentrations. The analysis of variance assumes that the errors
4-14
-------
TABLE 4-4. EXAMPLE DATA FOR CHI-SQUARED
TEST
Observation
Residual
Standardized
residual
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-0.45
-0.35
-0.35
-0.22
-0.16
-0.13
-0.11
-0.10
-0.10
-0.06
-0.05
0.04
0.11
0.13
0.16
0.17
0.20
0.21
0.30
0.34
0.41
-1.90
-1.48
-1.48
-0.93
-0.67
-0.55
-0.46
-0.42
-0.42
-0.25
-0.21
0.17
0.47
0.55
0.68
0.72
0.85
0.89
1.27
1.44
1.73
4-15
-------
(estimated by the residuals) are normally distributed. The chi-squared test
is used to check this assumption.
Step 1. Divide the number of observations, 21, by 4 to get 5.25. Keep
only the integer part, 5, so the test will use K = 5 cells.
Stejz 2. The sample mean and standard deviation are calculated and found
to be: X = 0.00, S = 0.24. The data are standardized by subtracting the mean
(0 in this case) and dividing by S. The results are also shown in Table 4-4.
Step 3. Determine the number of (standardized) observations that fall
into the five cells determined from Table 4-3. These divisions are: (1) less
than or equal to -0.84, (2) greater than -0.84 and less than or equal to
-0.25, (3) greater than -0.25 and less than or equal to +0.25, (4) greater
than 0.25 and less than or equal to 0.84, and (5) greater than 0.84. We find
4 observations in cell 1, 6 in cell 2, 2 in cell 3, 4 in cell 4, and 5 in
cell 5.
Step 4. Calculate the chi-squared statistic. The expected number in
each cell is N/K or 21/5 = 4.2.
Ya = M - 4.2)2 (5 - 4.2)2
x 4.2 *•• 4.2 ^*1U
Step 5. The critical value at the 5% level for a chi-squared test with
2 (K-3 = 5-3 = 2) degrees of freedom is 5.99 (Table 1, Appendix B). Because
the calculated value of 2.10 is less than 5.99 there is no evidence that these
data are not normal.
INTERPRETATION
The cell boundaries are determined from the normal distribution so that
equal numbers of observations should fall in each cell. If there are large
differences between the number of observations in each cell and that predicted
by the normal distribution, this is evidence that the data are not normal.
The chi-squared statistic is a nonnegative statistic that increases as the
difference between the predicted and observed number of observations in each
cell increases.
If the calculated value of the chi-squared statistic exceeds the tabu-
lated value, there is statistically significant evidence that the data do not
follow the normal distribution. In that case, one would need to do a trans-
formation, use a nonparametric procedure, or seek consultation before inter-
preting the results of the test of the ground-water data. If the calculated
value of the chi-squared statistic does not exceed the tabulated critical
value, there is no significant lack of fit to the normal distribution and one
can proceed assuming that the assumption of normality is adequately met.
4-16
-------
REMARK
The chi-squared statistic can be used to test whether the residuals from
an analysis of variance or other procedure are normal. In this case the
degrees of freedom are found by (number of cells minus one minus the number of
parameters that have been estimated). This may require more than the sug-
gested 10 cells. The chi-squared test does require a fairly large sample size
in that there should be generally at least four observations per cell.
4.3 CHECKING EQUALITY OF VARIANCE: BARTLETT'S TEST
The analysis of variance procedures presented in Section 5 are often more
sensitive to unequal variances than to moderate departures from normality.
The procedures described in this section allow for testing to determine
whether group variances are equal or differ significantly. Often in practice
unequal variances and nonnormality occur together. Sometimes a transformation
to stabilize or equalize the variances also produces a distribution that is
more nearly normal. This sometimes occurs if the initial distribution was
positively skewed with variance increasing with the number of observations.
Only Bartlett's test for checking equality, or homogeneity, of variances is
presented here. It encompasses checking equality of more than two variances
with unequal sample sizes. Other tests are available for special cases. The
F-test is a special situation when there are only two groups to be compared.
The user is referred to classical textbooks for this test (e.g., Snedecor and
Cochran, 1980). In the case of equal sample sizes but more than two variances
to be compared, the user might want to use Hartley's or maximum F-ratio test
(see Nelson, 1987). This test provides a quick procedure to test for variance
homogeneity.
PURPOSE
Bartlett's test is a test of homogeneity of variances. In other words,
it is a means of testing whether a number of population variances of normal
distributions are equal. Homogeneity of variances is an assumption made in
analysis of variance when comparing concentrations of constituents between
background and compliance wells, or among compliance wells. It should be
noted that Bartlett's test is itself sensitive to nonnormality in the data.
With long-tailed distributions the test too often rejects equality (homo-
geneity) of the variances.
PROCEDURE
Assume that data from k wells are available and that there are n^ data
points for well i.
4-17
-------
2 Step 1. Compute the k sample variances S^...^. The sample variance,
S , is the square of the sample standard deviation and is given by the general
equation
S2 = z (X.-X)2/(n-l)
i=l 1
where X is the average of the Xlt...,Xn values. Each variance has associated
with it f.j = n.j-1 degrees of freedom. Take the natural logarithm of each
variance, ln(Sl),...,ln(S|c).
Step 2. Compute the test statistic
2 k 2
X2 = f in(Sp) - z f. ln(S.)
k /k \
where f = z f. = z n.l-k
1=1 ] \i=l V
thus f is the total sample size minus the number of wells (groups); and
21 2
S = 4 z f.S. , is the pooled variance across wells.
P f 1=1 i i
Step 3. Using the chi-squared table (Table 1, Appendix B), find the
critical value for x2 with (k-1) degrees of freedom at a predetermined signif-
icance level, for example, 5%.
INTERPRETATION
If the calculated value x2 is larger than the tabulated value, then con-
clude that the variances are not equal at that significance level.
REFERENCE
Johnson N. L., and F. C. Leone. Statistics and Experimental Design in
Engineering and the Physical Sciences. Vol. I, John Wiley and Sons, New York,
1977.
EXAMPLE
Manganese concentrations are given for k=6 wells in Table 4-5 below.
Note: Some numbers in Table 4-5 have been rounded.
4-18
-------
TABLE 4-5. EXAMPLE DATA FOR BARTLETT'S TEST
Sampling
date
January 1
February 1
March 1
April 1
ni =
f1 = nrl =
si '
v-
fi*si2 -
ln(Si2) =
fi*ln(S12) =
Well 1 Well 2
50
73
244
202
4
3
95.27
9,076
27,228
9.11
27.33
46
77
2
1
21.92
480
480
6.17
6.17
Well 3
272
171
32
53
4
3
111.60
12,455
37,365
9.43
28.29
Well 4
34
3,940
2
1
2,761.96
7,628,423
7,628,423
15.85
15.85
Well 5
48
54
2
1
4.24
18
18
2.89
2.89
Well 6
68
991
54
3
2
536.98
288,348
576,696
12.57
25.14
logari
Step 1. • Compute the six sample variances and take their natural
•ithm, ln(S!)),..., ln(S6), as 9.11, 6.17,..., 12.57, respectively.
Step 2
Compute z f. ln(s) = 105.67
1=1 ] n
This is the sum of the last line in Table 4-5.
Compute f = z f . = 3 + 1 +...+ 2 = 11
1=1 1
Compute
$•'
TT .Z fi Si = TT (27'298 +"- + 576«696) = JY (8,270,210) = 751,837.27
Take the natural logarithm of S*: ln(Sp) = 13.53
Compute x2 = 11(13.53) - 105.67 = 43.16
4-19
-------
Step 3. The critical x2 value with 6-1 = 5 degrees of freedom at the 5%
significance level is 11.1 (Table 1 in Appendi^ B). £ince 43.16 is larger
than 11.1, we conclude that the six variances S , ...,S , are not homogeneous
at the 5% significance level. l 6
INTERPRETATION
The sample variances of the data from the six wells were compared by
means of Bartlett's test. The test was significant at the 5% level, suggest-
ing that the variances are significantly unequal (heterogeneous). A log-
transform of the data can be done and the same test performed on the trans-
formed data. Generally, if the data followed skewed distribution, this
approach resolves the problem of unequal variances and the user can proceed
with an ANOVA for example.
On the other hand, unequal variances among well data could be a direct
indication of well contamination, since the individual data could come from
different distributions (i.e., different means and variances). Then the user
may wish to test which variance differs from which one. The reader is
referred here to the literature for a gap test of variance (Tukey, 1949;
David, 1956; or Nelson, 1987).
NOTE
In the case of k=2 variances, the test of equality of variances is
the F-test (Snedecor and Cochran, 1980).
Bartlett's test simplifies in the case of equal sample sizes, n^=n,
i=l,...,k. The test used then is Cochran1s test. Cochran1s test focuses on
the largest variance and compares it to the sum of all the variances. Hartley
introduced a quick test of homogeneity of variances that uses the ratio of the
largest over the smallest variances. Technical aids for the procedures under
the assumption of equal sample sizes are given by L. S. Nelson in the Journal
of Quality Technology, Vol. 19, 1987, pp. 107 and 165.
4-20
-------
SECTION 5
BACKGROUND WELL TO COMPLIANCE WELL COMPARISONS
There are many situations in ground-water monitoring that call for the
comparison of data from different wells. The assumption is that a set of
uncontaminated wells can be defined. Generally these are background wells and
have been sited to be hydraulically upgradient from the regulated unit. A
second set of wells are sited hydraulically downgradient from the regulated
unit and are otherwise known as compliance wells. The data from these com-
pliance wells are compared to the data from the background wells to determine
whether there is any evidence of contamination in the compliance wells that
would presumably result from a release from the regulated unit.
If the owner or operator of a hazardous waste facility does not have
reason to suspect that the test assumptions of equal variance or normality
will be violated, then he or she may simply choose the parametric analysis of
variance as a default method of statistical analysis. In the event that this
method indicates a statistically significant difference between the groups
being tested, then the test assumptions should be evaluated.
This situation, where the relevant comparison is between data from back-
ground wells and data from compliance wells, is the topic of this section.
Comparisons between background well data and compliance well data may be
called for in all phases of monitoring. This type of comparison is the gen-
eral case for detection monitoring. It is also the usual approach for com-
pliance monitoring if the compliance limits are determined by the background
well constituent concentration levels. Compounds that are present in back-
ground wells (e.g., naturally occurring metals) are most appropriately
evaluated using this comparison method.
Section 5.1 provides a flowchart and overview for the selection of
methods for comparison of background well and compliance well data. Sec-
tion 5.2 contains analysis of variance methods. These provide methods for
directly comparing background well data to compliance well data. Section 5.3
describes a tolerance interval approach, where the background well data are
used to define the tolerance limits for comparison with the compliance well
data. Section 5.4 contains an approach based on prediction intervals, again
using the background well data to determine the prediction interval for com-
parison with the compliance well data. Methods for comparing data to a fixed
compliance limit (an MCL or ACL) will be described in Section 6.
5-1
-------
5.1 SUMMARY FLOWCHART FOR BACKGROUND WELL TO COMPLIANCE WELL COMPARISONS
Figure 5-1 is a flowchart to aid in selecting the appropriate statistical
procedure for background well to compliance well comparisons. The first step
is to determine whether most of the observations are quantified (that is,
above the detection limits) or not. Generally, if more than 5056 of the obser-
vations are below the detection limit (as might be the case with detection or
compliance monitoring for volatile organics) then the appropriate comparison
is a test of proportions. The test of proportions compares the proportion of
detected values in the background wells to those in the compliance wells. See
Section 8.1 for a discussion of dealing with data below the detection limit.
If the proportion of detected values is 50% or more, then an analysis of
variance procedure is the first choice. Tolerance limits or prediction inter-
vals are acceptable alternate choices that the user may select. The analysis
of variance procedures give a more thorough picture of the situation at the
facility. However, the tolerance limit or prediction interval approach is
acceptable and requires less computation in many situations.
Figure 5-2 is a flowchart to guide the user if a tolerance limits
approach is selected. The first step in using Figure 5-2 is to determine
whether the facility is in detection monitoring. If so, much of the data may
be below the detection limit. See Section 8.1 for a discussion of this case,
which may call for consulting a statistician. If most of the data are quanti-
fied, then follow the flow chart to determine if normal tolerance limits can
be used. If the data are not normal (as determined by one of the procedures
in Section 4.2), then the logarithm transformation may be done and the trans-
formed data checked for normality. If the log data are normal, the lognormal
tolerance limit should be used. If neither the original data nor the log-
transformed data are normal, seek consultation with a professional
statistician.
If a prediction interval is selected as the method of choice, see Sec-
tion 5.4 for guidance in performing the procedure.
If analysis of variance is to be used, then continue with Figure 5-1 to
select the specific method that is appropriate. A one-way analysis of vari-
ance is recommended. If the data show evidence of seasonality (observed, for
example, in a plot of the data over time), a trend analysis or perhaps a two-
way analysis of variance may be the appropriate choice. These instances may
require consultation with a professional statistician.
If the one-way analysis of variance is appropriate, the computations are
performed, then the residuals are checked to see if they meet the assumptions
of normality and equal variance. If so, the analysis concludes. If not, a
logarithm transformation may be tried and the residuals from the analysis of
variance on the log data are checked for assumptions. If these still do not
adequately satisfy the assumptions, then a one-way nonparametric analysis of
variance may be done, or professional consultation may be sought.
5-2
-------
a
O
O
en
cc
O
O
_J
LU
LU
O
<
_1
Q.
S
O
O
O
_J
_J
LU
O
DC
O
*
O
<
03
i
£
•g
O
Q.
5
£
O
.Hi
E S
11
O ffl
on
C
O
(T3
Q.
O
U
0)
O
fd
Q.
O
U
5-3
-------
Tolerance Limits: Alternate Approach to
Background Well To Compliance Well Comparisons
Tolerance" Limits
Take Log
of Data
Consult with
Professional
Statistician
Are Data
Normal?
Are
Log Data
Normal?
Normal
Tolerance
Limits
Conclusions])
Lognormal
Tolerance
Limits
Conclusions])
)onclusions'
Figure 5-2. Tolerance limits: alternate approach to background
well to compliance well comparisons.
5-4
-------
5.2 ANALYSIS OF VARIANCE
If contamination of the ground water occurs from the waste disposal
facility and if the monitoring wells are hydraulically upgradient and
hydraulically downgradient from the site, then contamination is unlikely to
change the levels of a constituent in all wells by the same amount. Thus,
contamination from a disposal site can be seen as differences in average con-
centration among wells, and such differences can be detected by analysis of
variance.
Analysis of variance (ANOVA) is the name given to a wide variety of sta-
tistical procedures. All of these procedures compare the means of different
groups of observations to determine whether there are any significant differ-
ences among the groups, and if so, contrast procedures may be used to
determine where the differences lie. Such procedures are also known in the
statistical literature as general linear model procedures.
Because of its flexibility and power, analysis of variance is the pre-
ferred method of statistical analysis when the ground-water monitoring is
based on a comparison of background and compliance well data. The ANOVA is
especially useful in situations where sample sizes are small, as is the case
during the initial phases of ground-water monitoring. Two types of analysis
of variance are presented: parametric and nonparametric one-way analyses of
variance. Both methods are appropriate when the only factor of concern is the
different monitoring wells at a given sampling period.
The hypothesis tests with parametric analysis of variance usually assume
that the errors (residuals) are normally distributed with equal variance.
These assumptions can be checked by saving the residuals (the difference
between the observations and the values predicted by the analysis of variance
model) and using the tests of assumptions presented in Section 4. Since the
data will generally be concentrations and since concentration data are often
found to follow the lognormal distribution, the log transformation is sug-
gested if substantial violations of the assumptions are found in the analysis
of the original concentration data. If the residuals from the transformed
data do not meet the parametric ANOVA requirements, then nonparametric
approaches to analysis of variance are available using the ranks of the obser-
vations. A one-way analysis of variance using the ranks is presented in
Section 5.2.2.
When several sampling periods have been used and it is important to con-
sider the sampling periods as a second factor, then two-way analysis of vari-
ance, parametric or nonparametric, is appropriate. This would be one way to
test for and adjust the data for seasonality. Also, trend analysis (e.g.,
time series) may be used to identify seasonal ity in the data set. If neces-
sary, data that exhibit seasonal trends can be adjusted. Usually, however,
seasonal variation will affect all wells at a facility by nearly the same
amount, and in most circumstances, corrections will not be necessary. Fur-
ther, the effects of seasonality will be substantially reduced by simultane-
ously comparing aggregate compliance well data to background well data.
Situations that require an analysis procedure other than a one-way ANOVA
should be referred to a professional statistician.
5-5
-------
5.2.1 One-Way Parametric Analysis of Variance
In the context of ground-water monitoring, two situations exist for which
a one-way analysis of variance is most applicable:
* Data for a water quality parameter are available from several wells
but for only one time period (e.g., monitoring has just begun).
* Data for a water quality parameter are available from several wells
for several time periods. However, the data do not exhibit sea-
sonal ity.
In order to apply a parametric one-way analysis of variance, a minimum
number of observations is needed to give meaningful results. At least p > 2
groups are to be compared (i.e., two or more wells). It is recommended that
each group (here, wells) have at least three observations and that the total
sample size, N, be large enough so that N-p > 5. A variety of combinations of
groups and number of observations in groups will fulfill this minimum. One
sampling interval with four independent samples per well and at least three
wells would fulfill the minimum sample size requirements. The wells should be
spaced so as to maximize the probability of intercepting a plume of contamina-
tion. The samples should be taken far enough apart in time to guard against
autocorrelation.
PURPOSE
One-way analysis of variance is a statistical procedure to determine
whether differences in mean concentrations among wells, or groups of wells,
are statistically significant. For example, is there significant contamina-
tion of one or more compliance wells as compared to background wells?
PROCEDURE
Suppose the regulated unit has p wells and that n^ data points (concen-
trations of a constituent) are available for the ith well. These data can be
from either a single sampling period or from more than one. In the latter
case, the user could check for seasonality before proceeding by plotting the
data over time. Usually the computation will be done on a computer using a
commercially available program. However, the procedure is presented so that
computations can be done using a desk calculator, if necessary.
P
Step 1. Arrange the N = in. data points in a data table as follows
1=1 1
(N is the total sample size at this specific regulated unit):
5-6
-------
Well Total Well Mean
(from (from
Well No. 1
2
3
u
n
Observations
11 . In.
• ' 1
• i
xul
pl pn
Step 1)
1.
xu.
xp.
X
Step 2)
-
1.
xu.
X
P.
X
Step 2. Compute well totals and well means as follows:
i
Y - r X
i i i '
* j=l J
total of all n.. observations at well i
1
'. = — X. , average of all n. observations at well i
i • n • i • i
P n1
X = z z X. . , grand total of all n. observations
• ••I'-ilj I
X = -jj X , grand mean of all observations
These totals and means are shown in the last two columns of the table above.
Step 3. Compute the sum of squares of differences between well means
and the grand mean:
SSu«T!«. = z n. (X. -X )2= z
Wells ^_i i v i. ..'
P i
z —
i=l ni
(The formula on the far right is usually most convenient for calculation.)
This sum of squares has (p-1) degrees of freedom associated with it and is a
measure of the variability between wells.
5-7
-------
Step 4. Compute the corrected total sum of squares
SS
Total
p
= z
?
. - X
p
= z
ni
Z1 X?. - (X* /N)
(The formula on the far right is usually most convenient for calculation.)
This sum of squares has (N-l) degrees of freedom associated with it and is a
measure of the variability in the whole data set.
Step 5. Compute the sum of squares of differences of observations
within wells from the well means. This is the sum of squares due to error and
is obtained by subtraction:
SSError = SSTotal " SSWells
It has associated with it (N-p) degrees of freedom and is a measure of the
variability within wells.
Step 6. Set up the ANOVA table as shown below in Table 5-1. The sums
of squares and their degree of freedom were obtained from Steps 3 through 5.
The mean square quantities are simply obtained by dividing each sum of squares
by its corresponding degrees of freedom.
TABLE 5-1. ONE-WAY PARAMETRIC ANOVA TABLE
Source of Degrees of
Variation Sums of squares freedom Mean squares
MS,
Error (within SSgrror
wells)
Total ssTotal
V A
N-p
N-l
''"Wells
— cc
- 5bWells/(p-l)
MSError
= SSError/(N-p)
MSr-
^Error
Step 7. To test the hypothesis of equal means for all p wells, compute
F = MSWells/MSError Oast column in Table 5-1). Compare this statistic to the
tabulated F statistic with (p-1) and (N-p) degrees of freedom (Table 2, Appen-
dix B) at the 5% significance level. If the calculated F value exceeds the
tabulated value, reject the hypothesis of equal well means. Otherwise,
5-8
-------
conclude that there is no significant difference between the concentrations at
the p wells and thus no evidence of well contamination.
In the case of a significant F (calculated F greater than tabulated F in
Step 7), the user will conduct the next few steps to determine which compli-
ance well(s) is (are) contaminated. This will be done by comparing each com-
pliance well with the background well(s). Concentration differences between a
pair of background wells and compliance wells or between a compliance well and
a set of background wells are railed contrasts in the ANO'VA and multiple com-
parisons framework.
Step 8. Determine if the significant F is due to differences between
background and compliance wells (computation of Bonferroni t-stati sties).
Assume that of the p wells, u are background wells and m are compliance
wells (thus u + m = p). Then m differences — m compliance wells each compared
with the average of the background wells — need to be computed and tested for
statistical significance. If there are more than five downgradient wells, the
individual comparisons are done at the comparisonwise significance level of
1%, which may make the experimentwise significance level greater than 5%.
Obtain the total sample size of all u background wells.
Compute the average concentration from the u background wells.
Compute the m differences between the average concentrations from
each compliance well and the average background wells.
X • - X. , i = 1 , . . . , m
I • 0
Compute the standard error of each difference as
where MSError is determined from the ANOVA table (Table 5-1) and n.,-
is the number of observations at well i.
Obtain the t-statistic t = t/N_p^ n_a/m\ from Bonferroni 's t-table
(Table 3, Appendix B) with o = 0.05 and (N-p) degrees of freedom.
5-9
-------
Compute the m quantities 0^ = SE^ x t for each compliance well i.
If m > 5 use the entry for t/N_p\ n.o.oi)* Tnat is» use tne entry
at m = 5.
Step 9. Compute the residuals. The residuals are the differences
between each observation and its predicted value according to the particular
analysis of variance model under consideration. In the case of a one-way
analysis of variance, the predicted value for each observation is the group
(that is, well) mean. Thus the residuals are given by:
The residuals, R^,- can be used to check for departures from normality as
described in Section 4.2.
NOTE
The data can also be checked for equality of variances as described in
Section 4.3. The last column of Table 5-2 contains the standard deviations
estimated for each well, the S^ used in Bartlett's test.
INTERPRETATION
If the difference X^ - X^ exceeds the value D.J, conclude that the ith
compliance well has significantly higher concentrations than the average back-
ground wells. Otherwise conclude that the well is not contaminated. This
exercise needs to be performed for each of the m compliance wells individu-
ally. The test is designed so that the overall experimentwise error is 5% if
there are no more than five compliance wells.
In some cases it may be appropriate to implement the ANOVA procedure
independently for an individual regulated unit. If there are more than five
wells at the compliance point and the waste management area consists of more
than one regulated unit, then the data may be evaluated separately for each
regulated unit if approved by the Regional Administrator or State Director.
In many cases the monitoring well system design and site hydrogeology will
determine if this approach is appropriate for a particular regulated unit.
This will help reduce the number of compliance wells used in a multiple well
comparisons procedure.
If a single regulated unit has more than five wells at the point of
compliance, refer to the caveat in the cautionary note.
CAUTIONARY NOTE
Should the regulated unit consist of more than five compliance wells,
then the Bonferroni t-test should be modified by doing the individual compari-
sons at the 1% level so that the Part 264 Subpart F regulatory requirement
5-10
-------
pursuant to §264.97(1)(2) will be met. Alternately, a different analysis of
contrasts, such as Scheffe's, may be used. The more advanced user is referred
to the second reference below for a discussion of multiple comparisons.
REFERENCES
Johnson, Norman L., and F. C. Leone. 1977. Statistics and Experimental
Design in Engineering and the Physical Sciences. Vol. II, Second Edition,
John Wiley and Sons, New York.
Miller, Ruppert G., Jr. 1981. Simultaneous Statistical Inference. Second
Edition, Springer-Verlag, New York.
EXAMPLE
Four lead concentration values at each of six wells are given in
Table 5-2 below. The wells consist of u=2 background and m=4 compliance
wells. (The values in Table 5-2 are actually the natural logarithms of the
original lead concentrations.)
Step 1. Arrange the 4 x 6 = 24 observations in a data table as follows:
TABLE 5-2. EXAMPLE DATA FOR ONE-WAY PARAMETRIC ANALYSIS OF VARIANCE
Natural
loq
of Pb concentrations(yg/L)
Wei
1
total
Well No. Date:
1 Background wells
2
3 Compliance wells
4
5
6
Jan 1
4.06
3.83
5.61
3.53
3.91
5.42
Feb 1
3.99
4.34
5.14
4.54
4.29
5.21
Mar 1
3.40
3.47
3.47
4.26
5.50
5.29
Wei
1
mean
Apr 1 (X1e) (X^)
3.
4.
3.
4.
5.
5.
83
22
97
42
31
08
X..
15
15
18
16
19
21
= 106
.28
.86
.19
.75
.01
.00
.09
3
3
4
4
4
5
X.. = 4
.82
.97
.55
.19
.75
.25
.42
Well
std.
0.296
0.395
0.996
0.453
0.773
0.143
dev.
(max)
(min)
Step 2. The calculations are shown on the right-hand side of the data
table above. Sample standard deviations have been computed also.
Step 3. Compute the between-well sum of squares.
SSWells = i (15-282 + •••• + 21.012) - 25 x 106.082 = 5.75
with [6 (wells) - 1] = 5 degrees of freedom.
5-11
-------
Step 4. Compute the corrected total sum of squares.
SS
Total
= 4.062 + 3.992 + + 5.Q82 . -. x 106.082 = 11.92
with [24 (observations) - 1] = 23 degrees of freedom.
Step 5. Obtain the within-well or error sum of squares by subtraction.
SSError = U'92 ~ 5'75 = 6'17
with [24 (observations) - 6 (wells)] = 18 degrees of freedom,
Step 6. Set up the one-way ANOVA as in Table 5-3 below:
TABLE 5-3. EXAMPLE COMPUTATIONS IN ONE-WAY PARAMETRIC ANOVA TABLE
Source of
variation
Sums of Degrees of
squares freedom Mean squares
Between we!Is 5.76
Error 6.18
(within welIs)
Total 11.94
5
18
23
5.76/5 = 1.15 1.15/0.34 = 3.38
6.18/18 = 0.34
Step 7. The calculated F statistic is 3.38. The tabulated F value with
5 and 18 degrees of freedom at the a = 0.05 level is 2.77 (Table 2, Appen-
dix B). Since the calculated value exceeds the tabulated value, the hypothe-
sis of equal well means must be rejected, and post hoc comparisons are
necessary.
Step 8. Computation of Bonferroni t-statistics.
Note that there are four compliance wells, so m = 4 comparisons will
be made
n^ = 8 total number of samples in background wells
X^ = 3.89 average concentration of background wells
5-12
-------
Compute the differences between the four compliance wells and the
average of the two background wells:
X3. - Xb = 4.55 - 3.89 = 0.66
XH- - Xb = 4.19 - 3.89 = 0.3
X5. - Xb = 4.75 - 3.89 = 0.86
X6. - Xb = 5.25 - 3.89 = 1.36
Compute the standard error of each difference. Since the number of
observations is the same for all compliance wells, the standard
errors for the four differences will be equal.
SE. = [0.34 (1/8 + 1/4) I*5 = 0.357 for i = 3,..., 6
From Table 3, Appendix B, obtain the critical t with (24 - 6) = 18
degrees of freedom, m = 4, and for a = 0.05. The approximate value
is 2.43 obtained by linear interpolation between 15 and 20 degrees
of freedom.
Compute the quantities D.J. Again, due to equal sample sizes, they
will all be equal.
D. = SE. x t = 0.357 x 2.43 = 0.868 for i = 3,..., 6
Step 9. Compute the residuals using the data given in Table 5-2.
Residuals for Well 1:
RM = 4.06 - 3.82 = 0.24
R12 = 3.99 - 3.82 = 0.17
R13 = 3.40 - 3.82 = -0.42
Rm = 3.83 - 3.82 = 0.01
Residuals for Well 2:
R21 = 3.83 - 3.97 = -0.14
R22 = 4.34 - 3.97 = 0.37
R23 = 3.47 - 3.97 = -0.50
R21t = 4.22 - 3.97 = 0.25
Residuals for Well 3:
R31 = 5.61 - 4.55 = 1.06
R32 = 5.14 - 4.55 = 0.59
5-13
-------
R33 = 3.47 - 4.55 = -1.08
R3H = 3.97 - 4.55 = -0.58
Residuals for Well 4:
RH1
M- 2
R|»3
R.m
Residual
RSI
R52
R53
RSU
= 3.
= 4.
= 4.
= 4.
s for
= 3.
= 4.
= 5.
= 5.
53 -
54 -
26 -
42 -
Well
91 -
29 -
50 -
31 -
4.
4.
4.
4.
5
4.
4.
4.
4.
19
19
19
19
•
•
75
75
75
75
=
=
=
=
s
=
-
=
-0
0
0
0
-0
-0
0
0
.66
.35
.07
.23
.84
.46
.75
.56
Residuals for Well 6:
R61 = 5.42 -
R62 = 5.21 -
R63 = 5.29 -
R61t = 5.08 -
5.25 = 0.17
5.25 = -0.04
5.25 = 0.04
5.25 = -0.17
INTERPRETATION
The F test was significant at the 5% level. The Bonferroni multiple
comparisons procedure was then used to determine for which wells there was
statistically significant evidence of contamination. Of the four differences
X^ - Xb, only X6. - Xb = 1.36 exceeds the critical value of 0.868. From
this it is concluded that there is significant evidence of contamination at
Well 6. Well 5 is right on the boundary of significance. It is likely that
Well 6 has intercepted a plume of contamination with Well 5 being on the edge
of the plume.
All the compliance well concentrations were somewhat above the mean con-
centration of the background levels. The well means should be used to indi-
cate the location of the plume. The findings should be reported to the
Regional Administrator.
5.2.2 One-way Nonparametric Analysis of Variance
This procedure is appropriate for interwell comparisons when the data or
the residuals from a parametric ANOVA have been found to be significantly dif-
ferent from normal and when a log transformation fails to adequately normalize
the data. In one-way nonparametric ANOVA, the assumption under the null
hypothesis is that the data from each well come from the same continuous dis-
tribution and hence have the same median concentrations of a specific hazard-
ous constituent. The alternatives of interest are that the data from some
wells show increased levels of the hazardous constituent in question.
5-14
-------
The procedure is called the Kruskal-Wallis test. For meaningful results,
there should be at least three groups with a minimum sample size of three in
each group. For large data sets use of a computer program is recommended. In
the case of large data sets a good approximation to the procedure is to re-
place each observation by its rank (its numerical place when the data are
ordered from least to greatest) and perform the (parametric) one-way analysis
of variance (Section 5.2.1) on the ranks. Such an approach can be done with
some commercially statistical packages such as SAS.
PURPOSE
The purpose of the procedure is to test the hypothesis that all wells (or
groups of wells) around regulated units have the same median concentration of
a hazardous constituent. If the wells are found to differ, post-hoc compari-
sons are again necessary to determine if contamination is present.
Note that the wells define the groups. All wells will have at least four
observations. Denote the number of groups by K and the number of observations
in each group by n^, with N being the total number of all observations. Let
X^j denote the jth observation in the ith group, where j runs from 1 to the
number of observations in the group, n^, and i runs from 1 to the number of
groups, K.
PROCEDURE
Step 1. Rank all N observations of the groups from least to greatest.
Let R.JJ denote the rank of the jth observation in the ith group. As a
convention, denote the background well(s) as group 1.
Step 2. Add the ranks of the observations in each group. Call the sum
of the ranks for the ith group R.J. Also calculate the average rank for each
group, R1 = Rj/rif.
Step 3. Compute the Kruskal-Wallis statistic:
H =
12 Rf"
tf i=i N)
Step 4. Compare the calculated value H to the tabulated chi-squared
value with (K-l) degrees of freedom, where K is the number of groups (Table 1,
Appendix B). Reject the null hypothesis if the computed value exceeds the
tabulated critical value.
5-15
-------
Step 5. If the computed value exceeds the value from the chi-squared
table, compute the critical difference for well comparisons to the background,
assumed to be group 1:
Ci =
for i taking values 2,..., K,
12
1/2
where Z///K_j\\ is the upper (a/(K-l))-percentile from the standard normal
distribution found in Table 4, Appendix B. Note: If there are more than five
compliance wells at the regulated unit (K > 6), use Z.01, the upper one-
percentile from the standard normal distribution.
Step 6. Form the differences of the average ranks for each group to the
background and compare these with the critical values found in step 5 to de-
termine which wells give evidence of contamination. That is, compare R-j-Ri to
C.j for i taking the values 2 through K. (Recall that group 1 is the back-
ground.)
While the above steps are the general procedure, some details need to be
specified further to handle special cases. First, it may happen that two or
more observations are numerically equal or tied. When this occurs, determine
the ranks that the tied observations would have received if they had been
slightly different from each other, but still in the same places with respect
to the rest of the observations. Add these ranks and divide by the number of
observations tied at that value to get an average rank. This average rank is
used for each of the tied observations. This same procedure is repeated for
any other groups of tied observations. Second, if there are any values below
detection, consider all values below detection as tied at zero. (It is
irrelevant what number is assigned to nondetected values as long as all such
values are assigned the same number, and it is smaller than any detected or
quantified value.)
The effect of tied observations is to increase the value of the sta-
tistic, H. Unless there are many observations tied at the same value, the
effect of ties on the computed test statistic is negligible (in practice, the
effect of ties can probably be neglected unless some group contains 10 percent
of the observations all tied, which is most likely to occur for concentrations
below detection limit). In the present context, the term "negligible" can be
more specifically defined as follows. Compute the Kruskal-Wallis statistic
without the adjustment for ties. If the test statistic is significant at the
5% level then conclude the test since the statistic with correction for ties
will be significant as well. If the test statistic falls between the 10% and
the 5% critical values, then proceed with the adjustment for ties as shown
below.
5-16
-------
ADJUSTMENT FOR TIES
If there are 50% or more observations that fell below the detection
limit, then this method for adjustment for ties is inappropriate. The user is
referred to Section 8 "Miscellaneous Topics." Otherwise, if there are tied
values present in the data, use the following correction for the H statistic
h" =
H
1 -
z T./(N3-N)
• 1 i /
where g = the number of groups of distinct tied observations and T^ =
where t^ is the number of observations in the tied group i. Note that unique
observations can be considered groups of size 1, with the corresponding
= 0.
REFERENCE
Hollander, Myles, and D. A. Wolfe.
Methods. John Wiley and Sons, New York.
EXAMPLE
1973.
Nonparametric Statistical
The data in Table 5-4 represent benzene concentrations in water samples
taken at one background and five compliance wells.
Step 1. The 20 observations have been ranked from least to greatest.
The limit of detection was 1.0 ppm. Note that two values in Well 4 were below
detection and were assigned value zero. These two are tied for the smallest
value and have consequently been assigned the average of the two ranks 1 and
2, or 1.5. The ranks of the observations are indicated in parentheses after
the observation in Table 5-4. Note that there are 3 observations tied at 1.3
that would have had ranks 4, 5, and 6 if they had been slightly different.
These three have been assigned the average rank of 5 resulting from averaging
4, 5, and 6. Other ties occurred at 1.5 (ranks 7 and 8) and 1.9 (ranks 11 and
12).
Step 2. The values of the sums of ranks and average ranks are indicated
at the bottom of Table 5-4.
Step 3. Compute the Kruskal-Wallis statistic
H =
12
20(20+1)
(34V4
+ 35.5V3) - 3(20+1) = 14.68
5-17
-------
X— •*
I
a.
z
o
i— <
i—
2
i—
z
LU
O
0
UJ
LU
Z
LU
ca
i
i
^^
>
o
z
^£
v->
a:
H~
UJ
^r
^£
a:
a.
o
z
>_
(Tj
o
^-^
O^
VO
•
f-H
*^^»
^,
f-H
^•x
^^
•
""
s—*.
ID
•
i— 1
^^x
o
,^1— H»
LO
^HM^
ro
•
r-H
^"^*
o
CM
" —
O
r— 1
r— 1
^— »s
O
r— 1
x^x
f^
•
i— 1
r— (
C
*
--3
LO
r— I
LO
•
CM
^~*n
^O
r— t
s^
PS^
•
ro
s*^.
LO
v^^x
ro
•
<— i
^•^
ro
*^^
CM
«
i-H
f~**
CO
1— 1
' — '
o
00
^_,
LO
•
T— t
T— t
-w
C7^
•
v-H
i— 4
Q
a>
u.
_
to
i—4
t-H
^\
t
*— (
X^K
^f
,«4
— '
ro
•
CM
^^•^
LO . — «
• CO
t— ( »— 1
^••^ *^s
O CM
•
CM
,_^
LO
•
PN^
XuS
LO
•
•— '
*»••*
cn
*— '
LO
•
a\
,*~**
LO
• ***™^
P^ IT)
s.^x ^_ ^
LO ro
• •
f-H f-H
t-H f-H
U &-
(Tj f\
2: ««
ro
LO CO
LO ^-1
co ro •— i
ii u n
^ *O tf
c a: IQC
VO
•
r«^ LO
rO ^- r-i
ii it u
in in i/>
c a: la:
LO
CM
I— 1 •
«T CM LO
II II II
^ J* ^
c a: la:
on
C
o
•p-
"lo
LO l^x S-
r-l OJ
LO • in
ro i— i LO ri
0
II II II
M-
pn frj try Q
c ce: 10:
a;
-Q
^
c
)
r^ r~
r— rO
f^« O\ QJ •*-*
CO LO ^-t 3 O
j fc
II II II 4-
o a*
CM CM CM f
c a: la: i- -i->
0)
E cT
3 CM
c
(U
LO -M C
^" • r-H
«T CO CO • VO M ||
VO -r-
II II II
II II
*+ •• ^
c a: la: ^ z
t • • •
J^ C
c ia
<0 l-
s_
O)
<4- cn
O fO
s_
E (U
a >
5-18
-------
ADJUSTMENT FOR TIES
There are four groups of ties in the data of Table 5-4:
T2
T3
Tlf
(23-2) = 6 for the 2 observations of 1,900.
(23-2) = 6 for the 2 observations of 1,500.
(33-3) = 24 for the 3 observations of 1,300.
(23-2) = 6 for the 2 observations of 0.
Thus
z T. = 6+6+24+6 = 42
and H1 =
off = 14'76' a ne9l191b1.e
from
Step 4. To test the null hypothesis of no contamination, obtain the
critical chi-squared value with (6-1) = 5 degrees of freedom at the 5% signif-
icance level from Table 1, Appendix B. The value is 11.07. Compare the cal-
culated value, H1, with the tabulated value. Since 14.76 is greater than
11.07, reject the hypothesis of no contamination at the 5% level. If the site
was in detection monitoring it should move into compliance monitoring. If the
site was in compliance monitoring it should move into corrective actidn. If
the site was in corrective action it should stay there.
In the case where the hydraulically upgradient wells serve as the back-
ground against which the compliance wells are to be compared, comparisons of
each compliance well with the background wells should be performed in addition
to the analysis of variance procedure. In this example, data from each of the
compliance wells would be compared with the background well data. This com-
parison is accomplished as follows. The average ranks for each group, R^ are
used to compute differences. If a group of compliance wells for a regulated
unit have larger concentrations than those found in the background wells, the
average rank for the compliance wells at that unit will be larger than the
average rank for the background wells.
Step 5. Calculate the critical values to compare each compliance well
to the background well.
In this example, K=6, sq there are 5 comparisons of the compliance wells
with the background wells. Using an experimentwise significance level of a =
0.05, we find the upper 0.05/5 = 0.01 percentile of the standard normal
distribution to be 2.33 (Table 4, Appendix B). The total sample size, N, is
20. The approximate critical value, C2, is computed for compliance Well 2,
which has the largest average rank, as:
= 2.32
20(21)
.2
1/2
1/2
The critical values for the other wells are:
9.8 for Well 4.
= 10.5
10.5 for Wells 3, 5, and 6; and
5-19
-------
Step 6. Compute the differences between the average rank of each com-
pliance well and the average rank of the background well:
Differences Critical values
02 = 19.0 - 8.5 = 10.5 - -C2 =-10.5
03 = 5.17 - 8.5 = -3.33 —C3 =10.5
D^ = 5.25 - 8.5 = -3.25 C^. = 9.8
D5 = 15.67 - 8.5 * 7.17 C5 =-10.5
D6 = 11.83 - 8.5 = 3.13 C6 =-10.5
Compare each difference with the corresponding critical difference. D2 = 10.5
equals the critical value of C2 = 10.5. We conclude that the concentration of
benzene averaged over compliance Well 2 is significantly greater than that at
the background well. None of the other compliance well concentration of
benzene is significantly higher than the average background value. Based upon
these results, only compliance Well 2 can be singled out as being
contaminated.
For data sets with more than 30 observations, the parametric analysis of
variance performed on the rank values is a good approximation to the Kruskal-
Wallis test (Quade, 1966). If the user has access to SAS, the PROC RANK pro-
cedure is used to obtain the ranks of the data. The analysis of variance pro-
cedure detailed in Section 5.2.1 is then performed on the ranks. Contrasts
are tested as in the parametric analysis of variance.
INTERPRETATION
The Kruskal-Wallis test statistic is compared to the tabulated critical
value from the chi-squared distribution. If the test statistic does not
exceed the tabulated value, there is no statistically significant evidence of
contamination and the analysis would stop and report this finding. If the
test statistic exceeds the tabulated value, there is significant evidence that
the hypothesis of no differences in compliance concentrations from the back-
ground level is not true. Consequently, if the test statistic exceeds the
critical value, one concludes that there is significant evidence of contami-
nation. One then proceeds to investigate where the differences lie, that is,
which wells are indicating contamination.
The multiple comparisons procedure described in steps 5 and 6 compares
each compliance well to the background well. This determines which compliance
wells show statistically significant evidence of contamination at an experi-
mentwise error rate of 5 percent. In many cases, inspection of the mean or
median concentrations will be sufficient to indicate where the problem lies.
5.3 TOLERANCE INTERVALS BASED ON THE NORMAL DISTRIBUTION
An alternate approach to analysis of variance to determine whether there
is statistically significant evidence of contamination is to use tolerance
intervals. A tolerance interval is constructed from the data on (uncontam-
inated) background wells. The concentrations from compliance wells are then
5-20
-------
compared with the tolerance interval. With the exception of pH, if the com-
pliance concentrations do not fall in the tolerance interval, this provides
statistically significant evidence of contamination.
Tolerance intervals are most appropriate for use at facilities that do
not exhibit high degrees of spatial variation between background wells and
compliance wells. Facilities that overlie extensive, homogeneous geologic
deposits (for example, thick, homogeneous lacustrine clays) that do not natu-
rally display hydrogeochemical variations may be suitable for this statistical
method of analysis.
A tolerance interval establishes a concentration range that is con-
structed to contain a specified proportion (P%) of the population with a
specified confidence coefficient, Y. The proportion of the population
included, P, is referred to as the coverage. The probability with which the
tolerance interval includes the proportion P% of the population is referred to
as the tolerance coefficient.
A coverage of 95% is recommended. If this is used, random observations
from the same distribution as the background well data would exceed the upper
tolerance limit less than 5% of the time. Similarly, a tolerance coefficient
of 95% is recommended. This means that one has a confidence level of 95% that
the upper 95% tolerance limit will contain at least 95% of the distribution of
observations from background well data. These values were chosen to be con-
sistent with the performance standards described in Section 2. The use of
these values corresponds to the selection of o of 5% in the multiple well
testing situation.
The procedure can be applied with as few as three observations from the
background distribution. However, doing so would result in a large upper
tolerance limit. A sample size of eight or more results is an adequate toler-
ance interval. The minimum sampling schedule called for in the regulations
would result in at least four observations from each background well. Only if
a single background well is sampled at a single point in time is the sample
size so small as to make use of the procedure questionable.
Tolerance intervals can be constructed assuming that the data or the
transformed data are normally distributed. Tolerance intervals can also be
constructed assuming other distributions. It is also possible to construct
nonparametric tolerance intervals using only the assumption that the data came
from some continuous population. However, the nonparametric tolerance
intervals require such a large number of observations to provide a reasonable
coverage and tolerance coefficient that they are impractical in this
application.
The range of the concentration data in the background well samples should
be considered in determining whether the tolerance interval approach should be
used, and if so, what distribution is appropriate. The background well con-
centration data should be inspected for outliers and tests of normality
applied before selecting the tolerance interval approach. Tests of normality
were presented in Section 4.2. Note that in this case, the test of normality
would be applied to the background well data that are used to construct the
5-21
-------
tolerance interval. These data should all be from the same normal
distribution.
In this application, unless pH is being monitored, a one-sided tolerance
interval or an upper tolerance limit is desired, since contamination is indi-
cated by large concentrations of the hazardous constituents monitored. Thus,
for concentrations, the appropriate tolerance interval is (0, TL), with the
comparison of importance being the larger limit, TL.
PURPOSE
The purpose of the tolerance interval approach is to define a concentra-
tion range from background well data, within which a large proportion of the
monitoring observations should fall with high probability. Once this is done,
data from compliance wells can be checked for evidence of contamination by
simply determining whether they fall in the tolerance interval. If they do
not, this is evidence of contamination.
In this case the data are assumed to be approximately normally distrib-
uted. Section 4.2 provided methods to check for normality. If the data are
not normal, take the natural logarithm of the data and see if the transformed
data are approximately normal. If so, this method can be used on the loga-
rithms of the data. Otherwise, seek the assistance of a professional
statistician.
PROCEDURE
Step 1. Calculate the mean, X, and the standard deviation, S, from the
background well data.
Step 2. Construct the one-sided upper tolerance limit as
TL = X + K S,
where K is the one-sided normal tolerance factor found in Table 5, Appendix B.
Step 3. Compare each observation from compliance wells to the tolerance
limit found in Step 2. If any observation exceeds the tolerance limit, that
is statistically significant evidence that the well is contaminated. Note
that if the tolerance interval was constructed on the logarithms of the orig-
inal background observations, the logarithms of the compliance well observa-
tions should be compared to the tolerance limit. Alternatively the tolerance
limit may be transferred to the original data scale by taking the anti-
logarithm.
REFERENCE
Lieberman, Gerald J. 1958. "Tables for One-sided Statistical Tolerance
Limits." Industrial Quality Control. Vol. XIV, No. 10.
5-22
-------
EXAMPLE
Table 5-5 contains example data that represent lead concentration levels
in parts per million in water samples at a hypothetical facility. The
background well data are in columns 1 and 2, while the other four columns
represent compliance well data.
TABLE 5-5. EXAMPLE DATA FOR NORMAL TOLERANCE INTERVAL
Lead concentrations (ppm)
Background well Compliance wells
Date A B well 1 Well 2 Well 3 Well 4
Jan
Feb
Mar
Apr
n
Mean
SD
1
1
1
1
=
=
=
8
51
16
58
54
30
46
.4
.3
.0
.1
.0
.1
46.
76.
32.
68.
The
with
51.4
1 273.1*
7 170.7*
1 32.1
0 53.0
upper 95%
tolerance
+ (3.188)
34.1
93.7
70.8
83.1
coverage
49
73
244
202
.9
.0
.7*
.4*
tolerance 1
coefficient of
(16.3) =
103.4
95%
225
183
198
160
imit
is
.9*
.1*
.3*
.8*
Indicates contamination
Step 1. The mean and standard deviation of the n = 8 observations have
been calculated for the background well. The mean is 51.4 and the standard
deviation is 16.3.
Step 2. The tolerance factor for a one-sided normal tolerance interval
is found from Table 5, Appendix B as 3.188. This is for 95% coverage with
probability 95% and for n = 8. The upper tolerance limit is then calculated
as 51.4 + (3.188)(16.3) = 103.4.
Step 3. The tolerance limit of 103.3 is compared with the compliance
well data. Any value that exceeds the tolerance limit indicates statistically
significant evidence of contamination. Two observations from Well 1, two
observations from Well 3, and all four observations from Well 4 exceed the
tolerance limit. Thus there is statistically significant evidence of con-
tamination at Wells 1, 3, and 4.
5-23
-------
INTERPRETATION
A tolerance limit with 95% coverage gives an upper bound below which 95%
of the observations of the distribution should fall. The tolerance coeffi-
cient used here is 95%, implying that at least 95% of the observations should
fall below the tolerance limit with probability 95%, if the compliance well
data come from the same distribution as the background data. In other words,
in this example, we are 95% certain that 95% of the background lead concentra-
tions are below 104 ppm. If observations exceed the tolerance limit, this is
evidence that the compliance well data are not from the same distribution, but
rather are from a distribution with higher concentrations. This is inter-
preted as statistically significant evidence of contamination.
5.4 PREDICTION INTERVALS
A prediction interval is a statistical interval calculated to include one
or more future observations from the same population with a specified confi-
dence. This approach is algebraically equivalent to the average replicate
(AR) test that is presented in the Technical Enforcement Guidance Document
(TEGD), September 1986. In ground-water monitoring, a prediction interval
approach may be used to make comparisons between background and compliance
well data. This method of analysis is similar to that for calculating a
tolerance limit, and familiarity with prediction intervals or personal prefer-
ence would be the only reason for selecting them over the method for tolerance
limits. The concentrations of a hazardous constituent in the background wells
are used to establish an interval within which K future observations from the
same population are expected to lie with a specified confidence. Then each of
K future observations of compliance well concentrations is compared to the
prediction interval. The interval is constructed to contain all of K future
observations with the stated confidence. If any future observation exceeds
the prediction interval, this is statistically significant evidence of contam-
ination. In application, the number of future observations to be collected,
K, must be specified. Thus, the prediction interval is constructed for a
specified time period in the future. One year is suggested. The interval can
be constructed either to contain all K individual observations with a speci-
fied probability, or to contain the K1 means observed at the K1 sampling
periods.
The prediction interval presented here is constructed assuming that the
background data all follow the same normal distribution. If that is not the
case (see Section 4.2 for tests of normality), but a log transformation
results in data that are adequately normal on the log scale, then the interval
may still be used. In this case, use the data after transforming by taking
the logarithm. The future observations need to also be transformed by taking
logarithms before comparison to the interval. (Alternatively, the end points
of the interval could be converted back to the original scale by taking their
anti-logarithms.)
PURPOSE
The prediction interval is constructed so that K future compliance well
observations can be tested by determining whether they lie in the interval or
5-24
-------
not. If not, evidence of contamination is found. Note that the number of
future observations, K, for which the interval is to be used, must be speci-
fied in advance. In practice, an owner or operator would need to construct
the prediction interval on a periodic (at least yearly) basis, using the most
recent background data. The interval is described using the 95% confidence
factor appropriate for individual well comparisons. It is recommended that a
one-sided prediction interval be constructed for the mean of the four observa-
tions from each compliance well at each sampling period.
PROCEDURE
Step 1. Calculate the mean, X, and the standard deviation, S, for the
background well data (used to form the prediction interval).
Step 2. Specify the number of future observations for a compliance well
to be included in the interval, K. Then the interval is given by
[0, X + syi/m+ 1/n t(n_u K> Q>95)]
where it is assumed that the mean of the m observations taken at the K sam-
pling periods will be used. Here n is the number of observations in the back-
ground data, and t/n i v n Qc\ is found from Table 3 in Appendix B. The
~" III—1. l\ • U.JJI
table is entered with K as the number of future observations, and degrees of
freedom, v = n-1. If K > 5, use the column for K = 5.
Step 3. Once the interval has been calculated, at each sampling period,
the mean of the m compliance well observations is obtained. This mean is com-
pared to see if it falls in the interval. If it does, this is reported and
monitoring continues. If a mean concentration at a sampling period does not
fall in the prediction interval, this is statistically significant evidence of
contamination. This is also reported and the appropriate action taken.
REMARK
For a single future observation, t is given by the t-distribution found
in Table 6 of Appendix B. In general, the interval to contain K future means
of sample size m each is given by
fo Y + '
IU, AT.
where t is as before from Table 3 of Appendix B and where m is the number of
observations in each mean. Note that for K single observations, m=l, while
for the mean of four samples from a compliance well, m=4.
Note, too, that the prediction intervals are one-sided, giving a value
that should not be exceeded by the future observations. The 5% experimentwise
significance level is used with the Bonferroni approach. However, to ensure
5-25
-------
that the significance level for the individual comparisons does not go below
1%, a/K is restricted to be 1% or larger. If more than K comparisons are
used, the comparisonwise significance level of 1% is used, implying that the
comparisonwise level may exceed 5%.
EXAMPLE
Table 5-6 contains chlordane concentrations measured at a hypothetical
facility. Twenty-four background observations are available and are used to
develop the prediction interval. The prediction interval is applied to K=2
sampling periods with m=4 observations at a single compliance well each.
Step 1. Find the mean and standard deviation of the 24 background well
measurements. These are 101 and 11, respectively.
Step 2. There are K = 2 future observations of means of 4 observations
to be included in the prediction interval. Entering Table 3 of Appendix B at
K = 2 and 20 degrees of freedom (the nearest entry to the 23 degrees of
freedom), we find t^o 2 0 95) = 2.09. The interval is given by
[0, 101 + (11)2.09(1/4 + 1/24)1/2] = (0, 113.4).
Step 3. The mean of each of the four compliance well observations at
sampling period one and two is found and compared with the interval found in
Step 2. The mean of the first sampling period is 122 and that for the second
sampling period is 113. Comparing the first of these to the prediction inter-
val for two means based on samples of size 4, we find that the mean exceeds
the upper limit of the prediction interval. This is statistically significant
evidence of contamination and should be reported to the Regional Administra-
tor. Since the second sampling period mean is within the prediction interval,
the Regional Administrator may allow the facility to remain in its current
stage of monitoring.
INTERPRETATION
A prediction interval is a statistical interval constructed from back-
ground sample data to contain a specified number of future observations from
the same distribution with specified probability. That is, the prediction
interval is constructed so as to have a 95% probability of containing the next
K sampling period means, provided that there is no contamination. If the
future observations are found to be in the prediction interval, this is evi-
dence that there has been no change at the facility and that no contamination
is occurring. If the future observation falls outside of the prediction
interval, this is statistical evidence that the new observation does not come
from the same distribution, that is, from the population of uncontaminated
water samples previously sampled. Consequently, if the observation is a con-
centration above the prediction interval's upper limit, it is statistically
significant evidence of contamination.
5-26
-------
TABLE 5-6. EXAMPLE DATA FOR PREDICTION INTERVAL--CHLORDANE LEVELS
Background well data—Well 1
Sampling date
January 1, 1985
April 1, 1985
July 1, 1985
October 1, 1985
January 1, 1986
April 1, 1986
n =
Mean =
SD =
Chlordane
concentration
(ppb)
97
103
104
85
120
105
104
108
110
95
102
78
105
94
110
111
80
106
115
105
100
93
89
113
24
101
11
Compliance well data— Well 2
Chlordane
concentration
Sampling date (ppb)
July 1, 1986 123
120
116
128
m = 4
Mean = 122
SD = 5
October 1, 1986 116
117
119
101
m = 4
Mean = 113
SD = 8
5-27
-------
The prediction interval could be constructed in several ways. It can be
developed for means of observations at each sampling period, or for each in-
dividual observation at each sampling period.
It should also be noted that the estimate of the standard deviation, S,
that is used should be an unbiased estimator. The usual estimator, presented
above, assumes that there is only one source of variation. If there are other
sources of variation, such as time effects, or spatial variation in the data
used for the background, these should be included in the estimate of the vari-
'ability. This can be accomplished by use of an appropriate analysis-of-vari-
ance model to include the other factors affecting the variability. Determina-
tion of the components of variance in complicated models is beyond the scope
of this document and requires consultation with a professional statistician.
REFERENCE
Hahn, G. and Wayne Nelson. 1973. "A Survey of Prediction Intervals and Their
Applications." Journal of Quality Technology. 5:178-188.
5-28
-------
SECTION 6
COMPARISONS WITH MCLs OR ACLs
This section includes statistical procedures appropriate when the moni-
toring aims at determining whether ground-water concentrations of hazardous
constituents are below or above fixed concentration limits. In this situation
the maximum concentration limit (MCL) or alternate concentration limit (ACL)
is a specified concentration limit rather than being determined by the back-
ground well concentrations. Thus the applicable statistical procedures are
those that compare the compliance well concentrations estimated from sampling
with the prespecified fixed limits. Methods for comparing compliance well
concentrations to a (variable) background concentration were presented in
Section 5.
The methods applicable to the type of comparisons described in this sec-
tion include confidence intervals and tolerance intervals. A special section
deals with cases where the observations exhibit very small or no variability.
6.1 SUMMARY CHART FOR COMPARISON WITH MCLs OR ACLs
Figure 6-1 is a flow chart to aid the user in selecting and applying a
statistical method when the permit specifies an MCL or ACL.
As with each type of comparison, a determination is made first to see if
there are enough data for intra-well comparisons. If so, these should be done
in parallel with the other comparisons.
Here, whether the compliance limit is a maximum concentration limit (MCL)
or an alternate concentration limit (ACL), the recommended procedure to com-
pare the mean compliance well concentration against the compliance limit is
the construction of a confidence interval. This approach is presented in
Section 6.2.1. Section 6.2.2 adds a special case of limited variance in the
data. If the permit requires that a compliance limit is not to be exceeded
more than a specified fraction of the time, then the construction of tolerance
limits is the recommended procedure, discussed in Section 6.2.3.
6.2 STATISTICAL PROCEDURES
This section presents the statistical .procedures appropriate for com-
parison of ground-water monitoring data to a constant compliance limit, a
fixed standard. The interpretation of the fixed compliance limit (MCL or ACL)
is that the mean concentration should not exceed this fixed limit. An alter-
nate interpretation may be specified. The permit could specify a compliance
limit as a concentration not to be exceeded by more than a small, specified
6-1
-------
Comparisons with MCL/ACLs
Comparisons with
MCL/ACLs
(Section 6)
mtra-Well Comparisons
if More than 1 Yr of Data
Control Charts
(Section 7)
Type of
Comparison
with Upper 95th Percentile
Confidence Intervals
Tolerance Limits
Conclusions
Are Data
Normal?
Conclusions
Are
Log Data
Normal?
Enough
Data
Available?
Nonparametric
Confidence
Intervals
Consult with
Professional
Statistician
Figure 6-1. Comparisons with MCLs/ACLs.
6-2
-------
proportion of the observations.
situation is also presented.
6.2.1 Confidence Intervals
A tolerance interval approach for such a
When a regulated unit is in compliance monitoring with a fixed compliance
limit (either an MCL or an ACL), confidence intervals are the recommended pro-
cedure pursuant to §264.97(h)(5) in the Subpart F regulations. The unit will
remain in compliance monitoring unless there is statistically significant evi-
dence that the mean concentration at one or more of the downgradient wells
exceeds the compliance limit. A confidence interval for the mean concentra-
tion is constructed from the sample data for each compliance well individu-
ally. These confidence intervals are compared with the compliance limit. If
the entire confidence interval exceeds the compliance limit, this is statisti-
cally significant evidence that the mean concentration exceeds the compliance
limit.
Confidence intervals can generally be constructed for any specified dis-
tribution. General methods can be found in texts on statistical inference
some of which are referenced in Appendix C. A confidence limit based on the
normal distribution is presented first, followed by a modification for the
log-normal distribution. A nonparametric confidence interval is also
presented.
6.2.1.1 Confidence Interval Based on the Normal Distribution
PURPOSE
The confidence interval for the mean concentration is constructed from
the compliance well data. Once the interval has been constructed, it can be
compared with the MCL or ACL by inspection to determine whether the mean con-
centration significantly exceeds the MCL or ACL.
PROCEDURE
Step 1. Calculate the mean, X, and standard deviation, S, of the sample
concentration values. Do this separately for each compliance well.
Step 2. For each well calculate the confidence interval as
* ± '(0.99, n-1) S//"
where ^fQtggt n_n is obtained from the t-table (Table 6, Appendix B).
Generally, there will be at least four observations at each sampling period,
so t will usually have at least 3 degrees of freedom.
Step 3. Compare the intervals calculated in Step 2 to the compliance
limit (the MCL or ACL, as appropriate). If the compliance limit is contained
in the interval or is above the upper limit, the unit remains in compliance.
6-3
-------
If any well confidence interval's lower limit exceeds the compliance limit,
this is statistically significant evidence of contamination.
REMARK
The 99th percentile of the t-distribution is used in constructing the
confidence interval. This is consistent with an alpha (probability of Type I
error) of 0.01, since the decision on compliance is made by comparing the
lower confidence limit to the MCL or ACL. Although the interval as con-
structed with both upper and lower limits is a 98% confidence interval, the
use of it is one-sided, which is consistent with the 1% alpha level of
individual well comparisons.
EXAMPLE
Table 6-1 lists hypothetical concentrations of Aldicarb in three compli-
ance wells. For illustration purposes, the MCL for Aldicarb has been set at
7 ppb. There is no evidence of nonnormality, so the confidence interval based
on the normal distribution is used.
TABLE 6-1. EXAMPLE DATA FOR NORMAL CONFIDENCE INTERVAL—ALDICARB
CONCENTRATIONS IN COMPLIANCE WELLS (ppb)
Sampling
date Well 1 Well 2 Well 3
Jan. 1 19.9 23.7 5.6
Feb. 1 29.6 21.9 3.3
Mar. 1 18.7 26.9 2.3
Apr. 1 24.2 26.1 6.9
X = 23.1 24.6 4.5
S = 4.9 2.3 2.1
MCL = 7 ppb
Step 1. Calculate the mean and standard deviation of the concentrations
for each compliance well. These statistics are shown in the table above.
Step 2. Obtain the 99th percentile of the t-distribution with (4-1) = 3
degrees of freedom from Table 6, Appendix B as 4.541. Then calculate the con-
fidence interval for each well's mean concentration.
Well 1: 23.1 ± 4.541(4.9)//4~ = (12.0, 34.2)
Well 2: 24.6 ± 4.541(2.3)//4~= (19.4, 29.8)
Well 3: 4.5 ± 4.541(2.1)//3~= (-0.3, 9.3)
6-4
-------
where the usual convention of expressing the upper and lower limits of the
confidence interval in parentheses separated by a comma has been followed.
Step 3. Compare each confidence interval to the MCL of 7 ppb. When this
is done, the confidence interval for Well 1 lies entirely above the MCL of 7,
indicating that the mean concentration of Aldicarb in Well 1 significantly
exceeds the MCL. Similarly, the confidence interval for Well 2 lies entirely
above the MCL of 7. This is significant evidence that the mean concentration
in Well 2 exceeds the MCL. However, the confidence interval for Well 3 is
mostly below the MCL. Thus, there is no statistically significant evidence
that the mean concentration in Well 3 exceeds the MCL.
INTERPRETATION
The confidence interval is an interval constructed so that it should con-
tain the true or population mean with specified confidence (98% in this
case). If this interval does not contain the compliance limit, then the mean
concentration must differ from the compliance limit. If the lower end of the
interval is above the compliance limit, then the mean concentration must be
significantly greater than the compliance limit, indicating noncompliance.
6.2.1.2 Confidence Interval for Log-Normal Data
PURPOSE
The purpose of a confidence interval for the mean concentration of log-
normal data is to determine whether there is statistically significant
evidence that the mean concentration exceeds a fixed compliance limit. The
interval gives a range that includes the true mean concentration with
confidence 98%. The lower limit will be below the true mean with confidence
99%, corresponding to an alpha of 1%.
PROCEDURE
This procedure is used to construct a confidence interval for the mean
concentration from the compliance well data when the data are log-normal (that
is, when the logarithms of the data are normally distributed). Once the
interval has been constructed, it can be compared with the MCL or ACL by
inspection to determine whether the mean concentration significantly exceeds
the MCL or ACL. Throughout the following procedures and examples, natural
logarithms (In) are used.
Step 1. Take the natural logarithm of each data point (concentration
measurement). Also, take the natural logarithm of the compliance limit.
Step 2. Calculate the sample mean and standard deviation of the log-
transformed data from each compliance well. (This is Step 1 of the previous
section, working now with logarithms.)
6-5
-------
Step 3. Form the confidence intervals for each compliance well as
* * '(0.99, n-1)
where t/Q gg n_j\ is from the t-distribution in Table 6 of Appendix B. Here
t will typically have 3 degrees of freedom.
Step 4. Compare the confidence intervals found in Step 3 to the
logarithm of the compliance limit found in Step 1. If the lower limit of the
confidence interval lies entirely above the logarithm of the compliance limit,
there is statistically significant evidence that the unit is out of compli-
ance. Otherwise, the unit is in compliance.
EXAMPLE
Table 6-2 contains EDB concentration data from three compliance wells at
a hypothetical site. The MCL is assumed to be 20 ppb. For demonstration pur-
poses, the data are assumed not normal; a natural log-transformation
normalized them adequately. The lower part of the table contains the natural
logarithms of the concentrations.
TABLE 6-2. EXAMPLE DATA FOR LOG-NORMAL CONFIDENCE INTERVAL—EDB
CONCENTRATIONS IN COMPLIANCE WELLS (ppb)
Sampling
date Well 1 Well 2 Well 3
Concentrations
Jan. 1 24.2 39.7 55.7
Apr. 1 10.2 75.7 17.0
Jul. 1 17.4 60.2 97.8
Oct. 1 39.7 10.9 25.3
X = 22.9 46.6 49.0
S = 12.6 28.0 36.6
MCL * 20 ppb
Natural log concentrations
Jan. 1 3.19 3.68 4.02
Apr. 1 2.32 4.33 2.84
Jul. 1 2.85 4.10 4.58
Oct. 1 3.68 2.39 3.23
X * 3.01 3.62 3.67
S = 0.57 0.86 0.78
In (MCL) = 3.00
6-6
-------
Step 1. The logarithms of the data are used to calculate a confidence
interval. Take the natural log of the concentrations in the top part of
Table 6-2 to find the values given in the lower part of the table. For exam-
ple, ln(24.2) = 3.19, . . ., ln(25.3) = 3.23. Also, take the logarithm of the
MCL to find that ln(20) = 3.00.
Step 2. Calculate the mean and standard deviation of the log concentra-
tions for each compliance well. These are shown in the table.
Step 3. Form the confidence intervals for each compliance well.
Well 1: 3.01 ± 4.541(0.57)/A"= (1.72, 4.30)
Well 2: 3.62 ± 4.541(0.86)//T= (1.67, 5.57)
Well 3: 3.67 ± .4.541(0.78)//T= (1.90, 5.44)
where 4.541 is the value obtained from the t-table (Table 6 in Appendix B) as
in the previous example.
Step 4. Compare the individual well confidence intervals with the MCL
(expressed on the log scale). The natural log of the MCL of 20 ppm is 3.00.
None of the individual well confidence intervals for the mean has a lower
limit that exceeds this value, so none of the individual well mean concentra-
tions is significantly different from the MCL.
Note: The lower and upper limits of the confidence interval for each
well's mean concentration could be converted back to the original scale by
taking antilogs. For example, on the original scale, the confidence intervals
would be:
Well 1: (exp(1.72), exp(4.30)) or (5.58, 73.70)
Well 2: (exp(1.67), exp(5.51)) or (5.31, 262.43)
Well 3: (exp(1.90), exp(5.44)) or (6.69, 230.44)
These limits could be compared directly with the MCL of 20 ppb. It is gen-
erally easier to take the logarithm of the MCL rather than the antilogarithm
of all of the intervals for comparison.
INTERPRETATION
If the original data are not normal, but the log-transformation ade-
quately normalizes the data, the confidence interval (on the log scale) is an
interval constructed so that the lower confidence limit should be less than
the true or population mean (on the log scale) with specified confidence (99%
6-7
-------
in this case). If the lower end of the confidence interval exceeds the appro-
priate compliance limit, then the mean concentration must exceed that compli-
ance limit. These results provide statistically significant evidence of
contamination.
6.2.1.3 Nonparametric Confidence Interval
If the data do not adequately follow the normal distribution even after
the logarithm transformation, a nonparametric confidence interval can be con-
structed. This interval is for the median concentration (which equals the
mean if the distribution is symmetric). The nonparametric confidence interval
is generally wider and requires more data than the corresponding normal dis-
tribution interval, and so the normal or log-normal distribution interval
should be used whenever it is appropriate. It requires a minimum of seven (7)
observations in order to construct an interval with a two-sided confidence
coefficient of 98%, corresponding to a one-sided confidence coefficient of
99%. Consequently, it is applicable only for the pooled concentration of
compliance wells at a single point in time or for special sampling to produce
a minimum of seven observations at a single well during the sampling period.
PURPOSE
The nonparametric confidence interval is used when the raw data have been
found to violate the normality assumption, a log-transformation fails to
normalize the data, and no other specific distribution is assumed. It pro-
duces a simple confidence interval that is designed to contain the true or
population median concentration with specified confidence (here 99%). If this
confidence interval contains the compliance limit, it is concluded that the
median concentration does not differ significantly from the compliance
limit. If the interval's lower limit exceeds the compliance limit, this is
statistically significant evidence that the concentration exceeds the compli-
ance limit and the unit is out of compliance.
PROCEDURE
Step 1. Within each compliance well, order the n data from least to
greatest, denoting the ordered data by X(l),. . ., X(n), where X(i) is the ith
value in the ordered data.
Step 2. Determine the critical values of the order statistics as
follows. If the minimum seven observations is used, the critical values are 1
and 7. Otherwise, find the smallest integer, M, such that the cumulative
binomial distribution with parameters n (the sample size) and p = 0.5 is at
least 0.99. Table 6-3 gives the values of M and n+l-M together with the exact
confidence coefficient for sample sizes from 4 to 11. For larger samples,
take as an approximation the nearest integer value to
where ZQ gg is the 99th percentile from the normal distribution (Table 4,
Appendix B) and equals 2.33.
6-8
-------
TABLE 6-3. VALUES OF M AND n+l-M AND CONFIDENCE
COEFFICIENTS FOR SMALL SAMPLES
n
4
5
6
7
8
9
10
11
M
4
5
6
7
8
9
9
10
n+l-M
1
1
1
1
1
1
2
2
Two-sided
confidence
87.5%
93.8%
96.9%
98.4%
99.2%
99.6%
97.9%
98.8%
Step 3. Once M has been determined in Step 2, find n+l-M and take as the
confidence limits the order statistics, X(M) and X(n+l-M). (With the minimum
seven observations, use X(l) and X(7).)
Step 4. Compare the confidence limits found in Step 3 to the compliance
limit. If the lower limit, X(M) exceeds the compliance limit, there is sta-
tistically significant evidence of contamination. Otherwise, the unit remains
in compliance.
REMARK
The nonparametric confidence interval procedure requires at least seven
observations in order to obtain a (one-sided) significance level of 1% (confi-
dence of 99%). This means that data from two (or more) wells or sampling
periods would have to be pooled to achieve this level. If only the four
observations from one well taken at a single sampling period were used, the
one-sided significance level would be 6.25%. This would also be the false
alarm rate.
Ties do not affect the procedure. If there are ties, order the observa-
tions as before, including all of the tied values as separate observations.
That is, each of the observations with a common value is included in the
ordered list (e.g., 1, 2, 2, 2, 3, 4, etc.). For ties, use the average of the
tied ranks as in Section 5.2.2, Step 1 of the example. The ordered statistics
are found by counting positions up from the bottom of the list as before.
Multiple values from separate observations are counted separately.
EXAMPLE
Table 6-4 contains concentrations of T-29 in parts per million from two
hypothetical compliance wells. The data are assumed to consist of four sam-
ples taken each quarter for a year, so that sixteen observations are available
6-9
-------
TABLE 6-4. EXAMPLE DATA FOR NONPARAMETRIC CONFIDENCE
INTERVAL—T-29 CONCENTRATIONS (ppm)
Sampling
date
Jan. 1
Apr. 1
Jul. 1
Oct. 1
Well 1
Concentration
(ppm)
3.17
2.32
7.37
4.44
9.50
21.36
5.15
15.70
5.58
3.39
8.44
10.25
3.65
6.15
6.94
3.74
Rank
(2)
(1)
(11)
(6)
(13)
(16)
(7)
(15)
(8)
(3)
(12)
(14)
(4)
(9)
(10)
(5)
Well 2
Concentration
(ppm)
3.52
12.32
2.28
5.30
8.12
3.36
11.02
35.05
2.20
0.00
9.30
10.30
5.93
6.39
0.00
6.53
Rank
(6)
(15)
(4)
(7)
(11)
(5)
(14)
(16)
(3)
(1.5)
(12)
(13)
(8)
(9)
(1.5)
(19)
from each well. The data are not normally distributed, neither as raw data
nor when log transformed. Thus, the nonparametric confidence interval is
used. The MCL is taken to be 15 ppm.
Step 1. Order the 16 measurements from least to greatest within each
well separately. The numbers in parentheses beside each concentration in
Table 6-4 are the ranks or order of the observation. For example, in Well 1,
the smallest observation is 2.32, which has rank 1. The second smallest is
3.17, which has rank 2, and so forth, with the largest observation of 21.36
having rank 16.
Step 2. The sample size is large enough so that the approximation is
used to find M.
M = 16/2 + 1 + 2.33 /(16/4) = 13.7 = 14
Step 3. The approximate 95% confidence limits are given by the
16 + 1 - 14 = 3rd largest observation and the 14th largest observation. For
6-10
-------
Well 1, the 3rd observation is 3.39 and the 14th largest observation is
10.25. Thus the confidence limits for Well 1 are (3.39, 10.25). Similarly
for Well 2, the 3rd largest observation and the 14th largest observation are
found to give the confidence interval (2.20, 11.02). Note that for Well 2
there were two values below detection. These were assigned a value of zero
and received the two smallest ranks. Had there been three or more values
below the limit of detection, the lower limit of the confidence interval would
have been the limit of detection because these values would have been the
smallest values and so would have included the third order statistic.
Step 4. Neither of the two confidence intervals' lower limit exceeds the
MCL of 15. In fact, the upper limit is less than the MCL, implying that the
concentration in each well is significantly below the MCL.
INTERPRETATION
The rank-order statistics used to form the confidence interval in the
nonparametric confidence interval procedure will contain the population median
with confidence coefficient of 98%. The population median equals the mean
whenever the distribution is symmetric. The nonparametric confidence interval
is generally wider and requires more data than the corresponding normal dis-
tribution interval, and so the normal or log-normal distribution interval
should be used whenever it is appropriate.
If the confidence interval contains the compliance limit (either MCL or
ACL), then it is reasonable to conclude that the median compliance well con-
centration does not differ significantly from the compliance limit. If the
lower end of the confidence interval exceeds the compliance limit, this is
statistically significant evidence at the 1% level that the median compliance
well concentration 'exceeds the compliance limit and the unit is out of
compliance.
6.2.2 Tolerance Intervals for Compliance Limits
In some cases a permit may specify that a compliance limit (MCL or ACL)
is not to be exceeded more than a specified fraction of the time. Since lim-
ited data will be available from each monitoring well, these data can be used
to estimate a tolerance interval for concentrations from that well. If the
upper end of the tolerance interval (i.e., upper tolerance limit) is less than
the compliance limit, the data indicate that the unit is in compliance. That
is, concentrations should be less than the compliance limit at least a speci-
fied fraction of the time. If the upper tolerance limit of the interval
exceeds the compliance limit, then the concentration of the hazardous con-
stituent could exceed the compliance limit more than the specified proportion
of the time.
This procedure compares an upper tolerance limit to the MCL or ACL. With
small sample sizes the upper tolerance limit can be fairly large, particularly
if large coverage with high confidence is desired. If the owner or operator
wishes to use a tolerance limit in this application, he/she should suggest
values for the parameters of the procedure subject to the approval of the
Regional Administrator. For example, the owner or operator could suggest a
6-11
-------
95% coverage with 95$ confidence. This means that the upper tolerance limit
is a value which, with 95% confidence, will be exceeded less than 5% of the
time.
PURPOSE
The purpose of the tolerance interval approach is to construct an inter-
val that should contain a specified fraction of the concentration measurements
from compliance wells with a specified degree of confidence. In this appli-
cation it is generally desired to have the tolerance interval contain 95% of
the measurements of concentration with confidence at least 95%.
PROCEDURE
It is assumed that the data used to construct the tolerance interval are
approximately normal. The data may consist of the concentration measurements
themselves if they are adequately normal (see Section 4.2 for tests of normal-
ity), or the data used may be the natural logarithms of the concentration
data. It is important that the compliance limit (MCL or ACL) be expressed in
the same units (either concentrations or logarithm of the concentrations) as
the observations.
Step 1. Calculate the mean, X, and the standard deviation, S, of the
compliance well concentration data.
Step 2. Determine the factor, K, from Table 5, Appendix B, for the sam-
ple size, n, and form the one-sided tolerance interval
[0, X + KS]
Table 5, Appendix B contains the factors for a 95% coverage tolerance interval
with confidence factor 95%.
Step 3. Compare the upper limit of the tolerance interval computed in
Step 2 to the compliance limit. If the upper limit of the tolerance interval
exceeds that limit, this is statistically significant evidence of contamina-
tion.
EXAMPLE
Table 6-5 contains Aldicarb concentrations at a hypothetical facility in
compliance monitoring. The data are concentrations in parts per million (ppm)
and represent observations at three compliance wells. Assume than the permit
establishes an ACL of 50 ppm that is not to be exceeded more than 5% of the
time.
Step 1. Calculate the mean and standard deviation of the observations
from each well. These are given in the table.
6-12
-------
TABLE 6-5. EXAMPLE DATA FOR A TOLERANCE
INTERVAL COMPARED TO AN ACL
Sampling
date
Aldlcarb concentrations (ppm)
Well 1 Well 2 Well 3
Jan. 1
Feb. 1
Mar. 1
Apr. 1
19.
29,
18.
24.
Mean
SD
23.1
4.93
23.7
21.9
26.9
26.1
24.7
2.28
25.6
23.
22.
26.9
24.5
2.10
ACL = 50 ppm
Step 2. For n = 4, the factor, K, in Table 5, Appendix B, is found to
be 5.145. Form the upper tolerance interval limits as:
Well 1: 23.1 + 5.145(4.93) = 48.5
Well 2: 24.7 + 5.145(2.28) = 36.4
Well 3: 24.5 + 5.145(2.10) = 35.3
Step 3. Compare the tolerance limits with the ACL of 50 PPM. Since the
upper tolerance limits are below the ACL, there is no statistically signifi-
cant evidence of contamination at any well. The site remains in detection
monitoring.
INTERPRETATION
It may be desirable in a permit to specify a compliance limit that is not
to be exceeded more than 5% of the time. A tolerance interval constructed
from the compliance well data provides an estimated interval that will contain
95% of the data with confidence 95%. If the upper limit of this interval is
below the selected compliance limit, concentrations measured at the compliance
wells should exceed the compliance limit less than 5% of the time. If the
upper limit of the tolerance interval exceeds the compliance limit, then more
than 5% of the concentration measurements would be expected to exceed the
compliance limit.
6.2.3 Special Cases with Limited Variance
Occasionally, all four concentrations from a compliance well at a par-
ticular sampling period could be identical. If this is the case, the formula
for estimating the standard deviation at that specific sampling period would
6-13
-------
give zero, and the methods for calculating parametric confidence intervals
would give the same limits for the upper and lower ends of the intervals,
which is not appropriate.
In the case of identical concentrations, one should assume that there is
some variation in the data, but that the concentrations were rounded and give
the same values after rounding. To account for the variability that was
present before rounding, take the least significant digit in the reported
concentration as having resulted from rounding. Assume that rounding results
in a uniform error on the interval centered at the reported value with the
interval ranging up or down one half unit from the reported value. This
assumed rounding is used to obtain a nonzero estimate of the variance for use
in cases where all the measured concentrations were found to be identical.
PURPOSE
The purpose of this procedure is to obtain a nonzero estimate of the
variance when all observations from a well during a given sampling period gave
identical results. Once this modified variance is obtained, its square root
is used in place of the usual sample standard deviation, S, to construct con-
fidence intervals or tolerance intervals.
PROCEDURE
Step 1. Determine the least significant value of any data point. That
is, determine whether the data were reported to the nearest 10 ppm, nearest 1
ppm, nearest 100 ppm, etc. Denote this value by 2R.
Step 2. The data are assumed to have been rounded to the nearest 2R, so
each observation is actually the reported value ±R. Assuming that the obser-
vations were identical because of rounding, the variance is estimated to be
R2/3, assuming the uniform distribution for the rounding error. This gives
the estimated standard deviation as
S1 = R//3"
Step 3. Take this estimated value from Step 2 and use it as the estimate
of the standard deviation in the appropriate parametric procedure. That is,
replace S by S1.
EXAMPLE
In calculating a confidence interval for a single compliance well, sup-
pose that four observations were taken during a sampling period and all
resulted in 590 ppm. There is no variance among the four values 590, 590,
590, and 590.
Step 1. Assume that each of the values 590 came from rounding the con-
centration to the nearest 10 ppm. That is, 590 could actually be any value
between 585.0 and 594.99. Thus, 2R is 10 ppm (rounded off), so R is 5 ppm.
6-14
-------
Step 2. The estimate of the standard deviation is
S' = 5//T = 5/1.732 * 2.89 ppm
Step 3. Use S1 = 2.89 and X = 590 to calculate the confidence interval
(see Section 6.2.1) for the mean concentration from this well. This gives
590 ± (4.541)(2.89//i) = (583.4, 596.6)
as the 98% confidence interval of the average concentration. Note that 4.541
is the 99th percentile from the t-distribution (Table 6, Appendix B) with 3
degrees of freedom since the sample size was 4.
INTERPRETATION
When identical results are obtained from several different samples, the
interpretation is that the data are not reported to enough significant figures
to show the random differences. If there is no extrinsic evidence invalidat-
ing the data, the data are regarded as having resulted from rounding more
precise results to the reported observations. The rounding is assumed to
result in variability that follows the uniform distribution on the range ±R,
where 2R is the smallest unit of reporting. This assumption is used to calcu-
late a standard deviation for the observations that otherwise appear to have
no variability.
REMARK
Assuming that the data are reported correctly to the units indicated,
other distributions for the rounding variability could be assumed. The max-
imum standard deviation that could result from rounding when the observation
is ±R is the value R.
6-15
-------
SECTION 7
CONTROL CHARTS FOR INTRA-WELL COMPARISONS
The previous sections cover various situations where the compliance well
data are compared to the background well data or to specified concentration
limits (ACL or MCL) to detect possible contamination. This section discusses
the case where the level of each constituent within a single uncontaminated
well is being monitored over time. In essence, the data for each constituent
in each well are plotted on a time scale and inspected for obvious features
such as trends or sudden changes in concentration levels. The method sug-
gested here is a combined Shewhart-CUSUM control chart for each well and
constituent.
The control chart method is recommended for uncontaminated wells only,
when data comprising at least eight independent samples over a one-year period
are available. This requirement is specified under current RCRA regulations
and applies to each constituent in each well.
As discussed in Section 2, a common sampling plan will obtain four inde-
pendent samples from each well on a semi-annual basis. With this plan a con-
trol chart can be implemented when one year's data are available. As a result
of Monte Carlo simulations, Starks (1988) recommended at least four sampling
periods at a unit of eight or more wells, and at least eight sampling periods
at a unit with fewer than four wells.
The use of control charts can be an effective technique for monitoring
the levels of a constituent at a given well over time. It also provides a
visual means of detecting deviations from a "state of control." It is clear
that plotting of the data is an important part of the analysis process. Plot-
ting is an easy task, although time-consuming if many data sets need to be
plotted. Advantage should be taken of graphics software, since plotting of
time series data will be an ongoing process. New data points will be added to
the already existing data base each time new data are available. The follow-
ing few sections will discuss, in general terms, the advantages of plotting
time series data; the corrective steps one could take to adjust when season-
al ity in the data is present; and finally, the detailed procedure for con-
structing a Shewhart-CUSUM control chart, along with a demonstration of that
procedure, is presented.
7.1 ADVANTAGES OF PLOTTING DATA
While analyzing the data by means of any of the appropriate statistical
procedures discussed in earlier sections is recommended, we also recommend
plotting the data. Each data point should be plotted against time using a
time scale (e.g., month, quarter). A plot should be generated for each
7-1
-------
constituent measured in each well. For visual comparison purposes, the scale
should be kept identical from well to well for a given constituent.
Another important application of the plotting procedure is for detecting
possible trends or drifts in the data from a given well. Furthermore, when
visually comparing the plots from several wells within a unit, possible con-
tamination of one rather than all downgradient wells could be detected which
would then warrant a closer look at that well. In general, graphs can provide
highly effective illustrations of the time series, allowing the analyst to
obtain a much greater sense of the data. Seasonal fluctuations or sudden
changes, for example, may become quite evident, thereby supporting the analyst
in his/her decision of which statistical procedure to use. General upward or
downward trends, if present, can be detected and the analyst can follow-up
with a test for trend, such as the nonparametric Mann-Kendall test (Mann,
1945; Kendall, 1975). If, in addition, seasonality is suspected, the user can
perform the seasonal Kendall test for trend developed by Hirsch et al.
(1982). The reader is also referred to Chapters 16 "Detecting and Estimating
Trends" and 17 "Trends and Seasonality" of Gilbert's "Statistical Methods for
Environmental Pollution Monitoring," 1987. In any of the above cases, the
help of a professional statistician is recommended.
Another important use of data plots is that of identifying unusual data
points (e.g., outliers). These points should then be investigated for pos-
sible QC problems, data entry errors, or whether they are truly outliers.
Many software packages are available for computer graphics, developed for
mainframes, mini-, or microcomputers. For example, SAS features an easy-to-
use plotting procedure, PROC PLOT; where the hardware and software are avail-
able, a series of more sophisticated plotting routines can be accessed through
SAS GRAPH. On microcomputers, almost everybody has his or her favorite
graphics software that they use on a regular basis and no recommendation will
be made as to the most appropriate one. The plots shown in this document were
generated using LOTUS 1-2-3.
Once the data for each constituent and each well are plotted, the plots
should be examined for seasonality and a correction is recommended should
seasonality be present. A fairly simple-to-use procedure for deseasonalizing
data is presented in the following paragraphs.
7.2 CORRECTING FOR SEASONALITY
A necessary precaution before constructing a control chart is to take
into account seasonal variation of the data to minimize the chance of mistak-
ing seasonal effect for evidence of well contamination. This could result
from variations in chemical concentrations with recharge rates during
different seasons throughout the years. If seasonality is present, then
deseasonalizing the data prior to using the combined Shewhart-CUSUM control
chart procedure is recommended.
Many approaches to deseasonalize data exist. If the seasonal pattern is
regular, it may be modeled with a sine or cosine function. Moving averages
can be used, or differences (of order 12 for monthly data for example) can be
7-2
-------
used. However, time series models may Include rather complicated methods for
deseasonalizing the data. Another simpler method exists which should be ade-
quate for the situations described in this document. It has the advantage of
being easy to understand and apply, and of providing natural estimates of the
monthly or quarterly effects via the monthly or quarterly means. The method
proposed here can be applied to any seasonal cycle—typically an annual cycle
for monthly or quarterly data.
NOTE
Corrections for seasonality should be used with great caution as they
represent extrapolation into the future. There should be a good scientific
explanation for the seasonal ity as well as good empirical evidence for the
seasonality before corrections are made. Larger than average rainfalls for
two or three Augusts in a row does not justify the belief that there will
never be a drought in August, and this idea extends directly to groundwater
quality. In addition, the quality (bias, robustness, and variance) of the
estimates of the proper corrections must be considered even in cases where
corrections are called for. If seasonality is suspected, the user might want
to seek the help of a professional statistician.
PURPOSE
When seasonality is known to exist in a time series of concentrations,
then the data should be deseasonalized prior to constructing control charts in
order to take into account seasonal variation rather than mistaking seasonal
effects for evidence of contamination.
PROCEDURE
The following instructions to adjust a time series for seasonality are
based on monthly data with a yearly cycle. The procedure can be easily modi-
fied to accommodate a yearly cycle of quarterly data.
Assume that N years of monthly data are available. Let x^ denote the
unadjusted observation for the ith month during the jth year.
Step 1. Compute the average concentration for month i over the N-year
period:
Xi = (Xn + ... + X1N)/N
This is the average of all observations taken in different years but during
the same month. That is, calculate the mean concentrations for all Januarys,
then the mean for all Februarys and so on for each of the 12 months.
Step 2. Calculate the grand mean, X, of all N*12 observations,
12 N 12
X = z z X../N*12 = z X./12
i=l j=l 1J 1=1 1
7-3
-------
Step 3. Compute the adjusted concentrations,
Computing X^ - X^ removes the average effect of month i from the monthly
data, and adding X, the overall mean, places the adjusted z.,-,- values about the
same mean, X. It follows that the overall mean adjusted observation, Z,
equals the overall mean unadjusted value, X.
EXAMPLE
Columns 2 through 4 of Table 7-1 show monthly unadjusted concentrations
of a fictitious analyte over a 3-year period.
TABLE 7-1. EXAMPLE COMPUTATION FOR DESEASONALIZING DATA
Unadjusted
Monthly adjusted
concentrations
1983 1984 1985
3-Month
average
concentrations
1983 1984 1985
January
February
March
April
May
June
July
August
September
October
November
December
1.99
2.
2.
2.
10
12
12
2.11
2.
2.
2.
2.
2.
2.
15
19
18
16
08
05
,01
,10
,17
,13
.13
,18
.25
.24
.22
2.15
2.08
2.13
2.08
2.16
,17
,27
,23
,24
,26
.31
.32
,28
.22
.19
,05
,12
,19
,16
.16
.20
2.
2.
2.
2.
2.
10
14
10
13
12
2.13
2.12
2.22
2.25
2.25
2.22
2.14
2.11
2.16
2.
2.
2.
11
10
11
2.10
2.11
2.09
2.
2.
2.
2.
2.
2.
2,
2,
2,
2,
15
15
14
13
15
16
16
17
16
14
2.
2.
2,
2,
2.
2.
2.
2,
2,
2.
2.
27
21
25
24
25
23
23
24
22
24
25
2.17
2.23
Overall 3-year average = 2.17
Step 1. Compute the monthly averages across the 3 years. These values
are shown in the fifth column of Table 7-1.
Step 2. The grand mean over the 3-year period is calculated to be 2.17.
7-4
-------
Step 3, Within each month and year, subtract the average monthly con-
centration for that month and add the grand mean. For example, for January
1983, the adjusted concentration becomes
1.99 - 2.05 + 2.17 = 2.11
The adjusted concentrations are shown in the last three columns of Table 7-1.
The reader can check that the average of all 36 adjusted concentrations
equals 2.17, the average unadjusted concentration. Figure 7-1 shows the plot
of the unadjusted and adjusted data. The raw data clearly exhibit seasonality
as well as an upwards trend which is less evident by simply looking at the
data table.
INTERPRETATION
As can be seen in Figure 7-1, seasonal effects were present in the
data. After adjusting for monthly effects, the seasonality was removed as can
be seen in the adjusted data plotted in the same figure.
7.3 COMBINED SHEWHART-CUSUM CONTROL CHARTS FOR EACH WELL AND CONSTITUENT
Control charts are widely used as a statistical tool in industry as well
as research and development laboratories. The concept of control charts is
relatively simple, which makes them attractive to use. From the population
distribution of a given variable, such as concentrations of a given constit-
uent, repeated random samples are taken at intervals over time. Statistics,
for example the mean of replicate values at a point in time, are computed and
plotted together with upper and/or lower predetermined limits on a chart where
the x-axis represents time. If a result falls outside these boundaries, then
the process is declared to be "out of control"; otherwise, the process is
declared to be "in control." The widespread use of control charts is due to
their ease of construction and the fact that they can provide a quick visual
evaluation of a situation, and remedial action can be taken, if necessary.
In the context of ground water monitoring, control charts can be used to
monitor the inherent statistical variation of the data collected within a
single well, and to flag anomalous results. Further investigation of data
points lying outside the established boundaries will be necessary before any
direct action is taken.
A control chart that can be used on a real time basis must be constructed
from a data set large enough to characterize the behavior of a specific
well. It is recommended that data from a minimum of eight samples within a
year be collected for each constituent at each well to permit an evaluation of
the consistency of monitoring results with the current concept of the hydro-
geology of the site. Starks (1988) recommends a minimum of four sampling
periods at a unit with eight or more wells and a minimum of eight sampling
periods at a unit with less than four wells. Once the control chart for the
specific constituent at a given well is acceptable, then subsequent data
7-5
-------
CM ro co
0)
O
c
o
a.
O)
s-
en
CM CM CM CM
(-j/Suu) UOI^DJ^USOUOO »;X|Duy
7-6
-------
points can be plotted on it to provide a quick evaluation as to whether the
process is in control.
The standard assumptions in the use of control charts are that the data
generated by the process, when it is in control, are independently (see Sec-
tion 2.4.2) and normal1y distributed with a fixed mean v and constant variance
o2. The most important assumption is that of independence; control charts are
not robust with respect to departure from independence (e.g., serial correla-
tion, see glossary). In general, the sampling scheme will be such that the
possibility of obtaining serially correlated results is minimized, as noted in
Section 2. The assumption of normality is of somewhat less concern, but
should be investigated before plotting the charts. A transformation (e.g.,
log-transform, square root transform) can be applied to the raw data so as to
obtain errors normally distributed about the mean. An additional situation
which may decrease the effectiveness of control charts is seasonality in the
data. The problem of seasonality can be handled by removing the seasonality
effect from the data, provided that sufficient data to cover at least two
seasons of the same type are available (e.g., 2 years when monthly or quart-
erly seasonal effect). A procedure to correct a time series for seasonality
was shown above in Section 7.2.
PURPOSE
Combined Shewhart-cumulative sum (CUSUM) control charts are constructed
for each constituent at each well to provide a visual tool of detecting both
trends and abrupt changes in concentration levels.
PROCEDURE
Assume that data from at least eight independent samples of monitoring
are available to provide reliable estimates of the mean, y, and standard
deviation, a, of the constituent's concentration levels in a given well.
Step 1. To construct a combined Shewhart-CUSUM chart, three parameters
need to be selected prior to plotting:
h - a decision internal value
k - a reference value
SCL - Shewhart control limit (denoted by U in Starks (1988))
The parameter k of the CUSUM scheme is directly obtained from the value,
D, of the displacement that should be quickly detected; k = D/2. It is recom-
mended to select k = 1, which will allow a displacement of two standard devia-
tions to be detected quickly.
When k is selected to be 1, the parameter h is usually set at values of 4
or 5. The parameter h is the value against which the cumulative sum in the
CUSUM scheme will be compared. In the context of groundwater monitoring, a
value of h = 5 is recommended (Starks, 1988; Lucas, 1982).
7-7
-------
The upper Shewhart limit is set at SCL = 4.5 in units of standard devia-
tion. This combination of k = 1, h = 5, and SCL = 4.5 was found most appro-
priate for the application of combined Shewhart-CUSUM charts for groundwater
monitoring (Starks, 1988).
Step 2. Assume that at time period T^, n^ concentration measurements
Xlf ..., Xni- , are available. Compute their average X^ .
Step 3. Calculate the standardized mean
Z = (X - y)
where y and o are the mean and standard deviation obtained from prior monitor-
ing at the same well (at least four sampling periods in a year).
Step 4. At each time period, T^ , compute the cumulative sum, S^, as:
Si = max (0, (Zi - k) + S^}
where max (A, B} is the maximum of A and B, starting with S0 = 0.
Step 5. Plot the values of Sn- versus Tn- on a time chart for this com-
bined Shewhart-CUSUM scheme. Declare an "out-of -control" situation at sam-
pling period T^ if for the first time, S^ > h or Z^ > SCL. This will indicate
probable contamination at the well and further investigations will be
necessary.
REFERENCES
Lucas, J. M. 1982. "Combined Shewhart-CUSUM Quality Control Schemes." Jour-
nal of Quality Technology . Vol. 14, pp. 51-59.
Starks, T. H. 1988 (Draft). "Evaluation of Control Chart Methodologies for
RCRA Waste Sites."
Hockman, K. K., and J. M. Lucas. 1987. "Variability Reduction Through Sub-
vessel CUSUM Control." Journal of Quality Technology. Vol. 19, pp. 113-121.
EXAMPLE
The procedure is demonstrated on a set of carbon tetrachloride measure-
ments taken monthly at a compliance well over a 1-year period. The monthly
means of two measurements each (n^ = 2 for all i's) are presented in the third
column of Table 7-2 below. Estimates of u and o, the mean and standard
deviation of carbon tetrachloride measurements at that particular well were
obtained from a preceding monitoring period at that well; u = 5.5 yg/L and
o = 0.4 yg/L.
7-8
-------
TABLE 7-2. EXAMPLE DATA FOR COMBINED SHEWHART-CUSUM CHART-
CARBON TETRACHLORIDE CONCENTRATION (yg/L)
Sampling
period Mean concentration, Standardized X^t
Date
Jan 6
Feb 3
Mar 3
Apr 7
May 5
Jun 2
Jul 7
Aug 4
Sep 1
Oct 6
Nov 3
Dec 1
Ti
1
2
3
4
5
6
7
8
9
10
11
12
Xi
5.52
5.60
5.45
5.15
5.95
5.54
5.49
6.08
6.91
6.78
6.71
6.65
Zi
0.07
0.35
-0.18
-1.24
1.59
0.14
-0.04
2.05
4.99a
4.53a
4.28
4.07
Zi - k
-0.93
-0.65
-1.18
-2.24
0.59
-0.86
-1.04
1.05
3.99
3.53
3.28
3.07
CUSUM,
Si
0
0
0
0
0.59
0.00
0.00
1.05
5.04^
8.56°
11.84°
14.91°
Parameters: Mean = 5.50; std = 0.4; k = 1; h = 5; SCL = 4.5.
a Indicates "out-of-control" process via Shewhart control limit (Z^ > 4.5).
b CUSUM "out-of-control" signal (Si > 5).
Step 1. The three parameters necessary to construct a combined
Shewhart-CUSUM chart were selected as h = 5; k = 1; SCL = 4.5 in units of
standard deviation.
Step 2. The monthly means are presented in the third column of
Table 7-2.
Step 3. Standardize the means within each sampling period. These
computations are shown in the fourth column of Table 7-2. For example,
li = (5.52 - 5.50)*/270.4 = 0.07.
Step 4. Compute the quantities S^, i = 1, ..., 12. For example,
St = max (0, -0.93 + 0} = 0
S2 = max (0, -0.65 + 0} = 0
Ss = max (0, 0.59 +
S6 = max {0, -0.86 H
*
etc.
>„} = max {0, 0.59 + 0} = 0.59
Ss] = max {0, -0.86 + 0.59) = max {0, -0.27}
= 0
7-9
-------
These quantities are shown in the last column of Table 7-2.
Step 5. Construct the control chart. The y-axis is in units of stan-
dard deviations. The x-axis represent time, or_the sampling periods. For
each sampling period, T..-, record the value of X^ and S^. Draw horizontal
lines at values h = 5 and SCL = 4.5. These two lines represent the upper con-
trol limits for the CUSUM scheme and the Shewhart control limit, respec-
tively. The chart for this example data set is shown in Figure 7-2.
The combined chart indicates statistically significant evidence of con-
tamination starting at sampling period T9. Both the CUSUM scheme and the
Shewhart control limit were exceeded by S9 and Z9, respectively. Investi-
gation of the situation should begin to confirm contamination and action
should be required to bring the variability of the data back to its previous
level.
INTERPRETATION
The combined Shewhart-CUSUM control scheme was applied to an example data
set of carbon tetrachloride measurements taken on a monthly basis at a well.
The statistic used in the construction of the chart was the mean of two
measurements per sampling period. (It should be noted that this method can be
used on an individual measurement as well, in which case n^ = 1). Estimates
of the mean and standard deviation of the measurements were available from
previous data collected at that well over at least four sampling periods.
The parameters of the combined chart were selected to be k = 1 unit, the
reference value or allowable slack for the process; h = 5 units, the decision
interval for the CUSUM scheme; and SCL = 4.5 units, the upper Shewhart control
limit. All parameters are in units of a, the standard deviation obtained from
the previous monitoring results. Various combinations of parameter values can
be selected. The particular values recommended here appear to be the best for
the initial use of the procedure from a review of the simulations and recom-
mendations in the references. A discussion on this subject is given by Lucas
(1982), Hockman and Lucas (1987), and Starks (1988). The choice of the param-
eters h and k of a CUSUM chart is based on the desired performance of the
chart. The criterion used to evaluate a control scheme is the average number
of samples or time periods before an out-of-control signal is obtained. This
criterion is denoted by ARL or average run length. The ARL should be large
when the mean concentration of a hazardous constituent is near its target
value and small when the mean has shifted too far from the target. Tables
have been developed by simulation methods to estimate ARLs for given combina-
tions of the parameters (Lucas, Hockman and Lucas, and Starks). The user is
referred to these articles for further reading.
7.4 UPDATE OF A CONTROL CHART
The control chart is based on preselected performance parameters as well
as on estimates of y and o, the parameters of the distribution of the measure-
ments in question. As monitoring continues and the process is found to be in
control, these parameters need periodic updating so as to incorporate this new
information into the control charts. Starks (1988) has suggested that in
7-10
-------
o
PS
O
in
o
en
" e
en
HrH «
HH m
c
o
0)
Q
PQ
O
O
I I I I I
in Tf m CN ^- o
.coin^ncN'«-o
ID
CO
D
O
(U
Q.
O>
"5.
E
o
C.
a
TJ
0)
N
O
TJ
C
O
-t-J
en
00
Z3
O
I
-M
i-
O)
J=
OO
-------
general, adjustments in sample means and standard deviations be made after
sampling periods 4, 8, 12, 20, and 32, following the initial monitoring period
recommended to be at least eight sampling periods. Also, the performance
parameters h, k, and SCL would need to be updated. The author suggests that
h = 5, k = 1, and SCL = 4.5 be kept at those values for the first 12 sampling
periods following the initial monitoring plan, and that k be reduced to 0.75
and SCL to 4.0 for all subsequent sampling periods. These values and sampling
period numbers are not mandatory. In the event of an out-of-control state or
a trend, the control chart should not be updated.
7.5 NONOETECTS IN A CONTROL CHART
Regulations require that four independent water samples be taken at each
well at a given sampling period. The mean of the four concentration measure-
ments of a particular constituent is used in the construction of a control
chart. Now situations will arise when the concentration of a constituent is
below detection limit for one or more samples. The following approach is
suggested for treating nondetects when plotting control charts.
If only one of the four measurements is a nondetect, then replace it with
one half of the detection limit (MDL/2) or with one half of the practical
quantisation limit (PQL/2) and proceed as described in Section 7.3.
If either two or three of the measurements are nondetects, use only the
quantitated values (two or one, respectively) for the control chart and pro-
ceed as discussed earlier in Section 7.3.
If all four measurements are nondetects, then use one half of the detec-
tion limit or practical quantisation limit as the value for the construction
of the control chart. This is an obvious situation of no contamination of the
well.
In the event that a control chart requires updating and a certain propor-
tion of the measurements is below detection limit, then adjust the mean and
standard deviation necessary for the control chart by using Cohen's method
described in Section 8.1.4. In that' case, the proportion of nondetects
applies to the pool of data available at the time of the updating and would
include all nondetects up to that time, not just the four measurements taken
at the last sampling period.
CAUTIONARY NOTE: Control charts are a useful supplement to other statistical
techniques because they are graphical and simple to use. However, it is
inappropriate to construct a control chart on wells that have shown evidence
of contamination or an increasing trend (see §264.97(a)(l)(i)). Further, con-
tamination may not be present in a well in the form of a steadily increasing
concentration profile—it may be present intermittently or may increase in a
step function. Therefore, the absence of an increasing trend does not
necessarily prove that a release has not occurred.
7-12
-------
SECTION 8
MISCELLANEOUS TOPICS
This chapter contains a variety of special topics that are relatively
short and self contained. These topics include methods to deal with data
below the limit of detection and methods to check for, and deal with outliers
or extreme values in the data.
8.1 LIMIT OF DETECTION
In a chemical analysis some compounds may be below the detection limit
(DL) of the analytical procedure. These are generally reported as not
detected (rather than as zero or not present) and the appropriate limit of
detection is usually given. Data that include not detected results are a
special case referred to as censored data in the statistical literature. For
compounds not detected, the concentration of the compound is not known.
Rather, it is only known that the concentration of the compound is less than
the detection limit.
There are a variety of ways to deal with data that include values below
detection. There is no general procedure that is applicable in all cases.
However there are some general guidelines that usually prove adequate. If
these do not cover a specific situation, the user should consult a profes-
sional statistician for the most appropriate way to deal with the values below
detection.
A summary of suggested approaches to deal with data below the detection
limit is presented as Table 8-1. The method suggested depends on the amount
of data below the detection limit. For small amounts of below detection
values, simply replacing a "NO" (not detected) report with a small number, say
the detection limit divided by two, and proceeding with the usual analysis is
satisfactory. For moderate amounts of below detection limit data, a more
detailed adjustment is appropriate, while for large amounts one may need to
only consider whether a compound was detected or not as the variable of
analysis.
The meaning of small, moderate, and large above is subject to judgment.
Table 8-1 contains some suggested values. It should be recognized that these
values are not hard and fast rules, but are based on judgment. If there is a
question about how to handle values below detection, consult a statistician.
8-1
-------
TABLE 8-1. METHODS FOR BELOW DETECTION LIMIT VALUES
Percentage
of Nondetects
in the Data Base
Statistical
Analysis Method
Section of
Guidance Document
Less than 15%
Replace NDs with
MDL/2 or PQL/2,
then proceed with
parametric procedures:
• ANOVA
• Tolerance Units
• Prediction Intervals
• Control Charts
Section 8.1.1
Section 5.2.1
Section 5.3
Section 5.4
Section 7
Between 15 and 50%
Use NDs as ties,
then proceed with
Nonparametric ANOVA
or
use Cohen's adjustment,
then proceed with:
• Tolerance Limits
• Confidence Intervals
• Control Charts
Section 5.2.2
Section 8.1.3
Section 5.3
Section 6.2.1
Section 7
More than 50%
Test of Proportions
Section 8.1.2
8-2
-------
It should be noted that the nonparametric methods presented earlier auto-
matically deal with values below detection by regarding them as all tied at a
level below any quantitated results. The nonparametric methods may be used if
there is a moderate amount of data below detection. If the proportion of non-
quantified values in the data exceeds 25%, these methods should be used with
caution. They should probably not be used if less than half of the data con-
sists of quantified concentrations.
8.1.1 The DL/2 Method
The amount of data that are below detection plays an important role in
selecting the method to deal with the limit of detection problem. If a small
proportion of the observations are not detected, these may be replaced with a
small number, usually the method detection limit divided by 2 (MDL/2), and the
usual analysis performed. This is the recommended method for use with the
analysis of various procedure of Section 5.2.1. Seek professional help if in
doubt about dealing with values below detection limit. The.results of the
analysis are generally not sensitive to the specific choice of the replacement
number.
As a guideline, if 15% or fewer of the values are not detected, replace
them with the method detection limit divided by two and proceed with the
appropriate analysis using these modified values. Practical quantisation
limits (PQL) for Appendix IX compounds were published by EPA in the Federal
Register (Vol 52, No 131, July 9, 1987, pp 25947-25952). These give practical
quantisation limits by compound and analytical method that may be used in
replacing a small amount of nondetected data with the quantitation limit
divided by 2. If approved by the Regional Administrator, site specific PQL's
may be used in this procedure. If more than 15% of the values are reported as
not detected, it is preferable to use a nonparametric method or a test of pro-
portions.
8.1.2. Test of Proportions
If more than 50% of the data are below detection but at least 10% of the
observations are quantified, a test of proportions may be used to compare the
background well data with the compliance well data. Clearly, if none of the
background well observations were above the detection limit, but all of the
compliance well observations were above the detection limit, one would suspect
contamination. In general the difference may not be as obvious. However, a
higher proportion of quantitated values in compliance wells could provide evi-
dence of contamination. The test of proportions is a method to determine
whether a difference in proportion of detected values in the background well
observations and compliance well observations provides statistically signifi-
cant evidence of contamination.
The test of proportions should be used when the proportion of quantified
values is small to moderate (i.e., between 10% and 50%). If very few quanti-
fied values are found, a method based on the Poisson distribution may be used
as an alternative approach. A method based on a tolerance limit for the
number of detected compounds and the maximum concentration found for any
detected compound has been proposed by Gibbons (1988). This alternative would
8-3
-------
be appropriate when the number of detected compounds is quite small relative
to the number of compounds analyzed for as might occur in detection
monitoring.
PURPOSE
The test of proportions determines whether the proportion of compounds
detected in the compliance well data differs significantly from the proportion
of compounds detected in the background well data. If there is a significant
difference, this is statistically significant evidence of contamination.
PROCEDURE
The procedure uses the normal distribution approximation to the binomial
distribution. This assumes that the sample size is reasonably large. Gener-
ally, if the proportion of detected values is denoted by P, and the sample
size is n, then the normal approximation is adequate, provided that nP and
n(l-P) both are greater than or equal to 5.
Step 1. Check criterion for using the normal approximation.
Determine X, the number of background well samples in which the
compound was detected, and Y, the number of compliance well samples
in which the compound was detected.
Let n^ be the total number of background well samples analyzed and
nc be the total number of compliance well samples analyzed. Let
n = nb + n^.
Estimate P with P = (X + Y)/n.
Compute nP and n(l - P). If both products are > 5, then the normal
approximation may be used.
Step 2. Compute the proportion of detects in the background well
samples:
Pb " X/nb
Step 3. Compute the proportion of detects in the compliance well
samples:
PC ' Y/nc
Step 4. Compute the standard error of the difference in proportions:
S0 - £l(X+Y)/(nb+nc)][l - (X+Y)/(nb+nc)][l/nb
and form the statistic:
Z - (Pb - P
8-4
-------
Step 5. Compare the absolute value of Z to the 97.5th percent! 1e from
the standard normal distribution, 1.96. If the absolute value of Z exceeds
1.96, this provides statistically significant evidence at the 5% significance
level that the proportion of compliance well samples where the compound was
detected exceeds the proportion of background well samples where the compound
was detected. This would be interpreted as evidence of contamination. (The
two-sided test is used to provide information about differences in either
direction.)
EXAMPLE
Table 8-2 contains data on cadmium concentrations measured in background
well and compliance wells at a facility. In the table, "BDL" is used for
below detection limit.
Step 1. Check the adequacy of the normal approximation. From Table 8-2,
X = 8, nb = 24, Y = 24, nc = 64, and hence n = 88.
Calculate: P = (8 + 24)/(24 + 64) = 0.364
Compute: nP = 88(0.364) = 32
n(l-P) = 88(1 - 0.364) = 56
Since both of these exceed 5, the normal approximation is justified.
Step 2. Estimate the proportion above detection in the background
wells. As shown in Table 8-2, there were 24 samples from background wells
analyzed for cadmium, so nb = 24. Of these, 16 were below detection and X = 8
were above detection, so Pb = 8/24 = 0.333.
Step 3. Estimate the proportion above detection in the compliance
wells. There were 64 samples from compliance wells analyzed for cadmium, with
40 below detection and 24 detected values. This gives n = 64, Y = 24, so P_
= 24/64 = 0.375. C
Step 4. Calculate the standard error of the difference in proportions.
SQ = [[(8+24)/(24+64)][l-(8+24)/(24+64)](l/24 +1/64)}1/2 = 0.115
Step 5. Form the statistic Z and compare it to the normal distribution.
7 _ 0.375 - 0.333 n ,7
L OTTT5 U>J/
which is less in absolute value than the value from the normal distribution,
1.96. Consequently, there is no statistically significant evidence that the
proportion of samples with cadmium levels above the detection limit differs in
the background well and compliance well samples.
8-5
-------
TABLE 8-2. EXAMPLE DATA FOR A TEST OF PROPORTIONS
Cadmium concentration (wg/L) Cadmium concentration (ug/L)
at background well at compliance wells
(24 samples) (64 samples)
0.1 BDL
0.12 BDL
BDL* BDL
0.26 BDL
BDL
0.1
BDL
0.014
BDL
BDL
BDL
BDL
BDL
0.12
BDL
0.21
BDL
0.12
BDL
BDL
0.12
0.08
BDL
0.2
BDL
0.1
BDL
0.012
BDL
BDL
BDL
BDL
BDL
0.12
0.07
BDL
0.19
BDL
0.1
BDL
0.01
BDL
BDL
BDL
BDL
BDL
0.11
0.06
BDL
0.23
BDL
0.11
BDL
0.031
BDL
BDL
BDL
BDL
BDL
0.12
0.08
BDL
0.26
BDL
0.02
BDL
0.024
BDL
BDL
BDL
BDL
BDL
0.1
0.04
BDL
BDL
0.1
BDL
0.01
BDL
BDL
BOL
BDL
BDL
BDL means below detection limit.
8-6
-------
INTERPRETATION
Since the proportion of water samples with detected amounts of cadmium in
the compliance wells was not significantly different from that in the
background wells, the data are interpreted to provide no evidence of contam-
ination. Had the proportion of samples with detectable levels of cadmium in
the compliance wells been significantly higher than that in the background
wells this would have been evidence of contamination. Had the proportion been
significantly higher in the background wells, additional study would have been
required. This could indicate that contamination was migrating from an off-
site source, or it could mean that the hydraulic gradient had been incorrectly
estimated or had changed and that contamination was occurring from the facil-
ity, but the ground-water flow was not in the direction originally esti-
mated. Mounding of contaminants in the ground water near the background wells
could also be a possible explanation of this observance.
8.1.3 Cohen's Method
If a confidence interval or a tolerance interval based upon the normal
distribution is being constructed, a technique presented by Cohen (1959)
specifies a method to adjust the sample mean and sample standard deviation to
account for data below the detection limit. The only requirements for the use
of this technique is that the data are normally distributed and that the
detection limit be always the same. This technique is demonstrated below.
PURPOSE
Cohen's method provides estimates of the sample mean and standard devia-
tion when some (< 50%) observations are below detection. These estimates can
then be used to construct tolerance, confidence, or prediction intervals.
PROCEDURE
Let n be the total number of observations, m represent the number of data
points above the detection limit (DL), and X^ represent the value of the ith
constituent value above the detection limit.
Step 1. Compute the sample mean xd from the data above the detection
limit as follows:
1 m
xd = m i=lxi
Step 2. Compute the sample variance S^ from the data above the detection
limit as follows:
m m , m 2
2 =
d "" rri-1 " ra-1
8-7
-------
Step 3. Compute the two parameters, h and -r (lowercase gamma), as
follows:
h = (n~m>
n
and
(x-DL)z
where n is the total number of observations (i.e.., above and below the
detection limit), and where DL is equal to the detection limit.
These values are then used to determine the value of the parameter x from
Table 7 in Appendix 8.
Step 4. Estimate the corrected sample mean, which accounts for the data
below detection limit, as follows:
X = xd - x(xd - DL)
Step 5. Estimate the corrected sample standard deviation, which accounts
for the data below detection limit, as follows:
•) ~ 71/9
S = (Sd + x(xd - DL)2)1/2
Step 6. Use the corrected values of X and S in the procedure for con-
structing a tolerance interval (Section 5.3) or a confidence interval (Sec-
tion 6.2.1).
REFERENCE
Cohen, A. C., Jr. 1959. "Simplified Estimators for the Normal Distribution
When Samples are Singly Censored or Truncated." Tecnnometrics. 1:217-237.
EXAMPLE
Table 8-3 contains data on sulfate concentrations. Three observations of
the 24 were below the detection limit of 1,450 mg/L and are denoted by
"< 1,450" in the table.
Step 1. Calculate the mean from the m = 21 values above detection
xd = 1,771.9
Step 2. Calculate the sample variance from the 21 quantified values
Sd = 8,593.69
8-8
-------
TABLE 8-3. EXAMPLE DATA FOR COHEN'S TEST
Sulfate concentration (mg/L)
1,850
'1,760
< 1,450
1,710
1,575
1,475
1,780
1,790
1,780
< 1,450
1,790
1,800
< 1,450
1,800
1,840
1,820
1,860
1,780
1,760
1,800
1,900
1,770
1,790
1,780
DL = 1,450 mg/L
Note:A symbol "<" before a number indicates that the value
is not detected. The number following is then the limit of
detection.
Step 3. Determine
h = (24-21)/24 = 0.125
and
r = 8593.69/(1771.9-1450)2 = Q.Q83
Enter. Table 7 of Appendix B at h = 0.125 and y = 0.083 to determine the
value of x. Since the table does not contain these entries exactly, double
linear interpolation was used to estimate x = 0.14986.
8-9
-------
REMARK
For the interested reader, the details of the double linear interpolation
are provided.
The values from Table 7 between which the user needs to interpolate are:
I h = 0.10 h = 0.15
0.05 0.11431 0.17935
0.10 0.11804 0.18479
There are 0.025 units between 0.01 and 0.125 on the h-scale. There are
0.05 units between 0.10 and 0.15. Therefore, the value of interest (0.125)
lies (0.025/0.05 * 100) = 50% of the distance along the interval between 0.10
and 0.15. To linearly interpolate between the tabulated values on the h axis,
the range between the values must be calculated, the value that is 50% of the
distance along the range must be computed and then that value must be added to
the lower point on the tabulated values. The result is the interpolated
value. The interpolated points on the h-scale for the current example are:
0.17935 - 0.11431 = 0.06504 0.06504 * 0.50 = 0.03252
0.11431 + 0.03252 = 0.14683
0.18479 - 0.11804 = 0.06675 0.06675 * 0.50 = 0.033375
0.11804 + 0.033375 = 0.151415
On the Y-axis there are 0.033 units between 0.05 and 0.083. There are
0.05 units between 0.05 and 0.10. The value of interest (0.083) lies
(0.0330.05 * 100) = 66% of the distance along the interval between 0.05 and
0.10. The interpolated point on the y-axis is:
0.141415 - 0.14683 = 0.004585 0.004585 * 0.66 = 0.0030261
0.14683 + 0.0030261 = 0.14986
Thus, x = 0.14986.
Step 5. The corrected sample mean and standard deviation are then esti-
mated as follows:
X = 1,771.9 - 0.14986 (1,771.9 - 1,450) = 1,723.66
S = [8,593.69 + 0.14986(1,771.9 - 1,450)2]1/2 = 155.31
Step 6. These modified estimates of the mean, X = 1723.66, and of the
standard deviation, S = 155.31, would be used in the tolerance or confidence
interval procedure. For example, if the sulfate concentrations represent
background at a facility, the upper 95% tolerance limit becomes
1723.7 -i- (155.3)(2.309) = 2082.3 mg/L
8-10
-------
Observations from compliance wells in excess of 2,082 mg/L would give sta-
tistically significant evidence of contamination.
INTERPRETATION
Cohen's method provides maximum likelihood estimates of the mean and
variance of a censored normal distribution. That is, of observations that
follow a normal distribution except for those below a limit of detection,
which are reported as "not detected." The modified estimates reflect the fact
that the not detected observations are below the limit of detection, but not
necessarily zero. The large sample properties of the modified estimates allow
for them to be used with the normal theory procedures as a means of adjusting
for not detected values in the data. Use of Cohen's method in more compli-
cated calculations such as those required for analysis of variance procedures,
requires special consideration from a professional statistician.
8.2 OUTLIERS
A ground-water constituent concentration value that is much different
from most other values in a data set for the same ground-water constituent
concentration can be referred to as an "outlier." Possible reasons for
outliers can be:
A catastrophic unnatural occurrence such as a spill;
Inconsistent sampling or analytical chemistry methodology that may
result in laboratory contamination or other anomalies;
Errors in the transcription of data values or decimal points; and
True but extreme ground-water constituent concentration measure-
ments.
There are several tests to determine if there is statistical evidence
that an observation is an outlier. The reference for the test presented here
is ASTM paper E178-75.
PURPOSE
The purpose of a test for outliers is to determine whether there is
statistical evidence that an observation that appears extreme does not fit the
distribution of the rest of the data. If a suspect observation is identified
as an outlier, then steps need to be taken to determine whether it is the
result of an error or a valid extreme observation.
PROCEDURE
Let the sample of observations of a hazardous constituent of ground water
be denoted by Xlt ..., Xn. For specificity, assume that the data have been
ordered and that the largest observation, denoted by Xn, is suspected of being
an outlier. Generally, inspection of the data suggests values that do not
8-11
-------
appear to belong to the data set. For example, if the largest observation is
an order of magnitude larger than the other observations, it would be suspect.
Step 1. Calculate the mean, X and the standard deviation, S, of the data
including all observations.
Step 2. Form the statistic, Tn:
Tn = (Xn - *)/S
Note that Tn is the difference between the largest observation and the sample
mean, divided by the sample standard deviation.
Step 3. Compare the statistic Tp to the critical value given the sample
size, n, in Table 8 in Appendix B. If the Tn statistic exceeds the critical
value from the table, this is evidence that the suspect observation, Xp, is a
statistical outlier.
Step 4. If the value is identified as an outlier, one of the actions
outlined below should be taken. (The appropriate action depends on what can
be learned about the observation.) The records of the sampling and analysis
of the sample that led to it should be investigated to determine whether the
outlier resulted from an error that can be identified.
If an error (in transcription, dilution, analytical procedure, etc.)
can be identified and the correct value recovered, the observation should be
replaced by its corrected value and the appropriate statistical analysis done
with the corrected value.
• If it can be determined that the observation is in error, but the
correct value cannot be determined, then the observation should be deleted
from the data set and the appropriate statistical analysis performed. The
fact that the observation was deleted and the reason for its deletion should
be reported when reporting the results of the statistical analysis.
If no error in the value can be documented then it must be assumed
that the observation is a true but extreme value. In this case it must not be
altered. It may be desirable to obtain another sample to confirm the observa-
tion. However, analysis and reporting should retain the observation and state
that no error was found in tracing the sample that led to the extreme observa-
tion.
EXAMPLE
Table 8-4 contains 19 values of total organic carbon (TOC) that were
obtained from a monitoring well. Inspection shows one value which at 11,000
mg/L is nearly an order of magnitude larger than most of the other observa-
tions. It is a suspected outlier.
8-12
-------
TABLE 8-4. EXAMPLE DATA FOR TESTING FOR AN OUTLIER
Total organic carbon (mg/L)
1,700
1,900
1,500
1,300
11,000
1,250
1,000
1,300
1,200
1,450
1,000
1,300
1,000
2,200
4,900
3,700
1,600
2,500
1,900
Step 1. Calculate the mean and standard deviation of the data.
X = 2300 and S = 2325.9
Step 2. Calculate the statistic T19.
Ti9 = (11000-2300)72325.9 = 3.74
Step 3. Referring to Table 8 of Appendix B for the upper 5% significance
level, with n = 19, the critical value is 2.532. Since the value of the
statistic T19 = 3.74 is greater than 2.532, there is statistical evidence
that the largest observation is an outlier.
Step 4. In this case, tracking the data revealed that the unusual value
of 11,000 resulted from a keying error and that the correct value was 1,100.
This correction was then made in the data.
INTERPRETATION
An observation that is 4 or 5 times as large as the rest of the data is
generally viewed with suspicion. An observation that is an order of magnitude
different could arise by a common error of misplacing a decimal. The test for
an outlier provides a statistical basis for determining whether an observation
8-13
-------
is statistically different from the rest of the data. If it is, then it is a
statistical outlier. However, a statistical outlier may not be dropped or
altered just because it has been identified as an outlier. The test provides
a formal identification of an observation as an outlier, but does not identify
the cause of the difference.
Whether or not a statistical test is done, any suspect data point should
be checked. An observation may be corrected or dropped only if it can be
determined that an error has occurred. If the error can be identified and
corrected (as in transcription or keying) the correction should be made and
the corrected values used. A value that is demonstrated to be incorrect may
be deleted from the data. However, if no specific error can be documented,
the observation must be retained in the data. Identification of an observa-
tion as an outlier but with no error documented could be used to suggest
resampling to confirm the value.
8-14
-------
APPENDIX A
GENERAL STATISTICAL CONSIDERATIONS AND
GLOSSARY OF STATISTICAL TERMS
A-l
-------
***£'*£*<>£ o;<*VC"* oft?* /;>^-'<^
7o n th- /•« 60-,Stf+ ?7i,~ e/c- H
-------
*3$<
x>e
>v
\*
rO* ' - *
<>e *X
,..:»^^^^'-(
v f\t^ A.*1 ^A ^^ \3Y «<\ON « i Ox > W01-1 *e -rO"
^J ^ ^^& ?^ o< $ $*$^\ **&«** . **>.
i^«?»-r ,^%s&^5?f „„,
> " <. ;;-'ul%,??.^: x;S>X'
°tf r# *.<«$ s>* < '•» J>>;
'.^fs;^.;: •: ^^5^
^-c^^^k:?
t^^^^:-^^
T&<>%&®&&8&
** w *%'.1 o^ *,\^ .^^. f>e'
^r^>
r^ X^e
-------
GENERAL STATISTICAL CONSIDERATIONS
FALSE ALARMS OR TYPE I ERRORS
The statistical analysis of data from ground-water monitoring at RCRA
sites has as its goal the determination of whether the data provide evidence
of the presence of, or an increase in the level of contamination. In the case
of detection monitoring, the goal of the statistical analysis is to determine
whether statistically significant evidence of contamination exists. In the
case of compliance monitoring, the goal is to determine whether statistically
significant evidence of concentration levels exceeding compliance limits
exists. In monitoring sites in corrective action, the goal is to determine
whether levels of the hazardous constituents are still above compliance limits
or have been reduced to, at, or below the compliance limit.
These questions are addressed by the use of hypothesis tests. In the
case of detection monitoring, it is hypothesized that a site is not contami-
nated; that is, the hazardous constituents are not present in the ground
water. Samples of the ground water are taken and analyzed for the constitu-
ents in question. A hypothesis test is used to decide whether the data indi-
cate the presence of the hazardous constituent. The test consists of calcu-
lating one or more ,statistics from the data and comparing the calculated
results to some prespecified critical levels.
In performing a statistical test, there are four possible outcomes. Two
of the possible outcomes result in the correct decision: (a) the test may
correctly indicate that no contamination is present or (b) the test may cor-
rectly indicate the presence of contamination. The other two possibilities
are errors: (c) the test may indicate that contamination is present when in
fact it is not or (d) the test may fail to detect contamination when it is
present.
If the stated hypothesis is that no contamination is present (usually
called the null hypothesis) and the test indicates that contamination is
present when in fact it is not, this is called a Type I error. Statistical
hypothesis tests are generally set up to control the probability of Type I
error to be no more than a specified value, called the significance level, and
usually denoted by a. Thus in detection monitoring, the null hypothesis would
be that the level of each hazardous constituent is zero (or at least below
detection). The test would reject this hypothesis if some measure of concen-
tration were too large, indicating contamination. A Type I error would be a
false alarm or a triggering event that is inappropriate.
In compliance monitoring, the null hypothesis is that the level of each
hazardous constituent is less than or equal to the appropriate compliance
A-3
-------
limit. For the purpose of setting up the statistical procedure, the simple
null hypothesis that the level is equal to the compliance limit would be
used. As in detection monitoring, the test would indicate contamination if
some measure of concentration is too large. A false alarm or Type I error
would occur if the statistical procedure indicated that levels exceed the
appropriate compliance limits when, in fact, they do not. Such an error would
be a false alarm in that it would indicate falsely that compliance limits were
being exceeded.
PROBABILITY OF DETECTION AND TYPE II ERROR
The other type of error that can occur is called a Type II error. It
occurs if the test fails to detect contamination that is present. Thus a
Type II error is a missed detection. While the probability of a Type I error
can be specified, since it is the probability that the test will give a false
alarm, the probability of a Type II error depends on several factors, includ-
ing the statistical test, the sample size, and the significance level or prob-
ability of Type I error. In addition, it depends on the degree of contamina-
tion present. In general, the probability of a Type II error decreases as the
level of contamination increases. Thus a test may be likely to miss low lev-
els of contamination, less likely to miss moderate contamination, and very
unlikely to miss high levels of contamination.
One can discuss the probability of a Type II error as the probability of
a missed detection, or one can discuss the complement (one minus the prob-
ability of Type II error) of this probability. The complement, or probability
of detection, is also called the power of the test. It depends on the magni-
tude of the contamination so that the power or probability of detecting con-
tamination increases with the degree of contamination.
If the probability of a Type I error is specified, then for a given sta-
tistical test, the power depends on the sample size and the alternative of
interest. In order to specify a desired power or probability of detection,
one must specify the alternative that should be detected. Since generally the
power will increase as the alternative differs more and more from the null
hypothesis, one usually tries to specify the alternative that is closest to
the null hypothesis, yet enough different that it is important to detect.
In the detection monitoring situation, the null hypothesis is that the
concentration of the hazardous constituent is zero (or at least below detec-
tion). In this case the alternative of interest is that there is a concen-
tration of the hazardous constituent that is above the detection limit and is
large enough so that the monitoring procedure should detect it. Since it is a
very difficult problem to select a concentration of each hazardous constituent
that should be detectable with specified power, a more useful approach is to
determine the power of a test at several alternatives and decide whether the
procedure is acceptable on the basis of this power function rather than on the
power against a single alternative.
In order to increase the power, a larger sample must be taken. This
would mean sampling at more frequent intervals. There is a limit to how much
can be achieved, however. In cases with limited water flow, it may not be
possible to sample wells as frequently as desired. If samples close together
A-4.
-------
in time prove to be correlated, this correlation reduces the information
available from the different samples. The additional cost of sampling and
analysis will also impose practical limitations on the sample size that can be
used.
Additional wells could also be used to increase the performance of the
test. The additional monitoring wells would primarily be helpful in ensuring
that a plume would not escape detection by missing the monitoring wells. How-
ever, in some situations the additional wells would contribute to a .larger
sample size and so improve the power.
In compliance monitoring the emphasis is on determining whether addi-
tional contamination has occurred, raising the concentration above a compli-
ance limit. If the compliance limit is determined from the background well
levels, the null hypothesis is that the difference between the background and
compliance well concentrations is zero. The alternative of interest is that
the compliance well concentration exceeds the background concentration. This
situation is essentially the same for power considerations as that of the
detection monitoring situation.
If compliance monitoring is relative to a compliance limit (MCL or ACL),
specified as a constant, then the situation is different. Here the null hypo-
thesis is that the concentration is less than or equal to the compliance
limit, with equality used to establish the test. The alternative is that the
concentration is above the compliance limit. In order to specify power, a
minimum amount above the compliance limit must be established and power speci-
fied for that alternative or the power function evaluated for several possible
alternatives.
SAMPLE DESIGNS AND ASSUMPTIONS
As discussed in Section 2, the sample design to be employed at a regu-
lated unit will primarily depend on the hydrogeologic evaluation of the
site. Wells should be sited to provide multiple background wells hydrauli-
cally upgradient from the regulated unit. The background wells allow for
determination of natural spatial variability in ground-water quality. They
also allow for estimation of background levels with greater precision than
would be possible from a single upgradient well. Compliance wells should be
sited hydraulically downgradient to each regulated unit. The location and
spacing of the wells, as well as the depth of sampling, would be determined
from the hydrogeology to ensure that at least one of the wells should inter-
cept a plume of contamination of reasonable size.
Thus the assumed sample design is for a sample of wells to include a
number of background wells for the site, together with a number of compliance
wells for each regulated unit at the site. In the event that a site has only
a single regulated unit, there would be two groups of wells, background and
compliance. If a site has multiple regulated units, there would be a set of
compliance wells for each regulated unit, allowing for detection monitoring or
compliance monitoring separately at each regulated unit.
Data from the analysis of the water at each well are initially assumed to
follow a normal distribution. This is likely to be the case for detection
A-5
-------
monitoring of analytes in that levels should be near zero and errors would
likely represent instrument or other sampling and analysis variability. If
contamination is present, then the distribution of the data may be skewed to
the right, giving a few very large values. The assumption of normality of
errors in the detection monitoring case is quite reasonable, with deviations
from normality likely indicating some degree of contamination. Tests of nor-
mality are recommended to ensure that the data are adequately represented by
the normal distribution.
In the compliance monitoring case, the data for each analyte will again
initially be assumed to follow the normal distribution. In this case, how-
ever, since there is a nonzero concentration of the analyte in the ground
water, normality is more of an issue. Tests of normality are recommended. If
evidence of nonnormality is found, the data should be transformed or a
distribution-free test be used to determine whether statistically significant
evidence of contamination exists.
The standard situation would result in multiple samples (taken at dif-
ferent times) of water from each well. The wells would form groups of back-
ground wells and compliance wells for each regulated unit. The statistical
procedures recommended would allow for testing each compliance well group
against the background group. Further, tests among the compliance wells
within a group are recommended to determine whether a single well might be
intercepting an isolated plume. The specific procedures discussed and recom-
mended in the preceding sections should cover the majority of cases. They did
not cover all of the possibilities. In the event that none of the procedures
described and illustrated appears to apply to a particular case at a given
regulated site, consultation with a statistician should be sought to determine
an appropriate statistical procedure.
The following approach is recommended. If a regulated unit is in detec-
tion monitoring, it will remain in detection monitoring until or unless there
is statistically significant evidence of contamination, in which case it would
be placed in compliance monitoring. Likewise, if a regulated unit is in com-
pliance monitoring, it will remain in compliance monitoring unless or until
there is statistically significant evidence of further contamination, in which
case it would move into corrective action.
In monitoring a regulated unit with multiple compliance wells, two types
of significance levels are considered. One is an experimentwise significance
level and the other is a comparisonwise significance level. When a procedure
such as analysis of variance is used that considers several compliance wells
simultaneously, the significance is an experimentwise significance. If
individual well comparisons are made, each of those comparisons is done at a
comparisonwise significance level.
The fact that many comparisons will be made at a regulated unit with
multiple compliance wells can make the probability that at least one of the
comparisons will be incorrectly significant too high. To control the false
positive rate, multiple comparisons procedures are allowed that control the
experimentwise significance level to be 5%. That is, the probability that one
or more of the comparisons will falsely indicate contamination is controlled
A-6
-------
at 5%. However, to provide some assurance of adequate power to detect real
contamination, the comparisonwise significance level for comparing each
individual well to the background is required to be no less than 1%.
Control of the experimentwise significance level via multiple comparisons
procedures is allowed for comparisons among several wells. However, use of an
experimentwise significance level for the comparisons among the different haz-
ardous constituents is not permitted. Each hazardous constituent to be moni-
tored for in the permit must be treated separately.
A-7
-------
GLOSSARY OF STATISTICAL TERMS
(underlined terms are explained subsequently)
Alpha (a)
Alpha-error
Alternative hypothesis
Arithmetic average
Autocorrelation
Biased estimator
Bonferroni t
A greek letter used to denote the significance
level or probability of a Type I error.
Sometimes used for Type I error.
An alternative hypothesis specifies that the
underlying distribution differs from the null
hypothesis. The alternative hypothesis usually
specifies the value of a parameter, for example
the mean concentration, that one is trying to
detect.
The arithmetic average of a set of
is their sum divided by the
observations.
observations
number of
This is a measure of dependence among sequential
observations from the same well. There are dif-
ferent orders of autocorrelation, depending on
how far apart in time the correlation per-
sists. For example, the first; order auto-
correlation is the correlation between suc-
cessive pairs of observations.
A biased estimator is an estimator that has an
expectation or average value that is not equal
to the parameter it is estimating. Often the
bias decreases as the sample size increases.
This is an approach, developed by Bonferroni, to
control the experimentwise error rate in mu11 i -
pie comparisons. The number of comparisons or
hypotheses to be tested is fixed (at k) and a
"t" statistic is computed to test each of
these. Instead of the usual "t" table, where
each of these tests would be done at the sig-
nificance level alpha, a special table is used
so that each test is done at level alpha/k.
This ensures that the experimentwise error rate
is no more than alpha.
A-8-
-------
Comparisonwise error rate
Composite hypothesis
Confidence coefficient
Confidence interval
Cumulative distribution
function
Distribution-free
Distribution function
Estimator
This term is used in association with multiple
comparisons. It refers to the probability of an
error occurring on a single comparison of sev-
eral that might be done. It is computed assum-
ing that the single comparison or hypothesis
test is the only one being done.
This is a hypothesis for which not all relevant
parameters are specified. A composite hypothe-
sis is made up of two or more simple hypothe-
ses. For example, the hypothesis that the data
are normally distributed with unspecified mean
and variance is a composite hypothesis.
The confidence coefficient of a confidence
interval for a parameter is the probability that
the random interval constructed from the sample
data contains the true value of the parameter.
The confidence coefficient is related to the
significance level of an associated hypothesis
the significance level (in
minus the confidence
test by the fact that
percent) is one hundred
coefficient (in percent).
A confidence interval for a parameter is a
random interval constructed from sample data in
such a way that the probability that the
interval will contain the true value of the
parameter is a specified value.
The distribution function for a random variable,
X, is a function that specifies the probability
that X is less than or equal to t, for all real
values of t.
This is sometimes used as a synonym for
nonparametric. A statistic is distribution-free
if its distribution does not depend upon which
specific distribution function (in a large
class) the observations follow.
This document uses "Cumulative Distribution
Function" and "Distribution Function" inter-
changeably. See Cumulative Distribution
Function.
An estimator is a statistic computed from the
observed data. It is used to estimate a param-
eter of interest; for example, the population
mean. Often estimators are the sample equiva-
lents of the population parameters.
A-9
-------
Experimentwise error rate
Hypothesis
Independence
Mean
Median
Multiple comparison
procedure
Nonparametric statistical
procedure
This term refers to multiple comparisons. If a
total of n decisions are made about comparisons
(for example of compliance wells to background
wells) and x of the decisions are wrong, then
the experimentwise error rate is x/n. The
probability that X exceeds zero is the experi-
mentwise significance.
This is a formal statement about a. parameter of
interest and the distribution of a statistic.
It is usually used as a null hypothesis or an
alternative hypothesis. For example, the null
hypothesis might specify that ground water had a
zero concentration of benzene and that analyti-
cal errors followed a normal distribution with
mean zero and standard deviation 1 ppm.
A set of events are independent if the
probability of the joint occurrence of any
subset of the events factors into the product of
the probabilities of the events. A set of
observations is independent if the joint
distribution function of the random errors
associated with the observations factors into
the product of the distribution functions.
Arithmetic average.
This is the middle value of a sample when the
observations have been ordered from least to
greatest. If the number of observations is odd,
it is the middle observation. If the number of
observations is even, it is customary to take
the midpoint between the two middle observa-
tions. For a distribution, the median is a
value such that the probability i.<; one-half that
an observation will fall above or below the
median.
This is a statistical procedure that makes a
large number of decisions or comparisons on one
set of data. For example, at a sampling period,
several compliance well concentrations may be
compared to the background well concentration.
A nonparametric statistical procedure is a
statistical procedure that has desirable
properties that hold under mild assumptions
regarding the data. Typically the procedure is
valid for a large class of distributions rather
than for a specific distribution of the data
such as the normal.
A-10
-------
Normal population,
normality
Null hypothesis
One-sided test
One-sided tolerance limit
One-sided confidence limit
Order statistics
Outlier
Parameter
Percentile
Post hoc comparison
Power
The errors associated with the observations
follow the normal or Gaussian distribution
function.
A null hypothesis specifies the underlying
distribution of the data completely. Often the
null distribution specifies that there is no
difference between the mean concentration in
background well water samples and compliance
well water samples. Typically, the null hypo-
thesis is a simple hypothesis.
A one-sided test is appropriate if concentra-
tions higher than those specified by the null
hypothesis are of concern. A one-sided test
only rejects for differences that are large and
in a prespecified direction.
This is an upper limit on observations from a
specified distribution.
This is an upper
distribution.
limit on a parameter of a
The sample values observed after they have been
arranged in increasing order.
An outlier is an observation that is found to
lie an unusually long way from the rest of the
observations in a series of replicate
observations.
A parameter is an unknown constant associated
with a population. For example, the mean
concentration of a hazardous constituent in
ground water is a parameter of interest.
A percentile of a distribution is a value below
which a specified proportion or percent of the
observations from that distribution will fall.
This is a comparison, say between hazardous
constituent concentrations in two wells, that
was found to be of interest after the data were
collected. Special methods must be used to
determine significance levels for post hoc
comparisons.
The power of a test is the probability that the
test will reject under a specified alternative
hypothesis. This is one minus the probability
of a Type II error. The power is a measure of
A-ll
-------
Sample standard deviation
Sample variance
Serial correlation
Significance level
Simple hypothesis
Test statistic
Trend analysis
Type I error
the test's ability to detect a difference of
specified size from the null hypothesis.
This is the square root of the sample variance.
This is a statistic (computed on a sample of
observations rather than on the whole popula-
tion) that measures the variability or spread of
the observations about the sample mean. It is
the sum of the squared differences from the
sample mean, divided by the number of observa-
tions less one.
This is the correlation of observations spaced a
constant interval apart in a series. For exam-
ple, the first order serial correlation is the
correlation between adjacent observations. The
first order serial correlation is found by cor-
relating the pairs consisting of the first and
second, second and third, third and fourth,
etc., observations.
Sometimes referred to as the alpha level, the
significance level of a test is the probability
of falsely rejecting a true null hypothesis.
The probability of a Type I error.
A hypothesis which completely specifies the
distribution of the observed random variables.
To completely define a distribution, both the
type of distribution and numeric values for the
parameters must be given.
A test statistic is a value computed from the
observed data. This value is used to test a
hypothesis by relating the value to a distribu-
tion table and rejecting the hypothesis if the
computed value falls in a region that has low
probability under the hypothesis being tested.
A "t" statistic, an "F" statistic, and a chi-
squared statistic are examples.
This refers to a collection of statistical
methods that analyze data to determine trends
over time. The trends may be of various types,
steady increases (or decreases), or a step
increase at a point in time.
A Type I error occurs when a true null
hypothesis is rejected erroneously. In the
monitoring context a Type I error occurs when a
A-12
-------
test incorrectly indicates contamination or an
increase in contamination at a regulated unit.
Type II error A Type II error occurs when one fails to reject
a null hypothesis that is false. In the moni-
toring context, a Type II error occurs when
monitoring fails to detect contamination or an
increase in a concentration of a hazardous
constituent.
Unbiased estimator An unbiased estimator is an estimator that has
zero bias. That is, its expectation is equal to
the parameter it is estimating. Its average
value is the parameter.
A-13
-------
APPENDIX B
STATISTICAL TABLES
B-l
-------
CONTENTS
Table Page
1 Percent!les of the x2 Distribution With
v Degrees of Freedom, x2v p B-4
2 95th Percentiles of the F-Distribution With vt and
v2 Degrees of Freedom, FVijV2j0.9s B-5
3 95th Percentiles of the Bonferroni t-Statisties,
t(v, a/m) B-6
4 Percentiles of the Standard Normal Distribution, Up B-7
5 Tolerance Factors (K) for One-Sided Normal Tolerance
Intervals With Probability Level (Confidence Factor)
Y = 0.95 and Coverage P = 95% B-9
6 Percentiles of Student's t-Distribution B-10
7 Values of the Parameter x for Cohen's Estimates
Adjusting for Nondetected Values B-ll
8 Critical Values for Tp (One-Sided Test) When the
Standard Deviation Is Calculated From the Same Sample... B-12
B-3
-------
TABLE 1. PERCENTILES OF THE x2 DISTRIBUTION WITH
v DEGREES OF FREEDOM, x
SOURCE:
Wiley and Sons, New York.
V
* \
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
70
80
90
100
an,
erini
0.750
1.323
2.773
4.108
5.385
6.626
7.841
9.037
10.22
11.39
12.55
13.70
14.85
15.98
17.12
18.25
19.37
20.49
21.60
22.72
23.83
24.93
26.04
27.14
28.24
29.34
30.43
31.53
32.62
33.71
34.80
45.62
56.33
66.98
77.58
88.13
98.65
109.1
Norman L
1 and the
0.900
2.706
4.605
6.251
7.779
9.236
10.64
12.02
13.36
14.68
15.99
17.28
]8.55
19.81
21.06
22.31
23.54
24.77
25.99
27.20
28.41
29.62
30.81
32.01
33.20
34.38
35.56
36.74
37.92
39.09
40.26
51.80
63.17
74.40
35.53
96.58
107.6
118.5
. and
Physic
0.950
3.841
5.991
7.815
9.488
11.07
12.59
14.07
15.51
16.92
18.31
19.68
21.03
22.36
23.68
25.00
26.30
27.59
28.87
30.14
31.41
32.67
33.92
35.17
36.42
37.65
38.89
40.11
41.34
42.56
43.77
55.76
67.50
79.08
90.53
102.9
113.1
124.3
0
0.975
5.024
7.378
9.348
11.14
12.83
14.45
16.01
17.53
19.02
20.48
21,92
23.34
24.74
26.12
27.49
28.85
30.19
31.53
32.85
34.17
35.48
36.78
38.08
39.36
40.65
41.92
43.19
44.46
45.72
46.98
59.34
71.42
83.30
95.02
106.6
118.1
129.6
F. C. Leone.
:al Sciences.
0.990
6.635
9.210
11.34
13.28
15.09
16.81
18.48
20.09
21.67
23.21
24.72
26.22
27.69
29.14
30.58
32.00
33.41
34.81
36.19
37.57
38.93
40.29
41.64
42.98
44.31
45.64
46.96
48.28
49.59
50.89
63.69
76.15
88.38
100.4
112.3
124.1
135.8
1977.
Vol. I.
X2
0.995
7.879
10.60
12.84
14.86
16.75
18.55
20.28
21.96
23.59
25.19
26.76
28.30
29.82
31.32
32.80
34.27
35.72
37.16
33.58
40.00
41.40
42.80
44.18
45.56
46.93
48.29
4964
50.99
52.34
53.67
66.77
79.49
91.95
104.2
116.3
128.3
140.2
0.9i>9
10.83
13.82
16.27
18.47
20.52
22.46
24.32
26.12
27.88
29.59
31.26
32.91
34.53
36.12
37.70
39.25
40.79
42.31
43.82
4532
46.80
48.27
49.73
51.18
52.62
54.05
55.48
56. 89
58.30
59.70
73.40
86.66
99.61
112.3
124.8
137.2
149.4
Statistics and \
Second Editi
John
B-4
-------
TABLE 2. 95th PERCENTILES OF THE F-DISTRIBUTION WITH
vt AND v2 DEGREES OF FREEDOM, FVlf>) o.ss
'>\ '
1 161.4
2 18.31
3 10.13
4 7.71
} 6.61
« 5.99
7 5.59
8 5.32
9 5.12
10 4 96
II 4.84
12 475
13 4.67
14 4.60
15 454
16 4 49
17 4 45
18 441
19 4.38
:o 4.35
21 4.32
22 4.30
23 4.28
24 4.26
25 4.24
26 4.23
27 421
28 4.20
29 4.18
30 4.17
40 408
60 400
120 3.92
« 3.84
2
1995
1900
9.J5
6.94
5.79
5.14
4.74
4.46
4.26
4.10
3.98
389
3.81
3.74
3 68
363
3.59
3.55
3.52
3.49
3.47
3.44
3.42
3.40
3.39
.137
3.35
3.34
3.33
J.32
3.23
3 15
3.07
3.00
3
215.7
1916
928
6.39
5.41
4.76
4.35
407
3.86
3.71
3.59
3.49
3.41
3.34
3.29
324
3 :o
3.16
3.13
3.10
307
3.05
3.03
3.01
2.99
2.98
2.96
2.95
2.93
2.92
2.84
2.76
2.68
2.60
4
224.6
!925
9.12
6.39
5.19
4.53
4 12
384
3.63
3.48
3.36
3 26
3 18
3.11
3 06
3 01
2.9fi
2.93
2.90
2.87
2.84
282
2.80
2.78
2.76
2.74
2.73
2.71
2.70
2.69
2.61
2.53
2.4J
2.37
5
230.2
19.30
901
6.26
5.05
4.39
3.97
3.69
3.48
3.33
3 20
3.11
303
2.96
2.90
2.85
2.81
2.77
2.74
2.71
2.68
2.66
2.64
2.62
2.60
2.59
2.57
2.56
2.55
2.53
2.45
2.37
2.29
2.21
6
2340
1933
8.94
6.16
495
4T28
3.87
3.58
3.37
3.22
3.09
3.00
2.92
2.85
2.79
2.74
2.70
2.66
2.63
2.60
2.57
2.55
2.53
2.51
2.49
2.47
2.46
2.45
2.43
2.42
2.34
2.25
2.17
2.10
7
236.8
1935
889
6.09
4.88
4.21
3.79
3.50
3.29
3.14
3.01
2.91
2.83
2.76
2.71
2.66
2.61
2.58
2.54
2.51
2.49
2.46
2.44
2.42
2.40
2.39
2.37
2.36
2.35
2.33
2.25
2.17
2.09
2.01
8 9 10 12 15 20 24 30 40 60 120 *>
238.9 240.5 241.9 2439 245.9 248.0 249.1 250.1 2511 252.2 2533 254.3
1937 19.38 1940 19.41 19.43 19.45 19.45 1946 19.47 1948 19.49 1950
8.85 881 8.79 8.74 8.70 866 864 8.62 8.59 8.57 8.}5 8.33
6.04 6.00 5.96 5.91 5.86 5 80 5.77 5.75 5.72 5.69 5.66 5.63
4.82 4.77 4.74 4.68 4.62 4.56 4.53 4.50 4 46 4.43 4.40 4 36
4.15 4.10 4.06 400 3.94 3.87 3.84 3.81 3.77 3.74 3.70 367
3.73 3.68 3.64 3.57 3.51 3.44 3.41 3.38 3.34 3.30 3.27 3 23
3.44 3.39 3.35 3.28 3.22 3.15 3.12 3.08 3.04 3.01 2.97 2.93
3.23 3.18 3.14 3.07 3.01 2.94 2.90 2.86 2.83 2.79 2.75 2.71
3.07 3.01 2.98 2.91 2.85 2.77 2.74 2.70 2.66 2.62 2.58 2.54
2.95 2.90 2.85 2.79 2.72 2.65 2.61 2.57 2.53 2.49 2.45 2.40
2.85 2.80 2.75 2.69 2.62 2.54 2.51 2.47 2.43 2.38 2.34 2 30
2.77 2.71 2.67 260 2.53 2.46 2.42 238 2.34 2.30 2.25 221
2.70 2.65 2.60 2.53 2.46 2.39 2.35 2.31 2.27 2.22 2.18 2.13
2.64 2.59 2.54 2.48 2.40 2.33 229 2.25 2.20 2.16 211 2.07
2.59 2.54 2.49 2.42 2..'5 2.28 2.24 2.19 2.15 2.11 2.06 2.01
2.55 2.49 2.45 2.38 2.31 2.23 2.19 2.15 2.10 2.06 2.01 196
2.51 2.46 2.41 2.34 2.27 2.19 2.15 2.11 2.06 2.02 197 192
2.48 2.42 2.38 2.31 2.23 2.16 2.11 2.07 2.03 1.98 1.93 1.88
2.45 2.39 2.35 2.28 220 2.12 2.08 2.04 1.99 .95 .90 1.34
2.42 2.37 2.32 2.25 2.18 2.10 2.05 2.01 1.96 92 .87 1.81
2.40 2.34 130 2.23 2.15 2.07 203 .98 194 .89 .84 1.78
2.37 2.32 2.27 2.20 2.13 2.05 2.01 .96 1.91 .86 .81 1.76
2.36 2.30 2.25 2.18 2.11 2.03 198 .94 1.89 .84 .79 1.73
2.34 2.28 2.24 J.I 6 2.09 2.01 1.96 .92 1 87 .82 .77 1.7!
2.32 2.27 2.22 2.15 2.07 1.99 1.95 .90 1 85 .80 .75 1.69
2.31 2.25 2.20 2.13 2.06 1.97 1.93 .88 1 84 .79 .73 1.67
2.29 2.24 2.19 2.12 2.04 1.96 1.91 .87 1.82 .77 .71 165
2.28 2.22 2.18 2.10 2.03 1.94 1.90 85 1.81 .75 .70 1.64
2.27 2.21 2.16 2.09 2.01 1.93 1.89 .84 1.79 .74 .68 1 62
2.1» 2.12 2.08 2.00 1.92 1.84 1.79 .74 1.69 64 58 1.51
2.10 2.04 1 99 1.92 1.84 1.75 1 70 .65 1.J9 .53 .47 I 39
2.02 1.96 1.91 1.8J 1.75 1.66 1.61 .55 150 .43 .35 125
1.94 1.88 1.83 1.75 1.67 1.57 1.52 .46 1.39 .32 .22 1.00
NOTE: vz: Degrees of freedom for numerator
v2: Degrees of freedom for denominator
SOURCE: Johnson, Norman L. and F. C. Leone. 1977. Statistics and Experimental
Design in Engineering and the Physical Sciences. Vol. I. Second Edition. John
Wiley and Sons, New York.
B-5
-------
TABLE 3. 95th PERCENTILES OF THE BONFERRONI
t-STATISTICS, t(v, a/m)
where v = degrees of freedom associated with the mean
squares error
m = number of comparisons
a = 0.05, the experimentwise error level
m
\a/m
4
5
6
7
8
9
10
15
20
30
m
1
0.05
2.13
2.02
1.94
1.90
1.86
1.83
1.01
1.75
1.73
1.70
1.65
2
0.025
2.78
2.57
2.45
2.37
2.31
2.26
2.23
2.13
2.09
2.04
1.96
3
0.0167
3.20
2.90
2.74
2.63
2.55
2.50
2.45
2.32
2.27
2.21
2.13
4
0.0125
3.51
3.17
2.97
2.83
2.74
2.67
2.61
2.47
2.40
2.34
2.24
5
0.01
3.75
3.37
3.1.4
3.00
2.90
2.82
2.76
2.60
2.53
2.46
2.33
SOURCE: For a/m = 0.05, 0,025, and 0.01, the percent-lies
were extracted from the t-table (Table 6, Appendix B) for
values of F=l-a of 0.95, 0.975, and 0.99, respectively.
For a/m = 0.05/3 and 0.05/4, the percentiles were
estimated using "A Nomograph of Student's t" by Nelson,
L. S. 1975. Journal of Quality Technology, Vol. 7,
pp. 200-201.
B-6
-------
TABLE 4. PERCENTILES OF THE STANDARD NORMAL DISTRIBUTION, Up
up
p
0.50
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
0.60
0.61
0.62
0.63
0.64
0.65
0.66
0.67
0.68
0.69
0.70
0.71
0.72
0.73
0.74
0.000
0.0000
0.0251
0.0502
0.0753
0.1004
0.1257
0.1510
0.1764
0.2019
0.2275
0.2533
0.2793
0.3055
0.3319
0.3585
0.3853
0.4125
0.4399
0.4677
0.4959
0.5244
0.5534
0.5828
0.6128
0.6433
0.00 1
0.0025
0.0276
0.0527
0.0778
0.1030
0.1 282
0.1535
0.1789
0.2045
0.2301
0.2559
0.2819
0.3081
0.3345
0.3611
0.3880
0.4152
0.4427
0.4705
0.4987
0.5273
0.5563
0.5858
0.6158
0.6464
0.002
0.0050
0.0301
0.0552
0.0803
0.1055
0.1307
0.1560
0.1815
0.2070
0.2327
0.2585
0.2845
0.3107
0.3372
0.3638
0.3907
0.4179
0.4454
0.4733
0.5015
0.5302
0.5592
0.5888
0.6189
0.6495
0.003
0.0075
0.0326
0.0577
0.0828
0.1080
0.1332
0.1586
0.1840
0.2096
0.2353
0.2611
0.2871
0.3134
0.3398
0.3665
0.3934
0.4207
0.4482
0.4761
0.5044
0.5330
0.5622
0.5918
0.6219
0.6526
0.004
0.0100
0.0351
0.0602
0.0853
0.1105
0.1358
0.1611
0.1866
0.2121
0.2378
0.2637
0.2898
0.3160
0.3425
0.3692
0.3961
0.4234
0.4510
0.4789
0.5072
0.5359
0.5651
0.5948
0.6250
0.6557
0.005
0.0125
0.0376
0.0627
0.0878
0.1130
0.1383
0.1637
0.1891
0.2147
0.2404
0.2663
0.2924
0.3186
0.3451
0.3719
0.3989
0.4261
0.4538
0.4817
0.5101
0.5388
0.5681
0.5978
0.6280
0.6588
0.006
0.0150
0.0401
0.0652
0.0904
0.1156
0.1408
0.1662
0.1917
0.2173
0.2430
0.2689
0.2950
0.3213
0.3478
0.3745
0.4016
0.4289
0.4565
0.4845
0.5129
0.5417
0.5710
0.6008
0.6311
0.6620
0.007
0.0175
0.0426
0.0677
0.0929
0.1181
0.1434
0.1687
0.1942
0.2198
0.2456
0.2715
0.2976
0.3239
0.3505
0.3772
0.4043
0.4316
0.4593
0.4874
0.5158
0.5446
0.5740
0.6038
0.6341
0.6651
0.008
0.0201
0.0451
0.0702
0.0954
0.1206
0.1459
0.1713
0.1968
0.2224
0.2482
0.2741
0.3002
0.3266
0.3531
0.3799
0.4070
0.4344
0.4621
0.4902
0.5187
0.5476
0.5769
0.6068
0.6372
0.6682
0.009
0.0226
0.0476
0.0728
0.0979
0.1231
0.1484
0.1738
0.1993
0.2250
0.2508
0.2767
0.3029
0.3292
0.3558
0.3826
0.4097
0.4372
0.4649
0.4930
0.5215
0.5505
0.5799
0.6098
0.6403
0.6713
NOTE: For values of P below 0.5, obtain the value of Un.p) from Table 4 and
change its sign. For example, UQ>45 = -U(i_0.45) = -U0.55 = -0-1257.
(Continued)
B-7
-------
TABLE 4 (Continued)
p
0.75
0.76
0.77
0.78
0.79
0.80
0.81
0.82
0.83
0.84
0-85
0.86
0.87
0.88
0.89
0.90
0.91
0.92
0.93
0.94
0.9S
0.96
0.97
0.98
0.99
0.000 0.001
0.6745 0.6776
0.7063 0.7095
0.7388 0.7421
0.7722 0.7756
0.8064 0.8099
0.8416 0.8452
0.8779 0.8816
0.9154 0.9192
0.9542 0.9581
0.9945 0.9986
.0364
.0803
.1264
.1750
.2265
.2316
.3408
.4051
.4758
.5548
.0407
.0848
.1311
.1800
.2319
.2873
.3469
.4118
.4833
.5632
1.6449 1.6546
1.7507 1.7624
1.8808 1.8957
2.0537 2.0749
2.3263 2.3656
0.002 0.003 0.004
0.6808 0.6840 0.6871
0.7128 0.7160 0.7192
0.7454 0.7488 0.7521
0.7790 0.7824 0.7858
0.8134 0.8169 0.8204
0.8488 0.8524 0.8560
0.8853 0.8890 0.8927
0.9230 0.9269 0.9307
0.9621 0.9661 0.9701
1.0027 1.0069 I.OHO
.0450
.0893
.1359
.1850
.2372
1.2930
1.3532
.4187
.4909
.5718
.0494
.0939
.1407
.1901
.2426
.2988
.3595
.4255
.4985
.5805
.0537
.0985
.1455
.1952
.2481
.3047
.3658
.4325
.5063
.5893
.6646 1.6747 1.6849
.7744 1.7866 1.7991
.9110 1.9268 1.9431
2.0969 2.1201 2.1444
2.4089 2.4573 2.5121
0.005
0.6903
0.7225
0.7554
0.7892
0.8239
0.8596
0.8965
0.9346
0.9741
1.0152
1.0581
1.1031
1.1503
1.2004
1.2536
1.3106
1.3722
1.4395
1.5141
1.5982
1.6954
1.8119
1.96CO
2.1701
2.5758
0.006 0.007
0.6935 0.6967
0.7257 0.7290
0.7588 0.7621
0.7926 0.7961
0.8274 0.8310
0.8633 0.8669
0.9002 0.9040
0.9385 0.9424
0.9782 0.9822
1. 01 94 1.0237
.0625
.1077
.1552
.2055
.259!
.3165
.3787
.4466
.5220
.6072
.7060
.8250
.9774
.0669
.1123
.1601
.2107
.2646
.3225
.3852
.4538
.5301
.6164
.7169
.8384
.9954
2.1973 2.2262
2.6521 2.7478
0.008
0.6999
0.7323
0.7655
0.7995
0.8345
0.8705
0.9078
0.9463
0.9863
1.0279
1.0714
1.1170
1.1650
1.2160
1 .2702
1.3285
.3917
.4611
.5382
.6258
.7279
.8522
2.0141
2.2571
2.8782
0.009
0.7031
0.7356
0.7688
0.8030
0.8381
0.8742
0.91 16
0.9502
0.9904
1 .0322
1.0758
1.1217
1.1700
1.2212
1.2759
1.3346
.3984
.4684
.5464
.6352
.7392
.8663
2.0335
2.2904
3.0902
SOURCE: Johnson, Norman L. and F. C. Leone. 1977.
Design in Engineering and the Physical Sciences. Vol. I,
Wiley and Sons, New York.
Statistics and Experimental
Second Edition. John
B-8
-------
TABLE 5. TOLERANCE FACTORS (K) FOR ONE-SIDED NORMAL TOLERANCE
INTERVALS WITH PROBABILITY LEVEL (CONFIDENCE FACTOR)
Y = 0.95 AND COVERAGE P = 95%
n
3
4
5
6
7
o
Q
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
30
35
40
45
50
55
60
65
70
K J!
7.655 !J
5.145 !J
4.202 JJ
3.707 JJ
3.399 JJ
3.188 J!
3.031
2.911 JJ
2.815 JJ
2.736 J J
2.670 JJ
2.614 J J
2.566
2.523 JJ
2.486 j;
2.543 J!
2.423 J|
2.396 JJ
2.371 ;j
2.350 JJ
2.329 JJ
2.309 JJ
2.292 JJ
2.220 JJ
2.166 JJ
2.126 J!
2.092 JJ
2.065 JJ
2.036 !!
2.017 ! !
2.000 II
1.986 ;
i
i
i
i
i
i i
1 1
i i
i i
1 1
1 1
n
75
100
125
150
175
200
225
250
275
300
325
350
375
400
425
450
475
500
525
550
575
600
625
650
675
700
725
750
775
800
825
850
875
900
925
950
975
1000
K
1.972
1.924
1.891
1.868
1.850
1.836
1.824
1.814
1.806
1.799
1.792
1.787
1.782
1.777
1.773
1.769
1.766
1.763
1.760
1.757
1.754
1.752
1.750
1.748
1.746
1.744
1.742
1.740
1.739
1.737
0.736
1.734
1.733
1.732
1.731
1.729
1.728
1.727
SOURCE: (a) for sample sizes < 50: Liebennan, Gerald F. 1958. "Tables for
One-sided Statistical Tolerance Limits." Industrial Quality Control. Vol. XIV,
No. 10. (b) for sample sizes > 50: K values were calculated from large
sample approximation.
B-9
-------
TABLE 6. PERCENTILES OF STUDENT1 s t-DISTRIBUTION
(F = 1-a; n = degrees of freedom)
\f
• x^
1
2
3
4
5
6
7
8
9
10
11
12
13
U
15
16
17
18
19
20
21
22
23
24
25
26
27
23
29
30
40
60
130
m
.80
.325
.289
.277
.271
267
.285
.283
.262
261
.260
.260
.259
.259
.258
.258
.258
.257
.257
.257
.257
257
256
.256
.256
.256
.256
.256
.256
.256
.256
.255
.254
.254
.253
.75
1.000
.816
.765
.741
.727
.718
.711
.706
.703
.700
.697
.695
.694
692
.691
.690
.689
.688
.688
.687
686
.686
.685
.685
.684
.684
684
683
.683
.683
.681
.679
.677
.674
.90
3.078
1.886
1.638
1.533
1.476
1.440
1.415
2.397
1.383
1.372
1 363
1.356
1.350
1 345
1.341
1.337
1.333
1.330
I 328
1.325
1.323
.321
.319
.318
.316
.315
.314
313
.311
.310
1.303
1.296
1.2S9
1 282
.95
6 314
2.920
2.353
2.132
2.015
.943
.895
.860
.833
.812
.796
782
.771
1.761
1.753
1.746
1.740
1.734
1 729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1,701
1.699
1.697
1.684
1.671
1 658
1.645
.975
12.706
4.303
3.182
2.776
2.571
2 447
2.365
2 306
2.262
2.228
2 201
2.179
2.160
2.145
2.131
2.120
2.110
2 101
2 093
2.086
2 OSO
2 074
2 069
2.064
2.060
2 056
2.052
2.048
2.045
2.042
2.021
2.000
1.980
1 960
.90
31 821
6 965
4.541
3.747
3.365
3.143
2 998
2 896
2 821
2.764
2.718
2.681
2 650
2.624
2 602
2 583
2 567
2.552
2.539
2.523
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.423
2.390
2.353
2 326
.995
63.657
9.925
5.841
4.604
4.032
3.707
3 499
3 355
3 250
3.169
3.106
3.055
3 012
2.977
2.947
2.921
2 898
2.878
2.861
2.S45
2.S31
2.819
2 807
2.797
2.787
2.779
2.771
2 763
2.756
2.750
2.704
2.660
2.617
2.576
.9995
636 619
31.598
12.941
8.610
6.859
5.959
5. 405
5 041
4 781
4 587
4 437
4.318
4.221
4.140
4.073
4.015
3 965
3 922
3 383
3.850
3.S19
3 T92
3 767
3.745
3.725
3.707
3.690
3 . 674
3.659
3.646
3.551
3.460
3 373
3 291
SOURCE: CRC Handbook of Tables fop Probability and Statistics. 1966.
W. H. Beyer, Editor. Published by the Chemical Rubber Company.
Ohio.
Cleveland,
B-10
-------
TABLE 7. VALUES OF THE PARAMETER \ FOR COHEN'S ESTIMATES
ADJUSTING FOR NONDETECTED VALUES
X -
.00 .9MIOO
.at .910941
.10 .010930
.13 .011.310
.30 .3US43
.23 .OUS12
.30 .012243
.33 .012S30
.40 .012784
.ti \ .01303*
.M .013179
.34 .013913
.«O .013739
.14 .013911
-TO .014.171
.75 .914378
.10 .01*37*
.84 .014775
.90 .4149*7
.iS .313134
1.90 .01133*
!\ .=.
.03
.02O40O
.021294
.0220*2
.022791
.3334M
.3240T«
.02414*
.02 3211
.023738
.02*243
.33175*
.327194
.027*49
.3210*7
.021314
.028927
.029330
.029723
.030107
.0304*3
.03MM
.30
.03
.330*03
.332223
.03339*
.11344*4
.035413
.038177
.037249
.03*077
.038*8*
.339*24
.040332
.041034
.341733
.042391
.343030
.043*32
.04423*
.044*4*
.043423
.043*19
.044340
.33
.04
.0413*3
.0433*3
.044902
.04131*
.047*2*
.041*3*
.03091*
.011120
.031173
.033113
.934133
.033019
.033993
.031*74
.OS7T21
.031331
.0393(4
.0*0133
.0*0923
.0*1*7*
.9*3413
.40
.as
.033307
.0*4670
.03*391
.03833*
.0399*0
.011322
.012919
.01434J
.013*10
.011121
.011133
.01930*
.070439
.371331
.3721O3
.073*43
.074*33
.073142
.079108
.077349
. 07*471
.43
.04
.013127
.061189
.011413
.070311
.072139
.074372
.071101
.077711
.079332
.010*43
.042301
.01370*
.OI3O11
.011381
.017170
.01*917
.09O133
.091319
.092477
.0»3111
.0*4720
.30
.or
.1)74931
.077909
.91094*
.M3OO9
.013210
.017413
.5*9433
.391333
.093193
.0*493*
.094*37
.09129*
.099*17
.10143
. 10292
.10431
.101*0
.10719
.10134
.10917
.1111*
.13
.0*
.0*44«*
.019134
.092132
.093129
.09121*
.100*3
. 1029S
.10311
. 10723
. 1092*
.11131
.1130*
.11490
. 111*1
.il»37
.12004
.m«7
.12323
.124*0
.12*32
.127*0
.M
.01
.09*24
.10197
.10334
. 10)43
.11134
.11401
.11SS7
.11914
.12130
. 12377
.13299
. 1290*
.13011
.13309
.13402
.13390
.-J773
. 13912
.14 126
.14297
.1*4*3
«3
.10
. 11030
.11431
.11*04
.1314*
.1341*
.13772
.13039
. 13333
.13193
.13847
.14090
. 14323
. 14312
.14773
.149*7
.11191
. 1.5400
. 11199
. 13793
.119*3
.11170
.10
.19
.17342
.17933
. 18479
.1*913
.1*4*0
. 19910
.20330
.237+7
.:;i39
.21317
.21812
.22233
.22378
.22910
.23234
.2333O
.22*31
.24111
.24412
.24740
.23012
.10
~\^
.242*1 00
.230331 .03
.14741 .10
.31403 .13
.27031 .20
.2T12I .23
.21193 .30
.3*737 .31
.29250 .40
.297*3 .41
.30233 .10
.30725 13
.31184 M
.31133 11
.320*3 .TO
.32419 .75
. 32903 1 JO
.33307 13
.23703 ,*>
.34091 .93
.34471 1.00
.90 */\
.30 | .31892
.33 .33793
. 10 33142
.13 .34410
.20 .31233
.23 .31993
.30 .38700
.40 .38033
.43 .3*143
.30 .39271
.11 .39470
.*0 .40447
.43 .4100*
.70 .41133
.73 42090
.40 .42812
.93 .43122
.90 .43823
1 .15 .44112
1.00 .44392
.4021
.4130
.4223
.4330
.4423
.4110
.4193
.4733
.4131
.49O4
.4978
.1043
'.3114
.1110
.1243
.1308
.1370
.1430
.349O
.3341
.'.941
.KM*
.3184
.129*
.1403
.1301
.1404
.1791
. »»0
.19*7
.1031
.3133
.0213
.1291
.3387
.5441
.6113
.63*4
.111*
.1724
.3911 .7091 .1381 .»OI 1.143 .33* 1.141 2.178 3.283 .90 I
.5101 .7232 .8540 .9994 t.iSl .318 1.183 2.203 3.314 93
.8234 .7400 .8703 1.017 I. 183 .379 1 308 2.229 3.3451 10
.6361 .7342 .3860 1.033 1.204 .400 '..530 2 231 1.3781 .13
.6413 .7873 ,3012 1.011 1.222 .413 all 2.280 3.403 23 j
.540O .7810 .911* 1.D4T 1.240 .439 .473 2.303 3.433 .23
.6713 .7937 .8300 1.013 1.2.17 .437 .iJ3 2.329 3.4«4 .30
,3821 .8040 .9437 1.09* 1.274 1.478 .713 2.333 3.492 ,33
.S927 .8179 .9370 1.113 1.290 1.494 .732 2.378 3.120 .40
.7329 .8293 .9700 1.137 1.204 1.311 .731 2.399 3.147| 43
1
.7119 .8401 .9121 1.141 1.221 1.32* .770 2.421 3.3751 .SO
.7115 .1317 .9»50 1.134 1.337 1.343 ,78* 2.443 3-JOi; .34
.7320 .3121 1.307 1.188 1.331 1.3*1 .S0« 2.4«3 3.5281 .SO
.7412 .8719 1.319 1.182 1.36* 1.377 .124 2.13S j.sul .43
.7302 .3U2 1.030 1.193 1.280 1.193 .341 2.107 3.379 .70
.7190 .J932 1.042 1.207 1.394 I. SOS .33* 2.128 2. 703 1 .75
.7878 .9031 1.033 1.220 1.408 U524 .873 2.148 3.730! .30
.7781 .J127 1.044 1.232 1.422 1.539 .492 2.3
-------
TABLE 8. CRITICAL VALUES FOR T. (ONE-SIDED TEST) WHEN THE
STANDARD DEVIATION IS CALCULATED FROM
THE SAME SAMPLE
Number of
Qburvaiioaa.
m
3
4
i
6
7
I
9
10
11
12
13
14
15
16
17
IS
19
20
^i
22
23
24
2!
26
27
23
29
30
31
32
33
34
35
36
37
38
39
40
41
a;
43
44
45
4«
47
48
49
SO
Upper O.J*
Sifnifiaac*
£.««!
I.1S5
1.499
1.780
2.01 1
2.201
2J5S
2.492
2.606
1705
2.771
2.867
2.93S
1997
3052
j.IOJ
3.U9
3.191
3.230
3.266
3.300
3.332
3.362
3.J39
3.415
3.440
3464
3.4*6
3J07
3.523
3.546
3.565
3.582
3.599
3.61«
3.631
3.646
3.660
3.673
3.687
3.700
3.712
3.n*
3.736
3.747
3.757
3.763
3.779
3.789
Upper 0.5*
SUmficarK*
Le»d
1.155
1.496
1.764
1.973
2.139
1274
2J37
1482
2J64
1636
1699
1755
1806
tM2
1S»4
1932
196S
3.001
3.031
3.060
3.087
3.112
3.135
3.157
3.178
3.190
3.218
3.236
3.253
3.270
3.:S6
3.301
3JI6
3J30
3J43
3JS6
3J69
3J3I
3.393
3.404
3.415
3.425
34J5
3.445
3.455
3.464
3474
J.483
Upper 1*
Significance
Uvd
1.135
1.492
1.749
1.944
1097
1221
2J2J
1410
2.485
1550
1607
1659
1705
2.747
1785
1821
1854
1884
2.912
1939
1963
2.9S7
3.009
3.029
3.049
3.068
3.085
3.103
3.119
3.135
3.150
3.164
3.178
3.191
3.204
3.216
3J2S
3.240
3.2JI
3-261
3.271
3.2S2
3.292
3.302
3.310
3.319
3.329
3.336
Upper 15%
Significance
Lod
1.155
1.48 1
1.7IJ
t.887
1020
1126
1215
1290
2.355
2.412
2.462
2J07
1549
:..<»
2.610
1651
2.681
1709
1733
2.718
2.7S1
2.302
2.822
1841
1S59
2.S76
2.S93
2.908
2.924
2.938
2.952
2.965
2.979
2.991
3.003
3.014
3.023
3.036
3.046
3.057
3.067
3.073
3.0S5
3.094
J.J03
3.111
3.120
3.128
Upper 5*
Significaasi
Leod
1.15}
1.463
1.672
1.822
1.938
2.032
1110
1176
2.234
1285
2.331
2.371
2.409
2.443
2.475
1504
1532
2.557
1«80
2.603
2.624
1644
1663
1681
1698
2.714
2.730
1745
2.759
2.773
17S6
2.799
2.311
2.S23
2.835
2.846
1S57
1866
1S77
18S7
2.896
1905
1914
2.923
2.951
2.940
2.948
2.956
Upper 10%
Sigmficancr
Lent
1.148
1.425
1.602
1.729
1.3128
1.909
t.977
1036
10S8
1134
1175
1213
1247
^ ^7<4
2.309
1335
1361
13S5
240S
2429
2.448
2,467
2.486
2.J02
:.J19
2.534
2.5-9
2.563
2.577
2.J9I
?.M»
:.616
2.628
:!.6?9
;r.65o
.1.661
::.6"i
2.682
1692
1700
2.710
1719
1727
173*
2."-U
2.753
:.7«o
2.768
(Continued)
B-12
-------
TABLE 8 (Continued)
Number of
ObMTtaiMMS.
•
51
52
53
54
55
i6
57
58
59
60
61
«:
63
64
65
66
67
6S
69
70
71
72
73
74
75
76
77
7S
79
.10
3!
32
83
S4
85
36
87
Sa
39
90
91
9:
93
94
95
96
97
91
•>»
100
Upper O.I=V
Significant*
Level
3.798
3.808
3.816
3.825
3.834
3.342
3.851
3.8JS
3.867
3.J74
3SS2
3 Si-*
3.396
3.903
3.910
3.917
3.923
3.930
3936
3.W2
3.94g
3.954
J.960
3.965
3.971
3.97T
3.9S2
3.9*7
3.992
3.998
4.00:
4007
4oi:
4.017
4021
4026
4031
4035
4.03*
4.044
4.049
4.053
4057
4 OKI
40M
4.0»9
4.073
407i
4.0W
««i4
L'pperO.3*
Stamficaacc
Level
3.491
3.500
3.507
3.516,
3.524
3.J31
3.53*
3.J46
ZJi)
3.5<0
3.<6«
3.5"
3.J7»
3J>6
3J»2
3.3M
3.605
3.610
3.617
3.622
3.627
3.633
3.63*
3.043
3.<4S
3.6-4
3.653
3.M3
3.469
3.673
J*T7
3.6S2
3.6S7
3.691
3.645
3.699
3.704
3.708
3.712
3.716
1.720
3.715
3.72S
3.732
3.736
3.739
3.744
3.747
3.750
3.754
Upper 1%
Significant*
Lend
3J45
3J53
3J6I
3J6S
3J76
3.383
3J9I
3.397
3.405
3.411
3.415
3424
3.430
3.437
3.442
1.449
3.454
3.460
3.466
3.471
3.476
3482
3.487
3.492
3.496
3.502
3J07
3.511
3.516
3J2I
3.325
3.529
3.534
3.539
3.543
3.347
3.551
3-555
3J59
3.363
3.567
3-570
3.375
3.379
3.382
3.5S6
3.5S9
3.393
3.597
3.600
Upper 2.5*
Significance
Level
3.136
3.143
3.151
3.15J
3.166
3.172
3.180
J.I 36
3.193
3.IW
.'.205
? :i2
J2'.S
3.224
3.230
3035
3-241
3.246
3.252
3.257
3262
3.267
3.272
3.278
3.232
3.2S7
3.291
3.297
3.301
3.305
3.309
3.315
3.319
3.323
3.327
3.331
3.335
3.339
3.343
3.347
3-550
3.355
3.358
3.362
3J65
3.369
J.372
3.377
3.3SO
3-583
Upper 51
Significance
Loci
2.964
1971
2.978
2.9S6
2.992
3.000
3.006
3.013
3.019
3.025
3.032
3037
J044
3.049
3.055
3.061
3.066
3.071
3.076
30S2
3087
3.092
3.093
3 102
3.107
3.1 II
3.117
3.121
3.125
3 130
3.134
3.139
3.U3
3.147
3.151
3.155
3.160
3.163
3.167
3.171
J.174
3.179
3.182
3 186
3.189
3.193
3.196
3.201
3.204
3.207
Upptr I0"v
Significance
Level
2.77J
2.78J
2.790
1798
2.J04
2.811
2.8 IS
1824
2.831
2.837
2.M2
2.M*
:.s:4
:.,S60
2.S66
2.871
2.877
2.833
2.3XS
2.393
2.897
2.903
2.90S
2.912
2.917
2.922
2.927
2.931
2.935
2.940
2.945
2.949
2.953
2.957
2.961
3.966
2.970
2.973
2.977
2.981
2.9S4
2.989
2.993
2.996
3000
3.003
3.006
3.011
3014
3.017
(Continued)
B-13
-------
TABLE 8 (Continued)
Loci
Significance
Leper ir
Sifmlicancc
Sitnifioj.xc
Lerci
Significance
Lori
L'npcr WV
Significance
Le<-ei
i
1
1
1
1
1
!
i
1
1
1
1
1
1
!
1
i
1
1
!
1
I
!
1
i
i
1
1
j
1
1
!
'
1
i
;-
i
'•
i.
;.
jl
12
J3
»
J5
)t>
J7
"5
N
0
1
2
3
4
;
t>
7
1
9
n
.
2
3
4
5
5
7
j
t
0
1
2
i
4
5
t.
i
4
(i
1!
12
3
4
<
p
7
4.0»S
4095
4.0VS
4.102
4.105
4 109
4 112
4.116
4 1 19
4 12:
4 125
4.1:9
4.132
4 135
4 US
4 141
4 144
4 146
4 150
4153
4 156
4 ;59
4 161
4 1 />4
4 |6<»
4 IW*
4 173
4175
4 rs
4 ISO
4 183
4 1 85
4 ij.\
4 1VO
4.193
4 196
4 19*
4 2-lfi
4.2D3
4205
4.207
4209
4 212
4214
4216
42N
3.7M
3.765
37t.li
3771
3.774
3777
3 7*0
1?84
3.787
3.7W
3. ^3
3794
3.759
3.802
3 !05
3 80*
3 I'M
3SI4
3SI7
3SI9
3 S22
3 s:*
3827
3 S3 1
3S33
3 1-36
3 S3S
3 S40
3 543
3 £45
3 H»
3 .ISO
3 i.".'
.1 >Jo
3S7t
3.S7V
38S1
3 »>J
3.603
3.W
3 610
3 6U
3.617
3*20
3.623
3626
3 629
363:
3636
3639
jw:
1645
3647
3650
3653
3656
3659
3662
3.665
3667
1670
367:
3675
3677
36SO
3.6*3
3.M!6
)6j»
36"0
369.1
369!
3697
3 •••»
3-0:
3-04
3 ^O1
3 710
3 712
3.714
3.716
3 719
3 T:i
3723
3 *:;
3 •':•'
338*
3.390
3.393
3.39-
3400
3 40?
3.406
3.4C«
341:
3.415
3 418
3422
3 424
3427
3430
3433
3433
3,438
3441
3444
3.447
3.450
3452
3455
3.457
> 4150
3 4ft2
3 465
34o7
3 4TO
1471
' 47?
t 4~X
1 ^,\o
3 4»2
34S4
3.487
3 4«9
3491
.' 493
349">
.i 499
3 501
3 503
3 50<
3 507
3.509
3.214
3.217
3.220
3.2Z4
3.227
3.230
3.233
3 236
32--9
3.242
3 245
3.245
3.251
32J4
3257
3.259
3262
3265
3267
3.270
3 :-4
3 :-6
3 2 "9
3.231
3 254
3.2S6
3 2S9
3 291
3 294
5.296
j 29S
3 302
3 J04
3 306
3 309
3 311
3 3 1 3
3315
3 3'S
3320
3322
3.324
3.3>
332S
3.331
3 334
3.021
3024
3.02?
3030
3033
3037
30-WJ
3043
.* 04«
3049
305:
3055
30J!
3061
3064
3067
3070
3C.7J
3.0-5
3078
3.041
3C5.1
3 OD6
3 "if
3 o-:
3 045
1 OT
3 !
-------
APPENDIX C
GENERAL BIBLIOGRAPHY
C-l
-------
The following list provides the reader with those references directly
mentioned in the text. It also includes, for those readers desiring further
information, references to literature dealing with selected subject matters in
a broader sense. This list is in alphabetical order.
ASTM Designation: E178-75. 1975. "Standard Recommended Practice for Dealing
with Outlying Observations."
ASTM Manual on Presentation of Data and Control Chart Analysis. 1976. ASTM
Special Technical Publication 15D.
Barari, A., and L. S. Hedges. 1985. "Movement of Water in Glacial Till."
Proceedings of the 17th International Congress of the International'Association of
Hydrogeologists. pp. 129-134.
Barcelona, M. J., J. P. Gibb, J. A. Helfrich, and E. E. Garske. 1985. "Prac-
tical Guide for Ground-Water Sampling." Report by Illinois State Water Sur-
vey, Department of Energy and Natural Resources for USEPA. EPA/600/2-85/104.
Bartlett, M. S. 1937. "Properties of Sufficiency and Statistical Tests."
Journal of the Royal Statistical Society, Series A. 160:268-282.
Box, G. E. P., and J. M. Jenkins. 1970. Time Series Analysis. Holden-Day, San
Francisco, California.
Brown, K. W., and D. C. Andersen. 1981. "Effects of Organic Solvents on the
Permeability of Clay Soils." EPA 600/2-83-016, Publication No. 83179978, U.S.
EPA, Cincinnati, Ohio.
Cohen, A. C., Jr. 1959. "Simplified Estimators for the Normal Distribution
When Samples Are Singly Censored or Truncated." Techno metrics. 1:217-237.
Cohen, A. C., Jr. 1961. "Tables for Maximum Likelihood Estimates: Singly
Truncated and Singly Censored Samples." Technometrics. 3:535-541.
Conover, W. J. 1980. Practical Nonparametric Statistics. Second Edition, John
Wiley and Sons, New York, New York.
CRC Handbook of Tables for Probability and Statistics. 1966. William H. Beyer
(ed.). The Chemical Rubber Company.
Current Index to Statistics. Applications, Methods and Theory. Sponsored by
American Statistical Association and Institute of Mathematical Statistics.
Annual series providing indexing coverage for the broad field of statistics.
David, H. A. 1956. "The Ranking of Variances in Normal Populations." Jour-
nal of the American Statistical Association. Vol. 51, pp. 621-626.
Davis, J. C. 1986. Statistics and Data Analysis in Geology. Second Edition.
John Wiley and Sons, New York, New York.
C-3
-------
Dixon, W. J., and F. J. Massey, Jr. 1983. Introduction to Statistical Analysis.
Fourth Edition. McGraw-Hill, New York, New York.
Freeze, R. A., and J. A. Cherry. 1979. Ground water. Prentice Hall, Inc.,
Englewood Cliffs, New Jersey.
Gibbons, R. D. 1987. "Statistical Prediction Intervals for the Evaluation of
Ground-Water Quality." Ground Water. Vol. 25, pp. 455-465.
Gibbons, R. D. 1988. "Statistical Models for the Analysis of Volatile
Organic Compounds in Waste Disposal Sites." Ground Water. Vol. 26.
Gilbert, R. 1987. Statistical Methods for Environmental Pollution Monitoring.
Professional Books Series, Van Nos Reinhold.
Hahn, G. and W. Nelson. 1973. "A Survey of Prediction Intervals and Their
Applications." Journal of Quality Technology. 5:178-188.
Heath, R. C. 1983. Basic Ground-Water Hydrology. U.S. Geological Survey
Water Supply Paper. 2220, 84 p.
Hirsch, R. M., J. R. Slack, and R. A. Smith. 1982. "Techniques of Trend
Analysis for Monthly Water Quality Data." Water Resources Research. Vol. 18,
No. 1, pp. 107-121.
Hockman, K. K., and J. M. Lucas. 1987. "Variability Reduction Through Sub-
vessel CUSUM Control. Journal of Quality Technology. Vol. 19, pp. 113-121.
Hollander, M., and D. A. Wolfe. 1973. Nonparametric Statistical Methods. John
Wiley and Sons, New York, New York.
Huntsberger, D. V., and P. Billingsley. 1981. Elements of Statistical Infer-
ence. Fifth Edition. Allyn and Bacon, Inc., Boston, Massachusetts.
Johnson, N. L., and F. C. Leone. 1977. Statistics and Experimental Design in
Engineering and the Physical Sciences. 2 Vol., Second Edition. John Wiley and
Sons, New York, New York.
Kendall, M. G., and A. Stuart. 1966. The Advanced Theory of Statistics.
3 Vol. Hafner Publication Company, Inc., New York, New York.
Kendall, M. G., and W. R. Buckland. 1971. A Dictionary of Statistical Terms.
Third Edition. Hafner Publishing Company, Inc., New York, New York.
Kendall, M. G. 1975. Rank Correlation Methods. Charles Griffin, London.
Lang ley, R. A. 1971. Practical Statistics Simply Explained. Second Edition.
Dover Publications, Inc., New York, New York.
Lehmann, E. L. 1975. Nonparametric Statistical Methods Based on Ranks. Moisten
Day, San Francisco, California.
C-4
-------
Lieberman, G. J. 1958. "Tables for One-Sided Statistical Tolerance
Limits." Industrial Quality Control. Vol. XIV, No. 10.
Lilliefors, H. W. 1967. "On the Kolmogorov-Smirnov Test for Normality with
Mean and Variance Unknown." Journal of the American Statistical Association.
64:399-402.
Lingren, B. W. 1976. Statistical Theory. Third Edition. McMillan.
Lucas, J. M. 1982. "Combined Shewhart-CUSUM Quality Control Schemes." Jour-
nal of Quality Technology . Vol. 14, pp. 51-59.
Mann, H. B. 1945. "Non-parametric Tests Against Trend." Econometrica.
Vol. 13, pp. 245-259.
Miller, R. G., Jr. 1981. Simultaneous Statistical Inference. Second Edition.
Springer-Verlag, New York, New York.
Mull, D. S., T. 0. Liebermann, J. L. Smoot, and L. H. Woosley, Jr. 1988.
"Application of Dye-Tracing Techniques for Determining Solute Transport
Characteristics of Ground Water in Karst Terranes." USEPA, EPA 904/6-88-001,
October 1988. 103 pp.
Nelson, L. S. 1987. "Upper 10%, 5%, and 1%—Points of the Maximum F-
Ratio." Journal of Quality Technology. Vol. 19, p. 165.
Nelson, L. S. 1987. "A Gap Test for Variances." Journal of Quality Technol-
ogy. Vol. 19, pp. 107-109.
Noether, G. E. 1967. Elements of Nonparametric Statistics. Wiley, New York.
Pearson, E. S., and H. 0. Hartley. 1976. Biometrika Tables for Statistician.
Vol. 1, Biometrika Trust, University College, London.
Quade, D. 1966. "On Analysis of Variance for the K-Sample Problem." Annals
of Mathematical Statistics. 37:1747-1748.
Quinlan, J. F. "Ground-Water Monitoring in Karst Terranes: Recommended
Protocols and Implicit Assumptions." EPA/600/X-89/050, March 1989.
Remington, R. D., and M. A. Schork. 1970. Statistics with Applications to the Bio-
logical and Health Sciences. Prentice-Hall, pp. 235-236.
Shapiro, S. S., and M. R. Wilk. 1965. "An Analysis of Variance Test for Nor-
mality (Complete Samples)." Biometrika. Vol. 52, pp. 591-611.
Snedecor, G. W., and W. G. Cochran. 1980. Statistical Methods. Seventh Edi-
tion. The Iowa State University Press, Ames, Iowa.
C-5
-------
Starks, T. H. 1988 (Draft). "Evaluation of Control Chart Methodologies for
RCRA Waste Sites." Report by Environmental Research Center, University of
Nevada, Las Vegas, for Exposure Assessment Research Division, Environmental
Monitoring Systems Laboratory-Las Vegas, Nevada. CR814342-01-3.
"Statistical Methods for the Attainment of Superfund Cleanup Standards
(Volume 2: Ground Water—Draft)."
Steel, R. G. D., and J. H. Torrie. 1980. Principles and Procedures of Statistics,
A Biometrical Approach. Second Edition. McGraw-Hill Book Company, New York,
New York.
Todd, D. K. 1980. Ground Water Hydrology. John Wiley and Sons, New York,
534 p.
Tukey, J. W. 1949. "Comparing Individual Means in the Analysis of Vari-
ance." Biometrics. Vol. 5, pp. 99-114.
Statistical Software Packages:
BMDP Statistical Software. 1983. 1985 Printing. University of California
Press, Berkeley.
Lotus 1-2-3 Release 2. 1986. Lotus Development Corporation, 55 Cambridge
Parkway, Cambridge, Massachusetts 02142.
SAS: Statistical Analysis System, SAS Institute, Inc.
SAS® User Is Guide: Basics, Version 5 Edition, 1985.
SAS® User's Guide: Statistics, Version 5 Edition, 1985.
SPSS: Statistical Package for the Social Sciences. 1982. McGraw-Hill.
SYSTAT: Statistical Software Package for the PC. Systat, Inc., 1800 Sherman
Avenue, Evanston, Illinois 60201.
C-6
-------
APPENDIX D
FEDERAL REGISTER, 40 CFR, Part 264
D-l
-------
Tuesday
October 11, 1388
Part II
Environmental
Protection Agency
40 CFR Part 264
Statistical Methods for Evaluating
Ground-Water Monitoring From
Hazardous Waste Facilities; Final Rule
D-3
-------
39728 Federal Register / Vol. 53, No. 196 / Tuesday, October 11, 1988 / Rules and Regulations
final authorization will have to revise
their programs to cover the additional
requirements in today's announcement.
Generally, these authorized State
programs must be revised within one
year of the date of promulgation of such
standards, or within two years if the
State must amend or enact a statute in
order to make the required revision (see
40 CFR 271,21). However, States may
always impose requirements which are
more stringent or have greater coverage
than EPA's programs.
Regulations which are broader in
scope, however, may not be enforced as
part of the federally-authorized RCRA
program.
B. Regulatory Impact Analysis
Executive Order 12291 (46 FR 13191,
February 9,1981) requires that a
regulatory agency determine whether a
new regulation will be "major" and, if
so, that a Regulatory Impact Analysis be
conducted. A major rule is defined as a
regulation that is likely to result in:
1. An annual effect on the economy of
$100 million or more;
2. A major increase in costs or prices
for consumers, individual industries.
Federal. State, or local government
agencies or geographic regions; or
3. Significant adverse effects on
competition, employment, investment,
productivity, innovation, or the ability of
United States-based enterprises to
compete with foreign-based enterprises
in domestic or export markets.
The Agency has determined that
today's regulation is not a major rule
because it does not meet the above
criteria. Today's action should produce
a net decrease in the cost of ground-
water monitoring at each facility. This
final rule has been submitted to the
Office of Management and Budget
(OMB) for review in accordance with
Executive Order 12291. OMB has
concurred with this final rule.
C. Regulatory Flexibility Act
Pursuant to the Regulatory Flexibility
Act, 5 U.S.C. 601 et seq., whenever an
agency is required to publish a general
notice of rulemaking for any proposed or
final rule, it must prepare and make
available for public comment a
regulatory flexibility analysis which
describes the impact of the rule on small
entities (e.g., small businesses, small
organizations, and small governmental
jurisdictions). The Administrator may
certify, however, that the rule will not
have a significant economic impact on a
substantial number of small entities. As
stated above, this final rule will have no
adverse impacts on businesses of any
size. Accordingly, I hereby certify that
this regulation will not have a
significant economic impact on a
substantial number of small entities.
This final rule, therefore, does not
require a regulatory flexibility analysis.
List of Subjects in 40 CFR Part 264
Hazardous material, Reporting and
recordkeeping requirements, Waste
treatment and disposal, Ground water.
Environmental monitoring.
Date: September 28,1988.
Lee M. Thomas,
Administrator.
Therefore, 40 CFR Chapter I is
amended as follows:
PART 264—STANDARDS FOR
OWNERS AND OPERATORS OF
HAZARDOUS WASTE TREATMENT,
STORAGE, AND DISPOSAL
FACILITIES
1. The authority citation for Part 264
continues to read as follows:
Authority: Sees. 1006, 2002(a), 3004. and
3005 of the Solid Waste Disposal Act, as
amended by the Resource Conservation and
Recovery Act, as amended (42 U.S.C. 6905,
6912(a), 6924, and 6925).
2. In § 264.91 by revising paragraphs
(a)(l) and (a)(2) to read as follows:
§ 264.91 Required programs.
(a) * * *
(1) Whenever hazardous constituents
under § 264.93 from a regulated unit are
detected at a compliance point under
| 264.95, the owner or operator must
institute a compliance monitoring
program under § 264.99. Detected is
defined as statistically significant
evidence of contamination as described
in § 264.98(f);
(2) Whenever the ground-water
protection standard under § 264.92 is
exceeded, the owner or operator must
institute a corrective action program
under § 264.100. Exceeded is defined as
statistically significant evidence of
increased contamination as described in
§ 264.99(d);
*****
3. Section 264.92 is revised to read as
follows:
§ 264.92 Ground-water protection
standard.
The owner or operator must comply
with conditions specified in the facility
permit that are designed to ensure that
hazardous constituents under § 264.93
detected in the ground water from a
regulated unit do not exceed the
concentration limits under § 264.94 in
the uppermost aquifer underlying the
waste management area beyond the
point of compliance under § 264.95
during the compliance period under
§ 264.96. The Regional Administrator
will establish this ground-water
protection standard in the facility permit
when hazardous constituents have been
detected in the ground water.
4. In § 264.97 by removing the word
"and" from the end of (a)(l),
redesignating and revising (g)(3) as
(a)(l)(i), adding (a)(3), revising
paragraphs (g) and (h), and adding (i)
and (j), to read as follows:
§ 264.97 General ground-water monitoring
requirements.
(a) * * *
(1) * * *
(i) A determination of background
quality may include sampling of wells
that are not hydraulically upgradient of
the waste management area where:
(A) Hydrogeologic conditions do not
allow the owner or operator to
determine what wells are hydraulically
upgradient; and
(B) Sampling at other wells will
provide an indication of background
ground-water quality that is
representative or more representative
than that provided by the upgradient
wells; and
*****
(3) AJlow for the detection of
contamination when hazardous waste or
hazardous constituents have migrated
from the waste management area to the
uppermost aquifer.
*****
(g) In detection monitoring or where
appropriate in compliance monitoring,
data on each hazardous constituent
specified in the permit will be collected
from background wells and wells at the
compliance point(s). The number and
kinds of samples collected to establish
background shall be appropriate for the
form of statistical test employed,
following generally accepted statistical
principles. The sample size shall be as
large as necessary to ensure with
reasonable confidence that a
contaminant release to ground water
from a facility will be detected. The
owner or operator will determine an
appropriate sampling procedure and
interval for each hazardous constituent
listed in the facility permit which shall
be specified in the unit permit upon
approval by the Regional Administrator.
This sampling procedure shall be:
(1) A sequence of at least four
samples, taken at an interval that
assures, to the greatest extent
technically feasible, that an independent
sample is obtained, by reference to the
uppermost aquifer's effective porosity,
hydraulic conductivity, and hydraulic
gradient, and the fate and transport
D-4
-------
Federal Register / Vol. 53. No. 196 / Tuesday. October 11. 1988 / Rules and Regulations
39729
characteristics of the potential
contaminants, or
(2) an alternate sampling procedure
proposed by the owner or operator and
approved by the Regional
Administrator.
(h) The owner or operator will specify
one of the following statistical methods
to be used in evaluating ground-water
monitoring data for each hazardous
constituent which, upon approval by the
Regional Administrator, will be
specified in the unit permit. The
statistical test chosen shall be
conducted separately for each
hazardous constituent in each well.
Where practical quantification limits
(pql's) are used in any of the following
statistical procedures to comply with
§ 264.97(i)(5), the pql must be proposed
by the owner or operator and approved
by the Regional Administrator. Use of
any of the following statistical methods
must be protective of human health and
the environment and must comply with
the performance standards outlined in
paragraph (i) of this section.
(1) A parametric analysis of variance
(AN'OVA) followed by multiple
comparisons procedures to identify
statistically significant evidence of
contamination. The method must
include estimation and testing of the
contrasts between each compliance
well's mean and the background mean
levels for each constituent.
(2) An analysis of variance (ANOVA)
based on ranks followed by multiple
comparisons procedures to identify
statistically significant evidence of
contamination. The method must
include estimation and testing of the
contrasts between each compliance
well's median and the background
median levels for each constituent.
(3) A tolerance or prediction interval
procedure in which an interval for each
constituent is established from the
distribution of the background data, and
the level of each constituent in each
compliance well is compared to the
upper tolerance or prediction limit.
(4) A control chart approach that gives
control limits for each constituent.
(5) Another statistical test method
submitted by the owner or operator and
approved by the Regional
Administrator.
(i) Any statistical method chosen
under § 264.97(h) for specification in the
unit permit shall comply with the
following performance standards, as
appropriate:
(1) The statistical method used to
evaluate ground-water monitoring data
shall be appropriate for the distribution
of chemical parameters or hazardous
constituents. If the distribution of the
chemical parameters or hazardous
constituents is shown by the owner or
operator to be inappropriate for a
normal theory test, then the data should
be transformed or a distribution-free
theory test should be used. If the
distributions for the constituents differ,
more than one statistical method may be
needed.
(2) If an individual well comparison
procedure is used to compare an
individual compliance well constituent
concentration with background
constituent concentrations or a ground-
water protection standard, the test shall
be done at a Type I error level no less
than 0.01 for each testing period. If a
multiple comparisons procedure is used,
the Type I experimentwise error rate for
each testing period shall be no less than
0.05; however, the Type I error of no less
than 0.01 for individual well
comparisons must be maintained. This
performance standard does not apply to
tolerance intervals, prediction intervals
or control charts.
(3) If a control chart approach is used
to evaluate ground-water monitoring
data, the specific type of control chart
and its associated parameter values
shall be proposed by the owner or
operator and approved by the Regional
Administrator if he or she finds it to be
protective of human health and the
environment.
(4) If a tolerance interval or a
prediction interval is used to evaluate
groundwater monitoring data, the levels
of confidence and, for tolerance
intervals, the percentage of the
population that the interval must
contain, shall be proposed by the owner
or operator and approved by the
Regional Administrator if he or she finds
these parameters to be protective of
human health and the environment.
These parameters will be determined
after considering the number of samples
in the background data base, the data
distribution, and the range of the
concentration values for each
constituent of concern.
(5) The statistical method shall
account for data below the limit of
detection with one or more statistical
procedures that are protective of human
health and the environment. Any
practical quantification limit (pql)
approved by the Regional Administrator
under § 264.97(h) that is used in the
statistical method shall be the lowest
concentration level tha can be reliably
achieved within specified limits of
precision and accuracy during routine
laboratory operating conditions that are
available to the facility.
(6) If necessary, the statistical method
shall include procedures to control or
correct for seasonal and spatial
variability as well as temporal
correlation in the data.
(j) Ground-water monitoring data
collected in accordance with paragraph
(g) of this section including actual levels
of constituents must be maintained in
the facility operating record. The
Regional Administrator will specify in
the permit when the data must be
submitted for review.
5. In § 264.98 by removing paragraphs
(i), (j) and (k), and by revising
paragraphs (c), (d), (f), (g), and (h) to
read as follows:
§ 264.98 Detection monitoring program.
*****
(c) The owner or operator must
conduct a ground-water monitoring
program for each chemical parameter
and hazardous constituent specified in
the permit pursuant to paragraph (a) of
this section in accordance with
§ 264.97{g). The owner or operator must
maintain a record of ground-water
analytical data as measured and in a
form necessary for the determination of
statistical significance under § 264.97(h).
(d) The Regional Administrator will
specify the frequencies for collecting
samples and conducting statistical tests
to determine whether there is
statistically significant evidence of
contamination for any parameter or
hazardous constituent specified in the
permit under paragraph (a) of this
section in accordance with § 264.97(g). A
sequence of at least four samples from
each well (background and compliance
wells) must be collected at least semi-
annually during detection monitoring.
* *. * * *
(f) The owner or operator must
determine whether there is statistically
significant evidence of contamination
for any chemical parameter of
hazardous constituent specified in the
permit pursuant to paragraph (a) of this
section at a frequency specified under
paragraph (d) of this section.
(1) In determining whether
statistically significant evidence of
contamination exists, the owner or
operator must use the method(s)
specified in the permit under § 264.97(h).
These method(s) must compare data
collected at the compliance point(s) to
the background ground-water quality
data.
(2) The owner or operator must
determine whether there in statistically
significant evidence of contamination at
each monitoring well as the compliance
point within a reasonable period of time
after completion of sampling. The
Regional Administrator will specify in
the facility permit what period of time is
reasonable, after considering the
D-5
-------
39730 Federal Register / Vol. 53. No. 196 / Tuesday. October 11. 1988 / Rules and Regulations
complexity of the statistical test and the
availability of laboratory facilities to
perform the analysis of ground-water
samples.
(g) If the owner or operator
determines pursuant to paragraph (f) of
this section that there is statistically
significant evidence of contamination
for chemical parameters or hazardous
constituents specified pursuant to
paragraph (a) of this section at any
monitoring well at the compliance point,
he or she must:
(1) Notify the Regional Administrator
of this finding in writing within seven
days. The notification must indicate
what chemical parameters or hazardous
constituents have shown statistically
significant evidence of contamination;
(2) Immediately sample the ground
water in all monitoring wells and
determine whether constituents in the
list of Appendix IX of Part 264 are
present, and if so, in what
concentration.
(3) For any Appendix IX compounds
found in the analysis pursuant to
paragraph (g)(2) of this section, the
owner or operator may resample within
one month and repeat the analysis for
those compounds detected. If the results
of the second analysis confirm the initial
results, then these constituents will form
the basis for compliance monitoring. If
the owner or operator does not resample
for the compounds found pursuant to
paragraph (g)(2) of this section, the
hazardous constituents found during this
initial Appendix IX analysis will form
the basis for compliance monitoring.
(4) Within 90 days, submit to the
Regional Administrator an application
for a permit modification to establish a
compliance monitoring program meeting
the requirements of § 264.99. The
application must include tiie following
information:
(i) An identification of the
concentration or any Appendix IX
constituent detected in the ground water
at each monitoring well at the
compliance point;
(ii) Any proposed changes to the
ground-water monitoring system at the
facility necessary to meet the
requirements of § 264.99;
(iii) Any proposed additions or
changes to the monitoring frequency,
sampling and analysis procedures or
methods, or statistical methods used at
the facility necessary to meet the
requirements of | 264.99;
(iv) For each hazardous constituent
detected at the compliance point, a
proposed concentration limit under
§ 264.94{a) (1) or (2), or a notice of intent
to seek an alternate concentration limit
under § 264.94(b); and
(5) Within 180 days, submit to the
Regional Administrator:
(i) All data necessary to justify an
alternate concentration limit sought
under § 264.94(b); and
(ii) An engineering feasibility plan for
a corrective action program necessary to
meet the requirement of § 264.100,
unless:
(A) All hazardous constituents
identified under paragraph (g)(2) of this
section are listed in Table 1 of § 264.94
and their concentrations do not exceed
the respective values given in that
Table; or
(B) The owner or operator has sought
an alternate concentration limit under
§ 264.94(b) for every hazardous
constituent identified under paragraph
(g)(2) of this section.
(6) If the owner or operator
determines, pursuant to paragraph (f) of
this section, that there is a statistically
significant difference for chemical
parameters or hazardous constituents
specified pursuant to paragraph (a) of
this section at any monitoring well at
the compliance point, he or she may
demonstrate that a source other than a
regulated unit caused the contamination
or that the detection is an artifact
caused by an error in sampling,
analysis, or statistical evaluation or
natural variation in the ground water.
The owner operator may make a
demonstration under this paragraph in
addition to, or in lieu of, submitting a
permit modification application under
paragraph (g)(4) of this section;
however, the owner or operator is not
relieved of the requirement to submit a
permit modification application within
the time specified in paragraph (g)(4) of
this section unless the demonstration
made under this paragraph successfully
shows that a source other than a
regulated unit caused the increase, or
that the increase resulted from error in
sampling, analysis, or evaluation. In
making a demonstration under this
paragraph, the owner or operator must:
(i) Notify the Regional Administrator
in writing within seven days of
determining statistically significant
evidence of contamination at the
compliance point that he intends to
make a demonstration under this
paragraph-
(ii) Within 90 days, submit a report to
the Regional Administrator which
demonstrates that a source other than a
regulated unit caused the contamination
or that the contamination resulted from
error in sampling, analysis, or
evaluation;
(iii) Within 90 days, submit to the
Regional Administrator an application
for a permit modification to make any
appropriate changes to the detection
monitoring program facility; and
(iv) Continue to monitor in accordance
with the detection monitoring program
established under this section.
(h) If the owner or operator
determines that the detection monitoring
program no longer satisfies the
requirements of this section, he or she
must, within 90 days, submit an
application for a permit modification to
make any appropriate changes to the
program.
6. In § 264.99 by revising paragraph
(c), revising paragraphs (d), (f), and (g),
removing paragraph (h), redesignating
paragraph (i) as (h), (j) as (i) and (k) as
(j), revising the redesignated paragraphs
(h) introductory text and (i) introductory
text, and removing paragraph (1) to read
as follows:
§ 264.99 Compliance monitoring program.
*****
(c) The Regional Administrator will
specify the sampling procedures and
statistical methods appropriate for the
constituents and the facility, consistent
with § 264.97 (g) and (h).
(1) The owner or operator must
conduct a sampling program for each
chemical parameter or hazardous
constituent in accordance with
§ 264.97(g).
(2) The owner or operator must record
ground-water analytical data as
measured and in form necessary for the
determination of statistical significance
under § 264.97(h) for the compliance
period of the facility.
(d) The owner or operator must
determine whether there is statistically
significant evidence of increased
contamination for any chemical
parameter or hazardous constituent
specified in the permit, pursuant to
paragraph (a) of this section, at a
frequency specified under paragraph (f)
under this section.
(1) In determining whether
statistically significant evidence of
increased contamination exists, the
owner or operator must use the
method(s) specified in the permit under
§ 264.97(h). The rnethods(s) must
compare data collected at the
compliance point(s) to a concentration
limit developed in accordance with
§ 264.94.
(2) The owner or operator must
determine whether there is statistically
significant evidence of increased
contamination at each monitoring well
at the compliance point within a
reasonable time period after completion
of sampling. The Regional Administrator
will specify that time period in the
facility permit, after considering the
D-6
-------
Federal Register / Vol. 53. No. 196 / Tuesday. October 11. 1988 / Rules and Regulations 39731
complexity of the statistical test and the
availability of laboratory facilities to
perform the analysis of ground-water
samples.
• * * • •
(f) The Regional Administrator will
specify the frequencies for collecting
samples and conducting statistical tests
to determine statistically significant
evidence of increased contamination in
accordance with § 264.97(g). A sequence
of at least four samples from each well
(background and compliance wells)
must be collected at least semi-annually
during the compliance period of the
facility.
(g) The owner or operator must
analyze samples from all monitoring
wells at the compliance point for all
constituents contained in Appendix IX
of Part 264 at least annually to
determine whether additional hazardous
constituents are present in the
uppermost aquifer and. if so, at what
concentration, pursuant to procedures in
§ 264.98(f)- If the owner or operator finds
Appendix IX constituents in the ground
water that are not already identified in
the permit as monitoring constituents.
the owner or operator may resample
within one month and repeat the
Appendix IX analysis. If the second
analysis confirms the presence of new
constituents, the owner or operator must
report the concentration of these
additional constituents to the Regional
Administrator within seven days after
the completion of the second analysis
and add them to the monitoring list. If
the owner or operator chooses not to
resample, then he or she must report the
concentrations of these additional
constituents to the Regional
Administrator within seven days after
completion of the intiial analysis and
add them to the monitoring list.
(h) If the owner or operator
determines pursuant to paragraph (d) of
this section that any concentration
limits under § 264.94 are being exceeded
at any monitoring well at the point of
compliance he or she must:
• « • * *
(i) If the owner or operator
determines, pursuant to paragraph (d) of
this section, that the ground-water
concentration limits under this section
are being exceeded at any monitoring
well at the point of compliance, he or
she may demonstrate that a source other
than a regulated unit caused the
contamination or that the detection is an
artifact caused by an error in sampling.
analysis, or statistical evaluation or
natural variation in the ground water. In
making a demonstration under this
paragraph, the owner or operator must:
* * * * *
[FR Doc. 88-22913 Filed 10-7-88. 8:45 am|
BILLING CODE 6560-50-M
D-7
------- |