Evaluation of Control Chart Methodologies for RCRA Waste Sites


              United States
              Environmental Protection
              Agency
              Environmental Monitoring
              Systems Laboratory
              P.O. Box 93478
              Las Vegas NV 89193-3478
EPA/600/4-88/040
December 1988
              Research and Development
vvEPA
Evaluation of Control
Chart Methodologies
for RCRA Waste
Sites

-------
EVALUATION OF CONTROL CHART METHODOLOGIES
            FOR RCRA WASTE SITES
                      by

                Thomas H. Starks
           Environmental Research Center
          University of Nevada, Las Vegas
              Las Vegas, NV 89154
                CR 814342-01-3
                Project Officer
               George T. Flatman
      Exposure Assessment Research Division
Environmental Monitoring Systems Laboratory-Las Vegas
         Las Vegas, Nevada 89193-3478
  OFFICE OF RESEARCH AND DEVELOPMENT
 U.S. ENVIRONMENTAL PROTECTION AGENCY
           LAS VEGAS, NEVADA

-------
                                   NOTICE

Although the information described in this document has been funded wholly or in part by
the United States Environmental Protection Agency under Cooperative Agreement CR
814342-01-3 with the Environmental Research Center, University of Nevada, Las Vegas,
it does not necessarily reflect the views of the Agency and no official endorsement should
be inferred.

Mention of trade names or commercial products does not constitute endorsement or
recommendation for use.

-------
                                  ABSTRACT

      This report is a discussion of decision rules relating to the monitoring of ground
water at hazardous  waste  sites that are subject to regulation under the Resource
Conservation and Recovery Act of 1976 (RCRA). A nested random-effects model for
ground-water quality parameter measurements is suggested and decision procedures are
developed in terms of that model. Particular attention is paid to the possible application of
industrial quality control schemes to the ground-water monitoring problem. A decision
procedure that changes over time as more information about well and aquifer characteristics
accumulate is proposed. This procedure involves the use of outlier tests and of Shewhart-
CUSUM quality control schemes.
                                   m

-------
                                 CONTENTS
Notice	ii
Abstract	    iii
Figures   	v
Tables	    vi
Abbreviations and Symbols   	   vii
     1.    Introduction	1
     2.    Summary and Conclusions	2
     3.    RCRA Regulations	4
     4.    Ground-Water Characteristics	5
     5.    Statistical Model	6
                 Nested random-effects model	6
                 Liggett's procedure    	   11
     6.    Problems Associated With the Analysis of Ground-Water Data   .   13
     7:    Quality Control Schemes	   16
                 Shewhart charts	   17
                 The CUSUM quality control scheme	   18
                 The combined Shewhart CUSUM scheme	   21
                 Multivariate quality control schemes   	   21
     8.    Monte Carlo Investigation of Run-Length Distributions  ....   23
     9.    A Decision Procedure	   36

References	   38
                                  IV

-------
                                 FIGURES

Dumber                                                         Page
   1     The CUSUM control chart	20
   2     Shewhart quality control chart	20
   3     Schematic of Monte Carlo Simulation under ideal conditions   .  .  26

-------
                                  TABLES

Number                                                             Page
   1      Analysis of Variance	    8
   2      CUSUM Quality Control Scheme Example	   19
   3      In-Control Run-Length Statistics for w Wells Under
          Ideal Conditions (N= 100 Trials)	27
   4      Run-Length Statistics for Out-of-Control Situations
          (N= 100 Trials, Ideal Conditions)	27
   5      In-Control Run-Length Statistics for w Wells When Measurements
          are Serially Correlated (N= 100 Trials, Normal Distribution)  ...   29
   6      Run-Length Statistics for Serially-Correlated Data in
          Out-of-Control Situations (N=100 Trials)	29
   7      Sample Statistics for Log-Gamma (0,6) Deviates (N=1000)  ...   30
   8      In-Control Run-Length Statistics When Data are Log-Gamma(a,6)
          Random Deviates (N= 100 Trials)	30
   9      Run-Length Statistics for Out-of-Control Situations When Data are
          Log-Gamma(a,6) Random Deviates (N= 100 Trials)	30
   10     Run-Length Statistics When Data are Gamma(3,6) Random
          Deviates (N=rlOO Trials)	32
   11     In-Control Run-Length Statistics for Measurements on Four
          GQP at Each of Four Wells (N= 100 Trials, Ideal Conditions) ...   33
   12     Run-Length Statistics for Out-of-Control Situations in Which One
          of the Four GQP Means has Increased by Two Standard Deviations
          at Two of the Four Wells (N=100 Trials, Ideal Conditions)  ...   33
   13     In-Control Run-Length Statistics for Learning Periods of Lengths
          Four and Eight (N= 100 Trials, Ideal Conditions) .	35
                                  VI

-------
               LIST OF ABBREVIATIONS AND SYMBOLS
ABBREVIATIONS
     ARL - - - -
     GQP - - - -
     HWS   - - -
     OOC
     RCRA  - - -

SYMBOLS
     h  .....
     k  .....
     P  .....
     s  .....
     o  .....
     o* .....
     L  .....
     t  .....
     T* .....
     U  .....
     w  .....
     X  .....

     x  .....
Average run length
Ground-water quality parameter
Hazardous waste site
Out of control
Resource Conservation and Recovery Act of 1976
CUSUM upper control limit
Parameter of CUSUM quality control scheme
Mean value of distribution of X
Correlation
Sample estimate of standard deviation
Standard deviation
Variance
Covariancc matrix for vector X
Number of sampling periods
Test statistic for Retelling's test of a multivariate mean
Shewhart upper control limit
Number of wells
(Transformed) measurement of a GQP,
   or a vector of measurements of GQP
Sample mean of measurements of a GQP
                              vu

-------
SECTION 1

INTRODUCTION
Under the Resource Conservation and Recovery Act of 1976 (RCRA), the U.S.
Environmental Protection Agency has developed regulations for landfills, surface
impoundments, waste piles, and land treatment units that are used to treat, store, or dispose
of hazardous wastes. These regulations include requirements for the monitoring of ground
water in the uppermost aquifer below the hazardous waste site (HWS). This monitoring
involves the drilling of wells into the uppermost aquifer that are at appropriate locations and
depths to yield ground water samples that represent the quality of background ground water
and the quality of ground water passing the point of compliance. The sampling and
analysis of monitoring-well water is conducted at regular time intervals to help determine
whether a release from the HWS has entered the uppermost aquifer. There are several
areas in this monitoring program that appear to need further development of methodology.
They include development of better methods for obtaining precise and unbiased
measurements of some constituents such as volatile organics, better specifications for well
construction, better methods for detection and accommodation of shifting direction and rate
of aquifer flow, and better decision rules based on measurements of water samples drawn
from wells near the HWS for determining when additional regulatory action may be
required. This paper discusses the problem of developing good decision rules and
recommends that the development be based on a realistic model for the ground-water
measurements. A nested random-effects model is suggested and statistical procedures
based on that model are formulated and criticized. Industrial quality control schemes are
considered in terms of their possible application to the ground-water monitoring decision
problem.

-------
                                   SECTION 2

                        SUMMARY AND CONCLUSIONS
     All statistical decision procedures are based on assumed measurement models.
Decision procedures based on unrealistic models will not succeed in providing answers to
ground-water monitoring decision problems, no  matter how simple or elegant  the
procedures may be. It is essential that a realistic workable model for the measurements be
formulated and used both in construction and evaluation of decision procedures. A nested
random-effects model is presented to illustrate a model approach  and to indicate  the
difficulties inherent in developing good statistical procedures for monitoring of ground-
water quality. Obviously, no statistical measurement model is the  correct model for a
system of nature, but the model for a decision procedure should be as reasonable and as
simple as possible.

     Any decision procedure, based on measurements of the quality  of the ground water
taken in each sampling period, where decisions are made at the end of each sampling period
as to whether or not additional regulatory actions are required, is by definition a quality
control  scheme.  In addition, for a quality control  scheme, one is interested in  the
distribution of run lengths in both in-control and out-of-control situations. That is, a good
decision procedure  (quality control scheme) is one with large average in-control run lengths
and small average out-of-control run lengths. Hence, consideration in comparing decision
procedures for RCRA sites should be given to the distributions of their run lengths rather
than to their probabilities of Type I and Type n errors on individual applications of the
decision rule in each sampling period. (However, the two types of criteria are obviously
not unrelated.) In choosing a quality control scheme for use at RCRA  sites it is reasonable
to consider quality control  schemes that have been used successfully in other settings,
particularly in industrial settings.

     The formulation of good decision  procedures  for determining  when increased
monitoring  activity is needed at a HWS is  extremely difficult because of the slow
aquisition, low precision, and multivariate nature of ground-water monitoring data along
with system instability due to intrusions of the aquifer caused by man outside the HWS.
The first two of these problems force the initiation of quality control schemes before good

-------
(highly precise) estimates of measurement distribution parameters can be obtained. With
good estimates of the measurement distribution parameters, it is possible to mathematically
derive the run-length distributions for various quality control schemes.  However, without
these good estimates, it is necessary to employ Monte Carlo techniques to estimate the
distribution properties of run lengths when the process is in-control (i.e., site is not leaking
into die aquifer) and when the process is out-of-control (i.e., site is leaking into aquifer and
plume is passing through one or more well sites).  The Monte Carlo analysis of the
Shewhart-CUSUM quality control  technique indicates the type  of results that can be
obtained with this method and also indicates that the method is reasonably robust with
respect to left-skewed, non-normal probability distributions of measurements. However,
the technique is not robust with respect to lack of independence between measurements; in
particular, its in-control run-length characteristics are adversely altered by positive serial
correlations.

     The Monte Carlo simulation results and methods discussed in this report provide a
basis for comparison and evaluation of all other decision procedures, since similar Monte
Carlo simulations can be performed on any decision procedure to obtain estimates bf the
run length distributions of such procedures.

-------
                                  SECTION 3

                             RCRA REGULATIONS
     Prior to discussion of decision procedures for ground-water monitoring at HWS, it is
well to review decision procedures in the RCRA regulations. These procedures are given
in a partial revision of 40 CFR 264 that recently appeared in the Federal Register (53 FR
39720-39731, October 11, 1988) and that will go into effect on April 11, 1989.  In these
revised regulations, the previously required use of the Cochran's approximation to the
Behrens-Fisher Student's t-test was removed. In its place, a choice is allowed between (i)
a parametric analysis of variance followed by multiple comparison procedures, (ii) an
analysis of variance based on ranks followed by multiple comparison procedures, (iii) a
tolerance interval or a prediction interval procedure, (iv) a control chart approach that gives
control limits for each constituent, or (v) another statistical test method submitted by the
owner or operator and approved by the Regional Administrator. The first three procedures
require the comparison of sample measurements taken at wells at the "compliance point(s)"
with sample measurements obtained at "background wells." The sampling procedure at a
given well at a particular sampling period is that "(1) A sequence of at least four samples,
taken at  an  interval that assures, to the greatest extent technically feasible, that  an
independent sample is obtained,... or (2) an alternate sampling procedure proposed by the
owner or operator and approved by the Regional Administrator."

     In the past, RCRA regulations have  considered the monitoring of concentrations of
certain ground-water constituents or of the levels of certain indicator parameters. In the
remainder of this paper no distinction will be made between constituent concentrations and
values of indicator parameters; they will both be called values of ground-water quality
parameters (GQP).  In addition, only one-sided statistical tests will be considered, since in
cases, such as for pH, where increases or  decreases in population mean indicate possible
leakage from the HWS, the one-sided procedures may be easily modified to accommodate
the two-sided situation.

-------
                                  SECnON 4

                    GROUND-WATER CHARACTERISTICS
     To develop a reasonable model for the measurements of GQP, some discussion of
ground-water characteristics is required. The horizontal velocity of water in an aquifer is
slow (a few meters per day is  considered a high velocity) and lateral dispersion of a
contaminant is considerably slower than fluid flow, which means that plumes do not widen
to any great extent as they extend through the aquifer (Freeze and Cherry, 1979, p. 26-29,
104,394-395).  At a particular sampling time, the water at different wells near a HWS will
have entered the aquifer at different times and, therefore, carry different concentratipns of
various monitored constituents.  Hence, the  value of a monitored parameter may be
increasing at one well while  decreasing at another. Concentrations of contaminants in
samples of ground water may also change due to changes in water table. A high water
table resulting from recent rains  or flooding  may cause a  reduction in measured
concentrations of a pollutant because of dilution of the contaminant or because the
contaminant is floating at the top of the aquifer which is now above the well screen. On the
other hand, a high water table may cause an increase in the  concentrations of some
contaminants caused by leaching activity in the vadous zone.  In any case, changes in water
table may have similar virtually simultaneous effects on all wells near a HWS.

-------
SECTION 5

STATISTICAL MODEL
The basic measurement is of the concentration of a GQP at a well at a particular
sampling time. Under the revised RCRA regulations it is usually obtained as the arithmetic
mean of the analytical results on at least four water samples drawn from the well at a
particular sampling time. Decision rules must be based on these basic measurements and
their distributions. Let X represent an appropriately transformed measurement of the GQP
from a well near a HWS. The transformation (if necessary) is performed to stabilize the
variance (i.e., to provide a measurement whose variance does not change when the mean
concentration changes). Typically this transformation also changes the distribution to one
that is more nearly normal.

NESTED RANDOM-EFFECTS MODEL

To take into account the properties of ground water mentioned above, a model is
considered in which temporal effects are allowed to be different at each well. This model is
a nested random-effects model (see Scheffe, 1959, Section 7.6). Let Xghi be the
background (see above RCRA regulations) measurement X at sampling time i from up-
gradient well h. Then in this nested random effects model

xBhi = HB + wh +Thi + Mhi h=l,...,w; i=l ..... t;
where HB is ^e unknown mean level for X over the background area and over the time
period during which sampling has been performed in the area; W^ is the random effect,
with expected value zero and variance QW» due to location and construction of well h; Thj is
a random temporal effect, with expected value zero and variance o, caused by the taking
of a sample at time i from well h; and Mhi is a random variable, with expected value zero
and variance o, that represents the error due to sample acquisition and sample analysis
(i.e., Mhi represents measurement error). If several true replicate samples are obtained at
each sampling time at each well, it is possible to obtain estimates of OT and ojf from the

-------
data. However, true replicate samples are difficult to obtain; and it is even more difficult to
convincingly demonstrate that they are true replicate samples by statistical arguments. One
solution is to work exclusively with the means of the samples taken at a particular sampling
time. In this case, one essentially confounds temporal and measurement error effects. To
provide a simpler nested random effects model for the sample mean, the last two terms of
the previous model are replaced by Enj.

*Bhi = ^B + Wh + Ehi
where Ehi now nas mean 0 and variance
-------
on (w-1) degrees of freedom. The usual procedure of taking the sample variance s2
(=Zfi£i(XBhi - XB)2/(wt -1)) of all the wt measurements divided by wt to estimate
variance of the sample mean is not appropriate here because of the fact that there is more
than one source of variation in the data.
TABLE 1. ANALYSIS OF VARIANCE
Source
Wells
Within Wells
d.f.
w-1
w(t-l)
Mean Square1 E(Mean Square)
tS(XBh-XB)2/(W-l) c
ZZ(XBhi-XBh)2/(w(t -1)) c
4+ta^
*
Total wt-1
For a down-gradient well, one wants to know whether the value of X in the present
sample represents evidence indicating that leachate from the HWS has reached the aquifer.
Hence, each time a down-gradient well is sampled, the value of the measurement from the
sample taken from the well at that time will be of interest to the owner/operator and to the
regulatory agencies. Let XqC represent the sample mean of the measurements taken from
down-gradient well q at the current sampling time c. By using the same reasoning
employed to model a measurement from a background well, one has
XqC = Mqc + Mqc
where |iqC is the mean value of X at well q at this current sampling time c, and MqC is the
measurement error. If there is no leakage from the HWS, then it may be reasonable to
treat XqC as a measurement coming from the same population as the background
measurements; in which case,
If one makes the reasonable assumption that there is spatial and temporal variation in
ground-water quality, and thereby in the value of X, then it is evident that the expected
value HqC of X at a particular down-gradient well at a particular time is not likely to equal
the expected value HB of X over both a large region and an extended period of time even
when there is no leakage from the HWS. Nevertheless, there have been a number of
8
-------
proposals (e.g., Miller and Kohout, 1984), and two RCRA regulations, that include a
statistical test of the equality of the two means as the decision rule in deciding whether
additional monitoring activity is required. The two RCRA regulation procedures, which
are being removed under the current revisions, have decision rules involving Cochran's
approximation to the Behrens-Fisher Student's t-test (in 40 CFR 264) and Student's t-test
(in 40 CFR 265) which can only be justified by selecting a model in which all the variation
between measurements, when there is no leakage from the HWS, is due to measurement
errors.

With the nested random-effects model, one can develop a test based on Cochran's
approximation to the Behrens-Fisher Student's t-test that, given the usual assumptions of
normality and independence of observations and sufficiently large numbers of up-gradient
and down-gradient wells, is a valid test The test statistic is

t=(Xqc-XB)/V(sqc2 + sb2)

where sb2 = (Mean Square for Wells)/(wt) is based strictly on the wt background

measurements, sqc2=£(Xnc-Xc)2/(w*-2), w* is the number of down-gradient wells,
Xnc is the basic measurement of the GQP for down-gradient well h (* q) during current
sampling, and Xc is the sample mean over all (w*-l) down-gradient wells, other than well
q, sampled during the current sampling period. Now sb2 and sqc2 are independent

unbiased estimates of the variances of Xg and XqC with the appropriate distributions for
the test provided w>l and w*>2. The critical value for the test is
t* = (sqc2tc + sb2tb)/(sqc2 + sb2)
where tc and tb are the upper lOOa-percentiles of Students t-distributions with (w*-2) and
(w-1) degrees of freedom, respectively. This test is not a test of the null hypothesis that
the mean ^ of measurement at the monitored well at this sampling time is exactly the same
as the mean JIB of the background measurement model. Instead, it is basically a test of the
null hypothesis that the measurements from the down-gradient wells currently being
monitored are members of the same super-population as the background measurements
Xghj versus the alternative hypothesis that the true model for the measurement Xqc on the
monitored well is the same as for background measurements Xghi except that mean has
increased from mj to some larger value.
-------
EXAMPLE 1: Consider a HWS where there are four up-gradient wells, that have
been sampled in each of four quarters to provide 16 background measurements, and
four down-gradient wells. The data from the background measurements and the
current measurements of the down-gradient wells are as follows:

Background Wells
-L 2 2 4
First Quarter 6.32 6.55 6.21 7.02
Second Quarter 6.20 6.60 6.75 6.94
Third Quarter 6.51 6.81 6.33 6.75
Fourth Quarter 6.03 6.70 6.27 7.13

6.265 6.665 6.390 6.960

Down-gradient Wells
1224
Xhc: 7.14 6.35 6.66 6.00

Analysis of Variance of Background Data

Source d.f. Mean Square EfMean Square)

Wells 3 0.3821 o| + 4ow
Other (within wells) 12 0.0349 o|
Total 15
XB = 6.570, Sb2 = 0.3821/16 = 0.0239, q = 1, Xlc = 7.14, sic2 = 0.1090
For down-gradient well 1, the test statistic value is

t = (7.14-657)/V(0.1090+0.0239)« 1.56.

The critical value for a one-sided test with 0.01 significance level is

t* = ((0.1090)(6.965)+(0.0239)(4.541)/(0.1090+0.0239) . 6.529

We do not accept the alternative hypothesis that Xic belongs to a population with a
larger mean than that for background measurements. The measurement at down-
gradient well 1 does not indicate a need for increased monitoring activity.
10
-------
This proposed test has several deficiencies, but the principal one is lack of power.
Practical considerations keep the numbers w and w*, and thereby the power, small. In
addition, for this test there is no analytical way to determine the power of the test, but it
could be estimated by simulation procedures

If the measurements taken at the current sampling time from the point of compliance
and background wells are the only ones considered, any one of several outlier tests given in
the text by Barnett and Lewis (1978) may be employed. One such test of whether the k
(k=l,...,w*) largest measurements in a sample of size n are outliers has the test statistic
where X(j) is the ilh order statistic (X(n) represents the largest observation in a sample of
size n=w+w*), X and s are the usual sample mean and sample variance. Table IXa, p.
304, of the above mentioned text, contains critical values for the test statistic T. This test is
a likelihood ratio test for a location slippage alternative in which k observations arise from a
normal population with mean (M-B+a), a>0, while the remaining observations are from a
normal population with mean |ig and the same variance. As with the previous test, this
outlier test will have poor power characteristics when the number of wells is small.

LIGGETTS PROCEDURE

A model and a decision procedure have been proposed by Liggett (1985). He
suggests that in the absence of leakage from the HWS, measurements at up-gradient and
down-gradient wells may be modeled by
xhik = M- + wh +Ti + ehik n=1 ..... WJ i-1 ..... *'> k=l,...,r

where X}^ is the measurement on the kth replicate sample from well h at time i, |i is the
general mean, Wjj is a fixed unknown spatial effect, Tj is a fixed unknown temporal effect,
and ejufc is the random effect due to sampling and measurement error. This fixed-effects
model implies that one has an interest in the comparison of the particular values at up-
gradient wells at each time of sampling, and that if concentration increases at one well, it
should increase by a corresponding amount at all other wells. Liggett states:

Although the background quality of ground water varies both spatially and
temporally, temporal variations at nearby points are related. This fact suggests that
(apart from sampling and measurement error) temporal variations are the same at

11
-------
nearby points. Wells located close together and sunk to the same depth will satisfy
this assumption, which provides a model for the background quality around a waste
management facility.

Under these assumptions,which are somewhat contrary to the discussion of ground-water
characteristics given in Section 3, Liggett's model represents the situation where there is no
leakage from the HWS. If such leakage occurs one might expect the plume to reach one or
more of the down-gradient wells and the increase in X would be larger at those wells than
at the others for the particular measurement period. This unequal change in concentrations
at the various wells is, in terms of analysis of variance models, an "interaction between
well location and time" and may be included on the right-hand side of the model equation as
an additional term Ini. Under the hypothesis of no leakage, 1^=0 for all h and i. Liggett
suggests examination of the residuals (measurements after subtraction of the estimated
mean, well effects, and time effects) to search for non-zero interaction effects. He also
recommends performance of a standard analysis of variance F-test of this no-leakage
hypothesis. If the hypothesis (H: Ini=0) is rejected, leakage is a possible cause. This F-
test requires that r, the number of replicate samples taken at each well at each sampling
time, must be greater than one. Such true replicate samples may be very difficult to obtain.
Replicate samples are such that the measurements of each GQP are values of independent
and identically distributed random variables. Identical distributions are hard to obtain
because well water characteristics often change as samples are drawn owing to the
mechanical action of drawing the samples. Independence is difficult to obtain since if one
sample is contaminated in the sampling process, there is a good chance that the others are
also. In addition, the samples will typically be handled as a batch throughout the
measurement process and thereby have within-batch correlations.
While Liggett's model may apply in some situations, such as where changes in GQP
values are primarily due to changes in height of the water table, it is this author's opinion
that Liggett's model is unlikely to have wide application due to the ground-water
characteristics mentioned in Section 3. Even for a small HWS, one might expect the age
of the water at up- and down-gradient wells to differ by many months, and, therefore, one
should not expect the same temporal effect at every well.
12
-------
SECTION 6

PROBLEMS ASSOCIATED WITH THE ANALYSIS OF GROUND-WATER DATA
There are several characteristics of ground-water monitoring data that severely
complicate the development of statistical decision procedures. They are as follows:

1. Several water quality parameters are tested for change from background;
2. Changes in aquifer water quality and flow characteristics are effected by human
interventions off the HWS such as by intermittent pumping of water from the
aquifer and by accidental spills of pollutants into the aquifer;
3. For some monitored substances, such as volatile organic compounds,
measurement error variance and bias tend to be large;
4. Cyclic (seasonal) variation in background GQP values may occur,
5. For some GQP, the chemical analyses will result in "below instrument detection
limit," or "not detected," for most of the water samples submitted; and
6. It is difficult to obtain consecutive samples that are true replicates from a well at a
particular sampling time.
Large measurement-error variances.serial correlations, and practical constraints make
it difficult to quickly obtain an adequate number of independent background measurements
needed to characterize the system. Similarly, it is difficult to determine and obtain a
sufficient number of samples and aliquots at a monitoring well to estimate current GQP
values with high precision or to provide tests with good power characteristics.

The fact that the values of several water quality parameters are being monitored at
each of several down-gradient wells implies that in each sampling period many decisions
must be made as to whether parameter values at down-gradient wells have increased
sufficiently to justify some form of regulatory action. If a separate test of hypotheses is
performed for each of these decisions, the probability of falsely rejecting at least one null
hypothesis is likely to be quite high, even though the probability of such a false-positive is
13
-------
low for each test. While the simplest, and typical, reaction to a large concentration that
signals the occurrence of an unusual event is to take another sample to confirm the
measurement, one cannot be complacent about false alarms and the added burden that they
place on the monitoring system. (One of the nice features of Liggett's procedure is that it
provides one test across all monitored wells for a given water quality parameter.) To
reduce the number of tests, one might think of using a multivariate procedure such as a
Hotelling's T2-test (Anderson, 1984, p. 159), but such tests usually require estimation of
the covariance matrix of the vector of measurements. Unfortunately, the number of
sampling periods required to obtain the data to estimate the covariance matrix is greater than
the dimension of the observation vector. Practical considerations and start-up time
restrictions make it unlikely that the covariance matrix could be adequately estimated from
measurements taken prior to start-up of the HWS. In addition, human intervention is likely
to change the covariance matrix of the observations during monitoring. Beyond these
problems with the covariance matrix, a significantly large T2-value does not necessarily
imply an unusual increase in any of the monitored chemical concentrations.

Another, and probably better, approach to the multiple tests problem is to use the
Bonferroni simultaneous inference technique (Miller, 1966, Chapter 1) which prescribes
the use on a significance level of o/m for each of the m tests to be performed at a given time
of sampling so as to keep the overall probability of making at least one false positive (Type
1) error at or below a Naturally, the reduction of the significance levels of the individual
tests from a to a/m also reduces the power of the tests.

Cyclic effects are difficult to accurately estimate until data are obtained for several
cycles. The number of cycles needed to reach a desired level of accuracy will depend on
the size of the effects relative to the size of the random variability in the system and
measurement process. Even if cyclic effects are present, it will usually be best to treat the
cyclic variation as random variation in the early phases of the monitoring process to avoid
bad estimation of the cyclic effects and under-estimation of the amount of random variation.
The slow acquisition of ground-water data and expected long periods of expected cycles
imply that one may be several years into the monitoring program before reasonably good
estimates of cyclic effects can be produced

The "below instrument detection limits" measurements of a water quality parameter
are impossible to employ in the types of statistical analyses and decision rules mentioned
above. While in many cases, these qualitative statements can be replaced by instrument
values obtained in the chemical analysis, these numbers are likely to have a considerably
different error distribution than will observations that are above the instrument detection
limits. The decision as to whether an above-detection-limit measurement represents an

14
-------
increase in the mean concentration of the chemical over previous "not detected" readings is
often a question which must involve the analytical chemist due to his knowledge of the
quality of the current measurement. Statistical methods proposed by Aitchison (1955), and
considered in terms of air monitoring by Owen and DeRouen (1980), for positive
continuous random variables with positive probability mass at zero may be useful here.

Inability to obtain true replicate samples makes it impossible to estimate sampling
variance and difficult to obtain desired levels of precision for quality estimates.
15
-------
SECTION 7

QUALITY CONTROL SCHEMES
If the slow flow of water and the narrowness of plumes in the aquifer make the tests
of means and interactions inappropriate, and outlier tests are lacking in power, what other
approach to decision procedures might one apply? First, one should note that in a RCRA
ground water monitoring situation a decision is made at the end of each sampling period as
to whether additional regulatory action is needed. Hence, during the operating life of a
hazardous waste facility, one is dealing with a sequence of decisions rather than just one
decision. The one-decision case is a test of hypothesis situation in which it makes sense to
consider the significance level and power of the test. However, in the situation where
decisions are made sequentially over time, one is dealing with a quality control scheme and
should be interested in distributions of run lengths. An in-control run length is the number
of sampling periods from start-up until a decision is made, on the basis of water sample
measurements, that additional regulatory action is required when, in fact, there is no
leakage from the HWS. An out-of-control run length is the number of sampling periods
from the time that a pollutant plume originating from the HWS intercepts a well site until a
decision is made that additional regulatory action is required Naturally, one wants to use a
quality control scheme that has, on average, long in-control run lengths and short out-of-
control run lengths. From this perspective, we see that all decision procedures that have
been suggested for monitoring ground water at RCRA sites are in truth quality control
schemes. It seems reasonable to think that if one is choosing a quality control scheme, he
should consider quality control schemes that have proved successful in the past That is,
one should consider industrial quality control schemes. In fact, Vaughan and Russell
(1983) have suggested the use of industrial quality control schemes for monitoring effluent
from waste treatment plants, which is a somewhat similar problem to that encountered in
RCRA ground-water monitoring.

A prime consideration in using industrial quality control methods to monitor ground-
water quality at a HWS is that they remove effects of location and well construction from
the decision process. Instead of comparing the measurement of a GQP at a particular well
with measurements at other wells, one compares the current measurement of the GQP with

16
-------
the past history of the GQP measurements in water from this well (or compares the current
average GQP measurement over wells with the history of such averages from the same set
of wells). Some other advantages, and also some disadvantages related to the earlier list of
problems with ground-water measurements, will become evident as the nature of industrial
quality control schemes is presented.

Discussion in this section is restricted to one-sided quality control schemes where the
common concern in monitoring is to detect an increase in the GQP. The extension to two-
sided schemes is straightforward if they arc needed for indicators such as pH.

SHEWHART CHARTS

The Shewhart (1931) quality control chart is one of the oldest and simplest of the
industrial quality control procedures. The chart is simply a graph of time of sampling, or
sample number if samples are equally spaced in time, versus the sample mean value for the
quality parameter being monitored. Time, or sample number, is the abscissa and sample
mean value the ordinate of a point on the graph. Typically the horizontal axis is positioned
so as to intersect the vertical axis at the steady-state mean value ji for the quality parameter.
A horizontal line is also drawn to intersect the vertical axis at ji+Za where Z is the upper a
quantile of the standard normal distribution and a is the long-run standard deviation of the
sample means. This line is called the upper control limit, and when a point falls above the
line the process is declared out of control. The average in-control run length (i.e., the
average number of samples between declarations that the system is out of control, when in
fact it is in-control) is 1 Ax if the sample means have a normal sampling distribution. The
commonly used value of Z is 3, for which the corresponding value of a is 0.0013. In
industrial quality control the sample sizes are usually somewhere between 5 and 10
depending on cost and internal variability between members of a sample.

Lorenzen and Vance (1986) give a procedure for determining on an economic basis
the sample size n, Z, and the time between samples. It would appear that their approach
could be generalized to other control schemes and to ground-water monitoring situations.

A second control chart is often kept for the variability of the product It is similar to
the Shewhart chart for the sample means, only now the ordinate is the sample range, or
standard deviation, and the horizontal lines representing upper and lower control limits are
located on the basis of the distribution of the statistic (sample range or sample standard
deviation) under the assumptions of a normal distribution for the quality parameter
measurements. In practice the lower limit is seldom used (Guttman et al., 1982, p. 275 ).
This chart is not nearly as robust with respect to the assumption of normality as is the chart

17
-------
for the sample means, and so out-of-control situations for variability must be viewed with
more skepticism than similar results on the means chart. If the variability of measurements
of the quality parameter has changed, the height of the upper control line on the Shewhart
chart for means is adjusted accordingly, or action is required to bring the variability back to
its previous level.

THE CUSUM QUALITY CONTROL SCHEME

The CUSUM (for cumulative summation) control scheme derives from a paper by
Page (1954) and is somewhat more complicated than the Shewhart chart. (A review article
by Lucas (1985a) gives the current state of development of this procedure.) The CUSUM
control scheme makes use of information in the present sample and in the previous samples
in reaching decisions as to whether the process is in-control, whereas the Shewhart chart
makes decisions on only the current observation (i.e., the Shewhart chart is a graphical
representation of a sequence of individual tests of the mean; whereas, the CUSUM scheme
is a sequential probability ratio test of the mean). The one-sided CUSUM scheme involves
the computation of a cumulative sum S which for the ith sample is given by the formula,
where z; is the standardized ilh sample mean (i.e., zf = [Xj - M.]/CT), and k is a parameter
of the control scheme. When Sf exceeds a specified value h, the process is declared out of
control (i.e., in pollution monitoring, a decision is made to begin additional monitoring
activity). The values of h and k are chosen to obtain desired average run lengths (ARL)
under in-control and specified out-of-control situations. For a scheme designed to be
sensitive to changes in mean quality of size Da, k is usually chosen to be D/2, and h is
selected to give the largest in-control ARL consistent with an adequately small out-of-
control ARL. Typically SQ is taken to be 0, however Lucas and Crosier (1982a) have
shown that by employing a slightly higher value of h and starting with S0=h/2 one can
decrease out-of-control ARL while maintaining in-control ARL. They also give tables of
ARL for in-control and out-of-control situations and various values for h and k. However,
if one is reasonably confident that there is no contamination coming from the HWS at the
start, then one can start with SQ=O and somewhat decrease the chance of an early false
positive.

EXAMPLE 2. To illustrate the Shewhart and CUSUM schemes, random normal
Oi=10, o2=4) deviates were drawn from a table of such numbers. To obtain an
out-of-control situation, 2 was added to each of the numbers drawn after the fourth
(i=4). These numbers are to represent the sample means of a process that went out

18
-------
of control between the fourth and fifth samplings. The in-control situation has
sample means distributed N(10,4), so z; = (Xj-^/o = (Xi-10)/2. The CUSUM
scheme (Table 2) indicates that a decision to take action (process is out of control)
should be made at MO. The CUSUM chart (Figure 1) gives a visual impression of
when the process went out of control. The corresponding Shewhart chart, with an
upper control limit of (n+3a)= 16, is given in Figure 2.
TABLE 2. CUSUM QUALITY CONTROL SCHEME EXAMPLE
(k=0.5, h=5)
In-Control
i
0
1
2
3
4
• 5
6
7
8
9
10
X

14.504
11.108
7.594
7.580
11.588
12.002
10.434
9.378
10.708
11.278
z

2.252
0.554
-1.203
-1.210
0.794
1.001
0.217
-0.311
0.354
0.639
S
0.000
1.752
1.806
0.103
0.000
0.294
0.795
0.512
0.000
0.000
0.139
Out-of-Conrrol at i=5
X

14.504
11.108
7.594
7.580
13.588
14.002
12.434
11.378
12.708
13.278
z

2.252
0.554
-1.203
-1.210
1.794
2.001
1.217
0.689
1.354
1.639
S
0.000
1.752
1.806
0.103
0.000
1.294
2.595
3.312
3.501
4.355
5.494 X
19
-------
6-
5~
S. 4-
3-j
r
18-
16-
14-
12-
10-
8-
6-
I i I I I I I I i I
1 23456789 10
i
Figure 1. The CUSUM control chart
(k=0.5, h=5)
•
•
T I I I I I I I I I i
1 23456789 10
Figure 2. Shewhart quality control chart
(H=10, a=2)
20
-------
This example illustrates the weakness of the standard Shewhart chart in detecting
small changes in the value of the mean. One way to reduce this weakness in the Shewhart
chart is to declare the process out of control whenever there are r successive sample means
with value above (a. The value of r is usually chosen to be 7 or 8. However, this
procedure also reduces the in-control ARL.

THE COMBINED SHEWHART CUSUM SCHEME

The Shewhart scheme is better than the CUSUM scheme in quickly detecting large
(>3a) shifts in the mean (i; whereas, the CUSUM scheme is usually faster in detecting a
small change in ji that persists. Bissell (1984) has also shown that the CUSUM scheme is
to be preferred when the mean is increasing in a linear time trend. To take advantage of the
good properties of both tests, Lucas (1982) suggested combining the two procedures. This
is accomplished by declaring the process out of control if any sample mean is above a
specified upper Shewhart limit or if the CUSUM Sj is above a specified limit h. To keep a
reasonable in-control ARL, Lucas suggests using an upper Shewhart control level of
i
Qi+4cr). Lucas calculated that if this upper Shewhart control limit is used with CUSUM
parameter values k=0.5 and h=5, then the in-control ARL is 459 while the out-of-control
ARL is 10.4, if the true mean shifted upward by a, and only 1.6 if the mean shifted
upward by 4a. If the nature of the data is such that there are occasional outliers (perhaps
due to .contamination of samples during handling and processing), Lucas and Crosier
(1982b) suggest the use of a two-in-a-row rule. That is, require values in two successive
samplings above the upper Shewhart control limit before declaring an out-of-control
situation based on the Shewhart control limit

For measurements that are typically near or below the instrument minimum detection
limits, it may be possible in some cases to treat the measurements, or some transformation
of the measurements, as Poisson count data (Ingamells and Switzer, 1973). Lucas
(1985b) has discussed how the CUSUM quality control approach can be employed with
Poisson count data.

MULTIVARIATE QUALITY CONTROL SCHEMES

In monitoring ground water at a HWS, the concentrations of a several GQP may be
measured at each of several monitored wells. If a quality control chart is kept for each
parameter, at each well, then one has the problem that while the chance of a false alarm is
kept small for each chart, the overall probability of a false alarm becomes large (i.e., the in-
control ARL for the whole set of charts may be quite small in spite of large in-control
ARL's for each individual chart). This problem is addressed in survey papers by Jackson

21
-------
(1985) and Alt (1985). One approach is to use a control scheme that in effect performs a
Hotelling's T2 test on each sample. This is essentially a two-sided procedure in that
abnormally low concentrations or unusual combinations of moderate concentrations, as
well as abnormally large concentrations of some pollutants may trigger the alarm. The
statistic calculated at each time of sampling is
where X is the vector of (transformed) parameter measurements at the monitored wells, ^
is the steady state mean vector and Z is the (positive definite) covariance matrix for the
vector of the random variables corresponding to X. Under assumptions of multivariate
normality for the measurement vector and a steady state system, T2 will have a chi-square
distribution with degrees of freedom equal to the dimension of the X vector. Hence the
control chart is similar to the Shewhart chart in that each observed T2-value is plotted
against the number of its sampling period, and the system is declared out of control if T2
exceeds an upper control limit which is the upper a-percent point of the appropriate chi-
square distribution. Jackson and Mudholkar (1979) have suggested additional statistics
that might be monitored to determine whether outliers or changes in £ have occurred
Montgomery and Klatt (1972) have determined optimum sample size and interval between
sampling times for the T2 control procedure.

A procedure, called MCUSUM for multivariate CUSUM, proposed by Woodall and
Ncube (1985) advocates running individual CUSUM charts on the different GQP or on
principal components of the GQP values and choosing h and k so that the combined ARL
under in-control conditions will be acceptable. Unfortunately, when running CUSUMs on
measurements of individual GQP where measurements on different parameters (or
measurements on the same parameter from different wells) are correlated, the calculation of
appropriate values for h and k is extremely difficult. If principal components are used,
computations of appropriate h and k values are easier to perform, but the control charts for
principal components will have to be two sided in most cases. When an action level is
exceeded, it is difficult to interpret the cause, which may not be an excessively high
concentration of a monitored parameter but simply an unusual value for a linear
combinations of GQP values. (This is the same problem encountered with the T2
procedure.)
22
-------
SECTION 8

MONTE CARLO INVESTIGATION OF RUN-LENGTH DISTRIBUTIONS
In industrial quality control, the period between samplings is typically short (minutes
or hours), and the cost of sampling and measurement is inexpensive relative to the value of
the product. For this reason, the learning period required to obtain excellent estimates of
the process mean quality JI, the process quality standard deviation a, and the appropriate
data transformation is short After these three items are obtained, it is possible to
mathematically determine the distribution of the run lengths of the control process for in-
control and for out-of-control (OOC) situations (Brook and Evans, 1972). In the RCRA
hazardous waste site quality control situation, sampling and measurement is expensive, and
the time between samplings is long (typically, three or six months). For this reason,
quality control decision procedures must be in place before good estimates are available and
adjustments must be made in the values used for |i and a while the quality control process
is in operation. This, in turn, implies that one cannot use the usual mathematical procedure
to determine the properties of average run lengths. The alternative that is employed in this
study is to estimate the distributions of run lengths by using Monte Carlo procedures.

With the Monte Carlo procedure pseudo-random numbers are generated by a
computer to represent measurements of GQP at wells around a HWS. A specified quality
control procedure is followed and one observes how many simulated sampling periods are
passed through before an out-of-control signal is produced. This process is repeated 100
times, and information as to the distribution of the run lengths is observed. For
determining the distribution of run lengths in an OOC state, the mean of the random
deviates being generated is increased at some randomly chosen point in the process, and the
number of sampling periods required to obtain an out-of-control signal after the change is
recorded. This process is repeated 100 times to give an empirical distribution of OOC run
lengths.

The first step in the development of a combined Shcwhart-CUSUM quality control
scheme is to choose a learning procedure and to determine appropriate parameters U, h,
and k, for the control scheme. For the Shewhart-CUSUM scheme, one declares an out of
control situation at sampling period i if for the first time, Si = Max{0, z, - k + S{_i}2 h, or

23
-------
Zj 2 U, where zj is the standardized ith observation (i.e., z, = (xj-ji)/a). In the learning
procedure, it is necessary to decide on the number of sampling periods to use to obtain
initial estimates for the target quality value and for the standard deviation of the quality
measurements. Next, a decision is needed as to how often and when to update the
estimates. Finally the parameters U, h, and k, have to be determined so that the quality
control scheme will have good operating characteristics under ideal conditions (i.e., steady
state, independence of all measurements, normal distribution or transformation to a normal
distribution of GQP measurements from a well under control). Once a procedure that
works well under ideal conditions is found the next step is to see how it will work when
some of the ideal conditions are absent

As a starting point, it was decided to use U=4, h=5, and k=0.5 in the Shewhart-
CUSUM scheme. Pseudo-random N(0,l) deviates were generated using the subroutine
GGNML in the International Mathematical and Statistical Library (IMSL) of computer
subroutines for FORTRAN programs. The scheme started on the ninth sampling period
and employed the sample means and standard deviations obtained over the first eight
simulated sampling periods as surrogates for the true unknown means and standard
deviations. If no OOC signal had been obtained, the estimates of the process quality mean
and standard deviation were recalculated after every fourth sampling period up to the 32nd
by using all available data. Sample means and standard deviations were calculated for each
well and for the sampling period averages over all wells. That is, for a given contaminant,
control charts were kept for each well, and an additional chart was kept for the sampling
period GQP average over wells . If upper control limits were exceeded on any one of these
(w+1) charts, an OOC signal was given. To save computer time the simulation process
stopped automatically if the simulation went through 1000 sampling periods (after the eight
period learning stage) without an OOC signal. The occurrence of an OOC signal or the
completion of sampling period 1000 marked the completion of a trial. In-control GQP
mean values were determined for the w wells by choosing w random deviates from a
N(0,l) distribution, multiplying each by 0.5, and adding 6 to each such product to obtain
w N(6,0.25) deviates. For each well in each sampling period, a pseudo-random N(0,l)
deviate was added to the appropriate well GQP mean value to obtain a GQP measurement.
With w=4 wells, 100 trials gave a median run length to a false positive of 76.5 and a
sample mean run length of 160.8. None of the 100 trials ran through the possible 1000
sampling periods. Since these average run lengths were considered too small, h was
changed from 5 to 5.5, and the simulation tried again. The median in-control run length
increased to 144.5, and the (Winsorized) sample mean was 242, with 5 trials running the
full 1000 possible sampling periods. (Sample mean is "Winsorized" in the sense that trials
that would have reported more than 1000 sampling periods if the simulation process had
been allowed to continue were reported as 1000 in calculating the mean.) While this
24
-------
modification was an improvement, results were not as good as hoped for, and it also had a
bad effect on average run length in OOC situations. For this reason, h was returned to the
original value 5, and k was increased from 0.5 to 0.75. This had a more dramatic effect.
The median in-control run length increased to 619.5 and the (Winsorized) sample mean
was 577. There were no false-positive signals over the 1000 sampling periods on 35 of the
100 runs.

While the last mentioned procedure has some good in-control characteristics, it sail
has some problems such as many short in-control run lengths, and many long OOC run
lengths. Additional modifications were tried until a procedure was arrived at that seemed a
good compromise. Under this compromise, U=4.5, h=5, and k=l for the first 12
sampling periods after the eight period learning stage, and then U is reduced to 4.0, k is
reduced to 0.75, and these two parameters are held at these values for all subsequent
periods. Adjustments in sample means and sample standard deviations are made after
sampling periods 4, 8, 12, 20, and 32, following the learning stage. (Figure 3 gives a
schematic of this Monte Carlo simulation procedure.) Now, for w=4 wells, in two
separate simulation runs, the Monte Carlo procedure gave median in-control run lengths of
502 and 637.5, and (Winsorized) sample mean run lengths of 550.5 and 589.6. No OOC
signals were given on 33 trials of the 100 trials on one simulation run and on 41 trials on
the other run. While these averages are no better than than those obtained earlier, there
were far fewer very short in-control run lengths, and better results were obtained in OOC
situations. As the number of wells at a site is increased, the number of control charts that
can give a signal increases, and thereby, the average of in-control run lengths decreases.
For eight wells, the median in-control run length over 100 trials was 228 on one simulation
run and 249.5 on the other, and the (Winsorized) sample means were 342.5 and 385.5.
Information on in-control run lengths is given in Table 3.

Under the ideal conditions and the procedure specified in the preceding paragraph,
OOC situations were investigated and the OOC run length (number of simulated sampling
periods until OOC signal after increase in one or more well GQP means) empirical
distributions were obtained. These OOC situations involved m wells (out of the w wells
being monitored) whose mean contaminant concentrations were increased by r standard
deviations at a randomly selected period (from 1 to 48, with equal probabilities) after the
learning stage. The results of these OOC simulations are presented in Table 4. The table
shows, as one would expect, that the distribution of run lengths is primarily dependent on
number of wells, m, intercepted by the contaminant plume, and by the amount of increase,
r standard deviations, in GQP mean values. The number of wells, w, at the site has little if
any influence on the distribution of OOC run lengths.
25
-------
Read m. w. Xinc. L. Dseed
-»4n - n+1
E
ror each well i, randomly choose mean value:
- 6 * 0.5 • random N(0,l) deviate
ABBREVIATIONS
m - number of wells intercepted by plume
w - number of wells at HWS
Xinc * increase in mean value of GQP when
plume intercepts a well
L - number of samplings in learning period
Dseed - seed number for random number
generator
LEARN
ING PERIOD
or each of the w wells i, and for each of the L sampling periods j generate a GQP measurement:
KM * R * rm<*om N( 0, generate a random integer S between 1 and 48, inclusive, from a Uniform distribution to
x the number of the first sampling period, after the learning period, in which a plume has intercepted
m well(s).
pf j • 5.9.13.21, or 33, recalulate ptj, p. and 3 using all data obtained prior to this sampling \
Generate GQP measurements for each of the w wells:
X{ - m + random N(0,l) deviate if Xinc - 0, or j < S or i > m,
or
Xs- m + Xinc 4- random N(O.l) deviate if Xinc > 0. j £ S. and i $ m
For any well i. is
rj - (Xj -ftyd S U or Sj -
or
i 5
z- (X- Jl)(Vwyd i U or Tj - Max{(X Tj.! - k - z) 2 5
whereU-4_5andk« lif j^l2tandU-4aodk-0.75ifj> 12?
NO
YES
-NO«« - |ll j - 10007|-»-YES-H

I Print'run length •1000-1
'
1—NO-f-
- Is n - 1007
(Print'run length-j'j
ICakulate sample staristics on 100 run lengths]

Figure 3. Schematic of Monte Carlo simulation under ideal conditions
26
-------
TABLE 3. IN-CONTROL RUN-LENGTH STATISTICS FOR w WELLS UNDER
IDEAL CONDITIONS (N = 100 TRIALS)
w
4
4
8
8
Median
502
637
228
249.5
Mean*
550.5
580.6
342.5
385.5
#
1
1
1
2
4
4
1
2
*
1
2
3
2
1
2
2
2
Median
16
4
3
3
4
1
4
3
Onesd
1
7
16
6
5
54
2
3
Twos
3
9
25
19
13
33
10
23
Threes
2
12
31
27
23
10
22
37
Fours
2
27
19
21
13
2
18
11
#FPe
5
7
3
4
4
1
6
5
Comments
three 1000's
maximum = 112
maximum = 8
maximum = 40
maximum =163
maximum = 4
maximum = 557
maximum = 14
a w = number of wells.
b m = number of wells intercepted by plume.
c The mean value of the GQP measurements are increased by ra at wells intercepted by
plume.
-------
The next set of simulations employs normal distributions as before, but now each
sample measurement from a well is serially correlated, with correlation pj, with the
preceding sample measurement from that well. All other serial correlations are zero. Two
values for pj are used: 0.11 and 0.36. Table 5 shows the in-control, run length statistics,
and Table 6 gives the OOC run length results. Comparison of these two tables with the
preceding two tables indicates that even a moderate positive serial correlation seriously
shortens the in-control run lengths, but has little effect on the OOC run lengths. Hence,
positive serial correlation in measurements between sampling periods will lead to more
false positive results than expected under ideal conditions; or, putting it another way, this
quality control scheme is not robust with respect to the assumption of independence
between measurements from different sampling periods.

Another question of interest is robustness of the procedure to incorrect distribution
assumptions. For measurements of concentrations, it is quite common to perform a log-
transformation to stabilize variances and to make the distribution more symmetric and more
like a normal (Gaussian) distribution. If the original distribution of the concentrations is
log-normal, the log-transformation is appropriate. However, if the original distribution is
not log-normal, the log-transformation leads to something other than the desired normal
distribution. In order to check the robustness of the procedure to this sort of problem, data
were used that were log-transformations on psuedo-random deviates generated with
Gamma(o,P) distributions. Pseudo-random Gamma(a,P) deviates were generated using
the subroutine GGAMR in the International Mathematical and Statistical Library (IMSL) of
computer subroutines for FORTRAN programs. Two values, 1.5 and 3.0, of the shape
parameter a were employed together with a scale parameter value of JJ = 6. The
distributions of log-transformations of such gamma deviates were investigated by
generating 1000 random Gamma( 1.5,6) and 1000 Gamma(3,6) deviates, taking the log-
transforms of all deviates, and then calculating sample distribution statistics for the two sets
of data. The results are given in Table 7. These results show that the log-Gamma( 1.5,6)
deviates have a sample distribution with a larger negative skewness and a larger kurtosis
than the sample distribution log-Gamma(3.0,6) deviates. The sample distribution of the
log-Gamma(3.0,6) deviates is more like that expected for a sample from a normal
distribution than that obtained from the other set of deviates. Owing to the skewness of the
two sample distributions, to obtain shifts in the mean comparable to those employed in the
previous simulations with normal distributions, we employed the interquartile range of each
set of deviates divided by 1.35 as estimates, oj, of the standard deviations. The run length
results for log-Gamma data are given in Tables 8 and 9. While very good in-control, run-
length characteristics were obtained for both values of a, the OOC results were less
favorable than those obtained with a normal distribution and somewhat worse for a =1.5

28
-------
TABLE 5. IN-CONTROL RUN-LENGTH STATISTICS WHEN
MEASUREMENTS ARE SERIALLY CORRELATED3
(N = 100 TRIALS, NORMAL DISTRIBUTION)
w
Pi
Median
Meanb # £ 10 # of 1000's
4 0.11 274.5 380.4 3
4 0.36 82.0 154.2 6
16
3
a Serial correlations Pj = 0 for j>l.
b Simulations stopped after 1000 samples and mean is Winsorized mean.
TABLE 6. RUN-LENGTH STATISTICS FOR SERIALLY-CORRELATED3 DATA IN
OUT- OF-CONTROL SITUATIONS (N = 100 TRIALS)

wb m r pi Median Ones Twos Threes Fours #FP Comments
422 0.11 3 9 16 25 26 2
422 0.36 3 6 19 21 19 10
ninety-six<10
eighty-nine<10
a Serial correlations Pj =0forj>l.
b See Table 4 for definitions of symbols in this row of headings.
29
-------
TABLE 7. SAMPLE STATISTICS FOR LOG-GAMMA(cc,6) DEVIATES (N = 1000)

cc Mean Std. Dev. Skewness Kurtosis3 ajb

1.5 1.84 0.98 -0.93 1.50 0.879
3.0 2.71 0.63 -0.47 0.21 0.633

* Kurtosis defined so that kurtosis for a normal distribution is zero.
b This is an estimate of dispersion obtained by dividing the interquartile range by 1.35.
TABLE 8. IN-CONTROL RUN-LENGTH STATISTICS WHEN DATA ARE
LOG-GAMMA(a,6) RANDOM DEVIATES (N = 100 TRIALS)

a w Median Mean* #S10 # of 1000's

1.5 4 1000 972.0 1 97
3.0 4 1000 976.8 1 96

* Simulations stopped after 1000 samples and mean is Winsorized mean.
TABLE 9. RUN-LENGTH STATISTICS FOR OUT-OF-CONTROL SITUATIONS
WHEN DATA ARE LOG-GAMMA(o,6) RANDOM DEVIATES
(N= 100 TRIALS)

a wa m i* Median Ones Twos Threes Fours #FP Comments

1.5 4 2 2 5 0 3 16 28 1 Ninety-five S10
3.0 4 2 2 4 4 6 27 30 2 Ninety-seven Z10

* See Table 4 for definitions of symbols in this row of headings.
b Standard deviation is
-------
than for a = 3.0. The good in-control, run length results were probably caused by the
negative skewness of the distributions. The poorer OOC run length performance for
a=1.5 was no doubt caused by the extent of the deviation of log-gamma(1.5,6)
distribution from normality. This indicates that while there is a considerable amount of
distribution-assumption robustness in this method, some care should be employed in
choosing a suitable data transformation so as to keep the true distribution of the
transformed data from deviating too wildly from the normal.

To show what happens when a right-skewed distribution is not transformed, run-
length distribution results are presented in Table 10 for Monte Carlo runs on Gamma (3,6)
data that are pot log-transformed. The average in-control, run-length distribution is much
worse (shorter), and the average OOC run-length distribution is somewhat worse (longer)
than for the corresponding log-transformed data cases.

The next set of Monte Carlo simulations is for the case where there are several
different GQP being measured on each water sample. It seems reasonable to assume that
these different GQP measurements will be positively correlated. To see the effect of having
several GQP to measure and the effect of correlation, a situation with four GQP and zero
correlation between the four measurements, and a situation with four GQP and correlations
of 0.7 between GQP measurements are considered. In a sampling period, water samples
are taken from each of four wells and measurements of the four GQP are obtained for each
sample by use of the multivariate normal random number generating program GGNSM in
the International Mathematical and Statistical Library (IMSL) of computer subroutines for
FORTRAN programs. The Shcwhart-CUSUM quality control scheme is applied to each
GQP measurement at each well, which means that there are a total of 16 control charts
being constructed for the four individual wells and 4 control charts for the four GQP
averages over wells. If any one of the 20 charts shows an OOC situation, the process is
declared to be out-of-control. Two repetitions of each 100 trial simulation were performed
in order to obtain some idea of the variability of results that might be obtained. The results
of Monte Carlo runs on in-control situations are given in Table 11. Since there are 20
control charts for the four wells (rather than the 5 charts used in the single variable case
[Table 3]), average run lengths to false OOC signals should be expected to be smaller, and
they are. However, Table 11 does show that the median in-control run length is longer
when there is a positive correlation between GQP measurements. For OOC situations, the
mean value of the measurements on one of the four GQP was increased by two standard
deviations at two of the four wells starting with randomly chosen sampling period
numbered between 1 and 48 inclusive. Table 12 gives the results of these simulations. A
comparison with the results in Table 4 shows that average OOC run lengths are tittle
effected by increasing the number of variables being monitored and that correlation between

31
-------
TABLE 10. RUN-LENGTH STATISTICS WHEN DATA ARE
GAMMA(3,6) RANDOM DEVIATES (N = 100 TRIALS)
wa
4
4
m
0
2
r»>
0
2
Median
58
4
Ones
1
3
Twos
3
8
Threes
3
15
Fours
2
15
#FP
99
28
Comments
Only one 1000
Sixty-five < 10
a See Table 4 for definitions of symbols in this row of headings.
b Standard deviation is Cfj which is 1.35 times the interquartile range of 1000
gamma(3,6) deviates generated by the software package used in this simulation.
32
-------
TABLE 11. IN-CONTROL RUN-LENGTH STATISTICS FOR MEASUREMENTS
ON FOUR GQP AT EACH OF FOUR WELLS
(N = 100 TRIALS, IDEAL CONDITIONS)

Correlation3 Median Meanb # S 20 # of 1000's
0
0
0.7
0.7
99
87
141
129
197.6
142.9
253.2
235.8
14
10
14
10
3
2
10
9
a The covariance matrix for the simulated muldvariate normal distribution was the
identity matrix for the zero correlation case, and it was a matrix with 1's on the
diagonal and 0.7's of the diagonal for the 0.7 correlation case.
b Winsorized mean in that 1000 was used as run length for trials in which an OOC
signal was not obtained.
TABLE 12. RUN-LENGTH STATISTICS FOR OUT-OF-CONTROL SITUATIONS IN
WHICH ONE OF THE FOUR GQP MEANS HAS INCREASED BY TWO
STANDARD DEVIATIONS AT TWO OF THE FOUR WELLS
(N = 100 TRIALS, IDEAL CONDITIONS)

Correlation8 Median Ones Twos Threes Fours #FP Comments
0
0
0.7
0.7
3
3
4
3
4
10
3
6
13
18
10
17
23
25
27
27
19
12
23
15
23
11
13
17
Seventy-six £11, max.=509
Eighty-eight £ 9, max. = 31
Eighty-seven S 13, max.=13
Eighty-two £ 15, max.= 66
8 The covariance matrix for the simulated multivariate normal distribution was the identity
matrix for the zero correlation case, and it was a matrix with 1's on the diagonal and 0.7's
of the diagonal for the 0.7 correlation case. See Table 4 for definitions of other headings.
33
-------
GQP measurements also has little effect on OOC run length distribution. However, as
would be expected by the increased number of control schemes being run, the numbers of
false positives (i.e., OOC signals prior to the system going out of control) are larger here
than in the earlier case where only one GQP was being monitored.
One other point that needs investigation is the length of the learning period. In the
above Monte Carlo investigations, the learning period has been set at eight sampling
periods. Under current (soon to be outdated) RCRA regulations (see Section 2), a learning
period of four sampling periods over a one year period is specified to determine
characteristics of the sampling distribution of the measurements at up-gradient
(background) wells. If in a quality control scheme the learning period is set to four
sampling periods, then, in the case of four wells, the number of degrees of freedom for the
pooled estimate of the variance at the beginning of monitoring under the quality control
scheme would be only 12, whereas it is 28 when the learning period is eight sampling
periods. Under ideal conditions, the 95% confidence intervals for a2 are

0.51 s2 < a2 < 2.72 s2 for 12 d.f.,
*
and
0.62 s2 < a2 < 1.83 s2 for 28 d.f.

For a monitoring system with eight wells, the pooled estimator of the variance at the end of
a learning period of length four would have 24 degrees of freedom, whereas it would have
56 degrees of freedom for a learning period of length 8. Monte Carlo simulations under
ideal conditions were run in which learning periods of length four were employed to
compare the in-control, run-length distributions in this case with the results in Table 3,
which give the corresponding results when the learning period is eight samplings. The
results of these simulations and a repetition of the results from Table 3 are presented in
Table 13. Learning periods of length four, for the four-well situation, give more very short
in-control run lengths and, in addition, the median run lengths are much shorter than the
corresponding results for eight period run lengths. For the eight well situation, there is still
some evidence of more very short in-control run lengths for the shorter learning period
case, but there is no evident difference in median in-control run lengths between the two
learning periods. This indicates that if one is using eight or more wells in a site monitoring
system, a learning period of length four might be considered; but for fewer wells, one
definitely should use the longer learning period. The longer learning period is always to be
preferred if it can be afforded.
34
-------
TABLE 13. IN-CONTROL RUN-LENGTH STATISTICS FOR LEARNING PERIODS
OF LENGTHS FOUR AND EIGHT3 (N=100 TRIALS, IDEAL CONDITIONS)
Lb
8
8
4
4
8
8
4
4
w
4
4
4
4
8
8
8
8
Median
502
637
399
384.5
228
249.5
281
235.5
Meanc
550.5
580.6
511.5
506.6
342.5
385.5
412.1
355.36
#£10
0
2
10
14
2
0
8
13
# of 1000's
33
41
34
34
12
14
18
14
a For L = 8, data are repeated from Table 3.
b L is learning period length in terms of sampling periods.
c Simulations stopped after 1000 samples and mean is Winsorized mean.
35
-------
SECTION 9

A DECISION PROCEDURE
A decision procedure for ground-water monitoring must be adaptive and pragmatic.
It must be in accord with available knowledge about the aquifer being sampled. At the
beginning of sampling when little information about the system is available, simple
procedures such as the outlier tests mentioned earlier should be employed. As more
information about the mean and variance of measurements taken at the various wells
becomes available, changes to more powerful procedures should be possible.
t
At the end of eight (say) sampling periods, the temporal variance for each GQP may
be estimated by using the analysis of variance procedure discussed earlier. With this
variance estimate and the sample mean for the GQP at a well, one can initiate a combined
Shewhart-CUSUM scheme with possible inclusion of the robust two-in-a-row rule of
Lucas and Crosier (1982b) that was mentioned above. At the start, it would be well to
choose a larger than normal CUSUM parameter k and Shewhart upper limit U as
mentioned in the Monte Carlo study (Section 7), because of the lack of precision in the
estimation of |i and a. Periodically the data should be combined with the data from
previous periods, provided a basic change in the system (i.e., the aquifer) has not
occurred, to obtain improved estimates of |i and a. The quality control scheme should
then be updated by using these new estimates for p. and a. If experience indicates a fairly
steady system, and that the estimates of Ji and a are reasonably accurate, the values of k
and U may be reduced to obtain smaller average OOC run lengths.

If it is clearly possible to distinguish up-gradient and down-gradient wells (which is
often not possible owing to shifting gradients or slow flow) additional decision-making
information may be available. When the control scheme for a GQP at a down-gradient well
indicates an out-of-control situation, the pattern of change in measurements of the GQP at
this well should be compared to the patterns observed at the up-gradient wells. If a similar
pattern was observed earlier, or currently, at one or more up-gradient wells, it would
indicate that the out-of-control situation is being caused by something other than the HWS
and that increased monitoring activity would not be required. The importance of
monitoring up-gradient wells and keeping corresponding control charts on them cannot be

36
-------
overemphasized. They are the principal means of determining when out-of-control
situations down-gradient are caused by off-HWS activities. In addition, when similar
patterns cannot be found up-gradient, this result gives greater confidence to a call for
additional regulatory action in down-gradient, out-of-control situations. The nature of the
GQP must also be taken into consideration. For example, a pollutant with low specific
gravities may, in the case of a leak, mound above the aquifer and spread out in all
directions; thereby, its plume may arrive at up-gradient wells before or at the same time that
it intercepts down-gradient wells.
In situations, where it is impossible to clearly define wells as up-gradient or down-
gradient, control charts should also be kept for wells on all sides of the HWS. However,
the inferences mentioned in the preceding paragraph will not be applicable.
The combined Shewhart-CUSUM procedure was selected because different types of
leakage (e.g., small or massive) are possible at a HWS. The robust two-in-a-row rule
should be considered because gross errors in GQP measurements are a problem -usually
due to contamination of some stage of the measurement system. It would be useful in
situations where the slow flow of water in the aquifer allows the wait for a decision till the
next sampling period without significantly increasing health risks or remedial costs if a leak
has occurred. However, in most cases, it will be necessary to follow the current procedure
of checking for measurement error by taking a second sample at the well as soon as
possible after an unusually high sample measurement is obtained
37
-------
REFERENCES

1. Aitchison, J. On the Distribution of a Positive Random Variable Having a Discrete
Probability Mass at the Origin. J. American Statistical Assoc. 50:901-908, 1955.

2. Alt, F. B. Multivariate Control Charts. In: Encyclopedia of Statistical Sciences, Vol.
6, (eds. S. Kotz and N. L. Johnson) John Wiley & Sons, New York, 1985. pp. 110-
122.

3. Anderson, T. W. An Introduction to Multivariate Statistical Analysis (2nd Ed.). John
Wiley & Sons, New York, 1984. 675 pp.

4. Barnett, V., and T. Lewis. Outliers in Statistical Data. John Wiley & Sons, .New
York, 1978. 365pp.

5. Bissell, A. F. The Performance of Control Charts and Cusums Under Linear Trend.
Applied Statistics 33(2):145-151,1984

6. Code of Federal Regulations, 40, Parts 264 and 265. Office of the Federal Register,
Washington, D.C., 1985.

7. Freeze, R. A., and J. A. Cherry. Groundwater. Prentice-Hall, Inc., Englewood
Cliffs, NJ, 1979. 602pp.

8. Gunman, L, S. S. Wilks, and J. S. Hunter. Introductory Engineering Statistics (3rd
Ed.). John Wiley & Sons, New York, 1982. 580pp.

9. Ingamells, C. O., and P. Switzer. A Proposed Sampling Constant for Use in
Geochemical Analysis. Talanta 20(6): 547-568,1973.

10. Jackson, J. E. Multivariate Quality Control. Communications in Statistics - Theory &
Methods 14(11):2657-2688, 1985.

11. Jackson, J. E., and G. S. Mudholkar. Control Procedures for Residuals Associated
With Principal Component Analysis. Technometrics 21(3):341-349,1979
38
-------
12. Ligget, W. Statistical Aspects of Designs for Studying Sources of Contamination. In:
Quality Assurance for Environmental Measurements: STP 867, J. K. Taylor and T.
W. Stanley, Eds., American Society for Testing and Materials, Philadelphia, PA,
1985. pp. 22-40.

13-Lorenzen, T. J., and L. C. Vance. The Economic Design of Control Charts: A
Unified Approach. Technometrics 28(1):3-10, 1986.

14. Lucas, J. M. Combined Shewhart-CUSUM Quality Control Schemes. J. Quality
Technology 14: 51-59, 1982.

15. Lucas, J. M. Cumulative Sum (CUSUM) Control Schemes. Communications in
Statistics - Theory & Methods 14(11):2689-2704, 1985a.

16. Lucas, J. M. Counted Data CUSUM's. Technometrics 27(2): 129-144, 19856.
*
17. Lucas, J. M., and R.B. Crosier. Fast Initial Response for CUSUM Quality Control
Schemes. Technometrics 24(3): 199-205,1982a.

18. Lucas, J. M., and R. B. Crosier. Robust CUSUM: A Robustness Study for CUSUM
Quality Control Schemes. Communications in Statistics - Theory & Methods 11(23):
2669-2687, 1982b.

19. Miller, M. D., and F. C. KohouL RCRA Ground Water Monitoring Statistical
Comparisons: A Better Version of the Student's t-Test. In: Proceedings of the
NWWA/API Conference on Petroleum Hydrocarbons and Organic Chemicals in
Ground Water - Prevention, Detection and Restoration. National Water Well
Association, Worthington, OH, 1984. pp. 211-223.

20. Miller, R. G. Simultaneous Statistical Inference. McGraw-Hill, Inc., New York,
1966. 272pp.

21. Montgomery, D. C., and P. J. Klatt Economic Design of 1*2 Control Charts to
Maintain Current Control of a Process. Management Science, 19(l):76-89,1972.

22. Owen, W. J., and T. A. De Rouen. Estimation of the Mean for Log Normal Data
Containing Zeros and Left-censored Values, With Applications to the Measurement of
Worker Exposure to Air Contaminants. Biometrics 36:(4)707-719,1980.
39
-------
23. Page, E. S. Continuous Inspection Schemes. Biometrika 41(1):100-114, 1954.

24. Scheffe, H. The Analysis of Varance. John Wiley & Sons, New York, 1959. 477pp.

25.Shewhart, W. A. Economic Control of Quality of Manufactured Product. Van
Nostrand, New York, 1931.

26. Vaughan, W. J., and C. S. Russell. Monitoring Point Sources of Pollution: Answers
and More Questions From Statistical Quality Control. American Statistician 37(4,
Part2): 476-487, 1983.

27.Woodall W. H., and M. M. Ncube. Multivariate CUSUM Quality-Control
Procedures. Technometrics 27(3):285-291, 1985.
40
-------