-------

-------
                   STATISTICAL METHOD
                   BACKGROUND DOCUMENT
1.1  Introduction

     The regulation as proposed on December 18, 1978 (43 FR
     59005) called for the use of the Student's t Test to
     detect statistically significant differences in the
     concentration of leachates found in upgradient and
     downgradient ground water samples.   The Student's t
     Test was initially choosen because it would provide a
     relatively sensitive (powerful)  test for detecting
     statistically significant differences in groundwater
     leachate concentration usi-ng small sample sizes (i.e.,
     sample sizes of seven (7) observations per sample of
     well water).  Based on comments received in response to
     the proposed use of the Student's t Test, EPA has
     changed the statistical methodology to the Mann-Whitney
     U Test.  The rationale for this change is documented in
     the remainder of this paper.

1.2  Comments as they relate to the Statistical Methodology

     Comments relating to the statistical methodology pro-
     posed in the December 18, 1978 publication (43 FR
     59005) can be classified into two general areas:  (1)
     violations of the mathematical assumptions underlying
     the statistical methodology; and (2). statistical
     sensitivity (power)  of the test.

     Violations of the underlying distribution assumptions
     of the Statistical Model

     The mathematical model underlying the Student's t Test
     assumes that the sample observations (data)  have been
     drawn from a population (all possible observations)  in
     which the measured observations of  leachate concentra-
     tion are independent and normally distributed.  Fur-
     ther,  the model assumes that all sample measurements
     were taken at the same point in time and represent a
     random sample of observations from  populations with a
     constant mean and variance.
                          - 2 -

-------
     Concern was expressed that the above assumptions would
     not be met by  the sampling methodology presented in the
     proposed regulations.  EPA agrees that these concerns
     would be valid under ceritain circumstances.  In re-
     sponse to those concerns EPA has changed the statis-
     tical methodology and will use the Mann-Whitney U Test
     instead of the Student's t Test.  The Mann-Whitney U
     Test is a nonparametric test,  (i.e., a statistical test
     which makes no assumptions regarding the nature of the
     underlying population distribution), and as such is not
     dependent on samples drawn from populations with known
     distributions.

     Rationale for  Use of the Mann-Whitney U Test

     Based on the comments received the EPA is proposing to
     substitute the nonparametric Mann-Whitney U Test for
     the more stringent, (i.e., more assumptions underlying
     the mathematical model)  Student's t Test.  The Mann-
     Whitney U Test is nonparametric and only requires that
     the sample be  drawn from a population of independent
     and continuous measurements.   The Student t Test
     assumes the underlying population to be normally
     distributed in addition to the measurements being
     independent and continuous.   The assumption of nor-
     mality inherent in the Student's t Test is what caused
     concern in applying that methodology to detect signi-
     ficant differences in leachate concentration.   The
     observations (seven measurements/well water sample)  in
     each sample are subject to measurement error as well as
     laboratory bias;  therefore it is possible that the
     underlying error  distribution is not normal and the
     sample of seven measurements  represents some other
     unknown population.i/  Given  this situation, the non-
     parametric Mann-Whitney  U Test is better suited as a
     decision model than the  more  "stringent"  Student's t
     Test.
I/   See Tai, Larry S.L., Statistical Methods for Determining
     the Measurement Precision of Drinking Water Contaminants,
     EPA, TSC-PD-A223-1, Final Report Task 1 Contract No.  68-
     01-5086, July 1979, for a discussion of non-normal error
     distribution for samples

-------
     Nonparametric  statistical  tests are uniformly less
     powerful  than  their parametric analogs when all assump-
     tions of  the parametric model are met; however, when
     the underlying assumptions are not met, one nonpara-
     metric  technique can provide greater sensitivity than
     the parametric analog.

     The power of the Mann-Whitney U Test compares favorably
     with the  Student's t Test; when the two populations
     sampled are assumed to differ only in location  (i.e.,
     mean or median) the Mann-Whitney U Test is almost equal
     in power  to the Student's  t Test.—/

     In summary, the Mann-Whitney U Test was selected be-
     cause of  its computational simplicity and broad range
     of applicability in situations where the more stringent
     assumptions of parametric  techniques would be violated.

1.3  Protocol  for using the Mann-Whitney U-test

     The Agency is  considering  requiring a minimum of seven
     observations in each of the samples to be compared.
     This should provide adequate power for detecting meaning-
     ful differences in leachate concentrations.  In this
     example,  the seven observations for the experimental
     sample  will be compared to the seven observations in
     the control sample using the following procedure.

     (1)  Combine the two data  sets in a single list,
          arranged  from lowest  to highest values.  For
          example,  assume we obtained the following sets of
          observations for the control (C)  and the experi-
          mental (E) wells, measured in mg/1:

               C    3.1, 3.2, 3.3, 3.4. 4.2, 4.5, 5.0

               E    4.0, 4.3, 4.8, 5.2, 5.5, 5.6, 5.8

          These fourteen data points would be reordered as
          follows:

               3.1  3.2  3.3  3.4  4.0  4.2  4.3
                C    C    C    C    E    C    E

               4.5  4.8  5.0  5.2  5.5  5.6  5.8
                C    E    C    E    E    E    E
2y   See Gibbons, Non-Parametric Statistical Inference, McGraw-
     Hill, 1971, pp. 148-149, for a discussion of the relative
     efficiency of the Mann-Whitney U and Student's statistics.

-------
       (2) For each control value  (C) count the number of
          experimental  (E) values which precede it.  The
          process is shown below:
                            ri,
               CCCCECECECEEEE
               0000     1     2     3

          That is to say, each of the first four control
          values has no experimental value preceding it, the
          fifth control value has one, the sixth two, and
          the last has three values preceeding it.  In the
          case of ties list the control value before the
          experimental value.

     (3)  Sum the counts obtained for the control values.
          The Mann-Whitney U statistics is the sum of these
          counts, that is:

               U = 0 + 0 + 0 + 0+l+2+3 = 6

     (4)  Determine if there is a statistically significant
          difference between the control and experimental
          samples.  If the calculated value of U is less
          than eleven, there is a statistically significant
          difference in the control and experimental samples
          at the 95% confidence level.

REFERENCES

1.   Mary Gibbons Natrella, Experimental Statistics, (Wash-
     ington, D.C. - U.S. Government Printing Office, 1966)
     p. 1-14; pp. 2-12 to 2-15.

2.   Sidney Siegel, Nonparametric Statistics, (New York:
     McGraw-Hill Book Co., 1956)  pp. 116-126.
                          - 5 -

-------