Technical Support Document For ODES Statistical Power Analysis


United States
Environmental Protection
Agency
Office of Marine
and Estuarine Protection
Washington DC 20460
EPA 430/09-87-005
June 1987
Water
Technical Support Document for
ODES Statistical Power Analysis
                            SEP 0 6 1994
                   Power of Test (1-p)
                   •Critical Region-

-------
TC 3953-03
Final  Report
TECHNICAL SUPPORT DOCUMENT  FOR
ODES  STATISTICAL POWER ANALYSIS
for
U.S. Environmental Protection  Agency
Office of  Marine and Estuarine Protection
Washington,  DC  20460
June 1987
by

Tetra Tech,  Inc.
11820 Northup Way, Suite 100
Bellevue, Washington  98005
U.S. Environmental Protection Agency
Region 5, Ubrar,- .K.-12J)
77 West  Jackson Bcn^-n^  j-'ti
Chicago,  IL  60f' :-.
                                      U.S. Environmental Protection Agency
                                      GLNPO Library Collection (PL-12J)
                                      77 West Jackson Boulevard,
                                      Chicago, it  60604-3590

-------
                                 CONTENTS


                                                                        Page

LIST OF FIGURES                                                         ill

LIST OF TABLES                                                           iv

ACKNOWLEDGMENTS                                                           v

INTRODUCTION                                                              1

POWER OF STATISTICAL TESTS                                                2

     HYPOTHESIS TESTING                                                   2

     POWER ANALYSIS                                                       6

POWER CALCULATIONS                                                        9

     ANALYSIS OF VARIANCE                                                 9

EXAMPLE ANALYSES                                                         16

     EXAMPLE DATA AND PRELIMINARY ANALYSES                               16

          Example 1                                                      18
          Example 2                                                      20
          Example 3                                                      27
          Example 4                                                      27
          Example 5                                                      30

SUMMARY                                                                  32

REFERENCES                                                               34

APPENDIX A                                                              A-l

-------
                                  FIGURES

Number      .                                                            Page
   1    Hypothesis testing:   possible circumstances  and  test  outcomes      3
   2    Probability densities of the F statistic                           5
   3    Power of the F-test  vs.  minimum detectable difference for
        specified design parameters                                      19
   4    Effects of increased unexplained sample variability on the
        power of the one-way ANOVA                                       21
   5    Power of the F-test  vs.  minimum detectable difference for
        specified design parameters                                      24
   6    Effects of increased sampling efforts  on  the power  of the
        one-way ANOVA                                                    28
   7    Minimum detectable difference vs. number  of  replicates for
        fixed set of design  parameters                                   29
   8    Effects of increased unexplained sample variability on the
        maximum detectable difference                                    31
                                    111

-------
                                  TABLES


Number                                                                  Page

   1    Analysis of variance table for one-way layout                     11

   2    Example data sets                                                17

   3    Measured contaminant concentrations in fish tissue (mg/kg)
        at six monitoring stations and one-way ANOVA results  for
        tests of differences among observed mean concentrations          23

   4    Log-transformed contaminant concentrations in  fish tissue
        (mg/kg) at six monitoring stations and one-way ANOVA  results
        for tests of differences among mean concentrations               26
                                     iv

-------
                              ACKNOWLEDGMENTS
     This document has  been reviewed by the 301(h)  Task Force of the Environ-
mental  Protection  Agency,  which  includes  representatives  from  the  Water
Management Divisions of the  U.S.  EPA Regions  I,  II,  III,  IV,  IX,  and X; the
Office  of  Research and  Development  -  Environmental  Research Laboratory-
Narragansett (located  in Narragansett, RI and Newport, OR),  and  the Marine
Operations Division in  the Office of Marine and Estuarine Protection, Office
of Water.

     This technical  guidance document was produced  for the U.S. Environmental
Protection Agency under the 301(h) post-decision  technical  support contract
No.  68-01-6938,  Allison J.  Duryee,  Project  Officer.   This  report was
prepared by Tetra  Tech,  Inc., under the  direction  of  Dr.  Thomas  C.  Ginn.
The primary author  was Mr.  Thomas M. Grieb.  The computer  program for the
power analysis tool  was developed by Mr.  Thomas M. Grieb  and  Dr.  Michael J.
Ungs.    Ms.  Marcy  B.   Brooks-McAul i f fe performed technical editing and
supervised report  production.

-------
                               INTRODUCTION
     The Ocean  Data Evaluation  System (ODES)  provides  users with  a wide
range  of  statistical  tools  for  analyzing  monitoring  data in  the ODES
database.  One of the most  valuable tools  in ODES for assessing discharge-
related  effects  is Analysis  of Variance  (ANOVA).   This  tool  enables  the
statistical evaluation of differences  in  biological  and chemical variables
among sampling stations (eg. discharge vs. control).  As a  companion  to  the
ANOVA tool, ODES  also contains  a  Statistical  Power Analysis  Tool  that  is
used in  the design  of  new monitoring programs and in the  interpretation  of
ANOVA test results.

     In  simple  terms,  statistical  power analysis is  the  evaluation of  the
ability  to detect  significant  statistical  results when  real differences
exist in a particular monitoring variable.  Application of  the tool enables
the  investigation  of  the statistical  implications  of alternative  sampling
strategies (eg. numbers  of  sample  replicates  or  sampling  stations).   This
application is especially useful in designing  new monitoring  programs or  in
evaluating the  effectiveness  (or  cost efficiency)  of  existing programs.
Power analysis  is  also  an important procedure to use  following  the use  of
ANOVA to  detect  discharge-related  effects.   In  such  cases,  power  analysis
can be used to assess the possibility  that an absence of  significant ANOVA
results is caused by an inadequate  sampling design.

     As a  supplement  to  the ODES Tool, this  document  provides a review  of
the  basic  concepts of  hypothesis   testing  and statistical power analysis.
The kinds of power analyses that can be conducted using ODES  are described,
and the uses of the tool  are described  with several  examples.

-------
                         POWER OF STATISTICAL TESTS


HYPOTHESIS TESTING

     The  statistical  tests available  in ODES  are often  applied  in  the
evaluation of  monitoring data to test  a  particular scientific  hypothesis.
ANOVA is  a  frequently  used test that enables the evaluation  of  statistical
significance of observed differences in the values of measured environmental
variables.  In the most basic application  of ANOVA for  environmental  effects
data, a  number (n) of  replicate  samples   is  collected at  a  number  (k)  of
fixed sampling stations or sampling  events.  This application is  referred to
as a fixed-effects one-way design  because  only a single factor (eg. sampling
stations) is  evaluated  in  each analysis.   The  one-way  ANOVA design  also
tests only a single hypothesis  during  each application.   For example,  in  a
design involving  multiple samples of  an  environmental  variable at  several
sampling stations, the ANOVA  tests  the null  hypothesis that  the effects  of
station location on the variable are not statistically  significant  (ie.  that
all station means are equal).   The  test of the  null hypothesis is based  on
the  F-statistic,  which  is a  ratio  of  the variability within ANOVA groups
(eg. sampling stations) to the variability among groups.

     The testing circumstances and outcomes associated  with  testing the  null
hypothesis are shown in Figure 1.   Four possible outcomes  exist:

     1.    The hypothesis is true and it is not rejected.

     2.    The hypothesis is true and it is rejected.

     3.    The hypothesis is false  and it is not  rejected.

     4.    The hypothesis is false  and it is rejected.

-------
     The shaded areas  shown  in  Figure 1 represent incorrect decisions.  The
incorrect  rejection  of the  null  hypothesis  is  referred  to  as  a  Type I
error.   The  probability of  a  Type I error,  designated a,  represents the
significance level  of the statistical  test.  The  incorrect acceptance of the
null  hypothesis  is  referred  to  as  the  /3  error,  where /3 represents the
probability of this  incorrect decision.   The  j3 error is  also  known as the
Type II error.  The probabilities  of the correct  acceptance and  rejection of
the null hypothesis  are represented by the complements of  the Type I and
Type II errors (i.e., 1-otand 1-/3), respectively.

     The probability densities  of the test statistic  for  the ANOVA (i.e.,
the F statistic) are shown in Figure 2 for  test  conditions corresponding to
a true null hypothesis  and  an alternative  hypothesis.  Under the alternative
hypothesis  (Hg  is  false),  there  is a fixed  but  unspecified  effect due to
sampling location  so that  the  mean values among  stations are  not equal.
These distributions will be used to demonstrate the  relationship between the
four possible outcomes  of  the hypothesis  testing process shown in  Figure 1
and to provide a probabilistic  interpretation of the power of  a statistical
test.

     The probability density of F when the  null  hypothesis is  true  is  shown
in Figure 2a.  This figure shows the probability of a  Type  I  or a error as
the  shaded  area  under  the  curve  to  the  right  of Fa[i.e., the  value  of F
corresponding to the selected significance  level of  the test  (a)].  Values
of F obtained in the test of significance  that are greater than  Fawill lead
to a rejection of a true null hypothesis (a Type I error), and these values
are said to represent the critical  region  of the  test.   Values  of F  obtained
in the test of significance that are less  than the critical  value (Fa) lead
to acceptance of  the true  null hypothesis, and  these values  represent the
acceptance  region of the test.  The probability of the correct  acceptance of
                                                                     /
the true null hypothesis is therefore  1-aand  is  represented by the  unshaded
area under  the curve.

     The corresponding probability density  of  F  for  the  alternative
hypothesis  (Hg  is false)  is  shown  in  Figure 2b.   Under  the alternative
hypothesis, the distribution is shifted to  the  right  as the expected  value

-------
                            Type I Error (x)
                   Fa
                                   •Critical Region-
             (a) Probability Density of F When Null Hypothesis is True
  Type II Error (
                                  Power of Test (1-(3)
                  h
•Critical Region-
             (b)  Probability Density of F When Null Hypothesis is False
Figure 2.   Probability densities of the F statistic.

-------
of the  test  statistic (F) increases.  However, the  value of the rejection
criterion (Fa) remains unchanged,  and the  shaded area under the curve to the
right of FQ,  now represents the probability of correctly rejecting the null
hypothesis when  it  is false  (i.e.,  of  detecting  a  statistical  difference
when one actually exists).  This probability is referred to as the power of
the  statistical  test.  The  complement  of  the  power of  the test  is  the
probability of  accepting  a false  null  hypothesis  (0).   The (3  or  Type II
error is shown in Figure 2b as the shaded area under the  curve  to the left
of Fa.

     With  the  probability  densities  in  Figure  2,   it  is  possible  to
demonstrate a dependency  between Type I and  Type II errors in the comparison
of  the  HO  and  the  fixed alternative  hypothesis  Hj.  For  example,  the
probability of  rejecting a  true  null hypothesis  (Type  I  error)  can  be
minimized by decreasing a.  This is equivalent to moving Fa to the right and
decreasing the critical regions in both Figures 2a and  b.   However,  as can
be seen in Figure 2b, the decrease in the Type I error  (a) achieved in this
manner is accompanied by  an increase  in the  Type II or £ error.

     While this  relationship  between a  and @ exists  for  the comparison of
any fixed alternatives or any  given statistical tests, this type of analysis
ignores  other  sampling  parameters  such  as level  of   sampling  effort  and
variability within  the sampling  environment.   As  described  below,  it  is
possible to  decrease 0  while  holding  a constant.   The emphasis  in  this
description,   however,  is on  the  evaluation of the power  (1-/3)  of  the
statistical  test under various sampling conditions.

POWER ANALYSIS

     The power  of the one-way fixed effects  ANOVA or the  probability of
correctly rejecting the false  null hypothesis  (1-jS) is  determined  by the
following five design parameters:

     •    Significance level of the test (a)

     •    Number of  sampling  stations

-------
     0    Number of replicates

     0    Minimum detectable difference (i.e., the smallest difference
          that  can  be  detected  among  means  of the  fixed-effects
          varible)

     t    Unexplained sample variance  (i.e., natural variability within
          the sampling environment).

     The relationship between the  power of a statistical test and the design
parameters makes several  types  of  power analyses possible.  For example, the
power  of  the  test  can  be determined as  a  function  of the  five  design
parameters.  Alternatively,  the value for any  individual  design parameter
required to obtain a  specified  power  of  the statistical  test -an be deter-
mined as a function of the other four  parameters.   It is this latter type of
analysis that  can be used  to  evaluate methods for decreasing  the Type II
error (@)  or  equivalently  increasing  the  power  of the test  while  hoi dingo;
constant.

     Two basic applications of  the ODES power  analysis tool are described in
this document.   The first  is  in  the evaluation   of  reported  results  of
statistical  tests of  significance.   For example,  acceptance  of  the  null
hypothesis at  some  specified significance  level  does  not  imply  that  it is
true and,  therefore, does not demonstrate the absence of differences in the
dependent  variable  of  interest.   In  reference  to the  above  discussion on
hypothesis testing, the  acceptance of the  null  hypothesis does  not provide
information on  the probability of  the Type  II error or  of accepting  a null
hypothesis that  is  false.   The probability of  the Type II error should be
investigated as a matter of course.   Power  analyses  should be conducted to
evaluate  the  ability of  the statistical test  to detect  the  existence of
effects given the values of the remaining  design parameters.  In  the Example
Analyses  section  below,  calculations  are  presented  to  evaluate  the
probability of  detecting  specific levels of  differences  in  the mean value
between sampling stations.

-------
     The  second  basic application  of  the  power  analysis tool  is  the
evaluation of the performance  of  monitoring  programs.  This application can
be  used  to  select  study  design  specifications  for  proposed  monitoring
programs or to evaluate the effectiveness of existing monitoring  programs.
When existing data are available for  a selected monitoring variable, power
calculations  provide  a quantitative comparison  of  alternative  sampling
layouts.   For  example,  using  historical  data,  the  minimum  detectable
difference in a  selected monitoring variable can be determined as a function
of the number of  sample replicates.  Examples of these calculations are also
provided in a subsequent section.
                                    8

-------
                            POWER CALCULATIONS
     Just as there are many types of power analyses, there are also several
different procedures that can be used to calculate the power of the statis-
tical test  for  any particular analysis.  For  example,  several  methods  for
calculating the power of the ANOVA have been described.  For the most  part,
these methods involve the use of  look-up tables or  nomographs (e.g., Scheffe
1959; Pearson  and  Hartley  1951;  Tang  1938;  Lehmer 1944; Winer  1971;  and
Cohen 1977).  However, there are differences in the nomenclature associated
with the description of these methods and the associated tables.  This  lack
of  conformity  can  cause  confusion  in  comparing  the different  formulas.
There  are  also  many different ways to  formulate the power  analysis
calculation.  For  example, Cohen (1977) provided three  different  formulations
of  the  power test  for the  ANOVA  based on the  assumed  degree  of departure
from the null hypothesis  (no effect).

     To provide a good understanding of the power  calculations performed  by
the ODES power analysis tool, a complete description of the methods used  is
presented below.  This description  includes a  review of the statistical  test
and the formulation of the power  calculation performed by the ODES tool.

ANALYSIS OF VARIANCE

     In  the evaluation of  environmental  monitoring  data,  ANOVA techniques
can  be  used to relate explicitly observations  of interest  (e.g., chemical
concentrations  in marine sediments)  to various  environmental  factors  and
random errors.  This partitioning of field observations can be  demonstrate'1
with the ANOVA  experimental model shown in Equation (1), which  partitions a
single observation  (Yij)  into several  components:

                               V" *  yi * €'j

-------
where:
    YJJ = Observations at Station  i  and Replicate j  of,  for example,  the
          concentration of a  selected chemical
      M = Mean of all  Y^j observations
     TT = Effect of the i^h  level  of an environmental  factor (e.g.,  station
          location)
    fij = Random errors not accounted for by either /i or 7i .

Under  the example  model  formulation,  the effects of  environmental  factors
(e.g., station location) on individual  observations can be tested for statis-
tical significance.  The null  hypothesis tested is that the station location
has  no  effect  on  observed contaminant  concentrations,  or  stated
formally: TI ---- = 7n  = °*  Similarly, more complex models  can be formulated
to test for  the effect of more than one environmental  factor as  well as  the
statistical  significance among factors.

     The  results  of  a one-way ANOVA  are  usually summarized  in  a manner
similar to that shown  in Table 1.   The  test  statistic  is the F  ratio,  which
is the  ratio  of the between-groups mean square  (BMS)  to  the wi thin-groups
mean square  (WMS).   As indicated in Table 1, the WMS is an  unbiased estimate
of  the  population  variance,  while  the  expected   value   of  the  BMS  is
represented  by  the  sum  of  the  population  variance and another  term
representing the actual  fixed  effects.  This added quantity is:

                          (I-l)'1 U^-y)2

where:

      I = The number of sampling stations
     Ji = The number of replicates at the itn station
     1] = The true value of the itn effect
       7 = The mean of  the treatment effects.

Under the null hypothesis,  the value of the actual fixed effects term  is 0,
and the expected value of the  F ratio is equal to 1.   When  fixed  effects  are
                                    10

-------
             TABLE 1.   ANALYSIS OF  VARIANCE  TABLE FOR ONE-WAY LAYOUT
    Source         Sum of Squares     d.f.    Mean Square            E(MS)
Between groups    EJ^-y)2        1-1     SSB/(I-1)     a2 + (I-l)'1 E

Within groups     SUy**^,^       n~l     SS/tn-I)               o2
                   ij  1J  1                   W
Total             ££(yiry)2        n-1
                        J
where:
    y-jj = Observation at group (station)  i  and  replicate j
     yi - itn group mean
      y = Overall  mean of all  i,  j  observations
      I = Number of sampling stations
      n = Total number of observations
    SS(j = Between groups sum of squares
    SSy = Within groups sum of squares
  E(MS) = Expected value of the between-groups  mean square
     J-i = Number of replicates at the itn station
     7j = True value of the itn effect
     7  = Mean of the treatment effects
     a2 = Population variance.
                                        11

-------
observed in the monitoring  program, the value of this quantity increases  and
results in an increase in  the  value of  the  numerator  of  the  F ratio.   Large
effects will  result  in an increase  in  the power of  the  test  (i.e.,  the
probability of rejecting a  false  null hypothesis).

     In performing  power analyses, a  set of effects is  assumed.   However,
when a sample design involves  several station locations,  many different sets
of  effects  can be  assumed.   For example,  alternative  hypotheses can  be
constructed such that actual station effects of a certain magnitude occur at
one,  two,  three,  or more  of  the  total  number  of  sampling  locations.
Additionally, as  mentioned above,  Cohen   (1977)  described methods  for
calculating the power of the F test  corresponding to  "small," "medium,"  and
"large" effects.   The magnitude  of the  effects  can also  be varied  among
stations,   so  that  an infinite number  of  alternative  hypotheses can  be
constructed for evaluation  in  power analyses.

     The  power  analyses  conducted  by  ODES are  formulated  to  provide  a
conservative estimate of  statistical  power.   Using  this  conservative
approach,  alternative hypotheses  are constructed such that the effects  occur
in the  combination  that  is the  most  difficult  to detect.   Scheffe  (1959)
showed that this conservative  set of effects is defined by:

                                 I Vyi'
          l^-Xjl -A  ;     yk  -   YJ      ,  for all k 4  1  or j         (2)

where:

      A=  The maximum difference  in actual effects
    7k =  The true value of the ktn effect.

     EquaMon (2)  states that  the  two effects (7i and 7j)  associated with
the hypothesis  of  interest differ  by  A, while all  other effects (7|<)  are
equal  to the mean of these  two.   For the maximum difference in effects  equal
to A,  this arrangement gives the  lowest test power.

     As illustrated  in Figure 2, determining  the  power  of  a  test involves
the calculation of  the area under  the curve in the  critical  region of  the
                                    12

-------
noncentral  F  probability  density  (i.e.,  the probability density  of  F when
the  null  hypothesis is  false).   This amounts  to integrating  the  density
function over this critical  region.   The appropriate mathematical expression
is:
                     Power . i - B •/ P (F J  V1§ V2. X) dF,             (3)
                                    a
where:
     MI = Numerator degrees  of  freedom
     \/2 = Denominator degrees of  freedom
      X = Noncentral i ty  parameter  (defines  shape  of the  noncentral
          distribution).

The numerical integration methods used to  solve  Equation  (3) are described
in Appendix A.   Scheffe  (1959) showed that  the  noncentrality parameter in
Equation (3) satisfies the  following  relationship:
                            Xo2  « J  I(y- y)2                         (4)
                                         1
where:
     a2 = Population variance
      J = Numbers of replicate samples.

Using this information and assuming equal numbers of replicate samples (J),
the expected value of the between-groups mean square (BMS)  in  the ANOVA table
(Table 1) can b" rewritten as:

                         E(BMS) - o2 + (I-lj'Vx                       (5)

     Combining Equation (5) with the relationship expressed  in Equation (4)
provides the necessary information to  solve  for  the  noncentrality parameter.
The value of the noncentrality parameter can then be used to characterize the

                                    13

-------
noncentral density function and provide  a  basis  for  solving  Equation  (3)  to
determine  the  power  of  the  test.  Under  the conditions  imposed in  Equa-
tion (2),
                           I  X, -
1  • '  '    2                                (6)
for the two extreme 7-j, and
                               y. - y |  - o
for the others.  Therefore,

                            2      A2
                          oX» j.  ,  and                              (7)
     As  previously  stated,  the  ODES power  analysis tool  can be  used  to
perform two types of  analyses for the one-way ANOVA.   In the  first  type  of
analysis, the ODES  tool  is used  to  determine  the probability of  detection
vs. specified values  of minimum detectable difference.  For this  analysis,
the user must specify the significance level  of the test,  the  number of sta-
tions,  the  number  of  replicates at  each station,  and an estimate  of the
unexplained variability. This analysis requires the solution of Equation (8)
to determine  the noncentrality  parameter  and  subsequently the solution  of
Equation (3).

     In the second  type of analysis, the minimum detectable difference (A)
is calculated for varying numbers of  replicate samples  at each station.  In
this analysis, the values of the other design parameters [i.e., the signifi-
                                     14

-------
cance level  (a),  number  of stations, and unexplained  sample  variance]  are
fixed, and Equation (7)  is  solved  for &:
                                                                        (9)
The formulation of the power  calculation  in this manner requires the inverse
solution for the noncentral F density  function  (Equation 3) [i.e., given the
power and  appropriate degrees  of  freedom,  solve  for  the  noncentral ity
parameter (X)].
                                     15

-------
                              EXAMPLE ANALYSES
     The application of  the  ODES  power analysis tool for the  one-way  fixed
effects ANOVA  is  presented  below  in several example analyses.   The purpose
of these examples  is twofold.   First,  these examples  summarize the  informa-
tion presented in  this guidance document.   They  are  intended  to demonstrate
the concepts presented and  to  familiarize the ODES user with  the  capabili-
ties and  potential uses  of  the  power  analysis tool.  Second,  the  examples
are intended to demonstrate  the use  of the ODES power analysis  tool  in  the
design and evaluation of monitoring programs.

EXAMPLE DATA AND PRELIMINARY ANALYSES

     Data used  in the examples  provided  below  are  summarized  in  Table 2.
For each data  set,  values  for  the estimated mean, residual error  variance,
and coefficient of variation are  presented.  These data were  selected  from
historical data compiled  in a previous  report (Tetra  Tech 1987).

     From Equations (7)  and  (8)  above, it is clear that an estimate  of  the
unexplained sample variance (i.e., the  natural  variability  not accounted  for
by  the  statistical model)  is required  to conduct  power  analyses.  This
estimate can be obtained  by conducting  a site-specific  preliminary  study  or,
alternatively, by using existing  sampling data.   The  unexplained  sample
variance is one of the five design parameters described above, and  it  can be
viewed as an estimate of  the denominator in the F ratio used to evaluate  the
significance of the ANOVA statistical  tests.  This quantity is  shown  in  the
one-way ANOVA table (Table  1) as the within-groups mean square  and represents
the average variance within  groups.  Where sample data are available, this
design parameter can be  estimated in one  of two  ways.  First, a preliminary
ANOVA can be conducted and the value of the within-groups mean square  (WMS)
used.  Second, the  sample variance can be computed from  all available data,
ignoring  sample  location.    The  first  value  provides  an  estimate  of  the
variance that  is  unexplained by  the statistical  model.   Therefore, if  the
                                    16

-------
                        TABLE  2.   EXAMPLE  DATA SETS
                                         Estimated             Coefficient of
              Estimated Mean      Residual  Error Variance        Variation
Data Set           (X)                      (S2)                     (3/X)
1
2
3
0.304
0.766
5.067
0.0243
0.3324
39.2918
51.3
75.3
123.7
                                     17

-------
effects of  sample  locations  are  found  to be  significant  in  the  F  test
conducted with  the  ANOVA,  the within-groups mean  square  will  have a  value
less than the overall sample variance.

     Since the  residual  error variance design  parameter  is  an estimate  of
the  denominator in the  F  ratio, it  can be  seen that  the  overall   sample
variance obtained  from existing data  provides  a conservative estimate  for
the  purposes  of conducting power analyses.  However,  where  available  data
can  be  fit  to the  ANOVA model,  the within-groups  mean  square estimate
provides a more realistic estimate of  the expected value  of the  denominator
in the  F  ratio.  In the examples provided  below,  estimates of the residual
error variance  (unexplained sample variance) were  obtained from  the within-
groups mean square after analyzing the  data  using a one-way ANOVA.

Example 1

     The first  example demonstrates the use of  the ODES  power  analysis tool
to determine  the  power  of the  fixed  effects  one-way ANOVA  for  proposed
monitoring designs.   The output  generated for  this  first  type  of  power
analysis  is  presented  in  Figure 3.   For a  fixed set  of  sampling  design
parameters [fixed  number of stations,  replicate samples,  significance  level
(a),  and unexplained sample variance],  the probability of correctly  rejecting
a false null  hypothesis (i.e., the power of  the  test)  is  shown  as a function
of the  minimum detectable  difference  between  stations, expressed  as  a
percentage of the overall mean.

     To generate  this  power  curve, values  for  the number of  stations  (4),
sample  replicates  (5),  significance  level  (a=0.05),  estimated  variance
(0.0243), and  estimated mean  (0.304)  were  entered  as  input  to the  power
analysis tool.  The estimated values of the mean  and variance  were obtained
from Data Set  1 (Table 2).   The values of  the  other design parameters  were
selected to evaluate the proposed example monitoring  program design.

     Suppose,   for  example,  that  the   objective  of the proposed monitoring
program is to  detect differences in measured  values  equal  to the overall
mean  among all  four stations  (i.e., 100 percent  of the mean).   The results
                                    18

-------
                                                                         0)
                                                                         0)


w
k.
5
"5
E
m
nj
Q.
_
_g>

CC
CO
o
d
n

CM

*O
•*-^
4)
C
._
as
>
•o
a>
CO
to
LU
                                            o
                                            o
                                          _ o
                                            o
                                            CVJ
                                            co
                                            o
                                            00
                                            CM
                                            CVJ
                                            s
                                          - o
                                            03
                LL.
                O
                LU
                O

                LU
                CC
                LU
                Q

                LU
                _l
                m
                                                Q
                                                                        CO
                                                                        Q.

                                                                        C
                                                                        D>
                                                                        '(/)
                                                                        CD
                                                                        T3

                                                                        T3
                                                                        0)
                                                                        Q
                                                                        Q.
                               O
                               C
                               CD

                               0)
                                                                        JQ
                                                                        CO
                                                                        O
                                                                        0>
                                                                        0)
                                                                        •a
                                        E
                                       "c
                                        E

                                        i
                                          - O   ^
       cq
       o
CVJ
d
p
d
(NOI10313a dO AilliavaOHd) UBMOd
                                                                        Q

                                                                        o
                                                                        Q.
                                                                        CO

                                                                        L
                                                                        2.
                                                                        LI
                                19

-------
presented  in  Figure  3 indicate that  under  the proposed design, the  proba-
bility of  correctly  detecting  a difference equal  to  the overall mean  (100
percent of the mean value) is approximately 0.63.

     In light of  the  previous  discussion  of the conservative nature  of  the
ODES power analysis,  these results indicate that given  mean  values  of 0.304
at two of  the  four stations  and values  of 0.152 and 0.456 at the  remaining
stations,  the  difference  between  the extreme  values  (equal  to  the overall
mean) will  be detected in the  test  of significance with  a probability  of
approximately 0.60.   In a similar  manner,  the  other points  on this  curve  can
be used to determine  the  power  of  the test  for a wide  range of  differences
in mean values.   For  example,  these  results also indicate  that  the proposed
design would  have a  very low  power if  the  objective  of the  monitoring
program was to  detect station  differences equal to 50  percent of  the mean.
In this  case, the estimated statistical  power  would   be  only   about  0.20.
Given these results,  different  values for  the  fixed design  parameters  (e.g.,
number of  replicate  samples) could be  entered  in  the  analysis  to  evaluate
alternative designs.

Example 2

     The example  data sets  in  Table  2  exhibit  increasing  levels of  sample
variability.  The  selected coefficients  of variation  are  51.3, 75.3,  and
123.7, respectively.   From the previous discussion and  intuitively,  it  can
be seen that an increase in the level  of unexplained sample variance results
in a  decrease in the  power  of the test.   Equivalently, associated with  an
increase in sample variability, there is  an increase in  the  minimum differ-
ence that can be detected for a fixed level  of statistical  power.

     The effect of increased sample variability is  shown in Figure  4 for  the
three example data sets.   The  power  curve (A)  presented in  Figure  4  is  the
same one  presented  in  Figure  3.   In  this  case,  the  minimum  detectable
difference that  can  be detected with a power of 0.80  is  approximately  1.2
times the overall  mean.  For  the same level of  power, the minimum  differences
that  can  be  detected  for  levels  of  sampling variance  corresponding  to
                                    20

-------



co
5)
E
n
o.
c
.05
CO
CD
O

CD
LZ


LO
O
6 •*
II II

a
T
O
(0
.a
'E
o>
CO
•I c
'SJ g
(0 (0
CO CO


LO
II








CO
H
.0
Q.
CD
QC
                                        O

                                        O
                                        CO
                                           HI
                                           vg

                                        CO  £
                                        o

                                        S
                                        CM
                                        8
                                        CM
                                           LU
                                           LU

                                           LU

                                           O
                                       . o
                                        00
                                       . o
       CO

       o
CD

O
CM

O
q
o
(NOI10313Q dO AllliavaOUd) cd3MOd
                                                              ?
                                                              o
                                                              c
                                                              o
                                                              O

                                                              i_

                                                              0}
                                                              Q-co
                                                              0> OJ

                                                              £7
                                                              c O
                                                              o
                                                              >,co
                                                03 4
                                               1
                                                Q-LO

                                                E  »
                                                co <
                                                CO


                                                o> o


                                                SI
                                                Q. CO
                                                X >
                                                0) «_
                                                c o

                                                3 CO


                                                0) Q

                                                CO O
                                                Q 5=
                                                >T 
-------
coefficients of  variation  equal  to 75.3  and  123.7  are 1.8 and 3  times  the
value of the overall mean,  respectively.

     The relationship  between  sample  variability and the power of  the  one-
way ANOVA can also be demonstrated using individual  data sets.   For example,
the individual measurements  from  Data Set 3  (Table  2)  and  the  results  of  a
one-way ANOVA for tests of differences  among  observed  means at  six  stations
are presented in Table  3.  From  these data,  we see  that the maximum differ-
ence between  observed mean  values  at  the  six monitoring  stations is  7.9
mg/kg [9.5 (STA 2)-1.6 (STA 4)] or 156 percent of the overall  mean,  and  that
this observed difference between  stations is  not statistically significant
(p=0.36).

     Given the relatively high level  of the  estimated  sample  variability as
indicated by the coefficient of  variation (123.7)  given  in Table  2,  these
results are not unexpected.   However,  power analyses can be used to evaluate
the example  monitoring program  results  in  terms  of   the  probability of
detecting the maximum differences between observed  mean  values at  the  six
monitoring stations  (7.9 mg/kg or approximately 160 percent of the overall
mean).

     Figure 5(B)  shows the  results  of a  power analysis  conducted  for  the
following fixed  design parameters:  stations (6), replicates (5), statistical
significance [(a)  =  0.05],  and  estimated variance  [(a2) = 39.3]  obtained
from the data given in Table 3.  These results indicate that the probability
of detecting  statistically  significant differences   among stations  equal  to
160 percent of the  overall mean is less  than 0.30.  In other words,  given the
example study design, there is a small probability of statistically verifying
the significance of observed  differences among the monitoring  stations.

     One  strategy  thct is  often used in these types of analyses is  to
transform the observed  values  to meet  the  variance assumptions for ANOVA.
As shown  below,  the ODES power  analysis  tool  can  be  used to  evaluate  the
statistical  implication of  this strategy.
                                      22
-------
TABLE 3.  MEASURED CONTAMINANT CONCENTRATIONS IN FISH
TISSUE (mg/kg) AT SIX MONITORING STATIONS AND ONE-WAY
     ANOVA  RESULTS  FOR TESTS OF  DIFFERENCES AMONG
            OBSERVED MEAN CONCENTRATIONS

Stations
Replicate 123
1 16.0 3.7 1.8
2 4.2 32.0 2.2
3 5.0 3.8 3.2
4 2.3 3.7 7.9
5 4.5 4.2 3.0
X 6.4 9.5 3.6
Overall Mean =5.07
ANOVA Table
Source D.F. Sum of Squares
Between groups 5 226.7026
Within groups 24 943.0036
Total 29 1,169.7061
456
1.1 2.4 5.3
1.3 1.8 2.9
1.5 2.8 3.4
0.5 3.0 18.0
3.5 2.4 4.6
1.6 2.5 6.8


Mean
Square F Ratio F Prob.
45.3405 1.154 0.3601
39.2918

                          23
-------



62
03
03
E
£5
re
Q.
C
O»
'w
Q
X
LL


m
q
d
n
^_^
d

03
o
a
g
c
CO
"re
g
M
re
CO


co
ii







CO
g
re
CO


in
n







CO
03
re
.g
"5.
03
CC









8
I
re
-a
1
to
LU
CD CD
^- •
o co
ii n






CM -S CJ
S §


    CO
    d
to
d
                                     o
                                     o
                                     U)

                                     o
                                     to
                                     o
                                     o
                                     o
                                     in
                                     eo


                                     o
                                     o
                                     CO

                                     o
                                     in
                                     eg


                                     o
                                     o
                                     OJ

                                     o
                                     in
                                     o
                                     o
                                     o
                                     in
                              UJ


                              it-
                              CD

                              |5

                              UJ
                              O

                              UJ
                              cr.
                              UJ
                              Q
                              LLJ

                              m
                              o
                              2
o
d
(Nouoaiaa do xiniavaoyd)
                                                               CO

                                                               0)
                                                               •^^
                                                               0)



                                                               2
                                                               Cfl
                                                               Q.
                                                               g>
                                                              'w
                                                               O
                                                               0)
                                                               Q.
                                                               CO
                                                               O
                                                               c

                            05
                            o 2
                            0) Cfl
                            K-O
                            "o 5
                            E £

                            1  "
                            ICQ

                            E co


                            5!
                            *- "O
                            co CD
                            03  /
                           ^r c
                           *- 03
                           »+. t-
                            o *-
                            •- %
                            0) O

                            l~
                           Q. <
D)
                         24
-------
     Log-transformed  values of observed concentrations in Data  Set  3  along
with ANOVA test results for differences among the mean values  in  log  space
are presented in Table 4.   These  results  indicate that  the  relative sample
variability  is  reduced in  log  space  [e.g.,  the  coefficient  of  variation
 A
(0/X)  in  log  space  is  66.8,  and  the  differences among  stations  are
statistically significant  (p=0.013)].   Furthermore,  power  analyses  shown in
Figure  5(A)  indicate  that  the  probability  of  detecting statistically
significant  differences among  stations equal  to  the maximum  differences
between observed mean  values  in  log  space at the  six  monitoring  stations
[1.77  (STA  2)-0.26 (STA  4)  = 1.51  or approximately 120 percent  of  the
overall mean] is  approximately 0.70.

     In assessing monitoring program  design it is  important  to evaluate the
changes that can  be  detected  in  the  original  units  of measurement rather
than the transformed values.  The following conversion relationship between
logarithmic parameters  and arithmetic parameters of  a  lognormal distribution
can be used:

                        A- exp (/4]n + 0.5<71n2)

where:

      H = Arithmetic mean
    fj.-\n = Arithmetic mean  of the log-transformed values
   (Jin2 = Variance of the  log-transformed  values.

The overall mean (1.279) and Within Groups Mean Square (0.4609)  from Table 4
are used as estimates of /i-\n and 0in2, respectively.   Using this  relationship
and  the  results shown  in Figure  5(A),  the probability  of detecting
statistically significant  differences  in  contaminant  concentrations in  fish
tissue of approximately 5.7 mg/kg  among stations with the existing monitoring
program design and using  log-transformed values is 0.70.
                                     25
-------
TABLE 4.  LOG-TRANSFORMED CONTAMINANT CONCENTRATIONS
  IN  FISH  TISSUE  (mg/kg)  AT SIX MONITORING STATIONS
       AND ONE-WAY ANOVA RESULTS FOR TESTS OF
        DIFFERENCES AMONG MEAN CONCENTRATIONS
Stations
Replicate
1
2
3
4
5
X
Overall Mean
1
2.773
1.435
1.609
0.833
1.504
1.63
- 1.229
2
1.308
3.466
1.335
1.308
1.435
1.77

3
0.588
0.788
1.163
2.067
1.099
1.14

456
0.095 0.875 1.668
0.262 0.588 1.065
0.405 1.030 1.224
-0.693 1.099 2.890
1.253 0.875 1.526
0.26 0.89 1.67

ANOVA Table
Source
Between groups
Within groups
Total
D.F.
5
24
29
Sum of
8.
11.
19.
Squares
5186
0605
5791
Mean
Square F Ratio F Prob.
1.7037 3.697 0.0127
0.4609

                         26
-------
Example 3

     The power of  the  test  is  also affected by changes in the other design
parameters.  For example,  the effects of changes in the number of replicate
samples  at  each station  on the  power  of  the  one-way ANOVA  are  shown in
Figure 6.   In this example, corresponding  to Data  Set  2 in  Table 2,  the
power has been calculated for three levels of sample effort.  As indicated,
the power of  the test  increases for  an  increase  in the number of replicate
samples at each station.

Example 4

     The ODES power analysis tool  can also be used to  determine the minimum
detectable difference as a  function  of  the number of  replicate  samples at
each station.   In  this type of analysis, the power  of the  test  as well as
the other design parameters  (i.e., number  of  stations, significance level,
and estimated  sample  variance)  are held constant.   For  these analyses the
power of the test was  arbitrarily  set at 0.80.  The results  of this type of
analysis are  shown for Data Set 1 (Table 2)  in  Figure 7.   As indicated in
Figure 7,  the minimum detectable  difference  decreases with  an increase in
the  number  of  replicate  samples at each station.   In other  words,  the
ability  of  the ANOVA  to  detect  significant  differences  among  sampling
stations  increases  with an  increase in  the  level  of  sample replication.
Results  of  this  analysis  indicate, for  example,  that  with  three replicate
samples at four stations a difference of approximately  1.8 times the overall
mean value can be  detected  between stations with  a probability of 0.80.  A
difference  between  stations approximately equal  to  the  overall  mean value
can be detected with seven replicates at each  station.

     Figure 7  also illustrates an important  concept  in  the  use  of power
analyses  to evaluate  alternative  sampling  designs.    While  increasing the
number  of  replicates always increases  the  level  of  detection between
stations, a disproportionate increase in the  level of detection is  achieved
initially.  The benefits of  increased sample  replication diminish with  each
additional sample,  and at some  point  the increase  in the level of detection
is negligible.
                                    27
-------


(0
5
£
a
0.
c
_D1
IB
Q

"2
X
LL





S
o
n
S

o
re
E
»j
LU
       CO
               (O
                              CM
                                        o
                                        o
CO


o
                                        o
                                        CO
                                        C\J
                                        CVJ
                                        o
                                        o
                                        CVJ
                                      - s
                                        o
                                        CO
                                            <
                                            LU
                                            2
                                            LU
                                            o
    LU
    o

    LU
    QC
    LU
    Q
    LU



    S
    LU
    tD
    Q
    S

    2
    Z
                                                               <
                                                               o
                                                               gco
                                                               ®  n
                                                               £ m
                                                                 s
                                                               O)
   03

   CO
^      o       o      o       o      o

 (NOI10313Q  dO AlHiavaOHd)  HBMOd
ft
03 03
CO CO
T3 0)
«
                                                                   .
                                                               O CD
                                                               c
                                                               O

                                                               co
LU :


CO

I
g>
L.
                           28
-------
      350-1
      300-
  LU
  o  25°-
  LU
  O
      200-
  cc
  LU
  Li-
  Li.
  Q
  LLJ   150'
  m
  O
  LU
  LU
  Q
100 n
       50 -
                                  Fixed Design Parameters
                      Statistical Significance ( a )  = 0.05
                      Power (1-P)             =0.80
                      Stations                 = 4
                      Estimated Variance ( a2 )  = 0.0243
                                  8     10     12     14     16
                      NUMBER  OF REPLICATES
Figure 7.   Minimum detectable difference vs. number of replicates for
            fixed set of design parameters.
                              29
-------
Example 5

     The effect of  an increase in  the  level  of unexplained variability on
the  performance  of  a  monitoring  program  is  shown  in  Figure 8.    In  this
example, the  relationship  between  minimum detectable difference and number
of replicates is  shown  for the three levels  of variability represented in
the  example  data  sets (Table 2).   As  indicated,  for  identical   sampling
designs (all  other design parameters held constant), there is a substantial
increase in minimum detectable difference between stations with an  increase
in unexplained  variability.   The  effect  of  this  increase  in  the minimum
detectable difference  is  to reduce the sensitivity of the monitoring  program.
                                    30
-------
 <
 LU
 LJJ
 o
 LU
 CC
 LU
 LL
 U_
 Q
 LU
 m
 jS
 o
 LU
 tD
 Q
     800 -i
     700-
     600-
500-
400-
300-
      200-
      100-
                                   Fixed Design Parameters
Statistical Significance ( a)
Stations
Power (1-p )
                                                 0.05
                                                 4
                                                 0.80
       (A)
                      4      6     8     10     12

                       NUMBER OF REPLICATES
                                               14
                             16
Figure 8.   Effects of increased unexplained sample variability on the
           maximum detectable difference. Coefficients of variation:
           A=51.3,  B=75.3, C=123.7.
                             31
-------
                                  SUMMARY
     The ODES power analysis tool can be  used  to  evaluate the power of the
one-way  fixed effects  ANOVA and provides  the  ability  to conduct two basic
types of analysis.  However, within  these  two analysis  types,  it  is  possible
to evaluate many  combinations of the five  basic design parameters affecting
the power of the statistical test.

     The two types of power analysis available  on  ODES  correspond to the two
primary intended applications of the power analysis  tool.  The  first type of
power  analysis  is used  to evaluate  the power  of the  one-way  ANOVA for
various levels of differences between monitoring  stations or,  equivalently,
levels of  effects between treatments.   This type of  analysis is described
above  in  Examples 1-3  and  is  primarily   intended  for the  evaluation  of
existing monitoring  data.   In  this  application,  referred to  here  as  an a
posteriori  analysis, the  focus  is on the evaluation  and interpretation of
statistical analyses  in which  the  null  hypothesis  has  been  accepted.   As
previously indicated,  failure to reject  the null hypothesis does  not justify
its acceptance.    Rejection  of the null  hypothesis should be followed by an
evaluation of the probability of the corresponding Type II error (i.e., the
probability  of  accepting  a  null   hypothesis when it  is  false).   The  a
posteriori  analysis  should  be  conducted   to  evaluate  the  probability  of
detecting  specific  levels of differences  between stations  or effects
associated with  different  treatments,  given the fixed  parameters of  the
experimental  design.   Several recent papers provide  examples and discussions
of the application of  power analyses  in this type of  evaluation (Parkhurst
1985; Toft and Shea 1983;  Rotenberry and Wiens  1985).

     The second type of power analysis, described in  Examples 4  and  5,  is
used to determine  the  minimum detectable  difference for selected levels of
sample replication..  This  type of analysis, referred to here as an  a priori
analysis  of  power,  is  especially useful  in the  evaluation  of  proposed
monitoring programs in terms of the ability  to correctly detect differences
                                    32
-------
among  sampling  stations.   These  analyses can  be  used  to  provide  a
quantitative comparison of alternative sampling  layouts.   For  example,  the
level  of sampling effort required  to obtain  a selected  level  of sensitivity
in the monitoring program can be determined.   Using this type of analysis,  it
is also possible  to  allocate  sampling effort between numbers of stations  and
replicate  samples  to  obtain   specified monitoring  program  objectives
(e.g., the detection  of a difference  in the dependent  variable equal   to
50 percent of the overall mean).

     The a priori analyses can  also be  useful in  identifying modifications
to the monitoring program for  increased  effectiveness.   Most environmental
samples are relatively  expensive to collect  and analyze.   Therefore,  it  is
important to evaluate the  cost  efficiency of alternative designs, especially
relative to the number of  replicate  samples.  Such analyses can be conducted
after the collection of several  data sets to determine tne optimum number of
replicate samples   needed  for the  most cost-effective  accomplishment  of
overall monitoring  program objectives.
                                    33
-------
                                REFERENCES


Cohen, J.   1977.   Statistical  power analysis  for  the behavioral  sciences.
Academic Press, New York, NY.

Lehmer, E.   1944.   Inverse tables of  probabilities  of error of  the  second
kind.  Ann. Math.  Stat. 15:388-398.

Parkhurst,  D.F.   1985.  Interpreting  failure  to  reject  a  null  hypothesis.
Bull. Ecol. Soc. 66:301-302.

Pearson, E.S.,  and  H.O. Hartley.   1951.  Charts of  the  power  function  for
analysis of variance  tests,  derived  from  the non-central  F-distribution.
Biometrika. 38:112-130.

Rotenberry,  J.T.,  and  J.A. Wiens.   1985.   Statistical  power analysis  and
community-wide patterns.  Am. Nat. 125:164-168.

Scheffe, H.  1959.  The  analysis  of  variance.  John  Wiley  & Sons, New York,
NY.  477 pp.

Tang, P.C.  1938.   The power function of the analysis of  variance tests with
tables and illustrations of their use.   Sta. Res. Mem. 2:126-149.

Tetra  Tech.   1987.   Bioaccumulation  monitoring guidance:  Strategies  for
sample replication and compositing.  Final  Report.   Prepared for U.S.  Envir-
onmental Protection Agency Office of Marine and Estuarine Protection.   Tetra
Tech, Inc., Bellevue, WA.  51 pp.

Toft,  C.A.,  and  P.J. Shea.   1983.   Detecting  community-wide  patterns:
estimating power strengthens statistical inference.   Am.  Nat.  122:618-625.

Winer, B.J.  1971.  Statistical principles  in  experimental  design.  McGraw-
Hill, New York, NY.
                                     34
-------
    APPENDIX A



POWER CALCULATIONS
-------
                               APPENDIX A
                           Power Calculations
      The  power of the statistical test  is  the  complement of the Type  II
error (0)  which is mathematically defined as follows

          Fa
(Al)  S  =  / p(t |  v», Vi. A)dt
          t=o

where  p(») is  the probability density function of the non-central  F-
distribution; Vi and vz are the degrees  of  freedom for the numerator and
denominator, respectively; A is the non-central ity parameter; t is a dummy
variable  of integration,  corresponding to  the  F-ratio; and  F   is the
critical  value (i.e.,  the value of  F  corresponding to the selected
significance level a of the test.  The probability density function for the
non-central F-distribution  can be expressed as an infinite series of the
Beta function (B(«,») (Pearson and Hartley,  1951):
      Substitute Eq. (A2)  into Eq. (Al) and  rearrange terms  so  that
                     -(Vi+V:+2j)/2
                           du
               Vi   '        U
where u is a  new variable of integration, such that

(A4)  u = T^lj7

                                   A-l
-------
      The distribution function of the central  F  can  be expressed as follows

                                         \h.        -(Vi+Vz)/2
(A5)
      A numerical  solution for solving Eq.  (A5)  is  given  by Abramowitz  and
Stegun (1972).

      By comparing  Eqs.  (A3) and (A5), the  Type  II  error  can be written as
an infinite series  of  the central F distribution

                            «   -A/2M/7,,j   v  F
(A6)  S - P(Fa  | v,.v,.A) =  I  S - jpi- P(^|j | v,+2j.vi)
                           J V

      Eq.  (A6) was programmed in Fortran IV and its  results exactly matched
those  of Lehmer (1944)  to  five significant digits  using less than  100
summations.

                                REFERENCES
Abramowitz,  M.  and  I.A. Stegun.  1972.  Handbook of mathematical functions.
National  Bureau  of Standards, Applied Mathematics Series  55,  1046 pages.
Lehmer,  E.   1944.   Inverse tables of probabilities of errors of the second
kind.   Annals of Mathematical Statistics, Volume  15, pages 388-398.
Pearson,  E.S. and H.O. Hartley.   1951.   Charts of  the power function for
analysis  of  variance  tests, derived  from the non-central  F-distribution.
Biometrika,  Volume 38, pages 112-130.
                                    A-2
-------
U.S. Environmental Protection Agency
GLNPO Library Collection (PL-12J)
77 West Jackson Boulevard,
Chicago, IL  60604-3590
-------
U.S. Environ1"'
Region 5, L:'
77 Wc2t  J~:  '
.,, 12th Fioor
-------