United States
Environmental Protection
Agency
Office of Marine
and Estuarine Protection
Washington DC 20460
EPA 430/09-87-005
June 1987
Water
Technical Support Document for
ODES Statistical Power Analysis
SEP 0 6 1994
Power of Test (1-p)
•Critical Region-
-------
TC 3953-03
Final Report
TECHNICAL SUPPORT DOCUMENT FOR
ODES STATISTICAL POWER ANALYSIS
for
U.S. Environmental Protection Agency
Office of Marine and Estuarine Protection
Washington, DC 20460
June 1987
by
Tetra Tech, Inc.
11820 Northup Way, Suite 100
Bellevue, Washington 98005
U.S. Environmental Protection Agency
Region 5, Ubrar,- .K.-12J)
77 West Jackson Bcn^-n^ j-'ti
Chicago, IL 60f' :-.
U.S. Environmental Protection Agency
GLNPO Library Collection (PL-12J)
77 West Jackson Boulevard,
Chicago, it 60604-3590
-------
CONTENTS
Page
LIST OF FIGURES ill
LIST OF TABLES iv
ACKNOWLEDGMENTS v
INTRODUCTION 1
POWER OF STATISTICAL TESTS 2
HYPOTHESIS TESTING 2
POWER ANALYSIS 6
POWER CALCULATIONS 9
ANALYSIS OF VARIANCE 9
EXAMPLE ANALYSES 16
EXAMPLE DATA AND PRELIMINARY ANALYSES 16
Example 1 18
Example 2 20
Example 3 27
Example 4 27
Example 5 30
SUMMARY 32
REFERENCES 34
APPENDIX A A-l
-------
FIGURES
Number . Page
1 Hypothesis testing: possible circumstances and test outcomes 3
2 Probability densities of the F statistic 5
3 Power of the F-test vs. minimum detectable difference for
specified design parameters 19
4 Effects of increased unexplained sample variability on the
power of the one-way ANOVA 21
5 Power of the F-test vs. minimum detectable difference for
specified design parameters 24
6 Effects of increased sampling efforts on the power of the
one-way ANOVA 28
7 Minimum detectable difference vs. number of replicates for
fixed set of design parameters 29
8 Effects of increased unexplained sample variability on the
maximum detectable difference 31
111
-------
TABLES
Number Page
1 Analysis of variance table for one-way layout 11
2 Example data sets 17
3 Measured contaminant concentrations in fish tissue (mg/kg)
at six monitoring stations and one-way ANOVA results for
tests of differences among observed mean concentrations 23
4 Log-transformed contaminant concentrations in fish tissue
(mg/kg) at six monitoring stations and one-way ANOVA results
for tests of differences among mean concentrations 26
iv
-------
ACKNOWLEDGMENTS
This document has been reviewed by the 301(h) Task Force of the Environ-
mental Protection Agency, which includes representatives from the Water
Management Divisions of the U.S. EPA Regions I, II, III, IV, IX, and X; the
Office of Research and Development - Environmental Research Laboratory-
Narragansett (located in Narragansett, RI and Newport, OR), and the Marine
Operations Division in the Office of Marine and Estuarine Protection, Office
of Water.
This technical guidance document was produced for the U.S. Environmental
Protection Agency under the 301(h) post-decision technical support contract
No. 68-01-6938, Allison J. Duryee, Project Officer. This report was
prepared by Tetra Tech, Inc., under the direction of Dr. Thomas C. Ginn.
The primary author was Mr. Thomas M. Grieb. The computer program for the
power analysis tool was developed by Mr. Thomas M. Grieb and Dr. Michael J.
Ungs. Ms. Marcy B. Brooks-McAul i f fe performed technical editing and
supervised report production.
-------
INTRODUCTION
The Ocean Data Evaluation System (ODES) provides users with a wide
range of statistical tools for analyzing monitoring data in the ODES
database. One of the most valuable tools in ODES for assessing discharge-
related effects is Analysis of Variance (ANOVA). This tool enables the
statistical evaluation of differences in biological and chemical variables
among sampling stations (eg. discharge vs. control). As a companion to the
ANOVA tool, ODES also contains a Statistical Power Analysis Tool that is
used in the design of new monitoring programs and in the interpretation of
ANOVA test results.
In simple terms, statistical power analysis is the evaluation of the
ability to detect significant statistical results when real differences
exist in a particular monitoring variable. Application of the tool enables
the investigation of the statistical implications of alternative sampling
strategies (eg. numbers of sample replicates or sampling stations). This
application is especially useful in designing new monitoring programs or in
evaluating the effectiveness (or cost efficiency) of existing programs.
Power analysis is also an important procedure to use following the use of
ANOVA to detect discharge-related effects. In such cases, power analysis
can be used to assess the possibility that an absence of significant ANOVA
results is caused by an inadequate sampling design.
As a supplement to the ODES Tool, this document provides a review of
the basic concepts of hypothesis testing and statistical power analysis.
The kinds of power analyses that can be conducted using ODES are described,
and the uses of the tool are described with several examples.
-------
POWER OF STATISTICAL TESTS
HYPOTHESIS TESTING
The statistical tests available in ODES are often applied in the
evaluation of monitoring data to test a particular scientific hypothesis.
ANOVA is a frequently used test that enables the evaluation of statistical
significance of observed differences in the values of measured environmental
variables. In the most basic application of ANOVA for environmental effects
data, a number (n) of replicate samples is collected at a number (k) of
fixed sampling stations or sampling events. This application is referred to
as a fixed-effects one-way design because only a single factor (eg. sampling
stations) is evaluated in each analysis. The one-way ANOVA design also
tests only a single hypothesis during each application. For example, in a
design involving multiple samples of an environmental variable at several
sampling stations, the ANOVA tests the null hypothesis that the effects of
station location on the variable are not statistically significant (ie. that
all station means are equal). The test of the null hypothesis is based on
the F-statistic, which is a ratio of the variability within ANOVA groups
(eg. sampling stations) to the variability among groups.
The testing circumstances and outcomes associated with testing the null
hypothesis are shown in Figure 1. Four possible outcomes exist:
1. The hypothesis is true and it is not rejected.
2. The hypothesis is true and it is rejected.
3. The hypothesis is false and it is not rejected.
4. The hypothesis is false and it is rejected.
-------
Q.
LU
O
O
o
LLJ
3
QC
NOISI03Q
CD
O
O
"5
o
4— '
to
0)
T3
CO
CO
0)
o
«
"to
E
'o
_CD
.0
'w
en
o
o.
)
'w
CD
^1
a
>»
I
O)
u.
-------
The shaded areas shown in Figure 1 represent incorrect decisions. The
incorrect rejection of the null hypothesis is referred to as a Type I
error. The probability of a Type I error, designated a, represents the
significance level of the statistical test. The incorrect acceptance of the
null hypothesis is referred to as the /3 error, where /3 represents the
probability of this incorrect decision. The j3 error is also known as the
Type II error. The probabilities of the correct acceptance and rejection of
the null hypothesis are represented by the complements of the Type I and
Type II errors (i.e., 1-otand 1-/3), respectively.
The probability densities of the test statistic for the ANOVA (i.e.,
the F statistic) are shown in Figure 2 for test conditions corresponding to
a true null hypothesis and an alternative hypothesis. Under the alternative
hypothesis (Hg is false), there is a fixed but unspecified effect due to
sampling location so that the mean values among stations are not equal.
These distributions will be used to demonstrate the relationship between the
four possible outcomes of the hypothesis testing process shown in Figure 1
and to provide a probabilistic interpretation of the power of a statistical
test.
The probability density of F when the null hypothesis is true is shown
in Figure 2a. This figure shows the probability of a Type I or a error as
the shaded area under the curve to the right of Fa[i.e., the value of F
corresponding to the selected significance level of the test (a)]. Values
of F obtained in the test of significance that are greater than Fawill lead
to a rejection of a true null hypothesis (a Type I error), and these values
are said to represent the critical region of the test. Values of F obtained
in the test of significance that are less than the critical value (Fa) lead
to acceptance of the true null hypothesis, and these values represent the
acceptance region of the test. The probability of the correct acceptance of
/
the true null hypothesis is therefore 1-aand is represented by the unshaded
area under the curve.
The corresponding probability density of F for the alternative
hypothesis (Hg is false) is shown in Figure 2b. Under the alternative
hypothesis, the distribution is shifted to the right as the expected value
-------
Type I Error (x)
Fa
•Critical Region-
(a) Probability Density of F When Null Hypothesis is True
Type II Error (
Power of Test (1-(3)
h
•Critical Region-
(b) Probability Density of F When Null Hypothesis is False
Figure 2. Probability densities of the F statistic.
-------
of the test statistic (F) increases. However, the value of the rejection
criterion (Fa) remains unchanged, and the shaded area under the curve to the
right of FQ, now represents the probability of correctly rejecting the null
hypothesis when it is false (i.e., of detecting a statistical difference
when one actually exists). This probability is referred to as the power of
the statistical test. The complement of the power of the test is the
probability of accepting a false null hypothesis (0). The (3 or Type II
error is shown in Figure 2b as the shaded area under the curve to the left
of Fa.
With the probability densities in Figure 2, it is possible to
demonstrate a dependency between Type I and Type II errors in the comparison
of the HO and the fixed alternative hypothesis Hj. For example, the
probability of rejecting a true null hypothesis (Type I error) can be
minimized by decreasing a. This is equivalent to moving Fa to the right and
decreasing the critical regions in both Figures 2a and b. However, as can
be seen in Figure 2b, the decrease in the Type I error (a) achieved in this
manner is accompanied by an increase in the Type II or £ error.
While this relationship between a and @ exists for the comparison of
any fixed alternatives or any given statistical tests, this type of analysis
ignores other sampling parameters such as level of sampling effort and
variability within the sampling environment. As described below, it is
possible to decrease 0 while holding a constant. The emphasis in this
description, however, is on the evaluation of the power (1-/3) of the
statistical test under various sampling conditions.
POWER ANALYSIS
The power of the one-way fixed effects ANOVA or the probability of
correctly rejecting the false null hypothesis (1-jS) is determined by the
following five design parameters:
• Significance level of the test (a)
• Number of sampling stations
-------
0 Number of replicates
0 Minimum detectable difference (i.e., the smallest difference
that can be detected among means of the fixed-effects
varible)
t Unexplained sample variance (i.e., natural variability within
the sampling environment).
The relationship between the power of a statistical test and the design
parameters makes several types of power analyses possible. For example, the
power of the test can be determined as a function of the five design
parameters. Alternatively, the value for any individual design parameter
required to obtain a specified power of the statistical test -an be deter-
mined as a function of the other four parameters. It is this latter type of
analysis that can be used to evaluate methods for decreasing the Type II
error (@) or equivalently increasing the power of the test while hoi dingo;
constant.
Two basic applications of the ODES power analysis tool are described in
this document. The first is in the evaluation of reported results of
statistical tests of significance. For example, acceptance of the null
hypothesis at some specified significance level does not imply that it is
true and, therefore, does not demonstrate the absence of differences in the
dependent variable of interest. In reference to the above discussion on
hypothesis testing, the acceptance of the null hypothesis does not provide
information on the probability of the Type II error or of accepting a null
hypothesis that is false. The probability of the Type II error should be
investigated as a matter of course. Power analyses should be conducted to
evaluate the ability of the statistical test to detect the existence of
effects given the values of the remaining design parameters. In the Example
Analyses section below, calculations are presented to evaluate the
probability of detecting specific levels of differences in the mean value
between sampling stations.
-------
The second basic application of the power analysis tool is the
evaluation of the performance of monitoring programs. This application can
be used to select study design specifications for proposed monitoring
programs or to evaluate the effectiveness of existing monitoring programs.
When existing data are available for a selected monitoring variable, power
calculations provide a quantitative comparison of alternative sampling
layouts. For example, using historical data, the minimum detectable
difference in a selected monitoring variable can be determined as a function
of the number of sample replicates. Examples of these calculations are also
provided in a subsequent section.
8
-------
POWER CALCULATIONS
Just as there are many types of power analyses, there are also several
different procedures that can be used to calculate the power of the statis-
tical test for any particular analysis. For example, several methods for
calculating the power of the ANOVA have been described. For the most part,
these methods involve the use of look-up tables or nomographs (e.g., Scheffe
1959; Pearson and Hartley 1951; Tang 1938; Lehmer 1944; Winer 1971; and
Cohen 1977). However, there are differences in the nomenclature associated
with the description of these methods and the associated tables. This lack
of conformity can cause confusion in comparing the different formulas.
There are also many different ways to formulate the power analysis
calculation. For example, Cohen (1977) provided three different formulations
of the power test for the ANOVA based on the assumed degree of departure
from the null hypothesis (no effect).
To provide a good understanding of the power calculations performed by
the ODES power analysis tool, a complete description of the methods used is
presented below. This description includes a review of the statistical test
and the formulation of the power calculation performed by the ODES tool.
ANALYSIS OF VARIANCE
In the evaluation of environmental monitoring data, ANOVA techniques
can be used to relate explicitly observations of interest (e.g., chemical
concentrations in marine sediments) to various environmental factors and
random errors. This partitioning of field observations can be demonstrate'1
with the ANOVA experimental model shown in Equation (1), which partitions a
single observation (Yij) into several components:
V" * yi * €'j
-------
where:
YJJ = Observations at Station i and Replicate j of, for example, the
concentration of a selected chemical
M = Mean of all Y^j observations
TT = Effect of the i^h level of an environmental factor (e.g., station
location)
fij = Random errors not accounted for by either /i or 7i .
Under the example model formulation, the effects of environmental factors
(e.g., station location) on individual observations can be tested for statis-
tical significance. The null hypothesis tested is that the station location
has no effect on observed contaminant concentrations, or stated
formally: TI ---- = 7n = °* Similarly, more complex models can be formulated
to test for the effect of more than one environmental factor as well as the
statistical significance among factors.
The results of a one-way ANOVA are usually summarized in a manner
similar to that shown in Table 1. The test statistic is the F ratio, which
is the ratio of the between-groups mean square (BMS) to the wi thin-groups
mean square (WMS). As indicated in Table 1, the WMS is an unbiased estimate
of the population variance, while the expected value of the BMS is
represented by the sum of the population variance and another term
representing the actual fixed effects. This added quantity is:
(I-l)'1 U^-y)2
where:
I = The number of sampling stations
Ji = The number of replicates at the itn station
1] = The true value of the itn effect
7 = The mean of the treatment effects.
Under the null hypothesis, the value of the actual fixed effects term is 0,
and the expected value of the F ratio is equal to 1. When fixed effects are
10
-------
TABLE 1. ANALYSIS OF VARIANCE TABLE FOR ONE-WAY LAYOUT
Source Sum of Squares d.f. Mean Square E(MS)
Between groups EJ^-y)2 1-1 SSB/(I-1) a2 + (I-l)'1 E
Within groups SUy**^,^ n~l SS/tn-I) o2
ij 1J 1 W
Total ££(yiry)2 n-1
J
where:
y-jj = Observation at group (station) i and replicate j
yi - itn group mean
y = Overall mean of all i, j observations
I = Number of sampling stations
n = Total number of observations
SS(j = Between groups sum of squares
SSy = Within groups sum of squares
E(MS) = Expected value of the between-groups mean square
J-i = Number of replicates at the itn station
7j = True value of the itn effect
7 = Mean of the treatment effects
a2 = Population variance.
11
-------
observed in the monitoring program, the value of this quantity increases and
results in an increase in the value of the numerator of the F ratio. Large
effects will result in an increase in the power of the test (i.e., the
probability of rejecting a false null hypothesis).
In performing power analyses, a set of effects is assumed. However,
when a sample design involves several station locations, many different sets
of effects can be assumed. For example, alternative hypotheses can be
constructed such that actual station effects of a certain magnitude occur at
one, two, three, or more of the total number of sampling locations.
Additionally, as mentioned above, Cohen (1977) described methods for
calculating the power of the F test corresponding to "small," "medium," and
"large" effects. The magnitude of the effects can also be varied among
stations, so that an infinite number of alternative hypotheses can be
constructed for evaluation in power analyses.
The power analyses conducted by ODES are formulated to provide a
conservative estimate of statistical power. Using this conservative
approach, alternative hypotheses are constructed such that the effects occur
in the combination that is the most difficult to detect. Scheffe (1959)
showed that this conservative set of effects is defined by:
I Vyi'
l^-Xjl -A ; yk - YJ , for all k 4 1 or j (2)
where:
A= The maximum difference in actual effects
7k = The true value of the ktn effect.
EquaMon (2) states that the two effects (7i and 7j) associated with
the hypothesis of interest differ by A, while all other effects (7|<) are
equal to the mean of these two. For the maximum difference in effects equal
to A, this arrangement gives the lowest test power.
As illustrated in Figure 2, determining the power of a test involves
the calculation of the area under the curve in the critical region of the
12
-------
noncentral F probability density (i.e., the probability density of F when
the null hypothesis is false). This amounts to integrating the density
function over this critical region. The appropriate mathematical expression
is:
Power . i - B •/ P (F J V1§ V2. X) dF, (3)
a
where:
MI = Numerator degrees of freedom
\/2 = Denominator degrees of freedom
X = Noncentral i ty parameter (defines shape of the noncentral
distribution).
The numerical integration methods used to solve Equation (3) are described
in Appendix A. Scheffe (1959) showed that the noncentrality parameter in
Equation (3) satisfies the following relationship:
Xo2 « J I(y- y)2 (4)
1
where:
a2 = Population variance
J = Numbers of replicate samples.
Using this information and assuming equal numbers of replicate samples (J),
the expected value of the between-groups mean square (BMS) in the ANOVA table
(Table 1) can b" rewritten as:
E(BMS) - o2 + (I-lj'Vx (5)
Combining Equation (5) with the relationship expressed in Equation (4)
provides the necessary information to solve for the noncentrality parameter.
The value of the noncentrality parameter can then be used to characterize the
13
-------
noncentral density function and provide a basis for solving Equation (3) to
determine the power of the test. Under the conditions imposed in Equa-
tion (2),
I X, -
1 • ' ' 2 (6)
for the two extreme 7-j, and
y. - y | - o
for the others. Therefore,
2 A2
oX» j. , and (7)
As previously stated, the ODES power analysis tool can be used to
perform two types of analyses for the one-way ANOVA. In the first type of
analysis, the ODES tool is used to determine the probability of detection
vs. specified values of minimum detectable difference. For this analysis,
the user must specify the significance level of the test, the number of sta-
tions, the number of replicates at each station, and an estimate of the
unexplained variability. This analysis requires the solution of Equation (8)
to determine the noncentrality parameter and subsequently the solution of
Equation (3).
In the second type of analysis, the minimum detectable difference (A)
is calculated for varying numbers of replicate samples at each station. In
this analysis, the values of the other design parameters [i.e., the signifi-
14
-------
cance level (a), number of stations, and unexplained sample variance] are
fixed, and Equation (7) is solved for &:
(9)
The formulation of the power calculation in this manner requires the inverse
solution for the noncentral F density function (Equation 3) [i.e., given the
power and appropriate degrees of freedom, solve for the noncentral ity
parameter (X)].
15
-------
EXAMPLE ANALYSES
The application of the ODES power analysis tool for the one-way fixed
effects ANOVA is presented below in several example analyses. The purpose
of these examples is twofold. First, these examples summarize the informa-
tion presented in this guidance document. They are intended to demonstrate
the concepts presented and to familiarize the ODES user with the capabili-
ties and potential uses of the power analysis tool. Second, the examples
are intended to demonstrate the use of the ODES power analysis tool in the
design and evaluation of monitoring programs.
EXAMPLE DATA AND PRELIMINARY ANALYSES
Data used in the examples provided below are summarized in Table 2.
For each data set, values for the estimated mean, residual error variance,
and coefficient of variation are presented. These data were selected from
historical data compiled in a previous report (Tetra Tech 1987).
From Equations (7) and (8) above, it is clear that an estimate of the
unexplained sample variance (i.e., the natural variability not accounted for
by the statistical model) is required to conduct power analyses. This
estimate can be obtained by conducting a site-specific preliminary study or,
alternatively, by using existing sampling data. The unexplained sample
variance is one of the five design parameters described above, and it can be
viewed as an estimate of the denominator in the F ratio used to evaluate the
significance of the ANOVA statistical tests. This quantity is shown in the
one-way ANOVA table (Table 1) as the within-groups mean square and represents
the average variance within groups. Where sample data are available, this
design parameter can be estimated in one of two ways. First, a preliminary
ANOVA can be conducted and the value of the within-groups mean square (WMS)
used. Second, the sample variance can be computed from all available data,
ignoring sample location. The first value provides an estimate of the
variance that is unexplained by the statistical model. Therefore, if the
16
-------
TABLE 2. EXAMPLE DATA SETS
Estimated Coefficient of
Estimated Mean Residual Error Variance Variation
Data Set (X) (S2) (3/X)
1
2
3
0.304
0.766
5.067
0.0243
0.3324
39.2918
51.3
75.3
123.7
17
-------
effects of sample locations are found to be significant in the F test
conducted with the ANOVA, the within-groups mean square will have a value
less than the overall sample variance.
Since the residual error variance design parameter is an estimate of
the denominator in the F ratio, it can be seen that the overall sample
variance obtained from existing data provides a conservative estimate for
the purposes of conducting power analyses. However, where available data
can be fit to the ANOVA model, the within-groups mean square estimate
provides a more realistic estimate of the expected value of the denominator
in the F ratio. In the examples provided below, estimates of the residual
error variance (unexplained sample variance) were obtained from the within-
groups mean square after analyzing the data using a one-way ANOVA.
Example 1
The first example demonstrates the use of the ODES power analysis tool
to determine the power of the fixed effects one-way ANOVA for proposed
monitoring designs. The output generated for this first type of power
analysis is presented in Figure 3. For a fixed set of sampling design
parameters [fixed number of stations, replicate samples, significance level
(a), and unexplained sample variance], the probability of correctly rejecting
a false null hypothesis (i.e., the power of the test) is shown as a function
of the minimum detectable difference between stations, expressed as a
percentage of the overall mean.
To generate this power curve, values for the number of stations (4),
sample replicates (5), significance level (a=0.05), estimated variance
(0.0243), and estimated mean (0.304) were entered as input to the power
analysis tool. The estimated values of the mean and variance were obtained
from Data Set 1 (Table 2). The values of the other design parameters were
selected to evaluate the proposed example monitoring program design.
Suppose, for example, that the objective of the proposed monitoring
program is to detect differences in measured values equal to the overall
mean among all four stations (i.e., 100 percent of the mean). The results
18
-------
0)
0)
w
k.
5
"5
E
m
nj
Q.
_
_g>
CC
CO
o
d
n
CM
*O
•*-^
4)
C
._
as
>
•o
a>
CO
to
LU
o
o
_ o
o
CVJ
co
o
00
CM
CVJ
s
- o
03
LL.
O
LU
O
LU
CC
LU
Q
LU
_l
m
Q
CO
Q.
C
D>
'(/)
CD
T3
T3
0)
Q
Q.
O
C
CD
0)
JQ
CO
O
0>
0)
•a
E
"c
E
i
- O ^
cq
o
CVJ
d
p
d
(NOI10313a dO AilliavaOHd) UBMOd
Q
o
Q.
CO
L
2.
LI
19
-------
presented in Figure 3 indicate that under the proposed design, the proba-
bility of correctly detecting a difference equal to the overall mean (100
percent of the mean value) is approximately 0.63.
In light of the previous discussion of the conservative nature of the
ODES power analysis, these results indicate that given mean values of 0.304
at two of the four stations and values of 0.152 and 0.456 at the remaining
stations, the difference between the extreme values (equal to the overall
mean) will be detected in the test of significance with a probability of
approximately 0.60. In a similar manner, the other points on this curve can
be used to determine the power of the test for a wide range of differences
in mean values. For example, these results also indicate that the proposed
design would have a very low power if the objective of the monitoring
program was to detect station differences equal to 50 percent of the mean.
In this case, the estimated statistical power would be only about 0.20.
Given these results, different values for the fixed design parameters (e.g.,
number of replicate samples) could be entered in the analysis to evaluate
alternative designs.
Example 2
The example data sets in Table 2 exhibit increasing levels of sample
variability. The selected coefficients of variation are 51.3, 75.3, and
123.7, respectively. From the previous discussion and intuitively, it can
be seen that an increase in the level of unexplained sample variance results
in a decrease in the power of the test. Equivalently, associated with an
increase in sample variability, there is an increase in the minimum differ-
ence that can be detected for a fixed level of statistical power.
The effect of increased sample variability is shown in Figure 4 for the
three example data sets. The power curve (A) presented in Figure 4 is the
same one presented in Figure 3. In this case, the minimum detectable
difference that can be detected with a power of 0.80 is approximately 1.2
times the overall mean. For the same level of power, the minimum differences
that can be detected for levels of sampling variance corresponding to
20
-------
co
5)
E
n
o.
c
.05
CO
CD
O
CD
LZ
LO
O
6 •*
II II
a
T
O
(0
.a
'E
o>
CO
•I c
'SJ g
(0 (0
CO CO
LO
II
CO
H
.0
Q.
CD
QC
O
O
CO
HI
vg
CO £
o
S
CM
8
CM
LU
LU
LU
O
. o
00
. o
CO
o
CD
O
CM
O
q
o
(NOI10313Q dO AllliavaOUd) cd3MOd
?
o
c
o
O
i_
0}
Q-co
0> OJ
£7
c O
o
>,co
03 4
1
Q-LO
E »
co <
CO
o> o
SI
Q. CO
X >
0) «_
c o
3 CO
0) Q
CO O
Q 5=
>T
-------
coefficients of variation equal to 75.3 and 123.7 are 1.8 and 3 times the
value of the overall mean, respectively.
The relationship between sample variability and the power of the one-
way ANOVA can also be demonstrated using individual data sets. For example,
the individual measurements from Data Set 3 (Table 2) and the results of a
one-way ANOVA for tests of differences among observed means at six stations
are presented in Table 3. From these data, we see that the maximum differ-
ence between observed mean values at the six monitoring stations is 7.9
mg/kg [9.5 (STA 2)-1.6 (STA 4)] or 156 percent of the overall mean, and that
this observed difference between stations is not statistically significant
(p=0.36).
Given the relatively high level of the estimated sample variability as
indicated by the coefficient of variation (123.7) given in Table 2, these
results are not unexpected. However, power analyses can be used to evaluate
the example monitoring program results in terms of the probability of
detecting the maximum differences between observed mean values at the six
monitoring stations (7.9 mg/kg or approximately 160 percent of the overall
mean).
Figure 5(B) shows the results of a power analysis conducted for the
following fixed design parameters: stations (6), replicates (5), statistical
significance [(a) = 0.05], and estimated variance [(a2) = 39.3] obtained
from the data given in Table 3. These results indicate that the probability
of detecting statistically significant differences among stations equal to
160 percent of the overall mean is less than 0.30. In other words, given the
example study design, there is a small probability of statistically verifying
the significance of observed differences among the monitoring stations.
One strategy thct is often used in these types of analyses is to
transform the observed values to meet the variance assumptions for ANOVA.
As shown below, the ODES power analysis tool can be used to evaluate the
statistical implication of this strategy.
22
-------
TABLE 3. MEASURED CONTAMINANT CONCENTRATIONS IN FISH
TISSUE (mg/kg) AT SIX MONITORING STATIONS AND ONE-WAY
ANOVA RESULTS FOR TESTS OF DIFFERENCES AMONG
OBSERVED MEAN CONCENTRATIONS
Stations
Replicate 123
1 16.0 3.7 1.8
2 4.2 32.0 2.2
3 5.0 3.8 3.2
4 2.3 3.7 7.9
5 4.5 4.2 3.0
X 6.4 9.5 3.6
Overall Mean =5.07
ANOVA Table
Source D.F. Sum of Squares
Between groups 5 226.7026
Within groups 24 943.0036
Total 29 1,169.7061
456
1.1 2.4 5.3
1.3 1.8 2.9
1.5 2.8 3.4
0.5 3.0 18.0
3.5 2.4 4.6
1.6 2.5 6.8
Mean
Square F Ratio F Prob.
45.3405 1.154 0.3601
39.2918
23
-------
62
03
03
E
£5
re
Q.
C
O»
'w
Q
X
LL
m
q
d
n
^_^
d
03
o
a
g
c
CO
"re
g
M
re
CO
co
ii
CO
g
re
CO
in
n
CO
03
re
.g
"5.
03
CC
8
I
re
-a
1
to
LU
CD CD
^- •
o co
ii n
CM -S CJ
S §
CO
d
to
d
o
o
U)
o
to
o
o
o
in
eo
o
o
CO
o
in
eg
o
o
OJ
o
in
o
o
o
in
UJ
it-
CD
|5
UJ
O
UJ
cr.
UJ
Q
LLJ
m
o
2
o
d
(Nouoaiaa do xiniavaoyd)
CO
0)
•^^
0)
2
Cfl
Q.
g>
'w
O
0)
Q.
CO
O
c
05
o 2
0) Cfl
K-O
"o 5
E £
1 "
ICQ
E co
5!
*- "O
co CD
03 /
^r c
*- 03
»+. t-
o *-
•- %
0) O
l~
Q. <
D)
24
-------
Log-transformed values of observed concentrations in Data Set 3 along
with ANOVA test results for differences among the mean values in log space
are presented in Table 4. These results indicate that the relative sample
variability is reduced in log space [e.g., the coefficient of variation
A
(0/X) in log space is 66.8, and the differences among stations are
statistically significant (p=0.013)]. Furthermore, power analyses shown in
Figure 5(A) indicate that the probability of detecting statistically
significant differences among stations equal to the maximum differences
between observed mean values in log space at the six monitoring stations
[1.77 (STA 2)-0.26 (STA 4) = 1.51 or approximately 120 percent of the
overall mean] is approximately 0.70.
In assessing monitoring program design it is important to evaluate the
changes that can be detected in the original units of measurement rather
than the transformed values. The following conversion relationship between
logarithmic parameters and arithmetic parameters of a lognormal distribution
can be used:
A- exp (/4]n + 0.5<71n2)
where:
H = Arithmetic mean
fj.-\n = Arithmetic mean of the log-transformed values
(Jin2 = Variance of the log-transformed values.
The overall mean (1.279) and Within Groups Mean Square (0.4609) from Table 4
are used as estimates of /i-\n and 0in2, respectively. Using this relationship
and the results shown in Figure 5(A), the probability of detecting
statistically significant differences in contaminant concentrations in fish
tissue of approximately 5.7 mg/kg among stations with the existing monitoring
program design and using log-transformed values is 0.70.
25
-------
TABLE 4. LOG-TRANSFORMED CONTAMINANT CONCENTRATIONS
IN FISH TISSUE (mg/kg) AT SIX MONITORING STATIONS
AND ONE-WAY ANOVA RESULTS FOR TESTS OF
DIFFERENCES AMONG MEAN CONCENTRATIONS
Stations
Replicate
1
2
3
4
5
X
Overall Mean
1
2.773
1.435
1.609
0.833
1.504
1.63
- 1.229
2
1.308
3.466
1.335
1.308
1.435
1.77
3
0.588
0.788
1.163
2.067
1.099
1.14
456
0.095 0.875 1.668
0.262 0.588 1.065
0.405 1.030 1.224
-0.693 1.099 2.890
1.253 0.875 1.526
0.26 0.89 1.67
ANOVA Table
Source
Between groups
Within groups
Total
D.F.
5
24
29
Sum of
8.
11.
19.
Squares
5186
0605
5791
Mean
Square F Ratio F Prob.
1.7037 3.697 0.0127
0.4609
26
-------
Example 3
The power of the test is also affected by changes in the other design
parameters. For example, the effects of changes in the number of replicate
samples at each station on the power of the one-way ANOVA are shown in
Figure 6. In this example, corresponding to Data Set 2 in Table 2, the
power has been calculated for three levels of sample effort. As indicated,
the power of the test increases for an increase in the number of replicate
samples at each station.
Example 4
The ODES power analysis tool can also be used to determine the minimum
detectable difference as a function of the number of replicate samples at
each station. In this type of analysis, the power of the test as well as
the other design parameters (i.e., number of stations, significance level,
and estimated sample variance) are held constant. For these analyses the
power of the test was arbitrarily set at 0.80. The results of this type of
analysis are shown for Data Set 1 (Table 2) in Figure 7. As indicated in
Figure 7, the minimum detectable difference decreases with an increase in
the number of replicate samples at each station. In other words, the
ability of the ANOVA to detect significant differences among sampling
stations increases with an increase in the level of sample replication.
Results of this analysis indicate, for example, that with three replicate
samples at four stations a difference of approximately 1.8 times the overall
mean value can be detected between stations with a probability of 0.80. A
difference between stations approximately equal to the overall mean value
can be detected with seven replicates at each station.
Figure 7 also illustrates an important concept in the use of power
analyses to evaluate alternative sampling designs. While increasing the
number of replicates always increases the level of detection between
stations, a disproportionate increase in the level of detection is achieved
initially. The benefits of increased sample replication diminish with each
additional sample, and at some point the increase in the level of detection
is negligible.
27
-------
(0
5
£
a
0.
c
_D1
IB
Q
"2
X
LL
S
o
n
S
o
re
E
»j
LU
CO
(O
CM
o
o
CO
o
o
CO
C\J
CVJ
o
o
CVJ
- s
o
CO
<
LU
2
LU
o
LU
o
LU
QC
LU
Q
LU
S
LU
tD
Q
S
2
Z
<
o
gco
® n
£ m
s
O)
03
CO
^ o o o o o
(NOI10313Q dO AlHiavaOHd) HBMOd
ft
03 03
CO CO
T3 0)
«
.
O CD
c
O
co
LU :
CO
I
g>
L.
28
-------
350-1
300-
LU
o 25°-
LU
O
200-
cc
LU
Li-
Li.
Q
LLJ 150'
m
O
LU
LU
Q
100 n
50 -
Fixed Design Parameters
Statistical Significance ( a ) = 0.05
Power (1-P) =0.80
Stations = 4
Estimated Variance ( a2 ) = 0.0243
8 10 12 14 16
NUMBER OF REPLICATES
Figure 7. Minimum detectable difference vs. number of replicates for
fixed set of design parameters.
29
-------
Example 5
The effect of an increase in the level of unexplained variability on
the performance of a monitoring program is shown in Figure 8. In this
example, the relationship between minimum detectable difference and number
of replicates is shown for the three levels of variability represented in
the example data sets (Table 2). As indicated, for identical sampling
designs (all other design parameters held constant), there is a substantial
increase in minimum detectable difference between stations with an increase
in unexplained variability. The effect of this increase in the minimum
detectable difference is to reduce the sensitivity of the monitoring program.
30
-------
<
LU
LJJ
o
LU
CC
LU
LL
U_
Q
LU
m
jS
o
LU
tD
Q
800 -i
700-
600-
500-
400-
300-
200-
100-
Fixed Design Parameters
Statistical Significance ( a)
Stations
Power (1-p )
0.05
4
0.80
(A)
4 6 8 10 12
NUMBER OF REPLICATES
14
16
Figure 8. Effects of increased unexplained sample variability on the
maximum detectable difference. Coefficients of variation:
A=51.3, B=75.3, C=123.7.
31
-------
SUMMARY
The ODES power analysis tool can be used to evaluate the power of the
one-way fixed effects ANOVA and provides the ability to conduct two basic
types of analysis. However, within these two analysis types, it is possible
to evaluate many combinations of the five basic design parameters affecting
the power of the statistical test.
The two types of power analysis available on ODES correspond to the two
primary intended applications of the power analysis tool. The first type of
power analysis is used to evaluate the power of the one-way ANOVA for
various levels of differences between monitoring stations or, equivalently,
levels of effects between treatments. This type of analysis is described
above in Examples 1-3 and is primarily intended for the evaluation of
existing monitoring data. In this application, referred to here as an a
posteriori analysis, the focus is on the evaluation and interpretation of
statistical analyses in which the null hypothesis has been accepted. As
previously indicated, failure to reject the null hypothesis does not justify
its acceptance. Rejection of the null hypothesis should be followed by an
evaluation of the probability of the corresponding Type II error (i.e., the
probability of accepting a null hypothesis when it is false). The a
posteriori analysis should be conducted to evaluate the probability of
detecting specific levels of differences between stations or effects
associated with different treatments, given the fixed parameters of the
experimental design. Several recent papers provide examples and discussions
of the application of power analyses in this type of evaluation (Parkhurst
1985; Toft and Shea 1983; Rotenberry and Wiens 1985).
The second type of power analysis, described in Examples 4 and 5, is
used to determine the minimum detectable difference for selected levels of
sample replication.. This type of analysis, referred to here as an a priori
analysis of power, is especially useful in the evaluation of proposed
monitoring programs in terms of the ability to correctly detect differences
32
-------
among sampling stations. These analyses can be used to provide a
quantitative comparison of alternative sampling layouts. For example, the
level of sampling effort required to obtain a selected level of sensitivity
in the monitoring program can be determined. Using this type of analysis, it
is also possible to allocate sampling effort between numbers of stations and
replicate samples to obtain specified monitoring program objectives
(e.g., the detection of a difference in the dependent variable equal to
50 percent of the overall mean).
The a priori analyses can also be useful in identifying modifications
to the monitoring program for increased effectiveness. Most environmental
samples are relatively expensive to collect and analyze. Therefore, it is
important to evaluate the cost efficiency of alternative designs, especially
relative to the number of replicate samples. Such analyses can be conducted
after the collection of several data sets to determine tne optimum number of
replicate samples needed for the most cost-effective accomplishment of
overall monitoring program objectives.
33
-------
REFERENCES
Cohen, J. 1977. Statistical power analysis for the behavioral sciences.
Academic Press, New York, NY.
Lehmer, E. 1944. Inverse tables of probabilities of error of the second
kind. Ann. Math. Stat. 15:388-398.
Parkhurst, D.F. 1985. Interpreting failure to reject a null hypothesis.
Bull. Ecol. Soc. 66:301-302.
Pearson, E.S., and H.O. Hartley. 1951. Charts of the power function for
analysis of variance tests, derived from the non-central F-distribution.
Biometrika. 38:112-130.
Rotenberry, J.T., and J.A. Wiens. 1985. Statistical power analysis and
community-wide patterns. Am. Nat. 125:164-168.
Scheffe, H. 1959. The analysis of variance. John Wiley & Sons, New York,
NY. 477 pp.
Tang, P.C. 1938. The power function of the analysis of variance tests with
tables and illustrations of their use. Sta. Res. Mem. 2:126-149.
Tetra Tech. 1987. Bioaccumulation monitoring guidance: Strategies for
sample replication and compositing. Final Report. Prepared for U.S. Envir-
onmental Protection Agency Office of Marine and Estuarine Protection. Tetra
Tech, Inc., Bellevue, WA. 51 pp.
Toft, C.A., and P.J. Shea. 1983. Detecting community-wide patterns:
estimating power strengthens statistical inference. Am. Nat. 122:618-625.
Winer, B.J. 1971. Statistical principles in experimental design. McGraw-
Hill, New York, NY.
34
-------
APPENDIX A
POWER CALCULATIONS
-------
APPENDIX A
Power Calculations
The power of the statistical test is the complement of the Type II
error (0) which is mathematically defined as follows
Fa
(Al) S = / p(t | v», Vi. A)dt
t=o
where p(») is the probability density function of the non-central F-
distribution; Vi and vz are the degrees of freedom for the numerator and
denominator, respectively; A is the non-central ity parameter; t is a dummy
variable of integration, corresponding to the F-ratio; and F is the
critical value (i.e., the value of F corresponding to the selected
significance level a of the test. The probability density function for the
non-central F-distribution can be expressed as an infinite series of the
Beta function (B(«,») (Pearson and Hartley, 1951):
Substitute Eq. (A2) into Eq. (Al) and rearrange terms so that
-(Vi+V:+2j)/2
du
Vi ' U
where u is a new variable of integration, such that
(A4) u = T^lj7
A-l
-------
The distribution function of the central F can be expressed as follows
\h. -(Vi+Vz)/2
(A5)
A numerical solution for solving Eq. (A5) is given by Abramowitz and
Stegun (1972).
By comparing Eqs. (A3) and (A5), the Type II error can be written as
an infinite series of the central F distribution
« -A/2M/7,,j v F
(A6) S - P(Fa | v,.v,.A) = I S - jpi- P(^|j | v,+2j.vi)
J V
Eq. (A6) was programmed in Fortran IV and its results exactly matched
those of Lehmer (1944) to five significant digits using less than 100
summations.
REFERENCES
Abramowitz, M. and I.A. Stegun. 1972. Handbook of mathematical functions.
National Bureau of Standards, Applied Mathematics Series 55, 1046 pages.
Lehmer, E. 1944. Inverse tables of probabilities of errors of the second
kind. Annals of Mathematical Statistics, Volume 15, pages 388-398.
Pearson, E.S. and H.O. Hartley. 1951. Charts of the power function for
analysis of variance tests, derived from the non-central F-distribution.
Biometrika, Volume 38, pages 112-130.
A-2
-------
U.S. Environmental Protection Agency
GLNPO Library Collection (PL-12J)
77 West Jackson Boulevard,
Chicago, IL 60604-3590
-------
U.S. Environ1"'
Region 5, L:'
77 Wc2t J~: '
.,, 12th Fioor
------- |