EPA CONTRACT NO. 68-01-6938
TC-3953-03
FINAL REPORT
BIOACCUMULATION
MONITORING GUIDANCE:
STRATEGIES FOR SAMPLE
REPLICATION AND COMPOSITING
Cl
ci
ci ci
Cl Cl
CCI3
cr
-ci
JUNE, 1987
PREPARED FOR:
MARINE OPERATIONS DIVISION
OFFICE OF MARINE AND ESTUARINE PROTECTION
U. S. ENVIRONMENTAL PROTECTION AGENCY
WASHINGTON, DC 20460
-------
EPA Contract No. 68-01-6938
TC 3953-03
Final Report
BIOACCUMULATION MONITORING GUIDANCE:
STRATEGIES FOR SAMPLE REPLICATION
AND COMPOSITING
for
U.S. Environmental Protection Agency
Office of Marine and Estuarine Protection
Washington, DC 20460
June 1987
by
Tetra Tech, Inc.
11820 Northup Way, Suite 100
Bellevue, Washington 98005
-------
PREFACE
This manual has been prepared by the U.S. Environmental Protection
Agency (EPA) Marine Operations Division, Office of Marine and Estuarine
Protection in response to requests for guidance from U.S. EPA regional
offices and coastal municipalities planning 301(h) monitoring programs for
municipal discharges into the marine environment. The members of the 301(h)
Task Force of EPA, which includes representatives for the U.S. EPA Regions
I, II, III, IV, IX, and X, the Office of Research and Development, and the
Office of Water, are to be commended for their vital role in the develop-
ment of this guidance by the technical support contractor, Tetra Tech, Inc.
This report provides guidance on the selection of appropriate replica-
tion and compositing strategies for bioaccumulation monitoring studies.
This report is one element of the Bioaccumulation Monitoring Guidance
Series. The purpose of this series is to provide guidance for monitoring of
priority pollutant residues in tissues of resident marine organisms. These
guidance documents were prepared for the 301(h) sewage discharge permit
program under the U.S. EPA Office of Marine and Estuarine Protection, Marine
Operations Division. Other documents in this series include:
Estimating the Potential for Bioaccumulation of Priority
Pollutants and 301(h) Pesticides (Tetra Tech 1985a)
0 Selection of Target Species and Review of Available Bioaccumu-
lation Data, Volumes I and II (Tetra Tech 1987a,b)
Recommended Analytical Detection Limits (Tetra Tech 1985b).
The statistical analyses conducted in this document are based on the
Ocean Data Evaluation System (ODES) Tool No. 14 for Statistical Power
Analysis. The Technical Support Document for ODES Statistical Power
ii
-------
Analysis (Tetra Tech 1987c) describes the basis for, and application of,
these analytical procedures.
The information provided herein will be useful to U.S. EPA monitoring
program reviewers, permit writers, permittees, and other organizations
involved in performing nearshore monitoring studies. Bioaccumulation
monitoring has become increasingly important in assessing pollution effects;
therefore this guidance should have broad applicability in the design and
interpretation of marine and estuarine monitoring programs.
m
-------
CONTENTS
Page
PREFACE ii
LIST OF FIGURES v
LIST OF TABLES vi
ACKNOWLEDGMENTS vi i
INTRODUCTION 1
MONITORING PROGRAM PERFORMANCE 2
METHODS OF ANALYSIS 2
HYPOTHESIS TESTING 3
POWER ANALYSES FOR INDIVIDUAL TISSUE SAMPLES 5
Analytical Methods 6
Preliminary Analyses 10
Analytical Results 14
Summary 28
COMPOSITE SAMPLING STRATEGIES 29
POWER ANALYSES FOR COMPOSITE SAMPLES 33
Analytical Methods 33
Simulation Analyses 33
Power Analyses 37
Summary 41
SUMMARY AND RECOMMENDATIONS 44
REFERENCES 49
IV
-------
FIGURES
Number Page
1 Hypothesis testing: possible circumstances and test
outcomes 4
2 Frequency distribution for calculated values of the
coefficient of variation for 23 historical data sets 15
3 Minimum detectable difference among sampling stations as a
function of the coefficient of variation 21
4 Minimum dectectable difference vs. number of replicates at
selected levels of unexplained variance for 4 and 6 stations 23
5 Minimum detectable difference vs. number of replicates at
selected levels of unexplained variance for 8 and 16 stations 24
6 Minimum detectable difference in the tissue concentration
of selected contaminants vs. number of replicates 27
7 Effects of increasing composite sample size on estimate of
the mean 35
8 Power of statistical tests vs. number of samples in
composite replicate samples 39
-------
TABLES
Number Page
1 Analysis of variance table for one-way layout 7
2 Summary of data used in power analysis 11
3 Summary of one-way analysis of variance results for
historical data 13
4 Results of power analyses showing the minimum detectable
difference in the concentration of selected contaminants 16
5 Results of simulation analyses demonstrating the effect of
composite sampling on the estimate of the population mean 36
6 Probability of detecting specified levels of minimum
detectable differences for selected grab-sampling and
composite-sampling strategies 42
VI
-------
ACKNOWLEDGMENTS
This document has been reviewed by the 301(h) Task Force of the Environ-
mental Protection Agency, which includes representatives from the Water
Management Divisions of U.S. EPA Regions I, II, III, IV, IX, and X; the
Office of Research and Development - Environmental Research Laboratory-
Narragansett (located in Narragansett, RI and Newport, OR); and the Marine
Operations Division in the Office of Marine and Estuarine Protection, Office
of Water.
This technical guidance document was produced for the U.S. Environmental
Protection Agency under the 301(h) post-decision technical support contract
No. 68-01-6938, Allison J. Duryee, Project Officer. This report was
prepared by Tetra Tech, Inc., under the direction of Dr. Thomas C. Ginn.
The primary author was Mr. Thomas M. Grieb. Ms. Marcy B. Brooks-McAuliffe
performed technical editing and supervised report production.
vii
-------
INTRODUCTION
Monitoring of toxic pollutants in body tissues of marine organisms is
an important assessment technique for evaluating effects of coastal sewage
discharges and other sources of pollution. A key consideration in the
design of bioaccumulation studies is related to the type and number of
samples to be analyzed. Measured concentrations of chemicals in organism
tissue samples commonly display high levels of variability, resulting from
natural biological factors as well as from analytical procedures. Assessment
of this variability is an important step in developing an optimal sampling
design. Chemical analyses of tissue samples also represent a relatively
expensive component of a monitoring program. Without an a priori evaluation
of alternative sampling strategies, there is the possibility of analyzing an
excessive number of samples (with associated high costs) or of analyzing too
few samples where the high variability results in equivocal results.
The objective of this report is to evaluate the applicability of alter-
native sampling strategies for bioaccumulation monitoring programs. A
statistical approach is presented for determining the levels of difference
in bioaccumulation that can be reliably detected with varying levels of
sampling effort. Example analyses are presented to demonstrate the effects
of alternative sampling designs. These example analyses are based on
historical data from bioaccumulation monitoring programs that used tissues
from individual target species recommended in an earlier report in this
series (Tetra Tech 1987a). The results of additional analyses employing
simulation methods are used to provide a comparision of grab- and composite-
sampling strategies.
-------
MONITORING PROGRAM PERFORMANCE
METHODS OF ANALYSIS
An evaluation of the accumulation of toxic pollutants and pesticides in
marine organisms is an important part of 301(h) monitoring programs. The
objective of the bioaccumulation component of 301(h) monitoring programs is
to determine whether the discharge causes an elevation in the body burden of
toxic chemicals in organisms living nearby. This objective is generally
addressed by comparing tissue contaminant levels in organisms near the
discharge and at a reference area. Measured tissue contaminant levels used
for such analyses commonly exhibit a large degree of variability resulting
from factors such as measurement errors and natural variability. This
variability may be great enough to severely limit the ability to detect
statistical changes. However, statistical techniques can be applied to deal
with these sources of uncertainty and to make statistically valid comparisons
of bioaccumulation levels among monitoring stations.
The 301(h) bioaccumulation monitoring studies are generally designed
based on the hypothesis that discharge effects are indicated by measurable
differences in bioaccumulation levels among monitoring stations or monitoring
events. Given this assumption, statistical techniques can be used to
distinguish discharge-related effects from natural variability. This can be
accomplished by partitioning field observations into several components.
Analysis of variance (ANOVA) techniques, which are commonly used to analyze
301{h) monitoring data, relate observations of interest (e.g., bioaccum-
ulation levels) explicitly to various environmental factors and random
errors. This partitioning of field observations can be demonstrated with
the ANOVA experimental model shown in Equation (1), which decomposes a
single observation (Yjj) into several components:
-------
Yijs^ + 51+eij (1)
where:
YJJ = Observations at station i and replicate j of, for example, the
tissue concentration of a selected metal
H = Mean of all Y^j observations
£i = Effect of the itn level of an environmental factor (e.g., station
location)
eij = Random errors not accounted for by either/x or £-j.
Under the example model formulation, the effects of environmental factors
(e.g., station location) on individual observations can be tested for statis-
tical significance. The null hypothesis tested is that the station location
has no effect on observed contaminant concentrations, or stated formally:
?1 = .... = £n = 0. Similarly, more complex models can be formulated to
test for the effect of more than one environmental factor (including time)
as well as the statistical significance of the interaction among factors.
HYPOTHESIS TESTING
The testing circumstances and outcomes associated with testing the null
hypothesis are shown in Figure 1. Four possible outcomes exist:
1. The hypothesis is true and it is not rejected.
2. The hypothesis is true and it is rejected.
3. The hypothesis is false and it is not rejected.
4. The hypothesis is false and it is rejected.
The shaded areas shown in Figure 1 represent incorrect decisions. The
incorrect rejection of the null hypothesis is referred to as a Type I
error. The probability of a Type I error, designated a, represents the
significance level of the statistical test. The incorrect acceptance of the
-------
o
CO
o
LU
Q
ACCEPT
REJECT
HYPOTHESIS
ACTUALLY TRUE ACTUALLY FALSE
Figure 1. Hypothesis testing: possible circumstances and test outcomes.
-------
null hypothesis is referred to as the $ error, where (3 represents the
probability of this incorrect decision. The |3 error is also known as the
Type II error. The probabilities of the correct acceptance and rejection of
the null hypothesis are represented by the complements of the Type I and
Type II errors, respectively.
The probability of correctly rejecting the false null hypothesis (i.e.,
of detecting an effect when one exists) is referred to as the power of the
statistical test. Because the objective of the bioaccumulation monitoring
program is to correctly detect the effects of station location or time of
sampling, the power of a statistical test serves as a basis for evaluating
the performance of the monitoring program. When existing data are available
for the selected monitoring variables, power calculations can be made to
provide a quantitative comparison of alternative sampling layouts. For
example, the probability of correctly detecting the effects of station
location can be determined for a specified level of sampling effort. These
methods can also be used to evaluate and interpret statistical analyses in
which the null hypothesis has been accepted. In this case, the probability
of detecting specific levels of differences between stations or effects
associated with different treatments can be determined for the fixed
parameters of the experimental design.
POWER ANALYSES FOR INDIVIDUAL TISSUE SAMPLES
The power of the statistical test is determined by the following five
study design parameters:
Significance level of the test
Number of sampling stations
Number of replicates
-------
Minimum detectable difference specified for the monitoring
variable
t Residual error variance (i.e., natural variability within the
system).
This relationship between the power of a statistical test and the design
parameters makes several types of power analyses possible. For example, the
power of the test can be determined as a function of the five design
parameters. Alternatively, the value for any individual design parameter
required to obtain a specified power of the statistical test can be
determined as a function of the other four parameters.
For this report, power analyses were conducted using historical data to
determine the minimum detectable difference in the tissue concentration of
specified contaminants as a function of the number of sample replicates.
The purpose of this type of analysis was to determine the level of difference
in tissue concentration of contaminants that can be identified in a test of
statistical significance. In each individual analysis, the power of the
statistical test as well as the other design parameters (i.e., number of
stations, significance level, and residual error variance) were held
constant. However, a series of these analyses was also conducted for
different numbers of stations and values of residual error variance to
demonstrate the effect of these design parameters on the ability to detect
changes among sampling locations.
Analytical Methods
The results of a one-way ANOVA are usually summarized in a manner
similar to that shown in Table 1. The test statistic is the F ratio, which
is the ratio of the between-groups mean square (BMS) to the within-groups
mean square (WMS). As indicated in Table 1, the V/MS is an unbiased estimate
of the population variance (a2), while the expected value of the BMS is
represented by the sum of the population variance and another term
representing the actual fixed effects. This added quantity is:
-------
TABLE 1. ANALYSIS OF VARIANCE TABLE FOR ONE-WAY LAYOUT
Source
Between groups
Within groups
Total
Sum of Squares d.f. Mean Square E(MS)
vj.(y.-y)2 l-i SS /(I-l) a2 + (I-lf ^^-I)2
i i
li^j-V2 n-I SSw/(n-I) °2
- 2
where:
y-jj = Observation at group (station) i and replicate j
y. = ith group mean
y = Overall mean of all i, j observations
I = Number of sampling stations
n = Total number of observations
SS[j = Between groups sum of squares
SSy = Within groups sum of squares
E(MS) = Expected mean square
Ji = Number of replicates at the itn station
£i = True value of the itn effect
£ = Mean of the treatment effects
a2 = Population variance.
-------
d-ir1 EJiUi-*)2
where:
I = The number of sampling stations
Jj = The number of replicates at the itn station
^i = The true value of the ith effect
£ = The mean of the treatment effects.
Under the null hypothesis, the value of the actual fixed effects term
is 0, and the F ratio is equal to 1. When fixed effects are observed in the
monitoring program, the value of this term increases and results in an
increase in the value of the numerator of the F ratio. Large effects will
result in an increase in the power of the test (i.e., the probability of
rejecting a false null hypothesis).
In performing power analyses, a set of effects is assumed and the
performance of the sampling design is evaluated as if these assumed effects
actually occurred. However, when a sample design involves several station
locations, many sets of effects can be assumed. For example, alternative
hypotheses can be constructed under which actual station effects of a certain
magnitude occur at one, two, three, or more of the total number of sampling
locations. The magnitude of the effects could also be varied among
stations. It can be seen that an infinite number of alternative hypotheses
can be constructed for evaluation in power analyses.
The power analyses presented in this report were conducted to provide a
conservative evaluation of monitoring program performance. The testable
hypothesis used in these analyses was constructed such that the effects
occur in the combination that is most difficult to detect. Scheffe (1959)
showed that this conservative set of effects is defined by:
8
-------
A; C = -J- , for an k f i or j (2)
where:
A = The maximum difference in actual effects
£|< = The true value of the ktn effect.
Equation (2) states that the two effects associated with the hypothesis
of interest differ by A while all other effects are equal to the mean of
these two. For the maximum difference in effects equal to A, this
arrangement gives the lowest test power.
The power analyses presented in this report were conducted on the Ocean
Data Evaluation System (ODES). The power analysis tool available on ODES is
described in a user-guidance document (Tetra Tech and American Management
Systems 1986). Statistical power analyses and methods of calculation are
described by Scheffe (1959) and Cohen (1977).
Recent evidence concerning the robustness of the ANOVA model to
deviations from assumptions of normality and equal variances indicates the
appropriateness of these statistical methods in environmental monitoring
applications (Grieb 1985). However, nonparametric statistical methods such
as the Kruskal-Wallis one-way analysis of variance by ranks (Kruskal and
Wall is 1952) could also be used for the analysis of these bioaccumulation
data. While the statistical analysis results in this report apply to the
parametric ANOVA model, these results can also be used to evaluate the
corresponding performance of alternative, nonparametric statistical methods
by computing the power-efficiency of the nonparametric analog. The power-
efficiency of the nonparametric test provides a comparison of the sample
size required to achieve the same level of power associated with the
corresponding parametric tests. For example, the power-efficiency of
statistical Test B relative to Test A is given by:
-------
NA
(100) percent
where:
Ng = The number of samples required in Test B to achieve the same level
of power obtained in Test A with a sample size of N/\.
Calculation of the power-efficiency ratio for the Kruskal-Wal1 is test is
described in Andrews (1954) and Lehmann (1975).
Preliminary Analyses
To conduct the power analysis, it is necessary to obtain an estimate of
the residual error variance (i.e., the natural variability not accounted for
by the statistical model). This estimate can be obtained by conducting a
site-specific preliminary study or by using existing sampling data. For the
purposes of demonstrating the power analysis techniques in this report,
historical data were compiled and analyzed in a one-way ANOVA to estimate
the residual error variance. These estimates were then used as study design
parameters in individual power analyses.
A summary of the historical data is provided in Table 2. Data were
obtained for five taxa: three fish species (Dover sole, English sole, and
winter flounder) and two invertebrate taxa (American lobster and Cancer
spp.). Replicate measurements of tissue concentrations of selected
contaminants were obtained from various numbers of sample locations. Tissue
concentrations of these pollutants were obtained for both muscle and liver
tissues. These data were compiled by Tetra Tech (1987a) as part of a review
of bioaccumulation data on target species recommended for 301(h) discharge
monitoring and were derived from analyses of tissue samples from individual
organisms (i.e., no composite samples). The raw data are presented in Tetra
Tech (1987b). In general, replicated data for tissue body burdens of
priority pollutants are limited. However, while the number of contaminants
included in these data is limited, two important chemical groups of concern
10
-------
TABLE 2. SUMMARY OF DATA USED IN POWER ANALYSIS
Taxon
American lobster
(Homarus americanus)
Dover sole
(Microstomus
pacificus)
Winter flounder
(Pseudopleuronectes
americanus)
English Sole
(Parophrys vetulus)
Crab
(Cancer spp.)
Type of
Tissue
Muscle
Muscle
Muscle
Liver
Liver
Muscle
Muscle
Liver
Liver
Muscle
Muscle
Contaminant
PCBs, Hg, Cd
PCBs, DOT
Cu
PCBs, DDT
Ag, Cd
PCBs, DDT
Cu
PCBs, DDT
Cd, Zn
As, Pb, Hg
PCBs, Pb, Hg
Number of
Stations
4
3
2
3
2
4
3
4
3
6
4
Number of
Replicates Location
10 Long Island Sound
NY Bight Apex
12 Southern California Bight
6
12
6
12 NY Bight Apex
6
12
6
5 Commencement Bay, WA
5 Commencement Bay, WA
Reference
Roberts et al . (1982)
Sherwood et al . (1980)
Sherwood et al. (1980)
Tetra Tech (1985c)
Tetra Tech (1985c)
-------
in terms of bioaccumulation, metals and chlorinated organic compounds, are
represented.
The residual error variance design parameter can be viewed as an
estimate of the denominator in the F ratio, which is used to evaluate the
significance of the ANOVA statistical tests. This quantity is shown in the
one-way ANOVA table (Table 1) as the within-groups mean square and repres-
ents the average variance within groups. Where sample data are available,
this design parameter can be estimated in one of two ways. First, a
preliminary ANOVA can be conducted and the value of the within-groups mean
square used. Second, the sample variance can be computed from all available
data ignoring sample location. The first value provides an estimate of the
variance that is unexplained by the statistical model. Therefore, if the
effects of sample locations are found to be significant in the F test
conducted with the ANOVA, the wi thin-groups mean square will have a value
that is less than the overall sample variance.
Because the variance design parameter is an estimate of the denominator
in the F ratio, it can be seen that the overall sample variance obtained
from existing data provides a larger and, therefore, more conservative
estimate for the purposes of conducting power analyses. In this case, the
estimated value of the difference that can be detected between stations will
be larger than if the power analyses were conducted using the within-groups
mean square as an estimate of denominator in the F ratio. However, where
available data can be fit to the ANOVA model, the estimate of the within-
groups mean square provides a more realistic estimate of the expected value
of the denominator in the F ratio.
The historical data described in Table 2 were analyzed using a one-way
ANOVA design to obtain values of the within-groups mean square for subsequent
power analyses. The results of 23 analyses are shown in Table 3. For each
analysis, the estimated mean tissue concentration of the pollutant, the
within-groups mean square, and a coefficient of variation are presented.
The occurrence of a significant F test is also indicated in this table.
12
-------
TABLE 3. SUMMARY OF ONE-WAY ANALYSIS OF VARIANCE RESULTS FOR HISTORICAL DATA
CO
Data Set
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Estimated Mean
Type of Concentration, x
Taxon Tissue Contaminant (mg/kg)
American Lobster Muscle Total PCBs
Hg
CD
Dover Sole Muscle Total PCBs
Total DDT
Cu
Liver Total PCBs
Total DDT
Ag
Cd
Winter Flounder Muscle Total PCBs
Total DDT
Cu
Liver Total PCBs
Total DDT
Cd
Zn
English Sole Muscle As
Pb
Hg
Crab Muscle Total PCBs
Pb
Hg
.152
.215
.018
.766
1.279
.075
.925
.382
.1162
.7251
.0906
.0126
4.389
4.015
.607
.093
29.594
5.067
.247
.0572
.0918
.316
.094
Wi thin-groups
Mean Square
tf2)
.0061
.0076
<.0001
.3324
6.7761
.0003
6.1616
.0615
.0023
.1194
.0007
<.0001
6.0750
4.2335
.0996
.0023
17.2450
39.2918
.0118
.0006
.0087
.1227
.0033
Coefficient of
Variation
(| x 100)
A
51.4
40.6
28.3
75.3
203.5
23.1
268.4
64.9
41.3
47.7
29.2
41.3
56.2
51.2
52.0
51.6
14.0
123.7
44.0
43.0
101.6
110.8
61.1
Significance
of Test
*a
*
*
*
*
*
*
*
*
*
*
*
*
F Test significant, P<0.05.
-------
Coefficients of variation presented in Table 3 were calculated as the
ratio of the square root of the within-groups mean square to the estimated
overall mean tissue concentration. This ratio was multiplied by 100 so that
the coefficient of variation is expressed as a percentage. These values
thus represent a normalized measure of the unexplained variability (uncer-
tainty) within the data set and, as demonstrated below, an important
indicator of the level at which statistically significant differences can be
detected. A frequency distribution of the observed values of the coeffi-
cients of variation is presented in Figure 2. Values ranged from 14.0 to
268.4, but the majority occurred between 40 and 60.
Analytical Results
Results of the power analyses conducted for each historical data set
are summarized in Table 4. Results of all analyses are expressed as a
percentage of the mean contaminant concentration observed in the particular
data set. The presentation of the minimum detectable difference as a percent
of the observed mean value, rather than as a concentration of the contami-
nant, provides a basis for comparing the results obtained for the different
data sets. For example, this makes it possible to readily evaluate the
effect of the increased sample variability, expressed as an increase in the
coefficient of variation, on the ability to detect statistically significant
differences among sampling locations. This presentation format also confers
a general applicability to these analyses, as the results can be applied to
any data set exhibiting the same or similar coefficient of variation.
However, as discussed below, it is important to evaluate individual monitor-
ing programs in terms of the value of the contaminant concentration that can
be detected among sampling locations.
The analyses presented in Table 4 were conducted with the number of
sampling stations fixed at four or eight. This number of stations was
selected to represent an expected range in many 301(h) bioaccumulation
monitoring programs. The selection of two levels of sampling effort also
provided the opportunity to demonstrate the relative effect of an increase
in the number of sampling locations on the ability to detect differences in
tissue concentrations among stations.
14
-------
8-
U)
0 C-
0 6
LU
CC
cr
o 4 "
LL.
O
cc
LU
2
0 "
c
H
1 10 20
II
I
::'-
30 40
ftl&'tt-
liii
s|||
111
III
50
II
ijl
111!
III!
Ill
1111 1111 1111 ill 111 111
Nill
,V,V, ..V, .^V, ,W« >13C
COEFFICIENT OF VARIATION (a/x x 100)
Figure 2. Frequency distribution for calculated values of the coefficient of variation for 23
historical data sets.
-------
TABLE 4. RESULTS OF POWER ANALYSES SHOWING THE MINIMUM DETECTABLE
DIFFERENCE IN THE CONCENTRATION OF SELECTED CONTAMINANTS
Mean Coefficient of
Tissue Concentration (J) Variation Number of
3ata Set Taxon Contaminant Type mg/kg (£ x 100) Replicates
1 American lobster Total PCBs M 0.152 51.4 2
3
4
5
6
8
10
12
14
2 American lobster Hg M 0.215 40.6 2
3
4
5
6
8
10
12
14
3 American lobster Cd H 0.018 28.3 2
3
4
5
6
8
10
12
14
4 Dover sole Total PCBs M 0.766 75.3 2
3
4
5
6
8
10
12
14
5 Dover sole Total DDT M 1.279 203.5 2
3
4
5
6
8
10
12
14
6 Dover sole Cu M 0.075 23.1 2
3
4
5
6
8
10
12
14
Minimum Detectable
(Percent of
4 Stations 8
283
178
141
121
108
91
80
72
67
223
140
112
96
85
72
63
57
53
156
98
78
67
60
50
44
40
37
414
260
207
178
158
133
117
106
98
Difference6
Mean)
Stations
286
195
158
137
123
104
91
83
76
226
154
125
108
97
82
72
65
60
158
107
87
76
68
57
50
46
42
419
285
232
201
179
152
134
121
112
1,120 1,134
704
560
481
428
361
317
287
264
127
80
64
55
49
41
36
32
30
772
626
542
485
410
362
328
302
129
87
71
61
55
47
41
37
34
16
-------
TABLE 4. (Continued)
Mean Coefficient of
Tissue Concentration (J) Variation Numfcer Qf
Data Set Taxon Contaminant Type rng/kg (-2x100) Replicates
7 Dover sole Total PCBs L 0.925 268.4 2
3
4
5
6
3
10
12
14
8 Dover sole Total DDT L 0.382 64.9 2
3
4
5
6
8
10
12
1*
9 Dover sole Ag L 0.1162 41.3 2
3
4
5
6
8
10
12
14
10 Dover sole Cd L 0.7251 47.7 2
3
4
5
6
8
10
12
14
11 Winter flounder Total PCBs M .0906 29.2 2
3
4
5
6
8
10
12
14
12 Winter flounder Total DDT H .0126 41.3 2
3
4
5
6
8
10
12
14
13 Winter flounder Cu M 4.389 56.2 2
3
4
5
6
8
10
12
14
Minimum Detectable Difference"
(Percent of Mean)
4 Stations 8 Stations
1,477
929
739
634
565
476
418
378
348
357
225
179
153
137
115
101
91
84
228
143
114
98
87
73
65
58
54
262
165
131
113
100
84
74
67
62
160
101
80
69
61
52
45
41
38
229
144
114
98
87
74
65
59
54
309
194
155
133
118
100
88
79
73
1,495
1,018
826
715
640
541
. 478
433
398
362
246
200
173
155
131
116
105
96
231
157
127
110
99
83
74
67
61
266
181
147
127
114
96
85
77
71
162
110
90
78
69
59
52
47
43
231
158
128
111
99
84
74
67
62
313
213
173
150
134
113
100
91
83
17
-------
TABLE 4. (Continued)
Mean Coefficient of
Tissue Concentration (x) Variation Number of
Oata Set Taxon Contaminant Type mg/kg (-2. x 100) Replicates
14 Winter flounder Total PCBs L 4.015 51.2 2
3
4
5
6
8
10
12
14
15 Winter flounder Total DDT L 0.607 52.0 2
3
4
5
6
8
10
12
14
16 Winter flounder Cd L 0.093 51.6 2
3
4
5
6
8
10
12
14
17 Winter flounder Zn L 17.2450 14.0 2
3
4
5
6
8
10
12
14
18 English sole As M 5.067 123.7 2
3
4
5
6
8
10
12
14
19 English sole Pb M 0.247 44.0 2
3
4
5
6
8
10
12
14
20 English sole Kg H 0.0572 43.0 2
3
4
5
6
8
10
12
14
Minimum Detectable Difference"
(Percent of Mean)
4 Stations 8 Stations
282
177
141
121
108
91
80
72
66
286
180
143
123
109
92
81
73
67
284
179
142
122
109
92
81
73
67
77
49
39
33
30
29
22
20
18
681
428
341
292
260
219
193
174
160
242
152
121
104
93
78
69
62
57
237
149
118
102
90
76
67
61
56
286
194
158
137
122
103
91
83
76
290
197
160
139
124
105
93
84
77
288
196
159
138
123
104
92
83
77
78
53
43
37
33
28
25
23
21
689
469
381
330
295
249
220
199
184
245
167
135
117
105
89
78
71
65
239
163
132
115
102
87
76
69
64
18
-------
TABLE 4. (Continued)
Mean Coefficient of
Tissue Concentration (x) Variation Number of
Data Set Taxon Contaminant Type3 mgAg (- x 100) Replicates
21 Cancer crabs Total PCBs M .0918 101.6 2
(Cancer spp.) 3
4
5
6
8
10
12
14
22 Cancer crabs Pb M .316 110.8 2
(Cancer spp.) 3
4
5
6
8
10
12
14
23 Cancer crabs Hg M .094 61.1 2
(Cancer spp.) 3
4
5
6
8
10
12
14
Minimum Detectable
(Percent of
4 Stations 8
558
351
279
240
213
180
158
143
132
610
384
305
262
233
197
173
156
144
336
211
168
144
128
108
95
86
79
K
Difference"
Mean)
Stations
565
385
312
270
242
205
180
163
150
618
420
341
295
264
224
197
179
164
340
232
188
163
146
123
109
98
91
Tissue type: M muscle, L « liver.
Power analyses conducted »t fixed levels of statistical significance (0.05) and power (0.80).
19
-------
As indicated in Table 4, these analyses were conducted at fixed levels
of statistical significance («) and power (1-0) (see Figure 1). The value
of the statistical power was set at 0.80. Thus, the probability of detecting
the minimum differences shown in Table 4 in a one-way ANOVA statistical test
is 0.80. The significance level selected for these power analyses was 0.05,
which corresponds to the value most commonly selected in statistical tests.
Values of the coefficient of variation are presented in Table 4 to
facilitate comparison of the power analysis results. These data show that
for an increase in the value of this measure of unexplained variability,
there is a corresponding increase in the minimum detectable difference among
sampling stations. This relationship can be seen by examining the results
for the smallest and largest values of the coefficient. The smallest value
of this measure of variability was obtained for Data Set 17 (Table 4). For
the zinc concentration observed in the liver tissue of winter flounder, the
computed value of the coefficient of variation is 14.0. Results of the
power analyses obtained for this data set indicate that the minimum differ-
ence in the zinc concentration that can be detected with four replicate
samples at four and eight sampling stations is 39 and 43 percent of the
overall mean value, respectively. The largest value for the coefficient of
variation (268.4) was obtained for the concentration of total PCBs in the
liver tissue of Dover sole (Data Set 7, Table 4). Results of the power
analyses obtained for this data set indicate that with four replicate samples
at four sampling stations the minimum difference in values of tissue concen-
tration that can be detected among stations is approximately 7 times as
great as the overall mean concentration. With four replicate samples at
eight stations, this minimum detectable value is over 8 times as great as
the observed mean tissue concentration.
Analytical results from all 23 data sets (Table 4) are summarized in
Figure 3. For each value of the coefficient of variation, the corresponding
minimum difference, expressed as a percentage of the mean, that can be
detected with five replicate samples at eight stations is plotted. The data
in Figure 3 show that the increase in the minimum detectable difference
among sampling stations is a linear function of the coefficient of variation.
20
-------
800,
01
O
600-
LU
t
Q 400
a
,
200-
FIXED DESIGN PARAMETERS
STATIONS
REPLICATES
POWER
STATISTICAL SIGNIFICANCE
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280
COEFFICIENT OF VARIATION
Figure 3. Minimum detectable difference among sampling stations as a function of the
coefficient of variation.
-------
From Figures 2 and 3, it can be seen that the greatest proportion of
the values of the coefficient of variation for the historical data sets fall
within the range of 40-60. Therefore, examination of the power analysis
results for data sets with coefficients of variation between 40 and 60
provides an estimate of the expected performance of bioaccumulation moni-
toring programs. For example, the calculated value of the coefficient of
variation for Data Set 2 (mercury, lobster) shown in Table 4 is 40.6.
Results of the power analysis for these data indicate that with five
replicate samples at either four or eight stations, the difference in the
concentration of mercury that could be detected between stations is between
0.206 and 0.232 mg/kg. This minimum detectable difference is approximately
equal to the overall observed mean concentration (0.215 mg/kg) of mercury
among sampling stations. Data Set 23 (mercury, crab), on the other hand,
represents conditions at the other end of this range. In this case, the
minimum detectable difference in the concentration of mercury in the muscle
tissue of Cancer spp. with the collection of five replicate samples is
144 percent and 163 percent of the mean value for four and eight stations,
respectively. Thus, in the majority of the data sets evaluated, the
observed coefficient of variation is between 40 and 60, and the collection
of five replicates at eight or fewer stations resulted in the ability to
detect differences in tissue concentrations of contaminants less than or
equal to 163 percent of the overall mean contaminant concentration.
Additional power analyses presented in Figures 4 and 5 were conducted
to summarize the results in Table 4 and to demonstrate the importance of the
level of unexplained variability, represented by the residual error variance
design parameter, in determining the expected performance of a monitoring
program. Specifically, these analyses demonstrate the effect of increased
levels of unexplained variance on 1) the ability to detect a specified
difference between stations and 2) the relative effect of increased numbers
of stations on the minimum detectable difference. These analyses were
conducted for four levels of unexplained variability. Coefficients of
variation were set at 30, 50, 70, and 90. The number of sampling stations
was set at 4, 6, 8, and 16. As with the previous analyses presented in
Table 4, all calculations were conducted for fixed levels of power (0.8) and
statistical significance (0.05).
22
-------
111
2
u_
O
LLJ
O
Z
LJJ
oc
LU
O
vt
111
Q
i
550
500
450
400
350-
300-
5 25CH
LLJ
CO
<
200-
150-
100-
50-
COEFFICIENT
OF VARIATION
90
70
50
30
NUMBER OF STATIONS
4 6
4 6 8 10 12
NUMBER OF REPLICATES
14
16
Figure 4. Minimum detectable difference vs. number of replicates at
selected levels of unexplained variance for 4 and 6 stations.
Power of test = 0.80, significance level = 0.05.
23
-------
2
ID
Z
550-i
500-
450-
400-
u.
° 350
LJJ
O
Z 300
HI
cc
LJJ
U.
§ »
111
CD
^ 200-
UJ
Q 150
100-
50-
COEFFICIENT
OF VARIATION
90
70
50
30
NUMBER OF STATIONS
8 16
4 6 8 10 12
NUMBER OF REPLICATES
14
16
Figure 5. Minimum detectable difference vs. number of replicates at
selected levels of unexplained variance for 8 and 16 stations.
Power of test = 0.80, significance level = 0.05.
24
-------
Results of these analyses, like those presented in Table 4, have general
applicability, because the minimum detectable difference is expressed as a
percentage of the mean. Additionally, the power curves are presented for
coefficients of variation representing a wide range of unexplained varia-
bility in the sampling environment. These curves can be used to evaluate
monitoring program performance for sampling designs using 4-16 stations and
for sampling data exhibiting coefficients of variation between 30 and 90.
This range includes the majority of historical data sets compiled for this
study.
These results show that for an increase in the level of unexplained
variance, the minimum detectable difference between sampling stations in-
creases. For example, in Figure 4 it can be seen that with five replicates
at four stations the minimum detectable difference between stations ranges
from approximately 70 percent of the mean for a coefficient of variation of
30 to 212 percent of the mean for a coefficient of variation of 90. Corres-
pondingly, both figures show that as the level of unexplained variance
increases, a greater level of sample replication is required to detect a
specified level of difference. For example, in a sample design with four
sampling stations (Figure 4), the number of replicate samples required to
detect a difference between stations equal to the mean is 3, 7, 12, and
about 17, respectively, for coefficients of variation of 30, 50, 70, and 90.
Results of these analyses also demonstrate that for a fixed level of
sample variability, the minimum detectable difference between stations
increases as the number of stations increases. This increase is small,
however, compared to the effect of increased variability in the sampling
environment. For example, in Figure 5, for a coefficient of variation equal
to 30, the minimum difference detectable with five replicates is approxi-
mately 80 and 90 percent of the mean for 8 and 16 stations, respectively. In
general, monitoring program performance, measured by the ability to detect
specified differences among stations, is increased for a fixed level of
sampling effort by the collection of more replicates at fewer stations.
However, the effect of number of stations on program performance is small
relative to that of the number of replicate samples.
25
-------
In Table 4 and Figures 3 through 5, the minimum detectable difference
is expressed in terms of a percentage of the overall mean. As indicated,
this provides a direct basis for the comparison of results among data sets,
and the results presented in Figures 4 and 5 allow a quick evaluation of the
expected performance of a large number of study designs. However, in many
monitoring programs, there may be an interest not only in the relative
change in contaminant concentrations among sampling locations, but also in
the minimum value of the contaminant concentration that can be detected. In
fact, results of power analyses used to evaluate individual monitoring
program design are generally expressed in terms of the measured units.
In Figure 6, results of power analyses are shown for selected data sets
presented in Table 4. In each of the four plots shown, the minimum detect-
able difference in the concentration of a selected contaminant is shown as a
function of the number of sample replicates. Additionally, the mean concen-
tration of the particular contaminant in each example case is shown to
indicate the relative performance of monitoring programs at the different
levels of unexplained variability.
Power analyses conducted with Data Set 11 (Table 4) are shown in
Figure 6a. The number of replicates required to detect a specified difference
in the concentration of total PCBs in the muscle tissue of winter flounder
between stations is shown. These data are characterized by a relatively low
coefficient of variation (29.1), indicating a low level of unexplained
variation. As a result, small differences in the contaminant concentration
can be detected with low levels of sample replication. Four or more
replicates will provide adequate replication to detect differences of
approximately 0.09 mg/kg of the contaminant in muscle tissue.
The results of power analyses conducted with data collected from
sampling environments exhibiting increasing levels of unaccountable variation
are shown in Figures 6b, c, and d. The calculated coefficients of variation
for these data sets are 40.6, 64.9, and 101.6, respectively. As the plots
indicate, successive increases in the coefficient of variation are accom-
panied by a decrease in the ability to detect differences relative to the
26
-------
I 4 STATIONS
OBSERVED MEAN CONCEIfTWTIOM
0.16
0.14
0.12-
0.10-
0.08-
0.06
I "4
jj 0.02
0.00
(I) DATA SET 11
WWTERaOUNOER
PCB (MUSCLE)
c.v.. a.i
4 6 a 10 12
NUMEROFREPUOTES
14 16
i
0.5
0.4
0.3
0.2-
0.1-
0.0
(b) DATA SET 2
LOBSTER
Hg (MUSCLE)
CV. . 40.6
4 6 8 10 12 14 16
NUMBER OF REPLICATES
i
1.4
17
1.0
0.8
0.6
0.4
0.2-
0.0
(e) DATA SET I
DOVER SOLE
DOT (LIVER)
C.V. . 64.9
4 6 8 10 12
NUMBER OF REPUCMES
14
16
0.6
0.5
0.4
OJ
02-
0.1
0.0
(d) DATA SET 21
CANCER (pp.
PCS (MUSCLE)
C.V. - 101.6
4 6 8 10 12
NUMBER OF REPLICATES
14 16
Rgure 6. Minimum detectable difference in the tissue concentration
of selected contaminants vs. number of replicates.
27
-------
overall mean concentration among stations. However, in Figures 6b and 6d,
the contaminant concentrations that can be detected at comparable levels of
sampling effort are similar. Likewise, from Figure 6c, 10 replicates at
each station are required to detect a difference approximately equal to the
overall mean among stations. In comparison, even with 15 replicate samples
at each station, the minimum detectable difference in the contaminant
concentration is greater than the overall mean in Figure 6d. However, the
values of the minimum detectable differences in terms of the contaminant
concentration corresponding to 10 replicates at each station are 0.39 mg/kg
(Figure 6c) and 0.16 mg/kg (Figure 6d), Thus, while the minimum detectable
difference in terms of a percentage of the overall mean is greater in one
analysis (Figure 6d), the minimum detectable contaminant concentration is
considerably less than that found in the other analysis (Figure 6c). These
results indicate the importance of evaluating the performance of monitoring
programs both in terms of the relative change in contaminant concentration
that can be detected among sampling locations as well as the minimum
contaminant concentration that can be detected.
Summary
1. Analyses of 23 historical data sets on tissue contaminants
indicate that, with the collection of individual tissue
samples, a difference of <163 percent of the mean could be
detected in the majority of cases (assumes five replicates at
eight or fewer sampling stations,«- 0.05, power - 0.80).
2. Many important chemicals (e.g., PCBs in Dover sole and crabs)
displayed much higher variability, however. In these cases,
a similar analytical design could only detect differences of
about 200-700 percent of the mean.
28
-------
COMPOSITE SAMPLING STRATEGIES
The historical data sets compiled for this report (Table 2) were based
on similar sampling and analytical approaches. In all cases, tissue samples
were obtained from selected organisms and analyzed individually to determine
the concentration of particular contaminants. This type of sample is
referred to as a grab sample, since the individual tissue samples are used
to provide an estimate of the contaminant concentration in the tissue of
specified populations. In each data set presented, a fixed number of these
individual estimates was obtained and analyzed statistically to estimate
distributional parameters of the underlying population and to make statis-
tical comparisons of these parameters among sampling sites.
An alternative to the analysis of tissue from individual organisms is
the analysis of composite samples. Composite tissue sampling consists of
mixing tissue samples from two or more individual organisms collected at a
particular site and analyzing this mixture as a single sample. The analysis
of a composite sample, therefore, provides an estimate of an average tissue
concentration for the individual organisms that make up the composite
sample. This composite sampling strategy is often used in effluent sampling
(Schaeffer and Janardan 1978; Schaeffer et al. 1980) to estimate average
concentrations of water quality variables in cases where the individual
chemical analyses are expensive but the cost of collecting individual
samples is relatively small. Composite sampling is also used in the
collection of samples from bioaccumulation monitoring systems containing
caged specimens of bivalve molluscs (Risebrough et al. 1980; Gordon et
al. 1980). The collection of composite samples is also required in other
cases where the tissue mass of an individual organism is insufficient for
the analytical protocol. An evaluation of the appropriateness of composite
sampling in bioaccumulation monitoring programs is provided below.
Composite sampling of the tissue from selected organisms represents an
attempt to prepare a sample that will represent the average concentration.
29
-------
If X]_, X2, . . . Xn represent the contaminant concentration of n tissue samples
from individual organisms, these samples can be mixed to obtain a single
composite observation:
n
Z = Lo>i Xi (3)
1-1
where:
^j = The proportion of total sample contributed by the itn component.
Rhode (1976) has shown that the expected value and variance of Z are given
by:
E(Z) » M (4)
Var{Z) -
where:
/* = Population mean
a2 = Population variance
o£ = Variance of the composite proportions
n = Number of samples in each composite.
If the values of &} in Equation (3) are equal for all i, then the
numerical value of Z is equivalent to the mean of the n samples, X, where:
n
E Xi
X = ^r- (6)
In this case, for the cost of analyzing a single composite sample, an
estimate of the mean of n grab samples is obtained. However, a consequence
30
-------
of selecting the composite sampling strategy in the above example is the
loss of information concerning individual sample variability. As shown
below, the range of values (minimum and maximum concentrations) contributing
to the mean concentration is not known.
The above results apply to single composite samples. However, replicate
composite samples can also be used in bioaccumulation monitoring programs.
The basic sample design previously described for the historical data sets
involved the collection of replicate grab samples from two or more locations
and the statistical comparison of the mean values among sampling locations.
As an alternative to this design, replicate composite samples, each composed
of tissue from several organisms, could be collected at specified sampling
locations with the objective of obtaining a more accurate estimate of the
true mean at each location and to increase the power of the statistical
tests.
The comparison of single composite and replicate grab samples can be
extended to replicate composite samples (Rhode 1976, 1979). The mean of m
composite samples (I\t fy, ., Zm) is given by:
m
(7)
The expected value and variance of Z are given by:
E(Z) = M (8)
Var(Z) - ^ + no*o2 O)
The consequences of Equations 8 and 9 are pertinent to the evaluation
of composite sampling strategies for bioaccumulation monitoring programs.
For example, when the composites consist of samples of equal mass (i.e., the
same mass tissue is taken from each organism) (0^=0), then:
31
-------
(10)
where:
A2
Var X » (ii)
VarZ- (12)
m = Number of replicate samples (replicate or composite) used in the
estimate of the population variance ( ^)
n = Number of samples constituting the composite sample.
Thus, from Equation 10, it can be seen that the collection of replicate
composite tissue samples at specified sampling locations will result in a
more efficient estimate of the mean (i.e., the variance of the mean obtained
with replicate composite samples is smaller than that obtained with the
collection of replicate grab samples). From Equation 9, it should also be
noted that for unequal proportions of composite samples (i.e., tissue mass),
the variance of the series of composite samples increases and, in extreme
cases, exceeds the variance of grab samples. A table of values for the
upper bound of the variance of the proportions ( aj) that lead to such an
increase in composite variance is presented in Schaeffer and Janardan
(1978). However, these tabulated values are extremely high when compared
with expected values of oj£ associated with preparing tissue-sample com-
posites. For example, using the Dirichlet model for compositing proba-
bilities, Rhode (1979) gave:
(13)
Var Z
as the increase in precision that can be achieved at the additional cost of
the compositing process. For the analyses presented below, it was assumed
32
-------
that the composite samples consist of individual samples of equal proportions
and therefore that a ^=0.
POWER ANALYSES FOR COMPOSITE SAMPLES
Analytical Methods
Historical data that could be used to evaluate the applicability of
composite sampling in bioaccumulation monitoring programs were not avail-
able. Instead, simulation methods were used to make a direct comparison of
grab-and composite-sampling strategies. Simulation refers to the use of
numerical techniques to generate random variables with specified statistical
properties. For the analyses described below, computer programs were
written 1) to produce individual random samples from populations with
statistical properties similar to those of the historical data described in
Table 2 and Figure 2 and 2) to construct composite samples.
The algorithms used to generate the individual random samples are
described in Rubinstein (1981). All algorithms used required the generation
of independent random variables uniformly distributed over the interval
0, 1. The program developed to perform these simulations used the congru-
ential method described by Lewis et al . (1969). Normally distributed
variables were generated using the approach developed by Box and Muller
(1958).
Two sets of analyses are described below. In the first set, simulation
methods were used to show the effect of sample compositing on the estimate
of the population mean. Power analyses were used in the second set of
analyses to demonstrate the effect of increasing the number of samples in a
composite sample on the probability of detecting specified levels of
differences among stations.
Simulation Analyses
The first set of analyses was conducted to demonstrate the effect of
increasing the number of individual samples in the composite on the estimate
33
-------
of the mean. The simulated sampling consisted of randomly selecting 10,000
composite samples from two populations exhibiting two different levels of
variability in the sampling environment. The mean value in both populations
was fixed at 18.52, but the population variances were set at 70.90 or 354.19,
corresponding to coefficients of variation of 45.5 and 101.6, respectively.
These population characteristics were selected as representative of the
range of values for the coefficient of variation observed in the historical
data sets (Table 2 and Figure 2). Coefficients of variation of 40-50 percent
were measured in several historical data sets for metals, including American
lobster muscle (mercury), Dover sole liver (silver and cadmium), and English
sole muscle (lead and mercury). Coefficients of variation of approximately
100 percent were observed for arsenic in English sole muscle and for lead
and PCBs in Cancer spp.
To demonstrate the effect of increasing the number of samples consti-
tuting the composite sample, the sample variance and the range of observed
values were recorded in each experiment. The results of these experiments
are summarized in Figure 7 and Table 5. A graphic display of the increase
in the ability to estimate the population mean obtained by increasing the
number of individual samples in composite samples is provided in Figure 7.
The 95 percent confidence intervals shown in Figure 7 represent the range
within which 95 percent of all samples in the simulation experiments fell.
As the number of individual samples per composite increased, the observed
range of mean values decreased substantially.
The actual values obtained at the boundaries of these confidence
intervals shown in Figure 7 (minimum and maximum values) are presented in
Table 5. In Analysis 1, sampling was conducted from a normal population
with a mean of 18.52 and a variance of 70.90. Therefore, 95 percent of all
values in this population ranged from approximately 1.7 to 35.4. This range
(33.7) would be expected from randomly selecting a large number of individual
samples from the specified population. For the same specified population,
composite sampling resulted in a much more precise estimate of the population
mean. With four individual samples in each composite, 95 percent of all
values obtained from the 10,000 simulated samples were between 10.4 and
26.7. This range (16.3) is approximately 50 percent of the range that would
34
-------
Analysisl. Mean(ji) = 18.52 Coefficient of Variation = 45.5
Variance (o2) = 70.90
30 ,
25-
95% C.
10 .
5-
4 6 10 20
NUMBER OF SAMPLES IN COMPOSITE
Analysis 2. Mean (n)
Variance (o2)
18.52 Coefficient of Variation - 101.6
354.19
40 -I
Z SO-
IL)
Ul
10
4 6 10 20
NUMBER OF SAMPLES IN COMPOSITE
Figure 7. Effects of increasing composite sample size on estimate of
the mean.
35
-------
TABLE 5. RESULTS OF SIMULATION ANALYSES DEMONSTRATING THE
EFFECT OF COMPOSITE SAMPLING ON THE ESTIMATE OF THE
POPULATION MEAN
Analyses 1. Mean (u) = 18.52
Variance (o2) = 70.90
Coefficient of Variation = 45.5
95 Percent Confidence Interval
Number of Samples Observed
in Composite Variance
4 17.29
6 12.31
10 7.02
20 3.49
Minimum
Value
10.4
11.6
13.3
14.9
Maximum
Value
26.7
25.4
23.7
22.2
Observed Range
16.3
13.8
10,4
7.3
Analyses 2. Mean (p) = 18.52
Variance (o2) = 354.19
Coefficient of Variation = 101.6
Number of Samples Observed
in Composite Variance
4 88.29
6 60.34
10 35.57
20 16.73
95 Percent Confidence Interval
Minimum Maximum
Value Value Observed Range
0.1
3.3
6.8
10.5
36.9
33.7
30.2
26.5
36.8
30.4
23.4
16.0
36
-------
be obtained with the collection of a similarly large number of individual
grab samples from this population. Furthermore, an increase in the number
of samples from 4 to ZO in each composite sample decreased the range of
values that define this 95 percent confidence interval by approximately
another 50 percent.
Similar results were obtained in the second experiment that involved
simulated sampling from a population with the same mean value (18.52), but
with the variance increased to 354.2. The simulated collection of composite
samples, each consisting of four individual samples from the population,
resulted in reduction in the range of values in the 95 percent confidence
interval by a factor of 2. A similar reduction in this range was obtained
by increasing the number of samples in the composite to 20.
Power Analyses
The results of power analyses conducted with the historical data sets
of individual grab samples are presented above in Figures 4 through 6. In
these analyses, the minimum detectable difference between sampling stations
was shown as a function of the number of replicate grab samples at each
station. These results are shown for specified sets of design parameters
(i.e., number of stations, significance level of the test, residual error
variance, and power of the test). To demonstrate the effect of sample
compositing on the power of the statistical test of significance, additional
power analyses were conducted. In these analyses, the number of stations
(5), number of replicate samples at each station (5), significance level of
the test (0.05), residual error variance level, and level of minimum
detectable difference were fixed. The power of the test or probability of
detecting the specified minimum difference was then calculated as a function
of the number of individual samples constituting each replicate composite
sample.
Power analyses were conducted for three levels of sample variability.
All design parameters except the residual error variance were, identical in
each set of analyses. Values of the residual error variance were selected
to represent the range of values found in the historical data sets (Table 2
37
-------
and Figure 2). The coefficients of variation selected for these three sets
of analyses were 45.5, 101.6, and 203.5. The highest level of variability
(coefficient of variation = 203.5) is equal to that measured in Dover sole
for DDT concentrations in muscle tissue.
The results of the power analyses conducted at the two lower levels of
sample variability are shown in Figure 8a. In each analysis, the probability
of statistically detecting a difference equal to the overall sample mean
among stations increases with the collection of replicate composite samples
at each station and as the number of samples constituting the composite
increases. For example, in Analysis 1 (Figure 8a), conducted at the lowest
level of sample variability (coefficient of variation = 45.5), the proba-
bility of detecting the specified difference among stations with five
replicate grab samples (i.e., number of samples = 1) is 0.70. With the
collection of five replicate composite samples, each composed of two
individual samples at each station, the power of the test increases to 0.96,
and with four or more samples per composite, the detection of the specified
difference between stations is virtually assured.
The benefits of the composite sampling strategy are more apparent from
the analysis conducted at the intermediate level of sample variability
(Analysis 2, Figure 8a). The probability of statistically detecting the
specified difference among stations with the collection of five individual
grab samples (number of samples = 1) is only 0.17. The power of the test
increases to 0.59 with the collection of 5 replicate composite samples, with
each composite composed of 4 samples in equal proportions, to 0.90 with 8
samples per composite, and to 0.96 with 10 samples per composite.
The results of both sets of analyses shown in Figure 8a also demonstrate
the phenomenon of diminishing returns for continued increases in the number
of samples per composite. In Analysis Set 1, for example, virtually no
increase in the power of the statistical test was achieved with increasing
the individual sample size above three. In the second analysis set,
substantial increases in statistical power were achieved by increasing the
number of samples in each composite from 2 to 10. However, with each
successive increase in sample size, the relative benefit was reduced until
38
-------
Analysis
1
2
3
0.8-
0.6-
0.4-
0.2-
0.0
0 1 2 345 6 7 8 9 10 11 12 13 14 15 16
NUMBER OF SAMPLES
(a)
1.0
0.8-
0.6-
0.4-
0.2-
0.0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
NUMBER OF SAMPLES
(b)
Coefficient
of Variation
45.5
101.6
203.5
Figure 8. Power of statistical tests vs. number of samples in composite
replicate samples. Fixed design parameters: number of stations
= 5, number of replicates = 5, significance level = 0.05, minimum
detectable difference = 100 percent of overall mean value.
39
-------
very little was gained by increasing the sample size above 10. This
phenomenon is analogous to that observed in the results of the power
analyses presented in Figures 4 through 6. In these previous analyses, the
benefit achieved in the minimum detectable difference between stations also
decreased with the addition of each successive replicate grab sample.
However, the main difference is that while the cost of collecting and
processing additional replicate samples (grab or composite) is substantial,
the cost of collecting additional samples for each composite replicate
sample is relatively small.
The results of the power analyses conducted at the highest level of
sample variability selected are shown in Figure 8b. These results are shown
separately from the first two sets of analyses to include a larger range of
values for the number of samples in each composite. These results are
directly comparable, however, and provide additional evidence of the
increased power obtained by the collection of replicate composite samples.
These analyses indicate the relatively low statistical power associated with
samples displaying a high level of natural variability. Under these
conditions (coefficient of variation = 203.5), the probability of detecting
a difference among stations equal to the overall mean with the collection of
five individual replicate grab samples is 0.08. The power of the test is
doubled with the collection of replicate composite samples composed of four
samples (power = 0.17). With the collection of 10 samples per replicate
composite sample, the power is increased to 0.38. However, given the high
level of background variability, the collection of replicate composite
samples composed of 25 individual samples each is required to obtain a
testing power of 0.80. Additionally, due to the diminishing returns
associated with increasing the number of samples per composite, the collec-
tion of replicate composite samples consisting of 32 samples each is
required to obtain a statistical power of approximately 0.90.
A final set of power analyses was conducted to provide a direct
comparison of grab-sampling and composite-sampling strategies. The number
of stations (5), significance level of the test (0.05), and residual error
variance level (coefficient of variation = 101.6) were fixed in all
analyses. In individual analyses, the probability of detecting specified
40
-------
minimum differences between stations was determined for selected numbers of
replicate grab samples and for a fixed number of replicate composite samples
at each station composed of selected numbers of individual samples. The
results of these analyses are summarized in Table 6.
Results of the first three analyses shown in Table 6 demonstrate the
effect of increasing the number of replicate grab samples on the ability to
detect statistically significant differences between sampling stations. For
example, the probability of detecting a difference equal to the overall mean
among stations is increased from 0.17 to 0.35 by increasing the number of
replicate grab samples at each of the five stations from 5 (Analysis 1) to 10
(Analysis 3). These first three results demonstrate from a different
perspective what was previously shown in Figures 4 through 6that an
increase in the number of replicate samples is accompanied by an increase in
the ability to identify differences among sampling stations. The results of
Analyses 4 through 6 presented in Table 6 demonstrate the effect of sample
compositing on the ability to detect differences between stations. These
analyses were conducted for five replicate composite samples at each station
and various numbers of individual samples per composite. These results,
therefore, are directly comparable to those provided in Analysis 1 for the
collection of five replicate grab samples at each station. Comparison of
the probability of detecting a difference between stations equal to the
overall mean in Analyses 1, 4, 5, and 6 indicates that a substantial
increase in the power of the statistical test can be achieved by the
collection of replicate composite samples. These analyses demonstrate that
the collection of five replicate composite samples each consisting of four
samples will increase the power to 0.59. The power is further increased to
0.96 by increasing the samples in each composite sample to 10.
Summary
1. Based on simulation results for a given number of samples,
composite sampling results in a much more precise estimate of
the mean than analysis of grab samples.
41
-------
TABLE 6. PROBABILITY OF DETECTING SPECIFIED LEVELS OF MINIMUM
DETECTABLE DIFFERENCES FOR SELECTED GRAB-SAMPLING
AND COMPOSITE-SAMPLING STRATEGIES.
FIXED DESIGN PARAMETERS: NUMBER OF STATIONS = 5,
SIGNIFICANCE LEVEL = 0.05, COEFFICIENT OF VARIATION = 101.6
Number of
Replicate
Samples
Analysis at Each
Number Station
Minimum Detectable Difference
(expressed as a proportion
Number of of the overall mean)
Samples in 0.25 0.50 1.0 1.5 2.0
Composite Corresponding Probability of Detection
1
2
3
4
5
6
5
8
10
5
5
5
1 (grab sample) 0.06 0.08 0.17 0.35 0.59
1 (grab sample) 0.06 0.10 0.27 0.58 0.85
1 (grab sample) 0.07 0.11 0.35 0.71 0.94
4 0.08 0.17 0.59 0.93 1.00
8 0.10 0.31 0.90 1.00 1.00
10 0.11 0.38 0.96 1.00 1.00
42
-------
2. The precision of the estimated mean increases with increasing
numbers of individual samples constituting a composite sample.
3. Because of reduced sample variance, composite sampling
results in a considerable increase in statistical power over
grab sampling (for a given number of samples analyzed).
4. For most contaminants, the collection of six to eight samples
per composite results in adequate statistical power, with
little relative gain in power for additional samples.
43
-------
SUMMARY AND RECOMMENDATIONS
This document describes the use of power analyses in designing 301(h)
bioaccumulation monitoring programs and provides evaluations of alternative
sampling strategies. These methods can be used to evaluate alternative
designs on the basis of the level of sampling effort required to obtain
desired levels of precision. For example, existing data can be analyzed to
determine the minimum differences in contaminant concentrations that can be
detected for selected levels of sample replication. The probability of
detecting specific levels of differences in tissue contaminant concentrations
for alternative sampling designs can also be determined.
The example analyses presented in this report were conducted on the
Ocean Data Evaluation System (ODES) using the statistical power analysis
tool. The ODES Power Analysis Tool can be used to assess bioaccumulation
monitoring programs from two perspectives. In the monitoring program design
phase, these techniques can be used in a prospective manner to evaluate
alternative design parameters such as numbers of samples and sampling
stations. The techniques can also be used retrospectively when monitoring
data are available to evaluate overall monitoring program performance. For
example, if a greater statistical power was desired for future data, the
relative benefits in power of increasing numbers of replicate samples could
be evaluated relative to increased program costs.
The use of power analyses in designing bioaccumulation studies was
demonstrated with both historical and simulated data. Twenty-three
historical data sets were compiled from published reports and analyzed.
Data were obtained for a total of five common marine species, three body
tissues, and measured values of nine contaminants. These data encompassed a
wide range of sample variability, and the results of the analyses conducted
provide an indication of the approximate levels of statistical power that
can be achieved with the collection of replicate grab samples at selected
sampling locations. Simulation techniques, sometimes referred to as Monte
44
-------
Carlo techniques, were used to produce data from specified sampling distri-
butions with fixed parameters. These data were essential for the evaluation
of grab- vs. composite-sampling strategies because equivalent historical
data for composite samples were not available.
In addition to the description and demonstration of power analysis
techniques, a primary objective of this document was to evaluate composite-
vs. grab-sampling strategies. Based on the results presented in this report,
the collection of replicate composite samples is recommended for most
bioaccumulation monitoring programs. The results of the analyses using
simulated data demonstrated that the collection of replicate composite
tissue samples at selected sampling locations provides a better estimate of
the population mean. The results of power analyses using these simulated
data also demonstrate that the corresponding decrease in the sample variance
that is achieved with composite sampling leads to an increase in the power
of statistical tests. For example, with an overall coefficient of variation
of approximately 100, the analyses demonstrated that the probability of
detecting a difference in tissue contaminant concentrations equal to the
overall mean among 5 stations was 0.17 with the collection of 5 replicate
grab samples at each station and 0.59, 0.90, and 0.96 with the collection of
5 replicate composite samples consisting of 4, 8, and 10 samples, respec-
tively.
Based on the levels of variability in the measurements of tissue
contaminant concentrations that were identified in the historical data sets
and the results of analyses presented in this report, it was concluded that
the collection of replicate composite samples makes it feasible to distin-
guish elevated tissue concentrations of contaminants between sampling
locations. However, the selection of the appropriate numbers of replicate
composite samples and numbers of samples per replicate will depend on
site-specific levels of sample variability in the tissues and contaminants
of concern. When these kinds of historical data are available for a
particular study site, the tools demonstrated in this document can be used
to make quantitative comparisons of alternative sampling designs and to
select the appropriate level of sampling effort. Where historical data are
not available, pilot studies may be conducted to estimate the level of
45
-------
expected variability in contaminant concentrations for selected species and
tissues. Alternatively, the observed level of variability in tissue
concentrations for selected contaminants and species at other locations
could be used to estimate sample variability. Where these data cannot be
obtained, the collection of five replicate composite samples, each consisting
of equal amounts of tissue from six individual organisms, is recommended.
The selection of these design parameters assumes a coefficient of variation
calculated among stations of approximately 100 percent and will result in
the ability to detect a difference between stations equal to the overall
mean concentration among stations with an estimated probability of detection
(power of the statistical test) equal to 0.80. Note that most data sets
reviewed herein had coefficients of variation less than 100. Therefore, the
recommended design will generally result in either a lower detectable
difference or higher power than stated above. If this general design
specification is used, power analyses should be conducted after site-
specific data are available to evaluate the exact probability of detecting
specific levels of differences between stations.
The objectives of a bioaccumulation monitoring program should, be
evaluated prior to selecting a sampling strategy. Composite sampling
methods are appropriate for monitoring programs that have as a primary
objective the determination of differences in contaminant tissue concentra-
tions among sampling stations. However, as shown in Equations 9 and 11, the
variance of composite samples is substantially less than the population
variance. As a consequence, the range of values obtained from composite
samples is not representative of the true range of tissue concentrations in
individual organisms of the sampled population. These results demonstrate
that composite sampling will not detect extreme values. Therefore, bioac-
cumulation monitoring programs using a composite sampling strategy may not
detect the existence of tissue concentrations that exceed legal limits or
action levels for contaminants in fish tissues. For example, from the
historical data compiled for this study, the mean concentration of total PCBs
in Dover sole muscle tissue (Data Set 4, Table 3) was 0.766 ppm. However, 2
of the 36 values in this data set exceed the U.S. Food and Drug Administra-
tion legal limit of 2.0 ppm (U.S. Food and Drug Administration 1984). These
values would not have been detected in a monitoring program based on the
46
-------
collection of composite samples. Therefore, if the objective of a bioaccumu-
lation study is to determine compliance with specified tissue concentration
limits, the program should include the collection of tissue samples from
individual organisms. This objective could be accomplished by two different
monitoring strategies. In the first, the entire study could be designed to
collect replicate grab samples. In this design there would be a much lower
statistical power to detect among-station differences than could be accom-
plished with a composite sampling strategy using the same number of samples.
Alternatively, the program could be designed to collect replicate composite
samples at all sampling stations with the collection of supplemental
individual tissue samples at areas of expected high tissue concentrations.
This program design would enable a more efficient assessment of among-
station differences in addition to providing an assessment of regulatory
compliance in specified areas of concern.
A primary objective of 301(h) bioaccumulation monitoring programs is to
determine whether the discharge causes an increase in the body burden of
toxic chemicals in indigenous organisms. Monitoring programs may also use
caged molluscs as sentinel organisms to evaluate uptake of toxic pollutants.
For these kinds of studies on indigenous or transplanted organisms, a
composite sampling strategy is recommended. Evaluation of effects on
recreational and commercial fisheries is another important component of some
301(h) monitoring programs. Where such fisheries are included in the
assessment, it is important to document whether tissue contaminant levels
exceed applicable criteria or standards. In these cases, the 301(h)
monitoring program may contain the dual objectives discussed previously.
Based on the statistical evaluations conducted herein, it is recommended
that such dual-objective programs be designed to collect composite tissue
samples at all sampling stations, and that they also include the collection
of individual tissue samples for commercial and recreational species in
selected areas.
An additional concern relative to selecting between analysis of
individual organisms and composite samples involves an evaluation of the
numbers of organisms required for collection. As stated previously, the
analytical costs associated with processing grab samples and composite
47
-------
samples are essentially equal. Overall cost differences between the two
strategies are associated with the additional time needed to collect
organisms and process tissues for composite sampling. For the general
composite sampling design recommended above, 30 organisms would be required
at each sampling station. For a comparable design involving analysis of
individual organisms, only five organisms would be required at each sampling
station. Therefore, the decision on appropriate sampling strategy should
also involve an assessment of the feasibility of collecting the required
numbers of organisms.
48
-------
REFERENCES
Andrews, F.C. 1954. Asymptotic behavior of some rank tests of analysis of
variance. Ann. Math. Stat. 25:724-736.
Box, G.E.P., and M.E. Muller. 1958. A note on the generation of random
normal deviates. Ann. Math. Stat. 29:610-611.
Cohen, 0. 1977. Statistical power analysis for the behavioral sciences.
Academic Press, New York, NY.
Gordon, M., G.A. Knauer, and J.H. Martin. 1980. Mytilus californianus as a
bioindicator of trace metal pollution: variability and statistical consider-
ations. Mar. Pollut. Bull. 11:195-198.
Grieb, T.M. 1985. Robustness of the analysis of variance in environmental
monitoring applications. Report EA 4015. Electric Power Research Institute,
Palo Alto, CA. 72 pp.
Kruskal, W.H., and W.A. Wall is. 1952. Use of ranks in one-criterion
variance analysis. 0. Am. Statist. Assoc. 47:583-612.
Lehmann, E.L. 1975. Nonparametrics: statistical methods based on ranks.
Holden-Day, Inc., San Francisco, CA. 457 pp.
Lewis, P.A.W., A.S. Goodman, and J.M. Miller. 1969. A pseudo-random number
generator for the System/360. IBM Syst. J. 8:199-200.
Rhode, C.A. 1976. Composite sampling. Biometrics 32:278-282.
Rhode, C.A. 1979. Batch, bulk, and composite sampling, pp. 365-377. In:
Sampling Biological Populations. R.M. Cormack et al. (eds). International
Co-operative Publishing House, Fainand, MD.
Risebrough, R.W., B.W. deLappe, E.F. Letterman, J.L. Lane, M. Firestone-
Gillis, A.M. Springer, and W. Walker II. 1980. California mussel watch:
1977-1978. Volume III - Organic Pollutants in Mussels, Mytilus cal ifornianus
and M. edulis along the California Coast. Water Quality Monitoring Report
No. 79-22. Prepared by Bodega Marine Laboratory, Bodega Bay, CA, for
California State Water Resources Control Board, Sacramento, CA. 109 pp. +
appendices.
Roberts, A.E., D.R. Hill, and E.G. Tifft. 1982. Evaluation of New York
Bight lobsters for PCBs, DDT, petroleum hydrocarbons, mercury, and cadmium.
Bull. Environ. Contam. Toxicol. 29:711-718.
Rubinstein, R.Y. 1981. Simulation and the Monte Carlo method. John Wiley
and Sons, New York, NY. 278 pp.
49
-------
Schaeffer, D.J., and K.G. Janardan. 1978. Theoretical comparison of grab
and composite sampling programs. Biometrical J. 20:215-227.
Schaeffer, D.J., H.W. Kerster, and K.G. Janardan. 1980. Grab versus
composite sampling: a primer for the manager and engineer. Environ.
Manage. 4:157-163.
Scheffe, H. 1959. The analysis of variance. John Wiley and Sons, New
York, NY. 477 pp.
Sherwood, M.J., A.J. Mearns, D.R. Young, B.B. McCain, R.A. Murchelano,
G. Alexander, T.C. Heeson, and T.-K. Jan. 1980. A comparison of trace
contaminants in diseased fishes from three areas. Southern California
Coastal Water Research Project, Long Beach, CA. 131 pp.
Tetra Tech. 1985a. Bioaccumulation monitoring guidance: 1. estimating
the potential for bioaccumulation of priority pollutants and 301(h) pesti-
cides discharged into marine and estuarine waters. Final Report prepared
for Marine Operations Division, Office of Marine and Estuarine Protection,
U.S. Environmental Protection Agency. EPA Contract No. 68-01-6938. Tetra
Tech, Inc., Bellevue, WA. 56 pp. + appendices.
Tetra Tech. 1985b. Bioaccumulation monitoring guidance: 3. recommended
analytical detection limits. Final Report prepared for Marine Operations
Division, Office of Marine and Estuarine Protection, U.S. Environmental
Protection Agency. EPA Contract No. 68-01-6938. Tetra Tech, Inc., Bellevue,
WA. 23 pp.
Tetra Tech. 1985c. Commencement Bay nearshore/tideflats remedial investi-
gation. Final Report. Volumes 1 and 2. Prepared for Washington State
Department of Ecology under Contract No. C84031. Tetra Tech, Inc., Bellevue,
WA.
Tetra Tech. 1987a. Bioaccumulation monitoring guidance: 2. selection of
target species and review of available bioaccumulation data, volume I. EPA
430/9-86-005. U.S. Environmental Protection Agency, Marine Operations
Division, Office of Marine and Estuarine Protection, Washington, DC. 52 pp.
Tetra Tech. 1987b. Bioaccumulation monitoring guidance: 2. selection of
target species and review of available bioaccumulation data, volume II:
appendices. EPA 430/9-86-006. U.S. Environmental Protection Agency, Marine
Operations Division, Office of Marine and Estuarine Protection, Washington,
DC.
Tetra Tech. 1987c. Technical support document for ODES statistical power
analysis. Final Report prepared for Marine Operations Division, Office of
Marine and Estuarine Protection, U.S. Environmental Protection Agency. EPA
Contract No. 68-01-6938. Tetra Tech, Inc., Bellevue, WA. 34 pp. + appendix.
Tetra Tech and American Management Systems, Inc. 1986. ODES user's guide:
supplement A - description and use of Ocean Data Evaluation System (ODES)
tools. Prepared for U.S. Environmental Protection Agency. Tetra Tech,
Inc., Bellevue, WA.
50
-------
U.S. Food and Drug Administration. 1984. Polychlorinated biphenyls (PCBs)
in fish and shellfish; reduction of tolerances; final decision. U.S. FDA,
Rockville, MD. Federal Register, Vol. 49, No. 100. pp. 21514-21520.
51
------- |