EPA CONTRACT NO. 68-01-6938
TC-3953-03
FINAL REPORT
BIOACCUMULATION
MONITORING GUIDANCE:
STRATEGIES FOR SAMPLE
REPLICATION  AND COMPOSITING
      Cl
      ci

ci     ci


Cl     Cl
                                CCI3
                             cr
                                     -ci

JUNE, 1987

PREPARED FOR:
MARINE OPERATIONS DIVISION
OFFICE OF MARINE AND ESTUARINE PROTECTION
U. S. ENVIRONMENTAL PROTECTION AGENCY

WASHINGTON, DC 20460

-------
EPA Contract No. 68-01-6938
TC 3953-03

Final Report
BIOACCUMULATION MONITORING GUIDANCE:

STRATEGIES FOR SAMPLE REPLICATION
AND COMPOSITING
for

U.S. Environmental Protection Agency
Office of Marine and Estuarine Protection
Washington, DC  20460
June 1987
by

Tetra Tech, Inc.
11820 Northup Way, Suite 100
Bellevue, Washington  98005

-------
                                  PREFACE
     This manual  has been  prepared  by  the  U.S.  Environmental  Protection
Agency  (EPA) Marine  Operations Division, Office  of Marine  and Estuarine
Protection  in  response  to  requests  for  guidance  from U.S.  EPA regional
offices and  coastal  municipalities  planning  301(h)  monitoring programs for
municipal  discharges into the marine  environment.  The members of the 301(h)
Task Force  of EPA,  which  includes  representatives  for the U.S. EPA Regions
I, II, III,  IV, IX, and X,  the  Office  of Research  and Development,  and the
Office of Water,  are to be  commended  for  their vital  role  in the develop-
ment of this guidance by the technical  support contractor, Tetra Tech, Inc.

     This report provides guidance on  the  selection of appropriate replica-
tion and  compositing strategies  for  bioaccumulation monitoring studies.
This  report is  one element  of  the  Bioaccumulation  Monitoring Guidance
Series.  The purpose of this series  is  to provide guidance for monitoring of
priority pollutant  residues  in  tissues of  resident marine organisms.  These
guidance  documents  were prepared  for  the 301(h)  sewage discharge  permit
program under the U.S. EPA Office of  Marine and Estuarine Protection, Marine
Operations Division.  Other documents in this  series  include:

     •    Estimating the  Potential   for  Bioaccumulation  of Priority
          Pollutants and 301(h)  Pesticides (Tetra Tech  1985a)

     0    Selection of Target Species and Review of  Available Bioaccumu-
          lation Data, Volumes I and  II (Tetra Tech 1987a,b)

     •    Recommended Analytical Detection Limits (Tetra Tech 1985b).

     The statistical  analyses conducted  in this document are  based  on the
Ocean  Data  Evaluation System (ODES) Tool   No.  14  for  Statistical  Power
Analysis.    The Technical  Support Document  for ODES Statistical  Power
                                       ii

-------
Analysis  (Tetra  Tech  1987c)  describes the  basis  for, and  application  of,
these analytical  procedures.

     The information provided herein will be useful  to  U.S.  EPA monitoring
program reviewers,  permit  writers,  permittees,  and  other  organizations
involved  in  performing  nearshore monitoring  studies.   Bioaccumulation
monitoring has become  increasingly  important in assessing pollution effects;
therefore  this guidance should  have broad applicability in  the  design  and
interpretation of marine and  estuarine monitoring programs.
                                       m

-------
                                  CONTENTS


                                                                        Page

PREFACE                                                                  ii

LIST OF FIGURES                                                           v

LIST OF TABLES                                                           vi

ACKNOWLEDGMENTS                                                         vi i

INTRODUCTION                                                              1

MONITORING PROGRAM PERFORMANCE                                            2

     METHODS OF ANALYSIS                                                  2

     HYPOTHESIS TESTING                                                   3

     POWER ANALYSES FOR INDIVIDUAL TISSUE SAMPLES                         5

          Analytical Methods                                              6
          Preliminary Analyses                                           10
          Analytical Results                                             14
          Summary                                                        28

COMPOSITE SAMPLING STRATEGIES                                            29

     POWER ANALYSES FOR COMPOSITE SAMPLES                                33

          Analytical Methods                                             33
          Simulation Analyses                                            33
          Power Analyses                                                 37
          Summary                                                        41

SUMMARY AND RECOMMENDATIONS                                              44

REFERENCES                                                               49
                                        IV

-------
                                  FIGURES
Number                                                                 Page
   1    Hypothesis testing:   possible circumstances  and  test
        outcomes                                                         4
   2    Frequency distribution for calculated  values of  the
        coefficient of variation for 23 historical data  sets              15
   3    Minimum detectable difference among sampling stations as a
        function of the coefficient of variation                          21
   4    Minimum dectectable difference vs.  number of replicates at
        selected levels of unexplained variance for  4 and  6 stations      23
   5    Minimum detectable difference vs.  number  of  replicates at
        selected levels of unexplained variance for  8 and  16  stations     24
   6    Minimum detectable difference in the tissue  concentration
        of selected contaminants vs. number of replicates                 27
   7    Effects of increasing composite sample size  on estimate of
        the mean                                                         35
   8    Power of statistical  tests vs. number  of  samples in
        composite replicate samples                                      39

-------
                                  TABLES


Number                                                                 Page

   1    Analysis of variance table for one-way layout                     7

   2    Summary of data used in power analysis                          11

   3    Summary of one-way analysis of variance results  for
        historical  data                                                 13

   4    Results of power analyses showing the minimum  detectable
        difference in the concentration of selected  contaminants         16

   5    Results of simulation analyses demonstrating the effect of
        composite sampling on the estimate of the population mean        36

   6    Probability of detecting specified levels of minimum
        detectable differences for selected grab-sampling and
        composite-sampling strategies                                   42
                                        VI

-------
                              ACKNOWLEDGMENTS
     This document has  been  reviewed by the 301(h) Task Force of the Environ-
mental  Protection  Agency,  which  includes representatives from  the Water
Management Divisions of  U.S.  EPA  Regions  I,  II,  III,  IV, IX, and  X; the
Office  of  Research and  Development -  Environmental  Research Laboratory-
Narragansett (located  in  Narragansett,  RI  and Newport,  OR);  and the Marine
Operations Division in the Office of Marine and  Estuarine  Protection, Office
of Water.

     This technical  guidance document was produced for the U.S.  Environmental
Protection Agency under the 301(h)  post-decision technical support contract
No.  68-01-6938, Allison  J.  Duryee,  Project  Officer.   This  report was
prepared  by  Tetra Tech,  Inc., under  the  direction of  Dr. Thomas  C.  Ginn.
The primary  author  was Mr.  Thomas M. Grieb.  Ms.  Marcy B. Brooks-McAuliffe
performed technical editing and supervised report production.
                                       vii

-------
                               INTRODUCTION
     Monitoring of toxic  pollutants  in  body tissues of marine organisms is
an important assessment  technique  for evaluating effects of coastal  sewage
discharges and  other sources  of pollution.   A key  consideration  in  the
design  of bioaccumulation  studies   is  related to  the type and  number of
samples to be  analyzed.   Measured concentrations of  chemicals in  organism
tissue samples  commonly  display high levels of variability, resulting  from
natural biological  factors as well as from analytical procedures.   Assessment
of this variability  is  an important  step in developing  an  optimal  sampling
design.  Chemical  analyses  of  tissue  samples  also  represent  a  relatively
expensive component of a monitoring  program. Without  an  a priori  evaluation
of alternative sampling strategies,  there is the possibility of analyzing an
excessive number of samples  (with associated high costs)  or of  analyzing too
few samples where the high variability results  in equivocal  results.

     The objective of this report is  to evaluate the applicability of alter-
native  sampling strategies  for  bioaccumulation  monitoring programs.  A
statistical approach  is  presented  for determining the levels  of  difference
in  bioaccumulation that  can  be reliably  detected  with  varying  levels of
sampling effort.   Example analyses are  presented  to demonstrate the effects
of  alternative sampling  designs.   These example  analyses  are  based on
historical data from bioaccumulation monitoring programs that  used tissues
from  individual target  species  recommended in  an  earlier  report in  this
series  (Tetra   Tech  1987a).   The results  of  additional   analyses  employing
simulation methods are used to provide a comparision of grab- and  composite-
sampling strategies.

-------
                      MONITORING PROGRAM PERFORMANCE
METHODS OF ANALYSIS

     An evaluation of the accumulation of toxic pollutants  and  pesticides in
marine organisms is  an  important  part of  301(h)  monitoring  programs.   The
objective of the bioaccumulation component of  301(h) monitoring programs is
to determine whether the discharge causes an  elevation  in the body  burden of
toxic  chemicals  in organisms  living  nearby.   This  objective  is  generally
addressed  by  comparing  tissue  contaminant  levels  in  organisms   near the
discharge and at a reference area.  Measured tissue  contaminant levels used
for such  analyses  commonly  exhibit a  large  degree of  variability  resulting
from  factors  such  as  measurement  errors  and  natural  variability.   This
variability may  be great enough  to  severely  limit  the ability  to detect
statistical changes.  However,  statistical  techniques can be applied  to deal
with these sources of uncertainty and to make statistically  valid comparisons
of bioaccumulation levels among monitoring stations.

     The  301(h)  bioaccumulation monitoring  studies  are  generally designed
based  on  the  hypothesis that discharge effects are  indicated  by measurable
differences in bioaccumulation  levels among monitoring stations  or monitoring
events.    Given  this  assumption, statistical  techniques can be used to
distinguish discharge-related effects  from natural  variability. This can be
accomplished  by  partitioning  field observations  into  several components.
Analysis  of variance (ANOVA) techniques, which  are commonly used to  analyze
301{h) monitoring data,  relate observations of  interest  (e.g.,   bioaccum-
ulation  levels)  explicitly  to various environmental  factors and  random
errors.   This  partitioning of  field  observations  can  be  demonstrated  with
the ANOVA experimental  model  shown  in  Equation (1),  which  decomposes  a
single observation (Yjj) into several  components:

-------
                         Yijs^ + 51+eij                              (1)

where:

    YJJ = Observations  at station  i  and  replicate  j  of,  for  example,  the
          tissue concentration  of a  selected metal
      H = Mean of all Y^j observations
     £i = Effect of the itn level  of an environmental factor (e.g., station
          location)
    eij = Random errors not  accounted  for  by either/x or £-j.

Under  the example  model  formulation,  the effects  of  environmental  factors
(e.g., station location)  on  individual observations can  be tested for statis-
tical significance.  The null  hypothesis tested  is  that the station location
has  no effect on  observed  contaminant  concentrations,  or stated formally:
?1  = .... = £n  =  0.   Similarly,  more complex models  can be formulated to
test  for the  effect of more than  one environmental factor (including time)
as well as the statistical significance  of the interaction among factors.

HYPOTHESIS TESTING

      The testing circumstances and  outcomes associated with testing the  null
hypothesis are shown in Figure 1.   Four  possible  outcomes  exist:

      1.   The hypothesis  is true and it  is not rejected.

      2.   The hypothesis  is true and it  is rejected.

      3.   The hypothesis  is false and it is not  rejected.

      4.   The hypothesis  is false and it is rejected.

      The shaded  areas  shown in Figure 1 represent  incorrect  decisions.  The
incorrect   rejection  of  the  null   hypothesis  is referred to  as a  Type  I
error.  The  probability  of a  Type I error,  designated  a,  represents  the
significance level of the statistical  test.   The incorrect acceptance of the

-------
o

CO

o
LU
Q
     ACCEPT
REJECT
                             HYPOTHESIS


                  ACTUALLY TRUE           ACTUALLY FALSE
Figure 1. Hypothesis testing: possible circumstances and test outcomes.

-------
null  hypothesis  is  referred  to  as  the  $  error, where  (3  represents  the
probability of this incorrect decision.  The  |3 error is also  known  as  the
Type II error.   The  probabilities  of  the  correct  acceptance and rejection of
the null  hypothesis are  represented  by  the complements  of the  Type  I  and
Type II errors,  respectively.

     The probability of correctly  rejecting  the false null  hypothesis  (i.e.,
of detecting an effect when one exists)  is  referred  to  as  the power of the
statistical test.  Because  the  objective of the  bioaccumulation monitoring
program is to correctly  detect  the effects  of  station  location  or time of
sampling,  the power of  a statistical test serves as  a basis  for evaluating
the performance of the monitoring  program.   When existing data are  available
for the  selected monitoring  variables,  power calculations can  be made to
provide a  quantitative  comparison of alternative  sampling  layouts.   For
example,  the  probability  of correctly detecting  the  effects  of  station
location can be determined for a  specified  level  of  sampling  effort.  These
methods can also  be used to evaluate and interpret  statistical  analyses in
which the null  hypothesis has been accepted.   In  this case, the  probability
of  detecting specific  levels of differences between  stations  or  effects
associated with  different treatments  can be determined  for the fixed
parameters of the experimental design.

POWER ANALYSES FOR INDIVIDUAL TISSUE  SAMPLES

     The power of the statistical test  is determined by the  following  five
study design parameters:

     •    Significance level of the test

     •    Number of sampling stations

     •     Number of replicates

-------
     •    Minimum  detectable difference  specified  for the  monitoring
          variable

     t    Residual error variance (i.e., natural variability within the
          system).

This relationship between  the  power of  a statistical test and  the  design
parameters makes several types of power analyses possible.  For example, the
power  of  the  test  can be  determined as  a  function of  the five  design
parameters.  Alternatively, the value  for any  individual  design parameter
required  to obtain  a  specified  power of the  statistical  test  can  be
determined as a function of the other four parameters.

     For this report,  power analyses were  conducted using historical data to
determine the minimum detectable difference  in  the  tissue  concentration  of
specified contaminants as  a function  of  the number  of  sample replicates.
The purpose of this type of analysis was to determine the  level  of difference
in tissue concentration  of contaminants that can be identified in a test of
statistical significance.    In  each individual  analysis,  the  power  of the
statistical test as  well   as the  other design  parameters  (i.e.,  number  of
stations,  significance  level,  and  residual  error  variance)  were  held
constant.   However,  a  series of these analyses  was  also  conducted for
different  numbers of  stations and values  of residual  error  variance  to
demonstrate the effect of  these design parameters  on the ability to detect
changes among sampling locations.

Analytical Methods

     The  results  of  a  one-way ANOVA  are  usually  summarized in  a  manner
similar to that shown  in Table  1.   The test statistic is the F ratio, which
is  the  ratio of  the  between-groups mean  square (BMS) to the within-groups
mean square (WMS).   As indicated  in Table  1,  the V/MS  is an unbiased estimate
of  the population  variance  (a2),  while the  expected value of  the  BMS  is
represented  by  the  sum   of  the  population variance  and  another  term
representing the actual  fixed effects.   This added quantity  is:

-------
             TABLE  1.  ANALYSIS  OF  VARIANCE  TABLE  FOR ONE-WAY  LAYOUT
Source
Between groups

Within groups
Total
Sum of Squares d.f. Mean Square E(MS)
vj.(y.-y)2 l-i SS /(I-l) a2 + (I-lf ^^-I)2
i i
li^j-V2 n-I SSw/(n-I) °2
- 2
where:
    y-jj = Observation at group (station) i and replicate j
     y. = ith group mean
      y = Overall  mean of all  i, j observations
      I = Number of sampling stations
      n = Total  number of observations
    SS[j = Between groups sum of squares
    SSy = Within groups sum of squares
  E(MS) = Expected mean square
     Ji = Number of replicates at the itn station
     £i = True value of the itn effect
      £ = Mean of the treatment effects
     a2 = Population variance.

-------
                             d-ir1 EJiUi-*)2

where:

      I = The  number of sampling stations
     Jj = The  number of replicates at the itn station
     ^i = The  true value of the ith effect
      £ = The  mean of the treatment effects.

     Under the null  hypothesis,  the value of the  actual  fixed  effects  term
is 0, and the  F  ratio is equal to 1.  When fixed effects are observed in the
monitoring program,  the  value  of  this term  increases  and results  in  an
increase in the value of the numerator of  the  F ratio.   Large  effects  will
result in an  increase  in  the power of the  test  (i.e.,  the probability  of
rejecting a  false null  hypothesis).

     In  performing  power analyses,  a  set  of  effects  is  assumed and  the
performance  of the sampling design  is  evaluated as if these assumed effects
actually occurred.   However,  when a sample  design involves several  station
locations, many sets of  effects  can be assumed.   For  example,  alternative
hypotheses can be constructed under which actual  station  effects of a certain
magnitude occur  at one, two,  three,  or more  of  the total number of sampling
locations.   The magnitude  of  the  effects  could also  be varied  among
stations.  It  can be  seen that  an infinite  number of alternative hypotheses
can be constructed for  evaluation in power analyses.

     The power analyses presented in this report were conducted  to provide a
conservative  evaluation  of monitoring program performance.  The  testable
hypothesis used in  these analyses  was constructed  such that   the  effects
occur in  the  combination that is most difficult to detect.  Scheffe (1959)
showed that  this conservative set of effects is defined by:
                                   8

-------
                 A;     C  =	-J- ,    for an k f i or j            (2)
where:

     A = The maximum  difference  in actual effects
     £|< = The true  value of the ktn effect.

     Equation (2) states that the two effects associated with the hypothesis
of interest differ by A while all other  effects are equal to  the mean of
these  two.   For  the  maximum  difference  in  effects  equal  to A,  this
arrangement gives the  lowest test power.

     The power analyses presented in this  report  were conducted on the Ocean
Data Evaluation System (ODES).  The power  analysis tool available on ODES is
described in  a user-guidance  document (Tetra Tech  and  American Management
Systems 1986).  Statistical power  analyses  and  methods of  calculation are
described by Scheffe  (1959) and Cohen  (1977).

     Recent evidence  concerning the  robustness  of the  ANOVA  model  to
deviations from assumptions of normality  and  equal  variances  indicates the
appropriateness of these  statistical  methods  in environmental monitoring
applications (Grieb  1985).  However,  nonparametric  statistical methods such
as the  Kruskal-Wallis one-way analysis of  variance by ranks  (Kruskal  and
Wall is 1952)  could also  be used for  the  analysis of these bioaccumulation
data.  While the statistical  analysis  results  in this report  apply to the
parametric  ANOVA  model,  these   results can also be  used  to  evaluate the
corresponding performance  of  alternative,  nonparametric statistical methods
by computing  the power-efficiency  of  the  nonparametric  analog.  The power-
efficiency of  the  nonparametric  test provides  a comparison of  the sample
size  required to achieve the  same  level of  power associated  with the
corresponding  parametric  tests.    For example,   the  power-efficiency  of
statistical Test B  relative to Test A  is given by:

-------
                                      NA
                                (100)     percent
where:
     Ng = The number of samples  required  in  Test  B  to  achieve the same level
          of power obtained  in Test  A  with a sample size of N/\.

Calculation of  the  power-efficiency  ratio  for  the Kruskal-Wal1 is  test is
described in Andrews (1954)  and  Lehmann  (1975).

Preliminary Analyses

     To conduct the power analysis,  it is necessary to obtain an estimate of
the residual  error variance  (i.e., the natural variability not  accounted for
by the  statistical  model).   This estimate  can  be  obtained  by conducting a
site-specific preliminary study  or by  using  existing sampling data.   For the
purposes  of demonstrating  the  power  analysis  techniques in  this  report,
historical  data were compiled and analyzed  in a one-way  ANOVA to  estimate
the residual  error variance.  These  estimates  were  then used as study design
parameters in individual  power analyses.

     A  summary  of the historical data  is  provided in Table  2.   Data were
obtained for five taxa:  three  fish species (Dover sole,  English sole, and
winter  flounder)  and  two invertebrate  taxa  (American lobster  and Cancer
spp.).   Replicate  measurements of  tissue concentrations  of  selected
contaminants were obtained from  various  numbers of  sample  locations.  Tissue
concentrations of these  pollutants  were  obtained for  both muscle and liver
tissues.  These data were compiled by  Tetra  Tech  (1987a) as part of  a review
of bioaccumulation data  on  target species  recommended for 301(h) discharge
monitoring and were derived from analyses of tissue samples from individual
organisms (i.e., no composite samples).   The raw  data  are  presented  in Tetra
Tech  (1987b).   In  general, replicated data  for tissue body  burdens of
priority pollutants are limited.  However,  while the  number of contaminants
included in these data is limited,  two important chemical groups of concern
                                     10

-------
TABLE 2.  SUMMARY OF DATA USED IN POWER ANALYSIS
Taxon
American lobster
(Homarus americanus)
Dover sole
(Microstomus
pacificus)


Winter flounder
(Pseudopleuronectes
americanus)


English Sole
(Parophrys vetulus)
Crab
(Cancer spp.)
Type of
Tissue
Muscle
Muscle
Muscle
Liver
Liver
Muscle
Muscle
Liver
Liver
Muscle
Muscle
Contaminant
PCBs, Hg, Cd
PCBs, DOT
Cu
PCBs, DDT
Ag, Cd
PCBs, DDT
Cu
PCBs, DDT
Cd, Zn
As, Pb, Hg
PCBs, Pb, Hg
Number of
Stations
4
3
2
3
2
4
3
4
3
6
4
Number of
Replicates Location
10 Long Island Sound
NY Bight Apex
12 Southern California Bight
6
12
6
12 NY Bight Apex
6
12
6
5 Commencement Bay, WA
5 Commencement Bay, WA
Reference
Roberts et al . (1982)
Sherwood et al . (1980)


Sherwood et al. (1980)


Tetra Tech (1985c)
Tetra Tech (1985c)

-------
in terms  of  bioaccumulation,  metals  and chlorinated organic compounds, are
represented.

     The  residual  error  variance  design  parameter  can  be viewed  as an
estimate of the denominator  in  the  F ratio, which  is  used  to evaluate the
significance  of the ANOVA statistical tests.  This  quantity is shown  in the
one-way ANOVA table (Table 1) as  the within-groups mean square and repres-
ents the average variance within  groups.  Where sample data are available,
this design  parameter can  be estimated  in one  of  two ways.   First,  a
preliminary ANOVA can be conducted  and  the  value of  the within-groups mean
square used.   Second,  the sample variance  can  be computed from all available
data ignoring sample location.  The first value provides an estimate  of the
variance that  is unexplained  by the  statistical  model.   Therefore,  if the
effects  of  sample locations are found  to be  significant  in the F test
conducted with the ANOVA,  the wi thin-groups mean  square will  have a  value
that is less  than  the  overall  sample  variance.

     Because  the variance design parameter is  an estimate of the denominator
in the  F  ratio, it can  be  seen that the overall   sample variance  obtained
from existing  data provides  a  larger  and, therefore, more  conservative
estimate for the purposes of  conducting power analyses.  In this case, the
estimated value of  the difference that can be  detected between stations will
be larger than if the power analyses  were conducted using the within-groups
mean square as  an  estimate  of denominator  in the  F ratio.   However, where
available data  can  be fit to the ANOVA model,  the estimate of the within-
groups mean square  provides a more realistic estimate of the expected value
of the denominator  in  the F  ratio.

     The historical data described in Table 2 were  analyzed using a one-way
ANOVA design  to obtain values of the within-groups mean square  for subsequent
power analyses.  The results of 23 analyses are  shown  in Table 3.   For each
analysis, the  estimated mean tissue concentration  of the pollutant,  the
within-groups mean  square,  and  a coefficient  of  variation  are presented.
The occurrence of  a significant  F test is  also indicated  in this table.
                                      12

-------
                        TABLE 3.   SUMMARY OF ONE-WAY ANALYSIS OF VARIANCE RESULTS FOR HISTORICAL DATA
CO
Data Set
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21


Estimated Mean
Type of Concentration, x
Taxon Tissue Contaminant (mg/kg)
American Lobster Muscle Total PCBs
Hg
CD
Dover Sole Muscle Total PCBs
Total DDT
Cu
Liver Total PCBs
Total DDT
Ag
Cd
Winter Flounder Muscle Total PCBs
Total DDT
Cu
Liver Total PCBs
Total DDT
Cd
Zn
English Sole Muscle As
Pb
Hg
Crab Muscle Total PCBs
Pb
Hg
.152
.215
.018
.766
1.279
.075
.925
.382
.1162
.7251
.0906
.0126
4.389
4.015
.607
.093
29.594
5.067
.247
.0572
.0918
.316
.094
Wi thin-groups
Mean Square
tf2)
.0061
.0076
<.0001
.3324
6.7761
.0003
6.1616
.0615
.0023
.1194
.0007
<.0001
6.0750
4.2335
.0996
.0023
17.2450
39.2918
.0118
.0006
.0087
.1227
.0033
Coefficient of
Variation
(| x 100)
A
51.4
40.6
28.3
75.3
203.5
23.1
268.4
64.9
41.3
47.7
29.2
41.3
56.2
51.2
52.0
51.6
14.0
123.7
44.0
43.0
101.6
110.8
61.1
Significance
of Test
*a
*
*
*
*





*


*
*
*


*
*
*

*
     F Test significant, P<0.05.

-------
     Coefficients  of variation presented in Table  3  were calculated as the
ratio of the square root of the within-groups  mean square to the estimated
overall  mean tissue concentration.   This  ratio  was  multiplied by 100 so that
the coefficient of variation  is expressed  as a  percentage.   These values
thus represent a  normalized measure of the unexplained variability (uncer-
tainty)  within  the data  set  and,   as  demonstrated  below,  an  important
indicator of the level  at  which  statistically  significant  differences can be
detected.  A frequency  distribution of the observed  values  of the coeffi-
cients of variation  is  presented  in Figure 2.   Values ranged from 14.0 to
268.4, but the  majority occurred between  40 and 60.

Analytical  Results

     Results of the  power analyses conducted  for  each historical data set
are summarized  in Table  4.   Results  of all   analyses are  expressed  as  a
percentage of the mean contaminant  concentration observed in the  particular
data set. The  presentation of the minimum detectable difference as a percent
of the observed mean value, rather than as a  concentration of  the  contami-
nant,  provides  a basis  for comparing the results obtained for  the different
data  sets.   For example,  this  makes  it possible to readily evaluate the
effect of the increased sample variability, expressed  as  an  increase in the
coefficient  of  variation,  on the ability to detect  statistically significant
differences  among  sampling locations.   This presentation  format also confers
a general applicability to these analyses, as  the  results can  be  applied to
any data  set  exhibiting  the  same  or  similar coefficient  of variation.
However, as  discussed below, it  is  important to evaluate  individual monitor-
ing programs in terms of the value  of the contaminant concentration  that can
be detected  among  sampling locations.

     The analyses  presented  in  Table 4  were  conducted with  the  number of
sampling stations  fixed at four  or eight.   This  number of  stations was
selected to  represent  an expected range  in many  301(h)  bioaccumulation
monitoring programs.  The  selection of two levels of sampling effort also
provided the opportunity  to demonstrate  the  relative effect of an  increase
in the number of sampling locations on the ability to detect differences in
tissue concentrations among stations.

                                    14

-------
8-
U)
0 C-
0 6
LU
CC
cr
o 4 "
LL.
O
cc
LU
2

0 "
c




H
1 10 20


II
•I
•::'-






30 40

ftl&'tt-
liii
s|||
111
III






50

II
ijl
111!
III!





Ill
1111 1111 1111 ill 111 111
Nill
,V,V, ..V, .^V, ,W« >13C
                                   COEFFICIENT OF VARIATION (a/x x 100)
Figure 2. Frequency distribution for calculated values of the coefficient of variation for 23
          historical data sets.

-------
TABLE 4.   RESULTS OF POWER ANALYSES SHOWING THE MINIMUM DETECTABLE
     DIFFERENCE IN THE CONCENTRATION OF SELECTED CONTAMINANTS
Mean Coefficient of
Tissue Concentration (J) Variation Number of
3ata Set Taxon Contaminant Type mg/kg (£ x 100) Replicates
1 American lobster Total PCBs M 0.152 51.4 2
3
4
5
6
8
10
12
14
2 American lobster Hg M 0.215 40.6 2
3
4
5
6
8
10
12
14
3 American lobster Cd H 0.018 28.3 2
3
4
5
6
8
10
12
14
4 Dover sole Total PCBs M 0.766 75.3 2
3
4
5
6
8
10
12
14
5 Dover sole Total DDT M 1.279 203.5 2
3
4
5
6
8
10
12
14
6 Dover sole Cu M 0.075 23.1 2
3
4
5
6
8
10
12
14
Minimum Detectable
(Percent of
4 Stations 8
283
178
141
121
108
91
80
72
67
223
140
112
96
85
72
63
57
53
156
98
78
67
60
50
44
40
37
414
260
207
178
158
133
117
106
98
Difference6
Mean)
Stations
286
195
158
137
123
104
91
83
76
226
154
125
108
97
82
72
65
60
158
107
87
76
68
57
50
46
42
419
285
232
201
179
152
134
121
112
1,120 1,134
704
560
481
428
361
317
287
264
127
80
64
55
49
41
36
32
30
772
626
542
485
410
362
328
302
129
87
71
61
55
47
41
37
34
                               16

-------
TABLE 4.  (Continued)
Mean Coefficient of
Tissue Concentration (J) Variation Numfcer Qf
Data Set Taxon Contaminant Type rng/kg (-2x100) Replicates
7 Dover sole Total PCBs L 0.925 268.4 2
3
4
5
6
3
10
12
14
8 Dover sole Total DDT L 0.382 64.9 2
3
4
5
6
8
10
12
1*
9 Dover sole Ag L 0.1162 41.3 2
3
4
5
6
8
10
12
14
10 Dover sole Cd L 0.7251 47.7 2
3
4
5
6
8
10
12
14
11 Winter flounder Total PCBs M .0906 29.2 2
3
4
5
6
8
10
12
14
12 Winter flounder Total DDT H .0126 41.3 2
3
4
5
6
8
10
12
14
13 Winter flounder Cu M 4.389 56.2 2
3
4
5
6
8
10
12
14
Minimum Detectable Difference"
(Percent of Mean)
4 Stations 8 Stations
1,477
929
739
634
565
476
418
378
348
357
225
179
153
137
115
101
91
84
228
143
114
98
87
73
65
58
54
262
165
131
113
100
84
74
67
62
160
101
80
69
61
52
45
41
38
229
144
114
98
87
74
65
59
54
309
194
155
133
118
100
88
79
73
1,495
1,018
826
715
640
541
. 478
433
398
362
246
200
173
155
131
116
105
96
231
157
127
110
99
83
74
67
61
266
181
147
127
114
96
85
77
71
162
110
90
78
69
59
52
47
43
231
158
128
111
99
84
74
67
62
313
213
173
150
134
113
100
91
83
         17

-------
TABLE 4.  (Continued)
Mean Coefficient of
Tissue Concentration (x) Variation Number of
Oata Set Taxon Contaminant Type mg/kg (-2. x 100) Replicates
14 Winter flounder Total PCBs L 4.015 51.2 2
3
4
5
6
8
10
12
14
15 Winter flounder Total DDT L 0.607 52.0 2
3
4
5
6
8
10
12
14
16 Winter flounder Cd L 0.093 51.6 2
3
4
5
6
8
10
12
14
17 Winter flounder Zn L 17.2450 14.0 2
3
4
5
6
8
10
12
14
18 English sole As M 5.067 123.7 2
3
4
5
6
8
10
12
14
19 English sole Pb M 0.247 44.0 2
3
4
5
6
8
10
12
14
20 English sole Kg H 0.0572 43.0 2
3
4
5
6
8
10
12
14
Minimum Detectable Difference"
(Percent of Mean)
4 Stations 8 Stations
282
177
141
121
108
91
80
72
66
286
180
143
123
109
92
81
73
67
284
179
142
122
109
92
81
73
67
77
49
39
33
30
29
22
20
18
681
428
341
292
260
219
193
174
160
242
152
121
104
93
78
69
62
57
237
149
118
102
90
76
67
61
56
286
194
158
137
122
103
91
83
76
290
197
160
139
124
105
93
84
77
288
196
159
138
123
104
92
83
77
78
53
43
37
33
28
25
23
21
689
469
381
330
295
249
220
199
184
245
167
135
117
105
89
78
71
65
239
163
132
115
102
87
76
69
64
         18

-------
                                          TABLE  4.   (Continued)
Mean Coefficient of
Tissue Concentration (x) Variation Number of
Data Set Taxon Contaminant Type3 mgAg (- x 100) Replicates
21 Cancer crabs Total PCBs M .0918 101.6 2
(Cancer spp.) 3
4
5
6
8
10
12
14
22 Cancer crabs Pb M .316 110.8 2
(Cancer spp.) 3
4
5
6
8
10
12
14
23 Cancer crabs Hg M .094 61.1 2
(Cancer spp.) 3
4
5
6
8
10
12
14
Minimum Detectable
(Percent of
4 Stations 8
558
351
279
240
213
180
158
143
132
610
384
305
262
233
197
173
156
144
336
211
168
144
128
108
95
86
79
K
Difference"
Mean)
Stations
565
385
312
270
242
205
180
163
150
618
420
341
295
264
224
197
179
164
340
232
188
163
146
123
109
98
91
Tissue type:  M • muscle, L « liver.
Power analyses conducted »t fixed levels of statistical  significance  (0.05)  and power (0.80).
                                                        19

-------
     As indicated in Table 4, these analyses were conducted at  fixed levels
of statistical significance («) and power  (1-0)  (see Figure 1).  The value
of the statistical power was  set at 0.80.  Thus, the  probability  of detecting
the minimum differences shown in  Table 4 in a one-way ANOVA  statistical test
is 0.80.  The significance level  selected for these  power  analyses was 0.05,
which corresponds to the value most  commonly selected in statistical tests.

     Values  of  the coefficient  of  variation  are  presented in  Table  4 to
facilitate comparison  of  the  power  analysis results.  These data show that
for an  increase   in the value of this measure of unexplained  variability,
there is a corresponding increase in the minimum detectable  difference among
sampling stations.  This  relationship  can  be  seen by examining  the results
for the smallest  and largest values of  the  coefficient.   The smallest value
of this measure of  variability was obtained for  Data Set  17 (Table 4).   For
the zinc concentration observed in the  liver tissue  of winter flounder,  the
computed value  of  the coefficient  of  variation is  14.0.   Results  of  the
power analyses obtained for this data  set  indicate that the minimum differ-
ence  in  the  zinc concentration  that  can be  detected  with four  replicate
samples at  four  and eight  sampling stations  is  39 and  43 percent  of  the
overall mean  value, respectively.  The  largest  value for  the coefficient of
variation (268.4) was  obtained  for  the  concentration of  total  PCBs in  the
liver  tissue of   Dover sole  (Data Set  7,  Table 4).   Results  of the power
analyses obtained for  this data set  indicate that with four replicate  samples
at four sampling  stations the minimum difference in  values of  tissue  concen-
tration that can be  detected  among  stations  is approximately  7  times as
great  as  the overall   mean concentration.  With  four replicate samples at
eight  stations,  this  minimum detectable value  is over 8  times  as great as
the observed mean tissue concentration.

     Analytical   results  from all  23  data  sets (Table 4) are summarized in
Figure 3.  For each value of the  coefficient of variation, the corresponding
minimum  difference,  expressed  as  a  percentage of  the   mean,  that  can be
detected with five replicate samples at eight stations is  plotted.   The  data
in  Figure  3  show that  the  increase  in  the  minimum detectable difference
among sampling stations is a  linear  function of the coefficient of variation.
                                    20

-------
                 800,
             01
             O
                 600-
             LU
             t
             Q   400
             a
                    ,
                 200-
                            FIXED DESIGN PARAMETERS
STATIONS
REPLICATES
POWER
STATISTICAL SIGNIFICANCE
                    0   20  40  60  80  100  120 140 160 180 200 220 240 260 280
                                     COEFFICIENT OF VARIATION
Figure 3.  Minimum detectable difference among sampling stations as a function of the
          coefficient of variation.

-------
     From  Figures  2 and 3,  it  can  be  seen that the greatest proportion of
the values of the coefficient of variation for the  historical data sets fall
within  the range of  40-60.   Therefore,  examination of  the  power analysis
results  for  data  sets  with  coefficients  of variation  between  40  and  60
provides  an  estimate of the expected  performance  of bioaccumulation moni-
toring  programs.   For example,  the calculated  value of  the  coefficient of
variation for Data  Set 2  (mercury,  lobster)  shown  in Table 4  is  40.6.
Results  of the  power analysis  for  these data indicate that  with  five
replicate  samples  at  either four or eight stations,  the difference in the
concentration of mercury that could be detected between  stations is between
0.206 and  0.232 mg/kg.  This minimum detectable difference is approximately
equal to  the  overall  observed mean  concentration  (0.215 mg/kg)  of mercury
among sampling  stations.   Data  Set 23  (mercury, crab),  on  the other hand,
represents conditions at  the other end  of this range.   In  this  case,  the
minimum detectable difference in the concentration of mercury in the muscle
tissue  of  Cancer  spp. with the  collection  of five  replicate  samples is
144 percent and 163  percent  of  the mean value  for four  and eight stations,
respectively.   Thus,  in the majority  of the  data  sets evaluated,  the
observed coefficient  of variation  is between 40 and 60,  and the collection
of five  replicates  at eight or fewer  stations  resulted  in  the  ability to
detect  differences  in  tissue concentrations of contaminants  less  than or
equal to 163 percent of the overall  mean  contaminant concentration.

     Additional power analyses  presented in  Figures  4  and 5 were conducted
to summarize the results in Table  4 and to demonstrate  the importance of the
level of unexplained variability,  represented by the  residual error variance
design  parameter,  in  determining  the expected  performance of  a monitoring
program.   Specifically, these analyses  demonstrate the effect of increased
levels  of  unexplained  variance on 1)  the ability to  detect a  specified
difference between stations  and  2) the relative effect of increased numbers
of stations  on the  minimum detectable  difference.   These  analyses  were
conducted for four  levels of unexplained  variability.   Coefficients of
variation were set at 30, 50, 70,  and  90.   The number of sampling stations
was  set  at 4, 6,  8, and  16.   As with  the  previous  analyses presented in
Table 4, all  calculations  were conducted  for  fixed  levels of  power (0.8) and
statistical significance (0.05).
                                    22

-------
111
2
u_
O
LLJ
O
Z
LJJ
oc
LU
O
vt
111
Q
i
      550
      500
      450
      400
      350-
300-
5    25CH
LLJ
CO
<
200-
150-
      100-
      50-
COEFFICIENT
OF VARIATION
90
70
50
30
NUMBER OF STATIONS
4 6




















                      4      6      8     10     12


                      NUMBER  OF REPLICATES
                                                  14
16
   Figure 4. Minimum detectable difference vs. number of replicates at
            selected levels of unexplained variance for 4 and 6 stations.
            Power of test = 0.80, significance level = 0.05.
                              23

-------
2
ID

Z
     550-i
     500-
     450-
     400-
u.
°   350
LJJ
O
Z  300
HI
cc
LJJ
U.

§  »
111
CD
^  200-
UJ
Q  150
100-
     50-
COEFFICIENT
OF VARIATION
90
70
50
30
NUMBER OF STATIONS
8 16




















                     4      6      8      10      12

                     NUMBER OF REPLICATES
                                                  14
16
   Figure 5. Minimum detectable difference vs. number of replicates at
            selected levels of unexplained variance for 8 and 16 stations.
            Power of test = 0.80, significance level = 0.05.
                             24

-------
     Results of these analyses, like those presented in Table  4, have general
applicability, because the minimum detectable  difference  is  expressed  as a
percentage of  the  mean.   Additionally, the power  curves  are presented  for
coefficients of  variation  representing a wide range  of  unexplained varia-
bility in  the  sampling environment.   These curves  can be used to evaluate
monitoring program performance for sampling designs using 4-16 stations  and
for sampling  data  exhibiting coefficients of  variation  between  30 and  90.
This range includes  the majority  of  historical  data sets compiled for this
study.

     These results  show  that for  an  increase in  the level  of unexplained
variance,  the  minimum detectable difference between  sampling  stations  in-
creases.   For example, in Figure 4 it  can  be seen  that with  five  replicates
at four  stations the minimum detectable difference between  stations ranges
from approximately 70  percent of the mean  for  a  coefficient  of variation of
30 to 212 percent of the  mean for a coefficient of  variation  of 90.  Corres-
pondingly,  both  figures show  that  as  the  level   of unexplained  variance
increases, a  greater level  of sample  replication  is  required  to detect  a
specified  level  of  difference.   For example,  in a sample design with  four
sampling  stations  (Figure 4),  the number of replicate samples required to
detect a difference  between  stations  equal  to  the  mean is 3,  7, 12,  and
about 17, respectively, for coefficients of variation of  30,  50, 70, and  90.

     Results of  these analyses  also demonstrate that for a fixed  level of
sample variability,  the minimum  detectable  difference  between  stations
increases  as  the  number  of  stations  increases.  This  increase  is small,
however,   compared  to  the effect  of increased variability  in  the  sampling
environment.  For example, in Figure 5, for a coefficient of  variation  equal
to 30, the minimum  difference detectable  with five  replicates is  approxi-
mately 80 and 90 percent of the mean  for 8 and 16 stations, respectively.   In
general,  monitoring  program  performance,  measured  by the ability to detect
specified  differences among  stations,  is increased  for a  fixed level of
sampling  effort  by  the  collection  of more  replicates  at   fewer  stations.
However,   the  effect of number of  stations on program performance  is  small
relative to that of  the number of replicate samples.
                                   25

-------
     In Table 4 and  Figures  3  through  5,  the minimum detectable difference
is expressed in terms  of  a percentage  of the overall mean.   As indicated,
this provides a direct basis for the comparison of results among data sets,
and the results presented  in  Figures  4  and 5  allow a quick evaluation of the
expected performance of a  large  number of  study designs.   However,  in many
monitoring programs,  there may  be  an  interest  not  only  in  the relative
change  in  contaminant  concentrations among  sampling  locations,  but  also in
the minimum value  of the contaminant  concentration that can be detected.  In
fact,  results  of  power   analyses used to  evaluate  individual  monitoring
program design are generally  expressed  in terms of the measured units.

     In Figure 6,  results  of  power analyses are shown for selected data sets
presented in Table 4.  In  each of the  four plots shown,  the minimum detect-
able difference in the concentration  of a selected contaminant is shown as a
function of the number of  sample  replicates.  Additionally, the mean concen-
tration of  the  particular contaminant  in  each example  case is shown  to
indicate the  relative  performance of monitoring programs  at  the different
levels of unexplained variability.

     Power  analyses conducted  with  Data Set  11  (Table  4)  are  shown  in
Figure 6a. The number of replicates required to detect a specified difference
in the  concentration  of total  PCBs  in  the muscle tissue of winter flounder
between stations is shown.  These data  are characterized by a  relatively low
coefficient of  variation  (29.1),  indicating a  low level  of  unexplained
variation.  As a result,  small differences in the contaminant concentration
can  be detected  with low  levels  of  sample  replication.   Four  or more
replicates will  provide  adequate replication  to detect differences  of
approximately 0.09 mg/kg of the contaminant  in muscle tissue.

     The  results  of  power  analyses  conducted  with  data collected  from
sampling environments exhibiting  increasing levels of  unaccountable variation
are shown in Figures 6b, c, and d.  The calculated coefficients of variation
for these data  sets  are 40.6,  64.9,  and 101.6, respectively.  As the  plots
indicate, successive  increases  in  the  coefficient of variation are accom-
panied  by  a  decrease in  the  ability to detect differences relative to the

                                   26

-------
                                                                           I  4 STATIONS

                                                                            OBSERVED MEAN CONCEIfTWTIOM
   0.16

   0.14

   0.12-

   0.10-

   0.08-

   0.06


I  "4

jj  0.02

   0.00
 (I) DATA SET 11
   WWTERaOUNOER
   PCB (MUSCLE)
   c.v.. a.i
                4    6    a    10   12

                  NUMEROFREPUOTES
                                        14   16
                        i
                            0.5
                            0.4
                            0.3
                            0.2-
                            0.1-
                                                         0.0
                            (b) DATA SET 2
                               LOBSTER
                               Hg (MUSCLE)
                               CV. . 40.6
                                        4    6    8    10   12   14   16

                                          NUMBER OF REPLICATES
i
    1.4


    17


    1.0


    0.8


    0.6


    0.4


    0.2-
    0.0
(e)  DATA SET I
   DOVER SOLE
   DOT (LIVER)
   C.V. . 64.9
                4    6    8    10   12
                  NUMBER OF REPUCMES
                                        14
                16
0.6


0.5


0.4


OJ


02-


0.1


0.0
(d)  DATA SET 21
   CANCER (pp.
   PCS (MUSCLE)
   C.V. - 101.6
            4    6    8    10   12
              NUMBER OF REPLICATES
                                                                14   16
           Rgure 6.  Minimum detectable difference  in the tissue concentration
                        of selected contaminants vs. number of replicates.
                                              27

-------
overall mean  concentration  among  stations.   However,  in  Figures 6b and 6d,
the contaminant concentrations that can be detected at comparable levels of
sampling effort  are similar.  Likewise,  from  Figure  6c,  10  replicates at
each station are required to detect a difference approximately equal to the
overall mean among  stations.  In comparison, even with 15  replicate samples
at  each  station,  the  minimum detectable difference  in  the contaminant
concentration is greater  than the  overall mean in Figure 6d.   However, the
values of  the minimum detectable  differences  in terms  of the contaminant
concentration corresponding to 10 replicates at each station  are 0.39 mg/kg
(Figure 6c) and 0.16 mg/kg  (Figure 6d),  Thus, while the minimum detectable
difference  in  terms of a percentage of the  overall mean is greater in one
analysis (Figure 6d),  the minimum detectable  contaminant  concentration is
considerably less than that found in the other analysis  (Figure 6c).  These
results indicate the importance of evaluating the performance of monitoring
programs both in terms of the relative change in contaminant concentration
that can  be detected  among sampling  locations  as well   as the  minimum
contaminant concentration that can  be detected.

Summary

     1.   Analyses   of  23 historical  data sets  on tissue  contaminants
          indicate   that,  with the collection  of  individual  tissue
          samples,   a difference of  <163 percent of the  mean  could be
          detected  in the majority  of cases  (assumes five replicates at
          eight or  fewer  sampling  stations,«-  0.05, power  - 0.80).

     2.   Many important  chemicals  (e.g.,  PCBs  in  Dover sole and crabs)
          displayed much  higher variability, however.   In  these cases,
          a similar analytical design could only detect  differences of
          about 200-700 percent of  the  mean.
                                   28

-------
                       COMPOSITE SAMPLING  STRATEGIES
     The historical  data sets compiled for this report (Table 2) were based
on similar sampling  and analytical  approaches.   In all cases, tissue samples
were obtained from selected organisms  and  analyzed individually to determine
the  concentration   of particular contaminants.  This  type of  sample  is
referred to as a grab  sample,  since  the  individual  tissue samples are used
to provide  an estimate of the contaminant concentration  in  the  tissue  of
specified populations.   In each data set presented,  a fixed number of these
individual  estimates  was   obtained  and analyzed statistically  to estimate
distributional parameters  of the underlying population  and to make statis-
tical comparisons of these parameters  among  sampling  sites.

     An alternative  to the analysis  of tissue from individual organisms is
the  analysis  of  composite  samples.   Composite  tissue sampling  consists  of
mixing tissue samples  from two  or  more individual  organisms collected at a
particular site and  analyzing this  mixture as  a  single sample.  The analysis
of a composite sample,  therefore, provides an estimate of  an average  tissue
concentration  for   the  individual  organisms  that  make  up  the  composite
sample.  This composite sampling strategy  is often used  in  effluent sampling
(Schaeffer  and  Janardan  1978;  Schaeffer  et  al. 1980)  to estimate average
concentrations of  water  quality  variables  in  cases  where  the  individual
chemical  analyses   are  expensive but the cost of  collecting  individual
samples  is  relatively  small.   Composite  sampling   is  also used in  the
collection  of samples from bioaccumulation  monitoring  systems  containing
caged  specimens  of   bivalve  molluscs   (Risebrough et  al.  1980;  Gordon  et
al. 1980).  The  collection of  composite samples is  also  required in other
cases where the  tissue mass of an  individual  organism  is insufficient for
the analytical protocol.   An evaluation of  the  appropriateness of composite
sampling in bioaccumulation monitoring programs  is provided below.

     Composite sampling of the  tissue  from  selected organisms represents an
attempt to  prepare  a sample that will represent the  average  concentration.

                                   29

-------
If X]_, X2,  . . .  Xn represent  the contaminant concentration of  n tissue samples
from  individual  organisms,  these samples can  be  mixed to  obtain  a single
composite observation:

                                    n
                              Z =  Lo>i  Xi                               (3)
                                   1-1

where:

    ^•j = The proportion of total sample contributed  by the  itn  component.

Rhode (1976) has  shown  that  the  expected  value and variance of Z are  given
by:

                                 E(Z) » M                                (4)
                           Var{Z) •  -

where:
      /* = Population mean
     a2 = Population variance
     o£ = Variance of the composite proportions
      n = Number of samples in each composite.

      If  the values of  &}  in  Equation (3)  are  equal  for  all  i,  then  the
numerical value of Z is equivalent to the mean of the n samples, X,  where:
                                   n
                                   E  Xi
                              X = ^r-                                 (6)
In  this  case,  for the  cost of  analyzing  a  single  composite  sample,  an
estimate of the mean  of  n grab samples is obtained.  However, a consequence
                                    30

-------
of  selecting  the composite  sampling  strategy in the  above  example is the
loss of  information concerning  individual  sample  variability.   As  shown
below,  the range of values (minimum and maximum concentrations) contributing
to the  mean concentration is not known.

     The above results  apply to  single composite samples.   However, replicate
composite samples can  also  be used in bioaccumulation monitoring programs.
The basic  sample design previously described  for  the historical data  sets
involved the collection of replicate grab samples from two or  more locations
and the statistical comparison of  the  mean values among  sampling locations.
As an alternative to this design, replicate composite  samples,  each  composed
of  tissue  from  several  organisms,  could be collected  at  specified  sampling
locations with  the  objective of obtaining a  more  accurate estimate of the
true mean  at each  location and to increase the power  of the  statistical
tests.

     The comparison of single composite and  replicate grab  samples can  be
extended to replicate  composite  samples  (Rhode 1976,  1979).   The mean  of  m
composite samples (I\t  fy,  • •., Zm) is given by:
                                     m
                                                                         (7)
The expected value and variance of Z are given by:

                              E(Z) = M                                   (8)


                          Var(Z) - ^ + no*o2                            O)

     The consequences  of Equations 8 and 9 are pertinent  to  the evaluation
of  composite sampling strategies  for bioaccumulation monitoring  programs.
For example, when the  composites consist of samples of equal mass (i.e., the
same mass tissue is taken from each organism) (0^=0), then:

                                    31

-------
                                                                        (10)
where:
                                       A2
                              Var X »  —                               (ii)
                              VarZ-                                   (12)
      m = Number of  replicate  samples  (replicate or composite) used in the
          estimate of the population variance ( ^)
      n = Number of samples constituting the composite  sample.

Thus,  from  Equation 10, it  can  be  seen  that the  collection  of  replicate
composite tissue  samples  at specified  sampling  locations  will  result  in a
more efficient estimate of the mean (i.e., the variance of  the  mean obtained
with  replicate  composite  samples  is smaller  than  that obtained  with the
collection of replicate grab samples).   From Equation  9,  it should also  be
noted that for unequal proportions of composite samples (i.e.,  tissue mass),
the variance of  the series of composite  samples increases and, in extreme
cases,  exceeds  the variance  of  grab samples.   A table of  values for the
upper bound of  the variance of the  proportions  ( aj)  that lead to such  an
increase  in  composite  variance  is  presented in  Schaeffer  and  Janardan
(1978).   However,  these tabulated  values  are extremely high when  compared
with  expected  values of  oj£ associated  with preparing tissue-sample com-
posites.  For  example,  using  the  Dirichlet  model  for compositing proba-
bilities, Rhode (1979) gave:
                                                                        (13)
                              Var Z
as the increase in precision that can  be  achieved  at  the  additional  cost of
the compositing process.   For  the  analyses  presented  below, it was  assumed
                                    32

-------
that the composite  samples consist of individual  samples of equal proportions
and therefore that  a ^=0.

POWER ANALYSES FOR  COMPOSITE  SAMPLES

Analytical Methods

     Historical  data  that  could be  used  to evaluate the  applicability  of
composite sampling  in  bioaccumulation  monitoring programs were  not  avail-
able.  Instead,  simulation methods were used to make a direct comparison of
grab-and  composite-sampling  strategies.   Simulation refers  to the  use  of
numerical techniques to generate random variables with specified statistical
properties.    For  the analyses  described  below, computer  programs  were
written   1)  to  produce  individual  random samples from  populations  with
statistical  properties similar to those of the historical data described in
Table 2 and Figure  2 and 2)  to construct  composite samples.

     The  algorithms  used to  generate  the  individual  random  samples  are
described in Rubinstein (1981).  All  algorithms  used  required the generation
of  independent  random variables  uniformly distributed   over  the  interval
0, 1.  The  program developed  to perform  these simulations used the congru-
ential method described by  Lewis et  al . (1969).   Normally distributed
variables were  generated using  the approach developed  by Box  and  Muller
(1958).

     Two  sets of analyses are described below.   In the first set, simulation
methods were  used  to  show the  effect of  sample  compositing on the estimate
of  the population  mean.   Power analyses  were  used in  the  second  set of
analyses  to demonstrate the effect of  increasing the number of samples in a
composite  sample   on the  probability   of  detecting specified  levels of
differences among  stations.

Simulation Analyses

     The  first  set of analyses was  conducted  to demonstrate the effect of
increasing the number of individual  samples in the composite on the estimate

                                    33

-------
 of  the  mean.   The simulated sampling consisted of randomly selecting 10,000
 composite  samples from two  populations  exhibiting  two different  levels  of
 variability in  the sampling  environment.  The mean value in both populations
 was  fixed at 18.52, but the population variances were set at 70.90 or 354.19,
 corresponding  to coefficients  of variation  of 45.5  and 101.6, respectively.
 These  population characteristics  were selected  as  representative  of  the
 range of values for  the coefficient of variation  observed in the historical
 data sets (Table 2 and Figure 2).  Coefficients of variation of 40-50 percent
 were measured  in  several historical data sets for metals, including American
 lobster muscle  (mercury), Dover  sole  liver (silver and cadmium), and English
 sole muscle  (lead and mercury).   Coefficients of  variation of approximately
 100  percent were observed for arsenic in English sole muscle and  for  lead
 and PCBs in Cancer spp.

     To demonstrate  the effect of increasing the number  of  samples  consti-
 tuting the composite  sample, the sample variance and  the  range  of observed
 values were  recorded  in each experiment.  The results  of these experiments
 are summarized  in Figure  7  and Table 5.   A graphic display  of  the increase
 in  the  ability to estimate  the  population  mean  obtained  by increasing  the
 number of  individual  samples in composite samples is  provided  in Figure 7.
 The  95  percent confidence  intervals  shown  in Figure  7  represent  the range
 within which 95 percent of all  samples in the simulation experiments fell.
 As  the  number of individual  samples  per composite increased,  the observed
 range of mean values decreased substantially.

     The  actual  values  obtained at  the  boundaries  of these  confidence
 intervals shown in  Figure 7 (minimum  and maximum values) are  presented  in
 Table 5.   In  Analysis  1,  sampling was conducted  from a  normal  population
 with a mean of  18.52 and a  variance  of  70.90.  Therefore,  95 percent of  all
 values in this  population ranged from approximately 1.7 to 35.4.  This range
 (33.7) would be expected from randomly selecting a large number of individual
 samples from  the specified  population.  For  the  same  specified population,
 composite sampling resulted in  a  much  more precise estimate of the population
mean.  With  four individual  samples  in each  composite,  95 percent  of  all
values obtained from  the  10,000  simulated  samples  were  between 10.4  and
26.7.  This range (16.3) is approximately 50 percent  of the range that would

                                    34

-------
  Analysisl.   Mean(ji)     = 18.52   Coefficient of Variation = 45.5
              Variance (o2)  = 70.90
       30 ,
       25-
                 95% C.
        10 .
        5-
                4            6           10           20

                    NUMBER OF SAMPLES IN COMPOSITE
   Analysis 2.    Mean (n)
               Variance (o2)
 18.52  Coefficient of Variation  -  101.6
354.19
        40 -I
    Z   SO-

    IL)
    Ul
         10
                 4           6            10           20

                    NUMBER OF SAMPLES IN COMPOSITE
Figure 7.  Effects of increasing composite sample size on estimate of
           the mean.
                              35

-------
      TABLE 5.  RESULTS OF SIMULATION ANALYSES DEMONSTRATING THE
         EFFECT OF COMPOSITE SAMPLING ON THE ESTIMATE OF THE
                           POPULATION MEAN
Analyses 1.  Mean (u)   = 18.52
          Variance (o2) = 70.90
Coefficient of Variation = 45.5
                                      95 Percent Confidence Interval
   Number of Samples    Observed
     in Composite       Variance

           4             17.29
           6             12.31
          10              7.02
          20              3.49
Minimum
Value
10.4
11.6
13.3
14.9
Maximum
Value
26.7
25.4
23.7
22.2

Observed Range
16.3
13.8
10,4
7.3
Analyses 2.  Mean (p)   =  18.52
          Variance (o2) = 354.19
 Coefficient of Variation = 101.6
   Number of Samples    Observed
     in Composite       Variance

           4             88.29
           6             60.34
          10             35.57
          20             16.73
   95 Percent Confidence Interval

Minimum   Maximum
 Value     Value    Observed Range
  0.1
  3.3
  6.8
 10.5
36.9
33.7
30.2
26.5
36.8
30.4
23.4
16.0
                                  36

-------
be obtained  with the collection of  a  similarly large number of  individual
grab samples  from  this  population.   Furthermore,  an increase in the number
of samples  from 4  to  ZO in  each  composite sample  decreased  the range of
values  that  define  this  95 percent confidence  interval  by approximately
another 50 percent.

     Similar  results  were obtained  in  the  second experiment that involved
simulated sampling from a population with  the  same mean value  (18.52), but
with the variance increased  to 354.2.  The  simulated collection  of  composite
samples, each consisting  of  four  individual  samples  from  the  population,
resulted in  reduction  in  the  range  of values  in the  95 percent  confidence
interval by a  factor of 2.  A similar  reduction in this range  was obtained
by increasing the number of  samples in  the  composite to 20.

Power Analyses

     The results of  power analyses  conducted  with the historical data  sets
of individual grab samples are  presented above  in Figures  4 through 6.   In
these analyses,  the  minimum detectable difference  between sampling stations
was  shown  as a  function  of  the number of replicate  grab  samples at  each
station.  These  results  are  shown  for specified  sets of design  parameters
(i.e., number of  stations,  significance level  of  the  test, residual  error
variance,  and  power of  the  test).   To  demonstrate  the effect of  sample
compositing on the power of  the statistical  test of significance, additional
power analyses  were  conducted.  In  these  analyses,  the number  of stations
(5), number of replicate  samples at  each station  (5),  significance level  of
the  test  (0.05),  residual  error variance  level,  and level of  minimum
detectable difference were  fixed.   The power  of the test or probability  of
detecting the specified minimum difference  was  then calculated  as a function
of the  number of  individual  samples constituting  each replicate  composite
sample.

     Power analyses  were  conducted  for three levels of sample  variability.
All design parameters  except  the residual  error variance were,  identical  in
each set of  analyses.   Values of the  residual  error variance were selected
to represent the range of values found in  the historical data  sets  (Table 2

                                    37

-------
 and  Figure 2).   The coefficients  of variation  selected  for  these  three  sets
 of analyses were 45.5, 101.6, and  203.5.  The  highest  level  of  variability
 (coefficient  of variation = 203.5) is equal to that  measured  in  Dover  sole
 for  DDT  concentrations  in muscle tissue.

      The results of the power analyses  conducted  at the two lower levels of
 sample variability  are shown in Figure 8a.   In each analysis, the  probability
 of statistically detecting  a  difference equal  to  the  overall sample  mean
 among stations  increases  with  the  collection of replicate composite  samples
 at each  station and as  the number of  samples constituting  the  composite
 increases.   For example,  in  Analysis 1  (Figure  8a), conducted  at  the  lowest
 level of sample variability  (coefficient of variation  = 45.5),   the  proba-
 bility  of  detecting  the  specified difference  among  stations   with  five
 replicate  grab  samples  (i.e.,  number  of samples  =  1)  is  0.70.   With the
 collection  of  five replicate composite samples, each composed  of  two
 individual  samples  at each station, the power of the test increases to 0.96,
 and with four or more  samples  per  composite, the  detection  of  the specified
 difference  between  stations is virtually assured.

      The  benefits of the  composite  sampling strategy  are more  apparent  from
 the  analysis  conducted  at  the  intermediate   level  of sample  variability
 (Analysis  2,  Figure 8a).   The  probability  of  statistically detecting  the
 specified  difference  among  stations with the collection of  five  individual
 grab  samples  (number  of samples =  1) is only 0.17.   The power of the  test
 increases to 0.59 with the collection of 5  replicate composite  samples,  with
 each  composite  composed of  4 samples in equal   proportions,  to 0.90  with 8
 samples  per composite, and to 0.96 with  10  samples per composite.

     The results of  both sets of analyses shown  in Figure 8a  also demonstrate
 the phenomenon  of diminishing returns for continued increases  in  the  number
of  samples  per  composite.   In Analysis Set 1,  for  example,  virtually no
 increase  in  the power  of  the statistical test  was achieved  with  increasing
the  individual  sample  size  above  three.   In  the second  analysis  set,
substantial increases  in  statistical  power  were achieved by increasing the
number of  samples  in  each  composite  from  2 to  10.    However,  with  each
successive  increase in  sample  size, the  relative benefit was  reduced until

                                    38

-------
                                                       Analysis
                                                           1
                                                           2
                                                           3
              0.8-
              0.6-
              0.4-
              0.2-
              0.0
                0  1  2  345  6  7  8  9  10 11 12  13  14 15 16

                            NUMBER OF SAMPLES
                                     (a)
              1.0
              0.8-
              0.6-
              0.4-
              0.2-
              0.0
                0  2  4  6  8  10 12  14  16  18  20 22 24 26 28  30

                             NUMBER OF SAMPLES
                                     (b)
Coefficient
of Variation
   45.5
  101.6
  203.5
Figure 8.  Power of statistical tests vs. number of samples in composite
          replicate samples. Fixed design parameters: number of stations
          = 5, number of replicates = 5, significance level = 0.05, minimum
          detectable difference = 100 percent of overall mean value.
                                 39

-------
very  little  was  gained  by increasing  the sample  size  above 10.   This
phenomenon  is  analogous to  that  observed  in  the  results  of  the power
analyses presented in Figures 4 through 6.   In  these previous  analyses,  the
benefit achieved in the minimum detectable  difference  between  stations  also
decreased  with the  addition  of each  successive  replicate  grab  sample.
However,  the  main difference  is  that while the  cost of  collecting   and
processing  additional replicate samples (grab or  composite)  is substantial,
the cost  of collecting  additional  samples for each  composite  replicate
sample is relatively small.

     The  results  of  the power  analyses  conducted  at  the highest level  of
sample variability selected  are  shown  in Figure 8b.  These results are shown
separately  from the first two sets of analyses  to  include a larger  range of
values  for  the number  of  samples in  each composite.   These  results   are
directly comparable,  however,  and  provide  additional  evidence  of  the
increased power obtained by  the collection of  replicate  composite  samples.
These analyses  indicate  the  relatively  low  statistical  power associated  with
samples displaying  a high  level  of  natural  variability.   Under these
conditions  (coefficient  of variation = 203.5),  the probability of detecting
a difference among stations  equal  to  the overall mean with the  collection of
five individual replicate grab  samples is  0.08.   The power of  the  test  is
doubled with the collection of replicate composite samples  composed  of  four
samples (power  =  0.17).  With  the  collection  of  10  samples  per replicate
composite sample,  the power  is  increased to 0.38.   However, given the  high
level  of  background  variability,   the  collection  of replicate  composite
samples composed  of  25  individual  samples each  is required  to  obtain  a
testing  power  of 0.80.  Additionally,  due  to the diminishing  returns
associated with increasing  the number of samples per composite,  the  collec-
tion  of  replicate  composite  samples  consisting  of 32  samples  each  is
required to obtain a  statistical power  of approximately 0.90.

     A  final  set  of power  analyses  was conducted to provide a direct
comparison of grab-sampling  and composite-sampling  strategies.   The number
of stations (5), significance level   of  the test (0.05),  and  residual error
variance  level   (coefficient  of  variation  =  101.6) were  fixed  in   all
analyses.    In  individual  analyses,  the probability of detecting specified
                                   40

-------
minimum differences between stations was determined for selected numbers of
replicate grab samples and for a fixed  number  of  replicate composite samples
at each  station composed of  selected  numbers of  individual  samples.   The
results of these analyses are  summarized  in  Table 6.

     Results of  the  first three analyses shown  in  Table  6 demonstrate the
effect of increasing the number of replicate grab samples  on the ability to
detect statistically significant differences between sampling stations.  For
example, the probability of detecting a difference equal to  the overall mean
among  stations  is  increased from 0.17 to 0.35  by  increasing  the number of
replicate grab samples at each of the five stations  from  5  (Analysis 1) to 10
(Analysis  3).   These  first  three results  demonstrate  from  a different
perspective what  was  previously  shown in  Figures  4  through  6—that  an
increase in the number of replicate samples  is accompanied by an  increase in
the ability to identify differences among sampling stations.  The results of
Analyses 4  through  6  presented  in Table 6  demonstrate the  effect of sample
compositing on  the  ability to detect  differences  between stations.   These
analyses were conducted for five replicate composite samples at each station
and  various numbers of  individual  samples  per  composite.   These results,
therefore,  are  directly  comparable  to  those provided  in Analysis 1 for the
collection  of  five  replicate  grab samples  at each  station.  Comparison of
the  probability of detecting  a difference between  stations equal  to the
overall  mean  in Analyses  1,  4,  5,  and  6  indicates  that  a substantial
increase  in  the  power  of  the  statistical  test  can  be  achieved  by the
collection  of replicate  composite samples.  These analyses  demonstrate that
the  collection  of  five replicate composite  samples  each consisting of four
samples will increase the power to 0.59.  The power is  further increased to
0.96 by increasing the samples in each  composite  sample  to 10.

Summary

     1.   Based  on  simulation results  for  a  given  number of samples,
          composite sampling results in a much more  precise  estimate of
          the mean than analysis of grab samples.
                                   41

-------
      TABLE 6.   PROBABILITY OF DETECTING SPECIFIED LEVELS OF MINIMUM
             DETECTABLE DIFFERENCES FOR SELECTED GRAB-SAMPLING
                    AND  COMPOSITE-SAMPLING STRATEGIES.
             FIXED DESIGN PARAMETERS:   NUMBER OF STATIONS = 5,
        SIGNIFICANCE LEVEL = 0.05, COEFFICIENT OF VARIATION = 101.6
          Number of
          Replicate
           Samples
Analysis   at Each
 Number    Station
                              Minimum Detectable  Difference
                               (expressed as a proportion
           Number of     	of the overall  mean)	
           Samples in       0.25     0.50   1.0     1.5     2.0
           Composite     Corresponding Probability of  Detection
   1

   2

   3

   4

   5

   6
 5

 8

10

 5

 5

 5
1 (grab sample)     0.06    0.08   0.17    0.35    0.59

1 (grab sample)     0.06    0.10   0.27    0.58    0.85

1 (grab sample)     0.07    0.11   0.35    0.71    0.94

      4            0.08    0.17   0.59    0.93    1.00

      8            0.10    0.31   0.90    1.00    1.00

     10            0.11    0.38   0.96    1.00    1.00
                                    42

-------
2.   The precision of the  estimated mean increases with increasing
     numbers of  individual samples constituting a composite sample.

3.   Because  of  reduced  sample  variance,  composite  sampling
     results in a considerable increase in statistical  power over
     grab sampling (for a  given  number  of  samples analyzed).

4.   For most contaminants,  the  collection of six to eight samples
     per composite  results  in  adequate  statistical  power,  with
     little relative gain  in power for  additional samples.
                              43

-------
                        SUMMARY AND RECOMMENDATIONS
     This document describes the  use  of  power analyses in designing 301(h)
bioaccumulation monitoring programs and provides evaluations of alternative
sampling  strategies.   These methods  can  be  used  to  evaluate  alternative
designs  on  the basis  of the  level  of sampling effort  required  to obtain
desired levels of precision.  For example, existing data can be analyzed to
determine the minimum differences in  contaminant concentrations that can be
detected  for  selected levels  of sample  replication.   The probability  of
detecting specific levels of differences in tissue contaminant concentrations
for alternative sampling designs can also  be  determined.

     The  example  analyses presented  in  this report were  conducted  on the
Ocean  Data  Evaluation System  (ODES)  using  the  statistical  power analysis
tool.  The  ODES  Power Analysis Tool  can  be  used to assess bioaccumulation
monitoring programs from two perspectives.   In the monitoring program design
phase, these  techniques  can  be  used  in  a  prospective  manner to evaluate
alternative design  parameters   such  as numbers of  samples  and  sampling
stations.  The techniques can  also be used  retrospectively when monitoring
data are  available to evaluate  overall monitoring program  performance.  For
example,  if  a greater  statistical  power  was  desired  for  future data, the
relative  benefits in power of  increasing numbers of  replicate  samples  could
be evaluated relative to increased program costs.

     The  use  of  power  analyses  in  designing  bioaccumulation  studies was
demonstrated  with  both  historical  and simulated  data.   Twenty-three
historical  data  sets were  compiled  from  published  reports and  analyzed.
Data were obtained for  a total of five  common marine species, three body
tissues, and measured values of nine contaminants.   These data  encompassed  a
wide range of  sample  variability, and  the  results of the analyses  conducted
provide  an  indication of the  approximate  levels  of statistical power that
can  be  achieved  with the  collection  of  replicate grab  samples at  selected
sampling  locations.   Simulation techniques,  sometimes  referred to as  Monte

                                    44

-------
Carlo techniques, were used to produce data from specified sampling distri-
butions with fixed parameters.  These data were  essential for the evaluation
of  grab-  vs.  composite-sampling strategies  because  equivalent historical
data for composite samples were not  available.

     In  addition to  the description  and demonstration of  power  analysis
techniques, a primary objective of this document was to evaluate composite-
vs. grab-sampling strategies.   Based  on the results  presented  in  this report,
the  collection  of  replicate  composite samples is  recommended  for  most
bioaccumulation  monitoring  programs.   The  results  of the  analyses  using
simulated  data  demonstrated  that  the  collection  of  replicate composite
tissue samples at selected sampling  locations provides a better estimate of
the  population mean.   The results of  power  analyses  using these simulated
data also demonstrate that the corresponding decrease  in the  sample variance
that is achieved with composite  sampling  leads  to  an  increase  in the power
of statistical  tests.   For example,  with  an overall coefficient  of variation
of  approximately  100, the  analyses demonstrated  that  the  probability  of
detecting  a  difference  in  tissue contaminant  concentrations equal  to the
overall  mean among  5  stations was 0.17 with the collection  of  5 replicate
grab samples at each station and  0.59,  0.90, and 0.96 with  the collection of
5 replicate composite  samples consisting  of  4,  8, and  10  samples,  respec-
tively.

     Based  on the  levels  of  variability in  the measurements of  tissue
contaminant concentrations that were identified  in  the historical  data sets
and the results of analyses presented in  this report, it was concluded that
the collection of replicate composite  samples makes  it feasible to distin-
guish  elevated  tissue  concentrations  of contaminants  between  sampling
locations.  However, the  selection of  the appropriate  numbers  of replicate
composite samples  and  numbers  of samples per  replicate will depend  on
site-specific levels of  sample variability in the  tissues  and  contaminants
of  concern.   When  these  kinds of historical data  are  available  for  a
particular study site, the tools  demonstrated in this  document  can  be used
to  make quantitative  comparisons of  alternative  sampling designs  and  to
select the appropriate level  of sampling  effort.  Where historical  data are
not  available,  pilot  studies may  be  conducted to  estimate the  level  of

                                   45

-------
expected variability in contaminant concentrations for selected species and
tissues.  Alternatively,  the  observed  level  of variability  in tissue
concentrations  for  selected contaminants and  species  at  other  locations
could be used  to estimate  sample variability.  Where these data  cannot  be
obtained, the collection of five replicate composite samples, each consisting
of equal amounts of  tissue  from  six  individual  organisms,  is  recommended.
The selection of these design parameters assumes a coefficient  of variation
calculated among  stations  of approximately 100 percent and will  result  in
the ability  to detect a  difference  between   stations equal to the overall
mean concentration among  stations  with an  estimated probability of detection
(power  of  the statistical  test)  equal  to 0.80.  Note that most  data sets
reviewed herein had coefficients  of variation  less than 100.  Therefore, the
recommended  design will  generally  result   in either a  lower detectable
difference or  higher power  than  stated  above.   If  this  general design
specification  is used,  power  analyses  should  be   conducted  after  site-
specific data are  available  to  evaluate  the  exact probability  of detecting
specific levels of differences between stations.

     The objectives  of a  bioaccumulation monitoring  program  should,  be
evaluated  prior  to  selecting  a sampling   strategy.   Composite sampling
methods  are  appropriate  for  monitoring  programs that  have as  a  primary
objective the determination  of differences in contaminant tissue concentra-
tions among sampling stations.  However,  as shown  in Equations  9 and 11, the
variance of  composite  samples  is substantially  less than  the population
variance.  As  a consequence, the  range  of values obtained  from composite
samples  is not representative of the  true  range of tissue concentrations in
individual  organisms of the  sampled  population.  These results demonstrate
that composite  sampling will  not  detect extreme values.   Therefore, bioac-
cumulation monitoring programs using  a  composite  sampling  strategy may not
detect  the existence  of  tissue concentrations  that exceed  legal  limits  or
action  levels  for contaminants  in  fish  tissues.   For example,  from  the
historical  data compiled for  this  study, the mean concentration  of total PCBs
in Dover sole muscle tissue (Data  Set  4,  Table 3)  was  0.766 ppm.  However, 2
of the 36 values  in this data set exceed  the  U.S. Food and Drug Administra-
tion legal  limit of 2.0 ppm (U.S.  Food and Drug  Administration  1984).  These
values  would  not  have been  detected  in a monitoring  program  based  on the

                                   46

-------
collection of composite  samples.  Therefore, if the objective of a bioaccumu-
lation study is to determine compliance with specified tissue concentration
limits, the  program  should include  the  collection  of tissue  samples  from
individual organisms.   This objective could  be accomplished by two different
monitoring strategies.   In the first, the entire study could be designed to
collect replicate grab  samples.   In this  design  there  would  be a much lower
statistical  power  to detect  among-station  differences  than  could  be accom-
plished with a composite sampling  strategy  using the same number of samples.
Alternatively, the program could be designed to collect replicate composite
samples at  all  sampling  stations  with  the  collection of  supplemental
individual tissue  samples  at  areas of  expected  high tissue concentrations.
This  program design  would  enable a more  efficient assessment  of  among-
station differences  in addition  to providing an assessment  of regulatory
compliance in specified areas  of concern.

     A primary objective of 301(h)  bioaccumulation monitoring programs is to
determine whether  the  discharge causes an  increase  in the  body  burden of
toxic chemicals  in indigenous organisms.   Monitoring  programs  may also use
caged molluscs as sentinel  organisms to evaluate uptake of toxic pollutants.
For  these  kinds  of  studies on  indigenous  or transplanted  organisms,  a
composite  sampling  strategy  is  recommended.    Evaluation  of  effects  on
recreational and commercial fisheries is  another important component of some
301(h) monitoring  programs.   Where such  fisheries are  included in  the
assessment,  it  is  important  to document whether  tissue  contaminant levels
exceed applicable  criteria or  standards.    In  these  cases,  the  301(h)
monitoring  program  may contain  the dual  objectives  discussed previously.
Based  on  the  statistical  evaluations  conducted  herein,  it  is recommended
that  such dual-objective programs  be  designed  to  collect  composite tissue
samples at all sampling stations,  and that  they also include the collection
of  individual  tissue  samples  for commercial  and  recreational  species in
selected areas.

     An  additional  concern  relative  to selecting between  analysis of
individual  organisms  and composite  samples involves  an evaluation  of the
numbers of organisms  required  for collection.   As  stated  previously, the
analytical  costs associated  with  processing  grab  samples  and  composite

                                    47

-------
samples are  essentially equal.   Overall  cost differences  between  the two
strategies are  associated with  the  additional  time  needed  to  collect
organisms and  process tissues  for composite sampling.    For  the  general
composite sampling design recommended above, 30 organisms would be  required
at each  sampling station.   For  a  comparable  design  involving analysis of
individual organisms, only five organisms  would be required  at  each  sampling
station.   Therefore, the decision  on  appropriate  sampling  strategy should
also involve  an assessment  of the feasibility of  collecting  the  required
numbers of organisms.
                                    48

-------
                                REFERENCES
Andrews, F.C.  1954.  Asymptotic behavior of some rank tests of analysis of
variance.  Ann. Math. Stat.  25:724-736.

Box, G.E.P.,  and  M.E.  Muller.   1958.   A  note  on the  generation  of random
normal  deviates.  Ann.  Math. Stat.  29:610-611.

Cohen,  0.   1977.   Statistical  power analysis  for  the behavioral   sciences.
Academic Press, New York, NY.

Gordon, M., G.A. Knauer, and J.H.  Martin.   1980.  Mytilus californianus as a
bioindicator of trace metal  pollution:  variability and statistical  consider-
ations.  Mar. Pollut. Bull.  11:195-198.

Grieb,  T.M.   1985.   Robustness of the analysis of variance  in  environmental
monitoring applications.  Report EA  4015.  Electric Power Research  Institute,
Palo Alto, CA.  72 pp.

Kruskal, W.H.,  and W.A.  Wall is.   1952.   Use  of  ranks  in  one-criterion
variance analysis.  0.  Am. Statist.  Assoc.  47:583-612.

Lehmann, E.L.   1975.  Nonparametrics:   statistical  methods based  on ranks.
Holden-Day, Inc.,  San Francisco,  CA.  457  pp.

Lewis,  P.A.W., A.S. Goodman, and  J.M.  Miller.   1969.   A pseudo-random number
generator for the System/360.  IBM Syst.  J.  8:199-200.

Rhode,  C.A.  1976.  Composite sampling.   Biometrics  32:278-282.

Rhode,  C.A.   1979.   Batch,  bulk, and composite sampling,   pp.  365-377.   In:
Sampling Biological  Populations.  R.M. Cormack et al.  (eds).   International
Co-operative Publishing House,  Fainand,  MD.

Risebrough,  R.W.,  B.W.  deLappe,  E.F.  Letterman,  J.L. Lane,  M.  Firestone-
Gillis,  A.M.  Springer,  and  W.  Walker II.   1980.   California mussel watch:
1977-1978.  Volume III  - Organic Pollutants in Mussels, Mytilus  cal ifornianus
and M. edulis  along the California Coast.  Water Quality  Monitoring Report
No. 79-22.   Prepared  by  Bodega  Marine  Laboratory, Bodega  Bay, CA,  for
California State Water  Resources  Control  Board,  Sacramento, CA.   109  pp. +
appendices.

Roberts, A.E.,  D.R. Hill,  and  E.G. Tifft.   1982.   Evaluation  of New York
Bight lobsters for  PCBs,  DDT, petroleum hydrocarbons,  mercury,  and cadmium.
Bull. Environ. Contam.  Toxicol.  29:711-718.

Rubinstein, R.Y.  1981.   Simulation and the Monte Carlo method.   John  Wiley
and Sons, New York, NY.  278 pp.
                                    49

-------
Schaeffer, D.J.,  and K.G. Janardan.   1978.  Theoretical  comparison of grab
and composite sampling programs.  Biometrical  J.  20:215-227.

Schaeffer,  D.J.,  H.W.  Kerster,  and  K.G. Janardan.   1980.   Grab versus
composite  sampling:   a  primer  for the  manager  and  engineer.   Environ.
Manage. 4:157-163.

Scheffe,  H.   1959.   The analysis of  variance.   John Wiley  and  Sons, New
York, NY.  477 pp.

Sherwood, M.J.,  A.J.  Mearns,  D.R. Young,  B.B. McCain,  R.A. Murchelano,
G. Alexander,  T.C.  Heeson,  and T.-K. Jan.   1980.   A comparison  of  trace
contaminants  in  diseased  fishes  from three  areas.   Southern California
Coastal Water Research Project, Long  Beach, CA.   131 pp.

Tetra  Tech.   1985a.  Bioaccumulation monitoring  guidance:   1.   estimating
the potential  for  bioaccumulation  of  priority pollutants and 301(h) pesti-
cides discharged  into marine and estuarine waters.   Final  Report  prepared
for Marine Operations  Division, Office  of  Marine and  Estuarine Protection,
U.S. Environmental  Protection  Agency.   EPA Contract  No.  68-01-6938.   Tetra
Tech, Inc., Bellevue, WA.  56 pp.  + appendices.

Tetra Tech.   1985b.   Bioaccumulation monitoring guidance:   3.  recommended
analytical detection  limits.   Final  Report prepared  for Marine Operations
Division, Office  of Marine  and  Estuarine  Protection,  U.S. Environmental
Protection Agency.  EPA Contract No. 68-01-6938.  Tetra Tech,  Inc., Bellevue,
WA.  23 pp.

Tetra Tech.   1985c.   Commencement  Bay nearshore/tideflats remedial  investi-
gation.   Final  Report.   Volumes  1 and  2.   Prepared for  Washington  State
Department of Ecology under  Contract No. C84031.  Tetra Tech,  Inc., Bellevue,
WA.

Tetra Tech.   1987a.   Bioaccumulation monitoring guidance:  2.   selection of
target species and review of available bioaccumulation data,  volume I.  EPA
430/9-86-005.   U.S.  Environmental  Protection  Agency,  Marine Operations
Division, Office of Marine and Estuarine Protection, Washington,  DC.  52 pp.

Tetra Tech.   1987b.   Bioaccumulation monitoring guidance:  2.   selection of
target  species  and  review  of available  bioaccumulation data,  volume II:
appendices.  EPA 430/9-86-006.   U.S.  Environmental  Protection Agency, Marine
Operations Division, Office of Marine and  Estuarine Protection, Washington,
DC.

Tetra Tech.   1987c.   Technical  support document for ODES statistical  power
analysis.  Final  Report  prepared  for Marine Operations  Division,  Office of
Marine and Estuarine  Protection, U.S. Environmental Protection Agency.  EPA
Contract No.  68-01-6938.   Tetra Tech, Inc., Bellevue, WA.  34  pp. + appendix.

Tetra Tech and American  Management Systems, Inc.   1986.   ODES user's guide:
supplement A  -  description  and use of  Ocean  Data Evaluation System (ODES)
tools.   Prepared  for  U.S. Environmental  Protection Agency.   Tetra  Tech,
Inc., Bellevue, WA.
                                     50

-------
U.S.  Food and Drug  Administration.   1984.   Polychlorinated  biphenyls (PCBs)
in fish  and  shellfish;  reduction of tolerances; final  decision.   U.S.  FDA,
Rockville, MD.  Federal  Register, Vol.  49,  No.  100.   pp. 21514-21520.
                                     51

-------