vvEPA
        United States
        Environmental Protection
        Agency
           Office of Research and
           Development
           Washington DC 20460
EPA/600/R-97/114
July 1999
A Review of Single
Species Toxicity Tests:
Are the Tests Reliable
Predictors of Aquatic
Ecosystem Community
Responses?

-------

-------
                                            EPA/600/R-97/114
                                                 July 1999
A Review of Single Species Toxicity
     Tests:  Are the Tests Reliable
  Predictors of Aquatic Ecosystem
        Community Responses?
                       By
                     Victor de Vlaming1
                 Teresa J. Norberg-King2
             1 State Water Resources Control Board
                    901 P Street
                    PO Box 9442:13
              Sacramento, California 94244-2130


               2Mid-Continent Ecology Division
                 6201 Congdon Boulevard
                 Duluth.MN 55804-1636
              Office of Research and Development
             U.S. Environmental Protection Agency
                 Duluth, Minnesota 55804
                                          Printed on Recycled Paper

-------
                                      Notice
This document has been reviewed according to U.S. Environmental Protection Agency Policy and
approved for publication. Mention of trade names or commercial products does not constitute
endorsement or recommendation for use.

The views expressed in this document are those of the individual authors and do not necessarily
reflect the view and policies of the U.S. Environmental Protection Agency or the State Water
Resources Control Board.

-------
                                      Abstract
This document provides a comprehensive review to evaluate the reliability of single species (also
referred to as indicator species) toxicity test results in predicting aquatic ecosystem impacts, also
known as the ecological relevance of laboratory single species toxicity tests.  Since aquatic
ecosystem biological assessments have been performed to determine whether toxicity test results
are predictive of biological community impacts, the strengths and limitations of these validation tools
have been assessed.  Ecological relevance has been analyzed in studies  on ambient waters,
effluents, and other types of aqueous media. Furthermore, the effectiveness of laboratory single
species toxicity tests with individual chemicals in predicting biological community impacts and/or
environmental adverse effect concentrations is evaluated.  Merits of published criticisms of the
predictive effectiveness of single species used in laboratory toxicity tests are analyzed.  Also, the
question of whether single species used in laboratory toxicity tests are more sensitive than most
natural populations is discussed. Alternatives to single species toxicity tests are explored. A
preponderance of evidence reveals that laboratory single species toxicity test results are reliable
qualitative predictors of aquatic ecosystem community impacts.
                                            in

-------
                                      Foreword
The US Environmental Protection Agency (USEPA) has begun a long-term process aimed at
restoring and maintaining the chemical, physical, and biological integrity of the Nation's waters. One
major element in this effort was removing the discharge of toxic materials in toxic amounts to surface
waters. Through the policy designed to reduce or eliminate toxics discharges and to assist in
achieving  objectives of the Clean Water Act (CWA), USEPA issued technical direction in  the
Technical  Support Document for Water Quality-Based Toxics (TSD) guidance (March 1984 Policy
for the Development of Water Quality-Based Permit Limitations for Toxic Pollutants; 49 FR 9016).
Through these directives, the Agency described its integrated toxics control program. The integrated
program consists of the application of both chemical-specific and biological methods to address the
discharge  of toxic pollutants. USEPA continued with the development of the toxics control program
by developing effluent toxicity test methods, and these methods are being used to assess the quality
of surface waters, effluents, stormwater, as wellasothertypes of aqueous media. The use of toxicity
tests for biological monitoring provides tools that can be used to assess the combined effect of
mixtures and unknown constituents in a water sample to be evaluated, which in turn provides a direct
evaluation of the attainment of protection to the aquatic life.

Many uses for laboratory toxicity tests are to determine compliance with enforceable water quality
standards  and effluent limits. The concept behind, and the intent of, the single species toxicity tests
(also referred to as indicator species tests)  is to assess the probability  of impacts  on aquatic
ecosystems.  To be effective water quality monitoring tools, toxicity test results should have a
predictive relationship with aquatic ecosystem impacts. USEPA (1991) reported that the results of
indicator species  toxicity tests are effective predictors of aquatic ecosystem impacts.  This
comprehensive literature review was undertaken to provide a critical examination of the relationships
among ambient water toxicity, effluent toxicity, and effects on organisms in ambient waters.
                                          IV

-------
                                        Contents
Abstract	iii
Foreword	iv
List of Tables	vii
List of Figures	vii
Acknowledgments	viii
Acronyms and Abbreviations	ix
Definitions 	  x

Section 1  	  1

  1.0 Introduction	  1

  2.0 Intent of Single Species Toxicity Tests  	  2

  3.0 Validation Procedures: Ecological Surveys/Bioassessments	  2
      3.1   Bioassessments	  2
      3.2  Can Laboratory Single Species Tests Be Validated?	  3
      3.3  To What Extent Should These Tests Be Validated?	  4

  4.0 False Positives and False Negatives 	.	  4

  5.0 Field Studies	  4
      5.1   CETTP Studies	  4
      5.2  Associated Studies  	  5
           5.2.1  South Elkhorn Creek Study	  5
           5.2.2  North Carolina Study	  6
      5.3  Review of CETTP Studies	  6
           5.3.1  Dickson et al. Analysis 	  7
           5.3.2  Marcus and McDonald Analysis	  9
      5.4  Independent Evaluation of Statistical Analyses	 10
      5.5  Review of CETTP Studies in Which Significant Correlation Was Not Observed	 10
           5.5.1  Ottawa River Study	 11
           5.5.2  Five Mile Creek Study	 12
           5.5.3  Skeleton Creek	 12
           5.5.4  Ohio River	 13
           5.5.5  General Comments Regarding the Four CETTP Studies Summarized	 13
  6.0 Criticisms of CETTP and Associated Studies  	 13
      6.1   CETTP Studies Compared Ambient Water Test Results with Bioassessment Variables  . 13
      6.2  Nonrandom Selection of Study Areas and Sites	 14
      6.3  Use of the Most Sensitive Toxicity Test Results 	 14
      6.4  Relationship Between Toxicity Test Results and Instream Biological Measurements Relied
           Heavily on High Magnitude Toxicity	 15
      6.5  Temporal Repeatability of the Ambient  Water Toxicity/Biological  Response Was Not
           Demonstrated	 15
      6.6  Confounding Factors Were Not Considered  	 15
      6.7  Was the CETTP Classification System Mathematically Biased? 	 16
      6.8  High Rate of False Positives  	 16
      6.9  Miscellaneous Criticisms 	 16
      6.10 Conclusions	 16

-------
                                    Contents (continued)


   7.0 Single Species Tests with Effluent	  17

   8.0 Single Species Tests with Individual Chemicals or Small Groups of Chemicals 	  17
       8.1  Organic Chemicals: Pesticides  	  17
       8.2  Organic Chemicals: Nonpesticides 	  17
       8.3  Metals	  17
       8.4  Other Data and Views of Predictiveness of Single Species Test Results	  17

   9.0  Comparison of Single Species and Multiple Species (Microcosm, Mesocosrn) Toxicity Test
           Results	  18
       9.1  Okkerman et al. (1993)	  19
       9.2  Emans et al. (1993)	                   19
       9.3  Slooff (1985)	'.'.'.'.'.'.'.'.'.'.  19
       9.4  Persoone and Janssen (1994)	                            19
       9.5  Phluger (1994) 	    19
       9.6  Dom(1996) 	                           20
       9.7  Crane (1995)	  20

   10.0 Alternatives to Single Indicator Species Tests	  20
       10.1 Tests with Single Indigenous Species	  20
       10.2 Tests with Multiple Indigenous Species	  21

   11.0 Studies in Ocean or Estuarine Settings	  22

Section 2  	  23
   1.0  Conclusions	"  ]  23
   2.0  Summary 	' ]  26

Section 3
   1.0 References	  28
   2.0 Bibliography 	'.'.'.'.'.'.'.'.'.'.'.'.'.  35

Appendices

   Appendix A Single  Species Tests with Effluents  	  36

   Appendix B Single  Species Tests with Individual Chemicals	  42

   Appendix C Single  Species Tests with Ocean Water or Sediment	  51

   Appendix D Strengths and Limitations of Single Species Toxicity Tests	  54
                                             VI

-------
                                      List of Tables
Table 1.    Toxicity testing summary for the Ottawa River site study (Mount et a!., 1984)	  11

Table 2.    Equations showing relationships between laboratory (single species) and ecosystem
           determined endpoints	  18

Table 3.    Summary of studies examining the relationship between laboratory single species
           test results and aquatic ecosystem responses  	  18








                                     List of Figures


Figure 1.    Summary of  Eagleson et al., 1990 analysis	  7

Figure 2.    Summary of Dickson et al., 1992 analysis	  9

Figure 3.    Summary of studies in which a cladoceran was used as a laboratory test organism
           when comparing toxicity test  results to ecological survey data and/or field test
           concentrations	  24

Figure 4.    Summary of studies reviewed in this report in which the results of laboratory single
           species toxicity tests were compared to biological community surveys and/or field
           effect concentrations	  26
                                             VII

-------
                            Acknowledgments
 This document was peer reviewed by numerous individuals and this version of the document
 incorporates reviewer recommendations. These review comments were considerably valuable
 in improving the quality, accuracy and clarity of the literature review.

 The reviewers of the early drafts of this document who offered helpful suggestions are:
  Larry Ausley (North Carolina DENR, Raleigh, NC),
  Tom Dean (Coastal Resources Associates, Vista, CA),
  Debra Denton (USEPA, Region 9, San Francisco, CA),
  Regina Donohoe (California EPA, Sacramento, CA),
  Chris Foe (Central Valley Regional Water Quality Control Board, Sacramento, CA),
  Jeff Miller (Aqua-Science, Davis, CA),
  Don Mount (AscI Corporation, Duluth, MN), and
  Michael Perrone (State Water Resources Control Board, Sacramento, CA).

 The critiques and suggestions of the following individuals were particularly valuable:
  Brian Anderson (University of CA-Santa Cruz, Monterey, CA),
  Gordon Anderson (deceased, formerly of Santa Ana Regional Water Quality Control Board,
  Riverside, CA),
  Rodger Baird (City Sanitation Districts of Los Angeles, Whittier, CA),
  Peter Chapman (EVS Consultants, Vancouver, BC),
  JoAnne Cox (State Water Resouces Control Board, Sacramento, CA),
  Carol DiGiorgio (Department of Water Resources, Sacramento, CA), and
  Mike Marcus (The Cadmus Group, Albuquerque, NM).

John Caims, Jr. (Virginia Tech,  Blacksburg, VA) provided an abundance of the relevant
literature for this review.

We appreciate the peer reviews conducted by Robert Spehar and Jo Thompson, USEPA,
Office of Research and Development, Mid-Continent Ecology Division, Duluth, MN.

Without the assistance and  cooperation of Robert Holmes (California State University,
Humboldt, Arcata, CA) and M. Perrone (Water Resources Control Board) this document could
not have been produced.
                                      viii

-------
                    Acronyms and Abbreviations
A. punctulata
AEC
C. dubia
C. variegatus
C. parvula
CETTP
CWA
EC
F1FRA
1C
IWC
km
LOEC
m
M. bahia
M. berylina
NOEC
NPDES
P. promelas
POTW

r
RWC
S. capricornutum
STP
TSCA
TSD
WET
WWTP
sea urchin, Arbacia puntulata
Acceptable Effluent Concentration
cladoceran, Ceriodaphnia dubia
sheepshead minnow, Cyprinodon variegatus
red algae, Champia parvula
Complex Effluent Toxicity Testing Program
Clean Water Act
Effect Concentration
Federal Insecticide, Fungicide and Rodenticide
Inhibition Concentration
Instream Waste Concentration
kilometers
Lowest Observed EEffect Concentration
meters
mysid shrimp, Mysidopsis bahia
inland silverside, Menedia berylina
No Observed Effect Concentration
National Pollutant Discharge Elimination System
fathead minnow, Pimephales promelas
Publicly Owned Treatment Works (wastewatertreatment plant)
and also referred to as WWTP
Correlation coefficient
Receiving Water Concentration
green algae, Selenastrum capricornutum
Sewage Treatment Plant
Toxic Substances Control Act
Technical Support Document (cf., USEPA, 1991)
Whole Effluent Toxicity
Wastewater Treatment Plant, also referred to as a POTW
                                      IX

-------
                                                 Definitions
  Accuracy is the degree of difference between observed
  values and known or actual values! This is appropriate for
  chemical and physical measurements, but not biological
  systems. Toxicity is relative rather than absolute and the
  organisms measure toxicity without a reference organism
  In a reference toxicant solution.

  Acute Toxicity is a test to determine the concentration of
  effluent or receiving waters (or ambient waters) that produces
  an adverse effect on a group of test organisms during a short-
  term exposure (e.g., 24,48, or96 h). The endpoint is lethality.
  Acute toxicity is measured using statistical procedures (e.g.;
  point estimate techniques orat-test). Acute toxicity is usually
  defined as TUa =100/LC50.

  Acirte-tc-Chronic Ratio (ACR) is the ratio of the acute toxicity
  of an effluent or a toxicant to its chronic toxicity. It is used
  as a factor for estimating chronic toxicity on the basis of acute
  toxicity data, or for estimating acute toxicity on the basis of
  chronic toxicity data.

 Additivity is the  characteristic property of a mixture of
 toxicants that exhibits a total toxic effect equal to the arithmetic
 sum of the effects of the individual toxicants.

 Ambient Toxicity is measured by a toxicity test on a sample
 collected from a surface water.

 Bioassay is a test used to evaluate the relative potency of
 a chemical or a mixture of chemicals by comparing its effect
 on a living organism with the effect of a standard preparation
 on the same type of organism. Bioassays frequently are used
 in the pharmaceutical industry to evaluate the potency of
 vitamins and drugs.

 Criteria Continuous Concentration (CCC) is the USEPA
 national water quality criteria recommendation forthe highest
 instream concentration of a toxicant or an effluent to which
 organisms can be exposed indefinitely without causing
 unacceptable effect.

 Criteria Maximum Concentration (CMC) is the USEPA
 national water quality criteria recommendation forthe highest
 Instream concentration of a toxicant or an effluent to which
 organisms can be exposed fora brief period of time without
 causing an acute effect.

 Chronic Toxicity is defined  as a long-term toxicity test in
whteh sublethal effects (e.g., reduced growth or reproduction)
are usually measured in addition to lethality. Chronic toxicity
is defined as TUc = 1007NOEC or TUc =  100/ECp (ICp)
  The ICp and ICp value should be the approximate equivalent
  of the NOEC calculated by hypothesis testing for each test
  method.

  Coefficient of Variation (CV) is a standard statistical measure
  of the relative variation of a distribution or set of data, defined
  as the standard deviation divided by the mean. Coefficient
  of variation is a measure of precision within (intralaboratory)
  and among (interlaboratory) laboratories.

  Critical Life Stage is the period of time in  an organisms life
  span in which it is the most susceptible to adverse effects
  caused by exposure to toxicants,  usually during early develop-
  ment (egg, embryo, larvae). Chronic toxicity tests are often
  run on critical life stages to replace long duration, life-cycle
 tests since  the most toxic effect usually occurs during the
 critical life stage.

 Effect Concentration (EC) is a point estimate of the toxicant
 concentration that would cause an observable adverse effect
 (e.g., survival or fertilization) in  a given percent of the test
 organisms, calculated from a continuous model (e.g., USEPA
 Probit Model).

 Hypothesis Testing is a technique (e.g., Dunnett'stest) that
 determines what concentration is statistically different from
 the control.  Endpoints determined from hypothesis testing
 are  NOEC and LOEC. Null hypothesis (Ho): The effluent
 is nottoxic; Alternative hypothesis (Ha): The effluent is toxic.

 Inhibition Concentration (1C)  is a point estimate of the
 toxicant concentration that would cause  a given percent
 reduction in a non-quantal biological measurement (e.g.,
 reproduction or growth) calculated from a continuous model.

 Instream Waste Concentration (I WC) is the concentration
 of a toxicant in a riverine system  after mixing. Also referred
 to as the receiving water concentration (RWC). The IWC
 or RWC is the inverse of the dilution factor.

 LC50 is the toxicant concentration that would cause death
 to 50% of the test organisms.

 Lowest Observed Effect Concentration (LOEC) is the
 lowest concentration of toxicant to which organisms  are
 exposed in a test, which causes statistically significant adverse
 effects on the test organisms (i.e., where the values forthe
observed endpoints are statistically significant different from
the control).  The  definitions of NOEC and LOEC assume
a  strict dose-response  relationship  between  toxicant
concentration and organism response. If this assumption

-------
were always the case, there would be no issue concerning
the endpoint definitions because the NOEC would always
be a lower concentration level than the LOEC. However,
this strict dose-response relationship does not exist with all
toxicants.  When this occurs the test must be repeated or
the lowest NOEC should be reported for compliance purposes.

Minimum Significant Difference (MSD) is the magnitude
of difference from control where the null hypothesis is rejected
in a statistical test comparing a treatment with a control MSD
is based on the number of replicates, control performance
and power of the test.

Mixing Zone is an area where an effluent discharge under-
goes initial dilution and may be extended to cover the second-
ary mixing in the ambient waterbody. A mixing zone is an
allocated impact zone where water quality criteria can be
exceeded  as long as acutely toxic conditions are prevented.

No Observed  Adverse Effect Level (NOAEL) is a tested
dose of an effluent or a toxicant below which no adverse
biological  effects are observed, as identified from chronic
orsubchronic human epidemiology studies or animal exposure
studies.

No Observed Effect Concentration (NOEC) is the highest
tested concentration of toxicant to which  organisms are
exposed in a fuirlife-cycle or partial life-cycle (short-term)
test, that causes no observable adverse effect on the test
organism (i.e., the highest concentration of toxicant at which
the values for the observed responses are not statistically
significant different from the controls). NOECs calculated
by hypothesis testing are dependent upon the concentrations
selected.

Point Estimation Techniques are used to determine the
effluent concentration at which adverse effects (e.g., fertiliza-
tion, growth or survival) occurred, such as Probit, Interpolation
Method, Spearman-Karber. For example, concentration at
which a 25% reduction in fertilization occurred.

Precision is  a  measure  of  mutual  agreement among
individual  measurements  or enumerated values of the
same property of the sample; can be described by the
mean, standard deviation and coefficient of variation. The
precision  is usually discussed by test consistency or re-
peatability both  with a laboratory (intralaboratory) and
among several  laboratories  (interlaboratory)  using the
same test method and reference toxicant.

Receiving Water Concentration (RWC) is the concentra
tion of a toxicant or the parameter toxicity in the receiving
water (i.e., riverine, lake, reservoir, estuary or ocean) after
mixing.    Isopleths of  effluent  concentration  can  be
established  by dye studies or modeling techniques is
determining  CMC and CCC.

Significant Difference is defined as statistically significant
difference (e.g., 95% confidence level) in the means of two
distributions of sampling results.

Tesst Acceptability Criteria (TAG) are defined for toxicity
tests results to be acceptable or valid for compliance, the
effluent and the concurrent reference toxicant  controls
must meet specific criteria as defined in the test method
(e.g.,  Ceriodaphnia dubia survival and reproduction test,
the criteria are: the test must achieve at least 80% survival
and average 15 young/female in the controls).

Toxicity Tests are laboratory experiments which employ
the use of standardized test organisms to measure the
adverse effect (e.g., growth, survival or reproduction) of
effluent or receiving waters.

Toxic Unit Acute (TUa) is the reciprocal of the effluent
concentration that causes 50% of the organisms to die by
the   end   of  the   acute  exposure  period   (i.e.,
TUa = 100/LC50).

Toxic Unit Chronic (TUc) is the reciprocal of the effluent
concentration that causes no observable effect on the test
organisms by the end of the chronic exposure period (i.e.,
TUc = 100/NOEC).

Toxic Units (TUs) are a measure of toxicity in an effluent
as determined by  the  acute toxicity units  or chronic
toxicity units. Higher TUs indicate greater toxicity.
Toxicity Identification Evaluation  (TIE)  is  a set of
procedures to identify the  specific chemical(s)  responsi-
ble for effluent toxicity.  TIEs are subset of the Toxicity
Reduction Evaluation (TRE).

Toxicity Reduction Evaluation (TRE) is a site-specific
study conducted in a stepwise process designed to iden-
tify the  causative agents of effluent toxicity, isolate the
sources of toxicity, evaluate the effectiveness of toxicity
control options, and then confirm the reduction in effluent
toxicity.

Whole Effluent Toxicity (WET) is the total toxic effect of
an  effluent or receiving water measured directly with a
toxicity test.
                                                     XI

-------

-------
                                             Section 1
 1.0 Introduction
 The Clean Water Act (CWA), Federal Insecticide, Fungi-
 cide, and Rodenticide Act (FIFRA), and Toxic Substances
 Control Acts (TSCA) are the federal legislation mandating
 those potential hazards of chemicals and wastewaters be
 assessed. In particular, the CWA aims at preventing the
 release of toxic concentrations of chemicals, regardless
 of whether they originate from point or nonpoint sources,
 into the nation's surface waters by stating "it is the national
 policy that the discharge of toxic pollutants in toxic amoun-
 ts be prohibited."

 As part of the effort to implement the above CWA policy,
 the USEPA incorporated toxicity-based discharge limits
 into National  Pollutant Discharge  Elimination System
 (NPDES) permits.  To support this approach, USEPA
 published a Technical Support Document (TSD) (USEPA,
 1991) and short-term toxicity test methodologies (USEPA,
 1994a; 1994b; hereafter referred to as the USEPA toxicity
 tests). The intent  of these toxicity tests  is to rapidly and
 reliably estimate the potential chronic effects of toxic chem-
 icals in ambient water and wastewater, stormwater and
 other water matrices on aquatic life.

 Forfreshwater ecosystems, USEPA has focused on three
 species for short-term tests designed to  estimate the de-
 gree of chronic toxicity in a water sample (USEPA, 1994a).
 These freshwater methods include a fish, larval fathead
 minnow   (Pimephales  promelas),   a   zooplankton
 (Ceriodaphnia dubia), and an alga (Selenastrumcapricorn-
 utum). The marine and estuarine short-term tests estimate
 chronic toxicity (USEPA,  1994b) with two fish species,
 sheepshead minnow  (Cyprinodon variegatus) and the
 inland silverside (Menidia berylina), a red alga (Champia
 parvula), an east coast mysid (Mysidopsis bahia), and a
 sea urchin (Arbacia punctulata).

 USEPA states "whole effluent toxicity (WET) is a  useful
 parameter for assessing and protecting  against impacts
 upon water quality and designated uses caused by the
 aggregate toxic effect of the discharge of  pollutants" (in
the TSD; USEPA, 1991). Four data sets were the focus of
supportforthe reliability of the USEPA toxicity tests results
in predicting aquatic ecosystem community responses:
USEPA's  Complex Effluent Toxicity Testing  Program
(CETTP) studies  (USEPA,  1991),   the South Elkhorn
Creek,  Kentucky study (Birge et al., 1989), the Trinity
River, Texas study (Dickson et al., 1989),  and the North
Carolina study  performed by Eagleson et al. (1990).
 The eight CETTP studies include:  Scippo Creek, Ohio
 (Mount and Norberg-King, 1985);  Ottawa River, Ohio
 (Mount et al., 1984); Five Mile Creek, Alabama (Mount et
 al., 1985); Skeleton Creek, Oklahoma (Norberg-King and
 M:ount, 1986); Naugatuck River, Connecticut (Mount etal.,
 1986a); Back River, Maryland (Mount et al., 1986b); Ohio
 River, West Virginia (Mount et al., 1986c); and Kanawha
 River, West Virginia (Mount and Norberg-King, 1986). In
 these studies the 7-d Ceriodaphnia and/or early life stage
 larval fathead  minnow toxicity test  results from surface
 water and/or effluents were  compared with  data from
 aquatic ecosystem community surveys (bioassessments)
 to determine whetherthe toxicity test results were effective
 predictors  of  instream biological  responses.  USEPA
 concluded  (USEPA, 1991) that the  four data sets "com-
 prise a large database specifically collected to determine
 the validity of toxicity  tests to predict receiving water
 community impact. The results, when linked together,
 clearly show that if toxicity is present (in discharges) after
 considering dilution, impact will also be present."

 Criticisms of the CETTP and associated studies,  as well
 as their conclusions, have been published (see Section 6
 below). In  a broader sense, there have been questions
 regarding the  reliability of single  species  (frequently
 described as indicator species) toxicity test results in-
 predicting aquatic ecosystem responses (impairments).
 Moreover, there are questions regarding the validity of, and
 the uncertainty associated with, extrapolations from single
 indicator species toxicity test results to aquatic ecosystem
 responses.   USEPA  also bases their chemical-specific
 water quality criteria on laboratory single species toxicity
 test estimates of chronic toxicity,  yet the validity and
 reliability of these criteria are less frequently questioned.

 The  central aspect of the uncertainty appears  to be
 whetherthe indicator species toxicity test results, obtained
 under controlled laboratory conditions, can  be reliably
 translated into responses  by complex and multivariant
 aquatic ecosystem communities. For example, laboratory
 effluent toxicity test results could overestimate biological
 community  responses  if  aquatic  ecosystem  physi-
 cal/chemical or  biotic  factors mitigated  (e.g.,  altered
 chemical bioavailability) effluent toxicity.  On the other
 hand, some aquatic  ecosystem physical/chemical and
 biotic factors could act as stressors which exacerbate the
effects of toxic chemicals such that laboratory toxicity test
 results underestimate  instream biological responses.
There is also the concern that indicatorspecies toxicity test
                                                   -1-

-------
 results do not represent the range of sensitivities and the
 different levels of biological organization which exist in
 aquatic ecosystems.  These, as well as other, concerns
 regarding the reliability of single species  toxicity test
 results in predicting  aquatic  ecosystem biological re-
 sponses will be considered in this document.

 Regulatory agencies have tended to rely on single species,
 especially USEPA toxicity tests, test results on surface
 water and  wastewater samples  to  estimate potential
 toxicity threats to aquatic ecosystem communities.  Since
 there have been criticisms of the predictive effectiveness
 of  single species tests, the intent in this review is to
 evaluate and summarize the published literature, as well
 as other available reports, on the ecological relevance of
 laboratory single species toxicity test results. This review
 examines, but is not limited to, the CETTP and associated
 studies (e.g., Birge et al., 1989;  Dickson et al.,  1989;
 Eagleson et al., 1990). Various aspects of the reliability of
 single species toxicity test results as predictors of biologi-
 cal community impacts have  been reviewed by  many
 authors (as noted in Section 3). This report is a compre-
 hensive review of the  literature in this area.

 The following sections address:
 4 the intent of single  species toxicity tests,
 * the procedures (bioassessments) used to "validate"
   indicator species toxicity test results,
 4- the concepts of false positives and false negatives,
 4 the USEPA CETTP and associated studies,
 4 criticisms of the CETTP studies,
 4 single species tests with effluent,
 4 single species tests with individual chemicals or small
   groups of chemicals,
 4 comparisons of single species and multiple species test
   results, and
 4 alternatives to single species toxicity tests.

 The vast majority of the literature  in this area relates to
 toxicity tests with freshwater species and ecosystems.
 There  Is a paucity of studies which  attempt to relate
 laboratory toxicity test results with bay and  estuary or
 ocean impacts, nonetheless, the few relevant studies
 (Section 11) are summarized in this review. The conclu-
 sions (Section 2) of  this report  are  weighted  toward
 freshwater toxicity test results as predictors of aquatic
 ecosystem community responses.

 2.0 Intent of Single Species Toxicity Tests
 Before summarizing and discussing data which relate to
 how reliably the USEPA toxicity tests (and other single
 species toxicity test results) predict ecosystem responses,
 a consideration of the intent of these tests seems war-
 ranted.

A criticism of the single species tests has been that their
results are invalid predictors of aquatic community re-
 sponses  because  only  qualitative  (i.e.,  statistically
 significant toxicity test results indicate some degree of
 biological community response/impairment)  rather than
 quantitative relationships are established between toxicity
 test  results  and  ecosystem  community  responses.
 Quantitative in this context refers to a case in which some
 level or percent response in toxicity test results can be
 directly correlated with a specific level/percent response
 in instream biological communities. However, these tests
 were  not  designed  to  be quantitative  predictors  of
 ecosystem responses. The USEPA toxicity tests and other
 indicator  single  species  tests  were  intended  to be
 screening  tools  (i.e.,  to indicate the  potential  for
 wastewater or ambient water samples to cause biological
 community  impacts, characterizing relative  ecosystem
 effects) and "early warning" signals (a measurement which
 indicates the potential for aquatic ecosystem  impairment
 priorto actual damage to biological communities (USEPA,
 1991; USEPA, 1994a). The toxicity tests are applicable to
 ambient water samples regardless  of the  sources (i.e.,
 point or nonpoint) of contaminants.

 Because the USEPA toxicity tests were intended  to be
 early warning signals of biological community impacts, the
 results of a single  toxicity test should not constitute a
 violation of  a water quality standard, or of  an effluent
 limitation. Unfortunately, such misuses have occurred and
 these cases may be major contributors to  the criticisms
 leveled at the USEPA toxicity tests.

 3.0   Validation  Procedures:   Ecological
 Surveys/Bioassessments
 3.1 Bioassessments
 The method generally used for "validating" the  reliability of
 single species toxicity test results in predicting aquatic
 ecosystem impairments (and "safe" concentrations) has
 been to perform ecological surveys (biological assess-
 ments), and then compare these data to toxicity test results
 with water samples from the same ecosystem sites or to
 data from effluent toxicity tests.  Bioassessments can
 consist of estimates of species composition, diversity, and
 density of aquatic organisms.

 Because bioassessments play a crucial role  in this
 "validation" process, there are  considerations regarding
 these procedures which must be  explored. From these
 surveys, a judgement is made as to whether or not the
 aquatic ecosystem or a part of it is  impacted. Bioassess-
 ments are not ate facto better or easier to interpret than
 other types of measurements!.

 Bioassessments are subject to most of the same pitfalls as
 other biological and toxicolocjical studies, including poor
 design and careless performance. Sound  experimental
 design and careful  conduct are crucial, requiring  a
thorough understanding of the complexity  of aquatic
ecosystems, as well as confounding factors (e.g., current
                                                   -2-

-------
velocity, depth, light penetration, shading, temperature,
substrate, organic matter, nutrients) which can affect site
selection so that they can be "controlled" or accounted for.
Moreover,  sites within a stream should  be chosen to
minimize differences among them with respect to physical
and chemical parameters. The idea is to minimize factors
which can  influence ecosystem parameters so that any
change can be ascribed to toxic chemicals.

To be effective in  "validation" of  toxicity test  results,
ecological  surveys  must be able to clearly distinguish
between contaminant-caused effects and all other effects
on aquatic populations.  Aquatic  ecosystem  biological
surveys are not, by themselves, sufficient to determine
toxic chemical impacts because biological community
structure and function are influenced by a host of other
factors  (e.g., dissolved oxygen, temperature, physical
parameters, habitat conditions).

Limitations (LaPoint, 1994; 1995) in bioassessment studies
have  included  failure to consider seasonal variations
(frequently sampling is only a one or two time event), poor
selection of endpoints  (endpoints should be reliable,
having ecological relevance), poor sampling procedures,
lack of sample replication, failure to consider nonchemical
stressors, failure to identify cause  of change, use of
inappropriate procedures and statistics which are not
standardized,  and failure  to provide  early warning of
impairment.  Many  ecological assessments have been
characterized by a high degree of variability (greater than
in chemical and toxicity measurements), imprecisions, and
lack of repeatability (e.g., LaPoint, 1994; 1995).

Many bioassessments provide qualitative (not quantitative)
data; for example, macroinvertebrate surveys with kick-
nets are qualitative and, usually, are not replicated. Most
of the ecological surveys associated with field "validation"
of single species test results have consisted of "simplistic
field designs" and "superficial study" of the natural system
(Neuhold, 1986; Chapman  et al., 1987; Luoma, 1995).
Neuhold (1986) contends  that measurements such  as
biomassand population numbers, which are frequently the
basis of ecological surveys,  are too insensitive  as
endpoints because they take considerable time to change
enough to ".clear" the background noise level.  Interpreta-
tions of bioassessment data are frequently controversial.
For example, according to LaPoint et al. (1996), biological
assessment of contaminant(s) effects is more difficult than
laboratory single species toxicity tests with  regards to the
possible ecological significance due to the large number
of aquatic species potentially responding in the system.
Clements and Kiffney  (1996) state, "Most importantly,
inability to establish a direct cause-and-effect relationship
between contaminants and  selected endpoints greatly
limits instream biomonitoring."
There is yet to be agreement on meaningful ecological
endpoints  or the  amount  of change  in  ecological
measurements which represent impairment. It has been
difficult to identify and measure subtle damage in aquatic
ecosystems.  No  procedures/protocols for performing
ecological surveys on large waterways, such as major
rivers, have been published. Developing scientifically valid
biological assessment methods for such systems  is
needed, but it seems unlikely that regulatory agencies will
have the budgets to fund such large efforts. In relation to
these bioassessment concerns, the difficulties surrounding
"validation" of laboratory single species toxicity test results
have been reviewed (Cairns, 1983; 1988a; Livingstone and
Meeter, 1985; Chapman, 1995a,b). These considerations
should be  remembered  when  using bioassessment
measurements in evaluating the predictive accuracy of the
single species or multiple species toxicity test results.

The intent here is not to malign bioassessments, but to
draw attention to the fact that they are  not de facto
conclusive.  On  the other hand,  well  designed  and
performed bioassessments are powerful tools, crucial to
environmental monitoring and assessment. The advan-
tages of using  ecological  surveys  and,  in  particular,
macroinvertebrate surveys as water quality indicators have
been thoroughly discussed in  an informative book edited
by Davis and Simon (1995).

3.2 Can Laboratory Single Species Tests Be
Validated?
Mount (1995) suggests that it is impossible to conclusively
establish that ecosystem  impairments are caused by
ambient water or effluent toxicity. This is because there
are many stressors and other confounding factors at work
in natural ecosystems. Proving cause in complex, poorly
understood ecosystems will be difficult at best.  Recently,
Chapman (1995a)  wrote,  "Basically,  I  consider the
perceived need for validation of the laboratory by field
studies to be incorrect dogma." A reactive toxicity test can
confirm ecosystem impairments, but proactive tests can
only be "validated" by waiting for ecosystem effects  to
appear.  Furthermore, absence of biological community
effects can never be fully proven.  While recognizing and
addressing  various  short-comings  in  ecotoxicology,
Chapman (1995a) warns against perpetuating the "estab-
lished" validation dogma;  he points out, as did Mount
(1995), that field studies can never validate laboratory
studiessince there is no certainty that effects observed (or
not) in field studies were caused by effects measured in
the lab.

Another inherent problem in "validating" that single species
toxicity test results  can  be  reliably  extrapolated  to
ecosystem responses is the status of many aquatic
ecosystems. Given the everexpanding number of aquatic
                                                   -3-

-------
 population declines (e.g., Herbold et al., 1992; Obrebski
 et al., 1992; Bailey et al., 1994) and number of extinct,
 endangered, and threatened species, it is clear that many
 aquatic ecosystems are partially to seriously impaired. If
 single  species test results are  to  be early warnings
 (predictive of future events), proactive  in function, they
 cannot be "validated"' in all circumstances with existing
 ecological conditions. The point is notto discontinue study
 of the relationship, but rather to understand the limitations
 of the procedures used to "validate" the predictiveness of
 single species tests.

 3.3  To What  Extent Should These Tests Be
 Validated?
 Without partial disturbance of healthy or relatively healthy
 aquatic ecosystems, it may  not be possible to "validate"
 that extrapolations from laboratory toxicity tests reliably
 predict aquatic ecosystem  responses.   Depending  on
 scale, biological surveys, especially if repeated through
 time, may be  destructive to aquatic ecosystems.  The
 question is,  should toxic chemicals  be  released into
 ecosystems  to repeatedly  establish  a  link between
 laboratory toxicity test results and ecological impairments?

 Since unequivocal demonstration that effluent or ambient
 watertoxicity is the sole cause of ecosystem impairments
 may not possible, it seems sensible to question how much
 effort, time, and money should be expended to "validate"
 a quantitatively  accurate  correlation  between  single
 species  toxicity test results and instream biological
 responses.

 USEPA's toxicity tests were designed as screening tools
 to provide  early warning  of potential environmental
 impacts. For this and other reasons mentioned above, it
 has been difficult to establish a quantitative correlation
 between  the  results  of  these tests   and ecological
 responses in  all aquatic ecosystems.   As  Chapman
 (1995b) suggests, we can never be sure that a proactive
 prediction (based on laboratory toxicity test results)  is
 correct  without allowing  for potential environmental
 degradation. Possibly, surrogate aquatic ecosystems will
 allow us to establish a better link between laboratory test
 results  and  ecosystem  responses,  while  minimizing
 Impacts on natural aquatic systems.

 4.0  False Positives and False Negatives
 In this review, the concepts of "false positives" and "false
 negatives"  will  emerge when comparing the  results of
single species tests with ecological survey measurements.
Caution is essential in the application of such concepts.

There has been a tendency to label any one statistically
significant toxicity test result which does not match with an
ecological endpoint as a false positive.  This may be an
inaccurate designation. A single effluent or ambient water
  sample can contain toxic levels of chemicals but, due to
  effluent  or ambient water variability,  the duration,
  magnitude, and frequency of the toxicity are not sufficient
  to elicit a  measurable biological community response.^
  More sampling and testing could reveal this. On the other
  hand, the toxic sample could be an early warning, signaling
  toxicity of a magnitude, duration, and frequency to cause
  adverse ecosystem responses.   The  false positive
  designation is also based  on the  assumption that the
  measure of ecosystem integrity  is accurate.

  A false negative designation has sometime been applied
  to cases in which statistically significant toxicity is absent
  from an effluent or ambient water sample, but instream
  impairment is indicated.   Such a designation  is not
  necessarily true.   For example, one  sample may not
  typically characterize effluent or ambient water toxicity.
  More frequent sampling  and testing could reveal that
  toxicity is of sufficient magnitude, duration, and frequency
  to evoke biological community responses. On the other
  hand, assuming that the bioassessment measurement
  reliably demonstrated impairment, the impact could be a
  consequence of nonchemical,  non-wastewater related
 causes. The presence of bioaccumlative toxic chemicals
  in a water sample could lead to a false negative designa-
 tion because short term toxicity tests are not designed to
 detect such substances.  In addition, there are biological
 endpoints in aquatic ecosystems which are  not repre-
 sented in indicator species toxicity tests.

 Following a systematic analysis, Luoma and  Ho (1993)
 concluded  that false negative  predictions (finding no
 statistically significant toxicity in laboratory single species
 tests  when, in truth, there  is biological  community
 degradation) are  just as probable as false  positive
 predictions. Luoma and Ho contend that "false negatives
 may be common in toxicity tests, but they are difficult to
 detect.  The main reason is that the ecological tests
 included  in  many  validation studies  are insensitive.
 Typically, validations are conducted only at one point in
 time, make inadequate replication, consider ambiguous
 community  structure indices, or do  a poor job of docu-
 menting exposures." Caution should be exercised when
 describing the relationship between a single toxicity test
 result and an index (which represents the integration of
 many types of stresses over time) of ecosystem integrity,
 as a false positive or negative,,

 5.0  Field Studies
 5.1 CETTP Studies
 The eight CETTP and three related studies examined the
 relationship between 7-d Ceriodaphnia  and/or larval
 fathead minnow early life stage toxicity test results on
 surface water or wastewater and  instream survey indices
.for zooplankton, benthic macroinvertebrates, and/or fish
 populations. The intent of these studies was to determine
                                                  . -4-

-------
 how effectively toxicity test results on ambient waters or
 effluents  corresponded with ("predicted") estimates of
 aquatic ecosystem community health. In the eight CETTP
 studies there were 80 sites in eight different watersheds
 where instream bioassessment indices were compared to
 surface water toxicity test results.

 The intent is not to summarize and evaluate each of these
 CETTP studies separately since they have, as a group,
 been the subject of recent analyses (Dickson et al., 1992;
 Marcus and McDonald, 1992).  The approach is to
 summarize two of the studies (Birge et al. 1989; Eagleson
 et al., 1990) which have been associated with the CETTP
 and then the two analyses (Dickson et al., 1992;  Marcus
 and McDonald, 1992) of the  CETTP  studies.  This
 summary is  followed by an evaluation of those two
 analyses by an independent statistician. The final portion
 of this section is a review of four CETTP studies which,
 according to Marcus and  McDonald  (1992),  do  not
 evidence  a statistically significant canonical correlation
 between  toxicity  test  results and instream indices of
 biological community health.

 Marcus and McDonald  (1992)  make the  interesting
 observation that, "There is, unfortunately, an  excess
 emphasis by many investigators  and  reviewers  on
 significance in assessing statistical results. The question
 of primary concern is not whether there  is high  or low
 frequency of significant correlations, but what the degree
 of correlation between pairings of each laboratory and field
 variable is." Examination of the CETTP studies reveals a
 distinct qualitative correspondence   between  ambient
 water toxicity and ecosystem variables. In most of the
 CETTP  studies  there  appeared  to   be  biological
 impairments in a gradient below discharge points (which
 showed toxic effluents) compared to upstream sampling
 sites. In most cases, ambient water toxicity at a site was
 associated  with  biological  community   impairments.
 Because of small sample sizes in the CETTP studies,
 routine correlative parametric statistics were not applied
 to compare bioassessment and toxicity test data.

 Statistics are frequently used to demonstrate "proof" of
 effect, the threshold of effects being arbitrary. McBride et
 al. (1993)  conclude that routine application of significance
 tests does not  extract the maximum information from
 environmental data.  These authors  discuss the advan-
 tages of equivalence tests where the investigator must
 state what degree of difference is considered a practical
 difference. In an equivalence test the null hypothesis is
 that  the  difference  in means  is greater than  some
 practically significant value which the tester must state in
 advance. They recommend that environmental managers
 and  scientists focus  attention on statistical power (the
 probability of rejecting the null hypotheses of no difference
 in the test  groups when in fact it is false-ideally the level
of power should be high)  and decide what is a practical
 difference.  This practical difference concept could apply
 to both the bioassessment and toxicity data.

 In a book on ecological risk estimation Bartell et al. (1992)
 write, "It might also be easier to design experiments or to
 monitor natural systems for qualitative endpoints rather
 than having to demonstrate statistical differences between
 quantitative  results.   The large variances that  typify
 ecological experiments  may  argue  for adopting  more
 qualitative endpoints." Statistics are a valuable tool in our
 attempt  to   understand  biological and  ecosystem
 operations. As we endeavorto comprehend the biological
 world,  it may be  useful,  however,  to  remember that
 statistical significance  does  not guarantee biological
 significance and biological significance does not always
 equate with statistical significance.


 5.2 Associated Studies
 5.2.1 South Elkhorn Creek Study
 Birge and associates (Birge etal., 1989; 1990) performed
 ecological assessements on a stream, which received a
 point-source discharge.  Results of single species toxicity
 tests were  compared to  ecological endpoints.   One
 objective was to assess the reliability of the laboratory test
 results  in predicting ecological responses.  Ecological
 measurements  included  macroinvertebrate  species
 richness, abundance, diversity, and functional group
 analysis. Toxicity in effluent and ambient water samples
 from  the different stream  sites was assessed using a
 fathead minnow embryo/larval 8-d test.

 The point-source  discharge  was from a wastewater
 treatment plant (WWTP) into Town Branch  Creek. Town
 Branch Creek entered into South Elkhorn Creek about 14
 km below the WWTP outfall.  There were three control
 sites, one above the WWTP outfall on Town Branch Creek
 and two on  South Elkhorn  Creek above the confluence
 with Town Branch Creek.  There were seven sampling
 sites at various distances downstream of the discharge
 point. The most distant station was 67.8 km downstream
 of the WWTP outfall.

 Embryo-larval survival in water samples from all three
 control (upstream of the WWTP) was greater than 90%.
 Toxicity tests with WWTP effluent generated data on effect
 concentrations (expressed as percent effluent). Hydrology
 of the creek was studied so that percent dilution of effluent
 could be predicted at each sampling site.  Toxicity at sites
 downstream of the  discharge point reflected the toxicity
 predicted by the effluent toxicity test data considering
 instream dilution. That is, instream toxicity was  reliably
 predicted by effluent dilution data.   These data are
significant  in  that  they demonstrate that  the  major
 modification of  effluent  toxicity was stream  dilution;
physical and chemical characteristics of the stream did not
appear to mitigate toxicity to any great extent.
                                                   -5-

-------
Ambient water samples  collected at the  three  sites
immediately  below the  point  of discharge  showed
statistically  significant toxicity,  whereas  none of the
reference sites yielded significant toxicity. A decreasing
gradient of toxicity downstream of the discharge point was
evident. Both the fish and invertebrate data suggested
adverse impacts at the three sites immediately below the
discharge point. Below these heavily impacted sites there
tended to be a gradient of increasing diversity of both fish
and macroinvertebrates downstream of  the discharge
point. The correlation coefficient (r) between embryo/larval
survival  in water samples from the  stream sites and
estimated percent effluent at those sites was -0.87 (i.e., the
greater the percent effluent at a site the lower the  embryo-
larval survival). The number of fish species® =-0.83) and
number of invertebrate taxa r = -0.94) were also inversely
correlated with the  estimated percent effluent at a site.
The correlation coefficients between embryo-larval survival
in water samples from the stream sites and number of
Invertebrate taxa was r = 0.96 while the value for the
number of fish species was r = 0.92.  All of these.corre-
lation coefficients were statistically significant. Results of
this study illustrates the laboratorytoxicity test results were
very reliable predictors of instream biological community
responses.  Data from this  study  were included in the
statistical analysis by Dickson et  al. (1992).

5.2.2 North Carolina Study
Effluent toxicity test results were compared to indices of
aquatic ecosystem community health at 43 sites on rivers
and streams in North Carolina (Eagleson et al., 1990).
Toxicity tests were performed with  both municipal waste
treatment and industrial  facilities effluents.  The  7-d
Cerfodaphnlatest, was used to estimate chronic toxicity in
effluents. Instream biological responses were gauged by
surveys of benthic macroinvertebrates above and below
points of discharge. Attempts were made to reduce habitat
type confounding factors, as well as other  physical
confounding factors.  Care  was taken to compare the
results of toxicity tests and field responses at  low and
average flow  conditions;  toxicity  decay was  also
incorporated into the comparisons.

Results of this study revealed that, if proper consideration
was given to effluent dilution,  the  USEPA toxicity tests
results can be reliable predictors  of ecological effects.
Comparisons of upstream and  downstream sites with
regard  to  biological  indices  were  made with  the
nonparametric  Wilcoxon  signed-rank  test.  If a site
downstream of an effluent discharge point was identified
as  a  statistically significant  response  (degradation)
compared to the reference  site above the discharge point,
the site was classified as "instream impact measured."  If
there  were  no  differences between  upstream  and
downstream site biological  indices measurements, the site
was classified as "no instream impact measured." When
an  effluent sample, diluted to the  appropriate instream
waste concentration (IWC),  resulted in  a statistically
significant response compared to controls, the sample was
designated as "instream impact predicted." If an effluent
sample did not produce statistically significant toxicity, it
was designated as "instream effect not predicted."

The  classification  system  described in  the above
paragraph was combined into a contingency table which
is best illustrated in Figure 1.

Toxicity test predictions were accurate in 88% of the cases.
If non-effluent anthropogenic factors contributed heavily to
instream biological impacts;,  one might expect a high
frequency of "false negatives." However, there were only
5% false negatives. If habitat differences or other physical
factors  between   the  reference site and  the  sites
downstream contributed to those sites being classified as
impaired, one would expect  a more equal distribution
between the two different categories with impacted sites.
However,  the  distribution  in the two categories where
instream impact was measured was very unequal (i.e., two
tests showing no toxicity and 29 testing positive fortoxicity,
cf., Figure 1).

Although these investigators did not apply statistical analy-
ses to this contingency table, results from Fisher Exact and
Chi-Square tests showed statistical significance (P<0.001).
Moreover, one must reject the null hypothesis that toxicity
test results do not predict biological responses.  Even
though there is some potential that confounding factors
(see   Section  6.6  below)  influenced  biological
measurements, it appears that the dominant impairments
were due to wastewater constituents. Assuming that the
ecological indicators were accurate, the results of this
study provide a strong case that the Ceriodaphnia (even
though  not indigenous to these stream  ecosystems)
toxicity tests results were reliable qualitative predictors of
aquatic ecosystem  impairments.   A more powerful
statistical design would have included more non-impacted
sites, but prior to the ecological surveys the nature of sites
was unknown.

5.3 Review of CETTP Studies
5.3.1 Dickson et al. Analysis
In 1992 Dickson and colleagues published the results of
a study undertaken to statistically analyze data from the
eight CETTP studies, the South Elkhorn Creek, Kentucky
study (Birge et al., 1989), and a study on the Trinity River,
Texas (Dickson et al., 1989). The intent of the Dickson et
al. (1992)  study was to apply a  statistical method and
classification approach to all of the above mentioned data
to elucidate relationships between surface water toxicity
test results and ecosystem community responses.

After entering data from all of the studies listed above into
a  database,  a  canonical  correlation analysis  was
performed to examine the  relationship between ambient
                                                    -6-

-------
                                                   EJ5%
                              021
           ElToxicity test predicts instream impact; instream survey measures impact [29/43 = 67%]

           DToxicity test predicts no instream impact; instream survey measures no impact [9/43 = 21%]

           BToxicity test predicts instream impact; instream survey measure no impact [3/43 = 7%]

           ElToxicity test predicts no instream impact; instream survey measures impact [2/43 = 5%]
 Figure 1.  Summary of Eagleson et al., 1990.  The categories represent the four possible outcomes when comparing
 laboratory effluent toxicity test results to ecological survey data collected at 43 stations.
water toxicity and  estimates  of biological  community
condition.  Canonical correlation tests  for  significant
relationships  between  two matrices  of data.   The
bioassessment metrics can be explored and meshed into
a  variate for ecosystem condition which  in turn  is
compared to the toxicity variate composed of the toxicity
data (e.g., from both Ceriodaphnia and larval fathead
minnow tests). A goal of canonical correlation is to identify
a combination of the predictor variables (i.e., toxicological
responses) and response variables (biological community
indices) which have the strongest correlation among all
possible combinations. The output of canonical correlation
includes indicators of the relative  importance (sometimes
defined as weights)  of each variable to the overall
correlation.

There were two major goals in the Dickson et al. (1992)
study:  1) ascertain whether or not statistically significant
correlations existed between the surface  water toxicity
variable and the biological community variable and 2) use
the results -of the canonical correlations  to identify
important variables. Using the toxicity test and biological
community indices variables, a classification system was
developed to determine the reliability of toxicity test results
as predictors  of instream community responses.

A major aspect of the analysis was data collected in the
Trinity River study (Dickson et al., 1989). In that study the
relationship between ambient water toxicity test results and
biological community response was scrutinized at 11 sites
along the river.  Reference sites were located above a
WWTP discharge point and the remaining sites were below
the outfall.  The relationships between ambient  water
toxicity and biological community indices were examined
through time, with sampling and testing in six separate
months. Assessments of ambient water toxicity consisted
of the Ceriodaphnia and larval fathead minnow short-term
test estimates of chronic toxicity.   Instream biological
community assessments included fisheries data (richness,
evenness, and an index of biotic integrity) and benthic
macroinvertebrate data (richness and evenness).

Separate canonical correlations were performed with the
toxicity variable compared to the fisheries indices and to
the benthic macroinvertebrate indices; the toxicity variable
was not correlated with a consolidated bioassessment vari-
able   consisting  of   combined   fisheries   and
macroinvertebrate  indices.    Statistically  significant
(p<0.001) coefficients of determination (r2 represents the
proportion of variation in one variable determined by the
variation of the other) were observed for both canonical
(range of r*was 0.38 to 0.59) and robust canonical (range
r2 was 0.38 to 0.94) analyses in all six months of the Trinity
River study for the fisheries and macroinvertebrate mea-
surements. These findings imply that the matrix of toxicity
test results were effective predictors of instream biological
community responses.

Unfortunately, detailed information on canonical correlation
for the CETTP studies was not presented by Dickson et al.
(1992).  In fact, r^'s were presented for only the Five Mile
Creek (Mount et al., 1985) and Kanawha River (Mount and
Norberg-King, 1989) studies.  Data showing the relative
contributions (i.e., weights) of each of the toxicological and
each  of  the  biological community variables were not
presented  for the  two   CETTP studies.   Statistically
                                                    -7-

-------
 significant i^'s from the robust canonical correlations were
 noted for the Five Mile Creek (r2 = 0.81, p = 0.0005) and
 Kanawha River (r2  =  0.81, p<0.00001) data.  These
 correlations suggest that toxicity test results from ambient
 water samples were reliable predictors  of  instream
 biological community responses.

 Based on the canonical analyses, fish species richness
 was  shown to be an  important aquatic  ecosystem
 response variable.  Therefore, Dickson et al. (1992) se-
 lected fish richness as the ecological response variable in
 all studies where it was available for the next phase of the
 analysis. However, two CETTP studies, Kanawha River
 (Mount and Norberg-King, 1986) and Ohio River (Mount
 et al., 1986c) did not include fish surveys, so benthic
 macroinvertebrates  richness  was substituted as  the
 biological response variable.

 The next step in the analysis was to develop a classifica-
 tion system to judge whether or not a site was predicted to
 be impacted based on ambient water toxicity data, and
 whether or not a site was observed to be impacted based
 on instream community metrics.

 For the ambient water toxicity data, a low value for test
 performance (e.g., low Ceriodaphnia neonate production
 or low larval minnow growth) was used to classify a site as
 "impact predicted" and a high value fortest species perfor-
 mance was  classified as "impact not predicted."  For
 instream biological community variables a low value (e.g.,
 low species richness) resulted in that site being classified
 as "impact  observed," whereas  a  high biological
 community  value classified  the site  as  "impact not
 observed."   Rather than establish arbitrary thresholds
 (cutoffs)  for classification of ambient water toxicity results
 into categories, the natural variability of the measured
 parameters was incorporated into the system.  Because
 the measure of toxicity consisted of the sum of a subset of
 the  toxicity  variables,  with  each of these  variables
 standardized, and with the assumption that the majority of
 the observations were normally distributed, the authors
 reasoned that the sum of a  set of these variables should
 have an approximately  normal distribution.  Assuming
 independence of variables, the authors reasoned that the
 sum divided by the square root of the number of variables
 being summed should have a standard normal distribution.
 For these reasons, Dickson  et al. (1992) concluded that a
 classification scheme could be defined such that a site
 would be classified "impact predicted" if the normalized
 toxicity measure fell below a threshold obtained from
 percentiles of the standard normal distribution.

 Controversy  surrounds  the biological metrics  and the
 amount of change or difference in these metrics which
 represents impairment. Therefore, as with the toxicity test
data, Dickson etal. (1992) used the biological community
data to determine a classification. Sites that revealed a
 biological response below a corresponding Poisson distri-
 bution percentile (representing counts of the number of fish
 and invertebrates at a site) were classified as "impact ob-
 served."

 Two misclassif ication errors were possible in this scheme,
 which were  1) misclassifying  a  nonimpacted site as
 impacted or 2)  misclassifying  an  impacted  site as
 nonimpacted. The percentiles selected for threshold de-
 pends on which misclassification error is of greater con-
 cern. If the desire is to keep the error rate of classifying an
 impacted site as nonimpacted low, then one might select
 a 95th percentile threshold. To keep the error rate of clas-
 sifying a nonimpacted site  as impacted  low,  the 5th
 percentile could be the threshold.
The classification scheme described above produced two-
way contingency  tables for predicted and  observed
impacts at aquatic ecosystem sites. Fisher's test was used
to evaluate the  accuracy of toxicity test predictions of
instream impacts. The classification scheme was applied
to the CETTP and Trinity River data sets, as well as to the
combined data sets. Using both the 95-95 and the  5-5
percentile cutoffs, strong, statistically significant qualitative
relationships were demonstrated between ambient water
toxicity and instream biological response (impairment).
The contingency table for the 95-95  percentile threshold
using the combined data sets is reproduced below.

Figure 2 shows the  data from  a contingency table
summarizing the data analyzed by Dickson et al (1992).
The total percentage of sites in all of the CETTP and Trinity
River studies where toxicity test results reliably predicted
instream biological findings was 84.4%. Fisher's Exact test
revealed that toxicity test results effectively (p  =  0.003)
predicted instream biological  responses.   The   low
percentage  (6.2%) of "false  negatives" suggests that
factors other than toxicity were not major contributors to
biological community impacts.

These data can be grouped and examined in a different
manner.  Grouping by whether sites  were biologically
impacted or not yields totals of 136 and 22, respectively.
For a stronger statistical design, a much larger number of
potentially  unimpacted  sites  would  be  necessary.
However, the condition of sites was unknown prior to  the
biological surveys.  Looking only at the impacted sites,
ambient water toxicity tests predicted impacts correctly in
93 % of the cases with 7% "falsie negatives." Examination
of the non-impacted site data reveals  that toxicity tests
were reliable predictors  in 32% of the cases,  with 68%
"false positives."  This potential (see discussion on false
positives/negatives above) high rate of "false positives" is
disturbing and confirms that the results of a single toxicity
test should not be used to characterize wastewater or an
ambient water toxicity.
                                                    -8-

-------
                                           19.4%
                                                       ne.2%
                                                                    D4.4%
                          El 80%
              EJToxicity test predicts instream impact; instream  survey measures impact [128/160]
              EToxicity test predicts instream impact; instream  survey measures no impact [15/160]
              EJToxicity test predicts no instream impact; instream survey measures impact [10/160]
              DToxicity test predicts no instream impact; instream survey measures no impact [7/160]
  Figure 2.  Summary of Dickson et al., 1992 analysis. The categories represent the four possible outcomes when
  comparing laboratory toxicity test results on ambient water samples from stream sites with ecological survey data from
  the same sites. Total number of stream sites is 160.
The procedures used in this study required that gradients
of ambient water toxicity and of biological community re-
sponses exist. The statistical analyses performed revealed
that the frequency  of observing instream impairments
when  toxicity test   results predicted  an impact was
significantly greater than the overall frequency of impair-
ments observed.  The analysis by Dickson et al. (1992)
provides a compelling qualitative relationship  between
ambient water toxicity and indigenous species responses.
Toxicity test endpoints identified as effective qualitative
predictors  of  aquatic   ecosystem  responses  were
Ceriodaphnia neonate production  and larval fathead
minnow growth.

5.3.2 Marcus and McDonald Analysis
Marcus and McDonald (1992) also analyzed the CETTP
and Elkhorn Creek (Birge et al., 1989) data sets using ca-
nonical correlation.  In this analysis, the null hypothesis
was that no  correlation  existed between the matrix of
instream biological  community measurements and the
matrix of toxicity test results (i.e., neonate production by
Ceriodaphnia and larval fathead minnow growth).

The results of their analysis showed a statistically signifi-
cant canonical  correlation occurred  in four of the eight
CETTP site studies (Scippo Creek: r = 0.93, Naugatuck
River:  r =  0.78, Back River: r =  0.996, and Kanawha
River: r =  0.79) as well as in the Elkhorn Creek (Birge et
al., 1989) data set r = 0.99). This translates to five of the
nine data sets (streams/rivers) analyzed. Relatively high
values were found for the canonical correlation coef-
ficients.  For all but two of the nine data sets the coeffi-
cients indicated a greater than 50% ® > 0.7) relationship
between the sets of laboratory toxicity test results and the
instream biological variables.  Marcus  and McDonald
(1992) emphasized these high correlation coefficients and
downplayed statistical significance.  Except  for  the
Naugatuck River study (Mount et al., 1986a), canonical
variable weights (weights refer to the relative importance
of each variable to the overall correlation) were not shown
in this publication, hindering the ability of the reader to
interpret the statistical analysis.

Marcus and McDonald (1992) concluded that, "Although
future improvements will be made in these test methods,
and better methods may be developed, we conclude that
at this time these two toxicity test methods  (i.e.,  the
Ceriodaphnia and larval fathead minnow short term esti-
mates of chronic toxicity) can  be potentially  useful
assessment tools for screening and monitoring."

In the analysis of the CETTP data, Marcus and McDonald
(1992) found that the ambient toxicity measures often
showed greater relationships to instream biological mea-
surements than expected by chance. They observed that
potentially important relationships appeared often. "Our
analyses of the  CETTP data indicate that results from the
tests  of ambient water toxicity often contain potentially
important biological information about relationships con-
tained in variables of these field variables."  (Marcus and
                                                   -9-

-------
 McDonald, 1992). In other words, qualitative relationships
 appeared often in the CETTP data.

 Marcus and McDonald (1992) reported that Ceriodaphnia
 neonate production generally had the greatest incidence
 of significant correlations to biological community mea-
 sures (greatest potential for predicting impairments).

 Based on simple correlation analysis of the CETTP data
 Parkhurst (1996) suggested that ambient toxicity did not
 show a strong relationship with measures of instream bio-
 logical communities. Nonetheless, a statistically significant
 relationship was noted between ambient watertoxicity and
 instream biological indices in five of nine CETTP and asso-
 ciated studies. Furthermore, only sublethal endpoints from
 the toxicity tests were used in the correlation analysis; that
 is, Parkhurst (1996) omitted  lethality data from his analysis.

 5.4 Independent Evaluation of Statistical Analyses
 The appropriateness of the statistical methods and an
 evaluation of the major differences used by both Dickson
 et al. (1992) and Marcus and McDonald (1992) was con-
 ducted by Smith (1994). In  this review, Smith (1994) was
 not convinced thatthe canonical analyses were the optimal
 statistical approach to examine the CETTP  and Trinity
 River data as canonical  correlation  assumes linear
 relationships. Note that Marcus and McDonald (1992) did
 address the linearity question within and between the sets
 of variables.  Smith suggests that there are many cases
 where biological community parameters have been shown
 to have nonlinear relationships with toxicity. Furthermore,
 canonical correlation focuses on linear combinations of
 toxicity and instream response variables that correlate
 maximally. Smith  concluded that, "It is possible that the
 most ecologically meaningful relationships between the
 toxicity tests and instream responses are not represented
 by maximal correlations."

 As indicated above, Marcus and McDonald (1992) did not
 provide canonical variable weights in their publication, ex-
 cept for one analysis, rendering interpretation of their ap-
 praisal difficult. Dickson etal. (1992) presented canonical
 variable weights for the Trinity River study, but not for the
 CETTP studies. Smith observes that examination of the
 canonical variable weights,  especially for  two (February
 and June) of the six months of Trinity River sampling data,
 call  into  question the usefulness of  the   canonical
 procedureforanalyzing the CETTP and associated studies
 data. More specifically, toxicity test variable weights and
 bfoassessment variable weights sometimes had opposite
 signs  (plus  or minus).    When both  toxicity  and
 bioassessment weights have the same sign (plus, plus or
 minus,  minus) the data indicate  that increased larval
fathead  survival/growth  and/or   Ceriodaphnia
survival/neonate production  were correlated with greater
diversity/density in the  biological community  measure-
 ments.  However, when the signs were opposite, the
 indication  is  that an increase in one  variable was
 accompanied by a decrease in the other variable (e.g.,
 higher growth/reproduction with lower diversity/density).
 Biologically, the opposite signs appear inconsistent with
 the expected relationship between toxicity and community
 parameters.

 Smith suggests analyzing  the CETTP and associated
 studies  using  separate  analyses  of the toxicity and
 instream response data, possibly ordination techniques.
 He commented that, "These analyses  would provide
 insight into the relationships and patterns shown by toxicity
 data alone and the instream response data alone. From
 these  analyses, I would produce interpretable variables
 summarizing the different patterns observed (for both
 toxicity and response variable sets  separately). I  would
 then correlate the summary variables for the toxicity tests
 with the  summary variables for the instream responses
 using multiple regression. Each regression analysis would
 involve an instream response summary variable as the
 dependent variable,  and  the toxicity test summary
 variables as the independent variables.  Bivariate plots
 would be useful to see the nature of the relationships and
 determine if nonlinearity needs to be taken into account
 when using the analytical tools."

 Assuming that the variables used to classify impairments
 were sufficient, Smith indicated thatthe conclusions made
 by Dickson et al.  (1992) from their classification system
 were reasonable.

 Data from only two CETTP studies were analyzed in com-
 mon by the two different groups of investigators. Both sets
 of authors reported statistically significant canonical corre-
 lations (but the correlation coefficients differed) for the
 Kanawha River data set.  Dickson et al. reported a statisti-
 cally significant canonical correlation coefficient for the
 Five Mile Creek data set, whereas Marcus and McDonald
 (1992) did not. The differences between the two analyses
 probably related to the fact that Dickson et al.  (1992) did
 not "mesh" the bioassessment fish and macroinvertebrate
 data whereas Marcus and McDonald did.

 From Smith's review, it is clear that there are various ways
to statistically analyze studies which attempt to examine
the relationship between toxicity test results and instream
 biological community responses. Data can  be used or
grouped  in various arrays such that the outcome  of an
analysis can be very different.

5.5  Review of CETTP  Studies in  Which A
Significant Correlation Was Not Observed
In Marcus and McDonald (1992) canonical analysis, four
of the CETTP studies, Ottawa River( Mount et al, 1984),
Five Mile Creek (Mount et al, 1985); Skeleton Creek
                                                   -10-

-------
(Norberg-King and Mount, 1986), and Ohio River (Mount
et al., 1986c) did not produce a statistically significant cor-
relation  between ambient water toxicity test results and
instream biological community parameters. The nature of
this type of analysis can obscure valuable pieces of data,
as well as informative observations.  For this reason, in-
structive aspects of these four studies are  summarized
below as the studies provide very useful information which
was not revealed by any canonical correlation analysis.

5.5.1  Ottawa River Study
The CETTP Ottawa Riverstudy included three discharges.
The most upstream was a sewage treatment plant (STP),
next was the refinery, and the last discharge was a chemi-
cal manufacturing plant. Outfalls from all of these facilities
were within a 1.3 km range  on the river.  Ecological
surveys were performed twice (1982 and 1983) at nine
different sites on the river. Two sites were upstream of the
three outfalls. Sites 2 and 3 were immediately above, and
below the STP outfall, respectively.  Sites 3 and 4 were
immediately above  and  below  the   refinery  outfall,
respectively. Sites 4 and 5 were immediately above and
below the chemical plant outfall, respectively. Sites 6, 7,
8, and 9 were approximately 6.8,13.1, 31.7, and 57.8 km
downstream of the chemical plant outfall.  If the discharges
from these plants were impairing  the Ottawa River
ecosystem, one  would expect that  sites 3, 4, 5, and
perhaps 6 would appear degraded compared to sites 1
(above discharges) and 9 (most distant downstream).
The STP discharge contributed approximately 72 to 82%
of the Ottawa River flow.  The refinery and chemical plant
                          effluents contributed about 17 to 32% and 8 to 10% of the
                          river flow, respectively.

                          Seven day early life stage toxicity tests with larval fathead
                          minnows and Ceriodaphnia were performed on effluents
                          from the three facilities and on surface water samples col-
                          lected at the nine river sites.  Benthic macroinvertebrate
                          and fish population indices were used to assess instream
                          biological community condition. Table 1 summarizes the
                          toxicity testing  data  from  the Ottawa River  tests.
                          Examination of the benthic macroinvertebrate diversity,
                          community loss  index, and dominant taxa data suggest
                          that the sites immediately downstream of the outfalls were
                          impaired compared to sites 1  and 9.  Sites 3, 4, 5, and 6
                          appeared the most impacted; recovery was  not evident
                          until site 8, (31.7 km downstream of the  chemical plant
                          outfall).

                          The  fish  population  data  generally   agreed  with
                          macroinvertebrate results. In the 1982 sample, sites 4,5,
                          6, and 7 were characterized by zero to 8 species and a
                          total of approximately 50 individuals at all four sites. At all
                          the  other sites there were 11 to 18 species with the
                          number of individuals in the  thousands.  For the 1983
                          sampling, sites 3, 4, 5, and 6 appeared to be the most
                          impacted with 1  to 5 species  and low total counts. The
                          remainder  of the sites were characterized by 10 to 23
                          species and high total counts.

                          Results from this investigation revealed a qualitative corre-
                          spondence between effluent and ambient water toxicity, as
    Table 1.     Toxicity testing summary for the Ottawa River study (Mount et al., 1984)
    Effluent

    STP:
    Refinery:
    Chem. facility:
    STP:
    Refinery:
    Chem. facility:

    Ambient Water Toxicitv
Larval Fathead Minnow Tests
    No significant toxicity 1982,1983
    Significant toxicity in 50% effluent 1982; 100% effluent 1983
    No significant toxicity in 1982; significant toxicity in 1% effluent in 1983

C. dubia Tests
    Significant toxicity in 10% effluent 1983
    Significant toxicity in 10% effluent 1983
    No significant toxicity
                             Fathead Minnow Test
                                Significant toxicity at sites 3 through 8 compared to sites 1 and 9 (1983)

                             C. dubia Test
                                Significant toxicity at sites 3 through 6 compared to sites 1 and 9 (1982);
                                Significant toxicity at sites 3 through 7 compared to sites 1 and 9 (1983).
                                                  -11-

-------
 well as between ambient water toxicity and ecosystem
 responses at sites downstream of effluent discharge
 points. Although statistical analyses were not performed,
 there was distinct correspondence between ambient water
 toxfcity and biological community responses.

 5.5.2 Five Mile Creek Study
 The Five Mile Creek study included three dischargers, two
 coke plants and a WWTP. Nine study sites were located
 along the creek: two sites above the first point of discharge
 and the remainder downstream of at least one discharge
 point. Coke plant #1 outfall was 2.3 km upstream of coke
 plant #2 and in turn  , coke plant #2 was 10.7 km upstream
 of the WWTP outfall.

 The 7-d larval fathead minnow and Ceriodaphnia toxicity
 tests were used to  assess toxicity in ambient  water
 samples from each of the sites, as well as in effluent
 samples. Ecological surveys were conducted at all sites
 in  February and October; effluent and ambient  water
 toxicity tests were also performed during these months.

 The effluent LOEC for the larval fathead tests was 1 % and
 3% in  October  and  February,  respectively for coke
 plant#1. The coke plant #2 effluent LOEC for October and
 February was 30% and 10%, respectively. In February,
 significant toxicity was seen in the larval fathead test at the
 two sites  below the coke  plants.  Ceriodaphnia tests
 conducted during February were not reliable because of
 problems in the culture population. In October, the LOEC
 in Ceriodaphnia tests was 10% effluent for coke plant #1
 and 30% effluent for coke plant #2. In general, the WWTP
 effluent appeared to contain little toxicity in either of the
 toxicity tests.

 There were fewer benthic macroinvertebrate taxa and
 lower density  at sites immediately below the coke plant
 outfalls, but the data were not conclusive.  Likewise, no
 consistent pattern was seen in zooplankton data. At sites
 above the coke plant outfalls 4 to 6 (total count >100) and
 10 to 12 (total count >1,600) fish species were counted in
 February and  October, respectively. In  February, 0 to 2
 (total count = 9) fish species were noted at the sites
 immediately below (500 m) the outfalls of the coke plants.
 In October, 1 to 8 (total count =  83) fish species were
 counted below the outfalls of the coke plants. This paucity
 of fish species and numbers of individuals below the coke
 plants discharge points  suggest that  those effluents
 adversely affected fish life.

 To understand this study it is important to note that the two
 coke plant effluents contributed a relatively low percentage
 of stream flow-less than 1%  for plant #1 and  usually not
 more than 8% for plant #2. In other words, the instream
waste concentration (IWC) forthese discharges was fairly
 low. Based on the results of the effluent toxicity tests, ac-
 ceptable waste concentrations (AECs) were calculated for
 the effluents of both coke plants. Comparison of the IWCs
 to AECs revealed that the AEC seldom exceeded the IWC
 with the exception of the sites immediately downstream of
 the two outfalls. These were the sites where there was a
 paucity of fish species and numbers. While the coke plant
 effluents (as well as ambient water) toxicity qualitatively
 predicted ecosystem impairments, the effluent tests tend-
 ed to "underestimate" fish population impairment. That is,
 some would suggest the  effluent toxicity tests yielded
 "false negatives"!

 That the canonical correlation analysis of Marcus and Mc-
 Donald (1992) did not "recognize" the specifics described
 above is not surprising since their analysis consolidated
 toxicity testing as well as bioassessment data. The only
 significant impairments in Five Mile Creek appeared to be
 on fish populations immediately downstream of coke plant
 discharges.  In the Dickson et al., 1992 study (where
 fishery data were correlated with the consolidated toxicity
 data) a statistically significant canonical correlation coeffi-
 cient between toxicity and bioassessment data was noted
 in the Five Mile Creek study.

 5.5.3 Skeleton Creek
 The Skeleton Creek study consisted of ten  sites where
 ecological surveys (Norberg-King and Mount, 1996) were
 performed and water samples collected for toxicity analy-
 sis. Sites were on Skeleton Creek or its tributary, Boggy
 Creek. During the study there were two major discharges,
 a refinery on Boggy Creek and a fertilizer manufacturing
 facility on Skeleton Creek a short distance below the con-
 fluence with Boggy Creek. On Boggy Creek there were
 two sampling sites above the refinery discharge point and
 one 200 meters below this point. On Skeleton Creek there
 was one site above the confluence with Boggy Creek and,
 thus, above the fertilizer plant discharge point. There were
 also sites immediately below the confluence with Boggy
 Creek and 300 meters below the outfall of the fertilizer
 plant. Five sites were at various distances downstream of
 the fertilizer plant on Skeleton Creek.

 The 7-d larval fathead minnow and Ceriodaphnia toxicity
 tests were used to assess effluents, as well as ambient
 water samples collected at  each of the sites.  Larval
fathead minnows were more sensitive than cladocerans to
the effluents from both the refinery and the fertilizer plant.
 Ten percent effluent from both facilities yielded statistically
significant larval fathead responses.

 Statistically significant larval fathead minnow mortality was
seen only in the ambient water sample collected immedi-
ately  below the fertilizer  plant outfall.   Statistically
significant larval fathead minnow growth inhibition was
seen only in ambient water samples collected immediately
below the outfalls of the refinery and fertilizer plant outfalls.
                                                   -12-

-------
 These toxicity test results were consistent witH the fish
 population data; the site immediately below the fertilizer
 plant was the only station where there were no fish.

 The fact that there were no fish collected at the site below
 the fertilizer plant and that the ambient water sample from
 this site was the only ambient sample to cause significant
 larval fathead mortality (effluent from this facility was also
 the most toxic to larval fish) suggests an effective corre-
 spondence between toxicity test results and instream bio-
 logical responses. That this relationship was lost in the
 canonical  matrices (Marcus and McDonald,  1992) of
 toxicity and bioassessment metrics is not surprising.
 In a study done at the same time as USEPA's, Burton and
 Lanza (1987)  reported that microbial assays revealed
 toxicity in  ambient waters below the two discharge points
 and that these toxicity results were inversely correlated
 with instream biological community data.

 5.5.4 Ohio River
 In this study, a 12 km segment of the Ohio River was in-
 vestigated.  Within the study area there was a steel mill
 with multiple outfalls and a WWTP.  This study included
 eight sampling sites. One site was located above the steel
 mill and WWTP outfalls. Other sites were situated immedi-
 ately upstream and downstream of the outfalls. The last
 river site was approximately 2.5 km downstream of the last
 steel mill outfall.

 Planktonic and benthic macroinvertebrate data were col-
 lected at each site only once. Ambient water samples from
 these sites were tested only once  with the  7-d larval
 fathead minnow and Ceriodaphnia toxicity tests; effluents
 were not tested.

 None of the surface water samples yielded significant
 toxicity to  Ceriodaphnia in a 7-d test.  The larval fathead
 minnow toxicitytest results were variable and inconsistent.
 Examination of the plankton data revealed little correspon-
 dence  to  points   of  discharge.    The   benthic
 macroinvertebrate data indicated possible impacts only at
 sites immediately below steel mill outfalls. These potential
 impacts were not predicted by the Ceriodaphnia or larval
 minnow toxicity tests.  If the instream biological responses
 were ecologically meaningful, they were underestimated
 (i.e, yielded false negatives) by the USEPA toxicity tests.

 Given the above observations, it is not particularly surpris-
 ing thatthe canonical correlations (Marcus and McDonald,
 1992) did not identify a statistically significant relationship
 between toxicitytest results and instream biological mea-
 surements, because neither varied greatly.  Therefore,
failure to find a significant canonical correlation in this
study should not be used to discredit the USEPA toxicity
tests, since there was little gradient in either the toxicity or
biological community variables.
 5.5.5 General Comments Regarding the Four CETTP
 Studies Summarized
 After reviewing  the  four CETTP studies  in which  the
 Marcus and McDonald (1992) canonical correlation did not
 find a statistically significant correlation between a matrix
 of toxicity test results and a matrix of bioassessment met-
 rics, it is not surprising that a statistically significant rela-
 tionship was not identified.  Moreover,  in three of  the
 studies, consolidation of the data obscured clear relation-
 ships between effluent/ambient watertoxicity and instream
 measurements.   Furthermore,  Marcus  and McDonald
 (1992) argued against placing too much value on the use
 of statistical significance and emphasized the high correla-
 tion coefficient  values identified in their analysis of  the
 CETTP and associated studies data. In the  study, signifi-
 cant toxicity was infrequent and differences in  instream
 parameters were minimal-not ideal for demonstrating sta-
 tistically significant correlations.  It would be incorrect to
 suggestthat these four CETTP studies described above
 constitute evidence that the USEPA  toxicity tests are
 unreliable qualitative predictors of instream biological com-
 munity responses.

 6.0 Criticisms of CETTP and  Associated
 Studies
 A group of authors (Parkhurst et al., 1990; Marcus and
 McDonald, 1992; Parkhurst, 1995,1996) has criticized the
 CETTP and associated studies.  The criticisms generally
 refate to design and analysis considerations, most of which
 are stated below. These publications consist of criticisms
 of the CETTP and associated studies and do not provide
 additional data regarding the predictiveness of the USEPA
 toxicity tests results. Moreover, empirical evidence which
 suggests that the USEPA toxicity tests are not reliable
 qualitative predictors of instream impairments has not been
 provided. Criticisms of the CETTP studies are stated and
 discussed below.

 6.1 CETTP Studies Compared Ambient Water Test
 Results with Bioassessment Variables
 A major criticism of the CETTP studies was that compari-
 sons were made between ambient water rather than
 effluent toxicity test results and biological community  re-
 sponses. The implication appears to be that abiotic and
 biotic factors other than  dilution can  mitigate effluent
toxicity. Parkhurst (1995) suggests that a missing link in
these studies was to connect surface water with effluent
toxicity.

 Discussion: Effluent toxicity was measured in seven of the
eight CETTP studies and, although statistical correlations
were not performed,  effluent toxicity corresponded with
ambient water toxicity and ecological responses.  This
criticism fails  to recognize that the most probable cause
(critics point this out, see Section 6.6) of toxicity in the
streams/rivers investigated was discharged effluents.
                                                   -13-

-------
 In the seven CETTP studies where effluent toxicity was
 measured, ambient water was not significantly toxic at
 sites above discharge points (or it was less than below
 discharge points).   Where  effluent toxicity was  noted,
 ambient water toxicity was generally seen at sites below
 the discharge point when dilution was taken into consid-
 eration. Furthermore, in most of the seven CETTP studies
 when effluent toxicity was identified there tended to be
 gradients  (i.e.,  greatest toxicity  immediately  below
 discharge points, with progressively lower levels of toxicity
 at sites downstream) of ambient watertoxicity below points
 of discharge. Also, where there was effluent toxicity there
 was generally evidence of instream impairments below the
 discharge  points   when   dilution  was   taken  into
 consideration.  Although statistical correlations were not
 performed between effluent and ambient watertoxicity (or
 instream biological measurements), it seems that effluent
 was responsible for ambient water toxicity and ambient
 water toxicity  was the  major  cause  of  instream
 impairments.

 6.2  Nonrandom  Selection of Study Areas and
 Sites
 Another major criticism of the CETTP studies is that study
 areas and sampling sites were not selected  randomly.
 Because of this, the contention is that findings  cannot be
 extrapolated using  statistical-based  induction to other
 aquatic ecosystems and, secondly, there was not a strong
 statistically based experimental design. A corollary to this
 criticism  is that USEPA intentionally selected rivers and
 streams where there were likely to be water quality prob-
 lems caused by discharged effluents.

 Discussion:  This criticism has merit and should be con-
 sidered when evaluating the  CETTP data. Design of the
 CETTP studies was not perfect from a statistical analysis
 standpoint.  More upstream (control) sites would have
 been desirable. Some  argue that all sites below  a dis-
 charge point represent pseudoreplicates. However, as a
 practical  matter limited funds and other resources require
 regulatory agencies  to focus on areas where  there are
 likely to  be environmental  problems so there can  be
 remediation and restoration. Indeed, the idea wasto study
 streams potentially impacted by effluent toxicity. It has not
 been the focus of regulatory agencies to study areas which
 are pristine or which have a low probability of water quality
 problems. Moreover, the intent of the CETTP studies was
to examine the relationship of probable effluent toxicity and
potential instream toxicity, as well as biological community
 responses.

 Random  selection of study areas would have resulted in
investigations of rivers and streams where there were no
discharges and  possibly waterbodies known to receive
effluents free of toxicity. A recommendation has not been
 advanced  as to  the  number and  types of aquatic
 ecosystems which should be studied before a consensus
 can be achieved on the effectiveness, or lack thereof, of
 single species toxicity test results in predicting qualitative
 ecosystem responses.  USE:PA (1991) suggests that it is
 reasonable to assume that in the absence of data showing
 otnerwiseVne relationship between ambient watertoxicity
 and  aquatic  ecosystem  impacts is   independent  of
 waterbody type.

 Random selection of sites on a stream would result in con-
 founding factors.   For comparison among sites or to a
 reference site, all sites should be equivalent,  including
 physical/chemical habitat and substrate; with this control
 the  major variable would be the  potential of  chemical
 toxicity from point or nonpoint sources. Random selection
 of sites could also introduce the confounding factor of non-
 chemical, anthropogenic effects on biotic communities.

 The criticisms that more sitess (controls) upstream of dis-
 charge points were necessary, that more non-impacted
 sites were necessary for an acceptable statistical design,
 and  that  all  sites  below  discharge  points  were
 pseudoreplicates  have some merit, but also disregard
 some facts and observations. In design of the CETTP and
 associated studies, it was unknown whether or not sites
 downstream of discharge points would show ambient water
 toxicity; whether or the ecological surveys would indicate
 whether  these sites were  impacted or not also was
 unknown. While, from a purely statistical standpoint, the
 sites downstream of discharge points could be considered
 pseudoreplicates, this criticism fails to recognize that in a
 majority of the CETTP and associated studies there were
 progressive gradients of decreasing ambient watertoxicity
 below discharge points which corresponded with progres-
 sive gradients of "improvements" in biological community
 indices.

 The criticism that there was limited statistical correlative
 analysis in the original CETTP publications is valid. How-
 ever, as indicated above, 'this was a  consequence of
 relatively small sample sizes (i.e., number of sites in each
 study).   This statistical  analysis  criticism has  been
 addressed in part by the Dickson et al. (1992) and Marcus
 and McDonald  (1992)  analyses.   As indicated  in
 Section 5.4 above, there are other ways that the CETTP
 data could be grouped and statistically analyzed.

 6.3  Use of the Most Sensitive   Toxicity  Test
 Results
 Marcus and McDonald (1992) called attention to the use
 in two CETTP studies (Norberg-King and Mount, 1986;
 Mount et al., 1986) of data from the most sensitive of two
toxicity tests to  relate with  the  most sensitive
 bioassessment measurements.
                                                   -14-

-------
 Discussion:  The USEPA procedure has biological and
 statistical limitations, however, it also has some logic from
 an ecological  perspective.  Because  sensitivities of
 different test organisms vary with the toxic chemical or
 combination of chemicals, the occurrence and combina-
 tions of toxic chemicals can vary along a stream, and
 assemblages of organisms change  along a  stream, it
 seems ideal to test with a suite of species and then relate
 these data to instream biological community variations.
 Likewise,  different   components   of  the  instream
 communities are likely to respond to different chemicals or
 combinations of chemicals.

 The limited responses (only two USEPA toxicity tests)
 tested in the laboratory toxicity tests compared to the
 multiple responses in aquatic ecosystems necessitates
 that all possible relationships be explored. Therefore,
 while recognizing the limitations of using maximum re-
 sponses, they may provide insights into interactions of
 toxicity and  community responses.   Because of the
 extremely limited number of species and biological end-
 points represented in the USEPA toxicity tests  there has
 been a tendency  for regulatory conservatism (use of
 results of the sensitive species).  Whether or not this
 conservatism is completely justified remains to be deter-
 mined; however, the results of this review show that labo-
 ratory single species tests more frequently yield reliable
 predictions, or underestimates, of biological community
 responses than overestimates of impacts.

 6.4 Relationship Between Toxicity Test Results
 and  Instream Biological Measurements Relied
 Heavily on High Magnitude Toxicity.
 Another criticism of the CETTP conclusions is that the
 correspondence between  ambient  water toxicity and
 ecosystem community impairments relied extensively on
 areas and sites where toxicity was relatively high.

 Discussion: There is merit to this criticism, but the overall
 significance is uncertain since toxicity theory is  based on
 a  concentration-response  relationship  (i.e., a greater
 response with highertoxicity). There should be no surprise
 that higher levels of toxicity (enough to cause lethality) in
 ambient or effluent water samples can yield measurable
 responses in ecosystem parameters. Furthermore, biologi-
 cal responses, as all measurements, are less reliable near
 detection limits. "False positives" are of greater concern
 in situations where surface water of effluent toxicity is rela-
tively low and near detection limits. The ability to reliably
 detect biological community impairments when the concen-
trations of toxic chemicals are near the effect thresholds
 is  difficult;  detection  of such impairments also will be
obscured by  the complexity and  natural variability in
aquatic ecosystems. It should be emphasized that, in the
CETTP studies, toxicity test "predictions" were based on
effects (including sublethal) in the 7-d early life stage tests.
 6.5 Temporal Repeatability of the Ambient Water
 Toxicity/Biological   Response   Was   Not
 Demonstrated
 The CETTP studies did not confirm through time the corre-
 spondence  of  surface  water  toxicity with  instream
 biological variables.

 Discussion:  There is some validity in this criticism, yet
 there can be wide temporal variations in effluent and ambi-
 ent toxicity. Temporal variations in the relationships be-
 tween toxicity and biological community parameters were
 considered in some of the CETTP and associated studies
 (Dickson et al.,  1989;  Mount et  al., 1984; Mount et al.,
 1985). Defining the magnitude, duration, and frequency of
 effluent/ambient watertoxicity is important. Understanding
 natural seasonal variations in aquatic biological communi-
 ties is essential when attempting to relate these variables
 to  potential controlling factors.  Significant variations in
 stream flow and physico/chemical factors can also influ-
 ence the relationship between effluenttoxicity and biologi-
 cal community responses and must be considered in de-
 scribing a temporal relationship.  Failure to demonstrate
 a   statistically  significant   correlation   between
 effluent/ambient water toxicity throughout the year does
 not discount the possibility of ecosystem impairments from
 toxic chemicals (from point or nonpoint sources) during
 portions of the year. The issue of temporal repeatability of
 the relationship between effluent or ambient water toxicity
 and biological community responses has been addressed
 by  Dickson et al. (1989, 1996).

 6.6 Confounding Factors Were Not Considered
 Parkhurst and associates (Parkhurst, 1995, 1996; Park-
 hurst et al., 1990) suggested that several factors otherthan
 ambient watertoxicity could have affected biological com-
 munity, but were not considered in the CETTP studies.
 They contend that both natural (e.g., poor habitat, low oxy-
 gen, nutrient enrichment, organic enrichment, natural sea-
 sonal variations) and non-effluent, anthropogenic factors
 could have  been responsible for biological community
 changes in the CETTP studies.

 Discussion:  While contending that confounding factors
 were not considered, these authors also point out that dis-
 charged effluents were the most probable cause of water
 quality problems. If their confounding factors theory is cor-
 rect one  would expect a high percentage  of  "false
 negatives" (toxicity test results predict no instream impact,
 but impact measured) in  the CETTP and associated
 studies.  However, "false negatives" were noted in only
 6.3% of  the  160 sites in  the  CETTP and associated
studies.

 Irrespective of potential confounding factors, statistically
significant canonical correlations were seen between ambi-
ent water toxicity test results and biological community
                                                  -15-

-------
 responses (Dickson et al., 1992; Marcus and McDonald,
 1992).  The criticism of confounding factors appears to
 disregard the CETTP and associated studies observations
 which revealed impairments on a progressive gradient be-
 low effluent discharge points (i.e., the greatest impairments
 were at sites nearest the discharge point, decreasing with
 distance from the discharge point). The argument regard-
 ing confounding factors makes little biological sense given
 that CETTP sites upstream of discharge sites generally
 Indicated "healthy" communities, whereas sites below dis-
 charge  points (which showed toxic effluents) tended to
 suggest impairments.

 The high frequency of accurate predictions in the Dickson
 et al. (1992) classification system of instream biological
 responses based on toxicity test  results in the CETTP and
 associated studies is rather surprising  given that these
 relationships were based on the results of single, or few,
 toxicity tests with a single bioassessment indices (which
 tends to be temporally  integrative, but which does  not
 incorporate natural variations).

 6.7 Was the CETTP Classification System Math-
 ematically Biased?
 Marcus and McDonald  (1992)  criticized the procedure
 used in some of the CETTP studies for identifying correct
 predictions of biological  impairments based on toxicity
 testing data.

 Discussion: This criticism appears accurate.  No consis-
 tent method was used throughout the CETTP studies to
 select correct and incorrect predictions. Based on these
 CETTP  comparisons, some studies concluded  that  the
 degree of toxicity was related to the degree of instream
 taxa reduction. The analysis of the data using various
 analyses appears to have been an attempt to convert a
 qualitative  relationship between toxicity test results and
 instream biological responses to a quantitative one.

 6.8 High Rate of False Positives
 Parkhurst  (1992) suggested that the rate of "false
 positives" (toxicity test results predict instream impact, but
 no impact observed) in the CETTP, South Elkhorn Creek
 (Birge et al., 1989), and Trinity River (Dickson et al., 1989)
 studies was 68% and 23% in the North Carolina (Eagleson
 etal., 1990) study.

 Discussion: Using all available  data the actual  rates of
 "false positives" were 9.4% and 7%, respectively in  the
 CETTP/Associated studies and the North Carolina study.
 Parkhurst values are based on only a portion of the data
collected in all of the studies, the  sites identified as  not
impacted. While there may be some value in the approach
presented by Parkhurst (1992), it certainly ignores a very
large portion of the data collected.
6.9 Miscellaneous Criticisms
Some criticisms of the CETTP studies do not relate directly
to those investigations. These criticisms include:
4  The size and assimilative  capacity of the receiving
   waterbody is not considered when evaluating WET test
   results,
*  the duration of exposure in aquatic ecosystems, relative
   to test duration, is not considered in the evaluation of
   WET test results, and
4  actual effluent dilution  and flow conditions are not
   usually considered in the evaluation of USEPA toxicity
   tests results.

Discussion: Some of these criticisms have merit, yet these
criticisms are less concerned with the reliability with which
USEPA toxicity tests results predict ecosystem responses
than with concern that the results of single (or few) toxicity
test results could be used as evidence of an effluent permit
violation  (i.e., they  represent potential implementation
problems). Certainly, such factors must be considered and
incorporated into risk assessments.

6.10 Conclusions
The CETTP studies suffered from some design and inter-
pretive problems. However, even critics of the CETTP and
associated studies tend to agree that there is a good quali-
tative relationship between USEPA toxicity test results and
aquatic ecosystem community responses. These critics
correctly assert that a quantitative relationship has not
been established.  Although critical of the CETTP and
associated studies,  Parkhurst et al. (1992) accept that
these studies demonstrate that, if adequate consideration
is given for effluent dilution, USEPA toxicity tests results
should be reliable predictors of ecological impairments.
What appears to be lacking in the criticisms of the CETTP
studies are: 1) experimental data which indicate that single
species (EPA toxicity tests) test results are more frequently
unreliable rather than reliable predictors of ecosystem
impacts, and 2) suggestions for effective alternatives to the
single species tests.

Recognizing  that  ecosystems   are  complex   and
multivariate, with many interacting factors and that sample
sizes were rather small, it is not surprising that the CETTP
and associated studies did not  establish a quantitative
relationship  between USEPA toxicity tests results and
biological community responses. However, the qualitative
association established was convincing enough to accept
the results as predictive of probable biological impacts.  If
a series of ambient water or effluent water tests produce
statistically significant toxicity in the USEPA toxicity tests,
some degree of ecosystem impairment is likely. Since the
USEPA toxicity tests provide an early warning and are
predictive of probable aquatic ecosystem impairments, it
is not essential that they be highly quantitative predictors
of biological community impacts.
                                                   -16-

-------
7.0 Single Species Tests with Effluent
Investigations in which effluents were tested with single
species toxicity tests and in which some ecological survey
data were collected from the receiving stream for compara-
tive purposes were reviewed. A summary of these reviews
is presented in Appendix A.  Studies reviewed in this Ap-
pendix, as well as in Appendices B and C were located
through literature searches. All studies related to the topic
were reviewed, none were screened out.  These studies
represent a special  concern  because of the  criticism
related to the correspondence between single  species
toxicity test results and ecosystem responses.

Appendix A summarizes 13 publications and the tabula-
tions presented below are by study (i.e., by the outcome
of the entire study, not by subcomponents within studies).
In nine (69%) of the 13 studies early  life stage  test
NOEC/LOEC  s  from effluent  tests provided  reliable
qualitative predictions of instream impairments. In three
(23%)  studies early life stage effluent test NOEC/LOECs
underestimated instream responses.  Results from one
study was inconclusive, consequent to study design and
interpretive inconsistencies. Based on effluent toxicity test
results no overestimations of instream impacts were noted
in these 13 studies.

The 13 studies summarized in  Appendix A, as well as the
Eagleson et al. (1990) study discussed above, demon-
strate that single species toxicity test results on effluents
can provide reliable qualitative predictions of biological
community responses ortend to underestimate ecosystem
impairments.

8.0 Single  Species Tests  with  Individual
Chemicals or Small Groups of Chemicals
Studies in which single species toxicity tests were used to
assess the toxicity of a single chemical or a small combina-
tion of chemicals and predict aquatic ecosystem biological
responses were evaluated and summarized  in Appendix
B, which  is subdivided into sections on pesticides, other
organic chemicals, metals and miscellaneous substances.

8.1 Organic Chemicals: Pesticides
Eighteen studies dealing with pesticides are summarized
in Appendix B. The most studied pesticide in this group of
investigations  is the  organophosphorus  insecticide
chlorpyrifos (seven studies). In  14 (78%) of the 18 studies,
single species laboratory toxicity test results  reliably pre-
dicted direct field adverse effect concentrations.  In many
of the studies the single species laboratory tests failed to
predict the secondary  (indirect)  effects  seen the field
experiments, such that biological community effects were
underestimated by the laboratory single species toxicity
test results. In four of the studies reviewed in Appendix B
the   laboratory   single  species  toxicity  test  effect
concentrations overestimated the field effect concentration
(i.e., the laboratory single species data underestimated the
biological community  responses).   Although use  of
daphnids in laboratory tests has been  criticized by some
because they are indicator, rather than resident species,
data in  12 of the 18 studies suggest  that daphnids are
reliable  (or tend to underestimate aquatic ecosystem
impacts) predictors of a biological community response.

8.2 Organic Chemicals:  Nonpesticides
Eleven investigations of organic chemicals were reviewed
and summarized in Appendix B. Laboratory single species
toxicity tests results were reliable predictors of biological
community effect concentrations in seven (64%) of the
eleven studies.  In most  of these six studies in which
laboratory effect concentrations were considered reliable
predictors, single species test results were somewhat
higher than the field effect concentrations (i.e., biological
communities were somewhat more sensitive to chemicals
than predicted by the laboratory tests).  Laboratory toxicity
tests overestimated field effect concentrations in two (18%)
studies.   Results of two  studies were inconclusive or
mixed.

8.3 Metals
Ten studies dealing with metal toxicity are reviewed in Ap-
pendix B. Results of five (50%) of the ten studies suggest
that laboratory single species test effect concentrations are
reliable qualitative predictors of biological community effect
concentrations and responses.  In four (40%) of the studies
laboratory single species effect  concentrations were
notably higher than effect concentrations (i.e., laboratory
single species tests underestimated aquatic ecosystem
impacts). One of the ten studies was inconclusive.

8.4 Other Data and Views of Predictiveness  of
Single  Species Test Results
Persoone and Janssen (1994) submitted that environmen-
tal factors may notably  modulate toxicity  (e.g.,  alter
bioavailability) as measured in laboratory tests. A majority
of the studies, with the exception of investigations on met-
als, summarized in Appendix B do not support that claim.
Speculations that laboratory toxicity test results estimate
effect concentrations (e.g., LOECs, NOECs) that are con-
siderably below instream effect concentrations have been
voiced, but most of the data reviewed herein fail to support
those conjectures.

La Point (1994) concludes that direct, but not secondary,
responses of fish in ecosystems can be predicted from
laboratory single species test results.  Luoma  (1995)
suggests that accurate predictions of metal impacts based
on single species test results are rare. Luoma (1995) also
wrote,  "As  toxicity tests  are  increasingly  used  in
contaminant management, reliance on insensitive
                                                   -17-

-------
  Table 2. Equations showing relationships between
           laboratory (single species) and ecosystem
           determined endpoints (data from Slooff et al.,
           1986)
  Using acute toxicity data the following equation was
  derived:

      log NOEC(ecosystem) = -0.55+0.81 log
      LC50(single species tests).

      In this case, n = 54, r = 0.77, and the
      uncertainty factor was 85.7.
  Using chronic toxicity data the following equation was
  derived:
      log NOEC(ecosystem) = 0.63+0.85 log
      NOEC(single species tests)

      In this case, n  = 51, r =  0.85, and the
      uncertainty factor was 33.5.
 procedures dominated by type II error (false negative) will
 lead  to regulations that underprotect nature."  Luoma
 (1995) listed the uncertainties in single species tests which
 result in underestimation of impacts due to metals on
 biological communities.  These sources of uncertainty
 include:

  *  choice of species (sensitive and ecological
     keystone species unrepresented),
  t  exposure time (underestimated),
  *  exposure route (rarely considered),
  *•  multigenerational life cycle (unrepresented),
  *  higher-order secondary effects (rarely considered),
     and
  *  interaction with natural  disturbances (rarely
     considered).

 Margins of uncertainty in predicting toxicity from laboratory
 single species tests to higher levels of biological organiza-
 tion were  determined by  regression  and  correlation
 analyses (Slooff et al., 1986).  Analyses were performed
 on  log-transformed data. The 95% uncertainty factors
 were  determined as the  minimum ratio of the estimated
 toxicity value and  its  upper and lower 95% confidence
 (prediction) limits.

 The regression  analysis consisted of  regressing eco-
 system-determined effect concentrations on laboratory
 single species toxicity test  effect concentrations.   The
 uncertainty factorwas defined as the minimum ratio of the
estimated effect concentration and its 95% prediction limit.
So, the srnallerthe value of the uncertainty factor, the more
 reliably would single species toxicity test results predict
 biological community effect concentrations.

 Using acute toxicity data for 34 chemicals, the following
 relationships in Table 2 were determined. Slooff et al.,
 (1986) concluded that data from laboratory single species
 toxicity tests  are reliable  enough for ecological risk
 assessments.

 The studies summarized in Appendix B suggest that labo-
 ratory single species test results afford a reliable qualitative
 prediction  (are  reliable  for extrapolations)  of  aquatic
 biological community responses or of environmental effect
 concentrations. Tabulation of the 47 studies (tabulation is
 by outcome of the entire  study) reviewed in Appendix B
 yields the results presented in Table 3.

 Single species toxicity test results usually provide enough
 information to tak,e action.  These tests can be  used to
 determine concentrations of chemicals in a water sample
 are sufficient to affect biological functions. Subsequent
 action can be taken to determine the chemicals causing
 toxicity and/or the persistence and  magnitude of the
 toxicity in the effluent or the water body. Clearly, the
 results of a single toxicity test should not be equated with
 ecosystem impairment; a test result is not de facto, defin-
 itive proof of biological impairment.
Table 3. Summary of studies examining the
relationship between laboratory single
species test results and aquatic ecosystem
responses (Appesndix B).
Laboratory single species effect concen-
tration provides reliable prediction of
biological community effect concentration
and/or responses
Laboratory single species effect concen-
tration > field effect concentration (single
species test underestimates biological
community responses)
Mixed or inconclusive results.
68%
23%
9%
9.0  Comparison  of  Single   Species  and
Multiple  Species  (Microcosm,  Mesocosm)
Toxicity Test Results
Intuitively one might suspect that single species  toxicity
test results  would not  predict biological  community
responses as reliably as multiple species (this term is used
to include both micro- and mesocosm studies) test results.
                                                   -18-

-------
 Direct comparisons have  not  been frequent, but five
 groups of authors  (Slooff, 1985; Emans et al.,  1993;
 Okkerman et al., 1993; Persoone and Janssen,  1994;
 Dorn,  1996)  have  published literature reviews which
 address this issue.

 9.1 Okkerman et al. (1993)
 Results from NOECs from  single species and multiple
 species tests were compared by Okkerman et al. (1993)
 in an endeavor to gain insight into whether aquatic ecosys-
 tems can be protected by setting a "safe" concentration
 derived from single species  toxicity test results compared
 To achieve this, Okkerman et al. (1993) performed an
 extensive literature search to locate all available multiple
 species studies.  These studies were then put through
 rigorous criteria to identify  the multiple species studies
 considered to be  reliable.  Some important criteria were
 that a study had to include  several taxonomic groups in
 fairly realistic ecosystems, the concentration of the chemi-
 cal had to be analytically verified, and a concentration-
 response relationship had to occur. NOECs from the multi-
 ple species studies were for direct effects only.

 Forthose compounds where a multiple species NOEC was
 considered reliable, the authors searched for  single
 species tests with an NOEC they considered reliable. Data
 were sufficient and reliable  enough to make the multiple
 species and single species  comparison for  only ten
 organic compounds, most of them were pesticides. When
 more than one single species NOEC was available, the
 comparison was  made using the value for the  most
 sensitive species. The comparison was the ratio of the
 multiple species and single species NOECs.  The closer
 the ratio was to one, the less divergent the single species
 and multiple species NOECs.

 For all ten chemicals, the ratio was five or less; for six of
 the compounds, the ratio was approximately one or less
 than one; forthe remaining fourchemicals the ratio ranged
 from 2.5 to 5. These investigators concluded that despite
 the general concept that effects  assessments should be
 conducted in  actual  aquatic ecosystems  or multiple
 species tests, in only a few cases did NOECs differ greatly
 between single species and multiple species tests. They
 also surmised that with some caution, due primarily to a
 paucity of data, single species toxicity test data are a good
 starting  point for  establishing "safe" concentrations for
 aquatic ecosystems.

 9.2 Emans et al. (1993)
The accuracy of extrapolating from single species toxicity
test results to aquatic ecosystem communities also was
examined by Emans et al. (1993). Their approach was to
compare NOECs derived from  multiple  species field
studies with those from single species toxicity tests. If field
multiple species toxicity test results were more reliable
predictors of how biological communities would respond
 to a chemical(s) than single species test results, then one
 would suspect that "safe" concentrations generated from
 these two different procedures would differ appreciably.

 After an extensive literature search, acceptable data forthe
 comparison of single species and multiple species tests
 were identified for 29 chemicals. Based on statistical anal-
 yses, the authors concluded that "there seems to be no
 reason to believe that organisms differ in sensitivity under
 field and laboratory conditions." Moreover, when species
 tested in the multiple species experiments were compared
 with similar or related species in single species studies
 (given corresponding response parameters and equivalent
 exposure concentrations) their response/sensitivity to a
 given chemical appeared essentially equivalent. Results
 of this inquiry suggest that single species toxicity test
 results are  reliable predictors of  biological community
 responses.  With the caution that there are limited  data,
 these authors conclude that it is acceptable to derive "safe"
 concentrations from single species toxicity test data.

 9.3 Slooff (1985)
 Slooff (1985) reached the same conclusion as Okkerman
 et al. (1993) and Emans et al. (1993) regarding the equiv-
 alency of effect concentrations from single species and
 multiple species toxicity tests after reviewing the literature
 studies.

 9.4 Persoone and Janssen (1994)
 The potential of laboratory single species test  results to
 reliably predict biological community responses was ex-
 amined in an extensive fouryearinterlaboratory study with
 four  chemicals  (copper,   atrazine,   lindane,   and
 dichloroaniline) by Persoone and Janssen (1994). NOECs
from  outdoor   stream  and  pond  microcosms  were
 compared with those from single species laboratory tests.
The NOECs from the field studies were within one order
 of magnitude  of the NOECs of  the  most  sensitive
 laboratory test, suggesting that the single species tests are
effective qualitative  predictors  of ecosystem effect
concentrations.

 In their review  of the literature on field "validation" of
predictions based on single  species toxicity  test  data
 Persoone and Janssen wrote, "One of the most striking
conclusions of this  literature  study  is that, in general,
NOECs derived from (a selected battery of) single species
laboratory tests relate relatively well to single species and
multiple species NOECs obtained in field studies."  Such
studies are not truly "validation", but they do argue that
abiotic and biotic factors in aquatic ecosystems do not
greatly modify effect concentrations  or bioavailability of
chemicals compared to laboratory tests.

9.5 Phluger (1994)
NOECs from multiple species field and single species labo-
ratory tests for ten pesticides were compared by Phluger
                                                   -19-

-------
 (cited in Persoone and Janssen, 1994). For all ten pesti-
 cides there was less than one order of magnitude differ-
 ence between the single species laboratory and multiple
 species field test NOECs, suggesting that the single spe-
 cies test results were reliable qualitative predictors of bio-
 logical community responses.

 9.6 Dorn (1996)
 In three separate stream mesocosm experiments,  testing
 a homologous series of nonionic alcohol ethoxylate sur-
 factants, Dom (1996) found that laboratory single species
 toxicity test effect concentrations were within a factor of
 three of mesocosm effect concentrations. In summarizing
 a review  of the literature Dorn  (1995)  concluded that
 effects  observed in numerous mesocosm studies are
 consistent with  laboratory single  species toxicity test
 results when exposures are reconciled correctly.

 9.7 Crane (1995)
 In a Society of Toxicology and Chemistry (SETAC) News
 article,  Crane (1995)  states, "Such tests (mesocosms)
 cost several million dollars to perform,  but the  results
 obtained from them have shown no greater sensitivity or
 predictive power and certainly no greater interpretability,
 than considerably cheaper laboratory tests with  single
 species".  Crane refers to the reviews of Okkerman et al.
 (1993) and Emans et al.(1993) as support for his position.

 10.0 Alternatives to Single Indicator Species
 Tests
 If the desire is to continue with testing which can provide
 an early warning of probable aquatic biological community
 impairments while having good qualitative reliability in pre-
 dicting ecosystem responses, possible options to the exist-
 ing USEPA toxicity tests and other single species  proce-
 dures include: 1) single indigenous species tests  and 2)
 multiple indigenous species tests. Desirable test and end-
 point characteristics  for reliably predicting instream
 biological  community responses have been listed (16
 items) by  Cairns and Niederlehner (1995). No current
 single species test can meet these criteria.

 10.1 Tests with Single Indigenous Species
 A common criticism of the indicator species tests is that the
 species does not occur in a particular waterbody. The
 argument is that the indicator species test should  be re-
 placed with an indigenous species test. From a biological
 perspective, the use of an indigenous species is sound, but
 care  must be  taken in  the selection of a  replacement
species from the same phyletic group. Selecting an indige-
 nous species  from an  impaired or partially impaired
waterbody could be a mistake since that species  would
 represent  a species likely to have developed tolerance to
chemical stressors. In the case of impaired systems se-
lecting a species that could or one that previously did (from
historical data) live in such a habitat may be necessary.
While  single indigenous species tests may decrease the
uncertainty  associated with extrapolating from  single
species test results to biological community responses, we
need evidence that such tests will significantly increase the
accuracy  of predicting  instream  impairments.  Such
indigenous species tests may  not significantly improve
predictive accuracy enough to justify the time, effort, and
cost of developing standard (with the essential QA/QC)
protocols with indigenous species for multiple watersheds.
Is it desirable, feasible, and cost effective to develop proto-
cols for indigenous species in each watershed or subre-
gions of watersheds? Furthermore, there is little evidence
that indigenous species test results are more reliable pre-
dictors of biological community responses in complex and
multivariate ecosystems.

Currently available single species tests can effectively re-
veal when there are significant levels of toxic chemicals in
an effluent or ambient water sample. The statistical proba-
bility that any one test species represents the most or the
least sensitive species, life stage, or endpoint in a given
ecosystem is very low. Persoone and Gillett (1990) con-
clude that  single indicator species toxicity tests do not
represent the most sensitive species or endpoints,  and
especially key components, in aquatic ecosystems. In fact,
one could argue that the single species tests (especially
the USEPA toxicity tests) have been effective predictors
of ecosystem responses because they manifest relatively
average sensitivities compared to most aquatic ecosystem
species and endpoints. Probability theory also advises us
that there can never be enough predictive potential in the
results of a single species toxicity test to encompass all
possible effects on ecosystem structure and function.

Luoma and Carter (1993) conclude that single species
toxicity tests results, when combined with chemical mea-
surements and benthic community surveys have shown
reliable qualitative relationships between toxicity tests re-
sponses, chemical concentrations, and changes in biologi-
cal  community  structure.   Slooff and Canton (1983)
asserted that the sensitivities in three  indicator species
testing (alga, daphnid.fish) effectively represented aquatic
organism sensitivity ranges for approximately 75% of the
chemicals they tested. Many of the pitfalls of developing
new single species tests are discussed by Luoma (1995).
Luoma is  not convinced that increasing the number of
standardized single species toxicity tests will improve the
accuracy of predicting ecosystem impacts.  In  reviewing
the validity of using indicator, rather than resident species
toxicity tests, Dorn (1996) suggested that use of  "new"
tests  with resident species may  not give   us  better
resolution of aquatic ecosystem responses than do the
well-developed  indicator  spescies  tests.   Rather than
develop a host of new single species tests, Dorn (1996)
advises that a better use of resources would be to assure
                                                   -20-

-------
that laboratory and field exposure regimes are comparable
(i.e., improve exposure assessments).

Chapman (1995a) concludes that the current standardized
single species test protocols do not represent the more
sensitive  ecosystem  endpoints;  he  also  notes  that
daphnids and fathead minnows are usually not the most
sensitive components in aquatic ecosystems.  Several
other authors (Persoone etal., 1990; Baird, 1992; Forbes
and Depledge, 1992; Clements and Kiffney, 1996; LaPoint
et a!., 1996) also proposed that the USEPA.toxicity tests
and   other  single  species  tests  most  frequently
underestimate effects in aquatic ecosystems. Examination
of USEPA's chemical-specific water quality criteria docu-
ments illustrates that the USEPA toxicity test species
(USEPA, 1994a,b) are not consistently among the most
sensitive species tested.

In combination with Toxicity Identifications Evaluations
(TIEs) and chemical analyses, the current set of single
species toxicity tests appear to be effective in the identifi-
cation  of toxicity, as well as its  sources and causes.
Therefore, available funds and efforts could be focused on
improving these procedures rather than on developing a
host of  new  indigenous  species testing procedures.
Persoone et al. (1990) assert that despite limitations of
indicator single species tests, they have been extremely
useful  and reliable predictors of ecosystem responses.
Cairns and Mount (1990) conclude that developing toxicity
test  methodologies with "new"  aquatic  organisms is
probably not productive unless the response of this new
species has a high correspondence with responses of
many other aquatic species. Cairns and Mount state, "For
regulatory purposes, it is unquestionably sound to use test
organisms that have been widely used for toxicity testing
and whose strengths and weaknesses forthis purpose are
well  known." There is little or no evidence that  use of
indigenous species in single species laboratory toxicity
tests will improve our ability to predict responses in the
field.

10.2 Tests With Multiple Indigenous Species
Several researchers have argued for the use of multiple
species  (micro/mesocosm)  tests rather than   single
species tests in regulatory settings.  The literature on
multiple species toxicity test strengths and limitations will
not be summarized here. Suffice it to say that the limita-
tions of these tests seem to be equivalent or greater than
for single species  toxicity tests (Dickson et al., 1985;
Mount, 1985; Slooff, 1985; Cairns et al., 1993; Cairns and
Smith, 1994; Dickson, 1995; LaPoint, 1995; Smith, 1995).
Generally, the designs of multiple species tests are highly
variable with no standardized protocols  or endpoints.
Multiple species tests are predisposed to  be ecosystem
specific,  which is a strength as well as a weakness.
Results of multiple species tests tend to be more variable
than those from single species toxicity tests. Factors that
increase complexity of toxicity tests may boost ecological
relevance, but result in greater variability, as well as less
reliability and repeatability.

Although  there is controversy on  this issue,  multiple
species tests have not been found to be more sensitive
than single species tests.  Information is increasing, but
"validation" of these multiple  species  test results with
ecosystem bioassessments  has  not  been  frequent.
Neuhold (1986) postulates that microcosm tests present
interpretation problems and are not likely to offer reliable
predictions of natural ecosystem impacts. Responses in
control systems are difficult to replicate and responses in
treatment groups tend to diverge greatly (Gearing, 1989).
Several groups of investigators (Cairns, 1983; Giesy and
Allred, 1985;  Slooff et  al., 1986; Luoma and  Ho,  1993)
conclude  that mesocosms are better suited to testing
process questions than to replicating nature. In regard to
multiple species tests Bailey (1995) concluded that "I am
not convinced that complex (i.e., multiple species) tests
accompanied  by simple models offer a  reduction in
(predictive) uncertainty over simple tests accompanied by
complex models."  Slooff (1985) argued that there is no
evidence that multiple species tests are more reliable in
predicting instream impacts than are  results of  single
species toxicity tests.

Although multiple species tests may have greater predic-
tive capacity than single species test results, they have
limitations which include:

+  There  are no standardized protocols and developing
   multiple species testing procedures will be costly. End-
   points have not been agreed upon. Designs, endpoints
   measured, and statistical analyses of multiple species
   tests vary widely, resulting in considerable debate re-
   garding the interpretation of every study.
+  Multiple species protocols may have to be altered or
   developed for each aquatic ecosystem.
+  Most multiple species tests are not designed to be early
   warning signals.
+  Multiple species tests tend to have high within and be-
   tween  test variability, and especially high between
   laboratory variability (more than for the single species
   tests).
4  Predictions of ecosystem responses based on multiple
   species test results will likely be qualitative rather than
   quantitative.

According to Giesy and Allred (1985), variability increases
with the size and complexity of multiple species study de-
sign. These authors contend  that replicability (ability to
establish more than one experimental unit within a particu-
lar experimental treatment) of multiple species tests is gen-
erally sufficient, but that the realism and accuracy in these
tests is largely unresolved.  Further, Giesy and  Allred
                                                   -21-

-------
(1995) claim that repeatability (duplicating results of a test
at a later time) of multiple species tests has seldom been
examined.

The intent is not to discredit the importance of multiple
species tests. The point is that multiple species tests may
not be more reliable screening or predictive tools than are
single species Tests. Multiple species tests are important
for  providing fundamental information on structure and
function of aquatic ecosystems and for potential following
up on single species toxicity test data.

11.0 Studies  in Ocean or  Estuarine Settings
Although the relationship between ocean watertoxicity and
water column biological community health has  not been
examined to any great extent, the link between laboratory
marine  sediment  toxicity  and biological  community
response has been studied. This review does not include
an exhaustive examination of the literature dealing with
marine sediment toxicity.  However, several studies (see
Appendix C) suggest that the results of sediment toxicity
tests are fairly reliable qualitative predictors of benthic
community responses.  In  a review of  the literature on
laboratory sediment toxicity testing with single species
Lamberson etal. (1992) concluded that, despite realized
and potential problems,  the  test results  have proven
"enormously  successful  as   both   research   and
management tools."  As Luoma and Ho (1993) conclude
in a review of the literature on sediment toxicity tests, it is
inappropriate to use data from single species tests alone
to quantitatively  predict  specific  aquatic ecosystem
impacts.

Appendix C includes summaries  of ten studies in which
laboratory single species toxicity test results on marine
sediment samples were evaluated in terms of predicting
benthic biological community responses. In all ten of these
studies, the laboratory sediment tests were reliable qualita-
tive predictors of benthic community effects; the laboratory
tests  tended to underestimate  the extent, of benthic
community impacts.

Richardson and Martin (1994) critiqued the strengths and
shortfalls of using ocean and estuarine toxicity testing as
a procedure for evaluating potential water quality impacts.
While constraints,'including the laboratory-to-field verifica-
tion, of single species toxicity testing are thoroughly dis-
cussed, these authors strongly advocate toxicity water
quality standards and toxicity  monitoring, similar to that
outlined  in  the  California  Ocean  Plan (State Water
Resources Control Board, 1990),  on a world-wide basis.
                                                   -22-

-------
                                              Section 2
 1.0 Conclusions
 Criticisms of single species tests, including the USEPA
 toxicity tests, have included excessive between test and
 between  laboratory variability,  as  well  as questions
 regarding the ecological relevance of test results. While
 some accept that contaminants are responsible for bio-
 logical impacts based on an inverse correlation between
 chemical  concentrations and biological indices, there has
 been some reluctance to make similar, parallel interpreta-
 tions when ambient water or wastewater toxicity  and
 biological indices are inversely correlated. Determination
 of contaminant concentrations per se provides no infor-
 mation on the  bioavailability of these compounds to
 resident biota.

 The information presented in this review offers compelling
 evidence  that the USEPA toxicity tests and other single
 species toxicity  test results are, in a  majority of cases,
 reliable qualitative predictors  of  aquatic  ecosystem
 community  responses.     However,  this   qualitative
 relationship must be based on a series of  test results
 (persistent toxicity) not on a single test result. Participants
 at a 1996 Pellston conference on toxicity testing (Grothe
 etal., 1996; Waller etal., 1996) concluded that the USEPA
 toxicity tests provide "an  effective  tool for predicting
 receiving system impacts when appropriate considerations
 of exposure are considered.  Further, laboratory to field
 validation  is not essential for the continued use of these
 toxicity tests." According to that reference, the participants
 felt "It is unmistakable and clear that WET procedures,
 when used properly and for the intended purpose,  are
 reliable predictors of environmental impacts."

 Ideally, laboratory toxicity tests should provide ecologically
 relevant, reliable, and repeatable data. In practice, how-
 ever,  incorporating  these desirable characteristics into
 laboratory toxicity tests has been difficult. The single spe-
 cies toxicity tests are successful (compared to multiple
 species tests) in providing reliable and repeatable data, but
 at some expense to ecological  relevance (e.g., Calow,
 1992).  To assess  effects  of contaminants  on  aquatic
 biological  communities there is a  need for integrated,
weight-of-evidence approaches. Especially at moderately
polluted sites, a multiplicity of testing methods is helpful in
estimating and evaluating biological community responses.

The optimal approach may be to integrate ecological sur-
veys, toxicity tests, and chemical analyses to better under-
 stand contaminant effects on the health of aquatic ecosys-
 tems. The principal hinderance forthis approach has been
 the complexity and costs of such combined, extensive ef-
 forts.

 While there is some merit to the criticisms of the CETTP
 studies, they are not persuasive enough to doubt the effec-
 tiveness of the USEPA toxicity tests as qualitative forecast-
 ers of  biological  community responses. There are  no
 empirical data which demonstrate that these tests fail to
 render  reliable  extrapolations to  instream biological
 responses. Thus, when used appropriately as early warn-
 ing screening procedures, these tests provide a powerful
 monitoring tool. When test results fail as reliable qualita-
 tive predictors of instream impairments, they have more
 frequently underestimated aquatic ecosystem impacts. The
 idea that biotic and abiotic factors in  the environment
 significantly decrease bioavailability and toxicity was not
 supported by a majority of the studies reviewed.

 Data from the 7-d  Ceriodaphnia test, in particular, have
 been very   reliable predictors of  instream biological
 responses. This may be due, in part, to the large database
 for this  species.  On the  other hand, Slooff and Canton
 (1983) contend that single  species  tests with daphnids
 have been extremely efficient in the identification of chemi-
 cal concentrations harmful to aquatic ecosystems.  Sum-
 marized in this document are some 49 studies in which a
 cladoceran (Ceriodaphnia or Daphnia) was utilized as the
 laboratory test species.  These tests were performed at
 many locations across the country with  a wide variety of
 ambient water types (in most of which these cladocerans
 were not resident), effluents, and chemicals (including pes-
 ticides,  other organic chemicals,  metals, and inorganic
 chemicals).   Results from these  laboratory cladoceran
 tests were  reliable  predictors of  aquatic  ecosystem
 biological community  responses  or  adverse effect
 concentrations in 33 (67%) of the 49 studies (cf., Figure 3).
 The laboratory cladoceran tests underestimated biological
 community  responses  (or  overestimated  ecosystem
 adverse effect concentrations of a chemical) in 16 (33%)
 of the investigations. There were no studies in which the
 cladoceran tests overestimated impairments to biological
communities.

 Defending single species  toxicity  tests beyond  their
capabilities is to no one's advantage. Single species test
 results alone are not reliable quantitative forecasters of
                                                   -23-

-------
 toxic chemical impacts on complex ecosystems. The sim-
 plicity of the single species tests comes at a cost of inter-
 pretation and predictive depth.  The test protocols can
 always be improved so that we are more confident of their
 meaning and  so that their results are more reliable
 predictors of instream impacts. A better understanding of
 the limitations of extrapolations  is needed so that mod-
 ifications can  be made which  increase reliability and
 decrease uncertainty, as well as establish a stronger
 theoretical basis for extrapolations. A list of limitations and
 strengths of single species tests is provided in Appendix D.

 Establishing a quantitative correlation between a biological
 response from a single grab or composite water sample
 and the biological community responses is not only impos-
 sible, but unnecessary. However, with a good temporal
 representation of ambientoreffluenttoxicity and with care-
 fully designed/performed seasonal bioassessment data
 from streams, statistically significant correlations between
 data sets have been possible. However, thorough ecologi-
 cal surveys in large rivers and bays will be difficult and
 expensive; there is likely to be considerable controversy
 regarding what sites,  if any, should serve as  reference
 ("clean") sites and, if there are not reference sites, what
 represents a "healthy" aquatic ecosystem.  Many  rivers
and bays in the U.S. are significantly altered by human
activities; attempting  to attribute degradation in  these
 systems to toxic chemicals with the use of bioassessments
 will be very difficult.  Even though there cannot be direct
 proof that toxic chemicals are a cause of declining aquatic
 organism populations, it is not advisable to forsake the
 qualitatively predictive early warning tools.

 The reliability with which single species toxicity test results
 predict biological community responses relates to several
 factors. One major factor was; addressed by Dickson et al.
 (1992); they observed that when effluent or ambient water
 toxicity  is  relatively  low or when impacts on aquatic
 ecosystems are moderate it will be difficult to establish a
 relationship between toxicity and instream ecological
 responses.  The strength of  the predictive capacity of
 single species test results is substantially enhanced when
 the test is performed  with  ambient  water (e.g., as
 compared to effluent) and with higher magnitude toxicity
 in the sample.  Chapman et al. (1987) came to a similar
 conclusion regarding magnitude of toxicity in relation to
 sediment tests. We appear to be approaching consensus
 that when significant lethality (and in the case of effluents,
 assuming accurate dilution has been considered) is seen
 in toxicity tests there is a very high potential of aquatic
 ecosystem impairment. As this connection is accepted, we
continue to struggle with the idea that sublethal effects on
indicator species  can  result in detectable  adverse
ecosystem responses.
                                   1333 %
         D67%
           DLaboratory toxicity tests provide reliable prediction of biological community responses and/or
             aquatic ecosystem adverse effect concentration (33/49)


           QLaboratory toxicity tests underestimate biological community responses and/or overestimate
             aquatic ecosystem adverse effect concentrations  (16/49)
Figure 3.  Summary of studies in which a cladoceran was used as a laboratory test organism when comparing toxicity
test results to ecological survey data and/or field effect concentrations. Percentages represent the outcomes of the
studies. Total number of studies is 49.
                                                   -24-

-------
Possibilities for decreasing the extrapolation uncertainties
and improving the predictiveness of single species test
results include: 1) more thorough characterization (persis-
tence, frequency, magnitude) of ambient water or effluent
toxicity, 2) more effective matching (or accounting for) of
exposure patterns in natural ecosystems as compared to
laboratory tests, 3) develop a more thorough comprehen-
sion  of  what constitutes critical  aquatic  ecosystem
endpoints, 4) improved simulation, or consideration of,
ecosystem characteristics  and processes in laboratory
tests (a corollary to this point would be to avoid defaulting
to worst case scenarios in all cases), 5) more thorough
knowledge of environmental fate  and  bioavailability of
chemicals, 6) develop models which map the quantitative
and qualitative  relation between  single  species  test
endpoints and important ecosystem endpoints (this would
include focusing on the  relative sensitivities of surrogate

species compared to key ecosystem endpoints), 7) focus
on or develop tests with endpoints which have a clear
connection to important ecosystem structures/functions, 8)
enhance the intertest repeatability of single species tests,
9) improve understanding of how toxicity is manifested in
complex ecosystems, and 10) develop field and laboratory
approaches which are complementary.

A convincing relationship has been established between
ambient water toxicity (as manifested by single species
tests) and biological community responses, but has such
connection been authenticated between effluent toxicity
and instream impairments? The effluent-biological com-
munity link has not  been as thoroughly investigated.
Nonetheless, in  several  recent studies (see Section  1.0,
subsection 7.0), as well as the CETTP and associated
studies, where effluent toxicity was assessed, a reliable
qualitative estimate of  instream biological  effects was
obtained. This relationship was most evident when flow
and dilution of the receiving water  were effectively esti-
mated and when environmental exposure duration was
matched (or account for) by laboratory toxicity test dura-
tion.

Recently, Sprague  (1995) observed that single species
toxicity test results give  us answers that support action,
and the important response is to take action rather than
wait for a 98% certainty. Because these single species
toxicity test results are, in a large majority of cases, reliable
qualitative predictors of biological community responses,
the controversy surrounding these tests can diminish if the
data from these tests are used appropriately. Moreover,
if the results  of a single test are not characterized as a
violation of an effluent limit or a water quality standard, but
rather  as a  gauge of relative toxicity and, therefore, a
signal   to   initiate  repeated  or  more  frequent
sampling/testing (orTIEs) to better characterize potential
effluent or ambient watertoxicity, regulated entities may be
less critical of the single species tests.  Prior to making
predictions  regarding  biological  community  impacts,
ambient water or effluent toxicity must be characterized so
there can be more certainty regarding the nature of the
toxicity. Furthermore, it is difficult to control toxicity until its
nature, cause, arid  source are known.  If the results of
single species tests are used to signal the potential for
instream  impairments,  then  the toxicity test (USEPA,
1994a,b) results do not have to be quantitative predictors,
but rather effective qualitative predictors.  These tests do
provide a reliable and repeatable  qualitative predictive
capability. Mount (1995) stated, "It is the application of the
toxicity data, not its inherent validity, that is questionable."
In harmony with Mount, Luoma and Carter (1993) conclude
that it is the "interpretation and application of results" from
single species tests that are controversial.

Critics of the single  species tests  fail to recognize that
these tests, even with nonindigenous species, reveal that
water samples contain biologically  significant concentra-
tions of toxic chemicals. Results of  laboratory single spe-
cies tests are  based on  the toxicological  principle of
concentration-response. This principle is fundamental and
well established.  The effectiveness of the single species
tests in predicting biological community responses is cen-
tered in this principle of concentration-response.

If the single species tests continue to be used  as early
warning, screening tools (identification of a potential prob-
lem which is further investigated), there is less necessity
for  developing new standardized  indigenous  species
testing protocols. The probability that any particular test
species, whether or not it is indigenous to the particular
aquatic ecosystem,  represents the most or the least
sensitive group of  species or endpoints in a  specific
biological assemblage is very low. More likely, the sensi-
tivity of the test species would fall somewhere within the
sensitivity distribution of organisms from an aquatic eco-
system. This is perhaps one of the reasons why the short-
term toxicity tests (USEPA, 1994a,b) and  other single
species test  results have  been  effective qualitative
predictors of ecosystem biological responses. Developing
testing protocols with indigenous species in many different
aquatic ecosystems may improve accuracy of predictions,
however,  such efforts will be expensive  and  difficult
undertakings (e.g.,  Persoone  and  Giliett, 1990;Luoma,
1995).  Furthermore, there is little evidence,  and no
guarantee, that the  reliability of environmental impact
predictions will be significantly enhanced with indigenous
species tests.

Slooff (1985), as well as Persoone and Janssen (1994)
discuss the  wide  range  of  sensitivities  of  aquatic
organisms, life stages, and endpoints to toxic chemicals.
These authors-also observe that enlarging the suite of test
species, life stages,  and endpoints almost always results
                                                   -25-

-------
 in lowering environmental effect  concentrations (i.e.,
 LOECs/NOECs decrease with increasing number of test
 species, life stages, and endpoints-i.e., with increasing
 amounts of toxicity data).  Several  authors (e.g.,  Slooff
 and Canton, 1983; Persoone and Gillett, 1990; Persoone
 etal., 1990; Chapman, 1995b; Luoma, 1995; Underwood,
 1995; Dom, 1996) maintain that indicator species are not
 likely to represent the most sensitive aquatic ecosystem
 response,  but rather have been selected for robustness,
 ease of culture and  availability. Bartell et al.  (1992)
 propose that area and sensitive species, by definition, are
 seldom selected for routine toxicity testing. Several other
 authors (Persoone et al., 1990; Baird, 1992; Forbes and
 Depledge, 1992; Clements and Kiffney, 1996; LaPoint et
 al., 1996) also proposed that the USEPA's toxicity tests for
 effluents and receiving waters (USEPA, 1994a,b) and
 other single species  toxicity  tests  most  frequently
 underestimate effects in aquatic ecosystems.

 Because single species test results are reliable qualitative
 predictors of biological community responses, the burden
 of proof in demonstrating that persistent toxicity is not im-
 pacting biological communities perhaps should rest with
 the entity(ies) responsible forthe contaminants. Moreover,
 at some stage it should become incumbent on the entity
 responsible for the probable  environmental impacts to
 demonstrate the absence of ecosystem impairments. As
 stated by Luoma (1995), "The toxicity tests tool may never
 achieve the  high probability  prediction   capabilities
 'required' by ardent critics; however, this does not prevent
 the approach from being a useful tool in the developing
 arsenal available to study the effects of contaminants".

 2.0 Summary
 Regulatory agencies have tended to rely on single species
 toxicity tests, particularly USEPA's toxicity tests, on surface
 or effluent water  samples to identify potential chemical
 toxicity threats  to aquatic   biological communities.
 Questions regarding the reliability of these laboratory test
 results in predicting impairments to biological communities
 have  been  advanced.    Of   particular concern are
 uncertainties of extrapolating from the outcomes of these
 highly controlled  laboratory  tests to complex  and
 multivariate ecosystems. This document is an interpretive
 review of the literature on this question of ecological
 relevance of single species toxicity test results; it includes,
 but is not restricted to USEPA's Complex Effluent Toxicity
Testing Program (CETTP-conducted for the purpose of
examining the predictive correspondence  between the
short-term toxicity tests (USEPA, 1994a,b) and instream
impacts).

Aquatic ecosystem surveys typically have been used to
assess the reliability of single species toxicity test results
extrapolations.  Potential limitations  to the use of these
bioassessments for "validating" predictions extrapolated
 from single species tests are discussed; caution is urge
 in the interpretation of these surveys. Strengths and limita^
 tions of the CETTP and associated studies, as well as of
 two  recent  statistical analyses  of those  studies,  are
 evaluated.

 Approximately 80 studies in which single species tests
 were used to assess ambient water or effluent toxicity and
 in which some ecological survey data were gathered, for
 the purpose of exploring the correspondence between tox-
 icity data and biological community responses, are critically
 evaluated.   A preponderance of evidence reveals that
 USEPA's toxicity tests (USEPA, 1994a,b) and othersingle
 species test results are, in a majority of cases, reliable
 qualitative (some level of response seen) predictors of
 aquatic ecosystem community effects. In this document 77
 independent studies in which the results of laboratory indi-
 cator single species toxicity tests are assessed with regard
 to reliability in predicting aquatic ecosystem biological com-
 munity responses (and/or adverse effect concentrations)
 are summarized. In 57 (74%) of the studies the indicator
 single  species  tests  provided  reliable  qualitative
 predictions of biological community impacts or adverse
 effect concentrations (cf., F:igure4). The laboratory single
 species tests underestimated aquatic ecosystem effects
 (and/or overestimated the biological community adverse
 effect concentration of a chemical) in 16  (21 %) of the 77
 studies.  Results  of four (5%)  of  the studies  were
 inconclusive or mixed. There are no experimental data
 which demonstrate that the single species tests generally
 fail to render reliable qualitative extrapolations to biological
 community responses.

 While criticisms of the USEPA toxicity  tests (USEPA,
 1994a,b) and other single species tests have some merit,
 they are not persuasive enough to cast doubt on the effec-
 tiveness of these tests in predicting ecosystem impacts.
 When used appropriately as early warning  signals and with
 dependable temporal representation of ambient water or
 effluent toxicity, these tests provide a powerful monitoring
 tool. When single species tests fail as reliable qualitative
 predictors, they most frequently underestimate impacts to
 the ecosystem community.  Single species test results
 alone are not reliable quantitative forecasters of toxic
 chemical impacts on complex ecosystems.

 The predictive power of single species tests is substan-
 tially enhanced when ambient water, as compared to
 discharge, is tested and when higher magnitude toxicity
 exists; reliability is also improved when exposure patterns
 in natural ecosystems are matched or accounted for and,
 in the case of effluents, when realistic estimates^ dilution
 are taken into account.

Alternatives to indicator species tests are explored. There
is a paucity of evidence that the current standardized toxic-
                                                   -26-

-------
 ity testing protocol (including the USEPA toxicity tests;
 USEPA, 1994a,b) test species are more sensitive to toxic
 chemicals than resident species. If the single species tests
 continue to be used as early warning signals there is less
 necessity for developing new  standardized indigenous
species testing protocols.  The wisdom of developing a
host of new standardized tests with indigenous species,
unless  they will  substantially  improve  accuracy  of
predicting ecosystem impacts, is questionable.
                              121%
                                                 m 5%
            D74%
     O Laboratory single species toxicity tests provide reliable prediction of biological community responses
       and/or aquatic ecosystem adverse effect concentration [57/77]

     HI Laboratory single species toxicity tests underestimate biological community responses and/or
       overestimate aquatic ecosystem adverse effect concentration  (I.e., adverse effects occur at lower
       concentration than predicted in  laboratory test [57/77]
     EJ Laboratory single species toxicity test yielded mixed or inconclusive results [4/77]
Figure 4. Summary of studies reviewed in this report in which the results of laboratory single species toxicity tests were
compared to biological community surveys and/or field effect concentrations. Tabulation is by overall outcome of the
study. Total number of studies summarized is 77.
                                                   -27-

-------
                                            Section 3
 1.0 References

 Review articles or publications relating to the ecological
 relevance of single species toxicity tests are noted by ***.
 References are inclusive for appendices.

 Adams, W.J., R.A. Kimerle, B.B. Heidolph, and P.R.
  Michael. 1983. Field Comparison of Laboratory-derived
  Acute and Chronic Toxicity Data. pp. 367-385. InrW.E.
  Bishop, R.D. Cardwell, and B.B. Heidolph, eds., Aquatic
  Toxicology and Hazard Assessment, ASTM STP 802.
  American  Society  for  Testing  and  Materials,
  Philadelphia, PA.

 Bailey, H.C. 1995. Letter to The  Editor. Human Ecol. Risk
  Assess. 1. 459-463.

 Bailey, H.C., C. Alexander, C.  DiGiorgio, M. Miller, S.I.
  Doroshov, D.E. Hinton. 1994. The Effects of Agricultural
  Discharges on  Striped Bass (Morone Saxatilis)   in
  California's  Sacramento-San Joaquin  Drainage.
  Ecotoxicology 3:123-142.

 Baird,  D.J.  1992. Predicting Population Response to
  Pollutants: in Praise of Clones. A Comment on Forbes
  & Depledge. Fund. Ecol. 6:616-617.

 Barbour, M.T., J.M. Diamond,  and C.O. Yoder. 1996.
  Biological  Assessment  Strategies:  Applications and
  Limitations, pp. 245-270.  In: D.R. Grothe, K.L. Dickson,
  and K. Reed-Judkins (eds),  Whole  Effluent Toxicity
  Testing: An Evaluation of Methods and Prediction of
  Receiving System Impacts. SETAC Press, Pensacola,
  FL

 Bartell, S.M., R.H. Gardner, and R.V. O'Neill.  1992.
  Ecological Risk Estimations.  Lewis Publishers, Boca
  Raton, FL.

 Baughman, D.S., D.W.  Moore,  and G.I. Scott. 1989. A
  Comparison and Evaluation of Field and Laboratory
  Toxicity   Tests with  Fenvalerate on an Estuarine
  Crustacean. Environ.  Toxicol. Chem. 8:417-429.

Becker,  D.S., G.R.  Bilyard, and  T.C. Ginn.  1990.
  Comparisons  Between Sediment Toxicity Tests and
  Alterations of Benthic Macroinvertebrate Assemblages
  ata Marine Superfund Site: Commencement Bay, Wash-
  ington. Environ. Toxicol. Chem. 9:669-685.

 Birge, W.J.  and J.A. Black. 1990. In Situ Toxicological
  Monitoring: Use in Quantifying  Ecological  Effects of
  Toxic Wastes, pp. 215-231. In: S.S. Sandhu, W.R.
  Lower, F.J. de Serres, W.A. Suk, and R.R. Tice, eds., In
  Situ Evaluation of Biological Hazards of Environmental
  Pollutants. Plenum Press, New York, NY.

 Birge, W.J., J.A. Black, T.M. Short, and A.G. Westerman.
  1989. A Comparative  Ecological and  Toxicological
  Investigation of Secondary WastewaterTreatment Plant
  Effluent and  Its Receiving Stream. Environ.  Toxicol.
  Chem. 8:437-450.

 Birge, W.J., D.J. Price, D.P. Keogh, J.A. Zuiderveen, and
  M.D. Kercher. 1992. Biological Monitoring Program for
  the Paducah Gaseous Diffusion Plant. Annual Report for
  study periodOct. 1990 through March 1992. Submitted
  to Oak Ridge National Laboratory, Oak Ridge, TN.

 Boelter, A.M.,  F.N.  Lamming, A.M.  Farag,  and H.L.
  Bergman. 1992. Environmental Effects of Saline Oil-field
  Discharges on Surface Waters. Environ. Toxicol. Chem.
  11:1187-1195.

 Boyle, T.P., S.E. Finger, R.L. Paulson, and C.F. Rabeni.
  1985. Comparison of Laboratory and Field Assessment
  of Fluorene. Part li: Effects on the Ecological Structure
  and Function of Experimental Pond Ecosystems, pp.
  134-151. In: T.P. Boyle, ed,., Validation and Predictability
  of Laboratory Methods for Assessing the Fate and Ef-
  fects of Contaminants in Aquatic Ecosystems. ASTM
  STP 865.  American Society for Testing and Materials,
  Philadelphia, PA.

Brock, T.C.M., S.J.H. Crum, R. van Wijngaarden, B.J.
  Budde, J. Tijink, A. Zuppelli, and p. Leeuwangh. 1992.
  Fate and Effects of the Insecticide Dursban 4E in Indoor
  E/ocfea-dominated  and  Macrophyte-free Freshwater
  Model Ecosystems:  I. Fate and Primary Effects of the
  Active Ingredient Chlorpyrifos. Arch. Environ. Contam.
  Toxicol. 23:69-84.

Burton, G.A. Jr. and G.R. Lanza. 1987. Aquatic Microbial
  Activity  and  Macrofaunal  Profiles of an Oklahoma
  Stream.  Wat. Res. 21:1173-1182.
                                                  -28-

-------
 Burton,  G.A. Jr., A. Drotar, J.  M. Lazorchak, and L.L.
   Baals.  1987.  Relationship of Microbial Activity and
   Ceriodaphnia Responses to Mining Impacts on the Clark
   Fork River, Montana. Arch. Environ. Contam. Toxicol.
   16:523-530

 ***Cairns, J. Jr. 1983. Are  Single Species Toxicity Tests
   Alone Adequate for Estimating Environmental Hazard?
   Hydrobiologla 100:47-57.

 ***Cairns, J. Jr. 1986. What Is Meant by Validation of Pre-
   dictions Based   on  Laboratory  Toxicity  Tests?
   Hydrobiologia 137:271 -278.

 ***Cairns, J. Jr. 1988a. What Constitutes Field Validation
   of Predictions Based on Laboratory Evidence? pp.361 -
   368. In: Adams, G.A. Chapman, and W.G. Landis, eds.,
   Aquatic Toxicology and Hazard Assessment Tenth Vol-
   ume, ASTM STP 971, American Society for Testing and
   Materials, Philadelphia, PA.

 ***Cairns, J. Jr., 1988b. Should  Regulatory Criteria And
   Standards Be Based on Multispecies Evidence? Environ.
   Profess. 10:157-165.

 ***Cairns, J. Jr. 1988c.  Putting the Eco in Ecotoxicology.
   Regulatory Toxicol. Pharm. 8:226-238.

 Cairns, J. Jr., and D.S. Cherry. 1983. A Site-specific Field
   and Laboratory Evaluation of Fish and Asiatic Clam Pop-
   ulation  Responses  to  Coal  Fired  Power  Plant
   Discharges. Wat. Sci. Tech. 15:31-58.

 ***Cairns, J., Jr. and D.I. Mount. 1990. Aquatic Toxicology,
   Part 2. Environ. Sci. Technol.24: 154-161.

 Cairns, J. Jr., and B.R. Niederlehner. 1995. Predictive
   Ecotoxicology: Methods for Making Estimates and Pre-
  dictability in Ecotoxicology. pp.667-680. In: D.J.
   Hoffman, B.A.  Rattner, G.A. Burton, and J. Cairns Jr.,
  eds., Handbook of Ecotoxicology, CRC Press, Inc., Boca
  Raton,  FL.

 Cairns, J. Jr., and E.P. Smith. 1994. The Statistical Validity
  of Biomonitoring Data. pp. 49-68. In: S.L. Loeb, and A.
  Spacie, eds., Biological Monitoring of Aquatic Systems.
  Lewis Publishers, Boca Raton, FL.

 Cairns, J. Jr., D.S. Cherry, and J.D. Grattina 1982.  Cor-
  respondence Between Behavioral Responses of Fish in
  Laboratory and Field Heated Chlorinated Effluents, pp.
  207-215. In: W.J. Mitsch, R.W. Bosserman, and J.M.
  Klopatek, eds., Energy and  Ecological Modelling.
  Elsevier Scientific Publishers Co., Amsterdam.

Cairns, J. Jr., P.V. McCormick, and S.E. Belanger. 1993.
  Prospects   for  the  Continued  Development  of
   Environmentally-realistic Toxicity Tests Using Microor-
   ganisms. J. Environ. Sci. 5:253-268.

 Calow, P. 1992. The Three R's of Ecotoxicology. Fund.
   Eco/. 6:617-619.

 Canfield, T.G.,  N.E.  Kemble, W.G. Brumbaugh,  F.G.
   Dwyer, C.G. Ingersoll and J.F. Fairchild. 1994. Use of
   Benthic Invertebrate Structure and the Sediment Quality
   Triad to Evaluate Metal-contaminated Sediment in the
   Upper  Clark Fork River, Montana. Environ. Toxicol.
   Chem.  13:1999-2012.

 Carlson, A.R., H. Nelson, and D. Hammermeister. 1986.
   Development and Validation of Site-specific Water Qual-
   ity Criteria for Copper. Environ. Toxicol. Chem. 5:997-
   1012.

 ***Chapman,  P.M.  1995a.  Extrapolating  Laboratory
   Toxicity Results to the Field. Environ. Toxicol. Chem. 14-
   927-930.

 Chapman, P.M. 1995b. Do  Sediment  Toxicity Tests
   Require Validation? Environ. Toxicol. C/?em.:14:1451-
   1453.

 ***Chapman, P.M. 1995c. Ecotoxicoiogyand Pollution-Key
   Issues. Marine Poll. Bull. 16:405-415.

 Chapman, P.M., R.N. Dexter, and E.R. Long. 1987. Syn-
   optic Measures of Sediment Contamination, Toxicity and
   Infaunal Community Composition (The Sediment Quality
   Triad) in San  Francisco Bay. Mar. Eco/. Prog.  Ser.
   37:75-96.

 Clark, J.R., P.W. Borthwick, L.R. Goodman, J.M. Patrick,
   Jr., E.M. Lores, and J.C. Moore. 1987. Comparison of
   Laboratory  Toxicity Test Results with  Responses of
   EEstuarine Animals Exposed to Fenthion in the Field.
   Environ. Toxicol.  Chem. 6: 151-160.

 Clements, W.H. and P.M. Kiffney. 1994. Integrated Labo-
   ratory and Field  Approach for Assessing  Impacts of
   Heavy Metals at the Arkansas River, Colorado. Environ.
   Toxicol. Chem. 13:397-404.

 Clements, W.H.  and P.M. Kiffney. 1996. Validation of
  Whole Effluent Toxicity Tests: Integrated Studies Using
  Field Assessments, Microcosms, and Mesocosms. pp.
  229-244. In: D.R. Grothe, K.L. Dickson, and O.K. Reed-
  Judkins, eds., Whole Effluent Toxicity Testing: An Evalu-
  ation of Methods and Prediction of Receiving System
  Impacts. SETAC Press, Pensacola, FL.

Cooper, W.E., and R.J. Stout. 1985. The Monticello Ex-
  periment: A Case Study,  pp. 96-116. In: Multispecies
                                                  -29-

-------
  Toxlcity Testing, J. Cairns, Jr., ed., Pergamon Press,
  New York, NY.

Crane, M. 1995. Society of Environmental Toxicology and
  Chemistry (SETAC) News Article.  Vol. 15,  No.   2.,
  March.

Grassland, N.O. 1984. Fate and Biological Effects of
  Methyl Parathion in  Outdoor Ponds  and Laboratory
  Aquaria. Ecotox. Environ. Safe. 8:482-495.

Grassland, N.O. and J.M.  Hillaby. 1985. Fate and Effects
  of 3,4-dichloroaniline in the Laboratory and in Outdoor
  Ponds: II. Chronic Toxicity to Daphnia Sp. And Other
  Invertebrates. Environ. Toxicol. Chem. 4:489-499.

Grassland, N.O. and C.J.M. Wolff. 1985. Fate and Biolog-
  ical Effects of Pentachlorophenol  in Outdoor Ponds.
  Environ. Toxicol. Chem. 4:73-86.

Grassland, N.O., G.C. Mitchell, and P.B. Dorn. 1992. Use
  of Outdoor Artificial  Streams to Determine Threshold
  Toxicity Concentrations for a Petrochemical  Effluent.
  Environ. Toxicol. Chem. 11:49-59.

Davis, W.S. and T.P. Simon. 1995. Biological Assessment
  and Criteria. Lewis Publishers, Boca Raton, FL.

deNoyelles, F. Jr.,  and W.D. Kettle. 1985. Experimental
  Ponds for Evaluating Toxicity Tests Predictions, pp. 91 -
  103. In: T.P. Boyle, ed., Validation and Predictability of
  Laboratory Methods for Assessing the Fate and Effects
  of Contaminants in Aquatic Ecosystems, ASTM STP
  865. American Society for Testing and Materials, Phila-
  delphia, PA.

Diamond, J.M., J.C. Hall, D.M. Pattie, and D. Gruber.
  1994. Use of an Integrated Approach to Determine Site-
  specific Effluent  Metal  Limits. Water Environ. Res.
  66:733-743.

Dickson, K.L.  1995.  Progress in Toxicity Testing-An
  Academic's Viewpoint,  pp. 209-216.  In: J Cairns, Jr.
  and B.R. Niederlehner, eds., Ecological Toxicity Testing,
  Lewis Publishers, Boca Raton, FL.

Dickson, K.L., T. Duke, and G. Loewengart. 1985. A Syn-
  opsis: Workshop on Multispecies Toxicity Tests, pp. 76-
  88.  In: J. Caims,  Jr., ed., Multispecies Toxicity Testing.
  Pergamon  Press, New York, NY.

Dickson, K.L., W.T. Waller, J.H. Kennedy, W.R. Arnold,
  W.P. Desmond, S.D. Dyer, J.F. Hall, J.T.  Knight, Jr., D.
  Malas, M.L. Martinez and S.L. Mutzner, 1989. A Water
  Quality and Ecological Survey of the Trinity River, Vols.
  I and  II. Rnal Report. City of Dallas Water  Utilities,
  Dallas, TX.
***Dickson, K.L., W.T. Waller, J.H. Kennedy, and LP.
  Ammann. 1992. Assessing the Relationship Between
  Ambient Toxicity and  Instream Biological  Response.
  Environ. Toxicol. Chem. 11:1307-1322.

Dickson, K.L., W.T. Waller, J.H. Kennedy, L.P. Ammann,
  R. Guinn, and T.J. Norberg-King.  1996. Relationships
  Between Effluent Toxicity, Ambient Toxicity, and Receiv-
  ing System Impacts: Trinity River Dechlorination Case
  Study, pp 287-308. In: D.R. Grothe, K.L. Dickson, and
  O.K.  Reed-Judkins,  eds.,  Whole  Effluent Toxicity
  Testing: An Evaluation of Methods and Prediction of
  Receiving System Impacts. SETAC Press, Pensacola,
  FL.

Dorn, P. 1996. An Industrial Perspective on Whole Effluent
  Toxicity Testing, pp. 16-37. In: D.R. Grothe, K.L. Dick-
  son, and O.K. Reed-Judkins, eds., Whole Effluent Toxic-
  ity Testing: An Evaluation of Methods and Prediction of
  Receiving System Impacts. SETAC Press, Pensacola,
  FL.

Dorn, P.B., R.  van Compernolle,  C.L. Meyer, and N.O.
  Grassland. 1991. Aquatic Hazard Assessment  of the
  Toxic Fraction from the Effluent of a Petrochemical Plant.
  Environ. Toxicol. Chem. 10:691-703.

Eagleson, K.W.,  D.L. Lenat, L.W.  Ausley, and F.B.
  Winborne. 1990. Comparison of Measured Instream
  Biological Responses with Responses Predicted Using
  the Ceriodaphnia dubia Chronic Toxicity Test. Environ.
  Toxicol. Chem. 9:1019-1028.

Eaton, J., J. Arthur, R. Hermanutz, R. Kiefer, L.  Mueller, R.
  Anderson, R. Erickson, B. Nordling, J. Rogers, and H.
  Pritchard. 1985. Biological  Effects of Continuous and
  Intermittent Dosing of Outdoor Experimental Streams
  with Chlorpyrifos. pp. 85-118. In: R.C. Banner and D.J.
  Hansen, eds., Aquatic Toxicology and Hazard Assess-
  ment: Eighth Symposium, ASTM STP 891, American
  Society for Testing and Materials,  Philadelphia,  PA.

Eisle, P.J.  and  R.  Hartung.  1976. The  Effects of
  Methoxychlor on Riffle Invertebrate Populations and
  Communities. Trans. Am. Fish. Soc. 105:628-633.

***Emans, H.J.B., E.J. v.d.Plassche, J.H. Canton, P.C.
  Okkerman, and P.M. Sparenburg. 1993. Validation of
  Some Extrapolation Methods Used for Effect Assess-
  ment. Environ. Toxicol. Chem. 4:155-166.

Fairchild, J.F., F.J. Dwyer, T..W. La Point, S.A. Burch, and
  C.G. Ingersoll. 1993. Evaluation of a Laboratory-gener-
  ated  NOEC  for Linear  Alkylbenzene Sulfonate in
  Outdoor Experimental Streams. Environ. Toxicol. Chem.
  12:1763-1775.
                                                  -30-

-------
 Fairchild, J.F., T.W. La Point, J.L. Zajicek, M.K. Nelson, F.
  J.  Dwyer,  and  P.A.  Lovely.  1992.  Population-,
  Community-and Ecosystem-level Responses of Aquatic
  Mesocosmsto Pulsed Doses of aPyrethroid Insecticide.
  Environ.  Toxicol. Chem. 11:115-129.

 Ferraro, S.P., R.C. Swartz, F.A. Cole, and D.W. Schults.
  1991. Temporal Changes in the Benthos Along a Pollu-
  tion Gradient: Discriminating the Effects of Natural Phe-
  nomena  from Sewage-industrial Wastewater  Effects.
  Estuarine, Coastal and Shelf Science 33:383-407.

 ***Forbes,  V.E. and M.H. Depledge. 1992.  Predicting
  Population Response to Pollutants: Significance of Sex.
  Fund. Eco/. 6:376-381.

 Franco, P.J., J.M. Giddings, S.E. Herbes, LA. Hook, J.D.
  Newbold, W.K. Roy, G.R. Southworth, and A.J. Stewart.
  1984. Effects of Chronic Exposure to Coal-derived Oil on
  Freshwater  Ecosystems:  I. Microcosms. Environ.
  Toxicol. Chem. 3:447-463.

 Frithsen, J.B.,  D.  Nacci, C. Oviatt, C.J. Strobel, and R.
  Walsh. 1989. Using Single-species and Whole Ecosys-
  tem Tests to Characterize the  Toxicity of a Sewage
  Treatment Plant Effluent, pp. 231-250.  ln:G.W. Suter
  II and M.A. Lewis, eds., Aquatic Toxicology and Environ-
  mental Fate: Eleventh Volume, ASTMSTP1007, Amer-
  ican Society forTesting and Materials, Philadelphia, PA.

 Gearing, J.N. 1989. The Role of Aquatic Microcosms  in
  Ecological Research as Illustrated by Large Marine Sys-
  tems, pp. 411-448.  In: Ecotoxicology:  Problems and
  Approaches (S.A. Levin, M.A. Harwell,  J.R. Kelly and
  K.D. Kimball.  Springer Verlag, New York, NY.

 Geckler, J.R.,  W.B.  Horning, T.M.  Nieheisel,  Q.H.
  Pickering, E.L.  Robinson, and  C.E.  Stephan.  1976.
  Validity of Laboratory Tests for Predicting Copper
  Toxicity in Streams. EPA 600/3-76-116. Cincinnati, OH.
Giddings, J.M., and P.J. Franco. 1985. Calibration of Lab-
  oratory Toxicity Tests with Results from Microcosms and
  Ponds, pp, 104-119. In: T.P. Boyle, ed., Validation and
  Predictability of Laboratory Methods for Assessing the
  Fate  and   Effects  of  Contaminants   in  Aquatic
  Ecosystems,  STP 865. American Society for Testing
  and Materials, Philadelphia, PA.

Giddings, J.M., P.J. Franco, S.M. Bartell, R.M. Cushman,
  S.E. Herbes, L.A. Hook, J.D. Newbold, G.R. Southworth,
  and A.J. Stewart.  1984.  Effects of Contaminants  on
  Aquatic Ecosystems: Experiments with Microcosms and
  Outdoor Ponds. Oak Ridge National Laboratory, Oak
  Ridge, TN.
 Giesy, J.P. and P.M. Allred. 1985. Replicability of Aquatic
   Multispecies Test Systems,  pp. 187-247. In: J. Cairns
   Jr., ed. Multispecies Toxicity Testing. Pergamon Press,
   New York, NY.

 Giesy, J.P., Jr., H.J. Kania, J.W. Bowling, R.L. Knight, S.
   Mashburn, and S. Clarkin. 1979. Fate and Biological
   Effects  of  Cadmium  Introduced   into  Channel
   Microcosms. EPA 600/3-79-039. Duluth, MN.

 Gonzalez, M.J. and T.M. Frost. 1994. Comparisons of
   Laboratory Toxicity Tests and a Whole-lake Experiment:
   Rotifer Responses to Experimental Acidification. Ecologi-
   cal Applications 4(1 ):69-80.

 ***Grothe, D.R.,  K.L.  Dickson, and D.K. Reed-Judkins
   (eds.). 1996. Whole Effluent Toxicity Testing: An Evalu-
   ation of Methods and Prediction of Receiving System
   Impacts. SETAC Press, Pensacola, FL.

 Hansen, S.R. and R.R. Garton. 1982. Ability of Standard
   Toxicity Tests to Predict the Effects of the Insecticide
   Diflubenzuron on Laboratory Stream Communities. Can.
   J.  Fish. Aquat. Sci. 39:1273-1288.

 Havas,  M.  and  T.C.   Hutchinson.  1982.  Aquatic
   Invertebrates from the Smoking Hills, N. W. T.: Effect of
   pH and Metals on Mortality. Can. J. Fish. Aquat. Sci
   39:890-903.

 Herbold, B., A.D. Jassbay, and P.B. Moyle. 1992. Status
   and Trends Report on Aquatic Resources in the San
   Francisco Estuary. San Francisco Estuary Report, CA.

 Hitchock, S.W. 1965. Field and Laboratory Studies of DDT
   on Aquatic Insects. Conn. Ag. Exp. Bull. (New Haven)
   668:1-32.

 Kersting, K. and R. van  Wijngaarden. 1982. Effects of
   Chlorpyrifos on a Microecosystem. Environ. Toxicol.
   Chem. 11:365-372.

. Larnberson, J.O., T.H. DeWitt,  and R.C. Swartz. 1992.
   Assessment of Sediment Toxicity to Marine Benthos.
   pp. 183-211. In: G.A. Burton, Jr., ed. Sediment Toxicity
   Assessment, Lewis Publishers, Boca Raton, FL.

 LaPoint, T.W. 1994. Interpreting the Results of Agricultural
   Microcosm Tests: Linking Laboratory and Experimental
   Field Results to Predictions of Effect in Natural Ecosys-
   tems,   pp.  83-94.  In:  l.R.  Hill,   F.   Heimbach, P.
   Leeuwangh, and P. Matthiesen, eds., Freshwater Field
   Tests for Hazard Assessment of Chemicals. Lewis Pub-
   lishers, Boca Raton,  FL.

 LaPoint,  T.W.  1995. Signs  and  Measurements of
   Ecotoxicology in the  Aquatic Environment,  pp. 13-24.
                                                  -31-

-------
   In: D.J. Hoffman, B.A. Rattner, G.A. Burton, Jr., and J.
   Cairns, Jr., eds., Handbook of Ecotoxicology. Lewis Pub-
   lishers, Boca Raton, FL.

 LaPoint, T.W., M.T. Barbour, D.L. Burton, D.S.  Cherry,
   W.H. Clements, J.M. Diamond, D.R. Grothe, M.A. Lewis,
   O.K. Reed-Judkins, and G.W. Saalfeld. 1996. Field as-
   sessments, pp. 191-228. In: D.R. Grothe, K.L. Dickson,
   and O.K. Reed-Judkins, eds., Whole Effluent  Toxicity
   Testing: An Evaluation of Methods and Prediction  of
   Receiving System Impacts. SETAC Press, Pensacola,
   FL.

 LaPoint, T.W., J.F. Fairchild, E.E. Little, and S.E. Finger.
   1989.   Laboratory  and  Field  Techniques   in
   Ecotoxicological Research: Strengths and Limitations.
   pp. 239-255. In: A. BoudouandF. Ribeyre, eds, Aquatic
   Ecotoxicology: Fundamental Concepts and Methodolo-
   gies, II. CRC Press, Inc.  Boca Raton, FL.

 Larsen,  DP.,  F.  deNoyelles   Jr.,  F. Stay, and T.
   Shiroyama. 1986. Comparisons of Single-species, Micro-
   cosm and Experimental  Pond Responses to Atrazine
   Exposure. Environ. Toxicol. Chem. 5:179-190.

 Leeuwangh, P., T.C.M. Brock, and K. Kersting. 1994. An
   Evaluation of Four Types of Freshwater Model Ecosys-
   tem for Assessing the Hazard of Pesticides. Human and
   Experimental Toxicology 13:888-899.

 Little, E.E., F. J. Dwyer, J.F. Fairchild, A.J. DeLonay,  and
   J.L Zajicek. 1993. Survival of Bluegill and Their Behav-
   ioral Responses During Continuous and Pulsed Expo-
   sures to Esfenvalerate, a Pyrethroid Insecticide. Environ.
   Toxicol. Chem. 12:871-878.

 •"Livingston, R.J. and D.A. Meeter.  1985. Correspon-
   dence  of Laboratory and Field Results: What Are the
   Criteria for Verification?  pp. 76-88.  In: J. Cairns, Jr.,
   ed., Multlspecies Toxicity Testing. Pergamon Press, New
  York, NY.

 Long, E.R. and P.M. Chapman. 1985. A Sediment Quality
  Triad: Measures of Sediment Contamination, Toxicity
  and Infaunal Community  Composition in Puget Sound.
  Marine Pollution Bulletin 10:405-415.

***Luoma, S.N. 1995. Prediction of Metal Toxicity in Nature
  from Toxicity Tests: Limitations and Research Needs.
  pp. 610-659. In: A. Tessierand D. Turner, eds., Metal
  Speclation andBioavailability in Aquatic Systems. John
  Wiley & Sons, Ltd., New York, NY.

Luoma, S.N. and J.L. Carter. 1993. Understanding  the
  Toxicity of Contaminants in Sediments: Beyond the Tox-
  icity Tests-based Paradigm. Environ.  Toxicol.  Chem.
  12:793-796.
 Luoma, S.N. and K.T. Ho. 1993. Appropriate Uses of Ma-
   rine and Estuarine Sediment Toxicity Tests, pp. 193-
   225.  In: P. Calow, ed., Handbook of Ecotoxicology.
   Blackwell Scientific, Oxford, U.K.

 ***Marcus, M.D. and L.L. McDonald. 1992. Evaluating the
   Statistical  Basis  for Relating  Receiving  Impacts to
   Effluent and Ambient Toxicities. Environ. Toxicol. Chem.
   11: 1389-1402.

 Marshall, J.S. 1978. Field Verification of Cadmium Toxicity
   to  Laboratory Daphnia Populations.  Bull. Environ.
   Contam. Toxicol. 20:387-393.

 Mayer, F.L. and M.R. Ellersieck. 1986. Manual of Acute
   Toxicity: Interpretation and Database for 410 Chemicals
   and 66  Species  of Freshwater Animals. Reference
   Source Publication 160. U.S. Fish and Wildlife Service,
   Dept. of Interior, Washington D.C.

 McBride, G.B., J.C. Loftis, and N.C. Adkins. 1993. What Do
   Significance Tests Really Tell Us about the Environ-
   ment? Environ. Manage. 17:423-432.

 Moore, M.V.,  and R.W. Winner. 1989. Relative Sensitivity
   of Ceriodaphnia dubia Laboratory Tests and Pond Com-
   munities  of Zooplankton and Benthos to Chronic Copper
   Stress. Aquat. Toxicol. 15:311-330.

 ***Mount,  D.I. 1985.  Scientific  Problems  in  Using
   Multispecies  Toxicity Tests for Regulatory Purposes.
   pp. 13-18.  ln:J.  Cairns, Jr., ed., Multispecies Toxicity
   Testing. Pergamon Press, New York, NY.

 Mount, D.I. 1995. Development and Current Use of Single
   Species Aquatic  Toxicity Tests,  pp. 97-104.  In:  J.
   Cairns, Jr.  and B.R.  Niederlehner,  eds.,  Ecological
   ToxicityTesting, Lewis Publishers, Boca Raton, FL.

 Mount, D.I. and T.J. Norberg-King, eds. 1985. Validity of
   Effluent and Ambient Toxicity Tests for Predicting Biolog-
   ical Impact, Scippo Creek, Circleville, Ohio. EPA 600/3-
  85-044. Duluth, MN.

 Mount, D.I. and T.J. Norberg-King, eds. 1986.  Validity of
  Effluent and Ambient Toxicity Tests for Predicting Bio-
  logical Impact, Kanawha River, Charleston, West Vir-
  ginia. EPA 600/3-86-006. Duluth, MN.

Mount, D.I., T.J. Norberg-King and A.E. Steen. 1986a.
   Validity of Effluent and Ambient Toxicity Tests for Pre-
  dicting   Biological   Impact,   Naugatuck  River,
  Waterbury,Connecticut.EPAQOO/8-86-005. Duluth,MN.

Mount, D.I., N.A. Thomas, T.J. Norberg, M.T. Barbour, T.H.
  Roush and  W.F. Brandes. 1984. Effluent and Ambient
  Toxicity Testing and Instream Community Response on
                                                  -32-

-------
  the  Ottawa River,  Lima,  Ohio.
  Duluth, MN.
EPA 600/3-84-080.
Mount, D.I., A.E. Steen and T.J. Norberg-King, eds. 1985.
  Validity of Effluent and Ambient Toxicity Testing for Pre-
  dicting Biological Impact on Five Mile Creek, Birming-
  ham, Alabama. EPA 600/8-85-015.  Duluth, MN.

Mount, D.I., A.E. Steen and T.J. Norberg-King, eds. 1986b.
  The Validity of Effluent and Ambient Toxicity Tests for
  Predicting Biologicallmpact, Back River, Baltimore Har-
  bor, Maryland. EPA 600/8-86-001.  Duluth, MN.

Mount, D.I., A.E. Steen and T.J. Norberg-King, eds. 1986c.
  Validity of Ambient Toxicity Tests for Predicting Biologi-
  cal Impact,  Ohio River, near Wheeling, West Virginia.
  EPA 600/3-85-071. Duluth, MN.

Mount, D.R., K.R. Drottar, D.D. Gulley, J.P. Fillo, and P.E.
  O'Neil. 1992. Use of Laboratory Toxicity Data for Evalu-
  ating the Environmental Acceptability of Produced Water
  Discharge to Surface Waters, pp. 175-185.  In: J.P. Ray
  and F.R. Engelhardt, eds., Produced Water. Plenum
  Press, New York, NY.

Neuhold, J.M. 1986. Toward a Meaningful Interaction Be-
  tween Ecology and Aquatic Toxicology, pp. 11-21.  In:
  T.M. Poston and R.  Purdy, eds., Aquatic Toxicology and
  Environmental Fate, ASTM STP 921. American Society
  for Testing and Materials.

Niederlehner,  B.R.,  K.W. Pontash,  J.R. Pratt, and J.
  Cairns, Jr. 1990.  Field Evaluation of Predictions of Envi-
  ronmental Effects from a Multispecies-Microcosm Tox-
  icity Test. Arch. Environ. Contam. Toxicol. 19:62-71.

Niederlehner, B.R., J.R. Pratt, A.L. Buikema, Jr., and J.
  Cairns, Jr. 1985. Laboratory Tests Evaluating the Effects
  of Cadmium on Freshwater Protozoan Communities.
  Environ. Toxicol.  Chem. 4:155-165.

Nimmo, D.R., D. Link,  L.P. Parrish, G.J. Rodriguez, and W.
  Wuerthele. 1989. Comparison of On-site and Laboratory
  Toxicity Tests: Derivation of Site-specific  Criteria for
  Unionized Ammonia in a Colorado Transitional Stream.
  Environ. Toxicol.  Chem. 8:1177-1189.

Nimmo, D.R., M.H. Dodson, P.M. Davies, J.C. Greene, and
  M.A. Kerr. 1990. Three Studies Using Ceriodaphniaio
  Detect Nonpoint Sources of Metals from Mine Drainage.
  J. Water. Poll. Contr. Fed. 62:7-14.

Norberg-King, T.J. and D,l. Mount, eds. 1986. Validity of
  Effluent and Ambient Toxicity Tests for Predicting Bio-
  logical Impact,  Skeleton  Creek,  Enid,   Oklahoma.
  EPA/600/8-86-002.  Duluth, MN.
Obrebski, S., J.J. Orsi, and W. Kimmerer. 1992. Long-term
  Trends in Zooplankton Distributions and Abundance in
  the Sacramento-San Joaquin Estuary. Interagency Eco-
  logical Studies Program forthe Sacramento-San Joaquin
  Delta Estuary. Technical Report No. 32.

*** Okkerman, P.C., E. J. V.D.PIassche, H. J.B. Emans, and
  J.H. Canton. 1993. Validation of Some Extrapolation
  Methods with Toxicity Data Derived from Multiple Spe-
  cies Experiments. Ecotox. Environ. Safe. 25:341-359.

***Parkhurst, B.R. 1995. Are Single Species Toxicity Test
  Results Valid Indicators of Effects to Aquatic Communi-
  ties? pp. 105-121. In:  J. Cairns, Jr. and B.R. Nieder-
  lehner, eds.,  Ecological Toxicity Testing, Lewis Publ-
  ishers, Boca Raton, LA.

***Parkhurst, B.R. 1996. Predicting Receiving System Im-
  pacts from Effluent Toxicity.  pp. 309-321.  In: D.R.
  Grothe, K.L.  Dickson, and  O.K.  Reed-Judkins, eds.,
  Whole Effluent Toxicity Testing: An Evaluation of Meth-
  ods and Prediction  of Receiving System Impacts.
  65ETAC Press, Pensacola, FL.

***Parkhurst, B.R., M.D. Marcus, and C.E. Noel.  1990.
  Review of the Results of EPA's Complex  Effluent
  Toxicity Testing Program.  Utility Water Act Group,
  Washington, D.C.

***Persoone, G. and J. Gillett. 1990. Toxicological Versus
  Ecotoxicological Testing, pp. 287-289. In: P. Bourdeau,
  E. Somers, G.M. Richardson, and J.R. Hickman, eds.,
  Short-term Toxicity Tests for Non-Genotoxic Effects,
  John Wiley and Sons Ltd., New York, NY.

***Persoone, G. and C.R. Janssen. 1994. Field Validation
  of Predictions Based on Laboratory Toxicity Tests, pp.
  379-397. In: I.R. Hill, F. Heimbach, P.I. Leeuwangh, and
  P. Matthiessen, eds., Freshwater Field Tests for Hazard
  Assessment of Chemicals, Lewis  Publishers,  Boca
  Raton, FL.

***Persoone,  G., D. Calamari, and D. Wells.  1990.
  Possibilities and Limitations of Predictions from Short-
  term Tests in the Aquatic Environment, pp. 301 -312. In:
  P. Bourdeau, E.  Somers, G.M. Richardson, and J.R.
  Hickman,  eds., Short-term Toxicity Tests for Non-
  Genotoxic Effects, John Wiley and Sons Ltd, New York,
  NY.

Pontash, K.W. and J. Cairns Jr. 1991.  Multispecies Toxicity
  Tests Using Indigenous Organisms: Predicting the Ef-
  fects of Complex Effluents  in Streams. Arch. Environ.
  Contam. Toxicol. 20:103-112.

Pontash, K.W., B.R. Niederlehner, and J. Cairns, Jr. 1989.
  Comparisons of Single-species, Microcosm and Field
                                                  -33-

-------
   Responses to a Complex Effluent.  Environ. Toxicol.
   Chem. 8:521-532.

 Pratt, J.R., J. Mitchell, R. Ayers, and J. Cairns, Jr. 1989.
   Comparison of Estimates of Effects of a  Complex
   Effluent at Differing Levels of Biological Organization.
   pp. 174-188.  In: G.W. Suter and M.A. Lewis, eds.,
   Aquatic Toxicology and Environmental Fate, ASTM STP
   1007. American Society for Testing and Materials, Phila-
   delphia, PA.

 Richardson, B.J. and M.  Martin. 1994. Marine  and
   Estuarine Toxicity Testing: a Way to Go? Additional
   sitings  from   Northern  and Southern  hemisphere
   perspectives. Marine Poll. Bull. 28:138-142.

 Roberts, J.R., D.W. Rodgers, J.R. Bailey, and M.A. Rorke.
   1978. Polychlorinated Biphenyls: Biological Criteria for
   an Assessment of Their Effects on Environmental Qual-
   ity.  National Research Council of Canada, Ottawa.

 Robinson, R.D., J.H. Carey, K.R. Solomon, I.  R. Smith,
   M.R. Servos, and K.R. Munkittrick.  1994. Survey of
   Receiving-water Environmental Impacts Associated with
   Discharges  from Pulp Mills.  1.  Mill Characteristics,
   Receiving-water Profiles and Laboratory Toxicity Tests.
   Environ. Toxicol. Chem. 13:1075-1088.

 Sasson-Brickson, G. and G.A. Burton, Jr. 1991. In Situ and
   Laboratory Sediment Toxicity Testing with Ceriodaphnia
   dubla. Environ. Toxicol. Chem. 10:201-207.

 Schimmel, S.C., G.E. Morrison, and M.A. Heber. 1989a.
   Marine Complex Effluent Toxicity Program: Test Sensi-
   tivity, Repeatability and Relevance to Receiving Water
   Toxicity. Environ. Toxicol. Chem. 8:739-746.

 Schimmel, S.C., G.B. Thursby, M.A. Heber, and M.J.
   Chammas. 1989b. Case Study of a Marine Discharge:
   Comparison of  Effluent and Receiving Water Toxicity.
   pp. 159-173. In: G.W. Suter, II and M.A. Lewis, eds.,
  Aquatic Toxicology and Environmental Fate: Eleventh
  Volume, ASTM STP 1007, American Society for Testing
  and Materials, Philadelphia, PA.

Sherman,  R.E., S.P. Gloss, and L.W. Lion. 1987. A Com-
  parison of Toxicity Tests Conducted in the Laboratory
  and in Experimental Ponds Using Cadmium  and the
  Fathead Minnow (Pimephales promelas). Water Res.
  1:317-323.

Siefert, R.E., S.J. Lozano, J.C. Brazner, and M.L. Knuth.
  1989. Littoral Enclosures for Aquatic Field Testing of
  Pesticides: Effects of Chlorpyrifos on a Natural System.
  Entomological Soc. Amer., Misc. Publ. 75:57-73.
  Slooff, W. 1985. The  Role of Multispecies Testing i
   Aquatic Toxicology,  pp 45-60.  In: J. Cairns, Jr., ed"
   Multispecies Toxicity Testing, Pergamon Press, New
   York, NY.
 Slooff, W. and J.H. Canton. 1983. Comparison of the Sus-
   ceptibility of 11 Freshwater Species to 8 Chemical Com-
   pounds. II. (Semi)  Chronic  Toxicity Tests. Aquati.
   Toxicol. 4:271 -282.

 Slooff, W., J.A.M. van Oers and D. de Zwart. 1986. Mar-
   gins of Uncertainty in Ecotoxicological Hazard Assess-
   ment. Environ. Toxicol. Chem. 5:841-852.

 Smith, E.P. 1995. Design and Analysis of Multispecies
   Experiments,  pp. 73-95.  In: J. Cairns, Jr., and B.R.
   Niederlehner, eds., Ecological Toxicity Testing, Lewis
   Publishers, Boca Raton, FL.

 Smith, R.  1994.- Contract Report by EcoAnalysis Inc.,
   Ojai, CA.  Submitted to the State Water Resources
   Control Board, Sacramento, CA .

 Sprague, J. 1995.  A Brief Critique of Today's Use of
   Aquatic Toxicity Tests. Human Ecol. Risk Assess. 1:
   167-170.

 State Water Resources Control Board. 1990. Water Quality
   Control Plan for Ocean Waters of California (California
   Ocean Plan). SWRCB Resolution No. 90-27.
 Stay, F.S., D.P. Larsen, A. Katko, and C.M. Rohm. 1985.
  Effects of Atrazine on Community Level Responses in
  Taub Microcosms, pp. 75-90. In:  T.P. Boyle,  ed.,
  Validation and Predictability of Laboratory Methods for
  Assessing the Fate and  Effects  of Contaminants in
  Aquatic Ecosystems, ASTM STP 865, American Society
  for Testing and Materials, Philadelphia, PA.

 Stephenson, R.R., and D.F.. Kane. 1984. Persistence and
  Effects of Chemicals in Small Enclosures in Ponds. Arch.
  Environ. Toxicol. 13:313-326.

 Swartz, R.C., F.A. Cole, J.O. Lamberson, S.P. Ferraro,
  D.W. Schults, W.A. DeBen, H. Lee II, and R. J. Ozretich.
  1994. Sediment Toxicity, Contamination and Amphipod
  Abundance at a DDT-and Dieldrin-contaminated Site in
  San Francisco. Environ. Toxicol. Chem. 13:949-962.

Swartz, R.C.,   W.A.  Deben,  K.A.  Sercu,  and  J.O.
  Lamberson. 1982. Sediment Toxicity and the Distribution
  of  Amphipods in Commencement Bay, Washington,
  USA. Marine Pollution Bulletin 13:359-364.

Swartz, R.C., D. W. Schults, G. R. Ditsworth, W.A. DeBen,
  and F.A. Cole. 1985. Sediment Toxicity, Contamination,
                                                  -34-

-------
  and Macrobenthic Communities near a Large Sewage
  Outfall, pp. 152-175. In: T.P. Boyle, ed., Validation and
  Predictability of Laboratory Methods for Assessing the
  Fate  and  Effects  of  Contaminants  in  Aquatic
  Ecosystems, ASTM  STP 865, American Society for
  Testing and Materials, Philadelphia, PA.

Swartz, R.C., D.W. Schults, J.O. Lamberson, RJ. Ozretich,
  and J.K. Stull. 1991. Vertical Profiles of Toxicity, Organic
  Carbon, and Chemical Contaminants in Sediment Cores
  from the Palos  Verdes Shelf and  Santa Monica,
  California. Marine Environ. Res. 31:215-225.

Underwood,  A.J.  1995.   Toxicological  Testing  in
  Laboratories  Is Not Ecological Testing of Toxicology.
  Human Ecol. Risk Assess. 1:178-182.

USEPA. 1984. Ambient Water Quality Criteria for Cad-
  mium 1984. EPA 440/5-84-032. Washington, D.C.

USEPA. 1991.  Technical Support Document for Water
  Quality-based Toxics  Control.   EPA/505/2-90-001.
  Washington, D.C.

USEPA. 1994a. Short-term Methods for Estimating the
  Chronic Toxicity of Effluents and Receiving Waters to
  Freshwater Organisms. 3rd ed.  EPA 600/4-91/002.
  Cincinnati, OH.

USEPA. 1994b. Short-term Methods for Estimating the
  Chronic Toxicity of Effluents and Receiving Waters to
  Marine and Estuarine Organisms. 2nd ed. EPA 600/4-
  91/003. Cincinnati, OH.

Van den Brink, P.J., R.P.A. Van Wijngaarden, W.G.H.
  Lucassen, T.C.M. Brock, and P. Leeuwangh. 1996. Ef-
  fects of the Insecticide Dursban 4E (Active Ingredient
  Chlorpyrifos)  in  Outdoor Experimental  Ditches:  II.
  Invertebrate  Community Responses and  Recovery.
  Environ.  Toxicol. Chem. 15:1143-1153.

 Van  Wijngaarden, R.P.A., P.J. van den Brink, S.J.H.
  Crum,  J.H. Oude Voshaar, T.C.M.  Brock, and  P.
  Leeuwangh. 1996. Effects of the insecticide Dursban 4E
  (active ingredient chlorpyrifos) in outdoor experimental
  ditches: I. Comparison of short-term toxicity between the
  laboratory  and  the field.   Environ.  Toxicol.  Chem.
  15:1133-1142.

***Waller, W.T., L.P. Ammann, W.J. Birge, K.L. Dickson,
  P.B. Dorn, N.E. LeBlanc, D.I. Mount, B.R. Parkhurst,
  H.R. Preston, S.C. Schimmel, A. Spacie, and G.B.
  Thursby.  1996.  Predicting Instream Effects from Wet
  Tests, pp. 271-286. In: D.R. Grothe, K.L. Dickson, and
  O.K. Reed-Judkins, eds.,  Whole Effluent  Toxicity
  Testing: An Evaluation of Methods and Prediction of
  Receiving System Impacts. SETAC Press, Pensacola,
  FL.

Weiss, C.M. 1976. Field Evaluation of the Algal Assay
  Procedure on Surface Waters of North Carolina, pp. 29-
  76.  In: E.J. Middlebrooks, D.H. Falkenburg and T.E.
  Maloney, eds, Biostimulation and Nutrient Assessment,
  Ann Arbor Science,  Ml.

Yoder, C.0.1991. Answering Some Concerns about Bio-
  logical Criteria Based on Experiences in Ohio. pp. 95-
  104. In: EPA Water Quality Standards for the 21st Cen-
  tury, Proceedings of a Conference. USEPA, Washington,
  D.C.
2.0' Bibliography
***Boyle, T.P. 1985. Research Needs in Validating and
  Determining the Predictability of Laboratory Data to the
  Field, pp. 61-66. In: R.C. Banner and D.J.Hansen, eds.,
  Aquatic Toxicology and Hazard Assessment, ASTM STP
  891.   American  Society for  Testing and  Materials,
  Philadelphia, PA.

***Cairns, J. Jr.  1993.  Environmental Science  and
  Resource Management in the 21st Century: Scientific
  Perspective.  Environ. Toxicol. Chem. 12:1321-1329.

***Cairns, J. Jr., and J.R. Pratt. 1989. The Scientific Basis
  for Toxicity Tests. Hydrobiologia 188/189:5-20.

***Kimball, K.D. and S.A. Levin. 1985. Limitations of Lab-
  oratory Toxicity Tests: The Need for Ecosystem-level
  Testing. 6/osc/ence35:165-171.

***Maltby, L. and P. Calow. 1989.  The Application of
  Toxicity Tests in the Resolution of Environmental Prob-
  lems;   Past,  Present  and  Future.  Hydrobiologia
  188/189:65-76.

***Mount, D.I. 1994. A Comparison of Strengths and Limi-
  tations of Limitations of Chemical Specific Criteria,
  Whole  Effluent Toxicity   Testing, and Biosurveys.
  Contract  report  submitted   to  USEPA  Office   of
  Wastewater Enforcement and Compliance, Washington,
  DC.

***Parkhurst, B.R.  and D.I. Mount. 1991. The Water
  Quality-based Approach to Toxics Control:  Narrowing
  the Gap  Between  Science and  Regulation.  Water
  Environ. Tech. 3:45-47.
                                                  -35-

-------
                                           Appendix  A
                         Single  Species Tests with Effluent
 The following consists of an interpretive summary of stud-
 ies in which effluents were tested with single species toxic-
 ity tests and iri which some ecological survey data were
 collected for comparative purposes.

 A.1 Dickson et al. (1996)
 A study (thesis project of  R. Guinn, as summarized by
 Dickson et al., 1996) was  conducted by the Institute of
 Applied Sciences at the University of North Texas to exam-
 ine the effects  of dechlorinating the  effluent from a
 wastewatertreatmentfacility (WWTP) on aquatic biological
 communities in the West Fork of the Trinity River, Texas.
 The WWTP effluent, at its discharge point, constitutes up
 to 96% of the river's flow during low flow periods. An ob-
 jective of  the  study was to evaluate the relationships
 among effluent toxicity, river water toxicity, and biological
 community responses.

 Field assessments were performed to determine resident
 biota and abiotic factors in the river both upstream and
 downstream of the WWTP. Effluent and ambient water
 toxfcity were assessed with USEPA's 7-d Ceriodaphnia
 survival/reproduction  and  larval  fathead   minnow
 survival/growth tests. In addition, ambient watertoxicity in
 the river was assessed in situ with caged organisms--
 fathead minnows and Asiatic clams (Corbicula fluminea).
 Two sampling sites (controls) were located upstream and
 five sites were downstream  of the WWTP outfall. The first
 two sites below the outfall were within 1.25 miles of the
 discharge point and the remaining three sites were at
 various locations 17 miles or less downstream. Ecological
 surveys  included fish and benthic  macorinvertebrate
 collections. Ambient water toxicity testing was conducted
 with samples collected at all seven sites.

 When this study was initiated, the WWTP was chlorinating
 its effluent. Effluent and ambient water toxicity testing, as
 well as biological sampling, was conducted during this pe-
 riod to establish a baseline for comparison with data col-
 lected after the implementation of dechlorination. During
this pre-dechlorination period data were collected during
two months (August and October).

With both the larval fathead minnow survival and growth
endpoints, statistically significanttoxicity (compared to the
two upstream sites) was observed in the effluent and in
ambient water from the first two sites downstream of the
 WWTP outfall; results were the same in August and Octo-
 ber. Dechlorination of the water samples from the two
 sites below the outfall removed the toxicity. Statistically
 significant toxicity in the larval fathead  tests was not
 observed in ambient water samples from sites 5, 6, and 7
 downstream of the outfall.

 Statistically significant toxicity (compared to the upstream
 sites) was recorded in the effluent and water samples from
 all sites downstream of the WWTP with the Ceriodaphnia
 tests (both survival and reproduction). Dechlorination of
 the toxic ambient water samples failed to remove the
 toxicity, suggesting that other contaminants were causing
 the water flea responses.  In October, statistically signifi-
 cant toxicity (compared to upstream sites) was noted in the
 Ceriodaphnia tests with effluent and in water samples from
 the two downstream sites nearest the outfall, but not at
 sites 5, 6, and 7.

 In the biological surveys, no fish were collected at the two
 sites below the WWTP outfall. Between 200 and 4,500 fish
 were collected at other sites on the river.  Fish species
 richness, evenness, and diversity were fairly equivalent at
 all sites except the two below the outfall. Densities of ben-
 thic macroinvertebrates were lower at the two sites below
 the outfall than at the two upstream reference sites as well
 as  at sites 5,  6, and 7, below the outfall.

 Based on the data collected during the pre-dechlorination
 period, the authors predicted that effluent dechlorination
 would remove toxicity to larval fathead  minnows and
 possibly restore the environment below the WWTP outfall
 so that those areas could be; colonized by fish.  Because
 the toxicity to the water flea could not be totally attributed
 to chlorine, the authors suggested that dechlorination
 might not alter Ceriodaphnia responses.   Potential for
 impacts to instream biota was possible due to non-chlorine
 contaminants.

 Following activation of the dechlorination system WWTP
 effluent and river water samples at  all seven sites were
 collected and tested on a monthly basis for a total of 17
test periods. Dechlorination appeared to remove effluent
and ambient water toxicity when larval fathead minnows
were  used  to screen samples.  Dechlorination did not
remove all of the effluent or ambient watertoxicity detected
with Ceriodaphnia. The TIE Identified disunion as a major
cause of the effluent and ambient water daphnid toxicity.
                                                  -36-

-------
  During the pre-dechlorination period, caged fathead min-
  nows did not survive at river stations 3 and 4, immediately
  below the WWTP outfall and approximately one mile down-
  stream,  respectively.  With the exception of one of four
  testing periods after implementation of dechlorination, sur-
  vival of caged fathead minnows at stations 3 and 4 was
  equivalent to all other stations.  Juvenile Corbicula were
  exposed in situior one month periods on five different test
  dates-one pre-dechlorination and four after initiation of
  dechlorination.  Prior to initiation  of dechlorination clam
  mortality was  100% at stations 3 and 4, while there was
  100% survival at all five of the  other stations.  Post-
  implementation of dechlorination, clam survival at stations
  3  and  4  was  100%.   However,  shell  growth  was
  significantly lower at stations 3 and  4  (compared to all
  other stations), suggesting the presence of an effluent
  contaminant other than chlorine. The in situ tests support
  the results observed in the laboratory toxicity tests with
  effluent and ambient water samples.

  Following dechlorination, fish were present at all river sta-
 tions, supporting the author's prediction of the possibility
 of recolonization at sites 3 and 4 with the implementation
 of dechlorination. However, in three of four surveys after
 dechlorination was initiated, the river station nearest the
 outfall was found to have fish assemblages dissimilar to
 those of  the other stations.  Macroinvertebrate surveys
 revealed significant improvement in diversity and evenness
 at stations  3 and 4 following initiation of dechlorination,
 although  the total number of organisms was lower com-
 pared to the other stations.

 In concluding, Dickson etal., (1996) state 1) "The results
 of this case study add to the growing weight-of-evidence
 to document a relationship between effluent toxicity (even
 chronic toxicity) and receiving system impacts for effluent-
 dominated systems" and 2) "We believe that establishing
 a quantitative relationship between WET test results, ambi-
 ent toxicity, and receiving systems effects, as a means for
 validating WET test results, is not possible given the meth-
 ods, approaches, and resources currently available. How-
 ever, we believe the weight-of-evidence strongly supports
 that such  a qualitative relationship exists."

 A.2 Pontash et al. (1989)
 These researchers compared microcosm (multiple species
 tests consisting of indigenous benthic macroinvertebrates
 and protozoans)  responses to a complex effluent with
 responses observed in  short-term  estimates of chronic
 toxicity (Ceriodaphnia survival and reproduction).  The
 predictive utility of these tests was evaluated in relation to
 observed  effects in the stream  receiving the complex
 effluent.

The  results of  this  study  demonstrated  .that the
 Ceriodaphnia  reproduction response  successfully esti-
mated no effect concentrations for the  assessment of
aquatic community biological responses. Information from
  the multiple species tests provided more specific predic-
  tions than did the single species test.

  The cladoceran reproduction results slightly underesti-
  mated the effects of the complex effluent on the receiving
  stream.  Ceriodaphnia survival results in the laboratory
  toxicity tests underestimated instream impairments of the
  effluent. Similarfindings had been made by Pontasch and
  Cairns (1991) in  which laboratory toxicity tests with D.
  magna underestimated biological community impairments
  in the stream receiving the discharge. Underestimation
  refers to the situation in which the laboratory toxicity test
  indicates a higher effect concentration than that which
  actually causes instream impairments. On the other hand,
  Cairns and Cherry (1983) demonstrated, in tests with a
  power plant effluent, that single species test results can
  effectively predict ecosystem biological responses.

  A.3 Niederlehner et al. (1990)
 The predictive validity of a microcosm (multiple species)
 toxicity test was evaluated by Niederlehner et al. (1990).
 The study was conducted on a stream which receives a
 complex industrial discharge. A control site was located
 immediately upstream of the outfall, site 1 was approxi-
 mately five meters downstream of the outfall; sites 2, 3,
 and 4 were 0.25, 1.4,  and 6.4  km downstream of the
 outfall, respectively. Care was taken to select sites with
 similar characteristics,  especially substrate  type.  The
 concept was to assure that the effluent was the major
 variable among the sites.  Effluent dilutions  at each of
 these sites was estimated using electrical conductivity. In
 addition to the microbial microcosm test, dilutions of the
 effluent also were tested with the 7-d Ceriodaphnia test.
 The instream measurements taken at each of the sites as
 indicators of biological community health, included species
 richness of protozoans and a semi-quantitative survey of
 benthic macroinvertebrates.

 In both the microcosm and the water flea  reproductive
 response tests the LOEC and NOEC were 3% and 1%
 effluent, respectively. In the field survey, significant effects
 on protozoan  and macroinvertebrate  species richness
 were seen at site 1, just below the outfall;  estimated
 effluent concentration at this site was 14.1%.  High per-
 centages of chironomid species and low percentages of
 mayfly species were seen at sites 1, 2, and 3, but at site
 4 the composition  of the two groups was similar to the
 control site. Generally, chironomids  are  considered
 tolerant and mayfly species intolerant of water pollution.

 If the species composition of these two groups is used as
 a sensitive indicatorof ecosystem responses, then effluent
 effects were seen all the way down to site 3.  Estimated
 effluent concentration  at this  site was  3.5%.   The
 Ceriodaphnia reproduction and microcosm tests estimated
the LOEC to be 3% effluent. Therefore, both tests reliably
predicted  instream  biological community responses.
                                                   -37-

-------
 A.4 Diamond et al. (1994)

 A rather detailed analysis of this publication is provided
 because the ecological survey, as well as other compo-
 nents, of  this study represent the type of design and
 analysis that is to be avoided when attempting to assess
 the reliability of extrapolations from laboratory toxicity test
 data to instream responses. Ourevaluation of the analysis
 of the data presented resulted in a conclusion that the
 effluent under study was adversely impacting the stream
 and river into which it was discharged. Diamond et al.
 (1994) concluded that the effluent was not impacting the
 stream.

 Diamond  and associates  conducted  a study  on a
 wastewatertreatmentfacility (WWTP) effluent, the stream
 (X-trib) into which the effluent was discharged, and the
 South Anna River (in Virginia) into which X-trib discharged.
 Chemical  specific analyses and USEPA toxicity tests
 (USEPA, 1994a) were performed on  effluent samples;
 stream bioassessments were implemented on X-trib, the
 South Anna River, and on two reference streams (other
 tributaries to Santa Anna River).

 X-trib, the receiving  water, was described  as heavily
 channelized with concrete structures. WWTP effluent com-
 prised approximately 98%  of X-trib   during low flow
 periods. The Santa Anna River was described as being
 forested over much of its watershed and apparently unim-
 paired by anthropogenic influences above its confluence
 with X-trib. Two sites on X-trib were selected, one in an
 open, sunny area above the WWTP point of discharge and
 the other in a shaded area below the point of discharge.
 The selection of these two sites appears unfortunate in that
 the two sites fail to match in habitat type; therefore, the
 primary  variable  between  sites is more than effluent
 constituents.

 Two reference sites were chosen; one on an open stream
 which discharged into Santa Anna River and was to serve
 as a matched orcontrol site forthe upstream site on X-trib.
 The second reference site was on a shaded stream which
 also discharged into Santa Anna River; this site was in-
 tended as  a control for the lower site on X-trib.  The
 authors  indicated that the reference sites  provided
 Information on fauna capable of inhibiting X-trib. However,
 the authors concluded that the two reference sites on the
 Santa Anna River tributaries had better habitats for fish
 and macroinvertebrates than the X-trib sites. Therefore,
these sites should  be  disqualified as reference  sites
because habitat differences ratherthan water quality could
accountforbiological community differences. Selection of
such  sites  reveals  questionable study  design  and
represents a serious flaw in this study.  Interpretation of
results are clearly confounded. Four sites were selected
on the Santa  Anna  River.   One site was above the
 confluence with the X-trib and the three other sites wen
 downstream of the confluence.
 Bioassessments focused on benthic macroinvertebrates
 and fish populations. Two types of bioassessments were
 performed. The first type involved introduced substrate at
 the sites. This substrate consisted of rocks collected at the
 upstream Santa Anna River site. The authors rationale for
 this procedure related to previous impact to X-trib and the
 Santa Anna River from toxic substances discharged from
 the WWTP. The authors fail to address the question of
 why fauna would not have  naturally recolonized sites on
 X-trib and Santa Anna River if water quality had improved.
 From   a  biological   perspective,  the   existing
 macroinvertebrate communities at a given site better repre-
 sent water quality over time than introduced fauna. If the
 introduced substrate procedure is to be used, information
 on response time (to toxic substances and particularly met-
 als, which tends to be slow as bioaccumulation occurs) of
 the introduced ma'croinvertebrates should have been pro-
 vided, but was not. Furthermore, the introduced substrates
 were  placed  at  each  site  for  only  four  weeks;
 macroinvertebrate communities tend to respond slowly to
 metals and other  toxicants which exert  effects after
 bioaccumulation.

 The authors placed much less emphasis on the second
 type of bioassessment procedure which was grab samples
 at each site. Clearly, however, these resident communities
 would be much more representative of bioaccumulative
 substances. Sampling was conducted during fall (October)
 and spring (April).

 Toxicity tests were performed using the 7-d larval fathead
 minnow and Ceriodaphnia protocols. Ceriodaphnia tests
 were completed on two effluent samples taken in May and
 two collected  in October.   No sample revealed toxicity.
 Larval minnow tests were conducted with two effluent sam-
 ples collected in October (neither indicated toxicity) and
 one sample taken in May. This May sample indicated sig-
 nificant toxicity. Unfortunately, two other effluent samples
 collected May (afterthe first May sample indicated toxicity)
 were not tested with larval fathead minnows. The failure
 to follow up on the first indication of toxicity was an experi-
 mental error. Furthermore, the very few effluent samples
 which were tested  do not allow characterization  of the
 WWTP effluent (WWTP  effluents tend to show consider-
 able temporal variability). Therefore, the authors' conclu-
 sion that the toxicity data indicated the effluent should not
 impact the receiving water biota  is not supported by data
 presented. The seven day tests are not good measures
 of bioaccumulative impacts  of metals.

Although replicates were included in the ecological survey,
variability among the replicates was not reported and fur-
ther complicates data interpretation.
                                                  -38-

-------
 Grab sample bioassessment data collected in the fall ex-
 plicitly revealed that the X-trib sites did not correspond with
 the  reference  sites.   According to  several  of the
 macroinvertebrate indices, the lower X-trib site was im-
 paired compared to the upstream site and to the reference
 site; fish data also suggested that the lower site was im-
 pacted. As, indicated above, the introduced substrate (IS)
 data should be Interpreted with caution; nonetheless, even
 these data imply that the lower X-trib site was impaired
 compared to the site above the discharge point. Perhaps
 more importantly, the fall grab sample at Santa Anna River
 sites downstream of the confluence with X-trib indicate that
 they were impacted.  No fish data were reported for the
 Santa Anna River.

 IS data were presented for only two X-trib sites and two
 Santa Anna River sites (those above and below the X-trib
 confluence). Failure to include other downstream sites, as
 well as the short exposure time (see above) limits the
 value of these data.  Nevertheless,  examination  of the
 dominant taxa on the IS suggests that water quality at the
 upstream Santa Anna River site was better than at the site
 below the  confluence.  Although differences between
 means  of several  of the bioassessment metrics when
 comparing the upstream and downstream  river sites are
 large, they were reported as not being statistically different.
 This is likely due to the fact that an analysis of variance
 was applied to data from both X-trib and Santa Anna River.
 This application does not seem justified given  that the
 tributary and the river  are such different habitats.
 Moreover, the two X-trib site macroinvertebrate indices
 means were frequently so large that variation in the data
 sets masked differences between Santa Anna River sites.

 In the spring collections, the macroinvertebrate grab sam-
 ples analyzed from the lower X-trib site indicated  that it
 was impacted compared to both the upstream and refer-
 ence  sites.  IS data from X-trib for the spring sampling
 period were not presented. During the spring grab sam-
 ples for macroinvertebrate analysis were taken at only two
 Santa Anna Riversites, the upstream and the downstream
 site nearest the confluence. The absence of data from the
 other two Santa Anna River sites further limit this data set.
 Although there were few apparent statistical differences
 between macroinvertebrate indices from the two sites, bio-
 logical community composition indicated that the site below
 the X-trib confluence  was impacted; the same trend was
 noted in the IS data.

 Data  presented in this publication  do not support the
 authors' contention that neither X-trib nor the Santa Anna
 River are impacted by WWTP effluent constituents. They
 attribute the impacts indicated in X-trib by the grab sample
 macroinvertebrate data to habitat limitations. If this is actu-
ally the case, one must conclude that their study design
was flawed from the outset. However, their conclusion is
not supported by the differences between the upstream
 and  downstream sites  (as  shown  in  all  types  of
 bioassessment data).

 A.5  Birgeetal. (1992)
 Birge et al. were involved in a relatively long-term study of
 the effluents produced by the Paducah Gaseous Diffusion
 Plant (PGDP) and the streams into which these effluents
 are discharged, Big Bayou and Little Bayou Creeks. Toxic-
 ity, chemical, and bioassessment monitoring were per-
 formed. Specifically investigating the relationship between
 effluent/ambient toxicity test results and instream biological
 responses was not a stated goal  of this study, but some
 interesting information can be gleaned from their results.

 The PGDP has 16 potential discharge points into the two
 creeks. The focus was on eight of these effluents because
 they constitute continuous discharge to  the streams.
 Seven-day Ceriodaphniaand larval fathead minnow tests
 were conducted with 51 undiluted effluent samples and
 with 37  stream  samples collected  on  four different
 occasions.  Instream biological assessments  (primarily.
 number of taxa and density) of benthic macroinvertebrates
 were performed at three separate times (1987-91) at eight
 sites.  One of these sites was above discharge points,
 three sites were at increasing distances  from the last
 discharge point, and the other four sites were a gradient
 within the spatial range of the  several discharge points.

 Bioassessment data were collected in 1987 through mid-
 1988. Four separate sampling events indicated instream
 biological impairment at sites within the range of discharge
 points (as compared to the upstream reference site).  At
 sites below the last discharge point, there appeared to be
 progressive recovery as measured by number of taxa and
 density of macroinvertebrates. Ecological survey data
 collected in 1990 and 1991, but not 1989, were similar to
those collected in earlier years. The toxicity testing data
 are summarized below:
              Larval Fathead Minnows
   Effluent:
     Significant mortality in 31/51 samples (61%)

   Ambient water downstream:
     Significant mortality in 18/37 samples (49%)

                   Ceriodaohnia
   Effluent:
      Significant toxicity in 11/51 samples (22%)

   Ambient water downstream:
       Significant toxicity in 4/37 samples (11%)
                                                  -39-

-------
The difference in undiluted effluent and ambient water
toxicity appeared to be primarily a dilution phenomenon.
Generally, effluent toxicity predicted instream toxicity when
dilution was taken into consideration.  On a qualitative
basis,  instream  toxicity  reliably  predicted  instream
biological responses.


A.6 Pratt etal. (1989)
The potential impact of a municipal sewage effluent on
Smith River (Virginia) was evaluated  using  acute  and
chronic single species toxicity tests and a microcosm test
consisting of indigenous microbiota.  Effect levels obtained
in the single species and microcosm studies on effluent
were compared with the estimated  instream waste con-
centration (IWC) and with results of an ecological survey.

The study consisted of  two  sites upstream  of  the
wastewater treatment facility (WWTP)  and three sites
below the  outfal! of the facility.  A survey  of benthic
macroinvertebrates and protozoan communities was con-
ducted at each of these sites,  Effluent from the WWTP
was tested in   the  7-d  larval  fathead minnow  and
Cerlodaphnia tests, as well as in the indigenous species
microcosm test.  The microcosm test consisted of  micro-
organisms.  River water samples collected at one  site
above the WWTP outfall and at all sites below the outfall
were tested in the 7-d Ceriodaphnia test, but not the larval
fathead minnow test.

The macroinvertebrate data suggested impairments (com-
pared to the upstream control) at the first two sites below
the WWTP outfall, with  recovery at the  third site.  The
Ceriodaphnia tests did not show significanttoxicity in water
samples collected at the two impacted sites. LOECs were
30% effluentin the microcosm and Ceriodaphniatests and
15% in the larval minnow test. Maximum IWC was esti-
mated to be 9.5% effluent; NOECs  in all  laboratory tests
were  10%  effluent.  Therefore, both  the effluent  and
ambient  water  single  species tests  underestimated
instream impacts.

A.7 Crossland et al. (1992)
The toxic fraction (chlorinated ethers) of a petrochemical
manufacturing plant effluent was studied in simulated out-
doorstreams. Four different concentrations of the effluent
extract were tested in the streams; exposure was for 21 to
28 days.  Two untreated streams served  as controls.

The LOEC and NOEC (Gammaruspuletf in the simulated
streams were  0.86 ug/L and  0.44 ug/L, respectively.
These values were compared to the NOEC from a 7-d
Daphnla magna laboratory test; the reproduction NOEC in
this test was 1.0 pg/L.  Although a 21-day Daphnia test
would have been more appropriate, the result from the
single species test was an effective qualitative predictor of
effect concentration in the  mesocosm; the Daphnia data
slightly overestimated the artificial stream effect concen-
tration.


A.8 Robinson et al.  (1994)
These investigators conducted an examination of the rela-
tionship between environmental responses at 11 pulp mills,
their pulping processes, degree of effluent treatment, and
bleaching technologies.  Water samples from upstream
and downstream of the pulp mill discharge points were
screened  in  the  7-d  larval  fathead  minnow  and
Ce/vbdap/7/7/a tests. These data were compared to physio-
logical   data   collected   from  fish  and  benthic
macroinvertebrate data from above and below the dis-
charge points.

At four of 11 pulp mills the benthic macroinvertebrate com-
munities were characterized as highly impacted below the
discharge point compared to upstream sites. Statistically
significant toxicity was detected in water samples down-
stream of all four of these mills in the larval fathead test,
but at only one site in the Ceriodaphnia test. These four
mills only had primary effluent treatment.  Although the
larval minnow test reliably indicated instream impacts at
the four sites, the Ceriodaphnia test was  less effective.

Neither of the single species tests predicted the physiologi-
cal impairments seen in fish collected below the pulp mill
outfalls.    Physiological responses  associated with
reproductive dysfunction (decreased sex steroid levels and
gonad size) and other disturbances (increased liver size
and enzyme abnormalities) were observed in fish collected
below pulp mill discharge points regardless of mill process,
bleaching technology, or effluent treatment.  This study
represents a case in which the single species tests failed
to predict (i.e.,  underestimated) instream impacts  of
effluents.

A.9 Sasson-Brickson and Burton (1991)
In situ exposures of C. dubia were conducted in a stream
know to be impacted (based on benthic macroinvertebrate
and  fish community data)  by several effluents.   The
C. dubia were in sediment exposure chambers placed in
the stream for 48 h at an  impacted site and at a reference
site.  Sediments from the impacted and reference sites
also were tested in the  laboratory with C. dubia  using
sediment solid phase, interstitial water, and elutriate tests.

Both the in situ and laboratory tests indicated statistically
significant sediment toxicity; the responses in the labora-
tory tests were greater than in the in situ exposures. The
authors concluded that the in situ exposures proved to be
sensitive indicators of both degraded and  nondegraded
stream conditions. They also implied that  the in situ re-
sponses were more reliable than the laboratory responses.
This may not be a valid conclusion  since neither the
                                                   -40-

-------
 laboratory nor the in situ responses were quantitatively
 correlated with instream biological community impacts.

 A.10 Barbour et al. (1996)
 Barbour et al.(1996) summarized studies conducted by
 Ohio EPA (see also Yoder, 1991) in which agreement be-
 tween data from 48-h C. dubia  and fathead minnow
 toxicity tests and from biosurveys were analyzed. Toxicity
 tests were performed on effluent and in some cases on
 mixing zone water samples. These authors surmised that
 the Ohio EPA analysis indicates that "the observance of
 acute toxicity, or lack thereof, in an effluent and to a lesser
 degree in mixing zones is not necessary, reflected by the
 instream communities." According to these authors other
 impacts often pre-empted or masked effects of toxicity.
 The authors concluded, "These results should not be mis-
 construed to claim toxicity testing is an invalid assessment
 and regulatory tool."

 Indeed, caution  should be  used in making conclusions
 from the Ohio EPA data for several reasons. Toxicity tests
 were  not performed  on  water samples collected at
 biosurvey sites.   No  information was provided on the
 degree of effluent dilution at each of the biosurvey sites.
 It appears that toxicity tests were performed on  whole
 effluent, without  dilution series to assess effect concen-
 trations.  Predicting biological community impacts based
 on the results of one toxicity test on effluent (or mixing
 zone sample), as the authors indicate, is unsound.

 Barbouretal. (1996) also summarized similar studies per-
 formed  by the  Florida Department of Environmental
 Protection (DEP).  In this project, 48 h toxicity tests with
 Ceriodaphnia and Notropis leedsi(a marine minnow) were
 performed on effluents from 107 facilities classified  into
 several industrial categories. Macroinvertebrate surveys
were conducted  on streams into which the facilities dis-
charged.  Comparisons were  made between effluent
toxicity and biosurvey data.
      Effluent toxic, stream site impaired = 24.0%;

    Effluent toxic, stream site not impaired = 10.7%;

    Effluent not toxic, stream site impaired = 41.3%;

  Effluent not toxic, stream site not impaired = 24.0%.
 Combining data from all facilities the following relationships
 between effluent toxicity and instream survey data were
 obtained. Toxicity tests reliably "predicted" instream condi-
 tions in 48% of the 107 situations. "False positives" were
 relatively rare (10.7 %).  "False negatives" were much
 more common (41.3 %). Florida DEP attributed biological
 impairment at a large portion of the "false negative" sites
 to non-effluent related factors.

 Barbouretal. (1996) concluded that lack of agreement
 in this study was  not necessarily due  to  contradiction
 between the toxicity testing and biosurveys. This conclu-
 sion seems valid since in many  cases biological impair-
 ment was due to causes other than effluent toxicity. Also,
 toxicity tests  were performed on only one sample from
 each facility (as indicated above, one sample is not likely
 to characterize the effluent of a facility). Furthermore, the
 same cautions as mentioned above in regard to the Ohio
 EPA data apply here. That  is, toxicity tests were not
 performed on water samples collected at biosurvey sites.
 No information was provided on the  degree of effluent
 dilution at each  of the biosurvey sites.  It  appears that
 toxicity tests were performed on whole  effluent, without
 dilution series to assess effect concentrations.

 A.1I1 Mount etal. (1992)
 During fossil fuel production  water  pumped from the
 formation is  separated and discarded, frequently into
 marine or freshwater environments.  This fraction, com-
 monly  termed "produced water" can  contain a diverse
 array of contaminants including brine, hydrocarbons, heavy
 metals, surfactants, and corrosion inhibitors.

 Mount  and colleagues reported on a series of laboratory
 and field studies which were conducted on produced water
 from a coal bed methane operation in the Cedar Cove De-
 gasification Field of Alabama. The produced waters were
 discharged into Little Hurricane Creek. The primary goal
 of the studies was to determine the environmental accept-
 ability of discharging produced water into this creek.

Toxicity tests we re performed on the produced water using
 USEPA's fathead minnow and Ceriodaphnia tests; the
cladocerans proved to be the more useful monitoring tool
 in these studies. Concurrent with the laboratory toxicity
tests, a series of instream surveys were performed on Little
 Hurricane Creek. Based on these data the authors con-
cluded, "Research conducted at Cedar Cove suggests that
 laboratory toxicity tests can be used to predict instream
effects  of produced water discharge.
                                                  -41-

-------
                                           Appendix  B
              Single Species Tests with Individual  Chemicals
  The  following consists of an interpretive summary of
  studies in which  single species tests were used to assess
  the toxicity of a single chemical or combination of a small
  number of chemicals and predict effect concentrations on
  aquatic ecosystem biological responses.

  B.1 Organic Chemicals: Pesticides
  B.1.1 Hansen and Garton (1982)
  These investigators assessed the ability of single species
  toxicity test results to reliably predict the effects of the in-
  secticide diflubenzuron on complex laboratory stream com-
  munities. The single species tests included five "chronic
  tests" with five different species, including a 21 -d Daphnia
  magna test.  The laboratory stream communities  were
  stocked from a natural source and then exposed to the
  pesticideforfive months. Effects on the stream communi-
  ties were appraised at the functional group level using bio-
  mass and diversity.

  For Daphnia, the 21-day LC50 for this  pesticide was
  0.1 ug/L. Statistically significant effects on invertebrate
  shredder, scraper, and collector/gather/filterer functional
  groups were evident in the mesocosm after 5 to 7 months
  exposure  at a   nominal  concentration   of  0.1  pg
  diflubenzuron/L.  The Daphnia toxicity test results ap-
 peared to reliably predict the responses of aquatic inver-
 tebrate communities.  However,  there is  uncertainty in
 these  data.  For  example, duration  of exposure in the
 laboratory and field setting  were very  different  and
 mesocosm  exposures  were not analytically confirmed
 (dissipation and degradation usually results in lower than
 predicted exposure concentrations). LCSOs are not nec-
 essarily the optimal predictor tool; however, if environ-
 mental concentrations of a chemical approach the LC50
 level, biological community impairments are probable.
 Another confounding factor in this study was that the
 control populations declined during the five month course
 of the study.

 Although there were uncertainties and confounding factors
 in this study, the correspondence between laboratory and
 field effect concentration supports the hypothesis that labo-
 ratory test results are predictive indicators of direct effects
 in the environment. Concentrations below the laboratory-
 determined LC50 were not tested  in the mesocosms, so
it is not possible to know whether field effect levels were
overestimated.
  B.I.2 Baughman etal. (1989)
  To evaluate the usefulness of laboratory toxicity tests in
  predicting fenvalerate (a pyrethroid insecticide) impacts,
  Baughman et al. conducted laboratory and field tests with
  the grass shrimp (Palaemonetes pugio).  Two types of
  laboratory tests were conducted: 96-h static-renewal tests
  and 6-h pulse dose exposures.  The response (endpoint)
  compared between laboratory and field was the LC50.

  Response of grass shrimp in the  field was similar to the
  laboratory toxicity tests (i.e., concentrations which were
  shown to produce lethality in the laboratory also caused
  mortality in field settings).  These results indicated that
  physical and chemical factors in natural ecosystems did
  not appreciably modify the toxicity of fenvalerate.

  In this study, laboratory test data were  not extrapolated
 across species, but rather to the same species in natural
 stream conditions. Although  not a powerful support of the
 reliability of laboratory single species  test results as pre-
 dictors of instream biological impacts, this study does show
 a correspondence between laboratory and natural ecosys-
 tem effect concentrations.

 B. 1.3 Clark et al. (1987)
 Clark and colleagues scrutinized laboratory toxicity test
 results  as  predictors of  effects  of fenthion,  an
 organophosphorus insecticide, on  caged animals in field
 settings. The laboratory tes;ts were 96-h mortality deter-
 minations on a mysid (M. bahia), the pink shrimp (Penaeus
 duoraum), the grass shrimp  (Palaemonetes pugio), the
 sheepshead minnow (C.variegatus). The responses used
 for comparisons were 24-h  and 48-h LCSOs.  Caged
 animal  tests   and  environmental   chemical  studies
 (measurements of fenthion) were executed in a bay and
 a pond connected to Santa Rosa Sound, as well as in an
 estuarine bay.

 Results of this study reveal that the laboratory-derived
 LCSOs were reasonable predictors of mortality to the same
 species in the field, but only when laboratory and field ex-
 posure regimes were similar. The laboratory LCSOs were
 not effective predictors of sublethal effects. As in many of
the studies summarized above, the findings of this study
disclose that physical and chemical factors in aquatic eco-
systems did not appreciably alter the toxicity of this pesti-
cide. Caging of animals in the  field  did not allow for
                                                  -42-

-------
 possible  avoidance  behavior.  The  advantages  and
 limitations of using acute exposure LCSOs as predictors
 of instream biological responses were mentioned above.

 B. 1.4 Fairchild et al. (1992)
 Population, community and ecosystem level responses to
 pulse  doses of esfenvalerate, a pyrethroid insecticide,
 were studied in experimental aquatic mesocosms. Differ-
 ent mesocosms were dosed at nominal concentrations of
 0,0.25, 0.67, and 1.71 ug/L esfenvalerate (each concen-
 tration had triplicated mesocosms). The pulse dosings
 were 15 minute applications  to achieve the nominal con-
 centrations every two weeks for a total of three months.

 Static acute (48-h) toxicity tests with the insecticide were
 conducted with D. magma and provided an LC50 of 0.27
 ug/L esfenvalerate;  neither  a  LOEC  or NOEC  were
 reported.  This laboratory effect level was compared to
 effect levels seen in the mesocosm portion of the study.

 I n the mesocosm component of the study, zooplankton and
 benthic macroinvertebrate populations were significantly
 decreased at the pulse dose treatment of 0.25 ug/L. There
 were also shifts in community composition and dominance
 at this treatment level. This was the lowest pulse dose
 tested, so an NOEC was not established. The effect con-
 centration in the mesocosm was compatible to the labora-
 tory generated 48-h LC50 forthis pesticide. With the differ-
 ence in exposure patterns and durations in the laboratory
 (single species) test and multispecies mesocosm studies,
 it is remarkable that a mesocosm effect concentration cor-
 responded so well with the laboratory test results.

 In the mesocosm, 0.67 ug/L  esfenvalerate reduced sur-
 vival, biomass, and reproductive success of bluegill sun-
 fish. The laboratory LC50 for juvenile bluegills exposed to
 esfenvalerate  for 96 h ranged  from  0.42 to 1.35 ug/L
 (Mayer and Ellersieck, 1986). This finding indicates that
 a laboratory effect concentration translates reliably into a
 field effect level.  Also inherent in this observation is that
the complex, multivariate conditions in the mesocosm did
 not appreciably modify the toxicity (i.e., bioavailability) of
 chemicals seen in highly controlled  laboratory studies.
Overall, the results of this study imply that single species
toxicity test results can qualitatively predict effect concen-
trations in more complex, multivariate systems. Another
study (Little et al.,  1993) with fenvalerate also indicated
that  laboratory determined effect concentrations  were
reliable predictors of effect concentrations in natural sys-
tems. Forfenvalerate, and possibly other pesticides which
have relatively short half-lives and may not exist in aquatic
ecosystems  for extended periods, acute toxicity test
endpoints may be  reliable  predictors of  biological
community responses.
 B.1.5Slooff(1985)
 A multiple species microcosm toxicity test was conducted
 in the Netherlands to determine the NOEC for the herbi-
 cide, dichlorbenil. An NOEC was also determined for
 Daphnia magna in the 21 -d short-term estimate of chronic
 toxicity.

 The microcosm NOEC from a  400 day exposure  was
 0.3 ug/L. The NOEC, 0.1 ug/L, determined in the Daphnia
 test was a qualitatively accurate predictor of the mesocosm
 no  effect level.   In Slooff's  review of the literature on
 dichlorbenil, the NOEC determined from 167 field expo-
 sures of various species was also 0.1 ug/L.

 After reviewing other data in the literature and in relation
 to these data, Slooff (1985) submits that multiple species
 (micro-  or mesocosm) toxicity test results are not better
 predictors of aquatic ecosystem responses than are single
 species toxicity test results. He concludes thatthe multiple
 species test results have many uses, but that, at their cur-
 rent stage of development, they do not improve predictions
 of ecosystem impairments.

 B.1.6 Larsen et al. (1986)
 Microcosm test data have been proposed as having more
 ecological relevance than laboratory indicator species test
 results.  Larsen and co-workers compared the predictive
 reliability of "surrogate" species and mesocosm toxicity test
 results with responses in experimental ponds to the herbi-
 cide, atrazine. This study compared the responses of algal
 tests, a algal microcosm, and experimental ponds exposed
 to similar concentrations of the herbicide. Eight different
 algal species were included in the indicator species tests.
 The endpoints used for comparisons in all three  systems
 were EC50s (the chemical concentration at which 50% of
 the test  population exhibits a  response).

 According  to  these  investigators,  the basic  similarity
 among the EC50 values across test systems suggests that
 results from a combination of single species tests or from
the mesocosm provided a reasonable estimate of the
 concentration of atrazine that produced similar effects on
the experimental pond. Both the lowest and highest EC50
came from single species tests. These authors conclude
that, "because broad ranges in species sensitivities occur,
 use of only a few test species might not offer sufficient
environmental protection."  Improvement in predictive
ability occurs when several  species  are used as  test
organisms.   Although this  study provided  valuable
 information, the EC50 endpoint may not be the most
 realistic  response measure to compare tests. One would
predict that a concentration of chemical(s) which is high
enough  to  affect 50% of a test  population, has a high
potential of evoking significant biological community re-
sponses.
                                                   -43-

-------
  Single species toxicity tests, microcosm, and outdoor ex-
  perimental pond exposures have been employed by other
  investigators (Stay et al., 1985; de Noyelles and Kettle,
  1985) to ascertain the effects of atrazine on algal primary
  production. Both the single species and microcosm tests
  were predictive of atrazine concentrations which signifi-
  cantly reduced production in the outdoor ponds. However,
  recovery from atrazine stress was not predicted by the
  laboratory tests. In the single species and microcosm tests
  there was only limited recovery whereas the pond commu-
  nities recovered more quickly because sensitive algal spe-
  cies were  replaced by algal species more resistant to
  atrazine.  The ecological significance (over time) of this
  shift to more chemically resistant assemblages was not
  discussed and is unknown.  Composition could change at
  all trophic levels due to the shift in algal species. All algal
  species were affected by atrazine in all test regimes, but
  only the pond study revealed assemblage shifts.

  B.I.7 Cross/and (1984)
  Studies with the insecticide methyl parathion revealed that
  laboratory single species toxicity test results underesti-
  mated the secondary effects (indirect effects that are not
  represented by direct action of a chemical on an individual
 species,  but rather result from interrelationships among
 components of a biological community) of this pesticide in
 outdoor ponds. The concentrations of methyl parathion
 eliciting toxicity, and thus decreasing populations, in zoo-
 plankton and benthic macroinvertebrates in the outdoor
 ponds were reliably predicted by the laboratory single spe-
 cies toxicity test results.

 The decreased populations of mayfly larvae and daphnids,
 cased by methyl parathion, secondarily resulted in blooms
 of filamentous algae. Death and decay of the algae in turn
 decreased dissolved  oxygen resulting in death of fish.
 Loss of invertebrate food items also caused reduced fish
 populations and smaller sized fish.

 B.I.8 Stephensen and Kane (1984)
 The fate and biological effects of the insecticides methyl
 parathion and linuron in outdoor ponds were studied. The
 relative sensitivities (response per concentration) were
 similar in both the laboratory and ponds for both pesticides.
 Furthermore, the response concentrations determined for
 Daphnia magna in the laboratory correlated closely with
 effect concentrations in the outdoor pond. The authors
 concluded that biotic and abiotic factors existing in ponds
 did notalterthe toxicity (i.e., bioavailability compared to the
 laboratory tests) of these two pesticides.

 B.1.9 Van Wijngaarden etal. (1996)
 Using the insecticide Dursban  4E  (active ingredient
 chlorpyrifos, an organophosphorus pesticide) these inves-
tigators compared the results of laboratory indicator spe-
cies toxicity tests with laboratory tests on indigenous spe-
cies, as well as with data from outdoor mesocosm tests.
  Mesocosms were sprayed once with the intent of achieving
  nominal chlorpyrifos concentrations of 0.1, 0.9, 6, and
  44 ug/L. Analytical measurements of chlorpyrifos were
  used to determine exposure and effect concentrations.
  Effects in the  mesocosms were assessed by sampling
  zooplankton and macroinvertebrates; in addition, in situ
  cage experiments were performed with several species.

  The indicator species, D. magna, was almost as sensitive
  to chlorpyrifos as the indigenous speci.es. The difference
  between the laboratory EC50s for the daphnid (1.0 ug/L)
  and  that for the most sensitive indigenous  species,
  Gammaruspulex(0.8 ug/L) was small, suggesting that the
  indicator species was not more sensitive to the insecticide.
  Effect concentrations (for nine invertebrate species) deter-
  mined in single species laboratory tests were compared to
 effect concentrations derived in the mesocosm exposures.

 The authors concluded that laboratory single species EC
 values were reliable estimators of mesocosm ECs, differ-
 ing by less than a factor of three for the seven species
 studied. Essentially the same conclusions were reached
 when comparing ECs from the laboratory toxicity tests with
 those from the cage experiments. These data indicate that
 chlorpyrifos bioavailability was not significantly  reduced
 under the mesocosm conditions.

 Although there was considerable spatial and temporal vari-
 ation of chlorpyrifos concentrations in the mesocosm expo-
 sures, ECs determined underthose conditions were similar
 to the laboratory ECs obtained under constant exposure
 regimes (i.e., variable and constant exposure regimes led
 to  comparable  effects).   According to these authors,
 laboratory single species toxicity test results can  be used
 to estimate direct effects in field populations.

 In a subsequent publication, van den Brink et al. (1996)
 reported that recovery of invertebrate populations afterthe
 single application  of chlorpyrifos required  three to six
 months. The investigators also suggested that "safe" con-
 centrations determined in short-term single species labora-
 tory toxicity tests are sufficient to protect invertebrate com-
 munities.

 Several other investigators (Eaton et al., 1985; Brock et al.,
 1992; Leeuwangh et al.,  1994) also concluded that the
 direct effects of chlorpyrifos on aquatic invertebrate com-
 munities can  be reliably predicted  on the basis  of
 laboratory single species toxicity data; that is, population
 responses observed in microcosms and mesocosms were
consistent with  laboratory  single species toxicity test
 results.

B.I.10 Kersting and van Wijngaarden (1982)
The effects of chlorpyrifos were  studied in a laboratory
microcosm system.  Daphnia magna was the herbivore
component in the microcosm; this species was also the
subject of a laboratory 48-h lethality test. The microcosms
                                                   -44-

-------
 received a single application of chlorpyrifos, and the re-
 sponses were followed for 130 days.

 Chlorpyrifos concentration in the Daphnia component of
 the mesocosm was 0.5 |jg/L on day 1, decreasing  to
 0.2 |jg/L by day 7. The laboratory 48-hour LC25 for Daph-
 nia was  0.4  ug/L.  Although  pesticide concentration
 decreased rapidly to belowthe LC25, Daphnia populations
 in the exposed microcosms decreased 36% and 42% in
 the two replicates.  Populations recovered within two
 weeks. The ecological significance of the magnitude and
 duration of population declines is unknown.

 Arguably, the laboratory LC25 was an effective predictor
 of the  population decline in the mesocosm.  However,
 chlorpyrifos treatment resulted in other biotic and abiotic
 changes in the microcosms. The laboratory Daphnia tests
 underestimated these other, "secondary" mesocosm ef-
 fects.

 B.1.11 Siefert et al. (1989)
 The effects of chlorpyrifos in natural pond enclosures were
 investigated. The pesticide was applied once to sets  of
 replicate ponds to achieve three different test concentra-
 tions;  pond   concentrations  of  the  pesticide  were
 analytically monitored.  Phytoplankton, periphyton, zoo-
 plankton,  benthic macroinvertebrates,  and fish  were
 sampled periodically up to 30 d post-application. Limita-
 tions in this study include the absence of normal exchange
 between the enclosures with the remainder  of the pond;
 also chlorpyrifos  adsorbed to the wall  material  of the
 enclosures (this problem relates mostly to environmental
 fate, rate  of loss, but also to a decrease  in potential
 exposure); this difficulty  was  partially  offset by the
 monitoring of pesticide concentrations in the water column.

 The targeted concentrations were 20, 5, and 0.5 ug/L.
 Chlorpyrifos concentrations decreased rapidly after appli-
 cation (see above) to 10,1, and 0.2 ug/L by day two post-
 application.  Cladocerans were the most sensitive of the
 zooplankton species, with all  five identified species
 showing dramatic and statistically significant population
 declines  at  the  lowest  chlorpyrifos  concentrations.
 Chironomids  were  the   most  sensitive  benthic
 macroinvertebrates, with 9 of 10 of the identified species
 responding to the  lowest chlorpyrifos treatment with sta-
tistically significant population declines.

 Laboratory determined acute toxicity LC50s (54 species)
from the literature were compared to the pond effect con-
centrations.  In general, the single species LC50 values
were higher than the LOEC determined in the pond study
(i.e.,  LC50s  underestimated  biological  community
responses); LC50sfrom Daphniaand Gammart/swerethe
most accurate forecasters of the pond LOEC. Significant
reductions in growth rates in larval fish, not predicted by
direct effects of chlorpyrifos, were also noted in this study.
 The  authors  attributed  these  secondary  effects to
 chlorpyrifos-caused  declines  of  invertebrate  forage
 organisms.

 B.2 Additional Organic Chemicals
 B.2.1 Cooper and Stout (1985)
 The effects of p-cresol on the biota in outdoor experimental
 stream channels (analogs of natural streams) were com-
 pared to the results of single species tests with this chemi-
 cal. Three hypotheses were tested:
     1) The transfer of laboratory acute toxicity test results
     to field situations  is  possible  without  serious
     distortion.
     2) Data from single species tests with p-cresol will
     yield similar results as multiple species, community
     level, tests.
     3) Pulsed exposures with short time intervals be-
     tween events will produce the same ecological re-
     sponses as continuous exposure with the same inte-
     gral of exposure (integral of concentration X time).

 In regard to the first hypothesis, data from this  study
 showed that the acute toxicity tests with fathead minnows,
 large mouth  bass, small mouth bass, damsel fly larvae,
 and amphipods estimated  survivorship rates consistent
 with results of the experimental  stream experiments.
 These investigators also concluded that results of the sin-
 gle species tests were effective predictors of community
 level responses.  The third hypothesis was found to be
 untrue in that the pulse exposure (with same integral) pro-
 duced greater impacts than continuous exposure. These
 pulse-response results are useful in interpreting data col-
 lected in agricultural settings where aquatic communities
 may be exposed to pulses of pesticides.

 B.2.2 Dorn  et al. (1991)
 These researchers undertook a project to estimate the
 environmental effects of a choloetherf raction from a chem-
 ical plant effluent.  The chemical plant effluent had been
 shown to be toxic to sheepshead minnows (Cyprinodon
 varigatus)  and  a mysid (Mysidopsis bahia).  Toxicity
 identification  evaluation  (TIE) procedures demonstrated
 that the primary causes of toxicity  was a  mixture of
 pentacholroethers.

 To gauge effect concentrations of the chloroether fraction,
 laboratory  toxicity  tests were  executed  with Daphnia
 magna, fathead minnow larvae, and Mysidopsis bahia.
 Effect concentrations forthe chloroetherfraction also were
 assessed in outdoor artificial streams.

The most sensitive indicator species was the water flea;
the NOEC for this species was 1.0 mg/L.  NOECs in the
outdoor streams were 0.44 and 0.26 mg chloroethers/L for
 Gamma/usand rainbow trout, respectively. The laboratory
effect concentrations forthe chloroethers in single species
                                                   -45-

-------
 tests were  reliable  qualitative  predictors of  effect
 concentrations in the outdoor stream experiments. The
 outdoor stream communities were somewhat more sensi-
 tive to the toxicants than indicated by the laboratory single
 species tests.

 In a follow-up study (Crossland et al., 1992), a range of
 chloroether fraction concentrations was tested in outdoor
 artificial streams. Exposure was for 28 days. Four differ-
 ent concentrations were tested; there were no replicate
 streams except for the control treatment.  Three mesh
 bags of macroinvertebrates were introduced into each
 stream.      However,   the  number  of  benthic
 macroinvertebrates of a given species was not equivalent
 in the different treatment groups at the time of pretreatment
 sampling;  thus  statistical  comparisons  among  the
 treatments  was not possible.  Furthermore, there was
 considerable variation among  "replicates" within each
 stream. Feeding rates of the amphipod, Gammaruspulex,
 also were assessed in the artificial streams. These and
 other factors render interpretation of macroinvertebrate
 data difficult.

 Gammarus numbers were  significantly  reduced at a
 chloroether  concentration of 0.86 mg/L, but not  at
 0.44 mg/L (the NOEC).  Invertebrate drift (possibly indicat-
 ing an unhealthy condition also appeared to be increased
 at chloroether concentrations of 0.44 m/L and above.  In
 laboratory 21-d Daphnia  magna tests, the chloroether
 NOEC was 1.0 mg/L and the LOEC was between 1 and
 2.5 mg/L Although comparable effect concentrations were
 seen in the laboratory single species test and the artificial
 stream data, the outdoorpopulations were somewhat more
 sensitive to the chloroethers.

 0.2.3 Fairchild et al. (1993)
 Laboratory and field studies were conducted with linear
 alkylbenzene sulfonate (LAS, an anionic surfactant), by
 Fairchild et al., to evaluate the use of laboratory-generated
 NOECs for protecting aquatic ecosystems. Laboratory
 toxicity tests included the 7-d fathead minnow test and a
 7-d test with the freshwater amphipod, Hyalella azteca. A
 series of these tests with exposures to a range of LAS
 concentrations resulted in a  laboratory estimate of a
 NOEC. This laboratory test predicted NOEC was then
 tested in the field with a 45-d exposure in outdoor experi-
 mental streams (three replicates).

 In these experimental streams, exposure to LAS concen-
 trations equivalent to the laboratory NOECs, no biological
 community impairments were seen as gauged by surveys
 of benthic macroinvertebrates, periphyton growth, detritai
 processing, and fathead minnow populations. The authors
concluded that their results indicated that the laboratory-
generated NOECfor LAS predicted environmental protec-
tive concentrations.  Results of this study do not demon-
 strate that concentrations above the laboratory NOEC
 would   have  engendered  impacts  in   the  outdoor
 experimental streams, but do suggest that the single spe-
 cies toxicity test results can be useful tools in predicting
 environmentally "safe" concentrations.

 B.2.4 Boyle et al. (1985)
 These investigators compared the responses (survival and
 growth) of bluegill sunfish and large mouth bass exposed
 to fluorene (a polynuclear aromatic hydrocarbon) in labora-
 tory 30-d partial life cycle tests to the responses of the
 same species exposed in outdoor ponds.

 The laboratory toxicity test results underestimated the re-
 sponse of these fish species in the outdoor ponds.  More-
 over, the responses in the experimental ponds were more
 sensitive to fluorene (e.g., occurred at a lower concentra-
 tion than in the laboratory tests). To the contrary, labora-
 tory toxicity tests overestimated responses of zooplankton,
 phytoplankton, and some insect populations to fluorene.

 B.2.5 Giddings and Franco (1985)
 The effects of a synthetic coal-derived crude oil were
 assessed in outdoor ponds and indoor microcosms. The
 results of these tests were compared with data from labo-
 ratory single species toxicity tests. Response concentra-
 tions were similar in the microcosms and pond studies. A
 "safe" exposure concentration for this organic compound
 was derived from the pond study.  Without an application
 factor, the USEPA final  acute and chronic values were
 higher than this "safe" concentration, whereas the LOEC
 of a 28-d D. magna laboratory test provided an effective
 prediction of the "safe" concentration.

 B.2.6 Crossland and  Wolff (1985)
 In this study [97] the effects of pentachiorophenol (PCP)
 were examined in outdoor experimental ponds. PCP was
 repeatedly applied to the subsurface water of three ponds
 with the aim of maintaining an average concentration of 50
 to 100 ug/L for 30 days. There were also replicate control
 ponds. Actual pond concentrations of PCP averaged 19
 to 21  ug/L (days 1 through 14) and 60 to 69  ug/L (days 15
 through 43).  No statistically significant effects were ob-
 served on algal, zooplankton,  benthic macroinvertebrate,
 orfish populations.  It should be noted, however, thatthere
 was considerable within and between replicate pond vari-
 ability. The three lowest laboratory determined PCP LC50
values gleaned from the literature were 52 ug/L (96-h rain-
bow trout), 100 ug/L (8-d  snail egg production  and
viability), and 130 ug/L (16-day snail egg viability). Since
most of these effect concentrations from the most sensitive
species  in  the  database  were greater than  PCP
concentrations in the ponds,  impacts  would  not  be
predicted.  Based  on  these observations, the authors
contended that a combination of single species toxicity test
results can effectively predict environmentally  "safe"
                                                   -46-

-------
concentrations for a chemical.  The variability of the
treatment concentrations as well as concentration variation
within and between replicate ponds, in addition to the fact
that  a pond  effect concentration  was not established
renders this study inconclusive regarding the accuracy of
single  species test results  in predicting environmental
impacts.

B.2.7 Giddings et al. (1984)
These investigators (Giddings et al., 1984; Franco et al.,
1984) examined the impacts of phenolic compounds on
biological communities in outdoor ponds. The phenolic
compounds were administered to replicate ponds daily for
56 days; five different treatment levels were compared to
control ponds. A laboratory-generated 28-d test LOEC for
Daphnia magna was a relatively  good forecaster  of a
phenol effect concentration in the experimental ponds.
However, the most sensitive indices of biological commu-
nity structure/function were affected at phenol concentra-
tions lower than the laboratory chronic LOEC. A 48-h test
LC50 for Daphnia was not a good predictor of pond effects
because much lower concentrations of the phenolic com-
pounds impacted pond  communities.

B.2.8 Nimmo et al. (1989)
Acute (96 h) toxicity tests with fathead minnows, Johnny
darters (Etheostoma nigrum), white suckers (Catosotomus
commersoni), as well as acute and 7-d C.  dubia  toxicity
tests were conducted by Nimmo et al (1989) to evaluate
whether river water (Vrain River in Colorado) ameliorated
toxicity of ammonia compared to laboratory tests in which
well water was used.

For most of the test species, ammonia LCSOs were equiv-
alent in the river water compared to the laboratory well
water. These data illustrated that there was not an amelio-
ration of ammonia by river water; that is, the laboratory test
results did not overestimate toxicity measured instream.
Related  to the above observation, laboratory single
species  toxicity tests  with  polychlorinated  biphenyls
overestimated the concentration demonstrated  in  field
studies to decrease diversity in invertebrate populations,
that is the field populations were more sensitive (Roberts
eta!., 1978).

B.2.9 Adams et al. (1983)
The  toxicity of a commercial phosphate  ester product
(PEP)  determined in outdoor tanks and in the laboratory
were compared. The test organisms were D. magna and
fathead minnows. Five concentrations of the PEP were
tested in the outdoortanks, without replicates; the five con-
centrations were tested in a series of tanks with and with-
out sediment. PEP concentrations were analytically moni-
tored and exposure  concentration maintained for  two
months. Laboratory toxicity tests consisted of 30-d fathead
minnow and 21-d D. magna flow-through tests.
LObCs forfathead survival in the lab, outdoor no sediment
tank,  and outdoor sediment  tank were 410, 826, and
545 ug/L, respectively. The laboratory tests overestimated
the toxicity of PEP, but estimates were within an order of
magnitude of one another. In the Daphnia tests, the repro-
duction LOECs for the lab, outdoor no sediment tank, and
outdoor sediment tank were 100,  136, and 226 ug/L, re-
spectively. The laboratory water flea tests gave  a fairly
reliable qualitative estimate of the PEP LOEC.

A major limitation of this study was that the outdoortanks
were not ecosystem surrogates; they did not contain other
biological communities, but only the test species. Small
sample sizes and the lack of replication were among the
other factors which compromise the reliability of data gen-
erated in this study.

B.3 Metals
B.3.1 Canfield et al. (1994)
This group evaluated the potential impacts of past mining
activities on the Clark Fork River (Montana) aquatic eco-
system using a benthic invertebrate community assess-
ment, chemical analyses on sediment, and laboratory
whole-sediment toxicity tests with an amphipod, Hyalella
azteca, a midge, Chironomus riparius, a cladoceran, Daph-
nia magna,  and  larval rainbow trout,  Oncorhynchus
mykiss. The study included  six sites in the Clark  Fork
River watershed,  one  control/reference  station  on an
uncontaminated tributary and five sites downstream of past
mining areas.

Sediment concentration of metals (especially copper) were
high at Clark  Fork sites 1 through 4. A  metals  con-
centration gradient from the most upstream sites to the
most downstream site was observed. The control site on
the tributary had the lowest sediment metal concentration.
The authors cautioned that there were many confounding
factors (including a possible sampling bias) influencing the
benthic invertebrate data which rendered interpretation
difficult. The authors pointed  out that many chironomids
are tolerant  of degraded conditions.  Furthermore, the
percentage of the Chironomidae community comprised of
Tanypodinae (considered to be relatively pollution tolerant)
was much higher at sites 1,2,  and 3 (upstream sites) than
at the control and downstream sites.
The amphipod tests revealed a gradient of toxicity, being
highest at the most upstream site and lowest at the control
site. Therefore, the results of the laboratory single species
toxicity tests were consistent with sediment metal concen-
trations and the distribution of chironomids. The investiga-
tors concluded that chemical analyses, laboratory toxicity
tests, and aquatic community evaluations all provided evi-
dence of metal-induced degradation to benthic populations
in the river.
                                                  -47-

-------
 B.3.2 Burton et al. (1987)
 The Clark Fork River was also the subject of another
 study. This group evaluated a battery of aquatic toxicity
 tests including the 7-d Ceriodaphnia test and 12 microbial
 enzyme activity assays. Results of the laboratory toxicity
 tests were  compared to instream parameters  including
 diatom diversity and density, as well as metal concentra-
 tions (In both water column and sediment). Data were
 collected at 13 sites along the river and one control site.
 As in the previous study (Canfield et al., 1994), sites were
 on a downstream gradient below an area with past mining
 activities.  Both Ceriodaphnia and microbial tests were
 conducted in the laboratory with water samples from the
 river sites.

 Ceriodaphnia survival ® = 0.94 and 0.93) and neonate
 production® = 0.93 and 0.92) showed statistically signif-
 icant (p<0.001) positive correlations with diatom density
 and diversity, respectively. Survival ® = -0.92 and -0.94)
 and neonate production ® = -0.92 and -0.94) were nega-
 tively correlated with water column copper and zinc, re-
 spectively.

 These data suggest the Ceriodaphnia toxicity test results
 were effective predictors of instream metal contamination
 and of diatom population variations. Laboratory microbial
 enzyme assays for galactosidase, glucosidase, and pro-
 tease activities also showed statistically significant nega-
 tive correlations with diatom populations in the river (i.e.,
 low diatom  diversity was associated with high enzyme
 activity).

 B.3.3 Clements and Kiffney (1994)
 These researchers attempted to assess impacts of metals
 from a mining site discharging into the Arkansas River (in
 Colorado). Three sites were selected: one site upstream
 of the mining operation discharge and two downstream of
 the discharge. Whether caution was taken in the selection
 of these sites to assure similar substrates, as well as other
 physical and chemical conditions is unclear. The intent
 was that the upstream site would serve as  a reference
 point. The second site was 6 km downstream and the third
 site was 45 km downstream of the mining operation input;
 the third was conceived as a site to represent biological
 community  recovery.     Unfortunately,  two   creeks
 discharged into the Arkansas  River below the input from
 the mining operation, confounding interpretation of data
 collected at site 2.  Furthermore, site 3 was below a town
 and no attempt was made to account for toxic inputs from
 the town or other sources (only metal analyses were
 performed on water samples).

 Water samples collected at each site were screened with
 the 7-d Ceriodaphnia test, neonate production being the
 endpoint. Benthicmacroinvertebratesbioaccumulation of
 metals was assessed.  Benthic invertebrate community
structure was surveyed by determining the number of taxa
 and the number of individuals in each taxa.  All assess-
 ments were conducted in fall and spring, except toxicity
 testing at site 2 was performed only with a spring water
 sample.

 In the fall water samples, zinc concentrations were highest
 at site 1, upstream of the mine input. Water sample toxicity
 was highest at site 3 and lowest at site 2 O'ust below input
 from the mining operation) during the fall. The pattern of
 toxicity in these fall water samples did not correspond to
 heavy metal concentrations in the same samples.  The
 causes of toxicity in the fall water samples are unknown
 because Toxicity Identification Evaluations (TIEs) or or-
 ganic chemical analyses were not performed.

 In the spring, all heavy metal concentrations were lowest
 in water samples from the upstream "control" site and high-
 est at site 2, immediately below the mining operation input.
 Cadmium, copper, and zinc concentrations at site 2 were
 5,8, and 5.5 fold higher, respectively, than at the reference
 site.  The Ceriodaphnia test was conducted only with a
 water sample from site 2 and the results mirrored the in-
 tense metal contamination.

 Neither the number of invertebrate taxa nor the number of
 individuals within taxa showed the site nearest to the
 mining operation input to be the most impacted.  Only the
 bioaccumulation data and changes in the composition of
 dominant macroinvertebrate groups suggested that site 2
 to be the most impaired compared to the other two sites.
 Interpretation of data collected in this study is difficult for
 several reasons. It is not clear how carefully the three sites
 were matched in terms  of substrate  and other physi-
 cal/chemical factors. In the spring when the other data
 were more understandable toxicity testing was incomplete
 and organic  chemical analyses  were not performed.
 Stream flow and rainfall conditions were not included in the
 manuscript, and these  factors  could influence  the
 measurements and interpretation of results. The authors
 counsel that the different approaches used in their study
 provided divergent information regarding metal impacts,
 so they recommend an integrated approach to assessing
 impacts on streams. While  an integrated approach to
 assessing  impacts  on aquatic ecosystems  should  be
 supported, design of this study was not optimal and, thus,
 the results were inconclusive.

 B.3.4 Niederlehner et al. (1985)
 Several researchers have counseled that single species
toxicity tests lack many important interactive characteristics
of multivariate, complex ecosystems and, therefore, may
 not  be  accurate predictors  of biological  community
 responses.  Niederlehner et al.  (1985)stated that "A
multispecies or microcosm test incorporate some of the
emergent properties of communities of ecosystems and
                                                   -48-

-------
 serve as an intermediate between the simplicity of the
 single species toxicity tests and the unreproducible com-
 plexity of the environment."

 These researchers scrutinized the responses of protozoan
 communities to cadmium exposures. Effects of cadmium
 were  evaluated by  observing  colonization  of the
 protozoans in polyurethane foam (PF), islands for 28 d.
 Exposures were in duplicate tubs using  five different
 cadmium concentrations.

 From the experiments, NOECs for protozoan colonization
 impairment ranged from 0.8 to 9.5 ug Cd/L. In the ambient
 water quality criteria document for cadmium (USEPA,
 1984) chronic values (ChV) adjusted for hardness range
 from 0.14 (Daphnia magna)  to 15.04 ug/L  (fathead
 minnow) (selecting  values from  studies with hardness
 equivalent to the range seen in  the microcosm study).
 Cladoceran chronic values range from 3.9 ug Cd/L for
 Ceriodaphnia reticulata to 0.14 ug Cd/L for D. magna
 Overall, the data in  the Niederlehner et al (1985) report
 suggest  that the laboratory single species toxicity test
 results underestimate field effects.

 It is not evident that the microcosm results were a better
 predictor of  a  safe  cadmium  concentration since the
 chronic values from 15 of the 16 species listed in USEPA's
 criterion document were within the range of NOECs noted
 in the mesocosm study. Arguably, this conclusion that the
 protozoan microcosm "tests were comparable to traditional
 single species tests in time and expense required, but had
 the  advantages of utilizing indigenous organisms  and
 including processes characteristic of communities, but not
 single species."  is highly questionable.  The microcosm
 tests were 28 d exposures.

 B.3.5 Moore and Winner (1989)
 These investigators conducted a study in outdoor ponds
 in Ohio to ascertain the effects of various concentrations
 of copper on zooplankton and benthic macroinvertebrates.
 Laboratory 7-d Ceriodaphnia toxicity tests were conducted
 to evaluate the ability to predict effect levels of copper on
 pond invertebrate communities.

 The results of the Ceriodaphnia tests predicted the effects
 of copper on pond populations of Daphnia ambigua, but
 underestimated the impacts of copper on other important
 species, such as rotifers, copepods, mayfly juveniles, and
 chironomids.

 B.3.6 Geckler et al. (1976)
The results of laboratory chronic toxicity tests in which
fathead minnows, green sunfish, and longearsunfish were
exposed, in separate tests (i.e., not a multiple species
test), to various concentrations of copper were compared
to responses offish in a natural stream.  Effects were seen
at a somewhat lower copper concentration in the stream
 than predicted by the laboratory toxicity tests. The authors
 concluded that, "Agreement between the predictions from
 laboratory toxicity tests and the observed field effects is
 surprisingly close considering the measurement errors
 involved." Similarly, laboratory toxicity test results provided
 reasonable estimates of the metal concentrations which
 impacted crustaceans inhabiting tundra ponds (Havas and
 Hutchinson, 1982).

 8.3.7 Giesy et al. (1979)
 Giesy et al. (1979) studied the effects of different cadmium
 concentrations in outdoor experimental stream channels.
 The results show that single species toxicity test results do
 not predict secondary  effects (cf., Grassland above) in
 aquatic ecosystems. The primary direct effect of cadmium
 in these channels was on crayfish; however the direct ef-
 fect of cadmium on crayfish could have been measured in
 the lab. In this mesocosm study, the crayfish was a "key-
 stone" species.   The  decrease  in  crayfish population
 greatly  influenced  community  structure  including
 macrophytes, insects, and clams. These secondary effects
 were not predicted by laboratory toxicity tests, therefore,
 underestimating biological community impacts.

 B.3.8 Marshall (1978)
 Marshall (1978) compared the short-term (7 to 9 d) toxicity
 of cadmium to laboratory and natural populations of Daph-
 nia galeata in Lake Michigan. As well as controls, there
 were four different exposure concentrations for both the
 laboratory and field populations. Results of this investiga-
 tion indicated that the  characteristics of Lake Michigan
 water did not appreciably alter the responses of Daphnia
 to cadmium.  Furthermore, responses to cadmium  were
 equivalent in the laboratory and in the lake experiments.

 This is in contrast to the study by Sherman et al. (1987)
 which demonstrated that laboratory toxicity test results with
 cadmium on fathead minnows could not be used to extrap-
 olate to field situations unless hardness and pH in the labo-
 ratory tests are equivalent. In  general, laboratory toxicity
 test results underestimated field effects of cadmium.

 B.4 Miscellaneous
 B.4.1 Boelteretal. (1992)
 Ambient water samples from streams receiving discharges
 of coproduced brine (water that is extracted along with
 petroleum products from underground deposits) from  an
 oil field in Wyoming were collected and tested for toxicity
 (Boelteretal., 1992). The7-d  Ceriodaphnia test was one
 of the testing procedures.

 Exposure to water samples collected downstream, but not
 upstream, of the oil field discharges significantly reduced
 Ceriodaphnia survival and neonate production. Application
of TIE procedures to toxic samples signified that toxicity
could not be attributed to nonpolarorganics, heavy metals,
or hydrogen sulfide. TIE results along with analytical
                                                  -49-

-------
chemistry data established that the cause of toxicity was
sodium,  potassium,  bicarbonate, and  carbonate ions.
Concentrations of these ions were sufficiently high to be
toxic to many aquatic organisms.

This study is one of many studies which illustrate that the
Ceriodaphnia test in combination with TIE and analytical
chemistry procedures have effectively identified causes
and sources of toxicity in surface waters, storm waters, and
effluents.

B.4.2 Gonzalez and Frost (1994)
The responses of two rotifer species, Keratella  cochlea
and K. taurocephata, to low pH were compared in labora-
tory toxicity tests and in a natural lake (Little Rock Lake in
Wisconsin). This lake, formed by seepage, consists of two
basins which were separated by a vinyl curtain. One of the
basins was acidified overtime, whereas the other was not
modified. Populations of the two rotifer species in the two
basins were compared through time. Short term  (30-d to
96-h) laboratory toxicity tests were conducted with each of
the species using watersamples from the two basins of the
lake.

The authors concluded that the laboratory tests were not
predictive of results obtained in the lake component of the
study and recommended caution when extending results
from laboratory studies to natural ecosystems. More spe-
cifically, the authors suggested that laboratory tests did not
explain the population increase of K. taurocephala in the
acidified  basin. However, the laboratory tests did reveal
that K. cochlea is very sensitive to low pHs, whereas K.
taurocephalawas much less sensitive. K. cochlea essen-
tially disappeared from the  acidified  basin while the
population of K. taurocephala in that basin increased.
Thus, field observations were not necessarily at odds with
the laboratory toxicity tests.  Furthermore, the population
of K. taurocephala in the reference basin remained very
low throughout study.  The K. taurocephala population
increase in the acidified basin appeared to be due to a
reduction of predators, this reduction being caused by the
low pH.  The laboratory tests with the rotifers would not
predict effects on predators.

Other aspects of this study complicate interpretation of the
data and acceptance of the author's conclusions. These
factors include the absence of replication in the field com-
ponent of the study and differences in the two basins. The
acidified basin underwent thermal stratification  and be-
comes anoxic where as the reference basin did not.
B.4.3. Other S.tudies
Other investigators (Hitchock, 1965; Eisle and Hartung,
1976; Weiss, 1976; Cairns et al., 1982; Grassland and
Hillaby, 1985) have examined the correspondence of labo-
ratory indicator species toxicity test results and biological
community responses; the comparisons generally support
a good qualitative adequacy of the single species test re-
sults as predictors of instream responses.

Some studies (Carlson et al., 1986; Nimmo et al.,  1990)
were not specifically designed for examining the reliability
of the single species test results in predicting aquatic eco-
system responses, but provided qualitative indications of
a good correspondence.
                                                   -50-

-------
                                        Appendix  C
        Single Species Tests with Ocean Water or  Sediment
C.1  Swartz et al. (1994)
Sediment  toxicity,  as assessed  with the  amphipod,
Eohaustoriusestuarius, sediment chemical analyses, and
the abundance of benthic  amphipods were examined
along a gradient in the Lauritzen Channel and adjacent
areas of Richmond Harbor, California. Dieldrin and DDT
were formulated at a facility on Lauritzen Channel from
1945 to 1966.

Objectives included: 1) Examination of the relationship
between sediment contamination by DDT and dieldrin,
sediment toxicity to Eohaustorius, and the field abundance
of  amphipods   at   nine   sites   in  the  Lauritzen
Channel/Richmond Harbor area; 2) Identification of the
lowest  DDT and dieldrin concentrations associated with
effectson amphipod survival in laboratory toxicity tests and
effects  on  abundance of amphipods in the field; and 3)
evaluation of the  relative contributions of DDT, dieldrin,
PAHs,  PCBs, and metals to sediment toxicity, and on
amphipod abundance in the study area.

Sediment contamination by  both dieldrin and the sum of
DDT and its metabolites was positively correlated with
sediment toxicity and negatively correlated with the abun-
dance of amphipods in the study area; DDT (plus its me-
tabolites) was the dominant toxicological factor. These
researchers concluded, "Correlations between toxicity,
contamination, and biology indicate that sediment toxicity
to Eohaustorius  estuarius,  Rhepoxynius  abronius, or
Hyallella azteca in laboratory tests provide reliable evi-
dence of biologically adverse sediment contamination in
the field."

In five  other studies (Swartz et al.,  1985, 1986, 1991;
Ferraro et al., 1991; Hake et al., 1994;) statistically signifi-
cant positive correlations were found between the sum of
DDT plus its metabolites in sediment and mortality of am-
phipods in laboratory sediment toxicity tests. Statistically
significant negative correlations were seen between sedi-
ment toxicity and amphipod  abundance in field sediment
samples (i.e., high sediment toxicity related to low abun-
dance). Thus, the weight-of-evidence from these studies
suggests that significant toxicity in laboratory sediment
toxicity  tests provides a reliable qualitative prediction of
benthic biological community responses.
C.2 Chapman et al. (1987)
These reseachers conducted an investigation in the San
Francisco Bay area which involved measurements of sedi-
ment contamination by: chemical analyses; toxicity through
sediment  toxicity  tests  (mortality  of  the  amphipod,
Rhepoxynius abronius, larval development of the mussel,
Mytilis edulis, behavior of a clam, Macoma balthica, and
reproduction of the copepod, Tigriopus californicus; and
benthic infaunal community structure through taxonomic
analyses of macroinfauna).

Sediment samples were collected atthree stations at each
of three sites in the San Francisco Bay: Islais Waterway,
Oakland,  and  San Pablo  Bay.   Chemical  analyses
indicated that the Islais Waterway site was more contami-
nated by a number of potentially toxic substances than the
Oakland site, while the latter site was more contaminated
than the San Pablo Bay site.

Benthic community analyses, as well as toxicity test results
(especially the mussel larvae, amphipod, and clam behav-
iortests) suggested that the rank of pollution-induced deg-
radation was: Islais Waterway > Oakland > San Pablo
Bay. Moreover, there was concordance among the three
synoptic measurements. The authors argue that all three
types  of assessment are critical for assessing pollution-
induced degradation of aquatic biological communities.

C.3 Swartz et al. (1985)
Sediment  toxicity,  chemical  contamination   and
macrobenthic community  structure were examined  at
seven stations along a gradient northward from Los An-
geles  County Sanitation Districts' sewage outfalls on the
Palos Verdes Shelf and compared to control conditions in
Santa Monica Bay. Sediment toxicity was assessed with
laboratory  toxicity  tests  utilizing  the   amphipod,
Rhepoxynius abronius.

Significant reductions in macrobenthic species  richness,
density, biomass, and infaunal indices occurred at the
three stations which also showed significant toxicity in the
laboratory tests. There was a close inverse relationship
between sediment toxicity and benthic community mea-
surements. The authors concluded that sediment toxicity
tests can be useful in predicting benthic community im-
pacts, but cautioned that the amphipod test is not particu-
                                                 -51-

-------
  larly sensitive.  Moreover, absence of statistically signifi-
  cant toxicity in this test should not be interpreted as evi-
  dence of a healthy benthic community (i.e., the test yields
  many false negatives).


  C.4 Long and Chapman (1985)
  To assess biological community effects of sediment con-
  tamination Long and Chapman advocate the use of a Sedi-
  ment Quality Triad (chemical, toxicity, and benthic infaunal
  data).  The authors contend that too much emphasis is
  placed on the determination of distribution and concentra-
  tion of chemicals in the designation of problem  areas or
  "hotspots." They further assert that chemical data alone
  provide little or no information regarding the possible bio-
  logical significance of such chemical accumulations. The
  objective of this publication was to determine the corre-
  spondence among measures of the three components of
 the Triad; data from several studies on Puget Sound,
 Washington were used.

 Toxicity data were derived from six different laboratory
 sedimenttests (amphippd lethality, oligochaete respiration,
 oyster larval abnormality, fish cell effects, and polychaete
 life-cycle effects). Data from these tests were combined
 into  a toxicity  summary  index.   Four indices were
 concluded to be effective indicators of benthic community
 health. All four  indices represent percent contribution of
 specific taxonomic groups to the total benthic community-
 contribution of echinoderms (pollution sensitive, so high
 percentage represents healthy community); contribution
 of arthropods (many are pollution sensitive, so higher
 percentage represents healthy community); contribution
 of phoxocephaiid amphipods (pollution sensitive, so higher
 percentage  represents  healthy  community);  and
 contribution  of  polychaetes and  molluscs  (many are
 relatively pollution tolerant,  so high percentage  can
 represent impacted community).

 Using the above indicators of benthic community health,
 the toxicity tests summary index was a reliable predictor
 of biological  community impacts.  In fact, good overall
 correspondence among the three components of the Triad
 was observed. On astation-by-station basis, the chemical
 data alone were not always reliable indicators of biological
 effects.


 C.5 Becker et al. (1990)
 Laboratory   sediment  toxicity   tests   and   benthic
 macroinvertebrate assemblage surveys were conducted
 at 43 stations in Commencement Bay, Washington; there
 were four reference sites in Carr Inlet.  The toxicity tests
 included the amphipod (Rhepoxynius abronius) mortality
 test, the oyster (Crassostrea gigas) larval development
test, and the Microtox™ test. A numerical classification
analysis was applied to the benthic assemblages data.
 Sediment samples were also subjected to chemical anal-
 yses for organic compounds and metals.

 Toxicity test results and benthic assemblages alterations
 were inversely related, whereas toxicity was positively cor-
 related with chemical concentrations. This suggests that
 most biological effects resulted from chemical toxicity.
 That is, the laboratory toxicity tests were reliable qualitative
 predictors of biological community responses.

 To evaluate the correspondence between toxicity test re-
 sults and alterations of benthic assemblages, three types
 of comparisons were made. Concordance was first deter-
 mined; this is a measure of agreement between results of
 toxicity tests and  macroinvertebrate surveys (i.e., both
 show statistically significant effects or both show no statisti-
 cally significant effects). Statistical significance of concor-
 dance was evaluated using  a binomial test and an
 expected level of concordance of 0.5 (i.e., that for random
 agreement). Of the 47  stations the benthic assemblages
 at 19 were deemed altered.  Concordance was 60% (not
 significant) with the amphipod test, 81 % (p<0.001) with the
 oyster larval test, and 68% (p<0.01) with the Microtox™
 test. Sensitivity of the toxicity tests was represented as the
 percentage  of stations with altered benthic assemblages
 that also revealed statistically significant toxicity. Sensitiv-
 ity was  84%, 68%,  and  42%,  respectively  for  the
 Microtox™, oyster larval, and amphipod, tests. Efficiency
 of the toxicity tests was determined as the percentage of
 tests which identified only those stations with altered ben-
 thic assemblages.  Efficiency was 81 %, 57%, and 50% for
 the oyster larval, Microtox™, and amphipod tests, respec-
 tively. The authors concluded that the laboratory sediment
 tests, especially the oyster larval tests, were reasonable
 predictors of altered benthic assemblages.


 C.6 Swartz et  al. (1982)
 The toxicity  of 175 sediment samples from Commence-
 ment Bay was measured in the laboratory Fthepoxynia
 abronis survival test.   The  relationship between  these
 toxicity test results and benthic community data from these
 sites was explored. Benthic community data exhibited a
 negative correlation (decreased amphipod density and
 species richness  with  higher levels  of  toxicity) with
 laboratory sediment toxicity. The authors concluded that
 the correlation between laboratory and field results indi-
 cated that the sediment toxicity tests were reliable predic-
 tors of biological community responses.


 C.7 Schimmel et al. (1989a,b)
Studies were conducted to  assess the relationship  be-
tween effluent and ocean water toxicity. The estimates of
chronic toxicity were made from 1982 to 1984 at seven
locations along the Atlantic and  Gulf Coasts with effluent
and ocean water samples (USEPA, 1994b).
                                                   -52-

-------
Effluent dilutions at various locations in receiving waters
were estimated with dye studies so that effect concentra-
tions could be compared. Data presented by these inves-
tigators reveal that effluent toxicity reliably reflected receiv-
ing water toxicity (effect concentrations in effluent and
ocean  water samples  with  equivalent dilution corre-
sponded).  The results of these studies signify that the
ocean  receiving waters had  little affect on  the toxic-
ity/bioavailability of chemicals in the effluent. The "missing
link" in these investigations was establishing a connection
between marine water toxicity and biological community
responses.


C.8 Frithsen et al. (1989)
Using indicator species toxicity tests and an ecosystem
survey, a four month study was conducted to evaluate the
toxicity of a sewage effluent. Effluent discharge was into
Narragansett Bay. Effluent toxicity was evaluated with the
sea urchin, Arbacia punctulata sperm cell test. Ecological
effects of the effluent were assessed in mesocosms con-
sidered by the authors to be functional analogs of shallow,
unstratified coastal systems such as Narragansett Bay.

The sewage effluent consistently tested toxic in the sea
urchin test, with the average EC50 being 1.1% effluent.
Little information could be gleaned from the mesocosm
data due to several problems.  There were unexplained
effects on phytoplankton and organic carbon loading which
lead to hypoxia. Toxicity measured in the mesocosm did
not correlate with that in the sewage effluent. There was
incomplete mixing of effluent in the mesocosms.  Signifi-
cant toxicity was detected in the control mesocosms. Tox-
icity in all mesocosms was highly variable and not related
to effluent toxicity. Because of the confounding factors, a
conclusion that the  mesocosm data failed to confirm
laboratory effluent toxicity data would be inappropriate.
The study was inconclusive.
                                                   -53-

-------
                                         Appendix D
  Strengths and Limitations of Single Species Toxicity Tests
D.1 Strengths of Single Species Tests
There is no instrument that can measure or predict how
organisms will respond to a toxic chemical(s).  Further-
more, chemical analyses of effiuentorambientwatersam-
ples do not yield information  on toxicological additivity,
bioavailability, synergistic, or  cumulative effects.  Many
wastewater and ambient waters are complex, containing
constituents that interact and that differ in toxicity; there-
fore, single chemical standards are important, but of limited
value in protecting water quality.-

   1)  Single species tests integrate additivity and cu-
      mulative interactions of chemicals.
   2)  Single species tests provide a direct measure of
      chemical bioavailability.
   3)  Single species tests measure responses to toxi-
      cants for which there are no chemical-chemical-
      specific water quality standards.
   4)  Single species tests have provided  reliable esti-
      mates of concentrations (for many different types
      of chemicals) which cause effects in  aquatic eco-
      systems.
   5)  Because they are highly standardized with specific
      quality assurance and control requirements, single
      species  tests provide  reliable, repeatable, and
      comparable results with good precision compared
      to other types of chemical and biological tests.
   6)  Single species tests provide an early warning signal
      so that actions can be taken to minimize significant
      ecosystem impacts  (especially with regard to the
      discharge or release of  toxic chemicals).
  7)  Single species tests can be performed relatively
      rapidly and inexpensively. This allows for the ac-
     cumulation of a data set which better characterizes
     the wastewater or ambient water system.
D.2 Limitations of Single Species Tests
Several definite and potential limitations of single species
toxicity tests have been identified.

    1)  Results of a single test do not characterize the
        duration, orf requency of toxicity in wastewater or
        ambient waters. Instream exposure reflects ambi-
        ent water/effluent characteristics overtime (days,
        weeks), whereas exposure in the laboratory re-
        flects  the characteristics of ambient  water or
        wastewater in a grab sample or composite sam-
        ple of one day.
    2)  Results of a test or tests with an effluent do not
        allow for assessment of cumulative effects of
        toxic substances from different sources in aquatic
        ecosystems.
    3)  The range of sensitivities (to toxic substances) of
        organisms and functions in aquatic ecosystems
        may not be encompassed by single species tests.
    4)  Effects due to bioaccumulation/bioconcentration,
        delayed, or secondary effects are not measured.
    5)  Results of single species tests may underesti-
        mate ecosystem community responses because
        of the multiple stressors acting on natural popula-
       tions and communities.  Single species tests
        include limited range of endpoints  (responses)
       compared to aquatic ecosystems.
    6)   Results of single species tests may not be pre-
       dictive of trophic interactions and ecosystem
       operational processes (tests do not incorporate
       aquatic ecosystem complexity).
    7)  Physical and chemical, as well as biotic factors,
       in aquatic ecosystems could modify (increase or
       decrease) bioavailability or toxicity compared to
       laboratory tests. The highly controlled exposure
       regimes in the  laboratory may not reflect the
       multivariant and complex exposure conditions in
       natural settings.
   8)  Single species tests tend to use non-indigenous
       species that may not represent local biota.
   9)  Single species toxicity test results fail to account
       for indirect effects of contaminants.
   10) Single  species  tests tend to use genetically
       homogenous laboratory populations
                                                -54-
                                                          &V.S.
                                                              GOVERNMENT PRINTING OFFICE: 1999 - 550-101/2OOM

-------