a EPA
United States
Environmental Protection
Agency
Office of Chemical Safety
and Pollution Prevention ,  '' —71
(7101)      January 2012
       Ecological Effects
       Test Guidelines

       OCSPP 850.2000:
       Background
       and Special
       Considerations-
       Tests with  Terrestrial
       Wildlife

-------
                                     NOTICE

     This guideline is  one of a series of test guidelines established by the United States
Environmental  Protection Agency's Office of Chemical Safety and  Pollution  Prevention
(OCSPP) for use  in  testing pesticides  and chemical substances to develop data  for
submission to the Agency under the Toxic Substances Control Act (TSCA) (15 U.S.C. 2601,
et seq.), the Federal Insecticide, Fungicide and Rodenticide Act (FIFRA) (7 U.S.C.  136, et
seq.), and section 408  of the Federal Food, Drug and Cosmetic (FFDCA) (21 U.S.C. 346a).
Prior to April 22, 2010,  OCSPP was known as the Office of Prevention, Pesticides and Toxic
Substances  (OPPTS).   To distinguish  these guidelines from  guidelines issued by other
organizations, the  numbering convention  adopted in 1994 specifically included  OPPTS as
part of the guideline's number.  Any test guidelines  developed after April 22, 2010 will use
the new acronym (OCSPP) in their title.

     The OCSPP harmonized test guidelines serve as a compendium of accepted scientific
methodologies and protocols that are intended to provide data to inform regulatory decisions
under TSCA, FIFRA, and/or FFDCA.  This document provides guidance for conducting the
test,  and is  also used by  EPA, the public,  and  the companies that are subject to data
submission requirements under TSCA, FIFRA, and/or the FFDCA. As a guidance document,
these guidelines are not binding on either EPA or any outside parties,  and the EPA may
depart from the  guidelines where circumstances warrant and without prior notice. At places
in this guidance, the Agency uses the word "should." In this  guidance, the use  of "should"
with regard to an action means that the action is recommended rather than mandatory. The
procedures contained in  this guideline are strongly  recommended for generating the data
that are the subject of the guideline, but EPA recognizes that departures may be appropriate
in specific  situations. You may propose alternatives to the recommendations described in
these guidelines, and the Agency will assess them  for  appropriateness  on a case-by-case
basis.

     For additional information about these test guidelines and to access these guidelines
electronically, please  go  to  http://www.epa.gov/ocspp  and  select  "Test Methods &
Guidelines" on  the left side navigation  menu.  You may also access the  guidelines in
http://www.requlations.qov grouped by Series under Docket ID #s: EPA-HQ-OPPT-2009-
0150 through EPA-HQ-OPPT-2009-0159, and EPA-HQ-OPPT-2009-0576.

-------
OCSPP 850.2000: Background and special considerations: tests with terrestrial
wildlife.

(a) Scope—
(1) Applicability.  This guideline is intended to be used to help develop data to submit to EPA
under the  Toxic Substances  Control  Act  (TSCA) (15  U.S.C.  2601,  et  seq.),  the  Federal
Insecticide, Fungicide, and Rodenticide Act (FIFRA) (7 U.S.C.  136,  et seq.), and the Federal
Food, Drug, and Cosmetic Act (FFDCA) (21  U.S.C. 346a).
(2) Background. This guideline provides general information applicable to conducting OCSPP
Series 850, Group B toxicity tests with terrestrial wildlife species.  The source materials used in
developing this  harmonized OCSPP guideline are:  OPP 70-1 General Information, OPP 70-2
Definitions, OPP 70-3 General Test Standards,  OPP 70-4 Reporting  and Evaluation of Data
(Pesticide Assessment Guidelines Subdivision E—Hazard Evaluation: Wildlife and Aquatic
Organisms); the Pesticide Reregi strati on Rejection Rate Analysis Ecological Effects report and
the background materials in the OCSPP Series 850, Group B specific guidelines.

(3) General

       (i) The OCSPP  Series 850, Group B provides guidelines  applicable  to conducting
       laboratory and field toxicity tests with terrestrial species, including birds and mammals.
       Field tests are designed on a case-by-case basis.  The guidelines in OCSPP Series 850,
       Group  B are applicable to  evaluating the hazards and risks of industrial  chemicals and
       pesticides to terrestrial wildlife exposed directly  or  indirectly.  Data concerning  the
       effects of pesticides on terrestrial wildlife  are used  in ecological risk  assessment  of
       pesticides (40 CFR part 158, paragraph (k)(29) of this guideline).  These data are also of
       use in  assessments of  potential off-target injury to endangered and threatened wildlife
       species listed by the Fish and Wildlife Service, Department of Interior, and when toxicity
       concerns arise from incidents or during Special Review.   These data are used for both
       deterministic and probabilistic risk assessments.

       (ii)  Information is provided on  the design and conduct of tests with  terrestrial wildlife,
       emphasizing  the  importance  of adequately  characterizing the test  substance, use  of
       suitable experimental design, and establishing the physical and chemical conditions  of the
       test system  in order to provide a scientifically sound understanding  of how the test
       substance behaves under test conditions.  Also considered  are the factors  that can  affect
       the test outcome and interpretation of test results. This general information is primarily
       applicable to the guidelines for laboratory toxicity tests, since field tests are designed on a
       case-by-case  basis.  However, the  OCSPP 850.2000  guideline  lists critical quality
       assurance and reporting standards common to all the guidelines in the OCSPP Series 850,
       Group  B guidelines.

       (iii) The OCSPP Series 850, Group B guidelines have generally been validated in formal
       round-robin tests or through repeated use.
       (iv) Each submitted study should meet the data quality objectives for which the test is
       designed.   Test validity  elements critical to determining the  scientific soundness and

                                       Page 1 of23

-------
acceptability of the study have been listed for each guideline in the OCSPP Series 850,
Group B.

(v) The  guidelines  contained in OCSPP Series 850,  Group  B  recommend  specific
procedures to be used in almost all circumstances to result in a satisfactory study result,
but also provide general guidance  that allows  for some latitude, based  upon study-
specific circumstances.   It is recognized that certain  problems,  some of which are
unavoidable, may arise both before and during  testing  and provisions  have thus  been
made in the guidelines for dealing with those that are commonly  encountered.  These
guidelines  provide for exceptions, while at the same  time maintaining  a high level of
scientifically sound, state-of-the-art guidance  so that following this guidance will provide
ecological  effect information that is scientifically defensible for its intended use, while
also taking into consideration the chemistry and environmental fate  of the test substance.
For  a  satisfactory  test, the  experimental  design,  execution  of the  experiments,
classification of the organism, sampling,  measurement, and  data analysis  should  be
accomplished by  use  of sound scientific  techniques  recognized by the scientific
community.  Uniformity  of procedures, materials,  and reporting should be maintained
throughout the toxicity evaluation process. Refinements of procedures to increase  their
accuracy and effectiveness are  encouraged.   When  such  refinements  include major
modifications  of any  test  procedure,   the  Agency  should  be consulted  before
implementation.  Also when in doubt, users of these guidelines should consult with the
appropriate regulatory  authorities  for clarification or  additional  information  before
proceeding.  All references supplied with respect to protocols or other test standards are
provided as recommendations.

(vi) For pesticides, a tiered testing approach given in  40 CFR 158.243 and 40  CFR
158.630 for terrestrial wildlife provides for greater efficiency of testing  resources while
assuring  data development  as warranted to  meet the objectives of a hazard and risk
assessment.  To reduce or eliminate unnecessary toxicity testing for regulatory decision
making the specific test requirements for pesticides in 40 CFR part  158 depend upon the
use pattern of the pesticide and the potential for exposure of wildlife. In addition, there is
a hierarchal or tier system which progresses  from basic laboratory  tests  to applied  field
tests, where the results of each tier of tests should be evaluated to determine the potential
of the pesticide to cause adverse effects,  and to determine whether further testing is
warranted to meet the objectives of the hazard or risk assessment (40  CFR part 202).
Tests in the lower tiers  (Tier  I and Tier II) are designed to screen test  substances to
determine  the potential  to  cause adverse effects on  survival  and reproduction.  For
pesticides,  a Tier I test,  referred to as a limit test in these Group  B guidelines, tests a
single concentration and  compares effects observed with appropriate controls.   Tier II
testing for  pesticides (multiple-concentration  definitive test in these Group B guidelines)
provides for generation  of dose-response  curves for test substances which are  known
toxicants or which in Tier I testing demonstrated toxicity.  The wild  mammal toxicity test
described  in OCSPP  850.2400 and  the field study described  in OCSPP  850.2500 are
considered  Tier III, and are  designed on a case-by-case basis to further refine and
characterize the estimate of risks to terrestrial  wildlife.

(vii) Data  on toxicity to terrestrial wildlife may also  be used  to evaluate the potential
hazard and risk of industrial chemicals.  Terrestrial wildlife toxicity data are requested

                                Page 2 of23

-------
       when the pattern of production, use, or disposal indicates exposure to terrestrial wildlife.
       This testing is part of the Tier I (base set)  suite of tests in the OPPT testing scheme
       developed  for determining environmental effects  (see  the references  in  paragraphs
       (k)(12), 00(13), (k)(18), (k)(19), (k)(31), and (k)(32) of this guideline for further details).
       This testing scheme is  deterministic for the most part, flexible, sequential, consistent,
       iterative, transparent, discriminatory of the extent of toxicity, and applicable to all types
       of chemicals.

       (viii) For industrial  chemicals, toxicity to birds may not be known  or suspected, in
       contrast to pesticides where it may  be established that the substance is or could be toxic
       to terrestrial wildlife.  Thus, for industrial chemicals the maximum amount of toxicity test
       information should be obtained from the initial or lower tier test,  Avian Acute  Oral
       Toxicity Test  (the OCSPP 850.2100 guideline).  In contrast to pesticides, the toxicity of
       industrial  chemicals to terrestrial  wildlife is generally uncharacterized.   Thus,  range-
       finding tests,  a preliminary step to define dose-response testing, are  more commonly
       conducted  than limit or maximum  challenge tests that use only one test concentration.
       For non-toxic or low toxicity chemicals (based on the results of the OCSPP 850.2100
       test) it is  likely that no further higher tier testing (OCSPP 850.2200 guideline  (Avian
       Dietary Toxicity Test)  or OCSPP  850.2300 guideline (Avian Reproduction  Test), or
       both) would be supported or recommended. For industrial chemicals, the base set Tier I
       tests and requirement to proceed from one tier to the  next are referenced in paragraphs
       (k)(12), (k)(13), (k)(18), (k)(19), (k)(31), and (k)(32) of this guideline.

       (ix)  While performing field tests, all necessary measures should be taken to ensure that
       nontarget  plants and animals, especially endangered or threatened species, will  not be
       adversely affected either by direct hazard or by impact on food supply or food chain.

(b) Definitions. Terms used in the OCSPP Series 850, Group B guidelines have the meanings
set forth in  Section 3 FIFRA   regulations at  40  CFR  152.3  (Pesticide Registration  and
Classification Procedures); 40 CFR part 158.300  (Product Chemistry Definitions); 40 CFR part
160 (Good Laboratory Practice Standards); and in TSCA Section  3 regulations  40 CFR part 792
(Good Laboratory Practice  Standards); and the  Agency's "Terms of Environment, Glossary,
Abbreviations and Acronyms" (see paragraph (k)(23) of this guideline). The definitions in this
section apply to  the  OCSPP Series  850  Group  B test guidelines and where applicable, the
individual test guidelines contain  additional or test-specific definitions.

Acclimation is the physiological  or behavioral adaptation by test animals to one or more new
environmental conditions and basal diet associated with the test procedure.

Active Ingredient (a.i.) is  any substance (or group of structurally similar substances if specified
by the Agency) that will prevent, destroy, repel or mitigate any pest, or that functions as  a plant
regulator, desiccant, or defoliant within the  meaning of FIFRA (40 CFR 152.3).

Acute  toxicity is  the  discernible  adverse  effects (lethal  or sublethal) induced in an organism
within a short exposure period (usually not constituting a substantial portion of the  total life
cycle or life span, e.g. days).

Acute  toxicity test is  a comparative study  in which organisms are  subjected to a severe, short-
term stimulus (test substance). The organisms, exposed to different concentrations of the test
                                       Page 3 of23

-------
substance (except in a limit test), are observed for a short period usually not constituting a
substantial portion of the total life cycle or life span.  Acute exposure typically includes a lethal
biological response of relatively quick progression.

Adjuvant is a subsidiary ingredient or additive in a mixture which modifies, enhances or prolongs
by physical action  the activity of the  active ingredient(s).   Examples of agricultural chemical
adjuvants include but are not limited  to  surfactants, crop oils, anti-foaming agents, buffering
compounds, drift control agents, compatibility agents, stickers and spreaders.

Basal diet is the food or diet as it is prepared or received from the supplier, without the addition
of any vehicle, diluent or test substance.

Chronic toxicity test is a  comparative study  in  which organisms are  exposed to  different
concentrations of the test substance generally for a relatively long period that constitutes a
substantial, nearly complete, or complete portion of the total life  cycle  or life span.  Chronic
exposure typically  induces  a  sublethal biological  response  of  relatively slow progression, or
which is cumulative in nature. For some chemicals with certain modes-of-action, shorter-term
exposure may result in chronic or latent  effects, and continued or cumulative exposure  is
therefore not necessary.

Concentration-response curve is  the  graphical and mathematical  relationship between the
concentration  of a test substance and a specific biological response produced from toxicity tests
when response (e.g., proportion or percent mortality) values are  plotted against concentration of
test substance for a given  exposure duration. This is also referred to as the dose-response curve
or concentration-effect curve.

Control refers to test organisms exposed to test conditions and test matrix  (capsule, diet, gavage)
in the absence of any introduced test  substance as part of  the test design for the purpose of
establishing a basis of comparison with a test substance for  known chemical or biological
measurements.

Effect Concentration (EC50) is the experimentally derived concentration of test substance
in the diet that would be expected to affect 50 percent of a test population of test animals which
is exposed exclusively to the treated diet under specified exposure conditions.

Formulation, as used within these guidelines, is a packaged end-use product (e.g., dust, wettable
powder, emulsifiable concentrate,  ultra low volume, etc} of  the test substance  and may contain
one or more active ingredients and one or more  inert ingredients.

Hatch refers to  eggs or young birds that  are the same age and  that are derived  from the same
adult breeding population, where the adults are of the same strain and stock.

Holding refers to the period from the time test organisms are received in the laboratory until they
are used in testing  or begin acclimation to test conditions.  Holding conditions may include
quarantine, lower temperatures to minimize disease,  or other conditions that are different from
test conditions.  Where holding conditions are  different from test conditions, the test organisms
should be acclimated to test conditions  prior to testing not to stress the organisms.
                                       Page 4 of23

-------
Inert ingredient is any substance (or group of structurally similar substances if designated by the
Agency), other than an active ingredient, which is intentionally included in a pesticide product
(40 CFR 152.3).

Inhibition Concentration (1C50) is the experimentally derived concentration of test substance in
the diet that would be expected to inhibit 50 percent of a test population of test animals which is
exposed exclusively to the treated diet under specified exposure conditions.

Lethal concentration, median (LCso) is  the experimentally derived concentration of the test
substance in  the  diet that would be expected  to  result in mortality of 50 percent (50%) of a
population of test  animals which  is exposed exclusively to  the treated  diet  under  specified
exposure conditions.

Lethal dose, median (LDso) is the experimentally derived dose of the test substance that would be
expected to result in mortality of 50% of a population of test animals which is treated  with a
single oral dose under specified exposure conditions.

Limit of detection  (LOD)  is the analytic  level  below which the qualitative  presence  of  the
material is uncertain.  This is typically defined by the lowest  concentration producing a signal
two standard  deviations above the background noise from a matrix blank sample.

Limit of quantification (LOQ) is the analytic level below which the quantitative amount of the
material is uncertain.  This is typically defined by the  lowest  concentration  of fortified  matrix
successfully analyzed.

Limit test is a toxicity test performed with a single test substance concentration or dose and a
control to establish  that the value for the measurement endpoint of concern (e.g., LCso, LDso) is
greater than the test substance concentration or dose (limit concentration or dose, respectively).

Lowest observed effect concentration (LOEC) is the lowest concentration of a test substance to
which  organisms are  exposed under specified exposure conditions that causes  a statistically
significant adverse effect as compared to the control(s).  Throughout these guidelines, the terms
LOEC and lowest observed adverse effect concentration (LOAEC)  have the same meaning.

Lowest observed effect level  (LOEL) is the lowest dose level of a  test substance to  which
organisms  are  exposed under specified  test conditions that  causes a statistically  significant
adverse effect as  compared to the control(s). Throughout these guidelines,  the terms LOEL and
lowest observed adverse effect level (LOAEL) have the same meaning.

Maximum acceptable toxicant concentration (MATC) is the highest concentration at which a test
substance can be  present and not be toxic to the test organism.  The MATC lies  within the range
between the LOEC  and NOEC. Operationally, for industrial chemicals, the MATC is defined as
the geometric mean of these values.  The MATC is also referred to (in the Pre-Manufacture
Notification (PMN) program  of  OPPT) as the  chronic value or chronic no-effect-concentration
(NEC).

Measured concentration  is an analytically derived quantitative measure which lies above  the
method detection limit.

                                       Page 5 of23

-------
Measurement endpoint is a quantitative measurable response to a stressor that is used to infer a
measure of  protection or  evaluate  risk  to  valued  environmental  entities.   Examples  of
measurement endpoints include,  but  are not  limited to,  mortality (e.g.,  LD50; NOEL), body
weight (e.g., NOEL), number of eggs (e.g., NOEL), etc.  Each test-specific guideline identifies
the measurement endpoint(s) to be determined by the proscribed study. The term "measurement
endpoint" is used synonymously with the term "measures of effect".

Method detection limit (MDL) is  operationally defined as  the concentration of constituent that,
when processed through the complete method, produces a  signal with 99% probability that it is
different from the blank.  It is computed as the standard deviation multiplied by the Student's t
constant corresponding to the appropriate  degrees of freedom (n-1).   Thus, for seven spiked
samples prepared at the hypothetical LOQ, the MDL is 3.143 times the standard deviation of the
mean of the seven replicate measurements.

No observed effect concentration (NOEC) is the  highest  concentration of a test substance to
which organisms  are exposed under specified exposure conditions that  does  not cause  a
statistically significant adverse effect as compared to the control(s).  The NOEC is  the test
concentration immediately below the LOEC and  can  only be defined in the presence of the
LOEC.    Throughout  these guidelines,  the terms  NOEC  and  no observed  adverse effect
concentration (NOAEC) have the  same meaning.

No observed effect level (NOEL) is the highest dose level of a test substance to which organisms
are exposed under specified exposure conditions that does not cause a statistically significant
adverse effect as compared to  the control(s). The  NOEL is the test dosage immediately below
the LOEL  and can only be defined in the presence of the LOEL.  Throughout these guidelines,
the terms NOEL and no observed  adverse effect level (NOAEL) have the same meaning.

Reagent water is water that has  been prepared by  deionization, glass  distillation, or reverse
osmosis.

Replicate is the experimental unit  within a toxicity test.  It is the smallest physical entity to which
treatments can be independently assigned.

Subchronic toxicity test is a  comparative study with terrestrial organisms that has characteristics
of both acute and chronic toxicity  tests, but with more of the latter. Organisms are subjected to a
stimulus (test substance), of a longer duration than  an acute test, but of a shorter duration than a
chronic test.  Subchronic exposure typically induces a lethal or sublethal biological response of
relatively moderate progression for periods that constitute a portion of the total life  cycle or life
span.

Test substance refers to the specific form of a chemical substance or mixture being evaluated
(e.g., pesticide active ingredient or formulation, or industrial chemical).

Treatment group refers to the set  of replicates  that receive the same amount (if any) of the test
substance;  controls are treatment groups that receive none of the test substance.

Vehicle is any agent which facilitates the mixture, dispersion, or solubilization of a test substance
with  a carrier (e.g.,  diet, capsule or gavage solution, drinking water) used to expose  the test
organisms (40 CFR 160.3, 40 CFR 792.3).
                                      Page 6 of23

-------
(c) Apparatus, facilities and equipment—

(1) Laboratory facilities and equipment.  The type of facilities and equipment for conducting
the toxicity tests with the organisms in this group of guidelines varies depending upon the nature
of the test and the organism.  In general the laboratory toxicity tests with terrestrial wildlife use
normal laboratory glassware, supplies and equipment as well  as equipment for housing animals
and controlling temperature, humidity and lighting.  Enclosures for  animals should be large
enough to permit normal behavior and movement.  Housing and maintenance conditions should
be in accordance with acceptable  animal husbandry practices (e.g., United  States Department of
Agriculture Animal  Care Regulations).   Construction materials  and equipment that are toxic,
may affect toxicity, or that  may sorb test substances should not be used.  Pens  should be
constructed of galvanized metal,  stainless steel, or perfluorocarbon plastic (e.g. Teflon).  Wire
mesh  should  be used for floors and external walls; solid sheeting  should  be used for common
walls and ceilings.  See test-specific OCSPP Series 850, Group B guidelines for identification of
any atypical facility, equipment, or supplies used in the test.

(2)  Maintenance and reliability.   All equipment  used in conducting the test, including
equipment used to  prepare and administer the test substance, and  equipment to maintain  and
record environmental conditions,  should be of such design and capacity that  tests involving this
equipment can be conducted in a reliable and scientific manner. Equipment should be inspected,
cleaned, and maintained regularly, and be properly calibrated.

(3) Permits.  Experimental use permits may be required  for the  terrestrial testing of pesticides
under field conditions.  Recommend consulting with the Agency  prior to conducting a field test
to identify what, if any, federal permits are required

(d) Experimental design and data analysis—

(1) Design elements.  Elements of experimental design such as  the number of test treatments,
progression factor between treatment levels, number of replicates, and  number of organisms per
replicate and per treatment  are based upon the purpose of the test, variability  expected in
response measurements, and  the type  of statistical procedures that  will be used to  evaluate the
results.  See the test-specific  guidelines for specific information relating to these  aspects of test
design.  General principles of test design are set forth in this guideline.  General guidance on the
statistical analysis of laboratory ecotoxicity tests  can  be  found in the references in paragraphs
(k)(l), (k)(2), (k)(15), (k)(16), (k)(17), (k)(26), (k)(27) and (k)(28) of this guideline.

(2) Calculation of endpoints—

       (i) Background.

              (A) Data generated in ecotoxicity tests with terrestrial  animals may be of three
              types:

              (1) Quantal (dichotomous), where the variable has only two  mutually exclusive
              outcomes  (e.g., dead or alive)—note  that quantal  data are a special case of
              discrete  data;
                                       Page 7 of23

-------
       (2) Discrete, where there is a finite number of values possible or there is a space
       on  the  number line between two possible values  (e.g., number of young
       produced); or

       (3) Continuous, where the variable can assume a continuum of possible outcomes
       (e.g., height and weight).

       (B) These data may be analyzed using regression-based techniques or hypothesis-
       testing procedures depending on the objectives and endpoints of a specific test
       guideline. Traditionally, the results  of acute toxicity tests have been expressed as
       point estimates  (e.g., LCso or LDso for lethality, or ECso or ICso for other effects),
       while the results of chronic tests have been expressed as the results of hypothesis-
       testing procedures to determine  the NOEC and LOEC (or NOEL  and LOEL).
       Regarding terminology, the  term  ICX is more appropriately used for continuous
       endpoints, rather than ECX. For information on the advantages and disadvantages
       of these approaches, see the references  in  paragraphs (k)(5),  (k)(8), (k)(16),
       (k)(17)  and (k)(20) of this guideline.  Specific test guideline objectives, either
       point  estimate  or  hypothesis-based endpoints or both, are identified in  each
       specific test guideline.

(ii) Point estimates and concentration-response or dose-response tests.  This type of
toxicity test is designed to allow calculation of a concentration- or dose-response curve
(mathematical model) and to estimate one or more specific points (point estimates) on the
curve,  such as  an LDso.  Because of the normal variation in sensitivity of individuals
within a group of test organisms,  a measure of the degree of certainty in the model
parameters and the point estimate value(s) should be determined.

       (A) No  single  statistical technique is appropriate for all data sets,  and  the
       assumptions and requirements of each method should be known before using (see
       paragraphs (k)(l),  (k)(4), (k)(6), (k)(7), (k)(9), (k)(10), (k)(ll), (k)(14), (k)(20),
       and (k)(30) of this guideline). Not all methods suitable for continuous data are
       appropriate for  quantal data  (see paragraphs (k)(4) and (k)(14) of this guideline).
       For point estimate tests, regression-based methods (e.g., probit) that model  the
       full concentration- or dose-response relationship and  provide error estimates of
       the  model parameters and point estimate(s) are desired.  The regression model
       used to  fit  data should be recorded,  and  the error  estimates of the model
       parameters (e.g., standard error of slope and intercept), and goodness-of-fit should
       be calculated and recorded.   For  a  point estimate  (e.g., LCso or LDso) the 95%
       confidence limits and standard error are calculated and recorded. If data do not fit
       a regression-based model, other point estimator methods (e.g., binomial, moving
       average, trimmed Spearman-Karber, linear interpolation (e.g., Boostrap ICp)) are
       available (see paragraphs (k)(24), (k)(27) and (k)(28) of this guideline). Which of
       these other methods is selected is dependent upon the shape of the concentration-
       response curve, the  number of treatments with partial  mortalities  (i.e., where
       mortality  is greater than 0% but  less  than 100%), the magnitude of these
       mortalities, and the  number of replicates.   The  method used  to  estimate  the
       endpoint and,  if applicable,  the 95%  confidence  limits for the point  estimate
       should be recorded.

                                Page 8 of23

-------
       (B) Dose-response models are good estimating tools only for the range of doses
       used to fit them; therefore, endpoints that are extrapolated beyond the range of the
       doses tested would be considered to be of lower confidence or potentially, of such
       low confidence that they would not be appropriate to estimate.

(iii) Hypothesis-based methods—

       (A) Multiple-concentration or dose-definitive tests.  In this type of test, the
       purpose is to determine if the biological response to a treatment level differs from
       the response of a control.  Hypothesis testing-based endpoints, expressed as the
       NOEC and LOEC (or  NOEL and  LOEL), are  calculated by determining
       statistically significant differences from the control.  The null hypothesis is that
       no difference exists among the mean  (or median if nonparametric) control and
       treatment responses. The alternative hypothesis is that the treatment(s) results  in
       an adverse biological  effect relative  to the  control  sample.   Parametric and
       nonparametric analysis of variance (ANOVA) tests and multiple-comparison tests
       are often  appropriate  for  continuous  data  and  for  count data  and may  be
       appropriate for some categorical data (rank, order, score). Contingency table tests
       are usually  appropriate for categorical data. Parametric tests are based on normal
       distribution theory and assume that the data  within  treatments are a random
       sample from an approximately normal distribution and that the error variance  is
       constant  among treatments.   These  assumptions should  be examined using
       appropriate tests,  and  data transformations (see paragraph  (d)(2)(iv)(A)  of this
       guideline) or non-parametric techniques should be used where the assumptions
       are not met. Where possible multiple comparison tests that restrict the number  of
       comparisons made should be used.   Generally,  the  more powerful multiple-
       comparison  tests  are  those which  assume a  concentration- or dose-response
       relationship  in the data.   When the assumption  of a monotonic dose-response
       holds, Williams' and Jonckheere's test, respectively, are examples of parametric
       and nonparametric tests that can be used.  When  the assumption of a monotonic
       dose-response fails, Dunnett's t-test and either  Steel's many-one rank test or the
       Wilcoxon rank sum test with Bonferroni adjustment, respectively, are examples  of
       parametric and nonparametric multiple comparison tests requiring no assumption
       about  the dose-response but which restrict comparisons  of the treatments to a
       control. A measure of the sensitivity of the test, such as the minimum significant
       difference (parametric tests), should be calculated.  Alternatively, a calculation  of
       the number of replicates  necessary to achieve  data quality objectives given the
       actual measured test responses and variability should be made.  At a minimum,
       the percent reduction from the control for each treatment should be calculated.

       (B) Types of decisions and errors.

       (1) Table 1  presents the two possible outcomes and decisions that can be reached
       in the  statistical  hypothesis tests discussed in paragraph  (d)(3)(ii)(A) of this
       guideline:

       (a) There is no difference among the mean control and treatment responses; or


                                Page 9 of23

-------
              (b)  There is a difference among the  mean  control  and treatment responses
              (concerned with direction, where response is adverse relative to the control).
              (2) Statistical tests of hypothesis can be designed to control for the chances of
              making incorrect decisions.  The types of incorrect and correct decisions that can
              be made in a hypothesis-based test and the probability of making these decisions
              are represented in Table 1.  For multiple comparison tests the Type I error rate is
              controlled to account for multiple test comparisons.

       Table 1.—Types of Errors and the Probabilities of Making Correct and Incorrect
Decisions Based on the Results of Testing
Test Decision Outcome:
Treatment Response > Control Response
Treatment Response < Control Response
Actual (or True) Condition:
Treatment Response >
Control Response
Correct Decision
probability = 1- alpha (a)
Type I error (False positive)
probability = a
Treatment Response < Control
Response
Type II error (False negative)
probability = beta (p)
Correct Decision
probability (Power of test) = 1-p
              (C) Power of the test.  Power of the test versus percent reduction in treatment
              response  relative to the control  mean at various coefficients of variation  is
              provided  in the reference in paragraph (k)(24) of this guideline.  Examples are
              specifically given for 5 and 8 replicates for a one-tailed test alpha (a) of 0.05 and
              0.10.  Effects on the number of replicates  at various coefficients of variation are
              also provided in the reference in paragraph (k)(24) of this  guideline for various
              low a and beta (P) values (i.e., a  + P =  0.25).  See also the  references  in
              paragraphs (k)(9) and (k)(25) of this guideline.

              (D) Limit  test.  In a  limit test it is only necessary to ascertain that: a fixed
              standard  (such  as  the LDso or LCso for acute oral and subchronic  dietary,
              respectively) is  greater than a given threshold; and/or the  response  at the limit
              dose or concentration  does not differ  from  the control  response.   Only  one
              treatment, the limit dose  or concentration, and  the  appropriate control(s) are
              tested. This is referred to as a limit test or maximum challenge concentration test.

              (1)  Fixed standard. For a fixed standard limit test, the null hypothesis is that the
              estimated limit treatment parameter (e.g., percent survival or average weight gain)
              is greater than or equal to  the fixed threshold value  (e.g., 50% survival).   The
              alternative hypothesis is that the estimated limit parameter  is less than the fixed
              threshold value  (e.g., 50% survival)  (Concerned with direction, where response
              is  inhibition  relative to the control  switch hypotheses around.)  Examples  of
              statistical approaches are one sample binomial tests or one sample t-tests.

              (2) Difference between two means  (or medians). For testing if the treatment
              level affects the test organism, the null hypothesis is that the treatment mean (or
              median) response is equal to the control response mean (or median) level and the
              alternative hypothesis is that the treatment mean response differs from the control
              response.   The  direction  of  the  alternative  hypothesis  depends  on what  is
              considered  an adverse direction for the specific response being evaluated, such  as
              decreased  number  of eggs laid,  decreased   proportion  of  uncracked eggs,
                                      Page 10 of 23

-------
              decreased  number  of surviving 14 day old chicks  as  compared to the control
              response. Examples of parametric and nonparametric two-group comparison tests
              are Student's t-test and Wilcoxon rank sum test, respectively.

       (iv) Transformations, outliers, and non-detects—

              (A)  Transformations.  Transforma-tion of data (e.g., square root, log, arcsine-
              square root) may be useful for a number of statistical analysis purposes.  The two
              main reasons are to satisfy assumptions for statistical testing and to derive a linear
              relationship between  two variables,  so that linear regression analysis can be
              applied.  Added benefits include consolidating data that may be spread out or that
              have several extreme values (see reference in paragraph (k)(25) of this guideline).
              Once the data have been transformed, all statistical analyses are performed on the
              transformed data.

              (B) Outliers.  Outliers are measurements that are extremely large or small relative
              to the rest of the  data  and,  therefore,  are suspected  of misrepresenting the
              population from which they were collected.  Unless there is a known documented
              reason for the outlier(s), such as measurement system  problems or instrument
              breakdown, the  statistical  analyses  performed should  at a  minimum  include
              results using the full data set (i.e., the suspected outlier(s) are  not  discarded).
              Outliers  should not be discarded based on a statistical outlier test (see reference in
              paragraph  (k)(25)  of this guideline).  The  analyst may conduct  all statistical
              analysis  of the  data with both  a  full  and truncated  (presumed  outliers are
              discarded) data set, however, so that  the effect of the presumed outlier(s) on the
              conclusion may be assessed.

              (C) Nondetects.  Data generated from chemical analysis that falls below the LOD
              of the analytical procedure are generally described as not detected, or nondetects,
              (rather than as zero or not present) and the appropriate LOD should be reported.
              There are a variety of ways to evaluate data that include both detected and  non-
              detected values (see reference in paragraph  (k)(25) of this guideline). However,
              for a satisfactory test in a number  of the Group B guidelines, test substance
              concentrations should not be  below  the LOD (see specific OCSPP Series  850,
              Group B guidelines), except in controls.

(3) Selection of test treatments—

       (i) Point estimate and  concentration-response or dose-response test.  Toxicity  tests
       where the objective is the concentration- or dose-response  curve (mathematical model)
       and a specific point(s) on the curve (e.g.., LD50), usually consist of a control treatment and
       at least five test treatments which should bracket the specific point response of concern
       for the test.  To obtain a reasonably precise estimate of the LCso or LDso using probit
       analysis, for example, one  or more treatments should kill between, but not include, 0 and
       50% of the test organisms  and one or more  treatments should kill  between,  but not
       include, 50 and 100% of the test organisms. The spacing between test treatment levels
       (doses or concentrations) depends upon the expected slope of the concentration- or dose-
       response curve,  information about which can be gained during a range-finding test.  The
       test treatment levels (doses or concentrations) are usually selected in a geometric series in
                                      Page 11 of 23

-------
       which the ratio is between 1.5 and 3.2.  When the objective of the test is to determine a
       regression-based estimate and sampling  size constraints apply, the use of more treatment
       levels is preferable to the use of more replicates. The inclusion of additional treatment
       levels rather  than additional  replicates  results in better  characterization of the overall
       concentration- or dose-response relationship.

       (ii) Hypothesis-based test—

              (A) Multiple-concentration or -dose definitive test. Each test usually consists
              of a control  treatment and  at least five test treatments which  span the expected
              environmental concentrations and where at least the lowest treatment level is the
              NOEC (or NOEL).  The test treatments are usually selected in a geometric series
              in which the ratio is between 1.5 and 3.2.  A key assumption is that the response
              data is  monotonic with increasing  concentration or dose  (i.e.., the  degree of
              biological  effect increases as concentration or dose increases) or that there is a
              threshold response such that a NOEC (or NOEL)  for a given biological response
              should not occur at a treatment concentration or dose higher than one found to be
              statistically different from the  control for the given biological response.  If these
              assumptions do  not hold, it is recommended that  additional concentrations or
              doses be included to better characterize the relationship of the biological response
              with exposure concentration or  dose.  If high variability in a given response
              measurement is expected, increasing the number of replicates is recommended.

              (B) Limit test. A limit test  consists of a single treatment level and the appropriate
              control.   Individual  OCSPP  Series  850  Group   B guidelines identify the
              concentration or dose that satisfies the limit treatment level test for that guideline.

(4) Randomization.   For  test results to be satisfactory test treatments should be randomly
assigned to  individual test containers  and the  test containers randomly  assigned to locations.
Randomized block designs may be used. For  test results to be satisfactory, test organisms should
ideally  be randomly assigned to the test containers; where this is not practical impartial
assignment can be used (with the exception of assignment intentionally according to sex). (Note:
random assignment as used here  implies  a mathematically-based unbiased assignment method
and impartial assignment implies a non mathematically-based unbiased assignment procedure.)
All test containers should be treated as similarly as possible to  eliminate potential  bias in test
results.  The methods used to randomize  treatments among test containers and test containers
among locations should be recorded,  as well as methods of impartial organism assignment to test
containers.

(5) Number of  replicates.  The number of replicate  test containers for a given treatment is
dependent upon the objective of the test. Except for field tests which are designed on a case-by-
case basis, the minimum number of replicates  for a given test is  described in each individual
OCSPP Series 850 Group B guideline.

       (i) Regression-based test.  When  the objective of the test  is to determine a regression-
       based  estimate and sample  size   constraints  apply,  the  inclusion of  additional
       concentrations rather than additional  replicates results in better characterization of the
       overall concentration-response  relationship.  The objective of some OCSPP Group B
       guideline tests includes determination of both  a regression-based point estimate  (e.g.,
                                      Page 12 of 23

-------
             and a hypothesis-based endpoint (e.g., NOEC) in which case the minimum number
       of replicates will be determined by the hypothesis-based method.

       (ii)  Hypothesis-based test.  For hypothesis-based tests,  the determination of the test-
       specific number of replicates  depends upon the objectives of the test, the statistical
       method(s) that may be used, the coefficient of variation, the size of effect to be detected,
       and  the  acceptable  error  rate.   (Note: several of the  recommended  non-parametric
       multiple-comparison tests  can  not  be performed without at least a minimum of four
       replicates.)  Individual testing  facilities should consider variability observed  in  their
       laboratory and  adjust the number of replicates upward where  the minimum replication
       number identified  in a test specific guideline is not sufficient to  provide the statistical
       power to  detect adverse effects to the test  organisms or, if  appropriate, identify and
       correct any  environmental, handling, and culturing conditions, etc. that  are resulting in
       the high variability.

(e) Test substance characterization—

(1) Background  information on the  test substance.  The information in paragraphs  (e)(l)(i)
through (e)(l)(vi) of this guideline should be known about the test substance prior to testing:

       (i) Chemical name; CAS number; molecular structure; source; lot or batch number; purity
       or percent active ingredient (a.i.); identities and concentrations of major  ingredients and
       major impurities; radiolabeling if any,  location of label(s), and  radiopurity; date of most
       recent assay and expiration date for  sample.

       (ii)  Appropriate storage and handling conditions for  the test  substance to protect the
       integrity of the test substance.  (Note: health and safety precautions  should also be
       known. These  considerations are beyond the scope of these guidelines and depend  upon
       the characteristics of the test substance).

       (iii) Physical and chemical properties of the test substance, including solubility in water
       and  various  solvents;  vapor pressure; hydrolysis at  various  pH, etc.   Of particular
       relevance are rates for processes such as hydrolysis, photolysis, and volatilization.

       (iv) Stability and solubility as relevant, under the test conditions (see paragraph (e)(2) of
       this guideline).

       (v) Physical and chemical properties and stability information for the  analytical standard
       (if applicable).

       (vi)  Analytical  method for quantification of the test substance in the  feed or dosing
       solutions.  Analyses are conducted with the specific media which will be  used during the
       test,  i.e., under test conditions.
(2) Preliminary analyses.
                                       Page 13 of 23

-------
       (i) The Agency recommends preliminary testing of the test substance. The information
       about stability of the test substance should be developed under actual test conditions.
       This information can be gained while doing the range-finding studies.

       (ii) Information on the behavior of a test substance should be based on experiments which
       are conducted under the  same  conditions as those occurring during the test.  These
       include but are not limited to:

              (A) Test diet or dosing solution characteristics.

              (B) Temperature, humidity, lighting.

              (C) With test organisms in place (when practical).

              (D) Use of the same test containers.

       (iii) The tests in paragraphs (e)(2)(iii)(A) through  (e)(2)(iii)(D) of this guideline should
       be performed:

              (A) Stability trials should be conducted under actual test conditions.

              (B) If relevant, solubility trials should be conducted under test conditions.

              (C) Chemical analysis methods as detailed in paragraph (g) of this guideline.

              (D) Determination of storage stability of the test substance in the samples to be
              collected for chemical analyses.   This  includes  determining  whether  and  how
              samples can be stored for future analysis.

(3) Sample storage.  If samples  of the diet or other dosing preparation collected for  chemical
analysis cannot be analyzed immediately,  they  should be handled and stored appropriately to
minimize  loss  of the test  substance.  Loss  could be  caused by such processes as  microbial
degradation, hydrolysis,  oxidation, photolysis,  reduction, sorption, or volatilization.   Stability
determination under storage conditions, whether it refers to storing the test substance before
testing or storing samples awaiting analysis, is required by GLP  regulation.  Test  substance
stability under storage conditions should be documented.

(4) Stability.  A test substance is considered to be stable under actual test conditions if, under
those conditions, it does not degrade, volatilize, dissipate, or otherwise decline to concentrations
less than 80% of the initial measured concentration during the study period.

(5) Analytical test substance determinations—

       (i) Measurement at initiation and termination of testing.

       (A) For stable test substances in the diet it is preferred that the concentration in the diet
       be confirmed at the beginning and end of the longest exposure period in laboratory tests,
       but minimally, analyses should be conducted at the initiation of exposure.
                                      Page 14 of 23

-------
       (B) If the test substance is known to be unstable to the  extent that 20% or more loss
       occurs  over the longest exposure period under test  conditions, then a second  series of
       analyses of the  same concentrations  previously  analyzed should be  conducted with
       samples taken at the end of the exposure period.  If it is observed that the stability or
       homogeneity of the test substance in the diet or dosing solutions cannot be maintained,
       care should be taken in the interpretation of the results and a note made that the results
       may not be reproducible.

       (ii) Field tests. For field tests, media and frequency of testing depends on the objective
       of the study, the stability and fate of the test substance, and is determined on a case-by-
       case basis.

(6) Measured  concentrations  versus   nominal  concentrations.  This  section  describes
acceptable limits of deviation of measured from nominal concentrations.

       (i) Pesticides and  other chemicals  that are used at  very  low levels tend to have high
       biological activity.  For this reason, it is imperative  that the toxicity data developed for
       these test substances be accurate and scientifically defensible.  Toxicity results should
       also be precise (repeatable and reproducible).

       (ii) Measured concentrations are used because:

       (A) There  are concerns that the actual concentrations to which  the test organisms  are
       exposed  may differ  from "nominal."   This  variation  may  be  due  to  chemical
       characteristics, test conditions, or mechanical  apparatus.   Exposure  estimates using
       measured concentrations account  for  characteristics that make  testing difficult (high
       volatility, short half-life, etc).  These  characteristics are not a  reason  for developing
       misleading toxicity values from laboratory tests.

       (B) Measured concentrations confirm that the test system was designed appropriately and
       is  operating acceptably.  Measurement of test concentrations is not performed  solely to
       determine if the technician knows how to prepare the dosing matrix once. Among other
       things,  this measurement also ensures that the dosing matrix was prepared correctly each
       time. It corroborates the precision of the technician or mechanics of the test system.

       (iii) If test levels  are not measured, the nominal values  are used to calculate the test
       endpoints.  If the  test substance has degraded  or has become  unavailable because of
       volatility, photodegradation, hydrolysis, etc., the test substance may be characterized as
       less toxic than it actually is.

       (iv) When a laboratory test design has  been specifically modified to accommodate  the
       instability  of test substance  or   other factors  likely  to cause variability  in  test
       concentrations,  and the design is judged adequate  based on  sufficient preliminary
       information,  the  study will  not  be rejected solely on  the  grounds  that  measured
       concentrations varied  by more than 20% of the nominal  concentration.  (This assumes
       that the  preliminary  stability  tests were  conducted under test conditions  essentially
       identical to the actual  test conditions.)  A change in measured test concentration of more
       than 20% from the nominal concentration during the test will generally not  result in

                                      Page 15 of 23

-------
       rejection, provided that the conditions in paragraph (e)(6)(iv)(A) through (e)(6)(iv)(E) of
       this guideline are met:

       (A) A reasonable and  scientific explanation is given,  and the variability of results
       produced by the chemical analysis method is adequately characterized.

       (B) All test treatment levels exhibit a similar (but not necessarily identical) shift.   If
       concentrations  at  some   treatment  levels  go  up  substantially  (>20%)  and  test
       concentrations at  other treatment levels go  down substantially (>20%), they will not be
       considered to have exhibited a similar shift.  The most important validity element is that
       test levels not experience a shift in "order." That is, the highest test level should remain
       the highest;  the next should remain second, etc.  If orders are shifted, the test may be
       rejected,  since  regression  analysis would  not yield  statistically sound median lethal
       concentrations and confidence limits.

       (C) The variability of the measured concentrations is acceptable.

       (D) A statistically valid  endpoint can be derived from  the measured concentrations  (e.g.,
       either an LCso, LD50, etc., or that the LCso or LD50 is greater than the limit concentration).

       (E) The preliminary stability information is provided  with complete documentation and
       description of methods used to derive such information.

       (v) In some  cases, high variability cannot be avoided because the test concentrations are
       approaching the LOD or  because  of unavoidable binding of the test  substance to the
       chemical analysis apparatus.  When the ratio of the highest concentration to the lowest
       measured  concentration is  expected to vary by more  than 1.5 the submitter is strongly
       advised to justify an exception to this standard in advance of conducting the studies.  This
       exception justification should include:

       (A) Documentation of the preliminary analyses indicating this problem.

       (B) The specific steps that will be taken to reduce the variation.

       (C) The fully developed chemical analysis method.

       (D) The raw data,  standards, quality  control  samples,  and  chromatograms from  a
       representative analysis  using the  method.   For each chemistry method,  identify the
       method detection limit and limit of quantification.

       (vi) The  Agency will decide on  each exception justification on a case-by-case basis.
       However,  if a series of tests are to be conducted with one chemical and it is anticipated
       that these  limits will be exceeded,  one exception justification may cover more than one
       study.  The Agency will then exercise judgment in evaluating studies with test substances
       that are difficult to measure.

(f) Preparation of test substance.
                                      Page 16 of 23

-------
(1) The preferred choice for preparation of the test substance is to use reagent water (deionized,
distilled or reverse osmosis water), providing the test substance can be dissolved in water and
does not readily hydrolyze. If the test substance cannot be dissolved in reagent water, vehicles
are often used. If a vehicle, i.e. a solvent, is absolutely necessary to dissolve the test substance,
the amount used  should not exceed the minimum volume necessary to dissolve or suspend the
test substance.  If the test substance is a mixture, formulation or commercial product, none of the
ingredients is considered a vehicle unless an extra amount is used in its preparation for testing.

       (i) Preferred vehicles include acetone, methylene chloride, table grade corn oil, propylene
       glycol, gum arable (acacia), and  1% carboxymethylcellulose.

       (ii) If a vehicle is used to prepare the test substance, a  vehicle control is included in the
       test.  The  same batch of vehicle used to prepare the test  treatment doses or concentrations
       is used in the vehicle control.  For a valid test, the selected vehicle should not affect the
       test organisms  at the  concentration used.   A vehicle should  not interfere with the
       metabolism (degradation) of the test substance, alter the chemical properties of the test
       substance, or produce physiological or toxic effects to test organisms.

       (iii) Ideally, vehicle concentration should be kept constant in the vehicle control and all
       test treatments.    If the  concentration  of vehicle is  not kept  constant, the  highest
       concentration of vehicle used in any test treatment level should be used in the vehicle
       control. The vehicle should not comprise more than 2% by weight of the treated diet.

(2) All techniques used in  stock solution preparation of test  substance  (shaking, stirring,
sonication, heating, solvent,  etc.) should be recorded.  The  appearance of the stock solution
should be observed and  recorded.

(3) If the test substance is a  formulated pesticide product, the test concentrations should be
expressed in terms of the concentration of a.i.

(g) Analytical methods and sampling for verification of exposure—

(1) Method validation.

       (i) The analytical method used  to measure the amount of test substance in the diet or
       dosing solutions should be validated by appropriate laboratory practices before beginning
       the definitive test.  An analytical method is not acceptable if likely degradation products
       of the test substance, such as hydrolysis and oxidation products, give positive or negative
       interferences which cannot be  systematically identified and mathematically corrected,
       unless it is shown that such degradation products are not present during the test.

       (ii) Method validation is conducted for  the purpose  of determining the linear range,
       detection  limit, accuracy and precision (repeatability and reproducibility) of the  method
       for analysis of the test  substance under the conditions of the test.  Thus, quality control
       (fortification)  samples  should be  prepared  at concentrations spanning  the  range of
       concentrations to be used in the  definitive test, using the same procedures (vehicles, etc.)
       and in the same matrix (feed, etc.) as will be used in the  definitive test.
                                       Page 17 of 23

-------
       (iii) The method validation should include a determination of linearity between detector
       response and test substance concentration, the LOQ, the MDL, method accuracy (average
       percent recovery) and precision (relative  standard deviation).  The method validation
       should establish the acceptance criteria for the quality control (QC)  samples that will be
       prepared and analyzed during the test.

(2) Collection of samples.   Samples  should be collected in  such  a manner as to provide an
accurate representation of the matrix being sampled. Samples should be processed and analyzed
immediately, or handled and stored in a manner which minimizes loss of test substance through
microbial  degradation, photodegradation, chemical reaction,  volatilization, sorption or other
processes.

(3) Analysis of test samples.  Concurrent with each analysis of test  samples, quality control
(fortified) samples should be  analyzed.  QC samples are prepared by adding known amounts of
the test substance to the test matrix. Minimally, one QC sample should be at the low end  of the
test concentration range and one QC sample at the high end.  A control (zero-level fortification)
sample should also be included.  To determine concordance with the provisions of paragraphs
(e)(4), (e)(6)(iv) and (e)(6)(v) test sample recoveries may be corrected for inherent method bias
as determined from the concurrent analysis of freshly fortified QC samples.

(h) Reference toxicants.  Historically, reference toxicity testing has been thought to provide
three types of information relevant to the interpretation of toxicity test data:  An indication  of the
relative health of the organisms used in the test; A demonstration that the laboratory can perform
the test procedure in a reproducible manner over a period of time;  and Information to indicate
whether the sensitivity of a particular strain or population in use at a laboratory is comparable to
that of those used in other facilities and how intra- and inter-laboratory sensitivity varies over
time.  However, performance  of control organisms over time may be a better indicator of success
in handling and testing of at least some organisms.  Nonetheless,  periodic reference  toxicant
testing can  provide an indication  of the overall comparability of results within and among
laboratories.  Although a positive control is not standard for each test, a quarterly or semiannual
positive control (on a guideline-specific basis) can  serve as a means  of detecting  possible
interlaboratory or temporal variation. A reference toxicant might also be desirable when there is
any significant change in food, housing, source of test  animals  or in  other  test conditions.
Despite these potential uses  of reference toxicants, alternative means of assessing organism
health, test reproducibility, and intra- and inter-laboratory variation should be considered in order
to minimize of the use of test animals.

(i) Monitoring of test conditions.  Test conditions are specified in  each test-specific guideline.
These  conditions include  environmental factors such as  temperature, humidity,  and lighting.
Methods used for monitoring test conditions should be in  accordance with  established methods
(e.g., those published by U.S.  EPA, ASTM, APHA et al., etc.).

(1) Temperature.   Preferably, temperature should be monitored continuously (recorded at least
hourly).   Alternatively, the maximum and  minimum should be measured daily  (which is a
minimum  of at least two  measurements during  each  24  hour  period  during the study).
Temperature measurements should be made in at least one representative location.

(2) Humidity. Where applicable,  humidity should be monitored continuously in at least one
representative location.
                                      Page 18 of 23

-------
(3) Lighting.  Guidance for lighting in laboratory toxicity tests can be found in the reference in
paragraph (k)(3) of this guideline.

(j) Reporting—

(1) Background information.  In addition to the reporting requirements prescribed in the Good
Laboratory Practices  Standards (40 CFR part 792 and  40 CFR  part 160), the  report should
include the information in paragraphs (j)0)(i) through (j)(l)(vi) in this guideline:

       (i) Test facility (name and location), test dates, and personnel.

       (ii) The  name of the  sponsor,  study director, principal investigator,  names  of other
       scientists or professionals, and the names of all  supervisory personnel involved in  the
       study.

       (iii) Raw data sufficient to allow independent confirmation the study authors' conclusions
       should be presented with the study report. Raw data includes all measurements recorded
       during  the  study including,  but not limited  to,  effects  (mortality,  growth, etc.),
       environmental conditions (temperature, etc.) and  test substance  concentration or dose
       measured as specified and are used for the reconstruction and evaluation of the report of
       that study.  The absence of raw data may make the study incomplete and impossible to
       review for scientific soundness and thus can lead to rejection of the study as scientifically
       sound.

       (iv)  The  signed  and   dated  reports  of each of the  individual  scientists  or other
       professionals involved in the study, including each person who, at the request or direction
       of the testing  facility  or sponsor,  conducted an analysis  or evaluation of data or
       specimens from the study after data generation was completed.

       (v) The locations where all raw data and the final report are stored.

       (vi) The statement prepared and signed by the quality  assurance unit  identifying whether
       or not the study was  conducted in compliance with Good Laboratory  Practices Standards
       (40 CFR part 792 or 40 CFR part 160).  Alternatively the  statement can indicate it was
       conducted under OECD Principles of Good Laboratory Practice, in accordance with  the
       multilateral agreement with OECD member countries.

(2) Data elements. The test report should provide a complete and accurate description of test
procedures and evaluation of test results  including but not limited  to the  material in  paragraphs
(j)(2)(i) through (j)(2)(xiii) of this guideline:

       (i) Objectives and procedures stated in the guideline, including any changes or deviations
       or occurrences which may have influenced the results of the test.

       (ii) Identification of the test substance (including source, lot or batch number,  and purity)
       and known physical  and chemical properties that are  pertinent to the test.  As relevant,
       solubility and stability of the test  substance under the test conditions, and stability of the
       test substance under storage conditions if stored prior to analysis. It should be reported if
       a formulation is being tested. Where appropriate a cross-reference to OCSPP  Series 830

                                       Page 19 of 23

-------
(Product Properties Test Guidelines)  guideline study results can be used to report this
data.

(iii) Methods of preparation of the test substance and the concentrations or doses used in
definitive testing.  If vehicles are used, the name and source of the vehicle, the nominal
concentration of the test substance  in the vehicle, and the vehicle concentration(s) and
dosage used in the test.

(iv) Information about the test organisms.

(v) A description of the test system used in definitive and any preliminary testing. This
includes a description  of  the test chambers,  method  of test  substance introduction,
number of organisms per chamber, number of replicates per treatment, all environmental
parameters,  description  of any feeding during the test (if applicable), including type of
food, source, amount given and frequency.

(vi) Document and submit to the Agency the preliminary test results for review with the
study to which they apply.

(vii) Results of measurements  of test substance.  All analytical procedures and results
should be described. Report all chemistry methods used in preliminary trials, in range-
finding tests, in establishing percent purity of batches of test substance, or in measuring
concentrations in  feed,  dosing solutions,  or animals.  Include in the documentation a
complete description of the method  so that a bench chemist can independently determine
what equipment to use  and perform the analysis.  Also  include the raw data, standards,
quality control samples, and chromatograms from samples taken during either definitive
or range-finding tests, not of standard or samples from recovery tests. For a satisfactory
test,  the accuracy of the  method, LOD, MDL, and LOQ should be given.

(viii) Any difficulties in maintaining  constant test substance concentrations should also
be reported.  If it is observed that the stability or  homogeneity  of the test  substance
cannot be maintained, care should be taken in the interpretation of the results,  and note
made that the results may not be reproducible.

(ix) Methods, frequency, and results of environmental monitoring performed during the
study (temperature, lighting, etc) and other records of test conditions.

(x) Biological observations should be  reported  in sufficient detail to allow complete
independent evaluation  of the results (see specific  test guidelines in this group for a
description of what should be reported).

(xi) All data developed  during the study that is suggestive or predictive of toxic effects
and all concomitant gross toxicological manifestations.

(xii) Calculated endpoints and  a description of all statistical methods, including: software
used, handling of outlier data points, handling  of non-detect or zero values,  tests to
validate the assumptions of the analyses, level of significance, any data transformations,
for hypothesis tests a  measure  of the  sensitivity  of the  test (either the  minimum
significant difference or the percent  change from the control that this minimum difference

                               Page 20 of 23

-------
       represents). Raw data should be reported to allow independent verification of statistical
       procedures.

       (xiii) Methods used for test chamber and treatment randomization as well as methods for
       impartial assignment of test organisms to test chambers.

(k) References. The references in this paragraph should be consulted for additional background
material on this test guideline.

(1) American  Public  Health  Association,  American  Water Works  Association,  Water
Environment Federation,  1998.    Standard Methods for  the  Examination  of Water and
Wastewater, 20* edition. Part 8010, Toxicity: Introduction.

(2) American Society for Testing and Materials, 2003.  ASTM E 1847-96.  Standard practice for
statistical analysis of toxicity tests conducted under ASTM guidelines.  In: Annual  Book  of
ASTM Standards, Vol. 11.06, West Conshohocken, PA. Current edition approved December 10,
1996; Reapproved 2003.

(3) American Society for Testing and Materials, 2002.  ASTM E 1733-95.  Standard guide for
the use of lighting in laboratory testing.  In: Annual Book of ASTM Standards, Vol. 11.06, West
Conshohocken, PA.  Current edition approved September 10, 1995; Reapproved 2002.

(4) Bruce,  R.D. and DJ. Versteeg,  1992.  A statistical procedure for modeling continuous
toxicity data. Environmental Toxicology and Chemistry 11: 1485-1491.

(5) Chapman, G.A., B.S. Anderson, AJ. Bailer, R.B. Baird, R. Berger, D.T. Burton, D.L.
Denton, W.L. Goodfellow, M. A. Heber, L.L.  McDonald, T. J. Norberg-King and P. J. Ruffier,
1996. Methods and appropriate endpoints. In Whole Effluent Toxicity Testing, D.R. Grothe,
K.L. Dickson and  D.K. Reed-Judkins,  eds., SETAC Press, Pensacola,  FL.

(6) Daum, R.J., 1970.  Revision of two  computer programs for probit analysis.  Bulletin of the
Entomological Society of America 16:10-15.

(7) Daum, RJ. and W. Killcreas, 1966.  Two computer programs for probit analysis. Bulletin of
the Entomological Society of America 12:365-369.

(8) deBruijn, J.H.M. and M. Hof, 1997.  How to measure no effect.  Part IV: How acceptable is
the ECX from an environmental policy  point of view? Environmetrics 8:263-267.

(9) Fairweather,  P.G.,  1991. Statistical  power and  design  requirements  for environmental
monitoring.  Australian Journal Marine Freshwater Research 42:555-567.

       (10) Finney, D.J., 1971. Probit Analysis 3rd ed., Cambridge: London and New York.

(11) Litchfield, J.T., Jr. and F. Wilcoxon,  1949.  A simplified method of evaluating dose-effect
experiments.  Journal of Pharmacological Experimental Therapy 96:99-133.

(12) Nabholz, J.V., 1991.  Environmental hazard and risk assessment under the Toxic Substances
Control Act. Science of the Total Environment, 109/110: 649-665.

                                     Page 21 of 23

-------
(13) Nabholz, J.V., P. Miller and M. Zeeman,  1993.  Environmental risk assessment of new
chemicals under the Toxic  Substances  Control  Act (TSCA) Section 5,  In Environmental
Toxicology and Risk Assessment, Landis, W.G., Hughes, J.S., and Lewis, M.A., eds., ASTM
STP 1179, American Society for Testing and Materials, Philadelphia, PA, pp. 40 -55.

(14) Nyholm, N., P.S. Sorenson, K.O. Kusk, and E.R. Christensen, 1992. Statistical treatment of
data from microbial toxicity tests.  Environmental Toxicology and Chemistry 11:157-167.

(15) Organization for Economic Co-operation and Development, 1998.  Report of the OECD
Workshop on Statistical Analysis of Aquatic  Toxicity Data.   OECD Series on Testing and
Assessment, No.  10. ENV/MC/CHEM(98)18.

(16) Organization for Economic Co-Operation and Development. 2006. Current Approaches in
the Statistical Analysis of Ecotoxicity Data: A Guidance to  Application.  OECD Series on
Testing and Assessment, No. 54. ENV/JM/MONO(2006)18.

(17) Pack, S., 1993.  A review of statistical data analysis and experimental design in OECD
aquatic toxicology test guidelines.  Report to OECD. Paris.

(18) Smrchek, J.C., R. Clements, R. Morcock, and  W. Rabert, 1993.  Assessing ecological
hazard under TSCA:  methods and evaluation of data, In Environmental Toxicology and Risk
Assessment, Landis, W.G., Hughes, J.S., and Lewis,  M.A., eds., ASTM STP 1179, American
Society for Testing and Materials,  Philadelphia, PA, pp. 22-39.

(19) Smrchek, J.C.  and M.G.  Zeeman,  1998.   Assessing risks to  ecological  systems from
chemicals.  In Handbook of Environmental Risk Assessment and Management, P. Calow, ed.,
Blackwell Science, Ltd., Oxford, UK, pp. 24-90, Chapter 3.

(20) Stephan, C.E., 1997.  Methods for calculating an LCso.  In Aquatic Toxicology and Hazard
Evaluation, ASTM STP 634, F.L.  Mayer and J.L. Hamelink, eds., American Society  for Testing
and Materials, Philadelphia, PA.

(21) U.S.  Environmental  Protection Agency,  1982.    Pesticide   Assessment  Guidelines
Subdivision E, Hazard Evaluation: Wildlife and Aquatic Organisms.   Office of Pesticides and
Toxic Substances, Washington, D.C. EPA-540/9-82-024, October 1982.

(22) U.S. Environmental Protection  Agency, 1994.   Pesticides Reregi strati on Rejection Rate
Analysis: Ecological Effects, Office of Prevention, Pesticides and Toxic Substances, EPA 738-
R-94-035, December, 1994.

(23) U.S.  Environmental  Protection  Agency,  1997.   Terms  of   Environment,  Glossary,
Abbreviations, and Acronyms, Communications, Education, and Public Affairs, EPA 175-B-97-
001, December 1997.

(24) U.S. Environmental Protection Agency, 2000.  Methods  for Measuring the Toxicity and
Bioaccumulation of Sediment-Associated Contaminants with Freshwater Invertebrates, Second
Edition, EPA 600/R-99/064, March 2000.
                                     Page 22 of 23

-------
(25) U.S.  Environmental Protection Agency, 2000.   Guidance for Data Quality Assessment,
Practical Methods  for Data  Analysis.  EPA QA/G9.  Office of Environmental Information,
Washington, DC. EPA/600/R-96/084, July.

(26) U.S. Environmental Protection Agency, 2002  Methods for measuring the acute toxicity of
effluents and receiving waters to freshwater and marine organisms.  Fifth edition, Office of
Water, Washington, DC. EPA-821-R-02-012.

(27) U.S.  Environmental Protection  Agency, 2002.  Short-term methods  for estimating the
chronic toxicity  of effluents and receiving waters to freshwater organisms.   Fourth edition,
Office of Water, Washington, DC. EPA-821-R-02-013.

(28) U.S.  Environmental Protection  Agency, 2002.  Short-term methods  for estimating the
chronic toxicity  of effluents and receiving waters to marine  and estuarine organisms,  Third
edition, Office of Water, Washington, DC. EPA-821-R-02-014.

(29) U.S.  Environmental Protection Agency,  Code of  Federal  Regulations  (CFR)   Title
40—Pesticide Programs Subchapter E—Pesticide Programs.  Part 158—Data Requirements for
Pesticides.

(30) VanEwijk, P.H. and J.A. Hoekstra, 1993.   Calculation  of the ECso and its  confidence
interval when a subtoxic stimulus is present.  Ecotoxicology Environmental Safety 25:25-32.

(31) Zeeman, M. and J. Gilford, 1993. Ecological  hazard evaluation and risk assessment  under
EPA's Toxic Substances  Control Act (TSCA): an  introduction.  In Environmental Toxicology
and Risk Assessment, Landis,  W.G.,  Hughes, J.S., and Lewis, M.A., eds.,  ASTM STP  1179,
American Society for Testing and Materials, Philadelphia, PA, pp.  7-21
(32) Zeeman, M.G., 1995.  Ecotoxicity testing and estimation methods developed under Section
5 of the Toxic  Substances Control Act (TSCA),  In Fundamentals of Aquatic Toxicology,_2nd
Edition, G.M. Rand, ed., Taylor and Francis, Washington, DC, pp. 703-715.
                                     Page 23 of 23

-------