EPA/630/R-94/007
                                                                February 1995
The Use of the Benchmark Dose Approach in Health Risk Assessment
                             Principal Authors
             Kenny Crump, Ph.D.: Clement International Corporation
              Bruce Allen, M.S.:  Clement International Corporation
               Elaine Faustman, Ph.D.:  University of Washington
                           EPA Technical Panel
      Michael Dourson, Ph.D.: Environmental Criteria and Assessment Office
      Carole Kimmel, Ph.D.:  Office of Health and Environmental Assessment
            Harold Zenick, Ph.D.:  Health Effects Research Laboratory^
                       Risk Assessment Forum Staff
                 1  Executive Director: William Wood, Ph.D.
                 Science Coordinator:  Harry Teitelbaum, Ph.D.
                        Risk Assessment Forum
                 U.S. Environmental Protection Agency
                        Washington, DC  20460
                                                          Printed on Recycled Paper

-------
                                   DISCLAIMER

      This document has been reviewed in accordance with U.S. Environmental Protection
Agency policy and approved for publication.  Mention of trade names or commercial
products does not constitute endorsement or recommendation for use.
                                        11

-------
                                CONTENTS

LIST OF TABLES	   v

LIST OF FIGURES	   vi

FOREWORD	    vii

ACKNOWLEDGEMENTS	  viii

CONTRIBUTORS AND REVIEWERS	   ix

1.   INTRODUCTION	;•   l

2.   BACKGROUND	     3
    2.1.   Cancer Versus Noncancer Effects	   3
    2.2.   Overview of the NOAEL Approach to Determining RfDs and RfCs	   4
    2.3.   Overview of the Benchmark Approach	   5

3.   DETAILED DESCRIPTION OF THE BMD APPROACH	   9
    3.1.   Selection of Responses to Model	   9
         3.1.1.  Example:  Selecting Endpoints Within a Single Study	  11
    3.2.   Use of Categorical Versus Continuous Data in BMD Modeling .........  12
         3.2.1.  Example	  14
    3.3   Mathematical Models for Defining a BMD	  16
         3.3.1.  Criteria for Selecting Models	  ......  16
         3.3.2.  Example	  25
         3.3.3.  Additional Research	;	  27
    3.4.   Adjusting for Lack of Fit . . .	 . . .	  27
         3.4.1.  Examples	.  .  29
         3.4.2.  Additional Research	  36
    3.5.   Measure of Increased Risk	  37
         3.5.1.  Examples	  40
         3.5.2.  Additional Research	  44
   3.6.   Selection of a Benchmark Level of Risk	  44
         3.6.1.  Examples	  45
        3.6.2.  Additional Research	  46
   3.7.  Confidence Limit Calculation	  46
        3.7.1.  Example	  47
        3.7.2.  Additional Research	  48
   3.8.  Choosing an Appropriate BMD		  48
        3.8.1.  Examples	  49
        3.8.2.  Additional Research	  49
                                    111

-------
    3.9.  Uncertainty Factors  	•  49
         3.9.1.  Example	  51
         3.9.2.  Additional Research	,	  52
    3.10. Summary of HMD Decisions	  52

4.  DETAILED COMPARISON OF NOAEL AND BMD APPROACHES	  55
    4.1.  Conceptual Basis	  55
    4.2.  Relative Sizes of NOAELS and BMDS	  56
    4.3.  Constraints Imposed by the Experimental Design	  56
    4.4.  Number of Experimental Subjects and Their Distribution into Treatment
         Groups	  57
    4.5.  Incorporation of Dose-Response Information   	  58
    4.6.  Sensitivity to Data Interpretation  and to Small Changes in Data	  60
    4.7.  Model Sensitivity	  60
    4.8.  Quantitative Estimates of Risk	  61
    4.9.  Statistical Expertise	  62

5.  SUMMARY OF RESEARCH NEEDS	  63
    5.1.  Summary of Research Needs Related to BMD Decision Points  	  63
    5.2.  Additional Topics For Investigation/Development  	  63
         5.2.1.  Comparison of Dose-Response Curves for Different Types of Data
                and Toxic Endpoints	  63
         5.2.2.  Development of Dose-Response Models for Multiple Endpoints of
                Toxicity	  64

6.  REFERENCES	  66

APPENDIX—STATISTICAL METHODS  	A-l

GLOSSARY	G'1
                                       IV

-------
                                LIST OF TABLES

Table 1.     Description of Typical Uncertainty and Modifying Factors in Deriving
            Reference Doses (RfDs)	„	6

Table 2.     Steps and Decisions Required in the BMD Approach	  10

Table 3.     Acrylamide-Induced Tibial Nerve Degeneration in Rats	  15

Table 4.     Dose-Response Models Proposed for Estimating BMDs	 .  17

Table 5.     EGPE-Induced Extramedullary Hematopoiesis in the Spleen of Rats ....  30

Table 6.     EGME-Induced Testicular Toxicity in Rats and Mice	  33

Table 7.     Gestational Weight Gains in Pregnant Rats	  41

Table 8.     BMDs (mg/kg/day) Calculated for Sulfamethazine Data	  43

Table 9.     Summary of Decisions and Options for BMD Approach	 .  53

-------
Figure 1.

Figure 2.


Figure 3.


Figure 4.


Figure 5.


Figure 6.

Figure 7.


Figure 8.


Figure 9.

Figure 10.

Figure 11.

Figure 12.
                    LIST OF FIGURES

Example of calculation of a BMD		8

Examples of quantal linear regression (QLR) curves:
P(d)=c+(l-c){l-exP[-qi(d-d0)]}		  18

Examples of quantal quadratic regression (QQR) curves:
P(d)=c+(l-c){l-exp[-qi(d-do)2]}	,	  19

Examples of quantal Weibull (QW) curves:
P(d)=c+(l-c){l-exp[-q1(d)k]}	  20

Examples of continuous quadratic regression (CQR) curves:
m(d)=c+q!(d-d0)2	  21

Examples of continuous power (CP) curves: m(d)=c+q1(d)k	  22

Moderate to severe nerve degeneration in rats following acrylamide
exposure	  26
Extramedullary hematopoiesis of the spleen in rats following EGPE
exposure	
32
Testes weights in rats following EGME exposure  	  34

Testes weights in mice following EGME exposure	  35

Weight gain during gestation in rats exposed to sulfamethazine	  42

Example of HMDs calculated from steep versus gradual dose
responses   	  59
                                        VI

-------
                                      FOREWORD

       The EPA established the Risk Assessment Forum to develop scientific consensus on
 risk assessment issues and to incorporate these ideas into Agency guidance.   Part of this
 function is to focus on and stimulate Agency discussion of promising new risk assessment
 techniques.  For almost  10 years, scientists have been studying,the benchmark dose (BMD)
 as a promising technique for the quantitative assessment of noncancer health  effects.  This
 report was developed to  serve as a background document for discussing benchmark dose
 applications to noncancer risk assessment.
       Cancer risk assessment uses an assortment of quantitative methods, whereas, until
 recently, quantitative approaches  to noncancer risk assessment were much more limited.  The
 EPA is now developing new quantitative methods for noncancer risk assessment.  The
 information presented in this report is one step in developing the basis for an EPA consensus
 on the role of benchmark methods in the quantitative  assessment of noncancer health risk.
 The report presents a basic overview of the benchmark method, which may provide an
 additional quantitative approach to current EPA practice (i.e., the no observed adverse effect
 level/uncertainty factor approach for calculating a reference dose).  Clement International  .
 Corporation developed the main body of this report under contract to EPA.1  Agency
 scientists modified the Clement draft to prepare this document, which more closely reflects
 current EPA terminology and practice.
       The document focuses especially on critical decisions that must be made in deriving a
 BMD and applying the BMD in risk assessment.  Major decisions in using the BMD are
 explained, and the sensitivity of the final result to each assumption is evaluated.  The
document also identifies many unresolved issues in benchmark dose application and identifies
research that may help resolve some of these issues.  Technical guidance on study selection,
data selection, and model selection is not provided, but the reader is referred  to appropriate
sources on these highly technical topics.
'Kenny S. Crump; Bruce C. Allen; Elaine M. Faustman.  1992. "The Use of the Benchmark Dose Approach
 in Health Risk Assessment."  EPA Contract No. 68-C8-0036.
                                          Vll

-------
                              ACKNOWLEDGEMENTS

       This U.S. Environmental Protection Agency (EPA) report was developed under the
auspices of EPA's Risk Assessment Forum, a standing committee of EPA scientists charged
with developing risk assessment guidance for Agency-wide use. An Office of Research and
Development intraoffice technical panel including Drs. Michael Dourson (Environmental
Criteria and Assessment Office-Cincinnati), Carole Kimmel (Office of Health and
Environmental Assessment), and Harold Zenick (Health Effects Research Laboratory) led this
effort.  This document is based, in large part, on a final report titled The Use of the
Benchmark Dose Approach in Health Risk Assessment that was prepared by Dr.  Kenny S.
Crump and Bruce C.  Allen, Clement International Corporation, and Dr. Elaine  M.
Faustman, University of Washington, under contract to the EPA.  The Risk Assessment
Forum gratefully acknowledges their contribution.
                                         Vlll

-------
                       CONTRIBUTORS AND REVIEWERS


      Numerous experts, both inside and outside the Agency, provided technical review for

this document. In 1993,  the following scientists participated in a peer review of the draft

document and provided written comments to EPA via mail.  Editorial assistance was

provided by R.O.W. Sciences, Inc.
Chao Chen
U.S. Environmental Protection Agency
Office of Research and Development
Washington, DC

John Christopher
California Environmental Protection
  Agency
Sacramento, CA

George Daston
The Proctor and Gamble Company
Miami Valley Laboratories
Cincinnati, OH

Kerry Dearfield
U.S. Environmental Protection Agency
Office of Pesticide Programs
Washington, DC

Michael Dourson
U.S. Environmental Protection Agency
Environmental Criteria and Assessment
  Office
Cincinnati, OH

David Gaylor
National Center For Toxicological
  Research
Jefferson, AR
Suzanne Gianinni
U.S. Environmental Protection Agency
Office of Policy, Planning and Evaluation
Washington, DC

Man Golub
California Regional Primate Research
 Center
University of California at Davis
Davis, CA

Lee Gorsky
U.S. Environmental Protection Agency
Region 5
Chicago, IL

Richard Hertzberg
U.S., Environmental Protection Agency
Environmental Criteria and Assessment
 Office
Cincinnati, OH

Robert Kavlock
U.S., Environmental Protection Agency
Health Effects Research Laboratory
Research Triangle Park, NC

Linda Knauf
U.S., Environmental Protection Agency
Environmental Criteria and Assessment
 Office
Cincinnati, OH
                                        IX

-------
Daniel Krewski
Health & Welfare Canada
Ottawa, Ontario
Canada

Steven Lewis
Exxon Biomedical Sciences, Inc.
East Millstone, NJ

Elizabeth Margosches
U.S. Environmental Protection Agency
Office of Pollution Prevention and Toxics
Washington, DC

Edward Ohanian
U.S. Environmental Protection Agency
Office of Water
Washington, DC

William Pease
University of California at Berkeley
Berkeley, CA
Louise Ryan
Dana-Farber Cancer Institute
Boston, MA

Chon Shoaf
U.S. Environmental Protection Agency
Environmental Criteria and Assessment
  Office
Research Triangle Park, NC

Jeanette Wiltse
U.S. Environmental Protection Agency
Office of Research and Development
Washington, DC

Suzanne Wuerthele
U.S. Environmental Protection Agency
Region 8
Denver, CO

-------
                                 1. INTRODUCTION

       The U.S. Environmental Protection Agency (EPA) frequently calculates a reference
 dose (RfD) or reference concentration (RfC), which is used along with other scientific
 information in setting standards for noncancer human  health effects.  An RfD or RfC is an
 estimate (with uncertainty spanning perhaps an order of magnitude) of daily exposure (RfD)
 or continuous inhalation exposure (RfC)  to the human population (including sensitive
 subgroups) that is likely to be without an appreciable risk of deleterious effects during a
 lifetime (U.S. EPA, 1992).  The EPA estimates the RfD or RfC by first determining the no
 observed adverse effect level (NOAEL) for the critical effect; the NOAEL represents the
 highest experimental dose for which no adverse health effects have been documented.
 Missing information is then accounted for by the application of uncertainty factors to estimate
 a reference dose or concentration.
       Using the NOAEL in determining RfDs and RfCs has  many limits (reviewed by
 Kimmel and Gaylor [1988] and others and  noted by EPA's Science Advisory Board [U.S.
 EPA, 1986, 1988a, b,  1989]).  These limitations include the following:

       •     The experimental dose called the NOAEL is based on scientific judgment and
             is often a source of controversy.
       •     Experiments involving fewer animals tend to produce larger NOAELs and, as
             a consequence, may produce larger RfDs or RfCs (the reverse would seem
             more appropriate in a regulatory context because larger experiments should
             provide greater evidence of safety).
       •     The  slope of the dose response plays little role in determining the NOAEL.
       •     In conjunction with exposure data, the RfD/RfC can be used to estimate the
             size of the  "population at risk" but not the size of their risks.
       •     The NOAEL is limited to the doses tested experimentally.

      These and other limitations of the NOAEL approach prompted development of an
alternative that applies uncertainty factors to a benchmark dose (BMD) rather than to  a

-------
NOAEL (Crump, 1984; Gaylor, 1989; and others). A BMD is a statistical lower confidence
limit for a dose that produces a predetermined change in response rate of an adverse effect
(called the benchmark response or BMR) compared to background.  Unlike the NOAEL, the
BMD takes into account dose-response information by fitting a mathematical model to dose-
response data. The BMR is generally set near the lower limit of responses that can be
measured directly in animal experiments  of typical size. Thus, unlike the risk assessment
methods that EPA employs with cancer effects (Anderson et al., 1983), the BMD method
does not extrapolate to doses far below the experimental range.
       The EPA believes that the BMD approach presents a significant opportunity to
improve the scientific basis of noncancer risk assessment. This document aims to encourage
further application and development of the method by outlining the benchmark approach.  It
is hoped the BMD will add a new perspective to risk assessment and overcome some
limitations of the NOAEL.  To do this, the risk assessment  community must first become
familiar with the benchmark approach and its opportunities  and limitations.
       This document provides background information on  applying the BMD approach;
discusses the goals,  strengths, and limitations of the BMD approach; and provides a detailed
comparison of the NOAEL and BMD methods as well as examples of the steps required in
calculating  the BMD. The description of the benchmark approach includes the important
decisions and options at each step.  Finally, the document suggests areas for additional
research. As work  on the BMD continues and the approach matures, it is anticipated that
guidance will be developed for applying  the benchmark dose in estimating the RfD.

-------
                                  2. BACKGROUND

2.1.  CANCER VERSUS NONCANCER EFFECTS
       Assessment of risk from exposure to toxic chemicals has traditionally been performed
differently, depending on whether the response is cancer or a noncancer health effect (U.S.
EPA, 1987).  EPA cancer risk assessments use dose-response models to extrapolate risks
measured in high-dose animal experiments to the much lower doses typical of human
environmental exposures. This extrapolation depends on the dose-response model selected;
different models can fit experimental data equally well, yet yield low dose-risk estimates that
differ by many orders of magnitude (Crump, 1985). EPA generally uses a dose-response
model for estimating cancer risks that assumes that increased risk is proportional to dose at
low doses (i.e., increased risk varies linearly with dose at low doses) (U.S. EPA, 1987).  An
important consequence of this assumption is that any dose, no matter how small, is assumed
to result in some increase in cancer risk (i.e.,  it is assumed that a threshold for response does
not exist).
       Much of the rationale for these assumptions is based on  the idea that carcinogenicity
is mediated through genotoxicity. The possibility that a. single molecule of a genotoxicant
may be sufficient to alter the deoxyribonucleic acid (DNA) in a single cell so that a cancer is
eventually produced suggests that—no matter how unlikely such an event is—the dose-
response relationship cannot have a threshold and must be linear, at least at low doses (NRC,
1977).  Crump et al. (1976) argued more generally that whenever a biological effect occurs
spontaneously in the absence of any exposure, and the effect of the toxic insult is mediated
through augmenting processes that are already operating spontaneously, a threshold would
not be present and the response should vary approximately linearly with sufficiently low
doses.
      In contrast to risk assessment for cancer, less effort has been directed at developing
dose-response models for noncancer effects. One reason for this has been the lack of a
consensus regarding the  shape of the dose-response curve, especially  below the NOAEL for
noncancer effects.  Many scientists believe that thresholds are likely to exist for  many
chemically induced biological effects, particularly noncancer effects.  Another reason is the

-------
 diverse nature of noncancer effects.  The term noncancer effect is nonspecific and
 encompasses a wide variety of responses, including adverse effects on specific organs or
 organ systems, reproductive capacity, viability and structure of developing offspring in utero,
 and survival.2 For even a single type of effect,  the response can range in severity from mild
 and reversible to irreversible and life-threatening.  The severity of the response may depend
 on both the level and duration of exposure.  Modeling this diversity of response represents a
 major challenge.
       The EPA often assesses risks for noncancer effects by applying uncertainty factors to
 a NOAEL for a critical effect.  This method does  not involve dose-response models.  The
 purpose of this report is to discuss the benchmark  dose option in which the NOAEL is
 replaced with a BMD determined by a dose-response model. It is important to keep in mind,
 however, that the calculation of a BMD does not involve using a dose-response model to
 extrapolate risks to low doses, as the EPA does when conducting risk assessments for cancer
 effects.

 2.2.  OVERVIEW OF THE NOAEL APPROACH TO DETERMINING RfDs AND
      RfCs
       The BMD and NOAEL  have a number of features in common.  Before the BMD
 method is presented, the NOAEL is briefly described.  A NOAEL is defined as:

       "An exposure level at which there are no statistically or biologically significant
       increases in the frequency or  severity of adverse effects between the exposed
       population and its appropriate control.  Some effects may be produced at  this
       level, but they are not considered as adverse, nor  precursors to adverse
       effects. In an experiment with several NOAELs,  the regulatory focus is
       primarily on the highest one,  leading to the common usage of the term
2The words "effect" and "response" are used interchangeably in this document and refer generally to conditions
 that are considered adverse. Although the term "risk" is sometimes used in a similar manner to denote a
 specific adverse effect (e.g., cancer or reduced fertility), in this document "risk" is used quantitatively and
 refers specifically to an increased probability of an adverse effect.

-------
        NOAEL as the highest exposure without adverse effect."  (U.S. EPA, 1992).
        An RfD, or RfC,3 is obtained by dividing the NOAEL by one or more
        uncertainty factors.

        Different reference values may be developed for different routes of exposure (i.e.,
 oral RfDs and inhalation RfCs) and specific health effects (e.g., RfDDTs for developmental
 effects).  Similar techniques can also be used for different durations of exposure (e.g., for
 subchronic exposures of about 7 years).  The overall approach to determining an RfD, which
 is outlined below, is generally the same for each type of reference value.
        An RfD determination begins  with a review of the relevant literature to identify the
 "critical effect(s)"4 on which the RfD is to be based.  This determination takes into account
 the overall quality of the studies, the  route and duration of exposure, and the range of health
 effects.  If adequate human data are available, such data are used as the basis for the RfD;
 otherwise data from animal studies are used.  Among the well-conducted studies, the RfD is
 based on the study demonstrating the critical effect at the lowest dose.  This dose is the
 lowest observed adverse effect level (LOAEL).  The responses in the critical study that are
 obtained at doses below the LOAEL are examined to confirm that they constitute NOAELs.
 The RfD is calculated by dividing the largest NOAEL from the critical study by appropriate
 uncertainty factors.  Table 1 presents  uncertainty factors currently used by EPA (U.S. EPA,
 1992).

2.3. OVERVIEW OF THE BENCHMARK APPROACH
       A BMD is defined as a statistical lower confidence limit on the dose producing a
predetermined level of change in adverse response compared with the response in untreated
animals (the benchmark response, or BMR).  For example, a BMD could represent a 95
percent statistical lower confidence limit on the dose corresponding to a 1 percent increase in
3The remainder of this report will refer only to RfDs; however, the discussion is equally applicable to RfCs.
'The first adverse effect(s), or known precursors), that occurs as the dose rate increases (U.S. EPA, 1992).

                                           5

-------
 Table 1.  Description of Typical Uncertainty and Modifying Factors in Deriving Reference
 Doses (RfDs)"
 Standard uncertainty
 factors (UFs)c
        General guidelines1*
 H (Interhuman)
Generally use a 10-fold factor when extrapolating from valid
experimental results from studies using prolonged exposure to
average healthy humans.  This factor is intended to account for
the variation in sensitivity in the human population.	
 A (Experimental animal
    to man)
For RfDs, generally use a 10-fold factor when extrapolating
from valid results of long-term studies on experimental animals
when results of studies of human exposure are not available or
are inadequate. For RfCs,  this factor is reduced to 3-fold
when a NOAEL (human equivalent concentration) is used as
the basis of the estimate. In either case this factor is intended
to account for the uncertainty in extrapolating animal data to
humans.
 S (Subchronic to
   chronic)
Generally use a 10-fold factor when extrapolating from less
than chronic results on experimental animals or humans.  This
factor is intended to account for the uncertainty in extrapolating
from less than chronic NOAELs to chronic NOAELs.	
 L (LOAEL to NOAEL)
Generally use a 10-fold factor when deriving an RfD/RFC
from a LOAEL instead of a NOAEL.  This  factor is intended
to account for the uncertainty in extrapolating from  LOAELs to
NOAELs.
 D (Incomplete data
    base to complete)
Generally use a 10-fold factor when extrapolating from valid
results in experimental animals when the data are "incomplete."
This factor is intended to account for the inability of any single
study to address adequately all possible adverse outcomes.
 Modifying factor (MF)
Use professional judgment to determine an additional
uncertainty factor termed a modifying factor (MF) that is
greater than zero and less than or equal to 10. The magnitude
of the MF depends on the professional assessment of scientific
uncertainties of the study and data base not explicitly treated
above (for example,  the number of animals tested). The
default value for the MF is 1.
"Source:  Adapted in part from Dourson and Stara (1983), Barnes and Dourson (1988), and Jarabek et al.
 /1QOQ\
••Professional judgment is required to determine the appropriate value to use for any given UF.  The values
 listed in this table are nominal values that are frequently used by the EPA.
•Note: The maximum uncertainty factor to be used with the minimum confidence data base is 10,000.
                                           6

-------
 an adverse response over that found in untreated animals.  The benchmark level of adverse
 change in response (the BMR) is 1 percent in this example.
       A BMD is calculated by  fitting a mathematical dose-response model to data using
 appropriate statistical procedures.  The calculations necessary to determine a BMD are
 illustrated in figure 1 for a hypothetical set of dose-response data. The horizontal axis
 indicates the doses to which the  animals were exposed, and the vertical axis gives the
 percentage of animals having a particular adverse response.  Each solid symbol represents
 the average outcome in an experimental dose group.  For simplicity, it is assumed that the
 adverse effect did not occur in untreated animals. The figure depicts a mathematical dose-
 response model fit to the data and a corresponding curve (also derived from the mathematical
 model) of statistical lower bounds on doses corresponding to various levels of response. The
 predetermined level of increased response (the BMR) used to define the BMD is shown on
 the response (vertical) axis.  The resulting BMD plotted on the dose (horizontal) axis is
 determined as the lower bound on dose corresponding to an increased response equal to the
 BMR. Figure 1 also  shows the NOAEL determined from these data.  Although in this
 particular hypothetical example the BMD is illustrated as being smaller than the NOAEL, a
 BMD can be either less than or greater than the corresponding NOAEL.
   .    The BMD approach addresses several quantitative or statistical criticisms of the
 NOAEL approach presented earlier. The goals of the BMD approach include providing
 flexibility with respect to the definition of the BMD (i.e., not to be restricted  to one of the
 experimental dose levels) and accounting more appropriately for sample size and dose-
 response characteristics (Crump,  1984; Dourson et al., 1985; Kimmel and Gaylor, 1988).
       Even though mathematical models are used in  the BMD approach, their use is less
 certain for "low-dose extrapolation," at doses far below the range for which increased
 responses can be directly  measured.  Because the models proposed for the BMD approach
 are statistical models,  their predictions may be seriously in error if used to extrapolate to low
 doses without incorporating detailed information on the mechanisms through which the
 toxicant causes the particular adverse effect being modeled.  On the other hand, because the
calculation of a BMD  does not involve extrapolation fair beyond the range of the experimental
data, it should not be highly dependent on the dose-response model used.

-------
   100
 D)
 C
T5
 C
 o
 Q.
 W
 0)
 s_

_co
 CO

 E
"c
 03
 BMR  -


     0
                        indicates data point
                        with confidence bars
                          Lower statistical

                          limit on dose
Best-fitting dose
response model
                              BMD   NOAEL
           \j

                                         Dose

        BMR=targst response lavel used to define BMD
 Figure 1. Example of calculation of a BMD
                                         8

-------
               3.  DETAILED DESCRIPTION OF THE BMD APPROACH

        The determination of an RfD with the BMD approach involves three basic steps.
  First, a response or group of responses from one or more experiments is selected.  Second,
  BMDs are calculated for the selected responses.  Third, a single BMD is determined from
  among those calculated, and an RfD is calculated by dividing that BMD by appropriate
  uncertainty factors. Each of these steps involves a number of decision points that will be
  discussed in detail in this section.  In the first step, one must decide how to select the
  experiments and responses for calculating BMDs.  In the third step, the values of the
  uncertainty factors must be chosen. These selections and decisions are required in both the
 NOAEL and  BMD approaches.
       Particular attention will be focused here on the second step, which is unique to  the
 BMD approach.  This step involves specifying the form  in which the data will be recorded
 for modeling, choosing a dose-response model, selecting the  mathematical definition of
 altered response, stipulating  the benchmark level of altered response (the BMR) used to
 define the BMD, and selecting the procedure for computing statistical confidence limits used
 to calculate the BMD (including selecting the size of the confidence limit).
       Each of the decision points required in the BMD approach is listed in table 2. These
 decision points and the options available for those decisions are discussed in detail in the
 following sections. Many of these issues were also discussed at a recent workshop sponsored
 by  the EPA, American Industrial Health Council, and International Life Sciences Institute
 (Barnes et al., 1994). In applying the BMD method, the EPA may find it desirable to
 provide guidance for choosing among these options so that RfDs obtained with the BMD
 approach are calculated in a consistent manner.

3.1.  SELECTION OF RESPONSES TO MODEL
       There may be several  toxicity studies for a particular substance, and  each  study may
contain data for a number of  biological effects.  When calculating a BMD, dose-response
models must be applied to one or more effects from one or more studies.  The following
section discusses several options for selecting  responses for modeling.

-------
Table 2.  Steps and Decisions Required in the BMD Approach
                  Step
           Decisions
  1.    Study/response selection

  2.    Model dose response
 3.    Calculate BMD(s)
 4.    Determine RfD
Experiments to include
Responses to model

Format of data
Mathematical model(s)
Handling lack of fit
Measure of altered response

BMR definition
Confidence limit calculation

Specific BMD for RfD calculation
Uncertainty factors
                                          10

-------
        Toxicologic considerations indicate some data are unsuitable for modeling based on
 the overall quality of the study, the route of exposure used in the study vis-a-vis the route of
 exposure for which an RfD is required, and the range of health effects studied yis-a-Vis those
 for which the RfD is intended to cover.  The EPA uses these same considerations to focus
 attention on more relevant studies for identifying NOAELs (Barnes and Dourson,  1988).
 Additionally, specific responses in studies may be eliminated from consideration if there is
 no convincing evidence of a dose effect for those responses. Such a determination may be
 based on the opinions of those who conducted the experiment, possibly supplemented by
 additional statistical tests.  After toxicologically irrelevant data are eliminated, several studies
 and endpoints often remain, each with a different dose response.  How can these diverse data
 be handled using the  BMD approach?
        One option is  to apply dose-response models to all relevant responses.  While this
 option has the advantage of completeness, it may require a large effort if the data base is
 sizable.  Further,  it may be difficult to interpret results from a large number of dose-
 response analyses.  Selecting critical effects, as the EPA does in the NOAEL approach
 (Barnes and Dourson, 1988), helps limit the scope of modeling.
       The most limited modeling chooses only the effect(s) seen at  the LOAEL, thus
 minimizing the number of responses modeled.  This seems inappropriate, however, for the
 BMD because, unlike a NOAEL, the calculation of a BMD depends  on the slope of the dose
 response.  Thus, it is  possible that an effect may have a  higher LOAEL but a lower BMD
 than an effect with a steeper dose response but a lower BMD.  This is  a potential drawback
 to modeling a single effect at the lowest LOAEL.

 3.1.1.  Example:  Selecting Endpoints Within a Single Study
       Sanders et al. (1974) tested the effects of dietary exposure to Aroclor 1254, a PCB
 mixture, on several biological responses in male albino mice (ICR strain).  The researchers
 examined effects of 2  weeks of exposure on pentobarbital-induced sleeping  time; food
 consumption; serum corticosterone; and weights of the liver, testes, preputial glands, adrenal
glands, and vesicular glands.  Serum corticosterone levels were elevated for all doses tested
                                          11

-------
(62.5, 250, and 1,000 ppm),5 pentobarbital-induced sleeping time and food consumption
were reduced, and liver weight was increased at 250 and 1,000 ppm.  Adrenal glands were
significantly heavier only at 1,000 ppm.  Weights of testes, preputials, and vesicular glands
were not significantly affected.
      For this modeling exercise, changes that showed no response to dose (in the testes,
preputials, and vesicular glands) can be ignored. If one chose to model only responses seen
at the LOAEL (62.5 ppm in this case), only serum corticosterone level would be modeled.
Otherwise, serum corticosterone, liver weight, adrenal gland weight, pentobarbital-induced
sleeping time, and food consumption could be modeled.  Depending on the slope of the dose-
response curves, any one of these responses could yield the smallest BMD.
      If other studies of PCB were included in the data base, the 62.5 ppm dose (suitably
transformed  to yield consistent units across all studies) might not be the LOAEL among all
the studies (i.e., Sanders et al. [1974] is not the critical study).  However, even in the more
general context of all relevant PCB studies, one of the responses from Sanders et al. (1974)
could yield the smallest BMD, again depending on the dose-response slopes and the  doses
used in the other studies.

3.2.   USE OF CATEGORICAL VERSUS CONTINUOUS DATA IN BMD MODELING
      Noncancer health effects can be recorded in either categorical or continuous formats.
In a categorical format, possible responses are divided into two or more groups, and the
numbers of responses in each  group are recorded.  For example, organ degeneration may be
recorded as absent, mild, moderate, or severe.  The most commonly used format for
categorization of data is the quantal format in which only the presence or absence of the
response in an experimental subject is noted.  At the other extreme, a response may  be
capable of assuming a continuum of values and be recorded in a continuous format.  Organ
5Although a higher dose of 4,000 ppm was tested, all mice exposed at that level died within 7 days of initial
 exposure.
                                          12

-------
  weights and serum enzyme levels are examples of responses that are often recorded in a
  continuous format.6
         The format used for expressing a response may be determined largely by what is
  customary or appropriate for a particular type of response.  For example, cancer responses
  and particular types of developmental effects are-generally recorded in a quantal form simply
  as present or absent without more detailed categorization.  On the other hand, fetal weight
  can be expressed and modeled as either a quantal or continuous variable (Kavlock et al.,
  1994).
        Additionally, unless there is access to the raw data from a study, the format for
 expressing a response will be limited by the format in which the data are summarized.
 Clearly, data cannot be categorized  more finely than in the data summary available. When
 the raw data for a response in question are available in a continuous format, they can either
 be used directly in a continuous format in the dose-response models or converted into a
 categorical format by dividing the range of the responses into subintervals and recording the
 number of subjects with responses in each subinterval.  For certain continuous responses, a
 particular interval in the range may be considered to represent the "normal range" for this
 response.  Normal ranges can be used to define corresponding quantal responses in  a very
 natural fashion by considering a subject to be affected if its response is outside the normal
 range.
       In some cases, receding continuous data into  a quantal form relates the data  more
 directly to adverse response.  For example, liver weight as a fraction of total body weight is
 not adverse per se, but it may represent an adverse response when it reaches a certain level.
If this level was specified, then animals  with  liver weight to total weight ratios above that
level could be considered to be adversely affected. As another example, because a body
weight reduction of  >10 percent is generally considered adverse (OSTP, 1985), body weight
An additional possibility is for a response to be reported in a format that is a combination of continuous and
categorical  Consider, for example, the measurement of a serum enzyme level by an analytical method that has
a detection limit of x micrograms per liter.  Subjects with a response higher than x would have their enzyme
level recorded continuously, whereas subjects with enzyme levels below the detection limit would have their
response categorized as 
-------
changes could be treated quantally by using the 10 percent cutpoint to define the presence of
an adverse response.
       A disadvantage of receding continuous data into a categorical form is that information
on the magnitude of the response is lost.  However, a possible advantage is that, because
generally some of the responses of interest must be categorical, comparisons among
responses may be facilitated if they are all categorical, particularly if they are all quanta!.
On the other hand, the data needed to define a categorical response may not be available in
the published report.
       In the case of categorical data, the information generally required for application of
dose-response models includes the experimental doses, the total number of animals in each
dose group, and the number of these whose responses are in each of the categories.
Generally, interest will be in the special case of quanta! responses (i.e., two categories), and
only models for this special case will be discussed.
       In the case of continuous data, for application of a number of dose-response models
(specifically those that assume that responses  at each dose level are normally distributed),
experimental doses, the number of animals in each dose group, the mean response in each
group, and the sample variance of the response in each group must be known.

3.2.1. Example
       Johnson et al.  (1986)  examined the effect in rats of chronic acrylamide exposure on
degeneration of tibial nerves.  The degree of degeneration (from very slight to severe) was
recorded for each rat.  Data of this form are categorical but not quantal.  Because
degeneration of the type observed has been observed in aging rats (Johnson et al.,  1986) and
because very slight and slight degeneration was observed at roughly the same rate in all dose
groups, adverse effect was defined to be moderate or  severe degeneration.  This definition
could also define a quantal response, with degeneration that was slight or very slight
counting as no response and moderate and severe degeneration counting as a response.  The
numbers of male rats with  moderate or severe degeneration are displayed in table 3.
                                           14

-------
Table 3.  Acrylamide-Induced Tibial Nerve Degeneration in Rats

Data




Modeling Result

Dose
(mg/kg/day)
0
0.01
0.1
0.5 (NOAEL)
2.0
Model
QQR
QW
Number
affected
9
6
12
13
16
Goodness-of-fit
p-value
0.34
0.48
Number
tested
60
60
60
60
60
BMD (mg/kg/day)
(5% extra risk)
0.83
0.31
QQR = quanta! quadratic regression.
QW  = quanta! Weibull.

Source: Johnson et al. (1986).
                                          15

-------
3.3.  MATHEMATICAL MODELS FOR DEFINING A BMD
       Table 4 lists various models for quanta! and continuous data (Crump, 1984; Gaylor,
1989; Gaylor and Slikker, 1990).  Fitting the model to experimental data gives estimates of
three or more parameters that describe each model. This fitting, usually accomplished
through maximum likelihood methods (see appendix), estimates  the probability of response
(for quanta! data) or the mean response (for continuous data) for each dose level.  The same
methods also compute a lower statistical confidence limit for the dose corresponding to the
BMR.   This lower confidence limit defines the BMD.
       The models shown here, as well as many other possible models, relate  the response to
the dose, d.  The response variable is denoted in table 4 either by P(d), the probability of
response for a disease outcome that is either present or absent (quanta!), or by m(d), the
mean value of a continuously measured parameter of health or well-being.  In  all of the
equations shown, do is a threshold dose level, i.e., a dose level below which the response
variable is unaffected (i.e., at doses less than or equal to the threshold, the response variable
value remains at c, the value of that variable in the absence of dosing).  For the quantal
models, the probability of response is assumed to increase as dose level increases.  For the
continuous models, mean response can either increase or decrease as a function of dose level.

3.3.1.   Criteria for Selecting Models
      Ability to Describe the Observed Dose Response.  Since the goal of the BMD
approach is estimation of a lower bound on dose for some level of risk not far below the
observed range, the model should give adequate predictions of the observed experimental
responses.  Goodness-of-fit tests (see appendix) can be applied to determine if a model
adequately describes the dose-response data.
      Each of the models presented in table 4 is capable of describing a range of dose-
response patterns.  Figures 2 through  6 show the dose-response curves obtained with some of
these models.  The QPR and LN models will provide dose-response shapes similar to that
shown for the QW model (figure 4).  Similarly, the CPR model  will provide a range of
patterns similar to that shown for the CP model (figure 6).  Although figures 5 and 6 depict
the CQR and CP models applied to a response that decreases as the dose increases, these and
                                         16

-------
Table 4.   Dose-Response Models Proposed for Estimating BMDs
       Model
 Formula
Quantal Data
Quantal linear regression (QLR)
Quantal quadratic regression (QQR)
Quantal polynomial regression (QPR)
Quantal Weibull (QW)
Log-normal (LN)

P(d)
P(d)
P(d)
P(d)
P(d)

= c +
•"•*• ^ ~j
"•— c ~4~
= c +
= c +

(l-c){l-exp[-qi(d-d0)]}
(l-c){l-exp[-qi(d-d0)2]}
(l-c){l-exp[-q[id-...-qkdk]}
(l-c){l-exp[-qidk]}
(l-c)N(a+b logd)
 Continuous Data

   Continuous linear regression (CLR)

   Continuous quadratic regression (CQR)

   Continuous linear-quadratic
       regression  (CLQR)

   Continuous polynomial regression (CPR)

   Continuous power (CP)
m(d) =c +

m(d) = c +


m(d) = c +

m(d) =c + q1d+...+qkdk

m(d) = c
Note:  P(d) is the probability of a response at the dose, d; m(d) is the mean response at the
dose, d.  In all models, c, q^.-.q^ and d are parameters estimated from data. For the
quanta! models, 00, k> 1.  N(x) denotes
the normal cumulative distribution function.

Source: Crump (1984); Gaylor (1989); Gaylor and Slikker (1990).
                                        17

-------
    0.8-
    0.6 -
5-
CL
    0.4-
    0.2-
       0
           0
20      d0     40
                                      q=0.05
60
                                                  Dose
                                                                              80
100
                                                       q=0.005
           AH curves show, maximum likelihood predictions of response, not confidence limits, for various choices of parameters
           For all curves, c=0.05, d -30. The parameter c^ is the dose coefficient in this model; larger values of c^ give steeper dose

           response.



    Figure 2.  Examples of quantal linear regression (QLR) curves:

                P(d)=c+ (l-
                                                 18

-------
     0.8
     0.6
  T5,
  CL
     0.4
     0.2
 I       I
20     d
                                      40           60
                                          Dose
80
100
                q^O.01    q^O.001    q^O.0001    q.=0.00001
     All curves show maximum likelihood predictions of response, not confiiience limits, for various choices of parameters.
     For all curves, c=0.05, d =30. The parameter q. is the dose coefficient in this model; larger values of q. give steeper dose
     response.         01                                   n

Figure 3.  Examples of quantal quadratic regression (QQR) curves:
           P(d)=c+(1-cKl-expf-q^d-do)2]}
                                          19

-------
      0.8
      0.6
  CL
      0.4
      0.2
        0
                         20
40            60
     Dose
80
                         i=1 .29E-2;k=1    q^1 .82E-3;k=1 .5
100
                        qi=2.57E-4;k=2
      All curves show maximum likelihood predictions of response, not confidence limits, for various choices of parameters.
      For all curves, c=0.05. The parameter q, is the dose coefficient in this model; larger values of q, give steeper dose response.
      The parameter k is the power on dose; larger values of k give more curvature.

Figure 4.   Examples of quantal Weibull (QW) curves:
            P(d)=c+(l-
                                            20

-------
    100
     80
     60
     40
     20
                                    40            60

                                         Dose
80
100
              q^-0.01   q^-0.005   q^-0.0025   qi=-0.001
   All curves show maximum likelihood predictions of response, not confidence limits, for various choices of parameters.
   For all curves, c=90, d =20. The parameter q is the dose coefficient in this model; larger values of q give steeper dose
   response.                                                       .    '


Figure 5.  Examples of continuous quadratic regression (CQR) curves:

           m(d)=c+q1(d-d0)2
                                          21

-------
      100
       80
       60
  "D

   E
       40
       20
           0
20
40           60

    Dose
80
100
   q^-0.2; k=1     q^-0.0283; k=1.5    q^-0.004; k=4    q^-S.OE-5; k=3
     All curves show maximum likelihood predictions of response, not confidence limits, for various choices of parameters.
     For all curves, c=90. The parameter q1 is the dose coefficient in this model; larger values of q1 give steeper dose response.
     The parameter k is the power on dose; larger values of k give more curvature.



Figure 6.  Examples of continuous power (CP) curves: m(d)=c+q1(d)k
                                          22

-------
all of the models for continuous data also can be applied to responses that increase with
increasing dose.
       It is often the case that several models will adequately describe the data under
consideration.  When that is true, other considerations must be used to decide on the model
to use for BMD calculation.

       Statistical Assumptions. Important considerations in selecting a model are the
reasonableness of the statistical assumptions underlying a model and the procedures used to
fit it to the data.  In most instances, it may be reasonable to assume that quanta! results arise
from binomial variation about a dose-dependent, expected number of responders.  This
means that each subject is assumed to respond independently of all other subjects and that all
animals in a given dose group have an ec[ual probability of responding. These assumptions
are generally made when the models are applied for quantal data listed in table 4.  Similarly,
a continuous endpoint is generally assumed to display variation in accordance with dose-
dependent normal distributions. In other words, each subject is assumed to respond
independently of all other subjects, and the responses of animals in a particular dose group
are distributed according to a normal probability distribution. The methods proposed by
Crump (1984) for fitting the continuous models listed in table 4 assume this type of normal
variation. There are situations, however, when the binomial or normal assumptions may not
be appropriate.   In those cases, one should consider alternative models that are based on
more appropriate assumptions.  Whatever assumptions are made should be documented and
the reasons for  their selection described.  For example, in studies of developmental toxicity
where responses within and across litters are observed,  the response in one fetus may not be
independent of the response in other fetuses in the same litter.  Consequently, the  assumption
of independence inherent in models that assume binomial variation is not strictly valid,
although this assumption may still provide reasonable results in specific cases.
      Alternative models that assume more general forms of variation for quantal responses
from developmental toxicity experiments have been developed (Rai and Van Ryzin, 1985;
Kupper et al., 1986; Kodell et al.,  1991; Ryan, 1992).  Such models should be  considered
when the BMD approach is applied to responses observed in individual fetuses.
                                          23

-------
       As a different example, it may be necessary to transform continuous data in some
cases so that they better satisfy the assumptions of a normal distribution.  A log-transform is
often used for this purpose.  Kendall (1951) presents statistical tests that can be used to
determine if data are consistent with a normal assumption.

       Biological Considerations. Even though the models in table 4 are descriptive and do
not incorporate detailed information on biological mechanisms, certain general biological
considerations may be used to help select the dose-response models to be used for BMD
calculation.
       One example could be in selecting a threshold versus a nonthreshold model. The
quanta! models QLR and QQR involve a threshold dose, d0.  Doses below this threshold7
are assumed not to affect the probability of a response.  On the other hand,  the quantal
models QPR, QW, and LN do not involve a threshold dose; consequently, with  these models
any dose, no matter how small, is assumed to increase the probability of a response.  One
possible input for selecting a model is to apply threshold models to responses that  are thought
likely to have thresholds, and nonthreshold models for responses for which thresholds are
considered less likely, based on consideration of biological  mechanisms.8
       Because a BMD is a dose corresponding to a finite (nonzero)  increment in response
(the BMR), even if a threshold exists, the model  predictions are only used for doses that are
above the threshold.  In applying both threshold and nonthreshold models  to several data
sets, Crump (1984) did not find large differences between BMDs calculated from models
involving thresholds and those not involving thresholds.  Indeed, one goal in selecting a
BMR is for the resulting BMD not to be highly dependent on the underlying model. If this
7A distinction is sometimes made between a threshold for an individual and a threshold for a population. In the
 models listed in table 2 that incorporate a threshold dose, do, the single threshold applies to the population.
*Th& existence or nonexistence of a threshold for an effect can never be known with certainty based on
 experimental dose-response studies alone. If no responses are found at a given dose, another experiment
 employing larger numbers of animals may detect a response. Conversely, if responses are detected at a given
 dose, it is always possible that a threshold might exist at some lower dose.
                                            24

-------
 goal is accomplished, then it should make little practical difference whether the model used
 includes a threshold.
        Biological considerations also might be used to select models based on the biological
 plausibility of the dose-response curve shape.  Consider, for example, the difference between
 the QLR and QQR models (table 4) at doses near the threshold dose, d0.  While the QLR
 model has a  sharp transition from the background response rate to the dose-dependent rate at
 the threshold, the transition for the QQR model is smoother, without the apparent abrupt
 change (compare figures 2 and 3).  In some circumstances, a smooth (continuous) change of
 slope may be deemed more reasonable for the response under consideration and the QQR
 model favored over the QLR model.  In this case as well, however, if the BMR is selected
 appropriately (i.e.,  large enough so that lower bounds on dose for that level of risk are not
 overly dependent on the choice of model), it should make little practical difference which of
 these models is selected.                             ;

        Use of Multiple Models. It may be difficult to limit the calculations to a single model
 based on the criteria discussed above.  Consequently, it may be desirable to apply several
 models. More than one of those models may fit the observed responses equally well. The
 decisions required in that case are discussed  in section 3.8.

 3.3.2.  Example
       Johnson et al. (1986) observed tibial  nerve degeneration induced in rats by
 acrylamide.  Table  3 shows the responses in quanta! form as discussed in section 3.2.1.
 Table 3 also  summarizes the results of fitting two quanta! models (the quanta! quadratic
 regression, QQR, and quanta! Weibull, QW, models from table 4).  Figure 7 shows the rates
 of moderate and severe degeneration, the best-fitting QQR model, and the best-fitting QW
 model.
       Both models fit the data satisfactorily; chi-squared goodness-of-fit tests yielded
p-values greater than 0.05 (see appendix).  Because the fits of both models to the data are
adequate and the statistical assumptions underlying the two models are identical, these
considerations do not suggest acceptance of one model over the other.  The QW model does
                                          25

-------
    0.5
  s
  CD
  c
  0
  c
  o
  tr
  o
  Q_
  o
0.2
    0.1
               Data point with

               95% confidence bars.
        0
              0.5
     1          1.5

Dose-mg/kg/day


 QQR   QW
2.5
Figure 7.  Moderate to severe nerve degeneration in rats following acrylamide exposure



Source: Johnson et al., 1986
                                 26

-------
not allow a threshold, whereas the QQR model does. If it is suspected that tibial nerve
degeneration does not have a threshold, then one might prefer to use the QW model. If a
threshold is likely, then the QQR model might be preferred.  The high rates of very slight
and slight degeneration and nonzero rates of moderate and severe degeneration in the control
group of Johnson et al. (1986), in addition to other biological considerations, suggest a finite
background rate,  in which case a threshold may not be a good assumption.

3.3.3.  Additional Research
       Several types of data may require other types cjf models.  As described above, studies
of developmental toxicants evaluate responses in fetuses.   Different fetuses from  the same
litter may not respond independently to developmental toxicants, and models that account for
possible  "litter effects" may be needed.  Special approaches also may be needed for modeling
data in which, in addition to knowing whether an animal was affected, the level of effect
may be categorized (e.g., mild, moderate, severe).  While such categorization is generally
ignored,  other models can use the additional information.
       In some studies several different durations of exposure are used.  Correspondingly,
there may be a need  for different assessments for different durations of human exposure.
Models that incorporate duration of exposure as  well a^s the dose level can combine all the
exposure information.
       Some models exist that are applicable to each of these situations (see Rai and Van
Ryzin, 1985; Kupper et al.,  1986; Kodell et al., 1991; Clement International Corporation,
1990a, b; Ryan et al., 1991; Catalano et al., 1993). The applicability of such models to
calculating BMDs is being studied (Faustman et  al., 1994; Allen et al., 1994a, b).

3.4.   ADJUSTING FOR LACK OF FIT
       None of the models listed in table 4 will provide a reasonable fit to certain data sets.
Frequently this is due to reduced responses at higher doses that are inconsistent with the
dose-response trend seen at lower doses. One likely reason is interference at higher doses by
                   ^                               '"''•••       •   ••  '
competing mechanisms of toxicity. Whenever a lack of fit occurs, one should be sure that
all affected animals are taken into account.  For example, in some experiments, if a high
                                          27

-------
incidence of response is seen at lower doses, the experimenter may not look for the effect at
higher doses.  As another example, suppose a BMD is calculated based on the response,
"mild atrophy."  If mild atrophy progresses to "moderate atrophy" and subsequently to
"severe atrophy," then animals with these more severe forms could be considered to be
affected as well.  In general, if a BMD is calculated based on a toxic response that can
progress to more severe forms (possibly known by names different from the original
response), animals with more severe forms of the response also should be considered to be
affected.
       A plateau in the responses at the higher doses can be caused by saturation of
metabolic or delivery systems for the ultimate toxic substance.  Such an effect also can cause
dose-response models not to fit the data adequately. It may be possible to overcome this
problem by estimating the delivered dose to the site of action and then applying this dose in
the dose-response models  rather than an external measure of exposure  (Andersen et al.,
1987).  In this approach, pharmacokinetic data on animals are used to  estimate internal dose
to the target tissue that results from the experimental dosing regimen.  The BMD method is
applied to the internal dose.  Human pharmacokinetic information is then used to estimate the
external dose that would result in this internal dose. This external dose is  then used to
estimate the RfD.
       In other cases, a particular  response may be reduced at higher doses due to
interference by other responses that are not a progressive form of the response of interest.
One such example is when a dose-related toxic response that occurs primarily in aged
animals is not expressed because of premature deaths due to other toxic effects.  A more
subtle example is when moderate doses caused a particular organ to be enlarged, but still
higher doses caused the same organ to atrophy through an independent mechanism. In these
cases, it is not appropriate to combine these separate toxic responses into a common
response.
       Whenever the responses at the higher doses are reduced, so that none of the models
listed in table 4 fit, one option is to look for a more flexible model that can adequately
describe the dose response.  A seeming advantage to this approach is that one may be able to
incorporate all the data into the analysis. A danger in this approach is that the attempt to fit
                                          28

-------
 the high-dose data will skew the dose response at the lower doses that are of more direct
 interest.
        When none of the models provide an adequate fit, a simpler and perhaps better
 advised approach is to omit the data at the highest dose and refit the models to the remaining
 data.  This process can be continued, and an adequate fit will eventually be obtained.9
 Applying this process to  toxicologic test data will be greatly limited by the small number of
 doses that are typical in these experiments.  This approach is used by the EPA in risk
 assessments for cancer based on the linearized multistage model (Anderson et al., 1983).
 The rationale for eliminating data at the highest dose as opposed to lower doses is that the
 data at the highest dose should be the least informative of responses in the lower dose region
 of interest, i.e.,  the area of critical effects.

 3.4.1.  Examples
        Ethylene  glycol monopropyl ether (EGPE) was examined for toxic effects in rats
 when administered for 6  weeks via gavage (Katz et al., 1984).   At the end of the 6-week
 exposure period, the spleen appeared dark and enlarged in several rats,  presumably as a
 result of the exposure.  Histopathological examination  found congestion or extramedullary
 hematopoiesis. Table 5 displays the rates of hematopoiesis.  Note, no animals in the high-
 dose group had that lesion. This may be due to competing manifestations of toxicity or other
 unexplained reasons.10
       The NOAEL for this study was determined by applying a statistical technique referred
 to as the no statistical significance of trend (NOSTASOT) approach (Tukey et al., 1985).
 The NOSTASOT approach (described in  some detail in the appendix) applied  to all of the
 dose groups indicated that there was no significant trend for larger doses to yield larger
proportions of responders (the Mantel-Haenszel trend test p-value was about 0.56).
"The only exception to this is if at the lowest dose there is a statistically significant deficit in adverse response
 compared with control animals.

10Four of the high-dose rats had enlarged and darkened spleens; six high-dose rats had congestion in the spleen.
 These other endpoints might be used in lieu of extramedullary hematopoiesis for determining an RfD for
 EGPE, but for the sake of illustration the hematopoiesis response is discussed here.

                                            29

-------
Table 5.  EGPE-Induced Extramedullary Hematopoiesis in the Spleen of Rats
Data Dose
(mmole/kg/day)
0
1.88 (NOAEL)
3.75
7.50
15.0
Goodness-of-
Modeling results8 Model fit p-value
QQR 0.30
QW 0.18
Number Number
affected tested
0 10
0 10
3 10
4 10
0 10
BMD (mmole/kg/day) for
extra risk of
10% 5% 1%
2.24 1.56 0.69
0.99 0.48 0.094
*The results for the models are those fit to all dose groups except for the highest dose group.
 Neither model adequately fit data from all dose groups.

Source: Katz et al. (1984).
                                         30

-------
 However, the NOSTASOT procedure applied to the data both without the highest dose group
 and without the highest two dose groups detected significant trends.  Moreover, the pairwise
 comparison of the 7.5 mmole/kg dose group and the control group indicated a significantly
 increased rate of response at 7.50 mmole/kg (p = 0.04 by Fisher's exact test).  The pairwise
 comparison of the 3.75 mmole/kg dose group and controls was not significant (p = 0.11 by
 Fisher's exact test).
       The application of the BMD approach was also interesting in  this case.  Neither the
 QQR model nor the QW model could fit the dose-response data when all dose groups were
 included (p-values less than  0.02).   However, dropping the highest dose group (see section
 3.4) resulted in acceptable fits for both models (table 5),!  Figure 8 shows the results of
 fitting the models to the data, ignoring the highest dose group.
       Table 5 shows the BMD estimates corresponding to three levels of extra risk for the
 QQR and QW models.
       Another example of lack of fit that is not so directly accommodated is provided by a
 study of glycol ether-induced reproductive toxicity.  Miller et al. (1981) examined the effects
 of 9-day inhalation exposures (6  hours per day) to ethylene glycol  monomethyl ether
 (EGME) on testicular toxicity in rats and mice.  Toxicity was determined by measuring testes
 weights (table 6).  In both rats and mice, testes weights were significantly decreased
 following exposure to 1,000 ppm.  The NOAEL was 300 ppm for both rats and mice.
       The best-fitting CQR and CP models are shown in figures 9 and 10.  Although both
 models fit the rat data adequately, neither model adequately  describes the mouse data (table
 6).  The lack of fit to the mouse data is due primarily to the 100 ppm dose group, for which
the testes weights were larger (on average) than those of controls, and to the small amount of
variation in the observed results.
       The case of the mouse data illustrates one of the difficulties that can arise in applying
the BMD approach. The lack of fit in this case was  not due to a smaller adverse effect at the
                                          31

-------
               Data point with
               95% confidence bars.
                          5                 10
                        Dose-mmole/kg/day

                            QQR   QW
                  (Models fit without consideration of high dose group.)


Figure 8.  Extramedullary hematopoiesis of the spleen in rats following EGPE exposure

Source: Katz et al.} 1984

                                  32

-------
 Table 6.  EGME-Induced Testicular Toxicity in Rats and Mice
Data Dose (ppm)
0
100
300
(NOAEL)
1000
Rats
Average
weight
2.82
2.88
2.70
1.50
a
SD
0.10
0.05
0.20
0.10
Mice8
Average
weight
0.21
0.23
0.20
0.10
SD
0.01
0.01
0.02
0.01
                                            Rats
Mice


Modeling results




Model
CQR
CP
Goodness-
of-fit
p-value
0.13
0.17

BMD
(ppm)
315
184
Goodness-
of-fit
p-value
<0.01
<0.01

BMD
(ppm)
— .
—
aFive animals in each dose group.  Reported are average testes weights (in grams) and
 standard deviations (SD)  for testes weights in each dose group.

Source: Miller et al.  (1981).
                                          33

-------
  3.5
D)
C/) 2.5
 CO
 0
•*—*
 CO
 o>  „
h-  2
   1.5
       0
               Data point with

               95% confidence bars
200
400      600       800

    Dose-ppm

  CQR   CP
 Figure 9.  Testes weights in rats following EGME exposure



 Source:  Miller et al., 1981
                                                    1000     1200
                                  34

-------
      0.3
     0.25
   O)
   in  0.2
   CD
   CD
  j—0.15
      0.1
    0.05
                Data point with
                95% confidence bars
         0
200
 400       600       800
   Dose-ppm
CQR    CP
1000
Figure 10.  Testes weights in mice following EGME exposure
Source: Miller et al., 1981
                                 35

-------
highest dose but rather an opposite response at the low dose.  Therefore, dropping dose
groups (as discussed in this section) will not lead to an adequate fit.11
       It is not likely that alternative models will provide better fits to the mouse data, as
long as such models postulate a monotone dose response.  Models with monotone dose will
not be able to predict the increased testes weights in the 100 ppm group.  Biological and
toxicological considerations may dictate that a nonmonotone response pattern is feasible in
this instance, in which case one may conclude that doses of 100 ppm or less to male mice do
not result in testicular weight loss.  Alternatively,  it may be determined that the observed
variation among the responses underestimates the true variability associated with the
testicular response, in which case the predictions of the CQR and CP models may be
adequate for the application of the BMD approach.
        The estimated BMDs corresponding to a 5  percent relative decrease in testes weight in
rats were 315 and 184 ppm, respectively, for the CQR and CP models.  These two BMDs
bracket the NOAEL of 300 ppm.

3.4.2.  Additional Research
        Additional research is needed to develop guidelines and suitable options to adjust for
poor fit.  Guidance is especially needed on the biological and toxicological considerations to
apply when dropping doses.
        As discussed earlier, use of estimates of internal dose at the site of toxicity could
result in more appropriate RfDs, regardless of whether the NOAEL or BMD approach is
 "The small standard deviations reported for all the dose group responses entail small estimates of "pure error"
  used for comparison with the error between model predictions and observations. An F test is performed,
  where the numerator represents the error for lack of fit and the denominator represents the pure error or the
  variability of the observed weights around the group-specific means. When the estimate of pure error is small
  (i.e., when standard deviations are small), deviations of the model predictions from the observations may be
  significant, even when they appear to be in fairly close agreement.
  In some instances, values may be erroneously recorded as standard deviations, when in fact they represent
  standard error of the means.  Whenever this occurs, there is more variability in the observations than
  suggested by the reported standard deviations, and the models may provide a satisfactory fit. The best
  insurance against such an error is to have available the results in individual animals. In this case, if the values
  reported by Miller et al. (1981) are actually the standard error of the means, the CQR and CP models would
  adequately fit the mouse data.
                                              36

-------
  used. Once additional pharmacokinetic data are available and experience is gained from
  applying them to the BMD, then the EPA may develop additional guidance for applying
  pharmacokinetic data in calculating RfDs. The EPA has some experience in using such data
  in estimating RfDs,  but the EPA is using such data routinely for developing RfCs (e.g.,
  Jarebek et al.,  1989, 1990; U.S. EPA, 1990). This effort may solve some problems of poor
  fit because, as  mentioned above, certain pharmacokinetic behaviors might account for dose-
  response patterns that are not strictly monotone (e.g., plateaus in response rates due to
  saturation of crucial  metabolic pathways).

  3.5.  MEASURE OF INCREASED RISK
        Crump (1984) proposed two measures of increased response for quantal data,
  "additional risk" and "extra risk."  Additional risk is defined as
                                  AR(d) = P(d) - P(0),
 and extra risk as
                           ER(d)  = [P(d) - P(0)]
 In these equations, P(d) is the probability of response at dose d, and P(0) is the probability of
 response in the absence of exposure (d = 0).
       Additional risk is the additional proportion of total animals that respond in the
 presence of the dose.  Extra risk is the fraction of animals that would respond when exposed
 to a dose, d,  among animals who otherwise would not respond. Extra risk is typically used
 by the EPA in risk assessments for cancer (Anderson et al., 1983).
       Extra  risk is additional risk divided by the proportion of animals that will not respond
 in the absence of exposure.  Thus, extra risk and additional risk will coincide for responses
 that do not occur spontaneously (i.e. , when background rate is zero).
       Additional risk and extra risk differ quantitatively in the manner in which they
incorporate background response.  For example, ^if a dose increases one type of response
                                         37

-------
from 0 percent to 1 percent and increases a second type of response from 90 percent to 91
percent, the additional risk is 1 percent in both cases.  However, the extra risk is 1 percent
in the former case and 10 percent in the latter case.
       For continuous data, Crump (1984) suggested two measures of increased response
analogous to those defined above for quanta! data.  The first is the difference between the
mean response expected under exposure to dose d and the mean response expected in the
absence of exposure:
                                     m(d) - m(0) |,
where m(d) is the mean value of the continuous measure of response for dose d. The
vertical lines are symbols for absolute value and are incorporated to allow the expression to
be applicable regardless of whether increases or decreases in the mean response are
considered adverse.
       The second measure proposed by Crump (1984) for continuous data normalized
differences in mean responses by the background mean response:

                                 |m(d)-m(0)|  /m(0).

This measure of adverse response involves the fractional change in response rather than the
absolute amount of change.
       Crump (1984) also suggested that changes in a continuous endpoint could be assessed
relative to the variability of that endpoint.  His suggestion was to measure adverse response
by
                                  m(d) - m(0)
where cr(0) is the standard error of the responses in the control group.
       None of the measures proposed above for continuous variables take into consideration
the definition of an adverse effect (e.g.,  ranges of a continuous variable indicative of
                                          38

-------
  abnormality).  Gaylor and Slikker (1990) suggested an approach for continuous data that
  would allow one to estimate the probability of an adverse effect from continuous data without
  the necessity of first categorizing the continuous responses observed (although it would still
  be necessary to conceptualize a categorization into normal  and abnormal ranges of response).
  Suppose there is a value of the response, A, that defines an adverse effect (e.g., responses
  greater than A  are considered to be adverse).  The approach of Gaylor and Slikker calls for
  dose-response modeling of the continuous data, followed by conversion of the mean and
 variance estimates to statements about the probability of observing adverse effects (e.g.,
 effects greater than A) at given dose levels.
                             i
        To implement the approach, one first fits a dose-response model to the observed
 continuous endpoints and obtains estimates of the mean value of the response at a dose d,
 m(d), and the standard deviation for the observations at that dose, a(d). . Then the
 probability, P(d), of an adverse response at dose d can be computed as

                           P(d) = Probability (RESPONSESA).

 This probability can be computed from knowledge of the mean, m(d), and standard
 deviation, cr(d).  By using these probabilities, the equations for additional risk; AR, or extra
 risk, ER, for quantal responses can be applied in the subsequent steps of the BMD approach.
       To use the approach suggested by Gaylor and Slikker (1990), one must assume a
 normal distribution for the continuous endpoints.12 The need to assume some specific
 distribution is not  a disadvantage because a distribution must be assumed whenever a model
 is fit to continuous data (see appendix).  An advantage of this approach is that it allows a
 common measure of adverse response to be used with both quantal and continuous date.
 Another advantage is that, unlike the data needed to define categorical responses from
continuous date, the date necessary for implementation of this approach are likely to be
summarized in a published report.
12The method could readily be generalized to a non-normal distribution by replacing m(d) and a(d) with the
 mean and standard deviation of that distribution.  However, the (data needed for efficient estimation of the
 parameters of a non-normal distribution generally will not be summarized in a published report of a study.
                                           39

-------
3.5.1.  Examples
       Li examples presented in sections 3.3.1 and 3.4.1 that used quanta! responses (see
tables 3 and 5), extra risk was the measure of altered response used for BMD calculation.  In
the example of EGME-induced testicular toxicity (table 6), for which responses were
measured on a continuous scale, the measure of altered response used was relative change in
weight (absolute change in mean testes weight normalized by the mean background [control]
testes weight).
       Consider the case  of maternal effects induced by sulfamethazine during pregnancy.
As part of a developmental toxicity study of sulfamethazine, the National Center for
Toxicological Research (NCTR) conducted a preliminary study to determine the toxicity of
that compound to pregnant animals (NCTR, 1981).  Sulfamethazine was administered to CD
rats at  seven dose levels on gestation days 6 through 15. The maternal weight gain data for
the entire gestational period are shown in table 7. Weight gains were decreased at the three
highest doses.  Weight gains in the four lowest dose groups,  though larger than in controls,
were not statistically different from controls.  NCTR reported a significant trend for
decreased weight gain, as tested by Jonckheere's test.  Application of a procedure for
determining trends for continuous endpoints based on the CP model (see appendix)
established 600 mg/kg/day as the NOAEL.  Both the CQR and CP models fit the data very
well (figure 11). The BMDs estimated from these models are displayed in table 8.  Shown
in table 8 are BMD estimates for two dose-response models and two measures of altered
response (as well as three levels for the BMR and three confidence limit sizes; these are
discussed below).
       For both the CQR and CP models, the estimate of the BMDs depended greatly on the
measure of risk. The differences across the two measures of risk were greater for the CP
model  (especially for smaller BMRs and for the larger confidence limits).
       The results for the two models were most comparable when the absolute differences
in the means were normalized by the background mean (and when either the BMR was 5
percent or greater or the confidence limit size was less than or equal to 95 percent).
Normalizing by background response rates makes the BMR less dependent on the specific
model  used.
                                          40

-------
 Table 7.  Gestational Weight Gains in Pregnant Rats
Dose (mg/kg/day)
0
75
150
300
450
600 (NOAEL)
900
1200
"X
Average weight gain
118.6
126.4
130.6
125.1
122.8
117.4
100.0
75.2
SD
24.7
14.8
10.5
8.2
10.6
14.1
20.1
58.9
N
13
7
6
6
6
5
6
4
SD = standard deviation,
N = number of pregnant animals for which weight gains were determined.

Source:  NCTR (1981).
                                        41

-------
    140
    120 :r
    100
  CD
  O)
 .E  80
  Co
  O)
     60
     40
     20
      0
       0
Data point with
95% confidence bars.
200
400     600     800     1000
  Dose-mg/kg/day
   CQR   CP
1200
Figure 11.  Weight gain during gestation in rats exposed to sulfamethazine

Source: NCTR, 1981
                               42

-------
Table 8.  BMDs (mg/kg/day) Calculated for Sulfamethazine Data
Model
CQR
(p = 0.73)a




CP
(p = 0.70)a




Measure of risk
Absolute difference of
means

Absolute difference of
means normalized by
background mean
Absolute difference of
means

Absolute difference of
means normalized by
background mean
BMR
10
5
1
10
5
1
10
5
1
10
5
1
Confidence limit size
90%
49.0
34.7
15.5
558.0
395.0
176.0
28.9
19.0
7.18
533.0
355.0
136.0
95%
47.3
33.5
15.0
540.0
382.0
171.0
18.0
11.2
3.71
491.0
311.0
105.0
99%
44.6
31.5
14.1
510.0
361.0
161.0
5.29
2.82
0.655
405.0
226.0
54.5
ap-values for goodness-of-fit.
                                         43

-------
3.5.2.  Additional Research
        It is not clear when measures expressed relative to background (e.g., extra risk and
absolute differences in means divided by background means) are preferable to measures
expressed as absolute changes.  Additional research is required to provide guidance regarding
the measure of altered response that is most appropriate in particular instances.
       The method described by Gaylor and Slikker (1990) permits a BMD to be calculated
from response probabilities irrespective of whether the underlying data are quantal or
continuous.  Although the method is conceptually sound, the statistical methodology needed
for calculating confidence limits needs to be presented and computer software to implement
the methodology needs to be developed.  Support for these implementations and
investigations of properties of the approach is needed.  Particular aspects of the method that
need to be addressed include questions regarding the definition of normal and abnormal
ranges (whether based on professional, lexicological judgment or defined in terms of
variability in the control, or other background, populations). Also of particular importance
are methods for determining probabilities of being abnormal that are based on confidence
limits rather than maximum likelihood estimates.

3.6.  SELECTION OF A BENCHMARK LEVEL OF RISK
       The BMD is a lower statistical confidence limit on the dose corresponding to a
specified level of risk called the benchmark risk, or BMR.  Thus, before calculating a BMD,
the BMR must first be specified.  Several considerations may influence the selection of a
BMR. The first consideration is that, when used  for determining the RfD, the BMD is used
like the NOAEL.  This suggests that the BMR should be selected near the low end of the
range of increased  risks that can be detected in a bioassay of typical size.  Comparison of the
BMD with the NOAEL for a large number of developmental toxicity data sets indicated a
BMR in  the range  of 5 to 10 percent resulted in a BMD that was on average similar to the
NOAEL (Allen et  al.,  1994a, b; Faustman et al., 1994).
       Another consideration is that an important goal of the BMD  approach is that the
approach be relatively model independent; that is, different dose-response models that fit the
data should give comparable estimates of the BMD. However, it is well known that different
                                          44

-------
 mathematical dose-response models can fit data equally well and yet produce widely
 divergent estimates of risk at doses far below the range that produce measurable increases in
 response (Crump, 1985).  Thus, for the BMD approach to be relatively model independent,
 the BMR cannot be much smaller than increased responses that can be measured reliably in
 experimental groups of typical size.
       Some simple quantitative considerations can provide guidance with respect to the
 setting of the BMR.  Consider a quanta! response in a relatively large dose group of 100
 animals and suppose that the observed response rate is ,1 percent.  A 95 percent confidence
 interval for the true rate of response ranges from 0.25 percent to 5.4 percent. (A confidence
 interval for the difference between the rate in this group and that in a control group would be
 even larger.)  This illustrates the fact that increased responses of 1 percent or less cannot be
 measured with much precision in bioassays of typical size; that is, a BMR below 1 percent
 would be expected to be outside the range of risks that could be measured accurately in
 typical experiments.
       Various papers (Crump, 1984; Dourson et al.,  1985; Kimmel and Gaylor, 1988;
 Gaylor, 1989; Allen et al., 1994a, b) have proposed a BMR for quanta! responses in the
 range of 1 percent to 10 percent.  Less attention has been given to corresponding levels for
 continuous effects (Kavlock et al.,  1995).  If the approach of Gaylor and Slikker (1990) (see
 section 3.5) is used for continuous effects, then it may be possible to use the same BMR for
 continuous responses as for quantal responses.

 3.6.1.  Examples
       In  the example of EGPE-induced toxicity in the spleen of rats (section 3.4.1, table 5),
 BMDs were calculated for BMRs of 10 percent, 5 percent, and 1 percent.  For the two
 quantal models examined, the BMD estimates differed by slightly more than a factor of 2 for
 10 percent extra risk. There was less agreement at lower risk levels, and at an extra risk of
 1 percent, BMDs from the two models differed by a factor of 7.3.  For the QQR model, the
BMDs corresponding to 10 percent and 5 percent extra risk bracket the NOAEL.  All BMDs
calculated for the QW model fall below the NOAEL, with the BMD for 10 percent risk
being about one-half the NOAEL value.
                                          45

-------
      In the example of sulfamethazine-induced effects on the continuous variable,
gestational weight gain (table 7), BMDs were calculated for three levels of the BMR, 10
percent, 5 percent, and 1 percent (table 8).  For the two models considered and for each of
the measures of risk, the results were more similar across models (i.e., there was greater
model independence) when the BMR was 5 percent or greater.

3.6.2. Additional Research
      One of the desired features of the BMD approach is that,  because extrapolation far
beyond the range of the data is avoided, the procedure should be relatively independent of
the dose-response model used.  The extent to which this is the case depends in part on the
BMR selected.  As lower BMRs are  used, the corresponding BMDs should become more
model dependent because one is extrapolating further beyond the range of the data.  This was
observed in the examples.  However, as observed  in the examples, there will be some
divergence in BMDs regardless of the BMR selected.  The goal in selecting the BMR is to
make it as small as practical without the BMD becoming too model dependent.  Although a
BMR of 1 percent to 10  percent has  been recommended by various authors (Crump,  1984;
Kimmel and Gaylor, 1988;  Gaylor, 1989), there has been no systematic study of data from a
number of chemicals to determine how model dependent the BMD is for various values of
the BMR; however, the  studies by Allen et al. (1994a, b) and Faustman et al. (1994) suggest
that BMDs from several  models were similar for several developmental toxicity endpoints.
How  much does nonrandom or nonbinomial distribution affect confidence limit calculation
(especially for biphasic distributions)?  Such a study could provide a more definitive basis for
selecting a BMR and could  evaluate the model uncertainty at the recommended BMR. It also
could provide experience on the performance of various models and information on how well
models fit data and what problems might arise from their application.

3.7.   CONFIDENCE LIMIT CALCULATION
      Decisions to be made in calculating a lower confidence limit for the dose
corresponding to the BMR involve selecting the procedure for calculating confidence limits
and the size of the confidence limits. Recall that the  BMD is defined to be the lower
                                         46

-------
 confidence limit on dose corresponding to the BMR.  The lower limit, as opposed to the
 maximum likelihood estimate, is used for several reasons, the foremost being that statistical
 confidence limits are influenced by the sample size of an experiment.  That NOAEL
 judgments generally do not account for sample sizes is one major criticism of the NOAEL.
 Other factors that make the lower confidence limit preferable to the maximum likelihood
 estimate include the facts that the lower limit will be more stable to minor changes in the
 data and that the lower limit  may be estimable even in some cases where the maximum
 likelihood estimate is not.
       Confidence limits based on maximum likelihood theory have a number of desirable
 statistical properties (Cox and Lindley, 1974).  Maximum likelihood methods can derive
 confidence limits either from the asymptotic distribution of the parameter estimates
 themselves or the asymptotic distribution  of the likelihood ratio statistic (Cox and Lindley,
 1974).  Crump and Howe (1985) found that the latter approach (described in the appendix)
 appeared to have superior statistical qualities in dose-response applications. This approach is
 incorporated into GLOBAL 82 (Howe and Crump,  1982), the computer program that has
 been used by the EPA for dose-response modeling for cancer.
       The size of statistical confidence limits ranges from 90 percent to 99 percent in most
 applications.  Instead of being based  on scientific rationale, this range seems to be purely
 conventional.  The EPA has generally employed one-sided 95 percent confidence limits in
 risk assessments  for cancer effects (Anderson et al., 1983).

 3.7.1 Example
       In the example of sulfamethazine effect on weight gain during  pregnancy (section
 3.5.1, tables 7 and 8), BMDs were calculated for three sizes of confidence limits.  For the
 CQR  model, the choice of confidence limit size had veiy little impact on the BMD estimates.
For the CP model,  however,  the choice of confidence limit size was much more important,
especially when absolute difference in the means was used as the measure of risk. The
importance  of confidence limit size with the CP model increased as BMR decreased; e.g.,
the BMD estimate for the 1 percent BMR was more sensitive to the choice of confidence
limit size than was  the estimate for the 5 percent BMR.
                                         47

-------
       The results for the two models were most comparable when the absolute differences
in the means were normalized by the background mean (and when either the BMR was 5
percent or greater or the confidence limit size was less than or equal to 95 percent). This
suggests that not only will the BMD estimates be model dependent for low levels of risk, but
that they may also be model dependent when wide confidence limits are calculated.

3.7.2.  Additional Research
       The appearance of the interactions between BMR, BMD, and confidence limits
highlights two features: the care with which one must consider the options for all of the
decision points, and the need for additional research to investigate the interrelationships
among the decisions.  As an extension to the research suggested in section 3.6.2, one should
also consider the impact of the size of the confidence limits on the model independence of
the BMD approach.  It is clear that this cannot be done in isolation from the choices
concerning the BMR level.  Some guidelines for selecting confidence limit size also could be
developed that consider the adequacy (from a health-protective policy perspective) of
confidence limits of various sizes.

3.8.   CHOOSING AN APPROPRIATE BMD
       Depending on the models and the responses analyzed, the procedures discussed to this
point may yield a single BMD, multiple BMDs calculated from applying multiple models to
individual responses, multiple BMDs calculated from different responses in a single study, or
multiple BMDs calculated from different studies.
       Multiple BMDs may arise when different models fit the data for a single response in a
single study. Different BMDs could also come from a single study if more than one
response is modeled.  Selecting any BMD other than the smallest one from that study might
lead to an RfD that is not protective against the effect corresponding to the smallest BMD.
Different BMDs could arise for the same response from different studies.  Potential
differences among studies with regard to species of animal studied, dosing patterns, and other
features of experimental design make it difficult to  specify a general  rule that would be
applicable in all situations.
                                          48

-------
  3.8.1.  Examples                                j
        In the examples discussed above, BMDs for a single endpoint in a single study have
  been calculated using two different models (tables 3, 5, 6, and 8).  Because it may not be
  possible to eliminate one model from consideration  (because of lack of fit, inappropriate
  statistical assumptions, or biological considerations; see section 3.3), some judgment must be
  made about the treatment of the pairs of BMDs arising from the two models.
        Consider the example presented in table 3.  Two options for dealing with multiple
  BMDs from a single endpoint can be illustrated.  The first is to use the smallest of the
  BMDs, which in this case is 0.31 mg/kg/day.  The  second option is to combine the
  estimates.   If a geometric average is used, the resulting BMD  estimate for acrylamide-
  induced nerve degeneration  is 0.51 mg/kg/day. , For the sake of this example, attention is
 limited to the two models, QQR and QW,

 3.8.2.  Additional Research
        Determining how to  deal with multiple  BMDs requires  more extensive discussion.
 Current RfD/RfC workgroup policies for selecting the "critical effect" provide a beginning
 basis for developing some guidelines. Additional work needs to describe how to use the
 BMD approach to develop endpoint-specific RfDs/RfCs and characterize RfDs based on
 multiple endpoints. Specific scientific criteria, such  as, mechanistic information, may be
 particularly useful in choosing endpoints.

 3.9.  UNCERTAINTY FACTORS
       Once a unique BMD  is selected, an RfD is obtained by  dividing the BMD by one or
 more uncertainty factors.   This same step is required in the NOAEL approach, but the
 uncertainty factors are applied to the NOAEL rather  than to the BMD.
       The uncertainty factors that used to be routinely applied to NOAELs (table 1) have
been criticized as being arbitrary, but it is more appropriate to  consider them imprecise.
Limited publications on the application of uncertainty factors to BMDs suggest uncertainty
factors to account for within-human and animal-to-human variability,  the severity of the
modeled effect, and the slope of the dose-response curve (Dourson et al.,  1985).  A recent
                                          49

-------
paper also suggests that the BMD at 10 percent will frequently be near the LOAEL (Farland
and Dourson, 1992).
       New approaches to the definition and calculation of uncertainty factors are being
investigated (Hattis and Lewis, 1992; Lewis et al.,  1990; Dourson et al., 1992; Renwick,
1991, 1993). This work should be applicable to BMDs as  well as to NOAELs, but it should
be noted that, unlike the NOAEL, the calculation of the BMD depends on the BMR as well
as the size of the statistical confidence bound employed.  These additional considerations may
need to be accounted for when selecting uncertainty factors for BMDs.
        Some biological considerations (e.g., relating to the possibility of a threshold for the
responses under investigation) could affect the selection of  uncertainty factors.  The manner
in which these considerations affect uncertainty factors is unclear at present.
       Kimmel and Gaylor (1988) presented another option for determining acceptable doses
that specifies a  level of extra risk (e.g.,  10'5) that is deemed to be  sufficiently health
protective.  They used the upper confidence limit of the dose-response curve to estimate a
lower confidence limit on the effective dose (ED) for a given level of response (i.e.,  the
lower confidence limit on the ED to produce a 10 percent response is the LED10).  The
LED10 is equivalent to the BMD for a 10 percent response  as proposed by Crump (1984).
Adjustment factors (F) were then applied to the LED10 to achieve a specific level of excess
risk (e.g., 10"4  to 10'5). Thus, to achieve a risk level of 10'5 a factor of  1,000 would be
applied to the LED10.  This approach assumes that animal risk is approximately equal to
human risk. Thus, if the true dose response was linear, then the excess human risk level
would be no greater than the lower limit on the dose corresponding to a risk of 10"5.
However, if the dose response is highly nonlinear so that a threshold exists and if the LED10
is below the threshold, then  the true human risk would be zero and this approach would be
highly conservative.
       This option is equivalent to the approach proposed by Mantel and Schneiderman
(1975) for cancer toxicity and is similar to extrapolating to the presumed human risk with a
linear dose-response function (e.g., the linearized multistage approach) that the EPA applies
to cancer data (Anderson et al., 1983).  In the context of the EPA's current approach to
calculating the RfD, this option is based on a different philosophy, which does not consider
                                          50

-------
 adjustments for intra- and interspecies differences, missing data, or other factors explicitly.
 In addition, it requires value judgments as to what level of excess risk should be considered
 acceptable in the context of these uncertainties.
3.9.1.  Example
        Consider the example in section 3.3.1 of acrylamide neurotoxicity (table 3).  If the
same uncertainty factors applied are those used in the current NOAEL/RfD approach,  the
factors that might be relevant are a factor of 10 for aniimal-to-human extrapolation, another
factor of 10 for human variability, and another factor of 10 because it is not known whether
the electron microscopic changes occurring at the LOAEL also occur at the NOAEL.  These
yield a total uncertainty factor of 1,000.  Application of this uncertainty factor to the two
BMD estimates shown in table 3 yields RfDs of 0.31 |ug/kg/day or 0.83 /wg/kg/day. If an
average of the two BMDs were selected (see the example in section 3.8.1), then the resulting
RfD would be 0.5 ^g/kg/day.
       If the option for factor selection described by Kimmel and Gaylor (1988) were  to be
used, one must determine a level of risk that is sufficiently low to provide safety for human
populations. Suppose that for the endpoints under consideration in this example a risk level
of 10"4 is acceptable.  Then the factor that is applied to the BMDs is determined by the ratio
of the BMR (in this case,  0.05) and  the acceptable level of risk.  Thus, the factor selected in
this case would be 0.05/10"4 = 500.  The RfDs calculated using this approach would be 0.6
/tg/kg/day and 2 /tg/kg/day (for the BMDs in table 3) or 1 jtg/kg/day (if the average of those
BMDs were used).
       The  NOAEL derived from these data is 0.5 mg/kg/day.13  Applying an uncertainty
factor of  1,000 to that NOAEL yields an RfD of 0.5 /*g/kg/day.  That value is the same as
13Although Johnson et al. (1986) reported that the high-dose group experienced significantly greater mortality
 than the controls, the data reported in the manuscript are not adequate for conducting a mortality-adjusted test.
 However, the authors noted that the Mantel-Haenszel test showed a significant dose-related trend in
 degeneration of tibial nerves when applied to all the dose groups, and they stated that the degeneration results
 for doses of 0.5 mg/kg/day and below were "comparable to controls." The Mantel-Haenszel test applied to
 the data without adjustment for survival differences was not significant when the highest dose group was
 ignored. From such information we conclude that 0.5 is the NOAEL for tibial nerve degeneration in male
 rats.

                                            51

-------
the RfD calculated using the average BMD and the same uncertainty factor as the NOAEL
                                                                                 v
approach.  The RfDs calculated from the BMDs using the Kimmel and Gaylor (1988)
approach to deriving uncertainty factors are slightly larger than the RfD based on the
NOAEL.

3.9.2.  Additional Research
       The uncertainty factors applied to a NOAEL to calculate an RfD have been applied
extensively for a number of years. The "traditional" factors (the first four of table 1) are
based on deliberation and debate by toxicologists over a number of years.  They reflect a
large collection of informed judgment that is a continuing subject for research (Calabrese,
1985; Hattis et al.,  1987; Hattis and Lewis, 1992; Lewis et al., 1990; Dourson et al.,  1992;
Renwick,  1991,  1993).  A  scientific consensus such as this has recently begun for use with
BMDs (Barnes et al., 1994).
       The EPA will continue to promote public discussion of the roles of the dose-response
slope and  of biological considerations (e.g., the likelihood of thresholds) in determining
appropriate uncertainty factors. This will include discussion of applications of the BMD
procedure to data from a variety of chemicals with  a variety of toxic endpoints.  Investigating
uncertainty factors in the context of the BMD approach complements recent work that
reconsiders the uncertainty  factors used in the NOAEL.

3.10.  SUMMARY OF BMD DECISIONS
       The decisions required in implementing the BMD approach were presented, along
with some of the available options for each of the decisions, including options proposed in
the literature.  Options for each of the decisions are summarized in table 9.  By no means do
these exhaust all the possibilities. The options presented were selected because they were
judged to  have scientific merit, seemed reasonable,  or have a history of use.
                                          52

-------
Table 9.  Summary of Decisions and Options for BMD Approach
               Decision
                    Options
1.  Selection of studies
2.  Selection of responses
3.  Format of data
4.  Mathematical model(s)
5. Handling lack of fit



6. Measure of altered response

       Quantal data


       Continuous data
a.  All relevant, high-quality studies
b.  A single,  "critical" study

a.  All responses from selected studies
b.  Responses observed at LOAEL

a.  Convert continuous data to categorical data
b.  Transform continuous data (e.g., log-
       transformation)
c.  Retain original, continuous format

a.  All models with adequate fit to the data
b.  Models with most appropriate statistical
       assumptions
c.  Models most appropriately reflecting
       biological considerations (e.g.,
       threshold)
d.  Models satisfying combinations of a-c

a.  Try more  flexible model(s)
b.  Omit high-dose data if lack of fit is due to
       those data
c.  Use measure of internal dose
a.  Additional risk
b.  Extra risk

a.  Absolute difference in means
b.  Absolute difference in means normalized by
       background mean
c.  Absolute difference in means normalized by
       background standard error
d.  Gaylor and Slikker (1990) approach with
       additional risk
e.  Gaylor and Slikker (1990) approach with
       extra risk
                                         53

-------
Table 9. Summary of Decisions and Options for BMD Approach (cont.)
               Decision
                   Options
7. BMR definition
8. Confidence limit calculation

       Method
       Size
9.  Specific BMD for RfD calculation

       Multiple BMDs for a single
       endpoint

       Multiple BMDs from a single
       study

       Multiple BMDs from multiple
       studies •
10.  Uncertainty factors
a.   1 percent to 10 percent risk
a.  Likelihood theory, based on asymptotic
       distribution of likelihood ratio statistic
b.  Likelihood theory, based on asymptotic
       distribution of parameter estimates

a.  90 percent to 99 percent
a.  Select smallest BMD
b.  Combine BMDs (e.g., geometric average)

a.  Select smallest BMD
a.  Select smallest BMD
b.  Average BMDs for different species and/or
       sexes
c.  Use most appropriate species and/or sex

a.  Use same factors as used in NOAEL
       approach
b.  Use NOAEL factors modified by average
       ratio, BMD/NOAEL
c.  Use risk-based factors (Kimmel and Gaylor,
       1988)
d.  Use factors dependent on choice of BMR
       and confidence limit size
e.  Use factors that consider dose-response
       slope and/or biological considerations
                                         54

-------
        4.  DETAILED COMPARISON OF NOAEL AND BMD APPROACHES

4.1.  CONCEPTUAL BASIS
       A NOAEL for an experiment (if one exists) is am experimentally determined exposure
level at which there is no statistically or biologically significant increase in the frequency or
severity of adverse effects between the exposed population and its appropriate control.  Some
effects may be produced at this level, but they are not considered adverse or precursors to
adverse effects.  In an experiment with several NOAELs, the regulatory focus is primarily on
the highest one, leading to the common usage of the term NOAEL as the highest exposure
without adverse effect (U.S.  EPA,  1992). The NOAEL has sometimes been referred to as
an "experimental threshold."  Instead, the LOAEL should be considered the "experimental
threshold" but should not necessarily be considered an estimator of a biological threshold (if
one exists).  The definition of the LOAEL implies that it is the lowest experimental dose
associated with an  adverse effect.  It is also the case that a NOAEL represents a  dose at
which there is no significant  change (from control) in response.  There may, in fact, be some
instances where effects are seen at the NOAEL but not at a statistically or biologically
significant level.                                   ;
       The NOAEL traditionally  has been used for effects that are expected to have a
threshold.  On the  other hand, use of mathematical dose-response models has generally been
reserved for effects, particularly cancer effects,  that are considered not to have a threshold.
Conceptually, there is no reason why mathematical dose-response models cannot  be applied
to threshold effects as well as nonthreshold effects. A threshold can be incorporated into a
model as a parameter, and the value of the threshold can be estimated. In fact, several dose-
response models (QLR, QQR, CLR, and CQR)  listed in table 4 for  use in the BMD
approach explicitly incorporate a threshold dose, d0.  The implications of different threshold
estimates for choosing BMD  models and of different models for estimating thresholds have
not yet been investigated but  deserve much additional study because estimates of threshold
are likely to vary widely,  depending on model choice.
      Further, when calculating a BMD  using a dose-response model, it is not strictly
necessary that threshold effects be modeled with threshold models and nonthreshold effects
                                         55

-------
with nonthreshold models because in the calculation of a BMD, the mathematical model is
used only to estimate doses corresponding to a given level of increased response (the BMR).
Thus, even if a threshold exists for an effect, the dose-response model is used for prediction
only at doses above the threshold.

4.2. RELATIVE SIZES OF NOAELs AND BMDs
      The fact that a BMD corresponds to a specified level of change in response to an
adverse effect (for quanta! data, generally 1  percent to 10 percent increased risk, as discussed
earlier) and a NOAEL ostensibly corresponds to an experimental dose with no adverse effect
does not imply that NOAELs will necessarily be smaller than BMDs (and consequently that
larger uncertainty factors may be appropriate for BMDs). First, a BMD is defined as a
statistical lower limit, which introduces an element of conservatism in its definition.  Second,
one cannot conclude that no adverse effects are possible at a NOAEL or that effects will
necessarily be observed at the BMD. The BMD corresponding to an extra risk of 1 percent
was smaller than the corresponding NOAEL for each of  10 data sets studied by Gaylor
(1989).  Among five sets of quanta! data studied by Crump (1984), the BMD corresponding
to an extra risk of 1 percent was larger than the NOAEL in one case by a factor of 1.4, and
smaller than the NOAEL in three cases by factors ranging from 1.1 to 2.6 (one data set did
not define a NOAEL).  However, it is unclear whether the data sets used in these studies are
typical of those to which the BMD method would be applied if the method is used routinely.
In a comparison study of a  large number of developmental toxicity data sets (Allen et a!.,
1994a, b; Faustman et a!.,  1994), a BMD corresponding to an extra risk of 5 percent was on
average similar to the NOAEL when expressed  as probability of response per litter.

4.3. CONSTRAINTS IMPOSED BY THE EXPERIMENTAL DESIGN
      Whereas the BMD can theoretically assume any value, the NOAEL, by definition, is
one of the experimental doses.  This constraint may appear unnecessarily restrictive in some
cases. If, for example, only a marginally significant effect is seen at the LOAEL and there
is a large gap between the LOAEL and the next lowest dose, then the NOAEL could be
considerably smaller than would be obtained from a study employing more doses or a more
                                        56

-------
 judicious selection of doses.  On the other hand, a BMD could be estimated at a dose
 intermediate between the LOAEL and the NOAEL.  -i
       The NOAEL must be modified whenever effects are seen at all doses, and
 consequently a NOAEL is not determined. Two approaches have been used in this situation.
 One approach has been to require the study to be repeated at lower doses to define a
 NOAEL.  This alternative may be costly and time consuming and may appear to be
 unnecessary whenever a clear dose response is defined by the original  experiment (Crump,
 1984).  The other approach has been to use the  LOAEL instead of a NOAEL and to apply an
 additional uncertainty factor (generally 10; see table 1).  On the other  hand, the BMD
 approach does not have this limitation because a BMD can be determined regardless of
 whether a NOAEL is defined by  the data.

 4.4.   NUMBER OF EXPERIMENTAL SUBJECTS AND THEIR DISTRIBUTION
       INTO TREATMENT GROUPS
       One of the major differences between the NOAEL and BMD approaches is the
 manner in which they incorporate sample size.  If fewer animals are tested per group, it is
 less likely that a real difference in response rates between two groups will be detected.
 Thus, in the same species, experiments with fewer animals per dose group will tend to find
 larger NOAELs than experiments with more animals per dose group.  These considerations
 have led the EPA to impose minimum requirements for numbers of animals per test group.
 For example, the guidelines for developmental toxicity testing protocols recommend at least
 20 animals per dose group (U.S.  EPA, 1986).  This aspect of the NOAEL approach is the
 opposite of what seems appropriate; a larger study should afford greater evidence of safety
 and therefore should result in a larger RfD.
       On the other hand,  a BMD will appropriately tend to be larger when estimated from a
 study employing larger numbers of animals per dose group. This is because a BMD is
defined as a lower statistical confidence limit and a larger study will tend to define narrower
confidence bounds (i.e., larger lower limits and  smaller upper limits).
       With either the NOAEL or the BMD approach it is desirable to  have data from
several treatment groups.  With the BMD approach such data help define the shape of the
                                        57

-------
 dose response, which is estimated by the model; consequently, such data permit more
 accurate estimation of the BMD.  Having several treatment groups is also desirable when
 applying the NOAEL approach as this increases the range of possibilities for the NOAEL and
 consequently may increase the precision of the resulting RfD.
       For a given total number of experimental animals, the more dose groups in the
 experiment, the fewer animals that can be tested at each dose.  Dividing a given total number
 of animals into more treatment groups will generally not have a major impact on a BMD
 calculation because the BMD approach does not focus on dose groups individually but instead-
 fits a single dose-response model to all available data from a study. This is not the case with
 the NOAEL approach, however. Because the NOAEL compares individual responses at
 individual doses to responses in a control group, dividing a given number of animals into
 more groups decreases the power for detecting an effect at any particular dose and
 consequently tends to result in a higher NOAEL.14

 4.5.   INCORPORATION OF DOSE-RESPONSE INFORMATION
       A NOAEL may be based solely on information concerning whether an effect is
 observed at particular doses; the relationships among the magnitudes of the responses at the
 given doses  may not be taken into account.  On the other hand, the BMD is based on a dose-
 response curve that  naturally takes into account the shape of the dose response.
       This  is illustrated in figure 12 in which  the QW model has been used to  determine
BMDs for two hypothetical data sets.  The first data set (marked by x's) has a steep dose
response above the LOAEL, which in this example equals 1 mg/kg/day.  The second data set
 (marked by o's) has identical responses up to the LOAEL but then has a more gradual dose
response at doses  above the LOAEL. Also plotted are BMDs for the two data sets
corresponding to risks of 1 percent.  The first data set produces a higher BMD than the
second, which seems reasonable given the respective dose-response shapes.  On  the other
1*This effect could be mitigated by using a statistical trend test to test for a dose-response trend among the
 doses at and below a potential NOAEL. Such a test uses data from all doses in a range rather than comparing
 a single dose group with a control group. The NOSTASOT (no statistical significance of trend; Tukey et al.,
 1985) test procedure was proposed specifically for this situation.
                                          58

-------
      BMD2 NOAEL BMD1
                       Dose (mg/kg/day)
Figure 12. Example of BMDs calculated from steep versus gradual dose responses
                               59

-------
hand, the NOAEL, which is insensitive to the steepness of the dose response, is the same for
both data sets.

4.6.   SENSITIVITY TO DATA INTERPRETATION AND TO SMALL CHANGES IN
       DATA
       The NOAEL involves a number of decision points for which slight changes in data
can have a sizable effect on the outcome.  Determinations of a LOAEL and a NOAEL are
based, at least in part, on the degree of statistical significance.  Changes in responses of only
a few animals (or in even a single animal) can change a significant response  to nonsignificant
and vice versa. Further, according to the definition of a NOAEL, effects that are not
statistically significant can be determined to be biologically significant. The  calculation of
BMD, on the other hand, does not require judgments about whether an effect is present in
individual dose groups. The BMD also appears to be less sensitive than the NOAEL to small
changes in the data.15

4.7. MODEL SENSITIVITY
       Because the NOAEL does not use dose-response models, the issue of  model
sensitivity applies only to the BMD approach.  As the calculation  of a BMD  does not
extrapolate results to doses far below those for which effects are observed, the BMD
approach has been presented  as being relatively model independent (Crump, 1984). It
appears, however, that this issue has not been investigated thoroughly. Crump (1984)
applied four dose-response models to each of four sets of quanta! data and one set of
continuous data.  The ratios of the largest to the smallest of the four BMDs for each of the
five data sets were 1.2, 1.1,  1.2,  1.4, and 1.3.  The corresponding ratios for the BMDioS
were 1.3, 1.1, 1.2, 1.2, and  1.1. These ratios are small compared with the large model
differences that occur when extrapolating to much lower doses (Crump, 1985).
ISA situation in which a BMD might be affected by a small change in data is when there is a borderline lack of
 fit of models to the data and a decision must be made regarding whether to omit data at the highest dose.
                                         60

-------
 4.8.  QUANTITATIVE ESTIMATES OF RISK
        The NOAEL is defined as an experimental dose at which there is no significant
 increase in response.  At the NOAEL the risk under study conditions has been theoretically
 estimated by Gaylor (1989) as high as about 5 percent.
        Unlike the NOAEL, the BMD approach  associates a risk with each dose based on a
 mathematical dose-response model.  The calculation of a BMD uses the predictions of the
 model only at doses at and above the BMR (doses that typically correspond to altered
 responses of 1 percent or greater).  However, if desired, the model used to calculate the
 BMD also could be used to estimate risks for lower doses, even though this is not part of the
 BMD approach per se.  If such low-dose extrapolation is performed, it should be recognized
 that the results are likely to be highly model dependent and apply only under the conditions
 of the study used in modeling. It is still necessary to account for any species, exposure time,
 and other differences between the study population and the human population of interest.
       There are two obvious ways this extrapolation could be carried out.  The first is to
 simply use the predictions of the model used in calculating the BMD or the confidence limits
 on these predictions.  This approach incorporates all the; uncertainties in current procedures
 for estimating cancer risks.  The  second method, which is similar to the method proposed by
 Kimmel and Gaylor (1988) for determining factors for the BMD approach (see section 3.9 on
 uncertainty factors), is to assume that human risk is approximately equal to the animal risk
 and the dose response is linear below the BMR.  Thus, the risk at a dose, d, which is less
 than the BMR, is estimated as (d/BMD)*BMR.  This approach generally yields results
 similar to those obtained with the linearized multistage approach that the EPA applies to
 carcinogens (Anderson et al.,~ 1983). This approach could greatly under- or overestimate the
risk from substances with a dose response that includes a threshold or is highly nonlinear.
Even so, such conservative estimates of risk could be useful in some applications.  For
example, if such a conservative procedure predicted a low risk,  this  would indicate that the
true risk is at least this low and possibly much lower.
                                          61

-------
4.9. STATISTICAL EXPERTISE
       Both the NOAEL and BMD approaches require use of statistical methods.  With the
NOAEL approach, statistical tests for comparing two groups of data as well as tests for a
dose-response trend across several dose groups may be needed.  These same tests may be
required in applying the BMD  approach (e.g., to determine the critical effect).  In addition,
the BMD method requires statistical methods to fit mathematical dose-response models to"
data.  Statistical goodness-of-fit tests are needed to determine how well these models describe
the data. Further, a statistical  confidence limit on dose corresponding to a given BMR needs
to be calculated to define the BMD. Thus, the BMD method requires greater use of
statistical methodology than the NOAEL.
       Existing computer packages can perform the statistical tests required. Moreover,
programs are available that fit  most of the models listed in table 4 to data using the method
of maximum likelihood (Crump, 1984). Those programs also test goodness-of-fit and
calculate the required confidence intervals.  Consequently, a person with scientific credentials
who understands basic  statistical concepts and the basic ideas of the NOAEL and BMD
approach and who has  access to the necessary computer programs and facilities for running
them should be able to perform the necessary analyses.  Although a statistician should not be
required to perform the calculations, one should be available for consultation.
Implementation of these methods would be facilitated if a user-friendly,  special-purpose
program were available that  could perform the necessary calculations. Also useful would be
some special training (e.g., a 1-day seminar) for presenting the statistical methods used in the
BMD approach and the use of computer programs for making the necessary calculations.
                                          62

-------
                        5.  SUMMARY OF RESEARCH NEEDS
       *                                        •• -
        The discussion suggested several areas in which additional research into the BMD
 approach could be of value.  These are summarized here. Two additional
 investigations/developments are also discussed.  Some of these research needs could be
 addressed through a study that involves computing HMDs corresponding to various BMRs
 using several dose-response models for a number of data sets.

 5.1.  SUMMARY OF RESEARCH NEEDS RELATED TO BMD DECISION POINTS
       The areas identified as requiring additional research are the following:
       1.     Development of dose-response models and related methods for use with
              various types of data (see section 3.3.2),,
              Guidelines for handling lack of fit (section 3.4.2).
              Development of methods for applying pharmacokinetic considerations (section
              3.3.2).
              Guidelines for selecting appropriate measure(s) of altered response (section
              3.5.2).
              Study of the sensitivity of the BMD to choice of model, particularly in relation
             to the level of the BMR (section 3.6.2) and to the confidence limit size
             (section 3.7.2).
             Guidelines for selecting a single BMD when more than one is calculated
             (section 3.8.2).
             Investigation of uncertainty factors (section 3.9.2).
2.
3.

4.

5.
7.
5.2. ADDITIONAL TOPICS FOR INVESTIGATION/DEVELOPMENT
5.2.1. Comparison of Dose-Response Curves for DifiFerent Types of Data and Toxic
      Endpoints
      In the process of applying the BMD approach to a number of data sets, as is required
for the last two research recommendations above, it could be worthwhile from a theoretical
                                        63

-------
perspective to evaluate the various dose-response curve shapes for different forms of data
(e.g., quanta! versus continuous), for different toxic endpoints, and for different chemical
classes. Such a study could provide information on which endpoints appear to have a
threshold response versus a nonthreshold response and whether the dose responses of the
same effect from different chemicals appear to have the same shape.  This information could
be used to construct hypotheses regarding underlying mechanisms that could be tested in
subsequent experiments.  It would be particularly interesting to determine whether noncancer
responses appear in general to be "threshold-like."  This research would have implications
concerning the appropriateness of applying  different types of procedures for setting allowable
exposure for carcinogenic effects and various categories of noncarcinogenic effects.
       One way to conduct such a study would be to apply the QW and CP models and study
the values of the shape parameter, k, from  these models.  A value of k = 1 is consistent
with a linear no-threshold dose response, whereas large values of k are more indicative of a
threshold.

5.2.2. Development of Dose-Response Models for Multiple Endpoints of Toxicity
       Many types of noncancer toxicity are characterized by multiple types of effects;  e.g.,
liver toxicity may  be characterized by enzyme changes, pathology, weight changes, etc., and
neurotoxicity may include altered behavior, neurophysiological and neurochemical changes,
as well as altered structure.  Various endpoints within organ systems are likely to be
interdependent but are often treated independently for purposes of risk assessment.  A few
attempts have been made to model the interdependence of endpoints of toxicity, primarily in
the area of developmental toxicity.  For example, Ryan et al. (1991) and Catalano et al.
(1993) have shown a correlation between fetal weight and malformations and have developed
a multinomial model that accounts for fetal weight,  malformations, and prenatal death.  This
approach allows for the analysis of continuous and discrete outcomes by assuming that the
discrete outcome has some corresponding unobserved latent variable,  and that the continuous
outcome and the latent variable share a joint normal distribution.  Results of BMD
calculations using  this  approach showed that a lower value resulted for the multivariate
                                           64

-------
approach than for each outcome modeled individually, thus taking into account the risks from
all adverse events at once.
                                         65

-------
                                 6.  REFERENCES
Allen, B. C.; Kavlock, R. J.; Kimmel, C. A.; Faustman, E. M. (1994a) Dose-response
      assessment for developmental toxicity: II. Comparison of generic benchmark dose
      estimates with NOAELs. Fund. Appl. Toxicol. 23: 487-495.

Allen, B. C.; Kavlock, R. J.; Kimmel, C. A.; Faustman, E. M. (1994b) Dose-response
      assessment for developmental toxicity: III. Statistical models. Fund. Appl. Toxicol.
      23: 496-509.

Andersen, M.; Clewell, H.; Gargas, M.; Smith, F.; Reitz, R. (1987) Physiologically based
      pharmacokinetics and the risk assessment process for methylene chloride. Toxicol.
      Appl. Pharmacol. 87: 185-205.

Anderson, E.; Carcinogen Assessment Group of the U.S. Environmental Protection Agency.
      (1983) Quantitative approaches in use to assess cancer risk. Risk Anal. 3: 277-295.

Barnes, D. G.; Dourson, M. L. (1988) Reference dose (RfD): description and use in health
      risk assessments. Reg. Toxicol. Pharmacol. 8: 471-488.

Barnes, D. G.; Daston, G. P.;  Evans, J. S.; Jarabek, A. M.; Kavlock,  R. J.; Kimmel,
      C.  A.; Park, C.; Spitzer, H.L. (1995) Benchmark dose workshop: criteria for use of
      a benchmark dose to estimate a reference dose. Reg. Toxicol. Pharmacol., in press.

Bickel, P.; Doksum, K. (1977) Mathematical statistics: basic ideas and selected topics.  San
      Francisco: Holden-Day, Inc.

Calabrese, E. J. (1985) Uncertainty factors and interindividual variation. Reg. Toxicol.
      Pharm. 5: 190-196.

Catalano, P. J.; Scharfstein, D. O.; Ryan, L. M.; Kimmel, C. A.; Kimmel, G. L. (1993) A
      statistical model for fetal death, fetal weight, and malformation in developmental
      toxicity studies.  Teratology 47: 281-290.

Chemical Rubber Company (CRC). (1970) Standard mathematical tables.  Selby, S., ed.
      18th ed. Cleveland,  OH: Chemical Rubber Company.

Clement International Corporation. (1990a) Health effects and dose-response assessment for
      hydrogen chloride following short-term  exposure.  Unpublished report prepared for
      EPA Office of Air Quality Planning and Standards.
                                         66

-------
  Clement International Corporation. (19905) Health effects and dose-response assessment for
        acrolein following short-term exposure. Unpublished report prepared for EPA Office
        of Air Quality Planning and Standards.

  Cox, D.; Lindley, D. (1974) Theoretical statistics. London: Chapman & Hall.

  Crump, K. (1984) A new method for determining allowable daily intakes. Fund ADD!
        Toxicol. 4: 854-871.                        i                       '
                        r                                       .

  Crump, K. (1985) Mechanisms leading to dose-responsie models. In: Ricci, P., ed.
        Principles of health risk assessment. Englewood Cliffs, NJ: Prentice Hall- tro 321-
        372.

 Crump, K.; Howe, R. (1985) A review of methods for calculating confidence limits in low
        dose extrapolation. In: Krewski, D.,  ed. lexicological risk assessment. Boca Raton
        FL: CRC Press, Inc.                                                        '

 Crump, K.; Hoel, D.; Langley, H.; Peto, R. (1976) Fundamental carcinogenic processes and
        their implications to  low dose risk assessment. Cancer Res. 36: 2973-2979.

 Dourson, M.; Stara, J. (1983) Regulatory history and experimental support for uncertainty
        (safety) factors. Reg. Toxicol. Pharmacol. 3: 22,4-238.

 Dourson, M.; Hertzberg, R.; Hartung, R.; Blackburn, K. (1985) Novel methods for the
       estimation of acceptable daily intake.  Toxicol. Ind. Health 1: 23-41.

 Dourson, M. L.; Knauf,  L.  A.; Swartout, J. C. (1992) On reference dose (RfD) and its
       underlying toxicity data base. Toxicol. Ind. Health.  8: 171-189.

 Farland, W. F.; Dourson, M. L. (1992) Noncancer health endpoints: approaches to
       quantitative risk assessment. In:  Cothern, C. R., ed. Risk assessment. Boca Raton-
       Lewis Publ.; pp. 87-106.

 Faustman, E.M.; Allen, B.C.; Kavlock, R.J.; Kimmel, C.A. (1994) Dose-response
       assessment for developmental toxicity: I. Characterization of data base and
       determination of NOAELs. Fund. Appl. Toxicol. 23: 478-486.

Gaylor, D. (1989) Quantitative risk analysis for quanta! reproductive and developmental
       effects. Environ. Health Perspect. 79: 243-246.  ,      •   •
                                                   l         ...      •
Gaylor, D.; Slikker, W.,  Jr.  (1990) Risk assessment for neurotoxic effects. Neurotoxicoloev
       11: 211-218.                                                                 3
                                         67

-------
Haseman, J. (1984) Statistical issues in the design, analysis and interpretation of animal
       carcinogenicity studies. Environ. Health Perspect. 58: 385-392.

Hattis, D.; Lewis, S. (1992) Reducing uncertainty with adjustment factors. The Toxicologist.
       12: 1327.

Hattis, D.; Erdreich, L.; Ballew, M. (1987) Human variability in susceptibility to toxic
       chemicals~a preliminary analysis of pharmacokinetic data from normal volunteers.
       Risk Anal. 7: 415-426.

Howe, R.; Crump, K. (1982) GLOBAL 82: a computer program to extrapolate quanta!
       animal toxicity data to low doses. Prepared for the  Office of Carcinogen Standards,
       Occupational Safety and Health Administration, U.S. Department of Labor, Contract
       41USC252C3.

Jarabek, A. M.; Menache, M. G.; Overton, J. H,; Dourson, M. L.; Miller, F. J. (1989)
       Inhalation reference dose (RfD): an application of interspecies dosimetry modeling for
       risk assessment of insoluble particles. Health Phys.  57:  177-183.

Jarabek, A. M.; Menache, M. G.; Overton, J. H.; Dourson, M. L.; Miller, F. J. (1990)
       The U.S. Environmental Protection Agency's inhalation RfD  methodology: risk
       assessment for air toxics. Toxicol.  Ind. Health 6: 279-301.

Johnson, K.;  Gorzinski, S.; Bodner, K.; Campbell, R.; Wolf, C.; Friedman, M.; Mast, R.
       (1986) Chronic toxicity and oncogenicity study on acrylamide incorporated in the
       drinking water of Fischer 344 rats.: Toxicol. Appl.  Pharmacol.  85: 154-168.

Katz, G.; Krasavage, W.; Terhaar, C. (1984)  Comparative acute and subchronic toxicity of
       ethylene glycol monopropyl ether and ethylene glycol monopropyl ether acetate.
       Environ.  Health Perspect. 57: 165-175.

Kavlock, R. J.; Allen, B.  C.; Kimmel, C. A.; Faustman,  E. M. (1995) Dose-response
       assessment for developmental toxicity:  IV.  Benchmark doses for fetal weight
       changes.  Fund. Appl. Toxicol. (in press).

Kendall, M. (1951) The advanced theory of statistics. Vol. 1. 5th ed. New York: Hafner
       Publishing Company.

Kimmel, C.; Gaylor, D. (1988) Issues in  qualitative and quantitative risk analysis for
       developmental toxicology.  Risk Anal. 8: 15-21.

Kodell, R. L.; Howe, R. B.; Chen,  J. J.;  Gaylor, D. W.  (1991) Mathematical modeling of
       reproductive and developmental toxic effects for quantitative risk assessment.  Risk
       Anal. 11(4): 583-590.
                                          68

-------
  Kupper, L.; Portier, C.; Hogan, M.; Yamamoto, E. (1986) The impact of litter effects on
        dose-response modeling in teratology.  Biometrics 42: 85-98.

  Lehmann, E. (1975) Nonparametrics. Statistical methods based on ranks. San Francisco:
        Holden-Day, Inc.

  Lewis, S. C.; Lynch, J. R.; Nikiforov, A. I. (1990) A new approach to deriving community
        exposure guidelines from no-observed-adverse-effect levels. Reg. Toxicol. Pharmacol
        11: 314-330.

  Mantel, N.; Schneiderman, M. A. (1975) Estimating "safe" levels, a hazardous undertaking
        Cancer Res. 35: 1379-1386.                                              •     *'.

  Miller, R.; Ayres, J.; Calhoun, L.; Young, J.; McKenria, M. (1981) Comparative short-term
        inhalation toxicity of ethylene glycol monomethyl ether and propylene glycol
        monomethyl ether in rats and mice. Toxicol. Appl. Pharmacol. 61: 368-377.

 National Center for Toxicological Research (NCTR). (1981) Teratological evaluation of
       sulfamethazine. Prepared for Research Triangle Institute: July 8  1981-
       RTI-48/31U-2077.                            j

 National Research Council (NRC) (1977) Drinking water and health.  Washington, DC:  Safe
       Drinking Water Committee, National Academy of Sciences.

 Office of Science and Technology Policy (OSTP). (1989) Chemical carcinogens: A review of
       the science and its associated principles. In: Cohrssen, J. J.; Covello, V. T. (eds.)
       Risk analysis: A guide to principles and methods for analyzing health and
       environmental risks. Washington, DC:  Council on Environmental Quality  NTIS
       PB89-137772 RDM.

 Peto, R.;  Pike, M.; Day,  N.; Gray, R.; Lee, P.; Parish, S.; Peto, J.; Richards, S.;
       Wahrendorf, J. (1980) Guidelines for simple, sensitive significance tests for
       carcinogenic effects in long-term animal experiments. Annex. In:  Long-term and
       short-term screening assays for carcinogens: a critical appraisal. IARC monographs
       on the evaluation of the carcinogenic risk of chemicals to humans, supplement 2.
       Lyon: International Agency for Research on Cancer; pp. 311-426.

Rai, K.; Van Ryzin, J. (1985) A dose-response model for teratological experiments involving
       quantal response. Biometrics 41:  1-9.

Renwick, A. G. (1991) Safety factors and establishment of acceptable daily intakes  Food
      Add. Contamin. 8:  135-150.
                                         69

-------
Renwick, A. G. (1993) Data derived safety factors for the evaluation of food additives and
      environmental contaminants. In press.

Ryan, L. M. (1992) Quantitative risk assessment for developmental toxicity. Biometrics 48:
     ' 163-174.

Ryan, L. M.; Catalano, P. J.; Kimmel, C. A.; Kimmel, G.L. (1991) On the relationship
      between fetal weight and malformation in developmental toxicity studies. Teratology
      44: 215-223.

Sanders, O. T.; Zepp, R. L.; Kirkpatrick, R. L. (1974) Effect of PGB ingestion on sleeping
      times, organ weights, food consumption, serum corticosterone and survival of albino
      mice. Bull.  Environ. Contam. Toxicol.  12: 394-399.

SAS. (1988) SAS/STAT User's Guide, Release 6.03 edition. Gary, NC: SAS Institute, Inc.

Tarone, R.; Ware, J. (1977) On distribution-free tests for equality of survival distributions.
      Biometrika  64: 156-160.

Tukey, J.; Ciminera, J.; Heyse, J. (1985) Testing the statistical certainty of a response to
      increasing doses of a drug.  Biometrics 41: 295-301.

U.S. Environmental Protection Agency (U.S. EPA). (1986) Guidelines for the health
      assessment of suspect developmental toxicants. Federal Register 50: 39426-39436.

U.S. Environmental Protection Agency (U.S. EPA). (1987) The  risk assessment guidelines of
       1986. Washington, DC: Office of Health and Environmental Assessment. EPA/600-8-
       87-045.

U.S. Environmental Protection Agency (U.S. EPA). (1988a) Proposed guidelines for
       assessing male reproductive risk. Federal Register 53: 24850-24969.

U.S. Environmental Protection Agency (U.S. EPA). (1988b) Proposed guidelines for
       assessing female reproductive risk. Federal Register 53:24834-24847.

U.S. Environmental Protection Agency (U.S. EPA). (1989) Risk assessment guidance for
       Superfund. Vol. I: Human health evaluation manual. Interim final.  Washington, DC:
       Office of Emergency and Remedial Response.

U.S. Environmental Protection Agency (U.S. EPA). (1990) Interim methods for development
       of inhalation reference concentrations. Washington, DC: Office of Health and
       Environmental Assessment. EPA 600/8-88/066F.
                                          70

-------
U.S. Environmental Protection Agency (U.S. EPA). (1992) IRIS. Background document
      (4/1/91). Cincinnati, OH: Office of Health and Environmental Assessment,
      Environmental Criteria and Assessment Office.
                                       71

-------

-------
                       APPENDIX—STATISTICAL METHODS
                                                - i
 A.I. BMD APPROACH
       This section describes the statistical procedures associated with the fitting of the BMD
 models to experimental data.  The likelihood approach to parameter estimation is presented
 as are the methods used to evaluate the fit of the models to the data.
       Maximum Likelihood Procedures for Quanta], Endpoints.  Consider an experiment
 with g dose levels d1} ..., dg, and let N5 and X;, respectively, be the number of animals tested
 and the number of animals affected at the ith dose level.  Let P(d) be the probability that an
 animal is affected when exposed to a dose d. Assuming that Xj has a binomial distribution
 with parameters Nj and P(d),  the likelihood of the. data can be written as
LQ  =
                                 II
                                 1=1
The parameters that define P(d) are the only unknowns; they are estimated by the values that
maximize the value of LQ (Cox and Lindley, 1974).
       Maximum Likelihood Procedures for Continuous Endpoints.  Consider an
experiment with g dose levels dt, ..., dg, and let Nj be the number of animals in the ith dose
group, and let Xy, j = 1, ..., N5, i = 1, ..., g represent the response of the jth animal in the
ith dose group.  It is assumed that Xy has a normal distribution with mean m(dj) and variance
o-2. The unknown parameters in the model consist of the parameters defining m(d) (see table
4 of the text), plus alt ..., 
-------
                                          (N.-1)   '
where, again, the sum runs from 1 to N;. Then the likelihood of the data can be written as

                                             rl)s? + Ni(xi-m(di))2]/2af}.
The parameters of the continuous BMD model, as well as the variances a,2, ..., 
-------
 quantal endpoints, an approximate chi-square test is employed; for continuous endpoints, an
 F test is performed.
       For quantal responses, the observed values are numbers of responders and the models
 predict numbers of responders.  The chi-squared test statistic, C, is
                             = £
P(d;)]2
                                         *  P(d;)  *  [l-P(d;)3
 where the sum runs from 1 to g and the notation here is the same as that presented earlier.
 The degrees of freedom associated with this test are normally g-[number of parameters
 estimated].  If some of the parameter estimates fall on the boundary of the parameter space,
 the degrees of freedom are approximated as follows (Anderson et al.,  1983).  From the
 number of dose groups, subtract  1  for estimating the parameter c (the  background rate) and
 subtract 1 for each of the other parameters for which the maximum likelihood estimate is not
 a boundary value.16
       The value of C may be compared to the quantiles of a chi-square distribution.  For
 example, if C equals or exceeds the quantile for (1-a); where  a  = 0.01, then we may
 conclude that the model did not fit the observed data.
       For continuous responses, the mean  squared error for lack of fit is compared to the
 mean squared error associated with pure error to determine if a continuous model has fit the
 data.  The sum of squares associated with the pure emir is

                                   SSe =  E (Nrl)*Si2,

which has dfe  = £(Nrl) degrees of freedom.  In both cases the sum runs from 1 to g and N;
and S;2 are as defined above.  The sum of squares associated with lack  of fit is
16The parameters in the quantal BMD models are constrained to lie within certain ranges (see table 3). A
 parameter estimate may equal one of the values that define the range for the parameter, in which case a
 degree of freedom is not lost.

                                          A-3

-------
                                SS = E
which has dff degrees of freedom.  The value of dff is equal to the number of dose groups,
g, minus 1 (for the estimation of the background parameter c) minus 1 for each of the other
parameters for which the estimated value is not equal to a boundary value.
      The test statistic  ,

                               F' =  [SS/dff] / [SSe/dfJ

is distributed according to an F distribution with degrees of freedom dff and dfe. The value
of F' can be compared to tabulated quantiles of the F distribution with the specified degrees
of freedom (Bickel and Doksum, 1977; CRC, 1970) to determine if the model fits the data.
For example, when F' equals or exceeds the quantile corresponding to (1-a), where a  =
0.01, then we may conclude that the model did not fit the observed data.
      Application of the BMD Approach to Two Dose Groups.  Although the BMD models
listed in table 4 involve three or more parameters, the recommended method for computing
statistical bounds will provide a unique lower bound dose even when the data are for only
two dose groups (e.g., a control group and one treatment group). For the QQR and CQR
models, the lower bound is the same as the one that would have been obtained had the
parameter d0 been fixed at d0 = 0.  For the QW and CP models, the lower bound is the
same as the one that would have been obtained if the parameter k had been fixed at k =  1.
This value of k makes the models assume a linear, no-threshold form. Similar results apply
to other models.
       Unlike the statistical bounds, the maximum likelihood estimate (MLE) of dose
obtained using the models will not be unique when there are only two dose groups. If an
MLE is required in such a situation, it is recommended that it be calculated using the models
and constraints discussed in the previous paragraph (i.e., do = 0 for the QQR and CQR
models and k  = 1 for the QW and CP models).  These selections will generally provide the
lowest possible MLE of dose corresponding to a fixed, small level of increased response.
                                         A-4

-------
       Computer Programs.  The fitting procedures described above require sophisticated
 optimization routines involving iterative numerical calculations.  K.S. Crump Division of
 Clement International has developed software to perform the calculations and to evaluate the
 fit of models to the data. The software implements the QQR, QW, CQR, and CP models,
 among others. That software was used for all examples discussed in this document.
                                                 |-        •         •
 A.2.  STATISTICAL DETERMINATION OF A NOAEL
       A NOAEL is defined as the highest experimemtal dose at which there is no
 statistically or biologically significant increases in frequency or severity of adverse health
 effects compared with corresponding controls.  Thus, there should be no statistically
 significant evidence of a relationship between dose and response for doses up to the NOAEL.
 Although pairwise tests that compare a single treatment group with the control group are
 generally used in determining NOAELs, trend tests are available that make use of the data
 from all  of the dose groups up to and including the putative NOAEL.  These procedures test
 for the presence of a trend toward increased responses at increasingly higher doses.  These
 tests incorporate more of the data than pairwise tests; consequently they are generally more
powerful.
       NOSTASOT Dose.  Tukey et al. (1985) have proposed a procedure for determining a
no statistical significance of trend (NOSTASOT) dose. This procedure has greater power for
determining dose relationships than do multiple pairwiise tests (Tukey et al.,  1985) and can be
used to define a NOAEL.  The procedure is described as follows.
       First, select a suitable trend test. Selection  of such  a test depends on the type of
endpoint  in question and the data available for  analysis.  Recommended tests for the
situations likely to arise in the analysis of noncancer health effects  are presented below.
                                        A-5

-------
       After selecting the appropriate trend test, apply the test to all dose groups.  If the test
indicates no significant trend, then the highest dose may be considered a NOAEL.17 If the
test applied to all dose groups detects a significant trend, then the highest dose group cannot
be a NOAEL. In that case, delete the highest dose group from consideration and repeat the
trend test.   The highest dose level for which there is no statistically significant trend is the
NOAEL (NOSTASOT dose) if biological or lexicological considerations do not suggest
otherwise.
       Recommended Trend Tests.  Trend tests are proposed here for continuous endpoints
and quanta! endpoints.
       For quanta! endpoints, the Mantel-Haenszel trend test (Haseman, 1984) is
recommended.  The Mantel-Haenszel trend test relies on the  following test statistic:

                                    „     £ d,(X, - Et)
where Ej = N;*(SXi/ENi), d; is the dose level for group i, N} is the number of animals tested
in group i, X; is the number of animals with the endpoint of interest in group i, and
       y _  (EN. -  £X;)  *  (£XJ  * [(EN,)  *  (£N,  *  d?)  - (£Nt  *  d;)2]
                                     (EN;)2  *  (£N, - 1)


In all these equations the summations run over all dose groups.  The  significance of the
Mantel-Haenszel test can be determined by comparing  the value of Z with quantiles from a
standard normal table (Bickel and Doksum, 1977; CRC,  1970).  At the 5 percent level of
significance, for example, Z> 1.645 indicates a significant trend.
"Some judgment may be required because in certain circumstances the absence of a significant trend when
 considering all the doses may reflect biological realities that cannot be accounted for by a single trend test.
 As an example, consider an experiment with a compound that causes two effects. Suppose the occurrence of
 one of the endpoints makes the observation of the second endpoint less likely (e.g., death or resorption in
 developmental toxicity studies obscures the occurrence of malformations).  In such an instance, the lack of
 significant trend for the second endpoint, when considering all dose groups, may reflect the fact that the first
 endpoint is occurring so often in the high-dose group(s) that the second endpoint cannot be detected in as
 many animals and consequently makes the trend for that endpoint nonsignificant.

                                            A-6

-------
        The Mantel-Haenszel test, as stated, may not be appropriate whenever there are
 significant differences in survival.  An important case is one in which the presence of the
 toxic effect is only identified at necropsy and it is not a fatal effect (i.e., does not cause the
 death of the animal).  In this case the period of observation for the experiment can be
 divided into subintervals within which there is relatively little variation in death times.  The
 X;, NI, Ej, and V values can be calculated as described above for each subinterval.  A new Z
 statistic is calculated as
                                         ^ - Eik)
                                        (W     '
 where X^ is the number of animals with the toxic effect among animals in the ith treatment
 group that die in the km subinterval, Eik is the corresponding expected number based on
 animals that die in the kth subinterval, and Vk is the coirresponding variance.  The
 significance of this test is also evaluated by comparing this statistic with quantities from a
 standard normal table.
       Modifications of the Mantel-Haenszel test that are appropriate when either (1) the
 toxic effect  causes the death of the animal or (2) the effect can be identified before the death
 of the animal are discussed in Peto et al. (1980).
       For continuous endpoints, Jonckheere's trend test is recommended (Lehmann, 1975).
 This is a nonparametric test that is an extension of the Mann-Whitney (Wilcoxon) test.  A
 nonparametric test is recommended here because such tests make few assumptions about the
 distribution of the endpoint under consideration.  Given the variety of endpoints that may be
 analyzed under the NOAEL approach, the lack of distributional assumptions with the
Jonckheere test may be advantageous.
       To apply Jonckheere's test, one must have the individual observations (i.e., the values
of the endpoint for each animal examined). When working from summary reports of the
experiments (especially those found in the published literature), these individual values may
not be available.  In such a case, the following likelihood ratio trend test based on the CP
model is recommended.                              ,
                                          A-7

-------
       First, fit the CP model to the data with the power, k, set equal to one.  Second, apply
the CP model with the dose coefficient, qls set equal to zero and k still equal to one.  In each
run, the log-likelihood is maximized; denote the values of the two log-likelihoods as LLj and
1X2 for the first run and  the second run, respectively. Then,

                                  Cffl = 2*(LLrLL2)
is a likelihood ratio test statistic that is distributed approximately as a chi-square with
one degree of freedom under the null hypothesis of no treatment effect. The statistic CHI
tests whether the linear dose coefficient is significant (i.e., whether a significant dose-related
trend exists). Comparison of CHI with the one-degree-of-freedom chi-squared quantile
corresponding to (l-2a) determines whether the trend is significant for significance level a,
based on a one-sided (directional) test of trend.
       Alternatively, versions of nonparametric trend tests that are extensions of log-rank and
Wilcoxon tests (Tarone and Ware, 1977) may be applied to either quanta! or continuous data.
       Painvise  Tests. As alternatives to the trend tests listed above, one may wish to
employ pairwise tests to determine if a dose group is significantly different from the control
group irrespective of the overall trend. As noted, however, the trend tests have greater
power for detecting a significant dose-related increase than do the pairwise tests  (Tukey et
al., 1985).  The problem of multiple comparisons must also be considered when doing many
pairwise tests. Nevertheless, pairwise tests may provide useful supplementary information
that can be used in addition to the NOSTASOT approach.
       For quanta! data, Fisher's  exact test is the recommended pairwise test (Bickel and
Doksum, 1977).  For continuous  data, a nonparametric approach is recommended for the
pairwise comparisons as well as the trend tests.  The Mann-Whitney (Wilcoxon) test is
suitable in cases where the individual data are available (Lehmann,  1975).  When group
means and standard deviations are available but the individual results are not available, t-tests
may be applied to test for pairwise differences (Bickel and Doksum, 1977).  The
nonparametric approach is preferred when the individual data are available because it avoids
distributional assumptions.
                                          A-8

-------
      Computer Programs.  Statistical software packages such as SAS (SAS, 1988) contain
programs that can implement most of the statistical tests discussed for the NOSTASOT
procedure.
                                      A-9

-------

-------
                                      GLOSSARY

 Adverse effect.  A biochemical change, functional impairment, or pathological lesion that
 either singly or in combination adversely affects the performance of the whole organism or
 reduces an organism's ability to respond to an additional environmental challenge.

 Benchmark dose (BMD). A statistical lower confidence limit on the dose producing a
 predetermined, altered response for an effect.

 Benchmark response (BMR).  A predetermined level of altered response or risk at which
 the benchmark dose is calculated.

 Biologically significant effect.  A response in an organism or other biological system that is
 considered to have a substantial or noteworthy effect (positive or negative) on the well-being
 of the biological system.  Used to distinguish statistically significant effects or changes,
 which may or may not be meaningful to the general state of health of the system.

 Chronic exposure. Long-term exposure usually lasting 6 months to a lifetime.

 Confidence limit. A confidence interval for a parameter is a range of values that has a
 specified probability (e.g., 95 percent) of containing the parameter.  The confidence limit
 refers to the upper or lower value of the range (e.g., upper confidence limit).

 Continuous endpoint.  A measure of effect that is expressed  on a continuous scale (e.g.,
body weight or serum enzyme levels).

Critical effect. The first adverse effect, or its known precursor, that occurs as the dose  rate
increases.
                                          G-l

-------
Critical study.  A bioassay performed on the most sensitive species used as the basis of RfD
determination.

Developmental toxicity. Adverse effects on the developing organism that may result from
exposure prior to conception or postnatally to the time of sexual maturation.  Adverse
developmental effects may be detected at any point in the life span of the organism.  Major
manifestations of developmental toxicity include death of the developing organism; induction
of structural abnormalities (teratogenicity); altered growth;  and functional deficiency.

Dose-response relationship.  A relationship between (1) the dose, either "administered  dose"
(i.e., exposure) or absorbed dose, and (2) the extent of toxic injury produced by that
chemical. Response can be expressed either as the severity of injury or proportion of
exposed subjects affected.   A dose-response assessment is one of the four steps in a risk
assessment.

Endpoint. An observable or measurable biological or chemical event used as an index  of the
effect of a chemical on a cell, tissue, organ, organism, etc.

Extrapolation.  An estimate of response or quantity at a point  outside the range of the
experimental data. Also refers  to the estimation  of a measured response in a different
species or by a different route than that used in the experimental study of interest (i.e.,
species-to-species, route-to-route,  acute-to-chronic, high-to-low).

Genotoxic.  A broad term that  usually refers to a chemical that has the ability to damage
DNA or  the chromosomes. This can be determined directly by measuring mutations or
chromosome abnormalities or indirectly by measuring DNA repair,  sister-chromatid
exchange, etc.  Mutagenicity is a  subset of genotoxicity.

Lifetime. Covering the life span  of an organism (generally considered 70 years for humans).
                                          G-2

-------
 Lowest observed adverse effect level (LOAEL).  The lowest dose or exposure level of a
 chemical in a study at which there is a statistically or biologically significant increase in the
 frequency or severity of an adverse effect in the exposed population as compared with an
 appropriate, unexposed  control group.

 Maximum likelihood estimate (MLE).   A statistical best estimate of the value of a
 parameter from a given data set.

 Model.  A mathematical representation of a natural system intended to mimic the behavior of
 the real system, allowing description of empirical data, and predictions about untested states
 of the system.

 Neurotoxicity. Ability  to damage nervous tissue.
                                                  i

 No observed adverse effect level (NOAEL). An exposure level at which there are no
 statistically or biologically significant increases in the frequency or severity of adverse  effects
 between the exposed population and its appropriate control; some effects may be produced at
 this level, but they are not considered as adverse or precursors to adverse effects.  In an
 experiment with several NOAELs,  the regulatory focus is primarily on the highest one,
 leading to the common usage of the term  NOAEL as the highest exposure without adverse
 effect.

 Pharmacokinetics.  The field of study concerned with defining,  through measurement  or
 modeling, the absorption, distribution, metabolism,  and excretion of drugs or chemicals in a
biological system  as a function of time.

Population variability.  The concept of differences in susceptibility of individuals within a
population to toxicants due to variations such as genetic differences in metabolism and
response of biological tissue to chemicals.
                                         G-3

-------
Quantal endpoint.  A dichotomous measure of effect; each animal is scored "normal" or
"affected" and the measure of effect is the proportion of scored animals that are affected.

Reference concentration (RfC).  An estimate (with uncertainty spanning perhaps an order of
magnitude) of a continuous inhalation exposure to the human population (including sensitive
subgroups) that is likely to be without an appreciable risk of deleterious noncancer effects
during a lifetime.
                                                                             f

Reference dose (RfD).  An estimate (with uncertainty spanning perhaps an order of
magnitude) of a daily exposure to the human population (including sensitive subgroups) that
is likely to be without appreciable risk of deleterious noncancer effects during a lifetime.

Reproductive toxicity.  Harmful effects on fertility, gestation, or offspring caused by
exposure of either parent to a substance.

Risk.  The probability of injury, disease, or death under specific circumstances, relative to
the background probability.  In quantitative terms, risk is expressed in values ranging from
zero (representing the certainty that the probability of harm is no greater than the background .
probability) to one (representing the certainty that harm will occur).

Risk assessment.  The scientific activity of evaluating the toxic properties of a chemical and
the conditions of human exposure to it both to ascertain the likelihood that exposed humans
will be adversely affected and to characterize the nature of the effects they may experience.
The assessment may involve the following four steps:
       Hazard identification. The determination of whether a particular chemical is or is
       not causally linked to particular health effect(s).
       Dose-response assessment. The determination of the relation between the magnitude
       of exposure and the probability of occurrence of the health effects in question.
       Exposure assessment. The determination of the extent of human exposure.
                                         G-4

-------
       Risk characterization. The description of the nature and often the magnitude of
       human risk, including attendant uncertainty.

 Statistically significant effect.  In statistical analysis of data, a health effect that exhibits
 differences between a study population and a control group that are unlikely to have arisen
 by chance alone.

 Subchronic exposure.  Exposure to a substance spanning no more than approximately 10
 percent of the lifetime of an organism.

 Threshold toxicant.  A substance showing an apparent level of effect that is a minimally
 effective dose, above which a response may occur and which dose no response is expected.

 Uncertainty.  In the conduct of risk assessment (hazard identification, dose-response
 assessment, exposure assessment, risk characterization) the need to make assumptions or best
judgments in the absence of precise scientific data creates uncertainties; These uncertainties,
 expressed qualitatively and sometimes quantitatively, attempt to define the usefulness of a
particular evaluation in making a decision based on the available data.

Uncertainty factor (UF).  One of several, generally 10-fold factors used  in operationally
                                                    i
deriving the reference dose (RfD) from experimental data.  UFs are intended to account for
(1) the variation in sensitivity among members of the human population; (2) the uncertainty
in extrapolating animal data to the case of humans; (3) the uncertainty in extrapolating from
data obtained in  a study that is of less-than-lifetime exposure; and (4) the  uncertainty in using
LOAEL data rather than NOAEL data.
                                          G-5
                                                 *U.S. GOVERKMENT PRINTING OFFICE: 1995-650-006/2204 1

-------

-------