Use of the Benchmark Dose Approach in Health Risk Assessment


                                                            EPA/630/R-94/007
                                                                February 1995
The Use of the Benchmark Dose Approach in Health Risk Assessment
                             Principal Authors
             Kenny Crump, Ph.D.: Clement International Corporation
              Bruce Allen, M.S.:  Clement International Corporation
               Elaine Faustman, Ph.D.:  University of Washington
                           EPA Technical Panel
      Michael Dourson, Ph.D.: Environmental Criteria and Assessment Office
      Carole Kimmel, Ph.D.:  Office of Health and Environmental Assessment
            Harold Zenick, Ph.D.:  Health Effects Research Laboratory^
                       Risk Assessment Forum Staff
                 1  Executive Director: William Wood, Ph.D.
                 Science Coordinator:  Harry Teitelbaum, Ph.D.
                        Risk Assessment Forum
                 U.S. Environmental Protection Agency
                        Washington, DC  20460
                                                          Printed on Recycled Paper

-------
                                   DISCLAIMER

      This document has been reviewed in accordance with U.S. Environmental Protection
Agency policy and approved for publication.  Mention of trade names or commercial
products does not constitute endorsement or recommendation for use.
                                        11

-------
                                CONTENTS

LIST OF TABLES	   v

LIST OF FIGURES	   vi

FOREWORD	    vii

ACKNOWLEDGEMENTS	  viii

CONTRIBUTORS AND REVIEWERS	   ix

1.   INTRODUCTION	;•   l

2.   BACKGROUND	     3
    2.1.   Cancer Versus Noncancer Effects	   3
    2.2.   Overview of the NOAEL Approach to Determining RfDs and RfCs	   4
    2.3.   Overview of the Benchmark Approach	   5

3.   DETAILED DESCRIPTION OF THE BMD APPROACH	   9
    3.1.   Selection of Responses to Model	   9
         3.1.1.  Example:  Selecting Endpoints Within a Single Study	  11
    3.2.   Use of Categorical Versus Continuous Data in BMD Modeling .........  12
         3.2.1.  Example	  14
    3.3   Mathematical Models for Defining a BMD	  16
         3.3.1.  Criteria for Selecting Models	  ......  16
         3.3.2.  Example	  25
         3.3.3.  Additional Research	;	  27
    3.4.   Adjusting for Lack of Fit . . .	 . . .	  27
         3.4.1.  Examples	.  .  29
         3.4.2.  Additional Research	  36
    3.5.   Measure of Increased Risk	  37
         3.5.1.  Examples	  40
         3.5.2.  Additional Research	  44
   3.6.   Selection of a Benchmark Level of Risk	  44
         3.6.1.  Examples	  45
        3.6.2.  Additional Research	  46
   3.7.  Confidence Limit Calculation	  46
        3.7.1.  Example	  47
        3.7.2.  Additional Research	  48
   3.8.  Choosing an Appropriate BMD		  48
        3.8.1.  Examples	  49
        3.8.2.  Additional Research	  49
                                    111

-------
    3.9.  Uncertainty Factors  	•  49
         3.9.1.  Example	  51
         3.9.2.  Additional Research	,	  52
    3.10. Summary of HMD Decisions	  52

4.  DETAILED COMPARISON OF NOAEL AND BMD APPROACHES	  55
    4.1.  Conceptual Basis	  55
    4.2.  Relative Sizes of NOAELS and BMDS	  56
    4.3.  Constraints Imposed by the Experimental Design	  56
    4.4.  Number of Experimental Subjects and Their Distribution into Treatment
         Groups	  57
    4.5.  Incorporation of Dose-Response Information   	  58
    4.6.  Sensitivity to Data Interpretation  and to Small Changes in Data	  60
    4.7.  Model Sensitivity	  60
    4.8.  Quantitative Estimates of Risk	  61
    4.9.  Statistical Expertise	  62

5.  SUMMARY OF RESEARCH NEEDS	  63
    5.1.  Summary of Research Needs Related to BMD Decision Points  	  63
    5.2.  Additional Topics For Investigation/Development  	  63
         5.2.1.  Comparison of Dose-Response Curves for Different Types of Data
                and Toxic Endpoints	  63
         5.2.2.  Development of Dose-Response Models for Multiple Endpoints of
                Toxicity	  64

6.  REFERENCES	  66

APPENDIX—STATISTICAL METHODS  	A-l

GLOSSARY	G'1
                                       IV

-------
                                LIST OF TABLES

Table 1.     Description of Typical Uncertainty and Modifying Factors in Deriving
            Reference Doses (RfDs)	„	6

Table 2.     Steps and Decisions Required in the BMD Approach	  10

Table 3.     Acrylamide-Induced Tibial Nerve Degeneration in Rats	  15

Table 4.     Dose-Response Models Proposed for Estimating BMDs	 .  17

Table 5.     EGPE-Induced Extramedullary Hematopoiesis in the Spleen of Rats ....  30

Table 6.     EGME-Induced Testicular Toxicity in Rats and Mice	  33

Table 7.     Gestational Weight Gains in Pregnant Rats	  41

Table 8.     BMDs (mg/kg/day) Calculated for Sulfamethazine Data	  43

Table 9.     Summary of Decisions and Options for BMD Approach	 .  53

-------
Figure 1.

Figure 2.


Figure 3.


Figure 4.


Figure 5.


Figure 6.

Figure 7.


Figure 8.


Figure 9.

Figure 10.

Figure 11.

Figure 12.
                    LIST OF FIGURES

Example of calculation of a BMD		8

Examples of quantal linear regression (QLR) curves:
P(d)=c+(l-c){l-exP[-qi(d-d0)]}		  18

Examples of quantal quadratic regression (QQR) curves:
P(d)=c+(l-c){l-exp[-qi(d-do)2]}	,	  19

Examples of quantal Weibull (QW) curves:
P(d)=c+(l-c){l-exp[-q1(d)k]}	  20

Examples of continuous quadratic regression (CQR) curves:
m(d)=c+q!(d-d0)2	  21

Examples of continuous power (CP) curves: m(d)=c+q1(d)k	  22

Moderate to severe nerve degeneration in rats following acrylamide
exposure	  26
Extramedullary hematopoiesis of the spleen in rats following EGPE
exposure	
32
Testes weights in rats following EGME exposure  	  34

Testes weights in mice following EGME exposure	  35

Weight gain during gestation in rats exposed to sulfamethazine	  42

Example of HMDs calculated from steep versus gradual dose
responses   	  59
                                        VI

-------
FOREWORD

The EPA established the Risk Assessment Forum to develop scientific consensus on
risk assessment issues and to incorporate these ideas into Agency guidance. Part of this
function is to focus on and stimulate Agency discussion of promising new risk assessment
techniques. For almost 10 years, scientists have been studying,the benchmark dose (BMD)
as a promising technique for the quantitative assessment of noncancer health effects. This
report was developed to serve as a background document for discussing benchmark dose
applications to noncancer risk assessment.
Cancer risk assessment uses an assortment of quantitative methods, whereas, until
recently, quantitative approaches to noncancer risk assessment were much more limited. The
EPA is now developing new quantitative methods for noncancer risk assessment. The
information presented in this report is one step in developing the basis for an EPA consensus
on the role of benchmark methods in the quantitative assessment of noncancer health risk.
The report presents a basic overview of the benchmark method, which may provide an
additional quantitative approach to current EPA practice (i.e., the no observed adverse effect
level/uncertainty factor approach for calculating a reference dose). Clement International .
Corporation developed the main body of this report under contract to EPA.1 Agency
scientists modified the Clement draft to prepare this document, which more closely reflects
current EPA terminology and practice.
The document focuses especially on critical decisions that must be made in deriving a
BMD and applying the BMD in risk assessment. Major decisions in using the BMD are
explained, and the sensitivity of the final result to each assumption is evaluated. The
document also identifies many unresolved issues in benchmark dose application and identifies
research that may help resolve some of these issues. Technical guidance on study selection,
data selection, and model selection is not provided, but the reader is referred to appropriate
sources on these highly technical topics.
'Kenny S. Crump; Bruce C. Allen; Elaine M. Faustman. 1992. "The Use of the Benchmark Dose Approach
in Health Risk Assessment." EPA Contract No. 68-C8-0036.
Vll

-------
                              ACKNOWLEDGEMENTS

       This U.S. Environmental Protection Agency (EPA) report was developed under the
auspices of EPA's Risk Assessment Forum, a standing committee of EPA scientists charged
with developing risk assessment guidance for Agency-wide use. An Office of Research and
Development intraoffice technical panel including Drs. Michael Dourson (Environmental
Criteria and Assessment Office-Cincinnati), Carole Kimmel (Office of Health and
Environmental Assessment), and Harold Zenick (Health Effects Research Laboratory) led this
effort.  This document is based, in large part, on a final report titled The Use of the
Benchmark Dose Approach in Health Risk Assessment that was prepared by Dr.  Kenny S.
Crump and Bruce C.  Allen, Clement International Corporation, and Dr. Elaine  M.
Faustman, University of Washington, under contract to the EPA.  The Risk Assessment
Forum gratefully acknowledges their contribution.
                                         Vlll

-------
CONTRIBUTORS AND REVIEWERS

Numerous experts, both inside and outside the Agency, provided technical review for

this document. In 1993, the following scientists participated in a peer review of the draft

document and provided written comments to EPA via mail. Editorial assistance was

provided by R.O.W. Sciences, Inc.
Chao Chen
U.S. Environmental Protection Agency
Office of Research and Development
Washington, DC

John Christopher
California Environmental Protection
Agency
Sacramento, CA

George Daston
The Proctor and Gamble Company
Miami Valley Laboratories
Cincinnati, OH

Kerry Dearfield
U.S. Environmental Protection Agency
Office of Pesticide Programs
Washington, DC

Michael Dourson
U.S. Environmental Protection Agency
Environmental Criteria and Assessment
Office
Cincinnati, OH

David Gaylor
National Center For Toxicological
Research
Jefferson, AR
Suzanne Gianinni
U.S. Environmental Protection Agency
Office of Policy, Planning and Evaluation
Washington, DC

Man Golub
California Regional Primate Research
Center
University of California at Davis
Davis, CA

Lee Gorsky
U.S. Environmental Protection Agency
Region 5
Chicago, IL

Richard Hertzberg
U.S., Environmental Protection Agency
Environmental Criteria and Assessment
Office
Cincinnati, OH

Robert Kavlock
U.S., Environmental Protection Agency
Health Effects Research Laboratory
Research Triangle Park, NC

Linda Knauf
U.S., Environmental Protection Agency
Environmental Criteria and Assessment
Office
Cincinnati, OH
IX

-------
Daniel Krewski
Health & Welfare Canada
Ottawa, Ontario
Canada

Steven Lewis
Exxon Biomedical Sciences, Inc.
East Millstone, NJ

Elizabeth Margosches
U.S. Environmental Protection Agency
Office of Pollution Prevention and Toxics
Washington, DC

Edward Ohanian
U.S. Environmental Protection Agency
Office of Water
Washington, DC

William Pease
University of California at Berkeley
Berkeley, CA
Louise Ryan
Dana-Farber Cancer Institute
Boston, MA

Chon Shoaf
U.S. Environmental Protection Agency
Environmental Criteria and Assessment
  Office
Research Triangle Park, NC

Jeanette Wiltse
U.S. Environmental Protection Agency
Office of Research and Development
Washington, DC

Suzanne Wuerthele
U.S. Environmental Protection Agency
Region 8
Denver, CO

-------
1. INTRODUCTION

The U.S. Environmental Protection Agency (EPA) frequently calculates a reference
dose (RfD) or reference concentration (RfC), which is used along with other scientific
information in setting standards for noncancer human health effects. An RfD or RfC is an
estimate (with uncertainty spanning perhaps an order of magnitude) of daily exposure (RfD)
or continuous inhalation exposure (RfC) to the human population (including sensitive
subgroups) that is likely to be without an appreciable risk of deleterious effects during a
lifetime (U.S. EPA, 1992). The EPA estimates the RfD or RfC by first determining the no
observed adverse effect level (NOAEL) for the critical effect; the NOAEL represents the
highest experimental dose for which no adverse health effects have been documented.
Missing information is then accounted for by the application of uncertainty factors to estimate
a reference dose or concentration.
Using the NOAEL in determining RfDs and RfCs has many limits (reviewed by
Kimmel and Gaylor [1988] and others and noted by EPA's Science Advisory Board [U.S.
EPA, 1986, 1988a, b, 1989]). These limitations include the following:

• The experimental dose called the NOAEL is based on scientific judgment and
is often a source of controversy.
• Experiments involving fewer animals tend to produce larger NOAELs and, as
a consequence, may produce larger RfDs or RfCs (the reverse would seem
more appropriate in a regulatory context because larger experiments should
provide greater evidence of safety).
• The slope of the dose response plays little role in determining the NOAEL.
• In conjunction with exposure data, the RfD/RfC can be used to estimate the
size of the "population at risk" but not the size of their risks.
• The NOAEL is limited to the doses tested experimentally.

These and other limitations of the NOAEL approach prompted development of an
alternative that applies uncertainty factors to a benchmark dose (BMD) rather than to a

-------
NOAEL (Crump, 1984; Gaylor, 1989; and others). A BMD is a statistical lower confidence
limit for a dose that produces a predetermined change in response rate of an adverse effect
(called the benchmark response or BMR) compared to background. Unlike the NOAEL, the
BMD takes into account dose-response information by fitting a mathematical model to dose-
response data. The BMR is generally set near the lower limit of responses that can be
measured directly in animal experiments of typical size. Thus, unlike the risk assessment
methods that EPA employs with cancer effects (Anderson et al., 1983), the BMD method
does not extrapolate to doses far below the experimental range.
The EPA believes that the BMD approach presents a significant opportunity to
improve the scientific basis of noncancer risk assessment. This document aims to encourage
further application and development of the method by outlining the benchmark approach. It
is hoped the BMD will add a new perspective to risk assessment and overcome some
limitations of the NOAEL. To do this, the risk assessment community must first become
familiar with the benchmark approach and its opportunities and limitations.
This document provides background information on applying the BMD approach;
discusses the goals, strengths, and limitations of the BMD approach; and provides a detailed
comparison of the NOAEL and BMD methods as well as examples of the steps required in
calculating the BMD. The description of the benchmark approach includes the important
decisions and options at each step. Finally, the document suggests areas for additional
research. As work on the BMD continues and the approach matures, it is anticipated that
guidance will be developed for applying the benchmark dose in estimating the RfD.

-------
2. BACKGROUND

2.1. CANCER VERSUS NONCANCER EFFECTS
Assessment of risk from exposure to toxic chemicals has traditionally been performed
differently, depending on whether the response is cancer or a noncancer health effect (U.S.
EPA, 1987). EPA cancer risk assessments use dose-response models to extrapolate risks
measured in high-dose animal experiments to the much lower doses typical of human
environmental exposures. This extrapolation depends on the dose-response model selected;
different models can fit experimental data equally well, yet yield low dose-risk estimates that
differ by many orders of magnitude (Crump, 1985). EPA generally uses a dose-response
model for estimating cancer risks that assumes that increased risk is proportional to dose at
low doses (i.e., increased risk varies linearly with dose at low doses) (U.S. EPA, 1987). An
important consequence of this assumption is that any dose, no matter how small, is assumed
to result in some increase in cancer risk (i.e., it is assumed that a threshold for response does
not exist).
Much of the rationale for these assumptions is based on the idea that carcinogenicity
is mediated through genotoxicity. The possibility that a. single molecule of a genotoxicant
may be sufficient to alter the deoxyribonucleic acid (DNA) in a single cell so that a cancer is
eventually produced suggests that—no matter how unlikely such an event is—the dose-
response relationship cannot have a threshold and must be linear, at least at low doses (NRC,
1977). Crump et al. (1976) argued more generally that whenever a biological effect occurs
spontaneously in the absence of any exposure, and the effect of the toxic insult is mediated
through augmenting processes that are already operating spontaneously, a threshold would
not be present and the response should vary approximately linearly with sufficiently low
doses.
In contrast to risk assessment for cancer, less effort has been directed at developing
dose-response models for noncancer effects. One reason for this has been the lack of a
consensus regarding the shape of the dose-response curve, especially below the NOAEL for
noncancer effects. Many scientists believe that thresholds are likely to exist for many
chemically induced biological effects, particularly noncancer effects. Another reason is the

-------
diverse nature of noncancer effects. The term noncancer effect is nonspecific and
encompasses a wide variety of responses, including adverse effects on specific organs or
organ systems, reproductive capacity, viability and structure of developing offspring in utero,
and survival.2 For even a single type of effect, the response can range in severity from mild
and reversible to irreversible and life-threatening. The severity of the response may depend
on both the level and duration of exposure. Modeling this diversity of response represents a
major challenge.
The EPA often assesses risks for noncancer effects by applying uncertainty factors to
a NOAEL for a critical effect. This method does not involve dose-response models. The
purpose of this report is to discuss the benchmark dose option in which the NOAEL is
replaced with a BMD determined by a dose-response model. It is important to keep in mind,
however, that the calculation of a BMD does not involve using a dose-response model to
extrapolate risks to low doses, as the EPA does when conducting risk assessments for cancer
effects.

2.2. OVERVIEW OF THE NOAEL APPROACH TO DETERMINING RfDs AND
RfCs
The BMD and NOAEL have a number of features in common. Before the BMD
method is presented, the NOAEL is briefly described. A NOAEL is defined as:

"An exposure level at which there are no statistically or biologically significant
increases in the frequency or severity of adverse effects between the exposed
population and its appropriate control. Some effects may be produced at this
level, but they are not considered as adverse, nor precursors to adverse
effects. In an experiment with several NOAELs, the regulatory focus is
primarily on the highest one, leading to the common usage of the term
2The words "effect" and "response" are used interchangeably in this document and refer generally to conditions
that are considered adverse. Although the term "risk" is sometimes used in a similar manner to denote a
specific adverse effect (e.g., cancer or reduced fertility), in this document "risk" is used quantitatively and
refers specifically to an increased probability of an adverse effect.

-------
NOAEL as the highest exposure without adverse effect." (U.S. EPA, 1992).
An RfD, or RfC,3 is obtained by dividing the NOAEL by one or more
uncertainty factors.

Different reference values may be developed for different routes of exposure (i.e.,
oral RfDs and inhalation RfCs) and specific health effects (e.g., RfDDTs for developmental
effects). Similar techniques can also be used for different durations of exposure (e.g., for
subchronic exposures of about 7 years). The overall approach to determining an RfD, which
is outlined below, is generally the same for each type of reference value.
An RfD determination begins with a review of the relevant literature to identify the
"critical effect(s)"4 on which the RfD is to be based. This determination takes into account
the overall quality of the studies, the route and duration of exposure, and the range of health
effects. If adequate human data are available, such data are used as the basis for the RfD;
otherwise data from animal studies are used. Among the well-conducted studies, the RfD is
based on the study demonstrating the critical effect at the lowest dose. This dose is the
lowest observed adverse effect level (LOAEL). The responses in the critical study that are
obtained at doses below the LOAEL are examined to confirm that they constitute NOAELs.
The RfD is calculated by dividing the largest NOAEL from the critical study by appropriate
uncertainty factors. Table 1 presents uncertainty factors currently used by EPA (U.S. EPA,
1992).

2.3. OVERVIEW OF THE BENCHMARK APPROACH
A BMD is defined as a statistical lower confidence limit on the dose producing a
predetermined level of change in adverse response compared with the response in untreated
animals (the benchmark response, or BMR). For example, a BMD could represent a 95
percent statistical lower confidence limit on the dose corresponding to a 1 percent increase in
3The remainder of this report will refer only to RfDs; however, the discussion is equally applicable to RfCs.
'The first adverse effect(s), or known precursors), that occurs as the dose rate increases (U.S. EPA, 1992).

-------
Table 1. Description of Typical Uncertainty and Modifying Factors in Deriving Reference
Doses (RfDs)"
Standard uncertainty
factors (UFs)c
General guidelines1*
H (Interhuman)
Generally use a 10-fold factor when extrapolating from valid
experimental results from studies using prolonged exposure to
average healthy humans. This factor is intended to account for
the variation in sensitivity in the human population.
A (Experimental animal
to man)
For RfDs, generally use a 10-fold factor when extrapolating
from valid results of long-term studies on experimental animals
when results of studies of human exposure are not available or
are inadequate. For RfCs, this factor is reduced to 3-fold
when a NOAEL (human equivalent concentration) is used as
the basis of the estimate. In either case this factor is intended
to account for the uncertainty in extrapolating animal data to
humans.
S (Subchronic to
chronic)
Generally use a 10-fold factor when extrapolating from less
than chronic results on experimental animals or humans. This
factor is intended to account for the uncertainty in extrapolating
from less than chronic NOAELs to chronic NOAELs.
L (LOAEL to NOAEL)
Generally use a 10-fold factor when deriving an RfD/RFC
from a LOAEL instead of a NOAEL. This factor is intended
to account for the uncertainty in extrapolating from LOAELs to
NOAELs.
D (Incomplete data
base to complete)
Generally use a 10-fold factor when extrapolating from valid
results in experimental animals when the data are "incomplete."
This factor is intended to account for the inability of any single
study to address adequately all possible adverse outcomes.
Modifying factor (MF)
Use professional judgment to determine an additional
uncertainty factor termed a modifying factor (MF) that is
greater than zero and less than or equal to 10. The magnitude
of the MF depends on the professional assessment of scientific
uncertainties of the study and data base not explicitly treated
above (for example, the number of animals tested). The
default value for the MF is 1.
"Source: Adapted in part from Dourson and Stara (1983), Barnes and Dourson (1988), and Jarabek et al.
/1QOQ\
••Professional judgment is required to determine the appropriate value to use for any given UF. The values
listed in this table are nominal values that are frequently used by the EPA.
•Note: The maximum uncertainty factor to be used with the minimum confidence data base is 10,000.
6

-------
an adverse response over that found in untreated animals. The benchmark level of adverse
change in response (the BMR) is 1 percent in this example.
A BMD is calculated by fitting a mathematical dose-response model to data using
appropriate statistical procedures. The calculations necessary to determine a BMD are
illustrated in figure 1 for a hypothetical set of dose-response data. The horizontal axis
indicates the doses to which the animals were exposed, and the vertical axis gives the
percentage of animals having a particular adverse response. Each solid symbol represents
the average outcome in an experimental dose group. For simplicity, it is assumed that the
adverse effect did not occur in untreated animals. The figure depicts a mathematical dose-
response model fit to the data and a corresponding curve (also derived from the mathematical
model) of statistical lower bounds on doses corresponding to various levels of response. The
predetermined level of increased response (the BMR) used to define the BMD is shown on
the response (vertical) axis. The resulting BMD plotted on the dose (horizontal) axis is
determined as the lower bound on dose corresponding to an increased response equal to the
BMR. Figure 1 also shows the NOAEL determined from these data. Although in this
particular hypothetical example the BMD is illustrated as being smaller than the NOAEL, a
BMD can be either less than or greater than the corresponding NOAEL.
. The BMD approach addresses several quantitative or statistical criticisms of the
NOAEL approach presented earlier. The goals of the BMD approach include providing
flexibility with respect to the definition of the BMD (i.e., not to be restricted to one of the
experimental dose levels) and accounting more appropriately for sample size and dose-
response characteristics (Crump, 1984; Dourson et al., 1985; Kimmel and Gaylor, 1988).
Even though mathematical models are used in the BMD approach, their use is less
certain for "low-dose extrapolation," at doses far below the range for which increased
responses can be directly measured. Because the models proposed for the BMD approach
are statistical models, their predictions may be seriously in error if used to extrapolate to low
doses without incorporating detailed information on the mechanisms through which the
toxicant causes the particular adverse effect being modeled. On the other hand, because the
calculation of a BMD does not involve extrapolation fair beyond the range of the experimental
data, it should not be highly dependent on the dose-response model used.

-------
   100
 D)
 C
T5
 C
 o
 Q.
 W
 0)
 s_

_co
 CO

 E
"c
 03
 BMR  -


     0
                        indicates data point
                        with confidence bars
                          Lower statistical

                          limit on dose
Best-fitting dose
response model
                              BMD   NOAEL
           \j

                                         Dose

        BMR=targst response lavel used to define BMD
 Figure 1. Example of calculation of a BMD
                                         8

-------
3. DETAILED DESCRIPTION OF THE BMD APPROACH

The determination of an RfD with the BMD approach involves three basic steps.
First, a response or group of responses from one or more experiments is selected. Second,
BMDs are calculated for the selected responses. Third, a single BMD is determined from
among those calculated, and an RfD is calculated by dividing that BMD by appropriate
uncertainty factors. Each of these steps involves a number of decision points that will be
discussed in detail in this section. In the first step, one must decide how to select the
experiments and responses for calculating BMDs. In the third step, the values of the
uncertainty factors must be chosen. These selections and decisions are required in both the
NOAEL and BMD approaches.
Particular attention will be focused here on the second step, which is unique to the
BMD approach. This step involves specifying the form in which the data will be recorded
for modeling, choosing a dose-response model, selecting the mathematical definition of
altered response, stipulating the benchmark level of altered response (the BMR) used to
define the BMD, and selecting the procedure for computing statistical confidence limits used
to calculate the BMD (including selecting the size of the confidence limit).
Each of the decision points required in the BMD approach is listed in table 2. These
decision points and the options available for those decisions are discussed in detail in the
following sections. Many of these issues were also discussed at a recent workshop sponsored
by the EPA, American Industrial Health Council, and International Life Sciences Institute
(Barnes et al., 1994). In applying the BMD method, the EPA may find it desirable to
provide guidance for choosing among these options so that RfDs obtained with the BMD
approach are calculated in a consistent manner.

3.1. SELECTION OF RESPONSES TO MODEL
There may be several toxicity studies for a particular substance, and each study may
contain data for a number of biological effects. When calculating a BMD, dose-response
models must be applied to one or more effects from one or more studies. The following
section discusses several options for selecting responses for modeling.

-------
Table 2.  Steps and Decisions Required in the BMD Approach
                  Step
           Decisions
  1.    Study/response selection

  2.    Model dose response
 3.    Calculate BMD(s)
 4.    Determine RfD
Experiments to include
Responses to model

Format of data
Mathematical model(s)
Handling lack of fit
Measure of altered response

BMR definition
Confidence limit calculation

Specific BMD for RfD calculation
Uncertainty factors
                                          10

-------
Toxicologic considerations indicate some data are unsuitable for modeling based on
the overall quality of the study, the route of exposure used in the study vis-a-vis the route of
exposure for which an RfD is required, and the range of health effects studied yis-a-Vis those
for which the RfD is intended to cover. The EPA uses these same considerations to focus
attention on more relevant studies for identifying NOAELs (Barnes and Dourson, 1988).
Additionally, specific responses in studies may be eliminated from consideration if there is
no convincing evidence of a dose effect for those responses. Such a determination may be
based on the opinions of those who conducted the experiment, possibly supplemented by
additional statistical tests. After toxicologically irrelevant data are eliminated, several studies
and endpoints often remain, each with a different dose response. How can these diverse data
be handled using the BMD approach?
One option is to apply dose-response models to all relevant responses. While this
option has the advantage of completeness, it may require a large effort if the data base is
sizable. Further, it may be difficult to interpret results from a large number of dose-
response analyses. Selecting critical effects, as the EPA does in the NOAEL approach
(Barnes and Dourson, 1988), helps limit the scope of modeling.
The most limited modeling chooses only the effect(s) seen at the LOAEL, thus
minimizing the number of responses modeled. This seems inappropriate, however, for the
BMD because, unlike a NOAEL, the calculation of a BMD depends on the slope of the dose
response. Thus, it is possible that an effect may have a higher LOAEL but a lower BMD
than an effect with a steeper dose response but a lower BMD. This is a potential drawback
to modeling a single effect at the lowest LOAEL.

3.1.1. Example: Selecting Endpoints Within a Single Study
Sanders et al. (1974) tested the effects of dietary exposure to Aroclor 1254, a PCB
mixture, on several biological responses in male albino mice (ICR strain). The researchers
examined effects of 2 weeks of exposure on pentobarbital-induced sleeping time; food
consumption; serum corticosterone; and weights of the liver, testes, preputial glands, adrenal
glands, and vesicular glands. Serum corticosterone levels were elevated for all doses tested
11

-------
(62.5, 250, and 1,000 ppm),5 pentobarbital-induced sleeping time and food consumption
were reduced, and liver weight was increased at 250 and 1,000 ppm. Adrenal glands were
significantly heavier only at 1,000 ppm. Weights of testes, preputials, and vesicular glands
were not significantly affected.
For this modeling exercise, changes that showed no response to dose (in the testes,
preputials, and vesicular glands) can be ignored. If one chose to model only responses seen
at the LOAEL (62.5 ppm in this case), only serum corticosterone level would be modeled.
Otherwise, serum corticosterone, liver weight, adrenal gland weight, pentobarbital-induced
sleeping time, and food consumption could be modeled. Depending on the slope of the dose-
response curves, any one of these responses could yield the smallest BMD.
If other studies of PCB were included in the data base, the 62.5 ppm dose (suitably
transformed to yield consistent units across all studies) might not be the LOAEL among all
the studies (i.e., Sanders et al. [1974] is not the critical study). However, even in the more
general context of all relevant PCB studies, one of the responses from Sanders et al. (1974)
could yield the smallest BMD, again depending on the dose-response slopes and the doses
used in the other studies.

3.2. USE OF CATEGORICAL VERSUS CONTINUOUS DATA IN BMD MODELING
Noncancer health effects can be recorded in either categorical or continuous formats.
In a categorical format, possible responses are divided into two or more groups, and the
numbers of responses in each group are recorded. For example, organ degeneration may be
recorded as absent, mild, moderate, or severe. The most commonly used format for
categorization of data is the quantal format in which only the presence or absence of the
response in an experimental subject is noted. At the other extreme, a response may be
capable of assuming a continuum of values and be recorded in a continuous format. Organ
5Although a higher dose of 4,000 ppm was tested, all mice exposed at that level died within 7 days of initial
exposure.
12

-------
weights and serum enzyme levels are examples of responses that are often recorded in a
continuous format.6
The format used for expressing a response may be determined largely by what is
customary or appropriate for a particular type of response. For example, cancer responses
and particular types of developmental effects are-generally recorded in a quantal form simply
as present or absent without more detailed categorization. On the other hand, fetal weight
can be expressed and modeled as either a quantal or continuous variable (Kavlock et al.,
1994).
Additionally, unless there is access to the raw data from a study, the format for
expressing a response will be limited by the format in which the data are summarized.
Clearly, data cannot be categorized more finely than in the data summary available. When
the raw data for a response in question are available in a continuous format, they can either
be used directly in a continuous format in the dose-response models or converted into a
categorical format by dividing the range of the responses into subintervals and recording the
number of subjects with responses in each subinterval. For certain continuous responses, a
particular interval in the range may be considered to represent the "normal range" for this
response. Normal ranges can be used to define corresponding quantal responses in a very
natural fashion by considering a subject to be affected if its response is outside the normal
range.
In some cases, receding continuous data into a quantal form relates the data more
directly to adverse response. For example, liver weight as a fraction of total body weight is
not adverse per se, but it may represent an adverse response when it reaches a certain level.
If this level was specified, then animals with liver weight to total weight ratios above that
level could be considered to be adversely affected. As another example, because a body
weight reduction of >10 percent is generally considered adverse (OSTP, 1985), body weight
An additional possibility is for a response to be reported in a format that is a combination of continuous and
categorical Consider, for example, the measurement of a serum enzyme level by an analytical method that has
a detection limit of x micrograms per liter. Subjects with a response higher than x would have their enzyme
level recorded continuously, whereas subjects with enzyme levels below the detection limit would have their
response categorized as
-------
changes could be treated quantally by using the 10 percent cutpoint to define the presence of
an adverse response.
A disadvantage of receding continuous data into a categorical form is that information
on the magnitude of the response is lost. However, a possible advantage is that, because
generally some of the responses of interest must be categorical, comparisons among
responses may be facilitated if they are all categorical, particularly if they are all quanta!.
On the other hand, the data needed to define a categorical response may not be available in
the published report.
In the case of categorical data, the information generally required for application of
dose-response models includes the experimental doses, the total number of animals in each
dose group, and the number of these whose responses are in each of the categories.
Generally, interest will be in the special case of quanta! responses (i.e., two categories), and
only models for this special case will be discussed.
In the case of continuous data, for application of a number of dose-response models
(specifically those that assume that responses at each dose level are normally distributed),
experimental doses, the number of animals in each dose group, the mean response in each
group, and the sample variance of the response in each group must be known.

3.2.1. Example
Johnson et al. (1986) examined the effect in rats of chronic acrylamide exposure on
degeneration of tibial nerves. The degree of degeneration (from very slight to severe) was
recorded for each rat. Data of this form are categorical but not quantal. Because
degeneration of the type observed has been observed in aging rats (Johnson et al., 1986) and
because very slight and slight degeneration was observed at roughly the same rate in all dose
groups, adverse effect was defined to be moderate or severe degeneration. This definition
could also define a quantal response, with degeneration that was slight or very slight
counting as no response and moderate and severe degeneration counting as a response. The
numbers of male rats with moderate or severe degeneration are displayed in table 3.
14
-------
Table 3. Acrylamide-Induced Tibial Nerve Degeneration in Rats

Data

Modeling Result

Dose
(mg/kg/day)
0
0.01
0.1
0.5 (NOAEL)
2.0
Model
QQR
QW
Number
affected
9
6
12
13
16
Goodness-of-fit
p-value
0.34
0.48
Number
tested
60
60
60
60
60
BMD (mg/kg/day)
(5% extra risk)
0.83
0.31
QQR = quanta! quadratic regression.
QW = quanta! Weibull.

Source: Johnson et al. (1986).
15
-------
3.3. MATHEMATICAL MODELS FOR DEFINING A BMD
Table 4 lists various models for quanta! and continuous data (Crump, 1984; Gaylor,
1989; Gaylor and Slikker, 1990). Fitting the model to experimental data gives estimates of
three or more parameters that describe each model. This fitting, usually accomplished
through maximum likelihood methods (see appendix), estimates the probability of response
(for quanta! data) or the mean response (for continuous data) for each dose level. The same
methods also compute a lower statistical confidence limit for the dose corresponding to the
BMR. This lower confidence limit defines the BMD.
The models shown here, as well as many other possible models, relate the response to
the dose, d. The response variable is denoted in table 4 either by P(d), the probability of
response for a disease outcome that is either present or absent (quanta!), or by m(d), the
mean value of a continuously measured parameter of health or well-being. In all of the
equations shown, do is a threshold dose level, i.e., a dose level below which the response
variable is unaffected (i.e., at doses less than or equal to the threshold, the response variable
value remains at c, the value of that variable in the absence of dosing). For the quantal
models, the probability of response is assumed to increase as dose level increases. For the
continuous models, mean response can either increase or decrease as a function of dose level.

3.3.1. Criteria for Selecting Models
Ability to Describe the Observed Dose Response. Since the goal of the BMD
approach is estimation of a lower bound on dose for some level of risk not far below the
observed range, the model should give adequate predictions of the observed experimental
responses. Goodness-of-fit tests (see appendix) can be applied to determine if a model
adequately describes the dose-response data.
Each of the models presented in table 4 is capable of describing a range of dose-
response patterns. Figures 2 through 6 show the dose-response curves obtained with some of
these models. The QPR and LN models will provide dose-response shapes similar to that
shown for the QW model (figure 4). Similarly, the CPR model will provide a range of
patterns similar to that shown for the CP model (figure 6). Although figures 5 and 6 depict
the CQR and CP models applied to a response that decreases as the dose increases, these and
16
-------
Table 4. Dose-Response Models Proposed for Estimating BMDs
Model
Formula
Quantal Data
Quantal linear regression (QLR)
Quantal quadratic regression (QQR)
Quantal polynomial regression (QPR)
Quantal Weibull (QW)
Log-normal (LN)

P(d)
P(d)
P(d)
P(d)
P(d)

= c +
•"•*• ^ ~j
"•— c ~4~
= c +
= c +

(l-c){l-exp[-qi(d-d0)]}
(l-c){l-exp[-qi(d-d0)2]}
(l-c){l-exp[-q[id-...-qkdk]}
(l-c){l-exp[-qidk]}
(l-c)N(a+b logd)
Continuous Data

Continuous linear regression (CLR)

Continuous quadratic regression (CQR)

Continuous linear-quadratic
regression (CLQR)

Continuous polynomial regression (CPR)

Continuous power (CP)
m(d) =c +

m(d) = c +

m(d) =c + q1d+...+qkdk

m(d) = c
Note: P(d) is the probability of a response at the dose, d; m(d) is the mean response at the
dose, d. In all models, c, q^.-.q^ and d are parameters estimated from data. For the
quanta! models, 00, k> 1. N(x) denotes
the normal cumulative distribution function.

Source: Crump (1984); Gaylor (1989); Gaylor and Slikker (1990).
17
-------
0.8-
0.6 -
5-
CL
0.4-
0.2-
0
0
20 d0 40
q=0.05
60
Dose
80
100
q=0.005
AH curves show, maximum likelihood predictions of response, not confidence limits, for various choices of parameters
For all curves, c=0.05, d -30. The parameter c^ is the dose coefficient in this model; larger values of c^ give steeper dose

response.

Figure 2. Examples of quantal linear regression (QLR) curves:

P(d)=c+ (l-
18
-------
0.8
0.6
T5,
CL
0.4
0.2
I I
20 d
40 60
Dose
80
100
q^O.01 q^O.001 q^O.0001 q.=0.00001
All curves show maximum likelihood predictions of response, not confiiience limits, for various choices of parameters.
For all curves, c=0.05, d =30. The parameter q. is the dose coefficient in this model; larger values of q. give steeper dose
response. 01 n

Figure 3. Examples of quantal quadratic regression (QQR) curves:
P(d)=c+(1-cKl-expf-q^d-do)2]}
19
-------
0.8
0.6
CL
0.4
0.2
0
20
40 60
Dose
80
i=1 .29E-2;k=1 q^1 .82E-3;k=1 .5
100
qi=2.57E-4;k=2
All curves show maximum likelihood predictions of response, not confidence limits, for various choices of parameters.
For all curves, c=0.05. The parameter q, is the dose coefficient in this model; larger values of q, give steeper dose response.
The parameter k is the power on dose; larger values of k give more curvature.

Figure 4. Examples of quantal Weibull (QW) curves:
P(d)=c+(l-
20
-------
100
80
60
40
20
40 60

Dose
80
100
q^-0.01 q^-0.005 q^-0.0025 qi=-0.001
All curves show maximum likelihood predictions of response, not confidence limits, for various choices of parameters.
For all curves, c=90, d =20. The parameter q is the dose coefficient in this model; larger values of q give steeper dose
response. . '

Figure 5. Examples of continuous quadratic regression (CQR) curves:

m(d)=c+q1(d-d0)2
21
-------
100
80
60
"D

E
40
20
0
20
40 60

Dose
80
100
q^-0.2; k=1 q^-0.0283; k=1.5 q^-0.004; k=4 q^-S.OE-5; k=3
All curves show maximum likelihood predictions of response, not confidence limits, for various choices of parameters.
For all curves, c=90. The parameter q1 is the dose coefficient in this model; larger values of q1 give steeper dose response.
The parameter k is the power on dose; larger values of k give more curvature.

Figure 6. Examples of continuous power (CP) curves: m(d)=c+q1(d)k
22
-------
all of the models for continuous data also can be applied to responses that increase with
increasing dose.
It is often the case that several models will adequately describe the data under
consideration. When that is true, other considerations must be used to decide on the model
to use for BMD calculation.

Statistical Assumptions. Important considerations in selecting a model are the
reasonableness of the statistical assumptions underlying a model and the procedures used to
fit it to the data. In most instances, it may be reasonable to assume that quanta! results arise
from binomial variation about a dose-dependent, expected number of responders. This
means that each subject is assumed to respond independently of all other subjects and that all
animals in a given dose group have an ec[ual probability of responding. These assumptions
are generally made when the models are applied for quantal data listed in table 4. Similarly,
a continuous endpoint is generally assumed to display variation in accordance with dose-
dependent normal distributions. In other words, each subject is assumed to respond
independently of all other subjects, and the responses of animals in a particular dose group
are distributed according to a normal probability distribution. The methods proposed by
Crump (1984) for fitting the continuous models listed in table 4 assume this type of normal
variation. There are situations, however, when the binomial or normal assumptions may not
be appropriate. In those cases, one should consider alternative models that are based on
more appropriate assumptions. Whatever assumptions are made should be documented and
the reasons for their selection described. For example, in studies of developmental toxicity
where responses within and across litters are observed, the response in one fetus may not be
independent of the response in other fetuses in the same litter. Consequently, the assumption
of independence inherent in models that assume binomial variation is not strictly valid,
although this assumption may still provide reasonable results in specific cases.
Alternative models that assume more general forms of variation for quantal responses
from developmental toxicity experiments have been developed (Rai and Van Ryzin, 1985;
Kupper et al., 1986; Kodell et al., 1991; Ryan, 1992). Such models should be considered
when the BMD approach is applied to responses observed in individual fetuses.
23
-------
As a different example, it may be necessary to transform continuous data in some
cases so that they better satisfy the assumptions of a normal distribution. A log-transform is
often used for this purpose. Kendall (1951) presents statistical tests that can be used to
determine if data are consistent with a normal assumption.

Biological Considerations. Even though the models in table 4 are descriptive and do
not incorporate detailed information on biological mechanisms, certain general biological
considerations may be used to help select the dose-response models to be used for BMD
calculation.
One example could be in selecting a threshold versus a nonthreshold model. The
quanta! models QLR and QQR involve a threshold dose, d0. Doses below this threshold7
are assumed not to affect the probability of a response. On the other hand, the quantal
models QPR, QW, and LN do not involve a threshold dose; consequently, with these models
any dose, no matter how small, is assumed to increase the probability of a response. One
possible input for selecting a model is to apply threshold models to responses that are thought
likely to have thresholds, and nonthreshold models for responses for which thresholds are
considered less likely, based on consideration of biological mechanisms.8
Because a BMD is a dose corresponding to a finite (nonzero) increment in response
(the BMR), even if a threshold exists, the model predictions are only used for doses that are
above the threshold. In applying both threshold and nonthreshold models to several data
sets, Crump (1984) did not find large differences between BMDs calculated from models
involving thresholds and those not involving thresholds. Indeed, one goal in selecting a
BMR is for the resulting BMD not to be highly dependent on the underlying model. If this
7A distinction is sometimes made between a threshold for an individual and a threshold for a population. In the
models listed in table 2 that incorporate a threshold dose, do, the single threshold applies to the population.
*Th& existence or nonexistence of a threshold for an effect can never be known with certainty based on
experimental dose-response studies alone. If no responses are found at a given dose, another experiment
employing larger numbers of animals may detect a response. Conversely, if responses are detected at a given
dose, it is always possible that a threshold might exist at some lower dose.
24
-------
goal is accomplished, then it should make little practical difference whether the model used
includes a threshold.
Biological considerations also might be used to select models based on the biological
plausibility of the dose-response curve shape. Consider, for example, the difference between
the QLR and QQR models (table 4) at doses near the threshold dose, d0. While the QLR
model has a sharp transition from the background response rate to the dose-dependent rate at
the threshold, the transition for the QQR model is smoother, without the apparent abrupt
change (compare figures 2 and 3). In some circumstances, a smooth (continuous) change of
slope may be deemed more reasonable for the response under consideration and the QQR
model favored over the QLR model. In this case as well, however, if the BMR is selected
appropriately (i.e., large enough so that lower bounds on dose for that level of risk are not
overly dependent on the choice of model), it should make little practical difference which of
these models is selected. ;

Use of Multiple Models. It may be difficult to limit the calculations to a single model
based on the criteria discussed above. Consequently, it may be desirable to apply several
models. More than one of those models may fit the observed responses equally well. The
decisions required in that case are discussed in section 3.8.

3.3.2. Example
Johnson et al. (1986) observed tibial nerve degeneration induced in rats by
acrylamide. Table 3 shows the responses in quanta! form as discussed in section 3.2.1.
Table 3 also summarizes the results of fitting two quanta! models (the quanta! quadratic
regression, QQR, and quanta! Weibull, QW, models from table 4). Figure 7 shows the rates
of moderate and severe degeneration, the best-fitting QQR model, and the best-fitting QW
model.
Both models fit the data satisfactorily; chi-squared goodness-of-fit tests yielded
p-values greater than 0.05 (see appendix). Because the fits of both models to the data are
adequate and the statistical assumptions underlying the two models are identical, these
considerations do not suggest acceptance of one model over the other. The QW model does
25
-------
0.5
s
CD
c
0
c
o
tr
o
Q_
o
0.2
0.1
Data point with

95% confidence bars.
0
0.5
1 1.5

Dose-mg/kg/day

QQR QW
2.5
Figure 7. Moderate to severe nerve degeneration in rats following acrylamide exposure

Source: Johnson et al., 1986
26
-------
not allow a threshold, whereas the QQR model does. If it is suspected that tibial nerve
degeneration does not have a threshold, then one might prefer to use the QW model. If a
threshold is likely, then the QQR model might be preferred. The high rates of very slight
and slight degeneration and nonzero rates of moderate and severe degeneration in the control
group of Johnson et al. (1986), in addition to other biological considerations, suggest a finite
background rate, in which case a threshold may not be a good assumption.

3.3.3. Additional Research
Several types of data may require other types cjf models. As described above, studies
of developmental toxicants evaluate responses in fetuses. Different fetuses from the same
litter may not respond independently to developmental toxicants, and models that account for
possible "litter effects" may be needed. Special approaches also may be needed for modeling
data in which, in addition to knowing whether an animal was affected, the level of effect
may be categorized (e.g., mild, moderate, severe). While such categorization is generally
ignored, other models can use the additional information.
In some studies several different durations of exposure are used. Correspondingly,
there may be a need for different assessments for different durations of human exposure.
Models that incorporate duration of exposure as well a^s the dose level can combine all the
exposure information.
Some models exist that are applicable to each of these situations (see Rai and Van
Ryzin, 1985; Kupper et al., 1986; Kodell et al., 1991; Clement International Corporation,
1990a, b; Ryan et al., 1991; Catalano et al., 1993). The applicability of such models to
calculating BMDs is being studied (Faustman et al., 1994; Allen et al., 1994a, b).

3.4. ADJUSTING FOR LACK OF FIT
None of the models listed in table 4 will provide a reasonable fit to certain data sets.
Frequently this is due to reduced responses at higher doses that are inconsistent with the
dose-response trend seen at lower doses. One likely reason is interference at higher doses by
^ '"''••• • •• '
competing mechanisms of toxicity. Whenever a lack of fit occurs, one should be sure that
all affected animals are taken into account. For example, in some experiments, if a high
27
-------
incidence of response is seen at lower doses, the experimenter may not look for the effect at
higher doses. As another example, suppose a BMD is calculated based on the response,
"mild atrophy." If mild atrophy progresses to "moderate atrophy" and subsequently to
"severe atrophy," then animals with these more severe forms could be considered to be
affected as well. In general, if a BMD is calculated based on a toxic response that can
progress to more severe forms (possibly known by names different from the original
response), animals with more severe forms of the response also should be considered to be
affected.
A plateau in the responses at the higher doses can be caused by saturation of
metabolic or delivery systems for the ultimate toxic substance. Such an effect also can cause
dose-response models not to fit the data adequately. It may be possible to overcome this
problem by estimating the delivered dose to the site of action and then applying this dose in
the dose-response models rather than an external measure of exposure (Andersen et al.,
1987). In this approach, pharmacokinetic data on animals are used to estimate internal dose
to the target tissue that results from the experimental dosing regimen. The BMD method is
applied to the internal dose. Human pharmacokinetic information is then used to estimate the
external dose that would result in this internal dose. This external dose is then used to
estimate the RfD.
In other cases, a particular response may be reduced at higher doses due to
interference by other responses that are not a progressive form of the response of interest.
One such example is when a dose-related toxic response that occurs primarily in aged
animals is not expressed because of premature deaths due to other toxic effects. A more
subtle example is when moderate doses caused a particular organ to be enlarged, but still
higher doses caused the same organ to atrophy through an independent mechanism. In these
cases, it is not appropriate to combine these separate toxic responses into a common
response.
Whenever the responses at the higher doses are reduced, so that none of the models
listed in table 4 fit, one option is to look for a more flexible model that can adequately
describe the dose response. A seeming advantage to this approach is that one may be able to
incorporate all the data into the analysis. A danger in this approach is that the attempt to fit
28
-------
the high-dose data will skew the dose response at the lower doses that are of more direct
interest.
When none of the models provide an adequate fit, a simpler and perhaps better
advised approach is to omit the data at the highest dose and refit the models to the remaining
data. This process can be continued, and an adequate fit will eventually be obtained.9
Applying this process to toxicologic test data will be greatly limited by the small number of
doses that are typical in these experiments. This approach is used by the EPA in risk
assessments for cancer based on the linearized multistage model (Anderson et al., 1983).
The rationale for eliminating data at the highest dose as opposed to lower doses is that the
data at the highest dose should be the least informative of responses in the lower dose region
of interest, i.e., the area of critical effects.

3.4.1. Examples
Ethylene glycol monopropyl ether (EGPE) was examined for toxic effects in rats
when administered for 6 weeks via gavage (Katz et al., 1984). At the end of the 6-week
exposure period, the spleen appeared dark and enlarged in several rats, presumably as a
result of the exposure. Histopathological examination found congestion or extramedullary
hematopoiesis. Table 5 displays the rates of hematopoiesis. Note, no animals in the high-
dose group had that lesion. This may be due to competing manifestations of toxicity or other
unexplained reasons.10
The NOAEL for this study was determined by applying a statistical technique referred
to as the no statistical significance of trend (NOSTASOT) approach (Tukey et al., 1985).
The NOSTASOT approach (described in some detail in the appendix) applied to all of the
dose groups indicated that there was no significant trend for larger doses to yield larger
proportions of responders (the Mantel-Haenszel trend test p-value was about 0.56).
"The only exception to this is if at the lowest dose there is a statistically significant deficit in adverse response
compared with control animals.

10Four of the high-dose rats had enlarged and darkened spleens; six high-dose rats had congestion in the spleen.
These other endpoints might be used in lieu of extramedullary hematopoiesis for determining an RfD for
EGPE, but for the sake of illustration the hematopoiesis response is discussed here.

29
-------
Table 5. EGPE-Induced Extramedullary Hematopoiesis in the Spleen of Rats
Data Dose
(mmole/kg/day)
0
1.88 (NOAEL)
3.75
7.50
15.0
Goodness-of-
Modeling results8 Model fit p-value
QQR 0.30
QW 0.18
Number Number
affected tested
0 10
0 10
3 10
4 10
0 10
BMD (mmole/kg/day) for
extra risk of
10% 5% 1%
2.24 1.56 0.69
0.99 0.48 0.094
*The results for the models are those fit to all dose groups except for the highest dose group.
Neither model adequately fit data from all dose groups.

Source: Katz et al. (1984).
30
-------
However, the NOSTASOT procedure applied to the data both without the highest dose group
and without the highest two dose groups detected significant trends. Moreover, the pairwise
comparison of the 7.5 mmole/kg dose group and the control group indicated a significantly
increased rate of response at 7.50 mmole/kg (p = 0.04 by Fisher's exact test). The pairwise
comparison of the 3.75 mmole/kg dose group and controls was not significant (p = 0.11 by
Fisher's exact test).
The application of the BMD approach was also interesting in this case. Neither the
QQR model nor the QW model could fit the dose-response data when all dose groups were
included (p-values less than 0.02). However, dropping the highest dose group (see section
3.4) resulted in acceptable fits for both models (table 5),! Figure 8 shows the results of
fitting the models to the data, ignoring the highest dose group.
Table 5 shows the BMD estimates corresponding to three levels of extra risk for the
QQR and QW models.
Another example of lack of fit that is not so directly accommodated is provided by a
study of glycol ether-induced reproductive toxicity. Miller et al. (1981) examined the effects
of 9-day inhalation exposures (6 hours per day) to ethylene glycol monomethyl ether
(EGME) on testicular toxicity in rats and mice. Toxicity was determined by measuring testes
weights (table 6). In both rats and mice, testes weights were significantly decreased
following exposure to 1,000 ppm. The NOAEL was 300 ppm for both rats and mice.
The best-fitting CQR and CP models are shown in figures 9 and 10. Although both
models fit the rat data adequately, neither model adequately describes the mouse data (table
6). The lack of fit to the mouse data is due primarily to the 100 ppm dose group, for which
the testes weights were larger (on average) than those of controls, and to the small amount of
variation in the observed results.
The case of the mouse data illustrates one of the difficulties that can arise in applying
the BMD approach. The lack of fit in this case was not due to a smaller adverse effect at the
31
-------
Data point with
95% confidence bars.
5 10
Dose-mmole/kg/day

QQR QW
(Models fit without consideration of high dose group.)

Figure 8. Extramedullary hematopoiesis of the spleen in rats following EGPE exposure

Source: Katz et al.} 1984

32
-------
Table 6. EGME-Induced Testicular Toxicity in Rats and Mice
Data Dose (ppm)
0
100
300
(NOAEL)
1000
Rats
Average
weight
2.82
2.88
2.70
1.50
a
SD
0.10
0.05
0.20
0.10
Mice8
Average
weight
0.21
0.23
0.20
0.10
SD
0.01
0.01
0.02
0.01
Rats
Mice

Modeling results

Model
CQR
CP
Goodness-
of-fit
p-value
0.13
0.17

BMD
(ppm)
315
184
Goodness-
of-fit
p-value
<0.01
<0.01

BMD
(ppm)
— .
—
aFive animals in each dose group. Reported are average testes weights (in grams) and
standard deviations (SD) for testes weights in each dose group.

Source: Miller et al. (1981).
33
-------
3.5
D)
C/) 2.5
CO
0
•*—*
CO
o> „
h- 2
1.5
0
Data point with

95% confidence bars
200
400 600 800

Dose-ppm

CQR CP
Figure 9. Testes weights in rats following EGME exposure

Source: Miller et al., 1981
1000 1200
34
-------
0.3
0.25
O)
in 0.2
CD
CD
j—0.15
0.1
0.05
Data point with
95% confidence bars
0
200
400 600 800
Dose-ppm
CQR CP
1000
Figure 10. Testes weights in mice following EGME exposure
Source: Miller et al., 1981
35
-------
highest dose but rather an opposite response at the low dose. Therefore, dropping dose
groups (as discussed in this section) will not lead to an adequate fit.11
It is not likely that alternative models will provide better fits to the mouse data, as
long as such models postulate a monotone dose response. Models with monotone dose will
not be able to predict the increased testes weights in the 100 ppm group. Biological and
toxicological considerations may dictate that a nonmonotone response pattern is feasible in
this instance, in which case one may conclude that doses of 100 ppm or less to male mice do
not result in testicular weight loss. Alternatively, it may be determined that the observed
variation among the responses underestimates the true variability associated with the
testicular response, in which case the predictions of the CQR and CP models may be
adequate for the application of the BMD approach.
The estimated BMDs corresponding to a 5 percent relative decrease in testes weight in
rats were 315 and 184 ppm, respectively, for the CQR and CP models. These two BMDs
bracket the NOAEL of 300 ppm.

3.4.2. Additional Research
Additional research is needed to develop guidelines and suitable options to adjust for
poor fit. Guidance is especially needed on the biological and toxicological considerations to
apply when dropping doses.
As discussed earlier, use of estimates of internal dose at the site of toxicity could
result in more appropriate RfDs, regardless of whether the NOAEL or BMD approach is
"The small standard deviations reported for all the dose group responses entail small estimates of "pure error"
used for comparison with the error between model predictions and observations. An F test is performed,
where the numerator represents the error for lack of fit and the denominator represents the pure error or the
variability of the observed weights around the group-specific means. When the estimate of pure error is small
(i.e., when standard deviations are small), deviations of the model predictions from the observations may be
significant, even when they appear to be in fairly close agreement.
In some instances, values may be erroneously recorded as standard deviations, when in fact they represent
standard error of the means. Whenever this occurs, there is more variability in the observations than
suggested by the reported standard deviations, and the models may provide a satisfactory fit. The best
insurance against such an error is to have available the results in individual animals. In this case, if the values
reported by Miller et al. (1981) are actually the standard error of the means, the CQR and CP models would
adequately fit the mouse data.
36
-------
used. Once additional pharmacokinetic data are available and experience is gained from
applying them to the BMD, then the EPA may develop additional guidance for applying
pharmacokinetic data in calculating RfDs. The EPA has some experience in using such data
in estimating RfDs, but the EPA is using such data routinely for developing RfCs (e.g.,
Jarebek et al., 1989, 1990; U.S. EPA, 1990). This effort may solve some problems of poor
fit because, as mentioned above, certain pharmacokinetic behaviors might account for dose-
response patterns that are not strictly monotone (e.g., plateaus in response rates due to
saturation of crucial metabolic pathways).

3.5. MEASURE OF INCREASED RISK
Crump (1984) proposed two measures of increased response for quantal data,
"additional risk" and "extra risk." Additional risk is defined as
AR(d) = P(d) - P(0),
and extra risk as
ER(d) = [P(d) - P(0)]
In these equations, P(d) is the probability of response at dose d, and P(0) is the probability of
response in the absence of exposure (d = 0).
Additional risk is the additional proportion of total animals that respond in the
presence of the dose. Extra risk is the fraction of animals that would respond when exposed
to a dose, d, among animals who otherwise would not respond. Extra risk is typically used
by the EPA in risk assessments for cancer (Anderson et al., 1983).
Extra risk is additional risk divided by the proportion of animals that will not respond
in the absence of exposure. Thus, extra risk and additional risk will coincide for responses
that do not occur spontaneously (i.e. , when background rate is zero).
Additional risk and extra risk differ quantitatively in the manner in which they
incorporate background response. For example, ^if a dose increases one type of response
37
-------
from 0 percent to 1 percent and increases a second type of response from 90 percent to 91
percent, the additional risk is 1 percent in both cases. However, the extra risk is 1 percent
in the former case and 10 percent in the latter case.
For continuous data, Crump (1984) suggested two measures of increased response
analogous to those defined above for quanta! data. The first is the difference between the
mean response expected under exposure to dose d and the mean response expected in the
absence of exposure:
m(d) - m(0) |,
where m(d) is the mean value of the continuous measure of response for dose d. The
vertical lines are symbols for absolute value and are incorporated to allow the expression to
be applicable regardless of whether increases or decreases in the mean response are
considered adverse.
The second measure proposed by Crump (1984) for continuous data normalized
differences in mean responses by the background mean response:

|m(d)-m(0)| /m(0).

This measure of adverse response involves the fractional change in response rather than the
absolute amount of change.
Crump (1984) also suggested that changes in a continuous endpoint could be assessed
relative to the variability of that endpoint. His suggestion was to measure adverse response
by
m(d) - m(0)
where cr(0) is the standard error of the responses in the control group.
None of the measures proposed above for continuous variables take into consideration
the definition of an adverse effect (e.g., ranges of a continuous variable indicative of
38
-------
abnormality). Gaylor and Slikker (1990) suggested an approach for continuous data that
would allow one to estimate the probability of an adverse effect from continuous data without
the necessity of first categorizing the continuous responses observed (although it would still
be necessary to conceptualize a categorization into normal and abnormal ranges of response).
Suppose there is a value of the response, A, that defines an adverse effect (e.g., responses
greater than A are considered to be adverse). The approach of Gaylor and Slikker calls for
dose-response modeling of the continuous data, followed by conversion of the mean and
variance estimates to statements about the probability of observing adverse effects (e.g.,
effects greater than A) at given dose levels.
i
To implement the approach, one first fits a dose-response model to the observed
continuous endpoints and obtains estimates of the mean value of the response at a dose d,
m(d), and the standard deviation for the observations at that dose, a(d). . Then the
probability, P(d), of an adverse response at dose d can be computed as

P(d) = Probability (RESPONSESA).

This probability can be computed from knowledge of the mean, m(d), and standard
deviation, cr(d). By using these probabilities, the equations for additional risk; AR, or extra
risk, ER, for quantal responses can be applied in the subsequent steps of the BMD approach.
To use the approach suggested by Gaylor and Slikker (1990), one must assume a
normal distribution for the continuous endpoints.12 The need to assume some specific
distribution is not a disadvantage because a distribution must be assumed whenever a model
is fit to continuous data (see appendix). An advantage of this approach is that it allows a
common measure of adverse response to be used with both quantal and continuous date.
Another advantage is that, unlike the data needed to define categorical responses from
continuous date, the date necessary for implementation of this approach are likely to be
summarized in a published report.
12The method could readily be generalized to a non-normal distribution by replacing m(d) and a(d) with the
mean and standard deviation of that distribution. However, the (data needed for efficient estimation of the
parameters of a non-normal distribution generally will not be summarized in a published report of a study.
39
-------
3.5.1. Examples
Li examples presented in sections 3.3.1 and 3.4.1 that used quanta! responses (see
tables 3 and 5), extra risk was the measure of altered response used for BMD calculation. In
the example of EGME-induced testicular toxicity (table 6), for which responses were
measured on a continuous scale, the measure of altered response used was relative change in
weight (absolute change in mean testes weight normalized by the mean background [control]
testes weight).
Consider the case of maternal effects induced by sulfamethazine during pregnancy.
As part of a developmental toxicity study of sulfamethazine, the National Center for
Toxicological Research (NCTR) conducted a preliminary study to determine the toxicity of
that compound to pregnant animals (NCTR, 1981). Sulfamethazine was administered to CD
rats at seven dose levels on gestation days 6 through 15. The maternal weight gain data for
the entire gestational period are shown in table 7. Weight gains were decreased at the three
highest doses. Weight gains in the four lowest dose groups, though larger than in controls,
were not statistically different from controls. NCTR reported a significant trend for
decreased weight gain, as tested by Jonckheere's test. Application of a procedure for
determining trends for continuous endpoints based on the CP model (see appendix)
established 600 mg/kg/day as the NOAEL. Both the CQR and CP models fit the data very
well (figure 11). The BMDs estimated from these models are displayed in table 8. Shown
in table 8 are BMD estimates for two dose-response models and two measures of altered
response (as well as three levels for the BMR and three confidence limit sizes; these are
discussed below).
For both the CQR and CP models, the estimate of the BMDs depended greatly on the
measure of risk. The differences across the two measures of risk were greater for the CP
model (especially for smaller BMRs and for the larger confidence limits).
The results for the two models were most comparable when the absolute differences
in the means were normalized by the background mean (and when either the BMR was 5
percent or greater or the confidence limit size was less than or equal to 95 percent).
Normalizing by background response rates makes the BMR less dependent on the specific
model used.
40
-------
Table 7. Gestational Weight Gains in Pregnant Rats
Dose (mg/kg/day)
0
75
150
300
450
600 (NOAEL)
900
1200
"X
Average weight gain
118.6
126.4
130.6
125.1
122.8
117.4
100.0
75.2
SD
24.7
14.8
10.5
8.2
10.6
14.1
20.1
58.9
N
13
7
6
6
6
5
6
4
SD = standard deviation,
N = number of pregnant animals for which weight gains were determined.

Source: NCTR (1981).
41
-------
140
120 :r
100
CD
O)
.E 80
Co
O)
60
40
20
0
0
Data point with
95% confidence bars.
200
400 600 800 1000
Dose-mg/kg/day
CQR CP
1200
Figure 11. Weight gain during gestation in rats exposed to sulfamethazine

Source: NCTR, 1981
42
-------
Table 8. BMDs (mg/kg/day) Calculated for Sulfamethazine Data
Model
CQR
(p = 0.73)a

CP
(p = 0.70)a

Measure of risk
Absolute difference of
means

Absolute difference of
means normalized by
background mean
Absolute difference of
means

Absolute difference of
means normalized by
background mean
BMR
10
5
1
10
5
1
10
5
1
10
5
1
Confidence limit size
90%
49.0
34.7
15.5
558.0
395.0
176.0
28.9
19.0
7.18
533.0
355.0
136.0
95%
47.3
33.5
15.0
540.0
382.0
171.0
18.0
11.2
3.71
491.0
311.0
105.0
99%
44.6
31.5
14.1
510.0
361.0
161.0
5.29
2.82
0.655
405.0
226.0
54.5
ap-values for goodness-of-fit.
43
-------
3.5.2. Additional Research
It is not clear when measures expressed relative to background (e.g., extra risk and
absolute differences in means divided by background means) are preferable to measures
expressed as absolute changes. Additional research is required to provide guidance regarding
the measure of altered response that is most appropriate in particular instances.
The method described by Gaylor and Slikker (1990) permits a BMD to be calculated
from response probabilities irrespective of whether the underlying data are quantal or
continuous. Although the method is conceptually sound, the statistical methodology needed
for calculating confidence limits needs to be presented and computer software to implement
the methodology needs to be developed. Support for these implementations and
investigations of properties of the approach is needed. Particular aspects of the method that
need to be addressed include questions regarding the definition of normal and abnormal
ranges (whether based on professional, lexicological judgment or defined in terms of
variability in the control, or other background, populations). Also of particular importance
are methods for determining probabilities of being abnormal that are based on confidence
limits rather than maximum likelihood estimates.

3.6. SELECTION OF A BENCHMARK LEVEL OF RISK
The BMD is a lower statistical confidence limit on the dose corresponding to a
specified level of risk called the benchmark risk, or BMR. Thus, before calculating a BMD,
the BMR must first be specified. Several considerations may influence the selection of a
BMR. The first consideration is that, when used for determining the RfD, the BMD is used
like the NOAEL. This suggests that the BMR should be selected near the low end of the
range of increased risks that can be detected in a bioassay of typical size. Comparison of the
BMD with the NOAEL for a large number of developmental toxicity data sets indicated a
BMR in the range of 5 to 10 percent resulted in a BMD that was on average similar to the
NOAEL (Allen et al., 1994a, b; Faustman et al., 1994).
Another consideration is that an important goal of the BMD approach is that the
approach be relatively model independent; that is, different dose-response models that fit the
data should give comparable estimates of the BMD. However, it is well known that different
44
-------
mathematical dose-response models can fit data equally well and yet produce widely
divergent estimates of risk at doses far below the range that produce measurable increases in
response (Crump, 1985). Thus, for the BMD approach to be relatively model independent,
the BMR cannot be much smaller than increased responses that can be measured reliably in
experimental groups of typical size.
Some simple quantitative considerations can provide guidance with respect to the
setting of the BMR. Consider a quanta! response in a relatively large dose group of 100
animals and suppose that the observed response rate is ,1 percent. A 95 percent confidence
interval for the true rate of response ranges from 0.25 percent to 5.4 percent. (A confidence
interval for the difference between the rate in this group and that in a control group would be
even larger.) This illustrates the fact that increased responses of 1 percent or less cannot be
measured with much precision in bioassays of typical size; that is, a BMR below 1 percent
would be expected to be outside the range of risks that could be measured accurately in
typical experiments.
Various papers (Crump, 1984; Dourson et al., 1985; Kimmel and Gaylor, 1988;
Gaylor, 1989; Allen et al., 1994a, b) have proposed a BMR for quanta! responses in the
range of 1 percent to 10 percent. Less attention has been given to corresponding levels for
continuous effects (Kavlock et al., 1995). If the approach of Gaylor and Slikker (1990) (see
section 3.5) is used for continuous effects, then it may be possible to use the same BMR for
continuous responses as for quantal responses.

3.6.1. Examples
In the example of EGPE-induced toxicity in the spleen of rats (section 3.4.1, table 5),
BMDs were calculated for BMRs of 10 percent, 5 percent, and 1 percent. For the two
quantal models examined, the BMD estimates differed by slightly more than a factor of 2 for
10 percent extra risk. There was less agreement at lower risk levels, and at an extra risk of
1 percent, BMDs from the two models differed by a factor of 7.3. For the QQR model, the
BMDs corresponding to 10 percent and 5 percent extra risk bracket the NOAEL. All BMDs
calculated for the QW model fall below the NOAEL, with the BMD for 10 percent risk
being about one-half the NOAEL value.
45
-------
In the example of sulfamethazine-induced effects on the continuous variable,
gestational weight gain (table 7), BMDs were calculated for three levels of the BMR, 10
percent, 5 percent, and 1 percent (table 8). For the two models considered and for each of
the measures of risk, the results were more similar across models (i.e., there was greater
model independence) when the BMR was 5 percent or greater.

3.6.2. Additional Research
One of the desired features of the BMD approach is that, because extrapolation far
beyond the range of the data is avoided, the procedure should be relatively independent of
the dose-response model used. The extent to which this is the case depends in part on the
BMR selected. As lower BMRs are used, the corresponding BMDs should become more
model dependent because one is extrapolating further beyond the range of the data. This was
observed in the examples. However, as observed in the examples, there will be some
divergence in BMDs regardless of the BMR selected. The goal in selecting the BMR is to
make it as small as practical without the BMD becoming too model dependent. Although a
BMR of 1 percent to 10 percent has been recommended by various authors (Crump, 1984;
Kimmel and Gaylor, 1988; Gaylor, 1989), there has been no systematic study of data from a
number of chemicals to determine how model dependent the BMD is for various values of
the BMR; however, the studies by Allen et al. (1994a, b) and Faustman et al. (1994) suggest
that BMDs from several models were similar for several developmental toxicity endpoints.
How much does nonrandom or nonbinomial distribution affect confidence limit calculation
(especially for biphasic distributions)? Such a study could provide a more definitive basis for
selecting a BMR and could evaluate the model uncertainty at the recommended BMR. It also
could provide experience on the performance of various models and information on how well
models fit data and what problems might arise from their application.

3.7. CONFIDENCE LIMIT CALCULATION
Decisions to be made in calculating a lower confidence limit for the dose
corresponding to the BMR involve selecting the procedure for calculating confidence limits
and the size of the confidence limits. Recall that the BMD is defined to be the lower
46
-------
confidence limit on dose corresponding to the BMR. The lower limit, as opposed to the
maximum likelihood estimate, is used for several reasons, the foremost being that statistical
confidence limits are influenced by the sample size of an experiment. That NOAEL
judgments generally do not account for sample sizes is one major criticism of the NOAEL.
Other factors that make the lower confidence limit preferable to the maximum likelihood
estimate include the facts that the lower limit will be more stable to minor changes in the
data and that the lower limit may be estimable even in some cases where the maximum
likelihood estimate is not.
Confidence limits based on maximum likelihood theory have a number of desirable
statistical properties (Cox and Lindley, 1974). Maximum likelihood methods can derive
confidence limits either from the asymptotic distribution of the parameter estimates
themselves or the asymptotic distribution of the likelihood ratio statistic (Cox and Lindley,
1974). Crump and Howe (1985) found that the latter approach (described in the appendix)
appeared to have superior statistical qualities in dose-response applications. This approach is
incorporated into GLOBAL 82 (Howe and Crump, 1982), the computer program that has
been used by the EPA for dose-response modeling for cancer.
The size of statistical confidence limits ranges from 90 percent to 99 percent in most
applications. Instead of being based on scientific rationale, this range seems to be purely
conventional. The EPA has generally employed one-sided 95 percent confidence limits in
risk assessments for cancer effects (Anderson et al., 1983).

3.7.1 Example
In the example of sulfamethazine effect on weight gain during pregnancy (section
3.5.1, tables 7 and 8), BMDs were calculated for three sizes of confidence limits. For the
CQR model, the choice of confidence limit size had veiy little impact on the BMD estimates.
For the CP model, however, the choice of confidence limit size was much more important,
especially when absolute difference in the means was used as the measure of risk. The
importance of confidence limit size with the CP model increased as BMR decreased; e.g.,
the BMD estimate for the 1 percent BMR was more sensitive to the choice of confidence
limit size than was the estimate for the 5 percent BMR.
47
-------
The results for the two models were most comparable when the absolute differences
in the means were normalized by the background mean (and when either the BMR was 5
percent or greater or the confidence limit size was less than or equal to 95 percent). This
suggests that not only will the BMD estimates be model dependent for low levels of risk, but
that they may also be model dependent when wide confidence limits are calculated.

3.7.2. Additional Research
The appearance of the interactions between BMR, BMD, and confidence limits
highlights two features: the care with which one must consider the options for all of the
decision points, and the need for additional research to investigate the interrelationships
among the decisions. As an extension to the research suggested in section 3.6.2, one should
also consider the impact of the size of the confidence limits on the model independence of
the BMD approach. It is clear that this cannot be done in isolation from the choices
concerning the BMR level. Some guidelines for selecting confidence limit size also could be
developed that consider the adequacy (from a health-protective policy perspective) of
confidence limits of various sizes.

3.8. CHOOSING AN APPROPRIATE BMD
Depending on the models and the responses analyzed, the procedures discussed to this
point may yield a single BMD, multiple BMDs calculated from applying multiple models to
individual responses, multiple BMDs calculated from different responses in a single study, or
multiple BMDs calculated from different studies.
Multiple BMDs may arise when different models fit the data for a single response in a
single study. Different BMDs could also come from a single study if more than one
response is modeled. Selecting any BMD other than the smallest one from that study might
lead to an RfD that is not protective against the effect corresponding to the smallest BMD.
Different BMDs could arise for the same response from different studies. Potential
differences among studies with regard to species of animal studied, dosing patterns, and other
features of experimental design make it difficult to specify a general rule that would be
applicable in all situations.
48
-------
3.8.1. Examples j
In the examples discussed above, BMDs for a single endpoint in a single study have
been calculated using two different models (tables 3, 5, 6, and 8). Because it may not be
possible to eliminate one model from consideration (because of lack of fit, inappropriate
statistical assumptions, or biological considerations; see section 3.3), some judgment must be
made about the treatment of the pairs of BMDs arising from the two models.
Consider the example presented in table 3. Two options for dealing with multiple
BMDs from a single endpoint can be illustrated. The first is to use the smallest of the
BMDs, which in this case is 0.31 mg/kg/day. The second option is to combine the
estimates. If a geometric average is used, the resulting BMD estimate for acrylamide-
induced nerve degeneration is 0.51 mg/kg/day. , For the sake of this example, attention is
limited to the two models, QQR and QW,

3.8.2. Additional Research
Determining how to deal with multiple BMDs requires more extensive discussion.
Current RfD/RfC workgroup policies for selecting the "critical effect" provide a beginning
basis for developing some guidelines. Additional work needs to describe how to use the
BMD approach to develop endpoint-specific RfDs/RfCs and characterize RfDs based on
multiple endpoints. Specific scientific criteria, such as, mechanistic information, may be
particularly useful in choosing endpoints.

3.9. UNCERTAINTY FACTORS
Once a unique BMD is selected, an RfD is obtained by dividing the BMD by one or
more uncertainty factors. This same step is required in the NOAEL approach, but the
uncertainty factors are applied to the NOAEL rather than to the BMD.
The uncertainty factors that used to be routinely applied to NOAELs (table 1) have
been criticized as being arbitrary, but it is more appropriate to consider them imprecise.
Limited publications on the application of uncertainty factors to BMDs suggest uncertainty
factors to account for within-human and animal-to-human variability, the severity of the
modeled effect, and the slope of the dose-response curve (Dourson et al., 1985). A recent
49
-------
paper also suggests that the BMD at 10 percent will frequently be near the LOAEL (Farland
and Dourson, 1992).
New approaches to the definition and calculation of uncertainty factors are being
investigated (Hattis and Lewis, 1992; Lewis et al., 1990; Dourson et al., 1992; Renwick,
1991, 1993). This work should be applicable to BMDs as well as to NOAELs, but it should
be noted that, unlike the NOAEL, the calculation of the BMD depends on the BMR as well
as the size of the statistical confidence bound employed. These additional considerations may
need to be accounted for when selecting uncertainty factors for BMDs.
Some biological considerations (e.g., relating to the possibility of a threshold for the
responses under investigation) could affect the selection of uncertainty factors. The manner
in which these considerations affect uncertainty factors is unclear at present.
Kimmel and Gaylor (1988) presented another option for determining acceptable doses
that specifies a level of extra risk (e.g., 10'5) that is deemed to be sufficiently health
protective. They used the upper confidence limit of the dose-response curve to estimate a
lower confidence limit on the effective dose (ED) for a given level of response (i.e., the
lower confidence limit on the ED to produce a 10 percent response is the LED10). The
LED10 is equivalent to the BMD for a 10 percent response as proposed by Crump (1984).
Adjustment factors (F) were then applied to the LED10 to achieve a specific level of excess
risk (e.g., 10"4 to 10'5). Thus, to achieve a risk level of 10'5 a factor of 1,000 would be
applied to the LED10. This approach assumes that animal risk is approximately equal to
human risk. Thus, if the true dose response was linear, then the excess human risk level
would be no greater than the lower limit on the dose corresponding to a risk of 10"5.
However, if the dose response is highly nonlinear so that a threshold exists and if the LED10
is below the threshold, then the true human risk would be zero and this approach would be
highly conservative.
This option is equivalent to the approach proposed by Mantel and Schneiderman
(1975) for cancer toxicity and is similar to extrapolating to the presumed human risk with a
linear dose-response function (e.g., the linearized multistage approach) that the EPA applies
to cancer data (Anderson et al., 1983). In the context of the EPA's current approach to
calculating the RfD, this option is based on a different philosophy, which does not consider
50
-------
adjustments for intra- and interspecies differences, missing data, or other factors explicitly.
In addition, it requires value judgments as to what level of excess risk should be considered
acceptable in the context of these uncertainties.
3.9.1. Example
Consider the example in section 3.3.1 of acrylamide neurotoxicity (table 3). If the
same uncertainty factors applied are those used in the current NOAEL/RfD approach, the
factors that might be relevant are a factor of 10 for aniimal-to-human extrapolation, another
factor of 10 for human variability, and another factor of 10 because it is not known whether
the electron microscopic changes occurring at the LOAEL also occur at the NOAEL. These
yield a total uncertainty factor of 1,000. Application of this uncertainty factor to the two
BMD estimates shown in table 3 yields RfDs of 0.31 |ug/kg/day or 0.83 /wg/kg/day. If an
average of the two BMDs were selected (see the example in section 3.8.1), then the resulting
RfD would be 0.5 ^g/kg/day.
If the option for factor selection described by Kimmel and Gaylor (1988) were to be
used, one must determine a level of risk that is sufficiently low to provide safety for human
populations. Suppose that for the endpoints under consideration in this example a risk level
of 10"4 is acceptable. Then the factor that is applied to the BMDs is determined by the ratio
of the BMR (in this case, 0.05) and the acceptable level of risk. Thus, the factor selected in
this case would be 0.05/10"4 = 500. The RfDs calculated using this approach would be 0.6
/tg/kg/day and 2 /tg/kg/day (for the BMDs in table 3) or 1 jtg/kg/day (if the average of those
BMDs were used).
The NOAEL derived from these data is 0.5 mg/kg/day.13 Applying an uncertainty
factor of 1,000 to that NOAEL yields an RfD of 0.5 /*g/kg/day. That value is the same as
13Although Johnson et al. (1986) reported that the high-dose group experienced significantly greater mortality
than the controls, the data reported in the manuscript are not adequate for conducting a mortality-adjusted test.
However, the authors noted that the Mantel-Haenszel test showed a significant dose-related trend in
degeneration of tibial nerves when applied to all the dose groups, and they stated that the degeneration results
for doses of 0.5 mg/kg/day and below were "comparable to controls." The Mantel-Haenszel test applied to
the data without adjustment for survival differences was not significant when the highest dose group was
ignored. From such information we conclude that 0.5 is the NOAEL for tibial nerve degeneration in male
rats.

51
-------
the RfD calculated using the average BMD and the same uncertainty factor as the NOAEL
v
approach. The RfDs calculated from the BMDs using the Kimmel and Gaylor (1988)
approach to deriving uncertainty factors are slightly larger than the RfD based on the
NOAEL.

3.9.2. Additional Research
The uncertainty factors applied to a NOAEL to calculate an RfD have been applied
extensively for a number of years. The "traditional" factors (the first four of table 1) are
based on deliberation and debate by toxicologists over a number of years. They reflect a
large collection of informed judgment that is a continuing subject for research (Calabrese,
1985; Hattis et al., 1987; Hattis and Lewis, 1992; Lewis et al., 1990; Dourson et al., 1992;
Renwick, 1991, 1993). A scientific consensus such as this has recently begun for use with
BMDs (Barnes et al., 1994).
The EPA will continue to promote public discussion of the roles of the dose-response
slope and of biological considerations (e.g., the likelihood of thresholds) in determining
appropriate uncertainty factors. This will include discussion of applications of the BMD
procedure to data from a variety of chemicals with a variety of toxic endpoints. Investigating
uncertainty factors in the context of the BMD approach complements recent work that
reconsiders the uncertainty factors used in the NOAEL.

3.10. SUMMARY OF BMD DECISIONS
The decisions required in implementing the BMD approach were presented, along
with some of the available options for each of the decisions, including options proposed in
the literature. Options for each of the decisions are summarized in table 9. By no means do
these exhaust all the possibilities. The options presented were selected because they were
judged to have scientific merit, seemed reasonable, or have a history of use.
52
-------
Table 9. Summary of Decisions and Options for BMD Approach
Decision
Options
1. Selection of studies
2. Selection of responses
3. Format of data
4. Mathematical model(s)
5. Handling lack of fit

6. Measure of altered response

Quantal data

Continuous data
a. All relevant, high-quality studies
b. A single, "critical" study

a. All responses from selected studies
b. Responses observed at LOAEL

a. Convert continuous data to categorical data
b. Transform continuous data (e.g., log-
transformation)
c. Retain original, continuous format

a. All models with adequate fit to the data
b. Models with most appropriate statistical
assumptions
c. Models most appropriately reflecting
biological considerations (e.g.,
threshold)
d. Models satisfying combinations of a-c

a. Try more flexible model(s)
b. Omit high-dose data if lack of fit is due to
those data
c. Use measure of internal dose
a. Additional risk
b. Extra risk

a. Absolute difference in means
b. Absolute difference in means normalized by
background mean
c. Absolute difference in means normalized by
background standard error
d. Gaylor and Slikker (1990) approach with
additional risk
e. Gaylor and Slikker (1990) approach with
extra risk
53
-------
Table 9. Summary of Decisions and Options for BMD Approach (cont.)
Decision
Options
7. BMR definition
8. Confidence limit calculation

Method
Size
9. Specific BMD for RfD calculation

Multiple BMDs for a single
endpoint

Multiple BMDs from a single
study

Multiple BMDs from multiple
studies •
10. Uncertainty factors
a. 1 percent to 10 percent risk
a. Likelihood theory, based on asymptotic
distribution of likelihood ratio statistic
b. Likelihood theory, based on asymptotic
distribution of parameter estimates

a. 90 percent to 99 percent
a. Select smallest BMD
b. Combine BMDs (e.g., geometric average)

a. Select smallest BMD
a. Select smallest BMD
b. Average BMDs for different species and/or
sexes
c. Use most appropriate species and/or sex

a. Use same factors as used in NOAEL
approach
b. Use NOAEL factors modified by average
ratio, BMD/NOAEL
c. Use risk-based factors (Kimmel and Gaylor,
1988)
d. Use factors dependent on choice of BMR
and confidence limit size
e. Use factors that consider dose-response
slope and/or biological considerations
54
-------
4. DETAILED COMPARISON OF NOAEL AND BMD APPROACHES

4.1. CONCEPTUAL BASIS
A NOAEL for an experiment (if one exists) is am experimentally determined exposure
level at which there is no statistically or biologically significant increase in the frequency or
severity of adverse effects between the exposed population and its appropriate control. Some
effects may be produced at this level, but they are not considered adverse or precursors to
adverse effects. In an experiment with several NOAELs, the regulatory focus is primarily on
the highest one, leading to the common usage of the term NOAEL as the highest exposure
without adverse effect (U.S. EPA, 1992). The NOAEL has sometimes been referred to as
an "experimental threshold." Instead, the LOAEL should be considered the "experimental
threshold" but should not necessarily be considered an estimator of a biological threshold (if
one exists). The definition of the LOAEL implies that it is the lowest experimental dose
associated with an adverse effect. It is also the case that a NOAEL represents a dose at
which there is no significant change (from control) in response. There may, in fact, be some
instances where effects are seen at the NOAEL but not at a statistically or biologically
significant level. ;
The NOAEL traditionally has been used for effects that are expected to have a
threshold. On the other hand, use of mathematical dose-response models has generally been
reserved for effects, particularly cancer effects, that are considered not to have a threshold.
Conceptually, there is no reason why mathematical dose-response models cannot be applied
to threshold effects as well as nonthreshold effects. A threshold can be incorporated into a
model as a parameter, and the value of the threshold can be estimated. In fact, several dose-
response models (QLR, QQR, CLR, and CQR) listed in table 4 for use in the BMD
approach explicitly incorporate a threshold dose, d0. The implications of different threshold
estimates for choosing BMD models and of different models for estimating thresholds have
not yet been investigated but deserve much additional study because estimates of threshold
are likely to vary widely, depending on model choice.
Further, when calculating a BMD using a dose-response model, it is not strictly
necessary that threshold effects be modeled with threshold models and nonthreshold effects
55
-------
with nonthreshold models because in the calculation of a BMD, the mathematical model is
used only to estimate doses corresponding to a given level of increased response (the BMR).
Thus, even if a threshold exists for an effect, the dose-response model is used for prediction
only at doses above the threshold.

4.2. RELATIVE SIZES OF NOAELs AND BMDs
The fact that a BMD corresponds to a specified level of change in response to an
adverse effect (for quanta! data, generally 1 percent to 10 percent increased risk, as discussed
earlier) and a NOAEL ostensibly corresponds to an experimental dose with no adverse effect
does not imply that NOAELs will necessarily be smaller than BMDs (and consequently that
larger uncertainty factors may be appropriate for BMDs). First, a BMD is defined as a
statistical lower limit, which introduces an element of conservatism in its definition. Second,
one cannot conclude that no adverse effects are possible at a NOAEL or that effects will
necessarily be observed at the BMD. The BMD corresponding to an extra risk of 1 percent
was smaller than the corresponding NOAEL for each of 10 data sets studied by Gaylor
(1989). Among five sets of quanta! data studied by Crump (1984), the BMD corresponding
to an extra risk of 1 percent was larger than the NOAEL in one case by a factor of 1.4, and
smaller than the NOAEL in three cases by factors ranging from 1.1 to 2.6 (one data set did
not define a NOAEL). However, it is unclear whether the data sets used in these studies are
typical of those to which the BMD method would be applied if the method is used routinely.
In a comparison study of a large number of developmental toxicity data sets (Allen et a!.,
1994a, b; Faustman et a!., 1994), a BMD corresponding to an extra risk of 5 percent was on
average similar to the NOAEL when expressed as probability of response per litter.

4.3. CONSTRAINTS IMPOSED BY THE EXPERIMENTAL DESIGN
Whereas the BMD can theoretically assume any value, the NOAEL, by definition, is
one of the experimental doses. This constraint may appear unnecessarily restrictive in some
cases. If, for example, only a marginally significant effect is seen at the LOAEL and there
is a large gap between the LOAEL and the next lowest dose, then the NOAEL could be
considerably smaller than would be obtained from a study employing more doses or a more
56
-------
judicious selection of doses. On the other hand, a BMD could be estimated at a dose
intermediate between the LOAEL and the NOAEL. -i
The NOAEL must be modified whenever effects are seen at all doses, and
consequently a NOAEL is not determined. Two approaches have been used in this situation.
One approach has been to require the study to be repeated at lower doses to define a
NOAEL. This alternative may be costly and time consuming and may appear to be
unnecessary whenever a clear dose response is defined by the original experiment (Crump,
1984). The other approach has been to use the LOAEL instead of a NOAEL and to apply an
additional uncertainty factor (generally 10; see table 1). On the other hand, the BMD
approach does not have this limitation because a BMD can be determined regardless of
whether a NOAEL is defined by the data.

4.4. NUMBER OF EXPERIMENTAL SUBJECTS AND THEIR DISTRIBUTION
INTO TREATMENT GROUPS
One of the major differences between the NOAEL and BMD approaches is the
manner in which they incorporate sample size. If fewer animals are tested per group, it is
less likely that a real difference in response rates between two groups will be detected.
Thus, in the same species, experiments with fewer animals per dose group will tend to find
larger NOAELs than experiments with more animals per dose group. These considerations
have led the EPA to impose minimum requirements for numbers of animals per test group.
For example, the guidelines for developmental toxicity testing protocols recommend at least
20 animals per dose group (U.S. EPA, 1986). This aspect of the NOAEL approach is the
opposite of what seems appropriate; a larger study should afford greater evidence of safety
and therefore should result in a larger RfD.
On the other hand, a BMD will appropriately tend to be larger when estimated from a
study employing larger numbers of animals per dose group. This is because a BMD is
defined as a lower statistical confidence limit and a larger study will tend to define narrower
confidence bounds (i.e., larger lower limits and smaller upper limits).
With either the NOAEL or the BMD approach it is desirable to have data from
several treatment groups. With the BMD approach such data help define the shape of the
57
-------
dose response, which is estimated by the model; consequently, such data permit more
accurate estimation of the BMD. Having several treatment groups is also desirable when
applying the NOAEL approach as this increases the range of possibilities for the NOAEL and
consequently may increase the precision of the resulting RfD.
For a given total number of experimental animals, the more dose groups in the
experiment, the fewer animals that can be tested at each dose. Dividing a given total number
of animals into more treatment groups will generally not have a major impact on a BMD
calculation because the BMD approach does not focus on dose groups individually but instead-
fits a single dose-response model to all available data from a study. This is not the case with
the NOAEL approach, however. Because the NOAEL compares individual responses at
individual doses to responses in a control group, dividing a given number of animals into
more groups decreases the power for detecting an effect at any particular dose and
consequently tends to result in a higher NOAEL.14

4.5. INCORPORATION OF DOSE-RESPONSE INFORMATION
A NOAEL may be based solely on information concerning whether an effect is
observed at particular doses; the relationships among the magnitudes of the responses at the
given doses may not be taken into account. On the other hand, the BMD is based on a dose-
response curve that naturally takes into account the shape of the dose response.
This is illustrated in figure 12 in which the QW model has been used to determine
BMDs for two hypothetical data sets. The first data set (marked by x's) has a steep dose
response above the LOAEL, which in this example equals 1 mg/kg/day. The second data set
(marked by o's) has identical responses up to the LOAEL but then has a more gradual dose
response at doses above the LOAEL. Also plotted are BMDs for the two data sets
corresponding to risks of 1 percent. The first data set produces a higher BMD than the
second, which seems reasonable given the respective dose-response shapes. On the other
1*This effect could be mitigated by using a statistical trend test to test for a dose-response trend among the
doses at and below a potential NOAEL. Such a test uses data from all doses in a range rather than comparing
a single dose group with a control group. The NOSTASOT (no statistical significance of trend; Tukey et al.,
1985) test procedure was proposed specifically for this situation.
58
-------
BMD2 NOAEL BMD1
Dose (mg/kg/day)
Figure 12. Example of BMDs calculated from steep versus gradual dose responses
59
-------
hand, the NOAEL, which is insensitive to the steepness of the dose response, is the same for
both data sets.

4.6. SENSITIVITY TO DATA INTERPRETATION AND TO SMALL CHANGES IN
DATA
The NOAEL involves a number of decision points for which slight changes in data
can have a sizable effect on the outcome. Determinations of a LOAEL and a NOAEL are
based, at least in part, on the degree of statistical significance. Changes in responses of only
a few animals (or in even a single animal) can change a significant response to nonsignificant
and vice versa. Further, according to the definition of a NOAEL, effects that are not
statistically significant can be determined to be biologically significant. The calculation of
BMD, on the other hand, does not require judgments about whether an effect is present in
individual dose groups. The BMD also appears to be less sensitive than the NOAEL to small
changes in the data.15

4.7. MODEL SENSITIVITY
Because the NOAEL does not use dose-response models, the issue of model
sensitivity applies only to the BMD approach. As the calculation of a BMD does not
extrapolate results to doses far below those for which effects are observed, the BMD
approach has been presented as being relatively model independent (Crump, 1984). It
appears, however, that this issue has not been investigated thoroughly. Crump (1984)
applied four dose-response models to each of four sets of quanta! data and one set of
continuous data. The ratios of the largest to the smallest of the four BMDs for each of the
five data sets were 1.2, 1.1, 1.2, 1.4, and 1.3. The corresponding ratios for the BMDioS
were 1.3, 1.1, 1.2, 1.2, and 1.1. These ratios are small compared with the large model
differences that occur when extrapolating to much lower doses (Crump, 1985).
ISA situation in which a BMD might be affected by a small change in data is when there is a borderline lack of
fit of models to the data and a decision must be made regarding whether to omit data at the highest dose.
60
-------
4.8. QUANTITATIVE ESTIMATES OF RISK
The NOAEL is defined as an experimental dose at which there is no significant
increase in response. At the NOAEL the risk under study conditions has been theoretically
estimated by Gaylor (1989) as high as about 5 percent.
Unlike the NOAEL, the BMD approach associates a risk with each dose based on a
mathematical dose-response model. The calculation of a BMD uses the predictions of the
model only at doses at and above the BMR (doses that typically correspond to altered
responses of 1 percent or greater). However, if desired, the model used to calculate the
BMD also could be used to estimate risks for lower doses, even though this is not part of the
BMD approach per se. If such low-dose extrapolation is performed, it should be recognized
that the results are likely to be highly model dependent and apply only under the conditions
of the study used in modeling. It is still necessary to account for any species, exposure time,
and other differences between the study population and the human population of interest.
There are two obvious ways this extrapolation could be carried out. The first is to
simply use the predictions of the model used in calculating the BMD or the confidence limits
on these predictions. This approach incorporates all the; uncertainties in current procedures
for estimating cancer risks. The second method, which is similar to the method proposed by
Kimmel and Gaylor (1988) for determining factors for the BMD approach (see section 3.9 on
uncertainty factors), is to assume that human risk is approximately equal to the animal risk
and the dose response is linear below the BMR. Thus, the risk at a dose, d, which is less
than the BMR, is estimated as (d/BMD)*BMR. This approach generally yields results
similar to those obtained with the linearized multistage approach that the EPA applies to
carcinogens (Anderson et al.,~ 1983). This approach could greatly under- or overestimate the
risk from substances with a dose response that includes a threshold or is highly nonlinear.
Even so, such conservative estimates of risk could be useful in some applications. For
example, if such a conservative procedure predicted a low risk, this would indicate that the
true risk is at least this low and possibly much lower.
61
-------
4.9. STATISTICAL EXPERTISE
Both the NOAEL and BMD approaches require use of statistical methods. With the
NOAEL approach, statistical tests for comparing two groups of data as well as tests for a
dose-response trend across several dose groups may be needed. These same tests may be
required in applying the BMD approach (e.g., to determine the critical effect). In addition,
the BMD method requires statistical methods to fit mathematical dose-response models to"
data. Statistical goodness-of-fit tests are needed to determine how well these models describe
the data. Further, a statistical confidence limit on dose corresponding to a given BMR needs
to be calculated to define the BMD. Thus, the BMD method requires greater use of
statistical methodology than the NOAEL.
Existing computer packages can perform the statistical tests required. Moreover,
programs are available that fit most of the models listed in table 4 to data using the method
of maximum likelihood (Crump, 1984). Those programs also test goodness-of-fit and
calculate the required confidence intervals. Consequently, a person with scientific credentials
who understands basic statistical concepts and the basic ideas of the NOAEL and BMD
approach and who has access to the necessary computer programs and facilities for running
them should be able to perform the necessary analyses. Although a statistician should not be
required to perform the calculations, one should be available for consultation.
Implementation of these methods would be facilitated if a user-friendly, special-purpose
program were available that could perform the necessary calculations. Also useful would be
some special training (e.g., a 1-day seminar) for presenting the statistical methods used in the
BMD approach and the use of computer programs for making the necessary calculations.
62
-------
5. SUMMARY OF RESEARCH NEEDS
* •• -
The discussion suggested several areas in which additional research into the BMD
approach could be of value. These are summarized here. Two additional
investigations/developments are also discussed. Some of these research needs could be
addressed through a study that involves computing HMDs corresponding to various BMRs
using several dose-response models for a number of data sets.

5.1. SUMMARY OF RESEARCH NEEDS RELATED TO BMD DECISION POINTS
The areas identified as requiring additional research are the following:
1. Development of dose-response models and related methods for use with
various types of data (see section 3.3.2),,
Guidelines for handling lack of fit (section 3.4.2).
Development of methods for applying pharmacokinetic considerations (section
3.3.2).
Guidelines for selecting appropriate measure(s) of altered response (section
3.5.2).
Study of the sensitivity of the BMD to choice of model, particularly in relation
to the level of the BMR (section 3.6.2) and to the confidence limit size
(section 3.7.2).
Guidelines for selecting a single BMD when more than one is calculated
(section 3.8.2).
Investigation of uncertainty factors (section 3.9.2).
2.
3.

5.
7.
5.2. ADDITIONAL TOPICS FOR INVESTIGATION/DEVELOPMENT
5.2.1. Comparison of Dose-Response Curves for DifiFerent Types of Data and Toxic
Endpoints
In the process of applying the BMD approach to a number of data sets, as is required
for the last two research recommendations above, it could be worthwhile from a theoretical
63
-------
perspective to evaluate the various dose-response curve shapes for different forms of data
(e.g., quanta! versus continuous), for different toxic endpoints, and for different chemical
classes. Such a study could provide information on which endpoints appear to have a
threshold response versus a nonthreshold response and whether the dose responses of the
same effect from different chemicals appear to have the same shape. This information could
be used to construct hypotheses regarding underlying mechanisms that could be tested in
subsequent experiments. It would be particularly interesting to determine whether noncancer
responses appear in general to be "threshold-like." This research would have implications
concerning the appropriateness of applying different types of procedures for setting allowable
exposure for carcinogenic effects and various categories of noncarcinogenic effects.
One way to conduct such a study would be to apply the QW and CP models and study
the values of the shape parameter, k, from these models. A value of k = 1 is consistent
with a linear no-threshold dose response, whereas large values of k are more indicative of a
threshold.

5.2.2. Development of Dose-Response Models for Multiple Endpoints of Toxicity
Many types of noncancer toxicity are characterized by multiple types of effects; e.g.,
liver toxicity may be characterized by enzyme changes, pathology, weight changes, etc., and
neurotoxicity may include altered behavior, neurophysiological and neurochemical changes,
as well as altered structure. Various endpoints within organ systems are likely to be
interdependent but are often treated independently for purposes of risk assessment. A few
attempts have been made to model the interdependence of endpoints of toxicity, primarily in
the area of developmental toxicity. For example, Ryan et al. (1991) and Catalano et al.
(1993) have shown a correlation between fetal weight and malformations and have developed
a multinomial model that accounts for fetal weight, malformations, and prenatal death. This
approach allows for the analysis of continuous and discrete outcomes by assuming that the
discrete outcome has some corresponding unobserved latent variable, and that the continuous
outcome and the latent variable share a joint normal distribution. Results of BMD
calculations using this approach showed that a lower value resulted for the multivariate
64
-------
approach than for each outcome modeled individually, thus taking into account the risks from
all adverse events at once.
65
-------
6. REFERENCES
Allen, B. C.; Kavlock, R. J.; Kimmel, C. A.; Faustman, E. M. (1994a) Dose-response
assessment for developmental toxicity: II. Comparison of generic benchmark dose
estimates with NOAELs. Fund. Appl. Toxicol. 23: 487-495.

Allen, B. C.; Kavlock, R. J.; Kimmel, C. A.; Faustman, E. M. (1994b) Dose-response
assessment for developmental toxicity: III. Statistical models. Fund. Appl. Toxicol.
23: 496-509.

Andersen, M.; Clewell, H.; Gargas, M.; Smith, F.; Reitz, R. (1987) Physiologically based
pharmacokinetics and the risk assessment process for methylene chloride. Toxicol.
Appl. Pharmacol. 87: 185-205.

Anderson, E.; Carcinogen Assessment Group of the U.S. Environmental Protection Agency.
(1983) Quantitative approaches in use to assess cancer risk. Risk Anal. 3: 277-295.

Barnes, D. G.; Dourson, M. L. (1988) Reference dose (RfD): description and use in health
risk assessments. Reg. Toxicol. Pharmacol. 8: 471-488.

Barnes, D. G.; Daston, G. P.; Evans, J. S.; Jarabek, A. M.; Kavlock, R. J.; Kimmel,
C. A.; Park, C.; Spitzer, H.L. (1995) Benchmark dose workshop: criteria for use of
a benchmark dose to estimate a reference dose. Reg. Toxicol. Pharmacol., in press.

Bickel, P.; Doksum, K. (1977) Mathematical statistics: basic ideas and selected topics. San
Francisco: Holden-Day, Inc.

Calabrese, E. J. (1985) Uncertainty factors and interindividual variation. Reg. Toxicol.
Pharm. 5: 190-196.

Catalano, P. J.; Scharfstein, D. O.; Ryan, L. M.; Kimmel, C. A.; Kimmel, G. L. (1993) A
statistical model for fetal death, fetal weight, and malformation in developmental
toxicity studies. Teratology 47: 281-290.

Chemical Rubber Company (CRC). (1970) Standard mathematical tables. Selby, S., ed.
18th ed. Cleveland, OH: Chemical Rubber Company.

Clement International Corporation. (1990a) Health effects and dose-response assessment for
hydrogen chloride following short-term exposure. Unpublished report prepared for
EPA Office of Air Quality Planning and Standards.
66
-------
Clement International Corporation. (19905) Health effects and dose-response assessment for
acrolein following short-term exposure. Unpublished report prepared for EPA Office
of Air Quality Planning and Standards.

Cox, D.; Lindley, D. (1974) Theoretical statistics. London: Chapman & Hall.

Crump, K. (1984) A new method for determining allowable daily intakes. Fund ADD!
Toxicol. 4: 854-871. i '
r .

Crump, K. (1985) Mechanisms leading to dose-responsie models. In: Ricci, P., ed.
Principles of health risk assessment. Englewood Cliffs, NJ: Prentice Hall- tro 321-
372.

Crump, K.; Howe, R. (1985) A review of methods for calculating confidence limits in low
dose extrapolation. In: Krewski, D., ed. lexicological risk assessment. Boca Raton
FL: CRC Press, Inc. '

Crump, K.; Hoel, D.; Langley, H.; Peto, R. (1976) Fundamental carcinogenic processes and
their implications to low dose risk assessment. Cancer Res. 36: 2973-2979.

Dourson, M.; Stara, J. (1983) Regulatory history and experimental support for uncertainty
(safety) factors. Reg. Toxicol. Pharmacol. 3: 22,4-238.

Dourson, M.; Hertzberg, R.; Hartung, R.; Blackburn, K. (1985) Novel methods for the
estimation of acceptable daily intake. Toxicol. Ind. Health 1: 23-41.

Dourson, M. L.; Knauf, L. A.; Swartout, J. C. (1992) On reference dose (RfD) and its
underlying toxicity data base. Toxicol. Ind. Health. 8: 171-189.

Farland, W. F.; Dourson, M. L. (1992) Noncancer health endpoints: approaches to
quantitative risk assessment. In: Cothern, C. R., ed. Risk assessment. Boca Raton-
Lewis Publ.; pp. 87-106.

Faustman, E.M.; Allen, B.C.; Kavlock, R.J.; Kimmel, C.A. (1994) Dose-response
assessment for developmental toxicity: I. Characterization of data base and
determination of NOAELs. Fund. Appl. Toxicol. 23: 478-486.

Gaylor, D. (1989) Quantitative risk analysis for quanta! reproductive and developmental
effects. Environ. Health Perspect. 79: 243-246. , • •
l ... •
Gaylor, D.; Slikker, W., Jr. (1990) Risk assessment for neurotoxic effects. Neurotoxicoloev
11: 211-218. 3
67
-------
Haseman, J. (1984) Statistical issues in the design, analysis and interpretation of animal
carcinogenicity studies. Environ. Health Perspect. 58: 385-392.

Hattis, D.; Lewis, S. (1992) Reducing uncertainty with adjustment factors. The Toxicologist.
12: 1327.

Hattis, D.; Erdreich, L.; Ballew, M. (1987) Human variability in susceptibility to toxic
chemicals~a preliminary analysis of pharmacokinetic data from normal volunteers.
Risk Anal. 7: 415-426.

Howe, R.; Crump, K. (1982) GLOBAL 82: a computer program to extrapolate quanta!
animal toxicity data to low doses. Prepared for the Office of Carcinogen Standards,
Occupational Safety and Health Administration, U.S. Department of Labor, Contract
41USC252C3.

Jarabek, A. M.; Menache, M. G.; Overton, J. H,; Dourson, M. L.; Miller, F. J. (1989)
Inhalation reference dose (RfD): an application of interspecies dosimetry modeling for
risk assessment of insoluble particles. Health Phys. 57: 177-183.

Jarabek, A. M.; Menache, M. G.; Overton, J. H.; Dourson, M. L.; Miller, F. J. (1990)
The U.S. Environmental Protection Agency's inhalation RfD methodology: risk
assessment for air toxics. Toxicol. Ind. Health 6: 279-301.

Johnson, K.; Gorzinski, S.; Bodner, K.; Campbell, R.; Wolf, C.; Friedman, M.; Mast, R.
(1986) Chronic toxicity and oncogenicity study on acrylamide incorporated in the
drinking water of Fischer 344 rats.: Toxicol. Appl. Pharmacol. 85: 154-168.

Katz, G.; Krasavage, W.; Terhaar, C. (1984) Comparative acute and subchronic toxicity of
ethylene glycol monopropyl ether and ethylene glycol monopropyl ether acetate.
Environ. Health Perspect. 57: 165-175.

Kavlock, R. J.; Allen, B. C.; Kimmel, C. A.; Faustman, E. M. (1995) Dose-response
assessment for developmental toxicity: IV. Benchmark doses for fetal weight
changes. Fund. Appl. Toxicol. (in press).

Kendall, M. (1951) The advanced theory of statistics. Vol. 1. 5th ed. New York: Hafner
Publishing Company.

Kimmel, C.; Gaylor, D. (1988) Issues in qualitative and quantitative risk analysis for
developmental toxicology. Risk Anal. 8: 15-21.

Kodell, R. L.; Howe, R. B.; Chen, J. J.; Gaylor, D. W. (1991) Mathematical modeling of
reproductive and developmental toxic effects for quantitative risk assessment. Risk
Anal. 11(4): 583-590.
68
-------
Kupper, L.; Portier, C.; Hogan, M.; Yamamoto, E. (1986) The impact of litter effects on
dose-response modeling in teratology. Biometrics 42: 85-98.

Lehmann, E. (1975) Nonparametrics. Statistical methods based on ranks. San Francisco:
Holden-Day, Inc.

Lewis, S. C.; Lynch, J. R.; Nikiforov, A. I. (1990) A new approach to deriving community
exposure guidelines from no-observed-adverse-effect levels. Reg. Toxicol. Pharmacol
11: 314-330.

Mantel, N.; Schneiderman, M. A. (1975) Estimating "safe" levels, a hazardous undertaking
Cancer Res. 35: 1379-1386. • *'.

Miller, R.; Ayres, J.; Calhoun, L.; Young, J.; McKenria, M. (1981) Comparative short-term
inhalation toxicity of ethylene glycol monomethyl ether and propylene glycol
monomethyl ether in rats and mice. Toxicol. Appl. Pharmacol. 61: 368-377.

National Center for Toxicological Research (NCTR). (1981) Teratological evaluation of
sulfamethazine. Prepared for Research Triangle Institute: July 8 1981-
RTI-48/31U-2077. j

National Research Council (NRC) (1977) Drinking water and health. Washington, DC: Safe
Drinking Water Committee, National Academy of Sciences.

Office of Science and Technology Policy (OSTP). (1989) Chemical carcinogens: A review of
the science and its associated principles. In: Cohrssen, J. J.; Covello, V. T. (eds.)
Risk analysis: A guide to principles and methods for analyzing health and
environmental risks. Washington, DC: Council on Environmental Quality NTIS
PB89-137772 RDM.

Peto, R.; Pike, M.; Day, N.; Gray, R.; Lee, P.; Parish, S.; Peto, J.; Richards, S.;
Wahrendorf, J. (1980) Guidelines for simple, sensitive significance tests for
carcinogenic effects in long-term animal experiments. Annex. In: Long-term and
short-term screening assays for carcinogens: a critical appraisal. IARC monographs
on the evaluation of the carcinogenic risk of chemicals to humans, supplement 2.
Lyon: International Agency for Research on Cancer; pp. 311-426.

Rai, K.; Van Ryzin, J. (1985) A dose-response model for teratological experiments involving
quantal response. Biometrics 41: 1-9.

Renwick, A. G. (1991) Safety factors and establishment of acceptable daily intakes Food
Add. Contamin. 8: 135-150.
69
-------
Renwick, A. G. (1993) Data derived safety factors for the evaluation of food additives and
environmental contaminants. In press.

Ryan, L. M. (1992) Quantitative risk assessment for developmental toxicity. Biometrics 48:
' 163-174.

Ryan, L. M.; Catalano, P. J.; Kimmel, C. A.; Kimmel, G.L. (1991) On the relationship
between fetal weight and malformation in developmental toxicity studies. Teratology
44: 215-223.

Sanders, O. T.; Zepp, R. L.; Kirkpatrick, R. L. (1974) Effect of PGB ingestion on sleeping
times, organ weights, food consumption, serum corticosterone and survival of albino
mice. Bull. Environ. Contam. Toxicol. 12: 394-399.

SAS. (1988) SAS/STAT User's Guide, Release 6.03 edition. Gary, NC: SAS Institute, Inc.

Tarone, R.; Ware, J. (1977) On distribution-free tests for equality of survival distributions.
Biometrika 64: 156-160.

Tukey, J.; Ciminera, J.; Heyse, J. (1985) Testing the statistical certainty of a response to
increasing doses of a drug. Biometrics 41: 295-301.

U.S. Environmental Protection Agency (U.S. EPA). (1986) Guidelines for the health
assessment of suspect developmental toxicants. Federal Register 50: 39426-39436.

U.S. Environmental Protection Agency (U.S. EPA). (1987) The risk assessment guidelines of
1986. Washington, DC: Office of Health and Environmental Assessment. EPA/600-8-
87-045.

U.S. Environmental Protection Agency (U.S. EPA). (1988a) Proposed guidelines for
assessing male reproductive risk. Federal Register 53: 24850-24969.

U.S. Environmental Protection Agency (U.S. EPA). (1988b) Proposed guidelines for
assessing female reproductive risk. Federal Register 53:24834-24847.

U.S. Environmental Protection Agency (U.S. EPA). (1989) Risk assessment guidance for
Superfund. Vol. I: Human health evaluation manual. Interim final. Washington, DC:
Office of Emergency and Remedial Response.

U.S. Environmental Protection Agency (U.S. EPA). (1990) Interim methods for development
of inhalation reference concentrations. Washington, DC: Office of Health and
Environmental Assessment. EPA 600/8-88/066F.
70
-------
U.S. Environmental Protection Agency (U.S. EPA). (1992) IRIS. Background document
(4/1/91). Cincinnati, OH: Office of Health and Environmental Assessment,
Environmental Criteria and Assessment Office.
71
-------
-------
APPENDIX—STATISTICAL METHODS
- i
A.I. BMD APPROACH
This section describes the statistical procedures associated with the fitting of the BMD
models to experimental data. The likelihood approach to parameter estimation is presented
as are the methods used to evaluate the fit of the models to the data.
Maximum Likelihood Procedures for Quanta], Endpoints. Consider an experiment
with g dose levels d1} ..., dg, and let N5 and X;, respectively, be the number of animals tested
and the number of animals affected at the ith dose level. Let P(d) be the probability that an
animal is affected when exposed to a dose d. Assuming that Xj has a binomial distribution
with parameters Nj and P(d), the likelihood of the. data can be written as
LQ =
II
1=1
The parameters that define P(d) are the only unknowns; they are estimated by the values that
maximize the value of LQ (Cox and Lindley, 1974).
Maximum Likelihood Procedures for Continuous Endpoints. Consider an
experiment with g dose levels dt, ..., dg, and let Nj be the number of animals in the ith dose
group, and let Xy, j = 1, ..., N5, i = 1, ..., g represent the response of the jth animal in the
ith dose group. It is assumed that Xy has a normal distribution with mean m(dj) and variance
o-2. The unknown parameters in the model consist of the parameters defining m(d) (see table
4 of the text), plus alt ...,
-------
(N.-1) '
where, again, the sum runs from 1 to N;. Then the likelihood of the data can be written as

rl)s? + Ni(xi-m(di))2]/2af}.
The parameters of the continuous BMD model, as well as the variances a,2, ...,
-------
quantal endpoints, an approximate chi-square test is employed; for continuous endpoints, an
F test is performed.
For quantal responses, the observed values are numbers of responders and the models
predict numbers of responders. The chi-squared test statistic, C, is
= £
P(d;)]2
* P(d;) * [l-P(d;)3
where the sum runs from 1 to g and the notation here is the same as that presented earlier.
The degrees of freedom associated with this test are normally g-[number of parameters
estimated]. If some of the parameter estimates fall on the boundary of the parameter space,
the degrees of freedom are approximated as follows (Anderson et al., 1983). From the
number of dose groups, subtract 1 for estimating the parameter c (the background rate) and
subtract 1 for each of the other parameters for which the maximum likelihood estimate is not
a boundary value.16
The value of C may be compared to the quantiles of a chi-square distribution. For
example, if C equals or exceeds the quantile for (1-a); where a = 0.01, then we may
conclude that the model did not fit the observed data.
For continuous responses, the mean squared error for lack of fit is compared to the
mean squared error associated with pure error to determine if a continuous model has fit the
data. The sum of squares associated with the pure emir is

SSe = E (Nrl)*Si2,

which has dfe = £(Nrl) degrees of freedom. In both cases the sum runs from 1 to g and N;
and S;2 are as defined above. The sum of squares associated with lack of fit is
16The parameters in the quantal BMD models are constrained to lie within certain ranges (see table 3). A
parameter estimate may equal one of the values that define the range for the parameter, in which case a
degree of freedom is not lost.

A-3
-------
SS = E
which has dff degrees of freedom. The value of dff is equal to the number of dose groups,
g, minus 1 (for the estimation of the background parameter c) minus 1 for each of the other
parameters for which the estimated value is not equal to a boundary value.
The test statistic ,

F' = [SS/dff] / [SSe/dfJ

is distributed according to an F distribution with degrees of freedom dff and dfe. The value
of F' can be compared to tabulated quantiles of the F distribution with the specified degrees
of freedom (Bickel and Doksum, 1977; CRC, 1970) to determine if the model fits the data.
For example, when F' equals or exceeds the quantile corresponding to (1-a), where a =
0.01, then we may conclude that the model did not fit the observed data.
Application of the BMD Approach to Two Dose Groups. Although the BMD models
listed in table 4 involve three or more parameters, the recommended method for computing
statistical bounds will provide a unique lower bound dose even when the data are for only
two dose groups (e.g., a control group and one treatment group). For the QQR and CQR
models, the lower bound is the same as the one that would have been obtained had the
parameter d0 been fixed at d0 = 0. For the QW and CP models, the lower bound is the
same as the one that would have been obtained if the parameter k had been fixed at k = 1.
This value of k makes the models assume a linear, no-threshold form. Similar results apply
to other models.
Unlike the statistical bounds, the maximum likelihood estimate (MLE) of dose
obtained using the models will not be unique when there are only two dose groups. If an
MLE is required in such a situation, it is recommended that it be calculated using the models
and constraints discussed in the previous paragraph (i.e., do = 0 for the QQR and CQR
models and k = 1 for the QW and CP models). These selections will generally provide the
lowest possible MLE of dose corresponding to a fixed, small level of increased response.
A-4
-------
Computer Programs. The fitting procedures described above require sophisticated
optimization routines involving iterative numerical calculations. K.S. Crump Division of
Clement International has developed software to perform the calculations and to evaluate the
fit of models to the data. The software implements the QQR, QW, CQR, and CP models,
among others. That software was used for all examples discussed in this document.
|- • •
A.2. STATISTICAL DETERMINATION OF A NOAEL
A NOAEL is defined as the highest experimemtal dose at which there is no
statistically or biologically significant increases in frequency or severity of adverse health
effects compared with corresponding controls. Thus, there should be no statistically
significant evidence of a relationship between dose and response for doses up to the NOAEL.
Although pairwise tests that compare a single treatment group with the control group are
generally used in determining NOAELs, trend tests are available that make use of the data
from all of the dose groups up to and including the putative NOAEL. These procedures test
for the presence of a trend toward increased responses at increasingly higher doses. These
tests incorporate more of the data than pairwise tests; consequently they are generally more
powerful.
NOSTASOT Dose. Tukey et al. (1985) have proposed a procedure for determining a
no statistical significance of trend (NOSTASOT) dose. This procedure has greater power for
determining dose relationships than do multiple pairwiise tests (Tukey et al., 1985) and can be
used to define a NOAEL. The procedure is described as follows.
First, select a suitable trend test. Selection of such a test depends on the type of
endpoint in question and the data available for analysis. Recommended tests for the
situations likely to arise in the analysis of noncancer health effects are presented below.
A-5
-------
After selecting the appropriate trend test, apply the test to all dose groups. If the test
indicates no significant trend, then the highest dose may be considered a NOAEL.17 If the
test applied to all dose groups detects a significant trend, then the highest dose group cannot
be a NOAEL. In that case, delete the highest dose group from consideration and repeat the
trend test. The highest dose level for which there is no statistically significant trend is the
NOAEL (NOSTASOT dose) if biological or lexicological considerations do not suggest
otherwise.
Recommended Trend Tests. Trend tests are proposed here for continuous endpoints
and quanta! endpoints.
For quanta! endpoints, the Mantel-Haenszel trend test (Haseman, 1984) is
recommended. The Mantel-Haenszel trend test relies on the following test statistic:

„ £ d,(X, - Et)
where Ej = N;*(SXi/ENi), d; is the dose level for group i, N} is the number of animals tested
in group i, X; is the number of animals with the endpoint of interest in group i, and
y _ (EN. - £X;) * (£XJ * [(EN,) * (£N, * d?) - (£Nt * d;)2]
(EN;)2 * (£N, - 1)

In all these equations the summations run over all dose groups. The significance of the
Mantel-Haenszel test can be determined by comparing the value of Z with quantiles from a
standard normal table (Bickel and Doksum, 1977; CRC, 1970). At the 5 percent level of
significance, for example, Z> 1.645 indicates a significant trend.
"Some judgment may be required because in certain circumstances the absence of a significant trend when
considering all the doses may reflect biological realities that cannot be accounted for by a single trend test.
As an example, consider an experiment with a compound that causes two effects. Suppose the occurrence of
one of the endpoints makes the observation of the second endpoint less likely (e.g., death or resorption in
developmental toxicity studies obscures the occurrence of malformations). In such an instance, the lack of
significant trend for the second endpoint, when considering all dose groups, may reflect the fact that the first
endpoint is occurring so often in the high-dose group(s) that the second endpoint cannot be detected in as
many animals and consequently makes the trend for that endpoint nonsignificant.

A-6
-------
The Mantel-Haenszel test, as stated, may not be appropriate whenever there are
significant differences in survival. An important case is one in which the presence of the
toxic effect is only identified at necropsy and it is not a fatal effect (i.e., does not cause the
death of the animal). In this case the period of observation for the experiment can be
divided into subintervals within which there is relatively little variation in death times. The
X;, NI, Ej, and V values can be calculated as described above for each subinterval. A new Z
statistic is calculated as
^ - Eik)
(W '
where X^ is the number of animals with the toxic effect among animals in the ith treatment
group that die in the km subinterval, Eik is the corresponding expected number based on
animals that die in the kth subinterval, and Vk is the coirresponding variance. The
significance of this test is also evaluated by comparing this statistic with quantities from a
standard normal table.
Modifications of the Mantel-Haenszel test that are appropriate when either (1) the
toxic effect causes the death of the animal or (2) the effect can be identified before the death
of the animal are discussed in Peto et al. (1980).
For continuous endpoints, Jonckheere's trend test is recommended (Lehmann, 1975).
This is a nonparametric test that is an extension of the Mann-Whitney (Wilcoxon) test. A
nonparametric test is recommended here because such tests make few assumptions about the
distribution of the endpoint under consideration. Given the variety of endpoints that may be
analyzed under the NOAEL approach, the lack of distributional assumptions with the
Jonckheere test may be advantageous.
To apply Jonckheere's test, one must have the individual observations (i.e., the values
of the endpoint for each animal examined). When working from summary reports of the
experiments (especially those found in the published literature), these individual values may
not be available. In such a case, the following likelihood ratio trend test based on the CP
model is recommended. ,
A-7
-------
First, fit the CP model to the data with the power, k, set equal to one. Second, apply
the CP model with the dose coefficient, qls set equal to zero and k still equal to one. In each
run, the log-likelihood is maximized; denote the values of the two log-likelihoods as LLj and
1X2 for the first run and the second run, respectively. Then,

Cffl = 2*(LLrLL2)
is a likelihood ratio test statistic that is distributed approximately as a chi-square with
one degree of freedom under the null hypothesis of no treatment effect. The statistic CHI
tests whether the linear dose coefficient is significant (i.e., whether a significant dose-related
trend exists). Comparison of CHI with the one-degree-of-freedom chi-squared quantile
corresponding to (l-2a) determines whether the trend is significant for significance level a,
based on a one-sided (directional) test of trend.
Alternatively, versions of nonparametric trend tests that are extensions of log-rank and
Wilcoxon tests (Tarone and Ware, 1977) may be applied to either quanta! or continuous data.
Painvise Tests. As alternatives to the trend tests listed above, one may wish to
employ pairwise tests to determine if a dose group is significantly different from the control
group irrespective of the overall trend. As noted, however, the trend tests have greater
power for detecting a significant dose-related increase than do the pairwise tests (Tukey et
al., 1985). The problem of multiple comparisons must also be considered when doing many
pairwise tests. Nevertheless, pairwise tests may provide useful supplementary information
that can be used in addition to the NOSTASOT approach.
For quanta! data, Fisher's exact test is the recommended pairwise test (Bickel and
Doksum, 1977). For continuous data, a nonparametric approach is recommended for the
pairwise comparisons as well as the trend tests. The Mann-Whitney (Wilcoxon) test is
suitable in cases where the individual data are available (Lehmann, 1975). When group
means and standard deviations are available but the individual results are not available, t-tests
may be applied to test for pairwise differences (Bickel and Doksum, 1977). The
nonparametric approach is preferred when the individual data are available because it avoids
distributional assumptions.
A-8
-------
Computer Programs. Statistical software packages such as SAS (SAS, 1988) contain
programs that can implement most of the statistical tests discussed for the NOSTASOT
procedure.
A-9
-------
-------
GLOSSARY

Adverse effect. A biochemical change, functional impairment, or pathological lesion that
either singly or in combination adversely affects the performance of the whole organism or
reduces an organism's ability to respond to an additional environmental challenge.

Benchmark dose (BMD). A statistical lower confidence limit on the dose producing a
predetermined, altered response for an effect.

Benchmark response (BMR). A predetermined level of altered response or risk at which
the benchmark dose is calculated.

Biologically significant effect. A response in an organism or other biological system that is
considered to have a substantial or noteworthy effect (positive or negative) on the well-being
of the biological system. Used to distinguish statistically significant effects or changes,
which may or may not be meaningful to the general state of health of the system.

Chronic exposure. Long-term exposure usually lasting 6 months to a lifetime.

Confidence limit. A confidence interval for a parameter is a range of values that has a
specified probability (e.g., 95 percent) of containing the parameter. The confidence limit
refers to the upper or lower value of the range (e.g., upper confidence limit).

Continuous endpoint. A measure of effect that is expressed on a continuous scale (e.g.,
body weight or serum enzyme levels).

Critical effect. The first adverse effect, or its known precursor, that occurs as the dose rate
increases.
G-l
-------
Critical study. A bioassay performed on the most sensitive species used as the basis of RfD
determination.

Developmental toxicity. Adverse effects on the developing organism that may result from
exposure prior to conception or postnatally to the time of sexual maturation. Adverse
developmental effects may be detected at any point in the life span of the organism. Major
manifestations of developmental toxicity include death of the developing organism; induction
of structural abnormalities (teratogenicity); altered growth; and functional deficiency.

Dose-response relationship. A relationship between (1) the dose, either "administered dose"
(i.e., exposure) or absorbed dose, and (2) the extent of toxic injury produced by that
chemical. Response can be expressed either as the severity of injury or proportion of
exposed subjects affected. A dose-response assessment is one of the four steps in a risk
assessment.

Endpoint. An observable or measurable biological or chemical event used as an index of the
effect of a chemical on a cell, tissue, organ, organism, etc.

Extrapolation. An estimate of response or quantity at a point outside the range of the
experimental data. Also refers to the estimation of a measured response in a different
species or by a different route than that used in the experimental study of interest (i.e.,
species-to-species, route-to-route, acute-to-chronic, high-to-low).

Genotoxic. A broad term that usually refers to a chemical that has the ability to damage
DNA or the chromosomes. This can be determined directly by measuring mutations or
chromosome abnormalities or indirectly by measuring DNA repair, sister-chromatid
exchange, etc. Mutagenicity is a subset of genotoxicity.

Lifetime. Covering the life span of an organism (generally considered 70 years for humans).
G-2
-------
Lowest observed adverse effect level (LOAEL). The lowest dose or exposure level of a
chemical in a study at which there is a statistically or biologically significant increase in the
frequency or severity of an adverse effect in the exposed population as compared with an
appropriate, unexposed control group.

Maximum likelihood estimate (MLE). A statistical best estimate of the value of a
parameter from a given data set.

Model. A mathematical representation of a natural system intended to mimic the behavior of
the real system, allowing description of empirical data, and predictions about untested states
of the system.

Neurotoxicity. Ability to damage nervous tissue.
i

No observed adverse effect level (NOAEL). An exposure level at which there are no
statistically or biologically significant increases in the frequency or severity of adverse effects
between the exposed population and its appropriate control; some effects may be produced at
this level, but they are not considered as adverse or precursors to adverse effects. In an
experiment with several NOAELs, the regulatory focus is primarily on the highest one,
leading to the common usage of the term NOAEL as the highest exposure without adverse
effect.

Pharmacokinetics. The field of study concerned with defining, through measurement or
modeling, the absorption, distribution, metabolism, and excretion of drugs or chemicals in a
biological system as a function of time.

Population variability. The concept of differences in susceptibility of individuals within a
population to toxicants due to variations such as genetic differences in metabolism and
response of biological tissue to chemicals.
G-3
-------
Quantal endpoint. A dichotomous measure of effect; each animal is scored "normal" or
"affected" and the measure of effect is the proportion of scored animals that are affected.

Reference concentration (RfC). An estimate (with uncertainty spanning perhaps an order of
magnitude) of a continuous inhalation exposure to the human population (including sensitive
subgroups) that is likely to be without an appreciable risk of deleterious noncancer effects
during a lifetime.
f

Reference dose (RfD). An estimate (with uncertainty spanning perhaps an order of
magnitude) of a daily exposure to the human population (including sensitive subgroups) that
is likely to be without appreciable risk of deleterious noncancer effects during a lifetime.

Reproductive toxicity. Harmful effects on fertility, gestation, or offspring caused by
exposure of either parent to a substance.

Risk. The probability of injury, disease, or death under specific circumstances, relative to
the background probability. In quantitative terms, risk is expressed in values ranging from
zero (representing the certainty that the probability of harm is no greater than the background .
probability) to one (representing the certainty that harm will occur).

Risk assessment. The scientific activity of evaluating the toxic properties of a chemical and
the conditions of human exposure to it both to ascertain the likelihood that exposed humans
will be adversely affected and to characterize the nature of the effects they may experience.
The assessment may involve the following four steps:
Hazard identification. The determination of whether a particular chemical is or is
not causally linked to particular health effect(s).
Dose-response assessment. The determination of the relation between the magnitude
of exposure and the probability of occurrence of the health effects in question.
Exposure assessment. The determination of the extent of human exposure.
G-4
-------
Risk characterization. The description of the nature and often the magnitude of
human risk, including attendant uncertainty.

Statistically significant effect. In statistical analysis of data, a health effect that exhibits
differences between a study population and a control group that are unlikely to have arisen
by chance alone.

Subchronic exposure. Exposure to a substance spanning no more than approximately 10
percent of the lifetime of an organism.

Threshold toxicant. A substance showing an apparent level of effect that is a minimally
effective dose, above which a response may occur and which dose no response is expected.

Uncertainty. In the conduct of risk assessment (hazard identification, dose-response
assessment, exposure assessment, risk characterization) the need to make assumptions or best
judgments in the absence of precise scientific data creates uncertainties; These uncertainties,
expressed qualitatively and sometimes quantitatively, attempt to define the usefulness of a
particular evaluation in making a decision based on the available data.

Uncertainty factor (UF). One of several, generally 10-fold factors used in operationally
i
deriving the reference dose (RfD) from experimental data. UFs are intended to account for
(1) the variation in sensitivity among members of the human population; (2) the uncertainty
in extrapolating animal data to the case of humans; (3) the uncertainty in extrapolating from
data obtained in a study that is of less-than-lifetime exposure; and (4) the uncertainty in using
LOAEL data rather than NOAEL data.
G-5
*U.S. GOVERKMENT PRINTING OFFICE: 1995-650-006/2204 1
-------
-------