Benchmark Dose Technical Guidance Document (Draft)


 ,         ,                ,                 .                     ,        .
*      DRAFT--    .                    .                          EPA/600/P-96/002A
       DO NOT CITE OR QUOTE                                        August 9, 1996
                                                "             .    External Review Draft
               Benchmark Dose Technical Guidance Document
                                         NOTICE

       IMS-DOCUMENT IS A PRELIMINARY DRAFT. It has not been formally released by the
       U.S. Environmental Protection Agency and should not at this stage be construed to represent
       Agency policy. It is being circulated for comment on its technical accuracy and policy
       implications.         .    '                                               •
                                    .Risk Assessment Forum
                              U.S. Environmental Protection Agency
                                    Washington~DC 20460

-------

-------
                                   DISCLAIMER

       This document is an external draft for review purposes only and does not constitute U.S.
 Environmental Protection Agency policy. Mention of trade names or commercial products does
 not constitute endorsement or recommendation for use.
DRAFT-DO NOT CITE OR QUOTE        ii                         AUGUST 9, 1996

-------

-------
                                    CONTENTS
              • •
 .Figures	:.......
                 ••••••••••••••••:•••••••••	  vi
 Preface  . .... .. .....;.:..'........,'...-...'.. .';;..';._.;'.... .';•., . . ;, .'„ '......... viT
 Authors and Reviewers  .........:. .....                ,                          :Y
                          i             	'''.	..%-.........  ix
 I. EXECUTIVE SUMMARY .. .......	.	. ......;..  ,..                       j
       A. Introduction	::..........                \
       B. Decision Process for Carrying Out a Benchmark Dose/Concentration Analysis ..... 3
             1: Data array analysis - endpoint selection .		.......'.:             3
    ,.'•'• 2.. Minimum data set for calculating a BMD/C	4
             3. Criteria for selecting the benchmark response (BMR)  . . ...               4
             4. Order of model application	...,.....:.....	   ...  ...   5
>             5, Determining the model structure .	 .......... ;          5
             6. ConfidenceUmitcalculation ..'..'	       7
             7.  Selection of "equivalent" models ...........................    7
             8.  Selecting the BMD/C					             g
             9. Use in Risk Assessment	          9
H. INTRODUCTION . .... ....;..........	 .. ;	  .........            I0
       A.  Background	                        10
       B.  Purpose of This Guidance Document		 16
       C.  Definition of the Benchmark Dose or Concentration	 .        17
ffl. BENCHMARK DOSE GUIDANCE ...,.,..   	.. .;     .......       "18
       A.  Data Array Analysis - Endpoint Selection	            18
             1. Selection of Endpoints to be Modeled	 19
             2. Minimum Data Set for Calculating a BMD/C . . . . . .	            19
             3. Combining Data for a BMD/C Calculation	        .20
       B.  Criteria for Selecting the Benchmark Response Level (BMR) .'..-,	:..... 20
       C.  Mathematical Modeling ....:....	'.'....	       22
             1-. Introduction	   	         ....... 22
             2. Order of Model Application	 23

DRAFT-DO NOT CITE OR QUOTE       iii                        AUGUST 9, 1996

-------
            3.. Determining the Model Structure	".'..•	'.. 23
            4. Confidence Limit Calculation .	 25
            5. Selection of "Equivalent" Models 	...../...'.,		..... 25
            6. Selecting the BMD/C	.. ..	 26
IV. USING THE BMD/C IN NONCANCER DOSE-RESPONSE ANALYSIS	 28
      A. Introduction	 ... 28
      B. Effect of BMD/C Approach on Use of Uncertainty Factors	 28
      C. Dose-Response Characterization	:.. .	, .29
V. USE OF BENCHMARK-STYLE APPROACHES IN CANCER RISK ASSESSMENT .. 31
      A, Use in Hazard Ranking	 31
            1. Guidance for Developing Comparable EDi0s	 31
            2. Sources of TD50s and ED10s			\ .'.32
      B. Use in Low-Dose Extrapolation	 33
            1. Proposal for Low-Dose Extrapolation under the 1996 Proposed
            Guidelines for Carcinogen Risk Assessment	:... 33
            2. Issues in Choosing a Point of Departure	'	 37
      C. Dose-Response Characterization -.....'	 38
VI. FUTUREPLANS	,.. 40
      A. Updating of Guidance Document	40
      B. Potential for Use of the BMD/C in Cost-Benefit Analyses . . /	 40
            1. Background	'	40
            2. Benefits estimates using BMD/C and RfD/C	41
            3. Issues in quantifying noncancer health benefits	 42
      C. Research and Implementation Needs	43
VH. REFERENCES	.',	>.	46

APPENDICES
A. Aspects of Design, Data Reporting, and Route Extrapolation
   Relevant to BMD/C Analysis	'.	:	 55

DRAFT-DO NOT CITE OR QUOTE       iv                        AUGUST 9, 1996

-------
      1.'Design .... . .		 . . .	55
      2. Aspects of Data Reporting ...............:.........'. •. .....,...:	 55
•  •                                 '     '•'"."'••'••'•.        ••''•-'
      3. Route Extrapolation	 .	 57
B. Selecting the Benchmark Response (BMR) Level	 .....'..	58
      1. Biologically Significant Change for Specifying the BMR	.. 58
      2. Limit of Detection for Specifying the BMR .	.:,.....,...	,	.59
      3. Examples	62
                      '        •                     /                    b      -• • •
      4. Selecting the'Critical Power Level for the Limit of Detectability	65
C. Mathematical Modeling ..............,;.......	...... i,..-	 66
      1. Introduction	,	... ± . I........	 66
      2. Model Selection ....;..,.	 66
             a. Type of endpoint	:.. .67
             b. Experimental design	68
             c.. Constraints and variables	..............:.....	'... 69
      3. Model Fitting  ......		 69
      4. Assessing How Well the Model Describes the Data	 71
      5. Comparing Models	 .	 72
      6. Using Confidence Limits to Get a BMD/C	 .. ...	  ....  . ......... 73
D. Examples of BMD/C Analyses ..	 76
      Example#i: CarbonDisulfide	 .-.,.•	 . ..... .... ... 76
      Example #2: l,l,i,2-Tetrafluoroethane (HFC-134a) .		 80
      Example #3: Boron	*...,........	 : . 85
      Example #4: 1,3-Butadiene	'...'	91
E. List of Models Planned to be Included in the First Release of the EPA Software ...... . .. 94
DRAFT-DO NOT CITE OR QUOTE        v                        AUGUST 9, 1996

-------
        ,  '•    .                  LIST OF FIGURES
         (      *    *                  •                                %

  1.     Example of calculation of aBMD	.".' ..!.......             13
  2.     Dose response curves for models incorporating different forms of spontaneous
                                     i                . .  ,               ,
        background response	-.	               61
  3.     Detectable extra risk or additional risk for a sample size of 25 (N)	64
  4.     Carbon disulfide modeling	           79
  5.     HFC-134amodeling	;	..'....;...... .'•	   84
  6.     Boron modeling - continuous data	  89
  7.     Boron modeling - dichotomous data	     90
 8.'     1,3-Butadiene modeling	  93
DRAFT-DO NOT CITE OR QUOTE       vi                        AUGUST 9, 1996

-------
-....". .PREFACE

This draft document on guidance for application of the benchmark dose/concentration
(BMD/C) approach in cancer and nqncancer dose-response assessment is being developed by a
Technical Panel of the Risk Assessment Forum for use by the U.S. Environmental Protection
Agency. This document is intended to be used together with the background document on use of
the benchmark dose approach published earlier by the Agency (EPA, 1995c). While the major
audience for this document is risk assessors within the Agency, it is expected that this guidance
will-be considered and possibly used by other organizations as well. .
This draft guidance document is currently under development, but is being made available
at this stage for a peer consultation .workshop to be held September 10-11, 1996, at the Holiday
Inn, Bethesda, MD. Several experts in the areas of toxicology, statistics, and mathematical.
modeling have been asked to review the document and provide input at this early stage of
development on several issues for which there is ongoing discussion within the Agency. These
issues include: the appropriate selection of studies and responses for BMD/C analysis, use of
biological significance or limit of detection for selection of the benchmark response (BMR),
model selection and fitting, use of the lower confidence limit as the BMD/C, and the default
decision approach proposed in this document for the BMD/C analysis.
Our overall goal in developing this document is to have a procedure that is usable, that has
reasonable criteria and defaults to avoid proliferation of analyses and model shopping, and that
promotes consistency among analyses. Ultimately, we are trying to move cancer and noncancer
assessments closer together, using precursor and mode of action data to extend and inform our
understanding of risk in the range of extrapolation. We would like to have in one package
something that is usable for cancer and noncancer assessments when endpoints are relevant to
both. :
' ~ *' f - ' ."-'*•'"
We also are asking reviewers to comment on how understandable the document is for the
general toxicologist and risk assessor, In an effort to achieve a readable and usable document, the
primary information on approaches and .defaults is contained in the main body of the document, ,
with more detailed discussion of various steps in the process included in the appendices. Several

DRAFT-DO NOT CITE OR QUOTE vii AUGUST 9, 1996

-------
 examples of application of the BMD/C approach using the procedures recommended in this
 guidance are included in an appendix. In a separate but parallel effort, the EPA is developing
 user-fiiendly software for BMD/C analysis to be distributed widely for use by the risk assessment
 community, and our goal is to make the software consistent with the guidance and. default
 procedures developed in this document.
DRAFT-DO NOT CITE OR QUOTE       viii                         AUGUST 9,1996

-------
                          AUTHORS AND REVIEWERS

TECHNICAL PANEL

Carole A. Kimmel, Chair*
R. Woodrow, Setzer, Jr.*                              ,
Dan Guth*
Elizabeth Margosches*
Suzanne Giannini-Spohn*
Linda Teuschler*
Jini Cogliano*
Annie Jarabek*
Jeanette Wiltse
RobertMacPhail
Yogendra Patel                                        •
William Sette     .         /                           •
Carole Braverman
Rick Hertzberg
John Vandenberg
Hugh Pettigrew


* Authors
DRAFT-DO NOT CITE OR QUOTE        ix             .           AUGUST 9, 1996

-------

-------
.. I. EXECUTIVE SUMMARY
• . i - _ , , „
A. Introduction
The US EPA conducts risk assessments for an array of noncancer health effects as well as
for cancer. The process of risk assessment, based on the National Research Council (NRC)
paradigm (1983), has several steps: hazard characterization, dose-response analysis, exposure
assessment, and risk characterization. Risk assessment begins with a thorough evaluation of all
the available data to identify and characterize potential health hazards. The next stage of the
process, dose-response analysis, involves an analysis of the relationship between exposure to the
chemical and health-related outcomes,,and historically has been done very differently for cancer
and noncancer health effects. The common practice under the 1986 Guidelines for Carcinogen
' i . . • . ' ' • . ; ' •
Risk Assessment (EPA, 1986a) was to assume low dose linearity and model tumor incidence data
by applying the linearized multistage (LMS) procedure, which extrapolates risk as the 95% upper
confidence limit. The standard practice for the dose-response analysis of noncancer health effects
has been to determine a Ipwest-observed-adverse-effect-level (LOAEL) and a no-observed-
adverse-effect-level (NOAEL).. The LOAEL is the lowest dose for a given chemical at which
adverse effects have been detected while the NOAEL is the highest dose at which no adverse
effects have been' detected. The NOAEL (or LOAEL, if a NO AEL is not present) is adjusted
downward by uncertainty factors intended to account for limitations in the available data to arrive
at an exposure that is likely to be without adverse effects in humans. The NOAEL can also be
compared with the human exposure estimate to derive a margin of exposure~(MOE). The
LOAEL and NOAEL represent an operational definition of quantities that can characterize a
study, and do not necessarily have any consistent association with underlying biological processes
nor with thresholds. _•'•!'•
With the recent publication of EPA's Proposed Guidelines for Carcinogen Risk
Assessment (1996a), the dichotomy between quantitative approaches for cancer and noncancer
risks will begin to break down. The proposed guidelines promote the understanding of an agent's
mode of action in producing tumors to reduce the uncertainty in describing the likelihood of harm
and in determining the dose-response(s). The proposed guidelines call for modeling of hot only

DRAFT-DO NOT CITE OR QUOTE 1 AUGUST 9, 1996

-------
1 .tumor data but other responses thought to. be important precursor events in the carcinogenic
* ' ' , • •. -, .
2 . process. The dose-response assessment under these guidelines is a two-step process: (1) response
3 data are modeled in the range of empirical observation -- modeling in the observed range is done
4 with biologically-based, case-specific, or with appropriate curve-fitting models; and the (2)
5 extrapolation below the range of observation is accomplished by modeling if there are sufficient
6 data or by a default procedure (linear, nonlinear, or both). A point of departure for extrapolation
* '*"'..
7 is estimated from this modeling, and low dose extrapolation proceeds from this point of
8 departure. .
9 The benchmark dose approach discussed in this document is a way of determining the
10 point of departure, and can be used in cancer and noncancer risk assessment as the basis for linear
11 low-dose extrapolation, calculation of an MOE, or application of uncertainty factors for
12 calculating oral reference doses (RfDs), inhalation reference concentrations (RfCs) or other
13 exposure estimates.
14 Several limitations to the use of the LOAEL and NOAEL have been discussed in the
« '
15 literature (Crump, 1984; Gaylor, 1983; Kimmel and Gaylor, 1988) and by the EPA's Science
16 Advisory Board (EPA, 1986b, 1988a, b, 1989c). These include the fact that the NOAEL is
17 limited to one of the doses in the study and is dependent on study design, in particular on dose
18 selection and spacing. The NOAEL also does not account for variability in the data, and thus, a
19 study with a limited number of animals will often result in a higher NOAEL than one which has
20 more animals. In addition, the slope of the dose-response curve is not taken into account in the
21 selection of a NOAEL and is not usually considered unless the slope is very steep or very shallow.
22 Additionally, a LOAEL cannot be used to derive a NOAEL when one does not exist in a study;
23 rather, an uncertainty factor is applied to account for this limitation.
24 In an effort to address some of the limitations of the LOAEL and NOAEL, Crump (1984)
25 proposed the benchmark dose (BMD) approach as an alternative. Using this approach, the
26 experimental data are modeled, and an oral benchmark dose (BMD) or inhalation benchmark
27 concentration (BMC) in the observable range is estimated. The BMD/C1 is not-constrained to be
is used when the text can apply to values derived from either oral or inhalation
. exposure data.
DRAFT-DO NOT CITE OR QUOTE 2 AUGUST 9, 1996

-------
. * l one of the-experimental doses, and can be used as a more consistent point of departure than either
2 the LOAEL or NOAEL. This approach uses the dose-response information inherent in the data.
3 The BMD/C accountsfor the variability in the data since it is defined asthe lower confidence limit
4. on the dose estimated to produce a given level of change hi response (termed the benchmark
5 response, BMR) from controls. The BMD/C approach models all of the data in a study and the
6 slope of the dose-response curve is integral to the BMD/C estimation. A BMD/C can be
7 estimated even when all doses in a study are associated with a response (i.e., when there is no
.8 NOAEL). The BMD/C estimate is best when there are doses in the study near the range of the
,9 BMD/C.
10 •' - . • . - '" "•.-•" • '"' '.'•'• - •. •
' - ' . • " " . , ' ' >
11 B. Decision Process for Carrying out a Benchmark Dose/Concentration Analysis
12 . This document describes the proposed approach for carrying out a complete BMD/C
13 analysis. It is organized in the form of a decision process including the rationale and defaults for
14 proceeding through the analysis, and follows a similar .framework to that outlined in the
15 background document (EPA, 1.995c), and is meant to be used hi conjunction with that document.
16 The guidance here imposes some constraints on the BMD/C analysis through decision criteria,
17 and provides defaults when more than one feasible approach exists. Steps in the guidance are
18 discussed hi the body of the document and in more detail in appendices to this document. The
19 following guidance is applicable to dose-response analysis of noncaricer health effects and may
f • - ' ' -""
20 also be applicable to cancer dose-response analysis under the proposed guidelines (EPA, \996.a).
21 1. Data array analysis - endpoint selection.
22 Selection of the appropriate studies and endpoints is discussed in Appendix A and hi
23 various EPA publications (U.S. EPA, 1991a, 1994c, 1995f, 1996a and b). In general, the
24 selection of endpoints to model should focus on endpoints that are relevant or assumed relevant
25 to humans and are potentially the "critical" effect (i.e.., the most sensitive). Since differences in
26 slope could result in an endpoint having a lower BMD but a higher LOAEL or NOAEL than
27 another endpoint, selection of endpoints should not be limited only to the one with the lowest
28 • LOAEL or NOAEL. In general, endpoints should be modeled if theirLOAEL is up to 10-fold
29 above the lowest LOAEL. This will insure that ho endpoints with the potential to have the lowest

DRAFT-DO NOT CITE OR QUOTE 3 AUGUST 9, 1996

-------
, 1 BMD are Bxclutfed from the analysis on the basis of the value of the LOAEL.
* ! . » -
* ' .• " ' *
2 2. Minimum data set for calculating a BMD/C
3 • Once the critical endpoints have been selected, the data sets are examined for.the
4 appropriateness of a BMD/C analysis. The following constraints on data sets are used:
5 4 At a minimum., the number of dose groups and subjects should be sufficient to allow
6 determination of a LOAEL. •
7 4 There must be more than one exposure group with a response different than controls (this
8 could be determined with a pairwise comparison or a trend test). With only one
9 responding group, there is inadequate information about the shape of the dose-response
10 curve, and mathematical modeling becomes too arbitrary.
11 4 Dose-response modeling is not appropriate if the responding groups all show responses
12 near maximum (e.g., greater than 50% response for quanta! data or a clear plateau of
13 response for continuous data). In this case, there is inadequate information about the
14 shape of the curve in the low dose region near the BMR.
15 3. Criteria for selecting the benchmark response (BMR)
16 The BMD/C approach requires that a level of change in response (the BMR) be specified
17 in order to calculate the BMD/C. In this proposal, there are two bases for specifying the BMR: a
18 biologically significant change hi response for continuous endpoints, or the limit of detection for
19 either quantal or continuous data. In most cases, the question concerning how much of a change
20 ° in a continuous endpoint is biologically significant has not been addressed, and the level of change
21 that is considered adverse is based hi large part on the detectable level of response or limit of'
22 detection for a particular study design. For quantal data, the number of responders is counted,
23 and again the limit of detection is used. The limit of detection is based on the background
24 response rate, sample size, power level, and whether extra or additional risk is used hi the model.
25 Standard study designs for various endpoints should be used to determine the general background
26 rate for a response level and the number of animals typically used. The following default
27 decisions will be used in selection of the BMR.
28 4 Biological significance: If a particular level of response for a continuous endpoint has
29 been determined to be "biologically significant," the BMR is based on that degree of

. DRAFT-DO NOT CITE OR QUOTE 4 AUGUST 9, 1996

-------
* l .- change from background (e.g., adult body weight reduction >10% from the control value;
2 • a 10% or more decrease in nerve conduction velocity).
3 4 "' Limit of Detection: When the biologically significantlevel of response has not been
'•' • • .'. , • •' \. • .• . ' ' / ' ' ._.....-. .'..'':'..•
4 determined or is equated with a statistically significant response, the limit of detection
5 method is used. To find the magnitude of response just detectable (the limit of detection),
6 a default power level of 50% (may increase, pending simulation studies) and a one-
7 sided test with a Type I error of 0.05 (p<0.05) is used.
8 4 Defaults: For quanta! data, a 10% increase in extra risk will be used as the default
9 approach when neither biological significance nor the limit of detection has been,
10 determined. When the BMR is set on the basis of biological significance, extra risk will be
11 used as a default. When the limit of detection approach is used for the BMR, it does not
12 • matter whether extra or additional risk is used, as the BMR that corresponds to the same
13 limit of detection for the risk formulation is determined and used in the model.
14 4. Order of model application
15 The models should be executed either using software produced and distributed by the
16. EPA or using software that carries out the curve fitting in a manner similar to the EPA software.
17 Allowing the use of more than one model at this stage is done because there is not enough
18 experience with modeling a wide variety of endpoints, and the best-fitting model may be
19 somewhat endpoint-specific. The following order for running the models should be used:
20 4 Continuous data: A linear model should be fun first. If the fit of a linear model is not
21 adequate, the polynomial model should be run, followed by the continuous power model.
22 Other models may be applied at the discretion of the risk assessor.
23 4 Dichotomous data: As there is currently no rationale for selecting one versus another, one
24 or more of the following models should be applied: the log-logistic, Weibull, and quanta!
25 polynomial models. For developmental toxicity data, models with fetuses nested within
26 litters should be used. The nested log-logistic model tended to give a better fit more often
27 in studies by Allen et al. (1994b), but other models often gave a good fit as well. Other
28 models may be applied at the discretion of the risk assessor.
29 ' ' ' •• -•:'• "_ , ' • .•_'•'..••.•••-':

DRAFT-DO NOT CITE OR QUOTE 5 AUGUST 9, 1996

-------
^1 5. Determining the model structure .
* * * . " • . ,
2 . The parameters included i;: 'he models most commonly used for dose-response analysis,
3 and examination of the change in the shape of the dose-response curve as those parameters are
4 changed, are described in the background document (EPA, 1995c). This section provides
5 guidance on choosing a model structure appropriate to the data being analyzed:
6 4 Dose Parameter Restriction: As a default, the exponent on the dose parameter in the
7 quantal Weibull model and the coefficient on slope in the log-logistic model will be
, 8 constrained to be greater than or equal to 1. Unconstrained models may be applied if
9 necessary to fit certain data. This constraint is typically necessary to avoid unstable
10 numerical properties in calculating the confidence interval when the parameter value is less
11 ' thanl.
12 4 Degree of the Polynomial: A 2-step procedure will be used as a default for determining
13 the degree of the exponent in any polynomial model for continuous or quantal data (this is
14 a place-holder, pending further discussion and agreement). First, a default of the
15 degree equal to k-1, where, k is the number of groups, will be used. Second, a stepwise
16 top-down reduction hi the degree will be performed to select the model with the fewest
17 parameters that still achieves adequate fit (based on p>0.05 from the goodness-of-fit
18 statistic). This approach will be built into software distributed by EPA.
19 4 Background Parameter. As a default approach, the background parameter will riot be
20 included hi the models. A background term should be included if there is evidence of a ,'
21 background response. This would be the case if there is a non-zero response hi the
22 control group. A background parameter would also be included if the first 2 or more
23 dosed groups do not show a monotonic increase in response (e.g., if the first 2 dosed
24 groups show the same response or if the second dosed group has a lower response than
25 the lowest dosed group). If there is doubt about whether a background parameter is
26 needed, it is usually conservative (i.e., will result in a lower BMD) if the background
27 parameter is excluded from the model. This is an area where more work is needed. '
28 4 Use of extra risk versus additional risk for quantal data: As hi selection of the BMR,
29 above.

DRAFT-DO NOT CITE OR QUOTE 6 AUGUST 9, 1996

-------
4 Threshold Parameter. .A so-called "threshold" (intercept) term will not be included in the
models used for BMD/C analysis because it is not a biologically meaningful parameter
(i.e., not the same as a biological threshold) and because most data sets can be fit
adequately without this parameter and the associated loss of a degree of freedom. This
will be the default built into the software distributed by EPA.'
4 Conversion of continuous data to dichotomous format: The standard approach to BMD/C
; analysis will be to model the continuous data directly, without conversion to the
dichotomous format. Alternatively, a hybrid approach such as that described by Crump
(1995) can be used. The hybrid approach models the continuous data, then uses the
.resulting distribution of the control data to calculate a probability estimate. '
The conversion of continuous data to dichotomous data and modeling of the
dichotomous data is not preferred because of the loss of information about the distribution
, of the response that is inherent in the approach, and because of the uncertainties in '
. , defining a cut-off response level to distinguish responders and nonresponders. Conversion
to dichotomous data could be considered in the rare cases where much is known about the
biological consequences of the response and a cutoff can be defined more confidently, or
in cases where the need for a probabilistic estimate of response outweighs the loss of
information.
6. Confidence limit calculation
The confidence limits will be calculated using likelihood theory and they will be based on
the asymptotic distribution of the likelihood ratio statistic (with the exception of models fit using
GEE methods, see Appendix C). This will be the default built into the software distributed by
EPA. The 95% lower confidence bound on dose will be used as the default confidence interval
for calculating the BMD/BMC.
7. Selection of "equivalent" models
Because each of the available models has some degree of flexibility and is capable of
describing a range of dose-response patterns, it may be the case that several models seem
appropriate for the analysis of the data under consideration. This section describes how to
evaluate, whether these models are equivalent based upon an evaluation of statistical and
• . * ' , • - *
DRAFT-DO NOT CITE OR QUOTE 7 . AUGUST 9, 1996

-------
1 biological considerations.
* ' ; *
* - : • ' '
2 , 4 All models should be retained that are not rejected-at the probability of p>0.05 using the
3 goodness-of-fit (GOF) statistic. ' •
4 4 A graphical representation of the model should also be developed and evaluated for each
5 model. Models should be eliminated that do not adequately describe the dose-response in
6 the range near the BMR. This is because it is possible that an adequate model fit could be
7 obtained based on the GOF criteria alone, but the fit of the data is not adequate at the low
8 end of the dose-response curve. Such a case should not be used for the BMD/C
9 calculation. This is a subjective judgment and requires that software for dose-response
10 analysis provide adequate graphical functions. No objective criteria for this decision are
11 currently available. It should be noted, however, that most quanta! models assume that
12 each animal responds independently and has an equal probability of responding. Similarly,
13 for continuous data, the responses are assumed to be distributed according to a normal
14 probability distribution. When, these assumptions are not appropriate, for example in
15 studies of developmental toxicity where the responses are correlated within litters,
16 alternative model structures may be used. Biological considerations might also be helpful
17 to determine adequate model fit. For example, a smooth change of slope may be deemed
18 more reasonable for a given response than an abrupt change.
19 4 If adequate fit is not achieved because of the influence of high dose groups with high
20 response rates, the assessor should consider adjusting the data set by eliminating the high
21 dose group. This practice carries with it the loss of a degree of freedom, but may be
22 useful in cases where the response plateaus or drops off at high doses. Since the focus of
, - *
23 the BMD analysis is on the low dose and response region, eliminating high dose groups is
24 reasonable. •
25 At this point, the remaining models should be considered "equivalent" in terms of their
26 usefulness for BMD/C analysis, especially when there is no biological basis for distinguishing
27 between the models or for choosing the best model. .
28 8. Selecting the BMD/C
29 As the models remaining have met the default statistical criterion for adequacy and visually
i * ". '

DRAFT-DO NOT CITE OR QUOTE 8 AUGUST 9,1996

-------
V l fit the data, 'any .of them theoretically could be used for determining the BMD/C. The remaining
2 criteria for selecting the BMD/C are'somewhat arbitrary, and are adopted as defaults.
3 • * . If the BMD/C estimates from the remaining models are within a factor of3, then they are
4 considered to show no appreciable model dependence and will be considered
5 indistinguishable in the context of the precision of the methods. Models are ranked based
6 on the values of their Akaike Information Criterion (AIC), a measure of the deviance of
/ . ' '
,7 the model fit adjusted for the degrees of freedom, and the model with the lowest AIC is
8 used to calculate the BMD/C. . ;•••
9 4 If the BMD/C estimates from the remaining models are not within a factor of 3, some
10 model dependence of the estimate is assumed. Since there is no clear remaining biological
11 or statistical basis on which to choose among them, the lowest BMD/C is selected as a
12 reasonable conservative estimate. If the lowest BMD/C from the available models appears
13 to be an outlier, compared to the other results (e.g., if the other results are within a factor
14 ' a 3), then additional analysis and discussion would be appropriate. Additional analysis
15 might include the use of additional models,, the examination of the parameter values for the
16 models used, or an evaluation of the MLEs to determine if the same pattern exists as for
17 . theBMD/Cs. Discussion of the decision procedure should always be provided.
18 9. Use in Risk Assessment
19 ' The BMD/C derived based on this default approach should be used in risk assessment as
20 "discussed in other guidance documents (U.S. EPA, 1991a, 1994c, 1995f, 1996a and b).
21 '-•..-:'•' '•• • ' ' '".'..
.22 ' :''•....'•' • - ' ''.''.-.-'"•
DRAFT-DO NOT CITE OR QUOTE 9 AUGUST 9, 1996

-------
1 • . H. INTRODUCTION
* * '
* ' . ',••--'
2 ''.'•'''•'''
3 A. Background
4 The US EPA conducts risk assessments for an array of noncancer health effects as well as
5 for cancer. The process of risk assessment, based on the National Research Council (NRC)
6 (1983) paradigm, has several steps: hazard characterization, dose-response analysis, exposure
7 assessment, and risk characterization. Risk assessment begins with a thorough evaluation of all
8 the available data to identify and characterize potential health hazards. The next' stage of the
9 process, dose-response analysis, involves an analysis of the relationship between exposure to the
10 chemical und health-related outcomes, and historically has been done very differently for cancer
11 and noncancer health effects. The common practice under the 1986 Guidelines for Carcinogen
12 Risk Assessment (EPA, 1986a) was to assume low dose linearity and model tumor incidence data
13 by applying the linearized multistage (LMS) procedure, which extrapolates risk as the 95% upper
14 confidence limit. This linear default position is, in part, based on the belief that the process of
15 cancer induction is similar among chemicals, namely electrophilic reaction of carcinogens with
16 DNA, causing mutations that are essential elements of the carcinogenic process.
17 The standard practice for the dose-response analysis of noncancer health effects has been
18 to determine a lowest-observed-adverse-effect-level (LOAEL) and a no-observed-adverse-effect-
19 level (NOAEL). The LOAEL is the lowest dose for a given chemical at which adverse effects
20 have been detected while the NOAEL is the highest dose at which no adverse effects have been
21 detected. The NOAEL (or LOAEL, if a NOAEL is not present) is adjusted downward by
22 uncertainty factors intended to account for limitations in the available data to arrive at an
23 exposure that is likely to be without adverse effects hi humans. The NOAEL can also be
24 compared with the human exposure estimate to derive a margin of exposure (MOE). The general
25 default assumption for noncancer health effects is one of a threshold response, and thus is the
26 basis for the NOAEL/uncertainty factor or NOAEL/MOE approach. The LOAEL and NO AEL
27 represent an operational definition of quantities that can characterize a study, and do not
28 necessarily have any consistent association with underlying biological processes nor with
29 thresholds.

DRAFT-DO NOT CITE OR QUOTE 10 ' AUGUST 9, 1996

-------
        With the.recent publication of EPAIsProposed Guidelines for Carcinogen Risk
 Assessment (1996a), the dichotomy between quantitative approaches for cancer and noncancer
 will begin to break down. The proposed guidelines promote the understanding of an agent's
 mode of action in producing tumors .to reduce the uncertainty in describing the likelihood of harm
 and in determining the dpse-response(s). The proposed guidelines call for modeling of not only
 tumor data, but other responses thought to be important.precursor events in the carcinogenic
 process.  Thus, the dose-response extrapolation procedure follows conclusions in the hazard
 assessment about the agent's carcinogenic mode of action. The dose-response assessment under
 these guidelines is a two-step process: (1) response data are modeled in the range of empirical
 observation ~ modeling in the observed range is  done with biologically-based, case-specific, or
 with appropriate  curve-fitting models; and then (2) extrapolation below the range of observation
 is accomplished by modeling if there are sufficient or by a default procedure (linear, nonlinear, or
 both). A point of departure for extrapolation is estimated from this modeling. The linear default
 is a straight-line extrapolation to the origin from the point of departure, and the nonlinear default
 approach begins at the identified point of departure and provides a margin of exposure (MOE)
 analysis rather than estimating the probability of effects at low doses.
       The benchmark dose approach, discussed in this document is a way of determining the
 point of departure, and can be used as the basis for linear low-dose extrapolation, calculation of
 an MOE, or application of uncertainty factors for calculating oral reference doses (RfDs),
 inhalation reference concentrations (RfCs), or other exposure estimates.
       Several limitationsI to the use of theLOAEL and NOAEL have been discussed in the
 literature (Crump, 1984; Gaylor, 1983; Kimmel and Gaylor, 1988) and by the EPA's Science
 Advisory Board (EPA, 1986b, 1988a,b, 1989c).  These include the fact that the NOAEL is limited
 to one of the doses in the study and is dependent  on study design, in particular on dose selection
 and spacing. The NOAEL also does not account for variability in the data,  arid thus, a study with
 a limited number of animals will often result hi a higher NOAEL than one which has more
 animals.  In addition,  the .slope of the dose-response curve is not taken into  account in the
 selection of a NOAEL and is not usually considered unless the slope is very steep or very shallow.
Additionally, a LOAEL cannot be used to derive a NOAEL when one does not exist in a study,

DRAFT-DO NOT CITE OR QUOTE        11 "                        AUGUST 9, 1996

-------
, ^1 rather an uncertainty factor is applied to aqcount for this limitation.
* " '.-'•*' • * '
2 <, In an effort to address some of the limitations of the LOAEL and NOAEL, Crump (1984)
3 proposed the benchmark dose (BMD) approach as an alternative. Using this approach, the
* * , ' ' •
4 experimental data are modeled, and an oral benchmark dose (BMD) or inhalation benchmark
5 concentration (BMC) in the observable range is estimated (see Fig. 1). The BMD/C is not
6 constrained to be one of the experimental doses, and can be used as a more consistent point of
7 departure than either the LOAEL or NOAEL. This approach uses the dose-response information
8 inherent in the data. The BMD/C accounts for the variability hi the data since it is defined as the
9 lower confidence limit on the dose estimated to produce a given level of change hi response
10 (termed the benchmark response, BMR) from controls. The BMD/C approach models all of the
11 data in a study and the slope of the dose-response curve is integral to the BMD/C estimation. A
12 BMD/G can be estimated even when all doses hi a study are associated with a response (i.e., when
13 there is no NOAEL). The BMD/C estimate is best when there are doses hi the study near the
14 range of the BMD/C. . . •
15 Data used in the determination of a LOAEL/NOAEL or BMD/C are essentially of two
16 types: quantal and continuous. Quantal data are often dichotomous (yes/no) responses and are
17 usually presented as counts or incidence of a particular effect or of individuals affected. Quantal
18 data are represented by such endpoints as tumor incidence, mortality, or malformed offspring. A
19 specialized case of quantal responses is when data are categorized by severity of effect, e.g., mild,
20 moderate, or severe. This categorization of data is often used for histopathological lesions. At the
21 other extreme are continuous data, which represent a continuum of response and are usually
22 represented by a measurement. Body weight, serum liver enzyme activity or nerve conduction
23 velocity are examples of continuous responses. Continuous data can also be expressed as quantal
24 responses by determining some magnitude of change from controls that is considered significant,
25 then counting the number of individuals above or below that cutoff level. The level of significance
26 can be based either on biological significance, or on statistical significance. For example, a
27 decrease in adult body weight may not be considered adverse until it is > 10%, this magnitude of
change is based on biological significance.
DRAFT-DO NOT CITE OR QUOTE 12 AUGUST 9, 1996

-------
     100
 O)
 c
 T3
 C
 o
 Q.
 J2
 (0
    BMP-


       0
                         Indicates data point .
                         with confidence bars
                             Lower statistical
                             limit on dose
   I
--4—-	 '  ,
                             BMD
                          NOAEL

                               Dose
         BMR 3 targe) response level used to define BMD
 Figure 1. Example of calculation of a BMD
                                                      Best-fitting dose
                                                      response model
DRAFT-DO NOT CITE OR QUOTE
                             13
AUGUST 9,1996

-------
- .* • Calculation of a BMD/C requires selection of a benchmark response (BMR) level, i.e., the
2 level of change in response from considered to be an adverse effect. Basing BMRs. on the level of
3 biological significance automatically determines the level of the BMR. Basing them on statistical
4 significance, however, is dependent on the limit of detection for a particular study .design.
5 Selection of the'BMR based on the limit of detection is discussed hi detail in Appendix B.
6 Since the initial proposal of the BMD by Crump .(1984), a number of papers have been
7 published dealing with application of the BMD/C. approach. Many of these are included in the
8 reference list for this document. A background document on the use of the BMD/C in health risk
9 assessment was published by the EPA's Risk Assessment Forum (EPA, 1995c). This document
10 provided a framework for the decision points in the process of applying the BMD/C approach.
11 The background document is to be used as a companion to the current guidance document.
12 Several workshops and symposia have-been held to discuss the application of the BMD/C
13 and appropriate methodology (Kimmel et al., 1989; California EPA, 1993; Beck et al., 1.993; SRA
M Symposium, 1994; Barnes et al., 1995). The most recent workshop was conducted by ILSI Risk
15 Science Institute and sponsored by the EPA and AfflC (Barnes et al., 1995). The participants at
1G the workshop generally endorsed the application of the BMD/C for all quantal noncancer
17 endpoints and particularly for developmental toxicity, where a good deal of research has been
18 done. Less information was available at the time of the EPA/AIHC/ILSI workshop on the
19 application of the BMD/C approach to continuous data, and more work was encouraged. A
20 number of other issues concerning the application of the BMD/C were discussed. The guidance
21 and default options set forth hi the current document are based hi part on the outcome of this
22 workshop, the background document (EPA, 1995c), and on more recent information and
23 discussions. " ' . -
24 A number of research efforts, many of which have dealt with reproductive and
25 developmental toxicity data, have provided extremely useful information for application of the
26 BMD/C approach (e.g., Alexeeffet al., 1993; Catalano et al., 1993; Chen et al., 1991; Krewski
27 and Zhu, 1994,1995; Auton, 1994; Crump, 1995). In a series of papers by Faustman et al.
28 * (1994); Allen et al. (1994a and b), and Kavlock et al. (1995), the BMD approach was applied to a
29 large database of developmental toxicity studies. In brief, the results of these studies' showed that
* .*•-'.-•' *
DRAFT-DO NOT CITE OR QUOTE 14 ' AUGUST 9, 1996

-------
when data'were .expressed as counts of dichotomous endpoints (i.e., number of litters per dose
group with resorptions or malformations)., the NOAEL was approximately 2-3 times higher than
the BMD for a 10% probability of response above control values (approximately 20 animals per
dose.group), and 4-6 Jimes higher than the BMD for a 5% probability of response.. When the data
Were expressed as the proportion of affected fetuses per litter (nested dichotomous data), the
NOAEL was on average 0.7 times the BMD for a 10% probability of response, and was
approximately equal, on average, to the BMD for a 5% probability of response. Expressing the
data as the proportion of affected fetuses per litter is the more appropriate way to analyze
developmental toxicity data. However, the results of the quantal data analysis also may apply to
using the BMD/C approach with other quantal data, and suggest that the NOAEL in these cases
may be at or above the 10% true response level, depending on sample size and background rate.
Since reduced fetal weight in developmental toxicity studies often shows the lowest
NOAEL among the various endpoints evaluated, the application of the BMD to these continuous
data also was evaluated (Kavlock et al., 1995). A variety of cutoff values was explored for
defining an adverse level of weight reduction below control values. In some cases, data were
analyzed using a continuous power model, and in other cases, the data were transformed to
dichotdriiqus data. Comparisons with the NOAEL showed that several cutoff values could be
. used to give values similar to the NOAEL. These analyses suggest ways in which BMD/Cs may
be developed for continuous data from a variety of endpoints.
In a recent paper, Crump (1995) detailed a new approach to deriving a BMD/C for
continuous data based on a method originally proposed by Gayldr and Slikker (1990). This
approach makes use of the distribution of continuous data, estimates the incidence of individuals;
•falling above or below a level considered to be adverse, and gives the probability of responses, at
specified doses above the control levels. This results in an expression of the data in the same
terms as that derived from analyses of quantal data, and allows more direct comparison of BMDs
derived from continuous and quantal data. This approach has not been applied to many data sets .
as software has not been developed until recently.
Another approachito the derivation of BMD/Cs for noncancer health effects is the
multinomial modeling approach. Many noncancer health effects are characterized ,by multiple

. DRAFT-DO NOT CITE OR QUOTE 15 AUGUST 9, 1996

-------
1 endpoints'that are not completely independent of one another. Catalano et al. (1993), Chen et al.
2 (1991), and Zhu et al. (1994) have worked on this issue using developmental toxicity,data, and
3 have shown that, in general, the BMD derived from a multinomial modeling approach is lower
4 than that for any individual endpoint. This approach has not been applied to other health effects
5 data, but should be kept in mind when multiple related outcomes are being considered for a
6 particular health effect.
7 Other types of approaches have been proposed for analysis of noncancer data, e.g.,
8 categorical regression modeling (Dourson et al., 1985; Hertzberg, 1989; Hertzberg andMiller,
9 1985; Guth; Simpson), and Bayesian approaches (Hasselblad and Jarabek, 1995), that may be
10 more appropriate in some cases. There has not been much experience as yet in using these
II approaches for deriving BMD/Cs.
12 As discussed earlier most of the effort on development of BMD/C approaches has been
13 related to noncancer health effects, but recent proposed guidelines for cancer risk assessment
14 (EPA, 1996a) suggest an approach to dose-response analysis that is similar to use of the BMD/C
15 for noncancer risk assessment. While terminology and exact procedures are not the same, the
16 concept is parallel to the BMD/C approach (see Section V for further discussion). Thus,
17 guidance is provided in this document on the use of the BMD/C as the point of departure for low
18 dose extrapolation of both cancer and noncancer health effects.
19
20 B. Purpose of This Guidance Document .
21 The purpose of this document is to provide guidance for the Agency and the outside
22 community on the application of the BMD/C approach in the calculation of RfD/Cs and other
23 exposure standards and in the estimation of cancer risks for EPA use. The document provides
24 guidance based on today's knowledge and understanding, and on experience gained hi using this
25 approach. The Agency is actively applying this methodology and evaluating the outcomes for the
26 purpose of gaining experience in using it with a variety of endpoints. This document is intended
27 to be updated as new information becomes available that would suggest approaches and default
28 options alternative or additional to those indicated here. The document should not be viewed as
29 precluding additional research on modified or alternative approaches that will improve

DRAFT-DO NOT CITE OR QUOTE 16^ AUGUST 9, 1996

-------
"  1     quantitative risk, assessment.  In fact, the use of improved scientific understanding and
  2 .    development of more mechanistically-based approaches to dose-response modeling is strongly
  3     encouraged by the Agency.                                       '
'  4• .     '      •  '•   .'..  '       '         .          .       "'     '   "       •"''  '•    •      •_  '
  5     C. Definition of the Benchmark Dose or Concentration
  6            The benchmark dose or concentration (BMD/C) is defined as the statistical lower
  7            confidence limit on the dose estimated to produce a predetermined level of change
  8'           in response (the benchmark response - BMR) relative to controls.
                                  t      '        -                       -            .'
  9      '               "         ''•'..'-..    •"•'..•'   •  '   • '      '  •   .  •
 10            The BMD/C is intended to be used as an alternative to the NOAEL in deriving a point of
 11     departure for low dose extrapolations.  The BMD/C is a dose corresponding to some change in
 1.2     the level of response relative to background, and is not dependent on the doses used hi the study.
 13     The BMR is based on a biologically significant level of response or on the response level at the
 14     lower detection limit of the observable dose range for a particular endpoint in a standard study
 15     design. The BMD/C approach does not reduce uncertainty inherent in extrapolating from animal
 16     data to humans (except for that in the LOAEL to NO AEL extrapolation), and does not require
 17     that a study identify a NOAEL, only that at least one dose be near the range of the response level
 18     forthe BMD/C.
 19      •      "- '  .   •        '"     '.          .               ..     .     •   •'
        DRAFT-DO NOT CETE OR QUOTE        17                          AUGUST 9, 1996

-------
, . BENCHMARK DOSE GUIDANCE
* ' : ' •
2
3 ' • This section describes the proposed approach for carrying out a complete BMD/C
4 analysis. It is organized in the form of a decision process including the rationale and defaults for
5 proceeding through the analysis, and follows a similar framework to that outlined in the
6 background document (EPA, 1995c). The guidance here imposes some constraints on the
7 BMD/C analysis through decision criteria, and provides defaults when more than one feasible
8 approach exists. Steps in the guidance are discussed here and in more detail in Appendices A-C.
9 . : • • . . ,
10 A. Data Array Analysis - Endpoint Selection
11 The first step in the process is a complete qualitative review of the literature to identify
12 and characterize the hazards related to a particular compound or exposure situation. This process
13 is the same whether a BMD/C analysis or a NOAEL approach is used. Guidance on review of
14 data for risk assessment can be found hi a number of EPA publications (EPA, 1991a, 1994c,
15 1995f, 1996a and b). Further discussion of design issues, data reporting, and route extrapolation
16 relevant to the BMD/C analysis can be found hi Appendix A.
17 Following a complete qualitative review of the data, the risk assessor must select the
18 studies appropriate for benchmark dose analysis. The selection of the appropriate studies is based
19 on the human exposure situation that is being addressed, the quality of the studies, and the
20 relevance and reporting adequacy of the endpoints.
21 The process of selecting studies for benchmark analysis is intended to identify those
22 studies for which modeling is feasible, so that BMD/Cs can be calculated and used hi risk
23 assessment. In most cases the selection process will identify a single study or very few studies for
24 which calculations are relevant, and all studies should be modeled. However, cases with many
25 studies, or studies hi which many endpoints are reported would require a very large number of
26 * BMD/C calculations. Multivariate analysis of such studies to describe correlations between
27 effects is a useful tool, but has not been explored in terms of the BMD/C method. .
28 The analysis of a large number of endpoints may result hi benchmark calculations which
29 are redundant in the information they convey. In these cases, it is useful to select a subset of the

DRAFT-DO NOT CITE OR QUOTE 18 AUGUST 9, 1996

-------
endpoints-as representative of the effects in the target organ or the study! This selection can be
made on the basis of sensitivity or severity, which may be more easily compared within a single
study in the same target organ than across studies. Within an experiment, an endpoint may be
selected based.on how well it represents others for the same target organ and on its dose-response
behavior. It is reasonable to select a representative endpoint that shows smoothly increasing
response with increasing dose in order to obtain a good fit of the dose response model.
1. Selection of Endpoints to be Modeled.
Once endpoirits have been evaluated with regard to their relevance for BMD/C analysis,
the selection of endpoints to model should focus on endpoints that are relevant or assumed
relevant to humans and potentially the "critical" effect (i.e., the most, sensitive). Since differences
9in slope could result in an endpoint having a lower BMD but a higher LOAEL or NOAEL than
another endpoint, selection of endpoints should not be limited only to the one with the lowest
LOAEL. In general; endpoints should be modeled if their LOAEL is up to 10-fold above the
lowest LOAEL. This will insure that no endpoints with the potential to have the lowest BMD/C
are excluded from the analysis on the basis of the value of the LOAEL.
2. Minimum Data Set for Calculating a BMD/C
Once the critical endpoints have been selected, the data sets are examined for the
appropriateness of a BMD/C analysis. The following constraints on data sets are used:
4 At a minimum, the number of dose groups and subjects should be sufficient to allow
determination of a LOAEL. ;
4 There must be more than one exposure group with a response different than controls (this
could be determined with a pairwise comparison or a trend test). With only one
responding group, there is inadequate information about the shape of the dose-response
curve, and mathematical modeling becomes too arbitrary.
4 Dose-response modeling is not appropriate if the responding groups all show responses'
near maximum (e.g., greater than 50% response for quanta! data or a clear plateau of
response for continuous data). In/this case, there is inadequate information about the
shape of the curve in the low dose region near the BMR.
DRAFT-DO NOT CITE QR QUOTE 19 AUGUST 9, 1996

-------
1 3. Combining J)ata for a BMD/C Calculation
* ' . ' *
2 Data sets that are statistically and biologically compatible may be combined prior to dose-
3 response modeling and thus generate fewer BMD/C estimates for comparison. The combining of
4 appropriate data sets prior to modeling has several advantages. It leads to increased confidence,
5 both statistical and biological, in the calculated BMD/C. In addition, the use of combined data
6 sets may encourage further research to be conducted on that compound if the additional data can
7 affect the BMD/C estimate. Example #3 for boron in Appendix D is a case where data could be
8 combined for the BMD/C analysis.
9
10 B. Criteria for Selecting the Benchmark Response Level (BMR)
11 The Benchmark Dose Workshop (Barnes et al, 1995) recommended that the BMR
12 "should be within or near the experimental range of the doses studied." The Workshop's
13 recommendation in this regard was to. use a 5% or 10% increase in response levels above controls
14 as the BMR. However, this recommendation addressed only the case of dichotomous data and
15 did not deal with continuous endpoints. In addition, the variety of experimental designs used in ,
1G toxicology and the associated variety of experimental endpoints and their ranges of variability may
17 require a broader range of approaches. This proposal for selecting the BMR attempts to take into
18 account the wide array of lexicological responses.
19 Any toxicological study is limited in its ability to detect an increase in the incidence of
20 adverse responses or a change in the mean of continuous endpoints. This limitation is determined
21 by aspects of the study design: largely, the number of independent experimental units per dose
22 group, the nature of any nesting or repeated measures in the design, the background incidence of
23 the adverse response (for dichotomous endpoints) or the variance of the control response (for
24 continuous endpoints), as well as the intended form of significance testing. For some endpoints,
25 such as adult weight, serum liver enzyme activities, and certain neurological measurements like
26 nerve conduction velocity, there is also a minimum change from control levels that is considered
27 "biologically significant." For endpoints for which there is no agreed upon biologically significant
28 change, quantifying the concept of "limit of detection" for a toxicological study can simplify
29 specifying the BMR.

DRAFT-DO NOT CITE OR QUOTE 20 ' AUGUST 9, 1996

-------
' 1 .: .• The BMp/C approach requires that a level of change in response (the BMR) be specified
2 . in order to calculate the BMD/C. In this proposal, there are two bases for specifying the BMR: a
3 biologically significant change in response for continuous endpoints, or the limit of detection for
4 either quanta! or continuous data. In most cases, the question concerning how much of a change -
5 in a continuous endpoint is biologically significant has not been addressed, and the level of change
6 that is considered adverse is based in large part on the detectable level of response or limit of
.7 detection for a particular study design. For quantal data, the number of responders is counted,
8 and again the limit of detection is used. The limit of detection is based on the background
9 response rate, sample size, power level, and whether extra or additional risk is used in the .model.
10 Standard study designs2 for various endpoints should be used to determine the general
11 background rate for a response level and the number of animals typically used. These concepts
12 are discussed in depth in Appendix B.
13 the following default decisions will be used in selection of the BMR.
14 4 Biological significance: If a particular level of response for a continuous endpoint has
15 been determined to be "biologically significant," the BMR is based on that degree of
16 change from background (e.g., adult body weight reduction >10% from the control value;
17 a 10% or more decrease in nerve conduction velocity).
18 V Limit of Detection: When the biologically significant level of response has not been
19 • determined or is equated with a statistically significant response, the limit of detection
20 method is used. To find the magnitude of response just detectable (the lunit of detection),
21 a default power level of 50% (may increase, pending simulation studies) and a one-
22 .sided test with a Type I error of 0.05 (p<0.05) is used.
23 4 Defaults: For quantal data, a 10% increase in extra risk will be used as the default
24 approach when neither biological significance nor the limit of detection has been
2Using a cadre of standard study designs for a variety of endpoints is one way to reduce
the uncertainty that can be introduced when different individuals make decisions for limits of
detection. We assume here the use of Agency testing protocols as the basis for standard "good"
study designs. In areas where standard testing protocols have not been developed, the Agency
encourages activities that can assist in identifying a "most common well-designed protocol" for
typically-studied endpoints. •„. , •
DRAFT-DO NOT CITE OR QUOTE 21 AUGUST 9, 1996

-------
, 4 J ' determined. When the BMR is set.on the basis of biological significance, extra risk will be
$ ' ' ' , i
2 used as a default. When the limit of detection approach is used for the BMR, it does not
3 matter whether extra or additional risk is used, as the BMR that corresponds to the same
4 limit of detection for the risk formulation is determined and used hi the model.
•
5 ' - • '
6 C. Mathematical Modeling • " , .
7 1. Introduction
8 The goal of the mathematical modeling in benchmark dose computation is to fit a model to
9 dose-response data that describes the data set, especially at the lower end of the observable dose-
10 response range. The fitting must be done in a way that allows the uncertainty inherent hi the data
11 to be quantified and related to the estimate of the dose that would yield the benchmark response.
12 In practice, this procedure will involve first selecting a family or families of models for further
13 consideration, based.on characteristics of the data and experimental design, and fitting the models
14 using one of a few established methods. Subsequently, a lower bound on dose is calculated at the
15 BMR. This section is too brief to do more than introduce the topic of nonlinear modeling. Some
16 references for further reading are: Chapter 10 of Draper and Smith (1981), Clayton and Hills
17 (1993), Davidian and Giltinan (1995), and McCullagh and Nelder (1989).
18 Dose-response models are expressed as functions of dose, possibly covariates, and a set of
19 constants, called parameters, that govern the details of the shape of the resulting curve. They sire
20 fitted to a data set by finding values of the parameters that adjust the predictions of the model for
21 observed values of dose and covariates to be close to the observed response. At the present,
22 although biological models may often be expressed as nonlinear models (e.g., Michaelis-Menton
23 processes), nonlinear models do not necessarily have a biological interpretation. Thus, criteria for
24 final model selection will be based solely on whether various models describe the data,
25 conventions for the particular endpoint under consideration, and, sometimes, the desire to fit the
26 same basic model form to multiple data sets. Since it is preferable to use special purpose
27 modeling software, EPA is in the process of developing user-friendly software which includes
28 several models and default processes as described in this document. The models included in the
29 software are listed hi Appendix E. • • .

DRAFT-DO NOT CITE'OR QUOTE 22 AUGUST 9, 1996

-------
This section provides guidance on how to go about choosing a model structure
appropriate to the data being analyzed, the order of model application confidence limit
calculation, -selectipn of "equivalent" models, and selection of the BMD'/C to use as the point of
departure. More in depth discussion of these topics can be found in Appendix C. .Examples of
BMD/C modeling of actual data sets can be found in Appendix D. ,
2. Order of Model Application
The models should be executed eitheir using software produced and distributed by the
EPA or using software that carries out the curve fitting in a manner similar to the'EPA software.
Allowing the use of more than one model at this stage is done because there is not enough
experience with modeling a wide variety of endpoints, and the best-fitting model may be
somewhat endpoint-specific. The following order for running the models should be used:
> Continuous data: A linear model should be run first. If the fit to a linear model is not
adequate, the polynomial model should be run, followed by the continuous power model.
Other models may be applied at the discretion of the risk assessor,
4 Dichotomous data: As there is currently no rationale for selecting one versus another, one
or more of the following models should be applied: the log-logistic, Weibull, and
polynomial models. For developmental toxjcity data, models with fetuses nested within
litters should be used. The nested log-logistic model tended to give a better fit more often
in studies modeled by Allen et al. (1994b), but other models often gave a good fit as well.
Other models may be applied to dichotomous data at the discretion of the risk assessor.
3. Determining the Model Structure
The parameters included in the models most commonly used for dose-response analysis,
and examination of the change in the shape of the dose-response curve as those parameters are
changed, are described in the background document (EPA, 1995c). This section provides
guidance on choosing a model structure appropriate to the data being analyzed.
4 Dose Parameter Restriction: As a default, the exponent on the dose parameter in the
quanta! Weibull model and the coefficient on slope in the log-logistic model will be
constrained to be greater than or equal to 1. Unconstrained models may be applied if
necessary to fit certain data. This constraint is typically necessary to avoid unstable

DRAFT-DO NOT CITE OR QUOTE 23 AUGUST 9, 1996

-------
^ I ' numerical properties in calculating the confidence interval when the parameter value is less
* ' : ' , * * - . -
2 • thanl.
3 4 Degree of the Polynomial: A 2-step procedure will be used as a default for determining
4 the degree of the exponent in the polynomial model for both continuous and quantal data
5 (this is a place-holder, pending further discussion and agreement). First, a default of
6 the degree equal to k-1, where k is the number of groups, will be used. Second, a
7 stepwise top-down reduction hi the degree will be performed to select the model with the
8 fewest parameters that still achieves adequate fit (based on p>0.05 from the goodness-of-
9 fit statistic). This approach will be built into software distributed by EPA.
10 4 Background Parameter: As a default approach, the background parameter will not be
11 included in the models. A background term should be included if there is evidence of a
12 background response. This would be the case if there is a non-zero response in the
13 control group. A background parameter would also be included if the first 2 or more
14 dosed groups do not show a monotonic increase in response (e.g., if the first 2 dosed
15 groups show the same response or if the second dosed group has a lower response than
16 the lowest dosed group). If there is doubt about whether a background parameter is
17 needed, it is usually conservative (i.e., will result in a lower BMD) if the background
18 parameter is excluded from the model. This is an area where more work is needed.
19 4 Use of extra risk versus additional risk for quantal data: As in selection of the BMR,
20 * above.
21 4 Threshold parameter: A so-called "threshold" (intercept) term will not be included in the
22 models used for BMD/C analysis because it is not a biologically meaningful parameter
23 (I.e., not the same as a biological threshold) and because most data sets can be fit
24 adequately without this parameter and the associated loss of a degree of freedom. This
25 will be the default built into the software distributed by EPA.
26 4 Conversion of continuous data to dichotomous format: The standard approach to BMD/C
27 analysis will be to model the continuous data directly, without conversion to .the
28 dichotomous format. Alternatively, a hybrid approach such as that described by Crump
29 (1995) can be used. The hybrid approach models the continuous data, then uses the

DRAFT-DO NOT CITE OR QUOTE 24 . AUGUST 9, 1996

-------
.- resulting, distribution of the control .data to calculate a probability estimate.
The conversion of continuous data to dichptomous data and modeling of the
dichotomous data is not preferred because of the loss of information about the distribution
of the response that is inherent in the approach, and because of the uncertainties in
defining a cut-off response level to distinguish resporiders and nonrespbnders. Conversion
to dichotomous data could be considered in the rare cases where much is known about the
biological consequences of the response and a cutoff can be defined more confidently, or
.in cases where the need for a probabilistic estimate of response outweighs'the loss of
information,
4. Confidence Limit Calculation
The confidence limits will be calculated based using likelihood theory and they will be
based on the asymptotic distribution of the likelihood ratio statistic (with the exception of models
fit using GEE methods; see Appendix C). This will be the default built into the software
distributed by EPA. The 95% lower confidence bound on dose will be used as the default
confidence interval for calculating the BMD/BMC..
5. Selection of "Equivalent" Models
* . . ' •,
Because each of the available models has some degree of flexibility and is capable of
describing a range of dose-response patterns, it may be the case that several models seem
appropriate for the analysis of the data under consideration. This section describes how to
evaluate whether these models are equivalent based upon an evaluation of statistical and
biological considerations. /
4 All models should be retained that are not rejected at the probability of p>0.05 using the
goodness-of-fit (GOF) statistic.
4 A graphical representation of the model should also be developed and evaluated for each
model. Models should be elirninated that do not adequately describe the dose-response in
the range near the BMR. This is because it is possible that an adequate model fit could be
obtained based on the GOF criteria alone, but the fit of the data is not adequate at the low
end of the dose-response curve. Such a case should not be used for the BMD/C
calculation. This is a subjective judgment and requires that software for dose-response

DRAFT-DO NOT CITE OR QUOTE 25 ' AUGfUST 9, 1996

-------
• * ' analysis provide adequate graphical, functions. No objective criteria for this decision are
2 . currently available. It should be noted, however, that most quantal models assume that
3 each animal responds independently and has an equal probability of responding. Similarly,
4 for continuous data, the responses are assumed to be distributed according.to a normal
5 probability distribution! When these assumptions are not appropriate, for example in
6 studies of developmental toxicity where the responses are correlated within litters,
7 alternative model structures may be used. Biological considerations might also be helpful
8 to determine adequate model fit. For example, a smooth change of slope may be deemed
9 more reasonable for a given response than an abrupt change.
10 4 If adequate fit is not achieved because of the influence of high dose groups with high
11 response rates, the assessor should consider adjusting the data set by eliminating the high
12 dose group. This practice carries with it the loss of a degree of freedom, but may be
13 useful in cases where the response plateaus or drops off at high doses. Since the focus of
14 the BMD analysis is on the low dose and response region, eliminating high dose groups is
15 reasonable. .
16 At this point, the remaining models should be considered "equivalent" in terms of their
17 usefulness for BMD/C analysis, especially when there is no biological basis for distinguishing
18 between the models or for choosing the best model.
19 6. Selecting the BMD/C
20 As the models remaining have met the default statistical criterion for adequacy and visually
21 fit the data, any of them theoretically could be used for determining the BMD/C. The remaining
22 criteria for selecting the BMD/C are necessarily somewhat arbitrary, and are adopted as defaults.
23 4 If the BMD/C estimates from the remaining models are within a factor of 3, then they are
24 considered to show no appreciable model dependence and will be considered
25 ' indistinguishable in the context of the precision of the methods. Models are ranked based
26 on the values of their Akaike Information Criterion (AIC), a measure of the deviance of
27 the model fit adjusted for the degrees of freedom, and the model with the lowest AIC is
28 used to calculate the BMD/C.
29 4 If the BMD/C estimates from the remaining modjels are not within a factor of 3, some

DRAFT-DO NOT CITE OR QUOTE 26 AUGUST 9, 1996

-------
•  ,x      '      model dependence of the estimate is assumed. Since there is no clear remaining biological
  2     .  .-•    or statistical basis on which to choose among them, the lowest BMD/C is selected as a
  3     •       reasonable conservative estimate.  If the lowest BMD/C from the available models appears
  4            to be an .outlier, compared to' the other results (e.g., if the other results are .within a factor
  5            a 3), then additional analysis and discussion would be appropriate. Additional analysis
  6            might include the use of additional models, the examination of the parameter values for the
  7        "    models used, or an evaluation of the MLEs to determine if the same pattern exists as for
  8            the BMD/Cs.  Discussion of the decision procedure should always be provided.
  9     >  .   In some cases, relevant data,for a given agent are not amenable to modeling and a mixture
 10            of BMD/Cs and NO AEL/LOAELs results. When this occurs, and the critical effect is
 11            from a study considered adequate but not amenable to modeling, the NOAEL should be
 12            used as the point of departure.      \            •
 13'             ; .  ;.      '      ,     '•'.'- '..'     '             ,     ,     '.'••'•'..  ^       •  . '
 14      '               '•',''             .     .'.•-'•'-.•••.       '         .
       DRAFT-DO NOT CITE OR QUOTE        27                          AUGUST 9, 1996

-------
. 1 IV, USING TEE BMD/C IN NQNCANCER DOSE-RESPONSE ANALYSIS
* * ' , * • *~
2 . ' • •
3 A. Introduction , '
4 Use of the BMD/C approach for noncancer dose-response analysis on a broad scale raises
5 a number of issues. A wide variety of endpoints of noncancer health effects must be considered in
6 the development of exposure limits. Experience with the BMD/C approach varies depending on
7 the disciplinary area and type of endpoint under consideration. As indicated earlier in the
8 document, a good deal of work has been done for developmental toxicity, less for neurotoxicity,
9 and relatively little for other types of noncancer health effects to date. Although some BMD/Cs
10 are based on continuous variables and others on discrete (mostly dichotomous) variables, this is
11 not a problem using the approach advocated in this document. Basically, the BMR is set by
12 selecting levels of response that are considered biologically significant and/or near the limit of
13 detection for a particular endpoint in a given study design, using much the same basis as has been
14 , done for setting NOAELs.
1§ A few RfCs and one RED included in the IRIS database (EPA, 1996c) have been based on
16 BMD/C calculations. These include: methyl mercury (RfD based on delayed postnatal
17 development in humans), carbon disulfide (Example #1 in Appendix D), HFC-134a (Example # 2
18 in Appendix D) and antimony trioxide (RfC based on chronic pulmonary interstitial inflammation
19 in female rats). A few other risk assessments have been developed using the BMD/C approach,
20 including manganese (RfC based on neurotoxicity hi humans) (EPA, 1995g), and diesel exhaust
21 (RfC based on lung irritation) (EPA, 1995), and others are under development (e.g., boron -
22 Example #3 in Appendix D; 1,3-butadiene - Example #4 hi Appendix D). All of these cases have
23 involved a variety of approaches to calculating the BMD/C and to deriving RfDs and/or RfCs. In
24 addition, the Agency is continuing to conduct such evaluations as part of ongoing risk
25 assessments and is aware of a number of similar efforts outside the Agency.
26
27 B. Effect of the BMD/C Approach on Use of Uncertainty Factors
28 Uncertainly always accompanies inferences regarding the derivation of RfD/Cs or other
29 exposure limits. Whether a threshold for noncancer effects is assumed or not, inferences about

DRAFT-DO NOT CITE OR QUOTE 28 . AUGUST 9,1996

-------
l lower human exposure levels usually embody one or more extrapolations which may include:
* * ' ' " , • • '" '•
2 animal to human, human to sensitive human populations, acute exposure to chronic exposure, and
3 LOAELtoNOAEi,(ifaNOAELisnotavailable). Each of these extrapolations carries attendant
4 uncertainties. The only change in the use of uncertainty factors when using the BMD/C as the
5 point of departure is that the LOAEL to NOAEL uncertainty factor is no longer necessary.
6 . - ' .' ^ ' '"'• , ,"- / . . '• • '.. •' • ' ;
7 C." Dose-Response Characterization
8 Dose-response analyses based on BMD/Cs provide a point of departure for calculating
9 MOEs, RfD/Cs, and other exposure estimates. Although the BMD/C is associated with a defined
10 level of risk in the study population from which the BMD/C was calculated, it would be
11 misleading to translate that to a level of risk at the MOE or RfD/C or other reference value. The
12 dose at the MOE or reference value is intended to yield human risks that are lower than the risk to
13 test species at the BMD/C, but the exact degree of protection is unknown, given the attendant
14 uncertainties in extrapolation of low dose nonlinear response relationships.
15 This issue was discussed in some detail at the BMD workshop (Barnes etal., 1995).,.The
16 participants concluded that the level of effect at the BMD/C in the experimental species cannot be
17 translated to a level of risk at the RfD/C for humans. They pointed out that typically the average
18 level of effect in the test species at the BMD/C is less than the level of response at the BMR (e.g.,
19 5%, 10%), since the BMD/C is defined as the lower 95% confidence limit on the dose at the
20 BMR. They recommended an approach (discussed further below) to communicating the use of
21 the BMD/C in such situations, and that this information be included in IRIS along with the central
22 estimates (e.g., MLEs), BMD/Cs, MOEs, RfD/Cs or other reference values.
23 The following statement^ modified from the BMD workshop report, is an example of what
24 can be used to communicate these values based on BMD/Cs:
25 '' .••.'..' ' • '•'• '•''•''':•".•'..'.. ' ''
26 The BMD/C corresponds to a dose level which yields (with 95%
27 confidence) a level of effect in the test species of, for example, 10% or
28 less.for quantal data, or that represents a change from the control mean
29 of, for example, 5% for continuous data. This is about the lowest level .

DRAFT-DO NOT CITE OR QUOTE 29 AUGUST 9, 1996

-------
 2
 3
 4
 5
 6
 7
 8
 9
10
of effect.that can be detected reliably in an experimental study of this
design.  Alternatively, if the BMR is based on a degree of change that is
considered biologically significant, e.g., a 10% or greater reduction in
adult body weight or reduction in birth weight below 2.5 kg, the
BMD/C represents the lower confidence limit on dose for that degree
of change. Overall, the BMD/C will be a more consistent point of
departure than the NOAEL and will not be constrained by the doses
used in a particular study.
      DRAFT-DO NOT CITE OR QUOTE
                                   30
AUGUST 9, 1996

-------
' . 1 V. USE OF BENCHMARK-STYLE APPROACHES IN CANCER RISK ASSESSMENT
2 . '.'•;'' ' ' • .•'•'"''••'" • •.' . '' . ,.. . ••' ',
3 Although use of a benchmark approach in noncancer assessment is relatively new,
4 benchmark-style approaches have been used in ranking and regulating potential cajrcinogens for
5 many years. Historically, cancer risk assessors have not used the term "benchmark dose,"
6 preferring terms such as "TDSO" and "EDi0," because these latter terms draw an analogy to the
7 LD50 and maintain continuity in terminology with the published cancer literature. The principal
. 8 use these approaches has been in hazard ranking; more recently, a benchmark-style approach is
9 being proposed for extrapolation below the observed range for tumor incidence in the proposed
10 cancer guidelines (EPA, 1996a). This section provides guidance for cancer hazard ranking and
11 low-dose extrapolation.
12 , . ' ;: ' ' - - '•'••-.'-• ' . " ' ..._-.'
13 A. Use in Hazard Ranking
14 Hazard ranking, the comparison of quantitative potency estimates across chemicals, has
15 been the principal use of ED10s and similar estimates. This, use does not require that low-dose risk
16 be quantified, consequently, rankings can be based on potency estimates from the experimental
17 range, which are mostly independent of choice of model. EPA has used ED10s in regulations and
18 proposals that set priorities for emergency response (U.S. EPA, 1987, 1989a, 1995a) and
19 evaluate health hazards from air pollutants (U.S. EPA, 1994a). This section provides guidance
20 for developing ED10s, cites sources where TDsos and ED10s can be obtained, and reviews
21 examples where TD50s and ED10s have been adapted for other uses.
22 .1. Guidance for Developing Comparable ED10s
23 Hazard rankings are based on the notion that potency estimates for different chemicals and
24 experimental protocols can be made comparable. It is customary to adjust for background tumor
25 rates, combine fatal and incidental tumors, and correct for early mortality (Peto et al., 1984;
26 Sawyer et al., 1984). Other calculation conventions include a 2-year standard lifespan for rat and
27 mouse studies; standard food, water, and air intake factors for each sex and species; standard
28 absorption rates for each exposure route; use of time-weighted-average doses; and correction for
29 less-than-lifespan studies (Peto et al., 1984). A common method promotes consistency; for

DRAFT-DO NOT CITE OR QUOTE 31 AUGUST 9, 1996

-------
. 1 example, Kfetabl.e methods give similar, but usually lower, TD50s than summary incidence methods
2 (Goldetal., 1986a). '' , '
3 • These factors are covered in EPA's methodology for developing EDi0s and applying them
4 in hazard rankings (EPA, 1988c, 1994b), incorporated here by reference. The methodology
5 specifies detailed approaches for selecting data sets and tumor responses; deriving equivalent
6 doses across species, dosing regimens, and exposure routes; adjusting for less-than-lifetime dosing
7 or survival; and modeling in the experimental range. To allow consistent estimates where data are
8 missing, the methodology provides defaults. With this approach, ED 10s for .different chemicals
9 and experimental protocols are made comparable.
10 2. Sources of TD50s and ED10s
11 TD50s and EDi0s are readily available from the published literature. The first and most
12 comprehensive compilation is the Carcinogenic Potency Database (CPDB), developed and
13 regularly updated by Gold et al. (1984, 1986b, 1987, 1990, 1993, in press). The CPDB provides
14 standardized information on over 4400 carcinogenicity experiments on over 1100 chemicals.
15 Carcinogenic potency is described by the TDSO, defined as "the chronic dose rate that will halve
16 the probability of remaining tumor-free throughout the standard life span." TDSOs have been
17 used to investigate many questions, including those concerning chemical carcinogenesis,
18 carcinogen identification, cross-species extrapolation, reproducibility of results, and ranking
19 possible carcinogenic hazards. TD50s from the CPDB have been used by California EPA as an
20 ' alternative method for low-dose extrapolation (Hoover et al., 1995).
21 EPA uses EDi0s to rank potential carcinogens found at waste sites (EPA, 1988c, 1989b,
22 1995b) and evaluate health hazards from air pollutants (EPA, 1994b). These references provide
23 standardized cancer potency estimates for more than 100 chemicals found at waste sites and more
24 than 80 air pollutants. EDi0s from these references have been used in other comparative risk
25 analyses. The 10-percent level was chosen to represent the lower end of the experimental range
26 and, thus, be more pertinent to human environmental exposure than TDsos.
27 Judgment is essential when adapting TD50s or ED10s to a particular application. For
28 example, the inclusive nature of the CPDB supports use of TD50s as a midrange response, as many
29 carcinogenicity experiments test only one high dose. In contrast, EPA uses ED10s from the lower -

DRAFT-DO NOT CITE OR QUOTE 32 AUGUST 9, 1996

-------
' . 1 .end of the experimental range because of its focus on human environmental exposure.
2 •'•'".'"' ''•'.. '•
3 B. Use in Low-Dose Extrapolation
4 Low-dose extrapolation goes beyond the experimental information on tumor incidence to
5 estimate risks at lower doses. Extrapolations generally have been described by a slope factor..
6 which is an upper bound on the slope of an assumed linear dose-response curve at low doses.
7 Slope factors can be multiplied by exposure levels to, bound the cancer risk. Recently proposed
8 • cancer guidelines (EPA, 1996a) provide alternative approaches to low-dose extrapolation, with
9 linear, nonlinear or both as defaults based on an LED1(j. This section summarizes the proposed
10 use of LEDi0s, describes how the proposed use of LED10s would change existing slope factors,
11 and discusses some issues that have appeared in the literature. --.-.,.
12 1. Proposal for low-dose extrapolation under the 1996 Proposed Guidelines for Carcinogen
13 Risk Assessment
14 As described previously, the dose-response assessment under the new guidelines is a two-
15. step process. In the first step, response data are modeled in the:range of observation. It should
16 be noted that in addition to modeling tumor data, the proposed guidelines allow for the
17 opportunity to use, and model other kind of responses if they are considered to be important
18 measures f carcinogenic risk. In the second step, extrapolation below the range of observation is
19 accomplished by biologically based or case-specific modeling if there are sufficient data, or by a
20 default procedure using a curve-fitting model The proposed guidelines, indicate a preference for
21 biologically based dose-response models, such as the two-stage model of initiation plus clonal
22 expansion and progression (Moblgavkar and Knudson, 1981; Chen and Farland, 1991; EPA,
23 1995d) for the extrapolation of risk. Because the parameters of these models require extensive
24 data, it is anticipated that the necessary data to support these models will not be available for most
25 chemicals. Therefore, the 1996 proposed guidelines allow the use of several default extrapolation
26 approaches. -., , ,
27 The default extrapolation approaches are based on "curve-fitting" in the observed range to
DRAFT-DO NOT CITE OR QUOTE 33 AUGUST 9, 1996

-------
f 1 determine the lower 95% confidence limit on a dose associated with 10% extra risk (LED103).
2 The LEDi0 is proposed as a standard point of departure. The 10% response is at or near the limit
3 of sensitivity in most cancer bioassays. Other points of departure may be appropriate, e.g., if a.
4 response is observed below the 10% level. The point of departure forms the basis for the default
5 . extrapolation approaches described below.
6 Linear default extrapolation procedure--The LMS procedure of the 1986 guidelines for
• 7 extrapolating risk from upper confidence intervals is no longer recommended as the linear default
8 in the 1996 proposed guidelines. The linear default in the new guidelines is a straight-line
9 extrapolation to the origin from the point of departure identified by curve fitting in the range of
10 observed data (the slope of this line is 0.10/LED10, which is inversely related to the LED10> i.e.,
11 high potency is indicated by a large slope and a small LED10). It should be noted that the
12 straight-line extrapolation from the LED10 and the LMS procedure produce similar results (Gaylor
13 and Kodell, 1980).-
14 The straight-line/LED10 approach does not imply unfounded sophistication as
15 extrapolation with the LMS procedure does. The linear default approach would be considered for
16 agents that directly affect growth control at the DNA level (e.g., carcinogens that directly interact
17 with DNA and produce mutations). There might be modes of action other than DNA reactivity
18 (e.g., certain receptor-based mechanisms) that are better supported by the assumption of linearity.
19 When inadequate.or no information exists to explain the mode of action of a carcinogen, the linear
20 default approach would be used as a science policy choice in the interest of public health.
21 Likewise, a linear default would be used if evidence demonstrates the lack of support for linearity
22 (e.g., negative genotoxicity studies), but there is also an absence of sufficient information on
23 another mode of action to explain the induced tumor response. The latter is also a public health
24 conservative policy choice. ,
25 Nonlinear default extrapolation procedure—Although the understanding of the mechanisms of
26 induced carcinogenesis likely will never be complete for most agents, there are situations where
395 percent lower bound on the effective dose associated with a 10 percent increase in
response; equivalent to the BMD/C. The term LED10 is used in this section becuase of its use hi ,
the proposed guidelines for carcinogen risk assessment (EPA, 1996a).
DRAFT-DO NOT CITE OR QUOTE 34 AUGUST 9, 1996

-------
  l     evidence i's sufficientto support an assumption of nonlinearity. Because it is experimentally
  2     difficult to distinguish modes of actions with true "thresholds" from others with a nonlinear dose-
  3     response relationship, the proposed nonlinear default procedure is considered a practical approach
  4     to use without the necessity of distinguishing sources of nonlinearity. Moreover, the use of
  5     empirical models to approximate a nonlinear dose-response relationship is discouraged, because
  6     different models can give dramatically different results below the observable range of tumor
  7   .  response, with no basis for choosing among them. Thus, in the 1996 proposed guidelines, the
.8     nonlinear default approach begins at the identified point of departure (LED10) arid provides a
  9     margin of exposure (MOE) analysis rather than estimating the probability of effects at low doses.
'10           , A nonlinear default position must be consistent with the understanding of the agent's
 11     mode of action hi causing tumors. For example, a nonlinear default approach would be taken for
 12     agents causing tumors as a secondary consequence of organ toxicity or induced physiological
 13     disturbances. Because there must be a sufficient understanding of the agent's mode of action to
 14     take the nonlinear default position, and because the proposed guidelines allow for the opportunity
 15     to model not only tumor data but other responses thought to be important precursor events in the
 16     carcinogenic process (e.g., DNA adducts, mutation, cellular proliferation, hormonal or
 17     physiological disturbances, receptor binding), modeling of key nontumor data is anticipated to ,
 18     make extrapolation based on the nonlinear default procedures more meaningful by providing
 19     insights into the relationships of exposure and tumor response below the observable range.
20     Nontumor data may actually be used instead of tumor data for determining the point of departure
21      for the MOE analysis.
22            The MOE analysis is used to compare the LED10 with the human exposure levels of
23      interest.  The acceptability of an MOE is a matter for risk management; thus, the key objective of
24      the MOE analysis is to describe for the risk manager how rapidly response may decline with dose.
25      The MOE analysis considers:                .
2<3      +     steepness of the slope of the dose response, including the degree to which the dose-
27            response relationship deviates from a straight line.
28      4     human differences in sensitivity  - if this cannot be determined itis considered to be  at
29            least 10-fold.                        '             '

        DRAFT-DO NOT CITE OR QUOTE       35                           AUGUST 9,  1996

-------
 1     4     interspecies differences - if this cannot be determined, humans can be considered 10-fold
 2    .        more sensitive; if evidence shows humans to be less sensitive, a fraction no smaller than
 3            1/10 can be used. This compares cross-species sensitivity to equivalent doses, which are
 4            calculated using either toxicokinetic information or an oral default scaling factor based on
 5            equivalence of mg/kg3/4-d (US EPA, 1992), or inhalation default using the RfC dosimetry
 6            approach (USEPA, 1994c).                  .
 7     4     nature of the response being used for the point of departure, i.e., tumor or nontumor
 8            data - tumor data might support a greater MOE than a more sensitive precursor response
 9            which can be measured at lower exposures.
10     4     biopersistence of the agent becomes an important factor in the MOE analysis if
11            nontumor precursor response data from less than life-time exposures are used for
12            determining the point of departure for. extrapolation of risk.          .
13     Both linear and nonlinear defaults—"There may be situations where it is appropriate to consider
14     both linear and nonlinear default procedures.  For example, an agent may produce tumors at
15     multiple sites'by different mechanisms. If it is apparent that an agent is both DNA reactive and
16     highly active as a promoter at higher doses, both linear and nonlinear default procedures may be
17     used to distinguish between the events operative at different portions of the dose-response curve
18     and to consider the contribution of both phenomena.
19            In curve-fitting the data in the observed range to determine a point of departure for the
20     defaults discussed above, the multistage model can be used as a default empirical curve-fitting
21     model; personal computer software has been developed for EPA analyses (Howe et al., 1986).
22     Results from other software packages, even those based on the same model, can be difficult to
23     compare, because they can employ different parameter constraints, dose transformations,
24     background incidence adjustments, confidence interval methods, or goodness of fit statistics. For
25     example, a spreadsheet version of the multistage model (Haas, 1994) uses a logarithmic
26     transformation of dose to implement the nonnegativity constraints in the multistage model; such ,
27     variants can be statistically sound yet introduce small differences in results compared to other
28     EPA analyses.
29            Slope factors developed from LED10s and alternative points of departure, including ED10s :

      1 DRAFT-DO NOT CITE'OR QUOTE       36                          AUGUST 9, 1996

-------
l and ED01s,- are sjable in the face of minor perturbations in incidence in controls and at the lowest
2 tumor dose (Gaylor et al., 1994). This represents an improvement over the maximum likelihood
3 • estimate of slope from the linearized multistage procedure, which is not statistically stable below
4 the experimental range (BCrewski et al., 1990, 1991; Gaylor etal., 1994).
5 [Note: ", Use of nontumor data is currently being discussed, and additional information will be
6 added later.] . . ,
7 2. Issues in choosing a point of departure
8 The approach of the proposed cancer guidelines (U.S.EP A, 1996a) reflects an
9 observation that carcinogenicity bioassay studies can provide information in the experimental
.10 range, but cannot determine the nature of the true dose-response relationship below that range;
ll this/suggests the separation of analysis in the experimental range from extrapolation to lower
12 doses (Gaylor and Kodell, 1980). Using a model to extrapolate to a pre-determined risk level,
13 with further extrapolation by straight line, was first proposed by Van Ryzin (1980). This proposal
14 • allowed that extrapolation by model could proceed to a risk of 0.01 percent. Later refinements
15 suggested that extrapolation stop at 1 percent to avoid model dependency (Farmer et al., 1982).
X '•'•'* *
16 For an upper bound on linear extrapolation, a straight line from an upper bound on risk at the
17 lower end of the experimental range liad been proposed by Gaylor and Kodell (1980).
• '•'•• •.'_•*>• •
18 Judicious choice of a starting point is key to the credibility of a low-dose extrapolation.
19 The high correlations established by Cogliano (1986) and Krewski etal. (1990) led to the idea
20 'that ED 10s can serve as a common measure for both potency ranking and a starting point for low-
21 .dose extrapolation (NRG, 1993). The concerns of Gold etal. (1993), Wartehberg and Gallo
22 (1990), and Hoel (1990) remind the risk assessor that effects at high and low doses can be
23 different, and that the shape of the dose-response curve in the experimental range contains useful
24 information. The concern about overestimation if the dose-response curve turns upward further
25 highlights differences between high and low doses. These concerns can be addressed, in part, by
26 using a point of departure near the lower end of the experimental range.
27.' : . '• . . - • - • • • ' , • • • '-
DRAFT-DO NOT CITE OR QUOTE 37 AUGUST 9, 1996

-------
'.  1      C. Dose-Response Characterization
                  •••••'       .      '•
   2    •    f    Cancer risk assessments attempt to describe a dose-response relationship that can be used
   3      to evaluate risks over a range of environmental exposure levels. All relevant experimental
   4      information is considered, with incomplete information addressed by a set of default assumptions
   5      and adjustments.  Some adjustments, for example, the cross-species scaling factor and the time-
   6      weighted-average dose metric, are intended as unbiased defaults not expected to contribute to
   7      health-conservative estimates (EPA, 1992).                       .
   8            The principal health-conservative defaults in cancer risk assessment have'been the
   9      presumption that animal results are pertinent to humans, the use of low-dose linear models to
  10      extrapolate these results to lower doses, and the use of statistical upper bounds to express slope
  11      factors. The proposed cancer guidelines (EPA, 1996a) would not change two of these defaults:
  12      animal results would still be considered pertinent to humans, in the absence of convincing
  13      evidence to the contrary; and slope factors would retain the use of statistical bounds, though' this
  14      would be more explicitly communicated by the "L" in the term "LED10."
  15             In contrast, the proposed availability of both linear and nonlinear defaults, each using
  16      LED10s as a point of departure, signals a shift from routine application of linear defaults toward a
  17      most-plausible-hypothesis approach. When a linear default is used, the LED10 approach should
  18  ,    have only a small effect on existing slope factors, as linear extrapolation from TDsos or lower
  19      points gives results similar to other low-dose linear methods (Krewski,  1990). When a nonlinear
 20 '    default is used, however, there will be substantial changes in the way a cancer hazard is
 21      characterized.  The risk assessment will characterize how rapidly risk is reduced as exposure is
 22      reduced, and this characterization will be both more qualitative and quantitative in nature. Of
 23      increased importance will be a discussion of the factors that affect the magnitude of a cancer risk
 24      at lower doses,, including slope at the point of departure, nature of the response, persistence in the
 25      body, sensitivity of humans compared to experimental  animals, and nature and extent of human
 26      variability hi sensitivity.  The uncertainty inherent in these factors likely will be much more
 27      important than issues in computing LED10s.
 28            Because low-dose linear models have been used extensively hi cancer risk assessment,
 29      there will be a natural inclination to compare nonlinear methods to linear extrapolation. Even

        DRAFT-DO NOT CITE OR QUOTE        38                           AUGUST 9, 1996

-------
l     when linear and nonlinear models both fit the experimental information, risk estimates at lower
2     doses from a nonlinear model can be substantially lower than those from a linear model.  This will
3     make it critical that risk assessments using nonlinear methods explain why risk is anticipated to
4     decline more than proportionately with dose, that risk assessments using linear methods explain
5     why risk is anticipated to decline in proportion to dose and, for both methods, identify the major
6 •    determinants of risk at low doses and address the principal sources of uncertainty and human
7     variability.                                         .                   •       ,
8          • •      •     •       • •    "      •'  -      -      ••-..'••'••
9         '                '                     .               .       '.-'  '    ''.      :  -
      DRAFT-DO NOT CITE OR QUOTE       39                         AUGUST 9, 1996

-------
. .1 ' .. VI. FUTURE PLANS
2 ' ' '"''•' ' ' '' .
3 A. Updating of Guidance Document
4 This guidance document will be updated as new information on the characteristics of
5 modeling of dose-response data and BMD/C analysis becomes available. Information from the
6 literature as well as that submitted to the Agency for review will be considered in revisions to this
7 guidance document. In particular, as approaches become established for selection of the BMR for
8 various endpoints or types of endpoints, this information will be added to the guidance.
9 The guidance in this document is intended to be consistent with various EPA risk
10 assessment guidelines and with the RfD/C and CRAVE processes (for input into the IRIS
11 Database). As changes occur in either the guidelines or in the RfD/C or CRAVE process, this
12 guidance may also need to be revised. .
13
14 B. Potential for Use of the BMD/C in Cost-Benefit Analyses
15 1. Background
16 As a regulatory agency, the US EPA has responsibility for the implementation in whole or
17 in part of about a dozen environmental statutes. Two statutes explicitly require the Agency to
18 weigh the health and environmental benefits of proposed regulations against then- costs: the
19 Federal Insecticide, Fungicide and Rodenticide Act as amended (P.L. 110-532, Oct. 25, 1988,
20 102 Stat. 2654), and the Toxic Substances Control Act (P.L. 100-551, Oct. 28, 1988, 102 Stat.
21 2755). President Clinton's Executive Order 12866 on Regulatory Planning and Review also
22 directs the Agency to consider the benefits of the regulations it proposes. In the context of the
23 BMD/C guidance document here, the term "benefits" refers only to the reduced incidence of
24 adverse health effects that would be expected to occur as a result of implementing a given
25 regulation. Currently, no consensus exists as to how noncancer health benefits may be
26 characterized in terms of the expected numbers of cases of some specific disease that are avoided
27 by reducing the level of exposure for a given population. Note that the monetary valuation of
28 these human health benefits and the costing of regulations aimed at obtaining them are outside the
29 scope of risk assessment methodology and are not considered here.
* » '
DRAFT-DO NOT CITE OR QUOTE 40 AUGUST 9, 1996

-------
I The goa| of a benefits analysis is to. determine the benefits to society rather than to the
'* * *•*'... ; " • *
2 . individual, and thus requires estimates of the incidence of adverse effects in a population, not just
3 unit risk. The estimation of benefits is a multi-step process that requires assessment of individual
4 risk, inter-individual variability and population size in order to estimate population risk. First, the
5 incidence of adverse health effects is estimated for a baseline set of exposure conditions (usually
6 the current condition).
7 Second, the incidence of expected adverse health effects is estimated for an alternative set
8 of conditions, usually reflecting anticipated reductions in exposure in the affected'population after
9 regulations are in effect. Finally, the benefits of achieving the reduced exposure are calculated
10 from the difference between the estimated adverse health effects for the baseline conditions and
11 alternative conditions. While point estimates are often derived for health benefits, an alternate
12 approach could entail a probabalistic scheme, based on upper and lower bounds on the individual
13 risk and exposure curves. -
14 1° predicting numbers of expected adverse health effects, one needs to consider the weight
15 of evidence that the critical effect in deriving the BMD/G is the effect of concern in human
16 populations. One would have greatest confidence in benefits assessments based on strong human
17 data; high confidence where the mechanism is known, the critical endpoint is observed in several
18 animal test, species and other endpoints occur hi the same sequence in several species; and less
19 confidence where a varjety of effects are seen in different animal species, appearing hi different
20 . sequences.
21 2. Benefits estimates using BMD/C and RfD/C
22 In the case of carcinogens, standardized methodologies exist for assessment of individual
23 risk and estimates of the incidence of cancer in a population (EPA, 1996a). The use of .
24 benchmark-style approaches hi the assessment of individual cancer risk has been discussed
25 elsewhere in these guidelines (section V). To estimate benefits, the numbers of expected cases of
26 cancer avoided under .different regulatory scenarios are obtained by multiplying individual risk
27 times the size of the exposed population. .
28 In noncancer health benefits analyses, the quantification of the number of cases of disease
29 avoided has been difficult because the RfD/G derivation methodology does not provide estimates .
• .'..••. .••«-.. '
DRAFT-DO NOT CITE OR QUOTE 41 AUGUST 9, 1996

-------
 11      of individual Depopulation risk. To date, with a few exceptions like lead (EPA, 1991b), the
  2      primary measure of benefits for regulating non-carcinogens has been the estimated size of the
  3      population expected to have reduced exposure to a chemical as the result of regulations
  4      implemented to interdict exposure or to reduce ambient concentration of pollutants.  These
  5      estimates, however, do not directly indicate the extent of morbidity, nor the number of cases of
  6      disease or mortality that would be avoided in the affected populations;  This guidance does not fill
         t                                     »  •       ,
  7      these gaps per se.
  8             The debate regarding the defensibility of approaches to quantifying the benefits associated
  9      with reducing exposure to noncarcinogens is long-standing (NRC, 1983). Efforts to quantify
 10      noncancer risks at low levels of exposure should be based on empirical dose-response data instead
 11      of statistical models to extrapolate to risks at low exposures, and should have a strong biological
 12      basis for the particular form of model chosen.  In the BMD/C approach, there is an explicit
 13      estimate of the dose that corresponds with some level of risk within or not far below the observed
 14      range. These estimates potentially can be used to determine the number of cases expected for
 15      various non-cancer endpoints when exposure levels are in or near those in the experimental data
 16      range.
 17            However, estimation of benefits will usually also require determining risk well below the
 18      experimental range, and determining the level below which observed effects are not of concern.
 19      These determinations include risks and levels that apply to the general population, and those that
20      apply to sensitive subgroups.
21      3. Issues in quantifying noncancer health benefits
22            The benchmark dose approach thus provides a good starting point to develop benefits
23      estimates for non-carcinogens.  Many of the key issues for benefits assessment are addressed
2-i      during BMD/C development, e.g., selecting endpoints relevant to a human disease state,
25      identifying the proper scaling factor to convert animal dose values to human equivalent doses,
26      converting continuous response data to quantal form to support estimates of the probability of
27      expected adverse health effects, and modeling critical health effects data in the observable range
28      using standard mathematical and statistical models aimed at estimating the lower confidence
29      bound on the dose producing a response.   Two major issues remain that require the development.

        DRAFT-DO NOT CITE OR QUOTE        42'                         AUGUST 9, 1996

-------
." l of new paradigms to qu,antify population risk at low dose exposures. One is the extent to which
2 .,••'••• there may be sensitive or "super-exposed" subpopulations. The secbnd issue involves identifying,
3 in a progression of effects in the same mechanistic pathway, those effects which have such limited
. ' f • ' - - .
4 severity as to be of no concern, beyond representing biomarkers of exposure. Work iscontinuing
5 in these areas. , •• ' v •
6 . •'.-'. • - •. •••-.-., '•;•.•.'
7 C. Research and Implementation Needs
8 - A number of research needs related to the BMD/C approach were indicated in the
. 9 background document (EPA, 1995c). These included the development of dose-response models
io and related methods for use with various types of data, guidelines for handling lack of fit,
11 development of methods for applying pharmacokinetic considerations, guidelines for selecting
12 appropriate measures of altered response, study of the, sensitivity of the BMD/C to choice of
13 model, particularly in relation to the level of the BMR, and to the confidence limit size, guidelines
14 for selecting a single BMD/C when more than one is calculated, investigation of uncertainty
15 factors, comparison of dose-response curves for different types of data and toxic endpoints, and
16 development of dose-response models for multiple endpoints of toxicity. Several of these
17 research needs have been addressed in the .present document. In most cases, however, further
18 research is needed on these topics:
19 The most critical data need from the point of view of implementing the BMD/C approach
20 on a broad scale according to the guidance in this document is the selection of appropriate
21 measures of altered response. As discussed in Section E.C, selection of the BMR should be
22 dependent on one of two factors, the degree of change considered biologically significant, or that
23 degree of change at the limit of detection. These determinations will need to be done on an
24 endpoint by endpoint basis and, thus, will require bringing together experts in various disciplines
25 to reach consensus. In those situations where biological significance is equated with statistical
26 significance, the limit of detection must be determined. This will require the evaluation of data
27 across a number of studies of similar design to determine for each endpoint of concern what is the
28 limit of detection, on average. Alternatively, studies comparing NOAELs and BMD/Cs for
29 different BMRs could be conducted if an adequate database exists, as has been done for
• , ,,<.'. • ' . ( ... . ,
DRAFT-DO NOT CITE OR QUOTE 43 AUGUST 9, 1996

-------
1 developmental toxicity (Faustman et al., 1994; Allen et al., 1994a and b; Kavlock et al., 1995).
2 More experience with modeling data using a variety of models is needed to compare the .
3 BMD/Cs obtained and the rationale for selection of the most appropriate model and BMD/C.
4 Sensitivity, of the BMD/C to the choice of model, confidence limit size and the actual software
5 used need to be explored with a variety of data sets. It would be advantageous, eventually, to
6 have a small suite of models that could be recommended for rioncancer health effects, depending
7 on the type of data.
8 The development of user-friendly software that is widely available for BMD/C modeling
9 would be extremely helpful. EPA has an on-going effort to develop such software, and the
10 models to be included at the present time are indicated in Appendix E.
11 Training of risk assessors hi the BMD/C approach, application of models and
12 interpretation of data is needed to implement this approach on an Agency-wide basis as well as
13 among individuals outside the Agency. EPA intends to implement such a training program 'for its
14 employees engaged in risk assessment, as well as risk managers who need to understand the
15 procedure. EPA can also work with other organizations (e.g., professional societies) in
16 developing training programs for those in other sectors who may be involved in risk assessments
17 or serve as reviewers of risk assessments of interest to the EPA.
18 Several issues have been considered in the development of this guidance document that do
19 not pertain directly to the application of the BMD/C approach but do impinge on its application hi
20 the risk assessment arena. These are intentionally not addressed in this guidance document, but
21 do need to be considered further hi other efforts. For examples there are several differences in the
22 way in which dose-response analyses are done for cancer endpoints versus noncancer endpohits.
23 These include the assumption of a threshold for noncancer endpoints versus a low-dose linear
24 relationship for cancer (although this is beginning to change with the newly proposed cancer risk
25 assessment guidelines - EPA, 1996). In addition, within human variability is dealt with explicitly
26 in the use of an uncertainty factor for noncancer endpoints, but implicitly for cancer. Dose scaling
27 across species is handled differently for cancer (mg/kg3/4-d) versus noncancer (mg/kg/day) health
28 effects. And while linear low-dose extrapolation for cancer to obtain a risk estimate or slope
29 factor includes a risk reduction strategy, there is no implicit risk reduction factor included for
. ' f ' f
DRAFT-DO NOT CITE OR QUOTE 44 AUGUST 9, 1996

-------
 1      noncancer risk assessment; rather uncertainty factors are used to derive a dose that is considered
 2      to be unlikelyto cause effects.detectable above background. These and other issues, suchashow
 3      to deal with severity of effect, need further consideration by the Agency.'
4         ,  '  '     .'•'.'•       . '       '  '  '      _-                 "       ••."'. '
5                    •     .       '       '    ' ."         -      '.•'';-..   .
6                    '     '      "•••.'            :  .   '.   - :.       '
     DRAFT-DO NOT CITE OR QUOTE        45                         AUGUST 9,1996

-------
 1              •     .                   VIL REFERENCES
 #              ! ;      •    ' ,                  •                                     '
 2                   .= .           -      •-     '    ,             >    '•
 3                                    .     -.               '.-...-•
 4      Alexeeff, G.V.; Lewis, D.C.; Ragle, N.L. (1993) Estimation of potential health effects from acute
 5      exposure to hydrogen fluoride using a 'benchmark dose' approach. Risk Analysis, 13(l):63-69.
 6                                    '                                      '•••'.
 7      Allen, B.C.; Kavlock, R.J.; Kimmel, C.A.; Faustman, E.M. (1994a) Dose-response assessment for
 8      developmental toxicity: n.  Comparison of generic benchmark dose estimates with NOAELs.
 9      Fund. Appl. Toxicol., 23:487-495.               '••••'    ' •  ,  _'
10                                                                                      .
11      Allen, B.C.; Kavlock, R.J.; Kimmel, C.A.; Faustman, E.M. (1994b) Dose-response assessment for
12      development toxicity:  ffl.  Statistical models.  Fund. Appl. Toxicol., 23:496-509.
13                                 •
14      Allen, B.C., and P.L. Strong, C.J. Price, S.A. Hubbard, and G.P. Daston (1996) Benchmark dose
15      analysis of developmental toxicity in rats exposed to boric acid.  Fund. Appl. Toxicol.. 32: (in
16      press).
17                         '
18      Auton, T.R. Calculation of benchmark doses from teratology data. Regulatory Toxicology and
19      Pharmacology, 1994, in press.
20                                      .                   ,        •
21      Barnes, D.G.; Daston, G.P.; Evans, J.S. Jarabek, A.M.; Kavlock, R.J.; Kimmel, C.A.; Park, C.;
22      Spitzer, H.L. (1995) Benchmark dose workshop: Criteria for use of a benchmark dose to estimate
23      a reference dose. Regulatory Toxicol. Pharmacol., 21:296-306;
24
25      Beck, B.D.; Conolly, R.B.; Dourson, M.L.; Guth,  D.; Hattis, D.; Kimmel, C.; Lewis, S.C. (1993)
26      Symposium overview: improvements in quantitative noncancer risk assessment. Fund. Appl.
27      Toxicology 20:1-14.
28
29      California Office of Environmental Health Hazard Assessment. (1993) Safety assessment for non-
30      cancer endpoints: The benchmark dose and other possible approaches. Summary report.
31                          '          '              '                  "       .'          "     •
32      Catalano, P.J.; Scharfstein, D.O.; Ryan, L.M.; Kimmel, C.A.; Kimmel, G.L. (1993) Statistical
33      model for fetal death fetal weight and malformation in developmental toxicity studies. Teratology
34      47:281-290.
35                                                              *  ' "
36      Chen, C.; Farland, W. (1991) Incorporating cell proliferation in quantitative cancer risk
37      assessment: approaches, issues, and uncertainties.  In: Butterworth, B.; Slaga, T.; Farland, W.;
38      McClain, M., eds. Chemical induced cell proliferation: implications for risk assessment. New
39      York: Wiley-Liss, pp. 481-499.              ,          .'....
40                            '                                                   .        •
41      Chen, J.J.; Kodell, R.L.; Howe, R.B.; Gaylor, D.W.  (1991) Analysis of trinomial responses from
42      reproductive and developmental toxicity experiments.  Biometrics 47:1049-105 8.
43


        DRAFT-DO NOT CITE OR QUOTE        46                         AUGUST9, 1996

-------
.'  1     Clayton, D.; Hills, M. (1993) Statistical Mpdels in Epidemiology. Oxford University Press
   2     Oxford.     '-..-.'               -,      '               .      -.                   '
   3                    -     '       '    .    '•'.''-'•             •    ...'.'...-
   4     Cogliano, V.J. (1986) The U.S. EPA's methodology for adjusting the reportable quantities of
 >  5     potential carcinogens. Proceedings of the 7th National Conference on Management of
   6.   • • Uncontrolled Hazardous Wastes (Superfund '8.6): Washington: Hazardous Materials Control
   7     Research Institute, pp. 182-185,  -                       '
   8    .-.''••''.'                         '"'.  •              •' .'•'         ''   ••
   9     Collins, M. A., G.M. Rusch, F.Sato, P.M. Hext, and R.-J. Millischer.  1995.
 10     1,1,1,2-Tetf afluoroethane: Repeat exposure inhalation toxicity in the rat, developmental foxicity
 11     in the rabbit, and genptoxicity in vitro and in vivo. Fund. Appl. Toxicol 25- 271-280
 • 12          '                         .   ,       • •     •'•--•;."••'   ',
 13     Cox, D.R.; Hirikley, D.V. (1974) Theoretical Statistics, chapter 7, Chapman and Hall, London.
 14    . .       • i  '  . .    -        .,-.'•    ..  '    .              .               . . •
 15     Crump, K.S. (1984) A new method for determining allowable daily intakes. Fundamental and
 16    Applied Toxicology 4:854-871.               ,
 17        .   ;         '  - ..    '    .   .       '                 .  ,  :  '.        ' ,      '   . '
 18    Crump, K, S, (1.995) Calculation of benchmark.doses from continuous data. Risk Analysis 15-
 19    79-89.             .                                   -....'        /  .   •
 20     .  •   -  ..  /       .    •  •  -  •         '     '                               ,
 21     Crump, K. S.; R. Howe. (1985) Chapter 9. in lexicological Risk Assessment. D.B. Clayson, D
 22     Krewski, I. Munro, eds. Boca Raton: CRC Press, Inc.
 23   "          ' •  .       '   ' ''    '      •    •            '•'-.''
 24     Davidian, M.; Giltinan, D. M. (1995) Nonlinear Models for Repeated Measurement Data
 25     Chapman and Hall, London. Farmer, J.H.; Kodell, R.L.; Gaylor, D.W. (1982) Estimation and
 26     extrapolation of tumor probabilities from a mouse bioassay with survival/sacrifice components
 27     Risk Analysis 2(l):27-34.                                                     ,
 28,;'  ..        "      •   .     •             '          .   •             "-•-'      ''
 29     Dourson, MX., Hertzberg, R.C., Hartung, R., Blackburn, K. (1985) Novel methods for the
 30     estimation of acceptable daily intake. Toxicology and Industrial Health 1'23-41
 31 .     .    ,           '  /. - •    '          .    '   •    •   .  •    • •   . ;  .•    .'•...
 32     Draper, N.; Smith, H. (1981) AppUed Regression Analysis, Second Edition Chapter 10 Wilev
 33     New York.   ,                                                                     y'
 34         •  ..           '     .  '    .  •    . -•  -.   .'
 35     Farmer, J.H.; KodeU, R.L.; Gaylor, D.W. (1982) Estimation and extrapolation of tumor
 36     probabilities from a mouse bioassay with survival/sacrifice components.  Risk Analysis 2(lY27-34
 37                     .                       '           ,'  -      '   .  '
 38'    Faustman, E.M.;  Allen, B.C.; Kavlock, R.J.; Kimmel, C.A. (1994) Dose-response assessment for
 39     developmental toxicity:  I. Characterization of data base and determination of NO AELs  Fund
 40     Appl. Toxicol., 23:478-486.                            ,
 41           •.'    .       '       -.''..'••    '          '        '..-'"      ''       .
 42     Fleiss, J.L.( 1981) Statistical Methods for Rates and Proportions. Second Edition Wilev  New
 43     York.                        ;         .        -   -  .


        DRAFT-DO NOT CITE OR QUOTE        47                          AUGUST 9, 1996

-------
 1     Gaylor, D.-W. (1983) The use of safety factors for controlling risk. Journal of Toxicology and
*2"     Environmental Health. 11:329-336. ••                           '•
 3                                  •.                    '      -.-...  ..        -. -.
 4    • Gaylor, D.W.; Kodell, R.L. (1980) Linear interpolation algorithm for low dose risk assessment of
 5     toxic substances. J. Environ. Pathol: Toxicol. 4:305-312.
 6,                          '          '                     .                            .
 7*    Gaylor, D.; Slikker, W., Jr. (1990) Risk assessment for neurotoxic effects. NeuroToxicology 11:
 8     211-218.                                              ..
 9                   •                                    •
10     Gaylor, D.W.; Kodell, R.L.; Chen, JJ.-~Springer, J.A.; Loreritzen, R.J.; Scheuplein, RJ. (1994)
11     Point estimates of cancer risk at low doses. Risk.Analysis 14(5):843-850.      ..
12 •
13     Gerrity, T.R.; Henry, C.J., eds. (1990) Summary report of the workshops on principles of route-
14     to-route extrapolation for risk assessment.  In: Principles of route-to-route extrapolation for risk
15     assessment, proceedings of the workshops; March and Juiy; Hilton Head, SC and Durham, NC.
16     New York, NY: Elsevier Science Publishing Co., Inc.; pp. 1-12.
17                                          .         •••"..'.'
18     Gold, L.S.; Sawyer, C.B.; Magaw, R.; Backman, G.M.; de  Veciana,  M.; Levinson, R.;
19     Hooper, N.K.; Havender, W.R.; Bernstein, L.; Peto, R.; Pike, M.C.;  Ames, B.N. (1984) A
20     carcinogenic potency database of the standardized results of animal bioassays. Environ. Health
21     Perspect. 58:9-319.                                                 .                .  -
22                      '               :            .'..••
23     Gold, L.S.; Bernstein, L.; Kaldor, J.; Backman, G.; Hoel, D. (1986a) An empirical comparison of
24     methods used to estimate carcinogenic potency in long-term animal bioassays: lifetable vs     *
25     summary incidence data. Fund. Appl. Toxicol. 6:263-269.
26
27     Gold, L.S.; de Veciana, M.; Backman, G.M.; Magaw, R.; Lopipero,  P.; Smith, M.;
28     Blumenthal, M.; Levinson, R.; Bernstein, L.; Ames, B.N. (1986b) Chronological supplement to
29   • the carcinogenic potency database: standardized  results of animal bioassays published through
30     December 1982. Environ. Health Perspect. 67:161-200.
31                                          .        .
32     Gold, L.S.; Slone, T.H.; Backman, G.M.; Magaw, R.; Da Costa, M.; Lopipero, P.;
33     Blumenthal, M.; Ames, B.N. (1987) Second chronological  supplement to the carcinogenic
34     potency database: standardized results of animal  bioassays published through December 1984 and
35     by the National Toxicology Program through May 1986. Environ. Health Perspect. 74:237-329.
36
37     Gold, L.S.; Slone, T.H.; Backman, G.M.; Eisenberg, S.; Da Costa, M.; Wong, M.; Manley, N.B.;
38     Rohrbach, L.; Ames, B.N.  (1990) Third chronological supplement to the carcinogenic potency
39     database: standardized results of animal bioassays published through December 1986 and by the
40     National Toxicology Program through June 1987. Environ. Health Perspect. 84:215-286.
41
42     Gold, L.S.; Manley, N.B.;  Slone, T.H.; Garfinkel, G.B.; Rohrbach, L.; Ames, B.N. (1993) The
43     fifth plot of the carcinogenic potency database: results of animal bioassays published in the general


       DRAFT-DO NOT CITE OR QUOTE       48                         AUGUST 9,1996

-------
..  1     literature through 1988 and by the National Toxicology Program through 1989 Environ Health
   2     Perspect. 100:65-168.            -.             .  .            •-
   3    '     .  : .      .   •   •   .  •   •'      '   •... ,\ •  •      ., •;        •  •
   4     Gold, L.S.; Manley, N.B.; Slone, T.H.; Garfinkel, G.B.; Ames, B.N. Rohrbach, L.; Stern, B.R.;
   5     Chow, K. (in press) Sixth plot of the carcinogenic potency database: results of animal bioassays
   6     published in the general literature 1989-1990 and by-the National Toxicology Program
   7     1990-1993. Environ. Health Perspect.
   8      .                 .-..-..     .          ,         -",'•."--.'
   9     G,uth(19_J   '                             ;'      ,          ,
  10 .'••''-    '   '  '      ..•'.:       •            :     .-•.."-..
  11     Haas, C.N.  (1994) Dose-response analysis using spreadsheets. Risk Analysis 14(6): 1097-1100
  12                 '      • '    '        .         . -..     .         "•'.'•'"
  13     Hasselblad, V.; A.M. Jarabek (1995) Dose-response analysis of toxic chemicals. In: Bayesian
  14     biostatistics. D.A. Berry, D.K. Stangl, eds. Marcel Dekker, Inc. New York
  is     '   ;   '  - .            .           •      ';.   •.       •       • . :       ••.       '
  16    Heihdel, J.J., CJ. Price, E.A. Field, M.C. Marr, C.B. Myers, RE. Morrissey, and B.A. Schwetz
  17    (1992). Developmental toxicity of boric acid in mice and rats. Fund. Appl. Toxicol 18-266-
  18 ,        .,..          •     '        .' •  .    '      -   \       •     '       -•       '-    -
  19    Hertzberg, R.C. (1989) Fitting a model to categorical response data with application to species
  20    extrapolation of toxicity.  Health Physics 57:405-409.
  21        ..'-••        "          '   r               '          •''•_••
  22 ;    Hertzberg, R.C., Miller, M. (1985) A statistical model for species extrapolation using categorical
  23     response data. Toxicology and Industrial Health 1:43-57
  24             •    '.-      '•''.'••'•-.-   •                   '          '
 25     Hext, P.M. and R.J. Parr-Dobrzahski, 1993. HFC 134a: 2 Year inhalation toxicity study in the
  26     rat.  ICI Central Toxicology Laboratory, Alderley Park, Macclesfield, Cheshire, UK  Report No
  27     CTL/P/3317.                                 .  .                                      /
 28        ••,;".•  ''-•"_'-'      '.'.,'••.; ••'.-      '       ••     •  .;.  •
 29     Hoel(1979)
 30                  •-       '              ''.'.'.'-'"..'•
 31     Hoel, D.G. (1990) Assumptions of the HERP index. Risk Analysis 10(4):623-624
 32   r                 '       '          ,        '•.-;.    '.••-••      .   •  .  •
 33     Hoover, S.M.; Zeise, L.; Pease, W.S.; Lee, L.E.; Hennig, M.P.; Weiss, L.B.; Cranor, C, (1995)
 34     Improving the regulation of carcinogens by expediting cancer potency estimation  Risk Analysis
 35     15(2):267-280.  ,          "   , ,                                              ,
 36          .  "  '  •      ''''.''...      -     ",              ,       ;'.    ,   '  -
 37     Howe, R.B.; Crump, K.S.; Van Lahdingham, C. (1986) Global 86: a computer program to
 38     extrapolate quantal animal toxicity data to low; doses. Prepared for U.S. EPA under contract
 39     68-01-6826.                       .                                       :
 40     .,.•''      ': '.    '       '  •• •       "'-'•'.        •'  '   •   '       •
 41     Jarabak, A.M., Hasselblad, V.  (1992) Application of a Bayesian statistical approach to response
 42     analysis of noncancer toxic effects.  Toxicologist 12:98.
 43         •'  '            .         •          ,      • '"• .        .   ' .    '    .       .;'•'•


        DRAFT-DO NOT CITE OR QUOTE        49'   >                     AUGUST 9, 1996

-------
  1      Johnson B-.L., J..Boyd, J.R. Burg, S.T. Lee, C. Xintaras, and.B.E. Albright. 1983. Effects on the
 '2      peripheral nervous system of workers' exposure to carbon disulfide/ Neurotoxicology 4(1):
  3      53-66,  "'•.-"                                     •.'..-.
  4                                                                '.''•'•'
  5      Kavlock, R.J., B.C. Allen, C.A. Kimmel, E.M. Faustman. (1995) Dose-response assessment for
  6      developmental toxicity: IV. Benchmark doses for fetal weight changes. Fund. Appl. Toxicol.,
  7      26:211-222.
  8
  9      Kavlock, R.J.; Schmid, IE.; Setzer, R.W., Jr.  (1996) A-simulation study of the influence of study
 10      design on the estimation of benchmark doses for developmental toxicity. Risk Analysis 16:391-
 11      403.                                           •••.-.
•12
 13      Kimmel, C.A., Gaylor, D.W. (1988) Issues in qualitative and quantitative risk analysis for
 14      developmental toxicity. RiskAnalysis.8:15-20.
 15                         '••-..-
 16      Kimmel, C.A.; Wellington, D.G.; Farland, W.; Rose, P.; Manson, J.M.; Chernoff, N.; Young, IF.;
 17      Selevan, S.G.; Kaplan, N.; Chen, C.; Chitiik, L.D.; Siegel-Scott, C.L.; Valaoras, G.; Wells, S.
 18      (1989) Overview of a workshop on quantitative models for developmental toxicity risk
 19      assessment. Environmental Health Perspectives 79:209-2150.
 20                                                               ,                         .
 21      Kimmel, C.A., M, Siegel, T.M. Crisp, and C.W. Chen (1996) Benchmark concentration (BMC)
 22      analysis of 1,3-butadiene (BD) reproductive and developmental effects. Fund. Appl. Toxicol.
 23      (Suppl., no. 1, part 2) 30: 146.
 24
 25      Kodell, R.L.; Chen, J.J.; Gaylor, D.W. (1995) Neurotoxicity Modeling for Risk Assessment.
 26      Regulatory Toxicology and Pharmacology 22:24-29.
 27
 28      Krewski, D. (1990) Measuring carcinogenic potency. Risk Analysis 10(4):615-617.
 29
 30      Krewski, D.; Zhu, Y.  (1994)
 31                                                          '        '
 32      Krewski, D.; Zhu, Y.  (1995) A simple data transformation for estimating benchmark doses in
 33      developmental toxicity experiments. Risk Analysis 15:29-39.
 34                   '                        .'              .
 35      Krewski, D.; Szyszkowicz, M.; Rosenkranz, H. (1990) Quantitative factors in chemical ,
 36      carcinOgenesis: variation in carcinogenic potency. Regul. Toxicol. Pharmacol. 12:13-29.
 37                                                                              ,
 38      Krewski, D.; Gaylor, D.;  Szyszkowicz, M. (1991) A model-free approach to low-dose
 39      extrapolation. Environ. Health Perspect. 90:279-285.
 40                       ,                                         -
 41      Kupper, L.L.; Hafher, KB. (1989) How appropriate are popular sample size formulas? The
 42      American Statistician 43:101-105.
 43


        DRAFT-DO NOT CITE OR QUOTE        50                          AUGUST 9, 1996

-------
  LefkopQulou,M.; Moore, D.; Ryan, L. (1989)The analysis of multiple binary outcomes-
  APP^i01110 rodent teratology experiments. Journal of the American Statistical Association 84:
  olO-ol5.'                    .       • .         , .

  Moolgavkar, S.H.; Knudson, A.G. (1981) Mutation and cancer: a model for human
  carcinogenesis. J.Natl. Cancer Inst. 66:1037-1052.

  McCullagh, P.; Nelder, J.A. (1989) Generalized Linear Models, Second Edition.  Chapman and
  Hall, London.                                     .        .

  National Research Council (NRC) (1983)  Risk Assessment in the Federal Government-
  Managing the Process.  Prepared by: Committee on the Institutional Means for Assessment of
  Risks to Public Health, Commission on Life Sciences. Washington, DC.

  National Research Council (NRC) (1993) Issues in risk assessment. Washington- National
  Academy Press, pp. 115-116.                 .             -

  National Research Council (NRC) (1994)  Science and Judgment in Risk Assessment, Committee
  on Risk Assessment of Hazardous Air Pollutants, Board on Environmental Studies and
  Toxicology, Commission on Life Sciences, National Academy Press, Washington, DC.

  National Toxicology Program (NTP) (1991) Technical report on the toxicology and
  carcinogenesis of l>butadiene (CAS No. 106-99-0) in B6C3F, mice (inhalation studies)  U S
  Department of Health and Human Services, Pubic Health Service, National Institutes of Health
  National Toxicology Program. NTP TR434, NTH Publ. No. 92-3165.                    .  '

  Peto, R.; Pike, M.C.; Bernstein, L.; Gold, L.S.; Ames, B.N. (1984) The TD50: a numerical  ;   :
  description of the carcinogenic potency of chemicals in chronic-exposure animal experiments
•  Environ. Health Perspect. 58:1-8.  .                                ,      ,           '   .-

  Price and Berner, 1995.  A benchmark dose for carbon disulfide: Analysis of nerve conduction
  velocity measurements from the NIOSH exposure database. Report to the Chemical
  Manufacturers Association Carbon Disulfide Panel.

  Research Triangle Institute (RTI)  (1994) Determination of the no-observable-adverse-efTect-level
  (NOAEL) for developmental toxicity in Sprague-Dawley (CD) rats exposed to boric acid in feed
  on gestational days 0 to 20, and evaluation of postnatal recovery through postnatal day 21   RTI
 Identification Number 65C-5657-200,                             '    '

 Ryan, L. 1992. Quantitative risk assessment for developmental toxicity. Biometrics 48:163-174.

 Sawyer, C.; Peto, R.; Bernstein, L;; Pike, M.C. (1984) Calculation of carcinogenic potency from
 long-term animal carcinogenesis experiments. Biometrics 40:27-40.


 DRAFT-DO NOT CITE OR QUOTE        51                         AUGUST 9, 1996

-------
 I     Simpson (19_X                     -.
 *2    • ,:        ' ,    ^     '       .  -  '•     '   '       '         '  ''       ' ' '      .'.
 3     Setzer, R.W.; Rogers; J.M. (1991) Assessing developmental hazard: the reliability of the A/D
 4     ratio.  Teratology 44:653-665.
 5                       '   .                              .
 6     SRA Symposium (1994)                                                 •'•
 7                      .                                  . •        '     '
 8     U.S. Environmental Protection Agency (EPA) (1986a) Guidelines for carcinogen risk
 9     assessment. Federal Register 51(185):33992-34003.
10
11     U.S. Environmental Protection Agency (EPA) (1986b) Science Advisory Board Comments
12                                               .       ,
13     U.S. Environmental Protection Agency (EPA) (1987) Hazardous substances; reportable quantity
14     adjustments; proposed rules. Federal Register 52(50):8140-8186.
16
16     U.*S. Environmental Protection Agency (EPA) (1988a) Science Advisory Board Comments.
17
18     U.S. Environmental Protection Agency (EPA) (1988b) Science Advisory Board Comments.
19
20     U.S. Environmental Protection Agency (EPA) (1988c) Methodology for evaluating potential
21     carcinogenicity in support of reportable quantity adjustments pursuant to CERCLA section 102.
22     Washington: report no. EPA/600/8-89/053.                   '.".-.
23
24     U.S. Environmental Protection Agency (EPA) (1989a) Reportable quantity adjustments;
25     delisting of ammonium thiosulfate; final rules. Federal Register 54(155):33418-33484.
26
27     U.S. Environmental Protection Agency (EPA) (1989b) Technical background document to
28     support rulemaMng pursuant to CERCLA section 102, volume 3., Washington: Office of Solid
29     Waste and Emergency Response.
30 '                                                                                        •
31     U.S. Environmental Protection Agency (EPA) (1989c) Science Advisory Board Comments.
32
33     U.S. Environmental Protection Agency (EPA) (1991a) Guidelines for developmental toxicity
34     risk assessment; notice. Fed Regist, 56:63798-63826.
35                '              .              .     '.-.-..,'•
36     US Environmental Protection Agency (EPA) (1991b) Regulatory impact analysis of proposed
37     national primary drinking water regulation for lead and copper. Prepared by Wade Miller
38     Associates, Inc. April.                                   '
39
40     U.S. Environmental Protection Agency (EPA) (1992) Draft report: a cross-species scaling factor
41     for carcinogen risk assessment based on equivalence of mg/kg3/4/day; notice. Federal Register
42     57(109):24152-24173.
43


      •DRAFT-DO NOT CITE 'OR QUOTE       52                         AUGUST 9,  1996

-------
  U.S. Environmental Protection Agency (1994a) Ranking of pollutants with respect to hazard to
  human health; proposed rule. Federal Register 59.      ...

  U.S. Environmental Protection Agency (1994b) Technical background document to support
  rulemaking pursuant to the Clean Air Act—section 112(g): ranking of pollutants with respect to
  hazard to human health. Research Triangle-Park, NG: report no. EPA-450/3-92-0-10.

  U.S. Environmental Protection Agency (1994c) Methods for derivation of inhalation reference
  dose concentrations and application of inhalation dosimetry. Office of Health and Environmental
  Assessment, Environmental Criteria and Assessment Office, Research Triangle Park. MC
  EPA/600/8-90/066F.                                                   .        '

  U.S. Environmental Protection Agency (1995a) Reportable quantity adjustments' final rule
  Federal Register 60(112):30926-30962.

 U.S. Environmental Protection Agency (1995b) Technical background document to support
 rulemaking pursuant to CERCLA section 102, vol. 7. Washington: Office of Solid Waste and
 Emergency Response.                       ,               .                :

 U.S. Environmental Protection Agency (1995c) The usex>f the benchmark dose approach in
 health nsk assessment.  Office of Research and Development, Washington, DC" EPA/630/R-
 94/007, February.    .                       '              .                •.     •,

 U.S. Environmental Protection Agency (1995d) Health assessment document for diesel emissions
 Washington, EPA/600/8-90/057Bb.         .         •

 U.S. Environmental Protection Agency (1995e) Benchmark dose concentration analysis for    '
 carbon disulfide. Internal Report.

 U.S. Environmental Protection Agency (EPA) (1995f)  Proposed guidelines for neurotoxicity
 nsk assessment; notice.  Fed Regist, 60:52032-52056

 U.S. Environmental Protection Agency  (EPA) (1995g) Manganese document

 U.S. Environmental Protection Agency (1996a) Proposed guidelines for carcinogen risk
 assessment. Federal Register 61(79): 17960-18011.

 U.S. Environmental Protection Agency (1996b) Guidelines for Reproductive Toxicity Risk
 Assessment; notice.  Federal Register (draft).

 U.S.,Environmental Protection Agency  (EPA) (1996c)  Integrated Risk Information System
 (IRIS).  Online. National Center for Environmental Assessment, Washington, DC.
DRAFT-DO NOT CITE OR QUOTE        53                  v      AUGUST 9, 1996

-------
 1     Van Ryzin, J. Q980) Quantitative risk assessment. J. Occup. Med. 22(5):321-326.
*              " *..      *     ',
 2                 .    •     •   '     '•           '      '•        . '         '-.        '
 3     Wartenberg, D.; Gallo, M.A. (1990) The fallacy of ranking possV-ie carcinogen hazards using the
 4     TDjo. Risk Analysis 10(4):609-613.        ...                     '.
 5                                  •                                 _.'           '
 6     Zeger, S. L.; Liang, K. Y. (1986) Longitudinal data analysis for discrete and continuous
 7     outcomes. Biometrics 42: 121-130.
 8
 9     Zhu, Y.; Krewski, D.; Ross, W.H.  (1994) Dose-response models for correlated multinomial data
10     from developmental toxicity studies.  Applied Statistics 43:583-598.
11                  '                          '•".-.-'.
12                                            .
13
       'DRAFT-DO NOT CITE OR QUOTE        54                          AUGUST 9, 1996

-------
•V;1 ' •',. APPENDIX A
3 ASPECTS OF DESIGN, DATA REPORTING, AND ROUTE EXTRAPOLATION
4 RELEVANT TO BMD/C ANALYSIS
5. ''"'''. , .'..;'•-• ' .. ' ••-.'...."•' ,
6 1. Design
.7 In general, studies with more dose groups and a graded monotonic response with dose
8 will be more useful for BMD/C analysis. Studies with only a single dose showing a response
9 different from controls are not appropriate for BMD/C analysis. Studies in which responses are
10 only at the same level as background or at or near the maximal response level are not considered
11 adequate for BMD/C analysis. It is preferable to have studies with one or more doses near the
12 level of the BMR to give a better estimate of the BMD/C. Studies in which all dose levels show
13 changes compared with control values (i.e., no NOAEL) are readily useable in BMD/C analyses,
14 unless the lowest response level is much higher than that at the BMR.
I5 In a recent simulation study by Kavloek et al. (1996), various aspects of study design
(number of dose groups, dose spacing, -dose placement, and sample size per dose group) were
examined for two endpoints of developmental toxicity (incidence of malformations and reduced
fetal weight). Of the designs evaluated, the best results were obtained when two dose levels had
response rates above the background level, one of which was nearthe BMR. In this study, there
was virtually no advantage in increasing the sample size from 10 to 20 litters per dose group.
When neither of the two dose groups with response rates above the background level was near
the BMR, satisfactory results were also obtained, but the BMDs tended to be lower. When only
one dose level with a response rate above background was present and near the BMR, reasonable
results for the maximum likelihood estimate and BMD were obtained, but here there were benefits
of larger dose group sizes. The poorest results were obtained when only a single group with an
elevated response rate was present, and the response rate was.much greater than the BMR.
2. Aspects of Data Reporting
In most cases, the risk assessor relies on published reports of key toxicologieal studies in
performing a dose-response assessment. Reports from the peer-reviewed literature may contain
summary information which can vary hi completeness yis-a-vis the data requirements of the BMD
DRAFT-DO NOT CITE OR QUOTE 55 AUGUST 9, 1996

-------
1 method. The optimal situation is to have information on individual subjects. It is very common to
** S ' . . '. • • r " - '.
2 ' have summary information (group level information, e.g., mean and standard deviation)
3 concerning the measured effect, especially for continuous response variables, and it must be
4 determined whether the summary information is adequate for the BMD/C method.to proceed.
6 Dichotomous data are normally reported at the individual level (e.g., 2/10 animals showed
6 the effect). Occasionally a dichotomous endpoint will be reported as being observed in a group,
7 with no mention of the number of animals showing the effect. This usually occurs when the
8 incidence of the endpoint reported is ancillary to the focus of the report. For BMD/C modeling of
9 dichotomous data, both the number showing the response and the total number of .subjects in the
10 group are necessary.
11 Continuous data are reported as a measurement of the effect, such as body weights or
12 enzyme activity hi control and exposed groups. The response might be reported hi several
• * • .
13 different ways, including as an actual measurement, or as a contrast (e.g., as absolute change from
14 control or as relative change from control). To model continuous data when individual animal
15 data are not available, the number of subjects, mean of the response variable, and a measure of
16 variability (e.g., standard deviation, SD; standard error, SE; or variance)are needed for each
17 group. The lack of a numerically reported SD/E precludes the calculation of the BMD/C, unless
18 partial information is presented (e.g., SD for control group only) and some assumptions are made.
19 For example, an assumption can be made that the variance in the exposed groups is the same as
20 the controls, but this introduces uncertainty, since the variance in the individual groups allows
21 more precise modeling of the data and calculation of the confidence limits.
22 Categorical data are defined as a type of quanta! data in which there is more than one
23 defined severity category hi addition to the no-effect category and the responses in the treatment
24 groups are characterized hi terms of the severity of effect (e.g., mild, moderate, or severe
25 histological change). Results may be classified by reporting an entire treatment group hi terms of
26 category (group level reporting), or by reporting the number of animals from each group in each
27 category (individual level reporting). For example, a report of epithelial degenerative lesions
28 might state that an exposed group showed a mild effect (group level) or that in the exposed group
29 there were 7 animals with a mild effect and 3 with no effect (individual level reporting). In such a

DRAFT-DO NOT CITE OR QUOTE 56 AUGUST 9, 1996

-------
.* 1 case, the BMD/C can be calculated using the dichotomous model after combining data in severity
2 • categories (e.g., mpdel all animals with aneffect or all with greater than a mild effect).
3 Dichotomous data can be viewed as a special case in which there is one category and the possible
4 response is binary (e.g., effect or no effect). Information may also be treated as 'categorical in
5 cases where an endpoint is inherently a dichotomous or continuous variable, but because the
6 endpoint is reported only descriptively, it cannot be treated quantitatively. The BMD/C approach
-: - . - . ' . • >,
7 cannot be applied because the minimum data required for .dichotomous models, number affected
8 and total number exposed, are not reported. '•
9 Modeling approaches have been discussed for categorical data with multiple categories
10 "(Doursonetal., 1985; Hertzberg, 1989; Hertzberg and Miller, 1985) and for group level
11 categorical data (Guth, Simpson). These regression models can also be used to derive a BMD/C,
12 by estimating the probability of effects of severity defined as adverse. This approach is analogous
13 to the BMD/C if the severity categories are defined consistently. This approach has had
14 considerably less review than other approaches being used for the BMD/C
15 3. Route Extrapolation
16 Tne criteria for determining-if route extrapolationis appropriate for risk assessment have
17 been discussed previously (EPA, 1994c; Gerrity and Henry, 1990) and the same criteria apply
18 when selecting data for BMD/C analysis. If it is determined that route extrapolation is
19 appropriate, the general procedure is to convert from the route of exposure in the study to the
20 route of exposure of interest in the risk assessment and then to perform the BMD/C, analysis. In
21 this way, any non-linearity in the route extrapolation model would be incorporated into the
22 calculation of the doses .used as input into the BMD model.
23 ' ' . ' • • • •••'.' -. . ••.'••'-'-
24
DRAFT-DO NOT CITE OR QUOTE 57 AUGUST 9, 1996

-------
1 ' . APPENDIX B
* * - * *
* - > *.
2 , SELECTING THE BENCHMARK RESPONSE (BMR) LEVEL
3 .. '';. '; ••• •''."•• •
4 1. Biologically Significant Change for Specifying the BMR
5 Continuous endpoints differ from dichotomous endpoints in the way. adversity is specified
6 and in the way BMRs can be expressed. Whereas only adverse dichotomous endpoints are
7 selected for consideration in a dose-response assessment, often the adversity of a continuous
8 endpoint depends upon the magnitude of the response. This can be manifested in two general
9 ways. For some continuous endpoints, the NOAEL is defined to be the highest dose at which the
10 difference between treated and control groups does not. exceed the criterion for a biologically
11 significant change. In essence, smaller changes are not considered adverse. For example, a
12 decrease in mean adult body weight usually is not considered adverse unless it is at least 10% of
13 the control mean. In such cases, the BMR should correspond to that biologically significant
14 magnitude of change. Selection of the biologically significant magnitude of change is an issue for
15 specialists hi various fields of toxicology. Unfortunately, data have seldom been considered in this
16 manner, so that it is difficult to get an answer to the question, "How much of a change is
17 biologically significant and should be considered an adverse effect?" The more usual situation is
18 that the magnitude of change considered biologically significant is based on statistical significance.
19 Other continuous endpoints are further classified into adverse and non-adverse values by
20 specifying a cut-off value that distinguishes the two categories. For example, human infants
21 weighing less than 2.5 kg at birth have been labeled "low-birth weight," an adverse outcome
22 useful jn epidemiological studies of effects of environmental agents on human birth outcomes.
23 Thus, while for dichotomous endpoints, BMRs are expressed in terms of a dose-related increase
24 in the incidence of adverse outcomes, BMRs for continous outcomes may be expressed either in
25 terms of a dose-related change in the mean or, as for dichotomous endpoints, an increase in the
26 incidence over background of the adverse outcome. This-latter approach for continuous ,
27 endpoints mandates a choice of the cutoff distinguishing adverse from non-adverse as well as the
28 choice of the BMR. . .
29 .

DRAFT-DO NOT CITE OR QUOTE 58 AUGUST 9, 1996

-------
   l      2.  Limit of Detection for Specifying the BMR
   2             The concept of "limit of detection" for a toxicological bioassay needs to be refined before
19
20
21
         it can be used to set a specific BMR, Consider a continuous sequence'of populations, identical to
         each other except for the dose of some toxic agent. At least after a threshold has.been exceeded,
         and in the absence of masking effects, the incidence of adverse effects (for dichotomous and
         dichotomized continuous endpoints) and the mean of cqntinous endpoints will increase (or
         decrease) as dose increases.  As the difference between the response in the control population and
         in a dosed population increases, so does the probability that a statistical test of that difference,
         carried out on samples from those populations (as in a toxicological dose-response study), will
         indicate a significant difference. The probability that the statistical test will be significant when
         there is a true difference between underlying populations is called the "power"  of the test, and
         depends upon the experimental design. This same concept of power is used to determine sample
         size when designing studies, when an effort is made to set a sample size that will give a relatively
  3
  4
  5
 •6
  7
• .8
  9
 10
 11
 12
 13
 14     high power (usually 80% - 90%) to detect a given magnitude of effect.  Note that, even in the
 15      absence of a dose-related effect, a statistical test has some probability of indicating a statistically
 16
 17
        significant difference.  This is variously referred to as the size of the test, or the Type I error, or
        the "alpha" level of the test (very often, it takes the value of 0.05: Le., there is a 5% probability of
 18      detecting an effect). A way to quantify what is meant by the limit of detection of a bioassay
        design, then, is by specifying how frequently we would expect to distinguish a treatment group
        with a response that is just detectable from a control group. Thus, for example, we might decide
        that a response was at the limit of detection when we could distinguish from background in 50%
22      of the experiments (power of 0.50) in which we attempt it.
23             To use this concept of limit of detection to specify the BMR for a specific endpoint and
24      species, first a power level is chosen to represent the limit of detection. This should be done in a
25      larger context than just the specific study at hand, and the selection should satisfy the goal for the
26      overall average BMD/C:NOAEL ratio of one.  The next step uses the  sample size and nesting
27.      structure (e.g., litters within darns, repeated measures) for a typical "good4" study of the endpoint
                                                         . .                 -             -  '    -
              'Using a cadre of standard study designs for a variety of endpoints is one way to reduce
.       the uncertainty that can be introduced when different individuals make decisions from limits of
        DRAFT-DO NOT CITE OR QUOTE        59                           AUGUST 9, 1996

-------
I and species of interest, and historical values of the measurements such as background incidence or
2 control group variance that are necessary for power calculation. With these one finds the
3 magnitude of response just detectable in a two-group design (control arid one treatment group)
4 using a one-sided test with a Type I error of 0.05 and the predetermined power (e;g., 0.50).
5 Finally, this change is expressed either as a change in the mean (for continuous endpoints) or as
6 additional or extra risk (for dichotomous endpoints), . -
* /
7 Additional risk and extra risk are two ways to quantify the increment to the background
8 risk of an adverse outcome for a dichotomous endpoint. Their definitions are: ' .
9 • - ' ' '•
10 Additional Risk at dose d = P(d)-P(0), and
11
12 Extra Risk at dose d = p>(d)-P(0)]/[l-P(0)],
13
14 where P(d) is the proportion of animals, given dose d that have an adverse response.
15 .
16 Additional risk is the proportion of responders in the exposed group beyond that in the control
17 group, and extra risk is the proportion of animals responding that would not otherwise have
18 responded under the assumption that the processes that lead to the adverse outcome in unexposed
19 subjects are independent of the processes that lead to the adverse outcome in the exposed subjects
20 (see Fig. 2). The greater the background incidence, the greater the difference between extra and
21 additional risk. If there are no responders in the control group [P(0)=0], there is no difference
22 between extra and additional risk. For an effect with an incidence, of 50% in the control group
23 and 55% in the exposed group, the additional risk is 5% and the extra risk is 10%. Likewise, for
24 a 90% background, and a 1% increase .in the exposed group, the additional risk is 1% and the
25 extra risk is 10%. The Agency has used extra risk models in the past for most animal-based
detection. We assume here the use of Agency testing protocols as the basis for standard "good"
study designs. In areas where standard testing protocols have not been developed, the Agency
encourages activities that can assist in identifying a "most common well-designed protocol" for
typicajly-studied endpoints.
DRAFT-DO NOT CITE OR QUOTE 60 AUCJUST 9, 1996

-------
                                                 Dose (d)
Figure 2.    Dose response curves for models incorporating difference forms of spontaneous
            background response
            (a)  No background response (P*(d) = P(d)).
            (b)  Independent background response (P*(d) =Y + (1 -Y)P(d)) -extrarisk.
            (c)  Addttlve background response (P*(d) = P(d +6)).
            (d)  Additional background response (P*(d) = f + P(d)).
    DRAFT-DO NOT .CITE OR QUOTE     61
AUGUST 9, 1996

-------
1 cancer risk assessments and for most work with BMD/C analyses to date. EPA-supported . .
' 2 research on developmental toxicity data (Allen et al., 1994a, b; Kavlock et al., 1995) used
3 ' additional risk, but since the background incidence in the data used were relatively low, the
4 difference between additional and extra risk are likely to be minimal.
5 It has been proposed that the. extra risk approach is consistent with the assumption that
6 independent mechanisms are responsible for the background incidence and the excess incidence of
7 effect in the exposed group. Although the basis for this explanation has not been well described
8 (Hoel, 1979), it could be argued that the effects of exposure would be masked iri proportion to
9 the background response, since independent mechanisms would not be assumed to have an
10 additive effect. The true exposure-related response would then be reflected by the extra risk
' '
11 model. There is no correspondingly simple interpretation of additional risk; its use seems
12 primarily to reflect computational convenience. r
13 There is limited basis for a science policy choice as to whether to use an additional or
14 extra risk approach for BMD/C analyses. The choice of the extra risk model would be
15 conservative in the sense that, for a given increment of risk, say 10%, the BMD/C corresponding
16 to an extra risk of 10% would always be equal to or less than that corresponding to an additional
17 risk of 10%. When the BMR is set based on limit of detection considerations, however, it does
18 not matter which of the two risk formulations are used; the resulting BMD/C will be the same,
19 because the value of the BMR based on extra risk and additional risk that correspond to the same
20 limit of detection differ (see the example, below). The default approach for BMD/C analysis of
21 dichotomous outcomes is to use extra risk to specify BMRs.
22 3. Examples .
23 Suppose the endpoint of interest has a dichotomous response and no nested structure, so )
24 that we would .normally assume that the response has a binomial distribution. Fleiss (1981) gives
25 an approximate formula for the power of.a two-sample test in such a situation, when each group
26 is the same size. This is a function of sample size, the probability of response hi each group, and
27 the Type I error of the test. This formula (Fleiss, 1981) can be used for the task at hand by fixing
28 the incidence of the adverse outcome in the control group, the sample size, and the desired Type I
29 error, and then, either by trial and error or by some more efficient search method, solving for the

DRAFT-DO NOT CITE OR QUOTE 62 AUGUST 9,1996

-------
*   l     treatment-group, response that would be detectable with the specified power under those
   2     conditions. Figure 3 shows the result of doing this for a range of values for control incidence,
   3     Type I error of 0.05 in a one-sided test, a sample size of 25 per group, 'and three powers: 0.2,
   4     0,5, and 0.8. The "barely detectable response" is plotted both as extra risk (top panel) or
 ,5     additional risk (lower panel).  Note that, for a given control incidence, the same treatment group
   6     incidence results in a higher value for extra risk than it does additional risk, because of the " 1 -
   7.     P(0)" term in the denominator for extra risk.       -  '                           :
   8           As a*1 example of how to read this graph, suppose we are looking at data'for an endpoint
   9     with a background incidence that is usually around 10%, and which is usually measured in a study
 10     with about 25 animals per group. Go to the figure that corresponds to the way you want to
 11      express risk ("Extra" or" Additional"), find 0.1 on the x-axis (there is a vertical dotted line there),
 12      and follow it up until it crosses one of the plotted curves. The vertical line crosses that thinnest,
 13      bottommost curve at an additional risk of about 0,13 or an extra risk of about 0.15. Since this
 14      curve corresponds to a power of 0:2, this means that one in five (20%) experiments in a situation
 15      with a control incidence of 0.1, an additional risk in the treatment group of 0.13 (that is, the
 16      treatment response is about 0.23),with 25 animals per group, and a one-sided test with a Type I
 17      error of 0.05, would indicate that the treatment response was greater than control.  To raise the
 18      proportion of experiments that indicate that the treatment response was greater than control to
 19      one in two (50%), the additional risk for the treatment group would have to be about 0.23, and to
20     raise it to eight out often (a typical pre-design-stage sort of power), the additional risk would
21     have to be about 0.35.                :
22            If the magnitude -of treatment .effect (difference between the mean responses of treatment
23     and control groups) that is deemed biologically significant is the smallest treatment effect that a
24     standard experimental design for that endpoint could detect, the BMR would be based on the limit
25     of resolution of the standard bioassay for looldng at that endpomt in the relevant test species.  For
26     example, for average litter sizes, the variance of litter means of term fetus weight for CD-I mice is
27     about 0.008 with a  mean of about 1.02 gm (Setzer and Rogers, 1991).  A commonly used sample
28     size formula (Kupper and Haffiier, 1989) can be rearranged to express the difference between two
       DRAFT-DO NOT CITE OR QUOTE        63                           AUGUST 9, 1996

-------
N = 25
0.4

Control Incidence
± 0.8---
CO

i
'an
•a
-a
<

S
to

-------
1 means just detectable with a test of a specified type I error and
2 power: '
4 where 8 is the desired difference between control and treatment group means, o is the control
5 group standard deviation, n is the number of animals in a group, a is the Type I.error, 1 - P is the
6 desired power, and Z,.0 is the number from a table of the normal distribution such that the
7 probability that a standard normal random variable is less than Z,.a is 1-a.
8 .. This gives a detectable difference of about 0.04 grams (about 4%) using a test with a type
9, I error of 0.05 and a power of 0.5 (for powers of 0.2 and 0.8 the values are 0.02 gm and 0.08 gm,
1Q respectively). If it were decided that the limit of detectability would be based on a difference
11 detectable with a power of 0.5, then, to reflect the limit of detectability of the conventional assay,
12 the BMR would be set at a difference of 0.04 gm.
13 4. Selecting the Critical Power Level for the Limit of Detectability
14 The critical power level to use for the "limit of detectability" with any particular study
' * . '•- '•,-.'
15 design needs to be determined in advance of applying this approach to setting, the BMR. This
16 could be determined using data from a large number of studies of similar "standard" design as was
17 ..done for developmental toxicity data (Allen etal., 1994a, b; Kavlock et al., 1995), or using
18 simulation techniques based on standard study design and background incidence. Other
19 requirements for setting the BMR using this approach are general agreements on the details of the
20 behaviors of those endpoints that affect power calculation; for example, the "typical" study design
21 for evaluating an endpoint in a given species; for dichotomous responses, normal background
22 incidences; and for continuous response, normal control variance levels and changes considered
23 to be "biologically significant" or cutoff levels distinguishing adverse from non-adverse values of
24 the continuous endpoint.
25 . " . ; . ' .-'.-• . • '
DRAFT--DONOTCITEORQIJOTE 65 AUGUST 9, 1996
-------
1 •• APPENDIX C
• • • * • . • '
2 MAlHE^iATICAL MODELING
3 1. Introduction .
4 Dose-response models for toxicology data are usually of the type called "nonlinear" in
5 mathematical terminology. In a linear model, the value the model predicts is a linear combination
6 of the parameters. For example, in a linear regression of a response^ on dose, the predicted value
7 is a linear combination of a and b, namely, ax 1 +b*dose . Note that, even a quadratic or other

8 polynomial is a linear model, in this sense: y=a+bxdose+cxdose2+d*dose 3 is a third order

9 polynomial (a cubic) equation, but is still a linear combination of the parameters, a, b, c, and d. In
10 contrast, in a nonlinear model, for example the log-logistic with background,

°' 1+e-[a+bxln(dose)]'
12 the response is not a linear combination of the parameters (here, Pg, a, and b). The distinction is
13 important, because nonlinear models are usually more difficult to fit to data, requiring more
14 complicated calculations, and statistical inference is more typically approximate than with linear
15 models. Note that this definition of "linear" is in contrast to the way the term is used in reference
16 to cancer dose-response assessment, in which the phrase "low-dose linear" refers to models in
17 which the slope is positive at zero dose. .' .
18 This section will discuss some aspects of model-fitting: initial model selection, approaches
19 to fitting models to data, ways to select among several models fitted to the same data set, and
20 calculation of the confidence intervals.. ,
21 2. Model Selection
22 The initial selection of a group of models to fit to the data is governed by the nature of the
23 measurement that represents the endpoint of interest and the experimental design used to generate
24 the data. In addition, certain constraints on the models or their parameter values sometimes need
25 to be observed, and may influence model selection. Finally, it may be desirable to model multiple

' DRAFT-DO NOT CITE'OR QUOTE 66 AUGUST 9, 1996
-------
endpoints;atthe.same:time. The diversity of possible non-cancer endpoints and shapes of their .
dose-responses for different toxicants precludes specifying a small set of models for further
consideration. This will inevitably lead to the use of judgement and occasional ambiguity when
selecting a final model to calculate a BMD/C. It is hoped that, as experience using benchmark
dose methodology in dose-response assessment accumulates, it will be possible to narrow the
number of acceptable models. '
a. Type of endpoint • ,
The kind of measurement variable that represents the endpoint of interest is an important
consideration in selecting mathematical models. Commonly, such variables are either continuous,
like liver weight or the specific activity of a given liver enzyme, or discrete, commonly
dichotomous, like the presence or absence of abnormal liver status. However, other types are
common in biological data; for example: ordered categorical, like a histology score that ranges
from 1-normal to 5-extremely abnormal; counts, such as counts of deaths or the numbers of cases
of illness per thousand person-years of exposure to a given exposure condition; waiting time,
such as the time it takes for an illness to appear after exposure, or age at death, or multiple
endpoints considered jointly (see,.for example, Krewski and Zhu, 1995, Lefkopoulou et al. 1989).
It is beyond the scope of this document to consider ail possible kinds of variables that might be
encountered, so further discussion will concentrate on dichotomous and continuous variables.
Dichotomous variables. Data on dichotomous variables are commonly presented as a fraction or
percent of individuals that present with the given condition at a given dose or exposure level. For
such endpoints, normally we select probability density models like logistic, probit, Weibull, and so
forth, whose predictions lie between zero and one for any possible dose, including zero. The
natural form of some models, such as the log-logistic model, presumes that the proportion of
controls with the abnormal response (an estimate of background) will always be zero. The default
approach taken in this document is not to include a background term in the model unless there is
evidence of a background response, or if the first 2 or more dosed groups do not show a
monotonic: increase in response. This is an area where more work is needed.
Continuous variables. Data for continuous variables are often presented as means and standard
deviations or standard errors, but may also be presented as a percent of control or some other

DRAFT-DO NOT CITE OR QUOTE 67 AUGUST 9, 1.996
-------
1 standard. 'From.a modeling standpoint, the most desirable form for such data is by individual.
2 • Unlike the usual situation for dichotombus variables, summarization of continuous variables
3 results in a loss of information about the distribution of those variables.
4 The preferred approach to expressing the BMR will determine the approach to modeling
5 continuous data. Two broad categories of approach have been proposed: 1) to express the BMR
6 as a particular change in the mean response, possibly as a fraction of the control mean, a fraction
7 of the standard deviation of the measurement from untreated individuals, or a level of the
8 response that expert opinion holds is adverse; or 2) to decide on a level of the outcome to
9 consider adverse, and treat the proportion of individuals with the adverse outcome much as one
10 would a dichotomous variable (see Appendix B above).
1 1 Typical models to use in the first situation include linear and polynomial models, and
12 power models or other nonlinear models. In the second situation, one approach is to classify each
13 individual as affected or not, and model the resulting variable as dichotomous. An alternative is to
14 use a hybrid approach, such as that described by Gaylor and Slikker (1990), Kodell et al. (1995),
15 and Crump (1995), which fit continuous models to continuous data, and, presuming a distribution
16 of the data, calculate a BMD/C in terms of the fraction affected. . . ,
* .
17 b. Experimental design .
: * * T f ' =
18 The aspects of experimental design that bear on model selection include the total number
19 of dose groups used and possible clustering of experimental subjects. The number of dose groups
20 has a bearing on the number of parameters that can be estimated: the number of parameters that
21 affect the overall shape of the dose-response curve generally cannot exceed the number of dose
22 groups.
23 Clustering of experimental subjects is actually more of an issue for methods of fitting the
24 models than for choice of the model form itself. The most common situation in which clustering
25 occurs is in developmental toxicology experiments, in which the toxicant is applied to the mother,
26 and individual offspring are examined for adverse effects. Another example is for designs in
27 which individuals yield multiple observations (repeated measures). This can happen, for example,
28 when each subject receives both treatment and control (common in studies with human subjects),
29 or each subject is observed multiple times after treatment (e.g., neurotoxicity studies). The issue

DRAFT-DO NOT CITE OR QUOTE 68 AUGUST 9,1996
-------
in all these-examples is that individual observations cannot be taken as independent of each other.
Most methods used for fitting models rely hea\dly on the assumption that the data are .
independent, and special fitting methods need to be used for data sets that exhibit more
complicated patterns of dependence (see, for example, Ryan 1992; Davidian and Giltinan, 1995).
c. Constraints and covariates
An obvious constraint on models for dichotomous data has already been been discussed;
probabilities are constrained to be positive numbers no, greater than one. However, biological
reality may impose other constraints on models. For. example, most biological quantities are.
constrained to be positive, so models should be selected so that their predicted values, at least in
the region of application, conform to that constraint. In models in which dose is raised to a power
which is a parameter to be estimated (such as a Weibull model), if that parameter is allowed to be
less than 1.0,'the slope of the dose-response curve becomes infinite at a dose of zero. This is
often seen as an undesirable situation, and the default is to constrain these parameters to be
, greater than or equal to 1. . .. ,
It is sometimes desirable to include covariates on individuals when fitting dose-response
models. For example, litter size has often been included as a covariate in modeling laboratory
animal data in developmental toxicity. Another example is in modeling epidemiology data, when
certain covariates (e.g.., age, parity) are included that are expected to affect the outcome and
might be correlated with exposure. In continuous models, if the covariate has an effect on the
response, including it in a model may improve the precision of the overall estimate by accounting
for variation that would otherwise end up in the residual variance. In any kind of model, any
variable that is correlated with dose, and which affects outcome, would need to be included as a
covariate. ' ^ .
3. Model Fitting
The goal of the fitting process is to find values for all the model parameters so that the
resulting fitted model describes those data as well as possible; this ;s termed "parameter
estimation." In practice, this happens when the dose-group means predicted by the model come
aS close as possible to the data means. One way to achieve this is to write down a function of all
the parameters (the objective function) and all the data, with the property that the parameter

DRAFT-DO NOT CITE OR QUOTE 69 AUGUST 9, 1996
-------
. 1 values that-corre.spond either to an overall minimum (or, equivalently, an overall maximum) of the
2 function, or that result in function values of zero, give the desired model predictions. The actual
3 fitting process is carried out iteratively. Many models will converge to the right estimates for
4 most data sets from just about any reasonable set of initial parameter values; however, some
5 models, and some data sets, may require multiple guesses at initial values before the model
6 converges. It also happens occasionally that the fitting procedure will converge to different
7 estimates from different initial guesses. Only one of these sets of estimates will be "best". It is
8 always good practice when fitting nonlinear models to try different initial values, just in case.
9 There are a few common ways to construct objective functions (estimates): the methods
10 of nonlinear least squares, maximum likelihood, and generalized estimating equations (GEE). The
11 choice of objective function is determined in large part by the nature of the variability of the data
12 around the fitted model. The method of nonlinear least squares, where the objective function is
13 the sum of the squared differences between the observed data values and the model-predicted
M values, is a common method for continuous variables when observations can be taken as
15 independent. A basic assumption .of this method is that the variance of individual observations
16 around the dose-group means is a constant across doses. When this assumption is violated
17 (commonly, when the variance of a continuous variable changes as a function of the mean, often
18 proportional to the square of the mean, giving a constant coefficient of variation), a modification
19 of the method may be used in which each term in the sum of squares is weighted by an estimate of
20 the variance at the corresponding dose. This method is especially appropriate when the data to be
21 fitted can be supposed to be at least approximately normally distributed.
22 Maximum likelihood is a general way of deriving an objective function when a reasonable
23 supposition about the distribution of the data can be made. Because estimates derived by
24 maximum likelihood methods have good statistical properties, such as asymptotic normality,
25 maximum likelihood is often a preferred form of estimation when that assumption is reasonably
26 close to the truth. An example of such a situation is the case of individual independently treated'
27 animals (e.g., not clustered in litters) scored for a dichotomous response. Here it is reasonable to
28 suppose that the nurriber of responding animals follows a binomial distribution with the probability
29 of response expressed as a function of dose. Continuous variables, especially means of several

DRAFT-DO NOT CITE OR QUOTE 70 ' AUGUST 9, 1996
-------
' l observations, are often normal (gaussian) qr log-normal! When variables are normally distributed
2 • with a constant variance, minimizing the sum of squares is equivdent to maximizing the
3 'likelihood, which explains in part, why least squares methods are often used for continuous
•. 4' . • variables. In developmental toxicity data, the distribution of the number of animals with an
5 adverse outcome is often taken to be approximately beta-binomial. This particular likelihood is
6 used to accomodate the lack of independence among.littermates. •
7 A third group of approaches to estimating parameters are the related quasi-likelihood
8 method (McCullagh and Nelder, 1989) and the method of GEE (see Zeger and Liang, 1986),
; 9 which require only that the mean, variance, and correlation structure of the data be specified.
10 GEE methods are similar to maximum likelihood estimation proceedures in that they require an
11 iterative solution, provide estimates of standard errors and correlations of the parameter
12 estimates, and estimates are asymptotically normal. Their use so far has primarily been to handle
13 forms of lack of independence, as in litter data, and would be useful in any of a number of kinds of
14 repeated measures designs, such as occur in clinical studies and repeated neurobehavioral testing.
15 4. Assessing How Well the Model Describes the Data
16 A*1 important criterion is that the selected model should describe the data, especially in the
17 region of the BMR. Most fitting methods will provide a global goodness-of-fit measure, usually
18 associated with a P-value. These measures quantify the degree to which the dose-group means
19 that are predicted by the model differ from the actual dose-group means, relative to how much
20 variation of the dose-group means one might expect. Small P-values (say, P < 0.05) indicate that
21 it would be unlikely to achieve a value of the goodness-of-fit statistic at least this extreme if the
22 data were actually sampled from the model, and, consequently, the model is a poor fit to the data.
23 Larger values cannot be compared from one model to another since they assume the different
24 models are correct; they can only identify those models that are consistent with the experimental
25 results. When there are other covariates in the models, such as litter size, the idea is the same,
26 just more complicated to calculate. In this case, the range of doses and other covariates is broken
27 up into cells, and the number of observations that fall into each cell is compared to that predicted
28 by the model.
29 ft can happen that the model is never very far from the data points (so the P-value for the ='

DRAFT-DO NOT PITE OR QUOTE 71 AUGUST 9, 1996
-------
1 goodness-of-fit statistic is not too small), but is always on one side or the other of the dose-group
* : * . ~ • ' .
2 means. Also, there could be a wide range in the response, and the model predicts the high
3 responses well, but misses the low dose responses. In such cases, the goodness-of-fit statistic
4 might not be significant, but the fit should be treated with caution.
5 The best way to detect the form of these deviations from fit is with graphical displays.
6 Plots should always supplement goodness-of-fit testing, It is extremely helpful that plots that
7 include data points also include a measure of dispersion of those data points, such as confidence
8 limits for the means, if the data points are means. '
9 In certain cases, the typical models for a standard study design cannot be used with the
10 observed data as, for .example, when the data are rnonotonic, or when the response rises abruptly
11 after some lower doses that give only the background response. In these cases, adjustments to
12 the data (e.g., a log-transformation of dose) or the model (e.g., adjustments for unrelated deaths)
13 may be necessary. In the absence of a mechanistic understanding of the biological response to a
14 toxic agent, evidence from exposures that give responses much more extreme than the BMR does
15 not really tell us very much about the shape of the response in the region of the BMR. Such
16 exposures, however, may very well have a strong effect on the shape of the fitted model in the
17 region of the BMD/C. Thus; if lack of fit is due to characteristics of the dose-response data for
18 high doses, the data may be adjusted by eliminating the high dose group. The practice carries
19 with it the loss of a degree of freedom, but may be useful in cases where the response plateaus or
20 drops off at high doses. Alternatively, an entirely different model could be fit.
21 5. Comparing Models
22 It will often happen that several models provide an adequate fit to a given data set. These
23 models may be essentially unrelated to each other (for example a logistic model and a probit
24 model often do about as well at fitting dichotomous data) or they may be related to each other in
25 the sense that.they are members of the same family that differ in.which parameters are fixed at
26 some default value. For example, one can consider the log-logistic, the log-logistic with non-zero
27 background, and the log-logistic with threshold and non-zero background to all be members of
28 the same family of models. Goodness-of fit statistics are not designed to compare different
29 models, so alternative approaches to selecting a model to use for BMD/C computation need to be
* *
DRAFT-DO NOT CITE OR QUOTE 72 . AUGUST 9, 1996
-------
l pursued,'- ' •.,.
2 . When other data sets for similar endpoints exist, an external consideration can be applied.
: 3 It may be possible to compare the result of BMD/C computations across studies of all the data
4 that were fit using the same form, of model, presuming that a model can be found that describes all
5 the data sets. Another consideration is the existence of a conventional approach to fitting a kind
6 of data. In this case, communication with specialists in Ithat data is eased when a familiar model is
7 used to fit the data. Neither of these considerations should be seen as justification for using ill-
8 fitting models. Finally, it is generally considered preferable to use models with fewer parameters,
9 when possible! :
10 Generally, both in the method of least squares and in maximum likelihood methods, within
11 a family of models, as additional parameters are introduced the objective function will appear to
" 12 improve in behavior. This general behavior is due solely to the increase in the additional
13 parameters. Likelihood ratio tests can be used to evaluate whether the improvement in fit
14 afforded by estimating additional parameters is justified. Such tests cannot be applied to compare
15 models from different families, however. Some statistics, notably AkaUce's Information
16 Coefficient (AIC) can be used to compare models with different numbers of parameters using a
17 similar fitting method (for example, least squares or a binomial maximum likelihood). Although
18 such methods are not exact, they can provide useful guidance in model selection.
19 6. Using Confidence Limits to Get a BMD/C
20 When a BMD/C is determined, it requires the choice of a model together with a suitable
21 method to calculate a confidence limit, as well as the choice (or prior specification) of the level of
22 confidence and the magnitude of the endpoint. Confidence limits bracket those models which,
23 within a particular model family, are consistent with the data. They do not make statements about
24 the extent to which a model is likely to be the true one (see Cox and Hinkley, 1974).
25 The choice of endpoint or its severity is discussed elsewhere inithis document. The "level"
26 at wWch confidence Umits are set has tended to be 95% for a variety of applications (e.g.,
27 prediction of future monitoring values, human cancer prediction). This level reflects the extent to
28 which outlying values are tolerable for the system and is chosen to cover some reasonable amount
29 of the distribution of the source of the modeled data. This cannot account for or assume any

DRAFT-DO NOT CITE OR QUOTE 73 AUGUST 9, 1996
-------
^1 correspondence.between the modeled animal data and the human population" of concern. Rather,
2 the "confidence" associated with an interval indicates the percent of repeated intervals based on
3 experiments of the same sort that are expected to cover (include) the dose associated with the
4 BMR. The choice of confidence level represents tradeoffs in data collection costs.and the needed
5 data precision, just as hypothesis-testing levels do. Just as 0.05 is a convenient (but not
6 necessarily good for all data) level for tests, 95% is a convenient choice for most limits. An
* * .
7 example of the effects of choice among 90, 95, and 99% for two different models may be found in
8 the background document (EPA, 1995c). '
9 The method by which the confidence limit is obtained is typically related to the manner in
10 which the BMD/C is estimated from the model. Historically, most Agency modelling has been
11 likelihood based (see discussion below); mostly this reflects general trends in modelling. How
12 well any method behaves (e.g., how narrow the confidence interval of fixed level is) relative to
13 other methods with the same model can depend on how nearly the effect level for estimation is
14 contained in the exposure range in the source study. Different types of models lend themselves to
15 different types of estimation, depending on such things as the extent to which the model
16 accommodates clustered data and whether the model equation can be restated hi terms of the dose
17 that is calculated to produce the specific effect. Software used to determine benchmark doses
18 should identify the estimation procedure used and the available confidence limits.
19 Several ways to derive model functions are described earlier in this Appendix. For both
20 likelihood and estimating equation strategies, the most typical approach to constructing
21 confidence limits relies on asymptotic normality to establish confidence sets, although neither
22 strategy requires that the actual endpoint of interest be a continuous variable.
23 With regard to likelihood-based methods, confidence intervals (CIs) based on the
24 asymptotic distribution of the likelihood ratio are preferred to those based on the asymptotic
25 distribution of the MLEs, because they can use a commonly tabled distribution function, but both
1 \
26 can give problems in ranges where the assumptions needed to use asymptotic theory begin to
27 weaken (e.g., as sample sizes decrease, as interest focuses farther from the experimental doses, as
28 observations become more correlated). In general, however, it is preferred to base CIs for
29 parameters estimated by maximum likelihood across various data contexts on the asymptotic.

DRAFT-DO NOT CITE OR QUOTE 74 AUGUST 9, 1996
-------
,1 distribution of the Ukelihood ratio, owing to their tendency to give better coverage behavior5
2 (Crump and Howe, 1985).
3 In the long run, there may be a preference for one type of construction with GEE (e.g.,
4 like Ryan, 1992, who uses the delta method and sample estimates). It still has to be demonstrated
5 whether those types are similar in performance (e.g., coverage behavior) for their respective
6 parameters (where GEE were used) to.those from the asymptotic distribution of the likelihood
7 ratio hi the likelihood context. '
8
9
10
1 1
12
13
14
15
16
17
Thus, the BMD/C is determined by .1) selecting ;an endpoint, 2) identifying a BMR (a
predetermined level of change in response relative to controls), 3) establishing, by an appropriate
estimation procedure, a model that fits the data adequately, and 4) calculating a confidence limit
at the BMR using the model and the same estimation procedure.
At the time of this draft, some commercial software is available that is designed
specifically for carrying out steps 3) and 4) by maximum likelihood methods. EPA is currently
developing software for this purpose that will be made widely-available to all. potential users.
GEE solutions, which might be used with nested data, involve iterative fitting after the method of
weighted least squares but standardized routines are available for certain contexts; these are
written as commonly-used software macros (e.g., in SAS) that must be inserted into user-
is fashioned programs.
19
While we say there is, for instance, a 95% probability that our interval covers the BMR,
the actual probability may be something else, sometimes even as low as 30% or 40% although
more usually in the 50% to 90% range. This can happen when we have not allowed for
something like correlated observations in the model. Then it is said that the nominal and actual
coverages diverge. It is not clear if anyone has, looked at the extent of this divergence for various
construction approaches for GEE CIs. ...
DRAFT-DO NOT CITE OR QUOTE .75 AUGUST 9, 1996
-------
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
• . APPENDIX D
'-•*'- * . i
EXAMPLES OF BMD/C ANALYSES

Example #1; Carbon Bisulfide .

Summary of Main Points Illustrated

Determination of the BMR for a continuous variable based on biological significance of response.

Use of individual rather than group exposure and response data. ' '

Summary of Pata

Study: Occupational inhalation study (in humans) by NIOSH.
Endpoints modeled: Sural and peroneal nerve conduction velocity and amplitude .
Endpoint used: peroneal nerve conduction velocity-

The benchmark concentration for carbon disulfide was derived from nerve conduction
velocity changes reported in the occupational study by Johnson et al., 1983. Johnson et al.
reported several neurophysiological measurements in workers exposed for an average of 12 years.
Endpoints affected by exposure included peroneal nerve conduction velocity and amplitude, and
sural nerve conduction velocity. The published report presented exposure groups categorized as
low, medium and high exposure and presented mean and median exposures for each group (Table
1). . .
Table 1. .Group data used in the Benchmark Dose Calculation for Carbon Disulfide - As
summarized in Johnson et al., 1983.
Exposure Level (ppm) .
Peroneal NCV
Group
Comparison
group
Low
Medium
High
•••• — MM ••-•Bin-It -TUTTI-1
Mean

1.2
5.1
12.6
SD

1.0
4.1
26.9
Median
0.2
>
1.0
4.1
7.6
N
196
44
56
36
Mean
45.3
43.7
43.4
41.8
SD
4.4
5.1
4.8
4.5
DRAFT-DO NOT CITE OR QUOTE
76
AUGUST 9, 1996
-------
• ,. ". . TI"; Ch^lical. Manufacturers Association Carbon Bisulfide Panel obtained the raw
mdmdual data from the Johnson study from NIOSH and performed a benchmark concentration
analysis which is presented in the report by Price and Berner, 1995.
Selection of the BMR

Benchmark response: 10% change (note that this BMR is > the response in the highest exposure
O "~P/ i

The decision of the RfC work group was to use a 1-0% decrease in nerve conduction :
velocity as the definition of the BMG. The BMR is therefore at a higher level of .response than
the highest dose group in the study and was also higher than a statistically significant response
level in the study. This decision was aided and supported by extensive discussion with
neurotoxicologists from EPA's NHEERL, who attended the work group meeting There was
considerable.discussion about the proper choice of the BMR because use of the BMC10 resulted
in a BMC that was higher than any of the exposure group responses, i.e., the BMC involved
extrapolation above the data points. This approach was supported on the grounds that the
response observed in the high concentration group was considered to be, at most mildly adverse
(a peer-reviewer' s comment was that the effects in the Johnson study should be considered pre-
adyerse), because a change in conduction velocity of 1 0% is about where a clinician would begin
to be concerned, and because a 10% change is about equal to one standard deviation for this
endpoint. The use of the individual subject analysis was preferred because it allowed for aae
adjustment. ' • &

Calculation of the Benchmark Concentration

Model: polynomial model used, data fit with linear term only

A. Group Level Data
Analyses were done initially using the group level data based on the arithmetic mean
exposures (with no variability estimate) using commercial software (Table 2) These results are
presented m the internal EPA report 'Benchmark Concentration Analysis for Carbon Bisulfide'
(EPA, 1995e). (Also presented in this report are BMC analyses on the developmental studies by
Tabacova and colleagues.) . *••

B. Individual Data
The Chemical Manufacturers Association Carbon Disulfide Panel performed a benchmark
concentration analysis on the individual data, which allowed for an age adjustment to be included
in the analysis and allowed for exposure variability to be considered because each subject had a
unique exposure and effect measurement (Price and Berrier, 1995).

Further analysis of the raw data by EPA involved correction of some missing or
implausible data points and evaluation of the interaction of age and exposure. This analysis found.

DRAFT-DO NOT CITE OR QUOTE 77 AUGUST 9, 1996
-------

Group Data
Individual Data
BMR
10%
10%
BMC
ppm
12
20
BMC
mg/m3
37
62
1 a greater decline in conduction velocity with age in exposed compared with control subjects,
'2 suggesting an interaction of age and exposure, and found cumulative exposure to better explain
3 the response (see figure). The resulting BMCs are shown in Table 2. .
4 ' '"''.''
5 .
6 Table 2. Peroneal Nerve Conduction Velocity Benchmark Concentration Results'-
7 "... ' ' ,
8
9
10
11
12
13
14
15 ' '
16
17 The difference in the BMCs between the analysis based on group data versus individual
18 data are due mainly to the skewed distribution of concentrations in the defined exposure
19 categories for the group analysis (See Table 1).
20
21 Explanation of Figure 4:
22 The top -horizontal solid line is the mean adjusted nerve conduction velocity (NCV) for the
23 '"typical" subject (34 year old, 70 inches tall, 177 pound). The horizontal dashed line and lower
24 solid line show the BMR of 5 or 10 % reduction in peroneal NCV. The dashed sloping line
25 shows the 95% confidence limit on the linear model using the cumulative exposure. The vertical
26 line connects the intersection of the lower confidence limit and the BMR of a 10% reduction to
27 the x-axis, showing the BMC. The solid sloping line shows the lower confidence limit on the
28 regression model using mean exposure index for comparison.
29
DRAFT-DO NOT CITE OR QUOTE 78 AUGUST 9; 1996
-------
o
0
CO
"o
.0
CD

*;c
o

T3
C
O
O

CD
CO
CD
C

2
CD
a.

-o
CD

.to
O
CD
O
LD
0
10
20
30
40
.

Mean Exposure Index (ppm)
Figure 4. Carbon Disulflde Modeling (see text for explanation)
DFLAFT--DO NOT CITE OR QUOTE
79
AUGUST 9, 1996
-------
1
t
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
OE
JiO
26
27 '
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Example'#2: l,l,l,2-Tetrafluoroethane.(HFC-134a)
'• . ' ' ' •
Summary of Main Fumts Illustrated .
"
Setting the BMR based on default and limit of detection approaches for dichotomous dat&
' • *.
Modeling a response with a high background incidence. .
•
Comparison of Extr.a vs. Additional risk

Summary of Data

Study: Chronic rat inhalation
Endpoints modeled: Leydig cell.hyperplasia, incidence
Endpoint used: same

The BMC for HCF-134a was based on the BMC calculated for testicular effects (Table
in a chronic study in rats (Hext and Parr-Dobrzanski, 1993; Collins et al., 1995).

Table 1. Data used for Benchmark Dose Analysis of HFC-134a

Concentration N Incidence
ppm ' ,

0 85 • •- 27 ' ;
2,500 , 79 25
10,000 85 31(NOAEL)
•50,000 85 40(LOAEL)

*•

..
Selection of BMR

Default BMR - The benchmark response for Leydig cell hyperplasia was defined as a 10% extra
risk of response.
. " *
Selection of BMR based on Limit of Detection. - Use of the limit of detection approach
significantly changes the BMC for this chemical because of the high background and the very
shallow dose.-response curve. . • ,

The determination of a "typical" design that includes evaluation of Leydig cell hyperplasia
in rats, and control incidences is a first step in determining the BMR using the limit of detection
approach. Leydig cell hyperplasia would be observed during routine histopathology of the testes ,
DRAFT-DO NOT CITE'OR QUOTE 80 AUGUST 9, 1996
-------
1 in a subchronic.or chronic animal study. The number of animals typically used in such studies are
2 10-20 for a subchronic study and 50-100 for a chronic study. To determine the appropriate
3 BMR, a "typical sample size could be determined to be, e.g., n = .2,5. Alternatively, the sample
4 size from the study could be used, n = 85. There is a provision for the latter approach in the
5 guidelines if the endpoint is not typically examined and there is no historical information on a
6 typical design and background incidence level . . ••
7. . "• • ' • . ' . - ' '."'.•..'"•" •.'.-'•
8 Secondly, the "typical" background rates must be determined. If an additional risk
9 formulation is used, background is relatively unimportant, but the sample size makes a big
10 difference. Again, the choice is between the use of a "typical" background rate for the endpoint
1.1 and the use of the background in the study. Because a review of the incidence of Leydig cell
12 hyperplasia has not been done to determine the typical incidence and the variability among studies,
13 the background observed in the study (approximately 30%) was used in the determination of the '
14 BMR.
is ' ' - •• ' ' -- . .'.';. '.•'.•
16 Finally, the required power level to be used must be determined. For the purpose of this
17 example, power levels of 50% and 80% will be used. The BMRs for various sample sizes,
18 backgrounds, and power levels are shown in Table 2.
19 ' - •' • ' . -...•'' ; ;.''.'•..'
20 Table 2. Benchmark response levels for various sample size, background, and power.
21 '• ••: . . -..•/'.'.. - "•.••-'-- ''."'...•'•
22 , Extra Risk Additional Risk
23
24
25
26
27
28
29
so;
31
32
33 ••.••..•-•. • , • , . '
34 ' ' ' •..'- •'''•/••'* -
35 Calculation of the Benchmark Concentration
36 • , - .->-...''• ' '.'•'''.'-' "•
37 A. Use of Default BMR
38 Model: Polynomial and Weibull models (with and without threshold term) agreed. Used
39 commercially available software.
4o ', • ,. . ' ~ '. --: • :'- -,. - .•:""• , • "".
41 The RfC for HGF-134a was based on the BMC calculated for testicular efFects (Table 1)
42 in a chronic study in rats (Hext and Parr-Dobrzanski, 1993; Collins et al., 1995). Because of the
43 high background, a substantial difference in the estimates of benchmark concentration occurs for

DRAFT-DO NOT CITE OR QUOTE 81 AUGUST 9, 1996
ss
25
25
25
85
85
85
Background
.20
.30
.40
.20 ;
.30
s.40"
50% Power
0.33
0.39
0.46
0.15
•0.19.
0.23
80% Power
0.47 -
a.55
0.63
0.23
0.28
0.34
50% Power
0.26
0.27
0.28
0.12
0.13 '
0.14
80% Power
0:37
0.38
0.38
0.18
0.20
0.20
-------
I
'2'
3
4'
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
extra risk vs. additional risk models (15,600 vs. 1 1000 ppm), based on the default use of a BMR
of 0.1 (Table 3). 'An extra risk model was selected as most appropriate based on the conservative
assumption of independence of mechanisms causing the background response and the treatment-
related response. Dose-response models for dichotomous data included a polynomial (multistage)
and a Weibull form and either including or not including a model parameter for a background-
intercept (sometimes referred to as a threshold parameter). Using any of these four- model forms,
an excellent model fit was obtained and the BMC estimates were the same after rounding to two
significant figures.

Table 3. Benchmark Concentrations
default BMR of 0.1. (Polynomial and

Endpoint/
Model
Polynomial, no thresh, extra •

Polynomial, no thresh, additional

Weibull, no thresh, extra

B. Use of Limit of Detection

for Leydig Cell Hyperplasia for HFC- 1 34a based on the
Weibull models, extra risk, no threshold)
-
BMR MLE BMC
% ppm ppm
10 . 19300 11000

10 28600 15600

,10 19300 11000
,

• .
The limit of detection approach used considerably higher BMR levels (.30 compared to
the default of . 1 0) because of the high control response level and consequently, a greater
treatment effect required to be statistically significant. BMCs resulting for the BMRs based on
the limit of detection for a background rate of 30%, 50% or 80% power and extra or additional
risk are shown in Table 4.

DRAFT-DO NOT CITE OR QUOTE 82 AUGUST 9, 1996
-------

25
85

.30
.30
M
95600
40800
xtra Risk
52500
22600

154000 .
63500

62400
. 35200
-. 4 Table 4. 'Bencljmark Concentrations and MLE Estimates for Various Sample Size Background
2 and Power. The polynomial model-with no threshold term was applied
,3.: -. ' '•', - •. -- ; •,'.•.••.'.•;, ' :. ....-'
4 SS Background 50% Power 80% Power
5 MLE . .BMC MLE BMC
6
7
8
9
10 .
11
12
13
14
15 • . ' - .- •''••'.
16; .-••'.,'. , .. ' .'-' - ' " ' ;* .• • . . . _
17 Tiie BMC is 5-6 times higher when the BMR is calculated based on the limit of detection
18 approach for n=25 and 2-3 times higher for n=85, and for a sample size of 25, is higher than the
.19 highest exposure level in the study (see figure 5, based on extra risk). This difference in the
20 BMCs would be translated directly into the derivation of the RfC or other health exposure limit
.21-. because there would be no difference in the application of uncertainty factors for the two .
22 approaches to selecting the BMR. It is also noteworthy that the limit of detection approach to
23 determining theBMR eliminates the difference between the BMCs for additional vs extra risk
24 because BMRs corresponding^ extra versus additional risk are used for the appropriate models
Additional Risk
25
85
.30
.30 ..
97500
40900
51000
21600
158000 -
67100
62600
35300
DRAFT-DO NOT CITE OR QUOTE 83 AUGUST 9, 1996
-------
d.
3 HI

I'*
a
e
"53 •{=
m 1/9
— <1>
e-1
s *
w -o
O 6
s

• c
ra >»
U-
x
—i—
o>
o'
05
O
s
SO
. ON

ON"
H
CO

rl
"8
I , ^3
| t

.1 I
oo
S
a
g
g
o

I
o
p
-------
Example #3: Boron
3
4
5
6
7
8
9
10
11
12
13
14 .
15
16
17
18
•19
20
21
22
23
24
25
26
27
28
29.
30
31
32
33
34,
35
36
Summary of Main Points Illustrated
- * . • • *
Combmation of two experiments '•
Conversion of continuous data to dichotomous form.
: - _ •

Summary of Data Sets '•'••'
Studies: Two prenatal developmental toxicity studies with boric acid administered
(Heinde^et al., 1992; RTI, 1994). '" ,
.Endpqints modeled: fetal weight (as continuous and dichotomous), malformations,
.-.,.-' • .

in the diet

variations.
Endpoint used: fetal weight (continuous) because changes were seen at the lowest doses.

Table 1 . Data used for Benchmark Dose Analysis for Boron - continuous data

Dose* Fetal Weight SD N (fetuses)
(mg/kg/day)

Study A
0 3.7 .' .32 218
78 3.45 .25 217 :
163 3.21 .26 - 205
330 2.34 .25 . ,191
, . Study B
0 3.61 .24 211
19 3.56 . .23 226
36 3.53 ' , . - .28 220
55 3.50 , .38 , 221
76 3.38 .26 236
143 3,16 .• ; .31 209
' "'""""""'""""'""••"•"""•"••••""""""•"""""""'•••"•"••"•""""•••••••••••••••••••••^^^•••••••••M™

*The doses in the two studies are slightly different because they are based on food consumption
measurements in the two studies. . .

DRAFT-DO NOT CITE OR QUOTE 85 AUGUST 9, 1996
-------
Study A
431
432
408
386
Study B
416
460
437
437
471
411

'21 .
51
' 152
384 ' .

21
24
42
40 -
70
162
. -. > -.
2 Table 2. Data used for Benchmark Dose Analysis for Borpri - dichotomized continuous data
3 ' '; _ • -.•'../ ..
4 Dose N Incidence
5 (mg/kg/day)
6
7
8 0
9 78
10 163
11 330
12
13 0
14 19
15 36
16 55
17 76
18 143
19 _____•__—____.
20
21 Selection of theBMR
22
23 * Benchmark response levels of 5% additional risk for.dichotomous data, and control SD
24 divided by 2 for continuous data were selected. The rationale for these choices is provided by
25 Allen et al. (1996). . .
26
27 Calculation of the Benchmark Dose x
28
29 Model: continuous power for continuous data, log-logistic (fetuses nested within litters) for
30 dichotomous, additional risk.
31
32 The BMD for boron was derived by Allen et al. (1996). In this analysis, BMDs for several
33 endpoints from two developmental studies in rats were calculated and compared. In addition, the
34 analysis combined results from the 2 studies because they were done by the same lab, and because
35 the 2nd study was done as a follow-up intended specifically to extend the dose range from the
36 first study in order to clearly define a NOAEL. There are a few general issues illustrated by this
37 analysis.
38
39 1. Combination of studies - Dose-response functions were fit to the individual data sets and these
40 were compared using a likelihood ratio test. If this test indicated that the responses from the two
41 studies were consistent with a single dose-response function, the model was fit to the combined
42 data.
43 _ • • ' '-••-

DRAFT-DO NOT CITE OR QUOTE 86 AUGUST 9, 1996
-------
•'. „••","-- • • . .
^l ,2. Fetal, weighty were analyzed in two ways. In the first, the average Utter weights were modeled
2 as a continuous variable (Figure 6). The BMR was defined as a 5% decrease in fetal weight from
3 the control mean or as a decrease equal to the control standard deviatipn divided by 2. In the
4 second approach^ the individual litter weights were converted to dichottimous data using the 5th
5 percentile in the corresponding control group as the definition of adversely affected pups. The
6 model was then fit to the litter data expressed as probability of adverse response (Figure 7).
7 '-"'.. . ' ••-.•''•• ."".'' '''.'
8 ,3. Combination of data on missing or malformed ribs - The analysis explored several approaches
9 to weighting and combining data on alterations in thoracic rib XIQ arid in lumbar ribs.
10
11. Selection of the most appropriate BMD: .
••"12 •-• . • ' ' •"•"' • , '• ' ' . •'• ;" - - ' •' •<
13 The fetal weight analysis using the continuous power model was recommended by Allen et
1.4 al. (1996) for the following reasons: , .
15 ' ' ; .;••..'• ' ' . .•- •
16 N - The BMP was lower than those calculated for rib effects or other malformations
17. '. - The analysis of fetal weight effects converted to dichotomous data showed that the dose-
is response pattern was not the same in the two studies so the studies could not be combined
19 for this analysis.
20 -It is preferable to combine data from these two studies, if possible.
21 ' •.-..•..'.• '. .'
22 Tables. Benchmark Dose Results for Boron
23 , ' . • • - '' - ., .' . • ;; •x . .
24 Analysis MLE BMD

26 Fetal weight - continuous data r BMR = 5% decrease
27 Study A 80 56
28 Study B 68 47
29 Combined 78 59
30 • '' •,'•• . '..''.'•• •••••• _ \ ..";'•
31 Fetal weight - continuous data - BMR = decrease to control SD/2
32 Study A 73 48
33 Study B 49 31
;34 Combined- 65 ' 59 • " -
35 '. ••'.-• "'•-'- " - •''.•'""•.•
36 Fetal weight - dichotomized continuous data
37 Study A 129 115 .[-'•'.'.
38 . Study B 47 31
39 Combuied — — . ;
40 -——-—•————«—«.«—'—«*,««,-»—««.».^.....«_.»............___1..... .
41 "..-... . ' ' : _. ; ' . ' . ' •
42 The BMDs from the combined continuous data using either a 5% decrease or a decrease
43 to control SD/2 were the same, and were similar to the BMDs calculated for Studies A and B

DRAFT-DO NOT CITE OR QUOTE 87 AUGUST 9, 1996
-------
1 alone, especially, those based on a 5% decrease from the control mean. These results indicate that
"2 a BMD could be calculated without-running the second study to define the NOAEL. ,
3 ' •'••.•.-.-• ,.'' '.t ' '. • . .
4 . .•-''''•'•'.' •
'DRAFT-DO NOT CITE OR QUOTE 88 AUGUST 9, 1996
-------
H
1/3
8
3
oo
-------
.Tq a a o \anaa
\
Tj

• a

~O)Q
E|

o
CD JB

&
I
0\"
H

I
i
a
i
pa
S
a
.2P
to
a

s
HH
u
H

g
O

-------
Example #4: l,3rButadiene

Summary of Main Points Illustrated

BMC analysis with np NOAEL '
What to do when model doesn't fit "(drop the high dose level) ..

Summary of Data

Study: Chronic inhalation exposurei study (NTP, 1991) in BeCSFj mice
Endpoints modeled: Ovarian, testiciilar and uterine atrophy .
Endpoint used: Ovarian atrophy

Data from a chronic inhalation study were used to evaluate the noncancer health effects.
Ovarian, testieular and uterine atrophy all showed effects related to exposure level, but ovarian
atrophy was seen at all exposure levels, with no NOAEL. The data used are shown in Table 1.

Table 1. Data used in the benchmark concentration analysis for 1,3-butadiene.
Exposure Level
0.00
6.25 ppm
20.0 ppm.
62.5 ppm
200.0 ppm
625.0 ppm'
No. Examined
49
~ . 49
48
50
50
79
. % Affected
8.16
38.78
66.67
84.00
86.00
87.34
Selection of the BMR Level

The BMR was chosen as the default of 10% extra risk above controls (Kimmel et al.
1996). The limit of detection was determined based on the background rate (10%) and number of
animals (50) in this study to compare with the default approach. Table 2 shows the BMR
calculated on this basis for power levels of 50% and 80%.
DRAFT-DO NOT CITE OR QUOTE 91 AUGUST 9, 1996
-------
I
•*
3
4
5
6
7
8
9
IP
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Table 2.. Determination of the BMR Using the Limit of Detection Approach

Background 'N Power Extra Risk . , .,
.10
.10
50
50
50%
80%
.13
.22
Calculation of the Benchmark Concentration

Model: log-logistic model used
» > ' '
Ovarian atrophy data could not be fit adequately using the quantal Weibull model. A log-
logistic model was used to model the data.The model produced a poor fit for all 6 exposure
groups, due to leveling off of the response at exposures above 62.5 ppm. Although the model
was capable of fitting the first 5 exposure levels, the best fit of the data based on a graphical
display of the fit was with exposure groups 1-4 (see Fig. 8). However, the BMCs based on the
default of 10% risk were similar for exposure groups 1-5 and 1-4, and for additional risk and
extra risk (Table 2). The BMCs based on the limit of detection for each power level also were
similar for exposure groups 1-5 and 1-4, indicating that the model was giving a similar fit the data
in the range of the BMR in both cases.
Table 2. BMCs for ovarian atrophy after 1,3-butadiene chronic inhalation exposure
LOAEL Exposure Groups BMCi0
Modeled
BMC (based on limit of detection)
50% 80%
Additional Risk
6.25 ppm

•
1-5
1-4

1-5
1-4
1.23
1.03

1.15 .
,0.96

Extra Risk
1.54 2.91
1.29 2.43
DRAFT-DO NOT CITE OR QUOTE
92
AUGUST 9, 1996
-------
OVARIAN ATROPHY (GROUPS 1-5)
' ' V * •.-.-. •• s fiiif.' I
' *•- •' T--i _ ' . ii |

^W-'!.";';^v»- ,> }">,S>%.".*;cJ.*i,v}x\""™£
P7-5^^' -"-, -"t<.^s/i y&f* \>-&; \*ji,„&*.

'"^^$ ^IC^^-'B^&fH^;
' "• *. •Sj''. .:"% :x»,-,'- ,p?<'^ * -' "'$ <<> ,^,, 'i^«>/ '"V-
"'-.C^' >"x'\M', ;>/-; ;;,,;^
.„>,:,.,; ', ;v >^%'; - •> ?st ^-.;;,- -; "-"'•<, *-"', ^

'&>y^-^,:!^^ .:/..^:^'^y'.^
100.00
CONCENTRATION(PPM)
150.00
200.00
OVARIAN ATROPHY (GROUPS 1-4)
0.00
0.00 10.00 20.00 30.00 40.00 50.00

CONCENTRATION(PPM)
60.00 70.00
Figure 8. 1,3-Buiadiene Modeling
DRAFT-DO NOT CITE OR QUOTE
93
AUGUST 9, 1996
-------
.1 •. APPENDKE
2 LIST OF MODELS PLANNED TO BE INCLUDED IN FIRST
3 , RELEASE OF THE EPA.SOFTWARE.
4 ' ' ' , - , •
5 " BINARY REGRESSION MODELS •
6 ProbitModel ...
7 Weibiill Model
8 Logistic Model
9 Gamma Multi-Hit Model
10 . Quantal Linear Model (One Hit)
11 Quanta! Quadratic Model
12 Multistage Model (Quantal Polynomial) "
13
14 NESTED BINARY REGRESSION MODELS :
15 Logistic Model
16 Rai and Van Ryzin Model /
17 - NCTRModel . ' '
18 .
19 CONTINUOUS REGRESSION MODELS
20 Linear Model
21 Polynomial Model
22 Power Model
23 '"•-'•-,..-'
DRAFT-DO NOT CITE OR QUOTE 94 AUGUST 9, 1996
-------