,         ,                ,                 .                     ,        .
*      DRAFT--    .                    .                          EPA/600/P-96/002A
       DO NOT CITE OR QUOTE                                        August 9, 1996
                                                "             .    External Review Draft
               Benchmark Dose Technical Guidance Document
                                         NOTICE

       IMS-DOCUMENT IS A PRELIMINARY DRAFT. It has not been formally released by the
       U.S. Environmental Protection Agency and should not at this stage be construed to represent
       Agency policy. It is being circulated for comment on its technical accuracy and policy
       implications.         .    '                                               •
                                    .Risk Assessment Forum
                              U.S. Environmental Protection Agency
                                    Washington~DC 20460

-------

-------
                                   DISCLAIMER

       This document is an external draft for review purposes only and does not constitute U.S.
 Environmental Protection Agency policy. Mention of trade names or commercial products does
 not constitute endorsement or recommendation for use.
DRAFT-DO NOT CITE OR QUOTE        ii                         AUGUST 9, 1996

-------

-------
                                    CONTENTS
              • •
 .Figures	:.......
                 ••••••••••••••••:•••••••••	  vi
 Preface  . .... .. .....;.:..'........,'...-...'.. .';;..';._.;'.... .';•., . . ;, .'„ '......... viT
 Authors and Reviewers  .........:. .....                ,                          :Y
                          i             	'''.	..%-.........  ix
 I. EXECUTIVE SUMMARY .. .......	.	. ......;..  ,..                       j
       A. Introduction	::..........                \
       B. Decision Process for Carrying Out a Benchmark Dose/Concentration Analysis ..... 3
             1: Data array analysis - endpoint selection .		.......'.:             3
    ,.'•'• 2.. Minimum data set for calculating a BMD/C	4
             3. Criteria for selecting the benchmark response (BMR)  . . ...               4
             4. Order of model application	...,.....:.....	   ...  ...   5
>             5, Determining the model structure .	 .......... ;          5
             6. ConfidenceUmitcalculation ..'..'	       7
             7.  Selection of "equivalent" models ...........................    7
             8.  Selecting the BMD/C					             g
             9. Use in Risk Assessment	          9
H. INTRODUCTION . .... ....;..........	 .. ;	  .........            I0
       A.  Background	                        10
       B.  Purpose of This Guidance Document		 16
       C.  Definition of the Benchmark Dose or Concentration	 .        17
ffl. BENCHMARK DOSE GUIDANCE ...,.,..   	.. .;     .......       "18
       A.  Data Array Analysis - Endpoint Selection	            18
             1. Selection of Endpoints to be Modeled	 19
             2. Minimum Data Set for Calculating a BMD/C . . . . . .	            19
             3. Combining Data for a BMD/C Calculation	        .20
       B.  Criteria for Selecting the Benchmark Response Level (BMR) .'..-,	:..... 20
       C.  Mathematical Modeling ....:....	'.'....	       22
             1-. Introduction	   	         ....... 22
             2. Order of Model Application	 23

DRAFT-DO NOT CITE OR QUOTE       iii                        AUGUST 9, 1996

-------
            3.. Determining the Model Structure	".'..•	'.. 23
            4. Confidence Limit Calculation .	 25
            5. Selection of "Equivalent" Models 	...../...'.,		..... 25
            6. Selecting the BMD/C	.. ..	 26
IV. USING THE BMD/C IN NONCANCER DOSE-RESPONSE ANALYSIS	 28
      A. Introduction	 ... 28
      B. Effect of BMD/C Approach on Use of Uncertainty Factors	 28
      C. Dose-Response Characterization	:.. .	, .29
V. USE OF BENCHMARK-STYLE APPROACHES IN CANCER RISK ASSESSMENT .. 31
      A, Use in Hazard Ranking	 31
            1. Guidance for Developing Comparable EDi0s	 31
            2. Sources of TD50s and ED10s			\ .'.32
      B. Use in Low-Dose Extrapolation	 33
            1. Proposal for Low-Dose Extrapolation under the 1996 Proposed
            Guidelines for Carcinogen Risk Assessment	:... 33
            2. Issues in Choosing a Point of Departure	'	 37
      C. Dose-Response Characterization -.....'	 38
VI. FUTUREPLANS	,.. 40
      A. Updating of Guidance Document	40
      B. Potential for Use of the BMD/C in Cost-Benefit Analyses . . /	 40
            1. Background	'	40
            2. Benefits estimates using BMD/C and RfD/C	41
            3. Issues in quantifying noncancer health benefits	 42
      C. Research and Implementation Needs	43
VH. REFERENCES	.',	>.	46

APPENDICES
A. Aspects of Design, Data Reporting, and Route Extrapolation
   Relevant to BMD/C Analysis	'.	:	 55

DRAFT-DO NOT CITE OR QUOTE       iv                        AUGUST 9, 1996

-------
      1.'Design .... . .		 . . .	55
      2. Aspects of Data Reporting ...............:.........'. •. .....,...:	 55
•  •                                 '     '•'"."'••'••'•.        ••''•-'
      3. Route Extrapolation	 .	 57
B. Selecting the Benchmark Response (BMR) Level	 .....'..	58
      1. Biologically Significant Change for Specifying the BMR	.. 58
      2. Limit of Detection for Specifying the BMR .	.:,.....,...	,	.59
      3. Examples	62
                      '        •                     /                    b      -• • •
      4. Selecting the'Critical Power Level for the Limit of Detectability	65
C. Mathematical Modeling ..............,;.......	...... i,..-	 66
      1. Introduction	,	... ± . I........	 66
      2. Model Selection ....;..,.	 66
             a. Type of endpoint	:.. .67
             b. Experimental design	68
             c.. Constraints and variables	..............:.....	'... 69
      3. Model Fitting  ......		 69
      4. Assessing How Well the Model Describes the Data	 71
      5. Comparing Models	 .	 72
      6. Using Confidence Limits to Get a BMD/C	 .. ...	  ....  . ......... 73
D. Examples of BMD/C Analyses ..	 76
      Example#i: CarbonDisulfide	 .-.,.•	 . ..... .... ... 76
      Example #2: l,l,i,2-Tetrafluoroethane (HFC-134a) .		 80
      Example #3: Boron	*...,........	 : . 85
      Example #4: 1,3-Butadiene	'...'	91
E. List of Models Planned to be Included in the First Release of the EPA Software ...... . .. 94
DRAFT-DO NOT CITE OR QUOTE        v                        AUGUST 9, 1996

-------
        ,  '•    .                  LIST OF FIGURES
         (      *    *                  •                                %

  1.     Example of calculation of aBMD	.".' ..!.......             13
  2.     Dose response curves for models incorporating different forms of spontaneous
                                     i                . .  ,               ,
        background response	-.	               61
  3.     Detectable extra risk or additional risk for a sample size of 25 (N)	64
  4.     Carbon disulfide modeling	           79
  5.     HFC-134amodeling	;	..'....;...... .'•	   84
  6.     Boron modeling - continuous data	  89
  7.     Boron modeling - dichotomous data	     90
 8.'     1,3-Butadiene modeling	  93
DRAFT-DO NOT CITE OR QUOTE       vi                        AUGUST 9, 1996

-------
              -....".                   .PREFACE

       This draft document on guidance for application of the benchmark dose/concentration
 (BMD/C) approach in cancer and nqncancer dose-response assessment is being developed by a
 Technical Panel of the Risk Assessment Forum for use by the U.S. Environmental Protection
 Agency. This document is intended to be used together with the background document on use of
 the benchmark dose approach published earlier by the Agency (EPA, 1995c). While the major
 audience for this document is risk assessors within the Agency, it is expected that this guidance
 will-be considered and possibly used by other organizations as well.                       .
       This draft guidance document is currently under development, but is being made available
 at this stage for a peer consultation .workshop to be held September 10-11, 1996, at the Holiday
 Inn, Bethesda, MD. Several  experts in the areas of toxicology, statistics, and mathematical.
 modeling have been asked to review the document and provide input at this early stage of
 development on several issues for which there is ongoing discussion within the Agency.  These
 issues include: the appropriate selection of studies and responses for BMD/C analysis, use of
 biological significance or limit of detection for selection of the benchmark response (BMR),
 model selection and fitting, use of the lower confidence limit as the BMD/C, and the default
 decision approach proposed in this document for the BMD/C analysis.
       Our overall goal in developing this document is to have a procedure that is usable, that has
 reasonable criteria and defaults to avoid proliferation of analyses and model shopping, and that
 promotes consistency among analyses.  Ultimately, we are trying to move cancer and noncancer
 assessments closer together, using precursor and mode of action data to extend and inform our
 understanding of risk in the range of extrapolation. We would like to have in one package
 something that is usable for cancer and noncancer assessments when endpoints are relevant to
 both.                                                               :
                   ' ~ *'   f                -          '                  ."-'*•'"
       We also are asking reviewers to comment on how understandable the document is for the
 general toxicologist and risk assessor, In an effort to achieve a readable and usable  document, the
 primary information on approaches and .defaults is contained in the main body of the document, ,
with more detailed discussion of various steps in the process included in the appendices. Several

DRAFT-DO NOT CITE OR QUOTE        vii                          AUGUST 9, 1996

-------
 examples of application of the BMD/C approach using the procedures recommended in this
 guidance are included in an appendix. In a separate but parallel effort, the EPA is developing
 user-fiiendly software for BMD/C analysis to be distributed widely for use by the risk assessment
 community, and our goal is to make the software consistent with the guidance and. default
 procedures developed in this document.
DRAFT-DO NOT CITE OR QUOTE       viii                         AUGUST 9,1996

-------
                          AUTHORS AND REVIEWERS

TECHNICAL PANEL

Carole A. Kimmel, Chair*
R. Woodrow, Setzer, Jr.*                              ,
Dan Guth*
Elizabeth Margosches*
Suzanne Giannini-Spohn*
Linda Teuschler*
Jini Cogliano*
Annie Jarabek*
Jeanette Wiltse
RobertMacPhail
Yogendra Patel                                        •
William Sette     .         /                           •
Carole Braverman
Rick Hertzberg
John Vandenberg
Hugh Pettigrew


* Authors
DRAFT-DO NOT CITE OR QUOTE        ix             .           AUGUST 9, 1996

-------

-------
               ..             I. EXECUTIVE SUMMARY
               • .  i      - _ ,                     ,                  „
A. Introduction
       The US EPA conducts risk assessments for an array of noncancer health effects as well as
for cancer. The process of risk assessment, based on the National Research Council (NRC)
paradigm (1983), has several steps: hazard characterization, dose-response analysis, exposure
assessment, and risk characterization. Risk assessment begins with a thorough evaluation of all
the available data to identify and  characterize potential health hazards.  The next stage of the
process, dose-response analysis, involves an analysis of the relationship between exposure to the
chemical and health-related outcomes,,and historically has been done very differently for cancer
and noncancer health effects. The common practice under the 1986 Guidelines for Carcinogen
                         '     i     . . • .         '     '     •    .  ;       '          •
Risk Assessment (EPA, 1986a) was to assume low dose linearity and model tumor incidence data
by applying the linearized multistage (LMS)  procedure, which extrapolates risk as the 95% upper
confidence limit. The standard practice for the dose-response analysis of noncancer health effects
has been to determine a Ipwest-observed-adverse-effect-level (LOAEL) and a no-observed-
adverse-effect-level (NOAEL).. The LOAEL is the lowest dose for a given chemical at which
adverse effects have been detected while the NOAEL is the highest dose at which no adverse
effects have been' detected. The NOAEL (or LOAEL, if a NO AEL is not present) is adjusted
downward by uncertainty factors intended to account for limitations in the available data to arrive
at an exposure that is likely to be without adverse effects in humans.  The NOAEL can also be
compared with the human exposure estimate to derive a margin of exposure~(MOE). The
LOAEL and NOAEL represent an operational definition of quantities that can characterize a
study, and do not necessarily have any consistent association with underlying biological processes
nor with thresholds.                        _•'•!'•
       With the recent publication of EPA's  Proposed Guidelines for Carcinogen Risk
Assessment (1996a), the dichotomy between quantitative approaches for cancer and noncancer
risks will begin to break down. The proposed guidelines promote the understanding of an agent's
mode of action in producing tumors to reduce the uncertainty in describing the likelihood of harm
and in determining the dose-response(s). The proposed guidelines call for modeling of hot only

DRAFT-DO NOT CITE OR QUOTE        1                          AUGUST 9, 1996

-------
 1     .tumor data but other responses thought to. be important precursor events in the carcinogenic
 *             '            '           ,                            •           •.   -,   .
 2    . process. The dose-response assessment under these guidelines is a two-step process: (1) response
 3     data are modeled in the range of empirical observation -- modeling in the observed range is done
 4     with biologically-based, case-specific, or with appropriate curve-fitting models; and the (2)
 5     extrapolation below the range of observation is accomplished by modeling if there are sufficient
 6     data or by a default procedure (linear, nonlinear, or both). A point of departure for extrapolation
        *                                         '*"'..
 7     is estimated from this modeling, and low dose extrapolation proceeds from this point of
 8     departure.                         .
 9            The benchmark dose approach discussed in this document is a way of determining the
10     point of departure, and can be used in cancer and noncancer risk assessment as the basis for linear
11     low-dose extrapolation, calculation of an MOE, or application of uncertainty factors for
12     calculating oral reference doses (RfDs), inhalation reference concentrations (RfCs) or other
13     exposure estimates.
14            Several limitations to the use of the LOAEL and NOAEL have been discussed in the
                                  «                                                         '
15     literature (Crump, 1984; Gaylor, 1983; Kimmel and Gaylor, 1988) and by the EPA's Science
16     Advisory Board (EPA, 1986b, 1988a, b, 1989c). These include the fact that the NOAEL is
17     limited to one of the doses in the study and is dependent on study design, in particular on dose
18     selection and spacing.  The NOAEL also does not account for variability in the data, and thus, a
19     study with a limited number of animals will often result in a higher NOAEL than one which has
20     more animals.  In addition, the slope of the dose-response curve is not taken into account in the
21     selection of a NOAEL and is not usually considered unless the slope is very steep or very shallow.
22     Additionally, a LOAEL cannot be used to derive a NOAEL when one does not exist in a study;
23     rather, an uncertainty factor is applied to account for this limitation.
24            In an effort to address some of the limitations of the LOAEL and NOAEL, Crump (1984)
25     proposed the benchmark dose (BMD) approach as an alternative.  Using this approach, the
26     experimental data are modeled, and an oral benchmark dose (BMD) or inhalation benchmark
27     concentration (BMC) in the observable range is estimated. The BMD/C1 is not-constrained to be
                       is used when the text can apply to values derived from either oral or inhalation
      . exposure data.
       DRAFT-DO NOT CITE OR QUOTE        2                           AUGUST 9, 1996

-------
. *  l     one of the-experimental doses, and can be used as a more consistent point of departure than either
   2     the LOAEL or NOAEL.  This approach uses the dose-response information inherent in the data.
   3     The BMD/C accountsfor the variability in the data since it is defined asthe lower confidence limit
   4.     on the dose estimated to produce a given level of change hi response (termed the benchmark
   5     response, BMR) from controls. The BMD/C approach models all of the data in a study and the
   6     slope of the dose-response curve is integral to the BMD/C estimation.  A BMD/C can be
   7     estimated even when all doses in a study are associated with a response (i.e., when there is no
  .8     NOAEL).  The BMD/C estimate is best when there are doses in the study near the range of the
 ,9     BMD/C.                                                	
 10      •'     -      . •          . -    '"       "•.-•"      •     '"'          '.'•'•     - •. •
       '     -   '   .     •         "         "             .        ,               ' '             >
 11     B.  Decision  Process for Carrying out a Benchmark Dose/Concentration Analysis
 12      .     This document describes the proposed approach for carrying out a complete BMD/C
 13     analysis. It is organized in the form of a decision process including the rationale and defaults for
 14     proceeding through the analysis, and follows a similar .framework to that outlined in the
 15      background document (EPA, 1.995c), and is meant to be used hi conjunction with that document.
 16      The guidance here imposes some constraints on the BMD/C analysis through decision criteria,
 17      and provides  defaults when more than one feasible approach exists. Steps in the guidance are
 18      discussed hi the body of the document and in more detail in appendices to this document.  The
 19      following guidance is applicable to dose-response analysis of noncaricer health effects and may
  f            •                    -                  '                   '              -""
 20      also be applicable to cancer dose-response analysis under the proposed guidelines (EPA, \996.a).
 21      1. Data array analysis - endpoint selection.
 22            Selection of the appropriate studies and endpoints is discussed in Appendix A and hi
 23     various EPA publications (U.S. EPA, 1991a, 1994c, 1995f, 1996a and b). In general, the
 24     selection of endpoints to model should focus on endpoints that are relevant or assumed relevant
 25     to humans and are potentially the "critical" effect (i.e.., the most sensitive).  Since differences in
 26     slope could result in an endpoint having a lower BMD but a higher LOAEL or NOAEL than
 27     another endpoint, selection of endpoints should not be limited only to the one with the lowest
 28 •    LOAEL or NOAEL. In general, endpoints should be modeled if theirLOAEL is up to 10-fold
 29     above the lowest LOAEL. This will insure that ho endpoints with the potential to have the lowest

        DRAFT-DO  NOT CITE OR QUOTE        3                           AUGUST 9, 1996

-------
,  1     BMD are Bxclutfed from the analysis on the basis of the value of the LOAEL.
  *              ! .       »                        -
   *                          '           .•   "         '               *
  2     2. Minimum data set for calculating a BMD/C
  3     •       Once the critical endpoints have been selected, the data sets are examined for.the
  4     appropriateness of a BMD/C analysis. The following constraints on data sets are used:
  5     4     At a minimum., the number of dose groups and subjects should be sufficient to allow
  6            determination of a LOAEL.                        •
  7     4     There must be more than one exposure group with a response different than controls (this
  8            could be determined with a pairwise comparison or a trend test).  With only one
  9            responding group, there is inadequate information about the shape of the dose-response
 10            curve,  and mathematical modeling becomes too arbitrary.
 11     4     Dose-response modeling is not appropriate if the responding groups all show responses
 12            near maximum (e.g., greater than 50% response for quanta! data or a clear plateau of
 13            response for continuous data). In this case, there is inadequate information about the
 14            shape of the curve in the low dose region near the BMR.
 15     3. Criteria for selecting the benchmark response (BMR)
 16            The BMD/C approach requires that a level of change in response (the BMR) be specified
 17     in order to  calculate the BMD/C. In this proposal, there are two bases for specifying the BMR: a
 18     biologically significant change hi response for continuous endpoints, or the limit of detection for
 19     either quantal or continuous data.  In most cases, the question concerning how much of a change
 20    ° in a continuous endpoint is biologically significant has not been addressed, and the level of change
 21     that is considered adverse is based hi large part on the detectable level of response or limit of'
 22     detection for a particular study design. For quantal data, the number of responders is counted,
 23     and again the limit of detection is used. The limit of detection is based on the background
 24     response rate,  sample size, power level, and whether extra or additional risk is used hi the model.
 25     Standard study designs for various endpoints should be used to determine the general background
 26     rate for a response level and the number of animals typically used.  The following default
 27     decisions will be used in selection of the BMR.
 28     4     Biological significance: If a particular level of response for a continuous endpoint has
 29            been determined to be "biologically significant," the BMR is based on that degree of

      .  DRAFT-DO NOT CITE OR QUOTE         4                          AUGUST 9, 1996

-------
* l      .-      change from background (e.g., adult body weight reduction >10% from the control value;
  2    •         a 10% or more decrease in nerve conduction velocity).
  3      4  "'    Limit of Detection:  When the biologically significantlevel of response has not been
              '•' • •   .'.         ,   • •' \. •   .•  .  '       '   /      '   '    ._.....-.  .'..'':'..•
  4             determined or is equated with a statistically significant response, the limit of detection
  5             method is used. To find the magnitude of response just detectable (the limit of detection),
  6             a default power level of 50% (may increase, pending simulation studies) and a one-
  7             sided test with a Type I error of 0.05 (p<0.05) is used.
  8      4      Defaults: For quanta! data, a 10% increase in extra risk will be used as the default
  9             approach when neither biological significance nor the limit of detection has been,
 10             determined. When the BMR is set on the basis of biological significance, extra risk will be
 11             used as a default.  When the limit of detection approach is used for the BMR, it does not
 12       •      matter whether extra or additional risk is used, as the BMR that corresponds to the same
 13             limit of detection for the risk formulation is determined and used in the model.
 14      4.  Order of model application
 15             The models should be executed either using software produced and distributed by the
 16.      EPA or using software that carries out the curve fitting in a manner similar to the EPA software.
 17      Allowing the use of more than one model at this stage is done because there is not enough
 18      experience with modeling a wide variety of endpoints, and the best-fitting model may be
 19      somewhat endpoint-specific. The following order for running the models should be used:
 20      4      Continuous data: A linear model should be fun first. If the fit of a linear model is not
 21             adequate, the polynomial model should be run, followed by the continuous power model.
 22             Other models  may be applied at the discretion of the risk assessor.
 23      4      Dichotomous  data: As there is currently no rationale for selecting one versus another, one
 24             or more of the following models should be applied: the log-logistic, Weibull, and quanta!
 25             polynomial models. For developmental toxicity data, models with fetuses nested within
 26             litters should be used. The nested log-logistic model tended to give a better fit more often
 27             in studies by Allen et al.  (1994b), but other models often gave a good fit as well.  Other
 28             models may be applied at the discretion of the risk assessor.
 29       '        '   '        ••           -•:'•  "_     ,    '     •     .•_'•'..••.•••-':

        DRAFT-DO NOT CITE OR QUOTE        5                           AUGUST 9, 1996

-------
 ^1     5. Determining the model structure   .
  *             *            *                 . " •                   .     ,
  2      .     The parameters included i;: 'he models most commonly used for dose-response analysis,
  3     and examination of the change in the shape of the dose-response curve as those parameters are
  4     changed, are described in the background document (EPA, 1995c).  This section provides
  5     guidance on choosing a model structure appropriate to the data being analyzed:
  6     4      Dose Parameter Restriction: As a default, the exponent on the dose parameter in the
  7            quantal Weibull model and the coefficient on slope in the log-logistic model will be
 , 8            constrained to be greater than or equal to 1. Unconstrained models may be applied if
  9            necessary to fit certain data. This constraint is typically necessary to avoid unstable
 10            numerical properties in calculating the confidence interval when the parameter value is less
 11   '         thanl.
 12     4      Degree of the Polynomial: A 2-step procedure will be used as a default for determining
 13            the degree of the exponent in any polynomial model for continuous or quantal data (this is
 14            a place-holder, pending further discussion and agreement). First, a default of the
 15            degree equal to k-1, where, k is the number of groups, will be used. Second, a stepwise
 16            top-down reduction hi the degree will be performed to select the model with the fewest
 17            parameters that still achieves adequate fit (based on p>0.05 from the goodness-of-fit
 18            statistic). This approach will be built into software distributed by EPA.
 19     4     Background Parameter.  As a default approach, the background parameter will riot be
20            included hi the models. A background term should be included if there is evidence of a ,'
21            background response.  This would be the case if there is a non-zero response hi the
22             control group.   A background parameter would also be included if the first 2 or more
23             dosed groups do not show a monotonic increase in response (e.g., if the first 2 dosed
24            groups show the same response or if the second dosed group has a lower response than
25            the lowest dosed group). If there is doubt about whether a background parameter is
26            needed, it is usually conservative (i.e., will result in a lower BMD) if the background
27            parameter is excluded from the model. This is an area where more work is needed. '
28      4      Use of extra risk versus additional risk for quantal data: As hi selection of the BMR,
29            above.

       DRAFT-DO NOT CITE OR QUOTE        6                          AUGUST 9, 1996

-------
  4      Threshold Parameter. .A so-called "threshold" (intercept) term will not be included in the
         models used for BMD/C analysis because it is not a biologically meaningful parameter
         (i.e., not the same as a biological threshold) and because most data sets can be fit
         adequately without this parameter and the associated loss of a degree of freedom.   This
         will be the default built into the software distributed by EPA.'
  4      Conversion of continuous data to dichotomous format: The standard approach to BMD/C
     ;    analysis will be to model the continuous data directly, without conversion to the
         dichotomous format.  Alternatively, a hybrid approach such as that described by Crump
         (1995) can be used. The hybrid approach models the continuous data, then uses the
        .resulting distribution of the control data to calculate a probability estimate. '
               The conversion of continuous data to dichotomous data and modeling of the
         dichotomous data is not preferred because of the loss of information about the distribution
       ,  of the response that is inherent in the approach, and because of the uncertainties in  '
.  ,       defining a cut-off response level to distinguish responders and nonresponders. Conversion
         to dichotomous data could be considered in the rare cases where much is known about the
         biological consequences of the response and a cutoff can be defined more confidently, or
         in cases where the need for a probabilistic estimate of response outweighs the loss of
         information.
  6. Confidence limit calculation
         The confidence limits will be calculated using likelihood theory and they will be based on
  the asymptotic distribution of the likelihood ratio statistic (with the exception of models fit using
  GEE methods, see Appendix C). This will be the default built into the software distributed by
  EPA. The 95% lower confidence bound on dose will be used as the default confidence interval
  for calculating the BMD/BMC.
  7. Selection of "equivalent" models
        Because each of the available models has some degree of flexibility and is capable of
  describing a range of dose-response patterns, it may be the case that several models seem
  appropriate for the analysis of the data under consideration. This section describes how to
  evaluate, whether these models are equivalent based upon an evaluation of statistical and
                                            • . * '  ,                       •         -        *
  DRAFT-DO NOT CITE OR QUOTE        7  .                         AUGUST 9, 1996

-------
 1      biological considerations.
 *              ' ;       *
 *                  -                 :  •             '               '
 2    ,  4      All models should be retained that are not rejected-at the probability of p>0.05 using the
 3             goodness-of-fit (GOF) statistic.                            ' •
 4      4      A graphical representation of the model should also be developed and evaluated for each
 5             model.  Models should be eliminated that do not adequately describe the dose-response in
 6             the range near the BMR. This is because it is possible that an adequate model fit could be
 7             obtained based on the GOF criteria alone, but the fit of the data is not adequate at the low
 8             end of the dose-response curve. Such a case should not be used for the BMD/C
 9             calculation. This is a subjective judgment and requires that software for dose-response
 10             analysis provide adequate graphical functions. No objective criteria for this decision are
 11             currently available. It should be noted, however, that most quanta! models assume that
 12             each animal responds independently and has an equal probability of responding. Similarly,
 13             for continuous data, the responses are assumed to be distributed according to a normal
 14             probability distribution.  When, these assumptions are not appropriate, for example in
 15             studies  of developmental toxicity where the responses are correlated within litters,
 16             alternative model structures may be used. Biological considerations might also be helpful
 17             to determine adequate model fit.  For example, a smooth change of slope may be deemed
 18             more reasonable for a given response than an abrupt change.
 19      4      If adequate fit is not achieved because of the influence of high dose groups with high
20             response rates, the assessor should consider adjusting the data set by eliminating the high
21             dose group. This practice carries with it the loss of a degree of freedom, but may be
22             useful in cases where the response plateaus or drops off at high doses. Since the focus of
                                                                                  , -    *
23             the BMD analysis is on the low dose and response region, eliminating high dose groups is
24             reasonable.                              •
25             At this point, the remaining models should be considered "equivalent" in terms of their
26      usefulness for BMD/C analysis, especially when there is no biological basis for distinguishing
27      between the models or for choosing the best model.                 .
28      8.  Selecting the BMD/C
29             As the models remaining have met the default statistical criterion for adequacy and visually
               i        *                    ".                                     '

        DRAFT-DO NOT CITE OR QUOTE        8                           AUGUST 9,1996

-------
V l      fit the data, 'any .of them theoretically could be used for determining the BMD/C. The remaining
   2      criteria for selecting the BMD/C are'somewhat arbitrary, and are adopted as defaults.
   3    •  * .   If the BMD/C estimates from the remaining models are within a factor of3, then they are
   4            considered to show no appreciable model dependence and will be considered
   5            indistinguishable in the context of the precision of the methods. Models are ranked based
   6            on the values of their Akaike Information Criterion (AIC), a measure of the deviance of
                                  /            .            '                             '
 ,7            the model fit adjusted for the degrees of freedom, and the model with the lowest AIC is
   8            used to  calculate the BMD/C.                                   .   ;•••
   9      4     If the BMD/C estimates from the remaining models are not within a factor of 3, some
 10            model dependence of the estimate is assumed. Since there is no clear remaining biological
 11            or statistical basis  on which to choose among them, the lowest BMD/C is selected as a
 12            reasonable conservative estimate. If the lowest BMD/C from the available models appears
 13            to be an outlier, compared to the other results (e.g., if the other results are within a factor
 14           ' a 3), then additional analysis and discussion would be appropriate.  Additional analysis
 15            might include the use of additional models,, the examination of the parameter values for the
 16            models used, or an evaluation of the MLEs to determine if the same pattern exists as for
 17   .         theBMD/Cs.  Discussion of the decision procedure should always be provided.
 18     9. Use in Risk Assessment
 19        '    The BMD/C derived based on this default approach should be used in risk assessment as
 20     "discussed in other guidance documents (U.S. EPA, 1991a,  1994c, 1995f,  1996a and b).
 21                               '-•..-:'•'      '••      •      '  '     '".'..
 .22     '      :''•....'•'            •     -       '         ''.''.-.-'"•
        DRAFT-DO NOT CITE OR QUOTE        9                           AUGUST 9, 1996

-------
  1              •     .                  H.  INTRODUCTION
 *                     *    '
  *                          '           .                 ',••--'
  2                                      ''.'•'''•'''
  3      A. Background
  4            The US EPA conducts risk assessments for an array of noncancer health effects as well as
  5      for cancer. The process of risk assessment, based on the National Research Council (NRC)
  6      (1983) paradigm, has several steps: hazard characterization, dose-response analysis, exposure
  7      assessment, and risk characterization. Risk assessment begins with a thorough evaluation of all
  8      the available data to identify and characterize potential health hazards.  The next' stage of the
  9      process, dose-response analysis, involves an analysis of the relationship between exposure to the
 10      chemical und health-related outcomes, and historically has been done very differently for cancer
 11      and noncancer health effects. The common practice under the 1986 Guidelines for Carcinogen
 12      Risk Assessment (EPA, 1986a) was to assume low dose linearity and model tumor incidence data
 13      by applying the linearized multistage (LMS) procedure, which extrapolates risk as the 95% upper
 14      confidence limit. This linear default position is, in part, based on the belief that the process of
 15      cancer induction is similar among chemicals, namely electrophilic reaction of carcinogens with
 16      DNA, causing mutations that are essential elements of the carcinogenic process.
 17            The standard practice for the dose-response analysis of noncancer health effects has been
 18      to determine a lowest-observed-adverse-effect-level (LOAEL) and a no-observed-adverse-effect-
 19      level (NOAEL). The LOAEL is the lowest dose for a given chemical at which adverse effects
20      have been detected while the NOAEL is the highest dose at which no adverse effects have been
21      detected.  The NOAEL (or LOAEL, if a NOAEL is not present) is adjusted downward by
22      uncertainty factors intended to account for limitations in the available data to arrive at an
23      exposure that is likely to be without adverse effects hi humans. The NOAEL can also be
24      compared with the human exposure estimate to derive a margin of exposure (MOE). The general
25      default assumption for noncancer health effects is one of a threshold response, and thus is the
26      basis for the NOAEL/uncertainty factor or NOAEL/MOE approach.  The LOAEL and NO AEL
27      represent an operational definition of quantities that can characterize a  study, and do not
28      necessarily have any consistent association with underlying biological processes nor with
29      thresholds.

        DRAFT-DO NOT CITE OR QUOTE        10                '          AUGUST 9, 1996

-------
        With the.recent publication of EPAIsProposed Guidelines for Carcinogen Risk
 Assessment (1996a), the dichotomy between quantitative approaches for cancer and noncancer
 will begin to break down. The proposed guidelines promote the understanding of an agent's
 mode of action in producing tumors .to reduce the uncertainty in describing the likelihood of harm
 and in determining the dpse-response(s). The proposed guidelines call for modeling of not only
 tumor data, but other responses thought to be important.precursor events in the carcinogenic
 process.  Thus, the dose-response extrapolation procedure follows conclusions in the hazard
 assessment about the agent's carcinogenic mode of action. The dose-response assessment under
 these guidelines is a two-step process: (1) response data are modeled in the range of empirical
 observation ~ modeling in the observed range is  done with biologically-based, case-specific, or
 with appropriate  curve-fitting models; and then (2) extrapolation below the range of observation
 is accomplished by modeling if there are sufficient or by a default procedure (linear, nonlinear, or
 both). A point of departure for extrapolation is estimated from this modeling. The linear default
 is a straight-line extrapolation to the origin from the point of departure, and the nonlinear default
 approach begins at the identified point of departure and provides a margin of exposure (MOE)
 analysis rather than estimating the probability of effects at low doses.
       The benchmark dose approach, discussed in this document is a way of determining the
 point of departure, and can be used as the basis for linear low-dose extrapolation, calculation of
 an MOE, or application of uncertainty factors for calculating oral reference doses (RfDs),
 inhalation reference concentrations (RfCs), or other exposure estimates.
       Several limitationsI to the use of theLOAEL and NOAEL have been discussed in the
 literature (Crump, 1984; Gaylor, 1983; Kimmel and Gaylor, 1988) and by the EPA's Science
 Advisory Board (EPA, 1986b, 1988a,b, 1989c).  These include the fact that the NOAEL is limited
 to one of the doses in the study and is dependent  on study design, in particular on dose selection
 and spacing. The NOAEL also does not account for variability in the data,  arid thus, a study with
 a limited number of animals will often result hi a higher NOAEL than one which has more
 animals.  In addition,  the .slope of the dose-response curve is not taken into  account in the
 selection of a NOAEL and is not usually considered unless the slope is very steep or very shallow.
Additionally, a LOAEL cannot be used to derive a NOAEL when one does not exist in a study,

DRAFT-DO NOT CITE OR QUOTE        11 "                        AUGUST 9, 1996

-------
,  ^1      rather an uncertainty factor is applied to aqcount for this limitation.
   *              "            '.-'•*'       •            *               '
  2      <,     In an effort to address some of the limitations of the LOAEL and NOAEL, Crump (1984)
  3      proposed the benchmark dose (BMD) approach as an alternative. Using this approach, the
                       * *          ,                  '              ' •
  4      experimental data are modeled, and an oral benchmark dose (BMD) or inhalation benchmark
  5      concentration (BMC) in the observable range is estimated (see Fig. 1). The BMD/C is not
  6      constrained to be one of the experimental doses, and can be used as a more consistent point of
  7      departure than either the LOAEL or NOAEL. This approach uses the dose-response information
  8      inherent in the data. The BMD/C accounts for the variability hi the data since it is defined as the
  9      lower confidence limit on the dose estimated to produce a given level of change hi response
 10      (termed the benchmark response, BMR) from controls.  The BMD/C approach models all of the
 11      data in a study and the slope of the dose-response curve is integral to the BMD/C estimation. A
 12      BMD/G can be estimated even when all doses hi a study are associated with a response (i.e., when
 13      there is no NOAEL). The BMD/C estimate is best when there are doses hi the study near the
 14      range of the BMD/C.                 .    .       •
 15            Data used in the determination of a LOAEL/NOAEL or BMD/C are essentially of two
 16      types: quantal and continuous. Quantal data are often dichotomous (yes/no) responses and are
 17      usually presented as counts or incidence of a particular effect or of individuals affected.  Quantal
 18      data are represented by such endpoints as  tumor incidence, mortality,  or malformed offspring. A
 19      specialized case of quantal responses is when data are categorized by severity of effect, e.g., mild,
 20      moderate, or severe. This categorization of data is often used for histopathological lesions. At the
 21      other extreme are continuous data, which  represent a continuum of response and are usually
 22      represented by a measurement.  Body weight, serum liver enzyme activity or nerve conduction
 23     velocity are examples of continuous responses.  Continuous data can also be expressed as quantal
 24      responses by determining some magnitude of change from controls that is considered  significant,
 25      then counting the number of individuals above or below that cutoff level. The level of significance
 26      can be based either on biological significance, or on statistical significance.  For example, a
 27      decrease in adult body weight may not be  considered adverse until it is > 10%, this magnitude of
         change is based on biological significance.
        DRAFT-DO NOT CITE OR QUOTE       12                          AUGUST 9, 1996

-------
     100
 O)
 c
 T3
 C
 o
 Q.
 J2
 (0
    BMP-


       0
                         Indicates data point .
                         with confidence bars
                             Lower statistical
                             limit on dose
   I
--4—-	 '  ,
                             BMD
                          NOAEL

                               Dose
         BMR 3 targe) response level used to define BMD
 Figure 1. Example of calculation of a BMD
                                                      Best-fitting dose
                                                      response model
DRAFT-DO NOT CITE OR QUOTE
                             13
AUGUST 9,1996

-------
-  .*      •     Calculation of a BMD/C requires selection of a benchmark response (BMR) level, i.e., the
  2      level of change in response from considered to be an adverse effect. Basing BMRs. on the level of
  3      biological significance automatically determines the level of the BMR. Basing them on statistical
  4      significance, however, is dependent on the limit of detection for a particular study .design.
  5      Selection of the'BMR based on the limit of detection is discussed hi detail in Appendix B.
  6            Since the initial  proposal of the BMD by Crump .(1984), a number of papers have been
  7      published dealing with application of the BMD/C. approach. Many of these are included  in the
  8      reference list for this document.  A background document on the use of the BMD/C in health risk
  9      assessment was published by the EPA's Risk Assessment Forum (EPA, 1995c). This document
 10      provided a framework for the decision points in the process of applying the BMD/C approach.
 11       The background document is to be used as a companion to the current guidance document.
 12             Several workshops and symposia have-been held to discuss the application of the  BMD/C
 13       and appropriate methodology (Kimmel et al., 1989; California EPA,  1993; Beck et al., 1.993; SRA
 M       Symposium, 1994; Barnes et al., 1995). The most recent workshop  was conducted by ILSI Risk
 15       Science Institute and sponsored by the EPA and AfflC (Barnes et al., 1995).  The participants at
 1G      the workshop generally  endorsed the application of the BMD/C for all quantal noncancer
 17      endpoints and particularly for developmental toxicity, where a good  deal of research has been
 18      done. Less information was available at the time of the EPA/AIHC/ILSI workshop on the
 19      application of the BMD/C approach to continuous data, and more work was encouraged. A
20      number of other issues concerning the application of the BMD/C were discussed. The guidance
21      and default options set forth hi the current document are based hi part on the outcome of this
22      workshop, the background document (EPA, 1995c),  and on more recent information and
23      discussions.             "         '              .                               -
24             A number of research efforts, many of which have dealt with  reproductive and
25      developmental toxicity data, have provided extremely useful information for application of the
26      BMD/C approach (e.g.,  Alexeeffet al., 1993; Catalano et al., 1993; Chen et al., 1991; Krewski
27      and Zhu, 1994,1995; Auton, 1994; Crump, 1995). In a series of papers by Faustman et al.
28   *   (1994); Allen et al. (1994a and b), and Kavlock et al. (1995), the BMD approach was applied to a
29      large database of developmental toxicity studies. In brief, the results of these studies' showed that
                   *                               .*•-'.-•'                   *
        DRAFT-DO NOT CITE OR QUOTE        14  '                        AUGUST 9, 1996

-------
  when data'were .expressed as counts of dichotomous endpoints (i.e., number of litters per dose
  group with resorptions or malformations)., the NOAEL was approximately 2-3 times higher than
  the BMD for a 10% probability of response above control values (approximately 20 animals per
  dose.group), and 4-6 Jimes higher than the BMD for a 5% probability of response.. When the data
  Were expressed as the proportion of affected fetuses per litter (nested dichotomous data), the
  NOAEL was on average 0.7 times the BMD for a 10% probability of response, and was
  approximately equal, on average, to the BMD for a 5% probability of response. Expressing the
  data as the proportion of affected fetuses per litter is the more appropriate way to analyze
  developmental toxicity data. However, the results of the quantal data analysis also may apply to
  using the BMD/C approach with other quantal data, and suggest that the NOAEL in these cases
  may be at or above the 10% true response level, depending on sample size and background rate.
         Since reduced fetal weight in developmental toxicity studies often shows the lowest
  NOAEL among the various endpoints evaluated, the application of the BMD to these continuous
  data also was evaluated (Kavlock et al., 1995). A variety of cutoff values was explored for
  defining an adverse level of weight reduction below control values. In some cases, data were
  analyzed using a continuous power model, and in other cases, the data were transformed to
  dichotdriiqus data. Comparisons with the NOAEL showed that several cutoff values could be
 . used to give values similar to the NOAEL. These analyses suggest ways in which BMD/Cs may
  be developed for continuous data from a variety of endpoints.
        In a recent paper, Crump (1995) detailed a new approach to deriving a BMD/C for
  continuous data based on a  method originally proposed by Gayldr and Slikker (1990). This
  approach makes use of the distribution of continuous data, estimates the incidence of individuals;
 •falling above or below a level considered to be adverse, and gives the probability of responses, at
  specified doses above the control levels. This results in an expression of the data in the same
  terms as that derived from analyses of quantal data, and allows more direct comparison of BMDs
  derived from continuous and quantal data. This approach has not been applied to  many data sets  .
  as software has not been developed until recently.
        Another approachito the derivation of BMD/Cs for noncancer health effects is the
  multinomial modeling approach. Many noncancer health effects are characterized ,by multiple

.  DRAFT-DO NOT CITE OR QUOTE        15                          AUGUST 9, 1996

-------
 1     endpoints'that are not completely independent of one another.  Catalano et al. (1993), Chen et al.
 2     (1991), and Zhu et al. (1994) have worked on this issue using developmental toxicity,data, and
 3     have shown that, in general, the BMD derived from a multinomial modeling approach is lower
 4     than that for any individual endpoint. This approach has not been applied to other health effects
 5     data, but should be kept in mind when multiple related outcomes are being considered for a
 6     particular health effect.
 7            Other types of approaches have been proposed for analysis of noncancer data, e.g.,
 8     categorical regression modeling (Dourson et al., 1985; Hertzberg, 1989; Hertzberg andMiller,
 9     1985; Guth; Simpson), and Bayesian approaches (Hasselblad and Jarabek, 1995), that may be
10     more appropriate in some cases. There has not been much experience as yet in using these
II     approaches for deriving BMD/Cs.
12            As discussed earlier most of the effort on development of BMD/C approaches has been
13     related to noncancer health effects, but recent proposed guidelines for cancer risk assessment
14     (EPA, 1996a) suggest an approach to dose-response analysis that is similar to use of the BMD/C
15     for noncancer risk assessment. While terminology and exact procedures  are not the same, the
16     concept is parallel to the BMD/C approach (see Section V for further discussion). Thus,
17     guidance is provided in this document on the use of the BMD/C as the point of departure for low
18     dose extrapolation of both cancer and noncancer health effects.
19
20     B. Purpose of This Guidance Document                           .
21            The purpose of this document is to provide guidance for the Agency and the outside
22     community on the application of the BMD/C approach in the calculation of RfD/Cs and other
23     exposure standards and in the estimation of cancer risks for EPA use.  The document provides
24     guidance based on today's knowledge and understanding, and on experience gained hi using this
25     approach. The Agency is actively applying this methodology and evaluating the outcomes  for the
26     purpose of gaining experience in using it with a variety of endpoints.  This document is intended
27     to be updated as new information becomes available that would suggest approaches and default
28     options alternative or additional to those indicated here. The document should not be viewed as
29     precluding additional research on modified or alternative approaches that will improve

       DRAFT-DO NOT CITE OR QUOTE        16^                         AUGUST  9, 1996

-------
"  1     quantitative risk, assessment.  In fact, the use of improved scientific understanding and
  2 .    development of more mechanistically-based approaches to dose-response modeling is strongly
  3     encouraged by the Agency.                                       '
'  4• .     '      •  '•   .'..  '       '         .          .       "'     '   "       •"''  '•    •      •_  '
  5     C. Definition of the Benchmark Dose or Concentration
  6            The benchmark dose or concentration (BMD/C) is defined as the statistical lower
  7            confidence limit on the dose estimated to produce a predetermined level of change
  8'           in response (the benchmark response - BMR) relative to controls.
                                  t      '        -                       -            .'
  9      '               "         ''•'..'-..    •"•'..•'   •  '   • '      '  •   .  •
 10            The BMD/C is intended to be used as an alternative to the NOAEL in deriving a point of
 11     departure for low dose extrapolations.  The BMD/C is a dose corresponding to some change in
 1.2     the level of response relative to background, and is not dependent on the doses used hi the study.
 13     The BMR is based on a biologically significant level of response or on the response level at the
 14     lower detection limit of the observable dose range for a particular endpoint in a standard study
 15     design. The BMD/C approach does not reduce uncertainty inherent in extrapolating from animal
 16     data to humans (except for that in the LOAEL to NO AEL extrapolation), and does not require
 17     that a study identify a NOAEL, only that at least one dose be near the range of the response level
 18     forthe BMD/C.
 19      •      "- '  .   •        '"     '.          .               ..     .     •   •'
        DRAFT-DO NOT CETE OR QUOTE        17                          AUGUST 9, 1996

-------
 ,                      .             BENCHMARK DOSE GUIDANCE
  *              '    :                 '                              •
 2
 3     ' •      This section describes the proposed approach for carrying out a complete BMD/C
 4      analysis. It is organized in the form of a decision process including the rationale and defaults for
 5      proceeding through the analysis, and follows a similar framework to that outlined in the
 6      background document (EPA, 1995c). The guidance here imposes some constraints on the
 7      BMD/C analysis through decision criteria, and provides defaults when more than one feasible
 8      approach exists. Steps in the guidance are discussed here and in more detail in Appendices A-C.
 9                                                    .             :             •     •   .   .    ,
 10      A. Data Array Analysis - Endpoint Selection
 11             The first step in the process is a complete qualitative review of the literature to identify
 12      and characterize the hazards related to a particular compound or exposure situation. This process
 13      is the same whether a BMD/C analysis or a NOAEL approach is used. Guidance on review of
 14      data for risk assessment can be found hi a number of EPA publications (EPA, 1991a, 1994c,
 15      1995f, 1996a and b). Further discussion of design issues, data reporting, and route extrapolation
 16      relevant to the BMD/C analysis can be found hi Appendix A.
 17            Following a complete qualitative review of the data, the risk assessor must select the
 18      studies appropriate for benchmark dose analysis. The selection of the appropriate studies is based
 19      on the human exposure situation that is being addressed, the quality of the studies, and the
20      relevance and reporting adequacy of the endpoints.
21            The process of selecting studies for benchmark analysis is intended to identify those
22      studies for which modeling is feasible, so that BMD/Cs can be calculated and used hi risk
23      assessment. In most cases the selection process will identify a single study or very few studies for
24      which calculations are relevant, and all studies should be modeled.  However, cases with many
25      studies, or studies hi which many endpoints are reported would require a very large number of
26  *    BMD/C calculations. Multivariate analysis of such studies to describe correlations between
27      effects is a useful tool,  but has not been explored in terms of the BMD/C method.  .
28            The analysis of a large number of endpoints may result hi benchmark calculations which
29      are redundant in the information they convey. In these cases, it is useful to select a subset of the

        DRAFT-DO NOT CITE OR QUOTE         18                           AUGUST 9, 1996

-------
 endpoints-as representative of the effects in the target organ or the study!  This selection can be
 made on the basis of sensitivity or severity, which may be more easily compared within a single
 study in the same target organ than across studies. Within an experiment, an endpoint may be
 selected based.on how well it represents others for the same target organ and on its dose-response
 behavior. It is reasonable to select a representative endpoint that shows smoothly increasing
 response with increasing dose in order to obtain a good fit of the dose response model.
 1.  Selection of Endpoints to be Modeled.
        Once endpoirits have been evaluated with regard to their relevance for BMD/C analysis,
 the selection of endpoints to model  should focus on endpoints that are relevant or assumed
 relevant to humans and potentially the "critical" effect (i.e., the most, sensitive).  Since differences
9in slope could result in an endpoint having a lower BMD but a higher LOAEL or NOAEL than
 another endpoint, selection of endpoints should not be limited only to the one with the lowest
 LOAEL.  In general; endpoints should be modeled if their LOAEL is up to 10-fold above the
 lowest LOAEL. This will insure that no endpoints with the potential to have the lowest BMD/C
 are excluded from the analysis on the basis of the value of the LOAEL.
 2. Minimum  Data Set for Calculating a BMD/C
       Once the critical endpoints have been selected, the data sets are examined for the
 appropriateness of a BMD/C analysis. The following constraints on data sets are used:
 4     At a minimum, the number of dose groups and subjects should be sufficient to allow
       determination of a LOAEL.           ;
 4     There must be more than one exposure group with a response different than controls (this
       could be determined with a pairwise comparison or a trend test). With only one
       responding group, there is inadequate information about the shape of the dose-response
       curve, and mathematical modeling becomes too arbitrary.
4     Dose-response modeling is not appropriate if the responding groups all show responses'
       near maximum (e.g., greater than 50% response for quanta! data or a clear plateau of
       response for continuous data). In/this case, there is inadequate information about the
       shape of the curve in the low dose region near the BMR.
DRAFT-DO NOT CITE QR QUOTE        19                          AUGUST 9, 1996

-------
 1      3.  Combining J)ata for a BMD/C Calculation
  *                          '           .                    '        *
 2            Data sets that are statistically and biologically compatible may be combined prior to dose-
 3      response modeling and thus generate fewer BMD/C estimates for comparison.  The combining of
 4      appropriate data sets prior to modeling has several advantages. It leads to increased confidence,
 5      both statistical and biological, in the calculated BMD/C. In addition, the use of combined data
 6      sets may encourage further research to be conducted on that compound if the additional data can
 7      affect the BMD/C estimate. Example #3 for boron in Appendix D is a case where data could be
 8      combined for the BMD/C analysis.
 9
10      B.  Criteria for Selecting the Benchmark Response Level (BMR)
11            The Benchmark Dose Workshop (Barnes et al, 1995) recommended that the BMR
12      "should be within or near the experimental range of the doses studied." The Workshop's
13      recommendation in this regard was to. use a 5% or 10% increase in response levels above controls
14      as the BMR. However, this recommendation addressed only the case of dichotomous data and
15      did not deal with continuous endpoints.  In addition, the variety of experimental designs used in  ,
1G      toxicology and the associated variety of experimental endpoints and their ranges of variability may
17      require a broader range of approaches. This proposal for selecting the BMR attempts to take into
18      account the wide array of lexicological responses.
19            Any toxicological study is limited in its ability to detect an increase in the incidence of
20      adverse responses or a change in the mean of continuous endpoints.  This limitation is determined
21      by aspects of the study design: largely, the number of independent experimental units per dose
22      group, the nature of any nesting or repeated measures in the design, the background incidence of
23      the adverse response (for dichotomous endpoints) or the variance of the control response (for
24      continuous endpoints), as well as the intended form of significance testing. For some endpoints,
25      such as adult weight, serum liver enzyme activities, and certain neurological measurements like
26      nerve conduction velocity, there is also a minimum change from control levels that is considered
27      "biologically significant." For endpoints for which there is no agreed upon biologically significant
28      change, quantifying the concept of "limit of detection" for a toxicological study can simplify
29      specifying the BMR.

        DRAFT-DO NOT CITE OR QUOTE        20 '                         AUGUST 9, 1996

-------
'  1      .: .•     The BMp/C approach requires that a level of change in response (the BMR) be specified
  2     . in order to calculate the BMD/C. In this proposal, there are two bases for specifying the BMR: a
  3      biologically significant change in response for continuous endpoints, or the limit of detection for
  4      either quanta! or continuous data.  In most cases, the question concerning how much of a change -
  5      in a continuous endpoint is biologically significant has not been addressed, and the level of change
  6      that is considered adverse is based in large part on the detectable level of response or limit of
 .7      detection for a particular study design. For quantal data, the number of responders is counted,
  8      and again the limit of detection is used.  The limit of detection is based on the background
  9      response rate, sample size, power level,  and whether extra or additional risk is used in the .model.
 10      Standard study designs2 for various endpoints should be used to determine the general
 11      background rate for a response level and the number of animals typically used.   These concepts
 12      are discussed in depth in Appendix B.
13             the following default decisions will be used in selection of the BMR.
 14      4     Biological significance: If a particular level of response for a continuous endpoint has
 15             been determined to be "biologically significant," the BMR is based on that degree of
 16             change from background (e.g., adult body weight reduction >10% from the control value;
 17             a 10% or more decrease in nerve conduction velocity).
 18      V     Limit of Detection: When the biologically significant level of response has not been
 19         •    determined or is equated with a statistically significant response, the limit of detection
 20             method is used. To find the magnitude of response just detectable (the lunit of detection),
 21             a default power level of 50% (may increase, pending simulation studies) and a one-
 22             .sided test with a Type I error of 0.05 (p<0.05) is used.
 23      4     Defaults: For quantal data, a 10% increase in extra risk will be used as the default
 24             approach when neither biological significance nor the limit of detection has been
               2Using a cadre of standard study designs for a variety of endpoints is one way to reduce
        the uncertainty that can be introduced when different individuals make decisions for limits of
        detection. We assume here the use of Agency testing protocols as the basis for standard "good"
        study designs. In areas where standard testing protocols have not been developed, the Agency
        encourages activities that can assist in identifying a "most common well-designed protocol" for
        typically-studied endpoints. •„.       ,           •
        DRAFT-DO NOT CITE OR QUOTE        21                           AUGUST 9, 1996

-------
,  4 J      '      determined. When the BMR is set.on the basis of biological significance, extra risk will be
   $              '            '                  '                     ,                 i
  2             used as a default. When the limit of detection approach is used for the BMR, it does not
  3             matter whether extra or additional risk is used, as the BMR that corresponds to the same
  4             limit of detection for the risk formulation is determined and used hi the model.
                                                                                 •
  5                                      '  -    •           '
  6      C. Mathematical Modeling        •       "                                   ,      .
  7      1.  Introduction
  8             The goal of the mathematical modeling in benchmark dose computation is to fit a model to
  9      dose-response data that describes the data set, especially at the lower end of the observable dose-
  10      response range. The fitting must be done in a way that allows the uncertainty inherent hi the data
  11      to be quantified and related to the estimate of the dose that would yield the benchmark response.
  12      In practice, this procedure will involve first selecting a family or families of models for further
  13      consideration, based.on characteristics of the data and experimental design, and fitting the models
  14      using one of a few established methods.  Subsequently, a lower bound on dose is calculated at the
  15      BMR. This section is too brief to do more than introduce the topic of nonlinear modeling. Some
  16      references for further reading are: Chapter 10 of Draper and Smith (1981), Clayton and Hills
  17      (1993), Davidian and Giltinan (1995), and McCullagh and Nelder (1989).
  18            Dose-response models are expressed as functions of dose, possibly covariates, and a set of
 19      constants, called parameters, that govern the details of the shape of the resulting curve.  They sire
 20      fitted to a data set by finding values of the parameters that adjust the predictions of the model for
 21      observed values of dose and covariates to be close to the observed response. At the present,
 22      although biological models may often be expressed as nonlinear models (e.g., Michaelis-Menton
 23      processes), nonlinear models do not necessarily have a biological interpretation. Thus, criteria for
 24      final model selection will be based solely on whether various models describe the data,
 25      conventions for the particular endpoint under consideration, and, sometimes, the desire to fit the
 26      same basic model form to multiple data sets.  Since it is preferable to use special purpose
 27     modeling software, EPA is in the process of developing user-friendly software which includes
 28      several models and default processes as described in this document. The models included in the
 29      software are listed hi Appendix E. •       •                                                  .

        DRAFT-DO NOT CITE'OR QUOTE        22                          AUGUST 9, 1996

-------
        This section provides guidance on how to go about choosing a model structure
 appropriate to the data being analyzed, the order of model application confidence limit
 calculation, -selectipn of "equivalent" models, and selection of the BMD'/C to use as the point of
 departure. More in depth discussion of these topics can be found in Appendix C. .Examples of
 BMD/C modeling of actual data sets can be found in Appendix D.         ,
 2.  Order of Model Application
        The models should be executed eitheir using software produced and distributed by the
 EPA or using software that carries out the curve fitting in a manner similar to the'EPA software.
 Allowing the use of more than one model at this stage is done because there is not enough
 experience with modeling a wide variety of endpoints, and the best-fitting model may be
 somewhat endpoint-specific.  The following order for running the models should be used:
 >      Continuous data: A linear model  should be run first. If the fit to a linear model is not
        adequate, the polynomial model should be run, followed by the continuous power model.
        Other models may be applied at the discretion of the risk assessor,
 4     Dichotomous data: As there is currently no rationale for selecting one versus another, one
       or more of the following models should be applied: the log-logistic, Weibull, and
       polynomial models. For developmental toxjcity data,  models with fetuses nested within
       litters should be used. The nested log-logistic model tended to give a better fit more often
       in studies modeled by Allen et al. (1994b), but other models often gave a good fit as well.
       Other models may be applied to dichotomous data at the discretion of the risk assessor.
 3. Determining the Model Structure
       The parameters included in the models most commonly used for dose-response analysis,
 and examination of the change in the shape of the dose-response curve as those parameters are
 changed, are described in the background document (EPA, 1995c).  This section provides
 guidance on choosing a model structure appropriate to the data being analyzed.
 4      Dose Parameter Restriction:  As  a default, the exponent on the dose parameter in the
       quanta! Weibull model and the coefficient on slope in the log-logistic model will be
       constrained to be greater than or equal to 1. Unconstrained models may be applied if
       necessary to fit certain data.  This constraint is typically necessary to avoid unstable

DRAFT-DO NOT CITE OR QUOTE       23                           AUGUST 9,  1996

-------
 ^ I      '      numerical properties in calculating the confidence interval when the parameter value is less
 *              '            :     '      ,               *             *   -          .      -
 2   •         thanl.
 3      4     Degree of the Polynomial: A 2-step procedure will be used as a default for determining
 4            the degree of the exponent in the polynomial model for both continuous and quantal data
 5            (this is a place-holder, pending further discussion and agreement). First, a default of
 6            the degree equal to k-1, where k is the number of groups, will be used. Second, a
 7            stepwise top-down reduction hi the degree will be performed to select the model with the
 8            fewest parameters that still achieves adequate fit (based on p>0.05 from the goodness-of-
 9            fit statistic). This approach will be built into software distributed by EPA.
10      4     Background Parameter: As a default approach, the background parameter will not be
11            included in the models.  A background term should be included if there is evidence of a
12            background response.  This would be the case if there is a non-zero response in the
13            control group.  A background parameter would also be included if the first 2 or more
14            dosed groups do not show a monotonic increase in response (e.g., if the first 2 dosed
15            groups show the same response or if the second dosed group has a lower response than
16            the lowest dosed group).  If there is doubt about whether a background parameter is
17            needed, it is usually conservative (i.e., will result in a lower BMD) if the background
18            parameter is excluded from the model. This is an  area where more work is needed.
19      4     Use of extra risk versus additional risk for quantal data: As in selection of the BMR,
20     *      above.
21      4     Threshold parameter:  A so-called "threshold" (intercept) term will not be included in the
22            models used for BMD/C analysis because it is not a biologically meaningful parameter
23            (I.e., not the same as a biological threshold) and because most data sets can be fit
24            adequately without this parameter and the associated loss of a degree of freedom.  This
25            will be the default built into the software distributed by EPA.
26      4     Conversion of continuous data to dichotomous format:  The standard approach to BMD/C
27            analysis will be to model the continuous data directly, without conversion to .the
28            dichotomous format.  Alternatively, a hybrid approach such as that described by Crump
29            (1995) can be used. The hybrid approach models the continuous data, then uses the

        DRAFT-DO NOT CITE OR QUOTE        24          .                AUGUST 9, 1996

-------
 .-      resulting, distribution of the control .data to calculate a probability estimate.
                The conversion of continuous data to dichptomous data and modeling of the
        dichotomous data is not preferred because of the loss of information about the distribution
        of the response that is inherent in the approach, and because of the uncertainties in
        defining a cut-off response level to distinguish resporiders and nonrespbnders. Conversion
        to dichotomous data could be considered in the rare cases where much is known about the
        biological consequences of the response and a cutoff can be defined more confidently, or
       .in cases where the need for a probabilistic estimate of response outweighs'the loss of
        information,
 4.  Confidence Limit Calculation
        The confidence limits will be calculated based using likelihood theory and they will be
 based on the asymptotic distribution of the likelihood ratio statistic (with the exception of models
 fit using GEE methods; see Appendix C). This will be the default built into the software
 distributed by EPA. The 95% lower confidence bound on dose will be used as the default
 confidence interval for calculating the BMD/BMC..
 5.  Selection of "Equivalent" Models
                                                             *        .        .    '   •,
       Because each of the available models has some degree of flexibility and is capable of
 describing a range of dose-response patterns, it may be the case that several models seem
 appropriate for the analysis of the data under consideration.  This section describes how to
 evaluate whether these models are equivalent based upon an evaluation of statistical and
 biological considerations.      /
 4      All models should be retained that are not rejected at the probability of p>0.05 using the
       goodness-of-fit (GOF) statistic.
 4      A graphical representation of the model should also be developed and evaluated for each
       model. Models should be elirninated that do not adequately describe the dose-response in
       the range near the BMR. This is because it is possible that an adequate model fit could be
       obtained based on the GOF criteria alone, but the fit of the data is not adequate at the low
       end of the dose-response curve. Such a case should not be used for the BMD/C
       calculation. This is a subjective judgment and requires that software for dose-response

DRAFT-DO NOT CITE OR QUOTE        25 '                         AUGfUST 9, 1996

-------
•  *       '      analysis provide adequate graphical, functions.  No objective criteria for this decision are
   2       .      currently available.  It should be noted, however, that most quantal models assume that
   3             each animal responds independently and has an equal probability of responding. Similarly,
   4             for continuous data, the responses are assumed to be distributed according.to a normal
   5             probability distribution! When these assumptions are not appropriate, for example in
   6             studies of developmental toxicity where the responses are correlated within litters,
   7             alternative model structures may be used. Biological considerations might also be helpful
   8             to determine adequate model fit. For example, a smooth change of slope may be deemed
   9             more reasonable for a given response than an abrupt change.
 10      4      If adequate fit is not achieved because of the influence of high dose groups with high
 11             response rates, the assessor should consider adjusting the data set by eliminating the high
 12             dose group. This practice carries with it the loss of a degree of freedom, but may be
 13             useful in cases where the response plateaus or drops off at high doses.  Since the focus of
 14             the BMD analysis is on the low dose and response region, eliminating high dose groups is
 15             reasonable.              .
 16             At this point, the remaining models should be considered "equivalent" in terms of their
 17     usefulness for BMD/C analysis, especially when there is no biological basis for distinguishing
 18     between the models or for choosing the best model.
 19     6. Selecting the BMD/C
 20            As the models remaining have met the default  statistical criterion for adequacy and visually
 21     fit the data, any of them theoretically could be used for determining the BMD/C. The remaining
 22     criteria for selecting the BMD/C are necessarily somewhat arbitrary, and are adopted as defaults.
 23     4     If the BMD/C  estimates from the remaining models are within a factor of 3, then they are
 24            considered to show no appreciable model dependence and will be considered
 25         '   indistinguishable in the context of the precision of the methods.  Models are ranked based
 26            on the values of their Akaike Information Criterion (AIC), a measure of the deviance of
 27            the model fit adjusted for the degrees of freedom, and the model with the lowest AIC is
 28            used to calculate the BMD/C.
 29     4     If the BMD/C estimates from the remaining modjels are not within a factor of 3, some

        DRAFT-DO NOT CITE OR QUOTE        26                           AUGUST 9, 1996

-------
•  ,x      '      model dependence of the estimate is assumed. Since there is no clear remaining biological
  2     .  .-•    or statistical basis on which to choose among them, the lowest BMD/C is selected as a
  3     •       reasonable conservative estimate.  If the lowest BMD/C from the available models appears
  4            to be an .outlier, compared to' the other results (e.g., if the other results are .within a factor
  5            a 3), then additional analysis and discussion would be appropriate. Additional analysis
  6            might include the use of additional models, the examination of the parameter values for the
  7        "    models used, or an evaluation of the MLEs to determine if the same pattern exists as for
  8            the BMD/Cs.  Discussion of the decision procedure should always be provided.
  9     >  .   In some cases, relevant data,for a given agent are not amenable to modeling and a mixture
 10            of BMD/Cs and NO AEL/LOAELs results. When this occurs, and the critical effect is
 11            from a study considered adequate but not amenable to modeling, the NOAEL should be
 12            used as the point of departure.      \            •
 13'             ; .  ;.      '      ,     '•'.'- '..'     '             ,     ,     '.'••'•'..  ^       •  . '
 14      '               '•',''             .     .'.•-'•'-.•••.       '         .
       DRAFT-DO NOT CITE OR QUOTE        27                          AUGUST 9, 1996

-------
.  1            IV, USING TEE BMD/C IN NQNCANCER DOSE-RESPONSE ANALYSIS
   *              *            '           ,               * •  *~
   2              .                        '   •         •
   3      A. Introduction                                             , '
   4            Use of the BMD/C approach for noncancer dose-response analysis on a broad scale raises
   5      a number of issues.  A wide variety of endpoints of noncancer health effects must be considered in
   6      the development of exposure limits. Experience with the BMD/C approach varies depending on
   7      the disciplinary area and type of endpoint under consideration.  As indicated earlier in the
   8      document,  a good deal of work has been done for developmental toxicity, less for neurotoxicity,
   9      and relatively little for other types of noncancer health effects to date. Although some BMD/Cs
 10      are based on continuous variables and others on discrete (mostly dichotomous) variables, this is
 11      not a problem using the approach advocated in this document.  Basically, the BMR is set by
 12      selecting levels of response that are considered biologically significant and/or near the limit of
 13      detection for a particular endpoint in a given study design, using much the same basis as has been
 14   ,   done for setting NOAELs.
 1§            A few RfCs and one RED included in the IRIS database (EPA, 1996c) have been based on
 16     BMD/C calculations. These include: methyl mercury (RfD based on delayed postnatal
 17     development in humans), carbon disulfide (Example #1 in Appendix D), HFC-134a (Example # 2
 18     in Appendix D) and antimony trioxide (RfC based on chronic pulmonary interstitial inflammation
 19     in female rats). A few other risk assessments have been developed using the BMD/C approach,
 20     including manganese (RfC based on neurotoxicity hi humans) (EPA, 1995g), and diesel exhaust
 21     (RfC based on lung irritation) (EPA, 1995), and others are under development (e.g., boron -
 22     Example #3 in Appendix D; 1,3-butadiene - Example #4 hi Appendix D).  All of these cases have
 23     involved a variety of approaches to calculating the BMD/C and to deriving RfDs and/or RfCs. In
 24     addition, the Agency is continuing to conduct  such evaluations as part of ongoing risk
 25     assessments and is aware of a number of similar efforts outside the Agency.
 26
 27     B. Effect of the BMD/C Approach on Use of Uncertainty Factors
 28           Uncertainly always accompanies inferences regarding the derivation of RfD/Cs or other
 29     exposure limits. Whether a threshold for noncancer effects is assumed or not,  inferences about

        DRAFT-DO NOT CITE OR QUOTE        28                 .         AUGUST 9,1996

-------
 l     lower human exposure levels usually embody one or more extrapolations which may include:
 * *            '              '               "                       , •      •       '" '•
 2     animal to human, human to sensitive human populations, acute exposure to chronic exposure, and
 3     LOAELtoNOAEi,(ifaNOAELisnotavailable). Each of these extrapolations carries attendant
 4     uncertainties. The only change in the use of uncertainty factors when using the BMD/C as the
 5     point of departure is that the LOAEL to NOAEL uncertainty factor is no longer necessary.
 6    .   -               '   .' ^ '    '"'•  ,  ,"-          /  .   .     '•    •      '..   •'       •  '     ;
 7     C." Dose-Response Characterization
 8            Dose-response analyses based on BMD/Cs provide a point of departure for calculating
 9     MOEs, RfD/Cs, and other exposure estimates. Although the BMD/C is associated with a defined
10     level of risk in the study population from which the BMD/C was calculated, it would be
11     misleading to translate that to a level of risk at the MOE or RfD/C or other reference value. The
12     dose at the MOE or reference value is intended to yield human risks that are lower than the risk to
13     test species at the BMD/C, but the exact degree of protection is unknown, given the attendant
14     uncertainties in extrapolation of low dose nonlinear response relationships.
15            This issue was discussed in some detail at the BMD workshop (Barnes etal., 1995).,.The
16     participants concluded that the level of effect at the BMD/C in the experimental species cannot be
17     translated to a level of risk at the RfD/C for humans.  They pointed out that typically the average
18     level of effect in the test species at the BMD/C is less than the level of response at the BMR (e.g.,
19     5%, 10%), since the BMD/C is defined  as the lower 95% confidence limit on the dose at the
20     BMR.  They recommended an approach (discussed further below) to communicating the use of
21     the BMD/C in such situations, and that  this information be included in IRIS along with the central
22     estimates (e.g., MLEs), BMD/Cs, MOEs, RfD/Cs or other reference values.
23            The following statement^ modified from the BMD workshop report, is an example of what
24     can be used to communicate these values based on BMD/Cs:
25                   ''         .••.'..'     '   • '•'•      '•''•''':•".•'..'..   ' ''
26            The BMD/C corresponds to a dose level which yields (with 95%
27            confidence) a level of effect in the test species of, for example, 10% or
28            less.for quantal data, or that represents a  change from the control mean
29            of, for example, 5%  for continuous data.  This is about the lowest level           .

       DRAFT-DO NOT CITE OR QUOTE        29                          AUGUST 9, 1996

-------
 2
 3
 4
 5
 6
 7
 8
 9
10
of effect.that can be detected reliably in an experimental study of this
design.  Alternatively, if the BMR is based on a degree of change that is
considered biologically significant, e.g., a 10% or greater reduction in
adult body weight or reduction in birth weight below 2.5 kg, the
BMD/C represents the lower confidence limit on dose for that degree
of change. Overall, the BMD/C will be a more consistent point of
departure than the NOAEL and will not be constrained by the doses
used in a particular study.
      DRAFT-DO NOT CITE OR QUOTE
                                   30
AUGUST 9, 1996

-------
' .  1      V. USE OF BENCHMARK-STYLE APPROACHES IN CANCER RISK ASSESSMENT
  2 .     '.'•;''   '  ' •    .•'•'"''••'"   •   •.'            .       '' .    ,..   .         ••' ',
  3            Although use of a benchmark approach in noncancer assessment is relatively new,
  4      benchmark-style approaches have been used in ranking and regulating potential cajrcinogens for
  5      many years. Historically, cancer risk assessors have not used the term "benchmark dose,"
  6      preferring terms such as "TDSO" and "EDi0," because these latter terms draw an analogy to the
  7      LD50 and maintain continuity in terminology with the published cancer literature.  The principal
 . 8      use these approaches has been in hazard ranking; more recently, a benchmark-style approach is
  9      being proposed for extrapolation below the observed range for tumor incidence in the proposed
 10      cancer guidelines (EPA, 1996a).  This section provides guidance for cancer hazard ranking and
 11      low-dose extrapolation.
 12           ,   .    '  ;: '   '             - -     '•'••-.'-•     '    .  "    '       ..._-.'
 13      A. Use in Hazard Ranking
 14            Hazard ranking, the comparison of quantitative potency estimates across chemicals, has
 15      been the principal use of ED10s and similar estimates. This, use does not require that low-dose risk
 16      be quantified, consequently, rankings can be based on potency estimates from the experimental
 17      range, which are mostly independent of choice of model. EPA has used ED10s in regulations and
 18      proposals that set priorities for emergency response (U.S. EPA, 1987, 1989a, 1995a) and
 19      evaluate health hazards from air pollutants (U.S. EPA,  1994a). This section provides guidance
 20     for developing ED10s, cites sources where TDsos and ED10s can be obtained, and reviews
 21     examples where TD50s and ED10s have been adapted for other uses.
 22     .1.  Guidance for Developing Comparable ED10s
 23            Hazard rankings are based on the notion that potency estimates for different chemicals and
 24     experimental protocols can be made comparable. It is customary to adjust for background tumor
 25     rates, combine fatal and incidental tumors, and correct for early mortality (Peto et al., 1984;
 26     Sawyer et al., 1984). Other calculation conventions include a 2-year standard lifespan for rat and
 27     mouse studies; standard food, water, and air intake factors for each sex and species; standard
 28     absorption rates for each exposure route; use of time-weighted-average doses; and correction for
 29     less-than-lifespan studies (Peto et al., 1984). A common method promotes consistency; for

        DRAFT-DO NOT CITE OR QUOTE        31                          AUGUST 9, 1996

-------
.  1      example, Kfetabl.e methods give similar, but usually lower, TD50s than summary incidence methods
   2      (Goldetal., 1986a).             ''          ,                 '
   3     •       These factors are covered in EPA's methodology for developing EDi0s and applying them
   4      in hazard rankings (EPA, 1988c, 1994b), incorporated here by reference.  The methodology
   5      specifies detailed approaches for selecting data sets and tumor responses; deriving equivalent
   6      doses across species, dosing regimens, and exposure routes; adjusting for less-than-lifetime dosing
   7      or survival; and modeling in the experimental range. To allow consistent estimates where data are
   8      missing, the methodology provides defaults. With this approach, ED 10s for .different chemicals
   9      and experimental protocols are made comparable.
 10     2.  Sources of TD50s and ED10s
 11            TD50s and EDi0s are readily available from the published literature. The first and most
 12     comprehensive compilation is the Carcinogenic Potency Database (CPDB), developed and
 13     regularly updated by Gold et al. (1984, 1986b, 1987, 1990, 1993, in press). The CPDB provides
 14     standardized information on over 4400 carcinogenicity experiments on over 1100 chemicals.
 15     Carcinogenic potency is described by the TDSO, defined as "the chronic dose rate that will halve
 16     the probability of remaining tumor-free throughout the standard life span." TDSOs have been
 17     used to investigate many questions, including those concerning chemical carcinogenesis,
 18     carcinogen identification, cross-species extrapolation, reproducibility of results, and ranking
 19     possible carcinogenic hazards. TD50s from the CPDB have been used by California EPA as an
 20    ' alternative method for low-dose extrapolation (Hoover et al., 1995).
 21            EPA uses EDi0s to rank potential carcinogens found at waste sites (EPA, 1988c, 1989b,
 22     1995b) and evaluate health hazards from air pollutants (EPA, 1994b). These references provide
 23     standardized cancer potency estimates for more than 100 chemicals found at waste sites and more
 24     than 80 air pollutants. EDi0s from these references have been used in other comparative risk
 25     analyses.  The 10-percent level was chosen to represent the lower end of the experimental range
 26     and, thus, be more pertinent to human environmental exposure than TDsos.
 27            Judgment is essential when adapting TD50s or ED10s to a particular application. For
 28     example, the inclusive nature of the CPDB supports use of TD50s as a midrange response, as many
 29     carcinogenicity experiments test only one high dose. In contrast, EPA uses ED10s from the lower  -

        DRAFT-DO NOT CITE OR QUOTE        32                         AUGUST 9, 1996

-------
' .  1     .end of the experimental range because of its focus on human environmental exposure.
   2   •'•'".'"'            ''•'..            '•
   3     B. Use in Low-Dose Extrapolation
   4            Low-dose extrapolation goes beyond the experimental information on tumor incidence to
   5     estimate risks at lower doses.  Extrapolations generally have been described by a slope factor..
   6     which is an upper bound on the slope of an assumed linear dose-response curve at low doses.
   7     Slope factors can be multiplied by exposure levels to, bound the cancer risk. Recently proposed
   8   •  cancer guidelines (EPA, 1996a) provide alternative approaches to low-dose extrapolation, with
   9     linear, nonlinear or both as defaults based on an LED1(j.  This section summarizes the proposed
 10     use of LEDi0s, describes how the proposed use of LED10s would change existing slope factors,
 11     and discusses some issues that have appeared in the literature.          --.-.,.
 12     1. Proposal for low-dose extrapolation under the 1996 Proposed Guidelines for Carcinogen
 13     Risk Assessment
 14            As described previously, the dose-response assessment under the new guidelines is a two-
 15.     step process. In the first step,  response data are modeled in the:range of observation. It should
 16     be noted that in addition to modeling tumor data, the proposed guidelines allow for the
 17     opportunity to use, and model other kind of responses if they are considered to be important
 18     measures f carcinogenic risk. In the second step, extrapolation below the range of observation is
 19     accomplished by biologically based or case-specific modeling if there are sufficient data, or by a
 20     default procedure using a curve-fitting model The proposed guidelines, indicate a preference for
 21     biologically based dose-response models, such as the two-stage model of initiation plus clonal
 22     expansion and progression (Moblgavkar and Knudson,  1981; Chen and Farland, 1991; EPA,
 23     1995d) for the extrapolation of risk. Because the parameters of these models require extensive
 24     data, it is anticipated that the necessary data to support these models will not be available for most
 25     chemicals. Therefore, the 1996 proposed guidelines allow the use of several default extrapolation
 26     approaches.                                                     -.,        ,             ,
 27            The default extrapolation approaches are based on "curve-fitting" in the observed range to
        DRAFT-DO NOT CITE OR QUOTE        33                           AUGUST 9, 1996

-------
 f 1      determine the lower 95% confidence limit on a dose associated with 10% extra risk (LED103).
 2      The LEDi0 is proposed as a standard point of departure. The 10% response is at or near the limit
 3      of sensitivity in most cancer bioassays.  Other points of departure may be appropriate, e.g., if a.
 4      response is observed below the 10% level. The point of departure forms the basis for the default
 5     . extrapolation approaches described below.
 6      Linear default extrapolation procedure--The LMS procedure of the 1986 guidelines for
 • 7      extrapolating risk from upper confidence intervals is no longer recommended as the linear default
 8      in the 1996 proposed guidelines. The linear default in the new guidelines is a straight-line
 9      extrapolation to the origin from the point of departure identified by curve fitting in the range of
 10      observed data (the slope of this line is 0.10/LED10, which is inversely related to the LED10> i.e.,
 11      high potency is indicated by a large slope and a small LED10). It should be noted that the
 12      straight-line extrapolation from the LED10 and the LMS procedure produce similar results (Gaylor
 13      and Kodell, 1980).-
 14            The straight-line/LED10 approach does not imply unfounded sophistication as
 15      extrapolation with the LMS procedure does. The linear default approach would be considered for
 16      agents that directly affect growth control at the DNA level (e.g., carcinogens that directly interact
 17      with DNA and produce mutations).  There might be modes of action other than DNA reactivity
 18      (e.g., certain receptor-based mechanisms) that are better supported by the assumption of linearity.
 19      When inadequate.or no information exists to explain the mode of action of a carcinogen, the linear
20      default approach would be used as a science policy choice in the interest of public health.
21      Likewise, a linear default would be used if evidence demonstrates the lack of support for linearity
22      (e.g., negative genotoxicity studies), but there is also an absence of sufficient information on
23      another mode of action to explain the induced tumor response.  The latter is also a public health
24      conservative policy choice.                                                         ,
25      Nonlinear default extrapolation procedure—Although the understanding of the mechanisms of
26      induced carcinogenesis likely will never be complete for most agents, there are situations where
              395 percent lower bound on the effective dose associated with a 10 percent increase in
        response; equivalent to the BMD/C. The term LED10 is used in this section becuase of its use hi  ,
        the proposed guidelines for carcinogen risk assessment (EPA, 1996a).
        DRAFT-DO NOT CITE OR QUOTE        34                          AUGUST 9,  1996

-------
  l     evidence i's sufficientto support an assumption of nonlinearity. Because it is experimentally
  2     difficult to distinguish modes of actions with true "thresholds" from others with a nonlinear dose-
  3     response relationship, the proposed nonlinear default procedure is considered a practical approach
  4     to use without the necessity of distinguishing sources of nonlinearity. Moreover, the use of
  5     empirical models to approximate a nonlinear dose-response relationship is discouraged, because
  6     different models can give dramatically different results below the observable range of tumor
  7   .  response, with no basis for choosing among them. Thus, in the 1996 proposed guidelines, the
.8     nonlinear default approach begins at the identified point of departure (LED10) arid provides a
  9     margin of exposure (MOE) analysis rather than estimating the probability of effects at low doses.
'10           , A nonlinear default position must be consistent with the understanding of the agent's
 11     mode of action hi causing tumors. For example, a nonlinear default approach would be taken for
 12     agents causing tumors as a secondary consequence of organ toxicity or induced physiological
 13     disturbances. Because there must be a sufficient understanding of the agent's mode of action to
 14     take the nonlinear default position, and because the proposed guidelines allow for the opportunity
 15     to model not only tumor data but other responses thought to be important precursor events in the
 16     carcinogenic process (e.g., DNA adducts, mutation, cellular proliferation, hormonal or
 17     physiological disturbances, receptor binding), modeling of key nontumor data is anticipated to ,
 18     make extrapolation based on the nonlinear default procedures more meaningful by providing
 19     insights into the relationships of exposure and tumor response below the observable range.
20     Nontumor data may actually be used instead of tumor data for determining the point of departure
21      for the MOE analysis.
22            The MOE analysis is used to compare the LED10 with the human exposure levels of
23      interest.  The acceptability of an MOE is a matter for risk management; thus, the key objective of
24      the MOE analysis is to describe for the risk manager how rapidly response may decline with dose.
25      The MOE analysis considers:                .
2<3      +     steepness of the slope of the dose response, including the degree to which the dose-
27            response relationship deviates from a straight line.
28      4     human differences in sensitivity  - if this cannot be determined itis considered to be  at
29            least 10-fold.                        '             '

        DRAFT-DO NOT CITE OR QUOTE       35                           AUGUST 9,  1996

-------
 1     4     interspecies differences - if this cannot be determined, humans can be considered 10-fold
 2    .        more sensitive; if evidence shows humans to be less sensitive, a fraction no smaller than
 3            1/10 can be used. This compares cross-species sensitivity to equivalent doses, which are
 4            calculated using either toxicokinetic information or an oral default scaling factor based on
 5            equivalence of mg/kg3/4-d (US EPA, 1992), or inhalation default using the RfC dosimetry
 6            approach (USEPA, 1994c).                  .
 7     4     nature of the response being used for the point of departure, i.e., tumor or nontumor
 8            data - tumor data might support a greater MOE than a more sensitive precursor response
 9            which can be measured at lower exposures.
10     4     biopersistence of the agent becomes an important factor in the MOE analysis if
11            nontumor precursor response data from less than life-time exposures are used for
12            determining the point of departure for. extrapolation of risk.          .
13     Both linear and nonlinear defaults—"There may be situations where it is appropriate to consider
14     both linear and nonlinear default procedures.  For example, an agent may produce tumors at
15     multiple sites'by different mechanisms. If it is apparent that an agent is both DNA reactive and
16     highly active as a promoter at higher doses, both linear and nonlinear default procedures may be
17     used to distinguish between the events operative at different portions of the dose-response curve
18     and to consider the contribution of both phenomena.
19            In curve-fitting the data in the observed range to determine a point of departure for the
20     defaults discussed above, the multistage model can be used as a default empirical curve-fitting
21     model; personal computer software has been developed for EPA analyses (Howe et al., 1986).
22     Results from other software packages, even those based on the same model, can be difficult to
23     compare, because they can employ different parameter constraints, dose transformations,
24     background incidence adjustments, confidence interval methods, or goodness of fit statistics. For
25     example, a spreadsheet version of the multistage model (Haas, 1994) uses a logarithmic
26     transformation of dose to implement the nonnegativity constraints in the multistage model; such ,
27     variants can be statistically sound yet introduce small differences in results compared to other
28     EPA analyses.
29            Slope factors developed from LED10s and alternative points of departure, including ED10s :

      1 DRAFT-DO NOT CITE'OR QUOTE       36                          AUGUST 9, 1996

-------
  l     and ED01s,- are sjable in the face of minor perturbations in incidence in controls and at the lowest
  2     tumor dose (Gaylor et al., 1994).  This represents an improvement over the maximum likelihood
  3    • estimate of slope from the linearized multistage procedure, which is not statistically stable below
  4     the experimental range (BCrewski et al., 1990, 1991; Gaylor etal., 1994).
  5     [Note:  ", Use of nontumor data is currently being discussed, and additional information will be
  6     added later.]                                             .    .   ,
  7     2. Issues in choosing a point of departure
  8            The approach of the proposed cancer guidelines (U.S.EP A, 1996a) reflects an
  9     observation that carcinogenicity bioassay studies can provide information in the experimental
.10     range, but cannot determine the nature of the true dose-response relationship below that range;
 ll     this/suggests the separation of analysis in the experimental range from extrapolation to lower
 12     doses (Gaylor and Kodell, 1980).  Using a model to extrapolate to  a pre-determined risk level,
 13     with further extrapolation by straight line, was first proposed by Van Ryzin (1980). This proposal
 14  •   allowed that extrapolation by model could proceed to a risk of 0.01 percent.  Later refinements
 15     suggested that extrapolation stop at 1 percent to avoid model dependency (Farmer et al., 1982).
                   X    '•'•'*       *
 16     For an upper bound on linear extrapolation, a straight line from an upper bound on risk at the
 17     lower end of the experimental range liad been proposed by Gaylor and Kodell (1980).
                         •  '•'••    •.'_•*>•      •
 18            Judicious choice of a starting point is key to the credibility of a low-dose extrapolation.
 19     The high correlations established by Cogliano (1986) and Krewski etal. (1990) led to the idea
 20    'that ED 10s can serve as a common measure for both potency ranking and a starting point for low-
 21     .dose extrapolation (NRG, 1993). The concerns of Gold etal. (1993), Wartehberg and Gallo
 22     (1990), and Hoel (1990) remind the risk assessor that effects at high and low doses can be
 23     different, and that the shape of the dose-response curve in the experimental range contains useful
 24     information.  The concern about overestimation if the dose-response curve turns upward further
 25     highlights differences between high and low doses.  These concerns can be addressed, in part, by
 26     using a point of departure near the lower end of the experimental range.
 27.'        :     .      '•     .  .  -   •    -                 • • • '              ,   •      •   •      '-
        DRAFT-DO NOT CITE OR QUOTE        37                           AUGUST 9, 1996

-------
'.  1      C. Dose-Response Characterization
                  •••••'       .      '•
   2    •    f    Cancer risk assessments attempt to describe a dose-response relationship that can be used
   3      to evaluate risks over a range of environmental exposure levels. All relevant experimental
   4      information is considered, with incomplete information addressed by a set of default assumptions
   5      and adjustments.  Some adjustments, for example, the cross-species scaling factor and the time-
   6      weighted-average dose metric, are intended as unbiased defaults not expected to contribute to
   7      health-conservative estimates (EPA, 1992).                       .
   8            The principal health-conservative defaults in cancer risk assessment have'been the
   9      presumption that animal results are pertinent to humans, the use of low-dose linear models to
  10      extrapolate these results to lower doses, and the use of statistical upper bounds to express slope
  11      factors. The proposed cancer guidelines (EPA, 1996a) would not change two of these defaults:
  12      animal results would still be considered pertinent to humans, in the absence of convincing
  13      evidence to the contrary; and slope factors would retain the use of statistical bounds, though' this
  14      would be more explicitly communicated by the "L" in the term "LED10."
  15             In contrast, the proposed availability of both linear and nonlinear defaults, each using
  16      LED10s as a point of departure, signals a shift from routine application of linear defaults toward a
  17      most-plausible-hypothesis approach. When a linear default is used, the LED10 approach should
  18  ,    have only a small effect on existing slope factors, as linear extrapolation from TDsos or lower
  19      points gives results similar to other low-dose linear methods (Krewski,  1990). When a nonlinear
 20 '    default is used, however, there will be substantial changes in the way a cancer hazard is
 21      characterized.  The risk assessment will characterize how rapidly risk is reduced as exposure is
 22      reduced, and this characterization will be both more qualitative and quantitative in nature. Of
 23      increased importance will be a discussion of the factors that affect the magnitude of a cancer risk
 24      at lower doses,, including slope at the point of departure, nature of the response, persistence in the
 25      body, sensitivity of humans compared to experimental  animals, and nature and extent of human
 26      variability hi sensitivity.  The uncertainty inherent in these factors likely will be much more
 27      important than issues in computing LED10s.
 28            Because low-dose linear models have been used extensively hi cancer risk assessment,
 29      there will be a natural inclination to compare nonlinear methods to linear extrapolation. Even

        DRAFT-DO NOT CITE OR QUOTE        38                           AUGUST 9, 1996

-------
l     when linear and nonlinear models both fit the experimental information, risk estimates at lower
2     doses from a nonlinear model can be substantially lower than those from a linear model.  This will
3     make it critical that risk assessments using nonlinear methods explain why risk is anticipated to
4     decline more than proportionately with dose, that risk assessments using linear methods explain
5     why risk is anticipated to decline in proportion to dose and, for both methods, identify the major
6 •    determinants of risk at low doses and address the principal sources of uncertainty and human
7     variability.                                         .                   •       ,
8          • •      •     •       • •    "      •'  -      -      ••-..'••'••
9         '                '                     .               .       '.-'  '    ''.      :  -
      DRAFT-DO NOT CITE OR QUOTE       39                         AUGUST 9, 1996

-------
.  .1             '     ..                  VI. FUTURE PLANS
   2   '                        '         '"''•'      '    '    ''               .
   3     A. Updating of Guidance Document
   4            This guidance document will be updated as new information on the characteristics of
   5     modeling of dose-response data and BMD/C analysis becomes available. Information from the
   6     literature as well as that submitted to the Agency for review will be considered in revisions to this
   7     guidance document.  In particular, as approaches become established for selection of the BMR for
   8     various endpoints or types of endpoints, this information will be added to the guidance.
   9            The guidance in this document is intended to be consistent with various EPA risk
  10     assessment guidelines and with the RfD/C and CRAVE processes (for input into the IRIS
  11     Database). As changes occur in either the guidelines or in the RfD/C or CRAVE process, this
  12     guidance may also need to be revised.                                          .
  13
  14     B. Potential for Use of the BMD/C in Cost-Benefit Analyses
  15     1. Background
  16           As a regulatory agency, the US EPA has responsibility for the implementation in whole or
  17     in part of about a dozen environmental statutes.  Two statutes explicitly require the Agency to
  18     weigh the health and  environmental benefits of proposed regulations against then- costs: the
  19     Federal Insecticide, Fungicide and Rodenticide Act as amended (P.L. 110-532, Oct. 25, 1988,
 20      102 Stat. 2654), and the Toxic Substances Control Act (P.L. 100-551, Oct. 28, 1988, 102 Stat.
 21      2755). President Clinton's Executive Order 12866 on Regulatory Planning and Review also
 22      directs the Agency to consider the benefits of the regulations it proposes. In the context of the
 23      BMD/C guidance document here, the term "benefits" refers only to the reduced incidence of
 24      adverse health effects that would be expected to occur as a result of implementing a given
 25      regulation. Currently, no consensus exists as to how noncancer health benefits may be
 26      characterized in terms of the expected numbers of cases of some specific disease that are avoided
 27      by reducing the level  of exposure for a given population.  Note that the monetary valuation of
 28      these human health benefits and the costing of regulations aimed at  obtaining them are outside the
 29      scope of risk assessment methodology and are not considered here.
     *                       »                                                                   '
        DRAFT-DO NOT CITE OR QUOTE        40                          AUGUST 9, 1996

-------
  I            The goa| of a benefits analysis is to. determine the benefits to society rather than to the
 '*  *              *•*'...       ;                   "             •      *
 2 .    individual, and thus requires estimates of the incidence of adverse effects in a population, not just
 3     unit risk. The estimation of benefits is a multi-step process that requires assessment of individual
 4     risk, inter-individual variability and population size in order to estimate population risk. First, the
 5     incidence of adverse health effects is estimated for a baseline set of exposure conditions (usually
 6     the current condition).
 7            Second, the incidence of expected adverse health effects is estimated for an alternative set
 8     of conditions, usually reflecting anticipated reductions in exposure in the affected'population after
 9     regulations are in effect. Finally, the benefits of achieving the reduced exposure are calculated
 10     from the difference between the estimated adverse health effects for the baseline conditions and
 11     alternative conditions. While point estimates are often derived for health benefits, an alternate
 12     approach could entail a probabalistic scheme, based on upper and lower bounds on the individual
 13     risk and exposure curves.        -
 14            1° predicting numbers of expected adverse health effects, one needs to consider the weight
 15     of evidence that the critical effect in deriving the BMD/G is the effect of concern in human
 16     populations. One would have greatest confidence in benefits assessments based on strong human
 17     data; high confidence where the mechanism is known, the critical endpoint is observed in several
 18     animal test, species and other endpoints occur hi the same sequence in several species; and less
 19     confidence where a varjety of effects are seen in different animal species, appearing hi different
20  .   sequences.
21     2.  Benefits estimates using BMD/C and RfD/C
22            In the case of carcinogens,  standardized methodologies exist for assessment of individual
23     risk and estimates of the incidence  of cancer in a population (EPA, 1996a). The use of  .
24     benchmark-style approaches hi the assessment of individual cancer risk has been discussed
25      elsewhere in these guidelines (section V). To estimate benefits, the numbers of expected cases of
26      cancer avoided under .different regulatory scenarios are obtained by multiplying individual risk
27      times the size of the exposed population.                   .
28             In noncancer health benefits analyses, the quantification of the number of cases of disease
29      avoided has been difficult because the RfD/G derivation methodology does not provide estimates .
                          •         .'..••.                      .••«-..        '
        DRAFT-DO NOT CITE OR QUOTE        41                           AUGUST 9, 1996

-------
 11      of individual Depopulation risk. To date, with a few exceptions like lead (EPA, 1991b), the
  2      primary measure of benefits for regulating non-carcinogens has been the estimated size of the
  3      population expected to have reduced exposure to a chemical as the result of regulations
  4      implemented to interdict exposure or to reduce ambient concentration of pollutants.  These
  5      estimates, however, do not directly indicate the extent of morbidity, nor the number of cases of
  6      disease or mortality that would be avoided in the affected populations;  This guidance does not fill
         t                                     »  •       ,
  7      these gaps per se.
  8             The debate regarding the defensibility of approaches to quantifying the benefits associated
  9      with reducing exposure to noncarcinogens is long-standing (NRC, 1983). Efforts to quantify
 10      noncancer risks at low levels of exposure should be based on empirical dose-response data instead
 11      of statistical models to extrapolate to risks at low exposures, and should have a strong biological
 12      basis for the particular form of model chosen.  In the BMD/C approach, there is an explicit
 13      estimate of the dose that corresponds with some level of risk within or not far below the observed
 14      range. These estimates potentially can be used to determine the number of cases expected for
 15      various non-cancer endpoints when exposure levels are in or near those in the experimental data
 16      range.
 17            However, estimation of benefits will usually also require determining risk well below the
 18      experimental range, and determining the level below which observed effects are not of concern.
 19      These determinations include risks and levels that apply to the general population, and those that
20      apply to sensitive subgroups.
21      3. Issues in quantifying noncancer health benefits
22            The benchmark dose approach thus provides a good starting point to develop benefits
23      estimates for non-carcinogens.  Many of the key issues for benefits assessment are addressed
2-i      during BMD/C development, e.g., selecting endpoints relevant to a human disease state,
25      identifying the proper scaling factor to convert animal dose values to human equivalent doses,
26      converting continuous response data to quantal form to support estimates of the probability of
27      expected adverse health effects, and modeling critical health effects data in the observable range
28      using standard mathematical and statistical models aimed at estimating the lower confidence
29      bound on the dose producing a response.   Two major issues remain that require the development.

        DRAFT-DO NOT CITE OR QUOTE        42'                         AUGUST 9, 1996

-------
."  l     of new paradigms to qu,antify population risk at low dose exposures. One is the extent to which
   2 .,••'••• there may be sensitive or "super-exposed" subpopulations.  The secbnd issue involves identifying,
   3     in a progression of effects in the same mechanistic pathway, those effects which have such limited
                                           . '          f      •        '       -           -    .
   4     severity as to be of no concern, beyond representing biomarkers of exposure. Work iscontinuing
   5     in these areas.      ,                             ••   '   v  •
   6                 .          •'.-'.     •  - •.        •••-.-.,                '•;•.•.'
   7     C.  Research and Implementation Needs
   8      -     A number of research needs related to the BMD/C approach were indicated in the
 .  9     background document (EPA, 1995c).  These included the development of dose-response models
  io     and related methods for use with various types of data, guidelines for handling lack of fit,
  11     development of methods for applying pharmacokinetic considerations, guidelines for selecting
  12     appropriate measures of altered response, study of the, sensitivity of the BMD/C to choice of
  13     model, particularly in relation to the level of the BMR, and to the confidence limit size, guidelines
  14     for selecting a single BMD/C when more than one is calculated, investigation of uncertainty
  15      factors, comparison of dose-response curves for different types of data and toxic endpoints, and
  16      development of dose-response models for multiple endpoints of toxicity.  Several of these
  17      research needs have been addressed in the .present document. In most cases, however, further
  18      research is needed on these topics:
 19             The most critical data need from the point of view of implementing the BMD/C approach
 20      on a broad scale according to the guidance in this document is the selection of appropriate
 21      measures of altered response. As discussed in Section E.C, selection of the BMR should be
 22      dependent on one of two factors, the degree of change considered biologically significant, or that
 23      degree of change at the limit of detection.  These determinations will need to be done on an
 24      endpoint by endpoint basis and, thus, will require bringing together experts in various disciplines
 25      to reach consensus. In those situations where biological significance is equated with statistical
 26      significance, the limit of detection must be determined. This will require the evaluation of data
 27      across a number of studies of similar design to determine for each endpoint of concern what is the
 28      limit of detection, on average. Alternatively, studies comparing NOAELs and BMD/Cs for
 29      different BMRs could be conducted if an adequate database exists,  as has been done for
    •                ,       ,,<.'.    • ' .          (               ...         .                  ,
        DRAFT-DO NOT CITE OR QUOTE        43                          AUGUST 9, 1996

-------
 1     developmental toxicity (Faustman et al., 1994;  Allen et al., 1994a and b; Kavlock et al., 1995).
 2            More experience with modeling data using a variety of models is needed to compare the .
 3     BMD/Cs obtained and the rationale for selection of the most appropriate model and BMD/C.
 4     Sensitivity, of the BMD/C to the choice of model, confidence limit size and the actual software
 5     used need to be explored with a variety of data sets. It would be advantageous, eventually, to
 6     have a small suite of models that could be recommended for rioncancer health effects, depending
 7     on the type of data.
 8            The development of user-friendly software that is widely available for BMD/C modeling
 9     would be extremely helpful. EPA has an on-going effort to develop such software, and the
10     models to be included at the present time are indicated in Appendix E.
11            Training of risk assessors hi the BMD/C approach, application of models and
12     interpretation of data is needed to implement this approach on an Agency-wide basis as well as
13     among individuals outside the Agency. EPA intends to implement such a training program 'for its
14     employees engaged  in risk assessment, as well as risk managers who need to understand the
15     procedure. EPA can also work with other organizations (e.g., professional societies) in
16     developing training programs for those in other sectors who may be involved in risk  assessments
17     or serve as reviewers of risk assessments of interest to the EPA.
18            Several issues have been considered in the development of this guidance document that do
19     not pertain directly to the application of the BMD/C approach but do impinge on its application hi
20     the risk assessment arena.  These are intentionally not addressed in this guidance document, but
21     do need to be considered further hi other efforts. For examples there are several differences in the
22     way in which dose-response analyses are done for cancer endpoints versus noncancer endpohits.
23     These include the assumption of a threshold for noncancer endpoints versus a low-dose linear
24     relationship for cancer (although this is beginning to change with the newly proposed cancer risk
25     assessment guidelines - EPA, 1996). In addition, within human variability is dealt with explicitly
26     in the use of an uncertainty factor for noncancer endpoints, but implicitly for cancer. Dose scaling
27     across species is handled differently for cancer (mg/kg3/4-d) versus noncancer (mg/kg/day) health
28     effects. And while linear low-dose extrapolation for cancer to obtain a risk estimate or slope
29     factor includes a risk reduction strategy,  there is no implicit risk reduction factor included for
                               .     '              f                                      '    f
       DRAFT-DO NOT CITE OR QUOTE       44                          AUGUST 9, 1996

-------
 1      noncancer risk assessment; rather uncertainty factors are used to derive a dose that is considered
 2      to be unlikelyto cause effects.detectable above background. These and other issues, suchashow
 3      to deal with severity of effect, need further consideration by the Agency.'
4         ,  '  '     .'•'.'•       . '       '  '  '      _-                 "       ••."'. '
5                    •     .       '       '    ' ."         -      '.•'';-..   .
6                    '     '      "•••.'            :  .   '.   - :.       '
     DRAFT-DO NOT CITE OR QUOTE        45                         AUGUST 9,1996

-------
 1              •     .                   VIL REFERENCES
 #              ! ;      •    ' ,                  •                                     '
 2                   .= .           -      •-     '    ,             >    '•
 3                                    .     -.               '.-...-•
 4      Alexeeff, G.V.; Lewis, D.C.; Ragle, N.L. (1993) Estimation of potential health effects from acute
 5      exposure to hydrogen fluoride using a 'benchmark dose' approach. Risk Analysis, 13(l):63-69.
 6                                    '                                      '•••'.
 7      Allen, B.C.; Kavlock, R.J.; Kimmel, C.A.; Faustman, E.M. (1994a) Dose-response assessment for
 8      developmental toxicity: n.  Comparison of generic benchmark dose estimates with NOAELs.
 9      Fund. Appl. Toxicol., 23:487-495.               '••••'    ' •  ,  _'
10                                                                                      .
11      Allen, B.C.; Kavlock, R.J.; Kimmel, C.A.; Faustman, E.M. (1994b) Dose-response assessment for
12      development toxicity:  ffl.  Statistical models.  Fund. Appl. Toxicol., 23:496-509.
13                                 •
14      Allen, B.C., and P.L. Strong, C.J. Price, S.A. Hubbard, and G.P. Daston (1996) Benchmark dose
15      analysis of developmental toxicity in rats exposed to boric acid.  Fund. Appl. Toxicol.. 32: (in
16      press).
17                         '
18      Auton, T.R. Calculation of benchmark doses from teratology data. Regulatory Toxicology and
19      Pharmacology, 1994, in press.
20                                      .                   ,        •
21      Barnes, D.G.; Daston, G.P.; Evans, J.S. Jarabek, A.M.; Kavlock, R.J.; Kimmel, C.A.; Park, C.;
22      Spitzer, H.L. (1995) Benchmark dose workshop: Criteria for use of a benchmark dose to estimate
23      a reference dose. Regulatory Toxicol. Pharmacol., 21:296-306;
24
25      Beck, B.D.; Conolly, R.B.; Dourson, M.L.; Guth,  D.; Hattis, D.; Kimmel, C.; Lewis, S.C. (1993)
26      Symposium overview: improvements in quantitative noncancer risk assessment. Fund. Appl.
27      Toxicology 20:1-14.
28
29      California Office of Environmental Health Hazard Assessment. (1993) Safety assessment for non-
30      cancer endpoints: The benchmark dose and other possible approaches. Summary report.
31                          '          '              '                  "       .'          "     •
32      Catalano, P.J.; Scharfstein, D.O.; Ryan, L.M.; Kimmel, C.A.; Kimmel, G.L. (1993) Statistical
33      model for fetal death fetal weight and malformation in developmental toxicity studies. Teratology
34      47:281-290.
35                                                              *  ' "
36      Chen, C.; Farland, W. (1991) Incorporating cell proliferation in quantitative cancer risk
37      assessment: approaches, issues, and uncertainties.  In: Butterworth, B.; Slaga, T.; Farland, W.;
38      McClain, M., eds. Chemical induced cell proliferation: implications for risk assessment. New
39      York: Wiley-Liss, pp. 481-499.              ,          .'....
40                            '                                                   .        •
41      Chen, J.J.; Kodell, R.L.; Howe, R.B.; Gaylor, D.W.  (1991) Analysis of trinomial responses from
42      reproductive and developmental toxicity experiments.  Biometrics 47:1049-105 8.
43


        DRAFT-DO NOT CITE OR QUOTE        46                         AUGUST9, 1996

-------
.'  1     Clayton, D.; Hills, M. (1993) Statistical Mpdels in Epidemiology. Oxford University Press
   2     Oxford.     '-..-.'               -,      '               .      -.                   '
   3                    -     '       '    .    '•'.''-'•             •    ...'.'...-
   4     Cogliano, V.J. (1986) The U.S. EPA's methodology for adjusting the reportable quantities of
 >  5     potential carcinogens. Proceedings of the 7th National Conference on Management of
   6.   • • Uncontrolled Hazardous Wastes (Superfund '8.6): Washington: Hazardous Materials Control
   7     Research Institute, pp. 182-185,  -                       '
   8    .-.''••''.'                         '"'.  •              •' .'•'         ''   ••
   9     Collins, M. A., G.M. Rusch, F.Sato, P.M. Hext, and R.-J. Millischer.  1995.
 10     1,1,1,2-Tetf afluoroethane: Repeat exposure inhalation toxicity in the rat, developmental foxicity
 11     in the rabbit, and genptoxicity in vitro and in vivo. Fund. Appl. Toxicol 25- 271-280
 • 12          '                         .   ,       • •     •'•--•;."••'   ',
 13     Cox, D.R.; Hirikley, D.V. (1974) Theoretical Statistics, chapter 7, Chapman and Hall, London.
 14    . .       • i  '  . .    -        .,-.'•    ..  '    .              .               . . •
 15     Crump, K.S. (1984) A new method for determining allowable daily intakes. Fundamental and
 16    Applied Toxicology 4:854-871.               ,
 17        .   ;         '  - ..    '    .   .       '                 .  ,  :  '.        ' ,      '   . '
 18    Crump, K, S, (1.995) Calculation of benchmark.doses from continuous data. Risk Analysis 15-
 19    79-89.             .                                   -....'        /  .   •
 20     .  •   -  ..  /       .    •  •  -  •         '     '                               ,
 21     Crump, K. S.; R. Howe. (1985) Chapter 9. in lexicological Risk Assessment. D.B. Clayson, D
 22     Krewski, I. Munro, eds. Boca Raton: CRC Press, Inc.
 23   "          ' •  .       '   ' ''    '      •    •            '•'-.''
 24     Davidian, M.; Giltinan, D. M. (1995) Nonlinear Models for Repeated Measurement Data
 25     Chapman and Hall, London. Farmer, J.H.; Kodell, R.L.; Gaylor, D.W. (1982) Estimation and
 26     extrapolation of tumor probabilities from a mouse bioassay with survival/sacrifice components
 27     Risk Analysis 2(l):27-34.                                                     ,
 28,;'  ..        "      •   .     •             '          .   •             "-•-'      ''
 29     Dourson, MX., Hertzberg, R.C., Hartung, R., Blackburn, K. (1985) Novel methods for the
 30     estimation of acceptable daily intake. Toxicology and Industrial Health 1'23-41
 31 .     .    ,           '  /. - •    '          .    '   •    •   .  •    • •   . ;  .•    .'•...
 32     Draper, N.; Smith, H. (1981) AppUed Regression Analysis, Second Edition Chapter 10 Wilev
 33     New York.   ,                                                                     y'
 34         •  ..           '     .  '    .  •    . -•  -.   .'
 35     Farmer, J.H.; KodeU, R.L.; Gaylor, D.W. (1982) Estimation and extrapolation of tumor
 36     probabilities from a mouse bioassay with survival/sacrifice components.  Risk Analysis 2(lY27-34
 37                     .                       '           ,'  -      '   .  '
 38'    Faustman, E.M.;  Allen, B.C.; Kavlock, R.J.; Kimmel, C.A. (1994) Dose-response assessment for
 39     developmental toxicity:  I. Characterization of data base and determination of NO AELs  Fund
 40     Appl. Toxicol., 23:478-486.                            ,
 41           •.'    .       '       -.''..'••    '          '        '..-'"      ''       .
 42     Fleiss, J.L.( 1981) Statistical Methods for Rates and Proportions. Second Edition Wilev  New
 43     York.                        ;         .        -   -  .


        DRAFT-DO NOT CITE OR QUOTE        47                          AUGUST 9, 1996

-------
 1     Gaylor, D.-W. (1983) The use of safety factors for controlling risk. Journal of Toxicology and
*2"     Environmental Health. 11:329-336. ••                           '•
 3                                  •.                    '      -.-...  ..        -. -.
 4    • Gaylor, D.W.; Kodell, R.L. (1980) Linear interpolation algorithm for low dose risk assessment of
 5     toxic substances. J. Environ. Pathol: Toxicol. 4:305-312.
 6,                          '          '                     .                            .
 7*    Gaylor, D.; Slikker, W., Jr. (1990) Risk assessment for neurotoxic effects. NeuroToxicology 11:
 8     211-218.                                              ..
 9                   •                                    •
10     Gaylor, D.W.; Kodell, R.L.; Chen, JJ.-~Springer, J.A.; Loreritzen, R.J.; Scheuplein, RJ. (1994)
11     Point estimates of cancer risk at low doses. Risk.Analysis 14(5):843-850.      ..
12 •
13     Gerrity, T.R.; Henry, C.J., eds. (1990) Summary report of the workshops on principles of route-
14     to-route extrapolation for risk assessment.  In: Principles of route-to-route extrapolation for risk
15     assessment, proceedings of the workshops; March and Juiy; Hilton Head, SC and Durham, NC.
16     New York, NY: Elsevier Science Publishing Co., Inc.; pp. 1-12.
17                                          .         •••"..'.'
18     Gold, L.S.; Sawyer, C.B.; Magaw, R.; Backman, G.M.; de  Veciana,  M.; Levinson, R.;
19     Hooper, N.K.; Havender, W.R.; Bernstein, L.; Peto, R.; Pike, M.C.;  Ames, B.N. (1984) A
20     carcinogenic potency database of the standardized results of animal bioassays. Environ. Health
21     Perspect. 58:9-319.                                                 .                .  -
22                      '               :            .'..••
23     Gold, L.S.; Bernstein, L.; Kaldor, J.; Backman, G.; Hoel, D. (1986a) An empirical comparison of
24     methods used to estimate carcinogenic potency in long-term animal bioassays: lifetable vs     *
25     summary incidence data. Fund. Appl. Toxicol. 6:263-269.
26
27     Gold, L.S.; de Veciana, M.; Backman, G.M.; Magaw, R.; Lopipero,  P.; Smith, M.;
28     Blumenthal, M.; Levinson, R.; Bernstein, L.; Ames, B.N. (1986b) Chronological supplement to
29   • the carcinogenic potency database: standardized  results of animal bioassays published through
30     December 1982. Environ. Health Perspect. 67:161-200.
31                                          .        .
32     Gold, L.S.; Slone, T.H.; Backman, G.M.; Magaw, R.; Da Costa, M.; Lopipero, P.;
33     Blumenthal, M.; Ames, B.N. (1987) Second chronological  supplement to the carcinogenic
34     potency database: standardized results of animal  bioassays published through December 1984 and
35     by the National Toxicology Program through May 1986. Environ. Health Perspect. 74:237-329.
36
37     Gold, L.S.; Slone, T.H.; Backman, G.M.; Eisenberg, S.; Da Costa, M.; Wong, M.; Manley, N.B.;
38     Rohrbach, L.; Ames, B.N.  (1990) Third chronological supplement to the carcinogenic potency
39     database: standardized results of animal bioassays published through December 1986 and by the
40     National Toxicology Program through June 1987. Environ. Health Perspect. 84:215-286.
41
42     Gold, L.S.; Manley, N.B.;  Slone, T.H.; Garfinkel, G.B.; Rohrbach, L.; Ames, B.N. (1993) The
43     fifth plot of the carcinogenic potency database: results of animal bioassays published in the general


       DRAFT-DO NOT CITE OR QUOTE       48                         AUGUST 9,1996

-------
..  1     literature through 1988 and by the National Toxicology Program through 1989 Environ Health
   2     Perspect. 100:65-168.            -.             .  .            •-
   3    '     .  : .      .   •   •   .  •   •'      '   •... ,\ •  •      ., •;        •  •
   4     Gold, L.S.; Manley, N.B.; Slone, T.H.; Garfinkel, G.B.; Ames, B.N. Rohrbach, L.; Stern, B.R.;
   5     Chow, K. (in press) Sixth plot of the carcinogenic potency database: results of animal bioassays
   6     published in the general literature 1989-1990 and by-the National Toxicology Program
   7     1990-1993. Environ. Health Perspect.
   8      .                 .-..-..     .          ,         -",'•."--.'
   9     G,uth(19_J   '                             ;'      ,          ,
  10 .'••''-    '   '  '      ..•'.:       •            :     .-•.."-..
  11     Haas, C.N.  (1994) Dose-response analysis using spreadsheets. Risk Analysis 14(6): 1097-1100
  12                 '      • '    '        .         . -..     .         "•'.'•'"
  13     Hasselblad, V.; A.M. Jarabek (1995) Dose-response analysis of toxic chemicals. In: Bayesian
  14     biostatistics. D.A. Berry, D.K. Stangl, eds. Marcel Dekker, Inc. New York
  is     '   ;   '  - .            .           •      ';.   •.       •       • . :       ••.       '
  16    Heihdel, J.J., CJ. Price, E.A. Field, M.C. Marr, C.B. Myers, RE. Morrissey, and B.A. Schwetz
  17    (1992). Developmental toxicity of boric acid in mice and rats. Fund. Appl. Toxicol 18-266-
  18 ,        .,..          •     '        .' •  .    '      -   \       •     '       -•       '-    -
  19    Hertzberg, R.C. (1989) Fitting a model to categorical response data with application to species
  20    extrapolation of toxicity.  Health Physics 57:405-409.
  21        ..'-••        "          '   r               '          •''•_••
  22 ;    Hertzberg, R.C., Miller, M. (1985) A statistical model for species extrapolation using categorical
  23     response data. Toxicology and Industrial Health 1:43-57
  24             •    '.-      '•''.'••'•-.-   •                   '          '
 25     Hext, P.M. and R.J. Parr-Dobrzahski, 1993. HFC 134a: 2 Year inhalation toxicity study in the
  26     rat.  ICI Central Toxicology Laboratory, Alderley Park, Macclesfield, Cheshire, UK  Report No
  27     CTL/P/3317.                                 .  .                                      /
 28        ••,;".•  ''-•"_'-'      '.'.,'••.; ••'.-      '       ••     •  .;.  •
 29     Hoel(1979)
 30                  •-       '              ''.'.'.'-'"..'•
 31     Hoel, D.G. (1990) Assumptions of the HERP index. Risk Analysis 10(4):623-624
 32   r                 '       '          ,        '•.-;.    '.••-••      .   •  .  •
 33     Hoover, S.M.; Zeise, L.; Pease, W.S.; Lee, L.E.; Hennig, M.P.; Weiss, L.B.; Cranor, C, (1995)
 34     Improving the regulation of carcinogens by expediting cancer potency estimation  Risk Analysis
 35     15(2):267-280.  ,          "   , ,                                              ,
 36          .  "  '  •      ''''.''...      -     ",              ,       ;'.    ,   '  -
 37     Howe, R.B.; Crump, K.S.; Van Lahdingham, C. (1986) Global 86: a computer program to
 38     extrapolate quantal animal toxicity data to low; doses. Prepared for U.S. EPA under contract
 39     68-01-6826.                       .                                       :
 40     .,.•''      ': '.    '       '  •• •       "'-'•'.        •'  '   •   '       •
 41     Jarabak, A.M., Hasselblad, V.  (1992) Application of a Bayesian statistical approach to response
 42     analysis of noncancer toxic effects.  Toxicologist 12:98.
 43         •'  '            .         •          ,      • '"• .        .   ' .    '    .       .;'•'•


        DRAFT-DO NOT CITE OR QUOTE        49'   >                     AUGUST 9, 1996

-------
  1      Johnson B-.L., J..Boyd, J.R. Burg, S.T. Lee, C. Xintaras, and.B.E. Albright. 1983. Effects on the
 '2      peripheral nervous system of workers' exposure to carbon disulfide/ Neurotoxicology 4(1):
  3      53-66,  "'•.-"                                     •.'..-.
  4                                                                '.''•'•'
  5      Kavlock, R.J., B.C. Allen, C.A. Kimmel, E.M. Faustman. (1995) Dose-response assessment for
  6      developmental toxicity: IV. Benchmark doses for fetal weight changes. Fund. Appl. Toxicol.,
  7      26:211-222.
  8
  9      Kavlock, R.J.; Schmid, IE.; Setzer, R.W., Jr.  (1996) A-simulation study of the influence of study
 10      design on the estimation of benchmark doses for developmental toxicity. Risk Analysis 16:391-
 11      403.                                           •••.-.
•12
 13      Kimmel, C.A., Gaylor, D.W. (1988) Issues in qualitative and quantitative risk analysis for
 14      developmental toxicity. RiskAnalysis.8:15-20.
 15                         '••-..-
 16      Kimmel, C.A.; Wellington, D.G.; Farland, W.; Rose, P.; Manson, J.M.; Chernoff, N.; Young, IF.;
 17      Selevan, S.G.; Kaplan, N.; Chen, C.; Chitiik, L.D.; Siegel-Scott, C.L.; Valaoras, G.; Wells, S.
 18      (1989) Overview of a workshop on quantitative models for developmental toxicity risk
 19      assessment. Environmental Health Perspectives 79:209-2150.
 20                                                               ,                         .
 21      Kimmel, C.A., M, Siegel, T.M. Crisp, and C.W. Chen (1996) Benchmark concentration (BMC)
 22      analysis of 1,3-butadiene (BD) reproductive and developmental effects. Fund. Appl. Toxicol.
 23      (Suppl., no. 1, part 2) 30: 146.
 24
 25      Kodell, R.L.; Chen, J.J.; Gaylor, D.W. (1995) Neurotoxicity Modeling for Risk Assessment.
 26      Regulatory Toxicology and Pharmacology 22:24-29.
 27
 28      Krewski, D. (1990) Measuring carcinogenic potency. Risk Analysis 10(4):615-617.
 29
 30      Krewski, D.; Zhu, Y.  (1994)
 31                                                          '        '
 32      Krewski, D.; Zhu, Y.  (1995) A simple data transformation for estimating benchmark doses in
 33      developmental toxicity experiments. Risk Analysis 15:29-39.
 34                   '                        .'              .
 35      Krewski, D.; Szyszkowicz, M.; Rosenkranz, H. (1990) Quantitative factors in chemical ,
 36      carcinOgenesis: variation in carcinogenic potency. Regul. Toxicol. Pharmacol. 12:13-29.
 37                                                                              ,
 38      Krewski, D.; Gaylor, D.;  Szyszkowicz, M. (1991) A model-free approach to low-dose
 39      extrapolation. Environ. Health Perspect. 90:279-285.
 40                       ,                                         -
 41      Kupper, L.L.; Hafher, KB. (1989) How appropriate are popular sample size formulas? The
 42      American Statistician 43:101-105.
 43


        DRAFT-DO NOT CITE OR QUOTE        50                          AUGUST 9, 1996

-------
  LefkopQulou,M.; Moore, D.; Ryan, L. (1989)The analysis of multiple binary outcomes-
  APP^i01110 rodent teratology experiments. Journal of the American Statistical Association 84:
  olO-ol5.'                    .       • .         , .

  Moolgavkar, S.H.; Knudson, A.G. (1981) Mutation and cancer: a model for human
  carcinogenesis. J.Natl. Cancer Inst. 66:1037-1052.

  McCullagh, P.; Nelder, J.A. (1989) Generalized Linear Models, Second Edition.  Chapman and
  Hall, London.                                     .        .

  National Research Council (NRC) (1983)  Risk Assessment in the Federal Government-
  Managing the Process.  Prepared by: Committee on the Institutional Means for Assessment of
  Risks to Public Health, Commission on Life Sciences. Washington, DC.

  National Research Council (NRC) (1993) Issues in risk assessment. Washington- National
  Academy Press, pp. 115-116.                 .             -

  National Research Council (NRC) (1994)  Science and Judgment in Risk Assessment, Committee
  on Risk Assessment of Hazardous Air Pollutants, Board on Environmental Studies and
  Toxicology, Commission on Life Sciences, National Academy Press, Washington, DC.

  National Toxicology Program (NTP) (1991) Technical report on the toxicology and
  carcinogenesis of l>butadiene (CAS No. 106-99-0) in B6C3F, mice (inhalation studies)  U S
  Department of Health and Human Services, Pubic Health Service, National Institutes of Health
  National Toxicology Program. NTP TR434, NTH Publ. No. 92-3165.                    .  '

  Peto, R.; Pike, M.C.; Bernstein, L.; Gold, L.S.; Ames, B.N. (1984) The TD50: a numerical  ;   :
  description of the carcinogenic potency of chemicals in chronic-exposure animal experiments
•  Environ. Health Perspect. 58:1-8.  .                                ,      ,           '   .-

  Price and Berner, 1995.  A benchmark dose for carbon disulfide: Analysis of nerve conduction
  velocity measurements from the NIOSH exposure database. Report to the Chemical
  Manufacturers Association Carbon Disulfide Panel.

  Research Triangle Institute (RTI)  (1994) Determination of the no-observable-adverse-efTect-level
  (NOAEL) for developmental toxicity in Sprague-Dawley (CD) rats exposed to boric acid in feed
  on gestational days 0 to 20, and evaluation of postnatal recovery through postnatal day 21   RTI
 Identification Number 65C-5657-200,                             '    '

 Ryan, L. 1992. Quantitative risk assessment for developmental toxicity. Biometrics 48:163-174.

 Sawyer, C.; Peto, R.; Bernstein, L;; Pike, M.C. (1984) Calculation of carcinogenic potency from
 long-term animal carcinogenesis experiments. Biometrics 40:27-40.


 DRAFT-DO NOT CITE OR QUOTE        51                         AUGUST 9, 1996

-------
 I     Simpson (19_X                     -.
 *2    • ,:        ' ,    ^     '       .  -  '•     '   '       '         '  ''       ' ' '      .'.
 3     Setzer, R.W.; Rogers; J.M. (1991) Assessing developmental hazard: the reliability of the A/D
 4     ratio.  Teratology 44:653-665.
 5                       '   .                              .
 6     SRA Symposium (1994)                                                 •'•
 7                      .                                  . •        '     '
 8     U.S. Environmental Protection Agency (EPA) (1986a) Guidelines for carcinogen risk
 9     assessment. Federal Register 51(185):33992-34003.
10
11     U.S. Environmental Protection Agency (EPA) (1986b) Science Advisory Board Comments
12                                               .       ,
13     U.S. Environmental Protection Agency (EPA) (1987) Hazardous substances; reportable quantity
14     adjustments; proposed rules. Federal Register 52(50):8140-8186.
16
16     U.*S. Environmental Protection Agency (EPA) (1988a) Science Advisory Board Comments.
17
18     U.S. Environmental Protection Agency (EPA) (1988b) Science Advisory Board Comments.
19
20     U.S. Environmental Protection Agency (EPA) (1988c) Methodology for evaluating potential
21     carcinogenicity in support of reportable quantity adjustments pursuant to CERCLA section 102.
22     Washington: report no. EPA/600/8-89/053.                   '.".-.
23
24     U.S. Environmental Protection Agency (EPA) (1989a) Reportable quantity adjustments;
25     delisting of ammonium thiosulfate; final rules. Federal Register 54(155):33418-33484.
26
27     U.S. Environmental Protection Agency (EPA) (1989b) Technical background document to
28     support rulemaMng pursuant to CERCLA section 102, volume 3., Washington: Office of Solid
29     Waste and Emergency Response.
30 '                                                                                        •
31     U.S. Environmental Protection Agency (EPA) (1989c) Science Advisory Board Comments.
32
33     U.S. Environmental Protection Agency (EPA) (1991a) Guidelines for developmental toxicity
34     risk assessment; notice. Fed Regist, 56:63798-63826.
35                '              .              .     '.-.-..,'•
36     US Environmental Protection Agency (EPA) (1991b) Regulatory impact analysis of proposed
37     national primary drinking water regulation for lead and copper. Prepared by Wade Miller
38     Associates, Inc. April.                                   '
39
40     U.S. Environmental Protection Agency (EPA) (1992) Draft report: a cross-species scaling factor
41     for carcinogen risk assessment based on equivalence of mg/kg3/4/day; notice. Federal Register
42     57(109):24152-24173.
43


      •DRAFT-DO NOT CITE 'OR QUOTE       52                         AUGUST 9,  1996

-------
  U.S. Environmental Protection Agency (1994a) Ranking of pollutants with respect to hazard to
  human health; proposed rule. Federal Register 59.      ...

  U.S. Environmental Protection Agency (1994b) Technical background document to support
  rulemaking pursuant to the Clean Air Act—section 112(g): ranking of pollutants with respect to
  hazard to human health. Research Triangle-Park, NG: report no. EPA-450/3-92-0-10.

  U.S. Environmental Protection Agency (1994c) Methods for derivation of inhalation reference
  dose concentrations and application of inhalation dosimetry. Office of Health and Environmental
  Assessment, Environmental Criteria and Assessment Office, Research Triangle Park. MC
  EPA/600/8-90/066F.                                                   .        '

  U.S. Environmental Protection Agency (1995a) Reportable quantity adjustments' final rule
  Federal Register 60(112):30926-30962.

 U.S. Environmental Protection Agency (1995b) Technical background document to support
 rulemaking pursuant to CERCLA section 102, vol. 7. Washington: Office of Solid Waste and
 Emergency Response.                       ,               .                :

 U.S. Environmental Protection Agency (1995c) The usex>f the benchmark dose approach in
 health nsk assessment.  Office of Research and Development, Washington, DC" EPA/630/R-
 94/007, February.    .                       '              .                •.     •,

 U.S. Environmental Protection Agency (1995d) Health assessment document for diesel emissions
 Washington, EPA/600/8-90/057Bb.         .         •

 U.S. Environmental Protection Agency (1995e) Benchmark dose concentration analysis for    '
 carbon disulfide. Internal Report.

 U.S. Environmental Protection Agency (EPA) (1995f)  Proposed guidelines for neurotoxicity
 nsk assessment; notice.  Fed Regist, 60:52032-52056

 U.S. Environmental Protection Agency  (EPA) (1995g) Manganese document

 U.S. Environmental Protection Agency (1996a) Proposed guidelines for carcinogen risk
 assessment. Federal Register 61(79): 17960-18011.

 U.S. Environmental Protection Agency (1996b) Guidelines for Reproductive Toxicity Risk
 Assessment; notice.  Federal Register (draft).

 U.S.,Environmental Protection Agency  (EPA) (1996c)  Integrated Risk Information System
 (IRIS).  Online. National Center for Environmental Assessment, Washington, DC.
DRAFT-DO NOT CITE OR QUOTE        53                  v      AUGUST 9, 1996

-------
 1     Van Ryzin, J. Q980) Quantitative risk assessment. J. Occup. Med. 22(5):321-326.
*              " *..      *     ',
 2                 .    •     •   '     '•           '      '•        . '         '-.        '
 3     Wartenberg, D.; Gallo, M.A. (1990) The fallacy of ranking possV-ie carcinogen hazards using the
 4     TDjo. Risk Analysis 10(4):609-613.        ...                     '.
 5                                  •                                 _.'           '
 6     Zeger, S. L.; Liang, K. Y. (1986) Longitudinal data analysis for discrete and continuous
 7     outcomes. Biometrics 42: 121-130.
 8
 9     Zhu, Y.; Krewski, D.; Ross, W.H.  (1994) Dose-response models for correlated multinomial data
10     from developmental toxicity studies.  Applied Statistics 43:583-598.
11                  '                          '•".-.-'.
12                                            .
13
       'DRAFT-DO NOT CITE OR QUOTE        54                          AUGUST 9, 1996

-------
•V;1  '                •',.                    APPENDIX A
  3         ASPECTS OF DESIGN, DATA REPORTING, AND ROUTE EXTRAPOLATION
  4                              RELEVANT TO BMD/C ANALYSIS
  5. ''"'''.      ,       .'..;'•-•   '              ..         '        ••-.'...."•' ,
  6     1. Design
  .7            In general, studies with more dose groups and a graded monotonic response with dose
  8     will be more useful for BMD/C analysis. Studies with only a single dose showing a response
  9     different from controls are not appropriate for BMD/C analysis.  Studies in which responses are
 10     only at the same level as background or at or near the maximal response level are not considered
 11     adequate for BMD/C analysis. It is preferable to have studies with one or more doses near the
 12     level of the BMR to give a better estimate of the BMD/C. Studies in which all dose levels show
 13     changes compared with control values (i.e., no NOAEL) are readily useable in BMD/C analyses,
 14     unless the lowest response level is much higher than that at the BMR.
 I5           In a recent simulation study by Kavloek et al. (1996), various aspects of study design
        (number of dose groups, dose spacing, -dose placement, and sample size per dose group) were
        examined for two endpoints of developmental toxicity (incidence of malformations and reduced
        fetal weight). Of the designs evaluated, the best results were obtained when two dose levels had
        response rates above the background level, one of which was nearthe BMR.  In this study, there
        was virtually no advantage in increasing the sample size from 10 to 20 litters per dose group.
        When neither of the two dose groups with response rates above the background level was near
        the BMR, satisfactory results were also obtained, but the BMDs tended to be lower. When only
        one dose level with a response rate above background was present and near the BMR, reasonable
        results for the maximum likelihood estimate and BMD were obtained, but here there were benefits
        of larger dose group sizes.  The poorest results were obtained when only a single group with an
        elevated response rate was present, and the response rate was.much greater than the BMR.
        2. Aspects of Data Reporting
              In most cases, the risk assessor relies on published reports of key toxicologieal studies in
        performing a dose-response assessment. Reports from the peer-reviewed literature may contain
        summary information which can vary hi completeness yis-a-vis the data requirements of the BMD
        DRAFT-DO NOT CITE OR QUOTE        55                          AUGUST 9, 1996

-------
 1     method. The optimal situation is to have information on individual subjects.  It is very common to
**             S            '            .            .               '. •       •  r    "   -   '.
 2 '    have summary information (group level information, e.g., mean and standard deviation)
 3     concerning the measured effect, especially for continuous response variables, and it must be
 4     determined whether the summary information is adequate for the BMD/C method.to proceed.
 6            Dichotomous data are normally reported at the individual level (e.g., 2/10 animals showed
 6     the effect).  Occasionally a dichotomous endpoint will be reported as being observed in a group,
 7     with no mention of the number of animals showing the effect.  This usually occurs when the
 8     incidence of the endpoint reported is ancillary to the focus of the report. For BMD/C modeling of
 9     dichotomous data, both the number showing the response and the total number of .subjects in the
10     group are necessary.
11            Continuous data are reported as a measurement of the effect, such as body weights or
12     enzyme activity hi control and exposed groups. The response might be reported hi several
                                     •                          *  •              .
13     different ways, including as an actual measurement, or as a contrast (e.g., as absolute change from
14     control or as relative change from control).  To model continuous data when individual animal
15     data are not available, the number of subjects, mean of the response variable, and a measure of
16     variability (e.g., standard  deviation, SD; standard error, SE; or variance)are needed for each
17     group.  The lack of a numerically reported SD/E precludes the calculation of the BMD/C, unless
18     partial information is presented (e.g., SD for control group only) and some assumptions are made.
19     For example, an assumption can be made that the variance in the exposed groups is the same as
20     the controls, but this introduces uncertainty, since the variance in the individual groups allows
21     more precise modeling of the data and calculation of the confidence limits.
22            Categorical data are defined as a type of quanta! data in which there is more than one
23     defined severity category hi addition to the no-effect category and the responses in the treatment
24     groups are characterized hi terms of the severity of effect (e.g., mild, moderate, or severe
25     histological change). Results may be classified by reporting an entire treatment group hi terms of
26     category (group level reporting), or by reporting the number of animals from each group in each
27     category (individual level reporting). For example, a report of epithelial degenerative lesions
28     might state that an exposed group showed a mild effect (group level) or that in the exposed group
29     there were 7 animals with a mild effect and 3 with no effect (individual level reporting). In such a

       DRAFT-DO NOT CITE OR QUOTE        56                          AUGUST 9, 1996

-------
.*  1     case, the BMD/C can be calculated using the dichotomous model after combining data in severity
   2   •  categories (e.g., mpdel all animals with aneffect or all with greater than a mild effect).
   3     Dichotomous data can be viewed as a special case in which there is one category and the possible
   4     response is binary (e.g., effect or no effect). Information may also be treated as 'categorical in
   5     cases where an endpoint is inherently a dichotomous or continuous variable, but because the
   6     endpoint is reported only descriptively, it cannot be treated quantitatively. The BMD/C  approach
                        -: -   .     -          .          '           .   •                       >,
   7     cannot be applied because the minimum data required for .dichotomous models, number affected
   8     and total number exposed, are not reported.                                 '•
   9           Modeling approaches have been discussed for categorical data with multiple categories
 10     "(Doursonetal., 1985; Hertzberg, 1989; Hertzberg and Miller, 1985) and for group level
 11      categorical data (Guth, Simpson). These regression models can also be used to derive a  BMD/C,
 12      by estimating the probability of effects of severity defined as adverse.  This approach is analogous
 13     to the BMD/C if the severity categories are defined consistently. This approach has had
 14      considerably less review than other approaches being used for the BMD/C
 15      3. Route Extrapolation
 16            Tne criteria for determining-if route extrapolationis appropriate for risk assessment have
 17      been discussed previously (EPA, 1994c; Gerrity and Henry, 1990) and the same criteria apply
 18      when selecting data for BMD/C analysis. If it is determined that route extrapolation is
 19      appropriate, the general procedure is to convert from the route of exposure in the study to the
 20      route of exposure of interest in the risk assessment and then to perform the BMD/C, analysis.  In
 21      this way, any non-linearity in the route extrapolation model would be incorporated into the
 22      calculation of the doses .used as input into the BMD model.
 23        '             ' .     '  •    •          •          •••'.'      -.    .             ••.'••'-'-
 24
        DRAFT-DO NOT CITE OR QUOTE        57                           AUGUST 9, 1996

-------
  1             '     .                      APPENDIX B
 *             * -       *    *
  *                   -                                       >       *.
  2   ,               SELECTING THE BENCHMARK RESPONSE (BMR) LEVEL
  3                                                ..             '';.  '; •••     •''."••   •
  4     1. Biologically Significant Change for Specifying the BMR
  5            Continuous endpoints differ from dichotomous endpoints in the way. adversity is specified
  6     and in the way BMRs can be expressed. Whereas only adverse dichotomous endpoints are
  7     selected for consideration in a dose-response assessment, often the adversity of a continuous
  8     endpoint depends upon the magnitude of the response. This can be manifested in two general
  9     ways. For some continuous endpoints, the NOAEL is defined to be the highest dose at which the
 10     difference between treated and control groups does not. exceed the criterion for a biologically
 11      significant change. In essence, smaller changes are not considered adverse.  For example, a
 12      decrease in mean adult body weight usually is not considered adverse unless it is at least 10% of
 13     the control mean.  In such cases, the BMR should correspond to that biologically significant
 14      magnitude of change. Selection of the biologically significant magnitude of change is an issue for
 15      specialists hi various fields of toxicology. Unfortunately, data have seldom been considered in this
 16      manner, so that it is difficult to get an answer to the question, "How much of a change is
 17      biologically significant and should be considered an adverse effect?"  The more usual situation is
 18      that the magnitude of change considered biologically significant is based on statistical significance.
 19            Other continuous endpoints are further classified into  adverse and non-adverse values by
20      specifying a cut-off value that distinguishes the two categories.  For example, human infants
21      weighing less than 2.5 kg at birth have been labeled "low-birth weight," an adverse outcome
22      useful jn epidemiological studies of effects of environmental agents on human birth outcomes.
23      Thus, while for dichotomous endpoints, BMRs are expressed in terms of a dose-related increase
24      in the incidence of adverse outcomes, BMRs for continous outcomes may be expressed either in
25      terms of a dose-related change in the mean or, as for dichotomous endpoints, an increase in the
26      incidence over background of the adverse outcome. This-latter approach for continuous  ,
27      endpoints mandates a choice of the cutoff distinguishing adverse from non-adverse as well as the
28      choice of the BMR.                  .                                        .
29                            .

        DRAFT-DO NOT CITE OR QUOTE        58                          AUGUST 9, 1996

-------
   l      2.  Limit of Detection for Specifying the BMR
   2             The concept of "limit of detection" for a toxicological bioassay needs to be refined before
19
20
21
         it can be used to set a specific BMR, Consider a continuous sequence'of populations, identical to
         each other except for the dose of some toxic agent. At least after a threshold has.been exceeded,
         and in the absence of masking effects, the incidence of adverse effects (for dichotomous and
         dichotomized continuous endpoints) and the mean of cqntinous endpoints will increase (or
         decrease) as dose increases.  As the difference between the response in the control population and
         in a dosed population increases, so does the probability that a statistical test of that difference,
         carried out on samples from those populations (as in a toxicological dose-response study), will
         indicate a significant difference. The probability that the statistical test will be significant when
         there is a true difference between underlying populations is called the "power"  of the test, and
         depends upon the experimental design. This same concept of power is used to determine sample
         size when designing studies, when an effort is made to set a sample size that will give a relatively
  3
  4
  5
 •6
  7
• .8
  9
 10
 11
 12
 13
 14     high power (usually 80% - 90%) to detect a given magnitude of effect.  Note that, even in the
 15      absence of a dose-related effect, a statistical test has some probability of indicating a statistically
 16
 17
        significant difference.  This is variously referred to as the size of the test, or the Type I error, or
        the "alpha" level of the test (very often, it takes the value of 0.05: Le., there is a 5% probability of
 18      detecting an effect). A way to quantify what is meant by the limit of detection of a bioassay
        design, then, is by specifying how frequently we would expect to distinguish a treatment group
        with a response that is just detectable from a control group. Thus, for example, we might decide
        that a response was at the limit of detection when we could distinguish from background in 50%
22      of the experiments (power of 0.50) in which we attempt it.
23             To use this concept of limit of detection to specify the BMR for a specific endpoint and
24      species, first a power level is chosen to represent the limit of detection. This should be done in a
25      larger context than just the specific study at hand, and the selection should satisfy the goal for the
26      overall average BMD/C:NOAEL ratio of one.  The next step uses the  sample size and nesting
27.      structure (e.g., litters within darns, repeated measures) for a typical "good4" study of the endpoint
                                                         . .                 -             -  '    -
              'Using a cadre of standard study designs for a variety of endpoints is one way to reduce
.       the uncertainty that can be introduced when different individuals make decisions from limits of
        DRAFT-DO NOT CITE OR QUOTE        59                           AUGUST 9, 1996

-------
 I      and species of interest, and historical values of the measurements such as background incidence or
 2      control group variance that are necessary for power calculation. With these one finds the
 3      magnitude of response just detectable in a two-group design (control arid one treatment group)
 4      using a one-sided test with a Type I error of 0.05 and the predetermined power (e;g., 0.50).
 5      Finally, this change is expressed either as a change in the mean (for continuous endpoints) or as
 6      additional or extra risk (for dichotomous endpoints),   .           -
        *                                                   /
 7            Additional risk and extra risk are two ways to quantify the increment to the background
 8      risk of an adverse outcome for a dichotomous endpoint. Their definitions are:  '         .
 9                                                 •  -   '   '   '•
10                    Additional Risk at dose d = P(d)-P(0), and
11
12                    Extra Risk at dose d = p>(d)-P(0)]/[l-P(0)],
13
14      where P(d) is the proportion of animals, given dose d that have an adverse response.
15                                                                    .
16      Additional risk is the proportion of responders in the exposed group beyond that in the control
17      group, and extra risk is the proportion of animals responding that would not otherwise have
18      responded under the assumption that the processes that lead to the adverse outcome in unexposed
19      subjects are independent of the processes that lead to the adverse outcome in the exposed subjects
20      (see Fig. 2).  The greater the background incidence, the greater the difference between extra and
21      additional risk.  If there are no responders in the control group [P(0)=0], there is no difference
22      between extra and additional risk. For an effect with an incidence, of 50% in the control group
23      and 55% in the exposed group, the additional risk is 5% and the extra risk is 10%. Likewise, for
24      a 90% background, and a  1% increase .in the exposed group, the additional risk is  1% and the
25      extra risk is 10%.  The Agency has used extra risk models in the past for most animal-based
        detection.  We assume here the use of Agency testing protocols as the basis for standard "good"
        study designs. In areas where standard testing protocols have not been developed, the Agency
        encourages activities that can assist in identifying a "most common well-designed protocol" for
        typicajly-studied endpoints.
        DRAFT-DO NOT CITE OR QUOTE       60                         AUCJUST 9, 1996

-------
                                                 Dose (d)
Figure 2.    Dose response curves for models incorporating difference forms of spontaneous
            background response
            (a)  No background response (P*(d) = P(d)).
            (b)  Independent background response (P*(d) =Y + (1 -Y)P(d)) -extrarisk.
            (c)  Addttlve background response (P*(d) = P(d +6)).
            (d)  Additional background response (P*(d) = f + P(d)).
    DRAFT-DO NOT .CITE OR QUOTE     61
AUGUST 9, 1996

-------
 1     cancer risk assessments and for most work with BMD/C analyses to date. EPA-supported   .   .
' 2     research on developmental toxicity data (Allen et al., 1994a, b; Kavlock et al., 1995) used
 3    ' additional risk, but since the background incidence in the data used were relatively low, the
 4     difference between additional and extra risk are likely to be minimal.
 5            It has been proposed that the. extra risk approach is consistent with the assumption that
 6     independent mechanisms are responsible for the background incidence and the excess incidence of
 7     effect in the exposed group. Although the basis for this explanation has not been well described
 8     (Hoel, 1979), it could be argued that the effects of exposure would be masked iri proportion to
 9     the background response, since independent mechanisms would not be assumed to have an
10     additive effect.  The true exposure-related response would then be reflected by the extra risk
                                                                '                       '
11     model.  There is no correspondingly simple interpretation of additional risk; its use seems
12     primarily to reflect computational convenience.                                r
13            There is limited basis for a science policy choice as to whether to use an additional or
14     extra risk approach for BMD/C analyses. The choice of the extra risk model would be
15     conservative in the sense that, for a given increment of risk, say 10%, the BMD/C corresponding
16     to an extra risk of 10% would always be equal to or less than that corresponding to an additional
17     risk of 10%.  When the BMR is set based on limit of detection considerations, however, it does
18     not matter which of the two risk formulations are used; the resulting BMD/C will be the same,
19     because the value of the BMR based on extra risk and additional risk that correspond to the same
20     limit of detection differ (see the example, below). The default approach for BMD/C analysis of
21     dichotomous outcomes is to use extra risk to specify BMRs.
22     3. Examples                       .
23            Suppose the endpoint of interest has a dichotomous response and no nested structure, so )
24     that we would .normally assume that the response has a binomial distribution. Fleiss (1981) gives
25     an approximate formula for the power of.a two-sample test in such a situation, when each group
26     is the same size. This is a function of sample size, the probability of response hi each group, and
27     the Type I error of the test. This  formula (Fleiss, 1981) can be used for the task at hand by fixing
28     the incidence of the adverse outcome in the control group, the sample size, and the desired Type I
29     error, and then, either by trial and error or by some more efficient search method, solving for the

       DRAFT-DO NOT CITE OR QUOTE        62                          AUGUST 9,1996

-------
*   l     treatment-group, response that would be detectable with the specified power under those
   2     conditions. Figure 3 shows the result of doing this for a range of values for control incidence,
   3     Type I error of 0.05 in a one-sided test, a sample size of 25 per group, 'and three powers: 0.2,
   4     0,5, and 0.8. The "barely detectable response" is plotted both as extra risk (top panel) or
 ,5     additional risk (lower panel).  Note that, for a given control incidence, the same treatment group
   6     incidence results in a higher value for extra risk than it does additional risk, because of the " 1 -
   7.     P(0)" term in the denominator for extra risk.       -  '                           :
   8           As a*1 example of how to read this graph, suppose we are looking at data'for an endpoint
   9     with a background incidence that is usually around 10%, and which is usually measured in a study
 10     with about 25 animals per group. Go to the figure that corresponds to the way you want to
 11      express risk ("Extra" or" Additional"), find 0.1 on the x-axis (there is a vertical dotted line there),
 12      and follow it up until it crosses one of the plotted curves. The vertical line crosses that thinnest,
 13      bottommost curve at an additional risk of about 0,13 or an extra risk of about 0.15. Since this
 14      curve corresponds to a power of 0:2, this means that one in five (20%) experiments in a situation
 15      with a control incidence of 0.1, an additional risk in the treatment group of 0.13 (that is, the
 16      treatment response is about 0.23),with 25 animals per group, and a one-sided test with a Type I
 17      error of 0.05, would indicate that the treatment response was greater than control.  To raise the
 18      proportion of experiments that indicate that the treatment response was greater than control to
 19      one in two (50%), the additional risk for the treatment group would have to be about 0.23, and to
20     raise it to eight out often (a typical pre-design-stage sort of power), the additional risk would
21     have to be about 0.35.                :
22            If the magnitude -of treatment .effect (difference between the mean responses of treatment
23     and control groups) that is deemed biologically significant is the smallest treatment effect that a
24     standard experimental design for that endpoint could detect, the BMR would be based on the limit
25     of resolution of the standard bioassay for looldng at that endpomt in the relevant test species.  For
26     example, for average litter sizes, the variance of litter means of term fetus weight for CD-I mice is
27     about 0.008 with a  mean of about 1.02 gm (Setzer and Rogers, 1991).  A commonly used sample
28     size formula (Kupper and Haffiier, 1989) can be rearranged to express the difference between two
       DRAFT-DO NOT CITE OR QUOTE        63                           AUGUST 9, 1996

-------
                                      N = 25
                                           0.4


                                  Control Incidence
±  0.8---
CO

i
'an
•a
-a
<

CD

S
to
 
-------
  1     means just detectable with a test of a specified type I error and
  2     power:                          '
  4     where 8 is the desired difference between control and treatment group means, o is the control
  5     group standard deviation, n is the number of animals in a group, a is the Type I.error, 1 - P is the
  6     desired power, and Z,.0 is the number from a table of the normal distribution such that the
  7     probability that a standard normal random variable is less than Z,.a is 1-a.
  8      ..     This gives a detectable difference of about 0.04 grams (about 4%) using a test with a type
  9,     I error of 0.05 and a power of 0.5 (for powers of 0.2 and 0.8 the values are 0.02 gm and 0.08 gm,
 1Q     respectively). If it were decided that the limit of detectability would be based on a difference
 11     detectable with a power of 0.5, then, to reflect the limit of detectability of the conventional assay,
 12     the BMR would be set at a difference of 0.04 gm.
 13     4. Selecting the Critical Power Level for the Limit of Detectability
 14            The critical power level to use for the "limit of detectability" with any particular study
                                      '                    *        .        '•-          '•,-.'
 15     design needs to be determined in advance of applying this approach to setting, the BMR. This
 16     could be determined using data from a large number of studies of similar "standard" design as was
 17   ..done for developmental toxicity data (Allen etal., 1994a, b; Kavlock et al., 1995),  or using
 18     simulation  techniques based on standard study design and background incidence. Other
 19     requirements for setting the BMR using this approach are general agreements on the details of the
 20     behaviors of those endpoints that affect power calculation; for example, the "typical" study design
 21     for evaluating an endpoint in a given species; for dichotomous responses, normal background
22      incidences;  and for continuous response, normal control variance levels and changes considered
23      to be "biologically significant" or cutoff levels distinguishing adverse from non-adverse values of
24      the continuous endpoint.
25 .     "         .  ;             .           '     .-'.-•        .       •         '
       DRAFT--DONOTCITEORQIJOTE        65                           AUGUST 9, 1996

-------
 1              ••                           APPENDIX C
• •         •           *             •         . •            '
 2                                 MAlHE^iATICAL MODELING
 3     1. Introduction         .
 4            Dose-response models for toxicology data are usually of the type called "nonlinear" in
 5     mathematical terminology. In a linear model, the value the model predicts is a linear combination
 6     of the parameters.  For example, in a linear regression of a response^ on dose, the predicted value
 7     is a linear combination of a and b, namely,  ax 1 +b*dose  . Note that, even a quadratic or other

 8     polynomial is a linear model, in this sense:  y=a+bxdose+cxdose2+d*dose 3  is a third order

 9     polynomial (a cubic) equation, but is still a linear combination of the parameters, a, b, c, and d. In
10     contrast, in a nonlinear model, for example the log-logistic with background,


                                                °'  1+e-[a+bxln(dose)]'
12     the response is not a linear combination of the parameters (here, Pg, a, and b).  The distinction is
13     important, because nonlinear models are usually more difficult to fit to data, requiring more
14     complicated calculations, and statistical inference is more typically approximate than with linear
15     models. Note that this definition of "linear" is in contrast to the way the term is used in reference
16     to cancer dose-response assessment, in which the phrase "low-dose linear" refers to models in
17     which the slope is positive at zero dose.           .'                                          .
18            This section will discuss some aspects of model-fitting: initial model selection, approaches
19     to fitting models to data, ways to select among several models fitted to the same data set, and
20     calculation of the confidence intervals..                            ,
21     2. Model Selection
22            The initial selection of a group of models to fit to the data is governed by the nature of the
23     measurement that represents the endpoint of interest and the experimental design used to generate
24     the data. In addition, certain constraints on the models or their parameter values sometimes need
25     to be observed, and may influence model selection. Finally, it may be desirable to model multiple

      ' DRAFT-DO NOT CITE'OR QUOTE       66                         AUGUST  9, 1996

-------
  endpoints;atthe.same:time. The diversity of possible non-cancer endpoints and shapes of their .
  dose-responses for different toxicants precludes specifying a small set of models for further
  consideration. This will inevitably lead to the use of judgement and occasional ambiguity when
  selecting a final model to calculate a BMD/C.  It is hoped that, as experience using benchmark
  dose methodology in dose-response assessment accumulates, it will be possible to narrow the
  number of acceptable models.                                                   '
        a. Type of endpoint                                •                            ,
        The kind of measurement variable that represents the endpoint of interest is an important
  consideration in selecting mathematical models.  Commonly, such variables are either continuous,
  like liver weight or the specific activity of a given liver enzyme, or discrete, commonly
  dichotomous, like the presence or absence of abnormal liver status. However, other types are
 common in biological data; for example: ordered categorical, like a histology score that ranges
 from 1-normal to 5-extremely abnormal; counts, such as counts of deaths or the numbers of cases
 of illness per thousand person-years of exposure to a given exposure condition; waiting time,
 such as the time it takes for an illness to appear after exposure,  or age at death, or multiple
 endpoints considered jointly (see,.for example, Krewski and Zhu, 1995, Lefkopoulou et al. 1989).
 It is beyond the scope of this document to consider ail possible kinds of variables that might be
 encountered, so further discussion will concentrate on dichotomous and continuous variables.
 Dichotomous variables.  Data on dichotomous variables are commonly presented as a fraction or
 percent of individuals that present with the given condition at a given dose or exposure level. For
 such endpoints, normally we select probability density models like logistic, probit, Weibull, and so
 forth, whose predictions lie between zero and one for any possible dose, including zero. The
 natural form of some models, such as the log-logistic model, presumes that the proportion of
 controls with the abnormal response (an estimate of background) will always be zero. The default
 approach taken in this document is not to include a background  term in the model unless there is
 evidence of a background response, or if the first 2 or more dosed groups do not show a
 monotonic: increase in response. This is an area where more work is needed.
 Continuous variables.  Data for continuous variables are often presented as means and standard
 deviations or standard errors, but may also be presented as a percent of control or some other

DRAFT-DO NOT CITE OR QUOTE       67                          AUGUST 9, 1.996

-------
  1      standard. 'From.a modeling standpoint, the most desirable form for such data is by individual.
  2    •  Unlike the usual situation for dichotombus variables, summarization of continuous variables
  3      results in a loss of information about the distribution of those variables.
  4            The preferred approach to expressing the BMR will determine the approach to modeling
  5      continuous data.  Two broad categories of approach have been proposed: 1) to express the BMR
  6      as a particular change in the mean response, possibly as a fraction of the control mean, a fraction
  7      of the standard deviation of the measurement from untreated individuals,  or a level of the
  8      response that expert opinion holds is adverse; or 2) to decide on a level of the outcome to
  9      consider adverse, and treat the proportion of individuals with the adverse outcome much as one
 10      would a dichotomous variable (see Appendix B above).
 1 1            Typical models to use in the first situation include linear and polynomial models, and
 12      power models or other nonlinear models. In the second situation, one approach is to classify each
 13      individual as  affected or not, and model the resulting variable as dichotomous.  An alternative is to
 14      use a hybrid approach, such as that described by Gaylor and Slikker (1990), Kodell et al. (1995),
 15      and Crump (1995), which fit continuous models to continuous data, and,  presuming a distribution
 16      of the data, calculate a BMD/C in terms of the fraction affected.      .                  .   ,
                  *                                                                  .
 17            b. Experimental design          .
    :          *                       *                                    T f             '    =
 18            The aspects of experimental design that bear on model selection include the total number
 19      of dose groups used and possible clustering of experimental subjects. The number of dose groups
20      has a bearing on the number of parameters that can be estimated: the number of parameters that
21      affect the overall shape of the dose-response curve generally cannot exceed the number of dose
22      groups.
23            Clustering of experimental subjects is actually more of an issue for methods of fitting the
24      models than for choice of the model form itself.  The most common situation in which clustering
25      occurs is in developmental toxicology experiments, in which the toxicant is applied to the mother,
26      and individual offspring are examined for adverse effects. Another example is for designs in
27      which individuals yield multiple observations (repeated measures). This can happen, for example,
28      when each subject receives both treatment and control (common in  studies with human subjects),
29      or each subject is observed multiple times after treatment (e.g., neurotoxicity studies). The issue

        DRAFT-DO NOT CITE OR QUOTE        68                          AUGUST 9,1996

-------
  in all these-examples is that individual observations cannot be taken as independent of each other.
  Most methods used for fitting models rely hea\dly on the assumption that the data are .
  independent, and special fitting methods need to be used for data sets that exhibit more
  complicated patterns of dependence (see, for example, Ryan 1992; Davidian and Giltinan, 1995).
        c. Constraints and covariates
        An obvious constraint on models for dichotomous data has already been been discussed;
  probabilities are constrained to be positive numbers no, greater than one. However, biological
  reality may impose other constraints on models. For. example, most biological quantities are.
  constrained to be positive, so models should be selected so that their predicted values, at least in
  the region of application, conform to that constraint. In models in which dose is raised to a power
  which is a parameter to be estimated (such as a Weibull model), if that parameter is allowed to be
  less than 1.0,'the slope of the dose-response curve becomes infinite at a dose of zero.  This is
  often seen as an undesirable situation, and the default is to constrain these parameters to be
,  greater than or equal to 1.           .      ..          ,
        It is sometimes desirable to include covariates on individuals when fitting dose-response
 models. For example, litter size has often been included as a covariate in modeling laboratory
 animal data in developmental toxicity. Another example is in modeling epidemiology data, when
 certain covariates (e.g.., age, parity) are included that are expected to affect the outcome and
 might be correlated with exposure. In continuous models, if the covariate has an effect on the
 response, including it in a model may improve the precision of the overall estimate by accounting
 for variation that would otherwise end up in the residual variance. In any kind of model, any
 variable that is correlated with dose, and which affects outcome, would need to be included as a
 covariate.                                     '        ^                           .
 3. Model Fitting
        The goal of the fitting process is to find values for all the model parameters so that the
 resulting fitted model describes those data as well as possible; this ;s termed "parameter
 estimation." In practice, this happens when the dose-group means predicted by the model come
 aS close as possible to the data means.  One way to achieve this is to write down a function of all
 the parameters (the objective function) and all the data, with the property that the parameter

 DRAFT-DO NOT CITE OR QUOTE         69                           AUGUST 9, 1996

-------
.  1      values that-corre.spond either to an overall minimum (or, equivalently, an overall maximum) of the
  2      function, or that result in function values of zero, give the desired model predictions. The actual
  3      fitting process is carried out iteratively.  Many models will converge to the right estimates for
  4      most data sets from just about any reasonable set of initial parameter values; however, some
  5      models, and some data sets, may require multiple guesses  at initial values before the model
  6      converges. It also happens occasionally that the fitting procedure will converge to different
  7      estimates from different initial guesses. Only one of these sets of estimates will be "best". It is
  8      always good practice when fitting nonlinear models to try  different initial values, just in case.
  9            There are a few common ways to construct objective functions (estimates): the methods
 10      of nonlinear least squares, maximum likelihood, and generalized estimating equations (GEE). The
 11      choice of objective function is determined in large part by the nature of the variability of the data
 12      around the fitted model. The method of nonlinear least squares, where the objective function is
 13      the sum of the squared differences between the observed data values and the model-predicted
 M      values, is a common method for continuous variables when observations can be taken as
 15      independent. A basic assumption .of this method is that the variance of individual observations
 16      around the dose-group means is a constant across doses. When this assumption is violated
 17      (commonly, when the variance of a continuous variable changes as a function of the mean, often
 18      proportional to the square of the mean, giving a constant coefficient of variation), a modification
 19      of the method may be used in which each term in the sum of squares is weighted by an estimate of
 20      the variance at the corresponding dose. This method is especially appropriate when the data to be
 21      fitted can be supposed to be at least approximately normally distributed.
 22            Maximum likelihood is a general way of deriving an objective function when a reasonable
 23      supposition about the distribution of the data can be made. Because estimates derived by
 24      maximum likelihood methods have good statistical properties, such as asymptotic normality,
 25      maximum likelihood is often a preferred form of estimation when that assumption is reasonably
 26      close to the truth. An example of such a situation is the case of individual independently treated'
 27      animals (e.g., not clustered in litters) scored for a dichotomous response. Here it is reasonable to
 28      suppose that the nurriber of responding animals follows a binomial distribution with the probability
 29      of response expressed as a function of dose. Continuous variables, especially means of several

        DRAFT-DO NOT CITE OR QUOTE        70              '            AUGUST 9, 1996

-------
'  l     observations, are often normal (gaussian) qr log-normal!  When variables are normally distributed
   2  •   with a constant variance, minimizing the sum of squares is equivdent to maximizing the
   3     'likelihood, which explains in part, why least squares methods are often used for continuous
 •. 4' . •   variables.  In developmental toxicity data, the distribution of the number of animals with an
   5     adverse outcome is often taken to be approximately beta-binomial.  This particular likelihood is
   6     used to accomodate the lack of independence among.littermates.                       •
   7            A third group of approaches to estimating parameters are the related quasi-likelihood
   8     method (McCullagh and Nelder,  1989) and the method of GEE (see Zeger and Liang, 1986),
 ;  9     which require only that the mean, variance, and correlation structure of the data be specified.
  10     GEE methods are similar to maximum likelihood estimation proceedures in that they require an
  11      iterative solution, provide estimates of standard errors and correlations of the parameter
  12      estimates, and estimates are asymptotically normal. Their use so far has  primarily been to handle
  13      forms of lack of independence, as in litter data, and would be useful in any of a number of kinds of
  14      repeated measures designs, such as occur in clinical studies and repeated neurobehavioral testing.
 15      4. Assessing How Well the Model Describes the Data
 16            A*1 important criterion is that the selected model should describe the data, especially in the
 17      region of the BMR. Most fitting methods will provide a global goodness-of-fit measure, usually
 18      associated with a P-value.  These measures quantify the degree to which the dose-group means
 19      that are predicted by the model differ from the actual dose-group means, relative to how much
 20      variation of the dose-group means one might expect.  Small P-values (say, P < 0.05) indicate that
 21      it would be unlikely to achieve a value of the goodness-of-fit statistic at least this extreme if the
 22      data were actually sampled from the model, and, consequently, the model is a poor fit to the data.
 23      Larger values cannot be compared from one model to another since they assume the different
 24      models are correct; they can only identify those models that are consistent with the experimental
 25      results.  When there are other covariates in the models, such as litter size, the idea is the same,
 26     just more complicated to calculate.  In this case, the range of doses and other covariates is broken
 27     up into cells, and the number of observations that fall into each cell is compared to that predicted
 28     by the model.
 29            ft can happen that the model is never very far from the data points (so the P-value for the  ='

        DRAFT-DO NOT PITE OR QUOTE        71                          AUGUST 9, 1996

-------
 1     goodness-of-fit statistic is not too small), but is always on one side or the other of the dose-group
 *                  :        *           . ~                    •      ' .
 2     means.  Also, there could be a wide range in the response, and the model predicts the high
 3     responses well, but misses the low dose responses.  In such cases, the goodness-of-fit statistic
 4     might not be significant, but the fit should be treated with caution.
 5            The best way to detect the form of these deviations from fit is with graphical displays.
 6     Plots should always supplement goodness-of-fit testing, It is extremely helpful that plots that
 7     include data points also include a measure of dispersion of those data points, such as confidence
 8     limits for the means, if the data points are means.                            '
 9            In certain cases, the typical models for a standard study design cannot be used with the
10     observed data as, for .example, when the data are rnonotonic, or when the response rises abruptly
11     after some lower doses that give only the background response. In these cases, adjustments to
12     the data (e.g., a log-transformation of dose) or the model (e.g., adjustments for unrelated deaths)
13     may be necessary.  In the absence of a mechanistic understanding of the biological response to a
14     toxic agent, evidence from exposures that give responses much more extreme than the BMR does
15     not really tell us very much about the shape of the response in the region of the BMR.  Such
16     exposures, however, may very well have a strong effect on the shape of the fitted model in the
17     region of the BMD/C. Thus; if lack of fit is due to characteristics of the dose-response data for
18     high doses, the data may be adjusted by eliminating the high dose group.  The practice carries
19     with it the loss of a degree of freedom, but  may be useful in cases where the response plateaus or
20     drops off at high doses. Alternatively, an entirely different model could be fit.
21     5. Comparing Models
22            It will often happen that several models provide an adequate fit to a given data set. These
23     models may be essentially unrelated to each other (for example a logistic model and a probit
24     model often do about as well at fitting dichotomous data) or they may be related to each other in
25     the sense that.they are members of the same family that differ in.which parameters are fixed at
26     some default value. For example, one can consider the log-logistic, the log-logistic with non-zero
27     background, and the log-logistic with threshold and non-zero background to all be members of
28     the same family of models.  Goodness-of fit statistics are not designed to compare different
29     models, so alternative approaches to selecting a model to use for BMD/C computation need to be
                                                                                              * *
       DRAFT-DO NOT CITE OR QUOTE       72                      .     AUGUST 9, 1996

-------
   l      pursued,'- '  •.,.
  2  .          When other data sets for similar endpoints exist, an external consideration can be applied.
 : 3      It may be possible to compare the result of BMD/C computations across studies of all the data
  4      that were fit using the same form, of model, presuming that a model can be found that describes all
  5      the data sets. Another consideration is the existence of a conventional approach to fitting a kind
  6      of data. In this case, communication with specialists in Ithat data is eased when a familiar model is
  7      used to fit the data.  Neither of these considerations should be seen as justification for using ill-
  8      fitting models.  Finally, it is generally considered preferable to use models with fewer parameters,
  9      when possible!                                                :
 10            Generally, both in the method of least squares and in maximum likelihood methods, within
 11      a family of models, as additional parameters are introduced the objective function will appear to
" 12      improve in behavior. This general behavior is due solely to the increase in the additional
 13      parameters.  Likelihood ratio tests can be used to evaluate whether the improvement in fit
 14      afforded by estimating additional parameters is justified. Such tests cannot be applied to compare
 15      models from different families, however. Some statistics, notably AkaUce's Information
 16      Coefficient (AIC) can be used to compare models with different numbers of parameters using a
 17      similar fitting method (for example, least squares or a binomial maximum likelihood). Although
 18      such methods are not exact, they can provide useful guidance in model selection.
 19     6.  Using Confidence Limits to Get a BMD/C
 20           When a BMD/C is determined, it requires the choice of a model together with a suitable
 21     method to calculate a confidence limit, as well as the choice (or prior specification) of the level of
 22     confidence and the magnitude of the endpoint.  Confidence limits bracket those models which,
 23     within a particular model family,  are consistent with the data.  They do not make statements about
 24     the extent to which a model is likely to be the true one (see Cox and Hinkley, 1974).
 25            The choice of endpoint or its severity is discussed elsewhere inithis document. The "level"
 26      at wWch confidence Umits are set has tended to be 95% for a variety of applications (e.g.,
 27     prediction of future monitoring values, human cancer prediction).  This level reflects the extent to
 28      which outlying values are tolerable for the system and is chosen to cover some reasonable amount
 29      of the distribution of the source of the modeled data.  This cannot account for or assume any

        DRAFT-DO NOT CITE OR QUOTE         73                          AUGUST 9, 1996

-------
 ^1      correspondence.between the modeled animal data and the human population" of concern. Rather,
 2      the "confidence" associated with an interval indicates the percent of repeated intervals based on
 3      experiments of the same sort that are expected to cover (include) the dose associated with the
 4      BMR. The choice of confidence level represents tradeoffs in data collection costs.and the needed
 5      data precision, just as hypothesis-testing levels do. Just as 0.05 is a convenient (but not
 6      necessarily good for all data) level for tests, 95% is a convenient choice for most limits.  An
        *                             *                          .
 7      example of the effects of choice among 90, 95, and 99% for two different models may be found in
 8      the background document (EPA, 1995c).                                   '
 9            The method by which the confidence limit is obtained is typically related to the manner in
10      which the BMD/C is estimated from the model.  Historically, most Agency modelling has been
11      likelihood based (see discussion below); mostly this reflects general trends in modelling. How
12      well any method behaves (e.g., how narrow the confidence interval of fixed level is) relative to
13      other methods with the same model can depend on how nearly the effect level for estimation is
14      contained in the exposure range in the source study.  Different types of models lend themselves to
15      different types of estimation, depending on such things as the extent to which the model
16      accommodates clustered data and whether the model equation can be restated hi terms of the dose
17      that is calculated to produce the specific effect.  Software used to determine benchmark doses
18      should identify the estimation procedure used and the available confidence limits.
19            Several ways to derive model functions are described earlier in this Appendix. For both
20      likelihood and estimating equation strategies, the most typical approach to constructing
21      confidence limits relies on asymptotic normality to establish confidence sets, although neither
22      strategy requires that the actual endpoint of interest be a continuous variable.
23            With regard to likelihood-based methods, confidence intervals (CIs) based on the
24      asymptotic distribution of the likelihood ratio are preferred to those based on the asymptotic
25      distribution of the MLEs, because they can use a commonly tabled distribution function, but both
                1                                                           \
26      can give problems in ranges where the assumptions needed to use asymptotic theory begin to
27      weaken (e.g., as sample sizes decrease, as interest focuses farther from the experimental doses, as
28      observations become more correlated). In general, however, it is preferred to base CIs for
29      parameters estimated by maximum likelihood across  various data contexts on the asymptotic.

        DRAFT-DO NOT CITE OR QUOTE        74                          AUGUST 9,  1996

-------
  ,1      distribution of the Ukelihood ratio, owing to their tendency to give better coverage behavior5
  2      (Crump and Howe, 1985).
  3            In the long run, there may be a preference for one type of construction with GEE (e.g.,
  4      like Ryan, 1992, who uses the delta method and sample estimates). It still has to be demonstrated
  5      whether those types are similar in performance (e.g., coverage behavior) for their respective
  6      parameters (where GEE were used) to.those from the asymptotic distribution of the likelihood
  7     ratio hi the likelihood context.                        '
  8
  9
 10
 1 1
 12
 13
 14
15
16
17
               Thus, the BMD/C is determined by .1) selecting ;an endpoint, 2) identifying a BMR (a
        predetermined level of change in response relative to controls), 3) establishing, by an appropriate
        estimation procedure, a model that fits the data adequately, and 4) calculating a confidence limit
        at the BMR using the model and the same estimation procedure.
               At the time of this draft, some commercial software is available that is designed
        specifically for carrying out steps 3) and 4) by maximum likelihood methods. EPA is currently
        developing software for this purpose that will be made widely-available to all. potential users.
        GEE solutions, which might be used with nested data, involve iterative fitting after the method of
        weighted least squares but standardized routines are available for certain contexts; these are
        written as commonly-used software macros (e.g., in SAS) that must be inserted into user-
is     fashioned programs.
19
              While we say there is, for instance, a 95% probability that our interval covers the BMR,
       the actual probability may be something else, sometimes even as low as 30% or 40% although
       more usually in the 50% to 90% range.  This can happen when we have not allowed for
       something like correlated observations in the model. Then it is said that the nominal and actual
       coverages diverge. It is not clear if anyone has, looked at the extent of this divergence for various
       construction approaches for GEE CIs.        ...
       DRAFT-DO NOT CITE OR QUOTE    .75                          AUGUST 9, 1996

-------
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
        •     .                     APPENDIX D
       '-•*'-                 *                              .            i
                        EXAMPLES OF BMD/C ANALYSES


Example #1;  Carbon Bisulfide                                          .

Summary of Main Points Illustrated

Determination of the BMR for a continuous variable based on biological significance of response.

Use of individual rather than group exposure and response data.              ' '

Summary of Pata

Study: Occupational inhalation study (in humans) by NIOSH.
Endpoints modeled: Sural and peroneal nerve conduction velocity and amplitude   .
Endpoint used: peroneal nerve conduction velocity-

      The benchmark concentration for carbon disulfide was derived from nerve conduction
velocity changes reported in the occupational study by Johnson et al., 1983. Johnson et al.
reported several neurophysiological measurements in workers exposed for an average of 12 years.
Endpoints affected by exposure included peroneal nerve conduction velocity and amplitude, and
sural nerve conduction velocity. The published report presented exposure groups categorized as
low, medium and high exposure and presented mean and median exposures for each group (Table
1).                          .                 .
Table 1. .Group data used in the Benchmark Dose Calculation for Carbon Disulfide - As
summarized in Johnson et al., 1983.
             Exposure Level (ppm) .
Peroneal NCV
Group
Comparison
group
Low
Medium
High
•••• — MM ••-•Bin-It -TUTTI-1 	
Mean

1.2
5.1
12.6
SD

1.0
4.1
26.9
Median
0.2
>
1.0
4.1
7.6
N
196
44
56
36
Mean
45.3
43.7
43.4
41.8
SD
4.4
5.1
4.8
4.5
       DRAFT-DO NOT CITE OR QUOTE
                                         76
                   AUGUST 9, 1996

-------
  •  ,. ". . TI"; Ch^lical. Manufacturers Association Carbon Bisulfide Panel obtained the raw
  mdmdual data from the Johnson study from NIOSH and performed a benchmark concentration
  analysis which is presented in the report by Price and Berner, 1995.
  Selection of the BMR

  Benchmark response: 10% change (note that this BMR is > the response in the highest exposure
  O "~P/                       i

         The decision of the RfC work group was to use a 1-0% decrease in nerve conduction   :
  velocity as the definition of the BMG. The BMR is therefore at a higher level of .response than
  the highest dose group in the study and was also higher than a statistically significant response
  level in the study. This decision was aided and supported by extensive discussion with
  neurotoxicologists from EPA's NHEERL, who attended the work group meeting  There was
  considerable.discussion about the proper choice of the BMR because use of the BMC10 resulted
  in a BMC that was higher than any of the exposure group responses, i.e., the BMC involved
  extrapolation above the data points.  This approach  was supported on the grounds that the
  response observed in the high concentration group was considered to be, at most mildly adverse
  (a peer-reviewer' s comment was that the effects in the Johnson study should be considered pre-
  adyerse), because a change in conduction velocity of 1 0% is about where a clinician would begin
  to be concerned, and because a 10%  change is about equal to one standard deviation for this
  endpoint. The use of the individual subject analysis was preferred because it allowed for aae
 adjustment.    '                                                               •   &

 Calculation of the Benchmark Concentration

 Model: polynomial model used, data fit with linear term only

 A. Group Level Data
       Analyses were done initially using the group level data based on the arithmetic mean
 exposures (with no variability estimate) using commercial software (Table 2)  These results are
 presented m the internal EPA report 'Benchmark Concentration Analysis for Carbon Bisulfide'
 (EPA, 1995e). (Also presented in this report are BMC analyses on the developmental studies by
 Tabacova and colleagues.)                          .                                   *••

 B.  Individual Data
       The Chemical Manufacturers Association Carbon Disulfide Panel performed a benchmark
 concentration analysis on the individual data, which allowed for an age adjustment to be included
 in the analysis and allowed for exposure variability to be considered because each subject had a
 unique exposure and effect measurement (Price and Berrier, 1995).

       Further analysis of the raw data by EPA involved correction of some missing or
 implausible data points and evaluation of the interaction of age and exposure.  This analysis found.


DRAFT-DO NOT CITE OR QUOTE       77                          AUGUST 9, 1996

-------

Group Data
Individual Data
BMR
10%
10%
BMC
ppm
12
20
BMC
mg/m3
37
62
 1     a greater decline in conduction velocity with age in exposed compared with control subjects,
'2     suggesting an interaction of age and exposure, and found cumulative exposure to better explain
 3     the response (see figure). The resulting BMCs are shown in Table 2. .
 4                                '                               '"''.''
 5         .
 6     Table 2. Peroneal Nerve Conduction Velocity Benchmark Concentration Results'-
 7                           "...                '             '            ,
 8
 9
10
11
12
13
14
15                             '  '
16
17            The difference in the BMCs between the analysis based on group data versus individual
18     data are due mainly to the skewed distribution of concentrations in the defined exposure
19     categories for the group analysis (See Table 1).
20
21     Explanation of Figure 4:
22            The top -horizontal solid line is the mean adjusted nerve conduction velocity (NCV) for the
23     '"typical" subject (34 year old, 70 inches tall, 177 pound).  The horizontal dashed line and lower
24     solid line show the BMR of 5 or 10 % reduction in peroneal NCV.  The dashed sloping line
25     shows the 95% confidence limit on the linear model using the cumulative exposure. The vertical
26     line connects the intersection of the lower confidence limit and the BMR of a 10% reduction to
27     the x-axis, showing the BMC. The solid sloping line shows the lower confidence limit on the
28     regression model using mean exposure index for comparison.
29
        DRAFT-DO NOT CITE OR QUOTE        78                          AUGUST 9; 1996

-------
 o
 0
 CO
"o
.0
 CD


*;c
 o

I

T3
 C
 O
O

 (D


 CD
 CO
 CD
 C

 2
 CD
a.

-o
 CD

.to
      O
      CD
O
LD
0
10
                                  20
                                                     30
40
                                             .

                          Mean Exposure Index (ppm)
               Figure 4. Carbon Disulflde Modeling (see text for explanation)
     DFLAFT--DO NOT CITE OR QUOTE
                               79
                                AUGUST 9, 1996

-------
1
t
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
OE
JiO
26
27 '
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Example'#2: l,l,l,2-Tetrafluoroethane.(HFC-134a)
'• . ' ' ' •
Summary of Main Fumts Illustrated .
"
Setting the BMR based on default and limit of detection approaches for dichotomous dat&
' • *.
Modeling a response with a high background incidence. .
•
Comparison of Extr.a vs. Additional risk


Summary of Data

Study: Chronic rat inhalation
Endpoints modeled: Leydig cell.hyperplasia, incidence
Endpoint used: same

The BMC for HCF-134a was based on the BMC calculated for testicular effects (Table
in a chronic study in rats (Hext and Parr-Dobrzanski, 1993; Collins et al., 1995).

Table 1. Data used for Benchmark Dose Analysis of HFC-134a

Concentration N Incidence
ppm ' ,

0 85 • •- 27 ' ;
2,500 , 79 25
10,000 85 31(NOAEL)
•50,000 85 40(LOAEL)

*•

..
Selection of BMR


















1)
















Default BMR - The benchmark response for Leydig cell hyperplasia was defined as a 10% extra
risk of response.
. " *
Selection of BMR based on Limit of Detection. - Use of the limit of detection approach
significantly changes the BMC for this chemical because of the high background and the very
shallow dose.-response curve. . • ,





The determination of a "typical" design that includes evaluation of Leydig cell hyperplasia
in rats, and control incidences is a first step in determining the BMR using the limit of detection
approach. Leydig cell hyperplasia would be observed during routine histopathology of the testes ,
DRAFT-DO NOT CITE'OR QUOTE      80                   AUGUST 9, 1996

-------
  1     in a subchronic.or chronic animal study.  The number of animals typically used in such studies are
  2     10-20 for a subchronic study and 50-100 for a chronic study. To determine the appropriate
  3     BMR, a "typical sample size could be determined to be, e.g., n = .2,5. Alternatively, the sample
  4     size from the study could be used, n = 85. There is a provision for the latter approach in the
  5     guidelines if the endpoint is not typically examined and there is no historical information on a
  6     typical design and background incidence level              .     .             ••
  7. .  "•   •   '             •  .   ' .               -     '   '."'.•..'"•"        •.'.-'•
  8            Secondly, the "typical" background rates must be determined.  If an additional risk
  9     formulation is used, background is relatively unimportant, but the sample size makes a big
 10     difference. Again, the choice is between the use of a "typical" background rate for the endpoint
 1.1     and the use of the background in the study. Because a review of the incidence of Leydig cell
 12     hyperplasia has not been done to determine the typical incidence and the variability among studies,
 13     the background observed in the study (approximately 30%) was used in the determination of the  '
 14     BMR.
 is         '       ' -             ••        '  '    --       .                 .'.';.     '.•'.•
 16           Finally, the required power level to be used must be determined.  For the purpose of this
 17     example, power levels of 50% and 80% will be used. The BMRs for various sample sizes,
 18     backgrounds, and power levels are shown in Table 2.
 19            '  -     •'  •      '     	     .         -...•''     ; ;.''.'•..'
 20     Table 2.  Benchmark response levels for various sample size, background, and power.
 21          '•       ••:   . .   -..•/'.'..    -     "•.••-'--           ''."'...•'•
 22                         ,       Extra Risk                Additional Risk
 23
 24
 25
 26
 27
 28
 29
 so;
 31
 32
 33     ••.••..•-•.     •   ,      •      ,   .         '
 34                        '                   '      '  •..'-      •'''•/••'*  -
 35      Calculation of the Benchmark Concentration
 36                    • ,  -    .->-...''•        '           '.'•'''.'-'   "•
 37      A. Use of Default BMR
 38      Model: Polynomial and Weibull models (with and without threshold term) agreed. Used
 39             commercially available software.
4o         ',    •    ,.  .      '  ~  '.    --:  •   :'-        -,.      -   .•:""•	    ,     •  "".
41            The RfC for HGF-134a was based on the BMC calculated for testicular efFects (Table 1)
42      in a chronic study in rats (Hext and Parr-Dobrzanski, 1993; Collins et al., 1995). Because of the
43      high background, a substantial difference in the estimates of benchmark concentration occurs for


        DRAFT-DO NOT CITE OR QUOTE        81                          AUGUST 9, 1996
ss
25
25
25
85
85
85
Background
.20
.30
.40
.20 ;
.30
s.40"
50% Power
0.33
0.39
0.46
0.15
•0.19.
0.23
80% Power
0.47 -
a.55
0.63
0.23
0.28
0.34
50% Power
0.26
0.27
0.28
0.12
0.13 '
0.14
80% Power
0:37
0.38
0.38
0.18
0.20
0.20

-------
I
'2'
3
4'
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
extra risk vs. additional risk models (15,600 vs. 1 1000 ppm), based on the default use of a BMR
of 0.1 (Table 3). 'An extra risk model was selected as most appropriate based on the conservative
assumption of independence of mechanisms causing the background response and the treatment-
related response. Dose-response models for dichotomous data included a polynomial (multistage)
and a Weibull form and either including or not including a model parameter for a background-
intercept (sometimes referred to as a threshold parameter). Using any of these four- model forms,
an excellent model fit was obtained and the BMC estimates were the same after rounding to two
significant figures.


Table 3. Benchmark Concentrations
default BMR of 0.1. (Polynomial and

Endpoint/
Model
Polynomial, no thresh, extra •

Polynomial, no thresh, additional

Weibull, no thresh, extra

,

B. Use of Limit of Detection


for Leydig Cell Hyperplasia for HFC- 1 34a based on the
Weibull models, extra risk, no threshold)
-
BMR MLE BMC
% ppm ppm
10 . 19300 11000

10 28600 15600

,10 19300 11000
,


• .
The limit of detection approach used considerably higher BMR levels (.30 compared to
the default of . 1 0) because of the high control response level and consequently, a greater
treatment effect required to be statistically significant. BMCs resulting for the BMRs based on
the limit of detection for a background rate of 30%, 50% or 80% power and extra or additional
risk are shown in Table 4.




DRAFT-DO NOT CITE OR QUOTE      82                   AUGUST 9, 1996

-------

25
85

.30
.30
M
95600
40800
xtra Risk
52500
22600

154000 .
63500

62400
. 35200
-.   4      Table 4. 'Bencljmark Concentrations and MLE Estimates for Various Sample Size Background
   2      and Power.  The polynomial model-with no threshold term was applied
  ,3.:          -.    ' '•',    - •.  --     ;     •,'.•.••.'.•;,         '      :.        ....-'
   4      SS   Background           50% Power               80% Power
   5                         MLE      .  .BMC         MLE         BMC
   6
   7
   8
   9
  10  .
  11
  12
  13
  14
  15                       •              .  '                   -     .-         •''••'.
  16;  .-••'.,'.    ,  ..     '   .'-'          - ' "   '    ;*   .•           •      .    . .   _
  17            Tiie BMC is 5-6 times higher when the BMR is calculated based on the limit of detection
  18     approach for n=25 and 2-3 times higher for n=85, and for a sample size of 25, is higher than the
  .19     highest exposure level in the study (see figure 5, based on extra risk).  This difference in the
  20     BMCs would be translated directly into the derivation of the RfC or other health exposure limit
.21-.   because there would be no difference in the application of uncertainty factors for the two   .
  22      approaches to selecting the BMR.  It is also noteworthy that the limit of detection approach to
  23      determining theBMR eliminates the difference between the BMCs for additional vs extra risk
  24      because BMRs corresponding^ extra versus additional risk are used for the appropriate models
Additional Risk
25
85
.30
.30 ..
97500
40900
51000
21600
158000 -
67100
62600
35300
        DRAFT-DO NOT CITE OR QUOTE        83                          AUGUST 9, 1996

-------
                                              d.
                                              3  HI

                                              I'*
a
e
"53 •{=
m 1/9
— <1>
e-1
s *
 w -o
O  6
   s

 •  c
 ra  >»
U-
x
                 —i—
                  o>
                  o'
05
O
s
                                                                                      SO
                                                                                   .   ON

                                                                                      ON"
                                                                                      H
                                                                                      CO



                                                                                rl
                                                                               "8
                                                                          I  ,  ^3
                                                                          |    t

                                                                         .1    I
                                                                                                                 oo
                                                                                                                 S
                                                                                                                 a
                                                                                                                 g
                                                                                                                 g
                                                                                                                 o

                                                                                                                 I
                                                                                                                 o
                                                                                                                 p

-------
 Example #3: Boron
3
4
5
6
7
8
9
10
11
12
13
14 .
15
16
17
18
•19
20
21
22
23
24
25
26
27
28
29.
30
31
32
33
34,
35
36
Summary of Main Points Illustrated
- * . • • *
Combmation of two experiments '•
Conversion of continuous data to dichotomous form.
: - _ •

Summary of Data Sets '•'••'
Studies: Two prenatal developmental toxicity studies with boric acid administered
(Heinde^et al., 1992; RTI, 1994). '" ,
.Endpqints modeled: fetal weight (as continuous and dichotomous), malformations,
.-.,.-' • .


,



in the diet

variations.
Endpoint used: fetal weight (continuous) because changes were seen at the lowest doses.


Table 1 . Data used for Benchmark Dose Analysis for Boron - continuous data

Dose* Fetal Weight SD N (fetuses)
(mg/kg/day)

Study A
0 3.7 .' .32 218
78 3.45 .25 217 :
163 3.21 .26 - 205
330 2.34 .25 . ,191
, . Study B
0 3.61 .24 211
19 3.56 . .23 226
36 3.53 ' , . - .28 220
55 3.50 , .38 , 221
76 3.38 .26 236
143 3,16 .• ; .31 209
' "'""""""'""""'""••"•"""•"••••""""""•"""""""'•••"•"••"•""""•••••••••••••••••••••^^^•••••••••M™




















*The doses in the two studies are slightly different because they are based on food consumption
measurements in the two studies. . .



DRAFT-DO NOT CITE OR QUOTE      85                    AUGUST 9, 1996

-------
Study A
431
432
408
386
Study B
416
460
437
437
471
411

'21 .
51
' 152
384 ' .

21
24
42
40 -
70
162
.              -.      >    -.
 2     Table 2. Data used for Benchmark Dose Analysis for Borpri - dichotomized continuous data
 3   '                                      ';        _    •     -.•'../  ..
 4     Dose               N           Incidence
 5     (mg/kg/day)
 6     	
 7
 8     0
 9     78
10     163
11     330
12
13     0
14     19
15     36
16     55
17     76
18     143
19     _____•__—____.
20
21     Selection of theBMR
22
23         *  Benchmark response levels of 5% additional risk for.dichotomous data, and control SD
24     divided by 2 for continuous data were selected. The rationale for these choices is provided by
25     Allen et al. (1996).                    .                       .
26
27     Calculation of the Benchmark Dose                                              x
28
29     Model: continuous power for continuous data, log-logistic (fetuses nested within litters) for
30     dichotomous, additional risk.
31
32            The BMD for boron was derived by Allen et al. (1996).  In this analysis, BMDs for several
33     endpoints from two developmental studies in rats were calculated and compared. In addition, the
34     analysis combined results from the 2 studies because they were done by the same lab, and because
35     the 2nd study was done as a follow-up intended specifically to extend the dose range from the
36     first study in order to clearly define a NOAEL. There are a few general issues illustrated by this
37     analysis.
38
39     1. Combination of studies - Dose-response functions were fit to the individual data sets and these
40     were compared using a likelihood ratio test. If this test indicated that the responses from the  two
41     studies were consistent with a single dose-response function, the model was fit to the combined
42     data.
43                 _              •             •   '                '-••-


       DRAFT-DO NOT CITE OR QUOTE        86                          AUGUST 9,  1996

-------
•'.         „••","--       •                         •        .              .
   ^l    ,2. Fetal, weighty were analyzed in two ways. In the first, the average Utter weights were modeled
   2     as a continuous variable (Figure 6).  The BMR was defined as a 5% decrease in fetal weight from
   3     the control mean or as a decrease equal to the control standard deviatipn divided by 2. In the
   4     second approach^ the individual litter weights were converted to dichottimous data using the 5th
   5     percentile in the corresponding control group as the definition of adversely affected pups. The
   6     model was then fit to the litter data expressed as probability of adverse response (Figure 7).
   7                  '-"'..         .   '     ••-.•''••       ."".''         '''.'
   8     ,3. Combination of data on missing or malformed ribs - The analysis explored several approaches
   9     to weighting and combining data on alterations in thoracic rib XIQ arid in lumbar ribs.
  10
  11.     Selection of the most appropriate BMD:                                   .
••"12  •-•  .   •     '   '  •"•"'      •          ,    '• ' '   .  •'•    ;"  -   -       '    •'          •<
  13            The fetal weight analysis using the continuous power model was recommended by Allen et
  1.4     al. (1996) for the following reasons:                                          ,     .
  15             '         '       ;       .;••..'•                            '   ' . .•- •
  16         N   - The BMP was lower than those calculated for rib effects or other malformations
  17.  '.         - The analysis of fetal weight effects converted to dichotomous data showed that the dose-
  is            response pattern was not the same in the two studies so the studies could not be combined
  19           for this analysis.
  20            -It is preferable to combine data from these two studies, if possible.
  21                   '            •.-..•..'.•                '.  .'
  22     Tables. Benchmark Dose Results for Boron
  23       ,           '  .  • •  -      '' -      .,  .'      .     •    ;;        •x                .   .
  24     Analysis                   MLE         BMD

  26     Fetal weight - continuous data r BMR = 5% decrease
  27            Study A             80           56
  28            Study B             68           47
  29           Combined            78           59
  30                         • ''    •,'••        .       '..''.'••    ••••••    _      \ ..";'•
  31     Fetal weight - continuous data - BMR = decrease to control SD/2
  32           Study A             73           48
  33           Study B             49           31
;34           Combined-           65        '   59                     •    "  -
  35     '.           ••'.-•           "'•-'-    "   -            •''.•'""•.•
  36     Fetal weight - dichotomized continuous data
  37           Study A             129          115                                 .[-'•'.'.
  38       .    Study B              47           31
  39           Combuied             —            — .                                 ;
  40     -——-—•————«—«.«—'—«*,««,-»—««.».^.....«_.»............___1..... .
  41       "..-...     .   '      '       :  _.  ;  '  .       '      .       '  •
  42           The BMDs from the combined continuous data using either a 5% decrease or a decrease
  43     to control SD/2 were the same, and were similar to the BMDs calculated for Studies A and B


         DRAFT-DO NOT CITE OR QUOTE        87                          AUGUST 9, 1996

-------
 1     alone, especially, those based on a 5% decrease from the control mean. These results indicate that
"2     a BMD could be calculated without-running the second study to define the NOAEL. ,
 3            '                   •'••.•.-.-•    ,.''   '.t       '   '.     • .   .
 4                                      .       .•-''''•'•'.'      •
      'DRAFT-DO NOT CITE OR QUOTE       88                        AUGUST 9, 1996

-------
H
1/3
8
3
oo

-------
.Tq  a    a o \anaa
  \
                                         Tj


                                        • a


                                         I

                                         a






                                      ~O)Q
                                      E|

                                         o
                                      CD JB



                                      &
I
0\"
H



I
                                         i
                                          a
                                          i
                                         pa
                                          S
                                          a
                                         .2P
                                         to
 a

 g

 s
 HH
 u
 H

 g
 O

 
-------
  Example #4: l,3rButadiene

  Summary of Main Points Illustrated

  BMC analysis with np NOAEL                    '
  What to do when model doesn't fit "(drop the high dose level)                  ..

  Summary of Data

  Study: Chronic inhalation exposurei study (NTP, 1991) in BeCSFj mice
 Endpoints modeled: Ovarian, testiciilar and uterine atrophy  .
 Endpoint used: Ovarian atrophy

        Data from a chronic inhalation study were used to evaluate the noncancer health effects.
 Ovarian, testieular and uterine atrophy all showed effects related to exposure level, but ovarian
 atrophy was seen at all exposure levels, with no NOAEL. The data used are shown in Table 1.


 Table 1. Data used in the benchmark concentration analysis for 1,3-butadiene.
Exposure Level
0.00
6.25 ppm
20.0 ppm.
62.5 ppm
200.0 ppm
625.0 ppm'
No. Examined
49
~ . 49
48
50
50
79
. % Affected
8.16
38.78
66.67
84.00
86.00
87.34
Selection of the BMR Level

       The BMR was chosen as the default of 10% extra risk above controls (Kimmel et al.
1996). The limit of detection was determined based on the background rate (10%) and number of
animals (50) in this study to compare with the default approach. Table 2 shows the BMR
calculated on this basis for power levels of 50% and 80%.
DRAFT-DO NOT CITE OR QUOTE       91                          AUGUST 9, 1996

-------
 I
 •*
 3
 4
 5
 6
 7
 8
 9
IP
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Table 2.. Determination of the BMR Using the Limit of Detection Approach

Background  'N          Power       Extra Risk        . ,   .,
.10
.10
50
50
50%
80%
.13
.22
Calculation of the Benchmark Concentration

Model: log-logistic model used
           »                                     >                                ' '
       Ovarian atrophy data could not be fit adequately using the quantal Weibull model. A log-
logistic model was used to model the data.The model produced a poor fit for all 6 exposure
groups, due to leveling off of the response at exposures above 62.5 ppm. Although the model
was capable of fitting the first 5 exposure levels, the best fit of the data based on a graphical
display of the fit was with exposure groups 1-4 (see Fig. 8).  However, the BMCs based on the
default of 10% risk were similar for exposure groups  1-5 and 1-4, and for additional risk and
extra risk (Table 2).  The BMCs based on the limit of detection for each power level also were
similar for exposure groups 1-5 and 1-4, indicating that the model was giving a similar fit the data
in the range of the BMR in both cases.
Table 2. BMCs for ovarian atrophy after 1,3-butadiene chronic inhalation exposure
LOAEL       Exposure Groups    BMCi0
             Modeled
BMC (based on limit of detection)
       50%          80%
                                             Additional Risk
6.25 ppm



•
1-5
1-4

1-5
1-4
1.23
1.03

1.15 .
,0.96


Extra Risk
1.54 2.91
1.29 2.43
       DRAFT-DO NOT CITE OR QUOTE
                                          92
                       AUGUST 9, 1996

-------
                           OVARIAN ATROPHY (GROUPS 1-5)
                              ' '  V   *  •.-.-.   ••   s fiiif.'	I
                                ' *•- •'  	T--i    _ '  .      ii     |

                            ^W-'!.";';^v»- ,> }">,S>%.".*;cJ.*i,v}x\""™£
                            P7-5^^' -"-, -"t<.^s/i y&f* \>-&; \*ji,„&*.


                            '"^^$ ^IC^^-'B^&fH^;
                             ' "•  *. •Sj''.  .:"%   :x»,-,'-  ,p?<'^ * -' "'$ <<> ,^,, 'i^«>/ '"V-
                              "'-.C^' >"x'\M', ;>/-; ;;,,;^
                              .„>,:,.,;  ', ;v >^%'; - •> ?st ^-.;;,-  -; "-"'•<, *-"', ^

                              '&>y^-^,:!^^ .:/..^:^'^y'.^
                                      100.00
                                CONCENTRATION(PPM)
                                                 150.00
                                                             200.00
                           OVARIAN  ATROPHY (GROUPS 1-4)
              0.00
                0.00    10.00   20.00   30.00   40.00   50.00


                                CONCENTRATION(PPM)
            60.00   70.00
                           Figure 8. 1,3-Buiadiene Modeling
DRAFT-DO NOT CITE OR QUOTE
93
AUGUST 9, 1996

-------
 .1               •.                  APPENDKE
 2               LIST OF MODELS PLANNED TO BE INCLUDED IN FIRST
 3       ,                 RELEASE OF THE EPA.SOFTWARE.
 4            '       '       '      ,   -       ,        •
 5 "   BINARY REGRESSION MODELS                                •
 6         ProbitModel                        ...
 7         Weibiill Model
 8         Logistic Model
 9         Gamma Multi-Hit Model
 10    .     Quantal Linear Model (One Hit)
 11         Quanta! Quadratic Model
 12         Multistage Model (Quantal Polynomial)  "
 13
 14    NESTED BINARY REGRESSION MODELS                :
 15         Logistic Model
 16         Rai and Van Ryzin Model                      /
 17    -     NCTRModel       .                        ' '
 18                            .
 19    CONTINUOUS REGRESSION MODELS
20         Linear Model
21         Polynomial Model
22         Power Model
23                               '"•-'•-,..-'
      DRAFT-DO NOT CITE OR QUOTE      94                    AUGUST 9, 1996

-------