DRAFT EPA/630/R-00/001
DO NOT CITE OR QUOTE October 2000
External Review Draft
Benchmark Dose Technical Guidance Document
NOTICE
THIS DOCUMENT IS A PRELIMINARY DRAFT. It has not been formally released by the
U.S. Environmental Protection Agency and should not at this stage be construed to represent
Agency policy. It is being circulated for comment on its technical accuracy and policy
implications.
Risk Assessment Forum
U.S. Environmental Protection Agency
Washington, DC 20460
-------
DISCLAIMER
This document is an external draft for review purposes only and does not constitute U.S.
Environmental Protection Agency policy. Mention of trade names or commercial products does
not constitute endorsement or recommendation for use.
-------
EXECUTIVE SUMMARY v
I. INTRODUCTION 1
A. Purpose of This Guidance Document 1
B. Background 2
C. A Brief Review of Literature Relating to Benchmark Dose %
1. Earlier uses of benchmark modeling in dose-response assessment 8
2. Properties of the Benchmark Dose S
3. Approaches to BMD Computation j_0
4. General Discussions of Standards for the Benchmark Dose 12
II. BENCHMARK DOSE GUIDANCE .13
A. Data Evaluation and Endpoint Selection J_3
1. Data Evaluation 14
a. Design L4
b. Aspects of Data Reporting 14
2. Selection of Studies to be Modeled J6
3. Selection of Endpoints to be Modeled J_6
4. Minimum Data Set for Calculating a BMD j_7
5. Combining Data for a BMD Calculation 18
B. Criteria for Selecting the Benchmark Response Level (BMR) 18
C. Modeling the Data 2J.
1. Introduction 2J_
2. Background for Model Selection 22
a. Selecting the Model 22
i. Type of endpoint 23_
ii. Experimental design 25
iii. Constraints and covariates 25
b. Model Fitting 26
-------
c. Assessing How Well the Model Describes the Data 28
d. Comparing Models 29
e. Using Confidence Limits to Get a BMDL 30
f. Selecting the model to use for POD computation 34
D. Reporting Requirements 35
E. Decision Tree 36
REFERENCES 38
EXAMPLES 53
1. Introduction 53_
2. Quantal Data: Selecting a Model 53
3. Continuous Data: Getting a Good-Fitting Model 6j_
4. Cancer Bioassay Data: Modeling POD for Cancer Slope Factor 68
5. Developmental Toxicity Example 73_
6. Human Data 80
GLOSSARY 81
-------
1 EXECUTIVE SUMMARY
2
3 The US EPA conducts risk assessments for an array of health effects that may result from
4 exposure to environmental agents, and that require an analysis of the relationship between
5 exposure and health-related outcomes. The dose-response assessment is essentially a two-step
6 process, the first being the definition of a point of departure (POD), and the second extrapolation
7 from the POD to low environmentally-relevant exposure levels. The benchmark dose (BMD)
8 approach provides a more quantitative alternative to the first step in the dose-response
9 assessment than the current NOAEL/LOAEL process for noncancer health effects, and is similar
10 to that for determining the POD proposed for cancer endpoints (EPA, 1996). As the Agency
11 moves toward harmonization of approaches for cancer and noncancer risk assessment, the
12 dichotomy between cancer and noncancer health effects is being replaced by consideration of
13 mode of action and whether the effects of concern are likely to be linear or nonlinear at low
14 doses. Thus, the purpose of this document is to provide guidance for the Agency and the outside
15 community on the application of the BMD approach in determining the POD for all types of
16 health effects data, whether a linear or nonlinear low dose extrapolation is used.
17 This guidance document discusses the computation of BMDs and benchmark
18 concentrations (BMCs), their lower confidence limits, data requirements, dose-response analysis,
19 and reporting requirements that are specific to the use of BMDs or BMCs. The following
20 convention for terminology has been adopted in this document: BMD is used generically to refer
21 to the benchmark dose approach; in the more specific cases, BMD and BMC refer to the central
22 estimates, for example the EDx or ECx for dichotomous endpoints (with x referring to some
23 level of response above background, e.g., 5% or 10%). BMDL or BMCL refers to the
24 corresponding lower limit of a one-sided 95% confidence interval on the BMD or BMC,
25 respectively. This is consistent with the terminology introduced by Crump (1995) and with that
26 used in the EPA's BMD software (BMDS) which is freely available on the Internet at
27 http://www.epa.gov/ncea/bmds.htm. This terminology is a change, however, from that used in
28 previous Agency documents (e.g., EPA, 1995), but has been adopted because it more clearly
29 conveys the fact that the BMDL refers to the lower confidence limit on the dose that would result
-------
1 in the required response.
2 As indicated above, the BMD approach is an alternative to the NOAEL/LOAEL approach
3 that has been used for many years in dose-response assessment. The development of this
4 approach has been pursued because of recognized limitations in the NOAEL/LOAEL approach.
5 However, it is likely that there will continue to be endpoints that are not amenable to modeling
6 and for which a NOAEL/LOAEL approach must be used. In some cases, there may be a
7 combination of BMDs and NOAELs to be considered in the assessment of a particular agent, and
8 the most appropriate value to use for dose-response assessment must be made by the risk assessor
9 on the basis of scientific judgment and the modeling results.
10 This document addresses a number of issues that must be resolved in order to apply the
11 BMD approach for dose-response assessment in a consistent manner:
12 1. Determination of appropriate studies and endpoints on which to base BMD calculations;
13 2. Selection of the benchmark response (BMR) value;
14 3. Choice of the model to use in computing the BMD;
15 4. Details surrounding computation of the confidence limit for the BMD (BMDL); and
16 5. Reporting requirements for BMD and BMDL computation.
17 Determination of appropriate studies and endpoints on which to base BMD calculations.
18 Following the hazard characterization and selection of appropriate endpoints to use for the dose-
19 response assessment, the studies appropriate for modeling and BMD analysis can be evaluated.
20 All studies that show a graded monotonic response with dose likely will be useful for BMD
21 analysis, and the minimum data set for calculating a BMD should at least show a significant
22 dose-related trend in the selected endpoint(s). It is preferable to have studies with one or more
23 doses near the level of the BMR to give a better estimate of the BMD, and thus, a shorter
24 confidence interval. Studies in which all the dose levels show changes compared with control
25 .values (i.e., there is no NOAEL) are readily useable in BMD analyses, unless the lowest response
26 level is much higher than the BMR.
27 There are at least three types of endpoint data: dichotomous (quantal), continuous, and
28 categorical. This guidance provides definitions of these three types of data, and what information
29 is needed in order to model the responses. For example, a dichotomous response may be
VI
-------
1 reported as either the presence or absence of an effect, a continuous response may be reported as
2 an actual measurement, or as a contrast (absolute change from control or relative change from
3 control), hi the case of continuous data, when individual data are not available, the number of
4 subjects, mean of the response variable, and a measure of response variability (e.g., standard
5 deviation (SD), standard error (SE), or variance) are needed for each group. For categorical data,
6 the responses in the treatment groups are often characterized in terms of the severity of effect
7 (e.g., mild, moderate, or severe histological change), hi general, endpoints that have been judged
8 by the risk assessor to be appropriate and relevant to the exposure should be modeled if their
9 LOAEL is up to 10-fold above the lowest LOAEL. This will help ensure that no endpoints with
10 the potential to have the lowest BMDL are excluded from the analysis on the basis of the value of
11 the LOAEL or NOAEL. Selected endpoints from different studies that are likely to be used in
12 the dose-response assessment should all be modeled, especially if different uncertainty factors
13 may be used for different studies and endpoints. As indicated above, the selection of the most
14 appropriate BMDs and/or NOAELs (if some endpoints cannot be modeled) to use for
15 determination of the POD must be made by the risk assessor using scientific judgement and
16 principles of risk assessment, as well as the results of the modeling process.
17 Selection of the benchmark response (BMR) value. The calculation of a BMD is directly
18 determined by the selection of the BMR. This guidance provides default criteria to be used for
19 selecting the BMR in the case of quantal data and continuous data. For quantal data, an excess
20 risk of 10% is the default BMR, since the 10% response is at or near the limit of sensitivity in
21 most cancer bioassays and in some noncancer bioassays as well. If a study has greater than usual
22 sensitivity, then a lower BMR can be used, although the ED10 and LED10 should always be
23 presented for comparison purposes.
24 For continuous data, if there is an accepted level of change in the endpoint that is
25 considered to be biologically significant then that amount of change is the BMR. Otherwise, if
26 individual data are available and a decision can be made about what individual levels should be
27 considered adverse, the data can be "dichotomized" based on that cutoff value, and the BMR set
28 as above for quantal data. Alternatively, in the absence of any other idea of what level of
29 response to consider adverse, a change in the mean equal to one control SD from the control
VII
-------
1 mean can be used. The control SD can be computed including historical control data, but the'
2 control mean must be from data concurrent with the treatments being considered. Regardless of
3 which method of defining the BMR is used for a continuous dataset, the effective dose
4 corresponding to one control SD from the control mean response, as would be calculated for the
5 latter definition, should always be presented for comparison purposes.
6 Choice of the model to use in computing the BMD. The goal of the mathematical
7 modeling in BMD computation is to fit a model to dose-response data that describes the data set,
8 especially at the lower end of the observable dose-response range. In practice, this involves first
9 selecting a family or families of models for further consideration, based on characteristics of the
10 data and experimental design, and fitting the models using one of a few established methods.
11 Subsequently, a lower bound on dose is calculated at the BMR. The guidance document
12 introduces the topic of dose-response modeling and provides information on model selection for
13 different types of data. In addition, model fitting, determining goodness-of-fit, and comparing
14 models to decide which one to use for obtaining the POD are discussed. The guidance
15 recommends that oc=0.1 be used to compute the critical value for goodness of fit, instead of the
16 more conventional values of 0.05 or 0.01, and that a graphical display of the model fit be
17 examined as well. For comparison of models and selection of the model to use for BMDL
18 computation, the use of Akaike's Information Criterion (AIC) is recommended.
19 Computation of the confidence limit for the BMD (BMDL). The guidance document
20 discusses the computation of the confidence limit for the BMD, the fact that the method by which
21 the confidence limit is obtained is typically related to the data type, and the manner in which the
22 BMD is estimated from the model. Details for approaches to CI computation specific to
23 particular data types (quantal, clustered, continuous, multiple outcomes) are provided in the
24 document.
25 Reporting requirements from the BMD/BMDL calculations. The guidance document lists
26 a number of reporting requirements for the BMD and BMDL. These are considered important
27 for the risk assessor to judge whether or not the choice of studies and endpoints for modeling has
28 been done appropriately and whether the most appropriate BMD and BMDL have been selected
29 as the POD for low dose extrapolation.
via
-------
1 In summary, the guidance document provides a decision tree that discusses step-by-step
2 the process to be used in evaluating studies and endpoint types that are appropriate for modeling,
3 selecting the BMR level, model fitting and BMD computation, judging the fit of the model, and
4 the calculation of the BMDL. Finally, the document provides several examples of BMD and
5 BMDL derivation using the EPA BMDS software.
IX
-------
1 I. INTRODUCTION
2
3
4 A. Purpose of This Guidance Document
5
6 The purpose of this document is to provide guidance for the Agency and the outside
7 community on the application of the benchmark dose approach to determining the point of
8 departure (POD) for linear or nonlinear extrapolation of health effects data. This guidance
9 discusses computation of benchmark doses and benchmark concentrations (BMDs and BMCs)
10 and their lower confidence limits, data requirements, dose-response analysis, and reporting
11 requirements. The document provides guidance based on today's knowledge and understanding,
12 and on experience gained in using this approach. The Agency is actively applying this
13 methodology and evaluating the outcomes for the purpose of gaining experience in using it with
14 a variety of endpoints. This document is intended to be updated as new information becomes
15 available that would suggest approaches and default options alternative or additional to those
16 indicated here and should not be viewed as precluding additional research on modified or
17 alternative approaches that will improve quantitative risk assessment. In fact, the use of
18 improved scientific understanding and development of more mechanistically-based approaches to
19 dose-response modeling is strongly encouraged by the Agency.
20 Benchmark dose modeling is a highly technical exercise and this guidance is a technical
21 document generally targeted at readers with sufficient background in this area. The document is
22 not intended as a primer on modeling or risk assessment. The availability of software to facilitate
23 the analysis can make the modeling appear deceptively simple, but often interpretation of the
24 results is not trivial. It is recommended that BMD modeling be performed by or in collaboration
25 with a statistician or someone familiar with the potential pitfalls of this type of analysis.
26 Similarly, this document is not intended as a primer on toxicology; the procedures described
27 herein do not replace the expert judgements of toxicologists and others who address the hazard
28 characterization issues in risk assessment. Expert judgements on study quality, toxicological
29 significance of observed effects, etc., are required independent of the use of BMD analysis and
1
-------
1 are beyond the scope of this document. It is likewise beyond the scope of this document to
2 provide guidance for RfC, RfD, or cancer potency computation, which are also more general risk
3 assessment issues.
4 Since the methods for BMD computation require appropriate software, another purpose
5 of this document is to provide enough information about preferred computational algorithms to
6 allow users to make an informed choice in the selection of that software. The document does not
7 advocate use of any particular software package, although it is recommended that software with
8 well documented algorithms, such as the Agency's BMDS package, be used. Nor is this
9 guidance intended to document any particular software package, although it will present
10 examples for illustrative purposes that use the Agency's BMDS package. It is also expected that
11 this guidance will inform the design of studies for the computation of BMDs and dose-response
12 analysis, though this will not be covered explicitly.
13
14 B. Background
15
16 The US EPA conducts risk assessments for an array of health effects that may result from
17 exposure to environmental agents. The process of risk assessment, based on the National
18 Research Council paradigm (NRC, 1983), has several steps: hazard characterization, dose-
19 response assessment, exposure assessment, and risk characterization. Hazard characterization
20 includes a thorough evaluation of all the available data to identify and characterize potential
21 health hazards. Dose-response assessment involves an analysis of the relationship between
22 exposure to the chemical and health-related outcomes, and historically has been done very
23 differently for cancer and noncancer health effects because of perceived differences between the
24 mechanistic underpinnings of cancer and other toxic effects. As our understanding of the
25 underlying biology of toxic effects has grown, however, the apparent differences between cancer
26 and noncancer effects have lessened, to the point where it seems reasonable to develop
27 quantitative methods based on similar considerations for all types of health effects, and to make
28 approaches to dose-response assessment as consistent across health endpoints as our current
29 mechanistic understanding allows. This section provides an overview of EPA's approaches to
-------
1 dose-response assessment for cancer and non-cancer effects, and of the basis for developing more
2 broadly applicable quantitative methods.
3 The primary distinction between characterizing risks of cancer and noncancer effects has
4 been the expectation that cancer induction could result from even a single gene mutation in a
5 single cell, while noncancer effects were generally assumed to occur only if a minimum, but
6 possibly large, amount of damage had occurred. The practice for assessing dose-response for
7 cancer effects has been to fit a statistical model (linearized multistage procedure) to tumor
8 incidence data, and to assume low dose linearity to extrapolate risk at lower doses (USEPA,
9 1986). The modeling addresses variability in the data through an upper 95% bound on the slope
10 of the relationship between exposure and risk at very low risk levels, typically 10"5 to 10"6.
11 In contrast, the standard practice for the dose-response analysis of health effects other
12 than cancer has been to estimate the minimum dose not to be exceeded, by identifying a lowest-
13 observed-adverse-effect-level (LOAEL) and a no-observed-adverse-effect-level (NOAEL) from
14 an appropriate study. The LOAEL is the lowest dose for a given chemical at which adverse
15 effects have been detected, while the NOAEL is the highest dose at which no adverse effects
16 have been detected. The NOAEL (or LOAEL, if a NOAEL is not present) is adjusted downward
17 by uncertainty factors intended to account for limitations and uncertainties in the available data,
18 to arrive at an exposure that is likely to be without an appreciable risk of deleterious effects in
19 humans, that is, the reference dose (RfD) or reference concentration (RfC). Unlike cancer dose-
20 response modeling, variability in the observed responses is not addressed.
21 It has been tempting to use the dose level below which no effects are observed in a study
22 (sometimes called a "practical threshold") as an important point for describing a dose-response
23 curve because of a presumed relationship between such a practical threshold and true thresholds
24 (i.e., true no effect levels) in the dose-response. In fact, the practical threshold is really a
25 consequence of the fact that any finite study has an inherent limit of detection, and is of little
26 practical utility in describing toxicological dose-responses. In other words, the NOAEL does not
27 represent a biological threshold and does not imply that lower exposure levels are without risk.
28 Specific limitations of the NOAEL/LOAEL approach are well known and have been discussed
29 extensively (Crump, 1984; Gaylor, 1983; Kimmel and Gaylor, 1988; Leisenring and Ryan, 1992;
-------
1 EPA, 1986b, 1988a,b, 1989c; 1995c):
2 The NOAEL/LOAEL is highly dependent on dose selection since the NOAEL/LOAEL is
3 limited to being one of the doses included in a study.
4 The NOAEL/LOAEL is highly dependent on sample size. The ability of a bioassay to
5 distinguish a treatment response from a control response decreases as sample size
6 decreases1, so that the NOAEL for a compound (and thus the POD) will tend to be higher
7 in studies with smaller numbers of animals per dose group.
8 More generally, the NOAEL/LOAEL approach does not account for the uncertainty in the
9 estimate of the dose-response which is due to the characteristics of the study design.
10 NOAELs/LOAELs do not correspond to consistent response levels for comparisons
11 across studies/chemicals/endpoints and for use as PODs for the derivation of RfCs.
12 The slope of the dose-response curve is not taken into account in the selection of a
13 NOAEL or LOAEL, and is not usually considered unless the slope is very steep or very
14 shallow.
15 A LOAEL cannot be used to derive a NOAEL when a NOAEL does not exist in a study.
16 Instead, a tenfold uncertainty factor has been routinely applied to the LOAEL to account
17 for this limitation.
18 While the NOAEL has typically been interpreted as a threshold (no-effect level),
19 simulation studies (i.e, Leisenring and Ryan, 1992) and reanalyses of developmental
20 toxicity bioassay data (Allen et al, 1994a) have demonstrated that the rate of response
21 above control at doses fitting the criteria for NOAELs, for a range of study designs, is
22 about 5-20% on average, not 0%.
23
24 In an effort to address some of the limitations of the LOAEL and NOAEL, Crump (1984)
25 proposed the benchmark dose (BMD) approach as an alternative (see section I.C. for more
'Note that for a study utilizing 6 animals per dose group, the 95% upper confidence limit (UCL)
on an observed adverse response rate of 0% is 49%. That is, NOAELs chosen on the basis of no
observed response in 6 animals could be too high a substantial proportion of the time. The 95%
UCLs for groups of 10, 20 and 50 animals are 31% , 17%, and 7%, respectively, underscoring the
importance of adequate sample sizes.
-------
1 details). Benchmark dose modeling makes no particular assumption about the nature of
2 toxicological dose-responses, other than that the change in response generally does not decrease
3 with higher doses. In particular, there is no specific assumption of the relationship between a
4 putative no-effect level in the dose-response and the benchmark dose. The goal of the BMD
5 approach is to define a starting point of departure (POD) for the computation of a reference value
6 (e-g-, the RiD or RfC) or for linear low-dose extrapolation that is more independent of study
7 design.
8 The BMD approach parallels the recommendations in EPA's Proposed Guidelines for
9 Carcinogen Risk Assessment (1996a) regarding modeling tumor data and other (non-cancer)
10 responses thought to be important precursor events in the carcinogenic process. The proposed
11 guidelines promote the understanding of an agent's mode of action in determining the dose-
12 response(s). Moreover, the dose-response extrapolation procedure follows conclusions in the
13 hazard assessment about the agent's carcinogenic mode of action. The dose-response assessment
14 under the proposed guidelines is a two-step process: (1) response data are modeled in the range
15 of empirical observation ~ modeling in the observed range is done with biologically-based, case-
16 specific, or appropriate curve-fitting models; and then (2) extrapolation below the range of
17 observation is accomplished by modeling if there are sufficient data or by a default procedure
18 (linear, nonlinear, or both). For the default procedures, a point of departure (POD) for
19 extrapolation is estimated from this modeling. The linear default is a straight-line extrapolation
20 to the background response level from the POD, while the nonlinear default approach begins at
21 the identified POD and provides either a margin of exposure (MOE) analysis or a reference value
22 such as and RfD or RfC rather than estimating the probability of effects at low doses.
23 In the case of deriving reference values for noncancer effects, the POD is adjusted
24 downward, to account for the uncertainty that is contributed by extrapolation from experimental
25 animals to humans and to account for within human variability, as well as other limitations in the
26 available data. Note that the NOAEL or LOAEL has been used as a default POD for low dose
27 estimation or extrapolation, so that the primary difference between the two approaches is in how
28 the starting point is determined. The POD for BMD modeling is the BMDL, or the lower 95%
29 bound on the dose/exposure associated with the benchmark response, typically 10% above the
-------
1 control response. Using the lower bound accounts for the uncertainty inherent in a given study,
2 and assures (with 95% confidence) that the desired BMR is not exceeded (see section II.B. for a
3 complete discussion of selecting the benchmark response).
4 As detailed above, the BMD approach is generally a preferable alternative to the
5 NOAEL/LOAEL approach. For instance, a BMDL can be estimated even when all doses in a
6 study are associated with a significant adverse response (i.e., when there is no NOAEL). Note,
7 however, that there are some instances in which the NOAEL/LOAEL is the better choice. In
8 particular, the available data may not be amenable to modeling, such as when all individuals in
9 exposed groups respond. In such a case, BMD models may fail to fit the observed data, which
10 provide very little resolution in the region of the benchmark response (usually 10%) anyway
11 (although in such a case, the LOAEL is not very informative, either). Another circumstance may
12 happen when an observed effect is so rare that it is not statistically significantly different from
13 the control response, but may be found to be biologically meaningful (e.g., an increase in a rare
14 malformation).
15 Note that the literature has used the terms BMD and BMDL in a confusing way (Crump,
16 1984,1995). There is frequent need to refer to the central estimate and the lower confidence
17 limit, as well as a more generically-defined point of departure in discussions of dose-response
18 assessment. In this document, when talking in technical detail about the process of deriving
19 benchmark doses, "BMD" or "BMC" will refer to the central estimate of the dose that is
20 expected to yield the BMR, for example, the ED10, or EC10, and "BMDL" or "BMCL" will refer
21 to the lower end of a one-sided confidence interval for that central estimate. "BMD" will be used
22 to refer to the entire process. The POD for low dose extrapolation or for setting the RfD/C will
23 be the BMDL or BMCL. To simplify further discussion in this document, we will use BMD and
24 BMDL genetically to mean oral or inhalation values, unless stated otherwise.
25 Illustrative Example: Using the BMD approach, the experimental data are modeled, and
26 the benchmark dose (BMD) in the observable range is estimated (see Fig. 1). Unlike NOAELs
27 and LOAELs, the BMD is not constrained to be one of the experimental doses, and the BMDL
28 can thus be used as a more consistent POD than either the LOAEL or NOAEL. The BMDL
29 accounts for the uncertainty in the estimate of the dose-response that is due to characteristics of
-------
50
100
Dose
150
200
Figure 1 Sample of a model fit to dichotomous data, with
BMD and BMDL indicated. The fraction of animals affected
in each dose group is indicated by diamonds. The error bars
indicate 95% confidence intervals for the fraction affected.
The BMR for this example is an Extra Risk of 10%. The
dashed curve indicates the BMDL for a range of BMRs. The
dose labeled BMDL corresponds to the lower end of a one-
sided 95% confidence interval for the BMD.
1 the experimental design. The BMD approach models all of the data in a study and the shape of
2 the dose-response curve is integral to the BMDL estimation.
3 Since the benchmark dose procedure is quite general, a number of issues need to be
4 addressed before benchmark doses can be used in a consistent manner for dose-response
5 assessment:
6 1. how to select studies on which to base BMD calculations;
7 2. selectionof endpoints on which to base BMD calculations;
8 3. selection of the benchmark response (BMR) value;
9 4. choice of the model to use in computing the BMD;
10 5. details surrounding computation of the confidence limit for the BMD (BMDL);
11 6. what information from the BMD calculation should be reported
12 These issues will be covered in some detail in the following chapters.
13
-------
1 C. A Brief Review of Literature Relating to Benchmark Dose
2 1. Earlier uses of benchmark modeling in dose-response assessment
3 Benchmark dose-like approaches to dose-response assessment are not new. The
4 procedure of Mantel and Bryan (1961) formerly was used widely for conservative low-dose
5 cancer risk assessment. Their procedure calculated an upper confidence limit on the excess
6 tumor incidence at the lowest experimental dose or an upper confidence limit on the tumor
7 incidence at the dose estimated to produce a 1% tumor incidence, essentially a benchmark dose.
8 Assuming a probit-log dose model, a conservative low-dose slope of one probit per factor of 10
9 reduction in dose below the upper limit on the benchmark dose was used to provide an upper
10 bound estimate of cancer incidence at low doses. Gaylor and Kodell (1980), Van Ryzin (1980),
11 and Farmer et al. (1982) proposed low-dose linear extrapolation to zero excess risk from the
12 upper confidence limit on the excess incidence above background of an adverse effect at the
13 lowest experimental dose or dose corresponding to a 1% incidence, again, a benchmark dose, to
14 provide an upper bound on low-dose risks for convex (sublinear) dose-response curves. Gaylor
15 (1983) and Krewski et al. (1984) compare linear extrapolation and safety factors for controlling
16 low-dose risk. Crump (1984) first coined the term "benchmark dose," although variations of a
17 benchmark dose procedure had been in use since the process developed by Mantel and Bryan
18 (1961).
19 2. Properties of the Benchmark Dose
20 A number of research efforts, many of which have dealt with reproductive and
21 developmental toxicity data, have provided extremely useful information for application of the
22 BMD approach (e.g., Alexeeff et al., 1993; Catalano et al., 1993; Chen et al., 1991; Krewski and
23 Zhu, 1994, 1995; Auton, 1994; Crump, 1995; Fowles, et al., 1999). In a series of papers by
24 Faustman et al. (1994), Allen et al. (1994a and b), and Kavlock et al. (1995), the BMD approach
25 was applied to a large database of developmental toxicity studies. In brief, the results of these
26 studies showed that when the data were expressed as the proportion of affected fetuses per litter
27 (nested dichotomous data), the NOAEL was on average 0.7 times the BMDL for a 10%
28 probability of response, and was approximately equal, on average, to the BMDL for a 5%
29 probability of response. When data were expressed as counts of dichotomous endpoints (i.e.,
-------
1 number of litters per dose group with resorptions or malformations), the NOAEL was
2 approximately 2-3 times higher than the BMDL for a 10% probability of response above control
3 values (approximately 20 animals per dose group), and 4-6 times higher than the BMDL for a 5%
4 probability of response. Expressing the data as the proportion of affected fetuses per litter is the
5 more appropriate way to analyze developmental toxicity data. However, the results of the
6 quantal data analysis also may apply to using the BMDL approach with other quantal data, and
7 suggest that the NOAEL in these cases may be at or above the 10% true response level,
8 depending on sample size and background rate.
9 Since reduced fetal weight in developmental toxicity studies often shows the lowest
10 NOAEL among the various endpoints evaluated, the application of the BMD to these continuous
11 data also was evaluated (Kavlock et al., 1995). A variety of cutoff values was explored for
12 defining an adverse level of weight reduction below control values. In some cases, data were
13 analyzed using a continuous power model, and in other cases, the data were transformed to
14 dichotomous data. Comparisons with the NOAEL showed that several cutoff values could be
15 used to give values similar to the NOAEL. These analyses suggest ways in which BMDLs may
16 be developed for continuous data from a variety of endpoints.
17 Fowles, et al. (1999) examined acute inhalation lethality data, and compared NOAELs to
18 benchmark doses corresponding to 1%, 5%, and 10% response incidences. Sample sizes
19 averaged around 10 - 20 animals per dose group. Similarly to the "quantal" parts of the results of
20 the Allen et al. (1994, a and b) studies, BMDLs based on 10% incidence corresponded
21 approximately to NOAELS. However, because the dose-response for lethality is so steep,
22 BMDLs for 5% and 1% incidences were very close to those for 10% incidence. As a result, the
23 BMDLs for a 1% incidence were on average only about 1.6 or 3.6 times smaller than a NOAEL,
24 depending on whether a log-probit or Weibull model was used.
25 A simulation study by Kavlock et al. (1996) examined various aspects of study design
26 (number of dose groups, dose spacing, dose placement, and sample size per dose group) for two
27 endpoints of developmental toxicity (incidence of malformations and reduced fetal weight). Of
28 the designs evaluated, the best results (that is, those with the shortest confidence intervals) were
29 obtained when two dose levels had response rates above the background level, one of which was
-------
1 near the BMR. In this study, there was virtually no advantage in increasing the sample size from
2 10 to 20 litters per dose group. When neither of the two dose groups with response rates above
3 the background level was near the BMR, satisfactory results were also obtained, but the BMDLs
4 tended to be lower. When only one dose level with a response rate above background was
5 present and near the BMR, reasonable results for the maximum likelihood estimate and BMDL
6 were obtained, but in this case, there were benefits of larger dose group sizes. The poorest
7 results were obtained when only a single group with an elevated response rate was present, and
8 the response rate was much greater than the BMR.
9 3. Approaches to BMD Computation
10 Many noncancer health effects are characterized by multiple endpoints that are not
11 completely independent of one another. Lefkopoulou et al. (1989), Chen et al. (1991), Ryan
12 (1992), Catalano et al. (1993), Zhu et al. (1994), Krewski and Zhu (1995), and Fung et al. (1998)
13 have worked on this issue using developmental toxicity data, and have shown that, in general, the
14 BMDL derived from a multinomial modeling approach is lower than that for any individual
15 endpoint. This approach has not been applied to other health effects data, but should be kept in
16 mind when multiple related outcomes are being considered for a particular health effect.
17 Dose-response modeling for continuous endpoints is made more difficult because there is
18 not a natural probability scale in which to characterize risk. Of course, one approach is to
19 explicitly dichotomize such continuous endpoints, and then model the explicitly dichotomized
20 endpoints as any other quantal endpoint. In separate 1995 papers, Crump and Kodell et al.
21 detailed a new approach to deriving a BMDL for continuous data based on a method originally
22 proposed by Gaylor and Slikker (1990). This approach makes use of the distribution of
23 continuous data, estimates the incidence of individuals falling above or below a level considered
24 to be adverse or at least abnormal, and gives the probability of responses at specified doses above
25 the control levels. This results in an expression of the data in the same terms as that derived
26 from analyses of quantal data, that is, it implicitly dichotomizes the data while retaining the full
27 power of modeling the continuous data while allowing direct comparison of BMDs and BMDLs
28 derived from continuous and quantal data. Gaylor (1996) compared benchmark doses computed
29 for continuous endpoints directly to those computed after first explicitly dichotomizing the data,
10
-------
1 and found that, even for moderate sample sizes, substantial precision was lost upon explicitly
2 dichotomizing the data. West and Kodell (1999) compared such an implicit method for
3 continuous data to the result of modeling explicitly dichotomized endpoints. They found that, for
4 sample sizes in the range of 10 to 20 animals per dose group, the implicit approach gave
5 substantially better results than did the approach of modeling explicitly dichotomized data. Thus,
6 when it is possible to do, it is generally better to derive BMDs and BMDLs for continuous data
7 from models of the continuous data (perhaps using the hybrid approaches described by Gaylor
8 and Slikker, 1990, Crump, 1995 or Kodell et al., 1995).
9 Most approaches to benchmark dose modeling have focused on modeling a single or
10 multiple responses from a single study. Categorical regression modeling (Dourson et al., 1985;
11 Hertzberg, 1989; Hertzberg and Miller, 1985; Guth et al, 1997; Simpson et al, 1996ab) allows the
12 results for multiple endpoints across studies to be used to make an overall assessment of the
13 toxicity of a compound, based on a larger data base. Although so far this method has not been
14 widely used for benchmark dose computation, it shows promise as a way to more quantitatively
15 and rigorously combine information from a rich database.
16 Bayesian approaches to benchmark dose calculation express the uncertainty in the
17 benchmark dose estimate with a probability distribution (in Bayesian parlance, the posterior
18 distribution), in contrast to the confidence limits used by the more commonly used frequentist
19 approach (Hasselblad and Jarabek, 1995). Although the Bayesian approach has not been widely
20 used so far, it has some potentially useful features. It would be relatively easy to combine results
21 from different data sets to provide a more robust estimate, along with an evaluation of the
22 uncertainty in that estimate that would take into account the variability among studies. This
23 would be a clear improvement over the more widely used methods, which only quantify the
24 uncertainty inherent in a single study.
25 Gaylor, et al. (1998) reviewed statistical methods for computing benchmark doses, and
26 Murrel et al. (1998) discussed some consequences of basing the benchmark dose on a confidence
27 limit and suggested an approach for setting benchmark response levels for continuous endpoints.
28
29
11
-------
1 4. General Discussions of Standards for the Benchmark Dose
2 Several workshops and symposia have been held to discuss the application of the BMDL
3 and appropriate methodology (Kimmel et al., 1989; California EPA, 1993; Beck et al., 1993;
4 SRA Symposium, 1994; Barnes et al., 1995). The participants at the EPA/AIHC workshop
5 (Barnes et al., 1995) generally endorsed the application of the BMD approach for all quantal
6 noncancer endpoints and particularly for developmental toxicity, where a good deal of research
7 has been done. Less information was available at the time of the workshop on the application of
8 the BMD approach to continuous data, and more work was encouraged. A number of other
9 issues concerning the application of the BMD approach were discussed. The guidance and
10 default options set forth in the current document are based in part on the outcome of this
11 workshop, the background document (EPA, 1995c), and on more recent information and
12 discussions, including those at a peer consultation workshop on the 1996 draft of this report
13 (USEPA, 1996).
12
-------
1 II. BENCHMARK DOSE GUIDANCE
2
3 This section describes the proposed approach for carrying out a complete BMDL analysis.
4 It is organized in the form of a decision process including the rationale and defaults for
5 proceeding through the analysis, and follows a similar framework to that outlined in the
6 background document (EPA, 1995c). The guidance here imposes some constraints on the
7 BMDL analysis through decision criteria, and provides defaults when more than one feasible
8 approach exists.
9
10 A. Data Evaluation and Endpoint Selection
11
12 The first step in the process of hazard characterization is a complete review of the toxicity
13 data available on an agent to identify and characterize the hazards related to a particular
14 compound or exposure situation. This involves the determination of adverse effects or
15 precursors of adverse effects from all available data and the most appropriate endpoints, the so-
16 called "critical effect(s)," on which to base the NOAEL or BMD. Guidance on review of
17 endpoint data for hazard characterization can be found in a number of EPA publications (EPA,
18 199la, 1994c, 1995f, 1996a and b). This process is essentially the same whether using a BMD or
19 a NOAEL approach. The following discussion summarizes some of the more important issues
20 related to study design and data reporting when using the BMD approach. This guidance does
21 not change the way in which hazard characterization is done, particularly regarding the
22 determination of adversity and selection of endpoints. It does discuss the types of data and study
23 designs most amenable to dose-response modeling, but allows for the possibility that NOAELs
24 will continue to be used for some endpoints, and that in some cases there will be a combination
25 of BMDs and NOAELs to be considered in the assessment of a particular agent.
26
13
-------
1 1. Data Evaluation
2 a. Design
3 In general, studies with more dose groups and a graded monotonic response with dose
4 will be more useful for BMD analysis. Studies with only a single dose showing a response
5 different from controls may not be appropriate for BMD analysis, though if the one elevated
6 response is near the BMR, adequate BMD and BMDL computation may result (see Kavlock, et
7 al, 1996). Studies in which responses are only at the same level as background or at or near the
8 maximal response level are not considered adequate for BMD analysis. It is preferable to have
9 studies with one or more doses near the level of the BMR to give a better estimate of the BMD,
10 and thus, a shorter confidence interval. Studies in which all dose levels show changes compared
11 with control values (i.e., no NOAEL) are readily useable in BMD analyses, unless the lowest
12 response level is much higher than that at the BMR.
13 b. Aspects of Data Reporting
14 In many cases, the risk assessor must rely on published reports of key toxicological
15 studies in performing a dose-response assessment. Reports from the peer-reviewed literature
16 may contain summary information which can vary in completeness vis-a-vis the data
17 requirements of the BMD method. The optimal situation is to have information on individual
18 subjects, but this is unlikely in published reports. It is more common to have summary
19 information (group level information, e.g., mean and standard deviation) concerning the
20 measured effect, especially for continuous response variables, and it must be determined whether
21 the summary information is adequate for the BMD method to proceed.
22 Dichotomous data are normally reported at the individual level (e.g., 11/50 animals
23 showed the effect). Occasionally a dichotomous endpoint will be reported as being observed in a
24 group with no mention of the number of animals showing the effect. This usually occurs when
25 the incidence of the endpoint reported is ancillary to the focus of the report. For BMD modeling
26 of dichotomous data, both the number showing the response and the total number of subjects in
27 the group are necessary.
28 Continuous data are reported as a measurement of the effect, such as body weights or
29 enzyme activity in control and exposed groups. The response might be reported in several
14
-------
1 different ways, e.g., as an actual measurement, or as a contrast (absolute change from control or
2 relative change from control). To model continuous data when individual animal data are not
3 available, the number of subjects, mean of the response variable, and a measure of variability
4 (e.g., standard deviation, SD; standard error, SE; or variance) are needed for each group. The
5 lack of a numerically reported SD or SE precludes the calculation of the BMD. In some cases, a
6 measure of variability is presented for the control group only and this information can be used for
7 modeling by making an assumption, for example, that the variance in the exposed groups is the
8 same as the controls. However, the modeling of data and calculation of the confidence limits
9 will not be as precise as when the variance information is available for individual groups.
10 Categorical data are defined as a type of quantal data in which there is more than one
11 defined category in addition to the no-effect category and the responses in the treatment groups
12 are characterized in terms of the severity of effect (e.g., mild, moderate, or severe histological
13 change). Results may be classified by reporting an entire treatment group in terms of category
14 (group level reporting), or by reporting the number of animals from each group in each category
15 (individual level reporting). For example, a report of epithelial degenerative lesions might state
16 that an exposed group showed a mild effect (group level) or that in the exposed group there were
17 7 animals with a mild effect and 3 with no effect (individual level reporting), hi the latter case,
18 the BMD can be calculated using a quantal model after combining data in severity categories
19 (e.g., model all animals with a particular severity of effect or all with greater than a mild effect).
20 Dichotomous data can be viewed as a special case in which there is one effect category and the
21 possible response is binary (e.g., effect or no effect). Information may also be treated as
22 categorical in cases where an endpoint is inherently a dichotomous or continuous variable, but
23 because the endpoint is reported only descriptively, and the number affected and total number
24 exposed are not reported, it cannot be treated quantitatively. Modeling approaches have been
25 discussed for categorical data with multiple categories (Dourson et al., 1985; Hertzberg, 1989;
26 Hertzberg and Miller, 1985) and for group level categorical data (Guth et al., 1997, Simpson et
27 al., 1996a,b). These regression models can also be used to derive a BMD, by estimating the
28 probability of effects of different levels of severity.
29
15
-------
1 2. Selection of Studies to be Modeled
2 Following a complete review of the toxicity data, the risk assessor must select the studies
3 appropriate for benchmark dose analysis. The selection of the appropriate studies is based on the
4 human exposure situation being addressed, the quality of the studies, and the relevance and
5 reporting adequacy of the endpoints.
6 The process of selecting studies for benchmark dose analysis is intended to identify those
7 studies for which modeling is feasible, so that BMDLs can be calculated and used in dose-
8 response assessment. In most cases, the selection process will identify a single study or very few
9 studies for which calculations are relevant; all studies considered relevant should be modeled.
10 Cases in which there are a number of studies, or studies with a number of endpoints reported may
11 require a large number of BMD calculations. In these cases, it may be possible to select a subset
12 of endpoints as representative of the effects in the target organ or the study. This selection can be
13 made on the basis of sensitivity or severity, which may be more easily compared within a single
14 study in the same target organ than across studies.
15 3. Selection of Endpoints to be Modeled.
16 Once studies have been evaluated with regard to their appropriateness for BMD modeling,
17 the selection of endpoints to model should focus on the dose-response relationships. For example,
18 differences in slope (at the BMR) among endpoints could affect the relative values of the BMDLs
19 to the corresponding LOAELs/NOAELs. Thus, selection of endpoints should not be limited to only
20 the one with the lowest LOAEL. In general, endpoints within a study that have been judged by the
21 risk assessor to be appropriate and relevant to the exposure should be modeled if their LOAEL is up
22 to 10-fold above the lowest LOAEL. This will help ensure that no endpoints with the potential to
23 have the lowest BMDL are excluded from the analysis on the basis of the value of the LOAEL or
24 NOAEL. Selected endpoints from different studies that are likely to be used in determination of the
25 POD should all be modeled, especially if different uncertainty factors may be used for different
26 studies and endpoints. The selection of the most appropriate BMDs to use for determining the POD
27 must be made by the risk assessor using scientific judgement and principles of risk assessment, as
28 well as the results of the modeling process.
29
16
-------
1 4. Minimum Data Set for Calculating a BMD
2 Once the critical endpoints have been selected, data sets are examined for the appropriateness
3 of a BMD analysis. The following constraints on data sets to use for BMD calculations should be
4 applied:
5 4 There must be at least a statistically or biologically significant dose-related trend in the
6 selected endpoint.
7 4 The data set should contain information relevant to dose-response for modeling. A
8 determination of the amount of information about the dose-response that is available need
9 not be quantitative or technical. For example, a data set in which all non-control doses have
10 essentially the same response level provides limited information about the dose-response,
11 since the complete range of response from background to maximum must occur somewhere
12 below the lowest dose: the BMD may be just below the first dose, or orders of magnitude
13 lower. When this situation arises in quantal data, especially if the maximum response is less
14 than 100%, it is tempting to use a model like the Weibull with no restrictions on the power
15 parameter, because such models reach a plateau of less than 100% and most modeling
16 programs do not include other models for quantal data that have this property. This situation
17 can result in seriously distorted BMDs, because the model predictions jump rapidly from
18 background levels to the maximum level. In principle, other models could be found that
19 force the BMD to be anywhere between that extreme and the lowest administered dose. Thus
20 the BMD computed here depends solely on the model selected, and goodness of fit provides
21 no help in selecting among the possibilities, (see the quantal data examples in the appendix
22 for a worked example of this situation). The sad reality in such situations is that the data
23 provide little useful information about dose-response; the ideal solution is to collect further
24 data in the dose-range missed by the studies in hand.
25 When there is a jump between non-control doses between no response and maximal
26 response, there is still limited information about dose-response, but the dose-spacing may
27 ameliorate the situation, since the BMD is effectively bracketed between the two doses that
28 determine the jump.
29
17
-------
1 5. Combining Data for a BMD Calculation
2 Data sets that are statistically and biologically compatible may be combined prior to dose-
3 response modeling, resulting in increased confidence, both statistical and biological, in the calculated
4 BMD. In addition, the use of combined data sets may encourage further studies if the additional data
5 can affect the BMD estimate. Allen et al. (1996) provided an example of a case where data on boron
6 developmental effects could be combined for the BMD analysis. The simplest approach to
7 combining datasets is simply to treat the data as if they were all collected simultaneously. If it is
8 plausible that the multiple datasets represent a homogeneous picture of the dose-response (for
9 example, the responses at doses common to two or more datasets are essentially the same, and
10 statistically undifferentiable), then this is an appropriate approach. More likely, there will be some
11 variability among datasets which will require more elaborate modeling to include properly. There
12 is as yet too little practical as well as theoretical experience with this situation to allow specific
13 guidance in the matter, other than to say that statistically appropriate methods must be used and
14 justified if data sets are combined for modeling. An example of statistically accommodating
15 variability among studies is the model for categorical regression developed by Simpson, et al. (1996,
16 aandb).
17
18 B. Criteria for Selecting the Benchmark Response Level (BMR)
19
20 At the time of this writing, the Agency is developing guidance for the selection of the
21 appropriate response level, or BMR, for use with BMD modeling. In the interim, this document will
22 describe BMR selection as it has typically been done to date.
23 The major aim of benchmark dose modeling is to model the dose-response data for an
24 adverse effect in the observable range (i.e., across the range of doses for which toxicity studies have
25 reasonable power to detect effects) and then select a "benchmark dose" at the low end of the
26 observable range to use as a "point of departure" for deriving quantitative estimates below the range
27 of observation and to use as a basis for comparison of effective doses corresponding to a common
28 response level across chemicals or endpoints. Because different study designs have different
29 sensitivities to observe adverse effects (i.e., limits of detection), the low end of the observable range
18
-------
1 will correspond to different response levels for different study designs. A 10% response level is
2 conventionally used (at least for dichotomous endpoints) to define effective doses (i.e., ED10s and
3 LED10s) for comparing potencies across chemicals or endpoints (e.g., for chemical rankings). This
4 response level is used for such comparisons because it is at the low end of the observable range for
5 many common study designs, although for some designs the limit of detection is above the 10% level
6 and for others it is below. For the POD, on the other hand, it is not critical that a common response
7 level be used for all chemicals or endpoints, and for the purposes of deriving quantitative estimates
8 at doses below the observable range, it may be desirable to use response levels below 10%, if
9 possible, in order to minimize the degree of low-dose extrapolation required. Thus, while it is
10 important to always report ED10s and LEDI0s for comparison purposes, the actual "benchmark dose"
11 used as a POD may correspond to response levels below (or sometimes above) 10%, although for
12 convenience standard levels of 1 %, 5%, or 10% have typically been used rather than a floating level
13 dependent on the actual limit of detection of the relevant study.
14 For continuous data, there are various possibilities for selecting the BMR (see below);
15 however, regardless of which of the options is used, it is recommended that the BMD (and BMDL)
16 corresponding to a change in the mean response equal to one control standard deviation from the
17 control mean always be presented for comparison purposes (see below, third bullet for continuous
18 data). This value would serve as a standardized basis for comparison, akin to the ED10 for
19 dichotomous data.
20 The following describes the criteria conventionally used currently for selecting the BMR.
21 For quantal (dichotomous) data, the conventional approaches are fairly straight forward. For
22 continuous data, on the other hand, there is less historical precedence to draw upon, however some
23 reasonable options are presented. Once a BMR is selected and the dose-response data are modeled,
24 the BMD is explicitly determined.
25 Quantal data:
26 An excess risk of 10% has generally been the default BMR for quantal data. The 10%
27 response is at or near the limit of sensitivity in most cancer bioassays and in some
28, noncancer bioassays as well.
19
-------
1 If a study has greater than usual sensitivity, then a lower BMR can be used, although
2 the ED|0 and LED10 are always presented for comparison purposes. For example,
3 reproductive and developmental studies having nested study designs often have
4 greater sensitivity, and for such studies a BMR of 5% has typically been used.
5 Similarly, epidemiology studies often have greater sensitivities and a BMR of 1 % has
6 typically been used for quantal human data.
7 Continuous data:
8 If there is a minimal level of change in the endpoint that is generally considered to
9 be biologically significant (for example, a change in average adult body weight of
10 10%, or the doubling of average level for some liver enzyme), then that amount of
11 change can be used to define the BMR. (The BMD [and BMDL] corresponding to
12 a change in the mean response equal to one control standard deviation from the
13 control mean should also be presented for comparison purposes [see third bullet].)
14 If individual data are available and a decision can be made about which individual
15 levels can be reasonably considered adverse (perhaps based on a quantile of the
16 control distribution, for example), then the data can be "dichotomized" based on that
17 cutoff value, and the BMR can be set as above for quantal data. (The BMD [and
18 BMDL] corresponding to a change in the mean response equal to one control
19 standard deviation from the control mean should also be presented for comparison
20 purposes [see third bullet].)
21 In the absence of any other idea of what level of response to consider adverse, a
22 change in the mean equal to one control standard deviation from the control mean
23 (see Section II C2e) can be used. The control standard deviation can be computed
24 including historical control data, but the control mean must be from data concurrent
25 with the treatments being considered (Crump, 1995). This gives an excess risk of
26 approximately 10% for the proportion of individuals below the 2nd percentile or
27 above the 98th percentile of controls for normally distributed effects.
28
29
20
-------
1
2 C. Modeling the Data
3
4 1. Introduction
5 The goal of the mathematical modeling in benchmark dose computation is to fit a model to
6 dose-response data that describes the data set, especially at the lower end of the observable dose-
7 response range. The fitting must be done in a way that allows the uncertainty associated with
8 parameter estimates to be quantified and related to the estimate of the dose that would yield the
9 benchmark response. In practice, this procedure will involve first selecting a family or families of
10 models for further consideration, based on characteristics of the data and experimental design, and
11 fitting the models using one of a few established methods. Subsequently, a lower bound on dose is
12 calculated at the BMR. This section is too brief to do more than introduce the topic of modeling.
13 Some references for further reading are: Chapter 10 of Draper and Smith (1981), Gallant (1987),
14 Bates and Watts (1988), McCullagh and Nelder (1989), Seber and Wild (1989), Ross (1990),
15 Clayton and Hills (1993), Davidian and Giltinan (1995).
16 Dose-response models are expressed as functions of dose, possibly covariates, and a set of
17 constants, called parameters, that govern the details of the shape of the resulting curve. They are
18 fitted to a data set by finding values of the parameters that adjust the predictions of the model for
19 observed values of dose and covariates to be close to the observed response. Dose-response models
20 for toxicology data are usually of the type called "nonlinear" in mathematical terminology. In a
21 linear model, the value the model predicts is a linear combination of the parameters. For example,
22 in a linear regression of a response y on dose, the predicted value is a linear combination of a and
23 b, namely, a X 1 + b X dose .Note that, even a quadratic or other polynomial is a linear model, in
24 this sense: y = a + b X dose + c X dose + d X dose is a third degree polynomial (a cubic)
25 equation, but is still a linear combination of the parameters, a, b, c, and d. In contrast, in a nonlinear
26 model, for example the log-logistic with background,
1-P0
27 p = Pn + F-:-7 the response is not a linear combination of the parameters (here, Pn,
" O , . -a+61o(/oje r r \ ' i»
1 i C
21
-------
1 a, and b). The distinction is important, because nonlinear models are usually more difficult to fit to
2 data, requiring more complicated calculations, and statistical inference is more typically approximate
3 than with linear models. Note that this definition of "linear" is in contrast to the way the term is used
4 in reference to cancer dose-response assessment, in which the phrase "low-dose linear" refers to
5 models in which the linear coefficient on dose is positive.
6 At the present, although biological models may often be expressed as nonlinear models (e.g.,
7 Michaelis-Menten curves), nonlinear models do not necessarily have a biological interpretation.
8 Thus, criteria for final model selection will be based solely on whether various models describe the
9 data, conventions for the particular endpoint under consideration, and, sometimes, the desire to fit
10 the same basic model form to multiple data sets. Since it is preferable to use special purpose
11 modeling software, EPA is in the process of developing software which includes several models and
12 default processes as described in this document (http://www.epa.gov/ncea/bmds.htm).
13
14 2. Background for Model Selection
15 This section provides some basic statistical background and guidance on how to go about
16 choosing a model structure appropriate to the data being analyzed, selection of "equivalent" models,
17 and confidence limit calculation to derive the BMDL to use as the point of departure.
18 a. Selecting the Model
19 The initial selection of a group of models to fit to the data is governed by the nature of the
20 measurement that represents the endpoint of interest and the experimental design used to generate
21 the data. In addition, certain constraints on the models or their parameter values sometimes need to
22 be observed, and may influence model selection. Finally, it may be desirable to model multiple
23 endpoints, at the same time. The diversity of possible endpoints and shapes of their dose-responses
24 for different agents precludes specifying a small set of models to use for BMD computation. This
25 will inevitably lead to the need for judgement and occasional ambiguity when selecting the final
26 model and BMDL for dose-response assessment. It is hoped that, as experience using benchmark
27 dose methodology in dose-response assessment accumulates, it will be possible to narrow the
28 number of acceptable models.
29
22
-------
1 /. Type ofendpoint
2 The kind of measurement variable that represents the endpoint of interest is an important
3 consideration in selecting mathematical models. Commonly, such variables are either continuous,
4 like liver weight or the activity of a given liver enzyme, or discrete, commonly dichotomous, like
5 the presence or absence of abnormal liver status. However, other types are common in biological
6 data; for example: ordered categorical, like a histology score that ranges from 1-normal to 5-
7 extremely abnormal; counts, such as counts of deaths or the numbers of cases of illness per thousand
8 person-years of exposure to a given exposure condition; waiting time, such as the time it takes for
9 an illness to appear after exposure, or age at death, or multiple endpoints (such as survival, weight,
10 and malformations in a developmental toxicity study) considered jointly (see, references in section
11 I.C.2). It is beyond the scope of this document to consider all possible kinds of variables that might
12 be encountered, so further discussion will concentrate on dichotomous and continuous variables.
13 Dichotomous variables. Data on dichotomous variables are commonly presented as a
14 fraction or percent of individuals that exhibit the given condition at a given dose or exposure level.
15 For such endpoints, normally we select probability density models like logistic, probit, Weibull, and
16 so forth, whose predictions lie between zero and one for any possible dose, including zero.
17 Continuous variables. Data for continuous variables are often presented as means and
18 standard deviations or standard errors, but may also be presented as a percent of control or some
19 other standard. From a modeling standpoint, the most desirable form for such data is by individual.
20 Unlike the usual situation for dichotomous variables, summarization of continuous variables results
21 in a loss of information about the distribution of those variables.
22 The preferred approach to expressing the BMR will determine the approach to modeling
23 continuous data. Two broad categories of approach have been proposed: 1) to express the BMR as
24 a particular change in the mean response, possibly as a fraction of the control mean, a fraction of the
25 range of the response (when there is a clear maximum response), a fraction of the standard deviation
26 of the measurement from untreated individuals, or a level of the response that expert opinion holds
27 is adverse; or 2) to decide on a level of the outcome to consider adverse, and treat the proportion of
28 individuals with the adverse outcome much as one would a dichotomous variable.
29 Typical models to use in the first situation include linear and polynomial models, and power
23
-------
1 models or other nonlinear models such as Hill models. In the second situation, one approach is to
2 classify each individual as affected or not, and model the resulting variable as dichotomous.
3 An alternative is to use a so-called "hybrid" approach, such as that described by Gaylor and
4 Slikker (1990), Kodell et al. (1995), and Crump (1995), which fits continuous models to continuous
5 data, and, presuming a distribution of the data, calculates a BMD in terms of the fraction affected.
6 Using this approach, the probability (risk) of an individual with an adverse level is estimated directly
7 as a function of dose in four steps (Gaylor and Slikker, 1990). In the first step, the probability
8 distribution among individuals of the continuous measure is established for the control group. Often
9 this distribution may be approximately log-normal, i.e., the logarithm of the values of the biological
10 measure are normally distributed. Since most biological effects do not assume negative values, the
11 log-normal distribution satisfies this condition. If high values are adverse, a large percentile (e.g.,
12 99th percentile) of the distribution may be selected as a cutoff value for normal levels with larger
13 values considered adverse. Conversely, if low values are adverse, a small percentile (e.g., first
14 percentile) may be selected to classify individuals with lower values as adverse.
15 In the second step, a dose-response curve is fit to the data to establish how the average value
16 changes as a function of dose. In the third step, the variability of individuals about the average
17 dose-respose curve is calculated. Often this can be expressed simply by the standard deviation about
18 the dose-response curve. It is common for the standard deviation of biological measurements to be
19 proportional to their average value, i.e., a constant coefficient of variation. Again, this is a property
20 of the log-normal distribution. However, the coefficient of variation may change with dose which
21 leads to a more complicated analysis of the data. In this case, it is often useful to model the variance
22 as proportional to the mean raised to a power. This model includes the case where the coefficient
23 of variation is constant, where the variance is proportional to the square of the mean, and the
24 coefficient of variation is the square root of the constant of proportionality.
25 From the average values estimated from the dose-response curve in step 2 and the variability
26 of values about the curve estimated in step 3, it is possible in the 4th step to estimate the probability,
27 for any dose, that an individual is in the adverse range established in the 1st step. Hence, the BMD
28 can be estimated for a specified BMR. The BMDL can then be calculated for use as a POD for low
29 dose risk assessment.
24
-------
1 ii. Experimental design
2 The aspects of experimental design that bear on model selection include the total number of
3 dose groups used and possible clustering of experimental subjects. The number of dose groups has
4 a bearing on the number of parameters that can be estimated: the number of parameters that affect
5 the overall shape of the dose-response curve generally cannot exceed the number of dose groups.
6 Clustering of experimental subjects is actually more of an issue for methods of fitting the
7 models than for choice of the model form itself. The most common situation in which clustering
8 occurs is in developmental toxicity experiments, in which the agent is applied to the mother, and
9 individual offspring are examined for adverse effects. Another example is for designs in which
10 individuals yield multiple observations (repeated measures). This can happen, for example, when
11 each subject receives both treatment and control (common in studies with human subjects), or each
12 subject is observed multiple times after treatment (e.g., neurotoxicity studies). The issue in all of
13 these examples is that individual observations cannot be taken as independent of each other. Most
14 methods used for fitting models rely heavily on the assumption that the data are independent, and
15 special fitting methods need to be used for data sets that exhibit more complicated patterns of
16 dependence (see, for example, Ryan 1992; Davidian and Giltinan, 1995).
17 Hi. Constraints and covariates
18 An obvious constraint on models for dichotomous data has already been been discussed:
19 probabilities are constrained to be positive numbers no greater than one. However, biological reality
20 may impose other constraints on models. For example, most biological quantities are constrained
21 to be positive, so models should be selected so that their predicted values, at least in the region of
22 application, conform to that constraint. In models in which dose is raised to a power which is a
23 parameter to be estimated (such as a Weibull model), if that parameter is allowed to be less than 1.0,
24 the slope of the dose-response curve becomes infinite at a dose of zero. This often results in
25 numerical problemss in calculating the confidence interval. This is an undesirable situation, and the
26 default is to constrain these parameters to be at least 1.0 (see Example 1).
27 In quantal models, often a background parameter quantifies the probability that the outcome
28 being modeled can occur in the absence of exposure. It may be tempting to reduce the number of
29 parameters to be estimated by fixing the value of the background parameter to be zero. However,
25
-------
1 only when it is clear that an outcome is impossible in the absence of the exposure is it permissible
2 to fix the value of the background to zero.
3 It is preferred that a so-called "threshold" term not be included in the models used for
4 BMD/C analysis because, while it is not an estimate of a biological threshold, it is easily confused
5 with such because of confusing terminology, and because most data sets can be fit adequately
6 without this parameter and the associated loss of a degree of freedom. The software currently
7 distributed by EPA does not currently include this parameter. However, occasionally, it may happen
8 that the increase in a response is so precipitous that including a threshold parameter facilitates the
9 dose-response modeling, and in such cases it is acceptable to include the parameter.
10 It is sometimes desirable to include covariates on individuals when fitting dose-response
11 models. For example, litter size has often been included as a covariate in modeling laboratory
12 animal data in developmental toxicity studies. Another example is in modeling epidemiology data,
13 when certain covariates (e.g., age, parity) are included that are expected to affect the outcome and
14 might be correlated with exposure. In continuous models, if the covariate has an effect on the
15 response, including it in a model may improve the precision of the overall estimate by accounting
16 for variation that would otherwise end up in the residual variance. In any kind of model, any variable
17 that is correlated (non-causally) with dose, and which affects outcome, would need to be included
18 as a covariate.
19 b. Model Fitting
20 The goal of the fitting process is to find values for all the model parameters so that the
21 resulting fitted model describes those data as well as possible; this is termed "parameter estimation."
22 In practice, this happens when the dose-group means predicted by the model come as close as
23 possible to the data means. One way to achieve this is to write down a function (the objective
24 function) of all the parameters and all the data, with the property that the parameter values that
25 correspond either to an overall minimum (or, equivalently, an overall maximum) of the function, or
26 that result in function values of zero, give the desired model predictions.
27 The actual fitting process is carried out iteratively, and starts with an initial guess for the
28 parameter values. This guess is iteratively updated to produce a sequence of estimates that (usually)
29 converge. Many models will converge to the right estimates for most data sets from just about any
26
-------
1 reasonable set of initial parameter values; however, some models, and some data sets, may require
2 multiple guesses at initial values before the model converges. It also happens occasionally that the
3 fitting procedure will converge to different estimates from different initial guesses. Only one of
4 these sets of estimates will be "best". It is always good practice when fitting nonlinear models to try
5 different initial values, just in case.
6 There are a few common ways to construct objective functions: the methods of nonlinear
7 least squares, maximum likelihood, and generalized estimating equations (GEE). The choice, of
8 objective function is determined in large part by the nature of the variability of the data around the
9 fitted model. The method of nonlinear least squares, where the objective function is the sum of the
10 squared differences between the observed data values and the model-predicted values, is a common
11 method for continuous variables when observations can be taken as independent. A basic
12 assumption of this method is that the variance of individual observations around the dose-group
13 means is a constant across doses. When this assumption is violated (commonly, when the variance
14 of a continuous variable changes as a function of the mean, often proportional to the square of the
15 mean, giving a constant coefficient of variation), a modification of the method may be used in which
16 each term in the sum of squares is weighted by the reciprocal of an estimate of the variance at the
17 corresponding dose. This method is especially appropriate when the data to be fitted can be supposed
18 to be at least approximately normally distributed.
19 Maximum likelihood is a general way of deriving an objective function when a reasonable
20 supposition about the distribution of the data can be made. Because estimates derived by maximum
21 likelihood methods have good statistical properties, such as asymptotic normality, maximum
22 likelihood is often a preferred form of estimation when that assumption is reasonably close to the
23 truth. An example of such a situation is the case of individual independently treated animals (e.g.,
24 not clustered in litters) scored for a dichotomous response. Here it is reasonable to suppose that the
25 number of responding animals follows a binomial distribution with the probability of response
26 expressed as a function of dose. Continuous variables, especially means of several observations, are
27 often normal (gaussian) or log-normal. When variables are normally distributed with a constant
28 variance, minimizing the sum of squares is equivalent to maximizing the likelihood, which explains
29 in part why least squares methods are often used for continuous variables. In developmental toxicity
27
-------
1 data, the distribution of the number of animals with an adverse outcome is often taken to be
2 approximately beta-binomial. This particular likelihood is used to accommodate for the lack of
3 independence among littermates.
4 A third group of approaches to estimating parameters are the related quasi-likelihood method
5 (McCullagh and Nelder, 1989) and the method of GEE (see Zeger and Liang, 1986), which require
6 only that the mean, variance, and correlation structure of the data be specified. GEE methods are
7 similar to maximum likelihood estimation procedures in that they require an iterative solution,
8 provide estimates of standard errors and correlations of the parameter estimates, and estimates are
9 asymptotically normal. Their use so far has primarily been to handle forms of lack of independence,
10 as in litter data, and would be useful in any of a number of kinds of repeated measures designs, such
11 as occur in clinical studies and repeated neurobehavioral testing.
12 c. Assessing How Well the Model Describes the Data
13 An important criterion is that the selected model should describe the data, especially in the
14 region of the BMR. Most fitting methods will provide a global goodness-of-fit measure, usually
15 providing a P-value. These measures quantify the degree to which the dose-group means that are
16 predicted by the model differ from the actual dose-group means, relative to how much variation of
17 the dose-group means one might expect. Small P-values indicate that it would be unlikely to achieve
18 a value of the goodness-of-fit statistic at least this extreme if the data were actually sampled from
19 the model, and, consequently, the model is a poor fit to the data. Since it is particularly important
20 that the data be adequately modeled for BMD calculation, it is recommended that cc=0.1 be used to
21 compute the critical value for goodness of fit, instead of the more conventional values of 0.05 or
22 0.01. P-values cannot be compared from one model to another since they assume the different
23 models are correct; they can only identify those models that are consistent with the experimental
24 results. When there are other covariates in the models, such as litter size, the idea is the same, just
25 more complicated to calculate. In this case, the range of doses and other covariates is broken up into
26 cells, and the number of observations that fall into each cell is compared to that predicted by the
27 model.
28 It can happen that the model is never very far from the data points (so the P-value for the
29 goodness-of-fit statistic is not too small), but is always on one side or the other of the dose-group
28
-------
1 means. Also, there could be a wide range in the response, and the model predicts the high responses
2 well, but misses the low dose responses. In such cases, the goodness-of-fit statistic might not be
3 significant, but the fit should be treated with caution. One way to detect such situations is with
4 tables or plots of residuals: measures of the deviation of the response predicted by the model from
5 the actual data. If the residuals are scaled by an estimate of their standard deviation, then residuals
6 that exceed 2.0 in absolute value warrant further examination of the model.
7 Another way to detect the form of these deviations from fit is with graphical displays. Plots
8 should always supplement goodness-of-fit testing. It is extremely helpful that plots that include data
9 points also include a measure of dispersion of those data points, such as confidence limits.
10 In certain cases, the typical models for a standard study design cannot be used with the
11 observed data as, for example, when the data are not monotonic, or when the response rises abruptly
12 after some lower doses that give only the background response. In these cases, adjustments to the
13 data (e.g., a log-transformation of dose) or the model (e.g., adjustments for unrelated deaths) may
14 be necessary. In the absence of a mechanistic understanding of the biological response to a toxic
15 agent, data from exposures that give responses much more extreme than the BMR do not really tell
16 us very much about the shape of the response in the region of the BMR. Such exposures, however,
17 may very well have a strong effect on the shape of the fitted model in the region of the BMD. Thus,
18 if lack of fit is due to characteristics of the dose-response data for high doses, the data may be
19 adjusted by eliminating the high dose group. The practice carries with it the loss of a degree of
20 freedom, but may be useful in cases where the response plateaus or drops off at high doses. Since
21 the focus of the BMD analysis is on the low dose and response region, eliminating high dose groups
22 is reasonable. Alternatively, an entirely different model could be fit.
23 d. Comparing Models
24 It will often happen that several models provide an adequate fit to a given data set. These
25 models may be essentially unrelated to each other (for example a logistic model and a probit model
26 often do about as well at fitting dichotomous data) or they may be related to each other in the sense
27 that they are members of the same family that differ in which parameters are fixed at some default
28 value. For example, one can consider the log-logistic, the log-logistic with non-zero background,
29 and the log-logistic with threshold and non-zero background to all be members of the same family
29
-------
1 of models. Goodness-of fit statistics are not designed to compare different models, so alternative
2 approaches to selecting a model to use for BMDL computation need to be pursued.
3 Generally, within a family of models, as additional parameters are introduced the fit will
4 appear to improve. This general behavior is due solely to the increase in the additional parameters.
5 Likelihood ratio tests can be used to evaluate whether the improvement in fit afforded by estimating
6 additional parameters is justified. Such tests cannot be applied to compare models from different
7 families, however. Some statistics, notably Akaike's Information Criterion (AIC) (Akaike, 1973;
8 Linhart and Zucchini, 1986; Stone, 1998; AIC is -2L + 2p, where L is the log-likelihood at the
9 maximum likelihood estimates for the parameters, andp is the number of model degrees of freedom)
10 can be used to compare models with different numbers of parameters using a similar fitting method
11 (for example, least squares or a binomial maximum likelihood). Although such methods are not
12 exact, they can provide useful guidance in model selection.
13 When other data sets for similar endpoints exist, an external consideration can be applied.
14 It may be possible to compare the result of BMDL computations across studies if all the data were
15 fit using the same form of model, presuming that a model can be found that describes all the data
16 sets. Another consideration is the existence of a conventional approach to fitting a kind of data. In
17 this case, communication with specialists in that type of data is eased when a familiar model is used
18 -to fit the data. Neither of these considerations should be seen as justification for using ill-fitting
19 models. Finally, it is generally considered preferable to use models with fewer parameters, when
20 possible.
21 e. Using Confidence Limits to Get a BMDL
22 Confidence limits express the uncertainty in a parameter estimate that is due to sampling
23 and/or experimental error. The interval between two confidence limits is called a confidence
24 interval. Confidence intervals can be two-sided, that is, localize their corresponding parameter on
25 both sides, or one-sided, that is, localize their corresponding parameter on only one side. It may be
26 convenient to think of a one-sided confidence interval as one limit of a two-sided interval goes to
27 either infinity or minus infinity. For example, a one-sided 95% confidence interval for a parameter
28 would share a limit with the two-sided 90% confidence interval for the parameter, and have plus or
29 minus infinity (or, perhaps, 0, for a parameter such as the BMD that must be non-negative) as its
30
-------
1 second limit. Confidence limits bracket those values which, within a particular model family, are
2 consistent with the data, but do not account for or assume any correspondence between the modeled
3 animal data and the human population of concern. The "confidence" or "coverage" associated with
4 an interval indicates the percent of repeated intervals based on experiments of the same sort that are
5 expected to include the parameter being estimated, for example, the BMD. With rare but important
6 exceptions, calculated confidence intervals are approximations, in the sense that the actual coverage
7 of the interval usually diverges somewhat from the desired. The choice of confidence level
8 represents tradeoffs in data collection costs and the needed data precision. Just as 0.05 is a
9 convenient (but not necessarily good for all data) level for tests, 95% is a convenient choice for most
10 limits and is the default value recommended in this guidance.
11 A lower confidence limit is placed on the BMD to obtain a dose (BMDL) that assures with
12 high confidence (e.g., 95%) that the BMR is not exceeded. This process rewards better experimental
13 design and procedures that provide more precise estimates of the BMD, resulting in tighter
14 confidence intervals and thus higher BMDLs. Some procedures and examples for calculating
15 BMDLs or BMCLs are given by Gaylor et al. (1998).
16 The method by which the confidence limit is obtained is typically related to the manner in
17 which the BMD is estimated from the model. When parameters are estimated using the method of
18 maximum-likelihood, confidence intervals (CIs) may be based on the asymptotic distribution of the
19 likelihood ratio or on the asymptotic distribution of the maximum likelihood estimates (MLEs).
20 While both can give problems in ranges where the assumptions needed to use asymptotic theory
21 begin to weaken (e.g., as sample sizes decrease), in general it is preferred to base CIs for parameters
22 estimated by maximum likelihood on the asymptotic distribution of the likelihood ratio, owing to
23 their tendency to give better coverage behavior (Crump and Howe, 1985).
24 To compute a CI for a model parameter based on the distribution of the likelihood ratio, first
25 compute the maximum likelihood estimate of all the parameters in the model. Next, separate the
26 model parameters into one parameter whose CI is being computed (call it p.) and all the other
27 parameters. Then find the value of n such that, when the other parameters are adjusted to maximise
28 the likelihood, the log-likelihood is reduced from that at the maximum likelihood estimates by
29 exactly x2(i,i-a/2, where X2(i,i-0) represents the quantile of the x2 distribution corresponding to 1 degree
31
-------
1 of freedom and an upper tail probability of a (see, for example, Crump and Howe, 1985; Venzon and
2 Moolgavkar, 1988). When the value of interest cannot be expressed as a model parameter, a similar,
3 but more complicated, approach is used.
4 Details for other approaches to CI computation specific to particular data types follow:
5 Ouantal Data. For quantal data each individual is classified according to whether or not it
6 exhibits a particular adverse effect, e.g., death or cancer. Quantal data provide the simplest case for
7 estimating BMDs. Consider an experiment consisting of animals exposed to several doses of a
8 substance, and suppose that the number of animals exhibiting a particular adverse effect is
9 binomially distributed at each dose level. After a suitable dose-response curve has been fit to the
10 experimental data, the BMDL is defined as a lower confidence limit on the exposure level that
11 corresponds to a specified excess risk (e.g., 10%) above background. The exposure level itself is
12 the effective dose, or the BMD10. There are several ways to calculate a lower confidence limit. One
13 is to apply standard statistical theory (specifically, the delta method, see for example Gart et al.,
14 1986) to approximate the variance of the estimated BMD. This estimated variance can then be used
15 as the basis for constructing a lower confidence limit on the BMD. The logarithm of doses can be
16 used to ensure a positive BMDL. A second approach is to calculate an upper confidence limit on
17 the excess proportion (risk) of animals possessing an adverse effect as a function of dose. The
18 BMDL is the dose where the upper confidence limit for the estimate of risk equals the specified level
19 of risk, e.g., 10%, desired for the BMD (see e.g., Kimmel and Gaylor, 1988).
20 Clustered Data: Reproductive and Developmental Effects The issue of litter effects for
21 reproductive and developmental experiments complicates the calculation of a confidence limit. The
22 pregnant mother is the experimental unit and statistical methods must account for the tendency of
23 littermates to respond similarly. Chen and Kodell (1989) and Williams (1975) have proposed
24 methods based on the assumption that the number of affected individuals in a litter follows a
25 beta-binomial distribution. The probability of an affected individual increases with dose of a toxic
26 agent. To fit this model, maximum likelihood estimates can be obtained from the beta-binomial log
27 likelihood (Chen and Kodell, 1989).
28 One disadvantage of the beta-binomial distribution and other correlated binomial
29 distributions is their computational complexity. A second disadvantage is a lack of robustness if the
32
-------
1 assumed distribution is incorrect. Alternative analyses can be based on quasi-likelihood, or more
2 generally, generalized estimating equations. Liang and Zeger (1986) and Liang (1986) describe a
3 general approach for the analysis of correlated data. This approach is referred to as Generalized
4 Estimating Equations (GEE). Ryan (1992) discusses the use of this approach for developmental
5 toxicity. The GEE approach requires specification of only the mean and variance functions of the
6 data. To estimate dispersion parameters, a separate equation is required. A simple example is the
7 moment estimates. An important addition in the GEE method is the inclusion of an empirical
8 variance "fix-up" that relaxes the distributional assumptions so that the model parameters and their
9 variances will be estimated correctly, even if the variance function is misspecified. There is still
10 incentive to correctly specify the variance function since it improves statistical efficiency (Liang and
11 Zeger, 1986).
12 Continuous Data Different techniques for calculating a BMD are required for continuous
13 measurements. Examples of continuous endpoints are body weights, organ weights, and
14 hematological and clinical chemistry measurements. For such data measured on a continuum, there
15 generally is no sharp demarcation between normal and adverse values. In the absence of a clinical
16 definition of an adverse level, a low or high percentile (e.g., the first and 99th percentile) could be
17 used to define an abnormal observation. For values that are normally distributed, these percentiles
18 are estimated by the mean ±2.33 standard deviations from the control animals. Such extreme values
19 might be considered adverse or, at least, undesirable and can be classified as abnormal.
20 Crump (1995) shows the relationship between a change in the mean response, relative to the
21 standard deviation, and the excess risk. For example, if values beyond the 98th to 99th percentile of
22 control animals are considered abnormal, a dose that causes a shift in the average of one standard
23 deviation results in approximately an excess risk of 10% of the animals in the abnormal range. This
24 provides a very simple method for establishing a BMD associated with a risk of approximately 10%.
25 A lower confidence limit on this BMD can be calculated using standard regression procedures.
26 Multiple Outcomes In addition to the clustering or litter effect, multivariate outcomes are
27 often encountered. This is particularly true of developmental and reproductive toxicity data because
28 exposure to agents can affect many different stages in the reproductive process. Once implantation
29 has occurred, exposures to toxicants can result in early pregnancy loss, malformation, low fetal
33
-------
1 weight, and/or subsequent developmental problems. The BMD can be based on the risk of being
2 abnormal. Abnormality is defined as exhibiting any of several selected abberative endpoints.
3 Several authors have discussed the development of dose-response models for multivariate data (Chen
4 et al, 1991; Ryan et al., 1991; Catalano and Ryan, 1991; Ryan, 1992b; Catalano et al, 1993; Zhu
5 et al, 1994; Krewski and Zhu, 1994, 1995).
6 Thus, the BMDL is determined by 1) selecting an endpoint(s), 2) identifying a BMR (a
7 predetermined level of change in response relative to controls), 3) establishing, by an appropriate
8 estimation procedure, a model that fits the data adequately, and 4) calculating a confidence limit at
9 the BMR using the model and the same estimation procedure.
10 At the time of this writing, commercial software is available that is designed specifically for
11 carrying out steps 3) and 4) by maximum likelihood or GEE methods. EPA has software for this
12 purpose (using maximum likelihood methods) that is widely-available to all potential users.
13 f. Selecting the model to use for POD computation
14 To summarize the preceeding sections, it is recommended that the following steps be
15 followed to select the model(s) to use for computing the POD:
16 Assess goodness-of-fit, using a value of a=0.1 to determine a critical value.
17 Further reject models that apparently do not adequately describe the relevant low-dose
18 portion of the dose-response, examining residuals and graphs of model and data.
19 As the models remaining have met the default statistical criteria for adequacy and visually
20 fit the data, any of them theoretically could be used for determining the BMDL. The
21 remaining criteria for selecting the BMDL are necessarily somewhat arbitrary, and are
22 adopted as defaults.
23 If the BMDL estimates from the remaining models are within a factor of 3, then they are
24 considered to show no appreciable model dependence and will be considered
25 indistinguishable in the context of the precision of the methods. Models are ranked based
26 on the values of their Akaike Information Criterion (AIC), a measure of the deviance of the
27 model fit adjusted for the degrees of freedom, and the model with the lowest AIC is used to
28 calculate the BMDL. If this is not unique, the simple average or geometric mean of the
29 BMDLs with the lowest AIC is used.
34
-------
1 If the BMDL estimates from the remaining models are not within a factor of 3, some model
2 dependence of the estimate is assumed. Since there is no clear remaining biological or
3 statistical basis on which to choose among them, the lowest BMDL is selected as a
4 reasonable conservative estimate. If the lowest BMDL from the available models appears
5 to be an outlier, compared to the other results (e.g., there are several other results, all within
6 a factor of 3), then additional analysis and discussion would be appropriate. Additional
7 analysis might include the use of additional models, the examination of the parameter values
8 for the models used, or an evaluation of the BMDs to determine if the same pattern exists as
9 for the BMDLs. Discussion of the decision procedure should always be provided.
10 In some cases, relevant data for a given agent are not amenable to modeling and a mixture
11 of BMDLs and NOAEL/LOAELs results. When this occurs, and the most biologically
12 relevant effect is from a study considered adequate but not amenable to modeling, the
13 NOAEL should be used as the point of departure.
14
15 D. Reporting Requirements
16
17 Any computation of a BMD or BMDL should include the following elements:
18
19 Study or Studies Selected for BMD Calculation(s)
20 Rationale for study selection
21 Rationale for endpoints (effects)
22 List dose response data used
23 Dose-Response Model(s) Chosen for each Case
24 Rationale
25 Estimation procedure (e.g., maximum likelihood, least squares, generalized
26 estimating equations)
27 Estimates of model parameters with standard errors
28 Goodness-of fit test statistics
29 Standardized residuals (observed minus predicted response/standard deviation)
35
-------
1 Choice of BMR for Each Case
2 Rationale
3 Procedure used if for continuous data
4 Computation of the BMD
5 List the BMD Value.
6 Calculation of the Lower Confidence Limit for the BMD (BMDL) for Each Case
7 Confidence limit procedure (e.g., likelihood profile, delta method, bootstrap)
8 - List BMDL Value(s)
9 Graphics for Each Case
10 Plot of data points with error (standard deviation) bars
11 Plot of fitted dose-response
12 Plot of confidence limits for the fitted curve (optional; if included, the narative
13 should describe the methods used to compute them.)
14 - Identify BMD and BMDL
15 BMDs and BMDLs for Default BMRs
16 For dichotomous data, the BMD and BMDL for an extra risk of 0.10
17 For continuous data, the BMD and BMDL corresponding to a change in the mean
18 response equal to one control standard deviation from the control mean.
19
20 E. Decision Tree
21
22 The following decision tree depicts the general progression of steps in a BMD calculation.
23 A separate BMD calculation should be conducted for each endpoint/study combination that is a
24 reasonable candidate for becoming the basis for a final quantitative risk estimate. Unlike comparing
25 NOAELs or LOAELs across endpoints or studies, the relative values of potential BMDs are not
26 readily transparent until after the modeling has been completed.
27 For each candidate endpoint/study combination:
28 1. Select the appropriate BMR based on the type of data (i.e., quantal vs. continuous),
29 sensitivity of study design, toxicity endpoint, and judgements about the adversity of the
36
-------
1 endpoint if continuous (see Section II.B).
2 2. Model the dose-response data, using appropriate model structures for the type of data (i.e.,
3 quantal vs. continuous, depending on how the BMR is defined) and study design (e.g.,
4 nested) (see Section II.C.2.a). For modeling cancer bioassay data, a specific default
5 algorithm is generally used except for case-specific situations in which an alternate model
6 may be superior (e.g. a time-to-tumor model, a biologically-based model). For other types
7 of experimental animal data, curve-fitting can be attempted with any appropriate models.
8 Human data are modeled in a case-specific way which may need to account for covariates,
9 competing causes of mortality, etc.
10 3. Assess the fit of the models (see Section II.C.2.c). Retain models that are not rejected
11 using a p-value of 0.1. Examine the residuals and plot the data and models; check that the
12 models adequately describe the data, especially in the region of the BMR. (Sometimes it
13 may be necessary to transform the data in some way or to drop the highest exposure group(s)
14 (e-g-, if the behavior at high exposures can be attributed to early mortality or enzyme
15 saturation effects) and repeat the modeling in order to get a good fit.)
16 4. Calculate 95% lower confidence limits on the candidate BMDs (i.e., BMDLs) using the
17 models which adequately fit the data (see Section II.C.2.e).
18 5. Select from among the models which adequately fit the data (see Section II.C.2.f). If the
19 BMDL estimates from these remaining models are within a factor of 3 they are considered
20 indistinguishable, and the model with the lowest AIC can be selected to provide the BMDL.
21 If the BMDL estimates are not within a factor of 3, some model dependence is assumed, and
22 the model with the lowest BMDL estimate should be selected unless it appears to be an
23 outlier, in which case further analysis may be appropriate.
24 6. Document the BMD analysis as outlined in Section II.D. on reporting requirements.
37
-------
1 REFERENCES
2
3
4 Akaike, H (1973) Information theory and an extension of the maximum likelihood principle, in
5 Proceedings of the Second International Symposium on Information Theory, B. N. Petrov and F.
6 Csaki, eds. Akademiai Kiado, Budapest, pp. 267-281.
7
8 Alexeeff, G.V.; Lewis, D.C.; Ragle, N.L. (1993) Estimation of potential health effects from acute
9 exposure to hydrogen fluoride using a 'benchmark dose' approach. Risk Analysis, 13(l):63-69.
10
11 Allen, B.C.; Kavlock, R.J.; Kimmel, C.A.; Faustman, E.M. (1994a) Dose-response assessment for
12 developmental toxicity: II. Comparison of generic benchmark dose estimates with NOAELs. Fund.
13 Appl. Toxicol., 23:487-495.
14
15 Allen, B.C.; Kavlock, R.J.; Kimmel, C.A.; Faustman, E.M. (1994b) Dose-response assessment for
16 development toxicity: HI. Statistical models. Fund. Appl. Toxicol., 23:496-509.
17
18 Allen, B.C., and P.L. Strong, C.J. Price, S.A. Hubbard, and G.P. Daston (1996) Benchmark dose
19 analysis of developmental toxicity in rats exposed to boric acid. Fund. Appl. Toxicol. 32: 194-204.
20
21 Auton, T.R. Calculation of benchmark doses from teratology data (1994). Regulatory Toxicology
22 and Pharmacology: 19: 152-167.
23
24 Barnes, D.G.; Daston, G.P.; Evans, J.S. Jarabek, A.M.; Kavlock, R.J.; Kimmel, C.A.; Park, C.;
25 Spitzer, H.L. (1995) Benchmark dose workshop: Criteria for use of a benchmark dose to estimate
26 a reference dose. Regulatory Toxicol. Pharmacol., 21:296-306.
27
28 Bates, D. M. and Watts, D. G. (1988) Nonlinear Regression Analysis and Its Applications. Wiley.
29 New York.
38
-------
1 Beck, B.D.; Conolly, R.B.; Dourson, M.L.; Guth, D.; Hattis, D.; Kimmel, C.; Lewis, S.C. (1993)
2 Symposium overview: improvements in quantitative noncancer risk assessment. Fund. Appl.
3 Toxicology 20:1-14.
4
5 California Office of Environmental Health Hazard Assessment. (1993) Safety assessment for non-
6 cancer endpoints: The benchmark dose and other possible approaches. Summary report.
7
8 Catalano, P. J., and Ryan, L. M. (1992). Bivariate latent variable models for clustered discrete and
9 continuous outcomes. J. Am. Stat. Assoc. 87, 651-658.
10
11 Catalano, P.J.; Scharfstein, D.O.; Ryan, L.M.; Kimmel, C.A.; Kimmel, G.L. (1993) Statistical model
12 for fetal death fetal weight and malformation in developmental toxicity studies. Teratology 47:281 -
13 290.
14
15 Chen, C.; Farland, W. (1991) Incorporating cell proliferation in quantitative cancer risk assessment:
16 approaches, issues, and uncertainties. In: Butterworth, B.; Slaga, T.; Farland, W.; McClain, M., eds.
17 Chemical induced cell proliferation: implications for risk assessment. New York: Wiley-Liss, pp.
18 481-499.
19
20 Chen, J. J., and Kodell, R. L. (1989). Quantitative risk assessment for teratologic effects. J. Am.
21 Stat. Assoc. S4, 966-911.
22
23 Chen, J.J.; Kodell, R.L.; Howe, R.B.; Gaylor, D.W. (1991) Analysis of trinomial responses from
24 reproductive and developmental toxicity experiments. Biometrics 47:1049-1058.
25
26 Clayton, D.; Hills, M. (1993) Statistical Models in Epidemiology. Oxford University Press, Oxford.
27
28 Cogliano, V.J. (1986) The U.S. EPA's methodology for adjusting the reportable quantities of
29 potential carcinogens. Proceedings of the 7th National Conference on Management of Uncontrolled
39
-------
1 Hazardous Wastes (Superfund '86). Washington: Hazardous Materials Control Research Institute,
2 pp.182-185.
3
4 Collins, M.A., G.M. Rusch, F. Sato, P.M. Hext, and R.-J. Millischer. 1995.
5 1,1,1,2-Tetrafluoroethane: Repeat exposure inhalation toxicity in the rat, developmental toxicity in
6 the rabbit, and genotoxicity in vitro and in vivo. Fund. Appl. Toxicol. 25: 271-280.
7
8 Cox, D.R.; Hinkley, D.V. (1974) Theoretical Statistics, chapter 7, Chapman and Hall, London.
9
10 Crump, K.S. (1984) A new method for determining allowable daily intakes. Fundamental and
11 Applied Toxicology 4:854-871.
12
13 Crump, K. S. (1995) Calculation of benchmark doses from continuous data. Risk Analysis 15:
14 79-89.
15
16 Crump, K. S.; R. Howe. (1985) Chapter 9. in Toxicological Risk Assessment. D.B. Clayson, D.
17 Krewski, I. Munro, eds. Boca Raton: CRC Press, Inc.
18
19 Davidian, M.; Giltinan, D. M. (1995) Nonlinear Models for Repeated Measurement Data. Chapman
20 and Hall, London. Farmer, J.H.; Kodell, R.L.; Gaylor, D.W. (1982) Estimation and extrapolation of
21 tumor probabilities from a mouse bioassay with survival/sacrifice components. Risk Analysis
22 2(l):27-34.
23
24 Dourson, M.L., Hertzberg, R.C., Hartung, R., Blackburn, K. (1985) Novel methods for the
25 estimation of acceptable daily intake. Toxicology and Industrial Health 1:23-41.
26
27 Draper, N.; Smith, H. (1981) Applied Regression Analysis, Second Edition. Chapter 10. Wiley, New
28 York.
29
40
-------
1 Farmer, J.H.;Kodell,R.L.; Gaylor,D.W. (1982) Estimation and extrapolation of tumor probabilities
2 from a mouse bioassay with survival/sacrifice components. Risk Analysis 2(l):27-34.
3
4 Faustman, E.M.; Allen, B.C.; Kavlock, R.J.; Kimmel, C.A. (1994) Dose-response assessment for
5 developmental toxicity: I. Characterization of data base and determination of NOAELs. Fund.
6 Appl. Toxicol., 23:478-486.
7
8 Fleiss, J.L. (1981) Statistical Methods for Rates and Proportions. Second Edition. Wiley. New
9 York.
10
11 Fowles, J. R., Alexeeff, G. V., Dodge, D. (1999) The use of benchmark dose methodology with acute
12 inhalation lethality data. Regulatory Toxicology and Pharmacology 29: 262-278.
13
14 Fung, K. Y., Marro, L., and Krewski, D. (1998) A comparison of methods for estimating the
15 benchmark dose based on overdispersed data from developmental toxicity studies. Risk Analysis
16 18:329-342.
17
18 Gallant, A. R. (1987) Nonlinear Statistical Models. Wiley. New York.
19
20 Gart, J. J., Krewski, D., Lee, P. N., Tarone, R. E., and Wahrendorf, J. (1986). Statistical Methods
21 in Cancer Research, Vol. 3. The Design and Analysis of Long-Term Animal Experiments.
22 International Agency for Research on Cancer, Lyon, France.
23
24 Gaylor, D.W. (1983) The use of safety factors for controlling risk. Journal of Toxicology and
25 Environmental Health 11:329-336.
26
27 Gaylor, D. W. (1996) Quantalization of continuous data for benchmark dose estimation. Regulatory
28 Toxicology and Pharmacology 24: 246-250.
29
41
-------
1 Gaylor, D.W.; Kodell, R.L. (1980) Linear interpolation algorithm for low dose risk assessment of
2 toxic substances. J. Environ. Pathol. Toxicol. 4:305-312.
3
4 Gaylor, D.; Slikker, W., Jr. (1990) Risk assessment for neurotoxic effects. NeuroToxicology 11:
5 211-218.
6
7 Gaylor, D.W.; Kodell, R.L.; Chen, J.J.; Springer, J.A.; Lorentzen, R.J.; Scheuplein, R.J. (1994) Point
8 estimates of cancer risk at low doses. Risk Analysis 14(5):843-850.
9
10 Gaylor, D. W., Ryan, L., Krewski, D., and Zhu, Y. (1998). Procedures for calculating benchmark
11 doses for health risk assessment. Regulatory Toxicol. Pharmacol. 28, 150-164.
12
13 Gerrity, T.R.; Henry, C.J., eds. (1990) Summary report of the workshops on principles of route-to-
14 route extrapolation for risk assessment. In: Principles of route-to-route extrapolation for risk
15 assessment, proceedings of the workshops; March and July; Hilton Head, SC and Durham, NC. New
16 York, NY: Elsevier Science Publishing Co., Inc.; pp. 1-12.
17
18 Gold, L.S.; Sawyer, C.B.; Magaw, R.; Backman, G.M.; de Veciana, M.; Levinson, R.; Hooper, N.K.;
19 Havender, W.R.; Bernstein, L.; Peto, R.; Pike, M.C.; Ames, B.N. (1984) A carcinogenic potency
20 database of the standardized results of animal bioassays. Environ. Health Perspect. 58:9-319.
21
22 Gold, L.S.; Bernstein, L.; Kaldor, J.; Backman, G.; Hoel, D. (1986a) An empirical comparison of
23 methods used to estimate carcinogenic potency in long-term animal bioassays: lifetable vs summary
24 incidence data. Fund. Appl. Toxicol. 6:263-269.
25
26 Gold, L.S.; de Veciana, M.; Backman, G.M.; Magaw, R.; Lopipero, P.; Smith, M.; Blumenthal, M.;
27 Levinson, R.; Bernstein, L.; Ames, B.N. (1986b) Chronological supplement to the carcinogenic
28 potency database: standardized results of animal bioassays published through December 1982.
29 Environ. Health Perspect. 67:161-200.
42
-------
1 Gold, L.S.; Slone, T.H.; Backman, G.M.; Magaw, R.; Da Costa, M.; Lopipero, P.; Blumenthal, M.;
2 Ames, B.N. (1987) Second chronological supplement to the carcinogenic potency database:
3 standardized results of animal bioassays published through December 1984 and by the National
4 Toxicology Program through May 1986. Environ. Health Perspect. 74:237-329.
5
6 Gold, L.S.; Slone, T.H.; Backman, G.M.; Eisenberg, S.; Da Costa, M.; Wong, M.; Manley, N.B.;
7 Rohrbach, L.; Ames, B.N. (1990) Third chronological supplement to the carcinogenic potency
8 database: standardized results of animal bioassays published through December 1986 and by the
9 National Toxicology Program through June 1987. Environ. Health Perspect. 84:215-286.
10
11 Gold, L.S.; Manley, N.B.; Slone, T.H.; Garfmkel, G.B.; Rohrbach, L.; Ames, B.N. (1993) The fifth
12 plot of the carcinogenic potency database: results of animal bioassays published in the general
13 literature through 1988 and by the National Toxicology Program through 1989. Environ. Health
14 Perspect. 100:65-168.
15
16 Gold, L.S.; Manley, N.B.; Slone, T.H.; Garfmkel, G.B.; Ames, B.N. Rohrbach, L.; Stern, B.R.;
17 Chow, K. (1995) Sixth plot of the carcinogenic potency database: results of animal bioassays
18 published in the general literature 1989-1990andbytheNationalToxicologyProgram 1990to 1993.
19 Environ. Health Perspect.103 Suppl 8:3-122.
20
21 Guth, D. J. Carroll, R. J. Simpson, D. G. and Zhou, H. (1997) Categorical regression analysis of
22 acute exposure to tetrachloroethylene. Risk Analysis 17: 321-332.
23
24 Haas, C.N. (1994) Dose-response analysis using spreadsheets. Risk Analysis 14(6):1097-1100.
25
26 Haber, L. T., Allen, B. C., and Kimmel, C. A. (1998) Non-cancer risk assessment for nickel
27 compounds: Issues associated with dose-response modeling of inhalation and oral exposures.
28 Toxicological Sciences 43: 213-229.
29
43
-------
1 Hasselblad, V.; A.M. Jarabek (1995) Dose-response analysis of toxic chemicals. In: Bayesian
2 biostatistics. D.A. Berry, D.K. Stangl, eds. Marcel Dekker, Inc. New York.
3
4 Heindel, J.J., C.J. Price, E.A. Field, M.C. Marr, C.B. Myers, R.E. Morrissey, and B.A. Schwetz
5 (1992). Developmental toxicity of boric acid in mice and rats. Fund. Appl. Toxicol. 18:266-.
6
7 Hertzberg, R.C. (1989) Fitting a model to categorical response data with application to species
8 extrapolation of toxicity. Health Physics 57:405-409.
9
10 Hertzberg, R.C., Miller, M. (1985) A statistical model for species extrapolation using categorical
11 response data. Toxicology and Industrial Health 1:43-57.
12
13 Hext, P.M. and R.J. Parr-Dobrzanski, 1993. HFC 134a: 2 Year inhalation toxicity study in the rat.
14 ICI Central Toxicology Laboratory, Alderley Park, Macclesfield, Cheshire, UK. Report No.
15 CTL/P/3317.
16
17 Hoel, D.G. (1990) Assumptions of the HERP index. Risk Analysis 10(4):623-624.
18
19 Hoover, S.M.; Zeise, L.; Pease, W.S.; Lee, L.E.; Hennig, M.P.; Weiss, L.B.; Cranor, C. (1995)
20 Improving the regulation of carcinogens by expediting cancer potency estimation. Risk Analysis
21 15(2):267-280.
22
23 Howe, R.B.; Crump, K.S.; Van Landingham, C. (1986) Global 86: a computer program to
24 extrapolate quantal animal toxicity data to low doses. Prepared for U.S. EPA under contract
25 68-01-6826.
26
27 Jarabek, A.M., Hasselblad, V. (1992) Application of a Bayesian statistical approach to response
28 analysis of noncancer toxic effects. Toxicologist 12:98.
29
44
-------
1 Johnson B.L., J. Boyd, J.R. Burg, S.T. Lee, C. Xintaras, and B.E. Albright. 1983. Effects on the
2 peripheral nervous system of workers' exposure to carbon disulfide. Neurotoxicology 4(1): 53-66.
3
4 Kavlock, R.J., B.C. Allen, C.A. Kimmel, E.M. Faustman. (1995) Dose-response assessment for
5 developmental toxicity: IV. Benchmark doses for fetal weight changes. Fund. Appl. Toxicol.,
6 26:211-222.
7
8 Kavlock, R.J.; Schmid, J.E.; Setzer, R.W., Jr. (1996) A simulation study of the influence of study
9 design on the estimation of benchmark doses for developmental toxicity. Risk Analysis 16:391-403.
10
11 Kimmel, C.A., Gaylor, D.W. (1988) Issues in qualitative and quantitative risk analysis for
12 developmental toxicity. Risk Analysis 8:15-20.
13
14 Kimmel, C.A.; Wellington, D.G.; Farland, W.; Rose, P.; Manson, J.M.; Chernoff, N.; Young, J.F.;
15 Selevan, S.G.; Kaplan, N.; Chen, C.; Chitlik, L.D.; Siegel-Scott, C.L.; Valaoras, G.; Wells, S. (1989)
16 Overview of a workshop on quantitative models for developmental toxicity risk assessment.
17 Environmental Health Perspectives 79:209-2150.
18
19 Kimmel, C.A., M. Siegel, T.M. Crisp, and C.W. Chen (1996) Benchmark concentration (BMC)
20 analysis of 1,3-butadiene (BD) reproductive and developmental effects. Fund. Appl. Toxicol.
21 (Suppl., no. 1, part 2) 30: 146.
22
23 Kodell, R.L.; Chen, J.J.; Gaylor, D.W. (1995) Neurotoxicity Modeling for Risk Assessment.
24 Regulatory Toxicology and Pharmacology 22:24-29.
25
26 Krewski, D. (1990) Measuring carcinogenic potency. Risk Analysis 10(4):615-617.
27
28 Krewski,D ,Brown,C ,and Murdoch,D. Determining "safe" levels of exposure: safety factors or
29 mathematical models? Fund Appl Toxicol 4: S 383- S 394 ( 1984).
45
-------
1 Krewski, D., and Zhu, Y. (1994). Applications of multinomial dose-response models in
2 developmental toxicity risk assessment. Risk Anal. 14, 613-627.
3
4 Krewski, D.; Zhu, Y. (1995) A simple data transformation for estimating benchmark doses in
5 developmental toxicity experiments. Risk Analysis 15:29-39.
6
7 Krewski, D.; Szyszkowicz, M.; Rosenkranz, H. (1990) Quantitative factors in chemical
8 carcinogenesis: variation in carcinogenic potency. Regul. Toxicol. Pharmacol. 12:13-29.
9
10 Krewski, D.; Gaylor, D.; Szyszkowicz, M. (1991) Amodel-free approach to low-dose extrapolation.
11 Environ. Health Perspect. 90:279-285.
12
13 Kupper, L.L.; Hafner, K.B. (1989) How appropriate are popular sample size formulas? The
14 American Statistician 43:101-105.
15
16 Lefkopoulou, M.; Moore, D.; Ryan, L. (1989) The analysis of multiple binary outcomes:
17 Application to rodent teratology experiments. Journal of the American Statistical Association 84:
18 810-815.
19
20 Leisenring, W. and Ryan, L. (1992) Statistical properties at the NOAEL. Regulatory Toxicology and
21 Pharmacology 15: 161-171.
22
23 Liang, K-Y., and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models.
24 Biometrika 73, 13-22.
25
26 Linhart, H. and Zucchini, W. (1986). Model Selection. Wiley, New York.
27
28 McCullagh, P.; Nelder,J.A. (1989) Generalized Linear Models, Second Edition. Chapman and Hall,
29 London.
46
-------
1 Mantel,N and Bryan,WR. Safety testing of carcinogenic agents. J Natl Cancer Instit 27: 455-470
2 (1961).
3
4 Moolgavkar, S.H.; Knudson, A.G. (1981) Mutation and cancer: a model for human carcinogenesis.
5 J. Natl. Cancer Inst. 66:1037-1052.
6
7 Murrell, J.A., Portier, C. J., and Morris, R. W. (1998) Characterizing dose-response: I: Critical
8 assessment of the benchmark dose concept. Risk Analysis 18: 13-26.
9
10 National Research Council (NRC) (1983) Risk Assessment in the Federal Government: Managing
11 the Process. Prepared by: Committee on the Institutional Means for Assessment of Risks to Public
12 Health, Commission on Life Sciences. Washington, DC.
13
14 National Research Council (NRC) (1993) Issues in risk assessment. Washington: National Academy
15 Press, pp. 115-116.
16
17 National Research Council (NRC) (1994) Science and Judgment in Risk Assessment, Committee
18 on Risk Assessment of Hazardous Air Pollutants, Board on Environmental Studies and Toxicology,
19 Commission on Life Sciences, National Academy Press, Washington, DC.
20
21 National Toxicology Program (NTP) (1991) Technical report on the toxicology and carcinogenesis
22 of 1,3-butadiene (CAS No. 106-99-0) in B6C3F, mice (inhalation studies). U.S. Department of
23 Health and Human Services, Pubic Health Service, National Institutes of Health, National
24 Toxicology Program. NTP TR434, NIH Publ. No. 92-3165.
25
26 Peto, R.; Pike, M.C.; Bernstein, L.; Gold, L.S.; Ames, B.N. (1984) The TD50: a numerical description
27 of the carcinogenic potency of chemicals in chronic-exposure animal experiments. Environ. Health
28 Perspect. 58:1-8.
29
47
-------
1 Price and Berner, 1995. A benchmark dose for carbon disulfide: Analysis of nerve conduction
2 velocity measurements from the NIOSH exposure database. Report to the Chemical Manufacturers
3 Association Carbon Bisulfide Panel.
4
5 Research Triangle Institute (RTI) (1994) Determination of the no-observable-adverse-effect-level
6 (NOAEL) for developmental toxicity in Sprague-Dawley (CD) rats exposed to boric acid in feed on
7 gestational days 0 to 20, and evaluation of postnatal recovery through postnatal day 21. RTI
8 Identification Number 65C-5657-200.
9
10 Ross, G. J. S. (1990) Nonlinear Estimation. Springer-Verlag. New York.
11
12 Ryan, L. M, Catalano, P. J., Kimmel, C., and Kimmel, G. (1991). Relationship between fetal weight
13 and malformation in developmental toxicity studies. Teratology 44, 215-223.
14
15 Ryan, L. M. (1992a). The use of generalized estimating equations for risk assessment in
16 developmental toxicity. Risk Anal. 12, 439-447.
17
18 Ryan, L. 1992. Quantitative risk assessment for developmental toxicity. Biometrics 48:163-174.
19
20 Sawyer, C.; Peto, R.; Bernstein, L.; Pike, M.C. (1984) Calculation of carcinogenic potency from
21 long-term animal carcinogenesis experiments. Biometrics 40:2740.
22
23 Seber, G. A. F. and Wild, C. J. (1989) Nonlinear Regression. Wiley. New York.
24
25 Setzer,R.W.; Rogers, J.M.(1991)Assessingdevelopmental hazard: the reliability of the A/D ratio.
26 Teratology 44:653-665.
27
28 Simpson, D. G., Carroll, R. J., Zhou, H., Guth, D. J. (1996a) Interval censoring and marginal analysis
29 in ordinal regression. J. Agr. Biol. Environ. Statistics 1: 354-376.
48
-------
1 Simpson, D. G., Carroll, R. J., Zhou, H., Guth, D. J. (1996b) Weighted logistic regression and robust
2 analysis of diverse toxicology data. Commun. In Statist.-Meth. 25: 2615-2632,
3
4 Stone, M. (1998) Akaike's Criteria, in Encyclopedia ofBiostatistics, Armitage, P. and Colton, T.,
5 eds. Wiley, New York.
6
7 U.S. Environmental Protection Agency (EPA) (1986a) Guidelines for carcinogen risk assessment.
8 Federal Register 51(185):33992-34003.
9
10 U.S. Environmental Protection Agency (EPA) (1986b) Science Advisory Board Comments
11
12 U.S. Environmental Protection Agency (EPA) (1987) Hazardous substances; reportable quantity
13 adjustments; proposed rules. Federal Register 52(50):8140-8186.
14
15 U.S. Environmental Protection Agency (EPA) (1988a) Science Advisory Board Comments.
16
17 U.S. Environmental Protection Agency (EPA) (1988b) Science Advisory Board Comments.
18
19 U.S. Environmental Protection Agency (EPA) (1988c) Methodology for evaluating potential
20 carcinogenicity in support of reportable quantity adjustments pursuant to CERCLA section 102.
21 Washington: report no. EPA/600/8-89/053.
22
23 U.S. Environmental Protection Agency (EPA) (1989a) Reportable quantity adjustments; delisting
24 of ammonium thiosulfate; final rules. Federal Register 54(155):33418-33484.
25
26 U.S. Environmental Protection Agency (EPA) (1989b) Technical background document to support
27 rulemaking pursuant to CERCLA section 102, volume 3., Washington: Office of Solid Waste and
28 Emergency Response.
29
49
-------
1 U.S. Environmental Protection Agency (EPA) (1989c) Science Advisory Board Comments.
2
3 U.S. Environmental Protection Agency (EPA) (199la) Guidelines for developmental toxicity risk
4 assessment; notice. Fed Regist, 56:63798-63826.
5
6 US Environmental Protection Agency (EPA) (1991b) Regulatory impact analysis of proposed
7 national primary drinking water regulation for lead and copper. Prepared by Wade Miller
8 Associates, Inc. April.
9
10 U.S. Environmental Protection Agency (EPA) (1992) Draft report: a cross-species scaling factor for
11 carcinogen risk assessment based on equivalence of mg/kg3/4/day; notice. Federal Register
12 57(109):24152-24173.
13
14 U.S. Environmental Protection Agency (1994a) Ranking of pollutants with respect to hazard to
15 human health; proposed rule. Federal Register 59.
16
17 U.S. Environmental Protection Agency (1994b) Technical background document to support
18 rulemaking pursuant to the Clean Air Actsection 112(g): ranking of pollutants with respect to
19 hazard to human health. Research Triangle Park, NC: report no. EPA-450/3-92-010.
20
21 U.S. Environmental Protection Agency (1994c) Methods for derivation of inhalation reference dose
22 concentrations and application of inhalation dosimetry. Office of Health and Environmental
23 Assessment, Environmental Criteria and Assessment Office, Research Triangle Park, MC.
24 EPA/600/8-90/066F.
25
26 U.S. Environmental Protection Agency (1995a) Reportable quantity adjustments; final rule. Federal
27 Register 60(112):30926-30962.
28
29 U.S. Environmental Protection Agency (1995b) Technical background document to support
50
-------
1 rulemaking pursuant to CERCLA section 102, vol. 7. Washington: Office of Solid Waste and
2 Emergency Response.
3
4 U.S. Environmental Protection Agency (1995c) The use of the benchmark dose approach in health
5 risk assessment. Office of Research and Development, Washington, DC: EPA/630/R-94/007,
6 February.
7
8 U.S. Environmental Protection Agency (1995d) Health assessment document for diesel emissions.
9 Washington, EPA/600/8-90/057Bb.
10
11 U.S. Environmental Protection Agency (1995e) Benchmark dose concentration analysis for carbon
12 disulfide. Internal Report.
13
14 U.S. Environmental Protection Agency (EPA) (1995f) Proposed guidelines for neurotoxicity risk
15 assessment; notice. Fed Regist, 60:52032-52056
16
17 U.S. Environmental Protection Agency (EPA) (1995g) Manganese document
18
19 U.S. Environmental Protection Agency (1996a) Proposed guidelines for carcinogen risk assessment.
20 Federal Register 61(79):17960-18011.
21
22 U.S. Environmental Protection Agency (1996b) Guidelines for Reproductive Toxicity Risk
23 Assessment; notice. Federal Register (draft).
24
25 U.S. Environmental Protection Agency (EPA) (1996c) Integrated Risk Information System (IRIS).
26 Online. National Center for Environmental Assessment, Washington, DC.
27
28 Van Ryzin, J. (1980) Quantitative risk assessment. J. Occup. Med. 22(5):321-326.
29
51
-------
1 Venzon, D. J. and Moolgavkar, S. H. (1988) A method for computing profile-likelihood-based
2 confidence intervals. Appl. Statist. 37: 87-94.
3
4 Wartenberg, D.; Gallo, M.A. (1990) The fallacy of ranking possible carcinogen hazards using the
5 TD50. Risk Analysis 10(4):609-613.
6
7 West, R. W. and Kodell, R. L. (1999). A comparison of methods of benchmark-dose estimation for
8 continuous response data. Risk Analysis 19: 453 - 459.
9
10 Williams, D. A. (1975). The analysis of binary responses from toxicological experiments involving
11 reproduction and teratogenicity. Biometrics 31, 949-952.
12
13 Zeger, S. L.; Liang, K. Y. (1986) Longitudinal data analysis for discrete and continuous outcomes.
14 Biometrics 42: 121-130.
15
16 Zhu, Y.; Krewski, D.; Ross, W.H. (1994) Dose-response models for correlated multinomial data
17 from developmental toxicity studies. Applied Statistics 43:583-598.
18
19
52
-------
1 EXAMPLES
2
3
4 1. Introduction
5
6 The following examples were selected to illustrate some important aspects of computing
7 BMDs and BMDLs for single data sets and single endpoints. Of course, other decisions, not
8 illustrated here with examples, need to be made before a POD is determined, in particular which
9 endpoints and data sets to model, and how to select a POD from among several BMDLs.
10
11 2. Quantal Data: Selecting a Model
12
13 This example illustrates computing a benchmark dose for a simple quantal data set, using the
14 dose-response models available in BMDS. The main point is to illustrate selecting a benchmark
15 dose, given that the critical data set and benchmark response level have already been selected. In
16 addition, it provides some background into why, in four commonly used models for quantal data,
17 available in EPA's BMDS package (Weibull, log-logistic, log-probit, and gamma), a parameter
18 ("power" or "slope") is often constrained to be no less than 1.0.
19 Consider the following dose-response data:
20
21 Dose Number Affected Fraction Affected Number of Animals
22
23
24
25
26 We want to compute a benchmark dose and BMDL for an extra risk of 0.10 (as suggested
27 by this document), using a one-sided 95% confidence interval. If we define the BMD to correspond
28 to an extra risk of 0.10 (= BMR), then, ifP(BMD) is the proportion of affected animals at the BMD,
53
0
21
60
1
15
20
0.02
0.31
0.44
50
49
45
-------
P(BMD)-P(Q)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
auu r ( i// ib me pi upui iiuii 111 me euiiu
UlgiUU
p, D1V1J\ Ib UC11I1CU
can be rearranged to yield P(5MZ)) = P(0) + [l - P(0)]SMfl
IUUC DM I
i =
. i
ins
). Since we are looking for a BMR of
0. 1 0, that will correspond to a response of 0.02 + (0.98 * 0. 1) = 0. 1 1 8. Notice that
animals were affected in the lowest
Thus the expected response
31% of the tested
non-control dose.
at the
response. We need to be aware that model
First, we fit a number of models to
Results of fitting the models
BMD is substantially lower
than the
lowest observed
choice will have some effect on the BMD calculation.
the data.
, sorted in order of increasing AIC [
is the log-likelihood at the maximum likelihood estimates, and p is the
= -2 x (LL - p), where
LL
degrees of freedom of the
model; generally everything else being equal, lower AIC values are preferred]:
Model
log-logistic (slope si)
log-probit (unconstrained)
Weibull (unconstrained)
log-logistic (unconstrained)
gamma (unconstrained)
Multistage (degree=2)
gamma (power si)
Weibull (power si)
log-probit (slope s 1)
probit
logistic
x2
0.93
0
0
0
0
2.27
2.27
2.27
6.05
7.83
8.30
1 Degrees of freedom are 0, since there are
P-value
0.34
NA1
NA
NA
NA
0.13
0.13
0.13
0.0139
0.0051
0.004
three dose groups
AIC
136.907
137.995
137.995
137.995
137.995
138.17
138.17
138.17
141.692
144.448
145.179
BMD
7.21
2.75
1.71
2.25
1.33
9.29
9.29
9.29
14.82
19.50
20.95
and three estimated
BMDL
4.93
NA
NA
NA
~0
6.92
6.92
6.92
11.53
15.71
16.78
parameters.
Eight of the models have chi-squared values that exceed the recommended cutoff P-value of
0.1 (this includes four models with
perfect
fits, even though
their P-values are undefined because
54
-------
1
2
3
4
5
6
7
8
9
10
11
12
there are no degrees of freedom left to test the chi-square statistic). The model with the best AIC is
the log-logistic model with slope parameter constrained to be no less than 1 . For this model, the
standardized residuals [i.e.,( observed value - expected value)/standard error] are all small:
Dose Est._Prob. Expected Observed Size Residual
0.0000 0.0218 1.091 1 50 -0.0881
21.0000 0.2609 12.784 15 49 - 0.7208
60.0000 0.4917 22.125 20 45 -0.6335
and a visual examination seems OK, since the predicted curve comes well within the confidence
limits for each data point:
0.6
0.5
a 0.4
»
| 0.3
o
o
2 0.2
0.1
0
Log-Logistic Model with 0.95 Confidence Level
.'...' ' r - 1 1
Log-Logistic
^^^^
^^^^^
/^ :
/
BMDL BMD .....
0 10 20 30 40 50 60
dose
16:05 09/08 2000
Figure A-2.1 Example data with 95% confidence limits,
and constrained log-logistic model fit.
13 Four other models have only slightly greater AIC values and perfectly fit the data, the models
14 with unconstrained slope or power parameters. Their AIC values are greater than that for the
15 constrained log logistic only because an extra parameter counts against them: BMDS does not
16 assign a model degree of freedom to parameters that end up on a constraint, so that model has
17 only 2 degrees of freedom, while the models with unconstrained parameters have 3. The BMDs
18 computed from the unconstrained models differ slightly among themselves, but are all quite a bit
19 smaller than that computed from the constrained log-logistic, and, finally, there seems to be a
20 problem with computing a BMDL for those models. Nevertheless, these models also describe
55
-------
1
2
3
the data quite well, as the following graph of the unconstrained log-logistic model fit attests:
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Log-Logistic Model
0.6
0.5
» 0.4
o
I
< 0.3
0.1
Log-Logistic
10
20
30
dose
40
50
60
16:3709/082000
Figure A-2.2 Example data, 95% confidence limits, and
unconstrained log-logistic model fit.
The main difference between the two log-logistic curves is in the region between the control and
the lowest dose, where the unconstrained model curves upward more sharply than does the
constrained model, which accounts for the lower BMDs from these models.
Finally, three models, the second-degree multistage, constrained Weibull, and constrained
gamma, all give exactly the same fit and BMD prediction: in fact, for these data, they are really
the same model. While the P-value for the fit is approaching the recommended cutoff, the AIC is
only slightly worse than that for the unconstrained models. The predicted values and residuals
are summarized in the table below:
Scaled
Dose Est._Prob. Expected Observed Size Residual
0.0000
21.0000
60.0000
0.0251
0.2318
0.5064
1.257
11.356
22.787
1
15
20
50
49
45
-0.232
1.234
-0.831
The fit at the lower two doses is a little worse than it was for the constrained log-logistic,
56
-------
1
2
and this is apparent with close inspection of the graph, shown in Figure A-2.3.
Multistage Model with 0.95 Confidence Level
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
60
17:5909/082000
Figure A-2.3 Example data with 95% confidence limits,
and second degree multistage model fit.
The primary question to be addressed here is, "Which model should I use to compute the
BMD and BMDL". In this case, since the AIC of the constrained log-logistic model is slightly
below those for the other models, the constrained log-logistic model can be considered preferable
to them. However, the three lowest AIC values in the table above are so similar that it might be
tempting to consider the models with the perfect fit to the data, even though this guidance
recommends against using models such as the Weibull or log-logistic without constraining the
power parameter or the slope parameter to be no less than 1.0. The rest of the narrative of this
example is devoted to showing why allowing the slope parameter to be less than 1.0 might not be
such a good idea.
The answer centers on the interpretation of BMDLs and how they are computed. When
BMDLs are computed using the profile likelihood approach, this is particularly easy to visualize.
In this case, conceptually at least, the BMD is treated as a parameter in the dose-response model,
and, for each in a range of BMD values, the other parameters in the model are adjusted to
maximize the log-likelihood while keeping the BMD constant at the selected value. The
resulting curve, plotting the log-likelihood as a function of BMD value, is called a profile
likelihood. This curve has a maximum at the BMD that corresponds to the maximum likelihood
estimates for all the parameters, and drops off for values above and below that point. The BMDL
57
-------
1 for a (1 - a) x 100% confidence interval is the BMD value where the log-likelihood is reduced
2 from the maximum value by (x2idfi2ay2. Since we compute one-sided confidence intervals, we
3 need only consider the shape of the curve below the maximum likelihood estimate for the BMD.
4 The upper left hand panel of Figure A-2.4 shows this half of the profile likelihood for the BMD
5 for the constrained log-logistic model fitted to the example data. The horizontal line indicates
6 the critical value of the log-likelihood for determining a 95% confidence limit.
7 Since each BMD value on the x-axis of the figure has corresponding model parameter
8 estimates, we can examine the plausibility of the dose-response curves we are claiming are
9 consistent with the data. The upper right panel of Figure A-2.4 shows this for some BMD values
10 including the maximum likelihood estimate (lowest dose-response curve) and the lower
11 confidence limit (highest dose-response curve). Although clearly the range of curves does not
12 exhaust the set of plausible dose-response curves one might consider for these data, they are
13 certainly all plausible shapes. So, not only does the maximum likelihood fit of the model to the
14 data represent a plausible dose-response shape, so do all the models between that and the model
15 implied by the lower confidence bound on the BMD.
16 The story is different for the unconstrained log-logistic model, illustrated in the lower two
17 panels of Figure A-2.4. First of all, the profile likelihood is substantially flatter for this model. It
18 never even achieves the necessary drop in log-likelihood for there to be a lower 95% confidence
19 limit, indicated by the horizontal line (the lowest, left-most point on the curve is the limiting
20 value as BMD approaches 0). This explains why there is no BMDL for this model in the table:
21 the confidence limit includes 0! The lower right panel shows the dose-response curves that
22 correspond to the BMD values indicated by X's in the lower left panel. While the maximum
23 likelihood fit may be a plausible fit to the data, the curves become increasing implausible as
24 BMD drops, with the curve shooting up more and more rapidly from the control response as the
25 BMD for the model is reduced. The log-likelihood is never reduced very much, because there is
26 little evidence for trend in the responses at the two non-control doses. Indeed, at the limiting
27 value for the BMD, 0, the curve is discontinuous: the control is fit perfectly, and the non-control
28 responses are fit by a horizontal line, and the log-likelihood is not reduced sufficiently to reject
29 this model as a plausible fit to the data! This situation often occurs when models such as the log-
58
-------
1 logistic (also log-probit) are fit without constraining the slope parameter, and the Weibull,
2 gamma, Hill, or power models are fit without constraining the power parameter. The
3 implausibility of the curves that sometimes result when such models are fit to data is why this
4 document recommends that such models not be used with unconstrained power or slope
5 parameters, or only with great care.
6 In conclusion, for the reasons stated above, the log-logistic model, with the slope
7 constrained to be greater than one, is selected as the preferred model for these data. This gives a
8 BMDof7.2and BMDL of 4.9 for an extra risk of 10% for this dataset.
59
-------
-66.0 H
-0-66.5
o
o
|67.0
I67-5
O
-68.0 H
Q)
"o
I
C
g
"o
co
LL
0.6
0.5
0.4
0.3
0.2
0.1
0.0
I I I I I I
0 20 40 60
BMD
Dose
-66.0 H
-n-66.5
o
o
S-67.0
-68.0 H
)*C
XN
0.0
1.0
2.0
-------
1
2 3. Continuous Data: Getting a Good-Fitting Model
3
4 This example illustrates some of the care required when using non-linear modeling
5 software and some of the data manipulation that may be required to get an adequate model fit for
6 computing a BMD and BMDL. Several points are being made here: (1) convergence of a
7 nonlinear model does not guarantee that maximum likelihood estimates have been achieved;
8 sometimes some common sense and refitting is required to get MLEs; (2) even once maximum
9 likelihood estimates have been achieved, the model may not fit well enough, and other actions
10 may need to be taken to get a better fitting model; (3) one of the BMRs for this example is 5% of
11 the dynamic range of the response (Murrell et al, 1998, suggest that the fraction of the dynamic
12 range of a continuous variable may often be a good quantification of the biological significance
13 of the change); sometimes it may require some common sense and ingenuity to compute the
14 BMD corresponding to such a BMR. NOTE: Some of the behavior of this example depends on
15 the way the April 3, 2000 version of the Hill model from BMDS, selects its initial values. Other
16 software, and even later versions of the Hill model from BMDS, may well behave differently on
17 these data. This does not indicate "bugs" in the software, but rather that, for some datasets, there
18 can be multiple "local maxima" for the likelihood function; software that uses purely local
19 methods for optimization (as does BMDS) can get trapped at a local maximum, and may require
20 experimenting with alternative initial parameter values to assure convergence to a true global
21 maximum of the likelihood function. Software packages differ in the algorithm used to select the
22 starting parameter values for optimization, so may end up in different local maxima.
23 For this example, consider the following data set:
24 Dose subject/group Mean Std. Dev.
25
26
27
28
29
61
0.
0.3
1.
3.
10.
8
8
8
8
8
100.
98.24
111.34
172.16
357.48
30.4
49.8
59.9
58.4
167.5
-------
Dose subject/group
Mean
Std. Dev.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
30.
100.
300.
8
8
8
1695.03
1576.11
1896.22
260.9
169.7
141.7 '
The data represent a biochemical response in rats after dosing. For this example, we will
compute a BMD as the dose where the response has increased over background by an amount of
5% of the range between the background and the maximum response, per the suggestion of
Murrell, et al. (1998), as well as the dose where the mean has been displaced by one control
standard deviation, as this document suggests. As can be seen from Figure A-3.1, the dose-
response is clearly sigmoidal.
1
f.
Mean Re
ŁŁUU
2000
1800
1600
1400
1200
1000
800
600
400
200
0
_e
I
I
-
!!
0 0 50 100 150 200 250 300 3E
dose
16:3109/112000
Figure A-3.1: Mean and 95% confidence intervals for
example data.
It is natural in such data to fit a flexible model that allows a sigmoidal response; the Hill
model is one such model, available in BMDS. Since it is usual in biochemical data for the
variance to be proportional to the square of the mean (approximately), and since it looks as if the
variance is larger in this data set for larger means, in general, for this example, we fit a model to
the data in which the variance is modeled as being proportional to the power of the mean. That
is, our model is:
62
-------
Yd"
k" + d"
2
3 where d represents dose, fi(d) represents the mean response, and c?(d) represents the variance of
4 the observations at dose d. Rough estimates of this model can be read off the graph of the data,
5 and this provides a useful check of the fitting algorithm. When we fit a Hill model to the
6 example data, we would expect A (intercept) to be around 100, since that is about the background
7 level of the response, V should be around 1600, since that is about the increment at the highest
8 doses over the background level, k represents the dose where half the response has occurred, and
9 should be in the range of 10 - 30. Furthermore, based on experience, n should be relatively
10 small, say between 1 and 10, and p ought to fall between 1 and 2, or so, since it is common for
1 1 variances to be proportional to the square of means in such data.
12 If that model is fit to these data using the April 3, 2000 version of the Hill model from
13 BMDS (the current version as of this writing), the fitting algorithm apparently converges on a
14 solution. The parameter estimates from this solution are:
15
16 Variable Estimate Std. Err.
17 alpha 4381.57 2211.67
18 rho 0.266572 0.0668979
19 intercept 105.045 22.8759
20 v 1634.05 51.087
21 n 4.76591 1.62145
22 k 14.256 1.80324
23 Note that all the estimates are in their expected ranges except for the estimate of p (rho), which is
24 0.27, though we said we would have expected a value in the range 1-2.
25
63
-------
1
2
3
A
H
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
The resulting predicted values are:
Dose N Obs Mean Obs Std Dev Est Mean Est Std Dev Chi^2 Res.
0 8 100 30.4 105 123
0.3 8 98.2 49.8 105 123
1 8 111 59.9 105 123
3 8 172 58.4 106 123
10 8 357 168 360 145
30 8 1.7e+003 261 1.69e+003 178
100 8 1.58e+003 170 1.74e-t-003 179
300 8 1.9e+003 142 1.74e+003 179
While the model predicts the mean values and the standard deviations at
-0.115
-0.156
0.138
1.518
-0.059
0.028
-2.570
2.483
the higher doses
pretty well, the standard deviations at the lower doses are overestimated by factors of 2 to 4. For
future reference, the log-likelihood for this model fit is -345.786.
This may be the best this model can do, but it looks suspiciously like the
fitting algorithm
got caught in a local maximum of the likelihood surface, and that, perhaps, if we could get better
initial values for some of the parameters, we could get a better set of estimates.
Since the model
for the mean seems to describe the data pretty well, we will restart the model, selecting the old
estimates as initial values for the parameters of the model for the mean, and get new starting
values for estimating the variance function parameters. These new estimates will come from
regressing the log of the observed variance (that is, the square of the standard deviation), on the
log of the observed mean (that is, log(var) = log(«) + plog(mean) ). The
estimates from this regression are: p=\.0, log(ar)=3.166, so the estimate of aris e
Starting from these new values, the final estimates are:
Variable Estimate Std. Err.
alpha 24.8892 24.5755
rho 1.04671 0.162142
intercept 117.097 10.798
v 1629.2 64.9209
n 4.18855 1.33386
k 14.8385 1.86453
parameter
3-166 = 23.7.
64
-------
1
2
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
and the new
Dose
0
0.3
1
3
10
30
100
300
The!
predicted values:
N Obs Mean
8 100
8 98.2
8 111
8 172
8 357
8 1.76+003
8 1.586+003
8 1.96+003
BMD = 7
BMDL = 5.
log-likelihood for this
Obs Std
30.4
49.8
59.9
58.4
168
261
170
142
.3467
96733
fit is -333.
Dev Est Mean
117
117
117
119
379
1.676+003
1.75e+003
1.75e+003
127, a substantial in
Est Std
60.3
60.3
60.3
60.9
112
242
248
248
iprovement
Dev ChiA2 Res.
-0.797
-0.882
-0.281
2 .462
-0.556
0.351
-1.939
1.711
over the previous
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
fit. Furthermore, now not only do the estimated means accord with those observed, but the
estimated standard deviations are a lot closer to those observed. Most likely, the current
estimates are really the maximum likelihood estimates for this model and this dataset.
However, even though the fit is improved, neither the variance model (see the result of
Test 3, below) nor the model for the mean (result of Test 4, below) fits the data, as the following
excerpt from BMDS output for this example illustrates:
Likelihoods of Interest
Model
Al
A2
A3
fitted
R
Log (likelihood)
-343 .706
-317.77
-324.533
-333.127
-458.043
DF
9
16
10
6
2
DF
9
16
10
6
2
AIC
705.412
667.539
669.065
678.253
920.086
Explanation of Tests
65
-------
1 Test 1: Does response and/or variances differ among Dose levels?
2 (A2 vs. R)
3 Test 2: Are Variances Homogeneous? (Al vs A2)
4 Test 3: Are variances adequately modeled? (A2 vs. A3)
5 Test 4: Does the Model for the Mean Fit? (A3 vs. fitted)
6
7 Tests of Interest
8
9 Test -2*log(Likelihood Ratio) Test df p-value
10
11 Test 1 280.547 14 <.0001
12 Test 2 51.8732 7 <.0001
13 Test 3 13.5263 6 0.0354
14 Test 4 17.1876 4 0.001777
15
16 What is going on? The table of fitted values, above (particularly the column labeled
17 "chiA2 residuals") shows that the current model seriously underpredicts the response at a dose of
18 3, and misses the response at the two highest doses on either side. Furthermore, the model over
19 predicts the standard deviation at the two highest doses (which is probably why the model for the
20 variance is rejected). It is the under prediction at the lower doses that is most important,
21 however, because that is in the region of the BMD, as far as this model can tell.
22 The three highest doses, at 30, 100 and 300, are quite far from the BMD; if we drop those
23 doses, we will be eliminating doses whose responses the model cannot account for very well,
24 and, since they are far from the BMD, we should not be eliminating much information about the
25 actual location of the BMD. Furthermore, since the responses on the plateau have all been
26 dropped, other monotonic dose-response models can be fit to the data. We consider three here:
27 the Hill, a first degree polynomial (adding higher degree terms to the model did not add
28 significantly to the ability of the model's ability to fit the data; the model used is
29 }Ji(d) = /30 + $ d ), and the power model ( }J,(d) = j30 + j3, d r).
30 However, one of the BMRs we want to calculate is based on a change in the mean
31 response equal to 5% of the range of the response (that is 5% of the maximum value minus the
32 minimum value). In the Hill model, the BMD and BMDL corresponding to this change can be
33 computed directly by the software in BMDS, but this is not so for the other models (since those
66
-------
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
models to not allow for a horizontal asymptote). Furthermore, since this reduced data set really
contains no information about the maximum response, even the Hill model's estimate of that is
suspect (the estimate of the maximum value from the model reported in the above table is
ridiculously large: 143289; with a huge standard error: 5.8 x 108, so it is clearly not useful for
setting a BMR). The way around this is to calculate 5% of the observed dynamic range for this
endpoint, and look for the dose that would result in an absolute change of this amount. The
minimum value, based on the variance-weighted mean of the lower two dose groups, is 99.51,
and the maximum value, based on the variance-weighted mean of the upper three dose groups, is
1758.3; 5% of the difference of the two is 82.9.
Model
polynomial
power
Hill
GOF P-value
0.98
0.95
0.76
AIC
5% Dyn.
BMD
375.46 3.23
377.35 3.46
379.35 3.46
Range
BMDL
1 SD
BMD
2.46 1.46
2.47 1.66
2.47 1.70
Change
BMDL
1.11
1.11
1.14
All three models fit the data well, according to both the summary results reported here and a
more detailed examination of the graphs and residuals (not shown here), but the AIC for the
polynomial model is somewhat better than that for the other two, so that is the model to choose
to calculate the BMD and BMDL. That is, the BMD and BMDL based on 5% of the dynamic
range of the response are 3.23 and 2.46; based on a one standard deviation change, 1.46 and 1.11.
This example illustrates three points, none of which is specific to modeling continuous
data: (1) it is important to exercise some judgment when fitting models to data; no software
package can guarantee that the parameters returned are actually maximum likelihood estimates,
and the analyst may have to do some "tweaking" to get an acceptable answer; (2) we want
models that describe the data well in the region of the BMR/BMD, which may involve some
judicious narrowing of the dose range we attempt to model; (3) it may be necessary to exercise
some creativity to compute BMDs for the BMR we want, and what scientific and risk analytic
judgment dictate as desirable answer should not be subservient to what the software can do.
67
-------
1
2 4. Cancer Bioassay Data: Modeling POD for Cancer Slope Factor
3 For cancer response modeling from standard cancer bioassay data, U.S. EPA is
4 developing a specific algorithm which will be included in the BMDS package. The algorithm
5 uses a multistage (polynomial) model with some constraints. As this model is under
6 development at the time of this writing, the standard BMDS version of the multistage model will
7 be used for the purposes of this example. Under EPA's proposed 1996 Guidelines for
8 Carcinogen Risk Assessment, quantitative risk estimates from cancer bioassay data are typically
9 calculated by modeling the data in the observed range to estimate a BMDL for a BMR of 10%
10 extra risk, which is generally at the low end of the observable range for standard cancer bioassay
11 data. This BMDL then serves as the "point of departure" for linear extrapolation or a nonlinear
12 quantitative approach, as warranted by the mode of action of the carcinogen.
13 This example uses the dose-response data presented in EPA's 1988 Health and
14 Environmental Effects Document for Dibromochloromethane for the quantitative estimate of
15 carcinogenic risk from oral exposure. The rationale for study selection and endpoint selection,
16 while important components of any comprehensive write-up of a BMD calculation, are beyond
17 the scope of this quantitative example.
18
19 BMD Modeling for Dibromochloromethane
20
21 tumor type: hepatocellular adenoma or carcinoma
22 test animal: B6C3F1 mouse, female
23 route of exposure: gavage
24 study: NTP, 1985
25
26
68
-------
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
DOSE-RESPONSE DATA
administered human equivalent tumor
dose (mg/kg/day) dose (mg/kg/dav) incidence
0
50
100
0
2.83
5.67
6/50
10/49
19/50
As discussed above, the multistage model was used because it is considered the default model for
cancer bioassay data; although, in the future there will be a specific algorithm for modeling such
cancer data. Similarly, a BMR of 10% extra risk was used, as is typical for standard cancer
bioassay data.
BMR: 10%
model: multistage, extra risk
First, a second-degree (i.e., n-1) multistage model is fit to the data.
model form: background + (1-background) * [!-EXP(-betal*doseAl-beta2*doseA2)]
parameter
background
beta (1)
beta (2)
AIC = 158.688
p-value = 1
Chi2 = 0
residuals = 0
estimate (MLEs)
0.12
0.00930036
0.00925286
std.error
0.132665
0.141898
0.0246904
69
-------
Multistage Model with 0.95 Confidence Level
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Figure A-4.1 Second-degree multistage model.
BMD (ED10) = 2.91 mg/kg/day
BMDL (LED10; 95% confidence limit estimated by likelihood profile) = 1.25 mg/kg/day
The second-degree model provides a good fit. Next, a first-degree multistage model is fit to the
data to see if a more parsimonious model can also provide an adequate fit.
model form: background + (1-background) * [l-EXP(-betal*doseAl)]
parameter
background
beta(l)
estimate (MLEs) std.error
0.111488 0.120556
0.0559807 0.0391492
70
-------
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
AIC=157.
272
p-value = 0.4446
Goodness of Fit:
Dose Est. Prob. Expected Observed Size ChiA2 Residuals
0.0000 0
2.8300 0
5.6700 0
Chi-square
.1115 5.574 6 50 0.086
.2417 11.842 10 49 -0.205
.3531 17.657 19 50 0.118
= 0.57 DF = 1 P-value = 0.4494
BMD (ED10) = 1.88 mg/kg/day
BMDL (LED10; 95% confidence limit estimated by likelihood profile) = 1.20 mg/kg/day
Multistage Model with 0.95 Confidence Level
0.5
"o
§
< 0.3
o
"o
O3
0.2
0.1
Multistage
BMDL
BMD
1
3
dose
Figure A-4.2 First-degree multistage model.
71
-------
1 The AIC is lower for the first-degree model suggesting that this is the preferred model; however,
2 because the multistage model is really a family of k-degree models, a likelihood ratio test can be
3 used to evaluate whether the improvement in fit afforded by estimating additional parameters is
4 justified. In this case, the log likelihood for the second-degree model was -76.3439 and for the
5 first-degree model was -76.6361. Thus twice the absolute difference in the log likelihoods is less
6 than 3.84, i.e., a Chi-square with one degree of freedom (i.e., 2-1), suggesting that the first-
7 degree multistage model is not significantly different from the second-degree model. Under the
8 recommendations of the benchmark dose guidance, the more parsimonious first-degree model
9 would be generally preferred. Final judgement on this may be subject to endpoint-specific
10 guidance.
11
12 References
13
14 NTP (National Toxicology Program). 1985. Toxicology and carcinogenesis
15 studies of chlorodibromomethane (CAS No. 124-48-1) in F344/N rats and B6C3F1
16 mice (gavage studies). NTP Tech. Report Series No. 282. NTIS PB 86-166675.
17
18 U.S. EPA. 1988. Health and Environmental Effects Document for
19 Dibromochloromethane. Prepared by the Office of Health and Environmental
20 Assessment, Environmental Criteria and Assessment Office, Cincinnati, OH. ECAO-CIN-
21 GO40.
22
23
72
-------
1
2 5. Developmental Toxicity Example
3
4 In general, data from developmental toxicity studies in rodents are best modeled using
5 nested models. These models account for any intralitter correlation, or the tendency of
6 littermates to respond similarly to one another relative to the other litters in a dose group. If this
7 correlation (which may vary with dose) is not estimated, variance estimates, and hence the
8 confidence limits on benchmark responses and doses, will generally be misspecified.
9 This example uses dose-response data reported by George et al. (1992), regarding the
10 developmental toxicity of ethylene glycol diethyl ether administered orally to mice. As with the
11 other examples in this guidance, this example illustrates fitting a model to one dose-response
12 pattern. Note that the rationale for study selection and endpoint selection, while important
13 components of any comprehensive BMD calculation write-up, are beyond the scope of this
14 quantitative example.
15 The outcome modeled was prevalence of malformations, a quantal endpoint. The nested
16 logistic model was considered for the purpose of illustrating fitting these quantal, nested data.
17 Elements of the analysis addressing the reporting requirements in Section II.D. are documented
18 in Table A-4.1, including a brief description of the experiment. The model input and model
19 output data are summarized in Table A-4.2.
20 The nested logistic model demonstrated a reasonably good visual fit to the mean
21 responses of the dose groups (not shown), but the goodness of fit p-value was 0.061, less than the
22 value of 0.10 recommended in Section II.E. Since the coefficients which gauge the influence of
23 litter size in predicting the response rate were fairly close to zero (0.0013 and -0.1507,
24 respectively, not shown), suggesting that litter size was not important in this case, the model was
25 re-fit without litter size. The resulting fit yielded a p-value of 0.184 ©, adequate for supporting
26 BMD evaluation. Its AIC (at 450.6) was also slightly lower than the first fit (at 452.5).
27 Another variation on this model was also fitted, setting the intralitter correlations (the
28 coefficients phil - phi5) to zero. This fit was not successful, with a goodness of fit p-value of 0
29 and an AIC of 570.4 (compare to 450.6, above). The intralitter correlations are therefore
30 important for describing the observed variability in this data set.
73
-------
The fitted model and the mean responses by dose group are shown in Figure A-5.1. The
Nested Logistic Model with 0.95 Confidence Level
2
3
4
5
6
7
8
9
10
11
12
200
400 600
dose
800
1000
Figure A-5.1 Developmental Example Model Fit.
results for the selected nested logistic fit (the second fit described above) are provided in Table
A-5.2.
Reference
George JD, Price CJ, Marr MC, Kimmel CA, Schwetz BA, Morrissey RE. (1992). The
Developmental Toxicity of Ethylene Glycol Diethyl Ether in Mice and Rabbits. Fund. App. Tox.
19:15-25.
74
-------
1
2
3
4
5
6
7
8
9
10
11
12
13
Table A-5. 1 : Summary of benchmark dose estimate, and key to Table 4-2
Study
Dose-
Response
Model
Choice of
BMR
Benchmark
Dose
Graphics
Title/Identifier
Rationale for study selection
Rationale for endpoints (effects)
List dose response data used
Form
Rationale
Estimation procedure
Estimates of model parameters with
standard errors
Goodness-of fit test statistics
Standardized residuals
Rationale
Lower Confidence Limit Procedure
HMD
BMDL
Data points
Fitted dose-response model
Confidence limits for fitted curve
Ethylene glycol diethyl ether, administered to mice
by gavage (in mg/kg/day), days 6-15 of gestation
(George et al., 1992)
Selected by developmental toxicologist as an
adequate study
Skeletal malformations - Developmental toxicologist
selected as an important endpoint
See ź in Table 4-2
Nested logistic model in BMDS package, see (D in
Table 4-1
Fits a wide variety of dose-response shapes for nested
data
Maximum likelihood
See ©
See ©, ź, ©
See ©,©
Quantal data, used default 10% extra risk level.
Likelihood profile
485 mg/kg/day (ź)
410 mg/kg/day (ź)
See mean response rates and confidence limits in
Figure 4-1
See Figure 4-1
Not provided
75
-------
Table A-5.2: Output from Model Run (EPA BMDS NLogistic Model. Revision: 2.6, Date: 2000/03/03)
The probability function is:
Prob. = alpha + thetal*Rij + [1 - alpha - thetal*Rij]/ 1+exp(-beta-theta2*Rij-rho*log(Dose))],
where Rij is the litter specific covariate.
Restrict Power rho >= 1.
Total number of observations = 105
Total number of records with missing values = 0
Total number of parameters in model = 10
Total number of specified parameters = 2
Maximum number of iterations = 250
Relative Function Convergence has been set to: le-008
Parameter Convergence has been set to: le-008
User specifies the following parameters:
thetal = 0
theta2 = 0
Default Initial Parameter Values
alpha =
beta =
thetal =
theta2 =
rho =
phil =
phi2 =
phi3 =
phi4 =
phis =
0.0370596
-36.6368
0
0
5.56873
0.64095
0.999996
0.159461
0.284719
0.231641
Specified
Specified
Variable
alpha
beta
rho
phil
phi 2
phi 3
phi4
phi5
Parameter Estimates
Estimate
0.0370596
-36.6368
5.56873
0.64095
0.999996
0.159461
0.284719
0.231641
Std. Err.
0.0142364
0.289861
242.744
0.107174
0.13603
0.130185
0
0
AIC:
450.56
Litter Data
Dose
0,
0
0.
0,
0.
0,
0.
0
0
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
Lit. -Spec.
Cov.
6,
8.
8,
9.
9,
10.
10,
11,
11.
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
Est.
0.
0.
0.
0.
0.
0.
0.
0.
0.
. _Prob .
.037
.037
,037
.037
.037
.037
,037
.037
.037
Litter
Size
6
8
8
9
9
10
10
11
11
chi- squared
Expected Observed Residual
0
0
0
0
0
0
0
0
0
.222
.296
.296
.334
.334
.371
.371
.408
.408
0
0
0
0
0
0
0
0
0
-0
-0
-0
-0
-0
-0
-0
-0
-0
.2343
.2369
.2369
.2378
.2378
.2385
.2385
.2390
.2390
76
-------
Table A-5.2: Output from Model Run (EPA BMDS NLogistic Model. Revision: 2.6, Date: 2000/03/03)
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
50.0000
50.0000
50.0000
50.0000
50.0000
50.0000
50.0000
50.0000
50.0000
50.0000
50.0000
50.0000
50.0000
50.0000
50.0000
50.0000
50.0000
50.0000
50.0000 -
150.0000
150.0000
150.0000
150.0000
150.0000
150.0000
150.0000
150.0000'
150.0000
150.0000
150.0000
150.0000
150.0000
150.0000
150.0000
150.0000
150.0000
150.0000
150.0000
150.0000
150.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
500.0000
11.0000
11.0000
11.0000
11.0000
11.0000
11.0000
12.0000
14.0000
14 .0000
14.0000
15.0000
15.0000
15.0000
2.0000
5.0000
9.0000
9.0000
9.0000
10.0000
10.0000
11.0000
12.0000
12.0000
12.0000
12.0000
13.0000
13.0000
13.0000
13.0000
13.0000
14.0000
15.0000
3.0000
10.0000
10.0000
11.0000
11.0000
11.0000
12.0000
12.0000
12.0000
12.0000
12.0000
12.0000
13.0000
13.0000
13.0000
13.0000
13.0000
14.0000
14.0000
15.0000
18.0000
6.0000
8.0000
9.0000
10.0000
10.0000
10.0000
11.0000
11.0000
11.0000
11.0000
11.0000
11.0000
11.0000
12.0000
12.0000
12.0000
12.0000
12.0000
12.0000
13.0000
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.037
0.149
0.149
0.149
0.149
0.149
0.149
0.149
0.149
0.149
0.149
0.149
0.149
0.149
0.149
0.149
0.149
0.149
0.149
0.149
0.149
11
11
11
11
11
11
12
14
14
14
15
15
15
2
5
9
9
9
10
10
11
12
12
12
12
13
13
13
13
13
14
15
3
10
10
11
11
11
12
12
12
12
12
12
13
13
13
13
13
14
14
15
18
6
8
9
10
10
10
11
11
11
11
11
11
11
12
.12
12
12
12
12
13
0.408
0.408
0.408
0.408
0.408
0.408
0.445
0.519
0.519
0.519
0.556
0.556
0.556
0.074
0.185
0.334
0.334
0.334
0.371
0.371
0.408
0.445
0.445
0.445
0.445
0.482
0.482
0.482
0.482
0.482
0.519
0.556
0.112
0.372
0.372
0.409
0.409
0.409
0.447
0.447
0.447
0.447
0.447
0.447
0.484
0.484
0.484
0.484
0.484
0.521
0.521
0.558
0.670
0.893
1.191
1.340
1.489
1.489
1.489
1.638
1.638
1.638
1.638
1.638
1.638
1.638
1.787
1.787
1.787
1.787
1.787
1.787
1.936
0 -0.2390
0 -0.2390
0 -0.2390
0 -0.2390
0 -0.2390
0 -0.2390
0 -0.2395
0 -0.2403
0 -0.2403
4 1.6122
0 -0.2406
0 -0.2406
0 -0.2406
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.1962
0 -0.2965
1 0.6722
0 -0.3984
5 4.5396
4 3.5507
0 -0.4048
1 0.5086
0 -0.4104
0 -0.4104
0 -0.4104
0 -0.4104
0 -0.4104
1 0.4431
0 -0.4153
0 -0.4153
0 -0.4153
0 -0.4153
0 -0.4196
0 -0.4196
1 0.3352
0 -0.4330
0 -0.6581
0 -0.6839
6 2.4099
2 0.2404
0 -0.7008
0 -0.7008
7 2.3153
4 1.0199
3 0.5881
2 0.1563
1 -0.2755
0 -0.7073
0 -0.7073
4 0.8828
1 -0.3139
0 -0.7128
0 -0.7128
0 -0.7128
1 -0.3139
6 1.5066
77
-------
Table A-5.2: Output from Model Run (EPA BMDS NLogistic Model. Revision: 2.6, Date: 2000/03/03)
1 500.
2 500.
4 1000.
5 1000.
6 1000.
7 1000.
8 1000.
9 1000.
Q 1000.
1 1000.
2 1000.
3 1000.
4 1000.
5 1000.
6 1000.
7 1000.
8 1000.
9 1000.
20 1000.
l\ 1000.
12 1000.
'3 1000.
'4 1000.
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
13.
15.
3.
3.
3.
3.
3.
9.
9.
9.
10.
10.
10.
10.
11.
11.
11.
12.
12.
12.
13.
13.
14.
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000 ,
0000
0000
0000
0000
0000
0000
0000
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
149
149
867
867
867
867
857
867
867
867
867
867
867
867
867
867
867
867
867
867
867
867
867
13
15
3
3
3
3
3
9
9
9
10
10
10
10
11
11
11
12
12
12
13
13
14
1
2
2
2
2
2
2
7
7
7
8
8
8
8
9
9
9
10
10
10
11
11
12
.936
.234
.601
.601
.601
.601
.601
.803
.803
.803
.670
.670
.670
.670
.536
.536
.536
.403
.403
.403
.270
.270
.137
0
0
3
3
3
3
3
9
9
8
10
8
7
5
11
11
5
12
11
7
13
8
13
-0
-0
0
0
0
0
0
0
0
0
0
-0
-0
-1
0
0
-2
0
0
-1
0
-1
0
.7176
.7255
.5609
.5609
.5609
.5609
.5609
.6958
.6958
.1147
.7053
.3550
.8851
.9454
.7135
.7135
.2115
.7204
.2692
.5358
.7265
.3737
.3389
Combine litters with adjacent levels of the litter-specific covariate
within dose groups until the expected count exceeds 3.0, to help improve
the fit of the X*2 statistic to chi-squared.
32
Grouped Data
>5 Mean
>6 Dose Lit. -Spec. Cov.
>8 0
3 0
Q o
1
2 50
3 50
4 50
5
6 150
7 150
8 150
9
>Q 500
M 500
>2 500
)3 500
)4 500
)5 500
,ft
!«
)7 1000
>ft 1000
>9 1000
)Q 1000
)1 1000
)2 1000
}3 1000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
9
11
14
8
12
14
10
12
14
7
10
11
12
13
15
3
9
10
11
12
13
14
1111
5000
6000
9000
7143
5000
2222
5714
8000
6667
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
chi-squared
Expected Observed Residual
3
3
2
3
3
1
3
3
2
3
4
11
10
3
2
13
23
34
28
31
22
12
.039
.409
.705
.298
.298
.075
.424
.275
.754
.425
.467
.466
.722
.872
.234
.004
.408
.678
.609
.210
.541
.137
0
0
4
0
0
0
11
1
1
6
2
17
6
6
0
15
26
30
27
30
21
13
-0
-0
0
-0
-0
-0
2
-0
-0
0
-0
0
-0
0
-0
1
0
-1
-0
-0
-0
0
.7043
.6744
.2572
.5882
.5187
.2773
.5976
.7591
.5991
.8773
.6704
.9031
.7689
.5579
.7255
.2542
.8696
.2400
.4530
.3153
.4577
.3389
65
Chi-square =
17.35
DF
13
P-value
0.1837
To calculate the BMD and BMDL, the litter specific covariate is fixed
at the mean litter specific covariate of control group: 11.227273
Benchmark Dose Computation
Specified effect = 0.1
Risk Type = Extra risk
78
-------
Table A-5.2: Output from Model Run (EPA HMDS NLogistic Model. Revision: 2.6, Date: 2000/03/03)
\
Confidence level = 0.95
3 HMD = 485.152
4
5 BMDL = 409.019
6
7
79
-------
1
2 6. Human Data
3
4 Opportunities for modeling human toxicological data are limited, and the human studies
5 are less standardized than studies of experimental animals; thus modeling of human data is done
6 on a case-specific basis. For some examples of benchmark dose modeling of human data, please
7 refer to the following references; although it should be noted that these examples precede this
8 benchmark dose modeling guidance and may not strictly adhere to the recommendations
9 described herein. One example presented in EPA's IRIS database is for peripheral nervous
10 system dysfunction induced by carbon disulfide in occupationally exposed workers (U.S.
11 Environmental Protection Agency, 1995a). Another example in IRIS is for developmental
12 neurologic abnormalities in human infants from exposure to methylmercury (U.S. Environmental
13 Protection Agency, 1995b). More recent examples of benchmark dose modeling of
14 methylmercury developmental neurologic effects from different databases are reported by Crump
15 et al. (2000) and Budtz-Jorgensen et al. (2000).
16
17 References
18
19 IRIS (2000)...
20
21 Budtz-Jorgensen E, Grandjean P, Keiding N, White RF, Weihe P (2000) Benchmark dose
22 calculations of methylmercury-associated neurobehavioural deficits. Toxicology Letters 112-
23 113:193-199.
24
25 Crump KS, Van Landingham C, Shamlaye C, Cox C, Davidson PW, et al. (2000) Benchmark
26 concentrations for methylmercury obtained from the Seychelles Child Development Study.
27 Environ Health Perspect 108:257-263.
28
29 U.S. Environmental Protection Agency (EPA). (1995a). Integrated Risk Information System
30 (IRIS): Online substance file for carbon disulfide
31 (http://www.epa.gov/ngispgm3/iris/index.htmiy National Center for Environmental Assessment,
32 Washington, DC.
33
34 U.S. Environmental Protection Agency (EPA). (1995b). Integrated Risk Information System
35 (IRIS): Online substance file for methylmercury (http://www.epa.gov/ngispgm3/iris/index.html).
36 National Center for Environmental Assessment, Washington, DC.
80
-------
1 GLOSSARY
2
3 Akaike Information Criteria (AIC) : A statistical procedure that provides a measure of the
4 goodness-of-fit of a dose-response model to a set of data. AIC = -2 x (LL - p), where LL is the
5 log-likelihood at the maximum likelihood fit, and p is the degrees of freedom of the model
6 (usually, the number of parameters estimated).
7
8 Asymptotic Test: Statistical tests that approach known properties as sample sizes increase.
9
10 Benchmark Concentration (BMC): The concentration of a substance inhaled that is associated
11 with a specified low incidence of risk, generally in the range of 1% to 10%, of a health effect; or
12 the concentration associated with a specified measure or change of a biological effect.
13
14 Benchmark Dose (BMD) : An exposure due to a dose of a substance associated with a specified
15 low incidence of risk, generally in the range of 1% to 10%, of a health effect; or the dose
16 associated with a specified measure or change of a biological effect.
17
18 Benchmark Response (BMR): The response, generally expressed as in excess of background (see
19 for example, Extra Risk), at which a benchmark dose or concentration is desired (see Benchmark
20 Dose, Benchmark Concentration).
21
22 Beta-Binomial Distribution : A statistical distribution of clustered values, e.g., measures on
23 offspring in a litter, where the average proportions of an event for clusters are described by a Beta
24 distribution and the proportions of events in a cluster are described by a binomial distribution.
25
26 Binomial Distribution : The statistical distribution of the probabilities of observing 0,1,2,- - - ,n
27 events in a sample of n independent trials each with the same individual probability that the event
28 occurs.
29
30 BMCL: A lower one-sided confidence limit on the BMC.
31
32 BMDL: A lower one-sided confidence limit on the BMD.
33
34 Bootstrap : A statistical technique based on multiple resampling with replacement of the sample
35 values or resampling of estimated distributions of the sample values that is used to calculate
36 confidence limits or perform statistical tests for complex situations or where the distribution of
37 an estimate or test statistic cannot be assumed.
38
39 Cancer Potency ( Cancer Slope Factor ) : A number that estimates the cancer risk ( incidence )
40 for a lifetime exposure to a substance per unit of dose, dose is generally expressed as mg / kg
41 body wt/day.
42
43 Categorical Data : Results obtained where observations or measurements on individuals or
44 samples are stratified according to degree or severity of an effect, e.g., none, mild, moderate,or
81
-------
1 severe.
2
3 Chi-square Test: A statistical test used to examine the deviation of an observed number of
4 events from an expected number of events.
5
6 Clustered Data : Measurements collected on some grouping of individuals, e.g., litters in
7 reproductive and developmental studies.
8
9 Confidence Interval ( Two-Sided ) : An estimated interval from the lower to upper confidence
10 limit of an estimate of a parameter. This interval is expected to include the true value of the
11 parameter with a specified confidence percentage, e.g., 95% of such intervals are expected to
12 include the true values of the estimated parameters.
13
14 Confidence Interval ( One-Sided ) : An interval below the estimated upper confidence limit, or
15 interval above the estimated lower confidence limit, that is expected to include the true value of
16 an estimated parameter with a specified confidence (percent of the time).
17
18 Confidence Limit: An estimated value below ( or above ) which the true value of an estimated
19 parameter is expected to lie for a specified percentage of such estimated limits.
20
21 Constrained Dose-Response Model: Estimates of one or more parameters of the model are
22 restricted to a specified range, e.g., equal to or greater than zero.
23
24 Continuous Data : Effects Measured on a continuum, e.g., organ weight or enzyme concentration,
25 as opposed to quantal or categorical data where effects are classified by assignment to a class.
26
27 Convergence : Estimates of a parameter approach a single value with increasing sample size or
28 increasing number of computer iterations.
29
30 Convex : Region of a dose-response relationship that curves upward, i.e., the slope becomes
31 steeper with increasing dose.
32
33 Correlated Binomial Distribution : Clustered data where the individual values in a cluster ,e.g., a
34 litter, each have the same probability of expressing an effect.
35
36 Covariate : An independent variable other than dose that may influence the outcome of an effect,
37 e.g., age, body weight, or polymorphism.
38
39 Coverage : See confidence intervals or confidence limits.
40
41 Cubic : An effect is a function of a measure raised to the third power.
42
43 Degrees of Freedom : For dose-response model fitting, the number of data points minus the
44 number of model parameters estimated from the data.
45
82
-------
1 Delta Method : Variance of a function of random variables approximated from the derivatives of
2 the function with respect to the random variables and the variances of the random variables.
3
4 Dichotomous Data : Quantal data where an effect for an individual may be classified by one of
5 two possibilities, e.g., dead or alive, with or without a specific type of tumor.
6
7 Dispersion : Variation ( differences ) from a central ( mean or median ) value.
8
9 Dose-Response Model: A mathematical relationship ( function ) that relates ( predicts ) a
10 measure of an effect to a dose.
11
12 Dose-Response Trend : Relationship between incidence or severity of a biological effect and a
13 function of dose. Simply the slope for a linear dose-response.
14
15 EC_x : Effective exposure concentration associated with a biological effect in x% of the
16 individuals. Often used for inhalation exposures based on the airborne concentration.
17
18 ED_x : Effective dose associated with a biological effect in x% of the individuals. Dose may be
19 the external exposure often expressed in mg per day of the substance per kg body weight raised
20 to a power ( generally 1, 3/4, or 2/3 ) or area under the curve ( AUC ) in blood or target tissue
21 where the substance remains in the body over a period of time.
22
23 Estimate : An empirical value derived from data for a parameter.
24
25 Excess Risk : Proportion of individuals or animals observed or estimated to possess an effect in
26 addition to the spontaneous background risk.
27
28 Extra Risk: [P(d)-P(0)]/[l - P(0)], where P(d) is the risk at a dose = d and P(0) is the background
29 risk at zero dose.
30
31 Gamma Distribution : A unimodal statistical distribution ( relative proportion of responders as a
32 function of some measure ) that is restricted to effects greater than or equal to zero that can
33 describe a wide variety of shapes, e.g., flat, peaked, asymmetrical.
34
35 Gaussian ( Normal) Distribution : A unimodal symmetrical ( bell-shaped ) distribution where the
36 most prevalent value is the mean ( average ) and the spread is measured by the standard
37 deviation. Mathematically, the distribution varies from minus infinity with zero probability to
38 plus infinity with zero probability.
39
40 Generalized Estimating Equation ( GEE ) : A statistical technique used for estimating parameters
41 that requires only specification of the first two moments of the distribution of the estimator as
42 opposed to a complete specification of the distribution.
43
44 Goodness-of-Fit: A statistic that measures the dispersion of data about a dose-response curve in
45 order to provide a test for rejection of a model due to lack of an adequate fit, e.g., a P-value < 0.1.
83
-------
1 Hazard Identification : Detection of an adverse biological effect, or precursor to an adverse
2 effect, as a result of exposure to a substance.
3
4 Hill Equation : A dose-response curve, frequently used for enzyme kinetics, that monotonically
5 approaches an asymptote (maximum value ) as a function of dose raised to a power.
6
7 Hybrid Model : For continuous data establishes abnormal values based on the extremes in
8 controls ( unexposed individuals or animals ) and estimates the risk of abnormal levels as a
9 function of dose.
10
11 Incidence : Proportion or probability of individuals or animals exhibiting an effect, that varies
12 from zero to one, sometimes expressed as a percent from 0% to 100%.
13
14 Independence : The result in one animal or individual does not influence the result in another
15 animal or individual.
16
17 Intercept Term : The estimated value at zero dose or the dose corresponding to a zero effect.
18
19 Least Squares : A statistical procedure that estimates the values of dose-response parameters such
20 that the sum of squares of deviations of data points from their estimated values is minimized, i.e.,
21 minimizes the estimated variance.
22
23 Likelihood Ratio : Ratio of the probability that the observed data arise from a set of model
24 parameters relative to the maximum probability that arises from the set of maximum likelihood
25 estimates.
26
27 Linear Dose-Response Model: The amount of change in a response is proportional to the amount
28 of change in some function of dose.
29
30 Linearized Multistage Model: Dose-response model based on the multistage model of
31 carcinogenesis that is restricted to a form that is approximately linear at low doses.
32
33 Local Maximum : Mathematical solution that maximizes a function in a region that may not be
34 the overall global maximum.
35
36 Likelihood Function : Relative probabilities that various values of population parameters would
37 arise from the sample observations.
38
39 Logistic Model: A sigmoid ( S-shaped ) function that relates the proportion of individuals with a
40 specified characteristic to an independent variable, e.g., dose.
41
42 Log Transformation : Logarithm of raw data.
43
44 Maximum Likelihood Estimate (MLE) : Estimate of a population parameter most likely to have
45 produced the sample observations.
84
-------
1 Michaelis-Menten Equation : A dose-response curve, frequently used for enzyme kinetics, with
2 maximum slope at zero dose that approaches a maximum asymptote at increasing dose.
3
4 Margin of Exposure (MOE) : Ratio of a dose that produces a specified effect, e.g., a benchmark
5 dose, to an expected human dose.
6
7 Moment Estimates : A statistical estimation procedure that equates population moments to
8 sample moments.
9
10 Monotonic Dose-Response : A dose-response that never decreases as dose increases. A
11 monotonic function may be flat (constant) up to a threshold dose or may be flat at high doses if a
12 biological limit, e.g., saturation, is attained.
13
14 Multinomial: Animals or individuals may be classified by more than two (binomial) categories,
15 e.g., in a reproductive study fetuses may be : dead, alive normal, or alive abnormal.
16
17 Nonlinear Dose-Response Model: Mathematical relationship that cannot be expressed simply as
18 the change in response being proportional to the amount of change of some function of dose.
19
20 Objective Function : Choice of function that is optimized for maximum likelihood estimation.
21
22 Ordinal Data : Integers designating the rank, order, or counts.
23
24 P-Value : In testing a hypothesis, the probability of a type I error (false positive) . The
25 probability that the sample (experimental) results are compatible with a specific hypothesis.
26
27 Parameter : A value used to numerically describe a population of values, e.g., the mean and
28 standard deviation; or a value used to describe a dose-response curve, e.g., the intercept and the
29 slope of a linear dose-response.
30
31 Point of Departure (POD) : The point on a dose-response curve established from experimental
32 data, e.g., the benchmark dose, generally corresponding to an estimated low effect level ( e.g.,
33 1% to 10% incidence of an effect). Depending on the mode of action and available data, some
34 form of extrapolation below the POD may be employed for low-dose risk assessment or the POD
35 may be divided by a series of uncertainty factors to arrive at a reference dose.
36
37 Polynomial: A mathematical function of the sum of a constant, linear term, quadratic term, cubic
38 term, etc.
39
40 Probability : The proportion (on a scale of 0 to 1) of cases for which a particular event occurs.
41 Zero indicates the event never occurs and one indicates the event always occurs.
42
43 Probability Distribution : A mathematical description of the relative probabilities of all possible
44 outcomes of a measurement.
45
85
-------
1 Probit Function : Assumes that the relative probabilities of effects as a function of dose are
2 described by a Normal distribution. The cumulative probability as a function of dose has a
3 sigmoid shape.
4
5 Profile Likelihood : A plot of the likelihood function versus the estimated value of a parameter.
6
7 Quadratic Term : A quantity in a mathematical formula that is raised to the second power (
8 squared).
9
10 Quantal Data : Dichotomous ( Binomial) classification where an individual or animal is placed
11 in one of two categories, e.g., dead or alive, with or without a particular type of tumor, normal or
12 abnormal level of a hormone.
13
14 Quantile : Percentile ( cumulative probability ) of a distribution that ranges from zero to the
15 lOOthpercentile.
16
17 Quasi-Likelihood : Likelihood function that is not totally defined and generally based on only an
18 expression including the mean and variance.
19
20 Rectangular Hyperbola : A mathematical function of the form y squared equals x squared plus c
21 squared, where x and y are variables and c is a constant.
22
23 Regression Analysis : A statistical process that produces a mathematical function (regression
24 equation ) that relates a dependent variable ( biological effect) to independent variable, e.g., dose
25 rate, duration of exposure, age.
26
27 Repeated Measures : A biological endpoint is measured for the same individual or animal at
28 different times ( ages ).
29
30 Residual Variance : The variance in experimental measurements remaining after accounting for
31 the variance due to the independent variables, e.g., dose rate, duration of exposure, age.
32 Typically referred to as the inherent unaccountable experimental variation.
33
34 Residuals : The numerical differences between observed and estimated effects.
35
36 Reference Concentration ( RfC ) : An estimate of the concentration of daily exposure to a
37 substance ( with uncertainty spanning perhaps an order of magnitude ) for a human population (
38 including sensitive subgroups ) that is likely to be without an appreciable risk of deleterious
39 effects during a lifetime.
40
41 Reference Dose ( RfD ) : Replace " concentration " by " dose " in the above definition.
42
43 Risk : Probability that an animal or individual exhibits a particular adverse effect for a specified
44 exposure, expressed on a probability scale of 0 to 1. May be expressed as the proportion of a
45 population effected and often converted to the percent effected.
86
-------
1 Risk Characterization : The process of combining dose-response information with exposure
2 information in order to estimate risk.
3
4 S-Plus : Computer software for performing statistical analyses.
5
6 SAS : Computer software for performing statistical analyses.
7
8 Second Degree : A mathematical function that contains a quadratic term.
9
10 Shape Parameter : The exponent on dose in a dose-response function that dictates the curvature
11 of the function.
12
13 Significance ( Statistical Significance ) : See P-value.
14
15 Threshold Dose : Dose below which a specified biological effect does not occur, generally for a
16 particular population. Hence, the threshold dose is for the most sensitive individual in a
17 population.
18
19 Uncertainty : The unknown effects of parameters, variables, or relationships that cannot or have
20 not been verified or estimated by measurement or experimentation.
21
22 Uncertainty Factor : The value ( often a default value of 10 ) used as a divisor of a NOAEL,
23 LOAEL, or benchmark dose to calculate a RfC or RfD. Uncertainty factors are applied as needed
24 for extrapolation of results in experimental animals to humans, interindividual variability
25 including sensitive subgroups, extrapolation from a LOAEL to a NOAEL, extrapolation of
26 results from subchronic exposures to chronic exposures, and database inadequacies.
27
28 Unconstrained Dose-Response Model: No restrictions imposed on the estimates of parameters.
29
30 Upper-Tail Probability : Probability that a variable exceeds a specified value.
31
32 Variability: Observable diversity in biological sensitivity or response, and in exposure parameters
33 (such as breathing rates, food consumption, etc.) These differences can be better understood, but
34 generally not reduced by further research.
35
36 Variance : Measure of variability, standard deviation squared.
37
38 Weibull: Form of a dose-response curve characterized by a relatively shallow slope at low doses
39 that increases sharply as dose increases before leveling off at high doses.
40
41 Weighted Least Squares Estimate : Parameter estimate obtained by minimizing the sum of
42 squares of observed and estimated values weighted by a function, frequently the reciprocal of the
43 variance of an observation.
87
------- |