United States Office of Water Environmental Protection Office of Science and Technology Agency 4304 EPA-822-B-00-005 October 2000 U>EPA Methodology for Deriving Ambient Water Quality Criteria for the Protection of Human Health (2000) Technical Support Document Volume 1: Risk Assessment ------- EPA-822-B-00-005 October 2000 Methodology for Deriving Ambient Water Quality Criteria for the Protection of Human Health (2000) Technical Support Document Volume 1: Risk Assessment Final Office of Science and Technology Office of Water U.S. Environmental Protection Agency Washington, DC 20460 ------- LIST OF ACRONYMS AEL AWQC BAF BCF BMC BMD BMR BW CFR CR CWA D DI DNA ED10 EPA ER PEL FI GI HA IARC ILSI IRIS kg L LC50 LD50 LED10 LMS LOAEL LR MF mg ml MLE MOA MOE NOAEL NOEL NTIS Adverse-Effect Level Ambient Water Quality Criteria Bioaccumulation Factor Bioconcentration Factor Benchmark Concentration Benchmark Dose Benchmark Response Body Weight Code of Federal Regulations Consumption Rate Clean Water Act Dose Drinking Water Intake Deoxyribonucleic Acid Dose Associated with a 10 Percent Extra Risk Environmental Protection Agency Extra Risk Frank Effect Level Fish Intake Gastrointestinal Health Advisory International Agency for Research on Cancer International Life Sciences Institute Integration Risk Information System kilogram Liter Lethal concentration to 50 percent of the population Lethal dose to 50 percent of the population The Lower 95 Percent Confidence Limit on a Dose Associated with a 10 Percent Extra Risk Linear Multistage Model Lowest Observed Adverse Effect Level Lifetime Risk Modifying Factor Milligrams Milliliters Maximum Likelihood Estimate Mode of Action Margin of Exposure No-Observed-Adverse-Effect Level No-Observed-Effect Level National Technical Information Service ------- OSTP Office of Science and Technology Policy PAH Polycyclic Aromatic Hydrocarbon PCB Polychlorinated Biphenyl POD Point of Departure qx* Cancer Potency Factor RfC Reference Concentration RfD Reference Dose RfDDT Developmental Toxicity Reference Dose RPF Relative Potency Factor RSC Relative Source Contribution RSD Risk-Specific Dose SAB Science Advisory Board TEF Toxicity Equivalency Factor TSD Technical Support Document UF Uncertainty Factor USEPA U.S. Environmental Protection Agency ------- METHODOLOGY FOR DERIVING AMBIENT WATER QUALITY CRITERIA FOR THE PROTECTION OF HUMAN HEALTH (2000) TECHNICAL SUPPORT DOCUMENT VOLUME 1 RISK ASSESSMENT Page 1. INTRODUCTION 1-1 1.1 Background 1-1 1.2 Need for Revision of the 1980 Human Health Methodology for Deriving AWQC .... 1-2 1.2.1 Scientific Advances Since 1980 1-2 1.2.2 EPA Risk Assessment Guidelines Development Since 1980 1-2 1.3 Purpose of this Document 1-3 1.4 Criteria Equations 1-4 1.5 References 1-5 2. CANCER EFFECTS 2-1 2.1 1986 EPA Guidelines for Carcinogenic Risk Assessment 2-1 2.2 Revisions to EPA's Carcinogen Risk Assessment Guidelines 2-2 2.3 Description of the Methodology for Deriving AWQC Based on the Revised Carcinogen Risk Assessment 2-5 2.3.1 Weight-of-EvidenceNarrative 2-6 2.3.1.1 Mode of Action: General Considerations and Framework for Analysis 2-7 2.3.1.2 Framework for Evaluating a Postulated Carcinogenic Mode(s) of Action 2-7 2.3.2 Dose Estimation by the Oral Route 2-8 2.3.2.1 Determining the Human Equivalent Dose 2-8 2.3.3 Dose-Response Analysis 2-9 2.3.3.1 Characterizing Dose-Response Relationships in the Range of Observation 2-9 2.3.3.2 Extrapolation to Low, Environmentally Relevant Doses 2-11 2.3.4 AWQC Calculation 2-17 2.3.4.1 Linear Approach 2-17 2.3.4.2 Nonlinear Approach 2-17 2.3.5 Risk Characterization 2-18 2.3.6 Use of Toxicity Equivalence Factors and Relative Potency Estimates 2-19 2.4 Case Study (Compound Z, a Rodent Bladder Carcinogen) 2-19 2.4.1 Background and Evaluation for Compound Z 2-20 2.4.2 Conclusion and Use of the MOE Approach for Compound Z 2-21 2.4.2.1 Identification of the Point of Departure (POD) for Compound Z 2-21 in ------- 2.4.2.2 Discussion of the Points Affecting Selection of the UF for Compound Z 2-22 2.4.2.3 AWQC Calculations for Compound Z 2-23 2.4.3 Use of the Default Linear Approach for Compound Z 2-24 2.4.3.1 Computing the Human Equivalent Dose for Compound Z 2-24 2.4.3.2 Calculation of AWQC for Compound Z 2-24 2.4.4 Use of the LMS Approach for Compound Z 2-25 2.4.5 Comparison of Approaches and Results for Compound Z 2-26 2.5 References 2-26 3. NONCANCER EFFECTS 3-1 3.1 Introduction 3-1 3.2 Hazard Identification 3-2 3.3 Dose-Response Assessment 3-3 3.4 Selection of Critical Data 3-3 3.4.1 Critical Study 3-3 3.4.2 Critical Data and Endpoint 3-5 3.5 Deriving RfDs Using the NOAEL/LOAEL Approach 3-5 3.5.1 Selection of Uncertainty Factors and Modifying Factors 3-6 3.5.2 Confidence in NOAEL/LOAEL-Based RfD 3-9 3.5.3 Presenting the RfD as a Single Point or as a Range 3-11 3.6 Deriving an RfD Using a Benchmark Dose Approach 3-14 3.6.1 Overview of the Benchmark Dose Approach 3-15 3.6.2 Calculation of the RfD Using the Benchmark Dose Method 3-17 3.6.2.1 Selection of Response Data to Model 3-17 3.6.2.2 Use of Categorical Versus Continuous Data 3-18 3.6.2.3 Choice of Mathematical Model 3-18 3.6.2.4 Handling Model Fit 3-20 3.6.2.5 Measure of Altered Response 3-21 3.6.2.6 Selection of the BMR 3-22 3.6.2.7 Calculating the Confidence Interval 3-22 3.6.2.8 Selection of the BMD as the Basis for the RfD 3-23 3.6.2.9 Use of Uncertainty Factors with BMD Approach 3-23 3.6.3 Limitations of the BMD Approach 3-24 3.6.4 Example of the Application of the BMD Approach 3-24 3.6.4.1 Selection of Data to Model 3-24 3.6.4.2 Choice of Mathematical Model 3-24 3.6.4.3 Results of Information Above 3-25 3.6.4.4 Selection of the BMR 3-27 3.6.4.5 Calculating the Confidence Interval 3-28 3.6.4.6 Selection of the BMD as the Basis for the RfD 3-28 3.6.4.7 Use of Uncertainty Factors with BMD Approach 3-28 IV ------- 3.7 Categorical Regression 3-28 3.7.1 Summary of the Method 3-28 3.7.2 Steps in Applying Categorical Regression 3-29 3.8 Chronic, Practical Nonthreshold Effects 3-30 3.9 Acute, Short-Term Effects 3-30 3.10 Mixtures 3-31 3.11 References 3-33 APPENDICES Appendix A. Case Study Example - Hazard Evaluation for Compound Z A-l Appendix B. Case Study Example - Mode of Action Evaluation: Compound Z (Bladder Tumor) B-l Appendix C. Evaluation of the Quality of Data Set(s) for Use in Deriving an RfD C-l ------- NOTICE The policies and procedures set forth in this document are intended solely to describe EPA methods for developing or revising ambient water quality criteria to protect human health, pursuant to Section 304(a) of the Clean Water Act, and to serve as guidance to States and authorized Tribes for developing their own water quality criteria. This guidance does not substitute for the Clean Water Act or EPA's regulations; nor is it a regulation itself. Thus, it does not impose legally-binding requirements on EPA, States, Tribes or the regulated community, and may not apply to a particular situation based upon the circumstances. This document has been reviewed in accordance with U.S. Environmental Protection Agency policy and approved for publication. Mention of trade names or commercial products does not constitute endorsement or recommendation for use. VI ------- 1. INTRODUCTION This document provides technical support concerning cancer and noncancer risk assessment methods used in the Methodology for Deriving Ambient Water Quality Criteria for the Protection of Human Health (2000) (USEPA, 2000a; hereafter the "2000 Human Health Methodology"). 1.1 BACKGROUND Ambient water quality criteria (AWQC) developed under Section 304(a) of the Clean Water Act (hereafter the "CWA" or "the Act") are based solely on data and scientific judgments on the relationship between pollutant concentrations and environmental and human health effects. The 304(a) criteria do not reflect consideration of economic impacts or the technological feasibility of meeting the chemical concentrations in ambient water. As discussed below, 304(a) criteria are used by States and authorized Tribes to establish water quality standards, and ultimately provide a basis for controlling discharges or releases of pollutants. The U.S. Environmental Protection Agency (EPA) published the availability of AWQC documents for 64 toxic pollutants and pollutant categories identified in Section 307(a) of the CWA in the Federal Register on November 28, 1980 (USEPA, 1980). The November 1980 Federal Register notice (hereafter the "1980 Methodology") also summarized the criteria documents and discussed in detail the methods used to derive the AWQC for those pollutants. The AWQC for those 64 pollutants and pollutant categories were published pursuant to Section 304(a)(l)oftheCWA: The Administrator, . . . shall develop and publish, . . . , (andfrom time to time thereafter revise) criteria for water quality accurately reflecting the latest scientific knowledge (A) on the kind and extent of all identifiable effects on health and welfare including, but not limited to, plankton, fish, shellfish, wildlife, plant life, shorelines, beaches, esthetics, and recreation which may be expected from the presence of pollutants in any body of water, including ground water; (B) on the concentration and dispersal of pollutants, or their byproducts, through biological, physical, and chemical processes; and (c) on the effects of pollutants on the biological community diversity, productivity, and stability, including information on the factors affecting rates of eutrophication and rates of organic and inorganic sedimentation for varying types of receiving waters. The 1980 Methodology provided two essential types of information: (1) discussions of available scientific data on the effects of the pollutants on public health and welfare, aquatic life, and recreation; and (2) quantitative concentrations or qualitative assessments of the levels of pollutants in water which, if not exceeded, will generally ensure adequate water quality for a specified water use. 1-1 ------- The 1980 AWQC were derived using guidelines and methodologies developed by the Agency for calculating the impact of waterborne pollutants on aquatic organisms and on human health. Those guidelines and methodologies consisted of systematic procedures for assessing valid and appropriate data concerning a pollutant's acute and chronic adverse effects on aquatic organisms, nonhuman mammals, and humans. The guidelines and methodologies were fully described in Appendix B (for protection of aquatic life and its uses) and Appendix C (for protection of human health) of the November 1980 Federal Register Notice. 1.2 NEED FOR REVISION OF THE 1980 HUMAN HEALTH METHODOLOGY FOR DERIVING AWQC 1.2.1 Scientific Advances Since 1980 Since 1980, EPA risk assessment practices have evolved significantly in the areas of cancer and noncancer risk assessments, exposure assessments, and bioaccumulation assessment. In cancer risk assessment, there have been advances on the use of mode of action (MOA) information to support both the identification of carcinogens and the selection of procedures to characterize risk at low, environmentally relevant exposure levels. Related to this is the development of new procedures for quantifying cancer risk at low doses to replace the current default linear multistage model (LMS). In noncancer risk assessment, the Agency is moving toward the use of statistical models, such as the benchmark dose approach and categorical regression, to derive reference doses (RfDs) in place of the traditional NOAEL-(no-observed-adverse-effect level)-based method. In exposure analysis, several new studies have addressed water consumption and fish consumption. These exposure studies provide a more current and comprehensive description of national, regional, and special-population consumption patterns; these are reflected in the 2000 Human Health Methodology (USEPA, 2000). In addition, more formalized procedures are now available to account for human exposure from multiple sources when setting health goals that address only one exposure source. With respect to bioaccumulation, the Agency has moved toward the use of a bioaccumulation factor (BAF) to reflect the uptake of a contaminant by fish from all sources rather than just from the water column as reflected by the use of a bioconcentration factor (BCF) in the 1980 Methodology. The Agency has also developed detailed procedures and guidelines for estimating BAF values. 1.2.2 EPA Risk Assessment Guidelines Development Since 1980 When the 1980 Methodology was developed, EPA had not yet developed formal cancer or noncancer risk assessment guidelines. Since then, EPA has published several risk assessment guidelines documents. Guidelines for Carcinogen Risk Assessment were published in 1986 1-2 ------- (USEPA, 1986a) (hereafter the "1986 cancer guidelines") as were Guidelines for Mutagenicity Risk Assessment (USEPA, 1986b). In 1996, the Agency published Proposed Guidelines for Carcinogen Risk Assessment (USEPA, 1996a) (hereafter the "1996 proposed cancer guidelines"), which were subsequently revised in July 1999 following extensive external review (USEPA, 1999a, hereafter the "1999 draft revised cancer guidelines"). When final guidelines are published, they will replace the current Guidelines for Carcinogen Risk Assessment published in 1986 (USEPA, 1986a) (hereafter the "1986 cancer guidelines"). With respect to noncancer risk assessment, the Agency published Guidelines for Developmental Toxicity Risk Assessment in 1991 (USEPA, 1991) and Guidelines for Reproductive Toxicity Risk Assessment m 1996 (USEPA, 1996b). In 1998, EPA published final Guidelines for Neurotoxicity Risk Assessment (USEPA, 1998), and in 1999, it issued draft Guidance for Conducting Health Risk Assessment of Chemical Mixtures (USEPA, 1999b). In addition, the Agency is developing a framework for cumulative risk assessment and the Office of Pesticide Programs has developed draft guidance for assessing cumulative risk of common mechanism pesticides and other substances. 1.3 PURPOSE OF THIS DOCUMENT This Risk Assessment Technical Support Document (TSD) (hereafter the "Risk Assessment TSD") provides additional technical detail on the principles and recommendations presented in the 2000 Human Health Methodology for risk assessments to be used in deriving AWQC. Also included are illustrative examples to explain the thought process behind many of the new risk assessment directions being taken by EPA. For instance, there is an example of how to apply principles of the 1999 draft revised cancer guidelines to a chemical for which the MOA is considered to be a threshold process.1 For noncancer assessment, an example is included on how to use the benchmark dose (BMD) approach. The focus of the 2000 Human Health Methodology, which this document accompanies, is the development of AWQC to protect human health. The Agency intends to use the 2000 Human Health Methodology both to develop new AWQC for additional chemicals and to revise existing AWQC. It is important to emphasize that the 2000 Human Health Methodology is also intended to provide States and authorized Tribes flexibility in setting water quality standards by providing scientifically valid options for developing their own water quality criteria that consider local conditions. States and authorized Tribes are encouraged to use the Methodology to derive their own AWQC. The 2000 Human Health Methodology also defines the default factors EPA will use in evaluating and determining consistency of State and Tribal water quality standards with the requirements of the CWA and the implementing federal regulation (40 CFR 131). These 1 Throughout this document, the term "risk level" regarding a cancer assessment using linear approach refers to an upper bound estimate of excess lifetime cancer risk. 1-3 ------- default factors will also be used by the Agency to calculate 304(a) criteria when promulgating water quality standards for a State or Tribe under Section 303(c) of the Act. 1.4 CRITERIA EQUATIONS The following equations for deriving AWQC include toxicological parameters which are derived from scientific analysis, science policy, and risk management decisions. An example of an empirically measured, science-based value is a point of departure (POD) from an animal study [in the form of a lowest-observed-adverse-effect level (LOAEL)/no-observed-adverse-effect level (NOAEL)/ lower 95 percent confidence limit on a dose associated with a 10 percent extra risk (LED10)]. The decision to use animal effects as a surrogate for human effects involves judgment on the part of the EPA (and other agencies) as to the best practice to follow when human data are lacking. Such a decision is a matter of science policy. On the other hand, the choice to base AWQC on protection of the 90th percentile of the general population's water consumption rate is a risk management decision. In many cases, the Agency has selected parameters using its best judgment of the overall protection afforded by the resulting AWQC when all parameters are combined. This issue is discussed further in the 2000 Human Health Methodology document, along with further details on risk characterization as related to this Methodology with emphasis placed on explaining the uncertainties in the overall risk assessment. The generalized equations for deriving AWQC based on noncancer and cancer effects are: Noncancer Effects AWQC = RfD RSC BW DI + E i=2 (Equation 1-1) Cancer Effects: Nonlinear Low-Dose Extrapolation AWQC = POD UF RSC BW DI + E i=2 (Equation 1-2) 1-4 ------- Cancer Effects: Linear Low-Dose Extrapolation where: AWQC RfD POD UF RSD RSC BW DI FI BAF 1.5 REFERENCES AWQC = RSD BW DI + £ (FI;- BAFj) i=2 (Equation 1-3) Ambient Water Quality Criterion (mg/L, or milligrams/Liter) Reference dose for noncancer effects (mg/kg-day, or milligram/kilogram-day) Point of departure for carcinogens based on a nonlinear low-dose extrapolation (mg/kg-day), usually a LOAEL, NOAEL, or LED10 Uncertainty Factor for carcinogens based on a nonlinear low-dose extrapolation carcinogens (unitless) Risk-specific dose for carcinogens based on a linear low-dose extrapolation (mg/kg-day) (Dose associated with a target risk, such as 10"6) Relative source contribution factor to account for non-water sources of exposure. (Not used for carcinogens based on a linear low-dose extrapolation) May be either a percentage (multiplied) or amount subtracted, depending on whether multiple criteria are relevant to the chemical. Human body weight (default = 70 kg for adults) Drinking water intake (default = 2 L/day for adults) Fish intake (defaults = 0.0175 kg/day for general population and sport anglers, and 0.142 kg/day for subsistence fishers) Bioaccumulation factor, lipid normalized (L/kg) USEPA (U.S. Environmental Protection Agency). 1980. Guidelines and methodology used in the preparation of health effect assessment chapters of the consent decree water criteria documents. Federal Register 45: 79347. USEPA (U.S. Environmental Protection Agency). 1986a. Guidelines for carcinogen risk assessment. Federal Register 51: 33992-34003. September 24. USEPA (U.S. Environmental Protection Agency). 1986b. Guidelines for mutagenicity risk assessment. Federal Register 51: 34006-34012. September 24. 1-5 ------- USEPA (U.S. Environmental Protection Agency). 1986c. Guidelines for the health risk assessment of chemical mixtures. Federal Register 51: 33992-34003. September 24. USEPA (U.S. Environmental Protection Agency). 1991. Guidelines for developmental toxicity risk assessment. Federal Register 56: 63798-63826. USEPA (U.S. Environmental Protection Agency). 1996. Proposed guidelines for carcinogen risk assessment. Federal Register 61: 17960. USEPA (U.S. Environmental Protection Agency). 1998. Guidelines for neurotoxicity risk assessment. Federal Register 63: 26926. USEPA (U.S. Environmental Protection Agency). 1999a. 1999 Guidelines for Carcinogen Risk Assessment. Review Draft. Risk Assessment Forum. Washington, DC. EPA/NCEA-F- 0644. July. USEPA (U.S. Environmental Protection Agency). 1999b. Guidance for conducting health risk assessment of chemical mixtures. Federal Register 64: 23833. USEPA (U.S. Environmental Protection Agency). 2000. Methodology for Deriving Ambient Water Quality Criteria for the Protection of Human Health (2000). Office of Science and Technology, Office of Water. Washington, DC. EPA-822-B-00-004. August. 1-6 ------- 2. CANCER EFFECTS This section provides a discussion of the current status of the cancer risk assessment methodology employed by EPA which is based on recent scientific developments and the Agency's experience in this field. A discussion is provided of: Background information on the current cancer risk assessment methods in the 1986 Guidelines for Carcinogen Risk Assessment (USEPA, 1986; hereafter "1986 cancer guidelines"); and New principles recommended in the Guidelines for Carcinogen Risk Assessment. Review Draft (USEPA, 1999a; hereafter "1999 draft revised cancer guidelines").2 When final guidelines are published, they will replace the 1986 cancer guidelines, including their application in the Methodology for deriving AWQC for carcinogens. 2.1 1986 EPA GUIDELINES FOR CARCINOGENIC RISK ASSESSMENT In 1986, EPA published its Guidelines for Carcinogenic Risk Assessment (hereafter "1986 cancer guidelines"). These guidelines were based on the publication by the Office of Science and Technology Policy (OSTP, 1985) that provided a summary of the state of knowledge in the field of carcinogenesis and a statement of broad scientific principles of carcinogen risk assessment on behalf of the federal government. The 1986 cancer guidelines established a classification scheme to describe the nature of the cancer database and evidence supporting the carcinogenicity of an agent. This classification system is based on a similar scheme used at the time by the International Agency for Research on Cancer (IARC). This scheme is described briefly below. More detailed information can be obtained from the 1986 cancer guidelines. The classification scheme utilizes several alpha-numerical groups for classifying chemicals with respect to the evidence available regarding their likely carcinogenic potential for humans: Group A: Human carcinogen; sufficient evidence from epidemiological studies. Group B: Probable human carcinogen; sufficient evidence in animals or limited evidence in humans. Group C: Possible human carcinogen; limited evidence of carcinogenicity in animals in the absence of adequate human data. : This is a revision of the Proposed Guidelines for Carcinogen Risk Assessment published in 1996 (USEPA, 1996). 2-1 ------- Group D: Not classifiable; inadequate data or no data. Group E: Evidence of noncarcinogenicity for humans; no evidence of carcinogen!city in adequate studies in at least two species or in both epidemiological and animal studies. Within Group B there are two subgroups: Bl and B2. Group Bl is reserved for agents for which there is limited evidence of carcinogenicity from epidemiologic studies. Group B2 is generally for agents for which there is sufficient evidence from animal studies and for which there is inadequate evidence or no data from epidemiologic studies. The 1986 cancer guidelines also include guidance on the definition of sufficient or limited evidence. The evidence from human studies is evaluated as "sufficient" when a causal relationship is indicated by the study. Human evidence is considered "limited" when a causal interpretation is credible, but alternative explanations are not sufficiently excluded. When animal studies are used in the evaluation of carcinogenicity, "sufficient" evidence includes agents which have been demonstrated to cause: An increased incidence of malignant tumors; or combined malignant and benign tumors; 1) in multiple species or strains; 2) in multiple experiments (e.g., with different routes of administration or using different dose levels); or 3) to an unusual degree in a single experiment with regard to high incidence, unusual site or type of tumor; or An early age at onset. For quantitative cancer risk estimation, the 1986 cancer guidelines recommended the use of the linearized multistage model (LMS) as the default approach based on the default assumption that chemical carcinogens cause DNA mutations. The 1986 cancer guidelines also stated that low-dose extrapolation models and approaches other than the LMS model might be considered more appropriate based on biological information showing mechanisms of action other than mutagenesis. However, no guidance was given in choosing other approaches; thus, departures from the LMS procedure have been rare in practice. The 1986 cancer guidelines recommended the use of body weight raised to the two/thirds power (BW2/3) as a dose scaling factor between species based on the idea that dose would scale as a function of surface area of the body. 2.2 REVISIONS TO EPA'S CARCINOGEN RISK ASSESSMENT GUIDELINES In 1996, EPA published Proposed Guidelines for Carcinogen Risk Assessment (USEPA, 1996; hereafter the "1996 proposed cancer guidelines"). EPA developed its 1999 draft revised cancer guidelines in response to the February 1997 and January 1999 USEPA Science Advisory Board (SAB) reviews of the proposal. When final guidelines are published, they will replace the 1986 cancer guidelines. These revisions are designed to ensure that the Agency's cancer risk 2-2 ------- assessment methods reflect the most current scientific information and advances in risk assessment methodology. In the meanwhile, the 1986 cancer guidelines are used and extended with principles discussed in the 1999 draft revised cancer guidelines. These principles arise from scientific discoveries concerning cancer made in the last 15 years and from EPA policy of recent years supporting full characterization of hazard and risk both for the general population and potentially sensitive groups such as children. These principles are incorporated in recent and ongoing assessments such as the reassessment of dioxin, consistent with the 1986 guidelines. Until final guidelines are published, information is presented to describe risk under both the old 1986 guidelines and 1999 draft revisions. The 1999 draft revised cancer guidelines require the full use of all relevant information to convey the circumstances or conditions under which a particular hazard is expressed (e.g., route, duration, pattern, or magnitude of exposure). The 1999 draft revised cancer guidelines emphasize the understanding of mode of action (MOA) whereby the agent induces tumors. The MOA underlies the hazard assessment and provides the rationale for dose-response assessments. The key principles in the 1999 draft revised cancer guidelines include: a) Hazard assessment is based on the analysis of all biological information rather than just tumor findings. b) An agent's MOA in causing tumors is emphasized to reduce the uncertainty in describing the likelihood of harm and in determining the dose-response approach(es). c) The 1999 draft revised cancer guidelines emphasize the conditions under which the hazard may be expressed (e.g., route, pattern, duration and magnitude of exposure). Further, these guidelines require a hazard characterization to integrate the analysis of all relevant studies into a weight-of-evidence narrative, and to develop a working conclusion regarding the agent's mode of action in leading to tumor development. d) A weight-of-evidence narrative with accompanying descriptors (listed in Section 2.3.1 below) replaces the current alphanumeric classification system. The weight-of-evidence narrative is a summary of the key evidence for carcinogen! city. It describes the agent's MOA, characterizes the conditions of hazard expression including route of exposure and any anticipated disproportionate effects on sensitive subgroups, and recommends appropriate dose-response approach(es). Significant strengths, weaknesses, and uncertainties of contributing evidence are also highlighted. e) Biologically based extrapolation models are the preferred approach for quantifying risk. These models integrate events in the carcinogenic process throughout the dose-response range from high to low doses. It is anticipated, however, that the necessary data for the parameters used in such models will not be available for most chemicals. The 1999 draft 2-3 ------- revised cancer guidelines allow for alternative quantitative methods, including several default approaches. f) Dose-response assessment is a two-step process when a biologically based model is not used. The first step is the assessment of observed data to derive a point of departure (POD), and the second step is the extrapolation below the range of observation. In addition to modeling tumor data, the 1999 draft revised cancer guidelines call for the use and modeling of other kinds of responses if they are considered to be more informed measures of carcinogenic risk that reflect key events in the carcinogenic process (see Section 2.3.3). For the second extrapolation step, three default approaches are provided-linear, nonlinear, or both. The standard POD for animal studies is the effective dose (ED) corresponding to the lower 95 percent limit on a dose associated with 10 percent extra risk3 (LED10). A lower POD may be used for human studies of large populations. The choice of extrapolation approach is based on conclusions about an agent's MOA as described in Section 2.3.3.2 below. Linear. The linear default is a straight line extrapolation from the POD to the origin (zero dose, zero extra risk). Nonlinear. The nonlinear default begins with the identified POD and provides a margin of exposure (MOE) analysis rather than estimating the probability of effects at low doses. The MOE analysis is used to determine the appropriate margin between the POD and the exposure level of interest (in this Methodology, the AWQC). The goal is to provide information about the risk reduction that accompanies lowering of exposure and the adequacy of an MOE. Factors considered for MOE analysis include the nature of the "Two risk measures of increased response for quantal data have been proposed in the literature, additional risk and extra risk (Crump, 1984). Additional risk is defined as P(d) - P(0) and extra risk as [P(d) - P(0)[/[l - P(0)], where P(d) is the probability of response at dose d, and P(0) is the probability of response at dose 0 (no exposure). Thus, extra risk is additional risk divided by the proportion of individuals that will not respond in the absence of exposure, i.e. additional risk and extra risk differ quantitatively in the way they account for background response. If the spontaneous incidence of a tumor is zero (or close to zero), then the tumor incidence observed reflects the risk of the tumor from exposure to the chemical agent. In this case, the estimate of extra risk and additional risk are the same. If the spontaneous tumor incidence is greater than zero, then the risk of developing a tumor due to exposure to a specific dose of a chemical agent will not be the incidence of the tumor at that dose per se, but will be the incidence of the tumor at that dose corrected for the spontaneous incidence. Additional risk is the proportion of individuals with tumors in the exposed groups beyond that in the control group, and extra risk is the proportion of individuals responding that would not otherwise have responded. This assumes that the processes leading to tumors in the unexposed individuals are independent of the processes that lead to tumors in the exposed animals. The greater the background incidence, the greater the difference between extra and additional risk. If there are no tumors in the control group [P(0) = 0], there is no difference between extra and additional risk. Extra risk provides an expanded measure of the incidence of adverse effects when the background incidence is high, with the effect becoming more marked as the background incidence increases. In effect it provides a more sensitive measure of tumor response to a chemical agent when the spontaneous incidence of tumors is high." 2-4 ------- response, slope of the observed dose-response curve, human sensitivity compared with experimental animals, and nature and extent of human variability in sensitivity. For more detail about MOE analysis, see Section 2.3.3.2. Linear and Nonlinear. Both approaches can be used when different modes of action are thought to be responsible for different tumor or other key event responses. g) The approach used to calculate an oral human equivalent dose when assessments are based on animal bioassays has been refined and includes a change in the default assumption for interspecies dose scaling. The 1999 draft revised cancer guidelines use body weight raised to the 3/4 power (BW3/4). EPA modeling approaches for the observed range of cancer and noncancer assessments are being consolidated. The modeling of observed response data to identify the POD in a standard way for both kinds of response will be based on the benchmark dose (BMD) modeling approach described briefly in Section 3.6 below. Until new cancer guidelines are published, the 1986 guidelines will be used along with principles of the 1999 draft revised cancer guidelines. The 1986 cancer guidelines are the basis for IRIS risk numbers which were used to derive the current AWQC. Each new assessment applying the principles of the 1999 draft revised cancer guidelines will be subject to peer review before being used as the basis of AWQC. Section 2.3 describes the methodology for deriving numerical AWQC for carcinogens applying the principles of the 1999 draft revised cancer guidelines. This discussion of the revised methodology for carcinogens focuses primarily on the quantitative aspects of deriving numerical AWQC values. It is important to note that the cancer risk assessment process outlined in the 1999 draft revised cancer guidelines is not limited to the quantitative aspects. A numerical AWQC value derived for a carcinogen is to be based on appropriate hazard characterization and accompanied by risk characterization information. 2.3 DESCRIPTION OF THE METHODOLOGY FOR DERIVING AWQC BASED ON THE REVISED CARCINOGEN RISK ASSESSMENT Following the publication of the Draft Water Quality Criteria Methodology: Human Health (USEPA, 1998a) and the accompanying TSD (USEPA, 1998b), EPA received comments from the public. EPA also held an external peer review of the draft Methodology, including the cancer methodology. Both the peer reviewers and the public recommended that EPA incorporate the new cancer risk assessment approaches into the AWQC Methodology. The 2000 Human Health Methodology for deriving numerical AWQC for carcinogens is consistent with the 1986 cancer guidelines and principles included in the 1999 draft revised cancer guidelines. This discussion of applying the 2000 Human Health Methodology to carcinogens focuses primarily on the quantitative aspects of deriving numerical AWQC values, but also 2-5 ------- emphasizes the importance of qualitative information as critical to the cancer risk evaluation process. This section contains a discussion of the weight-of-evidence narrative, describing information relevant to a cancer risk evaluation and characterization. It also includes a discussion of general considerations and a framework of analysis for the MOA. These topics are followed by discussions of the quantitative aspects of deriving numerical AWQC values for carcinogens. It is assumed that data from an appropriately conducted animal bioassay or human epidemiological study provide the underlying basis for deriving the AWQC value. The discussion of quantitative risk estimation focuses on the following topics: Dose estimation; Characterizing dose-response relationships in the range of observation and at low, environmentally relevant doses; Calculating the AWQC value; Risk characterization; and Use of Toxicity Equivalent Factors (TEF) and Relative Potency Estimates. 2.3.1 Weight-of-Evidence Narrative The 1999 draft revised cancer guidelines include a weight-of-evidence narrative that is based on an overall judgment of biological, chemical, and physical considerations. The hazard assessment emphasizes analysis of all relevant information rather than just tumor findings. The weight-of-evidence narrative lays out key evidence and includes a discussion of tumor data, information on the MOA, its implications for human hazard including sensitive subgroups, and dose-response evaluation. The narrative emphasizes route and level of exposure and relevance to humans. In addition, a discussion of the strengths and weaknesses of the database is included. The weight-of-evidence narrative is written in nontechnical language. It provides the key data with conclusions, as well as the conditions for hazard expression. Conclusions about potential human carcinogenicity are presented by route of exposure. Contained within this narrative are simple likelihood descriptors that essentially distinguish whether there is enough evidence to make a projection about human hazard (i.e., carcinogenic to humans; likely to be carcinogenic to humans; suggestive evidence of carcinogenicity but not sufficient to assess human carcinogenic potential; data are inadequate for an assessment of human carcinogenic potential; and not likely to be carcinogenic to humans). Because one encounters a variety of data sets on agents, these descriptors are not meant to stand alone; rather, the context of the weight-of- evidence narrative is intended to provide a transparent explanation of the biological evidence and how the conclusions were derived. Moreover, these descriptors should not be viewed as classification categories (like the alphanumeric system), which often obscure key scientific 2-6 ------- differences among chemicals. The new weight-of-evidence narrative also presents conclusions about how the agent induces tumors and the relevance of the MO A to humans including sensitive subgroups, and recommends a dose-response approach based on an understanding of the MO A. 2.3.1.1 Mode of Action: General Considerations and Framework for Analysis An MOA is a description of key events and processes starting with the interaction of an agent with a cell, through operational and anatomical changes, and resulting in cancer formation. "Mode" of action is contrasted with "mechanism" of action, which implies a more detailed, molecular description of events than is meant by MOA. Mode of action conclusions are used to address the question of human relevance of animal tumor responses, to address differences in anticipated response among humans such as between children and adults or men and women, and as the basis of decisions about the anticipated shape of the dose-response relationship. Mode of action analysis is based on physical, chemical, and biological information that helps to explain key events4 in an agent's influence on development of tumors. There are many examples of possible modes of carcinogenic action such as mutagenicity, mitogenesis, inhibition of cell death, cytotoxicity with reparative cell proliferation, and immune suppression. All pertinent studies are reviewed in analyzing an MOA, and an overall weighing of evidence is performed, laying out the strengths, weaknesses, and uncertainties of the case as well as potential alternative positions and rationales. Identifying data gaps and research needs is also part of the assessment. 2.3.1.2 Framework for Evaluating a Postulated Carcinogenic Mode(s) of Action The framework is intended to be an analytic tool for judging whether available data support a mode of carcinogenic action postulated for an agent, and includes nine elements: 1. Summary description of postulated MOA 2. Identification of key events 3. Strength, consistency, specificity of association 4. Dose-response relationship 5. Temporal relationship 6. Biological plausibility and coherence 7. Other modes of action 8. Conclusion 9. Human relevance, including subpopulations "A "key event" is an empirically observable, precursor step that is itself a necessary element of the mode of action, or is a marker for such an element. 2-7 ------- In reaching conclusions, the question of "general acceptance" of an MOA will be tested as part of the independent peer review that EPA obtains for its assessment and conclusions. 2.3.2 Dose Estimation by the Oral Route 2.3.2.1 Determining the Human Equivalent Dose An important objective in the dose-response assessment is to use a measure of internal or delivered dose at the target site when sufficient data are available. This is particularly important in those cases where the carcinogenic response information is being extrapolated to humans from animal studies. Generally, the measure of dose provided in the underlying human studies and animal bioassays is the applied dose, typically given in terms of mg/kg-day. When animal bioassay data are used, it is necessary to make adjustments to the applied oral dose values to account for differences in toxicokinetics between animals and humans that affect the relationship between applied dose and delivered dose at the target organ and estimate a human equivalent dose. In the estimation of a human-equivalent dose, the 1999 draft revised cancer guidelines recommend that when toxicokinetic data are available, they are used to convert the doses used in animal studies to equivalent human doses. However, in most cases, there are insufficient data available to compare dose between species. In these cases, the estimate of a human-equivalent dose is based on science policy default assumptions. In the past, a standard surface area conversion was used; the surrogate for surface area was body weight raised to the 2/3 power (BW2/3). To derive an equivalent human dose from animal data, the new default procedure is to scale daily applied oral doses experienced over a lifetime in proportion to BW3/4. The BW3/4 adjustment factor is used because metabolic rates, as well as most rates of physiological processes that determine the disposition of a dose, scale this way. Thus, the rationale for this factor rests on the empirical observation that rates of physiological processes consistently tend to maintain proportionality with body weight raised to 3/4 power. Based on this assumption, the "human equivalent" of the applied oral dose in an animal study is obtained from the following algorithm where the doses are in mg/kg-day: Human Equivalent Dose = A i r» i Animal BW \ ( Human BW 3/4 Animal Dose x | x Animal BW 3/4J I Human BW (Equation 2-1) 2-8 ------- This equation can be simplified to: Human Equivalent Dose = (Animal Dose)[(Animal BW)/(Human BW)]1/4 (Equation 2-2) A more extensive discussion of the rationale and data supporting the Agency's change in scaling factors from (BW)2/3 to (BW)3/4 is in USEPA (1992b) and the 1999 draft revised cancer guidelines. 2.3.3 Dose-Response Analysis Dose-response analysis addresses the relationship of dose to the degree of response observed in an animal or human study. Extrapolations are necessary when environmental exposures are outside of the range of study observations. Past observations of response have focused on the observation of tumors. The 1999 draft revised cancer guidelines suggest that responses may also include tumor precursors or other effects related to carcinogenicity. These effects may include: changes in DNA, chromosomes, or other key macromolecules; effects on growth signal transduction; induction of physiological or hormonal changes; effects on cell proliferation; or other effects that play a role in the carcinogenic process. Non-tumor effects are referred to as "precursor data" in the following discussion. Specific guidance regarding the use of animal data, presentation of study results, and selection of the optimal data for use in a dose-response analysis is discussed in detail in the 1999 draft revised cancer guidelines. 2.3.3.1 Characterizing Dose-Response Relationships in the Range of Observation The first quantitative component in the derivation of AWQC for carcinogens is the dose- response assessment in the range of observation. The objective of this component is to identify a POD for low-dose extrapolation. Two options are available for the assessment in the observed range: Development of a biologically-based model or Curve-fitting of the tumor or precursor data. If data are extensive and sufficient to quantitatively relate specific key events in the cancer process to neoplasia and the purpose of the assessment is such as to justify investing the necessary resources, a biologically based model can be used for both the observed tumor and related response data and for extrapolation below the range of observed data in either animal or human studies. Extensive data are required to both build the model and to estimate how well it conforms 2-9 ------- with observed tumor development specific to the agent. There are not sufficient data to utilize these types of models for most agents. In the absence of adequate data to generate a biologically based model, dose-response relationships in the observed range can be addressed through curve-fitting procedures for tumor or precursor data. The models should be appropriate to the type of response data in the observed range (see Internet site http://www.epa.gov/ncea/bmds.htm). The 1999 draft revised cancer guidelines call for modeling not only tumor data in the observable range, but also other responses thought to be important events preceding tumor development (e.g., DNA adducts, cellular proliferation, receptor binding, hormonal changes). The modeling of those data is intended to better inform the dose-response assessment by providing insights into the relationships of exposure (or dose) below the observable range for tumor response. These non-tumor response data can only play a role in the dose-response assessment if the agent's carcinogenic mode of action is reasonably understood, as well as the role of that precursor event. The 1999 draft revised cancer guidelines recommend calculating the lower 95 percent confidence limit on a dose associated with an estimated 10 percent increased tumor or relevant non-tumor response (LED10) for quantitative modeling of dose-response relationships in the observed range. The estimate of the LED10 is used as the POD for low-dose extrapolations discussed below. This standard point of departure (LED10) is adopted as a matter of science policy to remain as consistent and comparable from case to case as possible. It is also a convenient comparison point for noncancer endpoints. The rationale supporting use of the LED10 is that a 10 percent response is at or just below the limit of sensitivity for discerning a statistically significant tumor response in most long-term rodent studies and is within the observed range for other toxicity studies. Use of lower limit takes experimental variability and sample size into account. The ED10 (central estimate) is also presented as a reference for comparison uses, especially for use in relative hazard/potency ranking among agents for priority setting. For some data sets, a choice of the POD other than the LED10 may be appropriate. The objective is to determine the lowest reliable part of the dose-response curve for the beginning of the second step of the dose-response assessmentdetermine the extrapolation range. Therefore, if the observed response is below the LED10, then a lower point may be a better choice (e.g., LED5). Human studies more often support a lower POD than animal studies because of greater sample size. The POD may be a NOAEL when a MOE analysis is the nonlinear dose-response approach. The kinds of data available and the circumstances of the assessment both contribute to deciding to use a NOAEL or LOAEL, which is not as rigorous or as ideal as curve fitting, but can be appropriate. If several data sets for key events and tumor response are available for an agent, and they are a mixture of continuous and incidence data, the most practicable way to assess them together is often through a NOAEL/LOAEL approach. 2-10 ------- When a POD is estimated from animal data, it is adjusted to the human equivalent dose using an interspecies dose adjustment or toxicokinetic analysis. Analysis of human studies in the observed range is designed on a case-by-case basis depending on the type of study and how dose and response are measured in the study. In some cases, the analysis may incorporate consideration of an agent's interactive effects with other agents. 2.3.3.2 Extrapolation to Low, Environmentally Relevant Doses In most cases, the derivation of an AWQC will require an evaluation of carcinogenic risk at environmental exposure levels substantially lower than those used in the underlying study. Various approaches are used to extrapolate risk outside the range of observed experimental data. In the 1999 draft revised cancer guidelines, the choice of extrapolation method is largely dependent on the mode of action. It should be noted that the term "mode of action" (MOA) is deliberately chosen in the 1999 draft revised cancer guidelines in lieu of the term "mechanism" to indicate using knowledge that is sufficient to draw a reasonable working conclusion without having to know the processes in detail as the term mechanism might imply. The 1999 draft revised cancer guidelines favor the choice of a biologically based model, if the parameters of such models can be calculated from data sources independent of tumor data. It is anticipated that the necessary data for such parameters will not be available for most chemicals. Thus, the 1999 draft revised cancer guidelines allow for several default extrapolation approaches (low-dose linear, nonlinear, or both). A. Biologically Based Modeling Approaches If a biologically based model has been used to characterize the dose-response relationships in the observed range, and the confidence in the model is high, it may be used to extrapolate the dose-response relationship outside the observed data range. Although biologically based approaches are appropriate both for characterizing observed dose-response relationships and extrapolating to environmentally relevant doses, it is not expected that adequate data will be available to support such approaches for most substances. In the absence of such data, the default linear approach, the nonlinear (or MOE) approach, or both linear and nonlinear approaches are used. B. Default Linear Extrapolation Approach The default linear approach replaces the LMS approach that has served as the default for EPA cancer risk assessments. Any of the following conclusions leads to selection of a linear dose-response assessment approach: The chemical has direct DNA mutagenic reactivity or other indications of DNA effects that are consistent with linearity. 2-11 ------- Mode of action analysis does not support direct DNA effects, but the dose- response relationship is expected to be linear (e.g., certain receptor-mediated effects). Human exposure or body burden is high and near doses associated with key events in the carcinogenic process (e.g., 2,3,7,8-tetrachlorodibenzo-p-dioxin). There is an absence of sufficient tumor MOA information. The procedures for implementing the default linear approach begin with the estimation of a POD as described above. The point of departure, LED10, reflects the interspecies conversion to the human equivalent dose and the other adjustments for less-than-lifetime experimental duration. In most cases, the extrapolation for estimating response rates at low, environmentally relevant exposures is accomplished by drawing a straight line between the POD and the origin (i.e., zero dose, zero extra risk). This is mathematically represented as: y = mx + b b = 0 (Equation 2-3) where: y = Response or incidence m = Slope of the line (cancer potency factor) = y/9 x x = Dose b = Slope intercept The slope of the line, "m" (i.e., y/* x, the estimated cancer potency factor at low doses), is computed as: 0.10 m = LEDio (Equation 2-4) When an LED10 isn't used, the standard equation for the slope of a line may be used: m = (Equation 2-5) where: 2-12 ------- y2 = Response at the POD Y! = Response at the origin (zero) x2 = Dose at the POD Xj = Dose at the origin (zero) Due to the use of the origin for yl and xl3 the equation simplifies to: (Equation 2-6) The risk-specific dose (RSD) is then calculated for a specific incremental targeted lifetime cancer risk (in the range of 10"6 to 10"4) as: BOTX _ Target Incremental Cancer Risk JVOlJ - m (Equation 2-7) where: RSD = Risk-specific dose (mg/kg-day) Target Risk7 = Value typically in the range of 10"4 to 10"6 m = Cancer potency factor (mg/kg-day)"1 The use of the RSD to compute the AWQC is described below in the Section 2.3.4, AWQC Calculation. C. Default Nonlinear Approach As discussed in the 1999 draft revised cancer guidelines, any of the following conclusions leads to a selection of a nonlinear (MOE) approach to dose-response assessment: A tumor MOA supporting nonlinearity applies (e.g., some cytotoxic and hormonal agents such as disrupters of hormonal homeostasis), and the chemical does not demonstrate mutagenic effects consistent with linearity. 7 In 1980, the target lifetime cancer risk range was set at 10'7 to 10'5. However, both the expert panel for the AWQC workshop (USEPA, 1992a) and SAB recommended that EPA change the risk range to 10"6 to 10"4, to be consistent with drinking water. 2-13 ------- A MOA supporting nonlinearity has been demonstrated, and the chemical has some indication of mutagenic activity, but it is judged not to play a significant role in tumor causation. A default assumption of nonlinearity is appropriate when there is no evidence for linearity and sufficient evidence to support an assumption of nonlinearity. The MOA may lead to a dose- response relationship that is nonlinear, with response falling much more quickly than linearly with dose or with response being most influenced by individual differences in sensitivity. Alternatively, the MOA may theoretically have a threshold (e.g., the carcinogenicity may be a secondary effect of toxicity or of an induced physiological change that is itself a threshold phenomenon) (see Appendix C, Example 5, or Appendix D, Example 2 in USEPA, 1999a). The EPA does not generally try to distinguish between modes of action that might imply a "true threshold" from others with a nonlinear dose-response relationship. Except in unusual cases where extensive information is available, it is not possible to distinguish between these empirically. As a matter of science policy under this analysis, nonlinear probability functions are not fit to the response data to extrapolate quantitative low-dose risk estimates. This is because different models can lead to a very wide range of results, and there is currently no basis, generally, to choose among them. Thus, the default procedure for nonlinear extrapolation is to conduct an MOE analysis to evaluate concern for levels of exposure. An MOE is defined as the POD divided by the environmental exposure of interest. The environmental exposures of interest, for which MOEs are estimated, may be actual or projected exposure levels. An acceptable MOE is estimated. MOE analysis is applicable if data are sufficient to presume a nonlinear dose-response function containing a significant change in slope. An RfD8 or RfC-like value may be estimated and considered based on a precursor event that is key to the cancer process. To support a risk manager's consideration of the MOE, all of the pertinent hazard, dose- response, and human exposure information is characterized to provide insights about the scientific community's current understanding of the phenomena that may be occurring as dose (exposure) decreases substantially below the observed data. The goal is to provide as much information as possible about the risk reduction that accompanies lowering of exposure and the adequacy of an MOE based on scientific input. Operationally, there are two main steps in the MOE approach: The first step is the selection of a POD that is a "minimum effect dose level." The POD would ideally be the dose where the key events in tumor development would not occur in 8 A reference dose (RfD) or reference concentration (RfC) for noncancer toxicity is an estimate with uncertainty spanning perhaps an order of magnitude of daily exposure to the human population (including sensitive subgroups) that is anticipated to be without appreciable deleterious effects during a lifetime. It is arrived at by dividing empirical data on effects by uncertainty factors that consider inter- and intraspecies variability, extent of data on all important chronic exposure toxicity endpoints, and availability of chronic as opposed to subchronic data. 2-14 ------- a heterogeneous human population, thus representing an actual "no-effect level". As noted above, the POD may be the LED10 9 for tumor incidence or a precursor. In some cases, it may also be appropriate to use a NOAEL or LOAEL value from a precursor. When animal data are used, the POD is a human equivalent dose or concentration arrived at by interspecies dose adjustment (as discussed above) or toxicokinetic analysis. The second step in using MOE analysis to establish an AWQC is the selection of an appropriate margin or UF to apply to the POD. This is supported by analysis in the MOE discussion provided in the risk assessment. The Agency will develop more specific guidance on the MOE approach, as recommended by the Agency's SAB in its January, 1999 review. The guidance will be peer reviewed and published separately as part of the Agency's implementation activity of these guidelines. The general principles and major elements to be considered in an MOE analysis are listed below. The nature of the response used for the dose-response assessment, for instance, whether it is a precursor effect or a tumor response. The latter may support a greater MOE. The slope of the observed dose-response relationship at the POD and its uncertainties and implications for risk reduction associated with exposure reduction. A steeper slope implies a greater reduction in risk as exposure decreases. This may support a smaller MOE. Human sensitivity compared with that of experimental animals. How sensitive is the human population compared with the tested animals? For this comparison, all doses should have already been converted to equivalent human doses, using either a toxicokinetic model or the default cross-species scaling factor. These dose conversions reflect interspecies differences in toxicokinetics, not toxicodynamics. When information is not sufficient to quantify human sensitivity with regard to the toxicodynamics compared with the tested animals, this uncertainty needs to be taken into account in the discussion of an adequate MOE. As with noncancer assessment, the default assumption is that the most sensitive humans are more sensitive than the test animals. Depending on the data available on the sensitivity of the test species to the agent and the endpoint of concern compared with humans, the MOE decision may need to incorporate more or less conservatism. Nature and extent of human variability and sensitivity. Is there information on sensitive individuals that would be part of a heterogeneous human population? Pertinent information would come from human studies, since animal studies, particularly those using homogeneous animal strains, do not provide information 9 The LED10 is adopted as the standard POD for non tumor key event or toxicity incidence data in order to harmonize curve-fitting procedures between cancer and non cancer toxicity assessments. Because the NOAEL in study protocols for non tumor toxicity can range from about a 5% to a 30% effect level, adopting the 10% effect level as the standard POD will accommodate most of these data sets without departing the range of observation. The LED10 can be regarded as an improved and harmonized estimate of the NOAEL (USEPA, 1999a). 2-15 ------- about human variability. When information is not sufficient to quantify the extent of human variability in sensitivity, this uncertainty should be reflected in the discussion of an adequate MOE (also see discussion below on human exposure). Human exposure. The MOE evaluation also takes into account the magnitude, frequency, and duration of exposure. If the population exposed in a particular scenario is wholly or largely composed of a subpopulation of special concern (e.g., children) for whom evidence indicates a special sensitivity to the agent's MO A, an adequate MOE would be larger than for general population exposure. Considering the toxicity and other data presented in the weight-of-evidence narrative and the MOE analysis provided in the risk assessment for the chemical, a UF is selected 10 on a case-by-case basis, with full explanation of the rationale. The UF is used to modify the POD in the final equation. This is shown below in Section 2.3.4 on AWQC calculation. D. Both Linear and Nonlinear Approaches Any of the following conclusions leads to selection of both a linear and nonlinear approach to dose-response assessment. Relative support for each dose-response method and advice on the use of that information needs to be presented. In some cases, evidence for one MOA is stronger that for the other, allowing emphasis to be placed on that dose-response approach. In other cases, both modes of action are equally possible, and both dose-response approaches should be emphasized. Modes of action for a single tumor type support both linear and nonlinear dose response in different parts of the dose-response curve (e.g., 4,4' methylene chloride). A tumor mode of action supports different approaches at high and low doses; e.g., at high dose, nonlinearity, but, at low dose, linearity (e.g., formaldehyde). The agent is not DNA-reactive and all plausible modes of action are consistent with nonlinearity, but a key event is not fully established. Modes of action for different tumor types support differing approaches, e.g., nonlinear for one tumor type and linear for another due to lack of MOA information (e.g., tri chl oroethyl ene). 10 EPA will develop more specific guidance on the margin of exposure approach, as recommended by the Agency's SAB in 1999. The guidance will be peer reviewed and published separately as part of the Agency's implementation of the Final Revised Cancer Guidelines. 2-16 ------- 2.3.4 AWQC Calculation 2.3.4.1 Linear Approach The following equation is used for the calculation of the AWQC for carcinogens where an RSD is obtained from the linear approach: AWQC = RSD x ' BW ' DI + (FI x BAF), (Equation 2-8) where: AWQC = Ambient water quality criterion (mg/L) RSD = Risk-specific dose (mg/kg-day) BW = Human body weight (kg) DI = Drinking water intake (L/day) FI = Fish intake (kg/day) BAF = Bioaccumulation factor (L/kg) The AWQC calculation shown above is appropriate for water bodies that are used as sources of drinking water (and for other uses). 2.3.4.2 Nonlinear Approach In those cases where the nonlinear, MOE approach is used, a similar equation is used to calculate the AWQC: AWQC= BW UF ^ DI+(FI x BAF), (Equation 2-9) where: AWQC = Ambient water quality criterion (mg/L) RSD = Risk-specific dose (mg/kg-day) POD = Point of departure (mg/kg-day) UF = Uncertainty factor (unitless) BW = Human body weight (kg) DI = Drinking water intake (L/day) 2-17 ------- FI = Fish intake (kg/day) BAF = Bioaccumulation factor (L/kg) RSC = Relative source contribution (percentage or subtraction) As noted above for the linear approach, the AWQC calculation shown above is appropriate for water bodies that are used as sources of drinking water (and for other uses). A difference between the AWQC values obtained using the linear and nonlinear approaches is that the AWQC value obtained using the default linear approach corresponds to a specific estimated incremental lifetime cancer risk level in the range of 10"4 to 10"6. In contrast, the AWQC value obtained using the nonlinear approach does not describe or imply a specific cancer risk. The actual AWQC chosen is based on a review of all relevant information, including cancer, noncancer, ecological, and other critical data. The AWQC might not utilize the value obtained from the cancer analysis if it is less protective than that derived from the noncancer endpoint. 2.3.5 Risk Characterization Risk characterization information accompanies the numerical AWQC value and addresses the major strengths and weaknesses of the assessment arising from the availability of data and the current limits of understanding of the process of cancer causation. Key issues relating to the confidence in the hazard assessment and the dose-response analysis (including the low dose extrapolation procedure used) are discussed. Whenever more than one interpretation of the weight of evidence for carcinogen!city or the dose-response characterization can be supported, and when choosing among them is difficult, the alternative views are provided along with the rationale for the interpretation chosen in the derivation of the AWQC value. Where possible, quantitative uncertainty analyses of the data are provided; at a minimum, a qualitative discussion of the important uncertainties is presented. Important features of the risk characterization include significant scientific issues, significant science and science policy choices that were made when alternative interpretations of data exist, and the constraints of the data and the state of knowledge. The assessments of hazard, dose-response, and exposure are summarized to generate risk estimates for the exposure scenarios of interest. The 1999 draft revised cancer guidelines contain more detailed guidance regarding the development of risk characterization summaries and analyses. 2-18 ------- 2.3.6 Use of Toxicity Equivalence Factors and Relative Potency Estimates The 1999 draft revised cancer guidelines state: A Toxicity equivalence factor (TEF) procedure is one used to derive quantitative dose-response estimates for agents that are members of a category or class of agents. TEFs are based on shared characteristics that can be used to rank or order the class members by carcinogenic potency when cancer bioassay data are inadequate for this purpose. The ordering is by reference to the characteristics and potency of a well-studied member or members of the class. Other class members are indexed to the reference agent(s) by one or more shared characteristics to generate their TEFs. In addition, the 1999 draft revised cancer guidelines state that TEFs are generated and used for the limited purpose of assessment of agents or mixtures of agents in environmental media when better data are not available. When better data become available for an agent, the TEF should be replaced or revised. To date, adequate data to support use of TEFs has been found only for dibenzofurans (dioxins) and coplanar polychlorinated biphenyls (PCBs) (USEPA, 1989, 1999b). The uncertainties associated with TEFs must be discussed when this approach is used. This is a default approach to be used when tumor data are not available for individual components in a mixture. Relative potency factors (RPFs) can be similarly derived and used for agents with carcinogenicity or other supporting data. The RPFs are conceptually similar to TEFs, but do not have the same level of data to support them. TEFs and RPFs are used only when there is no better alternative. When they are used, uncertainties associated with them must be discussed. As of today, there are only three classes of compounds for which relative potency approaches have been examined by EPA: dioxins, PCBs, and polycyclic aromatic hydrocarbons (PAHs). There are limitations to the use of TEF and RFP approaches, and caution should be exercised when using them. More guidance can be found in the Draft Guidance for Conducting Health Risk Assessment of Chemical Mixtures (USEPA, 1999b). 2.4 CASE STUDY (COMPOUND Z, A RODENT BLADDER CARCINOGEN) This section illustrates an application of the nonlinear method (MOE) for a rodent bladder carcinogen (Compound Z). A brief summary of the data set is provided below with conclusions regarding the weight of evidence "Likely/Not Likely Human Carcinogen "-Range of Dose Limited, Margin-of-Exposure Extrapolation. For more details in the hazard evaluation and in the mode of action evaluation of this chemical, see Appendices A and B, respectively, which are selected from the case studies in the 1999 draft revised cancer guidelines. The AWQC obtained using the default linear and LMS approaches are included for purposes of comparison only and would not be used for agents with the characteristics described for Compound Z. 2-19 ------- 2.4.1 Background and Evaluation for Compound Z Compound Z is a metal organophosphonate which has been tested in acute, subchronic, chronic, reproductive, mutagenic and carcinogenic assays in multiple species. Tumors were observed only in rat studies. No human data are available. Based on a review of the toxicity, mechanistic, metabolic, and other data summarized below for this agent, it was concluded that a nonlinear approach is most appropriate for establishing AWQC based on carcinogenicity. (See Appendices A and B for more detail.) Lifetime cancer bioassays of Compound Z identified bladder tumors and hyperplasia in male rats at doses of 1500 mg/kg-day and higher in the diet. These effects were not observed at 100 and 400 mg/kg-day. In a 90-day study designed to evaluate the mechanisms of tumor induction, the following sequence was identified as critical to bladder tumor formation in rats: 1) large doses of Compound Z produce urinary calcium/potassium imbalance followed by 2) diuresis, a sharp drop in urine pH, formation of urinary calculi, and 3) appearance of transitional cell hyperplasia in the renal pelvis, ureter, and urinary bladder. These effects occurred within two weeks of exposure onset, persisted to the end of exposure, and were reversible upon cessation of the 90-day exposure. The pathological events caused by Compound Z are believed to result from prolonged mechanical irritation of the bladder by calculi that developed in response to the exposure. At high but not lower subchronic doses in the male rat, Compound Z leads to elevated blood phosphorus levels; the body responds by releasing excess calcium into the urine. The calcium and phosphorus combine in the urine and precipitate into multiple stones in the bladder. The stones are very irritating to the bladder; the bladder lining is eroded, and cell proliferation occurs to compensate for the loss of the lining. This leads to development of hyperplasia, with subsequent tumor formation. A prolonged increase in the rate of proliferation of cells of the urinary bladder has been proposed to be an important step in the induction of urinary bladder tumors (Cohen and Ellwein, 1990, 1991). Thus, the association of cell proliferation, hyperplasia, and subsequent cancer induction as a result of urinary stone formations due to exposure to Compound Z is proposed as one mode of action which may justify, after a review of all relevant data, the use of a nonlinear approach, such as the MOE approach. Studies of the effects of separated components of this agent (i.e., the metal and the organophosphate components) yield no evidence of carcinogenicity in the bladder. In metabolic studies in animals, the metallic component in isolation from the parent molecule was not absorbed to a significant extent from the gastrointestinal tract. Compound Z has been assessed via a battery of mutagenicity assays that have yielded negative results, and a review of the chemical structure does not suggest potential genotoxicity. The metabolites of Compound Z have also yielded negative results in mutagenicity assays and yielded no evidence of carcinogenicity. The negative genotoxicity results for Compound Z and 2-20 ------- structurally related agents provide further support for the use of a nonlinear approach, such as the MOE approach, to establish AWQC. 2.4.2 Conclusion and Use of the MOE Approach for Compound Z Compound Z, a metal aliphatic phosphonate, is likely to be carcinogenic to humans only under high-exposure conditions following oral and inhalation exposure that lead to bladder stone formation, but is not likely to be carcinogenic under low-exposure conditions. It is not likely to be a human carcinogen via the dermal route, given that the compound is a metal conjugate that is readily ionized, and its dermal absorption is not anticipated. The weight of evidence is based on: (1) bladder tumors only in male rats at high exposure; (2) the absence of tumors at any other site in rats or mice; (3) the formation of calcium-phosphorus-containing bladder stones in male rats at high, but not low, exposure. The bladder stones erode bladder epithelium and result in profound increases in cell proliferation and cancer; and (4) the absence of carcinogenic structural analogues or mutagenic activity. There is a strong mode of action basis for the requirements of high doses of Compound Z, which leads to excess calcium and increased acidity in the urine, resulting in the precipitation of bladder stones and subsequent increase in cell proliferation and tumor hazard potential. Lower doses fail to perturb urinary constituents, lead to stones, produce toxicity, or give rise to tumors. Therefore, dose-response assessment should assume nonlinearity. A major uncertainty is whether the profound effects of Compound Z may be unique to the rat. Even if Compound Z produced stones in humans, there is only limited evidence that humans with bladder stones develop cancer. Based on the progression of pathology leading to tumors, in which hyperplasia is an early critical step, hyperplasia was selected as the sentinel precursor effect which was used as the basis for the calculation of AWQC using the MOE approach. Hyperplasia incidence data from a lifetime rat study are available for Compound Z. Tumor data from the same lifetime rat study were used to calculate AWQC using the default linear and LMS approaches for purposes of comparison. The data used for all three approaches are summarized in Table 2-1 below. 2.4.2.1 Identification of the Point of Departure for Compound Z The POD chosen for the MOE calculations was 400 mg/kg-day, which is the maximum animal dose yielding no observable hyperplastic effects (the NOAEL shown in Table 2-1).n The study found males to be more sensitive than females, and the hyperplasia results in male rats were used for AWQC calculations. The human equivalent dose for the NOAEL of 106.4 mg/kg-day was calculated using the new scaling factor of body weight raised to the 3/4 power (as shown in Equation 2-1). "This is based on a dietary conversion factor for rats from ppm to mg/kg-day of 0.05. 2-21 ------- Table 2-1. Study Results from a Lifetime Exposure of Male Rats to Compound Z Animal Dose in mg/kg-day (scaled human equivalent doses) 0 400 (BW3/4 = 106.4)3 (BW2/3 = 68.4)b 1500 (BW3/4 = 398.9)a (BW2/3 = 256.5)b Number in Group 73 78 78 Number Responding tumors (combined papilloma & carcinoma) 3 2 21* hyperplasia 5 5 29* a. The (BW)3/4 scaling factor is based on the 1999 draft revised cancer guidelines. b. The (BW)2/3 scaling factor is based on the 1986 cancer guidelines and is used with the LMS method later in this section for comparative purposes. * There were statistically significant (p<0.05) increases in both tumor incidence and hyperplasia in the treated group compared with the control group. 2.4.2.2 Discussion of the Points Affecting Selection of the UF for Compound Z The Nature of the Response. The response used for the dose-response assessment is hyperplasia, which is a precursor effect. Therefore, a smaller UF is needed. Slope of the Dose-Response Relationship. The data available indicate a steep slope at the point of departure (at 400 mg/kg-day animal dose). This would suggest a rapid reduction in risk with lower doses, or a smaller UF. Intraspecies Variability. There is variability within the human population in responses to xenobiotic agents which may result from a variety of factors including health status, diet, age, and genetic composition. Research on Compound Z did not identify a common health or genetic condition which would yield a subpopulation who are particularly susceptible to the carcinogenic effects of Compound Z nor did it indicate an exceptionally high or low level of intraspecies variability. 2-22 ------- Interspecies Variability. Animals and humans may vary widely in their responses to agents due to their differing physiology and metabolism. A review of human case studies and epidemiological studies indicate that humans may be significantly less susceptible to the influence of bladder irritation, stone formation, and subsequent tumor formation than male rodents. This would suggest a smaller UF for interspecies variability. Human Exposure. This exposure scenario is chronic, so there is no need to apply an additional UF. After considering all the issues together, a decision is made on the margin of safety (MOS) exposure or the UF. The size of the UF is a matter of policy and is selected on a case-by-case basis, considering the weight of evidence and the MOE analysis provided in the risk assessment.12 In summary, an overall UF of 30 is used in the MOE calculation. The selection of the UF is based on a consideration of all the factors discussed above, such as intraspecies variability (10), interspecies variability (3 is used here to account for toxicokinetic differences, a scaling factor of body weight raised to 3/4 power has already applied to adjust for toxicokinetic differences). In addition, the database for this chemical is very extensive, as described in detail in Appendix B (selected from the case study of the 1999 draft revised cancer guidelines). Further, the duration of the key study used for quantification is chronic. Thus, this factor of 30 is considered to be sufficient for human health protection. The risk may decline considerably with doses lower than the POD; the male rat is a very sensitive model (mice do not respond). Physiological phenomena are likely to fall off sharply with dose as shown by the dose-response curve. Further, bladder stone and subsequent tumor formation is not a common phenomenon in humans. 2.4.2.3 AWOC Calculations for Compound Z Equation 2-9 shown in Section 2.3.4.2 was used to calculate the AWQC for Compound Z: ATI- POD BW AWQC = x UF ^ DI+(FI x BAF)j (Equation 2-9) The following input parameters were used: POD = Point of departure (106.4 mg/kg-day (NOAEL)) 12 EPA will provide specific guidance on the margin of exposure approach. The guidance will be peer reviewed and published separately as part of the Agency's implementation activity of the Draft Revised Cancer Guidelines. 2-23 ------- UF = Uncertainty factor of 30 BW = Body weight for adult (70 kg) DI = Drinking water intake (2 L/day) FI = Fish intake (0.0175 kg/day) BAF = Assumed bioaccumulation factor (BAF) (300 L/kg) RSC = Relative source contribution (20% assumed) This calculation yields an AWQC of 6.7 mg/L. The body weight, water intake, fish intake, and RSC percentage values used in the above calculation are the current default values for adults. The BAF, which accounts for the accumulation of Compound Z from water through the food chain and into fish tissue, has been arbitrarily chosen for purposes of this case study. The AWQC calculations shown above is appropriate for water bodies that are used as sources of drinking water (and for other uses). 2.4.3 Use of the Default Linear Approach for Compound Z This section is provided for purposes of illustrating the use of the default linear approach for deriving AWQC based on carcinogenicity and to compare the resulting AWQC to that obtained above using the MOE approach. As discussed in Section 2.4.1 above, it is important to note that the default linear method would most likely not, in practice, be recommended as an approach for quantifying the risk and deriving the AWQC for Compound Z given the hazard characteristics described for this substance. 2.4.3.1 Computing the Human Equivalent Dose for Compound Z The doses used in the study were adjusted to obtain a human equivalent dose, as shown in Table 2-1. In the absence of toxicokinetic data, this was done using a scaling factor of BW3/4, with a male rat weight of 0.35 kg and a human weight of 70 kg (as shown in Equation 2-1). 2.4.3.2 Calculation of AWOC for Compound Z To describe the dose-response of tumor incidence data in the observed range, a curve- fitting model such as the multistage or other approach appropriate for the data can be used. In the case of Compound Z, three data points (at doses of 0, 400, and 1500 mg/kg-day) were used in the multistage model (GLOBAL86) to calculate the LED10 (the 95 percent lower confidence limit on a dose associated with a 10 percent increase in response). The value obtained for the LED10 is 204 mg/kg-day. 2-24 ------- The cancer slope factor (m) is calculated by dividing 0.1 by the LED10 using Equation 2-4: 0.10 m=- LED10 (Equation 2-4) This yields an estimated cancer slope factor of 4.9 x 10"4 per mg/kg-day. The cancer slope factor is then used in Equation 2-7 with a specified risk level (in this case 10"6) to calculate an RSD: Target Incremental Cancer Risk m (Equation 2-7) This yields an RSD of 2.0 x 10'3 mg/kg-day. The RSD is used in Equation 2-8 with the same input parameters (body weight, drinking water intake, fish intake, and BAF) as those used for the MOE approach: AWQC = RSD x ' BW ' DI+(FI x BAF)j (Equation 2-8) This yields an AWQC of 0.019 mg/L (rounded from 0.0189 mg/L) for a target risk of 10'6 2.4.4 Use of the LMS Approach for Compound Z This section is provided strictly for purposes of comparing the use of the MOE approach with the traditional LMS method for deriving AWQC for carcinogens. As discussed above, the LMS approach would not be used in practice to quantify risk and derive the AWQC for Compound Z given the hazard characteristics described for this substance. First, the LMS approach was used to fit the male rat tumor data shown in Table 2-1 using the computer program GLOBAL86. This program calculates the 95th percentile upper confidence limit on the linear slope (i.e., the q^) in the low dose range. A human equivalent dose was calculated using the BW2/3 interspecies dose scaling factor for purposes of illustrating the results obtained applying the 1980 Methodology. The human equivalent doses obtained using this scaling factor are shown in Table 2-1 above. (The same data set, using differently scaled doses, was employed for both the new linear and LMS approaches.) The qx* value obtained using the LMS approach is 6 x 10"4 (mg/kg-day)"1. 2-25 ------- Equation 2-7 was used with a reference incremental cancer risk of 10"6 to calculate an RSD of 1.7 x 10"3. Equation 2-8 was then used to calculate the AWQC with the same input parameters (body weight, drinking water intake, fish intake, and BAF) as those used for the MOE approach. The AWQC was calculated to be 0.016 mg/L and was rounded from 0.0157 mg/L. 2.4.5 Comparison of Approaches and Results for Compound Z The results of the three approaches used for Compound Z are summarized in Table 2-2. The AWQC calculated using the MOE approach is substantially higher than that obtained using the default linear and LMS approaches. If larger or smaller UFs were used in the MOE calculations, the AWQC obtained using the MOE approach would decrease or increase accordingly. The quantitative relationship between AWQC derived using different methods will vary depending on the nature of the data set and the UFs and POD selected for use in the MOE approach. Table 2-2. Comparison of AWQC Obtained for Compound Z Using the MOE, Default Linear, and LMS Approaches Method MOE: Using hyperplasia as a precursor for determining the POD and a UF of 30. Default Linear: Using linear extrapolation - straight line drawn from the LED10 to the origin with a 10"6 target risk level and an interspecies scaling factor based on BW3/4. LMS: Using the linearized multistage approach with a 10"6 risk level and an interspecies scaling factor based on BW2/3. AWQC (mg/L) 6.7 0.019 0.016 2.5 REFERENCES Barnes, D.G., G.P Daston, J.S. Evans, A.M. Jarabek, RJ. Kavlock, C.A. Kimmel, C. Park, and H.L. Spitzer. 1995. Benchmark dose workshop: criteria for use of a benchmark dose to estimate a reference dose. Regul. Toxicol. Pharmacol. 21:296-306. Chen, CW. and G. Oberdorster. 1996. Selection of models for assessing dose-response relationship for particle-induced lung cancer. Inhalation Toxicol. 8:259-278. Cohen, S.W. and L.B. Ellwein. 1990. Cell proliferation in carcinogenesis. Science 249:1007- 1011. 2-26 ------- Cohen, S.W. and L.B. Ellwein. 1991. Genetic errors, cell proliferation and carcinogenesis. Cancer Res. 51:6493-6505. Crump, K. 1984. A new method for determining allowable daily intakes. Fund. Appl. Toxicol. 4:854-891. OSTP (Office of Science and Technology Policy). 1985. Chemical carcinogens: Review of the science and its associated principles. Federal Register 50: 10372-10442. USEPA (U.S. Environmental Protection Agency). 1986. Guidelines for carcinogen risk assessment. Federal Register 51:33992-34003. USEPA (U.S. Environmental Protection Agency). 1989. Interim Procedures for Estimating Risks Associated with Exposures to Mixtures of Chlorinated Dibenzo-p-dioxins and Dibenzofurans (CDDs and CDFs) and 1989 Update. Risk Assessment Forum. Washington, DC. EPA/625/3-89/016. USEPA (U.S. Environmental Protection Agency). 1991. Workshop Report on Toxicity Equivalency Factors for Poly chlorinated Biphenyl Congeners. Risk Assessment Forum. Washington, DC. EPA/625/3-91/020. USEPA (U.S. Environmental Protection Agency). 1992a. Report of the National Workshop on Revision of the Methods for Deriving National Ambient Water Quality Criteria for the Protection of Human Health. Office of Water. Washington, DC. USEPA (U.S. Environmental Protection Agency). 1992b. Draft report: A cross-species scaling factor for carcinogen risk assessment based on equivalence of Mg/Kg3/4/day. Federal Register 57: 24152-24173. USEPA (U.S. Environmental Protection Agency). 1996. Proposed Guidelines for Carcinogen Risk Assessment. Office of Research and Development. Washington, DC. EPA/600/P- 92/003 C. (Federal Register 61:17960) USEPA (U.S. Environmental Protection Agency). 1998a. Draft Water Quality Criteria Methodology: Human Health. Federal Register Notice. Office of Water. Washington, DC. EPA-822-Z-98-001. USEPA (U.S. Environmental Protection Agency). 1998b. Ambient Water Quality Criteria Derivation Methodology - Human Health. Technical Support Document. Final Draft. Office of Water. Washington, DC. EPA 2-27 ------- USEPA (U.S. Environmental Protection Agency). 1999a. Guidelines for Carcinogen Risk Assessment. Review Draft. Risk Assessment Forum. Washington, DC. NCEA-F-0644. July. USEPA (U.S. Environmental Protection Agency). 1999b. Draft guidance for conducting health risk assessment of chemical mixtures. Federal Register 64:23833-23834. 2-28 ------- 3. NONCANCER EFFECTS 3.1 INTRODUCTION The evaluation of risks from noncarcinogenic chemicals traditionally has been based on the assumption that noncarcinogens have a dose or level below which no adverse effects are expected to occur. The risk estimate developed by EPA for noncarcinogens is the reference dose (RfD). The Integrated Risk Information System (IRIS) Background Document entitled Reference Dose (RfD): Description and Use in Health Risk Assessments (USEPA, 1988; hereafter the "1988 RfD background document") defines an RfD as "an estimate (with uncertainty spanning approximately an order of magnitude) of a daily exposure to the human population (including sensitive subgroups) that is likely to be without appreciable risk of deleterious effects over a lifetime." The RfD is acknowledged to be an estimate and, thus, may not be completely protective of every individual within a highly variable population; conversely, exposures above the RfD are not necessarily unsafe. Some individuals may have better adaptive or protective capacities than others, and responses may vary with age and state of health; thus, individuals respond differently to toxicant exposure (Barnes and Dourson, 1988). The key step in deriving water quality criteria for the protection of human health from noncancer effects is the determination of the RfD. As described in Section 1, the RfD is used in concert with additional information regarding exposure and the bioaccumulation potential of the substance to derive an AWQC for noncancer effects. The procedures presented in USEPA's 1988 RfD background document for deriving the RfD using an experimentally derived NOAEL/ LOAEL approach are incorporated into this chapter. The Agency is also investigating alternative methods for estimating the RfD. Thus, this guidance document contains information on two alternative methods: BMD and categorical regression approaches. The Agency continues to conduct research on the utility of both of these methods in the noncancer risk assessment process and recommends their application in circumstances where the data are sufficient. The Agency used the BMD approach to derive a RfD for methylmercury as described in Reference Dose (RfD) for Oral Exposure for Methylmercury (USEPA, 1994a). This section begins with a discussion of hazard identification and dose-response characterization. This is followed by a description of factors to be considered in the selection of critical data sets for use in the risk assessment evaluation. The procedures for deriving an RfD for a substance using the traditional NOAEL/LOAEL approach are presented as the accepted current risk assessment practice used by EPA. Next, the BMD method for deriving an RfD is discussed, and an example of its application is provided for illustrative purposes. A brief discussion of categorical regression is also included, with references to the relevant literature. The chapter concludes with specific sections on several issues relevant to noncancer risk assessment, including practical nonthreshold effects and risks from short-term exposures and mixtures. 3-1 ------- While the intent of this guidance is to provide sufficient information to apply methods for deriving RfDs, this document does not detail all relevant issues and underlying theory associated with these methods. For further information, the reader is referred to the sources cited in the reference list (in particular, USEPA, 1988; Crump et al., 1995; and Hertzberg and Miller, 1985). 3.2 HAZARD IDENTIFICATION The first step in the risk assessment involves preparing a hazard identification, based on a review of data available to characterize the health effects associated with chemical exposure. The 1988 RfD background document outlines considerations for choosing data upon which to base a hazard identification for noncancer health effects.13 Assessors should prepare a hazard identification document that describes the nature of exposure, the type and severity of effects observed, and the quality and relevance of data to humans. Well-conducted human studies are considered the best for establishing a link between exposure to an agent and manifestation of an adverse effect. In the absence of adequate human data, the Agency relies primarily on animal studies. In such cases, the principle studies are drawn from experiments conducted on laboratory mammals, most often rat, mouse, rabbit, guinea pig, dog, monkey, or hamster. Well-designed animal studies offer the benefit of controlled chemical exposures and definitive toxicological analysis. Supporting evidence provides additional information for dose-response assessment and may come from a wide variety of sources, such as metabolic and pharmacokinetic studies. In vitro studies seldom provide definitive hazard identification data, but they can often provide insight into the compound's potential for human toxicity. Important to the hazard identification is consideration of the biological and statistical significance of observed effects. The determination of whether an effect is adverse requires professional judgment. Generally, adverse health effects are considered to be those deleterious effects which are or may become debilitating, harmful, or toxic to the normal functions of an organism, including reproductive and developmental effects. Adverse effects do not include such effects as tissue discoloration without histological or biochemical effects, or the induction of the enzymes involved in the metabolism of the substance. Guidelines for defining the severity of adverse effects have been suggested by Hartung and Durkin (1986). EPA has also developed guidelines for the ranking of observed effects (USEPA, 1995) and a ranking scheme for slight to severe effects. Distinguishing slight effects such as reversible enzyme induction and reversible subcellular change from more serious effects is critical in distinguishing between a NOAEL and LOAEL. It is also important to evaluate the reversibility of an effect. Reversibility refers to whether or not a change will return to normal or within normal limits either during the course of or following exposure. However, even a reversible effect may be adverse to an organism. In "The Agency has also developed guidelines that explain the process of hazard identification for developmental (USEPA, 199 la) and reproductive (USEPA, 1994b) effects. Please refer to these EPA documents for guidance in these areas. 3-2 ------- performing a hazard identification, irreversible effects should be distinguished from less serious, but still adverse, reversible changes. The exposure conditions for toxicity tests, including the route (e.g., inhaled versus ingested), source (e.g., water versus food), and duration, should be discussed in the hazard identification. The hazard identification should also include an evaluation of the quality of studies. Elements that affect the quality of studies include the soundness of the study protocol, the adequacy of data analysis, the characterization of the study compound, the types of species used, the number of individuals per study group, the number of study groups, dose spacing, the types of observations recorded, sex and age of animals, and the route and duration of exposure (USEPA, 1988). The hazard identification should conclude with a weight-of-evidence discussion. In general, the discussion should review the results of different studies and develop an overall picture of the chemical's toxicity. Evidence for possible toxicity in humans is supported by similar results across species and across investigators. A plausible mechanism of action for the effect, as well as similar toxic activity in chemicals of similar structure, also add to the weight of evidence. 3.3 DOSE-RESPONSE ASSESSMENT The dose-response assessment involves the evaluation of toxicity data to identify doses at which statistically and/or biologically significant effects occur, and to identify NOAEL and/or LOAEL values. The effects data are also evaluated to see if there is a quantitative relationship between dose and the magnitude of the effect. Dose-response relationships can be linear, curvilinear, or U-shaped. The RfD is traditionally estimated by identifying the most appropriate NOAEL for the critical effect. The LOAEL may be used to estimate the RfD if no appropriate NOAELs have been identified. 3.4 SELECTION OF CRITICAL DATA 3.4.1 Critical Study Ideally, the scientific data for noncancer effects should include sufficient information to characterize quantitatively the incidence and severity of response as dose increases. However, complete data are frequently lacking. Instead, the Agency bases the derivation of the RfD on the NOAEL or LOAEL from a critical study or collection of critical studies. The choice of the critical study or studies to use in the derivation of the chronic RfD requires professional judgment concerning the quality of the studies, the definition of adverse effects and their level of occurrence. As part of the hazard identification, all relevant toxicity data on a chemical should be evaluated to support the establishment of the RfD. Those studies representing the best quality 3-3 ------- and most appropriate data should be considered for defining adverse effects and their level of occurrence. In choosing a study on which to base the RfD, the Agency recommends a hierarchy of acceptable data. Most preferable is a well-conducted epidemiologic study that demonstrates a positive association between a quantifiable exposure to a chemical and human disease. Use of acceptable human studies avoids the problems of interspecies extrapolation, and thus, confidence in the estimate is often greater. At present, however, human data adequate to serve as a basis for quantitative risk assessment are available for only a few chemicals. Most often, inference of adverse health effects for humans must be drawn from toxicity information gained through animal experiments with human data serving qualitatively as supporting evidence. Under this condition, health effects data must be available from well-conducted animal studies and be relevant to humans based on a defensible biological rationale (e.g., similar metabolic pathways). In the absence of data from a more "relevant" species, data from the most sensitive animal species tested (i.e., the species demonstrating an adverse health effect at the lowest administered dose via a relevant route of exposure), shall generally be used as the critical study. The route of administration must be considered when choosing the critical study from among quality toxicity tests. The vehicle in which the chemical is administered is also relevant. For example, within the oral route of exposure, the bioavailability of a chemical ingested from one source (e.g., food) may differ from when it is ingested from another source (e.g., water). Usually, the toxicity database does not provide data on all possible routes, sources, and/or durations of administration. In general, the preferred exposure route is that which is considered most relevant to environmental exposure. For example, when developing drinking water standards, the Agency has placed greater weight on oral studies in experimental animals, especially those studies in which the contaminant is administered via water. However, in the absence of data on the exposure route and/or source of concern, it is the Agency's view that the potential for the toxicity manifested by one route and/or source of exposure may be relevant to other exposure routes and/or sources. EPA's Interim Methods for the Development of Inhalation Reference Doses (USEPA, 1989) discusses specific issues relevant to route-to-route extrapolation. These include issues of portal-of-entry effects, available toxicokinetic data for the routes of interest, measurements of absorption efficiency by each route of interest, comparative excretion data when the associated metabolic pathways are equivalent by each route of interest, and comparative systemic toxicity data when such data indicate equivalent effects by each route of interest. Preference should be given to studies involving exposure over a significant portion of the animal's lifespan since this is anticipated to reflect the most relevant environmental exposure. Studies with shorter time frames can miss important effects. In selected cases, studies of less than 90 days can be used for quantification, but the study must be of exceptionally high quality. In general short-term tests should not be used for anything other than interim RfDs or for developmental RfDs. However, developmental effects can sometimes be the critical effect and serve as the basis of an RfD. The duration of a developmental study is generally less than 15 days. 3-4 ------- 3.4.2 Critical Data and Endpoint The experimental exposure level representing the highest dosage level tested at which no adverse effects were demonstrated in any of the species evaluated should be used for criteria development. By basing criteria on the critical toxic effect, it is assumed that all toxic effects are prevented (USEPA, 1988). In the absence of such data, the lowest LOAEL dosage may be used for criteria development and an additional uncertainly factor for LOAEL to NOAEL extrapolation is applied. When two or more studies of equal quality and relevance exist, the geometric means of the NOAELs or LOAELs may be used. Often a chemical may elicit multiple effects, each with a different NOAEL and LOAEL. From among these effects, the Agency selects a critical endpoint. The critical endpoint is generally the effect that exhibits the lowest LOAEL (USEPA, 1988). 3.5 DERIVING RFDS USING THE NOAEL/LOAEL APPROACH The 1988 RfD background document describes methods used to derive an RfD for a given chemical and criteria for selection of the critical NOAEL or LOAEL. Appropriate UFs and modifying factors (MF) are then applied to the selected endpoint to derive the RfD. The general equation for deriving the RfD is (USEPA, 1988): T>^ / /i A ^ NOAEL LOAEL RfD (mg/kg-day) = or UF*MF UF*MF (Equation 3-1) where: NOAEL = An exposure level at which there are no statistically or biologically significant increases in the frequency or severity of observed adverse effects between the exposed population and its appropriate control; some effects may be produced at this level, but they are not considered as adverse, nor precursors to specific adverse effects. LOAEL = The lowest experimental exposure level at which there are statistically or biologically significant increases in frequency or severity of observed adverse effects between the exposed 3-5 ------- population and its appropriate control group. The LOAEL may be used if the NOAEL cannot be determined. UF = An uncertainty factor which reduces the dose to account for several areas of scientific uncertainty inherent in most toxicity databases. Standard UFs are used to account for variation in sensitivity among humans, extrapolation from animal studies to humans, and extrapolation from less than chronic NOAELs to chronic NOAELs. An additional UF may be employed if a LOAEL is used to define theRfD. MF = A modifying factor, to be determined using professional judgment. The MF provides for additional uncertainty not explicitly included in UF, such as completeness of the overall database and the number of species tested. (The value for MF must be greater than zero and less than or equal to 10; the default value for the MF is 1). The RfD is generally expressed in units of milligrams per kilogram of body weight per day (mg/kg-day). 3.5.1 Selection of Uncertainty Factors and Modifying Factors The choice of appropriate UFs and MFs must be a case-by-case judgment by experts and should account for each of the applicable areas for uncertainty and nuances in the available data that impact uncertainty. Several reports describe the underlying basis of UFs (Zielhuis and van der Kreek, 1979; Dourson and Stara, 1983) and research into this area (Calabrese, 1985; Hattis et al., 1987; Hartley and Ohanian, 1988; Lewis et al., 1990; Dourson et al., 1992). The UFs summarized in Table 3-1 account for five areas of scientific uncertainty inherent in most toxicity databases: inter-human variability (H) (to account for variation in sensitivity among the members of the human population); experimental animal-to-human extrapolation (A); subchronic to chronic extrapolation (S) (to account for uncertainty in extrapolating from less- than-chronic NOAELs (or LOAELs) to chronic NOAELs); LOAEL-to-NOAEL extrapolation (L); and database completeness (D) (to account for the inability of any single study to adequately address all possible adverse outcomes). Each of these five areas is generally addressed by the Agency with a factor of 1, 3, or 10. The default value is 10. 3-6 ------- Table 3-1. Uncertainty Factors and the Modifying Factor Uncertainty Factor Definition UFH Use a 1-, 3-, or 10-fold factor when extrapolating from valid data in studies using long-term exposure to average healthy humans. This factor is intended to account for the variation in sensitivity (intraspecies variation) among the members of the human population. UFA Use an additional 1-, 3-, or 10-fold factor when extrapolating from valid results of long-term studies on experimental animals when results of studies of human exposure are not available or are inadequate. This factor is intended to account for the uncertainty involved in extrapolating from animal data to humans (interspecies variation). UFS Use an additional 1-, 3-, or 10-fold factor when extrapolating from less-than-chronic results on experimental animals when there are no useful long-term human data. This factor is intended to account for the uncertainty involved in extrapolating from less-than-chronic NOAELs to chronic NOAELs. UFL Use an additional 3- or 10-fold factor when deriving an RfD from a LOAEL, instead of a NOAEL. This factor is intended to account for the uncertainty involved in extrapolating from LOAELs to NOAELs. UFD Use an additional 1-, 3-, or 10-fold factor when deriving an RfD from an "incomplete" database. Missing studies, e.g., reproductive, are often encountered with chemicals. This factor is meant to account for the inability of any study to consider all toxic endpoints. The intermediate factor of 3 (i/2 log unit) is often used when there is a single data gap exclusive of chronic data. It is often designated as UFD. Modifying Factor Use professional judgment to determine the MF, which is an additional uncertainty factor that is greater than zero and less than or equal to 10. The magnitude of the MF depends upon the professional assessment of scientific uncertainties of the study and database not explicitly treated above (e.g., the number of species tested). The default value for the MF is 1. Note: With each UF or MF assignment, it is recognized that professional scientific judgment must be used. In addition, an MF may be used to account for areas of uncertainty that are not explicitly considered using the standard UF. This value of the MF is greater than zero and less than or equal to 10, but it should generally be used on a log 10 basis (i.e., 0.3, 1, 3, 10) as are the standard UFs. The default value for this factor is 1. The Agency's reasoning in its use of the MF is that the areas of scientific uncertainty labeled H, A, S, L, or D do not represent all of the uncertainties in the estimation of an RfD. For 3-7 ------- example, the fewer the number of animals used in a dosing group, the more likely it is that no adverse effect will be observed at a dose point which may have had an effect in a larger population. Such a case might argue for modifying the usual 10-fold factorsa 100-fold UF might be raised to 250 if too few animals were used in a chronic study. While this increase is scientifically reasonable, it introduces two difficulties: the adjustments applied could differ between risk assessors, and the applied precision of the result might not be justified by the data. For example, a UF of 250 has an implied precision of 2 digits and is not appropriate in relation to the variability of the biological response. The Agency intends to avoid these difficulties through limiting the options for the modifying factor (1, 3, 10). In practice, the magnitude of the overall UF is dependent on professional judgment as to the total uncertainty in all areas. When uncertainties exist in one, two or three areas, the Agency generally uses 10-, 100-, and 1,000-fold UF respectively. When uncertainties exist in four areas, the Agency generally uses an UF no greater than 3,000. It is the Agency's opinion that toxicity databases that are weaker and would result in UFs in excess of 3,000 are too uncertain as a basis for quantification. In such cases, the Agency does not estimate an RfD, and additional toxicity data are sought or awaited. For a few chemicals, an UF of 10,000 was applied. However, in such cases, the risk assessment was completed before current policies for the maximum UF were in place. The Agency occasionally uses a factor of less than 10 or even a factor of 1, if the existing data reduce or obviate the need to account for a particular area of uncertainty. For example, the use of a 1-year rat study as the basis of an RfD may suggest the use of a 3-fold, rather than 10- fold, factor to account for subchronic to chronic extrapolation, since it can be empirically demonstrated that 1-year rat NOAELs are generally closer in magnitude to chronic values than are 3-month NOAELs (Swartout, 1990). Lewis et al. (1990) more fully investigate this concept of variable uncertainty factors through an analysis of expected values. The modification of UFs from their standard values should follow the general guidelines for composite UFs and the overall precision of one digit for UFs. The composite uncertainty factor to use with a given database is again strictly a case-by-case judgment by experts. It should be flexible enough to account for each of the applicable five areas of uncertainty and any nuances in the available data that might change the magnitude of any factor. The Agency describes its choice for the composite UF and sub-components for individual RfDs on its IRIS. Table 3-2 presents examples of the UFs employed for several chemicals recently accepted into IRIS through the consensus process. Because of the high degree of judgment involved in the selection of UFs and MFs, the risk assessment justification should include a detailed discussion of the selection of these factors, along with the data to which they are applied. 3-8 ------- Table 3-2. Examples of Uncertainty Factors and Modifying Factors from IRIS Risk Assessments Chemical Total UF MF Rationale Barium The RfD is based on NOAELS from two human studies that were supported by NOAELS and LOAELS from two well designed animal studies. The UF of 3 was applied because of inadequate data on differences between adults and children with regard to the critical effect (hypertension) and incomplete data on possible developmental effects. Beryllium 300 The RfD is based on a BMD10 from a dietary study in dogs. The UF includes 10-fold values for intraspecies and interspecies variability and a 3 to accommodate database deficiencies regarding human effects via the oral route, reproductive/developmental, and immunological effects. Chromium VI 300 The RfD was based on a NOAEL from an animal study. The UF includes 10-fold values for intraspecies and interspecies variability and a 3 to accommodate the less-than-lifetime exposure in the principle study. A modifying factor of 3 was added because of concerns for acute gastrointestinal effects in humans with reported exposures to about 20 mg/L in drinking water. Naphthalene 3000 The RfD is based on a duration-adjusted NOAEL from a subchronic animal study. The UF includes 10-fold values for intraspecies and interspecies variability and the less-than-chronic duration of the study. An additional 3-fold factor was added because of the lack of a two-generation reproductive study. 3.5.2 Confidence in NOAEL/LOAEL-Based RfD As stated previously, when available, adequate data from acceptable human studies should be used as the basis for the RfD. Use of good epidemiology studies generally give the highest confidence in RfDs. In the absence of such data, RfDs are estimated from studies in experimental animals. 3-9 ------- The Agency generally considers a "complete" database for calculating a chronic RfD for noncancer health effects to include the following: Two adequate mammalian chronic toxicity studies, by the appropriate route in different species, one of which must be a rodent. One adequate mammalian multi-generation reproductive toxicity study by an appropriate route. Two adequate mammalian developmental toxicity studies by an appropriate route in different species. For a "complete" database, the likelihood that additional toxicity data may change the RfD is low. Thus, the Agency usually has confidence in such an RfD because additional toxicity data are not likely to change the value. The Agency considers a NOAEL from a well-conducted, mammalian subchronic (90-day) study by the appropriate route as a minimum database for estimating an RfD. However, for such a database, additional toxicity data may change the RfD. Thus the Agency generally has less confidence in such an RfD. For some chemicals, an acute health hazard is the critical effect of concern. These could include neurotoxic, portal of entry, or immunotoxic effects of acute exposures at environmental levels of contaminant. In such cases, longer term studies (subchronic or chronic) that would typically be included in a review of the toxicity literature may not capture the critical endpoint. Under such circumstances, greater emphasis should be placed on characterizing the acute threshold as opposed to the potential chronic effects. Developmental toxicity data, if they constitute the sole source of information, are not considered an adequate basis for chronic RfD estimation. This is because such data are often generated from short-term chemical exposures, and, thus, are of limited relevance in predicting possible adverse effects from chronic exposures. However, if a developmental toxicity endpoint is the critical effect established from a "complete" database, a chronic RfD can be derived from such data, applying the uncertainty and MFs normally required. Developmental data are the basis for developmental reference doses (RfDDT).14 The term RfDDT is used to distinguish the developmental value from the chronic RfD which refers to chronic exposure situations. Uncertainty factors for developmental toxicity include a 10-fold factor for interspecies variation and a 10-fold factor for intraspecies variation; in general, an uncertainty factor is not applied to "A RfD for developmental toxicity (RfDDT) is discussed in USEPA (1991a). 3-10 ------- account for duration of exposure. In some cases, additional factors may be applied due to a variety of uncertainties that exist in the database. For example, the standard study design for developmental toxicity study calls for a low dose that demonstrates a NOAEL, but there may be circumstances where a risk assessment must be based on the results of a study in which a NOAEL for developmental toxicity was not identified. For details regarding risk assessment for developmental toxicants, refer to EPA'sFinal Guidelines for Developmental Toxicity Risk Assessment (USEPA, 1991b). 3.5.3 Presenting the RfD as a Single Point or as a Range Although the RfD has traditionally been presented and used as a single point estimate, its definition contains the phrase "... an estimate (with uncertainty spanning perhaps an order of magnitude) . . ." (USEPA, 1988). Underlying this concept is the reasoning that during the derivation of the RfD, the selection of the critical effect and of the total uncertainty factor is based on the "best" scientific judgment of the Agency Work Group and that other groups of competent scientists examining the same database would reach a similar conclusion, within an order of magnitude. Presenting the RfD as a range may be more appropriate than expressing it as a point estimate because rarely are sufficient data available to precisely determine a lifetime threshold for a human. Even when there are good, reliable data, the variability of response in the human population argues for expressing the RfD as a range. However, although EPA supports the use of a range that spans one order of magnitude for most RfDs, there are a number of potential interpretations of the term "order of magnitude" as described below: Range = x to lOx. (where point estimate of RfD=x). This view is supported by those who believe that the risk assessment process is so inherently conservative that the RfD should be considered to be the lowest estimate, with the range of imprecision all resting above this point estimate. Range = 0.3x to 3x. This view is held by many EPA scientists who have developed RfDs. The RfD point estimate, x, is the midpoint of a range that spans an order of magnitude. Range = 0. Ix to x. This is the view held by many risk managers. Regulatory decisions (e.g., setting of standards or cleanup levels) are made based on the assumption that standards or cleanup levels are protective as long as they do not exceed the RfD. Range = O.lx to lOx. This range represents the assumption that the order of magnitude range could be on either side of the point estimate x. 3-11 ------- The Agency is proposing a risk management approach where the upper and lower bounds of the range are correlated to the uncertainty. Because the uncertainty around the dose response relationship increases as extrapolation below the observed data increases, the use of an alternative point within the range for the RfD may be more appropriate in characterizing risk than the calculated point estimate. Therefore, as a matter of risk management policy, it is proposed that if the product of the UFs and MF used to derive the RfD is 100 or less, there can be no consideration of a range. When greater than 100 but less than 1,000, a range can be established which is one half of a Iog10 (3-fold) or a number ranging from the point estimate divided by 1.5 to the point estimate multiplied by 1.5. With a UF of 1,000 and above, the range can span a number ranging from the point estimate divided by 3 to the point estimate multiplied by 3 (a 10-fold range). A risk assessor can then select a single point within the defined range to use as an alternate to the calculated RfD. The use of an alternative value within the range defined by the uncertainty must be justified. As used in this document, justification means that there are scientific data which indicate that some value in the range other than the point estimate may be more appropriate than the point estimate, based on human health or environmental fate considerations. One example of a situation where a point other than the calculated RfD might be applied would be where there is a lower bioavailability of the contaminant in fish than in water. In such an instance, the decreased bioavailability from fish tissues could be used to support selection of an RfD value greater than the calculated value if the critical study were one where the contaminant was administered through drinking water. For example, most inorganic contaminants, particularly divalent cations, have bioavailability values of 20 percent or less from a food matrix, but are much more available (about 80 percent or higher) from drinking water. Accordingly, the external dose necessary to produce a toxic internal dose would likely be higher for a study where the exposure occurred through the diet rather than the drinking water. As a result, the RfD from a dietary study would likely be higher than that for the drinking water study if equivalent external doses had been used. The exposures considered in the derivation of the AWQC include fish (food) and water. Thus, one might be able to justify an alternative value to the RfD estimate that was slightly higher than the RfD estimate in cases where the NOAEL that was the basis for the RfD came from a drinking water study, but slightly lower than the RfD estimate if the NOAEL was from a dietary study. Another situation where a point from the lower end of the range could be selected is one where there is a well-defined sensitive population, such as women in the first trimester of pregnancy. In this situation, the presence of the contaminant in both water and fish and average body weights for women of reproductive age that are less than the 70 kg default may justify an alternative value from the low end of the range about the RfD estimate. 3-12 ------- Table 3-3 gives examples of some factors to consider when determining whether to use the point estimate of the RfD or values higher or lower than the point estimate. The factors presented in Table 3-3 should be considered in making the decision as to whether or not to use a value other than the point estimate. EPA advocates the use of the point estimate of the RfD as the default to derive the AWQC. Table 3-3. Some Scientific Factors to Consider When Using the RfD Range Use point estimate RfD Use point from the lower range of RfD Use point from the upper range of RfD - Default position - Total UF/MF product is 100 or less - Essential nutrient - Increased bioavailability from the exposure vehicle verses the experimental conditions used in the RfD study - The seriousness of the effect and whether or not it is reversible - A shallow dose-response curve in the range of observation - Exposed group contains a sensitive population (e.g., children or fetuses) - Decreased bioavailability for humans as compared to experimental animals - RfD based on minimal LOAEL and a UF/MF of 1,000 or greater - A steep dose-response curve in the range of observation - No sensitive populations identified There are many factors that can affect the uncertainty in the RfD, and thereby affect the selection of an alternative value within a range. The completeness of the database plays a major role. Observing the same effects in several animal species, including humans, can increase confidence in the RfD point estimate and thereby narrow the range of uncertainly. Other factors that can affect the uncertainty are the slope of the dose-response curve, seriousness of the observed effect, spacing of doses, and the route of exposure. For example, a steep dose-response curve indicates that relatively large differences in effect occur with a small change in dose; thus, there will be a greater chance that the data will allow scientists to distinguish clearly (i.e., statistically) between doses that produce an effect and those that do not. For a situation where the RfD is derived from a LOAEL for a serious effect, an additional uncertainty factor is often used in the RfD derivation to protect against less serious effects that could have occurred at lower doses had lower doses been evaluated. Dose spacing and the size of the study groups used in the experiment can also affect the confidence in the RfD. The "true" NOAEL is not identified by a standard toxicology study. The wider the dose spacing, and the smaller the number of animals studied, the greater the margin of uncertainty about where the "true" NOAEL may fall. Finally, for some RfDs, the route of exposure in the experiment may not match the route of exposure for 3-13 ------- humans, and interroute extrapolation or toxicokinetic modeling may be considered using assumptions about differences in absorption rates between routes. There are cases where an alternative value within a range should not be used. For example, the RfD for zinc (USEPA, 1992) is based on consideration of nutritional data, a minimal LOAEL, and a UF of 3. If the factor of 3 were used to bound the RfD for zinc, then the upper- bound level would approach the minimal LOAEL. This situation must be avoided, since it is unacceptable to set a standard at levels that may cause an adverse effect. The risk manager must be informed of those specific cases when it is not scientifically correct to use the RfD range. Table 3-3 provides managers with guidelines on the scientific basis for using the range. 3.6 DERIVING AN RFD USING A BENCHMARK DOSE APPROACH A number of issues have been raised regarding the development of the RfD based on the traditional NOAEL/LOAEL approach. These concerns include the following: The traditional approach does not incorporate information on the shape of the dose- response curve, but focuses only on a single point (the NOAEL or LOAEL). The value of the NOAEL depends on the number of doses and spacing of the doses in the experiment. The possible NOAEL values are limited to the discrete values of the experimental doses. Theoretically, the experimental no adverse effect level could be any value between the experimental NOAEL and the LOAEL, and sometimes the true NOAEL is below the observed NOAEL, especially in studies with a limited number of animals in each dose group. Data variability is not directly taken into account. For example, studies based on a larger number of animals may detect effects at lower doses than studies with fewer animals; as a result, the NOAEL from a small study may be higher than the NOAEL from a similar but larger study in the same species. The traditional approach does not have a mechanism to account for such data variability. The determination of the NOAEL is dependent on the background incidence of the effect in control animals; therefore, statistically significant differences between the dose groups and the control group are more difficult to detect if background incidence is relatively high, even if biologically significant effects occur. In conjunction with exposure data, the NOAEL-based RfD can be used to estimate the size of the population at risk, but not the magnitude of the risk. 3-14 ------- In response to these concerns, alternative approaches have been developed that attempt to address some of these shortcomings. One such alternative, the BMD approach, has been the subject of extensive research over the past decade (Crump 1984, 1995; Gaylor, 1983, 1989; Dourson et al., 1985; Brown and Erdreich, 1989; Kimmel, 1990; Faustman et al., 1994; Allen et al., 1994a, 1994b). The EPA Risk Assessment Forum is in the process of developing guidance on procedures and models to be used in the calculation of BMDs. The following discussion presents the general methods for calculation of a RfD using the BMD approach; for more extensive discussion, the reader is referred to Crump et al. (1995). To date, the Agency has used the BMD approach for deriving the RfD for methylmercury (USEPA, 1994a) and the RfC for several compounds. 3.6.1 Overview of the Benchmark Dose Approach A BMD or benchmark concentration (BMC) is defined as a statistical lower confidence limit on the dose producing a predetermined level of change in response (the benchmark response, or BMR) relative to controls. The BMD/BMC is intended to be used as an alternative to the NOAEL in deriving a point of departure for low dose extrapolations. The BMD/BMC is a dose corresponding to some change in the level of response relative to background and is not dependent on the doses used in the study. The BMR is based on a biologically significant level of response or on the response level at the lower end of the observable range for a particular endpoint. The BMD/BMC approach does not reduce uncertainty inherent in extrapolating from animal data to humans (except for that in the LOAEL to NOAEL extrapolation), and does not require that a study identify a NOAEL. The BMD/BMC approach requires only that at least one dose be near the range of the response level for the BMD/BMC. Modeling of dose and response is central to the BMD approach. The modeling process is limited to the experimental range and no attempt is made to extrapolate to doses far below the experimental range. Generally, the models used in the BMD approach are statistical rather than biologically-based models; thus, they cannot be reliably used to extrapolate to low doses without incorporating detailed information on the mechanisms through which the toxic agent causes the particular effect being modeled. Once a mathematical dose-response curve and its corresponding curve of confidence limits are established, the assessor selects a point on the lower confidence dose curve corresponding to the chosen BMR. This point on the lower confidence curve is the lower confidence bound of the effective dose for that BMR (denoted as the BMD) (see Exhibit 3-1). A BMD may be calculated for each endpoint for which there is an adequate database. 3-15 ------- Exhibit 3-1 Derivation of RfD Using BMD Approach 100% j 90% -- 80% -- 70% -- 60% -- O 50% + w 0) 40% 30% -- 20% -- 10% -- 0% Dose Response Modeling uses animal or human data RfD 50 100 150 Dose 200 250 300 The BMD approach offers a number of advantages over the traditional approach for deriving the RfD from the NOAEL/LOAEL divided by uncertainty factors. Some of the advantages of the BMD approach are the following: it considers the dose-response curve, including its shape; it better accounts for statistical variability in the data; and it is not overly sensitive to dose spacing and, thus, is not limited to experimental doses for determining the effect level. However, the data requirements for using the BMD approach are more extensive than those for the NOAEL/LOAEL approach. 3-16 ------- Studies with small group sizes and evaluation of a limited number of endpoints will tend to yield lower BMD values because the confidence bands will be wider. Therefore, the BMD approach provides an incentive to conduct more robust studies, since better studies give narrower confidence bands. 3.6.2 Calculation of the RfD Using the Benchmark Dose Method The determination of an RfD using the BMD approach involves four basic steps. The first step involves the selection of the experiments and responses that will be used for modeling the BMD. Second, BMDs are calculated for the selected responses; BMD values should be calculated for all endpoints that have the potential for yielding the critical BMD. Third, a single BMD is selected from among those calculated. Finally, the RfD is calculated by dividing the chosen BMD by appropriate UFs. The decision points associated with these steps are outlined in Table 3-4. The discussion that follows summarizes the critical issues unique to the BMD approach and is based largely on information from Crump et al. (1995). Table 3-4. Steps and Decisions Required in the BMD Approach Step 1. 2. 3. 4. Selection Study /Response Model dose-response Select BMD(s) Calculate RfD Decisions 1. 2. 1. 2. O 4. 1. 2. 1. Experiments to include Responses to model Format of data Mathematical model(s) Handling model fit Measure of altered response Critical BMR Confidence limit calculation Uncertainty factors Source: Crump et al., 1995 3.6.2.1 Selection of Response Data to Model The selection of experiments and responses suitable for BMD modeling involves considerations similar to those for identifying the appropriate studies upon which to base a NOAEL. There may be several appropriate studies and relevant health effects that could be modeled for a chemical. Ideally, BMD calculations would be performed for the complete set of relevant effects. However, utilizing all relevant responses for the calculation of BMDs may be 3-17 ------- resource-intensive. Further, it is difficult to interpret results from a large number of dose- response analyses. When selecting the data to model it is considered appropriate to limit attention to those responses for which there is evidence of a dose-response relationship. Statistically, such a relationship may be indicated by significant trends (either increasing or decreasing) in the response as dose level increases. Considerations of biological significance may also be warranted. Another alternative is to focus efforts on modeling the most critical effects as seen at the LOAEL. However, limiting the number of responses modeled may potentially misrepresent the minimum BMD.15 3.6.2.2 Use of Categorical Versus Continuous Data A central issue in the selection of data to model concerns the form of the data used. Categorical data, particularly quantal data, are relatively straightforward to use in the BMD approach, since the data are expressed as the number (or percent) of subjects exhibiting a defined response at a given dose. Data may also be of the continuous form, where results are expressed as the measure of a continuous biological endpoint, such as a change in organ weight or serum enzyme level. With continuous data the results are generally presented in terms of means and standard deviation for dose groups but are most valuable when data for individual animals are available. To perform dose-response modeling of such data, the bounds for a normal response as opposed to an adverse response must be decided. Continuous data can be modeled by looking at the mean response for each dose group as a fraction of the mean response of the control group or as the percentage of animals showing an adverse response at each dose level (Gaylor and Slikker, 1990; Crump, 1995). Such approaches take advantage of the continuous nature of the response data, but express the results in terms that are directly comparable to those derived from analysis of categorical data, i.e., in terms of additional or extra risk, rather than in terms of changes in mean response. Crump (1995) provided options for handling continuous data that can be applied to the same models used for analysis of quantal endpoints. Such developments have enhanced the consistency of results across different endpoints for any particular chemical. In any case, application of the BMD approach to continuous data requires professional judgment in order to determine what level or category of response constitutes an abnormal (adverse) effect. The BMD approach is not recommended for routine use but may be used when data are available and justify the extensive analyses required. 3.6.2.3 Choice of Mathematical Model Various mathematical approaches have been proposed for determining the BMD. Table 3-5 shows a number of dose-response models that may be used for estimating the BMD with quantal or continuous data. The EPA Benchmark Dose modeling program (Version 1.2) includes 15This is due to the fact that an effect seen only at doses above the LOAEL but having a shallow dose-response could produce a lower BMD than an effect seen at the LOAEL, which has a steeper dose-response. 3-18 ------- Table 3-5. Dose-Response Models Proposed for Estimating BMDs Model Formula Quantal Data Quantal linear regression (QLR) Quantal quadratic regression (QQR) Quantal polynomial regression (QPR) Quantal Weibull (QW) Log-normal (LN) c + (l-c){l-exp[-q1(d-d0)]} c + (l-c){l-exp[-qi(d-d0)2]} c + (l-c){l-exp[-q1d1-...qkdk]} c + (l-c){l-exp[-qidk]} P(d) = c + (l-c)N(a+blogd) Continuous data Continuous linear regression (CLR) Continuous quadratic regression (CQR) Continuous linear-quadratic regression (CLQR) Continuous polynomial regression (CPR) Continuous power (CP) m(d) = c m(d) = c + qi(d-d0)2 m(d) = c + qxd+q2d2 m(d) = c + q1d+...+qkdk m(d) = c + qi(d)k Notes: P(d) is the probability of a response at the dose, d; m(d) is the mean response at the dose, d. In all models, c, qi,...^, and d0 are parameters estimated from the data. For the quantal models, 0" c* 1 and cjj" 0. For the CPR model proposed by Crump (1984), all the c^ have the same sign. In the CLQR model discussed by Gaylor and Slikker (1990), q] and c^ were not constrained to have the same sign. For all models, do* 0, k* 1. N(x) denotes the normal cumulative distribution function. SOURCE: Crump et al., 1995. 3-19 ------- option for using gamma, logistic, multistage, probit, quantal-linear, quantal-quadratic and Weibull models for quantal data. Linear, polynomial, power and Hill models are available for use with continuous data. The EPA software for benchmark modeling can be downloaded from http://www.epa.gov/ncea/bmds.htm. The Agency is also developing guidance for use of BMD model results. Information generally required for application of dose-response models for categorical (including quantal) data includes the experimental doses, the total number of animals in each dose group, and the number of these whose responses are in each of the categories of response. For continuous data, the experimental doses, number of animals in each dose group, mean response in each group, and sample variance of response in each dose group are needed. The BMD approach should not be applied to data sets with only two experimental groups (a control and one positive dose). In such cases, much of the advantage of the BMD approach with respect to consideration of the dose-response shape will be lost; such data supply little information about the shape of the dose-response curve. The more doses available, especially at lower doses, the greater the expected benefit of the BMD approach as compared to the NOAEL- based approach. 3.6.2.4 Handling Model Fit Fitting the models to experimental data gives estimates of the parameters that help determine the model which has the best fit to the data. This fitting, usually accomplished through maximum likelihood methods, estimates the probability of response (for quantal data) or the mean response (for continuous data) for each dose level. Goodness-of-fit tests can be used to determine if a model adequately describes the dose-response data. The experimental data should be plotted against the model projection, thereby providing a visual representation of fit. In many cases, several models may appear to fit the data well. In these cases, other considerations can be used to select an appropriate model. For example, the statistical assumptions underlying the model should be reasonable for the given data. Quantal results, for example, are assumed to follow a binomial distribution around a dose-dependent expected value. This assumption requires that each subject responds independently and that all have an equal probability of responding. Continuous responses for each dose level are assumed to follow a normal distribution and are also assumed to be independent. When biological factors may be important (e.g., intralitter correlation for developmental toxicity data) they may also be used to select appropriate models. Another biological consideration may be whether or not a threshold is assumed to exist. If a threshold is expected for the given effect, then a model that allows for a threshold dose may be chosen for modeling. The biological plausibility of the dose-response curve shape should always be a consideration in model selection. 3-20 ------- Even with these considerations, several different models may often adequately describe the data. In such cases it is important to examine fit about the BMR. Models that have similar fit to the entire data set may differ with respect to their predictions near the BMR. It may be possible to select one model over another on the basis of that more local behavior. In certain data sets, none of the standard models may provide a reasonable fit to the data. Fit is assessed statistically by comparing the model predictions to the observations. Goodness-of- fit statistics formalize that comparison and provide p-values, ranging between 0 and 1, as a measure of fit. When using a 2 statistic, larger p-values are indicative of good fit; smaller p- values of poorer fit. Sufficiently small p-values (e.g., less than 0.01 or 0.05) are typically viewed as an indication that the model was not adequate for describing the observed dose-response pattern. Poor fit is often due to reduced responses at higher doses that are inconsistent with the dose-response trend for lower doses, perhaps due to competing toxic processes or saturation of metabolic systems related to the toxic response of interest. Several procedures can be used to adjust the modeling process in these circumstances. For example, responses at the highest doses could be eliminated, since those doses are usually least informative of responses in the lower dose region of interest. In the case of saturated metabolic pathways, toxicokinetic data can be used to estimate delivered dose to the organ of interest. The BMD modeling can then be conducted on the delivered dose (Andersen et al., 1987, 1993; Gehring et al., 1978). Visual (graphical) examination of the model predictions in relation to the observations is an essential exercise with respect to all of these fit issues. This supplements the formal statistical assessment of fit and may, in fact, be equally or more informative. Biological plausibility is another critical factor to consider when selecting the best BMD from among several options. 3.6.2.5 Measure of Altered Response Crump (1984) proposed two measures of increased response for quantal data. These are additional risk and extra risk. Additional risk is the probability of response at dose d, P(d), minus the probability of response at zero dose (control response), P(0). It describes the additional proportion of animals that respond in the presence of a dose. Extra risk is additional risk divided by [l-P(O)]. It describes the additional proportion of animals that respond in the presence of a dose, divided by the proportion of animals that would not respond under control conditions. These measures are distinguished in the way they account for control responses. For example, if a dose increases a response from 0 to 1 percent, both the additional risk and the extra risk is 1 percent. However, if a dose increases risk from 90 to 91 percent, the additional risk is still 1 percent, but the extra risk is 10 percent. The choice of extra risk versus additional risk is based to some extent on assumptions about whether an agent is adding to the background risk. Extra risk is viewed as the default because it is more conservative. 3-21 ------- Analogous measures of risk have been proposed for continuous data (Crump, 1984). First, altered response can be expressed as the difference between the mean response to dose d minus the mean control response. The second measure is simply the difference between dose and control means divided by (i.e., normalized by) the control mean response. The second measure expresses change as a fraction of the control response rather than as an absolute change. More recent consideration of BMDs for continuous endpoints have suggested other alternatives. Allen et al. (1994a, 1994b) and Kavlock et al. (1995) determined that normalizing changes in mean responses by a multiple of the background standard deviation produced BMDs that were comparable, on average to NOAELs. For the developmental endpoints that those investigators studied, the preferred multiple for the standard deviation was 0.5. It is not clear when measures of risk expressed relative to the background (e.g., extra risk) are preferable to measures expressed as absolute changes. Additional research is required to provide guidance regarding the measure of altered response that is most appropriate in particular circumstances. 3.6.2.6 Selection of the BMR A critical decision for deriving the BMD is the selection of the Benchmark Response (BMR). Since the BMD is used like a NOAEL in the derivation of the RfD, the BMR should be selected near the low end of the range where effects were detected in a study. The dose predicted to cause a 10 percent increase in the incidence of the effect in the test population (ED10) is frequently chosen as the BMR. For some data, it may be possible to adequately estimate the ED05 or ED01, which are closer to a true no-effect dose. However, in many cases the ED10 is the lowest level of risk that can be estimated from standard toxicity studies (Crump, 1984). During a BMD Workshop, sponsored by EPA, participants generally agreed that the appropriate BMR should either be 5 percent or 10 percent, but acknowledged that future research might demonstrate the advisability of selecting one value over another (TLSI, 1993). Research by Allen et al. (1994a, 1994b) and Faustman et al. (1994) indicates that BMDs defined in terms of 10 percent increases in probability of response tend to be, on average, similar to corresponding NOAELs for quantal developmental toxicity studies. For the purposes of water quality criteria derivation, EPA recommends the use of the ED05 or ED10 when deriving a BMD. 3.6.2.7 Calculating the Confidence Interval The BMD is defined to be the lower confidence bound on the dose corresponding to the selected BMR. A statistical lower confidence limit is used rather than a maximum likelihood estimate (MLE) for several reasons. The use of confidence limits accounts for population variability. Most biological responses are normally distributed within a population. Accordingly, 3-22 ------- if one were to randomly select two groups of animals from the population to study, the lowest response-level responders from one study group might differ from that for the second group exposed to identical experimental conditions. Use of the lower bound confidence interval increases the confidence that the results from a study of a small group of animals can be extrapolated to the entire population. To calculate the upper confidence bound on response and the lower bound on effective dose, one must select a procedure for calculating confidence limits and the size of the confidence limits. The recommended method used to calculate the confidence bounds on the curve relies on maximum likelihood theory. This approach is the same one used by EPA in the computer program for cancer dose-response modeling. The approach can be applied to BMD modeling using the EPA Benchmark software as well as other commercially available benchmark programs. A detailed explanation of theory supporting this approach is found in Crump (1984). By convention, the size of the statistical confidence limits can range from 90 to 99 percent. The methods of confidence limit calculation and choice of confidence limits are critical. The Agency recommends the use of one-sided 95th percentile confidence limits for BMD modeling. This is consistent with the size of the confidence limits used in cancer dose-response modeling. 3.6.2.8 Selection of the BMD as the Basis for the RfD An important decision is the choice of the appropriate BMD to use in the RfD calculation when multiple BMDs are calculated. Multiple BMDs can be calculated when different models fit the response data for a single study, when more than one response is modeled in a single study, and when there are different BMDs from different studies. When multiple BMDs are calculated because several models fit a single data set, the analyst may select the smallest BMD or combine BMDs by using the geometric mean. When multiple BMDs are calculated from different responses or different studies that examine the same endpoint, the choice among BMDs may also involve the selection of the "critical effect" and the most appropriate species, sex, or other relevant feature of experimental design. Graphic representations of the model output and experimental data, as well as an understanding of the biological mode of action, may help in the selection of the BMD. 3.6.2.9 Use of Uncertainty Factors with BMD Approach Once a single or averaged BMD is selected, the RfD can be calculated by dividing the BMD by one or more uncertainty factors. As a default, all applicable uncertainty factors used in the traditional NOAEL-based RfD approach, except for the LOAEL-NOAEL extrapolation factor, should be considered. Other factors, such as the size of the BMR and confidence bounds, biological considerations (such as the possibility of a threshold), severity of the modeled effect, 3-23 ------- and the slope of the dose-response curve, may affect the choice and magnitude of uncertainty factors (see Crump et al., 1995, for more detailed discussion). 3.6.3 Limitations of the BMD Approach The BMD approach has been proposed as an alternative procedure that can be used until biologically motivated approaches are available for some or all effects. It provides specific improvements over NOAEL-based approaches, but by no means does it resolve all issues or difficulties associated with noncancer risk assessment. The BMD approach allows for objective extrapolation of animal response data to human exposures across the different study designs encountered in noncancer risk assessment. 3.6.4 Example of the Application of the BMD Approach The following provides a simple example of the application of the BMD approach to quantal toxicity data. The example given is taken from Crump et al. (1995) for acrylamide. The purpose of presenting this example is to illustrate the method only; no actual risk value nor AWQC for acrylamide is derived. 3.6.4.1 Selection of Data to Model This example takes the approach of identifying a critical study rather than modeling all endpoints seen in valid studies. For this example, a 2-year drinking water study of chronic effects in rats is used as the critical study for acrylamide (Johnson et al., 1986). The endpoint examined was tibial nerve degeneration in male rats. The researchers recorded the occurrence of nerve degeneration in two categories: none or mild; and moderate or severe. Since mild nerve degeneration occurs spontaneously in older rats, and because mild degeneration showed no dose- response relationship, only moderate and severe degeneration were recorded as responses. The data are presented in quantal form, with no or mild degeneration considered "no response," and moderate to severe degeneration recorded as a response. The dose levels and number of animals responding in each dose group are shown in Table 3-6. 3.6.4.2 Choice of Mathematical Model From Table 3-5, we can select from among the various models available for quantal data. Fitting is accomplished through the use of maximum likelihood estimation to estimate the probability of a response at each dose level. The actual fitting exercise is done through the use of computer software. 3-24 ------- Table 3-6. Rats Experiencing Moderate or Severe Nerve Degeneration in Response to Acrylamide Dose Dose (mg/kg-day) 0 0.01 0.1 0.5 2.0 Number affected 9 6 12 13 16 Number tested 60 60 60 60 60 3.6.4.3 Results of Information Above All of the models can be tried to see which achieves the best fit. The following Exhibits illustrate the best-fit modeling of the study data for the Weibull model (Exhibit 3-2) and the quadratic model (Exhibit 3-3). Table 3-7 provides the best-fit model parameters for the two equations. Note that in example given here, the measure of altered response is extra risk (ER), which is defined as: ER(d) = (Equation 3.2) where: ER d P Extra risk Dose Probability Extra risk is the fraction of animals that respond when exposed to a dose, d, among animals who otherwise would not respond. 3-25 ------- Exhibit 3-2. Quantal Weibull Regression - Extra Risk 0.50 0.40 -- 0.30 - P(d) Modeled I P(d) Observed P(d) 99th P(d) 95th P(d) 90th -0.10 Dose (|jg/kg-d) Exhibit 3-3. Quantal Quadratic Regression - Extra Risk 0.50 0.40 -- 0.30 -- P(d) Modeled P(d) Observed P(d) 99th P(d) 95th P(d) 90th 0.00 -0.10 1.5 Dose (|jg/kg-d) 3-26 ------- Table 3-7. Best-Fit Model Parameters from Modeling of the Acrylamide Data Model Quantal Weibull Quantal quadratic Background rate 0.15 0.16 ql 0.08 0.034 k 1 Chi-square p value 0.48 0.34 Both models fit the data adequately as shown in Table 3-7. In both cases the chi-squared goodness of fit yields P-values greater than 0.05. Therefore, either model can be used for derivation of BMD. Neither model, as fitted to this data set, suggests a threshold for this response. However, both models do indicate a background rate in the absence of exposure to acrylamide. 3.6.4.4 Selection of the BMR For the data set discussed above, the BMDs were calculated using the quantal Weibull and the quantal quadratic models for 1, 5, and 10 percent extra risk (Table 3-8 estimates are in units of mg/kg-day): Table 3-8. BMD Values Calculated Using Quantal Weibull and Quadratic Models Model Weibull Quadratic BMR 10 5 1 10 5 1 BMD (mg/kg-day) for Confidence Limit: 90th percentile 0.73 0.35 0.07 1.28 0.89 0.39 95th percentile 0.64 0.31 0.06 1.19 0.83 0.37 99th percentile 0.52 0.25 0.05 1.06 0.74 0.33 The calculated BMDs are about a factor of two apart for the BMD10 values, but are about a factor of six apart for the BMDj. This demonstrates the model dependence of the BMD values when low BMR levels are selected. 3-27 ------- 3.6.4.5 Calculating the Confidence Interval As shown in Table 3-8, the BMDs were calculated for 90th, 95th, and 99th percentile confidence limits. The effect of the confidence limit on the estimated BMD was slightly less for the quantal quadratic than for the quantal Weibull. Model results were most comparable for the 90th and 95th percentile confidence limits and least comparable for the 99th percentile confidence limits. These results demonstrate that the BMD tends to be more model-dependent for wider (higher percentile) confidence intervals. For the remainder of the example, the 95th percentile confidence limit estimate is used. 3.6.4.6 Selection of the BMD as the Basis for the RfD The example above yields different 95th percentile BMD10 values based on the two models. Since there is no basis upon which to eliminate one of the BMDs (i.e., goodness of fit, statistical assumptions and biological considerations), both must be considered. Either the smaller estimate or a geometric average may be used. In this case, the selection of which BMD to use is a risk management decision. In the example, the lower of the two BMDs (0.64) was chosen for the RfD calculation since it is the more conservative value. 3.6.4.7 Use of Uncertainty Factors with BMD Approach Once the BMD is chosen, the RfD is derived by dividing the BMD by UFs. The same UFs applied to a NOAEL are used. In this case, a factor of 10 was selected for interspecies extrapolation and a factor of 10 for human interspecies variability. Using a total UF of 100 and applying it to the 95th percentile confidence limit BMD for 10 percent response (derived with the quantal Weibull model) yields an RfD of 0.006 mg/kg-day. 3.7 CATEGORICAL REGRESSION 3.7.1 Summary of the Method Categorical regression is another method under investigation to estimate risks associated with systemic toxicity (Dourson et al., 1997; Guth et al., 1997). In this approach, health effects are grouped into ordered severity categories (ranging from no effect to severe effect). This simplification allows for both quantal and continuous data to be utilized, as well as data that are reported qualitatively rather than quantitatively. Furthermore, information on many health effects can be considered together. Logistic regression analysis techniques are then applied to the data: the cumulative odds of falling into severity categories is the dependent variable, and the independent variables are exposure concentration, exposure duration, and other parameters. Using the regression results, the RfD is then specified as the dose at which the probability of adverse effects is sufficiently small at some level of confidence, modified, as in the NOAEL and 3-28 ------- BMD approaches, by appropriate uncertainty factors. For example, the dose of interest, D, might be defined as that dose for which one could conclude with 95 percent certainty that the probability of an adverse effect was less than 0.01. The value D would then be adjusted by uncertainty factors to derive the RfD.16 3.7.2 Steps in Applying Categorical Regression The categorical regression approach begins with a review of the toxicological database for the chemical. For each valid study, the responses observed are assigned to one of several ordered severity categories, based on biological and statistical considerations. For example, responses may be grouped into four categories: (1) no effect; (2) no adverse effect; (3) mild-to-moderate adverse effect; and (4) severe or lethal effect. These correspond to the dose categories used in setting the RfD, namely the No Observed Effect Level (NOEL), NOAEL, LOAEL, and Frank Effect Level (FEL), respectively. Judgment is required to define the types of effects that correspond to the severity categories. Since all response data are used in categorical regression analysis, there is no need to specify the lowest dose showing "mild-to-moderate" adverse effects. Accordingly, a more general term, adverse-effect level (AEL), is generally used in categorical regression in place of the term LOAEL to describe mild-to-moderate effects. The probability of observing a response in a category at a given dose level is estimated by dividing the number of responses observed in that severity category divided by the total number of observations recorded for that dose level. Sufficient numbers of dose groups in each of several categories are required for the categorical regression. The log odds for each dose and severity level is calculated, and then regressed against dose. The resulting regression equation can be used to calculate the probability of an effect of given severity for any dose. Several model structures (logistic, Weibull, or others) may be used to perform the categorical regression. Logistic regression on the ordered categories (Harrell, 1986; Hertzberg, 1989) allows the dependent variable (e.g., severity parameter) to be categorical and the independent variables to be either categorical or continuous. The goodness of the fit of the model to the data can be judged using several statistical measures: the overall 2 value; model parameter standard errors and their 2 significance levels; "Note that the logistic regression could be used to estimate the response to exposures greater than the RfD. BMD models could be used similarly, but caution is warranted when doing so in either case. 3-29 ------- concordance statistics and correlation coefficients for the overall model; and the model covariates (Hertzberg and Wymer, 1991). A variety of criteria for judging goodness of fit are currently being investigated. Some advantages of using the categorical regression to derive the RfD are that data on more than one health effect can be incorporated and likely responses to exposures above the RfD can be evaluated. Predictions for responses above the RfD can incorporate effects other than the critical effect, a limitation for both the NOAEL/UF and BMD approaches. 3.8 CHRONIC, PRACTICAL NONTHRESHOLD EFFECTS Noncarcinogenic effects are generally assumed to exhibit a threshold below which adverse effects are unlikely to occur. There are, however, exceptions to this general rule. Of particular concern are teratogenic and reproductive toxicants that may act through genetic mechanisms. EPA has recognized the potential for genotoxic teratogens and germline mutagens and discussed this issue in the 1991 Amendments to Agency Guidelines for Health Assessments of Suspect Developmental Toxicants (USEPA, 199 la) and in the 1986 Guidelines for Mutagenicity Risk Assessment (USEPA, 1986). These risk assessment guidelines raise concern for the potential for future generations inheriting chemically induced germline mutations or suffering from mutational events occurring in utero. At this time, genotoxic teratogens and germline mutagens should be considered an exception to the threshold assumption. In the absence of adequate data to support a genetic or mutational basis for developmental or reproductive effects, the default becomes an threshold approach. For such chemicals, this guidance recommends the procedures described above for noncarcinogens assumed to have a threshold. A nonthreshold approach should only be applied when there are substantial scientific data supportive of a non-threshold mechanism of toxicity, as is the situation for the neuro-developmental effects of lead. Ideally, a proposed mode of action would be available and would support the no-threshold hypothesis. Where evidence for a genetic or mutational basis does exist, a nonthreshold mechanism shall be assumed for genotoxic teratogens and germline mutagens. Since there is no well established mechanism for calculating criteria protective of human health from the effects of these agents, criteria will be established on a case-by-case basis. 3.9 ACUTE, SHORT-TERM EFFECTS States may choose to derive criteria that correspond to acute or short-term exposures. These criteria should correspond to a level of exposure that is "without appreciable risk of deleterious effects during some relatively short period of time" (USEPA, 1991c). The derivation of such values follows the same general approaches described above for criteria based on chronic effects. The primary difference lies in the type of toxicity data used as the basis for the 3-30 ------- evaluation. Generally, studies that mimic the exposure pattern and duration of interest will be considered more relevant to the development of acute or short-term criteria. This is especially important where acute or short-term effects are of a substantially different nature than low-level chronic effects. The Office of Water has established procedures for deriving Health Advisories (HAs) for one day, ten days, and longer-term. In general, HAs are developed by using NOAELs or LOAELs from studies with similar duration to the exposure period of concern, though there is some flexibility in this regard. Studies used for HAs should provide information on the critical endpoint. Studies that identify only frank toxic responses should not be used since these levels are far above the protective level targeted by HAs. More information on the derivation of HAs is given in Ware (1988). Data from short-term studies should not be used when determing the longer-term or lifetime HAs. In instances where the database is inadequate to support a longer-term or lifetime HA no value is calculated. The Agency does not use data from less-than-90-day studies purely because they are the only available data. Factors such as the toxicokinetics, potential recovery periods, and potential for bioaccumulation should be considered in judging the relevance of the data to the HA derivation. 3.10 MIXTURES Exposures to multiple contaminants may occur simultaneously. Possible interactions among chemicals in a mixture are usually placed in one of three categories: Antagonistic, where the chemical mixture exhibits less toxicity than is suggested by the sum of the toxic effects of the components. Synergistic, where the chemical mixture exhibits greater toxicity than is suggested by the sum of the toxic effects of the components. Additive, where the toxicity of the chemical mixture is equal to the sum of the toxicities of the components. Approaches to conducting a risk assessment for a mixture are presented in the 1999 draft Guidelines for Health Risk Assessment of Chemical Mixtures (USEPA, 1999). In only a few instances have the interactive effects of chemical mixtures been specifically studied. Where data on the effects of chemical mixtures exist, they should be used to characterize risk. Using the available data is especially important in cases where the resulting toxic effect from the mixture HAs been demonstrated to be greater than the sum of the individual effects. Certain categories of 3-31 ------- contaminants, in particular, persistent organic pollutants that share a common mode of action and/or target tissue, are of elevated concern when they co-occur in the fish and drinking water. Where specific data are not available on the interactive effects of particular chemical mixtures or on similar mixtures, the methods described below can be used by states to characterize risks from chemicals in a mixture. When risks from multiple chemicals are added, the quality of experimental evidence that supports the assumption of dose addition should be stated clearly (USEPA, 1999) and the approach should only be applied when data on the same or a similar mixture are not available. In cases where the chemicals in the mixture induce the same effect by similar modes of action, contaminants may be assumed to contribute additively to risk (USEPA, 1999), unless specific data indicate otherwise. To characterize risks from multiple chemical exposure to noncarcinogens, the dose for each chemical with a similar effect first is expressed as a fraction of its RfD. These ratios are added for all chemicals to obtain the chemical mixture hazard index: RfD. (Equation 3.3) where: HL^ = hazard index of the mixture (unitless) Em = the exposure to chemical m RfDm = the reference dose for chemical m n = the number of chemicals in the mixture. A hazard index greater than one implies an increased risk for non-carcinogenic effects from the mixture. However, the numerical value of the hazard index does not indicate the magnitude and severity of the risk (USEPA, 1999). Mode of action is an important consideration. Two chemicals with the same target tissue but totally different modes of action may or may not increase risk in an additive fashion. Some chemical mixtures may contain chemicals that cause dissimilar health effects. Methods currently do not exist for combining dissimilar health effects to characterize overall health concerns from chemical mixtures. Instead, States should characterize and present the risks from these contaminants separately. 3-32 ------- 3.11 REFERENCES Allen, B.C., RJ. Kavlock, C.A. Kimmel and E.M. Faustman. 1994a. Dose response assessments for developmental toxicity: II. Comparison of generic benchmark dose estimates with NOAELs. Fundam. Appl. Toxicol. 23:487-495. Allen, B.C., RJ. Kavlock, C.A. Kimmel and E.M. Faustman. 1994b. Dose response assessments for developmental toxicity: III. Statistical models. Fundam. Appl. Toxicol. 23:496-509. Andersen M., H. Clewell, M. Gargas, F.A. Smith, and R.H. Ritz. 1987. Physiologically based pharmacokinetics and risk assessment process for methylene chloride. Toxicol. Appl. Pharmacol. 87:185-205. Andersen, M., J. Mills, M. Gargas, L. Kedderis, L. Birnbaum, D. Neubert, and W. Greenlee. 1993. Modeling receptor-mediated process with dioxin: Implications for pharmacokinetics and risk assessment. Risk Analysis 13:25-26. Barnes, D.G., and M. Dourson. 1988. Reference Dose (RfD): Description and use in health risk assessments. Regul. Toxicol. Pharmacol. 8:471-486. Brown, K.G. and L.S. Erdreich. 1989. Statistical uncertainty in the no-observed-adverse-effect level. Fund. Appl. Toxicol. 13(2): 235-244. Calabrese, E. 1985. Uncertainty factors and interindividual variation. Regul. Toxicol. Pharmacol. 5:190-196. Crump, K.S. 1984. A new method for determining allowable daily intakes. Fund. Appl. Toxicol. 4:854-871. Crump, K. 1995. Calculation of benchmark doses from continuous data. Risk Analysis 15:79- 89. Crump, K.S., B. Allen, and E. Faustman. 1995. The Use of the Benchmark Dose Approach in Health Risk Assessment. Prepared for USEPA Risk Assessment Forum. EPA/630/R- 94/007. Dourson, M.L. and J. Stara. 1983. Regulatory history and experimental support of uncertainty (safety) factors. Regul. Toxicol. Pharmacol. 3:224-239. 3-33 ------- Dourson, M.L., R.C. Hertzberg, R. Hartung and K. Blackburn. 1985. Novel approaches for the estimation of acceptable daily intake. Toxicol. Ind. Health 1:23-41. Dourson, M.L., L.A. Knauf, and J.C. Swartout. 1992. On reference dose (RfD) and its underlying toxicity database. Toxicol. Ind. Health 8(3): 171-189. Dourson, M.L., L.K. Teuschler, P.R. Durkin, and W.M. Stiteler. 1997. Categorical regression of toxicity data, a case study using Aldicarb. Regulatory Toxicity and Pharmacology 25:121-129. Faustman, E.M., B.C. Allen, RJ. Kavlock, and C.A. Kimmel. 1994. Dose response assessment for developmental toxicity: I. Characterization of database and determination of NOAELs. Fundam.Appl. Toxicol. 23:478-486. Gaylor, D.W. 1983. The use of safety factors for controlling risk. J. Toxicol. Environ. Health 11:329-336. Gaylor, D.W. 1989. Quantitative risk analysis for quantal reproductive and developmental effects. Environ. Health Perspect. 79:243-246. Gaylor, D.W. and W. Slikker, Jr. 1990. Risk assessment for neurotoxic effects. Neurotoxicology 11:211-218. Gehring, P.J., PJ. Watanabe, and C.N. Park. 1978. Resolution of dose-response toxicity data for chemicals requiring metabolic activation: Example vinyl chloride. Toxicol. Appl. Pharmacol. 44:581-591. Guth, D.J., RJ. Carroll, D.G. Simpson, and H. Zhou. 1997. Categorical regression analysis of acute exposure to tetrachloroethylene. Risk Analysis 17(3):321-332. Harrell, F. 1986. The legist procedure. SUGI Supplemental Library Users Guide, Ver. 5th. Ed. SAS Institute. Cary,NC. Hartley, W.R. and E.V. Ohanian. 1988. The use of short-term toxicity data for prediction of long-term health effects. In: Trace Substances in Environmental Health - XXII. D.D. Hemphil, (ed). University of Missouri. May 23-26. Pp. 3-12. Hartung, R. and P.R. Durkin. 1986. Ranking the severity of toxic effects: Potential applications to risk assessment. Comments on Toxicology 1:49-63. 3-34 ------- Hattis, D., L. Erdreich and M. Ballew. 1987. Human variability in susceptibility to toxic chemicals A preliminary analysis of pharmacokinetics data from normal volunteers. Risk Analysis 7(4):415-426. Hertzberg, R.C. 1989. Fitting a model to categorical response data with applications to species extrapolation of toxicity. Health Physics 57: 405-409. Hertzberg, R.C. and M.E. Miller. 1985. A statistical model for species extrapolation using categorical response data. Toxicol. Ind. Health l(4):43-63. Hertzberg, R.C. and L. Wymer. 1991. Modeling the severity of toxic effects. Presentation at the 84th Annual Meeting of the Air and Waste Management Association. June 16-21, 1991. ILSI (International Life Sciences Institute). 1993. Report on the Benchmark Dose Workshop. International Life Sciences Institute, Risk Science Institute. Washington, DC. Johnson, K.A., S.J. Gorzinski, K.M. Bodner, R.A. Campbell, C.H. Wolf, M.A. Friedman, and R.W. Mast. 1986. Chronic toxicity and oncogenicity study on acrylamide incorporated in the drinking water of Fischer 344 rats. Toxicol. Appl. Pharmacol. 85:154-168. Kavlock, R.J., B.C. Allen, E.M. Faustman, and C.A. Kimmel. 1995. Dose response assessment for developmental toxicity: IV. Benchmark doses for fetal weight changes. Fundam. Appl. Toxicol. 26:211-222. Kimmel, C.A. 1990. Quantitative approaches to human risk assessment for noncancer health effects. Neurotoxicology 11:189-198. Lewis, S.C., J.R. Lynch, and A.I. Nikiforov. 1990. A new approach for deriving community exposure guidelines from no-observed-adverse-effect levels. Reg. Toxicol. Pharmacol. 11:314-330. Swartout. 1990. Personal Communication to M.L. Dourson of the Office of Technology Transfer and Regulatory Support on January 12 . Washington, DC. USEPA (U.S. Environmental Protection Agency). 1986. Guidelines for mutagenicity risk assessment. Federal Register 51:34006-34012. USEPA (U.S. Environmental Protection Agency). 1988. Reference dose (RfD): Description and use in health risk assessments. Integrated Risk Information System (IRIS). Online. Intra- 3-35 ------- Agency Reference Dose (RfD) Work Group. Office of Health and Environmental Assessment, Environmental Criteria and Assessment Office. Cincinnati, OH. February. USEPA (U.S. Environmental Protection Agency). 1989. Interim Methods for Development of Inhalation Reference Doses. Office of Health and Environmental Assessment. Washington, DC. EPA/600/8-88-066F. USEPA (U.S. Environmental Protection Agency). 199la. Amendments to agency guidelines for health assessments of suspect developmental toxicants. Federal Register 56:63798- 63826. Decembers. USEPA (U.S. Environmental Protection Agency). 1991b. Final guidelines for developmental toxicity risk assessment. Federal Register 56:63798-63826. December 5. USEPA (U.S. Environmental Protection Agency). 1991c. General Quantitative Risk Assessment Guidelines for Noncancer Health Effects. Second External Review Draft. Technical Panel for Development of Risk Assessment Guidelines for Noncancer Health Effects. Cincinnati, OH. ECAO CIN-538. USEPA (U.S. Environmental Protection Agency). 1992. Reference dose (RfD) for oral exposure for inorganic zinc. Integrated Risk Information System (IRIS). Online. (Verification date 10/1/92). Office of Health and Environmental Assessment, Environmental Criteria and Assessment Office. Cincinnati, OH. USEPA (U.S. Environmental Protection Agency). 1993. Reference dose (RfD) for oral exposure for inorganic arsenic. Integrated Risk Information System (IRIS). Online. (Verification date 2/01/93). Office of Health and Environmental Assessment, Environmental Criteria and Assessment Office. Cincinnati, OH. USEPA (U.S. Environmental Protection Agency). 1994a. Reference dose (RfD) for oral exposure for methylmercury. Integrated Risk Information System (IRIS). Online. (Verification date 11/23/94). Office of Health and Environmental Assessment, Environmental Criteria and Assessment Office. Cincinnati, OH. USEPA (U.S. Environmental Protection Agency). 1994b. Guidelines for Reproductive Toxicity Risk Assessment. External Review Draft. Risk Assessment Forum. Washington, DC. EPA/600/AP-94/001. February. 3-36 ------- USEP A (U.S. Environmental Protection Agency). 1995. RQ Document for Solid Waste. Report on the Benchmark Dose Peer Consultation Workshop: Risk Assessment Forum. Office of Research and Development. Washington, DC. EPA/630/R-96/011. November. USEPA (U.S. Environmental Protection Agency). 1999. Guidelines for the Health Risk Assessment of Chemical Mixtures. External Peer Review Draft. Risk Assessment Forum. Washington, DC. NCEA-C-0148. April. Ware, G.W. (ed). 1988. Reviews of Environmental Contamination and Toxicology: U.S. Environmental Protection Agency Office of Drinking Water Health Advisories. Vol. 104. Springer-Verlag, Inc. New York, NY. Zielhuis, R.L. and F.W. van der Kreek. 1979. The use of a safety factor in setting health based permissible levels for occupational exposure. Int. Arch. Occup. Environ. Health 42:191- 201. 3-37 ------- APPENDIX A CASE STUDY EXAMPLE HAZARD EVALUATION FOR COMPOUND Z A.I HUMAN DATA Compound Z is a metal-conjugated phosphonate. No human tumor or toxicity data exist on this chemical. A.2 ANIMAL DATA Compound Z caused a statistically significant increase in the incidence of urinary bladder tumors in male, but not female, rats at 30,000 ppm (3%, 1500 mg/kg/day) in the diet in a long- term study. Some of these animals had accompanying urinary tract stones and toxicity. No bladder tumors or adverse urinary tract effects were seen in two lower dose groups (2,000 and 8,000 ppm equivalent to 100 and 400 mg/kg/day) in the same study. A chronic dietary study in mice at doses comparable to those in the rat study showed no tumor response or urinary tract effects. A 2-year study in dogs at doses up to 40,000 ppm showed no adverse urinary tract effects. A.3 OTHER KEY DATA Subchronic dosing of rats confirmed that there was profound development of stones in the male bladder at doses comparable to those causing cancer in the chronic study, but not at lower doses. Sloughing of the epithelium of the urinary tract accompanied the stones. There was a lack of mutagenicity relevant to carcinogenicity. In addition, there is nothing about the chemical structure of Compound Z to indicate DNA reactivity or carcinogenicity. Compound Z is composed of a metal, an ethanol, and a simple phosphorus-oxygen- containing component. The metal is not absorbed from the gut, whereas the other two components are absorbed. At high doses, ethanol is metabolized to carbon dioxide, which makes the urine more acidic; the phosphorus level in the blood and calcium in the urine are increased. Chronic testing of the phosphorus-oxygen-containing component alone in rats did not show any tumors or adverse effects on the urinary tract. Because Compound Z is a metal complex, it is not likely to be readily absorbed from the skin. A-l ------- A.4 EVALUATION Compound Z produced cancer of the bladder and urinary tract toxicity in male rats, but not in female rats or mice, and dogs also failed to show the toxicity noted in male rats. The mode of action developed from the other key data to account for the toxicity and tumors in the male rats is the production of bladder stones. At high, but not lower, subchronic doses in the male rat, Compound Z leads to elevated blood phosphorus levels; the body responds by releasing excess calcium into the urine. The calcium and phosphorus combine in the urine and precipitate into multiple stones in the bladder. The stones are very irritating to the bladder; the bladder lining is eroded and cell proliferation occurs to compensate for the loss of the lining. Cell layers pile up, and finally, tumors develop. Stone formation does not involve the chemical per se but is secondary to the effects of its constituents on the blood and, ultimately, the urine. Bladder stones, regardless of their cause, commonly produce bladder tumors in rodents, especially the male rat. A.5 CONCLUSION Compound Z: "Likely/Not Likely Human Carcinogen" Range of Dose Limited, Margin-of-Exposure Extrapolation Compound Z, a metal aliphatic phosphonate, is likely to be carcinogenic to humans only under high-exposure conditions following oral and inhalation exposure that lead to bladder stone formation, but is not likely to be carcinogenic under low-exposure conditions. It is not likely to be a human carcinogen via the dermal route, given that the compound is a metal conjugate that is readily ionized and its dermal absorption is not anticipated. The weight of evidence is based on (a) bladder tumors only in male rats; (b) the absence of tumors at any other site in rats or mice; (c) the formation of calcium-phosphorus-containing bladder stones in male rats at high, but not low, exposures that erode bladder epithelium and result in profound increases in cell proliferation and cancer; and (d) the absence of structural alerts or mutagenic activity. There is a strong mode-of-action basis for the requirements of (a) high doses of Compound Z, (b) which lead to excess calcium and increased acidity in the urine, (c) which result in the precipitation of stones, and (d) the necessity of stones for toxic effects and tumor hazard potential. Lower doses fail to perturb urinary constituents, lead to stones, produce toxicity, or give rise to tumors. Therefore, dose-response assessment should assume nonlinearity. A major uncertainty is whether the profound effects of Compound Z may be unique to the rat. Even if Compound Z produced stones in humans, there is only limited evidence that humans with bladder stones develop cancer. Most often human bladder stones are either passed in the urine or lead to symptoms resulting in their removal. However, since one cannot totally dismiss A-2 ------- the male rat findings, some hazard potential may exist in humans following intense exposures. Additional research would be needed to reduce this uncertainty. A-3 ------- APPENDIX B CASE STUDY EXAMPLE MODE OF ACTION EVALUATION: COMPOUND Z (BLADDER TUMOR) B.I HAZARD DATA SUMMARY B.1.1 Data Availability Data include a rat chronic/carcinogen!city feeding study, an 18 month CD-I mouse carcinogenicity study, a three-generation reproduction study in the rat, and a 2-year feeding study in dogs. There are no data on the effects in humans of exposure to compound Z. A 13-week feeding study in rats included interim sacrifices at 2, 4, and 8 weeks and establishment of 16-week recovery groups at 8 weeks and a 21-week recovery group at 13 weeks. B.1.2 Tumor Observations B.l.2.1 Tumor Response Rats. Administration of compound Z in the diet to male Sprague-Dawley rats at dose levels of 30,000 ppm or more for 2 years resulted in an increase in bladder urothelial tumors in male rats. Statistically significant increases (p<0.05) were noted at the high dose only (40,000/30,000 ppm) in the incidences of transitional cell papillomas, carcinomas, combined papillomas and carcinomas, and hyperplasia in the 2-year SD rat bioassay (Table B-l). Bladder calculi were observed in some animals but correlation between stones and tumors was not evident at final sacrifice. Mice. No increase in tumor incidences was observed in an 18-month bioassay with mice. Dogs. When administered to dogs at dose levels up to 40,000 ppm in the diet for up to 2 years, the compound produced no tumors. B-l ------- B.I.3 Mutagenicity Compound Z has not shown mutagenic activity in Salmonella sp. or micronucleus assays. No evidence exists that the chemical produces effects on DNA synthesis, nor does it appear to be clastogenic. There are no structural attributes that suggest mutagenic potential for the chemical. Table B-l. Incidence of Transitional Cell Lesions and Stones in the Bladder of Males from a 2-Year Sprague-Dawley Rat Study Parameter TV Lesion Papilloma Carcinoma Combined Hyperplasia Stones Dose (ppm) 0 73 1 2 3 5 0 2000 75 1 2 3 7 0 8000 78 1 1 2 5 0 40,000/30,000 78 5 16 21 29 5 B.I.4 Toxicity, Uroliths, and Hyperplasia There was a strong association among disruptions in urinary physiology, toxicity, uroliths, and hyperplasia in the 13-week study in mid-dose and high-dose animals (30,000 and 50,000 ppm respectively;/* < 0.05). In the control and 8,000 ppm group, no animals had stones and no animals had hyperplasia (see Table B-2). B.l.4.1 Thirteen-Week Study Urothelial toxicity and disruptions in urinary physiology and urothelial toxicity appeared early in the study. Early changes in urinary physiology (decreased pH and increased cation concentration) were observed following 2 weeks of treatment and persisted throughout the duration of the study. Urothelial toxicity was expressed as edema, cystitis, and hyperplasia; hyperplasia (simple and papillary transitional cell combined) increased in overall incidence with continued treatment. It was present in 70% of mid-dose (30,000 ppm) animals and 80% of high- dose (50,000 ppm) animals following 2 weeks of exposure, and in 70% of the mid-dose group and 100% of the high-dose group at 13 weeks. There was some indication of a decrease in B-2 ------- Table B-2. Incidence of Bladder Hyperplasia and Stones in Male Sprague-Dawley Rats Treated up to 13 Weeks Parameter Dose3 N Papillary hyperplasia Simple hyperplasia Stones 2 weeks 1 10 0 0 2 10 0 0 3 10 7 3 4 10 8 4 8 weeks 1 10 0 0 2 10 0 0 3 10 9 9 4 9 7 8 13 weeks 1 10 0 0 0 2 10 0 0 0 3 10 5 2 7 4 6 6 0 6 T)ose (ppm): 1 = control, 2 = 8000, 3 = 30,000, 4 = 50,000. severity of hyperplasia at 13 weeks when compared with earlier time periods, as there was an apparent shift from the incidence of papillary hyperplasia to simple hyperplasia and a decrease in the combined incidence of hyperplasia in the 30,000 ppm group of animals. Uroliths were found to be present as early as 2 weeks (0%, 0%, 30%, and 40% in the four dose groups, respectively) and the incidence increased over the period of the study. The incidence of uroliths at termination of the 13-week study was 0%, 0%, 70%, and 100% in the four dose groups, respectively, but there was a decrease in size and number of stones per animal at 13 weeks. B.I.4.2 Three-Generation Reproduction Study in Rats High dose levels (>20,000 ppm in the diet) led to formation of lesions in the urinary tract of males and females of the Fl, F2, and F3 generations. The lesions included hemorrhage of the bladder wall, increased pelvic dilation, and papillary necrosis. In the F3 generation, additional effects noted in renal tissue were hyperplasia of the transitional epithelium and desquamation of cells in the lumen of the urinary tract. The changes were associated with crystalline or calcareous deposits. B.1.5 Reversibility of Effects There was strong evidence of reversibility of bladder stones and bladder hyperplasia. When animals that had been treated for 8 weeks were returned to basal diet for 16 weeks, uroliths were found in 30% of 30,000 ppm animals and 25% of high-dose animals. Bladder hyperplasia (papillary and transitional cell combined) was reduced to 25% and 30% in each of these two dose B-3 ------- groups (Table B-3). An analysis of individual animal data revealed a strong correlation between the incidence of uroliths and hyperplasia at the termination of the recovery period. Table B-3. Reversal of incidence of bladder hyperplasia and stones following 8 weeks treatment and 16 weeks recovery Parameter TV Papillary hyperplasia Simple hyperplasia Stones Dose (ppm) 0 10 0 0 0 8000 10 0 0 0 30,000 10 2 1 3 50,000 8 1 1 2 B.1.6 Blood and Urine Chemistry Compound Z administration resulted in increases in blood phosphorus and carbon dioxide (data not shown). Urinalyses (Table B-4) showed elevated calcium levels, reduced urinary phosphorus, and a profound lowering of urinary pH (5.0), which began at 2 weeks and persisted throughout the 13-week study in the 30,000 and 50,000 ppm group of rats. These changes occurred in the presence of bladder stones, which were reported to consist of 33% calcium and 23% phosphorus. Table B-4. Clinical Chemistry Values (Urine) in Male Sprague-Dawley Rats Treated up to 13 weeks Parameter Dose TV Calcium - g/dL Phosphorus - mg/dL pH Stones 2 weeks 1 10 6 90 7 0 2 10 11 62 6.5 0 3 10 56b 2b 5b 3 4 10 36C 13C 5b 4 8 weeks 1 10 11 109 7.4 0 2 10 11 90 6.9 0 3 10 18 19 5.8b 9 4 9 65b lb 5.0b 8 13 weeks 1 10 5 57 7.2 0 2 10 7 67 6.7 0 3 10 14b 26 6.0b 7 4 6 58b lb 5.0b 6 "Dose (ppm): 1 = control, 2 = 8000, 3 = 30,000, 4 = 50,000. V<0.01; °p<0.05 B-4 ------- B.1.7 Metabolism Upon ingestion by rats, the ethyl moiety of compound Z is rapidly absorbed, hydrolyzed to a phosphite, and oxidized via acetaldehyde and acetate to carbon dioxide and water. Absorption of the phosphite moiety leads to increased blood phosphorus levels. There is also an increase in blood calcium load, which leads to increased excretion of calcium via the urine. Ethyl phosphite moieties and carbon dioxide are also eliminated via the urine. A marked depression of urinary pH (5.0) results from acidification of the urine by carbon dioxide. An aluminum moiety of the parent chemical is poorly absorbed, and most is eliminated in the feces. The phosphite metabolite, the major urinary metabolite, was not shown to express carcinogenic potential when administered to Sprague-Dawley rats at dose levels up to 32,000 ppm. It also does not express any mutagenic potential and does not have any structural alerts. B.1.8 Structure-Activity Relationships There are no data on structurally related chemicals. B.2 MODE OF ACTION ANALYSIS B.2.1 Summary Description of Postulated Mode of Action Compound Z produces transitional cell tumors in male Sprague-Dawley rats. The mode of action includes disruption in urinary physiology, including precipitation of calcium and phosphorus and formation of bladder calculi. The stones irritate the urothelium of the bladder, followed by transitional cell hyperplasia and bladder tumor formation. Disruption of urinary physiology is a consequence of a metabolic sequence involving (1) absorption and metabolism of the ethyl moiety to carbon dioxide, resulting in a reduction in urinary pH; and (2) absorbtion of the phosphite moiety, which leads to increased blood phosphorus levels and increased release of calcium into the urine. Increases in water consumption followed by increased urinary volume may contribute to bladder toxicity, but a precise role of increased urinary volume has not been established. The mode of action for compound Z is consistent with other data that demonstrate that solid masses in the rodent bladder, regardless of their origin insertion of solid materials, including inert pellets, precipitation of administered chemicals (e.g., melamine) or disruption of urinary physiology (e.g., diethylene glycol) lead to urothelial toxicity and the formation of tumors. B-5 ------- B.2.2 Key Events The key precursor events associated with bladder tumor formation following administration of compound Z to rats include increased blood phosphorus and carbon dioxide, elevated urinary calcium and volume, decreased urinary pH and phosphorus, formation of bladder stones, and irritation and hyperplasia of the urothelium. B.2.3 Strength, Consistency, and Specificity of Association of Tumor Response with Key Events The only tumor response seen in animal studies is bladder tumors in male Sprague-Dawley rats. Studies in dogs and mice showed no effect on the bladder. The rat tumor response was seen only at high doses that lead to key precursor effects: altered urinary physiology (volume, calcium, pH) results in stones and produces toxicity and hyperplasia of the urothelium. The high-dose changes were noted in a rat chronic, a rat subchronic, and a three-generation reproduction study in rats. The key events, including hyperplasia, were observed to be reversible in subchronic stop/recovery studies. Administration of the major metabolite of compound Z, monosodium phosphite, fails to reduce urinary pH, increase urinary volume, or produce nonneoplastic or neoplastic lesions of the bladder. The database on compound Z is sufficient to evaluate the proposed mode of action despite the absence of more complete information on the composition of the stones and questions regarding the absence of toxicity following the administration of monosodium phosphite. There is a high degree of confidence that the findings accurately reflect the effects associated with administration of the chemical. No data gaps were identified that would substantially alter the evaluation of the proposed mode of action. B.2.4 Dose-Response Relationships The 2-year bioassay showed urothelial hyperplasia, transitional cell papillomas, and transitional cell carcinomas and a few bladder stones at 40,000/30,000 ppm. Of 78 high-dose animals, 37% showed bladder tumors. Tumors, hyperplasia, and stones were not increased at 8000 ppm. A special 13-week feeding study demonstrated that key events increased urinary calcium levels, decreased urinary phosphorus levels, decreased pH, bladder stones, irritation, edema, and hyperplasia occurred consistently only at dose levels of 30,000 ppm or greater. A strong dose-response correlation was shown between calculus formation and hypercalciuria, acidic urine, and bladder hyperplasia. In a rat reproduction study, bladder effects were noted at 24,000 ppm but not at 12,000 ppm. B.2.5 Temporal Association A subchronic rat study with serial sacrifices at 2, 4, 8, and 13 weeks, including evaluation of 16-week recovery groups after 8 weeks and a 21-week recovery group after 13 weeks, was B-6 ------- performed. By 2 weeks of administration, compound Z produced stones that filled the bladder and resulted in advanced papillary hyperplasia. The number and size of stones was greatest at two weeks and there was a progressive decrease over the 13 week period. Early changes in urinary physiology (decreased urinary pH, increased calcium concentration, and decreased phosphorus concentration) were observed following 2 weeks of treatment and persisted throughout the duration of the study. Observation of the 8-week treatment/16-week recovery groups showed that incidence of both stones and hyperplasia significantly decreased as compared with incidence in animals sacrificed at 8 weeks. Also, upon cessation of dosing at 13 weeks, the incidence of animals with stones, the incidence of papillary hyperplasia, and the severity of hyperplasia decreased significantly by the end of a 21-week recovery period (data not shown). The changes noted within 2 weeks of dosing appear to have set in motion a series of events beginning with increased urinary calcium concentrations, followed or accompanied by stone formation, irritation of the bladder urothelium, hyperplasia and, eventually, neoplasia. B.2.6 Biological Plausibility and Coherence of the Database Long-term and subchronic studies with compound Z have demonstrated a dose correlation between development of stones and bladder tumor formation in male rats. Data from the 13- week study indicate a rapid onset of effects (changes in urinary parameters, formation of stones, and hyperplasia within 2 weeks of dosing) and adaptation of treated animals to compound Z exposure by 13 weeks (decreased numbers and size of stones per animal, decreased severity of hyperplasia). Tumors were observed only at doses at which key events were observed. Additional bioassay data provide support for the association of tumors in rats with the key events in rats and the absence of both tumors and similar key events in other species treated with compound Z. Treatment of rats in a three-generation reproduction study at high dose levels (>20,000 ppm in the diet) led to formation of lesions in the urinary tract of males and females. When administered to dogs at dose levels up to 40,000 ppm in the diet for up to 2 years, the chemical produced minimal toxic effects overall, no effects on the urinary tract, and no tumors. Compound Z produced no effects in mice when administered up to a dose level of 20,000/30,000 ppm in the diet for 2 years. Observations with compound Z are in keeping with those observed in many other experimental settings. Stones, regardless of their chemical makeup, are irritating to the rodent bladder, causing irritation, hyperplasia, and eventually neoplasia. There are some uncertainties regarding the role of certain findings following compound Z administration. Generally, an increase in urinary pH is associated with the precipitation of calcium and phosphorus-containing stones in rats. However, stones are formed in the presence of a low urinary pH in rats administered compound Z. It is also unclear whether or not the acidic environment of the urine (most likely a consequence of the conversion of the ethyl moiety to carbon dioxide in the blood) contributes to or enhances any effects noted in bladder tissue in rats. B-7 ------- There was a paucity of stones in high-dose animals at termination of the 2-year study but a higher incidence of bladder tumors, which suggests that bladder stones may not be the causative factor involved in bladder tumor formation. Other considerations discount this presumption. First, a number of the high-dose animals showed hydronephrosis or dilation of the ureters, presumptive indications of past urinary tract obstruction. Second, the 13-week study provided evidence that bladder calculi develop rapidly (within 2 weeks), but then decreased in frequency and size. The decrease in size and number of bladder calculi was accompanied by a decrease in severity of bladder hyperplasia in animals treated with 30,000 ppm of compound Z. Third, it is recognized that a constant ppm of an agent in the diet results in a reduction in dose per unit body weight as an animal grows. Finally, the increased urinary volume or decreased urinary pH may have led to a dissolution of stones over time. The absence of bladder stones and urothelial toxicity following administration of the major metabolite, monosodium phosphite, is puzzling, as one might expect administration to rats would lead to similar bladder effects as with compound Z. However, the metabolite when administered to rats, leads to an increase in blood levels of phosphorus but does not alter urinary volume or pH as would be expected with an increase in sodium consumption. Considering the high dose-level of metabolite administered to rats (32,000 ppm), it is unlikely an additional bioassay using higher dose-levels would provide useful information. B.2.7 Other Modes of Action Compound Z is not mutagenic in short-term tests and it does not have a structure suggesting biological reactivity. No other modes of action, apart from that postulated, are in evidence. The fact that bladder tumors were the sole tumors seen in rats and that no other species showed tumors or other toxicities like those in the rat make it less likely that the agent has another generalized mode of action. B.2.8 Conclusion The available bioassay data on compound Z are sufficient to support the postulated mode of action that the chemical, which lacks mutagenic potential, leads to bladder tumor formation in male rats through a sequence of key events involving perturbations in urinary physiology, especially increased calcium concentration, calculus formation, urothelial irritation, hyperplasia, and neoplasia. B.3 RELEVANCE OF THE MODE OF ACTION TO HUMANS Bacterial infection, urinary stones or a combination of the two may be risk factors for human urinary tract cancer (Burin et al., 1995; Davis et al., 1984; Gonzalez et al., 1991; Kawai et al., 1994; Hiatt et al., 1982). Infection of the bladder with Schistosoma haematobium leads to B-8 ------- bladder tumors, and part of its action may be associated with stone formation (IARC, 1994). A significant relationship has also been shown between spinal cord injury and bladder cancer; chronic infection and stones are found in individuals so affected (Bickel et al., 1991; Broecker et al., 1981; Dolin et al., 1994; El-Marsi and Fellows, 1981; Stonehill et al., 1996). Case control epidemiologic studies (relative risks less than three) suggest associations between bladder cancer and urinary tract stones (Burin et al., 1995; Gonzalez et al. 1991). A large cohort study supports the association shown between bladder stones and bladder cancer (Chow et al., 1997). Taken as a whole, stones may play some role (particularly, along with infection) in bladder cancer formation. Bladder cancer is a disease of advancing age, with about 2/3 of all cases occurring among persons aged 65 years or older (Hankey et al., 1993). Stones occur much more frequently in the upper urinary tract than in the bladder of humans (about 10% of urinary stones are found in the bladder), presumably because the upright posture of humans predisposes them to expelling stones through the urethra once a stone passes from the kidney to the bladder (Hiatt et al., 1982; Johnson et al. 1979; DeSesso, 1995). This characteristic, as well as the pain which accompanies such stones and leads to their surgical removal. Stones in the rodent bladder tend to be retained, because of their horizontal position. These findings suggest that there may be a lower susceptibility of humans compared to rodents to the development of urinary tract tumors associated with stones. Precipitation of chemicals in the urinary tract with the formation of stones is a common finding, with about 12% of males and 5% of females having a history over a lifetime of at least one stone (Johnson et al., 1979). Compared to adults, urinary stone formation in children is an uncommon occurrence except in individuals with a predisposing condition, such as, various inborn errors of metabolism (e.g., cystinuria) and congenital malformations (Gearhart et al., 1991). The prevalence of urinary stones in children is about 1 case per 20,000 per year (0.005%) (Khoory et al., 1998). Only about 5% of stones are initially manifest during the first 20 years of life (Johnson et al., 1979). Causes of urinary stones in children are remarkably similar to those of adults (Khoory et al., 1998; Stapleton, 1996). Like with adults, the urine of children varies in pH and osmolality, particularly in response to diet and physiologic stressors (e.g., exercise, heat). Urinary excretion of chemicals occurs throughout life, although there may be quantitative differences associated with a number of factors including disease states and nutritional status. Stones used to be more common in children in developed countries than they are now, largely due to malnutrition, which is still a problem in developing nations today (Trinchieri, 1996). Compound Z is converted to metabolic derivatives through simple hydrolysis, a chemical conversion that does not depend on enzymatic activity. It is not plausible that differences in levels of enzymatic activity, such as detoxification via hepatic metabolism or metabolism in other tissues will alter, qualitatively, responses in population subgroups such as the aged, the infirm, or infants and children who may be exposed to Compound Z. B-9 ------- In summary, the potential human carcinogenic hazard of the chemical cannot be dismissed for Compound Z. Compound Z poses a carcinogenic hazard to humans only under conditions that would lead to the formation of bladder stones. It is reasonable to conclude that the mode of action involving stone formation for Compound Z that has been developed for adult animals may be applicable to young animals and to children. Information suggests that effects in the young may not be any greater than in adults and, in fact, the young may be less susceptible unless there are rare extenuating factors. B.4 REFERENCES Bickel, A., Culkin, D.J., Wheeler, J.S. 1991. Bladder cancer in spinal cord injury patients. J. Urol. 146:1240-1242. Broecker, B.H., Klein, F.A., Hackler, R.H. 1981. Cancer of the bladder in spinal cord injury patients. J. Urol. 125:196-7. Burin, G.J., Gibb, H.J., Hill, R.N. 1995. Human bladder cancer: Evidence for a potential irritation-induced mechanism. Fd. Chem. Toxicol. 33:785-795. Chow, W-H., Lindbald, P., Gridley, G, Nyren, O., McLaughlin, J.K., Linet, M.S., Pennello, G.A., Adami, H-O., Fraumeni, J.F. Jr. 1997. Risk of urinary tract cancers following kidney or ureter stones. J. Natl. Cancer Inst. 89: 1453-1457. Davis, C.P., Cohen, M.S., Gruber, M.B., Anderson, M.D., Warren, M.M. 1984. Urothelial hyperplasia and neoplasia: A response to chronic urinary tract infections in rats. J. Urol. 132:1025-1031. DeSesso, J.M. 1995. Anatomical relationships of urinary bladders compared: their potential role in the development of bladder tumours in humans and rats. Food Chem Toxicol. 33:705-714. Dolin, P.J., Darby, S.C., Beral, V. 1994. Paraplegia and squamous cell carcinoma of the bladder in young women: findings from a case-control study. Br. J. Cancer 70:167-168. El-Masri, W.S., Fellows, G. 1981. Bladder cancer after spinal cord injury. Paraplegia 19:265- 70. B-10 ------- Gearhart, J.P., Herzberg, G.Z., Jeffs, R.D. 1991. Childhood urolithiasis: Experiences and advances. Pediatrics 87:445-450. Gonzalez, C.A., Errezola, M., Izarzugaza, I, Lopez-Abente, G., Escolar, A., Nebot, M., Riboli, E. 1991. Urinary infection, renal lithiasis and bladder cancer in Spain. Eur. J. Cancer 27:498-500. Hankey, B.F., Silverman, D.T., Kaplan, R. 1993. Urinary bladder. In Miller, B.A., Ries, L.A.G., Hankey, B.F. et al., eds. SEER Cancer Statistics Review: 1973-1990. NIH Pub. No. 93-789. Bethesda, MD: National Cancer Institute: XXXVI. 1-17. Hiatt, R.A., Dales, L.G., Friedman, G.D., Hunkeler, E.M. 1982. Frequency of urolithiasis in a prepaid medical program. Amer. J. Epidemiol. 115: 255-265. IARC. 1994. Some Industrial Chemicals. In IARCMonographs on the Evaluation of Carcinogenic Risks to Humans. Lyon, France, 60: 13-33. Johnson, C.M., Wilson, D.M., O'Fallon, W.M., Malek, R.S., Kurland, L.T. 1979. Renal stone epidemiology: A 25-year study in Rochester, Minnesota. Kid. Internal. 16:624-631. Kawai, K., Kawamata, H., Kemeyama, S., Rademaker, A., Oyasu, R. 1994. Persistence of carcinogen-altered cell population in rat urothelium which can be promoted to tumors by chronic inflammatory stimulus. Cancer Res. 54:2630-2632. Khoory, B.J., Pedrolli, A., Vecchni, S., Benini, D., Fanos, V. 1998. Renal caculosis in pediatrics. Pediatr. Med. Chir. 20:367-376. Stapelton, F.B. 1996. Clinical approach to children with urolithiasis. Semin. Nephrol. 16:389- 397. Stonehill, W.H., Dmochowski, R.R., Patterson, A.L., Cox, C.E. 1996. Risk factors for bladder tumors in spinal cord injury patients. J. Urol. 155:1248-1250. Trinchieri, A. 1996. Epidemiology of urolithiasis. Arch. Ital. Urol. Androl. 68:203-249. B-ll ------- APPENDIX C EVALUATION OF THE QUALITY OF DATA SET(S) FOR USE IN DERIVING AN RfD The derivation of RfDs begins with a thorough review and assessment of the toxicological database to identify the type and magnitude of possible adverse health effects associated with a chemical. This evaluation should include an examination of the full range of possible health effects, including acute, short-term (14 to 28 days), subchronic, reproductive/developmental, and chronic effects. To be useful for supporting the derivation of an RfD, a study must meet certain standards with regard to experimental design, conduct and data reporting. This appendix provides general guidance on criteria for appropriate study design for a variety of types of toxicity studies. These guidelines provide the assessor with a means to evaluate the quality and adequacy of data. Appropriate studies are used both for the evaluation of potential hazard of the chemical and for the derivation of the RfD. C.I ACUTE TOXICITY DETERMINATION Studies of acute exposure (one dose or multiple dose exposure occurring within a short time (e.g. less than 24 hours)) are widely available for many chemicals. Acute toxicity [often expressed in terms of the lethal dose (or concentration) to 50 percent of the population (LD50or LC50)] is usually the initial step in experimental assessment and evaluation of a chemical's toxic characteristics. Such studies are used in establishing a dosage regimen in subchronic and other studies and may provide initial information on the mode of toxic action of a substance. Because LD50 or LC50 studies are of short duration, inexpensive and easy to conduct, they are commonly used in hazard classification systems. Acute lethality studies are of limited use, however, in the derivation of chronic criteria, since the establishment of chronic criteria should never be based on exposures that approach acutely lethal levels. However, the data from such studies do provide information on health hazards likely to arise from individual short-term exposures. Such studies provide high dose effects data from which to evaluate potential effects from exposures which may temporarily exceed the acceptable chronic exposure level. An evaluation of the data should include the incidence and severity of all abnormalities, the reversibility of abnormalities observed other than lethality, gross lesions, body weight changes, effects on mortality, and any other toxic effects. In recent years guidelines have been established to improve quality and provide uniformity in test conditions. Unfortunately, many published LD50or LC50 tests were not C-l ------- conducted in accordance with current EPA or Organization for Economic Cooperation and Development (OECD) guidelines (USEPA, 1985; OECD 1987) since they were conducted prior to establishment of those guidelines. For this reason, it becomes necessary to examine each test or study to determine if the study was conducted in an adequate manner. The following is a list of ideal conditions compiled from various testing guidelines which may be used for determination of adequacy of acute toxicity data. Many published studies do not report details of test conditions making such determinations difficult. However, test conditions guidelines that might be considered ideal may include: General: Animal age and species identified. Minimum of 5 animals per sex per dose group (both sexes should be used). 14-day or longer observation period following dosing. Minimum of 3 dose levels appropriately spaced (most statistical methods require at least 3 dose levels). Identification of purity or grade of test material used (particularly important in older studies). If a vehicle used, the selected vehicle is known to be non-toxic. Gross necropsy results for test animals. Acclimation period for test animals before initiating study. Specific conditions for oral LD50: Dosing by gavage or capsule. Total volume of vehicle plus test material remain constant for all dose levels. Animals were fasted before dosing. C-2 ------- Specific conditions for dermal LD10: Exposure on intact, clipped skin and involve approximately 10 percent of body surface. Animals prevented from oral access to test material by restraining or covering test site. Specific conditions for inhalation LC50: Duration of exposure at least 4 hours. If an aerosol (mist or particulate), the particle size (median diameter and deviation) should be reported. Although the above listed conditions would be included in an ideally conducted study, not all of these conditions need to be included in an adequately conducted study. Therefore, some discretion is required on the part of the individual reviewing these studies (USEPA, 1985; OECD, 1987). C.2 SHORT-TERM TOXICITY STUDIES (14-DAY OR 28-DAY REPEATED DOSE TOXICITY) Short-term exposure generally refers to multiple or continuous exposure usually occurring over a 14-day to 28-day time period. The purpose of short-term repeated dose studies is to provide information on possible adverse health effects from repeated exposures over a limited time period. The following guidelines were derived using the OECD Guidelines for Testing of Chemicals (OECD, 1987) for determining the design and quality of a repeated dose short-term toxicity study: Minimum of 3 dose levels administered and an adequate control group used. Minimum of 10 animals per sex, per dose group (both sexes should be used). The highest dose level should ideally elicit some signs of toxicity without inducing excessive lethality and the lowest dose should ideally produce no signs of toxicity. C-3 ------- Ideal dosing regimes include 7 days per week for a period of 14 days or 28 days. All animals should be dosed by the same method during the entire experiment period. Animals should be observed daily for signs of toxicity during the treatment period (i.e., 14 or 28 days). Animals that die during the study are necropsied and all survivors in the treatment groups are sacrificed and necropsied at the end of the study period. All observed results, quantitative and incidental, should be evaluated by an appropriate statistical method. Clinical examinations should include hematology and clinical biochemistry, urinalysis may be required when expected to provide an indication of toxicity. Pathological examination should include gross necropsy and histopathology. The findings of short-term repeated dose toxicity studies should be considered in terms of the observed toxic effects and the necropsy and histopathological findings. The evaluation will include the incidence and severity of abnormalities, gross lesions, body weight changes, effects on mortality, and other general or specific toxic effects (OECD, 1987). These guidelines represent ideal conditions and studies will not be expected to meet all standards in order to be considered to be adequate. For example, the National Toxicology Program's cancer bioassay program has generated a substantial database of short-term repeated dose studies. The study periods for these range from 14 days to 20 days with 12 to 15 doses administered generally for 5 dose levels and a control. Since the quality of this data is good, it is desirable to consider these study results even though they do not always identically follow the protocol. C.3 SUBCHRONIC AND CHRONIC TOXICITY Studies involving subchronic exposure (occurring usually over 3 months) and chronic exposure (those involving an extended period of time, or a significant fraction of the subject's lifetime) are designed to permit a determination of no-observed-effect levels (NOEL) and toxic effects associated with continuous or repeated exposure to a chemical. Subchronic studies provide information on health hazards likely to arise from repeated exposure over a limited period of time. They provide information on target organs, the possibilities of accumulation, and, with the appropriate uncertainty factors, may be used in establishing water quality criteria for human health. Chronic studies provide information on potential effects following prolonged arid repeated exposure. Such effects might require a long latency period or are cumulative in nature before manifesting disease. The design and conduct of such tests should allow for detection of general C-4 ------- toxic effects including neurological, physiological, biochemical, and hematological effects and exposure-related pathological effects. The following guidelines were derived using the EPA Health Effects Testing Guidelines (USEPA, 1985), for determining the quality of a subchronic or chronic (long term) study. Additional detailed guidance may be found in that document. These guidelines represent ideal conditions and studies will not be expected to meet all standards in order to be considered for use as the basis for RfD derivation. Ideally, a subchronic/chronic study should include: Minimum of 3 dose levels administered and an adequate control group used. Minimum of 10 animals for subchronic, 20 animals for chronic studies per sex, per dose group (both sexes should be used). The highest dose level should elicit some signs of toxicity without inducing excessive lethality and the lowest dose should ideally produce no signs of toxicity. Ideal dosing regimes include dosing for 5-7 days per week for 13 weeks or greater (90 days or greater) for subchronic, and at least 12 months or greater for chronic studies in rodents. For other species repeated dosing should ideally occur over 10 percent or greater of animal's lifespan for subchronic studies and 50 percent or greater of the animal's lifespan for chronic studies. All animals should be dosed by the same method during the entire experimental period. Animals should be observed daily during the treatment period (i.e., 90 days or greater). Animals that die during the study are necropsied and, at the conclusion of the study, surviving animals are sacrificed and necropsied and appropriate histopathological examinations carried out. Results should be evaluated by an appropriate statistical method selected during experimental design. Such toxicity tests should evaluate the relationship between the dose of the test substance and the presence, incidence and severity of abnormalities (including behavioral and clinical abnormalities), gross lesions, identified target organs, body weight changes, effects on mortality, and any other toxic effects noted in USEPA (1985). C-5 ------- C.4 DEVELOPMENTAL TOXICITY Guidelines for reproductive and developmental toxicity studies have been developed by EPA (USEPA, 1985 and OECD, 1987). Developmental toxicity can be evaluated via a relatively short-term study in which the compound is administered during the period of organogenesis. Based on the EPA Health Effects Testing Guidelines (USEPA, 1985), ideal studies should include: Minimum of 20 young, adult, pregnant rats, mice, or hamsters or 12 young, adult, pregnant rabbits recommended per dose group. Minimum of 3 dose levels with an adequate control group used. The highest dose should induce some slight maternal toxicity but no more than 10 percent mortality. The lowest dose should not produce grossly observable effects in dams or fetuses. The middle dose level, in an ideal situation, will produce minimal observable toxic effects. Dose period should cover the major period of organogenesis (days 6 to 15 gestation for rat and mouse, 6 to 14 for hamster, and 6 to 18 for rabbit). Dams should be observed daily; weekly food consumption and body weight measurements should be taken. Necropsy should include both gross and microscopic examination of the dams; the uterus should be examined so that the number of embryonic or fetal deaths and the number of viable fetuses can be counted; fetuses should be weighted. One-third to one-half of each litter should be prepared and examined for skeletal anomalies and the remaining animals prepared and examined for soft tissue anomalies. As with any other type of study, the appropriate statistical analyses must be performed on the data for a study to qualify as a good quality study. In addition, developmental studies are unique in the sense that they yield two potential experimental units for statistical analysis, the litter and the individual fetus. The EPA testing guidelines do not provide any recommendation on which unit to use, but the Guidelines for the Developmental Toxicity Risk Assessment (USEPA, 1991) states that "since the litter is generally considered the experimental unit in most developmental toxicity studies..., the statistical analyses should be designed to analyze the relevant data based on incidence per litter or on the number of litters with a particular endpoint." C-6 ------- Others have also identified the litter as the preferred experimental unit (Palmer, 1981 and Madson etal., 1982). Information on maternal toxicity is very important when evaluating developmental effects because it helps determine if differential susceptibility exists for the offspring and mothers. Since the conceptus relies on its mother for certain physiological processes, interruption of maternal homeostasis could result in abnormal prenatal development. Substances which affect prenatal development without compromising the dam are considered to be a greater developmental hazard than chemicals which cause developmental effects at maternally toxic doses. Unfortunately, maternal toxicity information has not been routinely presented in earlier studies and has become a standard practice in studies only recently. In an attempt to use whatever data are available, maternal toxicity information may not be required if developmental effects are serious enough to warrant consideration regardless of the presence of maternal toxicity. C.5 REPRODUCTIVE TOXICITY The EP'A Health Effects Testing Guidelines (USEPA, 1985) include guidelines for both reproduction and fertility studies and developmental studies. These EPA guidelines can serve as the ideal experimental situation with which to compare study quality. Studies being evaluated do not need to match precisely but rather should be similar enough that one can be assured that the chemical was adequately tested and that the results are a reliable estimate of the true reproductive or developmental toxicity of the chemical. These guidelines also recommend a two-generation reproduction study to provide information on the ability of a chemical to impact gonadal function, conception, parturition and the growth and development of the offspring. Additional information concerning the effects of a test compound on neonatal morbidity, mortality, and developmental toxicity may also be provided. The recommendations for reproductive testing are lengthy and quite detailed and may be reviewed further in the EPA Health Effects Testing Guidelines. In general, the test compound is administered to the parental (P) animals (at least 20 males and enough females to yield 20 pregnant females) at least 10 weeks before mating, through the resulting pregnancies and through weaning of their offspring (Fl or first generation). The compound is then administered to the Fl generation similarly through the production of the second generation (or F2) offspring until weaning. Recommendations for numbers of dose groups and dose levels are similar to those reported for developmental studies. Details should also be provided on mating procedures, standardization of litter sizes (if possible, 4 males and 4 females from each litter are randomly selected), observation, gross necropsy and histopathology. Full histopathology is recommended on the following organs of all high dose and control P and Fl animals used in mating: vagina, uterus, testes, epididymides, seminal vesicles, prostate, pituitary gland, and target organs. Organs of animals from other dose groups should be examined when pathology has been demonstrated in high dose animals (USEPA, 1985). C-7 ------- C.6 REFERENCES Madson, J.M. et al. 1982. Teratology test methods for laboratory animals. In: Principles and Methods of Toxicology. Hayes, A.W. (ed). Raven Press. New York, NY. OECD (Organization for Economic Cooperation and Development). 1987. Guidelines for Testing of Chemicals. Paris, France. Palmer, AK. 1981. Regulatory requirements for reproductive toxicology: Theory and practice. In: Developmental Toxicology. Kimmel, C.A. and J. Buelke-Sam (eds). Raven Press. New York, NY. USEPA (U.S. Environmental Protection Agency). 1985. Health Effects Testing Guidelines. 40 CFR Part 798. USEPA (U.S. Environmental Protection Agency). 1991. Final guidelines for development toxicity risk assessment. Federal Register 56: 63798-63826. December 5. C-8 ------- |