EPA/635/R-00/005F September 2000 EPA Summary Report Characterization of Data Variability and Uncertainty: Health Effects Assessments in the Integrated Risk Information System (IRIS) In response to Congress, HR 106-379 National Center for Environmental Assessment Office of Research and Development US Environmental Protection Agency Washington, DC ------- DISCLAIMER This document has been reviewed in accordance with U.S. Environmental Protection Agency policy. Mention of trade names or commercial products does not constitute endorsement or recommendation for use. 11 ------- TABLE OF CONTENTS EXECUTIVE SUMMARY 1 1. INTRODUCTION AND PURPOSE 2 2. BACKGROUND 3 2.1. Hazard and Dose-Response Assessment 4 2.2. IRIS Program and DataBase 6 2.3. Uncertainty and Variability 8 3. EVALUATION APPROACH 10 3.1. Protocol Development 10 3.2. Screening Evaluation 10 3.3. In-depth evaluation 11 4. SUMMARY OF RESULTS 13 4.1. Screening Evaluation 13 4.2. In-Depth Evaluation 13 5. DISCUSSION 16 6. CONCLUSIONS 18 7. REFERENCES 20 ATTACHMENTS • EPA Screening Evaluation Report: Presentation and Discussion of Uncertainty and Variability in IRIS Assessments • Versar Report: Characterization of Data Uncertainty and Variability in IRIS Assessments, Pre-Pilot vs. Pilot/post-Pilot • Appendix A of Versar Report: Individual reports of experts assembled by Versar, Inc. in ------- EXECUTIVE SUMMARY In response to a Congressional directive contained in HR 106-379 regarding EPA's appropriations for FY2000, EPA has undertaken an evaluation of the characterization of data variability and uncertainty in its Integrated Risk Information System (IRIS) health effects information database. Through consultation with EPA's Science Advisory Board, EPA developed and implemented a systematic plan to select a representative sample of chemical assessments in IRIS to be evaluated in-depth by an independent panel of experts for the extent to which EPA has documented uncertainty and variability. EPA conducted a screening evaluation on 10% percent of the IRIS summaries of chemical assessments completed during the period of 1988-1994 (52 of 522 pre-Pilot assessments) and all 15 Pilot/post-Pilot IRIS summaries and Toxicological Reviews (completed after 1995) for overall documentation of data variability and uncertainty. An EPA contractor then selected 16 assessments (IRIS summaries and support documents) for in-depth examination from the screening sample (8 of 52 pre-Pilot and 8 of 15 Pilot/post-Pilot). The contractor selected six independent experts (outside EPA) in the field of human health risk assessment, who performed this in-depth review. In general, the outside experts concluded that the characterization of data variability and uncertainty varied across the assessments they reviewed. While the documentation of data variability and uncertainty has generally improved since the IRK Pilot's introduction of Toxicological Reviews to substantiate IRIS summaries, the reviewers found that the quality of the characterization of data variability and uncertainty varied among the Pilot/post-Pilot assessments. The reviewers also suggested ways to describe uncertainty and variability, and a number of scientific improvements, especially the need to update older assessments with more recent scientific data and risk assessment methods. This study supports EPA's goal to make the scientific bases for risk assessment conclusions more transparent. EPA will continue to look into ways to improve the characterization and documentation of data variability and uncertainty in future IRIS assessments. Note: This report reflects the review and comments of the Environmental Health Committee of EPA's Science Advisory Board, as discussed publically August 30, 2000, and documented in their final report to the EPA Administrator, dated September 26, 2000. (See http://www.epa.gov/sciencel/drrep.htm). ------- 1. INTRODUCTION AND PURPOSE The Integrated Risk Information System (IRIS) data base contains EPA's consensus scientific positions on potential adverse human health effects that may result from chronic exposure to specific chemical substances in the environment. As of January 31, 2000, the IRIS data base contained 537 chemical-specific assessments. IRIS is widely used by regulator programs and risk assessors at all levels of government and by the public. First publically available in 1988, these assessments provide the summary results of EPA deliberations culminating in consensus hazard and dose-response conclusions for cancer and noncancer health effects. Since 1995 (when the "IRIS Pilot" program was undertaken), EPA has taken several steps to ensure that the best available scientific information is included in chemical assessments made available on IRIS, including improvements in documentation of scientific decisions, and external peer reviews of all subsequent assessments. Regarding IRIS, Congress issued the following directive, which was contained in the October 1999 report from Congress (HR 106-379) regarding EPA's appropriations for FY2000: "The conferees are concerned about the accuracy of information contained in the Integrated Risk Information system [IRIS] data base which contains health effects information on more than 500 chemicals. The conferees direct the Agency to consult with the Science Advisory Board (SAB) on the design of a study that will a) examine a representative sample of IRIS health assessments completed before the IRIS Pilot Project, as well as a representative sample of assessments completed under the project and b) assess the extent to which these assessments document the range of uncertainty and variability of the data. The results of that study will be reviewed by the SAB and a copy of the study and the SAB's report on the study sent to the Congress within one year of enactment of this Act." In response to the Congressional directive, EPA has undertaken an evaluation of the characterization of data variability and uncertainty in IRIS assessments. This report addresses Congress's directive. Section 2 of the report provides background information about EPA's approaches to health hazard and dose-response assessments, and describes the IRIS program and the kinds of health information available in IRIS. It also discusses the sources of scientific uncertainties and variability related to the risk assessment process, and defines these terms in the context of the purpose of this EPA study, i.e., characterization of data variability and uncertainty of chemical assessments in IRIS. Section 3 describes the study protocol, and the summary findings of the study are provided in section 4. Details of the study protocol and results can be found in the three attachments. Discussion of study results, study conclusions, and references are provided in sections 5, 6 and 7, respectively. ------- 2. BACKGROUND Risk assessment is the process EPA uses to identity and characterize environmentally- related human health problems. As defined by the National Academy of Sciences (NAS, 1983), risk assessment entails the evaluation of all pertinent scientific information to describe the likelihood, nature, and extent of harm to human health as a result of exposure to environmental contaminants. EPA has used the basic NAS paradigm as a foundation for its published risk assessment guidance, and as an organizing system for many individual environmental chemical assessments. There are four components to every complete risk assessment - hazard assessment, dose-response assessment, exposure assessment, and risk characterization Hazard assessment describes qualitatively the likelihood that an environmental agent can produce adverse health effects under certain environmental exposure conditions. Dose-response assessment quantitatively estimates the relationship between the magnitude of exposure and the degree and/or probability of occurrence of a particular health effect. Exposure assessment determines the extent of human exposure. Risk characterization integrates the findings of the first three components to describe the nature and magnitude of health risk associated with environmental exposure to a chemical substance or a mixture of substances. There are many uncertainties associated with environmental risk assessments due to the complexity of the exposure-dose-effect relationship, and the lack of, or incomplete, knowledge and information about the physical, chemical, and biological processes within and between human exposure to an environmental substance(s) and health effects. Major sources of uncertainty include the use a wide range of data from many different disciplines (e.g., epidemiology, toxicology, biology, chemistry, statistics), the use of many different predictive models and methods in lieu of actual measured data, the use of many scientific assumptions and science policy choices, i.e., scientific positions assumed in lieu of scientific data, in order to bridge the information and knowledge gaps in the environmental risk assessment process. These diverse elements, along with varying interpretations of the scientific information, can result in divergent results in the risk assessment process, an outcome that leads to risk assessment controversies. Thus, EPA risk assessment guidelines stress the importance of identifying uncertainties and variability and presenting them as part of risk characterization. Over the years, EPA has conducted health hazard and dose-response assessments for many environmental chemical contaminants. The summary findings and outcomes of these assessments which represent scientific consensus positions across the Agency are made available in the IRIS data base. Information on IRIS can be used with an exposure assessment for a specific exposure scenario to perform a complete risk assessment. The following sections ------- provide an overview of EPA's historical and current approaches to health hazard and dose- response assessments, describe EPA's IRIS program and the kinds of information available in IRIS, and define variability and uncertainty in the context of hazard and dose-response assessments and available information in IRIS. 2.1. Hazard and Dose-Response Assessment In general, chemicals often affect more than one organ or system of the body (e.g., liver, kidney, nervous system) and can produce a variety of health endpoints (e.g., cancer, respiratory allergies, infertility), depending on the conditions of exposure such as the amount, frequency, duration, and route of exposure (i.e., ingestion, inhalation, dermal contact). For most environmental chemicals, available health effects information is generally limited to high exposures in studies of humans (e.g., occupational studies of workers) or laboratory animals. Thus, evaluation of potential health effects associated with low levels of exposure generally encountered in the environment involves inferences based on the understanding of the mechanisms of chemical-induced toxicities. Mechanism of action is defined as the complete sequence of biological events that must occur to produce an adverse effect. In cases where only partial information is available, the term mode of action is used to describe only major (but not all) biological events which are judged to be sufficient to inform about the shape of the dose- response curve beyond the range of observation. For effects that involve the alteration of genetic material (e.g., most cancers, heritable mutations), there are theoretical reasons to believe that such a mode of action would not show a threshold, or dose below which there are no effects. On the other hand, a threshold is widely accepted for most other health effects, based on considerations of compensatory homeostasis and adaptive mechanisms. The threshold concept presumes that a range of exposures from zero to some finite value can be tolerated by an individual without adverse effects. Accordingly, different approaches have traditionally been used to evaluate the potential carcinogenic effects and health effects other than cancer, referred to as "noncancer" effects. Carcinogenic Effects Cancer hazard assessment involves a qualitative weight-of- evidence evaluation of potential human carcinogenicity. This evaluation is a synthesis of all pertinent information in addressing the question of "How likely an agent is to be a human carcinogen. " The EPA's 1986 Guidelines for Carcinogen Risk Assessment (USEPA, 1986) provide a classification system for the characterization of the overall weight-of-evidence for potential human carcinogenicity based on human evidence, animal evidence, and other supportive data. The EPA's 1996 Proposed Guidelines for Carcinogen Risk Assessment (USEPA, 1996a) and the subsequent revised external review draft (USEPA, 1999), emphasize the need for characterizing cancer hazard in addition to hazard identification. Accordingly, the ------- question to be addressed in hazard characterization is expanded to "How likely an agent is to be a human carcinogen, and under what exposure conditions a cancer hazard may be expressed. " In addition, the revised guidelines stress the importance of considering the mode(s) of action information of the agent for making an inference about potential cancer hazard beyond the range of observation. To express the weight-of-evidence for potential human carcinogenicity, the EPA's proposed revised guidelines emphasize using a hazard narrative in place of the classification system. However, in order to provide some measure of consistency, standard hazard descriptors are used as part of the hazard narrative to express the conclusion regarding the weight-of evidence for potential human carcinogenicity. Dose-response assessment for carcinogenic effects usually involves the use of a linear extrapolation model(s) to estimate an upper bound on cancer risks at a given low level of exposure. The linear low dose extrapolation approach is considered appropriate for cases where there is insufficient understanding of the mode of action, or when available data indicate a linear dose-response curve at low doses, but there are not enough data that would allow the development of biologically-based dose-response models. This risk estimate is known as cancer unit risk for inhalation exposure and slope factor for oral exposure. It is recognized that such an estimate may not give a realistic prediction of risk and the true value of risk may be as low as zero. However, the use of such models puts a ceiling on what the risk might be. When there is sufficient evidence for a nonlinear mode of action, but not enough data to construct a biologically-based model for the relationship, EPA's proposed revised cancer guidelines (USEPA, 1996a) call for the use of a margin of exposure analysis as a default procedure. A margin of exposure analysis compares the point of departure (i.e., the lower 95% confidence limit of the dose or exposure associated with 10% risk of cancer or precursor effects) with the dose associated with the environmental exposure(s) of interest, and determines whether or not the exposure margins are adequate. Both default approaches maybe used for a specific cancer assessment, if it is mediated by multiple modes of action which may include linear and nonlinear modes of action. Noncancer Effects The Agency has published several guidelines for assessing specific noncancer health endpoints including developmental toxicity, reproductive toxicity, and neurotoxicity, (USEPA, 1991, 1996b, 1998, respectively). Like the cancer guidelines, these guidelines set forth principles and procedures to guide EPA scientists in the interpretation of epidemiologic, toxicologic and mechanistic studies to make inferences about the potential hazard of these specific health endpoints. Following a review and evaluation of the spectrum of potential health effects associated with the chemical of interest (i.e., hazard identification), a dose-response assessment is then performed on the "critical effect" (i.e., the adverse effects or its known precursor which occurs at the lowest dose) to derive a chronic reference dose (RfD) or ------- reference concentration (RfQ for oral and inhalation exposure, respectively. The RfD/RfC is defined as "an estimate (with uncertainty spanning perhaps an order of magnitude) of a continuous oral/inhalation exposure to the human population (including sensitive subgroups) that is likely to be without an appreciable risk of deleterious noncancer effects during a lifetime " (Barnes and Dourson, 1988; USEPA, 1994a). The RfD/RfC approach assumes that if exposure can be limited so that a critical effect does not occur, then no other noncancer effects will occur. Thus, this approach fulfills the regulatory needs for various EPA's regulatory programs for defining an exposure level(s) below which there is negligible risk of adverse noncancer health effects. 2.2. IRIS Program and Data Base The IRIS database was created in 1986 as a mechanism for developing consistent intra- Agency consensus positions on potential health effects of chemical substances. EPA Program Offices and Regions were regulating some of the same substances, and determined that in many cases the Agency needed to use consistent scientific judgments on potential health effects in risk- based decisions. Chemical assessments prepared by Program and Regional Offices were peer reviewed by three intra-agency workgroups (i.e., RfD, RfC, and Carcinogen Risk Assessment Verification Endeavor, or CRAVE, workgroups) comprising health scientists across the Agency. Summary results of these consensus assessments were collected and made available on IRIS. Combined with site-specific or national exposure information, the summary health information in IRIS could then be used by risk assessors and other staff to evaluate potential public health risks from environmental contaminants. Summary information in IRIS consists of three components: derivation of oral chronic RfD and inhalation chronic RfC, for noncancer critical effects, cancer classification (and cancer hazard narrative for the more recent assessments) and quantitative cancer risk estimates. IRIS summaries were originally written for an internal EPA audience. For this reason, IRIS information has focused on the documentation of toxicity values (i.e., RfD, RfC, cancer unit risk and slope factor) and cancer classification. The bases for these numerical values and evaluative outcomes are provided in an abbreviated and succinct manner. Details for the scientific rationale can be found in supporting documents, and references for these assessment documents, and key studies are provided in the bibliography sections. Moreover, it was not considered necessary to articulate every default assumption used in individual chemical assessments as these assumptions have been explicitly discussed and supported in the Agency's published risk assessment guidance. It is also important to note that the three components of IRIS information (RfD, RfC, and cancer evaluation) were added to the database at different times, depending on the regulatory needs, without an explanation of why other endpoints were ------- not assessed. As external interest in the information on IRIS grew, EPA made the IRIS database publically available in 1988 via the National Library of Medicine's TOXNET system. In 1995, EPA undertook the IRIS Pilot Program to evaluate and implement a number of improvements in the documentation of summary information in IRIS and in the scientific peer review process. Individual chemical hazard and dose-response assessments for cancer and noncancer health effects are now provided in a single supporting document known as the IRIS "lexicological Review" (or an equivalent support document). This procedure was subsequently adopted in response to the need for a more integrated health assessment as harmonized dose-response approaches become available for cancer and noncancer effects. In addition, there has been an increased demand for more transparency in the default assumptions and methods used in these chemical assessments, in response to the Agency policy on risk characterization (USEPA, 1995), as well as for developing and documenting the scientific bases for moving away from default methods (e. g., use of chemical-specific data to replace default values of uncertainty factors). In order to make the scientific quality of the assessments more uniform, an external peer review process was included in the Pilot program into the preparation of each chemical assessment, in response to EPA's Peer Review Policy (USEPA, 1994b). Since 1997, IRIS summaries and accompanying support documents, including a summary and response to external peer review comments, have been publically available in full text on the IRIS web site at http://www.epa.gov/iris. The Internet site is now EPA's primary repository for IRIS. Together they comprise the "IRIS assessment" for a given chemical substance. The information currently on IRIS represents the state-of-the-science and state-of-the- practice in risk assessment as it existed when each assessment was prepared, often 10 or more years ago. When EPA reassesses older IRIS entries, an opportunity exists to update the science and apply more current methodologies. EPA uses an annual priority-driven approach to determine which chemical substances are most in need of assessment or reassessment. The Office of Research and Development, National Center for Environmental Assessment (NCEA) coordinates the Agency-wide IRIS priority-setting process as part of its broader role of managing the IRIS program The criteria that drive EPA's priorities are usually Program Offices' and Regions' statutory, regulatory, and programmatic needs. Availability of new scientific information to perform reassessments is also a strong criterion. The determination of the annual IRIS agenda is further modified by the availability of EPA scientific staff with appropriate expertise and other resources in various IRK-sponsoring Offices to develop and manage individual assessments. NCEA's IRIS Staff therefore works with other parts of the Agency to re fine the compilation of priority needs with consideration of available resources to accomplish the work. The resulting annual IRIS agenda, published in the Federal Register each 7 ------- winter, therefore reflects both the Agency's priority chemicals for assessment or reassessment and internal commitments to lead the work. Much work will be needed over the coming years in order to update even the highest priority substances. In an effort to improve the pace of the assessment process and leverage resources, EPA is currently evaluating ways to work cooperatively with external parties on assessment development. Five cooperative efforts are currently in progress, three with private organizations and two with other federal agencies. Others are under consideration. Under a cooperative arrangement, an external party may submit an assessment for EPA's consideration in developing an EPA IRIS document; however, EPA's consensus position must be documented separately. EPA is continuing to look for opportunities to improve the IRK process and the pace of data base update. 2.3. Uncertainty and Variability Because the Congressional language was to address "uncertainty and variability of the data, " this report uses an expansive definition of the term "variability." As used in this report, "variability encompasses any aspect of the risk assessment process that can have varying results, including the potential interpretations of the available data, the availability of different data sets collected under different experimental protocols, and the availability of different models and methods. Several of these would be considered as sources of uncertainty under the definitions of variability and uncertainty used by the NRC (1994) and EPA (1992, 1997). These stricter definitions use "variability" to refer to differences attributable to diversity in biological sensitivity or exposure parameters; these differences can be better understood, but not reduced by further research. "Uncertainty" refers to lack of knowledge about specific factors, parameters, or models, and generally can be reduced through further study. This section summarizes key uncertainties and data variability generally encountered in hazard and dose-response evaluations for cancer and noncancer effects. Hazard Assessment For most chemical substances for which there are insufficient data in humans, a major uncertainty in the evaluation of potential health effects to humans is the reliance on animal studies of high exposure to predict human response at lower exposure, particularly in the absence of an understanding of how an agent causes the observed toxicologic effects in the animals, and in the face of the varying results frequently obtained with different animal species under different exposure conditions. Even when there are human data, there is uncertainty about average response at lower exposures and there is variability in individual response around this average. Therefore, EPA has adopted a number of scientific assumptions as science policy choices in the face of data and knowledge gaps. Major assumptions used in hazard assessment (unless there are data to the contrary) ------- include the following: (a) effects observed in one human population are predictive of other human populations; (b) in the absence of human data, effects seen in laboratory animals are assumed to be relevant to humans, and humans may respond similarly (although not identically) to the most sensitive animal species; and (c) effects seen at high exposure are relevant for evaluation of potential effects at low exposure. These scientific assumptions or science policies have also been articulated further in EPA's peer-reviewed risk assessment guidance documents, as discussed above. Reference Values for Noncancer Effects To derive a RfD/RfC for a noncancer critical effect, the common practice is to apply standard "uncertainty factors" (UFs) to the no-observed adverse effect level (NOAEL), lowest-observed adverse effect level (LOAEL) or benchmark dose/concentration (BMCLJ1 (US EPA, 1995c). These UFs are used to account for the extrapolation uncertainties (e.g., inter-individual variation, interspecies differences, duration of exposure) and database adequacy. A modifying factor (MF) is also used as a judgment factor to account for the confidence in the critical study (or studies) used in the derivation of the RfD/RfC. Replacements for default UFs are used when chemical-specific data are available to modify these standard values. This is known as the "data-derived" approach. Moreover, the use of pharmacokinetic or dosimetry models can obviate the need for an UF to account for differences in toxicokinetics across species. A number of related factors can lead to significant uncertainty of the RfD/RfC. Among these is the selection of different observed effects as a critical effect, which may vary within and across available studies. Also significant is the choice of different data sets for the identification of the NOAEL, LOAEL, or bench mark dose analysis, the use of different values for the various UFs, and additional judgments which impact the MF. Cancer Risk Estimates Cancer dose-response assessment generally involves many scientific judgments regarding the selection of different data sets (benign and malignant tumors or their precursor responses) for extrapolation, the choice of low dose extrapolation approach based on the interpretation and assessment of the mode of action for the selected tumorigenic response(s), the choice of extrapolation models, methods to account for differences in dose across species, and the selection of the point of departure for low dose extrapolation. Given that many judgments need to be made in the many steps of the assessment process in the face of data variability, along with the use of different science policy choices and default procedures and methods to bridge data and knowledge gaps, it is generally recognized that uncertainty exists in !BMCLx is defined as the lower 95% confidence limit of the dose that will result in a level of "x"% response (e.g., BMCL10 is the lower 95% confidence limit of a dose for a 10% increase in a particular response). 9 ------- cancer risk estimates. 3. EVALUATION APPROACH The following sections describe the overall approach for this evaluative study and the study protocols for the screening step and the in-depth evaluation of the documentation of data variability and uncertainty of available health information in IRIS. Details of the study protocols can be found in the attachments (EPA Screening Evaluation Report, and Versar In-Depth Report). 3.1. Protocol Development Following the Congressional directive, EPA consulted with the Executive Committee of EPA's Science Advisory Board (SAB) about a proposed approach to this study. The agreed- upon approach involved assembling a team of independent, qualified individuals, external to EPA, to evaluate a representative set of IRIS assessments for the extent of documentation of variability and uncertainty. The use of external experts would avoid internal bias and the appearance that the IRIS program was "reviewing itself." The assessments would be reviewed simultaneously by multiple evaluators, in order to obtain a range of opinions from experts with a variety of relevant backgrounds. In order to address Congress's point concerning pre-Pilot and Pilot assessments, half of the sample would be from the set of pre-Pilot assessments (completed before 1995) and half from the later assessments. The SAB supported EPA's overall approach, and recommended a number of enhancements. First, they recommended a tiered approach to selecting a representative sample of assessments, in which a sample of at least 10% of the available assessments would first be screened for their treatment of variability and uncertainty. This screening was to consider broad categories of documentation, and be verified by an independent reviewer. A smaller set of assessments would be chosen from the screening sample for in-depth review. The SAB also encouraged examining as large a set of assessments in-depth as possible. They felt that three reviews per assessments would provide a sufficient range of opinions, given an adequate range of subject area expertise among the evaluators. This decision made it possible to target a sample of 16 assessments, to be reviewed by a total of six independent evaluators. 3.2. Screening Evaluation An EPA scientist (IRIS Program Staff) carried out the screening evaluation, which is detailed in the attached EPA report. As recommended by the SAB, a 10% sample of pre-Pilot IRIS assessments (52 of 522) was identified. These, and the 15 Pilot/post-Pilot IRIS assessments 10 ------- completed by January 31, 2000, a total of 67 assessments, were classified into three broad categories of overall documentation: none/minimal, some/moderate, or extensive (see Table 2, attached EPA Screening Report). The purpose of the preliminary screening was to survey broadly the extent of documentation of uncertainty and variability of health effects information in IRIS, in order to facilitate an in-depth evaluation of a smaller, but representative set of chemical assessments in IRIS. Due to the large volume of pre-Pilot assessment materials (52 sets of an IRIS summary plus supporting EPA Source Document(s)), only the IRIS Summaries were examined. For the later IRIS assessments, the IRK summary and the Toxicological Review were examined. Consequently, this screening addressed only the overall approach to providing information concerning variability and uncertainty in the on-line assessments, not the completeness of the summarized information, nor the cited scientific literature available at the time of each assessment. The first category, "None/Minimal," describes assessments which presented conclusions, with overall uncertainty and confidence statements, but no incidence rates or other quantitative health effect levels for the available studies (such as, percent weight loss), nor any rationale for the confidence statements. Assessments with "Some or Moderate" documentation contained quantitative effect levels and some discussion of variability of effects, including variability across dose groups. In addition, these assessments contained some discussion of the reasons for overall confidence in the assessment. Assessments with "Extensive" documentation contained quantitative information (such as confidence intervals), some comparison of results across related studies, discussion of sources of uncertainty, comparison of uncertainties across available studies, and rationales for confidence in the available studies and conclusions drawn in the assessment. A listing of the categorized assessments was provided to the contractor to facilitate choosing the random sample for in-depth evaluation of the treatment of variability and uncertainty. As recommended by the SAB Executive Committee, a second reviewer (an EPA health scientist without routine involvement in preparing or reviewing IRIS assessments) repeated the above evaluative step, without any knowledge of the results of the first round of review. The details of this second evaluation are also provided in the attached EPA Screening Report 3.3. In-depth evaluation The in-depth evaluation then focused on 16 IRIS assessments, half (8) from the pre-Pilot assessments and the other half from the Pilot/post-Pilot assessments. Within these two subsets, the assessments were randomly selected from the "some/moderate" and "extensive" documentation categories as evenly as possible. The assessments in the "none/minimal" category were not included in this part of the evaluation; it was not clear whether it would be a good use 11 ------- of the experts' effort to review these assessments, as they likely contained limited characterization of uncertainty and variability, at least based on the summary information. EPA's contractor (Versar, Inc.) selected the sample of 16 assessments for in-depth evaluation. The materials for in-depth review of the pre-Pilot assessments included the IRIS summaries and the supporting EPA Source Document(s) identified in each summary. For the Pilot/post-Pilot assessments, the materials were the IRIS summary and Toxicological Review. The selection process and assessments chosen are provided in the attached Versar report. EPA's contractor assembled and coordinated a set of six independent experts to carry out the review. These experts were selected on the basis of their in-depth knowledge of EPA's human health risk assessment methodologies, familiarity with IRIS, knowledge of current practices for evaluating and documenting uncertainty and variability in data used in health assessments, and expertise in how these factors relate to sensitive subpopulations including children. They represented a range of professional affiliations and of health science backgrounds among cancer and noncancer toxic endpoints. The experts evaluated the documentation of uncertainty and variability in assessments on the basis of the data available at the time each assessment was conducted, focusing on the presentation of available data and variability in that data, discussion of confidence and uncertainty, including any uncertainty factors applied. The evaluators self-certified that they had not been involved in the development or peer review of the assessments under review for the study, and that they could perform independently, free of conflict of interest. Each evaluator was assigned 8 assessments to review, generally evenly divided between pre-Pilot and Pilot/post-Pilot assessments. Each chemical assessment was independently reviewed by three evaluators. The evaluators and their assigned assessments are listed in Table 2-6 of the attached Versar report. The evaluators were asked to answer the following questions: • Considering the data available at the time each assessment was performed, and the EPA guidelines and methodologies operative at the time of the assessment, did EPA characterize to an appropriate extent the uncertainty and variability in data used to develop these IRIS health assessments? How does this compare between pre-Pilot and Pilot/post-Pilot assessments? • Did EPA appropriately address the strengths and weaknesses of the scientific evidence from available studies, and sources of variability in the data used in each assessment? • Did EPA appropriately address the uncertainties in the underlying data, and uncertainties in the qualitative and quantitative judgments given in each assessment? The evaluators were also encouraged to raise other relevant observations or comments. 12 ------- 4. SUMMARY OF RESULTS The summary findings of the screening and in-depth evaluations are provided below. Details of review results can be found in the attached EPA report (screening evaluation) and Versar report (overall summary of in-depth review and Appendix A, containing individual reviewers' findings). 4.1. Screening Evaluation The results of the screening evaluation of the 52 pre-Pilot IRIS summaries by the first EPA reviewer were that: 3/52 had extensive, 16/52 some or moderate, and 33/52 none or minimal presentation or discussion of variability and uncertainty. Nearly all of the Pilot/post- Pilot assessments (14/15) showed extensive documentation of variability and uncertainty in the IRIS summary and Toxicological Review. It should be noted that a proper comparison between the two groups of assessments (pre-Pilot versus Pilot/post-Pilot) cannot be made as it requires an evaluation of a comparable set of assessment documentation (the source documents for the pre- Pilot assessments were not evaluated in the screening phase). The independent verification of the screening evaluation by a second EPA reviewer produced similar results (see attached EPA Screening Report, Table 5), with a Spearman rank correlation coefficient of 0.82. For 15 assessments, the ratings for the reviewers differed by one category. Given the valuable input from the verification step, it is reasonable to consider the results of the two rankings together. Among the 52 pre-Pilot summaries, then, approximately two-thirds (63-79%) contained none to minimal documentation of variability and uncertainty information. Almost all (93-100%) of the assessments carried out after 1995 demonstrated extensive documentation of variability and uncertainty information. 4.2. In-Depth Evaluation The report of the in-depth evaluation (attached Versar Report) summarizes the collective findings and conclusions of the six evaluators in responding to EPA's questions. The evaluators' individual reports are provided in Appendix A of the Versar report. The primary conclusions to each question are summarized below. Considering the data available at the time each assessment was performed, and the EPA guidelines and methodologies operative at the time of the assessment, did EPA characterize to an appropriate extent the uncertainty and variability in data used to develop these IRIS health assessments? How does this compare between pre-Pilot and Pilot/post-Pilot assessments? As described above, six independent evaluators examined, in-depth, a sample of 16 IRIS assessments which had been found to have either a "some/moderate" or "extensive" degree of 13 ------- documentation of variability and uncertainty in the screening evaluation. Each chemical assessment consisting of an IRIS summary and any supporting document(s) was reviewed by three independent evaluators. There was a range of opinions concerning the adequacy of documentation of data variability and uncertainty for the individual assessments among the reviewers. This range extended from two assessments (pre-Pilot assessments from 1988 and 1990) considered by all 3 reviewers to have been inadequately characterized, to one assessment (post-Pilot assessment from 1998) unanimously considered to demonstrate thoroughly adequate documentation. The evaluations for each of the other 13 assessments were not unanimous but were still informative (see Versar report, Table 3-2). These evaluations are discussed farther below. The evaluators generally concluded that the pre-Pilot IRIS summaries provided limited information on uncertainty and variability, although this was consistent with the practice at the time. Further, a number of evaluators felt that pre-Pilot assessments often did not utilize existing human data to interpret the relevance of toxic effects in animals to humans, even when the human data seemed to support the consideration of other toxic endpoints. Some noted that route- to-route extrapolation, for both cancer and noncancer effects, was routinely carried out without any apparent scientific justification. Despite these shortcomings, evaluators did point out that two (l,2-dibromo-3-chloropropane and manganese) of the eight pre-Pilot summaries were especially well characterized regarding uncertainty and variability (see Versar report, Section 4), when judged according to practices standard at the time. The evaluators noted that the Pilot/post-Pilot IRIS summaries typically presented more information than the pre-Pilot summaries, but at the same time varied in quality. More specifically, they concluded that some Pilot/post-Pilot summaries contained little discussion of variability and uncertainty, while others were distinctly more comprehensive than pre-Pilot assessments. The more comprehensive assessments included more description and better discussion of data gaps and endpoints such as reproductive/developmental or neurological effects, as well as physicochemical information relevant to pharmacokinetics and toxicity and more complete synopses of conclusions for each supporting study. The best Pilot/post-Pilot assessments contained a more comprehensive discussion of the mechanism of action, the relevance of the critical effect to humans, or the impact of pharmacokinetic or metabolic information on interspecies variability. Two of these better assessments (ethylene glycol monobutyl ether and methyl methacrylate) were highlighted for using this additional information to adjust uncertainty factors away from the default values. The evaluators appreciated the availability of the Toxicological Review documents that accompany the IRK summaries on the IRIS website. Did EPA appropriately address the strengths and weaknesses of the scientific evidence 14 ------- from available studies, and sources of variability in the data used in the assessment? The evaluators concluded that the strengths and weaknesses of the scientific evidence from available studies were not thoroughly addressed in the earlier IRIS assessments, relative to the later assessments. It was found that only one of the eight pre-Pilot assessments appropriately addressed all of the substantive studies available at the time of the assessments. On the other hand, the evaluators considered six of the eight Pilot/post-Pilot assessments appropriately addressed the strengths and weaknesses of the substantive studies available at the time of the assessments (see Versar report, section 3). Did EPA appropriately address the uncertainties in the underlying data, and uncertainties in the qualitative and quantitative judgments given in the assessment? In addition to verifying whether the standard uncertainty factors of the time were applied appropriately to develop the provided RfD/RfC, the evaluators determined whether additional issues contributing to variability and uncertainty had been considered, such as mechanism of action, variations in species susceptibility, potential for existence of sensitive subpopulations, relevance of the dosing regimen to likely human exposure pathways, and relevance of the critical effect to humans. The evaluators found that these latter issues tended not to be addressed in the pre-Pilot summaries, with the exception of two (1,2-dibromochloropropane and manganese). The evaluators raised similar concerns about the Pilot/post-Pilot summaries with respect to these issues. Except for one assessment (methyl methacrylate) for which there was full agreement that uncertainties of the assessment had been adequately addressed, there was a range of opinions for the other seven Pilot/post-Pilot IRIS summaries. That is, there was usually at least one evaluator who was dissatisfied with these summaries, on the basis of the lack of coverage of these more advanced scientific issues. Reviewers' Recommendations In addition to responding to the three questions above, there were some general themes in the evaluators' individual recommendations for improving IRIS assessments. First, the reviewers recommended development of a standardized approach to handling variability and uncertainty in IRIS assessments. It was also recommended that data quality issues should be clarified in IRIS assessments. Specifically, toxicological experiments carried out before the advent of Good Laboratory Practices (GLPs) should be earmarked as such, since there could be more uncertainty attached to data carried out before this standardization was implemented. Also, data from unpublished or non-peer-reviewed sources could carry similar uncertainties. The evaluators also emphasized that there did not appear to be enough consideration of the relevance of specific findings in animals to humans, both in choice of critical effects and exposure conditions. They also felt that the presumption that humans are more sensitive to environmental toxicants required more justification and discussion in most assessments. 15 ------- In their individual reports (Appendix A of the Versar report), the evaluators made specific recommendations for improving those assessments they reviewed. These recommendations generally addressed inclusion of more recent scientific information (such as, mode of action or discussion of concordance of animal and human health endpoints) and pointed out instances where these data might support the use of more recently developed risk assessment methods (e.g., benchmark dose, quantitative uncertainty analysis). 5. DISCUSSION The characterization of the extent of documentation of variability and uncertainty in chemical assessments in IRIS was accomplished using a tiered strategy, first by screening for the degree of this documentation in broad terms in a random sample, then in-depth in a smaller, targeted subsample. The representativeness of the in-depth evaluations for characterizing the rest of the database, first for the pre-Pilot IRIS assessments, then for the later IRIS assessments is discussed below. The screening evaluation of 10% of the pre-Pilot IRIS data base provided a baseline for characterizing the IRIS database. Recall that about two-thirds (63-79%) of the sample of pre- Pilot IRIS summaries were found to have none to minimal documentation of variability and uncertainty (see section 4.1 above). Given the subjective nature of this evaluation, the additional review and consensus-building necessary to narrow this estimate did not appear warranted. Thus, it was concluded that approximately one-third (21-37%) of the pre-Pilot IRIS summaries demonstrated at least some documentation of the variability and uncertainties in deriving the toxicity values provided. There was reasonable concordance for the pre-Pilot assessments between the screening evaluation and the in-depth review, given the different purposes of the two steps of the overall evaluation. In particular, two assessments (hexachlorobenzene and Prochloraz) were considered by the evaluators in their in-depth review to have inadequate documentation (see Versar report, section 3). These assessments were also judged to have minimal rather than moderate documentation in the independent verification stage of the screening evaluation (EPA Screening Report, Appendix B). At the other end of the scale, the two assessments highlighted as the most thoroughly documented of the pre-Pilot in-depth sample (1,2-dibromochloropropane and manganese) were also considered to be extensively documented in the screening evaluation. One apparent outlier involved an assessment determined in the screening evaluation to have moderate documentation, yet was considered unanimously by the in-depth evaluators to have inadequate documentation of uncertainties (4-methylphenol; see Versar report, section 3.1). While the degree of discussion in the summary was more detailed than was otherwise typical at 16 ------- the time (1990), the evaluators concluded that important aspects of uncertainty had been overlooked, e.g., incomplete use of data available at the time, and uncritical use of data from structural analogues that were not clearly relevant. The correspondence of the screening evaluation and the in-depth evaluation for the Pilot/post-Pilot assessments was also complementary. It was found in the screening evaluation that the IRK summary and lexicological Review for the Pilot/post-Pilot assessments generally contained extensive documentation of variability and uncertainty. In the in-depth evaluation, the reviewers further examined the completeness of the discussions provided. While they concluded that the quality of the discussions varied, it was also not always clear whether these remarks were addressed to the IRIS summary alone, the Toxicological Review alone, or to both. In conclusion, the statistical sampling approach taken in choosing the assessments to review allows some generalization of the results of the screening evaluation and the in-depth evaluation to the rest of the IRIS data base. That is, based on a 10% sample, approximately two- thirds of the pre-Pilot IRIS summaries can be expected to contain minimal discussion of the variability and uncertainty inherent in the available toxicity values. The remaining third of the pre-Pilot IRIS summaries can be expected to contain at least moderate documentation of variability and uncertainty. Among assessments with at least moderate documentation of variability and uncertainty, in their in-depth review, the evaluators found that coverage of relevant uncertainty and variability issues was uneven across the assessments they reviewed, with two of the eight assessments noticeably more comprehensive than the other pre-Pilot assessments. Among the Pilot/post-Pilot assessments, all but one demonstrated extensive documentation of variability and uncertainty, partly through the ready availability of the accompanying Toxicological Reviews. The evaluators' in-depth reviews of eight of these assessments no ted a range in quality of the discussion of relevant uncertainties in these assessments as well. One Pilot/post-Pilot assessment was highlighted as being more comprehensive than all of the other assessments examined in-depth. The independent evaluators also made several recommendations for improving IRIS assessments, including the need for updating assessments. EPA recognizes that many assessments in the IRIS database have not been updated and therefore either may not reflect the latest scientific findings or current risk assessment methods. With respect to current risk assessment methods, EPA has been applying the revised cancer guidelines in all assessments underway since they were proposed, as noted in Section 2 above, but acknowledges that some unevenness in documentation exists while the Agency gains experience in applying them. Concerning "data-derived" uncertainty factors, it should be noted that EPA-published risk assessment guidelines support the use of relevant data to replace these defaults. Limitations in developing data-derived factors are mostly due to the unavailability of useful data to justify 17 ------- departure from defaults. EPA is developing guidance for risk assessors in the application of the "data-derived" approach to facilitate the maximum use of scientific data in replacing default UFs. Moreover, EPA acknowledges that discussion of many of these underlying uncertainties in IRIS assessments can be improved. One of the more recent risk assessment methods encouraged by several reviewers was quantitative uncertainty analysis. The goal of a quantitative uncertainty analysis is to clarify the overall degree of variability and uncertainty and the confidence that can be placed in the analysis and its findings, through a systematic approach to account for relationships among the inputs or assumptions (in the case of risk assessment for IRIS, all of the data choices and uncertainty decisions discussed above) which contribute to a risk decision (in this case a toxicity value). Quantitative choices must be made for each input, even for qualitative decisions. A number of recent documents have emphasized the importance of adequately characterizing variability and uncertainty in risk assessments and discuss quantitative uncertainty analysis in more detail (US EPA, 1992, 1995, 1997a,b; National Academy of Sciences, 1994). EPA's current Policy for Use of Probabilistic Analysis in Risk Assessment (1997b) provides that, "For human health risk assessments, the application of Monte Carlo and other probabilistic techniques has been limited to exposure assessments in the majority of cases. The current policy, Conditions for Acceptance and associated guiding principles are not intended to apply to dose response evaluations for human health risk assessment until this application of probabilistic analysis has been studied further." Since it is the function of IRIS to implement Agency-approved published methodologies and Agency-wide policies, implementation of newer risk assessment methods in IRIS waits for an Agency-level mandate. In the meantime, EPA agrees that a thorough description of the available data and its related uncertainties can provide the IRK user with a level of confidence in a particular assessment, and can lay the groundwork for later uncertainty analysis, should it be considered practical. 6. CONCLUSIONS The results of the screening evaluation indicated that about a third of IRIS summaries for pre-Pilot chemical assessments had at least some documentation of data variability and uncertainty, while a large majority of Pilot/post-Pilot assessments (consisting of both IRIS summaries and Toxicological Reviews) had extensive documentation. While the documentation in assessments has improved overall since the IRIS Pilot's introduction of Toxicological Reviews to substantiate IRIS summaries, the results of the in-depth evaluation indicate that the quality of the characterization of data variability and uncertainty varies among the Pilot/post-Pilot 18 ------- assessments. This study supports EPA's commitment to provide more transparent scientific bases for risk assessment conclusions. EPA will continue to look into ways to improve documentation of variability and uncertainty issues in future Toxicological Reviews, and recapitulate this information in IRIS summaries. 19 ------- 7. REFERENCES Barnes, DG, and Dourson, M. (1988) Reference dose (RfD) description and use in health risk assessments. Regulatory Toxicology and Pharmacology. 8: 4471-4486. National Academy of Sciences (1 983) Risk Assessment in the Federal Government: Managing the Process. Washington, DC. National Research Council (1994) Science and Judgment in Risk Assessment. National Academy Press: Washington, D.C. U.S. Environmental Protection Agency (1986) Guidelines for carcinogen risk assessment. Federal Register 51(185):33992-34003. U.S. Environmental Protection Agency (1991) Guidelines for Developmental Toxicity Risk Assessment, dated December 5, 1991. Fed. Reg. 56,No.234: 63798-63826. U.S. Environmental Protection Agency (1992) Guidelines for Exposure Assessment Federal Register 57: 22888- 22938. EPA/600Z-92/001. US Environmental Protection Agency (1994a) Methods for Derivation of Inhalation Reference Concentrations and Application of Inhalation Dosimetry. Office of Health and Environmental Assessment, National center for Environmental Assessment, RTP, NC. EPA/600/8-90-066F. US Environmental Protection Agency (1994b) Peer Review and Peer Involvement at the US EPA. Memorandum of the Administrator, Carol M. Browner, June 7. US Environmental Protection Agency (1995) EPA Guidance on Risk Characterization. Memorandum of the Administrator, Carol M. Browner, March 21. U.S. EPA. (1995c) Use of the Benchmark Dose Approach in Health Risk Assessment. Office of Research and Development. EPA/630/R-94/007, February 1995. U.S. Environmental Protection Agency (1996a. Proposed Guidelines for Carcinogen Risk Assessment, Notice, 1996. Federal Register 61(79):17960-18011. U.S. Environmental Protection Agency (1 996b) U.S. EPA, 1996b. Guidelines for Reproductive Toxicity Risk Assessment. October 31. Federal Register 61 (212): 56274-56322. US Environmental Protection Agency (1997a) Guiding Principles for Monte Carlo Analysis. Office of Research and Development. EPA/630/R-97/001. US Environmental Protection Agency (1 997b) Policy for Use of Probabilistic Analysis in Risk Assessment. Memorandum of the Deputy Administrator, Fred Hansen, May 15. U.S. Environmental Protection Agency (1998) Guidelines for Neurotoxicity Risk Assessment, dated October 4, 1995. Federal Register 63 (93): 26926-26954. U.S. Environmental Protection Agency (1999) Proposed Guidelines for Carcinogen Risk Assessment. External Review Draft. 20 ------- |