DO NOT CITE OR QUOTE EPA/63 5/R-00/005A August 2000 SAB Review Draft EPA Summary Report Characterization of Data Variability and Uncertainty: Health Effects Assessments in the Integrated Risk Information System (IRIS) In response to Congress, HR 106-379 National Center for Environmental Assessment Office of Research and Development US Environmental Protection Agency Washington, DC ------- DISCLAIMER This document has been reviewed in accordance with U.S. Environmental Protection Agency policy. Mention of trade names or commercial products does not constitute endorsement or recommendation for use. ------- TABLE OF CONTENTS EXECUTIVE SUMMARY 1 1. INTRODUCTION AND PURPOSE 2 2. BACKGROUND 3 2.1 Hazard and Dose-Response Assessment 4 2.2 IRIS Program and Data Base 7 2.3 Uncertainty and Variability 9 3. EVALUATION APPROACH 11 3.1 Protocol Development 11 3.2 Screening Evaluation 12 3.3 In-depth evaluation 13 4. SUMMARY OF RESULTS 15 4.1 Screening Evaluation 15 4.2 In-Depth Evaluation 16 5. DISCUSSION 19 6. CONCLUSIONS 22 7. REFERENCES 23 ATTACHMENTS • EPA Screening Evaluation Report: Presentation and Discussion of Uncertainty and Variability in IRIS Assessments • Versar Report: Characterization of Data Uncertainty and Variability in IRIS Assessments, Pre-Pilot vs. Pilot/post-Pilot • Appendix A of Versar Report: Individual reports of experts assembled by Versar, Inc. in ------- 1 EXECUTIVE SUMMARY 2 3 In response to a Congressional directive contained in HR 106-379 regarding EPA's 4 appropriations for FY2000, EPA has undertaken an evaluation of the characterization of data 5 variability and uncertainty in its Integrated Risk Information System (IRIS) health effects 6 information database. Through consultation with EPA's Science Advisory Board, EPA 7 developed and implemented a systematic plan to select a representative sample of chemical 8 assessments in IRIS to be evaluated in-depth by an independent panel of experts for the extent to 9 which EPA has documented uncertainty and variability. EPA conducted a screening evaluation 10 on 10% percent of the IRIS summaries of chemical assessments completed during the period of 11 1988-1994 (52 of 522 pre-Pilot assessments) and all 15 Pilot/post-Pilot IRIS summaries and 12 Toxicological Reviews (completed after 1995) for overall documentation of data variability and 13 uncertainty. An EPA contractor then selected 16 assessments (IRIS summaries and support 14 documents) for in-depth examination from the screening sample (8 of 52 pre-Pilot and 8 of 15 15 Pilot/post-Pilot). The contractor selected six independent experts (outside EPA) in the field of 16 human health risk assessment, who performed this in-depth review. 17 In general, the outside experts concluded that the characterization of data variability and 18 uncertainty varied across the assessments they reviewed. While the documentation of data 19 variability and uncertainty has generally improved since the IRIS Pilot's introduction of 20 Toxicological Reviews to substantiate IRIS summaries, the reviewers found that the quality of the 21 characterization of data variability and uncertainty varied among the Pilot/post-Pilot assessments. 22 The reviewers also suggested ways to describe uncertainty and variability, and a number of 23 scientific improvements, especially the need to update older assessments with more recent 24 scientific data and risk assessment methods. 25 This study supports EPA's goal to make the scientific bases for risk assessment 26 conclusions more transparent. EPA will continue to look into ways to improve the 27 characterization and documentation of data variability and uncertainty in future IRIS assessments. ------- 1 1. INTRODUCTION AND PURPOSE 2 The Integrated Risk Information System (IRIS) data base contains EPA's consensus 3 scientific positions on potential adverse human health effects that may result from chronic 4 exposure to specific chemical substances in the environment. As of January 31, 2000, the IRIS 5 data base contained 537 chemical-specific assessments. IRIS is widely used by regulator 6 programs and risk assessors at all levels of government and by the public. First publically 7 available in 1988, these assessments provide the summary results of EPA deliberations 8 culminating in consensus hazard and dose-response conclusions for cancer and non-cancer health 9 effects. Since 1995 (when the "IRIS Pilot" program was undertaken), EPA has taken several 10 steps to ensure that the best available scientific information is included in chemical assessments 11 made available on IRIS, including improvements in documentation of scientific decisions, and 12 external peer reviews of all subsequent assessments. 13 Regarding IRIS, Congress issued the following directive, which was contained in the 14 October 1999 report from Congress (HR 106-379) regarding EPA's appropriations for FY2000: 15 "The conferees are concerned about the accuracy of information contained in the 16 Integrated Risk Information system [IRIS] data base which contains health effects 17 information on more than 500 chemicals. The conferees direct the Agency to consult 18 with the Science Advisory Board (SAB) on the design of a study that will a) examine a 19 representative sample of IRIS health assessments completed before the IRIS Pilot 20 Project, as well as a representative sample of assessments completed under the project 21 and b) assess the extent to which these assessments document the range of uncertainty 22 and variability of the data. The results of that study will be reviewed by the SAB and a 23 copy of the study and the SAB's report on the study sent to the Congress within one year 24 of enactment of this Act. " 25 In response to the Congressional directive, EPA has undertaken an evaluation of the 26 characterization of data variability and uncertainty in IRIS assessments. This report addresses 27 Congress's directive. Section 2 of the report provides background information about EPA's 28 approaches to health hazard and dose-response assessments, and describes the IRIS program and 29 the kinds of health information available in IRIS. It also discusses the sources of scientific ------- 1 uncertainties and variability related to the risk assessment process, and defines these terms in the 2 context of the purpose of this EPA study, i.e., characterization of data variability and uncertainty 3 of chemical assessments in IRIS. Section 3 describes the study protocol, and the summary 4 findings of the study are provided in section 4. Details of the study protocol and results can be 5 found in the three attachments. Discussion of study results, study conclusions, and references are 6 provided in sections 5, 6 and 7, respectively. 7 8 2. BACKGROUND 9 Risk assessment is the process EPA uses to identify and characterize environmentally - 10 related human health problems. As defined by the National Academy of Sciences (NAS, 1983), 11 risk assessment entails the evaluation of all pertinent scientific information to describe the 12 likelihood, nature, and extent of harm to human health as a result of exposure to environmental 13 contaminants. EPA has used the basic NAS paradigm as a foundation for its published risk 14 assessment guidance, and as an organizing system for many individual environmental chemical 15 assessments. There are four components to every complete risk assessment - hazard assessment, 16 dose-response assessment, exposure assessment, and risk characterization. Hazard assessment 17 describes qualitatively the likelihood that an environmental agent can produce adverse health 18 effects under certain environmental exposure conditions. Dose-response assessment 19 quantitatively estimates the relationship between the magnitude of exposure and the degree and/or 20 probability of occurrence of a particular health effect. Exposure assessment determines the extent 21 of human exposure. Risk characterization integrates the findings of the first three components to 22 describe the nature and magnitude of health risk associated with environmental exposure to a 23 chemical substance or a mixture of substances. 24 There are many uncertainties associated with environmental risk assessments due to the 25 complexity of the exposure-dose-effect relationship, and the lack of, or incomplete, knowledge 26 and information about the physical, chemical, and biological processes within and between human 27 exposure to an environmental substance(s) and health effects. Major sources of uncertainty 28 include the use a wide range of data from many different disciplines (e.g., epidemiology, 29 toxicology, biology, chemistry, statistics), the use of many different predictive models and ------- 1 methods in lieu of actual measured data, the use of many scientific assumptions and science policy 2 choices, i.e., scientific positions assumed in lieu of scientific data, in order to bridge the 3 information and knowledge gaps in the environmental risk assessment process. These diverse 4 elements, along with varying interpretations of the scientific information, can result in divergent 5 results in the risk assessment process, an outcome that leads to risk assessment controversies. 6 Thus, EPA risk assessment guidelines stress the importance of identifying uncertainties and 7 variability and presenting them as part of risk characterization. 8 Over the years, EPA has conducted health hazard and dose-response assessments for 9 many environmental chemical contaminants. The summary findings and outcomes of these 10 assessments which represent scientific consensus positions across the Agency are made available 11 in the IRIS data base. Information on IRIS can be used with an exposure assessment for a 12 specific exposure scenario to perform a complete risk assessment. The following sections provide 13 an overview of EPA's historical and current approaches to health hazard and dose-response 14 assessments, describe EPA's IRIS program and the kinds of information available in IRIS, and 15 define variability and uncertainty in the context of hazard and dose-response assessments and 16 available information in IRIS. 17 18 2.1 Hazard and Dose-Response Assessment 19 In general, chemicals often affect more than one organ or system of the body (e.g., liver, 20 kidney, nervous system) and can produce a variety of health endpoints (e.g., cancer, respiratory 21 allergies, infertility), depending on the conditions of exposure such as the amount, frequency, 22 duration, and route of exposure (i.e. ingestion, inhalation, dermal contact). For most 23 environmental chemicals, available health effects information is generally limited to high exposures 24 in studies of humans (e.g. occupational studies of workers) or laboratory animals. Thus, 25 evaluation of potential health effects associated with low levels of exposure generally encountered 26 in the environment involves inferences based on the understanding of the mechanisms of chemical - 27 induced toxicities. Mechanism of action is defined as the complete sequence of biological events 28 that must occur to produce an adverse effect. In cases where only partial information is available, 29 the term mode of action is used to describe only major (but not all) biological events which are ------- 1 judged to be sufficient to inform about the shape of the dose-response curve beyond the range of 2 observation. 3 For effects that involve the alteration of genetic material (e.g. most cancers, heritable 4 mutations), there are theoretical reasons to believe that such a mode of action would not show a 5 threshold, or dose below which there are no effects. On the other hand, a threshold is widely 6 accepted for most other health effects, based on considerations of compensatory homeostasis and 7 adaptive mechanisms. The threshold concept presumes that a range of exposures from zero to 8 some finite value can be tolerated by an individual without adverse effects. Accordingly, different 9 approaches have traditionally been used to evaluate the potential carcinogenic effects and health 10 effects other than cancer, referred to as "non-cancer" effects. 11 Carcinogenic Effects Cancer hazard assessment involves a qualitative weight-of-evidence 12 evaluation of potential human carcinogenicity. This evaluation is a synthesis of all pertinent 13 information in addressing the question of "How likely an agent is to be a human carcinogen. " 14 The EPA's 1986 Guidelines for Carcinogen Risk Assessment (USEPA, 1986) provide a 15 classification system for the characterization of the overall weight-of-evidence for potential human 16 carcinogenicity based on human evidence, animal evidence, and other supportive data. The 17 EPA's 1996 Proposed Guidelines for Carcinogen Risk Assessment (USEPA, 1996a) and the 18 subsequent revised external review draft (USEPA, 1999), emphasize the need for characterizing 19 cancer hazard in addition to hazard identification. Accordingly, the question to be addressed in 20 hazard characterization is expanded to "How likely an agent is to be a human carcinogen, and 21 under what exposure conditions a cancer hazard may be expressed. " In addition, the revised 22 guidelines stress the importance of considering the mode(s) of action information of the agent for 23 making an inference about potential cancer hazard beyond the range of observation. To express 24 the weight-of-evidence for potential human carcinogenicity, the EPA's proposed revised 25 guidelines emphasize using a hazard narrative in place of the classification system. However, in 26 order to provide some measure of consistency, standard hazard descriptors are used as part of the 27 hazard narrative to express the conclusion regarding the weight-of evidence for potential human 28 carcinogenicity. 29 Dose-response assessment for carcinogenic effects usually involves the use of a linear ------- 1 extrapolation model(s) to estimate an upper bound on cancer risks at a given low level of 2 exposure. The linear low dose extrapolation approach is considered appropriate for cases where 3 there is insufficient understanding of the mode of action, or when available data indicate a linear 4 dose-response curve at low dose, but there are not enough data that would allow the development 5 of biologically-based dose-response models. This risk estimate is known as cancer unit risk for 6 inhalation exposure and slope factor for oral exposure. It is recognized that such an estimate may 7 not give a realistic prediction of risk and the true value of risk may be as low as zero. However, 8 the use of such models puts a ceiling on what the risk might be. When there is sufficient evidence 9 for a non-linear mode of action, but not enough data to construct a biologically-based model for 10 the relationship, EPA's proposed revised cancer guidelines (USEPA, 1996a) call for the use of a 11 margin of exposure analysis as a default procedure. A margin of exposure analysis compares the 12 point of departure (i.e., the lower 95% confidence limit of the dose or exposure associated with 13 10% risk of cancer or precursor effects) with the dose associated with the environmental 14 exposure(s) of interest, and determines whether or not the exposure margins are adequate. Both 15 default approaches may be used for a specific cancer assessment, if it is mediated by multiple 16 modes of action which may include linear and nonlinear modes of action. 17 Non-Cancer Effects The Agency has published several guidelines for assessing specific 18 non-cancer health endpoints including developmental toxicity, reproductive toxicity, and 19 neurotoxicity, (USEPA, 1991, 1996b, 1998, respectively). Like the cancer guidelines, these 20 guidelines set forth principles and procedures to guide EPA scientists in the interpretation of 21 epidemiologic, toxicologic and mechanistic studies to make inferences about the potential hazard 22 of these specific health endpoints. Following a review and evaluation of the spectrum of potential 23 health effects associated with the chemical of interest (i.e., hazard identification), a dose-response 24 assessment is then performed on the "critical effect" (i.e., the adverse effects or its known 25 precursor which occurs at the lowest dose) to derive a chronic reference dose (RfD) or reference 26 concentration (RfC) for oral and inhalation exposure, respectively. The RfD/RfC is defined as 27 "an estimate (with uncertainty spanning perhaps an order of magnitude) of a continuous 28 oral/inhalation exposure to the human population (including sensitive subgroups) that is likely to 29 be without an appreciable risk of deleterious non-cancer effects during a lifetime " (Barnes and ------- 1 Dourson, 1988; USEPA, 1994a). The RfD/RfC approach assumes that if exposure can be limited 2 so that a critical effect does not occur, then no other non-cancer effects will occur. Thus, this 3 approach fulfills the regulatory needs for various EPA's regulatory programs for defining an 4 exposure levels(s) below which there is negligible risk of adverse non-cancer health effects. 5 6 2.2 IRIS Program and Data Base 7 The IRIS data base was created in 1986 as a mechanism for developing consistent intra- 8 Agency consensus positions on potential health effects of chemical substances. EPA Program 9 Offices and Regions were regulating some of the same substances, and determined that in many 10 cases the Agency needed to use consistent scientific judgments on potential health effects in risk- 11 based decisions. Chemical assessments prepared by Program and Regional Offices were peer 12 reviewed by three intra-agency workgroups (i.e., RfD, RfC, and Carcinogen Risk Assessment 13 Verification Endeavor, or CRAVE, workgroups) comprising of health scientists across the 14 Agency. Summary results of these consensus assessments were collected and made available on 15 IRIS. Combined with site-specific or national exposure information, the summary health 16 information in IRIS could then be used by risk assessors and other staff to evaluate potential 17 public health risks from environmental contaminants. Summary information in IRIS consists of 18 three components: derivation of oral chronic RfD and inhalation chronic RfC, for non-cancer 19 critical effects, cancer classification (and cancer hazard narrative for the more recent assessments) 20 and quantitative cancer risk estimates. 21 IRIS summaries were originally written for an internal EPA audience. For this reason, 22 IRIS information has focused on the documentation of toxicity values (i.e., RfD, RfC, cancer unit 23 risk and slope factor) and cancer classification. The bases for these numerical values and 24 evaluative outcomes are provided in an abbreviated and succinct manner. Details for the scientific 25 rationale can be found in supporting documents, and references for these assessment documents, 26 and key studies are provided in the bibliography sections. Moreover, it was not considered 27 necessary to articulate every default assumption used in individual chemical assessments as these 28 assumptions have been explicitly discussed and supported in the Agency's published risk 29 assessment guidance. It is also important to note that the three components of IRIS information 7 ------- 1 (RfD, RfC, and cancer evaluation) were added to the database at different times, depending on the 2 regulatory needs, without an explanation of why other endpoints were not assessed. 3 As external interest in the information on IRIS grew, EPA made the IRIS data base 4 publically available in 1988 via the National Library of Medicine's TOXNET system. In 1995, 5 EPA undertook the IRIS Pilot Program to evaluate and implement a number of improvements in 6 the documentation of summary information in IRIS and in the scientific peer review process. 7 Individual chemical hazard and dose-response assessments for cancer and non-cancer health 8 effects are now provided in a single supporting document known as the IRIS "lexicological 9 Review" (or an equivalent support document). This procedure was subsequently adopted in 10 response to the need for a more integrated health assessment as harmonized dose-response 11 approaches become available for cancer and non-cancer effects. In addition, there has been an 12 increased demand for more transparency in the default assumptions and methods used in these 13 chemical assessments, in response to the Agency policy on risk characterization (USEPA, 1995), 14 as well as for developing and documenting the scientific bases for moving away from default 15 methods (e.g., use of chemical-specific data to replace default values of uncertainty factors). In 16 order to make the scientific quality of the assessments more uniform, an external peer review 17 process was included in the Pilot program into the preparation of each chemical assessment, in 18 response to EPA's Peer Review Policy (USEPA, 1994b). Since 1997, IRIS summaries and 19 accompanying support documents, including a summary and response to external peer review 20 comments, have been publically available in full text on the IRIS web site at 21 http://www.epa.gov/iris. The Internet site is now EPA's primary repository for IRIS. Together 22 they comprise the "IRIS assessment" for a given chemical substance. 23 The information currently on IRIS represents the state-of-the-science and state-of-the- 24 practice in risk assessment as it existed when each assessment was prepared; often 10 or more 25 years ago. When EPA reassesses older IRIS entries, an opportunity exists to update the science 26 and apply more current methodologies. EPA uses an annual priority-driven approach to 27 determine which chemical substances are most in need of assessment or reassessment. The 28 criteria that drive EPA's priorities are usually Program Offices' and Regions' statutory, 29 regulatory, and programmatic needs. Availability of new scientific information to perform ------- 1 reassessments is also a strong criterion. In this manner, EPA directs its resources to the highest 2 priorities first. However, much work will be needed over the coming years in order to update 3 even the highest priority substances. In an effort to improve the pace of the assessment process 4 and leverage resources, EPA is currently evaluating ways to work cooperatively with external 5 parties on assessment development. Five cooperative efforts are currently in progress, three with 6 private organizations and two with other federal agencies. Others are under consideration. Under 7 a cooperative arrangement, an external party may submit an assessment for EPA's consideration 8 in developing an EPA IRIS document; however, EPA's consensus position must be documented 9 separately. EPA is continuing to look for opportunities to improve the IRIS process and the pace 10 of data base update. 11 12 2.3 Uncertainty and Variability 13 Because the Congressional language was to address "uncertainty and variability of the 14 data, " this report uses an expansive definition of the term "variability." As used in this report, 15 "variability" encompasses any aspect of the risk assessment process that can have varying results, 16 including the potential interpretations of the available data, the availability of different data sets 17 collected under different experimental protocols, and the availability of different models and 18 methods. Several of these would be considered as sources of uncertainty under the definitions of 19 variability and uncertainty used by the NRC (1994) and EPA(1992, 1997). These stricter 20 definitions use "variability" to refer to differences attributable to diversity in biological sensitivity 21 or exposure parameters; these differences can be better understood, but not reduced by further 22 research. "Uncertainty" refers to lack of knowledge about specific factors, parameters, or 23 models, and generally can be reduced through further study. This section summarizes key 24 uncertainties and data variability generally encountered in hazard and dose-response evaluations 25 for cancer and non-cancer effects. 26 Hazard Assessment For most chemical substances for which there are insufficient data in 27 humans, a major uncertainty in the evaluation of potential health effects to humans is the reliance 28 on animal studies of high exposure to predict human response at lower exposure, particularly in 29 the absence of an understanding of how an agent causes the observed toxicologic effects in the ------- 1 animals, and in the face of the varying results frequently obtained with different animal species 2 under different exposure conditions. Even when there are human data, there is uncertainty about 3 average response at lower exposures and there is variability in individual response around this 4 average. Therefore, EPA has adopted a number of scientific assumptions as science policy 5 choices in the face of data and knowledge gaps. 6 Major assumptions used in hazard assessment (unless there are data to the contrary) 7 include the following: (a) effects observed in one human population are predictive of other human 8 populations, including sensitive subpopulations; (b) in the absence of human data, effects seen in 9 laboratory animals are assumed to be relevant to humans, and humans may respond similarly 10 (although not identically) to the most sensitive animal species; and (c) effects seen at high 11 exposure are relevant for evaluation of potential effects at low exposure. These scientific 12 assumptions or science policies have also been articulated further in EPA's peer- reviewed risk 13 assessment guidance documents, as discussed above. 14 Reference Values for Non-Cancer Effects To derive a RfD/RfC for a non-cancer 15 critical effect, the common practice is to apply standard "uncertainty factors" (UFs) to the no- 16 observed adverse effect level (NOAEL), lowest-observed adverse effect level (LOAEL) or 17 benchmark dose/concentration (BMCLJ1 (US EPA, 1995c). These UFs are used to account for 18 the extrapolation uncertainties (e.g., inter-individual variation, interspecies differences, duration of 19 exposure) and adequacy of database. A modifying factor (MF) is also used as a judgment factor 20 to account for the confidence in the critical study (or studies) used in the derivation of the 21 RfD/RfC. Replacements for default UFs are used when chemical-specific data are available to 22 modify these standard values. This is known as the "data-derived" approach. Moreover, the use 23 of pharmacokinetic or dosimetry models can obviate the need for an UF to account for differences 24 in toxicokinetics across species. 25 A number of related factors can lead to significant uncertainty of the RfD/RfC. Among 26 these is the selection of different observed effects as a critical effect, which may vary within and JBMCLx is defined as the lower 95% confidence limit of the dose that will result in a level of "x" response (e.g., BMCL10 is the lower 95% confidence limit of a dose for a 10% increase in a particular response). 10 ------- 1 across available studies. Also significant are the choice of different data sets for the identification 2 of the NOAEL, LOAEL, or bench mark dose analysis, the use of different values for the various 3 UFs, and additional judgments which impact the MF. 4 Cancer Risk Estimates Cancer dose-response assessment generally involves many 5 scientific judgments regarding the selection of different data sets (benign and malignant tumors or 6 their precursor responses) for extrapolation, the choice of low dose extrapolation approach based 7 on the interpretation and assessment of the mode of action for the selected tumorigenic 8 response(s), the choice of extrapolation models, methods to account for differences in dose across 9 species, and the selection of the point of departure for low dose extrapolation. Given that many 10 judgments need to be made in the many steps of the assessment process in the face of data 11 variability, along with the use of different science policy choices and default procedures and 12 methods to bridge data and knowledge gaps, it is generally recognized that uncertainty exists in 13 cancer risk estimates. 14 15 3. EVALUATION APPROACH 16 The following sections describe the overall approach for this evaluative study and the 17 study protocols for the screening step and the in-depth evaluation of the documentation of data 18 variability and uncertainty of available health information in IRIS. Details of the study protocols 19 can be found in the attachments (EPA Screening Evaluation Report, and Versar in-Depth Report). 20 21 3.1 Protocol Development 22 Following the Congressional directive, EPA consulted with the Executive Committee of 23 EPA's Science Advisory Board (SAB) about a proposed approach to this study. The agreed- 24 upon approach involved assembling a team of independent, qualified individuals, external to EPA, 25 to evaluate a representative set of IRIS assessments for the extent of documentation of variability 26 and uncertainty. The use of external experts would avoid internal bias and the appearance that the 27 IRIS program was "reviewing itself." The assessments would be reviewed simultaneously by 28 multiple evaluators, in order to obtain a range of opinions from experts with a variety of relevant 29 backgrounds. In order to address Congress's point concerning pre-Pilot and Pilot assessments, 11 ------- 1 half of the sample would be from the set of pre-Pilot assessments (completed before 1995) and 2 half from the later assessments. 3 The SAB supported EPA's overall approach, and recommended a number of 4 enhancements. First, they recommended a tiered approach to selecting a representative sample of 5 assessments, in which a sample of at least 10% of the available assessments would first be 6 screened for their treatment of variability and uncertainty. This screening was to consider broad 7 categories of documentation, and be verified by an independent reviewer. A smaller set of 8 assessments would be chosen from the screening sample for in-depth review. 9 The SAB also encouraged examining as large a set of assessments in-depth as possible. 10 They felt that three reviews per assessments would provide a sufficient range of opinions, given 11 an adequate range of subject area expertise among the evaluators. This decision made it possible 12 to target a sample of 16 assessments, to be reviewed by a total of 6 independent evaluators. 13 14 3.2 Screening Evaluation 15 An EPA scientist (IRIS Program Staff) carried out the screening evaluation, which is 16 detailed in the attached EPA report. As recommended by the SAB, a 10% sample of pre-Pilot 17 IRIS assessments (52 of 522) was identified. These, and the 15 Pilot/post-Pilot IRIS assessments 18 completed by January 31, 2000, a total of 67 assessments, were classified into three broad 19 categories of overall documentation: none/minimal, some/moderate, or extensive (see Table 2, 20 attached EPA Screening Report). The purpose of the preliminary screening was to survey 21 broadly the extent of documentation of uncertainty and variability of health effects information in 22 IRIS, in order to facilitate an in-depth evaluation of a smaller, but representative set of chemical 23 assessments in IRIS. Due to the large volume of pre-Pilot assessment materials (52 sets of an 24 IRIS summary plus supporting EPA Source Document(s)), only the IRIS Summaries were 25 examined. For the later IRIS assessments, the IRIS summary and the Toxicological Review were 26 examined. Consequently, this screening addressed only the overall approach to providing 27 information concerning variability and uncertainty in the on-line assessments, not the 28 completeness of the summarized information, nor the cited scientific literature available at the 29 time of each assessment. 12 ------- 1 The first category, "None/Minimal," describes assessments which presented conclusions, 2 with overall uncertainty and confidence statements, but no incidence rates or other quantitative 3 health effect levels for the available studies (such as, percent weight loss), nor any rationale for 4 the confidence statements. Assessments with "Some or Moderate" documentation contained 5 quantitative effect levels and some discussion of variability of effects, including variability across 6 dose groups. In addition, these assessments contained some discussion of the reasons for overall 7 confidence in the assessment. Assessments with "Extensive" documentation contained 8 quantitative information (such as confidence intervals), some comparison of results across related 9 studies, discussion of sources of uncertainty, comparison of uncertainties across available studies, 10 and rationales for confidence in the available studies and conclusions drawn in the assessment. A 11 listing of the categorized assessments was provided to the contractor to facilitate choosing the 12 random sample for in-depth evaluation of the treatment of variability and uncertainty. 13 As recommended by the SAB Executive Committee, a second reviewer (an EPA health 14 scientist without routine involvement in preparing or reviewing IRIS assessments) repeated the 15 above evaluative step, without any knowledge of the results of the first round of review. The 16 details of this second evaluation are also provided in the attached EPA Screening Report. 17 18 3.3 In-depth evaluation 19 The in-depth evaluation then focused on 16 IRIS assessments, half (8) from the pre-Pilot 20 assessments and the other half from the Pilot/post-Pilot assessments. Within these two subsets, 21 the assessments were randomly selected from the "some/moderate" and "extensive" 22 documentation categories as evenly as possible. The assessments under "none/minimal" category 23 were not included in this part of the evaluation; it was not clear whether it would be a good use of 24 the experts' effort to review these assessments, as they likely contained limited characterization of 25 uncertainty and variability, at least based on the summary information. EPA's contractor (Versar, 26 Inc.) selected the sample of 16 assessments for in-depth evaluation. The materials for in-depth 27 review of the pre-Pilot assessments included the IRIS summaries and the supporting EPA Source 28 Document(s) identified in each summary. For the Pilot/post-Pilot assessments, the materials were 29 the IRIS summary and Toxicological Review. The selection process and assessments chosen are 13 ------- 1 provided in the attached Versar report. 2 EPA's contractor assembled and coordinated a set of six independent experts to carry out 3 the review. These experts were selected on the basis of their in-depth knowledge of EPA's 4 human health risk assessment methodologies, familiarity with IRIS, knowledge of current 5 practices for evaluating and documenting uncertainty and variability in data used in health 6 assessments, and expertise in how these factors relate to sensitive subpopulations including 7 children. They represented a range of professional affiliations and of health science backgrounds 8 among cancer and non-cancer toxic endpoints. The experts evaluated the documentation of 9 uncertainty and variability in assessments on the basis of the data available at the time each 10 assessment was conducted, focusing on the presentation of available data and variability in that 11 data, discussion of confidence and uncertainty, including any uncertainty factors applied. The 12 evaluators self-certified that they had not been involved in the development or peer review of the 13 assessments under review for the study, and that they could perform independently, free of 14 conflict of interest. Each evaluator was assigned 8 assessments to review, generally evenly 15 divided between pre-Pilot and Pilot/post-Pilot assessments. Each chemical assessment was 16 independently reviewed by three evaluators. The evaluators and their assigned assessments are 17 listed in Table 2-6 of the attached Versar report. 18 The evaluators were asked to answer the following questions: 19 • Considering the data available at the time each assessment was performed, and the EPA 20 guidelines and methodologies operative at the time of the assessment, did EPA 21 characterize to an appropriate extent the uncertainty and variability in data used to 22 develop these IRIS health assessments? How does this compare between pre-Pilot and 23 Pilot/post-Pilot assessments? 24 • Did EPA appropriately address the strengths and weaknesses of the scientific evidence 25 from available studies, and sources of variability in the data used in each assessment? 26 • Did EPA appropriately address the uncertainties in the underlying data, and uncertainties 27 in the qualitative and quantitative judgments given in each assessment? 28 The evaluators were also encouraged to raise other relevant observations or comments. 29 14 ------- 1 4. SUMMARY OF RESULTS 2 The summary findings of the screening and in-depth evaluations are provided below. 3 Details of review results can be found in the attached EPA report (screening evaluation) and 4 Versar report (overall summary of in-depth review and Appendix A, containing individual 5 reviewers' findings). 6 7 4.1 Screening Evaluation 8 The results of the screening evaluation of the 52 pre-Pilot IRIS summaries by the first 9 EPA reviewer were that: 3/52 had extensive, 16/52 some or moderate, and 33/52 none or minimal 10 presentation or discussion of variability and uncertainty. Nearly all of the Pilot/post-Pilot 11 assessments (14/15) showed extensive documentation of variability and uncertainty in the IRIS 12 summary and Toxicological Review. It should be noted that a proper comparison between the 13 two groups of assessments (pre-Pilot versus Pilot/post-Pilot) cannot be made as it requires an 14 evaluation of a comparable set of assessment documentation (the source documents for the pre- 15 Pilot assessments were not evaluated in the screening phase). The independent verification of the 16 screening evaluation by a second EPA reviewer produced similar results (see attached EPA 17 Screening Report, Table 5), with a Spearman rank correlation coefficient of 0.82. For 15 18 assessments, the ratings for the reviewers differed by one category. 19 Given the valuable input from the verification step, it is reasonable to consider the results 20 of the two rankings together. Among the 52 pre-Pilot summaries, then, approximately two-thirds 21 (63-79%) contained none to minimal documentation of variability and uncertainty information. 22 Almost all (93-100%) of the assessments carried out after 1995 demonstrated extensive 23 documentation of variability and uncertainty information. 24 25 4.2 In-Depth Evaluation 26 The report of the in-depth evaluation (attached Versar Report) summarizes the collective 27 findings and conclusions of the six evaluators in responding to EPA's questions. The evaluators' 28 individual reports are provided in Appendix A of the Versar report. The primary conclusions to 29 each question are summarized below. 15 ------- 1 Considering the data available at the time each assessment was performed, and the EPA 2 guidelines and methodologies operative at the time of the assessment, did EPA characterize to 3 an appropriate extent the uncertainty and variability in data used to develop these IRIS health 4 assessments? How does this compare betweenpre-Pilot andPilot/post-Pilot assessments? 5 As described above, six independent evaluators examined in-depth a sample of 16 IRIS 6 assessments which had been found to have either a "some/moderate" or "extensive" degree of 7 documentation of variability and uncertainty in the screening evaluation. Each chemical 8 assessment consisting of an IRIS summary and any supporting document(s) was reviewed by 9 three independent evaluators. There was a range of opinions concerning the adequacy of 10 documentation of data variability and uncertainty for the individual assessments among the 11 reviewers. This range extended from two assessments (pre-Pilot assessments from 1988 and 12 1990) considered by all 3 reviewers to have been inadequately characterized, to one assessment 13 (post-Pilot assessment from 1998) unanimously considered to demonstrate thoroughly adequate 14 documentation. The evaluations for each of the other 13 assessments were not unanimous but 15 were still informative (see Versar report, Table 3-2). These evaluations are discussed further 16 below. 17 The evaluators generally concluded that the pre-Pilot IRIS summaries provided limited 18 information on uncertainty and variability, although this was consistent with the practice at the 19 time. Further, a number of evaluators felt that pre-Pilot assessments often did not utilize existing 20 human data to interpret the relevance of toxic effects in animals to humans, even when the human 21 data seemed to support the consideration of other toxic endpoints. Some noted that route-to- 22 route extrapolation, for both cancer and noncancer effects, was routinely carried out without any 23 apparent scientific justification. Despite these shortcomings, evaluators did point out that two 24 (l,2-dibromo-3-chloropropane and manganese) of the eight pre-Pilot summaries were especially 25 well characterized regarding uncertainly and variability (see Versar report, Section 4), when 26 judged according to practices standard at the time. 27 The evaluators noted that the Pilot/post-Pilot IRIS summaries typically presented more 28 information than the pre-Pilot summaries, but at the same time varied in quality. More 29 specifically, they concluded that some Pilot/post-Pilot summaries contained little discussion of 16 ------- 1 variability and uncertainty, while others were distinctly more comprehensive than pre-Pilot 2 assessments. The more comprehensive assessments included more description and better 3 discussion of data gaps and endpoints such as reproductive/developmental or neurological effects, 4 as well as physicochemical information relevant to pharmacokinetics and toxicity and more 5 complete synopses of conclusions for each supporting study. The best Pilot/post-Pilot 6 assessments contained a more comprehensive discussion of the mechanism of action, the 7 relevance of the critical effect to humans, or the impact of pharmacokinetic or metabolic 8 information on interspecies variability. Two of these better assessments (ethyl ene glycol 9 monobutyl ether and methyl methacrylate) were highlighted for using this additional information 10 to adjust uncertainty factors away from the default values. 11 The evaluators appreciated the availability of the Toxicological Review documents that 12 accompany the IRIS summaries on the IRIS website. 13 14 Did EPA appropriately address the strengths and weaknesses of the scientific evidence 15 from available studies, and sources of variability in the data used in the assessment? 16 The evaluators concluded that the strengths and weaknesses of the scientific evidence 17 from available studies were not thoroughly addressed in the earlier IRIS assessments, relative to 18 the later assessments. It was found that only one of the eight pre-Pilot assessments appropriately 19 addressed all of the substantive studies available at the time of the assessments. On the other 20 hand, the evaluators considered six of the eight Pilot/post-Pilot assessments appropriately 21 addressed the strengths and weaknesses of the substantive studies available at the time of the 22 assessments (see Versar report, section 3). 23 24 Did EPA appropriately address the uncertainties in the underlying data, and 25 uncertainties in the qualitative and quantitative judgments given in the assessment? 26 In addition to verifying whether the standard uncertainty factors of the time were applied 27 appropriately to develop the provided RfD/RfC, the evaluators determined whether additional 28 issues contributing to variability and uncertainty had been considered, such as mechanism of 29 action, variations in species susceptibility, potential for existence of sensitive subpopulations, 17 ------- 1 relevance of the dosing regimen to likely human exposure pathways, and relevance of the critical 2 effect to humans. The evaluators found that these latter issues tended not to be addressed in the 3 pre-Pilot summaries, with the exception of two (1,2-dibromochloropropane and manganese). 4 The evaluators raised similar concerns about the Pilot/post-Pilot summaries with respect 5 to these issues. Except for one assessment (methyl methacrylate) for which there was full 6 agreement that uncertainties of the assessment had been adequately addressed, there was a range 7 of opinions for the other seven Pilot/post-Pilot IRIS summaries. That is, there was usually at least 8 one evaluator who was dissatisfied with these summaries, on the basis of the lack of coverage of 9 these more advanced scientific issues. 10 Reviewers' Recommendations In addition to responding to the three questions above, 11 there were some general themes in the evaluators' individual recommendations for improving 12 IRIS assessments. First, the reviewers recommended development of a standardized approach to 13 handling variability and uncertainty in IRIS assessments. It was also recommended that data 14 quality issues should be clarified in IRIS assessments. Specifically, toxicological experiments 15 carried out before the advent of Good Laboratory Practices (GLPs) should be earmarked as such, 16 since there could be more uncertainty attached to data carried out before this standardization was 17 implemented. Also, data from unpublished or non-peer-reviewed sources could carry similar 18 uncertainties. The evaluators also emphasized that there did not appear to be enough 19 consideration of the relevance of specific findings in animals to humans, both in choice of critical 20 effects and exposure conditions. They also felt that the presumption that humans are more 21 sensitive to environmental toxicants required more justification and discussion in most 22 assessments. 23 In their individual reports (Appendix A of the Versar report), the evaluators made specific 24 recommendations for improving those assessments they reviewed. These recommendations 25 generally addressed inclusion of more recent scientific information (such as, mode of action or 26 discussion of concordance of animal and human health endpoints) and pointed out instances 27 where these data might support the use of more recently developed risk assessment methods (e.g. 28 benchmark dose, quantitative uncertainty analysis). 29 18 ------- 1 5. DISCUSSION 2 The characterization of the extent of documentation of variability and uncertainty in 3 chemical assessments in IRIS was accomplished using a tiered strategy, first by screening for the 4 degree of this documentation in broad terms in a random sample, then in-depth in a smaller, 5 targeted subsample. The representativeness of the in-depth evaluations for characterizing the rest 6 of the database, first for the pre-Pilot IRIS assessments, then for the later IRIS assessments is 7 discussed below. 8 The screening evaluation of 10% of the pre-Pilot IRIS data base provided a baseline for 9 characterizing the IRIS database. Recall that about two-thirds (63-79%) of the sample of pre- 10 Pilot IRIS summaries were found to have none to minimal documentation of variability and 11 uncertainty (see section 4.1 above). Given the subjective nature of this evaluation, the additional 12 review and consensus-building necessary to narrow this estimate did not appear warranted. Thus, 13 it was concluded that approximately one-third (21-37%) of the pre-Pilot IRIS summaries 14 demonstrated at least some documentation of the variability and uncertainties in deriving the 15 toxicity values provided. 16 There was reasonable concordance for the pre-Pilot assessments between the screening 17 evaluation and the in-depth review, given the different purposes of the two steps of the overall 18 evaluation. In particular, two assessments (hexachlorobenzene and Prochloraz) were considered 19 by the evaluators in their in-depth review to have inadequate documentation (see Versar report, 20 section 3). These assessments were also judged to have minimal rather than moderate 21 documentation in the independent verification stage of the screening evaluation (EPA Screening 22 Report, Appendix B). At the other end of the scale, the two assessments highlighted as the most 23 thoroughly documented of the pre-Pilot in-depth sample (1,2-dibromochloropropane and 24 manganese) were also considered to be extensively documented in the screening evaluation. 25 One apparent outlier involved an assessment determined in the screening evaluation to 26 have moderate documentation, yet was considered unanimously by the in-depth evaluators to 27 have inadequate documentation of uncertainties (4-methylphenol; see Versar report, section 3.1). 28 While the degree of discussion in the summary was more detailed than was otherwise typical at 29 the time (1990), the evaluators concluded that important aspects of uncertainty had been 19 ------- 1 overlooked, e.g, incomplete use of data available at the time, and uncritical use of data from 2 structural analogues that were not clearly relevant. 3 The correspondence of the screening evaluation and the in-depth evaluation for the 4 Pilot/post-Pilot assessments was also complementary. It was found in the screening evaluation 5 that the IRIS summary and Toxicological Review for the Pilot/post-Pilot assessments generally 6 contained extensive documentation of variability and uncertainty. In the in-depth evaluation, the 7 reviewers further examined the completeness of the discussions provided. While they concluded 8 that the quality of the discussions varied, it was also not always clear whether these remarks were 9 addressed to the IRIS summary alone, the Toxicological Review alone, or to both. 10 In conclusion, the statistical sampling approach taken in choosing the assessments to 11 review allows some generalization of the results of the screening evaluation and the in-depth 12 evaluation to the rest of the IRIS data base. That is, based on a 10% sample, approximately two- 13 thirds of the pre-Pilot IRIS summaries can be expected to contain minimal discussion of the 14 variability and uncertainty inherent in the available toxicity values. The remaining third of the pre- 15 Pilot IRIS summaries can be expected to contain at least moderate documentation of variability 16 and uncertainty. Among assessments with at least moderate documentation of variability and 17 uncertainty, in their in-depth review, the evaluators found that coverage of relevant uncertainty 18 and variability issues was uneven across the assessments they reviewed, with two of the eight 19 assessments noticeably more comprehensive than the other pre-Pilot assessments. Among the 20 Pilot/post-Pilot assessments, all but one demonstrated extensive documentation of variability and 21 uncertainty, partly through the ready availability of the accompanying Toxicological Reviews. 22 The evaluators' in-depth reviews of eight of these assessments noted a range in quality of the 23 discussion of relevant uncertainties in these assessments as well. One Pilot/post-Pilot assessment 24 was highlighted as being more comprehensive than all of the other assessments examined in- 25 depth. 26 The independent evaluators also made several recommendations for improving IRIS 27 assessments, including the need for updating assessments. EPA recognizes that many assessments 28 in the IRIS database have not been updated and therefore either may not reflect the latest 29 scientific findings or current risk assessment methods. With respect to current risk assessment 20 ------- 1 methods, EPA has been applying the revised cancer guidelines in all assessments underway since 2 they were proposed, as noted in Section 2 above, but acknowledges that some unevenness in 3 documentation exists while the Agency gains experience in applying them. Concerning "data- 4 derived" uncertainty factors, it should be noted that EPA-published risk assessment guidelines 5 support the use of relevant data to replace these defaults. Limitations in developing data-derived 6 factors are mostly due to the unavailability of useful data to justify departure from defaults. EPA 7 is developing guidance for risk assessors in the application of the "data-derived" approach to 8 facilitate the maximum use of scientific data in replacing default UFs. Moreover, EPA 9 acknowledges that discussion of many of these underlying uncertainties in IRIS assessments can 10 be improved. 11 One of the more recent risk assessment methods encouraged by several reviewers was 12 quantitative uncertainty analysis. The goal of a quantitative uncertainty analysis is to clarify the 13 overall degree of variability and uncertainty and the confidence that can be placed in the analysis 14 and its findings, through a systematic approach to account for relationships among the inputs or 15 assumptions (in the case of risk assessment for IRIS, all of the data choices and uncertainty 16 decisions discussed above) which contribute to a risk decision (in this case a toxicity value). 17 Quantitative choices must be made for each input, even for qualitative decisions. A number of 18 recent documents have emphasized the importance of adequately characterizing variability and 19 uncertainty in risk assessments and discuss quantitative uncertainty analysis in more detail (US 20 EPA, 1992, 1995, 1997a,b; National Academy of Sciences, 1994). EPA's current Policy for Use 21 of Probabilistic Analysis in Risk Assessment (1997b) provides that: 22 23 For human health risk assessments, the application of Monte Carlo and other 24 probabilistic techniques has been limited to exposure assessments in the majority of 25 cases. The current policy, Conditions for Acceptance and associated guiding principles 26 are not intended to apply to dose response evaluations for human health risk assessment 27 until this application of probabilistic analysis has been studied further. 28 29 Since it is the function of IRIS to implement Agency-approved published methodologies 21 ------- 1 and Agency-wide policies, implementation of newer risk assessment methods in IRIS waits for an 2 Agency-level mandate. In the meantime, EPA agrees that a thorough description of the available 3 data and its related uncertainties can provide the IRIS user with a level of confidence in a 4 particular assessment, and can lay the groundwork for later uncertainty analysis, should it be 5 considered practical. 6 7 6. CONCLUSIONS 8 The results of the screening evaluation indicated that about a third of IRIS summaries for 9 pre-Pilot chemical assessments had at least some documentation of data variability and 10 uncertainty, while a large majority of Pilot/post-Pilot assessments (consisting of both IRIS 11 summaries and Toxicological Reviews) had extensive documentation. While the documentation 12 in assessments has improved overall since the IRIS Pilot's introduction of Toxicological Reviews 13 to substantiate IRIS summaries, the results of the in-depth evaluation indicate that the quality of 14 the characterization of data variability and uncertainty varies among the Pilot/post-Pilot 15 assessments. 16 This study supports EPA's commitment to provide more transparent scientific bases for 17 risk assessment conclusions. EPA will continue to look into ways to improve documentation of 18 variability and uncertainly issues in future Toxicological Reviews, and recapitulate this 19 information in IRIS summaries. 20 22 ------- 7. REFERENCES 3 Barnes, DG, and Dourson, M. (1988) Reference dose (RfD) description and use in health risk 4 assessments. Regulatory Toxicology and Pharmacology. 8:4471-4486. 5 6 National Academy of Sciences (1983) Risk Assessment in the Federal Government: Managing 7 the Process. Washington, DC. 8 9 National Research Council (1994) Science and Judgment in Risk Assessment. National 10 Academy Press: Washington, D.C. 11 12 U.S. Environmental Protection Agency (1986) Guidelines for carcinogen risk assessment. 13 Federal Register 51(185):33992-34003. 14 15 U.S. Environmental Protection Agency (1991) Guidelines for Developmental Toxicity Risk 16 Assessment, dated December 5, 1991. Fed. Reg. 56, No.234: 63798-63826. 17 18 U.S. Environmental Protection Agency (1992) Guidelines for Exposure Assessment Federal 19 Register 57: 22888-22938. EPA/600Z-92/001. 20 21 US Environmental Protection Agency (1994a) Methods for Derivation of Inhalation Reference 22 Concentrations and Application of Inhalation Dosimetry. Office of Health and Environmental 23 Assessment, National center for Environmental Assessment, RTF, NC. EPA/600/8-90-066F. 24 25 US Environmental Protection Agency (1994b) Peer Review and Peer Involvement at the US 26 EPA. Memorandum of the Administrator, Carol M. Browner, June 7. 27 28 US Environmental Protection Agency (1995) EPA Guidance on Risk Characterization. 29 Memorandum of the Administrator, Carol M. Browner, March 21. 30 31 U.S. EPA. (1995c) Use of the Benchmark Dose Approach in Health Risk Assessment. Office of 32 Research and Development. EPA/630/R-94/007, February 1995. 33 34 U.S. Environmental Protection Agency (1996a. Proposed Guidelines for Carcinogen Risk 35 Assessment, Notice, 1996. Federal Register 61(79):17960-18011. 36 37 U.S. Environmental Protection Agency (1996b) U.S. EPA, 1996b. Guidelines for Reproductive 38 Toxicity Risk Assessment. October 31. Federal Register 61 (212): 56274-56322. 39 40 US Environmental Protection Agency (1997a) Guiding Principles for Monte Carlo Analysis. 4 1 Office of Research and Development. EPA/63 O/R-97/00 1 . 42 23 ------- 1 US Environmental Protection Agency (1997b) Policy for Use of Probabilistic Analysis in Risk 2 Assessment. Memorandum of the Deputy Administrator, Fred Hansen, May 15. 3 4 U.S. Environmental Protection Agency (1998) Guidelines for Neurotoxicity Risk Assessment, 5 dated October 4, 1995. Federal Register 63 (93): 26926-26954. 6 7 U.S. Environmental Protection Agency (1999) Proposed Guidelines for Carcinogen Risk 8 Assessment. External Review Draft. 24 ------- |