EPA/625/3-90/017 September 1989 Workshop Report on EPA Guidelines for Carcinogen Risk Assessment: Use of Human Evidence Assembled by: Eastern Research Group, Inc. 6 Whittemore Street Arlington, MA 02174 EPA Contract No. 68-02-4404 for the Risk Assessment Forum Technical Panel on Carcinogen Guidelines U.S. Environmental Protection Agency Washington, DC 20460 ------- NOTICE Mention of trade names or commercial products does not constitute endorsement or recommendation for use. This workshop was organized by Eastern'Research Group, Inc., Arlington, Massachusetts, for the EPA Risk Assessment Forum. ERG also assembled and produced this workshop report. Sections from individual contributors were edited somewhat for clarity, but contributors were not asked to follow a single format. Relevant portions were reviewed by each workshop chairperson and speaker. Their time and contributions are gratefully acknowledged. The views presented are those of each contributor, not the U.S. Environmental Protection Agency. -ii- ------- CONTENTS PAGE INTRODUCTION 1 MEETING AGENDA . . 5 COLLECTED WORKSHOP MATERIALS ."...'.. ; ......... 7 Study Design and Interpretation ... 9 Chair, Summary . . . . 15 EPA Classification System for Categorizing Weight of Evidence for Carcinogenieity from Human Studies 23 Chair Summary 27 Dose-Response Assessment . . 39 Chair Summary 42 APPENDICES Appendix A EPA Risk Assessment Forum Technical Panel on Carcinogen Guidelines and Associates 47 Appendix B List of Participants 51 Appendix C List of Observers 57 Appendix D Introductory Plenary Session Comments (Drs. Philip Enterline, Raymond R. Neutra, and Gerald Ott) 61 Appendix E 1986 Guidelines for Carcinogen Risk Assessment 81 -iii- ------- ------- WORKSHOP REPORT ON EPA GUIDELINES FOR CARCINOGEN RISK ASSESSMENT: USE OF HUMAN EVIDENCE June 26-27, 1989 Washington, DC INTRODUCTION 1. Guidelines Development Program On September 24, 1986, the U.S. Environmental Protection Agency (EPA) issued guidelines for assessing human risk from exposure to environmental carcinogens (51 Federal Register 33992-34003). The guidelines set forth principles and procedures to guide EPA scientists in the conduct of Agency risk assessments, to promote high scientific quality and Agency-wide consistency, and to inform Agency decision-makers and the public about these scientific procedures. In publishing this guidance, EPA emphasized that one purpose of the guidelines was to "encourage research and analysis that will lead to new risk assessment methods and data," which in turn would be used to revise and improve the guidelines. Thus, the guidelines were developed and published with the understanding that risk assessment is an evolving scientific undertaking and that continued study would lead to changes. As expected, new information and thinking in several areas of carcinogen risk assessment, as well as accumulated experience in using the guidelines, has led to an EPA review to assess the need for revisions in the guidelines. On August 26, 1988, EPA asked the public to provide information to assist this review (53 Federal Register 52656-52658). In addition, EPA conducted two workshops to collect further information. The first workshop for analysis and review of these issues was held in Virginia Beach, Virginia, on January 11-13, 1989 (53 Federal Register 49919-20). That workshop brought together experts in various areas of carcinogen risk assessment to study and comment on the use of animal evidence in considering both qualitative issues in classifying potential carcinogens and quantitative issues in dose-response and -1- ------- extrapolation. The report from this workshop was made available to the public on April 24, 1989 (54 Federal Register 16403). On June 16, 1989, the Agency announced that a workshop for the study and review of the use of human evidence in risk assessment would be held in Washington, D.C., on June 26 and 27, 1989 (54 Federal Register 25619). This report is a. compilation of the discussions and presentations from that meeting. As with the Virginia Beach meeting, the Agency's intention was not to achieve consensus on or resolution of all issues. It was hoped instead that these workshops would provide a scientific forum for objective discussion and analysis. These workshops are part of a three-stage process for reviewing and, as appropriate, revising EPA's cancer risk assessment guidelines. The first stage began with several information-gathering activities to identify and define scientific issues relating to the guidelines. For example, EPA scientists and program offices were invited to comment on their experiences with the 1986 cancer guidelines. Also, the August 1988 Federal Register notice asked for public comment on the use of these guidelines. Other information was obtained in meetings with individual scientists who regularly use the guidelines. Information from the workshops and these other sources will be used to decide when and how the guidelines should be revised. In the second stage of the guidelines review process, EPA is analyzing the information described above to make decisions about changing the guidelines, to determine the nature of any such changes and, if appropriate, to develop a formal proposal for peer review and public comment. EPA's analysis of the information collected so far suggests several possible outcomes, ranging from no changes at this time to substantial changes for certain aspects of the guidelines. In the third stage of this Agency review, any proposed changes would be submitted to scientific experts for preliminary peer review, and then to the general public, other federal agencies, and EPA's Science Advisory Board for ------- comment. All of these comments would be. evaluated in developing final guidance. 2. Human Evidence Workshop On June 26 and 27, 1989, epidemiologists and others met in Washington, B.C. to study and comment on the scientific foundation for possible changes in the human evidence sections of the 1986 carcinogen guidelines. In general, although these guidelines emphasize that reliable human evidence takes precedence over animal data, guidance on the use of human evidence is considerably less detailed than that for animal data. Thus, workshop discussions focused on the possibilities of expanding and clarifying the guidelines by adding new language for (1) study design and interpretation, (2) quantification of human data, and (3) weight-of-evidence analyses for human data. The workshop participants met both in plenary sessions and in separate work groups to consider "strawman" language for potential inclusion in revised guidelines for carcinogen risk assessment. They also addressed related questions posed by the EPA Technical Panel. The work group on study design issues was chaired by Dr. Marilyn Fingerhut, Chief, Industrywide Studies Branch, at the National Institute for Occupational Safety and Health (NIOSH). Dr. Philip Enterline, Emeritus Professor of Biostatistics at the University of Pittsburgh School of Public Health, chaired the work group on dose-response issues. The work group on weight of evidence classification issues was chaired by Dr. Raymond Neutra, Chief of the Epidemiologic Studies Section of the California Department of Health Services. Dr. Enterline was the overall chair for the workshop. The strawman language and questions were developed by a subcommittee of the EPA Technical Panel. These documents were intended to initiate and guide work group discussions rather than to formally propose specific language or policy. Members of the EPA Technical Panel also participated in each work group. Other EPA scientific staff and the public attended the workshop as observers. -3- ------- As a scientific forum for objective discussion and analysis among the invited panelists, the workshop was designed to assist EPA epidemiologists and scientists in developing the scientific foundation for proposed guidance on the use of human evidence in risk assessment. Broader policy issues will become important later in the process when the public is invited to review any proposed changes in the guidelines. , -4- ------- Monday, June 26 Time 7:30 a.m. 8:30 a.m. 8:35 a.m. 8:50 a.m. 9:20 a.m. 9:50 a.m. 10-10:20 a.m. 10:20 a.m. 11:30 a.m. 12:00-1:15 p.m. 1:15 p.m. 3:15-3:30 p.m. 3:30 p.m. 5:30 p.m. 5:30-7:00 p.m. EPA CANCER GUIDELINES REVIEW WORKSHOP ON HUMAN EVIDENCE June 26-27, i?89 ; AGENDA AND WORRGROOP ASSIGNMENTS Chairman, Dr. Philip Enterline Topic Registration Checkt-rin Welcome Opening Comments Public interest VieVs Private Sector Views Administrative Announcements COFFEE BREAK Observer Comments Charge to Work Groups LUNCH Workgroups A: Study Design and Interpretation B: Weight of Evidence C: Dose Response COFFEE BREAK Workgroups A & B C Adjourn Cash Bar Reception Principals All , Dr. Patton Dr, Enterline Dr. Neutra and Panelists Dr. Ott and Panelists Ms. Schalk Dr. Farland Dr. Fingerhut, Chair Dr.' Neutra, Chair Dr. Enterline,.Chair Dr, Neutra & Dr. Fingerhut or* Enterline -5- ------- Tuesday, June 27 Time Topic 8:00 a.m. 10:15-10:30 a.m. 10:30 a.m. 11:30 a.m. 12:15 p.m. 12:30 p.m. Workgroup Reports and Discussion BREAK Observer Comments and Discussion Workgroup. Recommendations Wrap-up ADJOURNMENT Principals Workgroup Chairs (Drs. Enterline, Fingerhut & Neutra); Panelists Drs. Fingerhut, Neutra, Enterline Dr. Enterline Workgroup A; Chair: Members: WORKGROUP ASSIGNMENTS Study Design and Interpretation Dr. Fingerhut Drs. Matanoski, Hulka, Buffler, Cantor, Friedlander, D. Hill, Halperin,'Hogan, Koppikar Workgroup B; Chair: Members: Weight of Evidence Dr. Neutra Drs. Cole, Ott, Blair, Falk, Infante, M.Chu, Bayliss, Blondell,Margosches Workgroup C; Chair: Members: Dose-Response Dr. Enterline Drs. Crump, Checkoway, Gibb, Raabe, Smith, Krewski, Chen, Nelson, K. Chu, Farland, Scott EPA Technical Panel; Drs. /Farland, R. Hill, Patton, M. Chu, Rhomberg, Wiltse, Rees, Gibb, Bayliss, Blondell, Chen, D. Hill, Hogan, Margosches, Nelson, Scott Risk Assessment Forum Staff; Drs. Patton, Rees -6- ------- COLLECTED WORKSHOP MATERIALS Study Design and Interpretation Strawman Language and Related Questions Chair Summary of Work Group Session EPA Classification System for Categorizing Weight of Evidence for Carcinogenicity from Human Studies Strawman Language and Related Questions Chair Summary of Work Group Session Dose-Response Assessment Strawman Language and Related Questions Chair Summary of Work Group Session ------- ------- STUDY DESIGN AND INTERPRETATION Strawman Language and Related Questions Introduction and Study Types1 Epidemiologic studies provide unique information about the response of humans who have been exposed to suspect carcinogens. These studies allow the possible evaluation of the consequences of an environmental exposure in the precise manner in which it occurs and will continue to occur in human populations (OSTP, 1985). There are various types of studies or study designs that are well-described and defined in various textbooks and other documents (e.g., Breslow et al., 1980, 1987; Kelsey et al., 1986; Lilienfeld et al., 1979; Mausner et al., 1985; Rothman, 1986). The more common types are described below. A variety of study designs are considered to be hypothesis-generating. In general, these studies utilize already existing collections of data (e.g., vital statistics, census data), but only produce indirect associations, because they are based on broadly defined group or population characteristics. Studies depending on case reports typically are also considered hypothesis - generating, because the relatively limited numbers of cases and the absence of comparison groups generally do not permit causal inferences. Generally cross- sectional studies are hypothesis-generating. Sometimes a population may be well enough followed or restricted that bias is unlikely to arise from migration, mortality or similar removal from the observed group; a cross- sectional or prevalence study may then offer some sort of risk estimate. Epidemiologic studies designed to test a specific hypothesis, such as case-control and cohort studies, are more useful in assessing risks to exposed humans. These studies examine the characteristics of individuals within a Editor's Note: Unless otherwise noted, paragraph numbers refer to the Human Studies section of the 1986 Guidelines for Carcinogen Risk Assessment in Appendix E of this document. 9 ------- population. Case-control studies can provide reasonable estimates of population-based risk when controls are properly chosen, while cohort designs have the best capability to provide accurate estimates of population-based risk. Under certain circumstances, case-report studies may support causal associations, and prevalence studies may provide population-based risks. Issues: We have noted that use of the "descriptive/analytical" characterization of the array of study designs may provoke classification disagreements, detracting from the desired focus on which designs have what utility for use in risk assessment. For that reason, two sentences previously in the Guidelines have been omitted. Does the Panel believe they should be restored or supplied in some other fashion, or does the current text provide sufficient discussion? Should PMR studies or clusters be specifically addressed? do they fit in? Where Should the guidelines point out that studies designed specifically to test hypotheses also can generate other hypotheses, but the distinction between these types of information should be maintained. Adequacy (to replace current paragraph 2) Criteria for the adequacy of epidemiologic studies for risk assessment purposes include, but are not limited to, factors which depend on the study design and conduct: 1. The proper selection and characterization of study and comparison cases or groups. 2. The adequacy of response rates and methodology for handling missing data. 3. Clear and appropriate methodology for data collection and analysis. 4. The proper identification and characterization of confounding factors and bias. -10- ------- 5. The appropriate consideration of latency effects, 6. The valid ascertainment of the causes of morbidity and death. 7. Complete and clear documentation of results. For studies claiming to show no evidence of human carcinogenicity associated with an exposure, the statistical power to detect an appropriate outcome should be included in the assessment, if it can be calculated. It should be noted that sufficient statistical power alone does not determine the adequacy of a study. Although not unique to human studies, it is important to reiterate that sufficient and thorough evaluation of suspect carcinogens requires that evidence be available in a form, quality, and quantity suitable for assessment. In some cases, the availability of and access to raw data may be important. Guidelines for reporting epidemiolpgical research results have been previously published (IRLG, 1981; others?) Issues: a. Application'of various criteria are dependent on study type; most listed here usually apply to case-control and cohort studies. Is this a problem? b. Should we address rationales for combining sites and tumor types in this section, as done in the animal section? c. Given the discussion in the weight-of-evidence section, do we need much more here on what constitutes an adequate study? d. Any suggestions for appropriate citations? e. Is there too much emphasis on statistical power? Criteria for Causality (paragraphs 3 and 4 deleted: new text) Epidemiologic data are often used to infer causal relationships. Many forms of cancer are stated as causally related to exposure to agents for which there is no direct biological evidence, most notably, cigarette smoking,and -11- ------- lung cancer. As insufficient knowledge about the biological basis for disease in humans makes it difficult to classify exposure to an agent as causal, epidemiologists and biologists have provided a set of criteria that define a set of relationships about data. A causal interpretation is enhanced for studies that meet the following criteria. None of these criteria actually proves causality; actual proof is rarely attainable when dealing with environmental carcinogens. The absence of any one or even several of these criteria does not prevent a causal interpretation; none of these criteria should be considered either necessary or sufficient in itself. Criteria for causality are: 1. Consistency: Several independent studies of the same exposure in different populations, all demonstrating an association which persists despite differing circumstances, usually constitute strong evidence for a causal interpretation (assuming the same bias or confounding is not also duplicated across studies). This criterion also applies if the association occurs consistently for different subgroups in the same study. Issue: Diverse responses from similar populations (races or species) lend weight to human conclusions but seem to detract from animal ones. How should we address this inconsistency? 2. Strength (magnitude) of association: The greater the estimate of the risk of cancer due to exposure to the agent, the more credible will be a causal interpretation. It is less likely that nonrandom error (e.g., bias or some confounding variable) or chance can explain the association because these factors themselves have to be highly associated with the disease. A weak association might be more readily explained by the presence of chance or bias. Issues: Should we provide a guideline value, such as a relative risk of 5.0, since magnitude of association can also depend on, for instance, the variety and range of magnitudes of exposure present and the rarity of the cancer? -12- ------- b. Should the example of smoking-alcohol-esophageal cancer be considered here? 3. Temporal relationship: The disease occurs within a reasonable enough time frame after the initial exposure to account for the health effect. Cancer requires a latent period during which transformation of neoplasia into . malignancy occurs and a period of time passes before discovery. While latency periods vary, existence of the period is acknowledged. Since the time of transformation is seldom known, however, the initial period of exposure to the agent is the accepted starting point in most epidemiologic studies. 4. Dose-response or biologic gradient: An increase in the measure of effect is correlated positively with an increase in the exposure or estimated dose. A strong dose-response relationship across several categories of exposure can be considered to be evidence for causality if confounding effects are unlikely to be correlated with exposure levels. The absence of a dose-response gradient, however, may mean only that the maximum effect had already occurred at the lowest dose or perhaps all gradients of exposure were too low to produce a measurable effect. The absence of a dose-response relationship should not be construed as evidence of a lack of a causal relationship. 5. Specificity of the association: If a single, clearly-defined exposure is associated with an excess risk of one or more site-specific cancers, while other sites show no association, it increases the likelihood of a causal interpretation. Different agents, however, may be responsible for more than one site-specific cancer. Replication of the specific association(s) in different population groups (cf. consistency) would then be needed to provide strong support for a causal interpretation. Issue: Shall we retain this last sentence? A comment has been made that specific locations (i.e., microenvironments) influence expression of cancer. In some cases, conclusions regarding an association may be based on a mixture of chemicals rather than the specific chemical in question. In these cases, judgment on the causal relationship to the specific chemical will -13- ------- depend on such other information as the pharmacokinetics of the chemical or other biologic or epidemiologic data. Issue: Shall we include: In some instances, it may be concluded only the mixture can be held culpable, e.g., in the process to produce benzyl chloride. 6. Biological plausibility: The association makes sense in terms of what is known about the biologic mechanisms of the disease or other epidemiologic knowledge. It is not inconsistent with biological knowledge about how the exposure under study could produce the cancer. 7. Collateral evidence: A cause-and-effect interpretation is consistent with what is known about the natural history and biology of the disease. A proposed association that conflicted with existing knowledge would have to be examined with particular care. References Breslow, N.E. and Day, N.E. Statistical Methods in Cancer Research, Vol. 1. The Analysis of Case-Control Data. 1980. Breslow, N.E. and Day, N.E. Statistical Methods in Cancer Research, Vol. 2. The Design and Analysis of Cohort Studies. 1987. Interagency Regulatory Liaison Group (IRLG). Guidelines for documentation of epidemiologic studies. American Journal of Epidemiology. 114:609-613. 1981. Kelsey, J.L, , Thompson, ₯.D., and Evans, A.S. Methods in Observational Epidemiology. 1986. Lilienfeld, A.M. and Lilienfeld, D. Foundations of Epidemiology, 2nd ed. 1979. Mausner, J.S. and Kramer, S. Epidemiology, 2nd ed. 1985. Office of Science and Technology Policy (OSTP). Chemical carcinogens: review of the science and its associated principles. Federal Register 50:10372- 10442. 1985. Rothman, K.J. Modern Epidemiology. 1986. -14- ------- Chair Summary of Work Group Session on Study Design and Interpetation Chair: Dr. Marilyn Fingerhut Introduction The Study Design Work Group focused on three questions contained in the strawman language suggested by the EPA staff as a preliminary revision of Section II.B.7: 1) What types of epidemiologic studies are acceptable to the EPA for risk assessment? 2) What characteristics are desirable in a study to be used for risk assessment? and 3) What criteria strengthen the view that an epidemiologic association may reflect a causal relationship? The Study Design Work Group and the Weight-of-Evidence Work Group met together to consider the first question and to agree upon the types of studies to be discussed in each Work Group. The members of the Study Design Group discussed Questions 2 and 3. The sections below briefly describe the discussions pertaining to the three questions, and identify the recommendations and suggestions made to EPA. The sections contain revised strawman'language for Section II.B.7, which reflects the ideas suggested by the Study Design Work Group. This chair's summary presents the work group's views within the context of the strawman document. Question 1: What types of human studies are acceptable to the EPA for purposes of risk assessment? The members of both the Study Design and Weight-of-Evidence groups discussed this question in some detail and concluded that all valid epidemiologic studies can contribute information to an EPA risk assessment. Consequently, the panelists rejected suggestions by a few members to weight certain study types more heavily than others. There was general agreement that the various types of epidemiologic studies, properly conducted, could be -15- ------- useful. These include cohort, case-control, cross-sectional, proportional mortality (incidence) ratios, clusters, clinical trials, and correlational studies. Each type has strengths and limitations. It was agreed that "case reports" do not constitute studies, but that some of these should be reviewed by EPA during a risk assessment effort, because series of case reports have provided key information about human risk for several chemicals. Vinyl chloride was one example cited. Both groups strongly recommended that EPA needs additional experienced epidemiologists to evaluate epidemiologic data and to assist in risk assessments, because professionally sophisticated judgments are required when evaluating the studies. The Study Design Group reviewed the EPA strawman language suggested as a replacement for the current paragraph 1 of Section II.B.7 Human Studies. The group generally agreed that the proposed revision not be used. The group suggested a brief replacement paragraph: Introduction and Study Types (to replace the current paragraph 1) Epidemiologic studies with various study designs can provide unique information about the response of humans who have been exposed to suspect carcinogens. Each study must be evaluated for its individual strengths and limitations. Conclusions about causal associations usually also include consideration of the entire body of literature, including toxicology and biologic mechanisms. The Study Design Work Group suggested that guidelines be written for use by experienced epidemiologists. Therefore, the following responses were given to the questions posed by the EPA on page 2 of the strawman text (p. 9 of this document). There is no need to distinguish studies as analytical vs. descriptive, or hypothesis-generating vs. hypothesis-testing, or complete vs. incomplete (as suggested by one member of the group) because experienced -16- ------- epidemiologists, who are aware of strengths and limitations of the various study designs, will judge studies by their inherent validity and applicability to the particular risk assessment. For this reason, there is no need to specifically address proportional mortality ratios (PMRs) or clusters, or to address the distinction between hypothesis-generating and hypothesis-testing in the guidelines. The Study Design Work Group recognized that it may be desirable to provide in the guidelines an overview of epidemiologic principles and study types. The information would be useful for professionals trained in other disciplines. The information could also explain to the public how the EPA uses human studies in risk assessment. The group suggested that this overview of epidemiology might be placed in an appendix. The group suggested that EPA continue to provide epidemiologic training to nonepidemiolegists in the Agency who are involved with the risk assessment activities. However, the key judgments on epidemiologic studies should be made by experienced epidemiologists. Upon learning that EPA has very few epidemiologists on staff, the group recommended expanding this expertise in the Agency. A few of the members of the combined Study Design and Weight-of-Evidence Groups suggested that EPA might wish to consider using, for risk assessment, the approach of the International Agency for Research on Cancer (IARC) in which the entire evaluation of human data is conducted,by expert epidemiologists, and is thus free from political interference. Other participants observed that political concerns may influence any such group. They suggested that the regular use of EPA staff provides an objective" approach to risk assessment. There Was some discussion but no agreement in the groups about this point. -17- ------- Question 2: What characteristics are desirable in a study used for risk assessment? The EPA had provided strawman language to replace paragraph 2 of Section II.B.7 Human Studies, which focused on the question of "criteria for adequacy of epidemiologic studies for risk assessment purposes." Because the members of the group generally agreed with the view that all valid epidemiologic studies may contribute information to a risk assessment, discussion by the Study Design Work Group led to substitution of a different question: "What characteristics are desirable in a study used for risk assessment?" Since each type of study has particular characteristics, strengths, and limitations, the group suggested revising of the EPA strawman language for paragraph 2 to describe characteristics desirable (rather than required) for risk assessment. Several new characteristics were added to those identified by the EPA version. The following suggested revision is a restatement of ideas from the Group and should not be considered a polished or finished revision. Adequacy (to replace current paragraph 2) Criteria for the adequacy of epidemiologic studies are well recognized. Considerations made for risk assessment should recognize the characteristics, strengths, and limitations of the various epidemiologic study designs. Characteristics which are desirable in the epidemiologic studies are listed here. 1. Relevance - The study deals with the exposure-response relationship central to the risk assessment. 2. Adequate Exposure Assessment - Study subjects have exposure. - Analysis deals with time-related measures as far as study type permits, e.g., duration, intensity, age at first exposure, etc. -18- ------- 3. Proper Selection and Characterization of Study and Comparison groups - Selection and characterization are carefully described. - Source population is appropriate. - Results are generalizable to populations to be protected by the risk assessment. 4. Identification of a Priori Hypotheses 5. Adequate Sample Size 6. Adequate Response Rates and Methodology for Handling Missing Data 7. Clear and Appropriate Methodology for Data Collection and Analysis 8. Proper Identification and Characterization of Confounding and Bias 9. Appropriate Consideration of Latency Effects 10. Valid Ascertainment of Causes of Morbidity and Death 11. Complete and Clear Documentation of Results The panelists recommended that EPA continue to actively seek available unpublished studies, if an unpublished report (or the documentation for a published report) might contribute to the risk assessment process. Question 3: What criteria strengthen the view that an epidemiologic association may reflect a causal relationship? Strawman language had been provided by EPA to substitute for paragraphs 3 and 4 of Section II.B.7 Human Studies. The Study Design Work Group suggested that the EPA staff consider rewriting this text to express an historical -19- ------- approach, indicating that Koch's postulates were modified by Bradford Hill for use in environmental studies, and that his criteria have been modified by EPA for considerations relevant to risk assessment. The panelists were in general agreement on most points that are contained in the suggested text below. Some members suggested deleting "specificity." They viewed it as misleading or incorrect, based upon the view that most agents are observed to cause several effects. However, all agreed that as expressed below, it is a useful criterion when it is. present. The panelists agreed that only one criterion (temporal relationship) was essential for causality. The presence of other criteria may increase the credibility of a causal association, but their absence does not prevent a causal interpretation. The panelists viewed all but specificity and coherence as applicable to an individual study. The panelists' ideas for a suggested revision follow: Criteria for Causality (paragraphs 3 and 4 deleted: new text). Epidemiologic data are often used to infer causal relationship. A causal interpretation is enhanced for studies to the extent that they meet the criteria described below. None of these actually establishes causality; actual proof is rarely attainable when dealing with environmental carcinogens. The absence of any one or even several of the others does not prevent a causal interpretation. Only the first criterion (temporal relationship) is essential to a causal relationship: with that exception, none of the criteria should be considered as either necessary or sufficient in itself. The first six criteria apply to an individual study. The last criterion (coherence) applies to a consideration of all evidence in the entire body of knowledge. 1- Temporal relationship: This is the single absolute requirement, which itself does not prove causality, but which must ,be present if causality -20- ------- is to be considered. The disease occurs within a biologically reasonable time frame after the initial exposure tb account for the specific health effect. Cancers require certain latency periods. While latency periods vary, existence of the period is acknowledged. The' initial period of exposure to the agent is the accepted starting point in most epidemiologic studies. 2. Consistency: When compared to several independent studies of a similar exposure in different populations, the study in question demonstrates a similar association which persists despite differing circumstances. This usually constitutes strong evidence for a causal interpretation (assuming that the same bias or confounding is not also duplicated across studies). This criterion also applies if the association occurs consistently for different subgroups in the same study. 3. Strength (magnitude') of association: The greater the estimate of risk and the more precise (narrow confidence limits), the more credible the causal association. 4. Dose-response or biologic gradient: An increase, in the measure of effect is correlated positively with an increase in the exposure or estimated dose. A strong dose-response relationship across several 1 categories of exposure, latency, and duration is supportive although not conclusive for causality, assuming confounding effects are unlikely to be correlated with exposure levels. The absence of a dose-response gradient, however, may be explained in many ways. For example, it may mean only that the maximum effect had already occurred at the lowest dose, or perhaps all gradients of exposure were too low to produce a measurable effect. If present, this characteristic should be weighted heavily in considering causality. However, the absence of a dose-response relationship should riot be construed by itself as evidence of a lack of a causal relationship. -21- ------- 5. Specificity of the association: In the study in question, if a single exposure is associated with an excess risk of one or more cancers also found in other studies, it increases the likelihood of a causal .interpretation. Most known agents, however, are responsible for more than one site-specific cancer. Therefore, if this characteristic is present, it is useful. However, its absence is uninformative. 6. Biological plausibility: The association makes sense in terms of biological knowledge. Information from toxicology, pharmacokinetics, genotoxicity, and in vitro studies should be considered. 7. Coherence: This characteristic is used to evaluate the entire body of knowledge about the chemical in question. Coherence exists when a cause-and-effect interpretation is in logical agreement with what is known about the natural history and biology of the disease. A proposed association that conflicted with existing knowledge would have to be examined with particular care. In a joint session of the Study Design and Weight-of-Evidence Groups at the end of the meeting, some panelists noted the desirability of having epidemiologic data available for use in risk assessment at the time that animal studies are completed by the National Toxicology Program (NTP). A suggestion was made by some that EPA consider undertaking an effort to assess the feasibility of conducting a human epidemiologic study at the same time the Agency recommends that NTP undertake an animal study. There was only limited discussion of this point. Some panelists objected to it, mainly because of logistic difficulties. -22- ------- EPA CLASSIFICATION SYSTEM FOR CATEGORIZING WEIGHT OF EVIDENCE FOR GARCINOGENICITY FROM HUMAN STUDIES Strawman Language and Related Questions Assessment of Weight of Evidence' for Garcinogenlcity from Studies in Humans There are a variety of sources of human data. When the totality of human evidence is considered, the conditions under which the information has been collected are of importance in defining the limits of its inference. These limits are particularly critical for studies where no positive results have been seen, although they contribute to conclusions in all circumstances. In the evaluation of carcinogenicity based on epidemiologic studies it is necessary to consider the roles of extraneous factors such as bias and other nonrandom error and chance (random error) and how they might affect evaluation and estimates of an agent's effects. Some extraneous factors of concern are selection bias, information bias, and confounding. Five classifications of human evidence are established in this section. The following discussion includes some interpretation and illustration of their use. 1. The category of sufficient implies the existence of a" causal relationship between'the exposure in question and an elevation of cancer risk. Most if not all of the criteria for causality as defined in Section II. B. 7. should be met. Most agents or mixtures falling into this category would require at least one methodologically sound epidemiologic study meeting most of the criteria for causality and whose results cannot be explained by chance, bias, or confounding. -23- ------- Issues: a. If one such study is available, would others be needed as confirmatory? b. Is it necessary to specify what are "most criteria?" Sometimes a case series will present data that drive a causal conclusion. Issues: Are supporting studies needed? Language that might serve is: One or more supporting epidemiologic studies that also demonstrate a relationship between the exposure and cancer should be available. The latter studies need not be definitive by themselves although the stronger they are in terms of their validity the more credible will be a "sufficient" categorization of the epidemiologic data. Sometimes studied populations will differ only in cumulative dose or in dose rate. Should a conclusion of carcinogenicity be limited to circumstances of exposure? Corollary: Shall we include discussion of the evaluation of a body of studies where some show effects at one site and some show them at another or where some are of different ethnic or geographic groups or where there are age and sex differences? The Agency receives studies, some by statute, from a variety .of sources, that may not have appeared in the open or peer-reviewed literature. Should comment be made regarding our intent to use such studies? 2. The category of limited implies that a causal interpretation is more credible than nonrandom error, although it cannot be entirely ruled out as an explanation for the statistically significant positive association found in at least one or more epidemiologic studies. Such studies would typically include a vigorous effort by the author or be carefully reviewed by the Agency to explain why nonrandom error (confounding, information bias, etc.) is unlikely to account for the association. Also included in the limited category are agents for which the evidence consists of some number of independent studies exhibiting statistically significant positive associations between the exposure and the same site- -24- ------- specific cancer but for which nonrandom error could not be ruled out entirely as the explanation for the association in each study. This category may also include substances for which a series of epidemiclogic studies (some number of which must be considered valid) exhibit apparent but not significant positive associations for the same site-specific cancer without any series of valid studies in which there is apparent lack of association to counter the observed association. Issues: a. How many studies would support each conclusion? b. Is it necessary for responses in a series of studies to be specific. to site in order to fall into the limited category? 3. The category of inadequate implies that the data, although perhaps suggestive, do not meet the criteria for a limited categorization of the evidence. This would include studies that demonstrate statistically significant positive associations that could be explained by the presence of nonrandom error and which are not specific with respect to site. Also included in this category are studies deemed of insufficient quality or statistical power, and where there is no confidence in any .particular interpretation. For example, results may be consistent with a chance effect, or exposure may not clearly.be tied to the agent in question. Alternately, a report of a study may render it incapable of being evaluated owing to insufficient documentation. Issue: Is it proper to modify or downgrade a category by such language as "Studies showing no positive results can be used to lower the classification from limited to inadequate . . . " ? Possible Choices to Complete This Statement Might Be: "only if the exact same conditions (including sensitivity) have been replicated in a statistically significantly positive study of the kind described under this category. The results would thus be contradictory and the net effect of the latter study would be to negate the findings of the former." 25 ------- or "if they are at least as likely to detect an effect as an already- completed study providing limited evidence." Corollary: How explicit should we be? 4. The category of no data indicates no data are available directly regarding humans. 5. The category of evidence of not being a carcinogen in humans is reserved for circumstances in which the body of evidence indicates that no association exists between the suspected agent and an increased cancer risk. It should be recognized that alterations in the conditions under which a study is done may lead to statistically significant risk estimates where they did not exist before. Studies of uncertain quality with no positive results should not be used to reduce the weight of evidence. Issues: a. We are reluctant to use the word "negative" because it has come to mean a variety of things including (1) a study with no cases of cancer, (2) a study judged statistically to have no excess cases of cancer, and (3) a study that leads readers to believe there is no need for concern about carcinogenicity. We do not wish to perpetuate the misuse of the term "negative" when referring to certain epidemiologic studies. Have we adequately described the circumstances under which we would conclude that the body of epidemiologic evidence suggests an agent is not a carcinogen? b. Do we need this category? Will it ever be used? -26- ------- Chair Summary of Work Group Session on Classification System for Categorizing Weight of Evidence for Carcinogenieity from Human Studies Chair: Dr. Raymond R. Neutra I. INTRODUCTORY DISCUSSIONS t Weighing evidence refers to the act of reviewing and summarizing human evidence ranging from case studies to randomized trials. Evidence which suggests positive, null, or even protective carcinogenic effects is considered while taking note of the quality of each piece of information. The group seemed to advocate a procedure that considers all study results regardless of direction, rather than only positive studies, in other words, a "weight of evidence approach" rather than a "strength of evidence approach," respectively. One workshop participant pointed out that those who review human evidence assign some informal prior probability to the hypothesis that the substance under investigation causes cancer in humans. Without advocating formal Bayesian statistical procedures, it should be noted that this "prior probability" is influenced by the nature of the substance, information on its metabolism, and behavior in short-term tests. The group decided that results of animal bioassays or subchronic tests should not influence judgments on the prior probability or in the interpretation of the human studies, since a separate process, in EPA deals with the weight of animal evidence. In a subsequent process, the two streams of evidence will be combined by scientists to give a final "posterior" characterization of the evidence. There was a discussion of nomenclature. It was agreed that the adjective "negative" should be avoided, as it is ambiguous. It has been used to mean "bad," "protective effect," "no effect," or "absent." -For the purposes of the work group it was agreed that "null" would be used for a study that had a relative risk close to 1.0 with confidence limits which included 1,. 0, and that "inverse association" is the appropriate terminology for a study that showed a -27- ------- relative risk less than 1.0 and confidence limits which did not include 1.0. A positive study is one with a relative risk greater than 1.0 with confidence limits which do not include one. The group recognized that the evaluation of a body of evidence depends on,.but is different from, the process of evaluating individual studies. The latter process was discussed by the Study Design and Evaluation work group, which listed a series of study characteristics and criteria for likely causality that should be considered in characterizing a single study. In the process of evaluating the evidence from a single study, one of the following categories might be designated: clear, some, equivocal, or no evidence of human carcinogenieity. The study design group felt that it was not possible or desirable to have a rigid algorithm for making this categorical determination on a particular study. For the purposes of discussion in the weight-of-evidence work group, these terms were used even though definitions were not developed by that group. Similarly the weight-of-evidence group was not in favor of a rigid algorithm for combining evidence among individual studies. Instead, it was suggested that a review of all the studies by a qualified group could lead to an ordinal classification. It was agreed that assigning a ratio scale numerical score of evidentiary sufficiency would not be helpful since the Agency would need to categorize the score anyway for action purposes. An artificial numerical score might well complicate rather than simplify the regulatory process. The general consensus of the workshop was to avoid too narrowly defined guidelines, e.g., the use of specific numerical standards in defining "tight confidence limits." It was further noted that overly specific guidelines had a number of drawbacks. First, they could never capture all conceivable contingencies and thus would be a Procrustean bed. Second, a cookbook could be inappropriately used. This is a particular problem in an agency such as EPA where the prevalence of epidemiologists is below 1/1000 (15 total in this organization of 16,000 employees). The group encouraged EPA to increase the number of -28- ------- epidemiologists on their staff and to have continuing education for epidemiologists and others to foster interdisciplinary work. One participant urged that an lARC-like process using external experts should be employed for weighing evidence. A number of drawbacks were pointed out. There was discussion about the methods that EPA should employ in summarizing bodies of evidence. For example, should there be an appendix presenting techniques for meta analysis and its graphical presentation? No consensus emerged because of concern that these techniques could be misused. II. PROPOSED MODIFICATION OF THE STRAWMAN CATEGORIES FOR WEIGHED EVIDENCE The strawman language suggested the categories of Sufficient, Limited, Inadequate, No Human Data, Evidence of Not Being a Carcinogen in Humans. The work group suggested the categories: Sufficient Evidence for Human Carcinogenicity, Limited Evidence for Human Careinogenieity, Inconclusive Evidence for Human Carcinogenicity, No Human Data. There was no consensus and considerable argumentation about two additional possible categories: Human I Evidence Not Suggestive of Carcinogenicity and Sufficient Evidence for Lack of Human Carcinogenicity. The work group's understanding of the categories and its responses to specific strawman issues raised by EPA staff in each respective section are dealt with below. -29- ------- III. SPECIFIC COMMENTS ON EACH EVIDENTIARY CATEGORY 1. Sufficient Evidence for Human Careinogenieity Issues A and B. Required Number and Quality of Studies In some circumstances where the information "prior probability" was high and the study was particularly strong, the work group felt that a single study with clear evidence could provide sufficient evidence for a substance. In most cases, there would probably be more than one study with clear evidence. The group did not want to provide a cookbook to define the criteria for sufficiency. Issues A, B and C. Need for Supportive Information and Peer Review The work group felt that only in the rarest circumstances would a case series provide sufficient evidence for human carcinogenicity, e.g., vinyl chloride. They were reluctant, however, to provide a rigid algorithm which always requires supporting studies. The work group did not wish to limit hazard identification to the dose scenario covered in the epidemiological study. That is dealt with during dose-response assessment. It also advised that a series of studies which show an increased risk of cancer at a particular site should be given more weight than a series showing an increased risk at various sites. In the latter case, the mechanism of causation should enter into the weighing process. Also, hazard identification should not be limited to the particular races, sex, or age group covered by the epidemiological studies. Most work group members thought that unpublished studies should be considered in weighing evidence. Two strong caveats were voiced. There should be deadlines for submission to prevent a continual stream of last- minute submissions which delay the regulatory process indefinitely. There -30- ------- should be regulatory peer review and perhaps a requirement that any journal acceptance or rejection correspondence be submitted to the Agency. 2. Limited Evidence For Human Carcinogenicity Issues A and B. Number of Required Studies and Sites Specificity One or more studies providing "some evidence" even if there are some "null" studies, will qualify for this classification. Alternatively, a series of positive equivocal evidence studies in the absence of any null studies would qualify as well. The work group did not have suggestions for an algorithm to deal with these issues. 3. Inconclusive Evidence For Human Carcinogenicity The work group preferred the term "inconclusive" to the term "inadequate" because the latter implies poor quality evidence when in fact the studies may be of good quality but contradictory. The evidence may gain this characterization under three contingencies: 1) the evidence is a mixture of equivocal and null studies; 2) the evidence is a mixture of imprecise null studies which do not add up to a precise null study; or 3) The evidence does not meet the criteria for the other categories. Issue A. Modifying and Downgrading Categories The work group did not discuss exact wording to cover the situation in which a positive study is followed by a null study so that a substance would fall from the limited category into the inconclusive category. -31- ------- 4. No Human Data The category of No Data indicates no data are available directly regarding humans. The subcommittee had no comments on this self-evident category. 5. No Evidence for Human Carcinogenicity (Human Evidence Not Suggestive of Carcinogenicity) There was considerable discussion about the concept of this and the following category and of the names which should properly apply to them. It should be kept in mind that this epidemiological categorization was to be based on human studies and interpreted in the light of short-term studies and mechanistic insights, but not subchronic or chronic cancer animal bioassays. The ultimate classification would weight the two streams of evidence. The sensitivity of this and the following category has to do with the weight required for human studies to overcome sufficient animal evidence. The weight of evidence for a substance would fall into this category rather than the "inconclusive" category if all the human studies had been null studies yet animal risk assessment .would have predicted null studies at the human dose delivered to the population size "exposed, and there were possible mechanisms of action which would predict a nonthreshold dose-response curve. In this case, a series of good null studies are simply not good enough to definitively cancel out sufficient animal evidence. Yet the evidence of a series of null studies with individually tight confidence intervals or tight intervals when taken together somehow warrants more than an/'inconclusive" label. The weight of evidence regarding EDB is an example of this situation. The substance is genotoxic and animal risk extrapolations would have predicted that the worker studies carried out could not have detected the fairly small relative risks expected from the doses received. . -32- ------- There was some discussion about what it meant to be "taken together." Subjecting a series of small studies to a Mantel-Haenszel procedure was suggested. There were some technical objections, but it was agreed that consensus might be found for some analogous procedure. 6. Sufficient Evidence for Noncarcinogenicity in Humans There was no consensus on this classification. Many members felt that a series of strong null studies with tight confidence limits would qualify if coupled with a widely accepted mechanistic understanding that suggested that the agent should not cause cancer at doses to which humans could be accidently exposed at work or in the environment. This kind of mechanistic and epidemiological evidence could cancel out a series of positive animal bioassays for the purpose of hazard identification. A few others in the workshop pointed out that null studies could be used to determine if humans were substantially less sensitive than animals in the dose-response stage of risk assessment. In the hazard identification stage, however, only a much more stringent criterion was appropriate. The confidence limits around the null value needed to be so tight that they excluded the possibility of added risk of public health and regulatory concern. One proposal supported by two of the Workshop participants was that "risk of regulatory concern" be quantified. ' ''< The majority who disagreed with this proposal seemed to have two views. First, that it was unwise to tie the Agency's hand with a number Which might change and which was an issue of risk management. Second, it was perceived that no study would be able to rule out' all risks of potential regulatory concern. Such power is not practically achievable, and even it if were, one would not trust epidemiology's ability to control confounding sufficiently to accurately assess relative risks so close to the null. This stringent requirement might also create disincentives for government arid corporate sponsors who fund epidemiological studies in the hope that they would give the -33- ------- candidate chemical a "clean regulatory bill of health" in the hazard identification phase. Everyone recognized that it is not possible to prove an absolute zero risk. The advocates of the more stringent definition, e.g., to require exclusion of all risks of regulatory concern, responded that this was exactly the point. Sponsors should give up the vain hope that null epidemiological evidence could be used in the hazard identification process to get their substances "off the list"1 when there is sufficient evidence from animal studies. Although epidemiology can rarely get a substance off the hazard identification list once an animal study has put it there, a series of good quality null studies would put the substance in the Human Evidence Not Suggestive of Carcinogenicity category. If the human response was considerably lower than that predicted from animal studies, this may lead to higher tolerated industrial emissions of the substance, which in turn, may, in some cases, have important economic implications. Thus incentives exist for carrying out epidemiological investigations even if these studies can rarely be used to justify delisting a substance. Saccharin was cited as an example of a substance that should not get a "clean bill of health." A series of strong null human studies was still easily compatible with the animal predictions of 800 extra cases per year in the United States. Although this is equivalent to a relative risk of 1.01, small by epidemiological standards, one is hard pressed to exonerate a substance with a study which does not have the power to exclude the very number predicted by animal risk assessment. It is for this reason that some members in the subcommittee demanded a null study with the power to exclude the low added lifetime risks of interest to regulatory agencies. The wording used by IARC for a similar category was proposed with the addition of a sentence dealing with the need for power to exclude risks of regulatory Editor's note. Concept of a list arose during the work group discussion, and was not previously referred to the strawman language. -34- ------- interest. This line of argument was unfamiliar and even irritating to many of the epidemiologists present. The proponents for the category, Sufficient Evidence for Noncarcinogenicity in Humans, responded that saccharin was a good candidate for the category since there was experimental and mechanistic evidence that bladder cancer in rats should only occur at high doses and that downward extrapolation of risk to dietary levels in humans was not warranted. There was a question as to whether there was scientific consensus on this. Although most of the discussants did not question the use of widely accepted mechanistic arguments separating man from animal, there were a few concerns that there could always be other carcinogenic mechanisms that were shared between humans and rodents that might still operate. This is something which needs to be examined carefully. A consensus did seem to emerge against the more permissive strawman language which suggested that a series of unopposed null studies constituted Sufficient Evidence for Noncarcinogenicity in Humans. The work group agreed with EPA staff about not using the,word "negative" because of the many different interpretations that can be given to this word. , The discussion in the strawman document about how to interpret null studies was not adequate and prompted the arguments outlined above. The work group did not come to a consensus about categories that dealt with null studies. -35- ------- IV. MISCELLANEOUS OBSERVATIONS 1. Control of Smoking When a series of studies show an effect but smoking has not been controlled for, this should not automatically disqualify the studies for consideration. It should not always be assumed that controlling for smoking would weaken an observed chemical effect. Highly exposed individuals may smoke less. 2. Multiple Exposures There was a discussion of the problem of concomitant exposure to other chemicals in a series of studies. One should determine .if all of the studies were characterized by exposure to the same set of chemicals. If not, some of the other chemicals could be removed as confounders. If so, the participants recommended following the IARC policy of implicating the process as a whole. 3. Carcinogenic Metabolites Sometimes a substance produces by metabolism another substance that has achieved some degree of evidentiary sufficiency even though the parent compound has not been studied. If the target organ of main exposure is the same, the workshop members agreed that the parent compound should be classified similarly to the metabolite. As circumstances deviate from this paradigm, more judgment will be needed in the classification process. 4. Proper Use of Power Calculations The work group came to a consensus that it made no sense to calculate the power of the study after the fact. Instead, one should inspect the confidence limits to see if they include the expected effect. -36- ------- 5. Epidemiologic Research Needs A number of participants suggested that EPA fund' an epidemiological study every time a request was made to the National Toxicology Program (NTP) for an animal study. Others felt that this would be wasteful of scarce epidemiological resources and that one should wait for positive animal results 'before embarking on an epidemiological study. Still others felt that we could be missing human carcinogens that happened to have negative animal results. There seemed to'be some consensus that a search be initiated for an exposed cohort and an exposure assessment be carried out, in parallel with each NTP study. -37- ------- ------- DOSE-RESPONSE ASSESSMENT Strawman Language and Related Questions1 1. Selection of data: As indicated in Section II.D., guidance needs to be given by the individuals doing the qualitative assessment (epidemiologists, toxicologists, pathologists, pharmacologists, etc.) to those doing the quantitative assessment as to the appropriate data to be used in the dose- response assessment. This is determined by the quality of the data, its relevance to the likely human modes of exposure, and other technical details. A. Human studies. Estimates based on adequate human epidemiologic data are preferred over estimates based on animal data. Intraindividual differences, including age- and sex-related differences, should be considered where possible. If adequate exposure data exist in a well-designed and well- conducted epidemiologic study that has shown no positive results for any relevant endpoints, it may be possible to obtain an upper-bound estimate of risk from that study. Animal-based estimates, if available, also should be presented when such upper bound estimates are calculated. More carefully executed dose-response assessments benefit from the availability of data that permit the ages of exposure and onset of disease and the level of exposure and duration to that exposure to be incorporated in the assessment. Issues: a. Should an upper-bound risk estimate be made from a nonpositive human study if it is the only risk estimate that can be made? first two paragraphs of the strawman language provided are intended to be parallel to the first two paragraphs of III.A.I. of the current guidelines (see Appendix E). The second three paragraphs of the strawman language are intended to be parallel to the first three paragraphs of III.A.2, of the current guidelines. -39- ------- b. Should we describe what is minimal and what is preferred data? A discussion of "preferred data" could become so extensive that description in the guidelines would be cumbersome, such data may never be obtainable, and the discussion in the guidelines would probably never be exhaustive. Alternately, would it be possible to specify levels of preferred data? c. In the absence of dose-rate information, is the use of cumulative dose an appropriate default position? Should that be specified in the guidelines? d. Should the guidelines be made to reflect possible differences in dose-response between children and adults because of differences in tissue growth, metabolism, food and fluid intake, etc.? e. Should the use of person-years of observation be counted from the beginning or the end of exposure for dose-response assessment? Should this be discussed in the guidelines? 2. Choice of mathematical extrapolation model: Since risks at low-exposure levels cannot be measured directly either by animal experiments or by epidemiologic studies, a number of mathematical models have been developed to extrapolate from high to low dose. Models should make optimal use of biologic data where possible. Different extrapolation . . . (The language here would be the same as that in the current guidelines.) . . . A rationale will be included to justify the use of the chosen model. A. Human data. Dose-response assessments with human data should consider absolute as well as relative risk models when the data are available. Where possible, results from both models should be presented. If selecting / one model over another, the rationale should be described. In the absence of information to the contrary, a dose-response model that is linear at low doses will be employed with human data. A point estimate from the model may be used to estimate risk at doses below the observable range. B. Animal data. For animal data, the linearized multistage procedure will be employed in the absence of information to the contrary. The linearized multistage model is a curve-fitting procedure. It does not model what is believed to be a multistage process of tumor development. It is appropriate as a default procedure, however, in that it is linear at low -40- ------- doses. Where appropriate, the results of different extrapolation models may be presented for comparison with the linearized multistage procedure. When longitudinal data . . . (Continue discussion in current guidelines.) Issues: a. Point estimates from models of human data have been used in the Agency in the past for dose response assessment. This differs from risk estimates made from animal data in that the estimates from animal data are statistical upper bounds. The rationale for using a point estimate with human data is that (1) there is no cross-species extrapolation with the human data (2) exposures to humans in the epidemiologic data sets used for modeling (usually occupational studies) are much closer than the doses used in animal studies to the environmental exposures of concern and (3) the point estimate, though not a statistical upper bound, provides an upper bound in the sense that the response at lower doses is likely to be less than that predicted by a model with low-dose linearity. Should statistical upper-bound dose-response estimates be used with human data for consistency with the dose-response estimates from animal data? The linearized multistage model is recognized as a curve-fitting procedure. It does not model stages of cancer. Is it appropriate to recommend the linearized multistage procedure in absence of information to the contrary for the dose-response assessment of animal data? Would it be more appropriate to simply recommend'a model that is linear at low doses in the absence of information to the contrary? Are there examples of agents which have supralinear dose response for humans at low doses? If so, a model with low dose linearity may not be protective of public health. -41- ------- Chair Summary of Work Group Session on Dose-Response Assessment Chair: Dr. Philip Enterline The Dose-Response Work Group discussed two major questions posed by EPA in the strawman language. These were: (1) How should the most appropriate data be selected for use in dose- response assessment? (2) How should the most appropriate extrapolation model be selected for estimating risks from human data sets and should these models be consistent with those used for animal data? The sections below apply to Section III.A.I and 2 of the current guidelines (see Appendix E). Some suggestions are given for changes to the strawman language. 1. Selection of Data While estimates based on adequate human epidemiologic data are preferred over estimates based on animal data, many issues need to be considered so that these estimates will be scientifically sound. The following paragraphs address some of EPA's critical issues in choosing data for dose-response assessment. Issue A. Estimating Risk from Nonpositive Studies The group felt that when no positive evidence (either animal or human) is available, an upper bound risk estimate should not be made from a nonpositive human study. In the presence of a good positive animal study, however, it was felt that a human study could be used and that, under appropriate conditions, the upper bound from the human study could be used rather than the upper bound based on animal studies. -42- ------- Issue B. Acceptable Quality of Data With regard to the kind of epidemiologic data needed, the committee felt that the new strawman language proposed, which appears on page 1 (p. 39 of this document), is adequate: "more carefully executed dose-response assessments benefit from the availability of data that permit the ages of exposure and onset of disease and the level of exposure and duration of exposure to be incorporated in the assessment." Issue'C. Default Position for Dose-Rate Information The committee felt that cumulative dose is an appropriate default position. It is assumed that the wording that now appears in the first full paragraph on page 91 of this document (Appendix E), applies to epidemiologic information. Clearly, the use of daily average or cumulative dose is not always ideal and dose rate information by time would be desirable. Extrapolation from occupational studies that deal with only part of a ' lifetime-to-lifetime risk may not always be appropriate. The committee felt that using lifetime daily averages will probably not grossly understate risk but in some cases might cause an overstatement. Issue D. Adjustments for Children as Compared to Adults It was felt that the assumption that there is no difference between children and adults is a default position. After taking dosimetry into account, other factors such as remaining lifetime; tissue growth, metabolism, food and fluid intake, etc., should be taken into consideration wherever possible. Issue E. Options for Counting Person Years of Observation The committee couldn't comment directly on the issue of whether person years of observation should be counted from the beginning or the end of exposure for dose response assessment. The committee did feel, however, that -43- ------- latency should be considered in calculating dose for the purpose of examining dose-response relationships. This might take the form of lagging (5 years, 10 years, etc.) or of weighting dose by a time-to-tumor distribution with little weight given to times distant from some average value. The workshop group suggested the following changes in the strawman language: Section IIIA. Dose-Response Assessment, Paragraph 1. Selection of Data. On page 1 of the strawman text (pp. 39 and 89 of this document) delete "by the individuals doing the qualitative assessment (epidemiologists, toxicologists, pathologists, pharmacologists, etc.)." Section IIIA. Dose-Response Assessment, Paragraph 2. Selection of Data: Human studies: On page 1 of the strawman text (pp. 39 and 89 of this document) add to the first line the word "positive" so that the first sentence reads, "Estimates based on adequate positive epidemiologic data are preferred over estimates based on animal data." 2. Choice of Mathematical Extrapolation Model While mathematical models must be relied upon, since risks at low exposure levels cannot be measured directly, a range of choices exists regarding the type of model and its assumptions. The following paragraphs provide guidance based on the work group discussions for the critical issues identified by EPA in the strawman document. Issue A. Use of Statistical Upper Bound Dose-Response Estimates The committee felt that when dose-response estimates are made from positive human data, statistical upper bounds should be used so as to be consistent with dose-response estimates from animal data. While it is true that there is no cross-species extrapolation with the human data and that exposures are closer to those actually experienced by humans in risk -44- ------- assessment calculations, it was felt that this might be offset by the fact that the general population to which risk assessments apply may be more heterogeneous in terms of susceptibility than the data sets (often based on occupationally exposed groups) from which risk is estimated. Moreover, the committee was not certain that the true dose-response relationship for humans was always concave upward, and thus linear extrapolation may not always provide a margin of safety. In addition to upper bound estimates, the committee felt that point estimates as presently calculated by the EPA should be shown. Issue B. Use of the Linearized Multistage Model The committee discussed the appropriateness of the linearized multistage model as compared with simple linear models. It was felt that the question of appropriate models was a subject that might be better dealt with at a workshop where this was the main focus and in a context where other models could be presented and discussed. Issue C. Modeling Nonlinear Dose Response The committee felt that where dose is environmentally determined, there are agents (e.g., radiation, arsenic) where response is concave downward. Under these conditions an assumption of low-dose lineararity would ;not be protective of the public. This was considered in the decision to recommend the calculation of statistical upper bounds from human data. The workshop group suggested the following.change in the strawman language: Section IIIA. Dose-Response Assessment. Choice of Mathematical Extrapolation Model: , Human data. On page 2 of the strawman text (p. 40 of this document) delete the following sentence: "a point estimate for the model may be used to estimate risk if dose is below the observable range." -45- ------- ------- APPENDIX A EPA RISK ASSESSMENT FORUM TECHNICAL PANEL AND SUBCOMMITTEE ON EPIDEMIOLOGY -47- ------- WORKSHOP ON CARCINOGEN RISK ASSESSMENT EPA RISK ASSESSMENT FORUM TECHNICAL PANEL AND ASSOCIATES Richard Hill, William Farland, Co-Chairs Margaret Chu Lorenz Rhomberg Jeanette Wiltse Dorothy Patton, Chair, Risk Assessment Forum Cooper Rees, Science Coordinator, Risk Assessment Forum Subcommittee on Epidemiology David Bayliss Jerry Blondell Chao Chen Herman Gibb, Subcommittee Chair Doreen Hill Karen Hogan Aparna Koppikar Elizabeth Margosches Neal Nelson Cheryl Siegel Scott WORKSHOP PARTICIPANTS: Study Design and Interpretation Patricia Buffler Kenneth Cantor Marilyn Fingerhut, Chair Barry Friedlander William Halperin Doreen Hill Karen Hogan Barbara Hulka Renata Kimbrough Aparna Koppikar Genevieve Matanoski Weight of Evidence David Bayliss aaron Blair Jerry Blondell Margaret Chu Philip Cole Henry Falk Elizabeth Margosches Raymond Neutra, Chair Gerald Ott -48- ------- Dose Response Harvey Checkoway Chao Chen Kenneth Chu Kenneth Crump Philip Enterline, Chair William Farland Herman Gibb Daniel Krewski Neal Nelson Gerhard Raabe . Lorenz Rhomberg Cherly Siegel Scott Allan Smith CONTRACTOR ASSOCIATES: Kate Schalk, Conference Services, Eastern Research Group Trisha Hasch, Conference Services, Eastern Research Group Elaine Krueger, Environmental Health Research, Eastern Research Group Norbert Page, Scientific Consultant, Eastern Research Group -49- ------- ------- APPENDIX B LIST OF PARTICIPANTS -51- ------- U.S. Environmental Protection Agency Cancer Risk Assessment Guidelines Human Evidence Workshop June 26-27, 1989 Washington, DC FINAL LIST OF ATTENDEES Mr. David Bayliss Office of Health and Environmental Assessment (RD-689) U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 382-5726 Dr. Aaron Blair National Cancer Institute Executive Plaza North, Room 418 6130 Executive Blvd. Rockville, MD 20892 (301) 496-9093 Mr. Jerry Blondell Hazard Evaluation Division (TS-769C) U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 557-2564 Dr. Patricia Buffler University of Texas Health Science Center at Houston School of Public Health Epidemiology Research Unit P.O. Box 20186 Houston, XX 77225 (713) 792-7458 Dr. Kenneth Cantor National Cancer Institute Environmental Studies Section 6130 Executive Blvd. Rockville, MD 20892 (301) 496-1691 Dr. Harvey Checkoway Department of Environmental Health SC 34 University of Washington Seattle, WA 98195 (206) 543-4383 Dr. Chao Chen Office of Health and Environmental Assessment (RD-689) U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 382-5719 Dr. Kenneth Chu National Cancer Institute 9000 Rockville Pike Bethesda, MD 20892 301-496-8544 Dr. Margaret Chu Office of Health and Enyironmental Assessment (RD-689) U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 382-7335 Dr. Philip Cole School of Public Health University of Alabama 203 TH UAB Station Birmingham, AL 35294 (205) 934-6707 -52- ------- Dr. Kenneth Crump Clement Associates 1201 Gaines Street Ruston, LA 71270 (318) 255-4800 Dr. Philip Enterline University of Pittsburgh . School of Public Health Room A410 130 DeSoto Street Pittsburgh, PA 15261 (412) 624-1559 (412) 624-3032 Dr. Henry Falk Center for Disease Control EHHC/CEHIC Mailstop F-28 1600 Clifton Road, N.E. Atlanta, GA 30333 (404) 488-4772 Dr. William Farland :, . Office of Health and Environmental Assessment (RD-689) Office of Research and Development U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 382-7315 Dr. Marilyn Fingerhut National Institute of Occupational Health and Saftey 4676 Columbia Parkway (R-13) Cincinnati, OH 45226 (513) 841-4203 Dr. Barry Friedlander Monsanto Company 800 North Lindbergh -A3NA St. Louis, MO 63167 (314) 694-1000 Dr. Herman Gibb Human Health Assessment Group (RD-689) U.S. Environmental'Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 382-5720 Dr. Bill Halperin 51 Jackson Street Newton Center, MA (617) 732-1260 02159 Dr. Doreen Hill Analysis and Support Division (ANR-461) U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC. 20460 (202) 475-9640 Ms. Karen Hogan . Exposure Evaluation Division (TS-798) U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 382-3895 Dr. Barbara Hulka Department of Epidemiology Rosenau Hall CB 7400 University of North Carolina, Chapel Hill, NC 27514 (919) 966-5734 Dr. Peter Infante Health Standards Programs N3718 OSHA/DOL 2QO Constitution Avenue, N.W. Washington, DC 2Q210 (301) 523-7111 -53- ------- Dr. Renata Kimbrough Associate Administrator for Regional Operations (A101) U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 382-4727 Dr. Aparna Koppikar Office of Health and Environmental Assessment (RD-689) U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 475-6765 Dr. Daniel Krewski Health and Welfare Environmental Health Center Room 117 Ottawa, Ontario CANADA K1A OL2 (613) 954-0164 Dr. Elizabeth Margosches Exposure Evaluation Division (TS-798) .U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 382-3511 Dr. Genevieve Matanoski Johns Hopkins School of Hygiene and Public Health 615 North Wolfe Street Baltimore, MD 21205 (301) 955-8183 (301) 955-3483 (main office) Dr. Neal Nelson Analysis and Support Division (ANR-461) U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 475-9640 Dr. Raymond Neutra California Department of Health Services 2151 Berkeley Way Berkeley, CA 94704 (415) 540-2669 Dr. Gerald Ott Arthur D. Little 25 Acorn Park Cambridge, MA 02140 (617) 864-5770 (ext. 3136) Dr. Dorothy Patton Risk Assessment Forum (RD-689) U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 475-6743 Dr. Gerhard Raabe Mobil Corporation 150 E. 142nd Street New York, NY 10017 212-883-5368 Dr. David Cooper Rees Risk Assessment Forum (RD-689) U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 475-6743 Dr. Lorenz Rhomberg Office of Health and Environmental Protection (RD-689) U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 382-5723 Ms. Cheryl Siegel Scott Exposure Evaluation Division (TS-798) U.S. Environmental Protection Agency 401 M Street, S.W. Washington, DC 20460 (202) 382-3511 _54_ ------- Dr. Allan Smith University of California at Berkeley 315 Warren Hall Berkeley, CA 94720 (415) 642-1517 (office) (415) 843-1736 Health Risk Associates -55- ------- ------- APPENDIX C LIST OF OBSERVERS -57- ------- U.S. Environmental Protection Agency Cancer Risk Assessment Guidelines Human Evidence Workshop June 26-27, 1989 Washington, DC LIST OF OBSERVERS Steven Bayard U.S. EPA (RD-689) 401 M Street, S.W. Washington, DC 20460 202-382-5722 Judith Bellin 383 '0' Street, S.W. Washington, DC 20024 202-479-0664 Greg Beumel Combustion Engineering C.E. Environmental 1400 16th Street, N.W., Suite 720 Washington, DC 20036 202-797-6407 Karen Creedon Chemical Manufacturers Assoc. 2501 M Street, N.W. Washington, DC 20037 202-881-1384 Maggie Dean Georgia-Pacific Corp. 1875 Eye Street, Suite 775 Washington, DC 20006 202-659-3600 R.J. Dutton Risk Science Institute 1126 16th Street N.W. Washington, DC 20036 202-659-3306 Joel Fisher International Joint Commission 2001 S. Street, N.W., Room 208 Washington, DC 20440 202-673-6222 Robert Gouph Toxic Material News 951 Pershing Silver Spring, MD 20310 301-587-6300 Stanley Gross U.S. EPA (H7509C) 401 M Street, S.W. Washington, DC 20460 202-557-4382 Cheryl Hogue Chemical Regulation Reporter 1231 25th Street, N.W. Washington, DC 20037 202-452-4584 Allan Katz Technical Assessment Systems 1000 Potomac Street, N.W. Washington, DC 20007' 202-337-2625 Bob Ku Syntax Corporation 3401 Hillview Avenue Palo Alto, CA 94301 415-852-1981 -58- ------- Susan LeFevre Grocery Manufacturers of America 1010 Wisconsin Avenue, N.W. Washington, DC 20007 202-337-9400 Lisa Lefferts Center for Science in Public Interest 1501 16th Street, N.W. Washington, DC 20036 202-332-9110 George Lin Xerox Corporation Building 843-16S 800 Salt Road Webster, NY 14580 716-422-2081 Bertram Litt ' Litt Associates 3612 Veasey Street, N.W. Washington, DC 20008 202-686-0191 Donna Martin Putnam Environmental Services 2525 Meridian Parkway P.O. Box 12763 .... Research Triangle Park, NC 27709 919-361-4657 Ray McAllister Madison Building, Suite 900 1155 15th Street, N.W. Washington, DC 20005 202-296-1585 Robert E. McGaughy U.S. EPA (RD-689) 401 M Street, S.W. Washington, DC 20460 202-382-5898 Mark Morrel Front Royal Group 7900 W. Park Drive, McLean, VA 22102 703-893-0900 Suite A 300 Nancy Nickell Right-to-Know News 1725 K Street, N.W., Suite 200 Washington, DC 20006 202-872-1766 Jacqueline Prater Beveridge & Diamond 1350' Eye Street N.W. Suite 700 Washington, DC 20005 202-789-6113 , ^. e Charles Ris U.S. EPA (RD 689) 401 M Street, S.W. .- ; Washington, DC 22205 202-382-5898 Robert Schnatter -Exxon Biomedical Sciences ; Mettlers Road CN-2350 East Millstone, NJ 08875 201-873-6016 Melanie Scott Business Publishers, Inc. 951 Pershing Drive Silver Spring, MD 20910 301-587-6300 Sherry Selevan , U.S. EPA (RD 689) _ , 401 M Street, S.W. Washington, DC 20460 202-382-2604 ' , Tomiko Shimada Shin Nippon Biomedical Laboratory P.O. Box 856 Frederick, MD 21701 301-662-1023 Betsy Shirley Styrene Information & Research Center 1275 K Street, N.W. Washington, DC 20005 202-371-5314 -59- ------- Arthur Stock Shea & Gardener 1800 Massachusetts Ave. Washington, DC 20036 202-828-2147 N.W. Jane Teta Union Carbide Corporation Health, Safety and Environmental Affairs 39 Old Ridgebury Road Danbury, CT 06817-0001 203-794-5884 Sandra Tirey Chemical Manufacturers Association 2501 M Street, N.W. Washington, DC 20037 202-887-1274 Keith Vanderveen U.S. EPA (TS 798) 401 M Street, S.W. Washington, DC 20460 202-382-6383 Frank Vincent James River Corp. P.O. Box 899 Neenah, WI 54976 414-729-8152 -60- ------- APPENDIX D INTRODUCTORY PLENARY SESSION Opening Comments, Dr. Philip Enterline Public Interest Views, Dr. Raymond Neutra Private Sector Views, Dr. Gerald Ott -61- ------- OPENING REMARKS: EPA CANCER GUIDELINES REVIEW WORKSHOP ON HUMAN EVIDENCE Philip Enterline, Ph.D. Professor Emeritus of Biostatistics University of Pittsburgh School of Public Health I was pleased to learn of EPA's decision to expand and clarify its guidelines for the use of human evidence in quantitative risk assessment. As perhaps many of you are aware, of the fairly large number of risk assessments that have thus far been made, only a handful are based upon human evidence. Most are based on extrapolations from animal experimental data. In principal, there is no difference between epidemiologic evidence and experimental evidence. The problem lies in the design and analysis of these studies. When I first became interested in epidemiology, it was not considered to be a hard science and many of the "best" scientists were quite undecided as to how much faith to put on epidemiologic observations. One of the doubters was Bradford Hill, a then well-known British medical statistician, who suggested that perhaps epidemiology could be useful if in designing these studies "the experimental approach was kept firmly in mind." I think that is truly the key to good epidemiologic investigations. Somehow we must conduct epidemiologic studies so as to approach the conditions of an experiment as closely as possible. We have made much progress here with a great boost from advances in statistical methodology and computers. Perhaps the major problem with epidemiologic studies as a tool in quantitative risk assessment is a lack of firm environmental data, although as Allan Smith has pointed out, it is difficult to imagine how the environmental data could be more in error than animal to human extrapolations. I also feel that producers of epidemiologic data need more guidance from consumers as to what kind of data is needed. Most of us are primarily concerned with answering the question, "Is there a disease excess?" rather than "What is the potency of the agent?" -62- ------- It is my feeling that all well-designed epidemiologic studies invplving defined exposures provide some information that can be useful in risk assessment. This is true even if positive findings are not statistically significant. For a number of years I taught a course in Introductory Biostatistics and in that course we covered measurements and tests of significance, with the latter being particularly difficult for many of my students. As part of my final examination, I sometimes ask the following question, "Suppose your grandmother has a cancer and your parents, wanting to take full advantage of your place in the medical science field, ask you to see what kind of treatment is currently in vogue. You search the literature and find two treatments that are being viewed favorably. Results from a large recent clinical trial show treatment A to give a 60 percent five-year survival and treatment B to give a 75 percent five-year survival. Numbers of subjects studied were about the same in each of the two treatment groups and the difference in survival rates is not statistically significant. Which treatment would you select for your grandmother?" Perhaps not surprisingly most of my students conclude that since the difference in treatment was not statistically significant, there was no difference. If pressed they would simply toss a coin to decide on a treatment for their grandmother. There are, however, a few students who would notice that one of the treatments actually gave better results than the other. Epidemiologic studies are not different from clinical trials. All contain some information. Some studies are more positive or some more negative than others and this fact alone may be important. A relative risk of 1.2, even if not statistically significant, may mean more in a particular setting than a relative risk of .8. Perhaps the former might be called nonpositive and the latter called negative. EPA clearly recognizes the usefulness of such [nonpositive] epidemiologic data when they evaluate animal data. A very typical situation is one in which there is a positive animal study and a nonpositive or negative human study. While EPA might dismiss the human study because of a belief that there is no such thing as negative epidemiology, they do use the upper confidence interval of the human data to set an upper limit -63- ------- of risk calculated from the animal study. I think that is a fair way to view human evidence, since the confidence interval is both a function of the power of the study, that is, the sample size, and of the actual results of the study. Incidentally, I don't think it is proper to calculate power after the study has been completed, since it ignores what was found in the study and the situation is clearly different than it was before the study was ever undertaken. Some people seem to feel that the doses of toxic agents received by humans are too small to cause disease excesses large enough to be detectable by epidemiologic studies. In fact, my students often comment to me that it must have been great in the good old days when there were so many things to discover. I would point out here, however, that for the most part, in the "good old days," there was little to guide us in terms of what to look for or where to look. I recall when the first U.S. epidemiologic study of cigarette smoking and cancer was reported in the early 1950s, there was a great deal of debate as to whether this could be in fact true. Why did it take us so long to find such a grand relationship? Even Bill Hueper of NIH, who was probably our greatest prophet as to environmental causes of cancer, had missed this relationship, attributing only cancer of the tongue and cheek to the use of tobacco. Of course, it was the human evidence that led to what is now perhaps our greatest effort in the field of preventive medicine - the anti-smoking campaign. I feel that there are many discoveries yet to be made from epidemiologic studies. Some of these involve simply a careful review of existing literature, while others will require some new investigations guided perhaps by observations made in animal experiments as well as the new field of structural activity research (SAR). In studies of working populations, we really need to take a hard look at studies that show large numbers of statistically significant deficits in various diseases. Can these all be attributed to worker selection or is it possible that something in the design or execution of these types of studies is systematically causing understated risks? -64- ------- In closing, let me assure you'based on a couple of years experience as a member of EPA's Science Advisory Board, that EPA is one federal agency that listens to its consultants. Your work during the next day and a half could have an important impact on the quality of EPA's risk assessment activity in the future. -65- ------- EPIDEMIOLOGICAL RISK ASSESSMENT: SOME OBSERVATIONS FROM THE PUBLIC SECTOR Raymond Richard Neutra, M.D., Dr.P.H. Chief - Epidemiological Studies Section California Department of Health Services I. Should We Be Regulating Substance by Substance? It should be noted that there are 60,000 chemicals in commercial use and that our regulatory scheme has been to regulate them one by one after years of scientific debate. This is analogous to regulating fecal pathogens one by one instead of simply separating people from chemicals the way we have separated them from feces. This is worth pondering before we plunge into the difficulties which this general approach presents us. II. Epidemiologically Non-Detectable Risks May Be of Societal Concern. Figure 1 gives a schematized example of diseases according to their baseline, lifetime, cumulative rates, and the relative risks conveyed by the hypothetical carcinogens in each case. Common cancers, whose bas-^.ine risks are multiplied many times by a carcinogen, are easy to detect and are of societal concern. Rare cancers affected by carcinogens that convey small relative risks are neither important nor detectable. But what of moderately rare cancers exposed to agents that convey relative risks less than two? They may convey lifetime risks greater than one -in a million or one j.n a hundred thousand, and not be detectable with epidemiology. Saccharin is an example of a widely used agent for whom the animal risk assessment suggested an added burden of 800 bladder cancers a year. Yet this was a relative risk of only 1.01! Even the enormous case control studies that were done could not rule out this added risk. Epidemiologists said this study showed there was no risk of public health concern, but eight hundred cases a year (if identifiable) would attract public and legal attention (compare it to the number of Guillaine-Barre (GB) cases in swine flu vaccine, or the number of rabies cases a year). The public is not calmed by j£h_e fact that only a small percentage of -66- ------- .-6 Lifetime Risk in Unexposed Most EPA environmental risk in this cell -3 Rate Ratio Conveyed by Toxicant Small Large Not Important Socially Not Detectable Socially Important r Not Detectable Socially Important Detectable Socially Important Detectable Socially Important Detectable Socially Important Detectable Figure 1. Possible Environmental Risk Scenarios. -67- ------- all GB cases were attributable to the vaccine nor would they be calmed by a similar claim for diet drinks. The null Hoover study was useful in ruling out some of the outlier risk assessments, and in reassuring us that humans are not dramatically more sensitive than animals. Ultimately, mechanistic evidence may lay the issue to rest. The point here is that just because an epidemiologist can't see it doesn't mean it is unimportant. Hence, epidemiology will rarely, if ever - by itself, be able to give a clean bill of health during the Hazard Identification phase of risk assessment. III. Keep the Four Kinds of Evaluation Conceptually Separate. One evaluates individual studies for Hazard Identification (Clear Evidence, Some Evidence, Equivocal Evidence, No Evidence). One weighs a body of evidence for Hazard Identification (Sufficient, Limited, Inconclusive, Evidence Not Suggestive of Carcinogenicity, Sufficient Evidence for Noncarinogenicity). One evaluates individual studies for their usefulness in Dose-Response Assessment (Very Useful, Somewhat Useful, Not Useful) One combines useful studies for the purposes of dose-response assessment. (This has no nomenclature since a summary number comes out of the combined dose-response assessment.) The strawman document was sometimes unclear about the distinction between these various activities. -68- ------- IV. Systematically Anticipate What to Do When Dose-Response Assessment Is Based on Conflicting Human and Animal Data California health department staff have found unforeseen scenarios in which animal and human dose-response assessment may or may not agree. Figure 2 shows a simplified flow diagram spelling out the possible combinations. This diagram could serve as a guide to generate "scenario queries" to the participants, e.g., "How would you handle this one?" What do you do if you have "sufficient animal evidence" but human evidence is "inconclusive" because the only human studies gave null results? According to this flow diagram one would choose the "best" null study to see if the upper confidence level risk was lower than the risk predicted from animals. If so, (as was the case with cadmium) the human upper confidence level risk would be used. -69- ------- Sufficient Animal Evidence Inadequate Epidemiol. Evidence Epidemiol. Evidence Inadequate Epidemiol Evidence Sufficient (+ Sufficient ,"-> Sufficient + Positive Association 9999 No Dose Data Do Assessment Do Assessment Do Assessment Agree with Animal??? Use Human EDB ETO Figure 2. Flow Diagram of Possible Combinations of Associations. between Human and Animal Data. -70- ------- USE OF HUMAN EVIDENCE IN RISK ASSESSMENT - A PRIVATE SECTOR PERSPECTIVE Gerald Ott, Ph.D. Senior Consultant, Epidemiology Arthur D. Little, Inc. Background The views which I present today1 undoubtedly reflect my experience as an occupational and environmental epidemiologist working in the private sector; however, they are my own views and not necessarily those of any specific organization. Our standard of living in the United States has been achieved through the individual and collective efforts of people to convert resources, some renewable and others not, into useful products. This production of goods and services almost inevitably leads to the generation of waste byproducts that may be released to the environment. Wastes are any materials that are deemed to be of no discernable value and to have no utility to individuals, institutions, or society in general. Because of the costs of recovery, the unintended release of valued products to the environment may also render these products classifiable as waste. In an increasingly congested world, there is ample reason to be concerned about the release of hazardous materials to the work and general environments. Human health and environmental quality have been adversely affected by hazardous wastes in the past when both the production of goods was at a lower level and people were not residing in such close proximity to one another. To minimize adverse impacts on human health and the environment, waste control problems must be recognized and addressed using all of the scientific knowledge and appropriate resources available to us. The impact of wastes may be controlled through: Decreased production of goods and services. -71- ------- Waste minimization (e.g., continuous production in enclosed systems versus batch production in open systems). Recycling (extracting greater value from materials otherwise viewed as useless). Confinement in on-site and off-site disposal areas. Incineration (with subsequent dispersion and/or confinement of residual materials). Intentional dilution or dispersion in various environmental media. None of these approaches to waste management is free of risks. With each approach, there may be impacted populations that do not share proportionately in the costs and benefits of the enterprises producing the products. In Table 1, various waste control approaches are listed together with the populations that may be impacted. These include employee populations, community populations, ecologic populations, and global populations. By ecologic populations, I mean the interacting biologic species that exist within an impacted ecosystem. Clearly, there are tradeoffs in control strategies that could differential'ly impact the various populations. For example, venting used to reduce the likelihood of employee exposure may subject the community population to greater exposure opportunities. The United States Congress and various governmental agencies have recognized the need to assure that employees and communities are informed of the potential risks and are afforded an opportunity to participate in risk management decisions related to hazardous wastes. This recognition has been reflected in recent employee and community right-to-know laws and regulations and in regulations related to the siting of hazardous waste facilities. Attendant with right-to-know is an obligation to inform people of both the potential health effects of substances to which they may be exposed and the risks projected to result from those exposures. Quantitative risk assessment has become an important tool for informing persons about the projected risks associated with hazard control decisions. -72- ------- Epidemiology and Quantitative Risk Assessment The epidemiologic approach to assessing health risks of environmental factors relies on inductive reasoning, that is, reasoning from a particular set of facts to general principles. Consequently, epidemiology is data- driven. The epidemiologic approach requires (L) a characterization of both exposures and health outcomes in the selected human population of interest, and (2) analyses of the relationships between exposures and the health outcomes in that population. There are, of course, both strengths and weaknesses in the epidemiologic approach. Among the strengths is its direct relevance to the subject at hand, namely, determining the effects of exposures on human health. Major limitations are: The size of the population available for study may be too small to allow detection of low level but important health risks. The observation period may not have been sufficiently long for chronic health effects to have occurred. The study design may not address all alternative explanations for the- observed health findings. The occurrence of real adverse health effects can only be demonstrated after the fact. This latter limitation suggests that the toxicity endpoints evaluated should emphasize early indicators of reversible adverse effects. An additional, frequently cited, limitation is the lack of quantitative exposure assessments in support of epidemiologic studies. However, with more extensive use of modeling techniques to estimate exposures and with increasingly precise measurement procedures, it may be possible to minimize the practical importance of this limitation in both occupational and environmental settings. Quantitative risk assessment has emerged as the major scientific tool for deductively determining the likelihood that harm will come to people as a consequence of predictable exposure to hazardous substances. The four steps -73- ------- of the quantitative risk assessment process are hazard identification, toxicity or dose-response assessment, exposure assessment, and risk characterization. Since quantitative risk assessment utilizes external toxicity information to define a hazard profile for the environmental agents of concern, it is appropriate to include the results of epidemiologic research as well as animal bioassays and other toxicity tests in identifying specific hazards (e.g., establishing the carcinogenicity of a particular substance), in describing exposures and identifying sensitive populations, and in assessing the dose-response relationship. There are several notable strengths of the quantitative risk assessment approach. First, a quantitative risk assessment establishes what effects could take place in the absence of intervention measures. Thus, there may be opportunities to initiate corrective actions before injury to health has occurred. Secondly, the quantitative risk assessment approach is "highly risk sensitive". This stems from the use of models to predict risks that are far below the risk levels that could be detected in a health study of the subject population. Through the use of across species and low dose extrapolations, acceptable exposure concentrations can be calculated which would yield virtually safe doses provided the assumptions of the risk assessment are valid. For the remainder of this presentation, I would like to discuss the role of epidemiologic evidence in several specific aspects of the quantitative risk assessment process. These are (1) the selection of relevant epidemiologic studies to be included in the evaluation of risks, (2) the methods by which evidence for or against a particular effect is combined across studies, and (3) the use of epidemiologic evidence as a final check of the dose-response assessment. -74- ------- The Selection of Relevant Epidemiologic Studies Evaluating the consequences of exposure to an agent requires a critical review of the available toxicologic and epidemiologic data for that .substance and other interrelated substances. The purpose of the review is to determine appropriate toxicity endpoints and, in particular, to determine the evidence for and against carcinogenicity. In evaluating the available epidemiologic data, two important decisions need to be made. The first decision is whether or not a particular study is relevant (admissible) to the evaluation process. The second decision relates to how the evidence is to be combined across the relevant epidemiologic and toxicologic studies to assess the overall evidence for human carcinogenicity. In addressing the first decision, it is necessary to identify and characterize each candidate study on the basis of both relevance and methodological strengths. Studies under consideration may range from case reports to cohort studies which specifically examine the exposure of interest. To be admissible, each study should address a relevant biologic outcome,- there should be a reasonable basis for ascribing exposure to the agent of interest,- and the research should be methodologically sound within the context of its intended purpose. An assessment of the internal evidence for or against a causal relationship should not be part of the admissibility criteria. In other words, studies should not be selected based on their outcomes or conclusions. From a methodologic viewpoint, the guidelines for evaluating a study should be consistent with "good laboratory practices" and with guidelines developed by the National Academy of Sciences and other professional organizations for judging the quality of epidemiologic research. Based on these guidelines and relevancy criteria, studies can be classified as (1) not relevant, (2) relevant but methodologically unsound, or (3) admissible by virtue of bojth relevancy and soundness of methodology. The decision that a study has utilized sound methodology strengthens the basis of -75- ------- its admissibility. However, studies that have marginal or even important . methodologic deficiencies should not be excluded at this point except where other clearly superior studies are available. While it is essential that peer review takes place, the available studies should not be restricted to those appearing only in the peer-reviewed literature. This is important for two reasons. First, highly relevant and admissible studies may otherwise be excluded from consideration while awaiting publication. This would seem a harsh penalty to exact because of time constraints. Secondly, there are indications that publication bias may result in SL shift of the published literature toward positive findings, thus making it difficult to combine evidence across studies without aggregating the bias component. The effort to review critically those few relevant studies not already published would appear to be effort well spent. Combining Evidence Across Studies -.-. A variety of approaches have been proposed to assist the risk assessor in combining evidence across studies. These include expert or judgment-based approaches, categorical accept-reject analysis, classical statistical approaches, and meta-analysis. These approaches to decision-making are conceptually similar in assigning weights to each component study and combining the weighted evidence in some fashion to arrive at an overall judgment regarding causality. They may differ considerably in the methods for determining how much weight to assign to each study and how explicit to be in assigning weights. The system used by the International Agency for Research on Cancer in evaluating the weight of evidence for human carcinogenicity is well known and relies primarily on expert judgment -to combine evidence across studies. Methods for aggregating biological evidence may differ fundamentally from those used to combine statistical evidence. For example, the statistical power to detect a difference can be increased by combining results across -76- ------- comparably conducted parallel studies and yield a different conclusion than any of the studies viewed in isolation. It is certainly plausible that two studies, neither of which provides statistical evidence of an effect alone, could demonstrate statistical significance when combined. Biologic support for a hypothesis may come from dissimilar studies that demonstrate a connection between different aspects of a particular disease process. Thus, a study demonstrating that methylene chloride is metabolized to carbon monoxide in the liver and a separate study on the cardiovascular effects of carbon monoxide may convincingly link methylene chloride exposure to adverse . cardiovascular effects. While statistical attributes of a study may be judged on the basis of that study alone, biological attributes include elements that are frequently external to the study. This is evidenced by the common organizational structure of most research reports. When reporting findings, researchers typically provide a summary of their own study results followed by an interpretation of their results in the light of existing studies and accepted biologic knowledge. This suggests that evidence for a causal relationship is primarily aggregated on a biologic plane. The biologic evidence may be combined either in serial or parallel fashion, whereas statistical evidence is only aggregated in parallel. Various explanations may be invoked to describe a viable argument for causation, with the most direct argument consistent with biologic knowledge and observation and requiring the fewest assumptions being accorded the highest status. Statistical evidence secures our confidence in the validity of component statements within the various arguments but does not make the case for causality in and of itself. Broad statements that arise from ecologic or correlational studies generally provide weak arguments since they typically require many assumptions about component statements in the argument. Evaluating the strengths of various causal arguments may represent an alternative approach to performing weight-of-evidence determinations for human -77- ------- carcinogenicity. In other words, competing arguments would be put forward and then evaluated to determine the most plausible arguments. The plausible arguments themselves would then be subjected to a weight-of-evidence determination. One potential advantage of this approach is that future research may be more readily directed to address the weakest links in existing causality arguments. A Final Check of the Pose-Response Assessment In addition to their other contributions, epidemiologic studies may also serve as an overall validation check on the final proposed dose-response model for a given substance. The field of observation potentially open to epidemiologic assessment includes chemical process employees in direct contact with the substance of interest, employees whose assignments involve intermittent direct contact with the substance, employees assigned to the same production site, but whose assignments result in indirect exposure to the substance, members of communities surrounding the production sites and perhaps customers who purchase the substance for subsequent use or who purchase products contaminated by the substance. Quantitative exposure estimates would be required to carry out this exercise; however, continued improvements in exposure modeling suggest that it is feasible to develop appropriate exposure estimates. Presumably, the most sensitive validation test would be one which compares the total projected excess of cases throughout the period of observation and across all exposed populations with the observed excess of cases developed through epidemiologic followup of the impacted populations. An additional consistency test might be one which examines only persons in the upper portions of the exposure distribution with sufficient latency to allow for manifestation of a carcinogenic response. From the standpoint of determining that a dose-response model overestimates the risks, this exercise is only viable for substances that were in commercial use prior to the mid- 1950s. For evaluating the possibility that risks have been underestimated by the model, the approach has merit under a broader range of circumstances. -78- ------- TABLE I. HAZARDOUS WASTE CONTROL APPROACHES AND IMPACTED POPULATIONS CONTROL OF HAZARDOUS WASTES ON-SITE CONTROLS Waste Minimization Recyling of Wastes Effective on-site Confinement OFF-SITE CONTROLS Decreased Production of Goods and Services Effective off-site Confinement Ineffective Confinement Recycling of Wastes (off-site) Intentional Dilution Intentional Dispersion IMPACTED POPULATIONS Employee Populations Informed/Uninformed Voluntary/Involuntary Employee Populations Community Populations Ecologic Populations Global Populations ' InfprmedAJniformed Voluntary/Involuntary -79- ------- ------- APPENDIX E 1986 GUIDELINES FOR CARCINOGEN RISK ASSESSMENT 81 ------- 51 FR 33992 GUIDELINES FOR CARCINOGEN RISK ASSESSMENT SUMMARY:On September 24, 1986, the U.S. Environmental Protection Agency issued the following five guidelines for assessing the health risks of environmental pollutants. . Guidelines for Carcinogen Risk Assessment Guidelines for Estimating Exposures Guidelines for Mutagenicity Risk Assessment Guidelines for the Health Assessment of Suspect Developmental Toxicants Guidelines for the Health Risk Assessment of Chemical Mixtures This section contains the Guidelines for Carcinogen Risk Assessment. The Guidelines for Carcinogen Risk Assessment (hereafter "Guidelines") are intended to guide Agency evaluation of suspect carcinogens in line with the policies and procedures established in the statutes administered by the EPA. These Guidelines were developed as part of an interoffice guidelines development program under the auspices of the Office of Health and Environmental Assessment (OHEA) in the Agency's Office of Research and Development. They reflect Agency consideration of public and Science Advisory Board (SAB) comments on the Proposed Guidelines for Carcinogen Risk Assessment published November 23, 1984 (49 FR 46294). This publication completes the first round of risk assessment guidelines development. These Guidelines will be revised, and new guidelines will be developed, as appropriate. FOR FURTHER INFORMATION CONTACT: Dr. Robert E. McGaughy Carcinogen Assessment Group Office of Health and Environmental Assessment (RD-689) 401M Street, S.W. Washington, DC 20460 202-382-5898 SUPPLEMENTARY INFORMATION: In 1983, the National Academy of Sciences (NAS) published its book entitled Risk Assessment in the Federal Government: Managing the Process. In that book, the NAS recommended that Federal regulatory agencies establish "inference guidelines" to,ensure consistency and technical quality in risk assessments and to ensure that the risk assessment process was maintained as a scientific effort separate from risk management. A task force within EPA accepted that recommendation and requested that Agency scientists, begin to develop such guidelines. , General . , '. ' i '"" The guidelines are products of a two-year Agehcywide effort, which has included many scientists from the larger scientific community. These guidelines set forth principles and procedures to guide EPA scientists in the conduct of Agency risk assessments, and to inform Agency decision makers and .the public about these procedures. In particular, the guidelines emphasize that risk assessments will be conducted on a case-by-case basis, giving full consideration to all relevant scientific information. This case-by-case approach means that Agency experts review the scientific information on each agent and use the most scientifically appropriate interpretation to assess risk. The guidelines also stress that this information will be fully presented in Agency risk assessment documents, and that Agency scientists will identify the strengths and weaknesses of each assessment by describing uncertainties, assumptions, and limitations, as well as the scientific basis and rationale for each assessment. Finally, the guidelines are formulated in part to bridge gaps in risk assessment methodology and data. By identifying these gaps and the importance of the missing information to the risk assessment process," EPA wishes to encourage research and analysis that will lead to new risk assessment methods and data. Guidelines for Carcinogen Risk Assessment Work on the Guidelines for Carcinogen Risk Assessment began in January 1984. Draft guidelines were developed by Agency work groups composed of expert scientists from throughout the Agency. The drafts were peer-reviewed by expert scientists, in the field .of carcinpgenesis from universities, environmental groups, industry, labor, and other governmental agencies., They were then proposed for public comment in the FEDERAL REGISTER (49 FR 46294). On November 9, 1984, the Administrator directed that Agency offices use the proposed guidelines in performing risk assessments until final guidelines become available. -82- ------- After the close of the public comment period, Agency staff prepared summaries of the comments, analyses of the major issues presented by the commentors, and proposed changes in the language of the guidelines to deal with the issues raised. These analyses were presented to review panels of the SAB on March 4 and April 22-23, 1985, and to the Executive Committee of the SAB on April 25-26, 1985. The SAB meetings were announced in th.e FEDERAL REGISTER as follows: February 12, 1985 (50 FR 5811) and April 4, 1985 (50 FR 13420 and 13421). In a letter to the Administrator dated June 19, 1985, the Executive Committee generally concurred on all five of the guidelines, but recommended certain revisions, and requested that any revised - guidelines be submitted to the appropriate SAB review panel chairman for review and concurrence on behalf of the Executive Committee. As described in the responses to comments (see Part B: Response to the Public and Science Advisory Board Comments), each guidelines document was revised, where appropriate, consistent with the SAB recommendations, and revised draft guidelines were submitted to the panel chairmen. Revised draft Guidelines for Carcinogen Risk Assessment were concurred on in a letter dated February 7, 1986. Copies of the letters are available at the Public Information Reference Unit, EPA Headquarters Library, as indicated elsewhere in this section. Following this Preamble are two parts: Part A contains the Guidelines and Part B, the Response to the Public and Science Advisory Board Comments (a summary of the major public comments, SAB comments, and Agency responses to those comments). The Agency is continuing to study the risk assessment issues raised in the guidelines and will revise these Guidelines in line with new information as appropriate. References, supporting documents, and comments received on the proposed guidelines, as well as copies of the final guidelines, are available for inspection and copying at the Public Information Reference Unit (202-382-5926), EPA Headquarters Library, 401 M Street, S.W., Washington, DC, between the hours of 8:00 a.m. and 4:30 p.m. I certify that these Guidelines are not major rules as defined by Executive Order 12291, because they are nonbinding policy statements and have no direct effect on the regulated community. Therefore, they will have no effect oh costs or prices, and they will [51 FR 33993] have no other significant adverse effects on the economy. These Guidelines were reviewed by the Office of Management and Budget under Executive Order 12291. August 22, 1986 Lee M. Thomas, Administrator CONTENTS Part A: Guidelines for Carcinogen Risk Assessment 1. Introduction II Mazard Identification A. Overview , : B. Elements of Hazard Identification 1. Physical-Chemical Properties and Routes and Patterns of Exposure 2. Structure-Activity Relationships 3. Metabolic and Pharmacokinetic Properties 4. Toxicologic Effects 5. Short-Term Tests 6. Ix>ng-Term Animal Studies 7. Human Studies C. Weight of Evidence D. Guidance for Dose-Response Assessment E. Summary and Conclusion III.Dose-Response Assessment, Exposure Assessment.'and Risk Characterization . . . A.Dose-Response Assessment ' 1. Selection of Data 2. Choice of Mathematical Extrapolation Model 3. Equivalent Exposure Units Among Species B. Exposure Assessment C. Risk Characterization I. Options for Numerical Risk Estimates 2. Concurrent Exposure 3. Summary of Risk Characterization TV. EPA Classification System for Catagorizing Weight of Evidence for Carcinogehicity from Human and Animal Studies (Adapted from IARC) A. Assessment of Weight of Evidence for Carcinogenicity from Studies in Humans B. Assessment of Weight of Evidence for Carcinogenicity from Studies in Experimental Animals C. Categorization of Overall Weight of Evidence for Human Carcinogenicity V.References ' Part B: Response to Public and Science Advisory Board Comments /. Introduction II. Office of Science and Technology Policy Report on Chemical Carcinogens III. Inference Guidelines IV.EualuationofBenignTumors ' ' . . V. Transplacental and Multigenerational Animal Bioassays VI.MaximumToleratedDose VII. Mouse Liver Tumors VIII.Weight-of-Euidence Categories XI.Quantitative Estimates of Risk -83- ------- Part A: Guidelines for Carcinogen Risk Assessment /. Introduction This 5s the first revision of the 1976 Interim Procedures and Guidelines for Health Risk Assessments of Suspected Carcinogens (U.S. EPA, 1976; Albert et al., 1977). The impetus for this revision is the need to incorporate into these Guidelines the concepts and approaches to carcinogen risk assessment that have been developed during the last ten years. The purpose of these Guidelines is to promote .quality and consistency of carcinogen risk assessments within the EPA and to inform those outside the EPA about its approach to carcinogen risk assessment. These Guidelines emphasize the broad but essential aspects of risk assessment that are needed by experts in the various disciplines required (e.g.j toxicology, pathology, pharmacology, and statistics) for carcinogen risk assessment. Guidance is given in general terms since the science of carcinogenesis is in a stale of rapid advancement, and overly specific approaches may rapidly become obsolete. These Guidelines describe the general framework to be followed in developing an analysis of carcinogenic risk and some salient principles to be used in evaluating the quality of data and in formulating judgments concerning the nature and magnitude of the cancer hazard from suspect carcinogens. It is the intent of these Guidelines to permit sufficient flexibility to accommodate new knowledge and new assessment methods as they emerge. It is also recognized that there is a need for new methodology that has not been addressed in this document in a number of areas, e.g., the characterization of uncertainty. As this knowledge and assessment methodology are developed, these Guidelines will be revised whenever appropriate. A summary of the current state of knowledge in the field of carcinogenesis and a statement of broad scientific principles of carcinogen risk assessment, which was developed by the Office of Science and Technology Policy (OSTP, 1985), forms an important basis for these Guidelines; the format of these Guidelines is similar to that proposed by the National Research Council (NRC) of the National Academy of Sciences in a book entitled Risk Assessment in the Federal Government: Managing the Process (NRC, 1983). These Guidelines are to be used within the policy framework already provided by applicable EPA statutes and do not alter such policies. These Guidelines provide general directions for analyzing and organizing available data. They do not imply that one kind of data or another is prerequisite for regulatory action to control, prohibit, or allow the use of a carcinogen. Regulatory decision making involves two components: risk assessment and risk management. Risk assessment defines the adverse health consequences of exposure to toxic agents. The risk assessments will be.carried out independently from considerations of the consequences of regulatory action. Risk management combines the risk assessment with the directives of regulatory legislation, together with socioeconomic, technical, political, and other considerations, to reach- a decision as to whether or how much to control futur.6 exposure to the suspected toxic agents. , . Risk assessment includes one or more of the following components: hazard identification, dose- response assessment, exposure assessment, and risk characterization (NRC, 1983). Hazard identification is a qualitative risk assessment, dealing with the process of determining whether exposure to an agent has the potential to increase the incidence of cancer. For purposes of these Guidelines, both malignant and benign tumors are used in the evaluation of the carcinogenic hazard. The hazard identification component qualitatively answers the question of how likely an agent is to be a human carcinogen. ,: Traditionally, quantitative risk assessment has been used as an inclusive term to describe all or parts of dose-response assessment, exposure assessment, and risk characterization. Quantitative risk assessment can be a useful general term in some circumstances, but the more explicit terminology developed by the NRC (1983) is .usually preferred. The dose-response assessment defines the relationship between the dose of an agent and the probability of induction of a carcinogenic effect. This component usually entails an extrapolation from the generally high doses administered to experimental animals or exposures noted in epidemiologic studies to the exposure levels expected from human contact with the agent in the environment; it also includes considerations of the validity of these extrapolations. The exposure assessment identifies populations exposed to the agent, describes their composition and size, and presents the types, magnitudes, frequencies, and durations of exposure to the agents [51 PR 339941 In risk characterization, the results of the exposure assessment and the dose-response assessment are combined to estimate quantitatively the carcinogenic risk. As part of risk characterization, a summary of the strengths anjl weaknesses in the hazard identification, dose- response assessment, exposure assessment, and the public health risk estimates are presented. Major assumptions, scientific judgments, and, to the extent possible, estimates of the uncertainties embodied in the assessment are also presented, distinguishing clearly between fact, assumption, and science policy. -84- ------- The National Research Council (NRG, 1983) pointed out that there are many questions encountered in the risk assessment process that are unanswerable given current scientific knowledge. To bridge the uncertainty that exists in these areas where there is no scientific consensus, inferences must be made to ensure that progress continues in the assessment process. The OSTP (1985) reaffirmed this position, and generally left to the regulatory agencies the job of articulating these inferences. Accordingly, the Guidelines incorporate judgmental positions (science policies) based on evaluation of the presently available information and on the regulatory mission of the Agency. The Guidelines are consistent with the principles developed by the OSTP (1985), although in many instances are necessarily more specific. //. Hazard Identification A. Overview The qualitative assessment or hazard identification part of risk assessment contains a review of the relevant biological and chemical information bearing on whether or not an agent may pose a carcinogenic hazard. Since chemical agents seldom occur in a pure state and are often transformed in the body, the review should include available information on contaminants, degradation products, and metabolites. Studies are evaluated according to sound biological and statistical considerations and procedures. These have been described in several publications (Interagency Regulatory Liaison Group, 1979; OSTP, 1985; Peto et al:, 1980; Mantel, 1980; Mantel and Haenszel, 1959; Interdisciplinary Panel on Carcinogenicity, 1984; National Center for Toxicological Research, 1981; National Toxicology Program, 1984; U.S. EPA, 1983a, 1983b, 1983c; Haseman, 1984). Results and conclusions concerning the agent, derived from different types of information, whether indicating positive or negative responses, are melded together into a weight-of- evidence determination. The strength of the evidence supporting a potential human carcinogenicity judgment is developed in a weight- of-evidence stratification scheme. B. Elements of Hazard Identification Hazard identification should include a review of the following information to the extent that it is available. 1. Physical-Chemical Properties and Routes and Patterns of Exposure. Parameters relevant to carcinogenesis, including physical state, physical- chemical properties, and exposure pathways in the environment should be described where possible. 2. Structure-Activity Relationships. This section should summarize relevant structure-activity correlations that support or argue against the prediction of potential carcinogenicity. 3. Metabolic and Pharmacokinetic Properties. This section should summarize relevant metabolic information. Information such as whether the agent is direct-acting or requires conversion to a reactive carcinogenic (e.g., an electrophilic) species, metabolic pathways for such conversions, macromolecular interactions, and fate (e.g., transport, storage, and excretion), as well as species differences, should be discussed and critically evaluated. Pharmacokinetic properties determine the biologically effective dose and may be relevant to hazard identification and other components of risk assessment. t 4. Toxicologic Effects. Toxicologic effects other than carcinogenicity (e.g., suppression of the immune system, endocrine disturbances, organ damage) that are relevant to the evaluation of carcinogenicity should be summarized. Interactions with other chemicals or agents and with lifestyle factors should be discussed. Prechronic and chronic toxicity evaluations, as well as other test results, may yield information on target organ effects, pathophysiological reactions, and preneoplastic lesions that bear on the evaluation of carcinogenicity. Dose-response and time-to-response analyses of these reactions may also be helpful. 5. Short-Term Tests. Tests for point mutations, numerical and structural chromosome aberrations, DNA damage/repair, and in vitro transformation provide supportive evidence of .carcinogenicity and may give information on potential carcinogenic mechanisms. A range of tests from each of the above end points helps to characterize an agent's response spectrum. Short-term in viuo and in vitro tests that can give indication of initiation and promotion activity may also provide supportive evidence for carcinogenicity. Lack of positive results in short- term tests for genetic toxicity does not provide a basis for discounting positive results in long-term animal studies. 6. Long-Term Animal Studies. Criteria for the technical adequacy of animal carcinogenicity studies have been published (e.g., U.S. Food and Drug Administration, 1982; Interagency Regulatory Liaison Group, 1979; National Toxicology Program, 1984; OSTP, 1985; U.S. EPA, 1983a, 1983b, 1983c; Feron et al., 1980; Mantel, 1980) and should be used to judge the acceptability of individual studies. Transplacental and multigenerational carcinogenesis studies, in addition to more conventional long-term animal studies, can yield useful information about the carcinogenicity of agents. It is recognized that chemicals that induce benign tumors frequently also induce malignant -85- ------- tumors, and that benign tumors often progress to malignant tumors (Interdisciplinary Panel on Carcinogenicity, 1984). The incidence of benign and malignant tumors will be combined when scientifically defensible (OSTP, 1985; Principle 8). For example, the Agency will, in general, consider the combination of benign and malignant tumors to be scientifically defensible unless the benign tumors are not considered to have the potential to progress to the associated malignancies of the same histogenic origin. If an increased incidence of benign tumors is observed in the absence of malignant tumors, in most cases the evidence will be considered as limited evidence of carcinogenicity. The weight of evidence that an agent is potentially carcinogenic for humans increases (1) with the increase in number of tissue sites affected by the agent; (2) with the increase in number of animal species, strains, sexes, and number of experiments and doses showing a carcinogenic response; (3) with the occurrence of clear-cut dose- response relationships as well as a high level of statistical significance of the increased tumor incidence in treated compared to control groups; (4) when there is a dose-related shortening of the time- to-tumor occurrence or time to death with tumor; and (5) when there is a dose-related increase in the proportion of tumors that are malignant. Long-term animal studies at or near the maximum tolerated dose level (MTD) are used to ensure an adequate power for the detection of carcinogenic [51 PR 33995] activity (NTP, 1984; IARC, 1982). Negative long-term animal studies at exposure levels above the MTD may not be acceptable if animal survival is so impaired that the sensitivity of the study is significantly reduced below that of a conventional chronic animal study at the MTD. The OSTP (1985; Principle 4) has stated that, The carcinogenic effects of agents may be influenced by non- physiological responses (such as extensive organ damage, radical disruption of hormonal function, saturation of metabolic pathways, formation of stones in the urinary tract, saturation of DNA repair with a functional loss of the system) induced in the model systems. Testing regimes inducing these responses should be evaluated for their relevance to the human response to an agent and evidence from such a study, whether positive or negative, must be carefully reviewed. Positive studies at levels above the MTD should be carefully reviewed to ensure that the responses are not due to factors which do not operate at exposure levels below the MTD. Evidence indicating that high exposures alter tumor responses by indirect mechanisms that may be unrelated to effects at lower exposures should be dealt with on an individual basis. As noted by the OSTP (1985), "Normal metabolic activation of carcinogens may possibly also be altered and carcinogenic potential reduced as a consequence [of high-dose testing]." Carcinogenic responses under conditions of the experiment should be reviewed carefully as they relate to the relevance of the evidence to human carcinogenic risks (e.g., the occurrence of bladder tumors in the presence of bladder stones and implantation site sarcomas). Interpretation of animal studies is aided by the review of target organ toxicity and other effects (e.g., changes in the immune and endocrine systems) that may be noted in prechronic or other toxicological studies. Time and dose-related changes in the incidence of preneoplastic lesions may also be helpful in interpreting animal studies. Agents that are positive in long-term animal experiments and also show evidence of promoting or cocarcinogenic activity in specialized tests should be considered as complete carcinogens unless there is evidence to the contrary because it is, at present, difficult to determine whether an agent is only a promoting or cocarcinogenic agent. Agents that show positive results in special tests for initiation, promotion, or cocarcinogenicity and no indication of tumor response in well-conducted and well-designed long-term animal studies should be dealt with on an individual basis. To evaluate carcinogenicity, the primary comparison is tumor response in dosed animals as compared with that in contemporary matched control animals. Historical control data are often valuable, however, and could be used along with concurrent control data in the evaluation of carcinogenic responses (Haseman et al., 1984). For the evaluation of rare tumors, even small tumor responses may be significant compared to historical data. The review of tumor data at sites with high spontaneous background requires special consideration (OSTP, 1985; Principle 9). For instance, a response that is significant with respect to the experimental control group may become questionable if the historical control data indicate that the experimental control group had an unusually low background incidence (NTP, 1984). For a number of reasons, there are widely diverging scientific views (OSTP, 1985; Ward et al., 1979a, b; Tomatis, 1977; Nutrition Foundation, 1983) about the validity of mouse liver tumors as an indication of potential carcinogenicity in humans when such tumors occur in strains with high spontaneous background incidence and when they constitute the only tumor response to an agent. These Guidelines take the position that when the only tumor response is in the mouse liver and when other conditions for a classification of "sufficient" evidence in animal studies are met (e.g., replicate studies, malignancy; see section IV), the data should be considered as "sufficient" evidence of carcinogenicity. It is understood that this classification could be changed on a case-by-case basis to "limited," if warranted, when factors such as the following, are observed: an increased incidence -86- ------- of tumors only in the highest dose group and/or only at the end of the study; no substantial dose-related increase in the proportion of tumors that are malignant; the occurrence of tumors that are predominantly benign; no dose-related shortening-of the time to the appearance of tumors; negative or inconclusive results from a spectrum of short-term tests for mutagenic activity; the occurrence of excess tumors only in a single sex. Data from all long-term animal studies are to be considered in the evaluation of carcinogenicity. A positive carcinogenic response in one species/strain/sex is not generally negated by negative results in other species/strain/sex. Replicate negative studies that are essentially identical in all other respects to a positive study may indicate that the positive results are spurious. Evidence for carcinogenic action should be based on the observation of statistically significant tumor responses in specific organs or tissues. Appropriate statistical analysis should be performed on data from long-term studies to help determine whether the effects are treatment-related or possibly due to chance. These should at least include a statistical test for trend, including appropriate correction tar differences in survival. The weight to be given to the level of statistical significance (the p-value) and to other available pieces of information is a matter of overall scientific judgment. A statistically significant excess of tumors of all types in the aggregate, in the absence of a statistically significant increase of any individual tumor type, should be regarded as minimal evidence of carcinogenic action unless there are persuasive reasons to the contrary. 7. Human Studies. Epidemiologie studies provide unique information about the response of humans who have been exposed to suspect carcinogens. Descriptive epidemiologic studies are useful in generating hypotheses and providing supporting data, but can rarely be used to make a causal inference. Analytical epidemiologic studies of the case-control or cohort variety, on the other hand, are especially useful in assessing risks to exposed humans. Criteria for the adequacy of epidemiologic studies are well recognized. They include factors such as the proper selection and characterization of exposed and control groups, the adequacy of duration and quality of follow-up, the proper identification and characterization of confounding factors ,and bias, the appropriate consideration of latency effects, the valid ascertainment of the causes of morbidity and death, and the ability to detect specific effects. Where it can be calculated, the statistical power to detect an appropriate outcome should be included in the assessment. The strength of the epidemiologic evidence for carcinogenicity depends, among other things, on the type of analysis and on the magnitude and specificity of the response. The'weight of evidence increases rapidly with the number of adequate studies that show comparable results on populations exposed to the same agent under different conditions. It should be recognized that epidemiologic studies are inherently capable of detecting only comparatively large increases in the relative risk of ' 151 FR33996] '"'.'.' '." cancer. Negative results from such studies cannot, prove the absence, of carcinogenic action; however, negative results from a well-designed and; well-conducted epidemiologic study that contains usable exposure data can serve to define upper limits of risk; these are useful if animal evidence indicates that the agent is potentially carcinogenic in humans. C. Weight of Evidence Evidence of possible carcinogenicity in humans comes primarily from two sources: long-term animal tests and epidemiologic investigations. Results from these studies are supplemented with available information from short-term tests, pharmacokinetic studies, comparative metabolism studies, structure- activity relationships, and other relevant toxicologic studies. The questidn of how likely an agent is to be a human carcinogen should be answered in the framework of a weight-of-evidence judgment. Judgments about the weight" of evidence involve considerations of the quality and adequacy of the data and the kinds and consistency of responses induced by a suspect carcinogen. There are three major .steps to characterizing the weight of evidence for carcinogenicity in humans: (1) characterization of the evidence from human studies and from animal studies individually, (2) combination of the characterizations of these two types of data into ah indication of the overall weight of evidence for human carcinogenicity, and 1(3). evaluation of all- supporting information to determine if the overall weight of evidence should be modified. EPA has developed a system for stratifying the weight of evidence (see section IV). This classification is not meant to be applied rigidly or mechanically. At various points in the above discussion, EPA has emphasized the need for an overall, balanced judgment of the totality of the available evidence. Particularly for well-studied substances, the scientific data base will have a complexity that cannot be captured by any classification scheme. Therefore, the hazard identification section should include a narrative summary of the strengths and weaknesses of the evidence as well as its categorization in the EPA scheme. . The EPA classification system is, in general, an adaptation of the International Agency for Research on Cancer (IARC, 1982) approach for classifying the -87- ------- weight of evidence for human data and animal data. The EPA classification system for the characterization of the overall weight of evidence for carcinogenicity (animal, human, and other supportive data) includes: Group A - Carcinogenic to Humans; Group B -- Probably Carcinogenic to Humans; Group C -- Possibly Carcinogenic to Humans; Group D Not Classifiable as to Human Carcinogenicity; and Group E -- Evidence of Non- Carcinogenicity for Humans. The following modifications of the IARC approach have been made for classifying human and animal studies. For human studies: (1) The observation of a statistically significant association between an agent and life-threatening benign tumors in humans is included in the evaluation of risks to humans. (2) A "no data available" classification is added. (3) A "no evidence of carcinogenicity" classification is added. This classificaton indicates that no association was found between exposure and increased risk of cancer in well-conducted, well- designed, independent analytical epidemiologic studies. For animal studies: (1) An increased incidence of combined benign and malignant tumors will be considered to provide sufficient evidence of carcinogenicity if the other criteria defining the "sufficient" classification of evidence are met (e.g., replicate studies, malignancy; see section IV). Benign and malignant tumors will be combined when scientifically defensible. (2) An increased incidence of benign tumors alone generally constitutes "limited" evidence of carcinogenicity. (3) An increased incidence of neoplasms that occur with high spontaneous background incidence (e.g., mouse liver tumors and rat pituitary tumors in certain strains) generally constitutes "sufficient" evidence of carcinogenicity, but may be changed to "limited" when warranted by the specific information available on the agent. (4) A "no data available" classification has been added. (5) A "no evidence of carcinogenicity" classification is also added. This operational classification would include substances for which there is no increased incidence of neoplasms in at least two well-designed and well-conducted animal studies of adequate power and dose in different species. D. Guidance for Dose-Response Assessment The qualitative evidence for carcinogenesis should be discussed for purposes of guiding the dose- response assessment. The guidance should be given in terms of the appropriateness and limitations of specific studies as well as pharmacokinetic considerations that should be factored into the dose- response assessment. The appropriate method of extrapolation should be factored in when the experimental route of exposure differs from that occurring in humans. Agents that are judged to be in the EPA weight- of-evidence stratification Groups A and B would be regarded as suitable for quantitative risk assessments. Agents that are judged to be in Group C will generally be regarded as suitable for quantitative risk assessment, but judgments in this regard may be made on a case-by-case basis. Agents that are judged to be in Groups D and E would not have quantitative risk assessments. E. Summary and Conclusion The summary should present all of the key findings in all of the sections of the qualitative assessment and the interpretive rationale that forms the basis for the conclusion. Assumptions, uncertainties in the evidence, and other factors that may affect the relevance of the evidence to humans should be discussed. The conclusion should present both the weight-of-evidence ranking'and a description that brings out the more subtle aspects of the evidence that may not be evident from the ranking alone. ///. Dose-Response Assessment, Exposure Assessment, and Risk Characterization After data concerning the carcinogenic properties of a substance have been collected, evaluated, and categorized, it is frequently desirable to estimate the likely range of excess cancer risk associated with given levels and conditions of human exposure. The first step of the analysis needed to make such estimations is the development of the likely relationship between dose and response (cancer incidence) in the region of human exposure. This information on dose-response relationships is coupled with information on the nature and magnitude of human exposure to yield an estimate of human risk. The risk-characterization step also includes an interpretation of these estimates in light of the biological, statistical, and exposure assumptions and uncertainties that have arisen throughout the process of assessing risk. The elements of dose-response assessment are described in section III.A. Guidance on human exposure assessment is provided in another EPA [51 FR 33997] document (U.S. EPA, 1986); however, section I1I.B. of these Guidelines includes a brief description of the specific type of exposure information that is useful for carcinogen risk assessment. Finally, in section III.C. on risk characterization, there is a description of the manner in which risk estimates should be presented so as to be most informative. It should be emphasized that calculation of quantitative estimates of cancer risk does not -88- ------- require that an agent be carcinogenic in humans. The likelihood that an agent is a human carcinogen is a function of the weight of evidence, as this has been described in the hazard identification section of these Guidelines. It is nevertheless important to present quantitative estimates, appropriately qualified and interpreted, in those circumstances in which there is a reasonable possibility, based on human and animal data, that the agent is carcinogenic in humans. It should be emphasized in every quantitative risk estimation that the results are uncertain. Uncertainties due to experimental and epidemiologic variability as well as uncertainty in the exposure assessment can be important. There are major uncertainties in extrapolating both from animals to humans and from high to low doses. There are important species differences in uptake, metabolism, and organ distribution of carcinogens, as well as species and strain differences in target- site susceptibility. Human populations are variable with respect to genetic constitution, diet, occupational and home environment, activity patterns, and other cultural factors. Risk estimates should be presented together with the associated hazard assessment (section III.C.3.) to ensure that there is an appreciation of the weight of evidence for carcinogenicity that underlies the quantitative risk estimates. A. Dose-Response Assessment 1. Selection of Data. As indicated in section II.D., guidance needs to be given by the individuals doing the qualitative assessment (toxicologists, pathologists, pharmacologists, etc.) to those doing the quantitative assessment as to the appropriate data to be used in the dose-response assessment. This is determined by the quality of the data, its relevance to human modes of exposure, and other technical details. If available, estimates based on adequate human epidemiologic data are preferred over estimates based on animal data. If adequate exposure data exist in a well-designed and well-conducted negative epidemiologic study, it may be possible to obtain an upper-bound estimate of risk from that study. Animal-based estimates, if available, also should be presented. In the absence of appropriate human studies, data from a species that responds most like humans should be used, if information to this effect exists. Where, for a given agent, several studies are available, which may involve different animal species, strains, and sexes at several doses and by different routes of exposure, the following approach to selecting the data sets is used: (1) The tumor incidence data are separated according to organ site and tumor type. (2) All biologically and statistically acceptable data sets are presented. (3) The range of the risk estimates is presented with due regard to biological relevance (particularly in the case of animal studies) and appropriateness of route of exposure. (4) Because it is possible that human sensitivity is as high as the most sensitive responding animal species, in the absence of evidence to the contrary, the biologically acceptable data set from long-term animal studies showing the greatest sensitivity should generally be given the greatest emphasis, again with due regard to biological and statistical considerations. When the exposure route in the species from which the dose-response information is obtained differs from the route occurring in environmental exposures, the considerations used in making the route-to-route extrapolation must be carefully described. All assumptions should be presented along with a discussion of the uncertainties in the extrapolation. Whatever procedure is adopted in a given case, it must be consistent with the existing metabolic and pharmacokinetic information on the chemical (e.g., absorption efficiency via the gut and lung, target organ doses, and changes in placental transport throughout gestation for transplacental carcinogens). Where two or more significantly elevated tumor sites or types are observed in the same study, extrapolations may be conducted on selected sites or types. These selections will be made on biological grounds. To obtain a total estimate of carcinogenic risk, animals with one or more tumor sites or types showing significantly elevated tumor incidence should be pooled and used for extrapolation. The pooled estimates will generally be used in preference to risk estimates based on single sites or types. Quantitative risk extrapolations will generally not be done on the basis of totals that include tumor sites without statistically significant elevations. Benign tumors should generally be combined with malignant tumors for risk estimates unless the benign tumors are not considered to have the potential to progress to the associated malignancies of the same histogenic origin. The contribution of the benign tumors, however, to the total risk should be indicated. 2. Choice of Mathematical Extrapolation Model. Since risks at low exposure levels cannot be measured directly either by animal experiments or by epidemiologic studies, a number of mathematical models have been developed to extrapolate from high to low dose. Different extrapolation models, however, may fit the observed data reasonably well but may lead to large differences in the projected risk at low doses. As was pointed out by OSTP (1985; Principle 26), No single mathematical procedure is recognized as the most appropriate for low-dose extrapolation in carcinogenesis. When relevant biological evidence on mechanism of action exists (e.g., pharmacokinetics, target organ dose), the models or procedures -89- ------- employed should bo consistent with the evidence. When data and information are limited, however, and when much uncertainty exists regarding the mechanism of carcinogenic action, models or procedures which incorporate low-dose linearity are preferred when compatible with the limited information. At present, mechanisms of the carcinogenesis process are largely unknown and data are generally limited. If a carcinogenic agent acts by accelerating the same carcinogenic process that leads to the background occurrence of cancer, the added effect of the carcinogen at low doses is expected to be virtually linear (Crump etal., 1976). The Agency will review each assessment as to the evidence on carcinogenesis mechanisms and other biological or statistical evidence that indicates the suitability of a particular extrapolation model. Goodness-of-fit to the experimental observations is not an effective means of discriminating among models (OSTP, 1985). A rationale will be included to justify the use of the chosen model. In the absence of adequate information to the contrary, the linearized multistage procedure will be employed. Where appropriate, the results of using various extrapolation models may be useful for comparison with the linearized multistage procedure. When longitudinal data on tumor development are available, time-to-tumor models may be used. It should be emphasized that the linearized multistage procedure leads to [51FR33998] a plausible upper limit to the risk that is consistent with some proposed mechanisms of carcinogenesis. Such an estimate, however, does not necessarily give a realistic prediction of the risk. The true value of the risk is unknown, and may be as low as zero. The range of risks, defined by the upper limit given by the chosen model and the lower limit which may be as low as zero, should be explicitly stated. An established procedure does not yet exist for making "most likely" or "best" estimates of risk within the range of uncertainty defined by the upper and lower limit estimates. If data and procedures become available, the Agency will also provide "most likely" or "best" estimates of risk. This will be most feasible when human data are available and when exposures are in the dose range of the data. In certain cases, the linearized multistage procedure cannot be used with the observed data as, for example, when the data are nonmonotonic or flatten out at high doses. In these cases, it may be necessary to make adjustments to achieve low-dose linearity. When pharmacokinetic or metabolism data are available, or when other substantial evidence on the mechanistic aspects of the carcinogenesis process exists, a low-dose extrapolation model other than the linearized multistage procedure might be considered more appropriate on biological grounds. When a different model is chosen, the risk assessment should clearly discuss the nature and weight of evidence that led to the choice. Considerable uncertainty will remain concerning response at low doses; therefore, in most cases an upper-limit risk estimate using the linearized multistage procedure should also be presented. 3. Equivalent Exposure Units Among Species. Low-dose risk estimates derived from laboratory animal data extrapolated to humans are complicated by a variety of factors that differ among species and potentially affect the response to carcinogens. Included among these factors are differences between humans and experimental test animals with respect to life span, body size, genetic variability, population homogeneity, existence of concurrent disease, pharmacokinetic effects such as metabolism and excretion patterns, and the exposure regimen. The usual approach for making interspecies comparisons has been to use standardized scaling factors. Commonly employed standardized dosage scales include mg per kg body weight per day, ppm in the diet or water, mg per m2 body surface area per day, and mg per kg body weight per lifetime. In the absence of comparative toxicological, physiological, metabolic, and pharmacokinetic data for a given suspect carcinogen, the Agency takes the position that the extrapolation on the basis of surface area is considered to be appropriate because certain pharmacological effects commonly scale according to surface area (Dedrick, 1973; Freireich et al., 1966; Pinkel, 1958). B. Exposure Assessment In order to obtain a quantitative estimate of the risk, the results of the dose-response assessment must be combined with an estimate of the exposures to which the populations of interest are likely to be subject. While the reader is referred to the Guidelines for Estimating Exposures (U.S. EPA, 1986) for specific details, it is important to convey an appreciation of the impact of the strengths and weaknesses of exposure assessment on the overall cancer risk assessment process. At present there is no single approach to exposure assessment that is appropriate for all cases. On a case-by-case basis, appropriate methods are selected to match the data on hand and the level of sophistication required. The assumptions, approximations, and uncertainties need to be clearly stated because, in some instances, these will have a major effect on the risk assessment. In general, the magnitude, duration, and frequency of exposure provide fundamental information for estimating the concentration of the carcinogen to which the organism is exposed. These data are generated from monitoring information, modeling results, and/or reasoned estimates. An appropriate treatment of exposure should consider -90- ------- the potential for exposure via ingestion, inhalation, and dermal penetration from relevant sources of exposures including multiple avenues of intake from the same source. Special problems arise when the human exposure situation of concern suggests exposure regimens, e.g., route and dosing schedule that are substantially different from those used in the relevant animal studies. Unless there is evidence to the contrary in a particular case, the cumulative dose received over a lifetime, expressed as average daily exposure prorated over a lifetime, is recommended as an appropriate measure of exposure to a carcinogen. That is, the assumption is made that a high dose of a carcinogen received over a short period of time is equivalent to a corresponding low dose spread over a lifetime. This approach becomes more problematical as the exposures in question become more intense but less frequent, especially when there is evidence that the agent has shown dose-rate effects. , An attempt should be made to assess the level of uncertainty associated with the exposure assessment which is to be used in a cancer risk assessment. This measure of uncertainty should be included in the risk characterization (section 1II.C.) in order to provide the decision-maker with a clear understanding of the impact of this uncertainty on any final quantitative risk estimate. Subpopulations with heightened susceptibility (either because of exposure or predisposition) should, when possible, be identified. C. Risk Characterization Risk characterization is composed of two parts. One is a presentation of the numerical estimates of risk; the other is a framework to help judge the significance of the risk. Risk characterization includes the exposure assessment and dose-response assessment; these are used in the estimation of carcinogenic risk. It may also consist of a unit-risk estimate which can be combined elsewhere with the exposure assessment for the purposes of estimating cancer risk. Hazard identification and dose-response assessment are covered in sections II. and III.A., and a detailed discussion of exposure assessment is contained in EPA's Guidelines for Estimating Exposures (U.S. EPA, 1986). This section deals with the numerical risk estimates and the approach to summarizing risk characterization. 1. Options for Numerical Risk Estimates. Depending on the needs of the individual program offices, numerical estimates can be presented in one or more of the following three ways. a. Unit Risk - Under an assumption of low-dose linearity, the unit cancer risk is the excess lifetime risk due to a continuous constant lifetime exposure of one unit of carcinogen concentration. Typical exposure units include ppm or ppb in food or water, mg/kg/day by ingestion, or ppm or ug/m3 in air. b. Dose Corresponding to a Given Level of Risk This approach can be useful, particularly when using nonlinear extrapolation models where the unit risk would differ at different dose levels. c. Individual and Population Risks Risks may be characterized either in terms of the excess individual lifetime risks, the excess number of cancers [51 FR 33999] produced per year in the exposed population, or both. Irrespective of the <..,.lions chosen, the degree of precision and accuracy in the numerical risk estimates currently do not permit more than one significant figure to be presented. 2. Concurrent Exposure. In characterizing the risk due to concurrent exposure to several carcinogens, the risks are combined on the basis of additivity unless there is specific information to the contrary. Interactions of cocarcinogens, promoters, and inititators with known carcinogens should be considered on a case-by-case basis. 3. Summary of Risk Characterization. Whichever method of presentation is chosen, it is critical that the numerical estimates not be allowed to stand alone, separated from the various assumptions and uncertainties upon which they are based. The risk characterization should contain a discussion and interpretation of the numerical estimates that affords the risk manager some insight into the degree to which the quantitative estimates are likely to reflect the true magnitude of human risk, which generally cannot be known with the degree of quantitative accuracy reflected in the numerical estimates. The final risk estimate will be generally rounded to one significant figure and will be coupled with the EPA classification of the qualitative weight of evidence. For example, a lifetime individual risk of 2X10-4 resulting from exposure to a "probable human carcinogen" (Group B2) should be designated as 2X10-4 [B2] . This bracketed designation of the qualitative weight of evidence should be included with all numerical risk estimates (i.e., unit risks, which are risks at a specified concentration or concentrations corresponding to a given risk). Agency statements, such as FEDERAL REGISTER notices, briefings, and action memoranda, frequently include numerical estimates of carcinogenic risk. It is recommended that whenever these numerical estimates are used, the qualitative weight-of- evidence classification should also be included. The section on risk characterization should summarize the hazard identification, dose-response assessment, exposure assessment, and the public health risk estimates. Major assumptions, scientific judgments, and, to the extent possible, estimates of -91- ------- the uncertainties embodied in the assessment are presented. IV. EPA Classification System for Categorizing Weight of Evidence for Carcinogenicity from Human and Animal Studies (Adapted from I ARC) A. Assessment of Weight of Evidence for Carcinogenicity from Studies in Humans Evidence of Carcinogenicity from human studies comes from three main sources: 1. Case reports of individual cancer patients who were exposed to the agent(s). 2. Descriptive epidemiologic studies in which the incidence of cancer in human populations was found to vary in space or time with exposure to the agent(s). 3. Analytical epidemiologic (case-control and cohort) studies in which individual exposure to the agent(s) was found to be associated with an increased risk of cancer. Three criteria must be met before a causal association can be inferred between exposure and cancer in humans: 1. There is no identified bias that could explain the association. 2. The possibility of confounding has been considered and ruled out as explaining the association. 3. The association is unlikely to be due to chance. In general, although a single study may be indicative of a cause-effect relationship, confidence in inferring a causal association is increased when several independent studies are concordant in showing the association, when the association is strong, when there is a dose-response relationship, or when a reduction in exposure is followed by a reduction in the incidence of cancer. The weight of evidence for Carcinogenicity1 from studies in humans is classified as: 1. Sufficient evidence of Carcinogenicity, which indicates that there is a causal relationship between the agent and human cancer. 2. Limited evidence of Carcinogenicity, which indicates that a causal interpretation is credible, but that alternative explanations, such as chance, bias, or confounding, could not adequately be excluded. 1 For purposes of public health protection, agents associated with life-threatening benign tumors in humans are included in the evaluation. 8 An increased incidence of neoplasms that occur with high spontaneous background incidence (e.g., mouse liver tumors and rut pituitary tumors in certain strains) generally constitutes "sufficient" evidence of Carcinogenicity, but may be changed to "limited" when warranted by the specific information available on the agent, 3 Benign and malignant tumors will be combined unless the benign tumors are not considered to have the potential to progress to the associated malignancies of the same histogenic origin. 3. Inadequate evidence, which indicates that one of two conditions prevailed: (a) there were few pertinent data, or (b) the available studies, while showing evidence of association, did not exclude chance, bias, or confounding, and therefore a causal interpretation is not credible. 4. No data, which indicates that data are not available. 5. No evidence, twhich indicates that no association was found between exposure and an increased risk of cancer in well-designed and well- conducted independent analytical epidemiologic studies. B. Assessment of Weight of Evidence for Carcinogenicity from Studies in Experimental Animals These assessments are classified into five groups: 1. Sufficient evidence2 of Carcinogenicity, which indicates that there is an increased incidence of malignant tumors or combined malignant and benign tumors;3 (a) in multiple species or strains; or (b) in multiple experiments (e.g., with different routes of administration or using different dose levels); or (c) to an unusual degree in a single experiment with regard to high incidence, unusual site or type of tumor, or early age at onset. Additional evidence may be provided by data on dose-response effects, as well as information from short-term tests or on chemical structure. 2. Limited evidence of Carcinogenicity, which means that the data suggest a carcinogenic effect but are limited because: (a) the studies involve a single species, strain, or experiment and do not meet criteria for sufficient evidence (see section IV. B. l.c); (b) the experiments are restricted by inadequate dosage levels, inadequate duration of exposure to the agent, inadequate period of follow-up, poor survival, too few animals, or inadequate reporting; or (c) an increase in the incidence of benign tumors only. 3. Inadequate evidence, which indicates that because of major qualitative or quantitative limitations, the studies cannot be interpreted as showing either the presence or absence of a carcinogenic effect. 4. No data, which indicates that data are not available. 5. No evidence, which indicates that there is no increased incidence of neoplasms in at least two well-designed [51 PR 34000] and well- conducted animal studies in different species. The classifications "sufficient evidence" and "limited evidence" refer only to the weight of the experimental evidence that these agents are carcinogenic and not to the potency of their carcinogenic action. -92- ------- C. Categorization of Overall Weight of Evidence for Human Carcinogenicity The overall scheme for categorization of the weight of evidence of carcinogenicity of a chemical for humans uses a three-step process. (1) The weight of evidence in human studies or animal studies is summarized; (2) these lines of information are combined to yield a tentative assignment to a category (see Table 1); and (3) all relevant supportive information is evaluated to see if the designation of the' overall weight of evidence needs to be modified. Relevant factors to be included along with the tumor information from human and animal studies include structure-activity relationships; short-term test findings; results of appropriate physiological, biochemical, and toxicological observations; and comparative metabolism and pharmacokinetic studies. The nature of these findings may cause one to adjust the overall categorization of the weight of evidence. , The agents are categorized into five groups as follows: ' ''Group A Human Carcinogen -This group is used only when there is sufficient evidence from epidemiologic studies to support a causal association between exposure to the agents and cancer. Group B Probable Human Carcinogen This group includes agents for which the weight of evidence of human carcinogenicity based on .epidemiologic studies is "limited" and also includes agents for which the weight of evidence of .carcinogenicity based on animal studies is /'sufficient." The group is divided into two subgroups. Usually, Group Bl is reserved for agents for which there is limited evidence of carcinogenicity from epidemiologic studies. It is reasonable, for practical purposes, to regard an agent for which there is "sufficient" evidence of carcinogenicity in animals as if it presented a carcinogenic risk to humans. Therefore, agents for which there is "sufficient" evidence from animal studies and for which there is "inadequate evidence" or "no data" from epidemiologic studies would usually be categorized under Group B2. Group C Possible Human Carcinogen This group is used for agents with limited evidence of carcinogenicity in animals in the absence of human data: It includes a wide variety of evidence, e.g., (a) a malignant tumor response in a single well-conducted experiment that does not meet conditions for sufficient evidence, (b) tumor responses of marginal statistical significance in studies having inadequate design or reporting, (c) benign but not malignant tumors with an agent showing no response in a variety of short-term tests for mutagenicity, and (d) responses of marginal statistical significance in a tissue known to have a high or variable background rate. Group D -- Not Classifiable as to Human Carcinogenicity This group is generally used for agents with inadequate human and animal evidence of carcinogenicity or for which no data are available. Group E Evidence of Non-Carcinogenicity for Humans This group is used for agents" that show no evidence for carcinogenicity in at least two adequate animal tests in different species or in both adequate epidemiologic and animal studies. The designation of an agent as being in Group E is based on the available evidence and should not be interpreted as a definitive conclusion that the agent will not be a carcinogen under any circumstances. V. References Albert, R.E., Train, R.E., and Anderson, E. 1977. Rationale developed by the Environmental Protection Agency for the assessment'of carcinogenic risks. J. Natl. Cancer Inst. 58:1537-1541. Crump, K.S., Hoel, D.G., Langley, C.H., Peto R. 1976. Fundamental carcinogenic processes and their implications for low dose risk assessment. Cancer Res. 36:2973-2979. Dedrick, R.L. 1973. Animal Scale Up. J. Pharmacokinet. Biopharm. 1:435-461. Feron, V.J., Grice, H.C., Griesemer, R., Peto R., Agthe, C., Althoff, J., Arnold, D.L., Blumenthal, H., Cabral, J.R.P., Delia Porta, G., Ito, N., Kimmerle, G., Kroes, R., Mohr, U., Napalkov, N.P., Odashima, S., Page, N.P., Schramm, T., Steinhoff, D., Sugar, J., Tomatis, I.., Uehleke, H., and Vouk, V. 1980. Basic requirements for long-term assays for carcinogenicity. In: Long-term and short-term screening assays for carcinogens: a critical appraisal. I ARC Monographs, Supplement 2. Lyon, France: international Agency for Research on Cancer, pp 21- 83. Freireich, E.J., Gehan, E.A., Rail, D.P., Schmidt, L.H., and Skipper, H.E. 1966. Quantitative comparison of toxicity of anticancer agents in mouse, rat, hamster, dog, monkey and man. Cancer Chemother. Rep. 50:219-244. Haseman, J.K. 1984. Statistical issues in the design, analysis and interpretation of animal carcinogenicity studies. Environ. Health Perspect. 58:385-392. Haseman, J.K., Huff, J., and Boorman, G.A. 1984. Use of historical control data in carcinogenicity studies in rodents. Toxicol.Pathol. 12:126-135. Interagency Regulatory Liaison Group (IRLG). 1979. Scientific basis for identification of potential carcinogens and estimation of risks. J. Natl. Cancer Inst. 63:245-267. Interdisciplinary Panel on Carcinogenicity. 1984. Criteria for evidence of chemical carcinogenicity. Science 225:682-687. International Agency for Research on Cancer (I ARC). 1982. IARC Monographs on the [51 FR 340011 Evaluation of the Carcinogenic Risk of Chemicals to Humans, Supplement 4. Lyon, France: International Agency for Research on Cancer. Mantel, N. 1980. Assessing laboratory evidence for neoplastic activity. Biometrics 36:381-399. Mantel, N., and Haenszel, W. 1959. Statistical aspects of the analysis of data from retrospective studiesofdisease. J.Natl. Cancer Inst. 22:719-748. National Center for Toxicological Research (NCTR). 1981. Guidelines for statistical tests for carcinogenicity in chronic bioassays. NCTR Biometry Technical Report 81-001. Available from: National Center for Toxicological Research. -93- ------- TABLE 1.-ILLUSTRATIVE CATEGORIZATION OF EVIDENCE BASED ON ANIMAL AND HUMAN DATAI Sufficient Limited Inadequate No data No evidence Animal evidence Sufficient A B1 82 B2 B2 Limited A 81 C C C Inadequate A 81 D D D No data A 81 D D D No evidence A 81 D E1 E 1 The above assignments are presented for illustrative purposes. There may be nuances in the classification of both animal and human data indicating that different categorizations than those given in the table should be assigned. Furthermore, these assignments are tentative and may be modified by ancillary evidence. In this regard all relevant information should be evaluated to determine if the designation of the overall weight of evidence needs to be modified. Relevant factors to be included along with the tumor data from human and animal studies include structure-activity relationships, short-term test findings, results of appropriate physiological, biochemical, and toxicological observations, and comparative metabolism and pharmacokinetic studies. The nature of these findings may cause an adjustment of the overall categorization of the weight of evidence. National Research Council (NRG). 1983. Risk assessmentin the Federal government: managing the process. Washington, D.C.: National Academy Press. National Toxicology Program. 1984. Report of the Ad Hoc Panel on Chemical Carcinogenesis Testing and Evaluation of the National Toxicology Program, Board of Scientific Counselors. Available from: U.S. Government Printing Office, Washington, D.C. 1984-421 -132:4726. Nutrition Foundation. 1983. The relevance of mouse liver hepatoma to human carcinogenic risk: a report of the International Expert Advisory Committee to the Nutrition Foundation. Available from: Nutrition Foundation. ISBN 0- 935368-37-x. Office of Science and Technology Policy (OSTP). 1985. Chemical carcinogens: review of the science and its associated principles.FederalRegister50:10372-10442. Peto, R., Pike, M., Day, N., Gray, R., Lee, P., Parish, S., Peto, J., Richard, S., and Wahrendorf, J. 1980. Guidelines for simple, sensitive, significant tests for carcinogenic effects in long- term animal experiments. In: Monographs on the long-term and short-term screening assays for Carcinogens: a critical appraisal. IARC Monographs, Supplement 2. Lyon, France: International Agency for Research on Cancer, pp.311 -426. Pinkel, D. 1958. The use of body surface area as a criterion of drug dosage in cancer chemotherapy. Cancer Res. 18:853-856. Tomatis, L. 1977. The value of long-term testing for the implementation of primary prevention. In: Origins of human cancer. Hiatt, H.H., Watson, J.D., and Winstein, J.A., eds. Cold Spring Harbor Laboratory, pp. 1339-1357. U.S. Environmental Protection Agency (U.S. EPA). 1976. Interim procedures and guidelines for health risk and economic impact assessments of suspected carcinogens. Federal Register41:21402-21405. U.S. Environmental Protection Agency (U.S. EPA). 1980. Water quality criteria documents; availability. Federal Register 45:79318-79379. U.S. Environmental Protection Agency (U.S. EPA). 1983a. Good laboratory practices standards - toxicology testing. Federal Register48:53922. U.S. Environmental Protection Agency (U.S. EPA). 1983b. Hazard evaluations: humans and domestic animals. Subdivision F. Available from: NTIS, Springfield, VA. PB 83- 153916. U.S. Environmental Protection Agency (U.S. EPA). 1983c. Health effects test guidelines. Available from: NTIS, Springfield, VA.PB 83-232984. U.S. Environmental Protection Agency (U.S. EPA). 1986, Sept. 24.Guidelines for estimating exposures. Federal Register 51 (185): 34042-34054 U.S. Food and Drug Administration (U.S. FDA). 1 982. Toxicological principles for the safety assessment of direct food additives and color additives used in food. Available from: Bureau of Foods, U.S. Food and Drug Administration. Ward, J.M., Griesemer, H.A., and Weisburger.E.K. 1979a.The mouse liver tumor as an endpoint in carcinogenesis tests. Toxicol. Appl. Pharmacol. 5 1 :389-397. Ward, J.M., Goodman, D.G., Squire, R.A. Chu, K.C., and Linhart, M.S. 1979b. Neoplastic and nonneoplastic lesions in aging mice. J. Natl. Cancer Inst. 63:849-854. Part B: Response to Public and Science Advisory Board Comments /. Introduction This section summarizes the major issues raised during both the public comment period on the Proposed Guidelines for Carcinogen Risk Assessment published on November 23, 1984 (49 FR 46294), and also during the April 22-23, 1985, meeting of the Carcinogen Risk Assessment Guidelines Panel of the Science Advisory Board (SAB). In order to respond to these issues the Agency modified the proposed guidelines in two stages. First, changes resulting from consideration of the public comments were made in a draft sent to the SAB review panel prior to their April meeting. Secondly, the guidelines were further modified in response to the panel's recommendations. The Agency received 62 sets of comments during the public comment period, including 28 from corporations, 9 from professional or trade associations, and 4 from academic institutions. In general, the comments were favorable. The commentors welcomed the update of the 1976 guidelines and felt that the proposed guidelines of -94- ------- 1985 reflected some of the progress that has occurred in Understanding the mechanisms of carcinogenesis. Many commentors, however, felt that additional changes were warranted. The SAB concluded that the guidelines are "reasonably complete in their conceptual framework and are sound in their overall interpretation of the scientific issues" (Report by the SAB Carcinogenicity Guidelines Review Group, June 19, 1985). The SAB suggested various editorial changes and raised some issues regarding the content of the proposed guidelines, which are discussed below. Based on these recommendations, the Agency has modified the draft guidelines. II. Office of Science and Technology Policy Report on Chemical Carcinogens Many commentors requested that the final guidelines not be issued until after publication of the report of the Office of Technology and Science Policy (OSTP) on chemical carcinogens. They further requested that this report be incorporated into the final Guidelines for Carcinogen Risk Assessment. The final OSTP report was published in 1985 (50 FR 10372). In its deliberations, the Agency reviewed the final OSTP report and feels that the Agency's guidelines are consistent with the principles established by the OSTP. In its review, the SAB agreed that the Agency guidelines are generally consistent with the OSTP report. To emphasize this consistency, the OSTP principles have been incorporated into the guidelines when controversial issues are discussed. III. Inference Guidelines Many commentors felt that the proposed guidelines did not provide a sufficient distinction between scientific fact and policy decisions. Others felt that EPA should not attempt to propose firm guidelines in the absence of scientific consensus. The SAB report also indicated the need to "distinguish recommendations based on scientific evidence from those based on science policy decisions." The Agency agrees with the recommendation that policy, judgmental, or inferential decisions should be clearly identified. In its revision of the proposed guidelines, the Agency has included phrases (e.g., "the Agency takes the position that") to more clearly distinguish policy decisions. The Agency also recognizes the need to establish procedures for action on important issues in the absence of complete scientific knowledge or consensus. This need was acknowledged in'both the National Academy of Sciences book entitled Risk Management in the Federal Government: Managing the Process and the OSTP report on chemical carcinogens. As the NAS report states, "Risk assessment is an analytic process that is firmly based on scientific considerations, but it also_ requires judgments to be made when the available information is incomplete. These judgments inevitably draw on both scientific and policy considerations." 151 PR 34002] The judgments of the Agency have been based on current available scientific information and on the combined experience of Agency experts. These judgments, and the resulting guidance, rely on inference; however, the positions taken in these inference guidelines are felt to be reasonable and scientifically defensible. While all of the guidance is, to some degree, based on inference, the guidelines have attempted to distinguish those issues that depended more oh judgment. In these cases, the Agency has stated a position but has also retained flexibility to accommodate new data or specific circumstances that demonstrate that the proposed position is inaccurate. The Agency recognizes that scientific opinion will be divided on these issues. Knowledge about carcinogens and carcinogenesis is progressing at a rapid rate. While these guidelines are considered a best effort at the present time, the Agency has attempted to incorporate flexibility into the current guidelines and also recommends that the guidelines be revised as often as warranted by advances in the field. IV. Evaluation of Benign Tumors Several commentors discussed the appropriate interpretation of an increased incidence of benign tumors alone or with an increased incidence of malignant tumors as part of the evaluation of the carcinogenicity of an agent.' Some comments were supportive of the position in the proposed guidelines, i.e., under certain circumstances, the incidence of benign and malignant tumors would be combined, and an increased incidence of benign tumors alone would be considered an indication, albeit limited, of carcinogenic potential. Other commentors raised concerns about the criteria that would be used to decide which tumors should be combined. Only a few commentors felt that benign tumors should never be considered in evaluating carcinogenic potential. . The Agency believes that current information supports the use of benign tumors. The guidelines have been modified to incorporate the language of the OSTP report, i.e., benign tumors will be combined with malignant tumors when scientifically defensible. This position allows flexibility in evaluating the data base for each agent. The guidelines have also been modified to indicate that, whenever benign and malignant tumors have been combined, and the agent is considered a candidate for quantitative risk extrapolation, the contribution of benign tumors to the estimation of risk will be indicated. V. Transplacental and Multigenerational Animal Bioassays -95- ------- As one of its two proposals- for additions to the guidelines, the. SAB recommended a discussion of transplacental and multigenerational animal bioassays for carcinogenicity. The Agency agrees that such data, when available, can provide useful information in the evaluation of a chemical's potential carcinogenicity and has stated this in the final guidelines. The Agency has also revised the guidelines to indicate that such studies may provide additional information on the metabolic and pharmacokinetic properties of the chemical. More guidance on the specific use of these studies will be considered in future revisions of these guidelines'. VI. Maximum Tolerated Dose The proposed guidelines discussed the implications of using a maximum tolerated dose (MTD) in bioassays for carcinogenicity. Many commentors requested that EPA define MTD. The tone of the comments suggested that the commentors were concerned about the uses and interpretations of high-dose testing. The Agency recognizes that controversy currently surrounds these issues. The appropriate text from the OSTP report has been incorporated into the final guidelines which suggests that the consequences of high-dose testing be evaluated on a case-by-case basis. VII. Mouse Liver Tumors A large number of commentors expressed opinions about the assessment of bioassays in which the only increase in tumor incidence was liver tumors in the mouse. Many felt that mouse liver tumors were afforded too much credence, especially given existing information that indicates that they might arise by a different mechanism, e.g., tissue damage followed by regeneration. Others felt that mouse liver tumors were but one case of a high background incidence of one particular type of tumor and that all such tumors should be treated in the same fashion. The Agency has reviewed these comments and the OSTP principle regarding this issue. The OSTP report does not reach conclusions as to the treatment of tumors with a high spontaneous background rate, but states, as is now included in the text of the guidelines, that these data require special consideration. Although questions have been raised regarding the validity of mouse liver tumors in general, the Agency feels that mouse liver tumors cannot be ignored as an indicator of carcinogenicity. Thus, the position in the proposed guidelines has not been changed: an increased incidence of only mouse liver tumors will be regarded as "sufficient" evidence of carcinogenicity if all other criteria, e.g., replication and malignancy, are met with the understanding that this classification could be changed to "limited" if warranted. The factors that may cause this re-evaluation are indicated in the guidelines. VIII. Weight-of Evidence Catagories The Agency was praised by both the public and the SAB for incorporating a weight-of-evidence scheme into its evaluation of carcinogenic risk. Certain specific aspects of the scheme, however, were criticized. 1. Several commentors noted that while the text of the proposed guidelines clearly states that EPA will use all available data in its categorization of the weight of the evidence that a chemical is a carcinogen, the classification system in Part A, section IV did not indicate the manner in which EPA will use information other than data from humans and long-term animal studies in assigning a weight- of-evidence classification. The Agency has added a discussion to Part A, section IV.C. dealing with the characterization of overall evidence for human carcinogenicity. This discussion clarifies EPA's use of supportive information to adjust, as warranted, the designation that would have been made solely on the basis of human and long-term animal studies. 2. The Agency agrees with the SAB and those commentors who felt that a simple classification of the weight of evidence, e.g., a single letter or even a descriptive title, is inadequate to describe fully the weight of evidence for each individual chemical. The final guidelines propose that a paragraph summarizing the data should accompany the numerical estimate and weight-of-evidence classification whenever possible. 3. Several commentors objected to the descriptive title E (No Evidence of Carcinogenicity for Humans) because they felt the title would be confusing to people, inexperienced with the classification system. The title for Group E, No Evidence of Carcinogenicity for Humans, was thought by these commentors to suggest the absence of data. This group, however, is intended to be reserved for agents for-which there exists credible data demonstrating that the agent is not carcinogenic. Based on these comments and further discussion, the Agency has changed the . [51FR34003] title of Group E to "Evidence cf Non-Carcinogenicity for Humans," 4. Several commentors felt that the title for Group C, Possible Human Carcinogen, was not sufficiently distinctive from Group B, Probable Human Carcinogen. Other commentors felt that those agents that minimally qualified for Group C would lack sufficient data for such a label. The Agency recognizes that Group C covers a range of chemicals and has considered whether to subdivide Group C. The consensus of the Agency's -96- ------- Carcinogen Risk Assessment Committee, however, is that the current groups, which are based on the IARC categories, are a reasonable stratification and should be retained at present. The structure of the groups will.be reconsidered when the guidelines are reviewed in the future. The Agency also feels that the descriptive title it originally selected best conveys the meaning of the classification within the context of EPA's past and current activities. 5. Some commentors indicated a concern about the distinction between Bl and B2 on the basis of epidemiologic evidence only. This issue has been under discussion in the Agency and may be revised in future versions of the guidelines. 6. Comments were also received about the possibility of keeping the groups for animal and human data separate without reaching a combined classification. The Agency feels that a combined classification is useful; thus, the combined classification was retained in the final guidelines. The SAB suggested that a table be added to Part A, section IV to indicate the manner in which human and animal data would be combined to obtain an overall weight-of-evidence category. The Agency realizes that a table that would present all permutations of potentially available data would be complex and possibly impossible to construct since numerous combinations of ancillary data (e.g., genetic toxicity, pharmacokinetics) could be used to raise or lower the weight-of-evidence classification. Nevertheless, the Agency decided to include a table to illustrate the most probable weight-of-evidence classification that would be assigned on the basis of standard animal and human data without consideration of the ancillary data. While it is hoped that this table will clarify the weight-of-evidence classifications, it is also important to recognize that an agent may be assigned to a final categorization different from the category which would appear appropriate from the table and still conform to the guidelines. IX. Quantitative Estimates of Risk The method for quantitative estimates of carcinogenic risk in the proposed guidelines received substantial comments from the public. Five issues were discussed by the Agency and have resulted in modifications of the guidelines. 1. The major criticism was the perception that EPA would use only one method for the extrapolation of carcinogenic risk and would, therefore, obtain one estimate of risk. Even commentors who concur with the procedure usually followed by EPA felt that some indication of the uncertainty of the risk estimate should be included with the risk estimate. The Agency feels that the proposed guidelines were not intended to suggest that EPA would perform quantitative risk estimates in a rote or mechanical fashion. As indicated by the OSTP report and paraphrased in the proposed guidelines, no single mathematical procedure has been determined to be the most appropriate method for risk extrapolation. The final guidelines quote rather than paraphrase the OSTP principle. The guidelines have been revised to stress the importance of considering all available data in the risk assessment and now state, "The Agency will review each assessment as to the evidence on carcinogenic mechanisms and other biological or statistical evidence that indicates the suitability of a particular extrapolation model." Two issues are emphasized: First, the text now indicates the potential for pharmacokinetic information to contribute to the assessment of carcinogenic risk. Second, the final guidelines state that time-to-tumor risk extrapolation models may be used when longitudinal data on tumor development are available. 2. A number of commentors noted that the proposed guidelines did not indicate how the uncertainties of risk characterization would be presented. The Agency has revised the proposed guidelines to indicate that major assumptions, scientific judgments, and, to the extent possible, estimates of the uncertainties embodied in the risk assessment will be presented along with the estimation of risk. 3. The proposed guidelines stated that the appropriateness of quantifying risks for chemicals in Group C (Possible Human Carcinogen), specifically those agents that were on the boundary of Groups C and D (Not Classifiable as to Human Carcinogenicity), would be judged on a case-by-case basis. Some commentors felt that quantitative risk assessment should not be performed on any agent in Group C. Group C includes a wide range of agents, including some for which there are positive results in one species in one good bioassay. Thus, the Agency feels that many agents in Group C will be suitable for quantitative risk assessment, but that judgments in this regard will be made on a case-by- case basis. 4. A few commentors felt that EPA intended to perform quantitative risk estimates on aggregate tumor incidence. While EPA will consider an increase in total aggregate tumors as suggestive of potential carcinogenicity, EPA does not generally intend to make quantitative estimates of carcinogenic risk based on total aggregate tumor incidence. 5. The proposed choice of body surface area as an interspecies scaling factor was criticized by several commentors who felt that body weight was also appropriate and that both methods should be used. The OSTP report recognizes that both scaling factors are in common use. The Agency feels that the choice of the body surface area scaling factor can be -97- ------- justified from the data on effects of drugs in various species. Thus, EPA will continue to use this scaling factor unless data on a specific agent suggest that a different scaling factor is justified. The uncertainty engendered by choice of .scaling factor will be included in the summary of uncertainties associated with the assessment of risk mentioned in point 1, above. In the second of its two proposals for additions to the proposed guidelines, the SAB suggested that a sensitivity analysis be included in EPA's quantitative estimate of a chemical's carcinogenic potency. The Agency agrees that an analysis of the assumptions and uncertainties inherent in an assessment of carcinogenic risk must be accurately portrayed. Sections of the final guidelines that deal with this issue have been strengthened to reflect the concerns of the SAB and the Agency. In particular, the last paragraph of the guidelines states that "major assumptions, scientific judgments, and, to the extent possible, estimates of the uncertainties embodied in the assessment" should be presented in the summary characterizing the risk. Since the assumptions and uncertainties will vary for each assessment, the Agency feels that a formal requirement for a particular type of sensitivity analysis would be less useful than a case-by-case evaluation of the particular assumptions and uncertainties most significant for a particular risk assessment. *US GOVERNMENT PRIhrrWCOFHCEd 992 -750-002/60100 -98- ------- |