UNITED STATES ENVIRONMENTAL PROTECTION AGENCY WASHINGTON D.C. 20460 OFFICE OF THE ADMINISTRATOR SCIENCE ADVISORY BOARD January 21, 2010 EPA-SAB-10-003 The Honorable Lisa P. Jackson Administrator U.S. Environmental Protection Agency 1200 Pennsylvania Avenue, N.W. Washington, D.C. 20460 Subject: Review of EPA's Draft Expert Elicitation Task Force White Paper Dear Administrator Jackson: EPA's Office of the Science Advisor requested that the Science Advisory Board (SAB) review a white paper on expert elicitation (EE) prepared by a task force of the Agency's Science Policy Council. EPA's draft white paper defines expert elicitation as "a formal process by which expert judgment is obtained to quantify or probabilistically encode uncertainty about some uncertain quantity, relationship, parameter, or event of decision relevance." In response to the Agency's request, an SAB panel conducted a peer review of the draft white paper. The enclosed advisory report responds to the charge questions posed by the Agency. The SAB commends the task force for preparing a comprehensive and thoughtful white paper on the potential use of expert elicitation at the Agency. The white paper was commissioned by EPA's Science Policy Council "to initiate a dialogue within the Agency about the conduct and use of EE and then to facilitate future development and appropriate use of EE methods." The SAB judges that the white paper succeeds in providing much information needed for the proposed dialogue and to facilitate future development and appropriate use of EE. The white paper provides a good introduction to EE for readers who may be unfamiliar with it and careful discussion of many of the issues that must be faced if the Agency is to use EE in the future. ------- The SAB offers some recommendations to improve the white paper: 1. Describe the strengths and weaknesses of EE in comparison with those of other approaches for aggregating information and quantifying uncertainty from multiple sources. Other methods for aggregating information include meta-analysis and expert committees. This discussion should consider when EE should be used as a complement or substitute for other methods. 2. Maintain and emphasize the distinction between issues that are particular to EE and issues that arise in any analysis of environmental policy or in any method to incorporate expert judgment. Because EE is a transparent method, it can highlight issues such as selection of experts, cognitive biases, and problem structuring that are also important for other approaches. 3. Address methods for evaluating and ensuring the quality of the elicited judgments, including tests of coherence (e.g., consistency among judgments of mutually dependent quantities) and performance (e.g., calibration, defined as consistency of elicited probability distributions with true values of quantities, which can only be evaluated for quantities whose values become known). 4. Expand the discussion about combining judgments across experts to consider: (a) how the decision about whether and how to combine depends on the objective of the study; (b) the level of the analysis at which to combine (e.g., combine judgments about a model input or combine model outputs derived by running a model using each expert's judgment about the input); and (c) performance-based methods for combination. 5. More carefully delineate the types of quantities suitable for EE. The SAB recommends that the quantities being elicited be measurable (at least in principle, if not in practice). Models used in environmental assessment are, of course, simplifications of the real world and often include parameters that do not correspond to any measurable feature of the real world (e.g., transfer coefficients in a compartmental fate-and-transfer model; dispersion coefficients in an atmospheric model). Model-dependent parameters should be elicited only when they can be unambiguously translated into or inferred from measurable quantities. 6. Give greater attention to the need to be explicit about the values of other quantities that are relevant to the quantity being elicited. This is important for two reasons. First, an expert's judgment about the value of a quantity will depend on whether other quantities are fixed, and if so at what values. (If not fixed, the expert must incorporate uncertainty about the values of these other quantities and their effects on the value of the elicited quantity into his judgment.) Second, when multiple quantities are elicited, the values of some of them may be mutually dependent (e.g., the value of one quantity may depend on the value of another or some common factor may influence the values of both quantities). If the quantities are used as inputs to a model, it may be important to incorporate the dependence among them in order to accurately characterize uncertainty about the model output. Influence diagrams can be helpful for maintaining consistency about the values at which quantities are fixed. ------- 7. Emphasize the need for flexibility in EE implementation. The SAB suggests that the EPA be careful not to stifle innovation in EE methods by prescribing "checklist" or "cookbook" approaches. Rather, EE guidance should be in the form of goals and criteria for evaluating success that can be met by multiple approaches. Finally, the SAB encourages EPA to continue to explore the use of EE, to support research on the performance of EE and alternative approaches, and to conduct additional EE studies to gain experience and understanding of the advantages and disadvantages of EE and other methods in diverse applications. Thank you for the opportunity to provide advice on this important and timely topic. The SAB looks forward to receiving your response to this advisory. Sincerely yours, /Signed/ /Signed/ Dr. Deborah L. Swackhamer Dr. James K. Hammit Chair Chair Science Advisory Board Science Advisory Board Expert Elicitation Advisory Panel Enclosures ------- NOTICE This report has been written as part of the activities of the EPA Science Advisory Board (SAB), a public advisory group providing extramural scientific information and advice to the Administrator and other officials of the Environmental Protection Agency. The SAB is structured to provide balanced, expert assessment of scientific matters related to problems facing the Agency. This report has not been reviewed for approval by the Agency and, hence, the contents of this report do not necessarily represent the views and policies of the Environmental Protection Agency, nor of other agencies in the Executive Branch of the Federal government, nor does mention of trade names of commercial products constitute a recommendation for use. Reports of the SAB are posted on the EPA website at http://www. epa. gov/sab. ------- Enclosure A U.S. Environmental Protection Agency Science Advisory Board Expert Elicitation Advisory Panel CHAIR Dr. James K. Hammitt, Professor, Center for Risk Analysis, Harvard University, Boston, MA MEMBERS Dr. William Louis Ascher, Donald C. McKenna Professor of Government and Economics, Claremont McKenna College, Claremont, CA Dr John Bailar, Scholar in Residence, The National Academies, Washington, DC Dr. Mark Borsuk, Assistant Professor, Engineering Sciences, Thayer School of Engineering, Dartmouth College, Hanover, NH Dr. Wandi Bruine de Bruin, Research Faculty, Department of Social & Decision Sciences, Carnegie Mellon University, Pittsburgh, PA Dr Roger Cooke, Professor of Mathematics at Delft University of Technology and Chauncey Starr Senior Fellow for Risk Analysis at Resources for the Future, Resources for the Future, Washington, DC Dr. John Evans, Senior Lecturer on Environmental Science, Harvard University, Portsmouth, NH Dr. Scott Person, Senior Scientist, Applied Biomathematics, Setauket, NY Dr. Paul Fischbeck, Professor, Engineering and Public Policy and Social and Decision Sciences, Carnegie Mellon University, Pittsburgh, PA Dr. H. Christopher Frey, Professor, Department of Civil, Construction and Environmental Engineering, College of Engineering, North Carolina State University, Raleigh, NC Dr. Max Henrion, CEO and Associate Professor, Lumina Decision Systems, Inc., Los Gatos, CA Dr. Alan J. Krupnick, Senior Fellow and Director, Quality of the Environment Division, Resources for the Future, Washington, DC 11 ------- Dr. Mitchell J. Small, The H. John Heinz III Professor of Environmental Engineering, Department of Civil & Environmental Engineering and Engineering & Public Policy , Carnegie Mellon University, Pittsburgh, PA Dr Katherine Walker, Senior Staff Scientist, Health Effects Institute, Boston, MA Dr. Thomas S. Wallsten, Professor and Chair, Department of Psychology, University of Maryland, College Park, MD SCIENCE ADVISORY BOARD STAFF Dr. Angela Nugent, Designated Federal Officer, Washington, DC in ------- Enclosure B U.S. Environmental Protection Agency Science Advisory Board Fiscal Year 2009 CHAIR Dr. Deborah L. Swackhamer, Professor of Environmental Health Sciences and Co- Director Water Resources Center, Water Resources Center, University of Minnesota, St. Paul, MN SAB MEMBERS Dr. David T. Allen, Professor, Department of Chemical Engineering, University of Texas, Austin, TX Dr. John Balbus, Adjunct Associate Professor, George Washington University, School of Public Health and Health Services, Washington, DC Dr. Gregory Biddinger, Coordinator, Natural Land Management Programs, Toxicology and Environmental Sciences, ExxonMobil Biomedical Sciences, Inc., Houston, TX Dr. Timothy Buckley, Associate Professor and Chair, Division of Environmental Health Sciences, School of Public Health, The Ohio State University, Columbus, OH Dr. Thomas Burke, Professor, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD Dr. James Bus, Director of External Technology, Toxicology and Environmental Research and Consulting, The Dow Chemical Company, Midland, MI Dr. Deborah Cory-Slechta, Professor, Department of Environmental Medicine, School of Medicine and Dentistry, University of Rochester, Rochester, NY Dr. Terry Daniel, Professor of Psychology and Natural Resources, Department of Psychology, School of Natural Resources, University of Arizona, Tucson, AZ Dr. Otto C. Doering III, Professor, Department of Agricultural Economics, Purdue University, W. Lafayette, IN Dr. David A. Dzombak, Walter J. Blenko Sr. Professor of Environmental Engineering, Department of Civil and Environmental Engineering, College of Engineering, Carnegie Mellon University, Pittsburgh, PA Dr. T. Taylor Eighmy, Interim Vice President for Research, Office of the Vice President for Research, University of New Hampshire, Durham, NH IV ------- Dr. Baruch Fischhoff, Howard Heinz University Professor, Department of Social and Decision Sciences, Department of Engineering and Public Policy, Carnegie Mellon University, Pittsburgh, PA Dr. James Galloway, Professor, Department of Environmental Sciences, University of Virginia, Charlottesville, VA Dr. John P. Giesy, Professor, Department of Zoology, Michigan State University, East Lansing, MI Dr. James K. Hammitt, Professor, Center for Risk Analysis, Harvard University, Boston, MA Dr. Rogene Henderson, Senior Scientist Emeritus, Lovelace Respiratory Research Institute, Albuquerque, NM Dr. James H. Johnson, Professor and Dean, College of Engineering, Architecture & Computer Sciences, Howard University, Washington, DC Dr. Bernd Kahn, Professor Emeritus and Director, Environmental Radiation Center, Nuclear and Radiological Engineering Program, Georgia Institute of Technology, Atlanta, GA Dr. Agnes Kane, Professor and Chair, Department of Pathology and Laboratory Medicine, Brown University, Providence, RI Dr. Meryl Karol, Professor Emerita, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA Dr. Catherine Kling, Professor, Department of Economics, Iowa State University, Ames, IA Dr. George Lambert, Associate Professor of Pediatrics, Director, Center for Childhood Neurotoxicology, Robert Wood Johnson Medical School-UMDNJ, Belle Mead, NJ Dr. Jill Lipoti, Director, Division of Environmental Safety and Health, New Jersey Department of Environmental Protection, Trenton, NJ Dr. Lee D. McMullen, Water Resources Practice Leader, Snyder & Associates, Inc., Ankeny, IA Dr. Judith L. Meyer, Distinguished Research Professor Emeritus, Odum School of Ecology, University of Georgia, Athens, GA Dr. Jana Milford, Professor, Department of Mechanical Engineering, University of Colorado, Boulder, CO ------- Dr. M. Granger Morgan, Lord Chair Professor in Engineering, Department of Engineering and Public Policy, Carnegie Mellon University, Pittsburgh, PA Dr. Christine Moe, Eugene J. Gangarosa Professor, Hubert Department of Global Health, Rollins School of Public Health, Emory University, Atlanta, GA Dr. Duncan Patten, Research Professor, Department of Land Resources and Environmental Sciences, Montana State University, Bozeman, MT, USA Mr. David Rejeski, Director, Foresight and Governance Project, Woodrow Wilson International Center for Scholars, Washington, DC Dr. Stephen M. Roberts, Professor, Department of Physiological Sciences, Director, Center for Environmental and Human Toxicology, University of Florida, Gainesville, FL Dr. Joan B. Rose, Professor and Homer Nowlin Chair for Water Research, Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI Dr. Jonathan M. Samet, Professor and Chair , Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD Dr. James Sanders, Director and Professor, Skidaway Institute of Oceanography, Savannah, GA Dr. Jerald Schnoor, Allen S. Henry Chair Professor, Department of Civil and Environmental Engineering, Co-Director, Center for Global and Regional Environmental Research, University of Iowa, Iowa City, IA Dr. Kathleen Segerson, Professor, Department of Economics, University of Connecticut, Storrs, CT Dr. Kristin Shrader-Frechette, O'Neil Professor of Philosophy, Department of Biological Sciences and Philosophy Department, University of Notre Dame, Notre Dame, IN Dr. V. Kerry Smith, W.P. Carey Professor of Economics , Department of Economics , W.P Carey School of Business , Arizona State University, Tempe, AZ Dr. Thomas L. Theis, Director, Institute for Environmental Science and Policy, University of Illinois at Chicago, Chicago, IL Dr. Valerie Thomas, Anderson Interface Associate Professor, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA Dr. Barton H. (Buzz) Thompson, Jr., Robert E. Paradise Professor of Natural VI ------- Resources Law at the Stanford Law School and Perry L. McCarty Director, Woods Institute for the Environment Director, Stanford University, Stanford, CA Dr. Robert Twiss, Professor Emeritus, University of California-Berkeley, Ross, CA Dr. Thomas S. Wallsten, Professor, Department of Psychology, University of Maryland, College Park, MD Dr. Lauren Zeise, Chief, Reproductive and Cancer Hazard Assessment Branch, Office of Environmental Health Hazard Assessment, California Environmental Protection Agency, Oakland, CA SCIENCE ADVISORY BOARD STAFF Mr. Thomas Miller, Designated Federal Officer, U.S. Environmental Protection Agency, Washington, DC vn ------- Review of EPA's Draft Expert Elicitation Task Force White Paper EPA's Office of the Science Advisor requested that the Science Advisory Board (SAB) review a draft white paper on expert elicitation (EE) prepared by a task force of the Agency's Science Policy Council. As described in the white paper, EE "is a formal, systematic process of obtaining and quantifying expert judgment on the probabilities of events relationships, or parameters.. .It can enable quantitative estimation of uncertain values and can provide uncertainty distributions where data are unavailable or inadequate. In addition, EE may be valuable for questions that are not necessarily quantitative such as model conceptualization or design of observational systems." The white paper describes EPA's experience with EE in the Office of Air and Radiation, Office of Air Quality Planning and Standards and the use or recommendation of the approach for such different EPA applications as assessing the magnitude of sea level risk associated with climate change and ecological model development. The draft white paper was intended "to initiate a dialogue within the Agency about the conduct and use of EE and then to facilitate future development and appropriate use of EE methods." The white paper discussed the potential utility of using expert elicitation to support EPA regulatory and non-regulatory analyses and decision-making. It provided recommendations for expert elicitation "good practices" and described steps for a broader application across EPA. RESPONSE TO AGENCY CHARGE OTJESTTONS Charge question A - background and definition of expert elicitation Does the white paper provide a comprehensive accounting oj'the potential strengths, limitations, and uses ofEE? Please provide comments that would help to further elucidate these potential strengths, limitations, and uses. Please identify others (especially EPA uses), that merit discussion. The white paper provides a comprehensive overview of EE, its strengths and limitations, and issues relevant to its use by EPA. We offer some suggestions for possible improvement. 1. Include a more focused discussion of when to use EE that compares it with other approaches that might be used as alternatives, or complements, in particular cases. EE is a method to characterize what is known about the value of a quantity of interest. For example, EPA may be concerned about the shape and slope of an exposure-response function (e.g., when analyzing the consequences of policies to control exposure to a pollutant). EE is a structured method for synthesizing existing data, models, and understanding by eliciting subjective probability distributions from subject-matter experts. Other methods for characterizing such quantities by synthesizing existing information include, inter alia, (a) unstructured expert judgment of EPA or other analysts, perhaps complemented by literature review, (b) meta-analysis of empirical studies, (c) unstructured expert committees (e.g., SAB, National Research Council), and (d) structured group processes (e.g., Delphi). Another method ------- to estimate a quantity of interest is to collect additional primary data. Primary data collection can provide more data, and data that are more relevant to the problem that motivates EPA's interest. EE can be employed as a substitute or complement to other approaches. In some cases, results of a single empirical study or a meta-analysis of multiple studies may provide an appropriate characterization of what is know about a quantity. In others, it may be appropriate to conduct a meta-analysis as input to EE. In still other cases, it may be appropriate to conduct EE without any meta-analysis. Even when additional primary data are collected, it may still be appropriate to conduct an EE to interpret the implications of these data for the problem of interest to EPA. EE may be particularly useful in cases where it is necessary to extrapolate some distance from available data (e.g., from data on laboratory animals to humans, or from epidemiological data on an occupationally exposed human population to an environmentally exposed population). EE studies can be integrated into research planning if they elicit information on how an expert's judgments would be influenced by possible outcomes of a research study. For example, experts can be queried about their probability distributions of relationships given alternative outcomes of a study (Kadane and Wolfson, 1998) or direct elicitation of the likelihood function for a proposed experiment can be made (Small, 2008). With these assessments, the EE results can be used as part of value-of-information studies to identify research priorities and may be updated in an adaptive manner as new research results are obtained. In summary, EE is a useful way to organize and understand what is known about a quantity and to identify what remains to be studied. 2. Include a fuller discussion contrasting subjective (Bayesian) and objective (frequentist) probabilities. Frequentist probabilities describe the (objective) chance of an outcome conditional on a hypothesis (e.g., the probability an individual with specified exposure will develop cancer conditional on a linear no-threshold dose-response model with specified slope); subjective probabilities characterize an individual's degree of belief that a particular event will occur (e.g., that an individual with specified exposure will develop cancer). Recognition of the relevance of subjective probabilities has several implications. First, EPA is generally interested in the probabilities of specific environmental, health, and economic outcomes, not in whether a particular scientific model is "correct." In an oft-quoted remark of George Box, "all models are wrong, but some are useful." In evaluating the outcomes of alternative policies, EPA should (and sometimes does) incorporate uncertainty about which of several models provides the best approximation. Second, the objective when using EE should be to elicit judgments about quantities about which people could know the truth, if the appropriate research were conducted. The white paper describes the goal of EE as characterization of experts' beliefs "about relationships, quantities, events, or parameters of interest" (p. 22). Quantities and events, if potentially measurable, are appropriate objects for elicitation. In contrast, elicitation of relationships or parameters that cannot be measured, even in principle, can be dangerous. Experts who do not work with the ------- specific model in which a parameter is defined may have little knowledge about the value of the parameter. Moreover, the relationship between the parameter value and outcomes that are potentially measurable may depend on the choice among several alternative models, some or all of which the expert may reject. Consider an example from Jones et al. (2001). The spread of a radioactive plume from a power plant is often modeled as a power-law function of distance, i.e., a(x) = P XQ, where a is the lateral plume spread and x is downwind distance from the source. P and Q are parameters whose values depend on atmospheric stability at the time of release. This model is not derived from physical laws but provides a useful description when parameters are estimated using results of tracer experiments. Experts have experience with values of a(x) measured in tracer experiments and values of lateral spread at multiple distances from the source can be elicited. However, the problem of "probabilistic inversion," i.e., identifying probability distributions on P and Q that, when propagated through the model, produce the elicited distributions for lateral spread is difficult; indeed, there may not be any solution or the solution may not be unique (Jones et al., 2001; Cooke and Kraan, 2000). It is unreasonable to expect an expert to be able to perform this probabilistic inversion in the context of an EE. (Note that the problem of probabilistic inversion also exists when the distributions of lateral spread are obtained from measurements rather than from EE.) Other examples of model parameters that may not be suitable quantities for elicitation abound. These include the transfer coefficients in compartmental models describing environmental fate and transport or pharmacokinetics in the human body and the parameters of the multistage dose-response model often used for carcinogenic chemicals. Third, since subjective probabilities measure an individual's degree of belief, different experts may legitimately attach different probabilities to the same event. There may be no "correct" probability and, in general, no unique or well-accepted method for choosing among probabilities held by well-qualified experts. EE is a method for eliciting and integrating an expert's judgments about a quantity into a coherent expression and characterizing the expert's knowledge using probability. 3. Distinguish issues that are specific to EE from those that are common to any method of eliciting judgments or those common to any method for assessing consequences of environmental policies. Perhaps because it is a relatively transparent process, EE highlights many issues that are common to other methods that can be used to obtain judgments from domain experts or other individuals (as recognized in the white paper). For example, selection of experts is likely to be critical to EE, expert committees (e.g., SAB, National Research Council), Delphi methods, surveys, and peer review. Structuring the analysis and defining the quantities of interest are critical even when values will be obtained by literature review, measurement, or other methods that do not require explicit participation by experts. Judgments are inherent in many decisions made by analysts regarding choice and interpretation of data, models, metrics, and results. ------- 4. The white paper could be informed by and reference more recent literature. A list of suggested references appears in Appendix A. Charge question B - transparency Transparency is important for analyses that support Agency scientific assessments and for characterization of uncertainties that inform Agency decision making. Please comment on whether the white paper presents adequate mechanisms for ensuring transparency when 1) considering the use ofEE (chapter 4), 2) selecting experts (chapter 5); and 3) and presenting and using EE results (chapter 6). Please identify any additional strategies that could improve transparency. Overall, the white paper is sensitive to issues of transparency. However, the extent to which "mechanisms for ensuring transparency" are described varies by topic. The white paper does present adequate mechanisms for ensuring transparency with regard to selecting experts and presenting and using EE results, but does not present such mechanisms when considering the use ofEE. Although chapter 4 discusses a wide range of factors that should be considered when determining whether to conduct an EE study, it does not appear to describe any mechanisms for ensuring transparency about this decision. The question of whether to use EE in a particular instance should be viewed as part of the larger question of which analytic methods to use, and any mechanisms for ensuring transparency about choice of methods should be applicable to consideration of whether to use EE. Transparency regarding choice of methods is perhaps best ensured through including a discussion of methods whenever results of an analysis are presented. This discussion can include description of the rationale for the particular methods chosen and discussion of the comparative strengths and weaknesses of alternative methods that were not adopted. In general, EE is at least as transparent as most alternative methods for obtaining expert judgments. Unlike committee processes, each expert provides a set of judgments about the quantities that are elicited and so the degree of overlap or disagreement among experts can be made readily apparent. Although it can be argued that transparency would be further enhanced by associating each distribution with the expert who provided it, the panel concludes that the disadvantages of identification (e.g., implicit pressure to provide a distribution consistent with an institutional position) more than offset the advantages in most cases. To enhance transparency, it is important to characterize expertise of the experts (individually and jointly) and to identify the experts' rationales for their quantitative judgments (for credibility and to decide when new understanding renders the results obsolete). Some of the benefits of enhanced transparency include the ability to: 1) evaluate strengths and weaknesses of the study in the future; 2) evaluate and enhance credibility by demonstrating that the approach was applied rigorously; and 3) withstand litigation and other challenges. In determining what should be transparent, it is useful to distinguish between process and ------- results. Aspects of the process that should be transparent include the methods used to select experts, their identities and relevant characteristics (e.g., scientific discipline), the questions used to elicit judgments and the methods used to ensure that the questions are clear to the experts and elicitors, and the interactions between experts and elicitors. Aspects of the results that should be transparent include the problem framing, definitions of the quantities elicited and characterization of other quantities on which the quantities that are elicited are conditioned, the experts' judgments, and their rationales for their judgments (e.g., key empirical studies, suspected biases of existing data). The white paper could provide further discussion about how to capture each expert's assumptions and basis for his or her judgments, acknowledging the tradeoffs associated with deepening the interactions between elicitor and expert. The extended interaction between expert and elicitor that is often employed is intended to produce a more carefully considered judgment, i.e., one that better reflects each expert's understanding of a topic. However, this interaction can influence the results as compared with a more restricted interaction, e.g., in a remotely- conducted Delphi or survey. The extent of interaction has implications for the resources required to conduct and document a study. The interaction between expert and elicitor and the rationale for the expert's judgment may be documented through an interview transcript, a written description of the rationale that the expert drafts or approves, a brief note, or other means. Charge question C.I - selecting experts Section 5.2 considers the process of selecting of experts. a) Although it is agreed that this process should seek a balanced group of experts who possess all appropriate expertise, there are multiple criteria that can be used to achieve these objectives. Does this white paper adequately address the different criteria and strategies that may be used for nominating and selecting experts? b) Are there additional technical aspects about this topic that should be included? Section 5.2 provides a good description of criteria and strategies for selecting experts. As noted, the problem of expert selection is common to any effort to use expert judgment in support of the development of regulatory policy - whether informal or formal, structured or unstructured. Hence the guidance offered below applies to other methods of including expert judgment as well. For an EE study to succeed, the experts selected must be credible, the set of experts must be acceptable to stakeholders, and the process for selection should be clearly documented and replicable. To enhance the transparency and credibility of the study, experts should articulate the basis for their judgments. When quantitative judgments are to be obtained, whether through EE or alternative methods, the study will be better if experts have the ability to characterize their beliefs in terms of probability distributions that are well-calibrated and informative (i.e., relatively sharp). Typically, it is impossible to assess calibration of experts' judgments for the quantities that are the subject of the study, because the true values will not become known in a relevant time period. There are exceptions, however: Hawkins and Evans (1989) and Walker et al. (2003) evaluated individual experts' judgments about subsequently measured human exposure to hazardous air pollutants. Calibration on seed variables (i.e., other quantities in the ------- expert's field, the values of which become known in a timely manner) can be assessed. A test for whether assessing calibration on seed variables is useful is to ask whether the perceived quality of the experts' judgments on the quantities of interest is affected by their performance (collectively or individually) on the seed variables. Assessing experts' calibration on almanac questions (e.g., the length of the Nile River) is not useful when such questions are not within their domain of expertise and not relevant to quantities that are of interest. The white paper suggests that expert selection may depend on whether the purpose of the study is to elicit the range of reasonable judgments or to provide a central estimate of the scientific community (pp. 69, 72). The panel offers two cautions: First, it may be difficult to select experts to represent the range of reasonable judgments because their judgments may not be known before the elicitation and it may be difficult to determine what judgments are "reasonable." Second, scientific truth is not determined by majority vote, and so the frequency with which a view is held is not necessarily a good indicator of its validity. Moreover, estimates of any central tendency from an EE study may be sensitive to the exact set of experts selected, because of the small number of experts included. Moreover, it is difficult to recruit a valid probability sample of experts because of difficulties in (a) defining the universe from which a sample should be drawn and (b) overcoming selection biases associated with experts' availability and willingness to participate in what can be a time-consuming and challenging process. Charge question C.2 - multi-expert aggregation Sections 5.4 and 6.7present multi-expert aggregation. a) Among prominent EE practitioners there are varied opinions on the validity and approaches to aggregating the judgments obtained from multiple experts. Does this white paper capture sufficiently the range of important views on this topic? b) Are there additional technical aspects about this topic that should be included? As noted in the white paper, there is disagreement among EE scholars about the extent to which multi-expert aggregation is desirable and about the most appropriate methods for aggregation when it is conducted. The extent to which aggregation may be appropriate may depend on the purpose of the study (e.g., to estimate consequences of a policy change or to characterize current understanding of some relationship). Aggregation of experts' judgments can be considered part of a more general question about when to aggregate across sources of information. One aspect of this question is: how much should analysts aggregate across information sources when presenting estimates of policy consequences to a policy maker (and to other interested parties)? Information sources can include not only individual experts but also alternative models (e.g., dose-response models with or without a threshold), data used to estimate model parameters (e.g., different epidemiological cohorts), and others. One possibility is to aggregate as many relevant information sources as possible and to present the results in the form of a probability distribution or other summary of the likely magnitude of effects for relevant endpoints. Another possibility is to present multiple estimates of the magnitude of effects based on alternative information sources so that the policy maker (and others) can aggregate these multiple estimates judgmentally or using some other approach. Clearly, some ------- aggregation is virtually always required to yield a manageable number of alternative estimates for the decision maker to consider (e.g., even if there are only three parameters and three information sources for each, there are 27 alternative estimates). However, some indication of how the estimates depend on critical choices among information sources is also useful. A second aspect of the question is: at what stage of analysis to aggregate? With a non- linear model, the output when running the model using parameters based on an aggregation of information sources will generally differ from an aggregation of the outputs obtained when running the model using parameters based on each information source alone. The white paper would be improved by including a fuller discussion of performance- based combination methods (Cooke, 1991). Note that it is possible to empirically evaluate the quality of alternative methods for combining distributions when the values of the quantities that are elicited become known. For example, Cooke and Goossens (2008) compared the performance of alternative methods of combining experts' distributions for seed variables (see Clemen, 2008, and Cooke, 2008 for discussion), and one could evaluate the quality of alternative combinations of expert judgments in cases where the values of the target quantities become known (e.g., Walker et al., 2003; Hawkins and Evans, 1989). Whether experts' judgments are combined or not, the panel agrees with the recommendation that each judgment be reported individually (p. 83). This allows readers to see the individual judgments, to evaluate their similarities and differences, and potentially to aggregate them using alternative approaches. When the effects on model outputs of differences among experts' judgments about input values are not obvious, it may be useful to also report how model outputs depend on differences among the experts' judgments. Charge question C.3 - problem structure Section 5.2.2 discusses how the problem of an EE assessment is structured and decomposed using an "aggregated" or "disaggregated" approach. a) The preferred approach may be influenced by the experts available and the analyst's judgment. Does this discussion address the appropriate factors to consider when developing the structure for questions to be used in an EE assessment? b) Are there additional technical aspects about this topic that should be included? The panel agrees that the problem structure must be acceptable to the experts, specifically that it accords with their knowledge. It urges that the quantities for which judgments are elicited be quantities that are measurable (at least in principle, if not necessarily in practice). To the extent that experts use a common model that permits unambiguous translation between a model parameter and a quantity that is measurable (in principle), elicitation of judgments about the parameter may be more convenient (see related discussion and examples in response to charge question A). The white paper should give more attention to dependence among quantities. Dependence is important for at least two reasons. First, for experts to provide judgments about ------- the value of some quantity, they must be told the values of other quantities on which that quantity is being conditioned. Second, when experts are asked to provide judgments about multiple quantities, it may be important to elicit their judgments about dependencies among these quantities as well. Regarding the first point, if the quantity being elicited is dependent on the values of other quantities, then the expert must be told which of those quantities should be considered known (or held constant) and which should be considered unknown (or left unspecified). For the quantities considered to be known, the values must be specified so that the expert can take into account their influence on the elicited quantity. The influence of quantities left unspecified must be folded into the expert's uncertainty distribution. The "clairvoyance test," which requires "that an omniscient being with complete knowledge of the past, present, and future could definitively answer the question" (p. 12, fn. 4) attempts to capture the first issue (of dependence on other quantities) but is inadequately articulated. A better approach is to describe the measurement that one would make to determine the value of the quantity, including which of the other factors would be controlled. To illustrate, consider the elicitation of an expert's judgment about the maximum hourly ozone concentration in Los Angeles next summer. Maximum hourly ozone depends on temperature, wind speed and direction, precipitation, motor-vehicle emissions, and other factors. Depending on the purpose of the elicitation, the distribution of some of these may be specified. A clairvoyant would know the actual values of all these factors, but the expert cannot. Uncertainty about the values of the factors that are not specified must be folded into the expert's distribution. If experts are also asked their judgment about PM concentrations, the conditionalization on factors affecting PM concentrations should be consistent with that for the ozone question. Regarding the second point, when experts are asked to provide judgments about multiple quantities, dependencies among these quantities may be important. For example, using independent marginal distributions (ignoring correlation) for multiple uncertain parameters in a model can produce misleading outputs. Elicitation of mutually dependent quantities is complex and there is as yet no accepted best method. Evans et al. (1994) illustrate one approach, in which dependencies among multiple factors relating to the toxicity of chloroform were illustrated as a detailed tree and judgments about each factor were conditioned on the values of other factors in the tree. Jones et al. (2001) elicited marginal distributions for continuous variables, then characterized dependence by asking experts to report the probability that one variable would exceed its subjective median conditional on another variable exceeding its subjective median. Clemen et al. (2000) report experimental tests of different methods; more recent methods are discussed by Kurowicka and Cooke (2006). Maintaining a consistent "conditionalization" (i.e., a set of assumptions about which quantities are fixed at what levels or following what probability distribution) across a large study is critical. Problem structure and consistent conditionalization can be facilitated by use of an influence diagram that depicts the variables of interest and causal relationships or dependencies among these variables. The panel recommends replacing the diagram in Figure 6.1 with one formatted as an influence diagram showing relationships among variables. ------- The white paper identifies four categories of uncertainty (parameter, model, scenario, and decision-rule) and suggests that EE may be used to address each of them (pp. 50-51). The panel suggests that scenario and decision-rule uncertainty are not suitable objects for EE. Scenario uncertainty involves questions of designing scenarios that provide useful information about how the outputs of a model depend on various assumptions about input values. This question is distinct from about the magnitude of a potentially measurable quantity, such as a model input. Hence EE is not an appropriate tool for obtaining expert judgment about how best to design scenarios (although expert judgments about the values of input quantities, the relative importance of multiple factors to the value of an endpoint, or other issues can be a relevant input to scenario design). Decision-rule uncertainty concerns the principles that will be used to make a policy decision. The choice of principles is one to be made by policy makers subject to statute, guidance, and other applicable criteria, not by expert judgment about what principles will (or should) be applied. The white paper distinguishes scientific information from social value judgments and preferences and suggests that EE should not be used to provide values and preferences (pp. 11, 110). The panel acknowledges the distinctions between consequences, values, and preferences but notes that characterization of public preferences that may be used as inputs to economic evaluation (such willingness to pay for a specified reduction in health risk) is a scientific question that may be legitimately addressed using EE. Description of public preferences is distinct from the question of the role of these preferences in policy making. Analogously, whether the dose-response function for a toxicant has a threshold and the level of the threshold are scientific questions that are distinct from the questions of whether and how these quantities should be used in policy making. Charge question C.4 & 5 - findings and recommendations 4) Sections 7.1 and 7.2, presents the Task Force's findings and recommendations regarding: 1) selecting EE as a method of analysis, 2) planning and conducting EE, and 3) presenting and using results of an EE assessment. Are these findings and recommendations supported by the document? 5) Please identify any additional findings and recommendations that should be considered. Overall, the findings and recommendations are supported by the white paper. The panel suggests that these sections should include a discussion of the strengths and weaknesses of EE as compared with other approaches (e.g., meta-analysis, expert committees). An important topic that receives little attention in the white paper is that of the coherence of judgments of an expert. When an expert provides probability distributions to characterize personal knowledge about each of several quantities, the expert is providing information about a multivariate probability distribution. When there are dependencies among variables, it can be very easy to report distributions that do not satisfy basic properties of multivariate distributions (e.g., that the covariance matrix is positive semidefmite). Elicitation protocols should be structured to help an expert provide a coherent multivariate distribution that is consistent with ------- his or her knowledge, for example by eliciting distributions of one variable conditional on several alternative levels of another variable on which it is dependent, rather than eliciting a correlation coefficient between the two variables. Elicitation protocols can also include consistency checks, both to test for coherence of probability distributions and to confirm that the judgments are consistent with the expert's information. The literature on cognitive biases is richer than is indicated in the white paper. In addition to estimation biases such as anchoring and availability heuristics that are discussed, there are biases relating to uncertainty perception such as probability misperception, the conjunction fallacy, pseudocertainty, base-rate fallacy, and neglect of probability, all of which may distort experts' perceptions (Tucker et al., 2008). Strategies for overcoming these cognitive illusions and biases to ensure accurate and honest assessments should be discussed. The white paper reports, accurately, that EEs conducted in the manner it describes require substantial resources - they are neither quick nor inexpensive. The quantity of resources needed for an EE depend on the complexity of the question, including the need to structure the problem so that the quantities are sufficiently well-defined that they are appropriate for elicitation, the number of experts, the need for pre- or post-elicitation workshops, the extent to which the elicitation interview and the rationale for specific judgments are documented, and other factors. Some studies have been conducted at lower cost, e.g., of the 45 studies conducted by the group at Technical University Delft, most required between one and three person-months (Cooke and Goossens, 2008) although others have required one person-year and up to a week of time from each expert (Goossens et al, 2008). It would be useful to clarify the tradeoffs between cost and quality of the results of an EE study and to understand how it varies with study design. The panel suggests that the white paper could be made more accessible to the wide audience for which it is intended by including in the white paper glossary additional key terms with practical definitions. Some suggested terms are listed in Appendix B. Charge question D - development of future guidance As EPA considers the future development of guidance beyond this white paper, what additional specific technical areas should be addressed? What potential implications of having such guidance should be considered? Do the topics and suggestions covered in the white paper regarding selection, conduct, and use of this technique provide a constructive foundation for developing "best practices "for EE methods? The topics and suggestions covered in the white paper regarding selection, conduct, and use of EE provide a constructive foundation for developing a description of "best practices" for EE, but some parts of the white paper should be revised to incorporate newer literature than is currently included (e.g., cognitive biases and elicitation of quantities, methods for assessing performance of experts, and aggregation of judgments across experts). In considering the development of guidance, the panel counsels EPA to be careful not to 10 ------- stifle innovation in EE methods and to encourage research on the performance of EE and alternative methods for characterizing uncertainty. As noted in the white paper, considerable experience with structured expert judgment exists in other fields, including nuclear, aerospace, volcanology, health, and finance. The challenge is to bring this experience to bear on the specific problem areas within EPA's mandate. It may be useful for EPA to conduct several EE studies on issues that are not critical to current policy decisions, employing different methods and evaluating results. Different teams could employ different methods to a common quantity to facilitate comparison of results. The panel encourages the development of guidance characterized as a set of goals and criteria for evaluating success that can be met by multiple approaches rather than something that will be used as a checklist or "cookbook." 11 ------- References Clemen, R. T. 2008. Comment on Cooke's Classical Method. Reliability Engineering and System Safety 93: 760-765. Clemen, R. T., Fischer, G. W., and Winkler, R. L. (2000). Assessing dependence: Some experimental results. Management Science 46, 1100-1115. Cooke, R.M. 1991. Experts in Uncertainty: Opinion and Subjective Probability in Science, Oxford. Cooke, R.M. 2008. Response to Comments. Reliability Engineering and System Safety 93:775-777. Cooke, R.M., and B. Kraan, 2000. Uncertainty in Compartmental Models for Hazardous Materials - A Case Study. Journal of Hazardous Materials 71: 253-268. Cooke, R.M., and L.J.H. Goossens. 2000. Procedures Guide for Structured Expert Judgment, European Commission Directorate-General for Research, EUR 18820. Cooke, R.M., and L.J.H. Goossens. 2008. TU Delft Expert Judgment Data Base. Reliability Engineering and System Safety 93: 657-674. Goossens, L.J.H., R.M. Cooke, AR. Hale, and Lj. Rodic-Wiersma. 2008. Fifteen Years of Expert Judgment at TU Delft. Safety Science 46: 234-244. Hawkins, N.C., and J.S. Evans. 1989. Subjective Estimation of Toluene Exposures: A Calibration Study of Industrial Hygienists. Applied Industrial Hygiene, 4: 61-68. Jones, J.A. et al., 2001. Probabilistic Accident Consequence Uncertainty Assessment using COSYMA: Methodology and Processing Techniques, EUR 18827, European Communities. Kadane, J.B. and L.J. Wolfson. 1998. Experiences in elicitation (with discussion). The Statistician 47: 1-20. Kurowicka, D., and R.M. Cooke. 2006. Uncertainty Analysis with High Dimensional Dependence Modeling, Wiley. Small, M.J. 2008. Methods for assessing uncertainty in fundamental assumptions and associated models for cancer risk assessment. Risk Analysis 28(5): 1289-1307. Tucker, W.T., S. Person, A. Finkel, and D. Slavin (eds.) 2008. Strategies for Risk Communication: Evolution, Evidence, Experience. Annals of the New York Academy of Sciences, Volume 1128, Blackwell Publishing, Boston. 12 ------- Appendix A Suggested additional references for inclusion in a revised White Paper Ariely, D. 2008. Predictably Irrational: The Hidden Forces that Shape our Decisions, Harper Collins Publishers, NY Ariely, D., Au, W-T, Bender, R. H., Budescu, D. U., Dietz, C. B., Gu, H., Wallsten, T.S., and Zauberman, G. 2000. The effects of averaging probability estimates between and within j udges. Journal of Experimental Psychology: Applied 6, 130-147. Bruine de Bruin, W., Fischbeck, P.S., Stiber, N.A. & Fischhoff, B. 2002. What number is "fifty-fifty"? Redistributing excess 50% responses in risk perception studies. Risk Analysis 22, 725-735. Bruine de Bruin, W., Fischhoff, B., Brilliant, L., & Caruso, D. 2006. Expert judgments of pandemic influenza risks. Global Public Health 1, 178-193. Bruine de Bruin, W., Fischhoff, B., Millstein, S.G. & Halpern-Felsher, B.L. 2000. Verbal and numerical expressions of probability: "It's a fifty-fifty chance." Organizational Behavior and Human Decision Processes 81, 115-131. Bruine de Bruin, W., Parker, A.M., & Fischhoff, B. (2007). Individual differences in Adult Decision-Making Competence. Journal of Personality and Social Psychology 92, 938-956. Clemen, RT. 2008. A Comment on Cooke's Classical Method. Reliability Engineering and System Safety 2008; 93 (5): 760-765. Cooke, RM, Goossens LHJ. TU Delft expert judgment database. Reliability Engineering and System Safety 2008; 93(5): 657-674. Fischhoff, B. & Bruine de Bruin, W. 1999. Fifty-fifty=50%? Journal of Behavioral Decision Making 72, 149-163. Fischhoff, B. 1994. What forecasts (seem to) mean. International Journal of Forecasting 10, 387-403. Gilovich, Thomas, Dale Griffin, and Daniel Kahneman, eds. 2002. Heuristics and biases: the psychology of intuitive judgment. Cambridge: Cambridge University Press. Glimcher, P.W. 2003. Decisions, Uncertainty, and the Brain: The Science of Neuroeconomics. MIT Press/Bradford Press. Kahneman, Daniel, and Amos Tversky, eds. 2000. Choices, values, and frames. Cambridge: Cambridge University Press. Kahneman, Slovic and Tversky eds. 1982. Judgment Under Uncertainty: Heuristics and Biases., Cambridge University Press, New York. Karlin, S. andW. J. Studden. 1966. Tchebyshev Systems: With Applications in Analysis and Statistics. Interscience, New York. Morgan, M.G., Fischhoff, B., Bostrom, A., & Atman, C. 2001. Risk communication: The mental models approach. New York: Cambridge University Press. Morgan, M.G., H. Dowlatabadi, M. Henrion, D. Keith, R. Lempert, S. McBrid, M. Small, T. Wilbanks (eds.), Best Practice Approaches for Characterizing, Communicating, and Incorporating Scientific Uncertainty in Decisionmaking, Final Report, Synthesis and Assessment Product 5.2, CCSP, National Oceanic and Atmospheric Administration, Washington D.C., 2009. available at http://www.climatescience.gov/Library/sap/sap5-2/final-report/default.htm. 13 ------- O'Hagan, A, Buck, C, Daneshkhah, A, Eiser, JR, Garthwaite, PH, Jenkinson, DJ, Oakley, JE, Rakow, T 2006. Uncertain Judgements; Eliciting Experts' Probabilities. John Wiley & Sons Ltd. Chichester, England. Tuomisto, J.T., A. Wilson, J.S. Evans, M. Tainio. 2008. Uncertainty in mortality response to airborne fine particulate matter: Combining European air pollution experts, Reliability Engineering and System Safety 93(5): 732-744. Schwarz, N. (1996). Cognition and communication: Judgmental biases, research methods and the logic of conversation. Hillsdale, NJ: Erlbaum. Smith, I.E. 1990. Moment Methods for Decision Analysis. Ph.D. Dissertation, Stanford University, Stanford, California. Wallsten, T.S.,& Diederich, A 2001. Understanding Pooled Subjective Probability Estimates. Mathematical Social Science, 41, 1-18. Winkler, R.L. and RT Clemen. 2004. Multiple Experts vs. Multiple Methods: Combining Correlation Assessments. Decision Analysis 1(3): 167-176. Woloshin, S., & Schwartz, L.M. (2002). Press releases: Translating research into news. Journal of the American Medical Association 287, 2856-2858. In addition, many useful documents are available at the following websites: NUREG EU Probabilistic accident consequence uncertainty analysis http ://www. osti .gov/bridge/basicsearch.j sp EU Probabilistic accident consequence uncertainty assessment using COSYMA http://cordis.europa.eu/fp5-euratom/src/lib_docs.htm RFF workshop expert judgment http ://www.rff org/rff/Events/Expert-Judgment-Workshop, cfm Radiation Protection Dosimetry 90 (2000) http://rpd.oxfordjournals.org/content/vol90/issue3/index.dtl TU Delft Web site http://dutiosc.twi.tudelft.nl/~risk/ 14 ------- Appendix B Suggested terms to add to the glossary in the White Paper and to use consistently throughout the document Accurate Aggregation Assumption Assumptions Availability Averaging Bias Cognitive illusion Conditionalization Conditional probability Data gap Data quality Decision options Dependence Domain expert Elicitation Elicitor Encoding Estimates Event Extrapolation Heuristics Input Model Model choice Objective Overconfidence Paradigm Parameter Precision Quality Quantity Relationship Representativeness Robust Seed variable Subjective Subjective probability Weighting 15 ------- |