Council's Characterizing Uncertainty in Particulate Matter Benefits Using Expert Elicitation

UNITED STATES ENVIRONMENTAL PROTECTION AGENCY
WASHINGTON D.C. 20460
OFFICE OF THE ADMINISTRATOR
SCIENCE ADVISORY BOARD
July 11,2008
EPA-COUNCIL-08-002
The Honorable Stephen L. Johnson
Administrator
U.S. Environmental Protection Agency
1200 Pennsylvania Avenue, N.W.
Washington, D.C. 20460

Subject: Characterizing Uncertainty in Particulate Matter Benefits Using
Expert Elicitation

Dear Administrator Johnson:

Prior to issuing the 2006 National Ambient Air Quality Standard (NAAQS) for
parti culate matter (PM2.5), EPA's Office of Air Quality Planning and Standards (OAQPS)
within the Office of Air and Radiation completed a multi-year effort to characterize the
estimated benefits of reduced premature mortalities associated with exposures to PM2.5.
EPA used expert elicitation to quantitatively assess the relationship between exposures to
PM2.5 and the incidence of mortality, thus complementing and expanding the
epidemiological literature on this subject and incorporating probabilistic uncertainty
analysis. Completed in 2006, the PM2.5-Mortality Expert Elicitation addressed the
concentration-response function between PM2.5 and mortality and provided probabilistic
characterizations of uncertainty from 12 independent experts. The PM2.5-Mortality
Expert Elicitation received very favorable peer reviews (RTI International, Peer Review
of Expert Elicitation, September 2006) and became the basis for assessing monetized
benefits of the 2006 PM2.5 NAAQS. EPA then asked the Science Advisory Board (SAB)
Staff Office to convene an expert panel to review the application of the PM2.5-Mortality
Expert Elicitation results to the benefits assessment for PM2.5. In particular, EPA asked
for guidance on the interpretation of expert elicitation results and presentation in the
Executive Summary, Press Release and Benefits Analysis chapter of EPA's Regulatory
Impact Analysis associated with the 2006 PM2.5 standard.

To conduct this review, the Advisory Council on Clean Air Compliance Analysis
(Council) was augmented with noted experts in the health effects of air pollution and
expert elicitation (see enclosed roster). The Council and invited experts met on May 8,
2008, to discuss charge questions from EPA. The Council's detailed advice and
recommendations are provided in the enclosed Advisory with highlights below.

-------
The Council endorses EPA's application of the expert elicitation results. The
Council finds that EPA accurately characterized each expert's concentration-response
function and expressed the uncertainty surrounding these functions in a technically sound
manner. The probability distributions propagated for the experts' concentration-response
functions were appropriately constructed and applied to estimate benefits. EPA
thoroughly captured and expressed the breadth and diversity of opinion among experts
and clearly differentiated between estimates based on empirical data (i.e., individual
epidemiological studies) and those based on expert judgments (that are informed by
epidemiological studies). The Council was asked whether EPA's benefits assessment
responded to the National Research Council (NRC) recommendation to "move the
assessment of uncertainties from its ancillary analyses into the primary analysis by
conducting probabilistic, multiple-source uncertainty analyses." (NRC, Estimating the
Health-Risk-Reduction Benefits of Proposed Air Pollution Regulations, 2002). Our
answer is yes.

The Council was asked whether the Agency should move toward presenting a
central estimate with uncertainty bounds or continue to provide separate estimates for
each expert. The Council believes the answer to this question depends on the context of
the expert elicitation and its results. On issues where experts have a wide range of
opinions, it is important to provide separate estimates for each expert (or cluster of
experts sharing similar views), thus emphasizing the uncertainty associated with the
issue. But where experts largely agree, it would be appropriate to collapse the various
estimates into a single distribution (or point estimate with uncertainty bounds) while still
providing the individual estimates elsewhere, perhaps in an Appendix or website. In
future analyses, the decision about aggregation must be made in the context of each
analysis and its purpose.

On the critical side, the Council believes there is room for improvement in
conveying the differences in assumptions (including the influence of key empirical
studies) that drive the differences among experts' concentration-response functions. It
would be useful to know why the experts agree on some things and disagree on others.
The benefits chapter could be improved by devoting less space to the experts'
quantitative judgments in exchange for more discussion to characterize their rationales.
The text could better elucidate the relative importance of various sources of uncertainty:
both those that were quantified and those that were not quantified. These issues could be
addressed in the chapter and brought forward into the Executive Summary.

The Council has concerns about the Executive Summary and Press Release. The
PM2.5-Mortality Expert Elicitation showed a strong consensus among scientists, however
the Executive Summary and Press Release failed to show this central mass of expert
opinion. Instead, the Press Release presented the tails of the distribution, showing a
range of $8 to $76 billion dollars in net benefits. Presented with this range, the casual
reader could easily infer substantial differences in scientific opinion when, in fact, there
was a pronounced central cluster of views on PM2.5 mortality. To communicate with a
wider audience, the Executive Summary and Press Release should have clearly stated that
11

-------
scientific differences existed only with respect to the magnitude of the effect of PM2.5 on
mortality, not whether such an effect existed. The Executive Summary would have
benefited from a short description of the PM2.5-Mortality Expert Elicitation and the
rationale for its use in the context of the PM2.5 regulatory process.  Additional efforts are
needed to identify the most effective means of communicating both methods and results
to different kinds of readers. This is of particular importance for the Executive Summary
and Press Release, which are much more likely to be read in their entirety.  The Council
suggests that alternative and less complex graphics would provide much more useful
information than the tables that are included in the Executive Summary.

       Detailed recommendations are included in the enclosed Advisory.  On behalf of
the entire Council, we appreciate this opportunity to provide timely advice to the Agency.
We hope these comments are helpful to EPA as it proceeds with this important work.

                                  Sincerely,

                                         /Signed/

                                  James K. Hammitt, Chair
                                  Advisory Council on Clean Air Compliance
                                         Analysis
Enclosures
                                        in

-------
                  U.S. Environmental Protection Agency
  Advisory Council on Clean Air Compliance Analysis Augmented for
       Benefits of Reduced PM-Mortality using Expert Elicitation
CHAIR

Dr. James K. Hammitt, Professor, Center for Risk Analysis, Harvard University,
Boston, MA.
COUNCIL MEMBERS

Dr. David T. Allen, Gertz Regents Professor in Chemical Engineering, Department of
Chemical Engineering, University of Texas, Austin, TX.

Dr. Dallas Burtraw, Senior Fellow, Resources for the Future, Washington, DC.

Dr. Shelby Gerking, Professor of Economics, Department of Economics, College of
Business Administration, University of Central Florida, Orlando, FL.

Dr. Wayne Gray, Professor, Department of Economics, Clark University, Worcester,
MA.

Dr. F. Reed Johnson, Senior Fellow and Principal Economist, RTI Health Solutions,
Research  Triangle Institute, NC.

Dr. Katherine Kiel, Associate Professor, Department of Economics, College of the Holy
Cross, One College Street, Worcester, MA.

Dr. Virginia McConnell, Senior Fellow and Professor of Economics, Resources for the
Future, Washington, DC.

Dr. David Popp, Associate Professor of Public Administration, Center for Policy
Research, The Maxwell School, Syracuse University, Syracuse, NY.

Dr. Chris Walcek, Senior Research Scientist, Atmospheric Sciences Research Center,
State University  of New York, Albany, NY.
HEALTH EFFECTS SUBCOMMITTEE MEMBERS

Mr. J.  Fintan Hurley, Scientific Director, Institute of Occupational Medicine (IOM),
                                     IV

-------
Edinburgh, Scotland, United Kingdom.

Dr. Michael T. Kleinman, Professor, Department of Community & Environmental
Medicine, University of California, Irvine, CA.

Dr. Rebecca Parkin, Professor of Environmental and Occupational Health and Associate
Dean for Research and Public Health Practice, School of Public Health and Health
Services, George Washington University Medical Center, Washington, DC.
INVITED EXPERTS

Dr. Aaron Cohen, Principal Scientist, Health Effects Institute, Charlestown Navy Yard,
Boston, MA.

Dr. John Evans, Senior Lecturer on Environmental Science, Harvard University, Kuwait
Public Health Project, Portsmouth, NH.

Dr. H. Christopher Frey, Professor, Civil, Construction and Environmental
Engineering, College of Engineering, North Carolina State University, Raleigh, NC.

Dr. Ronald Wyzga, Technical Executive, Air Quality Health and Risk, Electric Power
Research Institute, Palo Alto, CA.
SCIENCE ADVISORY BOARD STAFF

Dr. Holly Stallworth, Designated Federal Officer, Science Advisory Board Staff Office,
Environmental Protection Agency, Washington, DC.

-------
                                    NOTICE
This report has been written as part of the activities of the U.S. Environmental Protection
Agency's Advisory Council on Clean Air Compliance Analysis (Council), a federal
advisory committee administratively located under the EPA Science Advisory Board
(SAB) Staff Office. The Council is chartered to provide extramural scientific information
and advice to the Administrator and other officials of the EPA. The Council is structured
to provide balanced, expert assessment of scientific matters related to issue and problems
facing the Agency. This report has not been reviewed for approval by the Agency and,
hence, the contents of this report do not necessarily represent the views and policies of
the EPA, nor of other agencies in the Executive Branch of the Federal government, nor
does mention of trade names or commercial products constitute a recommendation for
use. Council reports  are posted on the SAB Web site at: http://www.epa.gov/sab.
                                       VI

-------
           Advisory Council on Clean Air Compliance Analysis
 Advisory on Characterizing Uncertainty in Particulate Matter Benefits
                           Using Expert Elicitation
1.  In the PMNAAQS benefits chapter, has EPA accurately characterized each expert's
   concentration-response function as expressed in the PM-Mortality Expert Elicitation
   report and conveyed the differences in assumptions (including the influence of key
   empirical studies) that drive the differences among the concentration-response
   functions?

In the benefits chapter (Chapter 5) of the particulate matter (PM) National Ambient Air
Quality Standard (NAAQS) regulatory impact analysis (RIA), EPA has accurately
described the experts' concentration-response (C-R) functions in general terms and has
clearly summarized the implications of each expert's C-R function for the expected
reduction in fatalities (and their monetary valuation) in Figs. 5-10 - 5-13.  The benefits
chapter does not report each expert's C-R function nor describe the factors (such as
differences in assumptions and reliance on particular studies) that drive the differences
among the C-R functions in  the chapter. Extensive description of the individual experts'
C-R functions as well as their perspectives, rationales, and reliance on empirical studies
in formulating their judgments are described in the original reports of the expert
elicitation study, including The Expanded Expert Judgment Assessment of the
Concentration-Response Relationship between PM2.s Exposure and Mortality (Industrial
Economics, 2006) and its technical support documents. The Council believes that it
would be useful if Chapter 5 provided some discussion of the primary studies on which
the experts relied and of the factors that drive differences among their responses, though
we are sensitive to concerns that the regulatory analysis should not be overly long and
complex.
2.  In applying the PM-Mortality Expert Elicitation results in EPA 's benefit analysis, is
   our mathematical treatment of concepts such as the probability of causality,
   thresholds, and shape of the function technically sound, as well as transparent?

The mathematical treatment of concepts such as the probability of causality, thresholds,
and shape of the function is technically sound and transparent. For each expert, EPA
combined the expert's quantitative assessment of his beliefs about annual average PM2.5
and mortality hazard into an unconditional distribution of that expert's views of how
mortality hazards change per unit change in annual average PM2 5, at different baseline
levels of that annual average in the range 4-30 |ig/m3. For experts who expressed a non-
zero probability of a threshold, EPA made appropriate assumptions about how that
probability was distributed within ranges of annual average PM2.5 concentrations.

Some Council members expressed concern about how the derived unconditional
distribution for each expert is used to produce estimates of mortality impacts. The PM
NAAQS benefits chapter uses the C-R functions to estimate annual "attributable deaths,"

-------
using an apparently simple static methodology. It is increasingly recognized (i) that there
are difficulties underlying this concept, including that the estimated annual deaths do not
reproduce year-on-year; and (ii) that these difficulties can be overcome by use of life
tables, which also allow benefits to be expressed as gains in life expectancy (see, e.g.,
Rabl 2003, 2006).  The Council encourages EPA to identify this conceptual issue in the
use of both the expert elicitation and the cohort study results for benefits analysis. We
believe both types of C-R functions could be used in life-table calculations. (Indeed, such
calculations were apparently reported in an Appendix to the RIA, though not in Chapter
5). (We note that EPA suggests that measures of the gain in life expectancy may provide
a "theoretically preferred" method to value changes in mortality risk (p. 5-56), but it
does not discuss the assumptions used to estimate attributable deaths or attempt to
quantify the uncertainties in impacts and valuation estimates that result.)
3.  Do the tables, text, conclusions, and Executive Summary adequately distinguish the
   benefit estimates based on data-derived components of the uncertainty assessment
   from those based on expert judgment? How should the mortality estimates based on
   the elicitation be compared to those derived from the empirical studies of the PM-
   mortality association?

The tables, text, and conclusions of Chapter 5 clearly distinguish the benefit estimates
based on direct application of C-R functions from epidemiological studies (so-called
"data-derived components") from those based on the C-R functions elicited from the
experts (that are, of course, informed by epidemiological studies and other data). Overall,
the Council agrees that separately identifying the estimates based on their sources, and
reporting estimates based on multiple relevant epidemiological studies and on each
expert's C-R function, is a useful and appropriate method for accurately portraying the
uncertainty about the effects of PM2.5 on mortality (whether reported as number of
premature deaths  averted or as a monetary value).

It should be noted that the estimates derived from primary epidemiological studies and
from expert elicitation are not fully comparable. The epidemiological studies cited are
cohort studies used to estimate the longer-term influences of mortality; the expert
elicitation addressed total mortality changes that could be associated "with a reduction in
annual average PM2.5 including both changes in short-term (e.g., 24 hour) and long-term
exposures to PM2.5."

In addition the two epidemiological  studies cited are among several that were considered
by the experts; it would be useful  to present the range of estimates from several of the
other epidemiological studies (see Exhibit 3-3 in the Report) considered to see whether a
more comprehensive consideration of these studies yields as much variation in results as
the expert opinions. This could provide greater insight into the variation in expert
elicitation results.

The Council believes that graphical representations, such as the box-and-whiskers plots
in Figs. 5-12 and 5-13 in Chapter 5,  provide a clear and concise method to represent this

-------
information. These figures convey information about the likelihood of different ranges of
values as predicted from each C-R function (not simply a single range) and about the
degree of clustering and overlap among the different C-R functions (i.e., from individual
epidemiological studies and expert's judgments). The distribution functions presented in
Figs. 5-14 and 5-15 provide slightly more information, but most Council members find
them less informative, perhaps because the functions tend to stack on top of each other.
The Council was enthusiastic about the graphic shown below that shows cumulative
distributions of benefits calculated using two epidemiological studies and selected
fractiles of each expert's C-R function, all clearly distinguished by using distinctive
symbols. This graphic was presented to the Council but not included in the RIA benefits
chapter itself.  In contrast, the tables included in the chapter and Executive Summary
permit only an impoverished representation of the degrees of certainty and uncertainty.
  v-xEPA
Unrt«J States
Environmental Protection
AQ*ncy
                    PM NAAQS RIA - Valuation of Benefits
        Cumulative
        Probability
       95% - -
                                                   I   fj hi txT^J)   e
                                                                         SBeneflts
                                                                         (Billions)
            S3.5
Graphic taken from "Characterizing the Uncertainty in Estimated Benefits of Reduced PM-
Mortality Using Expert Elicitation," presentation by Lisa Conner, Bryan Hubbell and Harvey
Richmond, Office of Air Quality Planning and Standards, Advisory Council on Clean Air
Compliance Analysis meeting, May 8,2008.

Recognizing that standard box-and-whiskers plots such as Figs. 5-12 and 5-13 are
probably more complex than appropriate for the Executive Summary, the Council
suggests that alternative and less complex graphics, such as the figure above, would still
provide much more useful information than the tables that are included in the Executive
Summary.  Another option is a simplified box-and-whiskers plot including only the mean
or median and the 5th and 95th percentile values for each epidemiological study and each
expert.

-------
In attempting to summarize the rich information about uncertainty it has developed, EPA
evidently had difficulty choosing terminology. Table 5-1 and Table ES-3 present the
same material, but the concepts that are labeled "Lower Bound Expert Result" and
"Upper Bound Expert Result" in Table 5-1 are labeled "Low Mean" and "High Mean" in
Table ES-3. These concepts are not defined or explained in either location. Moreover, the
ranges for these concepts are not properly described. In Table ES-3, they are not even
labeled. In Table 5-1, they are improperly labeled as "confidence intervals" rather than
"credibility intervals." The concepts of confidence and credibility intervals are distinct
and have different interpretations. A 90 percent confidence interval is a statistic (i.e., a
random variable) constructed from data using a procedure such that the probability that
the interval includes the true value is 90 percent (conditional on the model assumptions).
A 90 percent credibility interval is an interval chosen by an expert who believes there is a
90 percent chance that the true value is in the interval (conditional on whatever
assumptions he may specify).
4. Does the EPA 's present effort to incorporate uncertainty analyses and discussions
into the primary analysis, as exemplified in the PMNAAQS RIA chapter, adequately
address the NRC's request to move the assessment of uncertainties into its primary
analyses? If not, what more could the EPA do to satisfy this request?

The short answer to the first question is "yes." To understand why this is so, it is
important to reflect on the way uncertainty was being addressed by the EPA at the time of
2002 NRC report. At that time benefits estimates for particulate air pollution would
typically present a base-case analysis which relied on the Pope et al. American Cancer
Society (ACS) study, a sensitivity analysis which provided upper estimates of effect
drawing on the Dockery et al. Six Cities study, and lower estimates from the time-series
literature, and these would be accompanied by a qualitative discussion of the issues
related to drawing causal inferences from this literature, especially the cohort studies.

The NRC was unsatisfied with this form of presentation because it left unresolved the
important question of determining how much weight to assign to these various alternative
estimates. If the cohort studies did not reflect causation, the effect estimates would need
to be based on the results of time series studies which are roughly a factor of 10 smaller
than the ACS Study. And if they did reflect causal associations, then the relative
plausibility of the coefficients from the Six Cities Study (that were about 3 times larger
than those of the American Cancer Society Study) was left unspecified.

The NRC committee was of the view that most users of EPA regulatory analyses (i.e.,
regulators, Congressmen, the general public) were not in as good a position to evaluate
these questions as scientific experts in epidemiology and toxicology would be and so it
recommended that EPA explore the possibility of eliciting scientific opinion, using
formal methods for probabilistic expert judgment, as a means of addressing this concern.

EPA's current effort reflects a careful attempt to do just that. EPA's analysis of
uncertainty (in dose-response coefficients for PM2.s) is integrative, quantitative, and

-------
central. The analysis is integrative in the sense that it deals with all sources of uncertainty
- both aleatory and epistemic - affecting PM dose-response functions. Aleatory
uncertainty is the inherent variation associated with the physical system or the
environment, sometimes referred to as stochastic uncertainty or irreducible uncertainty.
Epistemic uncertainty stems from a lack of knowledge of quantities or processes of the
system or the environment, sometimes referred to as model uncertainty or reducible
uncertainty. In this way it differs from meta-analysis, which would be valuable if the
only questions were about the magnitude of these relationships and not about the strength
of evidence for causal interpretation of the epidemiologic studies. The analysis is
quantitative in the  sense that it provides probabilistic statements about the relative
plausibility of alternative interpretations of the evidence - both about the relative
strengths of various studies and also about the likelihood that these study results are
artifacts of confounding or have little biological support. In addition, because the analysis
presents separately the quantitative interpretations of 12 experts, it provides the user with
a sense of the extent of scientific consensus among these experts. The analysis is central
to the EPA's document in that these probabilistic characterizations of uncertainty are
presented in the body of the RIA and not relegated  to technical appendices or supporting
documents.

While the Council  commends EPA for this work that clearly responds to the NRC
recommendations,  we believe there are ways in which future efforts could be
strengthened.  The Council understands that there are limitations to any approach,
including formal elicitation of expert judgment, to quantitatively characterize the nature
and strength of scientific understanding of quantities (such as concentration-response
slopes) relevant for environmental decision making. Among these are:

       •   Selection of experts - The first question in any effort to interpret ambiguous
           or conflicting scientific  information is to determine which scientists to
           consult. The EPA's analysis relies on the views of 12 experts in epidemiology
           and toxicology. The fact that the EPA is open and transparent about who these
           experts are and how they were selected invites questions about whether the
           group was representative, whether the sample was a probability sample,
          whether the group was balanced (with regard to discipline, institutional
           affiliation, or other factors), and so on. These are certainly important
           questions. But it is necessary to recognize that any effort to resolve questions
           about the extent of epistemic uncertainty (which often is the dominant source
           of uncertainty) must rely on the interpretations of scientists and therefore
           involve these same issues of which scientists, how chosen, whether
           representative, how balanced, and so on. Thus, the question could have been
           asked of previous EPA regulatory analyses that also relied on professional
          judgment, but without the transparency  of a  formal expert elicitation.

       •   Aggregation of expert opinion - The EPA has chosen to first present
           separately the views of each of the 12 scientists who participated in their
           expert elicitation. This is entirely consistent  with "best practices" in the field.
          But because of concerns about the scientific legitimacy of any approach for

-------
aggregating expert opinion, the EPA has said that it declined to aggregate and
it does not present any aggregate estimate of the central tendency of expert
opinions. However, EPA does present a range bounded by the mean estimates
of the experts with the lowest and highest mean estimates. The Council notes
that this is, in fact, a form of aggregation that assigns positive weight to the
most extreme judgments and zero weight to all the others (or perhaps suggests
a uniform distribution between these extreme values). The Council feels this is
not the best aggregation and recommends that the EPA consider other forms
of weighting, e.g., assigning equal weight to each expert's distribution or
assigning weights based on other approaches, such as peer weighting, self
weighting, or performance on calibration questions. See the Council's
discussion in Question 6b.

• Limits of rationality - All judgments are subject to well-known cognitive
anomalies such as sensitivity to framing, anchoring, probability weighting,
etc. Even trained experts such as scientists and physicians are not immune to
such effects. To protect against these difficulties, this well-designed expert
elicitation includes checks on consistency and logic and provide experts with
an opportunity to reconsider and revise their evaluations.

• Costs - The EPA did not provide the Council with estimates of the costs of
conducting this expert elicitation, but there is a sense that the costs were
"high" (without giving a magnitude or any real comparison). This is a clear
case, however, where the benefits of the expert elicitation for understanding
and resolving the large differences in estimated PM regulatory benefits were
even larger. There seems to be a consensus that the study provided benefits
well beyond its costs for analysis of the effects of the PM regulation.
However, it may be that similar expert elicitation efforts would not be
appropriate for all RIAs. Estimates of the cost of this study, and any lessons
learned about ways to reduce costs of future expert elicitation studies, would
be useful for future regulatory analysis.

The Council recommends that EPA develop criteria for determining when systematic
polling of scientific judgments would enhance the regulatory analysis, usefully inform
decision making, and justify the associated analytical costs. The Council understands that
the Agency is developing guidance on the use of expert judgment and encourages EPA to
consider this topic if it is not already doing so.

The primary area in which the EPA's effort has not been responsive to the NRC report is
in its explanation of the rationale behind the experts' judgments. Understanding why
experts disagree about the implications of the available evidence may be as important as
their specific judgments of the probability distribution for the PM2.5 C-R function itself.
Much information of this type was developed and reported in the final report (lEc, 2006)
and other documents describing the expert judgment study, but it is not evident how, if at
all, this information was reflected in the PM NAAQS benefits analysis.

-------
Expert judgments involve intuitive weighting and interpretation of existing evidence.
Thus in some ways expert elicitation is similar to meta-analysis, which also serves to
integrate and synthesize a body of evidence. Identifying the specific data, decision
weights, adjustments, and interpretations each expert employed could help inform
discussions among a wider group of experts and stakeholders, foster explicit evaluation
of the merits of experts' subjective criteria, and suggest opportunities for aggregating or
clustering judgments across experts.

Some Council members believe it would be useful to analyze how, if at all, experts'
judgments are correlated with factors such as: field of expertise (epidemiologists vs.
toxicologists/clinicians); authorship of primary epidemiological studies (do authors put
more emphasis on their own work?); and institution where the expert resides (the
elicitation includes three experts from one institution and two from another).

The Council recommends that EPA evaluate the qualitative information collected during
the elicitation, the post-elicitation workshop, and other interactions with the experts. This
analysis could help decision makers understand the divergence of expert opinion, identify
particular gaps or deficiencies in the evidence that experts believe contribute to
uncertainty, and identify fruitful avenues for future research.

In summary, EPA's use of expert elicitation satisfies the NRC's request and represents a
state-of-the-art example of expert elicitation methods. The benefits chapter serves as an
excellent proof of concept for quantifying uncertainty in regulatory analysis. In this
particular instance, the results serve to increase decision makers' and the public's
confidence that the health benefits of PM2.5 controls exceed costs by a comfortable
margin. This is largely because, as a group, the experts have great confidence that the
epidemiological studies upon which the EPA has relied reflect causal relationships
between exposure and mortality (i.e., the experts place little weight on non-causal
interpretations) and because they emphasize the relevance and validity of the cohort
studies for answering the questions of interest to the EPA.

Despite our strong support of this analysis, the Council urges EPA to anticipate
challenges to expert elicitation when it is used in more controversial applications. It is
reasonable to expect that EPA will be required to defend the process used for expert
selection. But as noted above, this challenge should apply to any effort to use expert
opinion, whether through formal elicitation or informal consultation, in support of
regulation.
5. Has the EPA adequately communicated the uncertainty information associated with
the PMpremature mortality estimate to the audiences that the RIA addresses,
including: scientists, policy analysts, decision makers, and the public?

Not yet. The Council appreciates that the Executive Summary, and especially the benefits
chapter, present the quantitative results in detail using diverse tabular and especially
graphical approaches. However, we raise general concerns related to: 1) methods of

-------
presentation most appropriate for the RIA's diverse readership; and 2) the proper
metric(s) for characterizing the results of the elicitation with regard to the distribution of
the experts'  subjective probabilities and their effect on the health impact and economic
valuation estimates.

The Council stresses that addressing these concerns will have important benefits for all
forms of communication about the expert elicitation including the RIA, the Executive
Summary, and the Press Release.

          a.  Considering the examples provided by the EPA, are  there other methods
              the EPA should use, instead of or in addition to those employed, to
              summarize and communicate the results of the PM-Mortality Expert
              Elicitation in the benefits chapter and the Executive Summary for
              communication to technical and non-technical audiences?

          Yes. The Council notes that the intended readership of the RIA is diverse, and
          we appreciate that EPA explored a range of approaches to presenting the
          results of the elicitation. Nonetheless, we believe that additional efforts are
          needed to identify the most effective means of communicating both methods
          and results to different kinds of readers. This is of particular importance for
          the Executive Summary and  Press Release, which are much more likely to be
          read in their entirety by most readers. Specific suggestions include:

                   •   Provide in the Executive Summary  a description of the elicitation
                      with regard to the rationale for its use, what it comprised, and
                      how it was conducted. Figure 5-1 could be useful in this regard.

                   •   Make more extensive use of graphical displays in the Executive
                      Summary, rather than (or in addition to) tables.

                   •   Add some indication of the "bottom line" in the Press Release.
                      For example, language could be added to state that
                      disagreements among experts are limited to the magnitude of
                      health benefits associated with PM2.5 reductions, not whether
                      those benefits exist. See also the Council's comments on
                      reflecting central tendency in Questions 5b, 5c and 6b.

          b.  To what extent do the types of statements made in the Executive Summary
              of the PMNAAQS RIA successfully communicate the extent of uncertainty
              (and/or the certainty) in the estimate of PMpremature mortality to those
              who are not familiar with the PM-Mortality Expert Elicitation?

          As discussed above (in response to Question 3), the Council questions the use
          of ranges to characterize the  uncertainty in impact and valuation estimates in
          the Executive Summary and  its tables. Panelists note that the range appears to
          imply that any value within it enjoys an equal degree of support from the

-------
experts while more detailed descriptions of the results show that the experts'
judgments are more clustered. Some indication of the clustering in the
elicitation results should be considered (see also Question 5c, point 1  below).

The Council notes two sources of uncertainty not explicitly addressed in either
the chapter or the Executive Summary:

    •   The  methods and criteria used to select the experts are neither
       presented in detail nor critically evaluated. Panelists note that  this
       stage of the elicitation process is critical to ensuring that the panel of
       experts adequately represents expert opinion about the effect of PM on
       mortality. (As noted in Question 4, expert selection is also critical to
       alternative approaches such as consensus panels, but  may be perceived
       as more salient for expert elicitation, perhaps because individual
       expert's distributions  are reported.)

    •   As noted above (Question 2), estimation from cohort study data of
       annual numbers of attributable deaths, as opposed to  measures of
       longevity (e.g., years  of life lost), is problematic.

c.  Are there additional summary statements that are important to deduce
   from the results of the PMNAAQS benefits chapter to the Executive
    Summary?

Yes. Panelists noted that the chapter lacks:

    •   A comprehensive statement of the "bottom line" with regard to the
       expert elicitation results,  e.g., that it supports the conclusion that the
       benefits of PM2.5 control  are very likely to be substantial; and

    •   An integrated discussion of the relative importance of various sources
       of uncertainty: those that were not quantified (e.g., relative toxicity of
       PM sources/constituents) versus those that were, as well as the relative
       importance of the various uncertainties that were quantified (e.g.,
       uncertainties in the C-R function vs. the valuation).

Table 5-5 identifies seven primary sources of uncertainty that are included in
the RIA, however this recognition of multiple uncertainties does not permeate
the rest of the chapter. It would be useful to acknowledge uncertainties at each
stage of the  analytic process and thus where the range of possible values
increases. It might be helpful to have a chart that outlines each step, assesses
the degree of uncertainty at that step, and reports how it is handled. This
would allow the reader to understand how the final range of numbers  reported
depends  on the various steps  in the analysis, and to more easily see which
uncertainties contribute most to the overall uncertainty. This information
should also be summarized the Executive Summary.

-------
6. Has the EPA adequately summarized the results of the PM-Mortality Expert
Elicitation across the experts in the PMNAAQS PJA benefits chapter and executive
summary?

The results are presented in summary form in terms of mean values and 90 percent
confidence intervals. (As noted in response to Question 3, the latter should be referred
to as 90 percent credibility intervals since they represent a judgment about uncertainty
and not an inference from the sampling distribution of a statistic.)

The Executive Summary is adequate in terms of conveying the central tendency and
range of estimates based on the experts' C-R functions. However, it could be made
clearer that the results shown are not the actual judgments of the experts - i.e., the
experts did not make judgments regarding the avoided premature mortality. Rather,
the avoided premature mortality was estimated using the C-R function elicited from
each expert. Similarly, the benefits assessment should not be attributed to the experts,
but it should be made clear that the expert judgment was simply one of many inputs
to the benefits assessment. This could also be made clearer in many of the tables (e.g.,
Table 5-32).

What is not apparent is why the expert judgments differ. For example, Figure 5-10
illustrates substantial inter-expert variability in results. Yet, the significance of this
variability to the conclusions of the benefits assessment seems not to be addressed.
Moreover, it appears that the experts could be grouped into clusters, such as a low
cluster (Experts G, K), central cluster, and high cluster (Experts A, E, and perhaps B
and C). It would be useful to know more about why the experts agree within clusters,
and why they disagree between clusters. For example, do experts within a cluster tend
to rely more heavily on a particular study than do experts in other clusters? If so, why
do members of different clusters put more weight on different studies? Are there
comments from the post-elicitation interviews that shed light on why the experts
continue to disagree even after seeing each other's judgments? For example, do the
experts differ with respect to which data sets they deem to be most representative or
useful or regarding inference methods (e.g., biological plausibility, statistical power
of empirically-based models)?
In the PMNAAQS benefits chapter, the EPA presents the mortality results based on
each of the twelve individual expert's responses along with results based on
concentration-response functions derived from empirical studies. The EPA has also
considered employing methods to aggregate results based on the elicitation into a
single combined estimate. In particular, the EPA considered calculating a simple
average of estimates across experts after the concentration-response functions of
each expert had been applied in the benefits model (i.e., the average of the resulting
estimation of the change in mortality incidence). Other options for summarizing the
results include: a weighted average of the resulting change in incidence, a trimmed
10

-------
means approach, and a fitted distribution to the overall set of concentration-response
functions.

       a.  Should the EPA continue to present the results of the individual experts in
          future benefits analyses as was done in the PMNAAQS RIA? Should the
           EPA develop metrics that aggregate across the individual experts? If
           aggregate measures are considered appropriate, should the EPA present
           these in addition to or instead of the individual estimates?

       The Council recommends that EPA continue to present the results of the
       individual experts in future benefits analyses, whether or not an aggregate or
       combined distribution is presented.

       Results from an expert elicitation can be used for several purposes, and
       whether aggregation or combination across experts is useful depends on the
       intended purpose. If the goal is to characterize the uncertainty about an
       outcome (e.g., the benefits of controlling PM2.5), a presentation that shows the
       distribution of estimated benefits conditional on each expert's C-R function
       (such as Figs. 5-10 - 5-11) is extremely useful, as it shows the implications of
       each expert's uncertainty and the diversity of judgments among experts.
       However, such  a rich presentation of uncertainty is excessive for many
       purposes, and will inevitably be collapsed by some sort of aggregation
       method, whether by EPA or by others (e.g., news media).

       The Council considered a number of approaches to aggregation but judges
       that none are ideal inasmuch as the appropriate form of aggregation may
       depend on the purposes of the elicitation and its results. For accurately
       portraying the range of expert opinion, reporting results using each expert's
       judgments may be most useful. For evaluating alternative regulatory options, a
       decision-analytic  perspective suggests it may be most useful to combine the
       experts' distributions into a single distribution, though the best method for
       doing so is unclear.  When the results  suggest that many of the experts'
       judgments fall into one or a few clusters, it may be important to identify and
       describe those clusters and to separately describe any significant outliers.
       Alternatively, if the experts' judgments are approximately uniformly
       distributed across a  range, a statement of this result may be most useful.
       b.  If a combination (aggregation) of results is considered appropriate, what
           technique for aggregation would you recommend?

       The basic goal is to explain both the range and distribution of judgments
       within it. There does not seem to be  a perfect or single best technique, as this
       will depend on the purpose of the aggregation and the data. A virtue of formal
       decision analysis is that it provides a rigorous  and theoretically justified
       method for mathematically combining information about uncertainty and
                                     11

-------
preferences. Alternative approaches that rely on a decision maker to
holistically weigh multiple factors in his or her head are susceptible to
cognitive limitations such as the heuristics and biases identified by Tversky
and Kahneman (1974) and the tendency to overweight those factors that
appear especially salient while neglecting others.

EPA could consider developing an operational approach to describing the
distribution of experts' judgments, while acknowledging that there is no single
best approach and that the choice of approach  is a matter of judgment and
context. The Council considered several examples of possible approaches:

        •   Approach 1: Present a range of experts' median or mean values.
            The range could be defined by fractiles (e.g., the interquartile
            range) or as a trimmed estimate by excluding judgments of the p
            highest and p lowest values. This has the advantage of being easy
            to explain and does not strongly imply a probabilistic
            interpretation, which is appropriate since the experts are not a
            probability sample. A limitation is that it does not take into
            account the range of uncertainty elicited from each expert or
            information from the experts with the highest and lowest central
            values.
        •   Approach 2:  Similar to Approach 1 but instead of using a range
            of central values, use a range enclosed by ranking the 5th
            percentiles of the experts with the lowest such values and the
            95th percentiles of the experts with the highest such values, and
            reporting some summary of these ranked extreme values (e.g.,
            the  mean or median 5th percentile; the smallest 5th percentile).
            This approach incorporates some information about the variation
            in both the locations and widths of the experts' distributions. It
            emphasizes the range of opinion but provides no information
            about any  clustering within it.
        •   Approach 3: If there are multiple clusters, EPA  could describe
            each cluster, perhaps using Approach 1 or 2. The identification
            of clusters may take into account  qualitative information
            regarding similarities in the basis of the judgments of multiple
            experts in  a cluster; it requires judgment.
        •   Approach 4: A combined distribution can be produced by
            aggregating the probability associated with each  value across
            experts. This is equivalent to estimating a distribution by Monte
            Carlo simulation in which experts are sampled (with equal or
            unequal probability) and values drawn from the selected expert's
            distribution. Unlike the previously described approaches, this
            method has the advantage of using all of the information
            provided by all of the experts, is reasonably easy to explain, and
            has been shown to perform reasonably well in characterizing
            uncertainty (Clemen and Winkler, 1999). However, this method
                              12

-------
            has the disadvantage of suggesting that the experts constitute a
            probability sample of relevant opinion. If expert weights are
            used, these can be based on several sources,  e.g., ranking of
            experts by each other or experts' performance in providing
            probability distributions in response to calibration questions.
            (Calibration questions ask about variables whose value is
            unknown to the experts at the time of elicitation but known  to the
            analyst at the time of combining the experts' distributions;
            Cooke, 1991).  Unequal weighting of experts' judgments can lead
            to superior performance if the weights (or calibration questions
            on which they  are based) are well selected, but is often viewed
            skeptically because it appears to give inadequate respect to
            experts whose  judgments are given little weight and/or may
            discourage experts from participating in a time-consuming
            elicitation process if their judgments may ultimately get little
            weight. Unequal weighting methods are more complicated to
            conduct and explain than some of the other approaches.

As is evident from these example approaches, none is perfect. Thus, EPA may
wish to consider selecting a relatively simple approach that conveys the range
and central tendency, while  acknowledging that a variety  of approaches are
possible, explaining the advantages and disadvantages of candidate
approaches, and  explaining  why the selected approach was chosen. It is
possible to use multiple approaches in combination (e.g., Approaches 1 and
4), as the situation warrants.

c.  If a combined estimate is considered appropriate, what interpretation
   should be applied to the percentiles of the uncertainty distribution derived
   from the elicitation (e.g., the mean estimate of a combined elicitation
   function, or the 5th -95thpercentiles)?

The appropriate interpretation depends on the combination method, and is
perhaps best characterized by explicit description of the method (e.g., for
Approach 1 above the interquartile range of the experts' mean estimates).
Because it is typically not appropriate to characterize the experts as a
probability sample of some  relevant population of expert opinion, it does not
seem appropriate to characterize a combined distribution as a probabilistic
distribution of expert opinion.

d.  If a combined distribution is not appropriate, how should the EPA
   characterize  the estimates of the PM premature mortality effect? One
   option employed in the Executive Summary of the PMNAAQS RIA is to
   present the estimates as a range from the average value associated with
   the steepest concentration-response function to the average value
   associated with the flattest concentration-response function. Is this the
   best approach? What other options would you recommend?
                              13

-------
As described above, characterizing effects as a simple range provides a very
limited summary of the rich information about uncertainty provided by the
expert elicitation. The Council finds that graphical displays (such as Figs 5-10
- 5-11 and a graphic like that illustrated in response to Question 3) that
portray the uncertainty about premature mortality conditional on each expert's
judgment, and the variability in estimates among experts, provide a
comprehensive summary. In attempting to compress this information into a
simpler format, the Council suggests that the appropriate summary will
depend on the data and encourages EPA to attempt to characterize the extent
to which experts' judgments are congruent and overlapping or broadly
distributed across an overall range.
                              14

-------
References

Clemen, R.T., and Winkler, R.L., "Combining Probability Distributions from Experts in
   Risk Analysis" Risk Analysis, 19: 187-203, 1999.
Cooke, R.M., Experts in Uncertainty: Opinion and Subjective Probability in Science,
   New York: Oxford University Press, 1991.
Industrial Economics, Inc., The Expanded Expert Judgment Assessment of the
   Concentration-Response Relationship between PM2.s Exposure and Mortality, Final
   Report prepared for EPA Office of Air Quality Planning and Standards.  Available at
   http://www.epa.gov/ttn/ecas/regdata/Uncertainty/pm_ee_report.pdf, 2006.
National Research Council, Estimating The Public Health Benefits Of Proposed Air
   Pollution Regulations, Washington, D.C.:  National Academies Press, 2002.
Rabl, A., "Analysis of Air Pollution Mortality in Terms of Life Expectancy Changes:
   Relation between Time Series, Intervention, and Cohort Studies," Environmental
   Health: A Global Access Science Source 5: 1-11, 2006.
Rabl, A., "Interpretation of Air Pollution Mortality: Number of Deaths or Years of Life
   Lost?" Journal of the Air and Waste Management Association 53: 41-50, 2003.
Tversky, A., and Kahneman, D., "Judgment Under Uncertainty: Heuristics and Biases,"
   Science 185(4157):  1124-1131, 1974.
                                       15

-------