SEPA
United States
Environmental Protection
Agency
Office of Air Quality
Planning and Standards
Research Triangle Park NC 27711
EPA-450 3 79-032
May 1979
Air
End Use of Solvents
Containing Volatile
Organic Compounds
-------
EPA-450/3-79-032
End Use of Solvents Containing
Volatile Organic Compounds
by
Ned Ostojic
The Research Corporation of New England
125 Silas Deane1 Highway
Wethersfield, Connecticut 06109
Contract No. 68-02-2615
Task No. 8
EPA Project Officer: Reid E. Iversen
Prepared for
U.S. ENVIRONMENTAL PROTECTION AGENCY
Office of Air, Noise, and Radiation
Office of Air Quality Planning and Standards
Research Triangle Park, North Carolina 27711
May 1979
-------
DISCLAIMER
This report has been reviewed by the Office of Air Quality Planning
and Standards, U.S. Environmental Protection Agency, and approved for
publication. Approval does not signify that the contents necessarily
reflect the views and policies of the U.S. Enviornmental Protection
Agency, nor does mention of trade names or commercial products constitute
endorsement or recommendation for use.
ii
-------
ABSTRACT
Currently there are no standardized guidelines for evaluating the
performance of air quality simulation models. In this report we develop
a conceptual framework for objectively evaluating model performance. We
define five attributes of a well-behaving model: accuracy of the peak
prediction, absence of systematic bias, lack of gross error, temporal cor-
relation, and spatial alignment. The relative importance of these attri-
butes is shown to depend on the issue being addressed and the pollutant
being considered. Acceptability of model behavior is determined by cal-
culating several performance "measures" and comparing their values with
specific "standards." Failure to demonstrate a particular attribute may
or may not cause a model to be rejected, depending on the issue and pollutant.
Comprehensive background material is presented on the elements of the
performance evaluation problem: the types of issues to be addressed, the
classes of models to be used along with the applications for which they are
suited, and the categories of performance measures available for considera-
tion. Also, specific rationales are developed on which performance standards
could be based. Guidance on the interpretation of performance measure values
is provided by means of an example using a large, grid-based air quality
model.
m
-------
ACKNOWLEDGMENTS
A number of persons have generously provided their assistance and sup-
port to this project. Special thanks is due Philip Roth, whose fore-
sight and leadership made this project possible. His perceptive advice
and guidance contributed immeasurably to the results of this work.
Steven Reynolds and Martin Hillyer made many significant, insightful
comments, which were greatly appreciated.
For their patience and diligence, grateful thanks is also due the
members of the SAI support staff, particularly Marie Davis, Sue Bennett,
Chris Smith, and Linda Hill.
IV
-------
CONTENTS
DISCLAIMER ii
ABSTRACT "i
ACKNOWLEDGMENTS iv
LIST OF ILLUSTRATIONS vii
LIST OF TABLES *i
LIST OF EXHIBITS xiv
I INTRODUCTION 1-1
A. Overview of the Problem 1-2
B. Structure of the Report 1-5
II SUMMARY H-l
A. Main Results H-l
B. Detailed Summary H-2
1. Summary of Chapter III (Issues) H-2
2. Summary of Chapter IV (Models) II-3
3. Summary of Chapter V (Performance Measures) .... I1-4
4. Summary of Chapter VI (Performance Standards) . . . 11-14
III TSSUES REQUIRING MODEL APPLICATION III-l
A. A Perspective on the Issues III-l
1. Federal Air Pollution Law III-2
2. The Code of Federal Regulations III-3
B. Generic Issue Categories III-7
1. The Issues: Their Classification III-8
2. The Issues: Some Practical Examples and
Their Implications for Air Pollution Modeling . . . 111-10
3. The Issues: A Prologue to the Next Chapter .... 111-13
-------
IV AIR QUALITY MODELS JV-"1
A. Generic Model Categories IV-2
1. Rollback Category Iv'2
2. Isopleth Category IV-4
3. Physico-Chemical Category IV-5
B. Generic Issue/Model Combinations IV-16
C. Model/Application Combinations IV-22
D. Some Specific Air Quality Models IV-22
E. Air Quality Models: A Summary IV-25
V MODEL PERFORMANCE MEASURES V-l
A. The Comparison of Prediction with Observation V-2
B. Generic Performance Measure Categories Y-4
1. The Generic Measures V-5
2. Some Types of Variations Among Performance
Measures v~10
3. Several Practical Considerations V-10
C. A Basic Distinction: Regional Versus Source-Specific
Performance Measures V-15
D. Some Specific Performance Measures V-22
E. Matching Performance Measures to Issues and Models . . V-27
1. Performance Measures and Air Quality Issues. . . . V-27
2. Performance Measures and Air Quality Models. . . . V-33
F. Performance Measures: A Summary ...'.. V-36
VI MODEL PERFORMANCE STANDARDS VI-1
A. Performance Standards: A Conceptual Overview VI-2
B. Performance Standards: Some Practical
Considerations VI-4
1. Data Limitations VI-5
2. Time/Resource Constraints VI-6
3. Variability of Analysis Requirements VI-6
C. Model Performance Attributes VI-7
D. Recommended Measures and Standards VI-12
1. Recommended Performance Measures VI-14
2. Recommended Performance Standards VI-23
3. Summary Table of Recommended Measures and
Standards VI-30
4. Formulas for Calculating Performance Measures
and Standards VI-32
vi
-------
VI MODEL PERFORMANCE STANDARDS (Continued)
E. A Sample Case: The SAI Denver Experience VI-39
1. The Denver Modeling Problem VI-39
2. Values of the Performance Measures VI-4U
3. Interpreting the Performance Measure Values VI-4b
F. Suggested Framework for a Draft Standard VI-53
VII. RECOMMENDATIONS FOR FUTURE WORK V11'1
A. Areas for Technical Development VII~2
1. Further Evaluation of Performance Measures VI1-2
2! Identification and Specification of Prototypical
Point Source "Test Bed" Data Bases VI1-2
3. Examination of Performance Evaluation Procedure in
Sparse-Data Point Source Applications ... VII-3
4. Further Development of Rationales for Setting
Performance Standards - • VII-4
B. Assessment of Institutional Implications VII-5
C. Documents To Be Compiled VII~5
APPENDICES
A IMPORTANT PARTS OF THE CODE OF FEDERAL REGULATIONS
CONCERNING AIR PROGRAMS A-'
B SOME SPECIFIC AIR QUALITY MODELS B'1
C SOME SPECIFIC MODEL PERFORMANCE MEASURES C-l
D SEVERAL RATIONALES FOR SETTING MODEL PERFORMANCE
STANDARDS D"1
REFERENCES R-1
vn
-------
ILLUSTRATIONS
II-l Various Levels of Knowledge About Regional Concentrations . . . II-9
II-2 Various Levels of Knowledge About Specific-Source
Concentrations 11-9
V-l Various Levels of Knowledge About Regional Concentrations ... V-6
V-2 Various Levels of Knowledge About Specific-Source
Concentrations V-7
V-3 Sample Regional Isopleth Diagram Illustrating Ozone
Concentrations in Denver on 29 July 1975 for
Hour 1200-1300 MST V-17
V-4 Sample Specific-Source Isopleth Diagram Illustrating
Concentrations Downwind of a Steady-State Gaussian
Point Source V-18
V-5 Concentration Isopleth Patterns for Various Source Types .... V-20
V-6 Schematic of a Point Source Measurement Network V-21
V-7 Locus of Possible Footprint Locations for an Elevated
Point Source V-21
VI-1 Orientation and Scaling of CAVE and d* A*65 on
a Prediction-Observation Correlogram . VI-37
VI-2 Locations of Monitoring Stations in the Denver
Metropolitan Region VI-41
VI-3 Predicted and Observed Ozone Concentrations at Each
Monitoring Station During the Day (Denver, 28 July 1976) .... VI-42
VI-4 Correlogram of Ozone Observation-Prediction Pairs
for Sample Case (Denver, 28 July 1976) VI-46
VI-5 Normalized Deviations About the Perfect Correlation Line
as a Function of Ozone Concentration (Denver, 28 July 1976) . . VI-47
VI-6 Non-Normalized Ozone Deviations About the Perfect Correlation
Line Compared with Instrument Errors (Data for 14 Hours and
8 Stations, Denver, 28 July 1976) VI-48
VI-7 Non-Normalized Ozone Absolute Deviation About the Perfect
Correlation Line Compared with Instrument Error (Data for
14 Hours and 8 Stations, Denver, 28 July 1976) VI-49
vi ii
-------
VI-8 Ground-Traces of the Predicted and Observed Peak Ozone
Concentrations (Denver, Hours 1100-1200 to 1400-1500
Local Standard Time, 28 July 1976) ............... VI-52
VI-9 Possible Relationships Between the Model Performance
Standards and a Guidelines Document ............... VI-54
C-l Locations and Values of Predicted Maximum One-Hour-Average
Ozone Concentrations for Each Hour from 8 a.m. to 6 p.m ..... C-7
C-2 Concentration Histories Revealing Time Lag or Spatial Offset . . C-14
C-3 Estimate of Bias in Model Predictions as a Function of
Ozone Concentration ....................... C-15
C-4 Time Variation of Differences Between Means of Observed
and Predicted Ozone Concentrations ............... C-l 7
C-5 Probabilities of Ozone Concentration Exceedance ......... C-18
C-6 Model Predictions Correlated with Instrument Observations
of Ozone (Data for 3 Days, 9 Stations, Daylight Hours) ..... C-19
C-7 Model Predictions Compared with Estimates of Instrument Errors
for Ozone (Data for 3 Days, 9 Stations, Daylight Hours) ..... C-21
C-8 Map of Denver Air Quality Modeling Region Showing Air
Quality Monitoring Stations . . . . ............... C-23
C-9 Time History of Predicted and Observed Concentrations at
Monitoring Sites ........................ C-24
C-10 Variations over All Stations of Observed and Predicted
Average Ozone Concentrations .................. C-25
C-ll Plots of Residuals and Forcing Variable ............. C-26
C-l 2 Distribution of Area Fraction Exposed to Greater
Than a Given Concentration Value ................ C-30
C-l 3 Isopleths of Ozone Concentrations (pphm) on 29 July 1975 .... C-35
C-14 Size of Area in Which Predicted Ozone Concentrations Exceed
Given Values for Years 1976, 1985, and 2000 ........... C-40
C-15 Typical Residuals Isopleth Plot for Annual Average N0£ ..... C-42
C-16 Estimated Exposure to Ozone as a Function of Ozone
Concentration for 3 August 1976 Meteorology ........... C-48
C-17 General Shape of the Exposure Cumulative Distribution
and Density Functions ...................... C-49
ix
-------
C-18 Shape of t(/(C), the Approximation to the Delta Function C-52
C-19 Cumulative Ozone Dosage as a Function of the Time of Day
for 3 August 1976 Meteorology C-54
o
C-20 Cumulative Exposure (in 10 Person-Hours) to Ozone
Concentrations Above Given Level in One-Square-Mile Grid
Cells Between 500 and 1800 Hours for 3 August 1976
Meteorology and 1976 Emissions C-55
C-21 Cumulative Ozone Dosages (in 10 pphm-Person-Hours) in the
One-Square-Mile Grid Cells from 500 to 1800 Hours (MST) for
3 August 1976 Meteorology and Emissions in 1976 C-58
C-22 Orientation with Respect to Measurement Station of Nearest
Point at Which Prediction Equals Station Observation C-59
C-23 Space-Time Trace of Location of Nearest Point Predicting
a Concentration Equal to the Station Measured Value C-60
D-l Possible Health Effects Curves D-4
D-2 Representation of Spatial and Concentration Dependent
Population Functions D-6
0-3 Population Distribution as a Function of Concentrations D-10
D-4 Idealized Concentration Isopleths D-ll
D-5 Typical Radial Concentration Distributions About
the Peak D-13
D-6 Predicted Population Distribution as a Function
of Concentration D-16
D-7 Shifts in w(C) Caused by Nonuniform Population Distributions . . D-l7
D-8 Expected Shape of Health Effects Function D-20
D-9 Minimum Allowable Ratio of Predicted to Measured Peak
Concentration Value D-23
D-10 Prototypical Isopleth Diagram D-28
D-ll The Isopleth Diagram Replotted D-29
D-12 Total Regional Control Cost as a Function of the Level
of Control Required D-32
D-13 Uncertainty Distribution for a Conservative Model D-35
D-14 Uncertainty Distribution for a Nonconservative Model D-35
-------
TABLES
II-l Air Quality Issues Commonly Addressed,
by Generic Model Type .................... H-5
I 1-2 Model /Application Combinations ................ I 1-6
II-3 Some Air Quality Models ................... H-7
II-4 Generic Performance Measure Information Requirements ..... 11-10
II-5 Types of Variations Among Generic Performance
Measure Categories ...................... 11-12
I 1-6 Performance Measures Commonly Associated with
Specific Issues ....................... ll-~\3
11-7 Performance Measures That Can Be Calculated by
Each Model Type ....................... H-l3
II-8 Performance Measure Objectives ................ 11-15
I 1-9 Importance of Performance Attributes by Issue ........ 11-16
11-10 Importance of Performance Attributes by Pollutant and
Averaging Time ........................ 11-18
II-ll Measures Recommended for Use in Setting Model
Performance Standards .................... 11-19
11-12 Possible Rationales for Setting Model Performance
Standards .......................... n'21
11-13 Performance Attributes Addressable Using Performance
Standard Rationales ..................... 11-22
11-14 Association of Rationales with Generic Issues ........ 11-22
11-15 Recommended Rationales for Setting Standards ......... 11-26
11-16 Summary of Recommended Performance Measures and Standards . . 11-26
IV-1 Air Quality Issues Commonly Addressed,
by Generic Model Type .............. ...... IV-1 8
IV-2 Possible Designations of Application Attributes ....... IV-23
IV-3 Model/Application Combinations ................ IV-24
IV-4 Some Air Quality Models ................... IV-26
xi
-------
V-l Generic Performance Measure Information Requirements ..... V-8
V-2 Types of Variations Among Generic Performance
Measure Categories ...................... V-ll
V-3 Some Peak Performance Measures ................ V-23
V-4 Some Station Performance Measures .............. V-24
V-5 Some Area Performance Measures ................ V-26
V-6 Some Exposure/Dosage Performance Measures .......... V-28
V-7 Performance Measures Associated with Specific Issues ..... V-34
V-8 Performance Measures That Can Be Calculated by
Each Model Type ....................... V-37
VI-1 Performance Measure Objectives ................ VI-10
VI-2 Importance of Performance Attributes by Issue ......... VI-10
VI-3 Importance of Performance Attributes by Pollutant and
Averaging Time ........................ VI-1 3
VI-4 Candidate Station Performance Measures ............ VI-16
VI-5 Useful Hybrid Performance Measures .............. VI-20
VI-6 Measures Recommended for Use in Setting Model
Performance Standards ............ ........ VI~21
VI-7 Possible Rationales for Setting Model Performance Standards . VI-24
VI-8 Performance Attributes Addressable Using Performance
Standard Rationales ..................... VI'26
VI-9 Association of Rationales with Generic Issues ........ VI-27
VI-10 Recommended Rationales for Setting Standards ......... VI -29
VI-11 Summary of Recommended Performance Measures and Standards . . VI-31
VI-12 Sample Values for Model Performance Standards
(Denver Example) ....................... VI~43
VI-13 Importance of Performance Attributes by Issue ........ VI-56
VI-14 Importance of Performance Attributes by Pollutant
and Averaging Time ..................... VI'56
VI-15 Model Performance Measures and Standards ........... VI-57
xii
-------
B-l Some Specific Air Quality Models B-3
C-l Some Peak Performance Measures C-3
C-2 Several Peak Measure Combinations of Interest and
Some Possible Interpretations C-4
C-3 Some Station Performance Measures C-8
C-4 Occurrence of Correspondence Levels of Predicted and
Observed Ozone Concentrations C-20
C-5 Some Area Performance Measures c~29
C-6 Some Exposure/Dosage Performance Measures C-45
D-l Selected Parameter Values in Denver Test Case D-15
-------
EXHIBITS
III-l Formal Organization of CFR Title 40—Protection
of Environment III-4
IV-1 General Model Categories IV-3
xiv
-------
I INTRODUCTION
In this report a candidate framework is suggested within which an
objective evaluation of air quality simulation model (AQSM) performance
may be carried out, along with an assessment of the relative applicability
of models to specific problems. Quantitative procedures are identified
that could facilitate assessment of the relative accuracy and usability
of an AQSM.
The subject addressed in this report is a broad and complex one. Sel-
dom can a rule for judging model performance be stated that does not have
several plausible exceptions to it. Consequently, we view the establish-
ment of model performance standards to be a pragmatic and evolutionary
exercise. As we gain experience in evaluating model performance, we will
need to modify both our choice of performance measures and the range of
acceptable values we insist on. Nevertheless, the process must begin some-
where. The recommendations contained in this report represent such a
beginning.
Model performance evaluation should not be viewed as a mechanistic
process, to be performed in a "cookbook" fashion. Performance measures
may be defined to be specific quantities whose value in some way character-
izes the difference between predicted and observed concentrations. No set
of performance measures, however well designed, can fully characterize
model behavior. Judgment is required of the model user. Predictions can
be compared with measurement data in a variety of ways. Some comparisons
involve the calculation of specific quantities and are thus suited for
having specific standards set. (An example might be the difference between
the predicted and the observed concentration peak.) Other comparisons are
more qualitative, better used in an advisory sense to facilitate "pattern
1-1
-------
recognition." (Concentration isopleth maps and. time profiles of predicted
and observed concentrations are examples of this type of qualitative com-
parison.) Although we recommend a set of performance measures and standards
in this report, in no way does this recommendation suggest that computation
of measures be limited to this set. For this reason, we catalogue many
different types of performance measures, only a small subset of which have
explicit, formal standards.
The measures and standards we suggest for use will almost certainly
change as experience improves our "collective judgment" about what consti-
tutes mode! acceptability and what does not. Perhaps the number of measures
will Increase to provide richer insight into model performance, or perhaps
the number will shrink without any loss of "information content." Regard-
less of the list of measures and these standards that ultimately emerges for
use, ft is the conceptual structuring of the performance evaluation itself
that seems to be most important at this point. We must identify clearly the
desirable model attributes whose presence we are most interested in detecting,
and we need to understand how we assess their relative importance, depending
on the issue we are addressing and the pollutant species we are considering.
Thfs report offers a conceptual structure for "folding in" all these concerns
and suggests candidate measures and standards.
A. OVERVIEW OF THE PROBLEM
Air quality simulation models (AQSMs) are widely used as predictive
tools, estimating tne impact on future air quality of alternative public
decisions. Their predictions, however, are inherently nonverifiable. Only
after the proposed action has been taken and the required implementation
time elapsed will measurement data confirm or refute the model's predictive
ability.
Herein lies the dilemma faced by users of air quality models: If a
model's predictions at some future time cannot be verified, on what basis
can we rely on that model to decide among policy alternatives? In resolving
this dilemma, most users have adopted a pragmatic approach: If a model can
1-2
-------
demonstrate its ability to reproduce a set of "known" results for a similar
type of application, then it is judged an acceptable predictive tool. It
is on this basis that model "verification" has become an essential prelude
to most modeling exercises.
Several investigators (Calder, 1974, and Johnson, 1972) have objected
to this approach, arguing that it amounts to little more than "crude cali-
bration." They suggest that true model validation can only be accomplished
by evaluating each component sub-model--emissions, transport, or chemistry,
for example. While this may be a scientifically sound approach, there are
so many models available that it is difficult to complete such efforts for
them all. Worse, the demand for a model, truly validated or not, often
forces such concerns to be swept aside. We take a highly pragmatic position
in this report, one that is also consistent with recommendations recently
made to and by the U.S. Environmental Protection Agency [EPA] (Roth, 1977,
and EPA, 1977). Because verification is so often performed at the "output
end" (that is, only model results are examined, comparing them with "true"
data), a systematic and objective procedure is needed in assessing model
performance on that same basis.
A further difficulty exists. What constitutes a set of "known" results?
This is not a problem easily solved. For "answers" to be known exactly, the
"test" problem must be simple enough to be solved analytically. Few problems
involving atmospheric dynamics are so simple. Most are complex and nonlinear.
For those, the analytic test problem is an unacceptable one. Another, more
practical alternative often is employed. For regional, multiple-source
applications, the "known" results are taken to be the station measurements
of concentrations actually recorded on a "test" date.
For source-specific applications, the source of interest may not yet
exist, permission for its construction being the principal issue at hand.
For these applications, it is often necessary to verify a model using the
most appropriate of several prototypical "test cases." Though not existing
currently, these could be assembled from measurements taken at existing
sources, the variety of source size, type and location spanning the range of
values found in applications of interest.
1-3
-------
The term "known" is used imprecisely when referring to a set of measure-
ment data. Station observations are subject to instrumentation error. The
locations of fixed monitoring sites may not be sufficiently well distri-
buted spatially to record data fully characterizing the concentration
field and its peak value. Nevertheless, despite those shortcomings,
"observed" data often are regarded as "true" data for the purpose of
model verification.
In evaluating model performance, we must decide which performance
attributes we most wish the modeT to possess. Having assembled two sets
of data, one "known" and the other "predicted," we can assess model perfor-
mance by comparing one with the other. Prediction and observation, however,
can be compared in many ways. We must select the quantities (performance
measures) that can most effectively test for the presence of those attributes.
Once we have decided on the performance measures best suited to our
needs (and most feasible computationally), we can calculate these values.
Having done so, however, we must ask a central question: How close must
prediction be to observation in order for us to judge model performance
as acceptable? If we are to answer "how good is good," performance stand-
ards for these measures must be set, with allowable tolerances (predicted
values minus observed ones) derived from a reasonable rationale (health
effects or pollution control cost considerations, for instance).
By setting these standards explicitly, certain benefits may be gained.
Among these are the following:
> A degree of uniformity is introduced in assessing model
reliability.
> A rational and objective basis is provided for comparing
alternative models.
> The impact of limitations in both data gathering proce-
dures and measurement network design can be made more
explicit, facilitating any review of them that may be
required.
1-4
-------
> The performance expected of a model is stated clearly,
in advance of the expenditure of substantial analysis
funds, allowing model selection to be a more straight-
forward and less "risky" process.
> The needs for additional research can be identified clearly,
with such efforts more directed in purpose.
B. STRUCTURE OF THE REPORT
The central purpose of this report is to suggest means for setting per-
formance standards for air quality dispersion models. In doing so, our dis-
cussion proceeds in two phases, the first exploring key elements of the over-
all problem, as well as their interactions, and the second synthesizing all
into a conceptual framework for model performance evaluation.
We recognize three key elements of the performance assessment problem,
all of which are interrelated: the classes of issues addressed by AQSMs (air
quality maintenance planning or prevention of significant deterioration, for
example), the types of AQSMs available for use (grid-based, trajectory, or
Gaussian models, for instance) with the applications for which they are suit-
able, and the classes of performance measures that are candidates for our
use (two of which are station and exposure/dosage measures).
We consider each of these three elements in Chapters III, IV, and V,
providing supporting material in Appendices A, B, and C. In Chapter III,
we identify from current federal law and regulations seven distinctly dif-
ferent types of air quality issues, each of which may be addressed using an
AQSM. In Chapter IV, we assess major model classes, examining their capabil-
ities and limitations as well as their suitability for use in addressing
each of the generic classes of issues. In Chapter V, we discuss model per-
formance measures, identifying four major types, which we then assess for
computational feasibility and suitability for use.
We provide supplementary detail for these three chapters in the first
three appendicies. In Appendix A, we outline important portions of the Code
1-5
-------
of Federal Regulations. In Appendix B, we describe in summary form a number of
specific air quality models. In Appendix C, we examine at length a variety
of specific model performance measures, discussing their computation and pro-
viding illustrative examples of their calculation.
Having identified issues (Chapter III), issue/model combinations
(Chapter IV), and issue/model/measure associations (Chapter V), we reach
the synthesis phase in Chapter VI. Here we first identify five desirable
attributes of model performance. Then we recommend a set of performance
measures suitable for use in determining the presence or absence of each
attribute. Each measure is chosen based on two criteria: First, it is
an accurate indicator of the presence of a problem type and second, it is
quantitative (that is, amenable to having specific standards set).
Having selected the performance measures for use, we then offer several
possible rationales for determining the range of their acceptable values.
We examine four rationales, discussing each in detail in Appendix D. Having
done so, we recommend standards for use.
We also consider the way in which the relative importance of the five
model performance attributes varies with the issue being addressed and the
pollutant being considered. We recommend a means for ranking problem types
that is dependent on these factors, using it as a way to decide from among
procedural alternatives when a model fails to display a particular attribute.
To illustrate how to interpret the values of the recommended perfor-
fance measures, we discuss a sample case. The sample case history is based
on the use of the grid-based SAI Airshed Model in modeling the Denver Met-
ropolitan region. Supplementary means for gaining insight into model
behavior are also shown.
Finally, a conceptual framework is suggested for a draft model perfor-
mance standard. The elements it should contain are discussed, as well as
its relationship to a supplementary guidelines document.
1-6
-------
With this final discussion, our presentation is complete, though the
subject itself is by no means exhausted. Considerable additional effort
is warranted, given the importance of this complex and difficult topic.
We suggest in Chapter VII several areas in which we feel such work would
prove fruitful.
1-7
-------
II SUMMARY
In this chapter we summarize the results of this study. First,
we state them in overall terms. Then, we summarize detailed results
on a chapter-by-chapter basis.
A. MAIN RESULTS
Several main tasks are accomplished in this report. These represent
the chief results of the study. We summarize them as follows:
> A conceptual framework is set for objective evaluation of
dispersion model performance (Chapter VI).
> An outline for a draft model performance standards document is
suggested (Chapter VI).
> Specific measures are recommended for use (Chapter VI).
> Specific rationales on which standards could be based are
developed, several of which represent research that is
original with this study (Chapter VI and Appendix D).
> Comprehensive background material is presented on key elements
of the performance evaluation problem: the types of issues to
be addressed (Chapter III and Appendix A), the classes of
models to be used along with the applications for which they
are suited (Chapter IV and Appendix B), and the categories of
performance measures available for consideration (Chapter V
and Appendix C).
> Guidance on the interpretation of performance measure values
is provided by means of an illustrative sample case (Chapter VI).
II-l
-------
B. DETAILED SUMMARY
Discussion in this report proceeds in two phases. In the first of these,
we present a comprehensive examination of key elements of the performance
evaluation problem. This background phase consists of the in-depth
analysis in Chapters III, IV and V, supported by material in Appendices
A, B and C.
We intend the background phase of this report to be regarded not as a
supplement but rather as an essential prelude to the second, or synthesis,
phase. The second phase, contained in Chapter VI and Appendix D, draws
from the background material to identify a set of performance criteria
that 1s both useful and computationally feasible.
In this section we present detailed summaries of the important
results of the report. We do so on a chapter-by-chapter basis.
1. Summary of Chapter III (Issues)
This chapter provides an issues framework within which the
application of air pollution models can be viewed. First, an overview
is provided, highlighting important aspects of federal air pollution
law (also see Appendix A). By means of this discussion, seven generic
classes of issues are identified. These issues are examined and
their implications for model applications explored.
The seven issue classes, divided into multiple-source and single-
source categories, are described as follows:
> Multiple-Source Issues
- SIP/C (State Implementation Plan/Compliance). The attainment
of regional compliance with NAAQS, as considered in the SIP.
- AQMP (Air Quality Maintenance Planning). Regional main-
tenance of compliance vrith the NAAQS, as considered in
the SIP.
11-2
-------
> Single-Source Issues
- PSD (Prevention of Significant Deterioration). Limitation
of the amount by which the air quality may be degraded in
areas in attainment of the NAAQS; this is considered in
each SIP.
- NSR (New Source Review). Permit process by which applicants
proposing new or modified stationary sources must demonstrate
that both directly and indirectly caused emissions are
within certain limits and that the pollution control to
be employed is performed with the best available tech-
nology; this is considered in each SIP.
- OSR (0 f fse_t_Ru 1 es). Interpretive decision by which all
new or modified stationary sources in urban areas currently
in noncompliance with the NAAQS are judged unacceptable
unless the applicant can demonstrate a plan for reducing
emissions in an existing source by an amount greater
than the emissions from the proposed new sources; this
decision has a strong impact on the stationary source
permit process.
- EIS/R (Environmental Impact Statement/Report). A state-
ment of impact required for major projects undertaken by
the federal government or financed by federal funds
(EIS), or a report of project impact required of public
or private agencies by state or local statutes (EIR).
- LIT (Litigation). Court suits brought to resolve disagree-
ment over any of the issues mentioned above or to secure .
variances waiving federal, state or local requirements.
2. Summary of Chapter IV (Models)
In Chapter III, we identified a set of generic air quality issues.
In this chapter, we define a set of generic model types. Having done so,
we match the two, identifying in generic terms those issues for which
each model may be a suitable analysis tool. We also describe the technical
formulations and underlying assumptions employed in each generic model
II-3
-------
type, indicating some key limitations. Through this presentation, we
specify the relationship between generic issues, models, and the appli-
cations for which they are suitable.
The generic classes of dispersion models that we consider are:
> Rollback
> Isopleth
> Physico-chemical
- Grid
• Region Oriented
• Specific Source Oriented
- Trajectory
• Region Oriented
• Specific Source Oriented
- Gaussian
• Long-Term Averaging
• Short-Term Averaging
- Box
In Table II-l we associate generic model types with air quality issues
for which their use is most appropriate. In Table II-2 we present model/
application combinations of interest, characterizing applications by five
attributes: number of sources, area type, pollutant, terrain complexity,
and required resolution. The table lists the values of the attributes that
can be accommodated by each model type.
In Table II-3 we relate some specific air quality models to the generic
model categories in which they may be classified. Each of these models is
described in detailed summary form in Appendix B.
3. Summary of Chapter V (Performance Measures)
In this chapter we discuss the types of performance measures available
for use, examining their relationship with both the issues
II-4
-------
TABLE II-l. AIR QUALITY ISSUES COMMONLY ADDRESSED BY GENERIC MODEL TYPE
Generic Model Type
Refined Usage
1. Grid1
a. Region Oriented
b. Specific Source Oriented
2. Trajectory1
a. Region Oriented
b. Specific Source Oriented
3
1
Gaussian
a.
Short-Term Averaging
1) Multiple Source
11) Single Source
b. Long Term-Averaging
Refined/Screening Usaqe
4. Isopleth1'5
Screening Usage
5. Rollback
6. Box
Issue Category
SlP/CAQMP PSD MSR BSR IT57R LTT
Notes:
1. Only short-term time scales can be considered (less than several days).
2. Regional impact of new sources can be assessed but not near-source, or microscale, effects.
3. Only non-reactive pollutants can be considered.
4. Only pollutants having long-term standards can be considered (SO., TSP, and NO.).
5. Only photochemically active pollutants can be considered.
II-5
-------
TABLE II-2. MODEL/APPLICATION COMBINATIONS
Centric Model Type
REFINED USAGE
trid
a. Region Oriented
Umber of
Sources
Area Type
Multiple-Source Urban
Rural
Militant
03. MC. CO. «02
(1-hour). S02
(3- and 24-hour).
TSP
Terrain
Complexity
Required
Resolution
Simple Temporal
Complex (Limited) Spatial
ft. Specific Source
Oriented
Single-Source Rural
Oj. HC. CO. Mfe
(1-hour). SO?
(3- and 74-hour).
TSP
Simple
Complex (Limited)
Temporal
Trajectory
a. Region Oriented
ft. Specific Source
Oriented
Multiple-Source Urban
Single-Source Urban
Rural
01. HC. CO. 102
(1-hour). SO?.
(3- and 24-hour).
TSP
03. HC. CO. W>2
(1-hour). SO?.
(3- end 24-hour).
TSP
Si-pie
Siayle
Complex (United)
Teaporal
Spatial (Lialted)
Teaporal
Spatial (LiBited)
tauisian
a . Long-Term
ft.
Short-Ter*
Averaging
Multiple- Source
Single- Source
Multiple-Source
Single- Source
Urban
Rural
Urban
Rural
SO? (Annual). TSP. Dimple
NOj (Annual)* Complex (Limited)
SO? (J- and 24-
hour). CO. TSP.
NOz. (l-hour)«
Si«ple
Coaplex (Li-ited)
Spatial
Temporal
Spatial
REFIMA/SC.KENING USAGE
Isopleth
ftuUiple-Source Urban
. .
(T-bour)
Simple Teaporal (Ltaited)
Complex (Limited)
SCREENING USAGE
Rollback
Multiple-Source
Single- Source
Urban
Rural
Multiple-Source Urban
03. HC. «02
S02. CO. TSP
03. HC. CO. NOz
(1-hour). SO?
(3- and 24-hour),
TSP
Simple
Complex (Limited)
Siaple
Conplex (Linited)
Temporal
* duty if M is taken to be total
11-6
-------
TABLE II-3. SOME AIR QUALITY MODELS
Generic Model Type
Refined Usage
Grid
a. Region Oriented
b. Specific Source Oriented
Trajectory
a. Region Oriented
b. Specific Source Oriented
Gaussian
a. Long-term Averaging
b. Short-term Averaging
Refiner/Screening Usage
Isopleth
Screening Usage
Rollback
Box
Spec ific Model Name
SAI
LIRAQ
PICK
EGAMA
DEPICT
DIFKIN
REM
ARTSIM
RPM
LAPS
AQDM
COM
CDMQC
TCM
ERTAQ*
CRSTER*
VALLEY*
TAPAS*
APRAC-1A
CRSTER*
HANNA-GIFFORD
HIWAY
PTMTP
PTDIS
PTMAX
RAM
VALLEY*
TEM
TAPAS*
AQSTM
CALINE-2
ERTAQ*
EKMA
WHITTEN
LINEAR ROLLBACK
MODIFIED ROLLBACK
APPENDIX 0
ATDL
* These models can be used for both long-term and short-term
averaging.
-------
and the models we identified in Chapters III and IV. Our discussion
proceeds as follows: We first identify generic types of performance
measures; we then catalogue some specific performance measures
(describing them in detail in Appendix C); and finally we match
generic performance measures to the issue/model/application combin-
ations presented in earlier chapters.
We consider four generic performance measure categories: peak,
station, area, and exposure/dosage. The first category contains
those measures deriving from the differences between the predicted and
observed concentration peak, its level, location and timing. The second
category includes measures based on concentration differences between
prediction and observation at specific measurement stations. Within the
third category are contained those measures based on concentration
field differences throughout a specified area. The fourth category
includes measures derived from differences in population exposure and
dosage within a specified area.
Each of these generic performance measure categories requires
successively greater knowledge of the spatial and temporal distribution
of concentrations. We show in Figure INI a schematic representation of
several distinct levels of knowledge about regional concentrations. A
similar schematic illustration appropriate for source-specific situations
is shown in Figure I1-2. Listed in Table I1-4 are the information require-
ments for the four categories. We also consider the relative likelihoods
that reliable information will be available supporting calculation of measures
from each of the four categories.
Three types of variations are recognized among performance measures:
scalar, statistical, and pattern recognition. Those measures of the
first type are based on a comparison of the predicted and observed
values of a specific quantity: the peak concentration level, for
instance. Those of the second type compare the statistical behavior
(the mean, variance, and correlation, for example) of the differences
between the predicted and observed values for the quantities of interest.
11-8
-------
CONCENTRATION
PEAK
Vvvv
MEASUREHHT STATION
CONCENTRATIONS
BOUNDARIES OF
MODELING REGION
FIGURE II-l
VARIOUS LEVELS OF KNOWLEDGE ABOUT
REGIONAL CONCENTRATIONS
ASUREHENT
STATION CONCENTRATIONS
GROUND-LEVEL CON
CENTRATION PEAK
CONCENTRATION
FIELD
C(x.y.t)
FIGURE II-2.
VARIOUS LEVELS OF KNOWLEDGE ABOUT
SPECIFIC-SOURCE CONCENTRATIONS
II-9
-------
TABLE I1-4. GENERIC PERFORMANCE MEASURE
INFORMATION REQUIREMENTS
Generic
Performance
Measure Type
Peak
Station
Area
Exposure/dosage
Information Required
Predicted and measured concentration peak (level,
location, and time), i.e.,
.. Meas.
Predicted and measured concentrations at specific
stations (temporal history), i.e.,
• 1 * * * " stations
Predicted and measured concentration field within
a specified area (spatial and temporal history),
i.e.,
C(x,y,t)pred >
Both the predicted and measured concentration
field and the predicted and actual population
distribution within a specified area (spatial
and temporal history), i.e.,
C(x,y,t)pred §
C(x,y,t)pred f
11-10
-------
Measures of the final type are useful in triggering "pattern recognition,"
that is, providing qualitative insight into model behavior, transforming
.concentration "residuals" (the differences between predicted and observed
values) into forms that highlight certain aspects of model performance.
To illustrate the types of variations found in each generic
performance measure category, we present Table II-5. Some typical
examples are included for each category/variation combination. In
Section D of this chapter, a number of specific performance measures
are listed. .Examined in detail in Appendix C, they are classified
according to the scheme presented here.
For reasons we examine in this chapter, performance measures may
be associated with the issue classes. We match issue with measure in
Table II-6, indicating where their calculation might be of use. Note
that NSR and PSD are both part of the preconstruction review process
for a new source.
Also, we may match measures to model type, as is shown in Table II-7.
This we do based on differences among model types in their ability to cal-
culate each of the measure types. Isopleth, rollback and box models, for
instance, provide insufficient spatial resolution for calculation of station,
area or exposure/dosage measures. Likewise, long-term averaging Gaussian
models lack sufficient temporal resolution to permit calculation of exposure/
dosage measures.
Several important conclusions are reached in this chapter about the
suitability for use of each of the four measure types:
> Performance measures relying on a comparison of the
predicted and "true" peak concentrations may not be
reliable in all circumstances, since measurement networks
can provide only the concentration at the station record-
ing the highest value, not necessarily the value at the
"true" peak.
11-11
-------
TABLE II-5.
TYPES OF VARIATIONS AMONG GENERIC
PERFORMANCE MEASURE CATEGORIES
Generic Performance
Measure Category
Station
Are*
Cxpasurc/do&age
Types of
Variations Typical Exaaple
Scalar Concentration residual* tt the peak.
Pattern Nap showing locations and values of Maximum
Recognition one-hour-average concentrations for each hour.
Scalar Concentration residual at the station Measuring
the highest value.
Statistical Expected va\u*. variance and correlation coef-
ficient of the residuals for the nodeling day
at • particular Measurement station.
Pattern At the ti«* of the peat (event-related). the
Recognition ratio of the residual at the station having
the highest value to the average of the resi-
duals at the other station sites (this can
Indicate whether the nodel performs better near
the peak than it does throughout the rest of
the modeled region).
Scalar Difference In the fraction of the andeled area
in which the NAAQS are exceeded.
Statistical At the tla* of the peak, differences 1n the
area/concentration frequency distribution.
Pattern For each modeled hour, Isopleth plots of the
Recognition ground-level residual field.
Scalar Differences in the number of perton-hours of
exposure to concentrations greater than the
NAAQS.
Statistical Differences in the exposure concentration fre-
quency distribution.
Pattern For the entire modeled day, an isopleth plot
Recognition of the ground level dosage residuals.
• Residual: The difference between "predicted" and "observed.
11-12
-------
TABLE II-6.
PERFORMANCE MEASURES COMMONLY
ASSOCIATED WITH SPECIFIC ISSUES
Performance Measure Type
Issue
Multiple-source
SIP/C
AQMP •
Specific-source
PSD
HSR
OSR
EIS/R
in
Peak
X
X
X
X
X
X
Station
X
X
X
X
X
X
X
Area
X
X
X
X
X
X
X
Exposure/Dosage
X
X
TABLE II-7.
PERFORMANCE MEASURES THAT CAN BE
CALCULATED BY EACH MODEL TYPE
Model
Refined usage
Grid
Region oriented
Specific source oriented
Trajectory
Region oriented
Specific source oriented
Gaussian
Long-term averaging
Short-term averaging
Refined/screening usage
Isopleth
Screening usage
Rollback
Box
Performance Heasure Type
Exposure/
Peak Station Area Dosage
11-13
-------
> Performance measures relying on a comparison of the
predicted and "true" concentration fields may not be
computationally feasible since neither predicted nor
"true" concentration fields are always resolvable,
spatially or temporally.
> Performance measures based upon a comparison of predicted
and "true" exposure/dosage, though they are appealing
because of their ability to serve as surrogates for the
health effects experienced by the populace, may not be
computationally feasible because of the difficulty in
measuring the "true" population distribution and the
"true" concentration field. (We do suggest in Chapter VI
and Appendix D, however, one means by which health effects
considerations can be accounted for implicitly.)
> Performance measures based upon a comparison of the
predicted and observed concentrations at station sites
in the measurement network may be of the greatest practical
value.
4. Summary of Chapter VI (Performance Standards)
The central purpose of this report is to suggest means for setting
performance standards for air quality dispersion models. In this
chapter we reach this goal. Our discussion proceeds as follows: First
we identify five key attributes of desirable model performance, evaluating
how their relative importance varies depending on the issue addressed and
the pollutant/averaging time considered; then we propose specific perfor-
mance measures appropriate for use in testing for the presence of these
attributes; and finally we suggest rationales on which to base the setting
of formal standards. Having recommended for use a list of performance mea-
sures and standards, we deal with two additional issues: interpretation
of the values of the measures, which we illustrate by means of a sample
case study, and promulgation of formal performance criteria, which we
explore by proposing an outline of a draft standard.
11-14
-------
The five attributes of desirable model performance are defined as
follows: accuracy of the peak prediction, absence of systematic bias,
lack of gross error, temporal correlation, and spatial alignment. Though
they are interrelated, each of the five performance attributes is distinct.
Consequently, we must employ different kinds of performance measures to
determine the presence or absence of each. We list in Table II-8 the
objectives of each type of performance measure.
TABLE II-8. PERFORMANCE MEASURE OBJECTIVES
Performance
Attributes _. Objective of Performance Measures
Accuracy of the Assess the model's ability to predict the concentra-
peak prediction tion peak (its level, timing and location)
Absence of Reveal any systematic bias in model predictions
systematic bias
Lack of gross Characterize the error in model predictions both at
error specific monitoring stations and overall
Temporal Determine differences between predicted and observed
correlation temporal behavior
Spatial alignment Uncover spatial misalignment between the predicted
and observed concentration fields
We classify the difference between bias and error by means of the
following example. Suppose when we compare a set of model predictions
with station observations, we find several large positive residuals (pre-
dicted minus observed concentrations) balanced by several equally large
negative residuals. If we were testing for bias, we would allow the
oppositely signed residuals to cancel. A conclusion that the model dis-
played no systematic bias therefore might be a justifiable one. On the
other hand, were we testing for gross error, the signs of the residuals
would not be considered with oppositely signed residuals no longer allowed
to cancel. Because the absolute value of the residuals is large in our
example, we might well conclude that the model predictions are subject to
significant gross error.
11-15
-------
Which of these performance attributes, however, is most important?
This question has no unique answer, the relative importance of each
attribute depending on the type of issue the model is being used to address
and the type of pollutant under consideration. In order to relate attri-
bute importance to application issue in a more convenient manner, we pre-
sent in Table II-9 a matrix of generic issues (as defined earlier in this
report) and problem type. For each combination we indicate an "importance
category." We define the three categories based on how strongly we insist
that model performance be judged acceptable for the given problem type.
For Category 1, we require that the performance attribute must be present
(the problem type is of prime importance). For Category 2, the attribute
should be present but, if it is not, some leeway ought to be allowed, per-
haps at the discretion of a reviewer (although the attribute is of consider-
able importance, some degree of "mismatch" may be tolerable). For Category 3,
we are not insistent that the performance attribute be present, though we
state that as being a desirable objective (the attribute is not of central
importance). The reasoning behind the entries in this table is complex.
For this reason, we urge the reader to consult the detailed discussion in
Chapter IV Section C.
TABLE I1-9. IMPORTANCE OF PERFORMANCE ATTRIBUTES BY ISSUE
Importance of Performance Attribute*
Performance Attribute SIP/C AQMP PSD NSR OSR EIS/R LJI
Accuracy of the peak 1 1 1121 1
prediction
Absence of systematic 1111111
bias
Lack of gross error 11 11111
Temporal correlation 2233333
Spatial alignment 2213333
* Category 1 - Performance standard must always be satisfied.
Category 2 - Performance standard should be satisfied, but some leeway
may be allowed at the discretion of a reviewer.
Category 3 - Meeting the performance standard is desirable but failure
is not sufficient to reject the model; measures dealing
with this problem should be regarded as "informational."
H-16
-------
The relative importance of each performance attribute also is dependent
on the type of pollutant being considered and the averaging time required
by the NAAQS. If a species is subject to a short-term standard, for
instance, model peak accuracy and temporal correlation might be of con-
siderable concern, depending on the issue being addressed. However, if
the species is subject to a long-term standard, neither of these are of
appropriate form. We indicate in Table 11-10 a matrix of the problem types
and pollutant species. We rank each combination by the same importance
categories we used earlier in Table II-3.
Conceivably, a conflict might exist between the ranking indicated
by the issue and the pollutant matrices in Tables II-9 and 11-10. We
would resolve the conflict in favor of the less stringent of the two
rankings.
Having identified the problem types of interest, we then suggest
specific performance measures for use. Our recommended choice of perfor-
mance measures is based upon the following criteria:
> The measure is an accurate indicator of the presence of a
given problem type.
> The measure is of the "absolute" kind, that is, specific
standards can be set.
> only station measures should be considered for use in
setting standards.* (This is more an "unavoidable" choice
than a. "preferred" one.)
Based on these criteria, we recommend the set of measures described
in Table 11-11. The use of ratios (cpp/cpm and v, for example) can intro-
duce difficulties: They can become unstable at low concentrations, and the
statistics of a ratio of two random variables can become troublesome. Never-
theless, when used properly their advantages can be offsetting. For example,
the use of Cp /Cp instead of Cp-Cm) permits a health effects rationale to be
used in recommending a performance standard (see a later discussion).
*P!ote the caveat on pages VI-18 and VI-19, with respect to point source applications,
11-17
-------
We draw a distinction between those measures that are of general
use In examining model performance and the much smaller subset of measures
that are most amenable to the establishment of explicit standards. Many
measures can provide rich insight into model behavior, but the informa-
tion Is conveyed in a qualitative way not suitable for quantitative
characterization (a requisite for use in setting performance standards).
TABLE 11-10. IMPORTANCE OF PERFORMANCE ATTRIBUTES BY POLLUTANT AND AVERAGING TIME
Importance of Performance Attribute*
Pollutants with Short-tern Standards
Performance 3 . CO** MIC* 2
Attribute (1 hour)1 (1 hour) (3 hour) (3 tour)
Pollutants with
long-tern Standards
CO TSP** *y "y TSP *°l
B hour) (24 hour) (24 hour) (1 year) (1 year) fl year)
Accuracy of the
peak prediction
Absence of
systematic bin
lack of gross
•rror
Temporal
correlation
Spatial
alignment
1
1
1
1
1
1
1
1
2
2
1
1
1
2
2
1 1
1 1
1 • 1
2 1
2 1
1
1
1
2
2
1
1
1
3
2
1
1
1
3
2
3
1
1
«/An
2
3
•
1
1
II/A
2
8
1
1
H/A
2
• Category 1 - Performance standard Mist be satisfied.
Category 2 - Performance standard should be satisfied, but sow leeway Bay be allowed at the discretion of a reviewer.
Category 3 - Meeting the performance standard Is desirable but failure Is not sufficient to reject the Model.
t Ho short-ten M>2 standard currently exists.
I Averaging tlaes required by the KAAQS are In parentheses.
•• PrlMry standards.
tt the performance attribute is not applicable.
11-18
-------
TABLE 11-11. MEASURES RECOMMENDED FOR USE IN SETTING MODEL PERFORMANCE STANDARDSf
Performance
Attribute
Accuracy of the
peak prediction
Performance Measure
Ratio of the predicted station peak to the measured station
(could be at different stations and times)
VA.
p/ rm
Difference in timing of occurrence of station peak*
Absence of
systematic bias
Lack of gross
error
Temporal cor-
relation*
Spatial alignment
Average value and standard deviation of the mean deviation
about the perfect correlation line normalized by the average
of the predicted and observed concentrations, calculated for
all stations during those hours when either the predicted or
the observed values exceed some appropriate minimum value -
(possibly the NAAQS)
v, o_\
y '
OVERALL
Average value and standard deviation of the absolute devia-
tion about the perfect correlation line normalized by the
average of the predicted and observed concentrations, calcu-
lated for all stations during those hours when either the
predicted or the observed values exceed some appropriate
minimum value (possibly the NAAQS)
OVERALL
Temporal correlation coefficients at each monitoring station
for the entire modeling period and an overall coefficient
averaged for all stations
rf , r
*1 COVERALL
for* 1 <. i <. M monitoring stations
Spatial correlation coefficients calculated for each modeling
hour considering all monitoring stations, as well as an over-
all coefficient average for the entire day
r , r
xj XOVERALL
for 1 <. j <. N modeling hours
* These measures are appropriate when the chosen model is used to consider questions
involving photochemically reactive pollutants subject to short-term standards.
t There is deliberate redundancy in the performance measures. For example, in
testing for systematic bias, U and o^. are calculated. The latter quantity
is a measure of "scatter" about the perfect correlation line. Thjs is also an
indicator of gross error and could be used in conjunction with |y"| and o~.
11-19
-------
These "measures," often involving graphical display, really are tools
for use in "pattern recognition." They display model behavior in
suggestive ways, highlighting "patterns" whose presence reveals much
about model performance. Several examples of such "measures" are
isopleth contour maps of predicted concentrations estimates of
"observed" ones, isopleth contour maps of the differences between the
two, and time histories of predicted and observed concentrations at
specific monitoring stations.
Although we focus on station measures for use in setting model per-
formance standards, we do not suggest that the calculation of performance
measures be limited to such measures. Many other measures should be used
where appropriate. The data should be viewed in as many, varied ways as
possible in order to enrich insight into model behavior. We suggest a
number of useful measures both in Chapter V and Appendix C.
Having identified specific measures for use, we consider four rationales
for setting appropriate standards. The rationales, along with a statement
of their guiding principles, are shown in Table 11-12. We discuss each in
detail in Appendix D.
The four rationales differ in their ability to consider each of the
five problem types. Shown in Table II-13 are the types of problems
addressable by measures whose standards are set by each of the rationales.
Only the Pragmatic/Historic rationale is of use in addressing all problem
types; the other three are of use principally in defining the level of
performance required in predicting values at or near the concentration
peak. In Table 11-14 we associate each rationale with those issues
for which its use is appropriate.
We select in the following ways from among the alternative rationales.
Hoping to avoid introducing a procedural bias, we first eliminate the
Guaranteed Compliance rationale from further consideration. Then,
because the Health Effects rationale is better suited for use in setting
11-20
-------
TABLE 11-12.
POSSIBLE RATIONALES FOR SETTING MODEL
PERFORMANCE STANDARDS
Rationale
Guiding 'Principle
Health Effects
Control Level
Uncertainty
Guaranteed Compliance
Pragmatic/Historic
The metric of concern is the area-integrated cum-
ulative health effects due to pollutant exposure;
the ratio of the metric's value based on predic-
tion to its value based on observation must be
kept to within a prescribed tolerance of unity.
Uncertainty in estimates of the percentages of
emissions control required must be kept within
certain allowable bounds.
Compliance with the NAAQS must be "guaranteed";
all uncertainty must be on the conservative side
even if this approach means introducing a syste-
matic bias.
In each new application, a model should perform
at least as well as the "best" previous perfor-
mance of a model in its generic class in a sim-
ilar application; until such a historical data
base is complete, other more heuristic approaches
may be applied.
11-21
-------
TABLE 11-13. PERFORMANCE ATTRIBUTES ADDRESSABLE USING
PERFORMANCE STANDARD RATIONALES
Performance
Attribute
Accuracy of the
peak prediction
Absence of
systematic bias
Lack of gross
error
Temporal
correlation
Soatial alignment
Health*
Effects
X
X
Control Level* Guaranteed
Uncertainty Compliance
X X
X
Pragmatic/
Historic
X
X
X
X
X
* These are most suited for photocheaically reactive pollutants subject
to short-term standards.
TABLE 11-14. ASSOCIATION OF RATIONALES WITH GENERIC ISSUES
Issue Category
Rationale
Health Effects
Control Level
Uncertainty
Guaranteed
Compliance
Pragmatic/
Historic
Multiple-Source
SIP/C
X
X
AQMP
X
X
PSD
X
X
Specific-Source
NSR
X
X
OSR EIS/R
X
X
LIT
X
X
X
X
X
X
X
X
11-22
-------
standards for peak measures, we choose to use it only in that way. As is
clear from Table 11-13, we presently have no alternative but to apply
the Pragmatic/Historic rationale for those measures designed to test
for systematic bias and gross error as well as to evaluate temporal
correlation and spatial alignment.
Where we invoke the Pragmatic/Historic rationale as justification
for selecting specific standards, we also state the specific guiding
principles we follow. We summarize those here:
> When the pollutant being considered is subject to a short-
term standard, the timing of the concentration peak may be
an important quantity for a model to predict. This is parti-
cularly true when the pollutant is also photochemically
reactive. We state as a guiding principle: "For photochem-
ically reactive pollutants, the model must reproduce reason-
ably well the phasing of the peak." For ozone an acceptable
tolerance for peak timing might be ±1 hour.
> The model should not exhibit any systematic bias at concen-
trations at or above some appropriate minimum value (possibly
the NAAQS) greater than the maximum resulting from EPA-allowable
calibration error in the air quality monitors. We would
consider in our calculations any prediction-observation pair
in which either of the values exceed the pollutant standard.
> Error (as measured by its mean and standard deviation) should not
be significantly different from the distribution of differences
resulting from the comparison of an EPA-acceptable monitor
with an EPA reference monitor. The EPA has set maximum
allowable limits on the amount by which a monitoring technique
may differ from a reference method (40 CFR § 53.20). An "EPA-
acceptable monitor" is defined here to be one that differs from
a reference monitor by up to the maximum allowable amount.
> Predictions and observations should appear to be highly cor-
related at a 95 percent confidence level, both when compared
11-23
-------
temporally and spatially. We can estimate the minimum allow-
able value for the respective correlation coefficient by using
a t-statistic at the appropriate percentage level and having
the degrees of freedom appropriate for the number of prediction-
observation pairs.
The guiding principles noted above are plausible ones, though in
some cases they are arbitrary. As a "verification data base" of
experience is assembled, historically achieved performance levels may
be better indicators of the expected level of model performance.
Standards derived on this more pragmatic basis may supplant those
deriving from the "guiding principles" followed in this report.
Our recommended choice for use, when possible, in establishing peak-
accuracy standards is a composite one, combining the Health Effects and
Control Level Uncertainty rationales. Were a model to overpredict the
peak, a control strategy based on its prediction might be expected
to abate the health impact actually occurring, though with more control
than actually needed. If the model underpredicted, however, the control
strategy might be "underdesigned," with the risk existing that some of the
health impact might remain unabated even after control implementation.
The penalty, in a health sense, is incurred only when the model underpre-
dicts. The Health Effects rationale then is one-sided, helping us set
performance standards only on the "low side."
On the other hand, the Control Level Uncertainty rationale is
bounded "above" and "below", that is, its use provides a tolerance
interval about the value of the measured peak concentration. For a
model to be judged acceptable under this criterion, its prediction of
the peak concentration would have to fall within this interval. Model
underprediction could lead to control levels lower than required, but
residual health risks. Overprediction, on the other hand, could lead
to abatement strategies posing little or no health risk but incurring
control costs greater than required.
11-24
-------
For the above reasons, we suggest that the Control Level Uncer-
tainty rationale be used to establish an upper bound (overprediction)
on the acceptable difference between the predicted and observed peak.
We would choose the lower bound (underprediction) to be the interval
that is the minimum of that suggested by the Health Effects and
Control Level Uncertainty rationales.
We list our recommendations in Table 11-15, noting the possibility
that the recommended rationales may not be appropriate in all applications
for all pollutants. Whether health effects would be an appropriate con-
sideration when considering TSP, for instance, is unclear. The Health
Effects rationale, as defined in Appendix D, is best suited for use in
urban applications involving short-term, reactive pollutants. In those
circumstances when the HE or CLU rationales are not suitable, we suggest
the Pragmatic/Historic rationale.
We summarize in Table 11-16 our list of recommended performance
measures and standards. In it, we associate performance attribute
and standard. To further describe the standard, v/e state the type of
rationale used and the guiding principle followed, as well as providing
sample values that are appropriate for the sample case we consider
in this chapter.
We also discuss two supplementary subjects. First, we illustrate
how performance measure values may be interpreted by describing a
sample case based on use of the SAI Airshed Model in simulating the
Denver Metropolitan region Then, we consider means by which model
performance criteria may be promulgated, suggesting an outline for a
draft standard.
Thus we conclude this chapter and the report. We note in closing
that the performance subject itself is by no means exhausted. Many
areas remain to be explored in greater detail, all warranting considerable
additional effort.
11-25
-------
TABLE 11-15. RECOWENDED RATIONALES FOR SETTING STANDARDS
Performance
Attribute
Accuracy of peak
prediction
Recommended Rationale
Health Effects* (lower side/underprediction)
Control Level Uncertainty* (upper side/overprediction)
Absence of
sytematic bias
Pragmati c/Hi stori c
Pragmati c/Hi stori c
Pragmatic/Historic
Spatial alignment Pragmatic/Historic
Lack of gross
error
Temporal cor-
relation
* These may not be appropriate for all regulated pollutants in all applica-
tions. When they are not, the Pragmatic/Historic rationale should be
employed. They are most applicable for photochemically reactive pol-
lutants subject to a short-term standard (03 and N02i if a 1-hour
standard is set).
11-26
-------
TABLE 11-16. SUMMARY OF RECOMMENDED PERFORMANCE MEASURES AND STANDARDS
Performance Standard
Performance
Attribute
Accuracy of the
peak prediction
Performance of Measure
Ratio of the predicted
station peak to the
measured station peak
(could be at different
stations and times)
Type of Rationale
Health Effects1-
(lower side) com-
bined with Control
Level Uncertainty
(upper side)
Guiding Principle
Limitation on uncertainty
1n aggregate health
Impact and pollution
abatement costs^
Sample Value
(Denver Example)
Cp
80 < fr-2- <. 150 percent
\
Difference in timing of
occurrence of station
Pragmatic/Historic
Model must reproduce
reasonably well the
phasing of the peak,
Sty, il hour
± 1 hour
Absence of Average value and standard Pragmatic/Historic
systematic bias deviation of mean devia-
tion about the perfect
correlation line normal-
ized by the average of the
predicted and observed con-
centrations, calculated for
all stations during those
hours when either predicted
or observed values exceed
some appropriate minimum
value (possibly the NAAQS).
'OVERALL
Lack of gross Average value and Stan- Pragmatic/Historic
error dard deviation of absolute
mean deviation about the
perfect correlation line
normalized by the average
of the predicted and
observed concentrations,
calculated for all sta-
tions during those hours
when either predicted or
observed values exceed some
appropriate minimum value
(possibly the NAAQS)
No or very little systematic
bias at concentrations (pre-
dictions or observations) at
or above some appropriate
minimum value (possibly the
NAAQS); the bias should not
be worse than the maximum
bias resulting from EPA-
allowable monitor calibra-
tion error (-8 percent is
a representative value for
ozone); the standard devia-
tion should be less than or
equal to that of the differ-
ence distribution of an EPA-
acceptable monitor** com-
pared with a reference moni-
tor. (3 pphm is represents-
tlve for ozone at the 95
percent confidence level)
For concentrations at or
above some appropriate
minimum value (possibly
the NAAQS), the error
•• (as measured by the overall
values of jiT| and o|—| )
should be indistinguishable
from the difference result-
ing from comparison of an
EPA-acceptable monitor with
a reference monitor
No apparent bias at
ozone concentrations
above 0.06 ppm
(see Table VI-12 and
Figures Vl-5 and Vl-6
for further details)
NO excessive gross
error (see Table
VI-12 and Figures
Vl-5 and VI-6 for
further details)
Temporal correla-
tion*
Spatial alignment
\ I "I/OVERALL
Temporal correlation coef-
ficients at each monitor-
ing station for the entire
modeling period and an
overall coefficient for
all stations
Pragmatic/Historic
V1
COVERALL
for 1 i 1 i M monitoring
stations
Spatial correlation coef-
ficients calculated for
each modeling hour con-
sidering all monitoring
stations, as well as an
overall coefficient for
the entire day
V "OVERALL
for 1 i j <. N modeling
hours
Pragmatic/Historic
At a 95 percent confidence
level, the temporal pro-
file of predicted and
observed concentrations
should appear to be in
phase (in the absence of
better information, a con-
fidence interval may be
converted into a minimum
allowable correlation
coefficient by using an
appropriate t-statistic)
At a 95 percent confidence
level, the spatial distri-
bution of predicted and
observed concentrations
should appear to be cor-
related
For each monitoring
station,
0.69 <. r. <. 0.97
Overall,
OVERALL
0.88
In this example a
value of r >. 0.53 is
significant at the
95 percent confidence
level
For each hour,
-0.43 t.T i 0.66
xj
Overal1,
0.17
'OVERALL
In this example a
value of r 2. 0.71 is
significant at the 95
percent confidence
level
* These measures are appropriate when the chosen model is used to consider questions involving photochemically reactive
pollutants subject to short-term standards.
•t These may not be appropriate for all regulated pollutants in all applications. When they are not the Pragmatic/
historic rationale should be employed.
** The EPA has set maximum allowable limits on the amount by which a monitoring technique may differ from a reference
method. An "EPA-acceptable monitor" Is defined here to be one that differs from a reference monitor by up to the
maximum allowable amount.
11-27
-------
Ill ISSUES REQUIRING MODEL APPLICATION
Air pollution models have been developed over a period of years, not
always in response to specific needs. While convenience and availability
(rather than strict suitability) often motivated their use in particular
applications, certain classes of models have come to be associated with
certain classes of applications. For this reason, it is helpful to view
the setting of model performance measures and standards within that issue-
specific context. This chapter is intended to provide an issues framework
within which the application of air pollution models can be viewed. First,
an overview is provided, highlighting important aspects of air pollution
law. By means of this discussion, generic issues are identified. Then,
these issues are examined and their implications for model applications
explored.
A. A PERSPECTIVE ON THE ISSUES
Basic air pollution law in this country has been enacted at the fed-
eral level, although many important legal variants exist among states and
localities. The passage of legislation, however, is often just a first
step. Usually, only broad authority is granted in the original law. It
remains to the federal agency thus chartered by the Congress to set the
specific regulations implementing the law. These are then promulgated,
becoming an additional part of the Code of Federal Regulations (CFR).
Notice is provided of such an action by publication in the Federal Register
(FR). When disagreements exist over the degree to which the promulgated
regulations mirror the intent of the original law, civil suits may be brought
in court to resolve disputes. Judgments in such suits can and have had
important effects on the CFR. In the remainder of this section we will
explore briefly the body of air pollution law, from enabling legislation to
promulgation of regulations in the CFR.
III-l
-------
1. Federal Air Pollution Law
Basic federal law is contained in the United States Code (USC). It
is divided into "Titles" which are themselves divided into "Sections."
Groups of sections form "Chapters." Title 42 of the USC (usually denoted
as 42 USC) is entitled "The Public Health and Welfare." It contains the
basic law pertaining to air pollution: Chapter 15B entitled "Air Pollution
Control" and Chapter 55 entitled "National Environmental Policy."
The Clean Air Act is contained in Section 1857 of Title 42 (within
Chapter 15B) and is referenced by the notation 42 USC §1857. Originally
enacted in 1963, It has since been amended a number of times. The most
notable changes occurred with the passage of the Clean Air Act Amendments
of 1970 and 1977, the former of which, among other things, created the
Environmental Protection Agency (EPA), authorized the setting of national
ambient air quality standards (NAAQS) and required the development of state
implementation plans (SIPs) for the attainment of compliance with the NAAQS.
After passage by the Congress and signature by the President, a bill con-
taining such amendments or providing for new portions of the USC becomes
a part of the public law and is referred to both by the Congressional ses-
sion and a passage sequence number. The 1970 Amendments, for example, are
referred to as Public Law 91-604. For reference, the 91st Congress convened
for the two years from January 1969 to January 1971.
The other legislation most heavily affecting air pollution law is the
National Environmental Policy Act (NEPA) of 1969 (Public Law 91-190), which
amended Chapter 55 (National Environmental Policy) of Title 42. In its
•primary features, the act created the Council on Environmental Quality
reporting to the President and mandated the preparation of environmental
impact statements (EISs) for "major Federal actions significantly affecting
the quality of the human environment." These are required for federal agency
actions and for projects supported "in whole or in part" with federal finan-
cing. The NEPA is found in 42 USC §4321, 4331 to 4335, 4341, and 4341 to 4347.
III-2
-------
2. The Code of Federal Regulations
Implementation of federal law is accomplished by promulgation of
specific regulations, the body of which is contained in the Code of Fed-
eral Regulations. The CFR is divided into "Titles" (not the same as those
in the DSC), which are themselves subdivided into "Chapters," "Subchapters,"
and "Parts." All federal regulations pertaining to air pollution are con-
tained in Title 40 which is called "Protection of the Environment." The
formal organization of 40 CFR is shown in Exhibit III-l. Note that Title 40
contains no Chapters II and III.
Subchapter C, "Air Programs," is expanded in that exhibit to include
"Part" subheadings as is Chapter V, "Council on Environmental Quality."
The following parts within Chapter I are of particular importance. In Part
50 the primary and secondary NAAQS are set for sulfur dioxide, particulate
matter, carbon monixide, photochemical oxidants, hydrocarbons, and nitrogen
dioxide. In Part 51 requirements are stated for the development of SIPs.
All State plans, whether approved or disapproved, are published in Part 52.
In Part 60 the emissions standards are set for new and modified stationary
sources. Further breakdown of these parts by section heading is provided
in Appendix A.
As originally conceived, SIPs were blueprints for achieving compliance
with the NAAQS. As the regulations have evolved, however, they now require
that SIPs now provide for air quality maintenance (AQM) once compliance
has been achieved. SIPs are currently being revised according to the man-
dates of the 7 August 1977 Clean Air Act Amendments and are required to
be reassessed periodically as to their ability to attain and maintain the
NAAQS.
III-3
-------
EXHIBIT III-1. FORMAL ORGANIZATION OF CFR TITLE 40—
PROTECTION OF ENVIRONMENT
Chapter 1. Environmental Protection Agency
Subchapter A - General (Parts 0-21)
Subchapter B - Grants and Other Federal Assistance (Parts 30-49)
Subchapter C - Air Programs (Parts 50-89)
Part 50. National primary and secondary ambient air quality
standards
Part 51. Requirements for preparation, adoption, and sub-
mi ttal of implementation plans
Part 52. Approval and promulgation of implementation plans
Part 53. Ambient air monitoring reference and equivalent
methods
Part 54. Prior notice of citizen suits
Part 55. Energy related authority
Part 60. Standards of performance for new stationary sources
Part 61. National emission standards for hazardous air
pollutants
Part 79. Registration of fuels and fuel additives
Part 80. Regulation of fuels and fuel additives
Part 81. Air quality control regions, criteria, and control
techniques
Part 85. Control of air pollution from new motor vehicles and
new motor vehicle engines
Part 86. Control of air pollution from new motor vehicles and
new motor vehicle engines: certification and test
procedures
Part 87. Control of air pollution from aircraft and aircraft
engines
Part 88-89. [Reserved]
Subchapter D - Water Programs (Parts 100-149)
Subchapter E - Pesticide Programs (Parts 162-180)
Subchapter F - [Reserved]
Subchapter G - Noise Abatement Programs (Parts 201-210)
Subchapter H - Ocean Dumping (Parts 220-230).
Subchapter I - Solid Wastes (Parts 240-399)
III-4
-------
Subchapter N - Effluent Guidelines and Standards (Parts 401-460)
Subchapter Q - Energy Policy (Part 600)
Chapter IV. Low Emissions Vehicle Certification Board (Part 1400)
Chapter V. Council on Environmental Quality (Parts 1500-1510)
Part 1500. Preparation of environmental impact statement:
Guidelines
Part 1510: National oil and hazardous substances pollution
contingency plan
III-5
-------
Contained within SIPs are procedures for controling emissions from both
mobile and stationary sources. Because of the size and age of the vehicle
fleet, control of emissions from mobile sources is currently an important
part of other SIP segments dealing with NAAQS compliance. As stricter auto-
motive emissions standards are achieved and older cars are removed from high-
ways through age attrition, stationary sources will contribute an increasing
fraction of the total emissions inventory. Their importance thus increases
in the AQM segment of the SIPs.
The portion of 40 CFR relating to the review of applications for
new or modified stationary sources 1s Section 51.18. There it is stated
that "no approval to construct or modify will be granted unless the appli-
cant shows to the satisfaction of the Administrator that the source will
not prevent or interfere with attainment or maintenance of any national
standard." The quote Is a paraphrase of §51.18(a), as written in the
California SIP [40 CFR §52.233(g)(3)]. Several issues of practical impor-
tance derive from this section of 40 CFR. New source review (NSR) proce-
dures are thus required, with such stationary sources directed to meet
new source performance standards (NSPS) where stated in 40 CFR §60 or as
determined by the appropriate reviewing agency and to install appropriate
pollution control equipment. Also, an important consequence of 40 CFR
§51.18 derives from its interpretation in urban areas currently in noncom-
pliance with the NAAQS. In most instances, the addition of a single, modestly
sized, stationary source would be unlikely to affect regional peak pollutant
concentration. Considered separately, an argument could be made that few new
stationary sources violate the letter of §51.18. Taken in the aggregate,
however, emissions from several new sources together could have serious ad-
verse effects on regional pollutant concentrations. To overcome this inter-
pretive difficulty, the EPA has employed the so-called offset rules (OSR).
All new stationary sources in noncompliant urban areas are considered to be
in violation of §51.18 unless the applicant can demonstrate that a reduction
in emissions from other sources and a reduction in the air quality impact of
those emissions has been achieved to offset those produced by the proposed
new source.
111-6
-------
Another issue of importance in SIP development is the prevention of
significant deterioration (PSD) of the air quality in areas currently in
attainment of the NAAQS. Originally, 40 CFR contained no provision for
consideration of PSD. A court suit, however, brought about a judgment
that SIPs must address this issue. As a consequence, subsequent to May 31,
1972, the EPA Administrator disapproved all SIPs not considering PSD.
Standards for PSD were promulgated in §52.21, entitled "Significant Deteriora-
tion of Air Quality."
In addition to SIPs, environmental impact statements and reports
(EIS/R) represent the other major class of planning documents formally
required to address air quality issues. In Chapter V of 40 CFR, guidelines
are provided for drafting EISs for major federal actions. They are required
not only for projects undertaken solely by the federal government, but also
for any major projects supported "in whole or in part" by federal financing.
EISs were submitted to the CEQ for review. They are now, however, received
and reviewed by the EPA. State and local agencies can also require for in-
dividual projects a formal statement of environmental impact. In California,
for instance, such a statement is called an "Environmental Impact Report"
(EIR) and is filed pursuant to the California Environmental Quality Act (CEQA)
Running throughout air pollution law is the basic right of legal appeal.
Court suits have played an important part in shaping the body of the law.
Portions of the authorizing statutes, the CFR, and many individual EIS/Rs
have come under legal challenge. As a result, litigation (LIT) also re-
presents an important class of issues addressed by air pollution modelers.
B. GENERIC ISSUE CATEGORIES
In the previous section we have outlined many of the important features
of air pollution law. A number of generic issues thereby have been ident-
ified. In this section we will summarize these generic issues, discuss each
briefly, and then examine their implications for air pollution modeling.
III-7
-------
In the next chapter we will match these issue categories with"a number of
existing models, comparing application requirements with model capabilities.
1. The Issues: Their Classification
The air pollution burden In a geographical area is the result of the
complex interaction of missions from all sources as they mix and disperse
in the atmosphere, subject to prevailing influences of meteorology, solar
irradiation, and te'rrain. The total pollutant concentrations experienced
are a function of the effects of emissions from each of the mobile and
stationary emitting sources, though that function is generally not a
linearly additive one. Because the NAAQS are expressed in terms of total
allowable concentration levels and are applicable at any location to which
the public has access, implementation plans are inherently regional in
perspective. There is a certain duality of focus in SIPs, however: While
they detail plans for regional NAAQS compliance and maintenance, they do so
through curtailment of emissions from individual sources and source cate-
gories. Thus, while the focus is ultimately on regional effects, the environ-
mental impact of individual sources also must be considered. This is an
explicit issue with new source review (NSR), for instance. As the number of
sources to be considered decreases, the two perspectives--regional and
single source-specific—merge together. A case in point is the examination
of the impacts of a few sources located in a rural area, where prevention
of significant deterioration (PSD) is an issue.
From the discussion of air pollution law presented earlier, we have
isolated several specific issues, each falling into one of two distinct
generic issue categories. The chief distinction between the two is not
simply the difference between regional and source-specific perspective, for
each individual source has both a regional and a localized downwind impact.
Rather, the clearest distinction lies in the number of sources considered.
Questions of regional NAAQS compliance and maintenance are multi-source
issues. NSR, on the other hand, primarily concerns a single source. Using
such a distinction, the principal issues addressed by air quality planners
are as follows:
111-8
-------
Multiple-Source Issues
- SIP/C (State Implementation Plan/Compliance). The
attainment of regional compliance with the NAAQS,
as considered in the SIP.
- AQMP (Air Quality Maintenance Planning). Regional
maintenance of compliance with the NAAQS, as con-
sidered in the SIP.
Single-Source Issues
- PSD (Prevention of Significant Deterioration). Limita-
tion of the amount by which the air quality can be de-
graded in areas currently in attainment of the NAAQS;
this is considered in each SIP.
- NSR (New Source Review). Permit process by which appli-
cants proposing new or modified stationary sources must
demonstrate that both directly and indirectly caused
emissions are within certain limits and that the pollu-
tion control to be employed is performed with the
appropriate technology; this is considered in each SIP.
- OSR (Offset Rules). Interpretive decision by which all
new or modified stationary sources in urban areas cur-
rently in noncompliance with the NAAQS are judged unac-
ceptable unless the applicant can demonstrate a plan for
reducing emissions in existing sources and that a reduc-
tion in the air quality impact of these emissions has
been achieved to offset those produced by the proposed
new source; this decision has a strong impact on the
stationary source permit process.
- EIS/R (Environmental Impact Statement/Report). A state-
ment of impact required for major projects undertaken
by the federal government or financed by federal funds (EIS),
or a report of project impact required by state or local
statutes (EIR).
- LIT (Litigation). Court suits brought to resolve disagree-
ment over any of the issues mentioned above or to secure
variances waiving federal, state or local requirements.
III-9
-------
The above seven issues are classified according to their most fre-
quently encountered form. We note that actual cases do not always conform
to the bounds of the generic issue categories as shown. An EIS, for
instance, can have a regional perspective, as with the Denver Overview EIS
recently completed for Region VIII of the EPA. Also, LIT can occasionally
have effects on regional NAAQS compliance and maintenance. For example, PSD
and AQHP resulted from court suits.
2. The Issues: Some Practical Examples and Their Implications
for Air Pollution Modeling
Many practical examples can be found in which the issues identified
above play an important role in planning. At this point, we will discuss
some of the more important applications in which they are likely to be
encountered. Modeling requirements can thus be identified. This discus-
sion will serve as a prelude to the examination of air pollution models
presented in the next chapter.
First, we consider the nature of multiple-source (M/S) issue appli-
cations. SIP/C and AQMP can focus both on urban areas as well as on large
rural sources. Here we concentrate on the most frequently encountered
applications, those in urban areas. Encountered in such regions are both
reactive pollutants [ozone (Og), hydrocarbon (HC), and nitrogen dioxide
(N02)3 and relatively nonreactive pollutants [carbon monoxide (CO), sulfur
dioxide (SO*)** and total suspended particulates (TSP)]. There are a
variety of different source types: point sources (power plants, refin-
eries, and large industrial plants, such as steel, chemical and manufac-
turing companies), line sources (highway, railroads, shipping lanes, and
•airport runways), and area sources (home heating, light industrial users
of volatile chemicals, street sanding, gasoline distribution facilities,
and shipping ports). Mobile sources (cars, trucks, and buses) almost
invariably can be aggregated into highway line sources. While a few
cities with air pollution problems are located in complex terrain (Pitts-
burg, for example), most are situated in relatively flat or gently rolling
terrain. Geographical features can play an important part in regional air
pollution (for instance, the ocean near Los Angeles, the lake near Chicago,
and the mountains near Denver).
* Sulfur dioxide is slowly reactive: S02 •»• S0|, aerosol.
111-10
-------
Air pollution modeling in such circumstances has been used for several
principal purposes. It has been useful in estimating the total amount of
emissions cutback required to reach compliance with the NAAQS. Individual
control strategies also have been assessed, both for SIP/C and AQMP. In-
sights from regional modeling have been useful in modifying and improving
pollutant measurement network design. In Denver, for instance, use of the
SAI Urban Airshed Model indicated for a particular model day the presence
of an ozone (03) peak in a then-unmonitored area. Subsequent location of a
temporary monitoring station at that site lead to the observation of 03
readings in excess of any previously measured. Also, models have had an
influence on transportation network design (the balance of freeways, arterials,
and feeders) and modal split (the mix between personal and mass transit).
Through the EIS/R process, individual projects (for example, the Interstate
470 freeway and the construction of wastewater treatment facilities, both
in Denver) have been examined using models to estimate air quality impact.
Second, we consider the nature of stationary single-source (S/S)
issues. Important applications occur in both urban and rural areas. These
focus on the following: (1) SIP/C and the permit approval process for new
or modified stationary sources and (2) the variance process for existing
facilities. As for the first of these, SIP/C and the permit approval pro-
cess, all new or modified major S/Ss, urban and rural, are subject to NSR
and must meet NSPS and use the best available pollution control equipment.
Also, both direct and indirect impact on air quality must be considered.
In urban areas, major S/Ss might include proposed refineries, power
plants, and industrial facilities, as well as shopping, employment, and
recreational/sports centers. With the last of these, indirect effects are
particularly important. Each draws appreciable numbers of automobiles,
adding to local vehicle miles traveled (VMT) and increasing congestion and
thus pollutant emissions. Also, automobile hot soak and some cold start
emissions are concentrated in accompanying parking lots.
III-ll
-------
Urban S/Ss are dealt with in the SIP/C and permit application process
differently than are rural S/Ss. In urban areas in noncompliance with the NAAQS,
OSR must be considered. The air pollution modeler must be able not only to
represent the regional and localized downwind impact of the new S/S but also
to estimate the subtractive effect of reducing emissions from one or more
existing sources.
Another difference between urban and rural areas has important signif-
icance for the modeler. In rural areas, the relatively nonreactive pollu-
tants (SOp and TSP) are often of greater interest than are the more reactive
ones. Although the NO emissions also produced at some point could gener-
ate, with the addition of HC, photochemically reactive pollutants, they are
usually not of primary concern. In urban areas, the reactive pollutants
(0 , N02, and HC) must also be modeled. When the incremental effect of a
S/S is being considered in an urban areas (OSR, as well), this distinction
can have a strong effect on model choice. This is particularly true when an
S/S emits 0 precursors such as NO , which power plants do, or HC, which
*v A
refineries do.
In rural areas, applications centering on energy development have been
prominent in recent years, particularly in the northern and central Great
Plains. The direct air pollution impact of these S/Ss would be produced by
coal extraction (strip mining), conversion to natural gas, transport to
energy production facilities if they are not on site (via unit train or
slurry pipelines), or coal combustion in large power plants. Indirect impact
would result from the construction of the above-mentioned facilities (new
highways, provision for temporary construction crews) and the growth of nearby
"boom" towns (housing for families of workers and the additional population
increase required to provide commercial and public services to workers).
A complicating factor not confronted in nonattainment regions arises in
attainment areas: PSD must be considered. No S/S or combination of them is
permitted to degrade significantly the air quality in nonpolluted rural areas.
In each SIP such areas are identified. The modeler must be able to assess the
111-12
-------
likelihood that an S/S will impinge on such areas to an unacceptable degree.
Also, because pollutants from rural source" are either inert or slow in
reacting and because surface deposition, rainout, and washout often proceed
at slow rates (depending on synoptic meteorology), atmospheric residence times
are long for some pollutants such as the derivative products of SO^. Trans-
port distances on the order of a thousand kilometers may not be unusual. The
modeler must be able to account for pollutant transport and transformation
on this temporal and spatial scale, if required.
In both urban and rural areas, the owner of a S/S has the right to seek
a variance temporarily excusing the source from provisions of the law, but
not such as to cause a violation of the NAAQS. A number of reasons could
motivate such a request. For a power plant, petroleum shortages could result
in a need to burn high-sulfur fuel. For a refinery, petroleum storage and
shipping needs might result in a variance request. Other reasons might include
a need for an extension of the time required to comply with SIP control
strategy requirements or for periodic pollution control equipment maintenance
or replacement.
3. The Issues: A Prologue to the Next Chapter
In this chapter, we have examined the body of air pollution law and
identified two generic issue categories: multiple-source issues and single-
source issues. Seven separate (though interrelated) types of issues were
classified within that structure: SIP/C, AQMP, PSD, NSR, OSR, EIS/R, and
LIT.
We have examined some practical examples illustrating particular
features of these issues as they manifest themselves in both urban and rural
areas. We have also discussed some key implications that these issues have
for air pollution modeling. This serves as an important prologue to the
discussion of specific models undertaken in the next chapter. In that
chapter we will match application requirements to model capabilities. The
issues identified here will serve as the framework within which that dis-
cussion is carried out.
111-13
-------
IV AIR QUALITY MODELS
In the last chapter, we identified generic types of air quality issues.
In this chapter, we define generic classes of models. Having done so,
we match the two, identifying those issues for which each model may be a
suitable analytical tool. We also describe the technical formulations
and underlying assumptions employed in each generic model class, indicating
some key limitations.
The final choice of a model for use in addressing a particular issue
can be made only by considering the characteristics of the proposed applica-
tion. To facilitate the comparison between model capabilities and applica-
tions requirements, we define a set of applications attributes. We then
match the two, identifying for each generic model the combinations of
application attributes for which it is suited. A related means for match-
ing model to application is described in EPA (1978a).
In this chapter we attempt to specify the relationship between issues,
models, and applications. Having done so, we then develop in Chapter V
model performance measures appropriate to each issue/model combination of
practical interest. This will set the stage for a discussion of requisite
model performance standards in Chapter VI.
In order to preserve generality, our emphasis in this chapter centers
primarily on generic model categories rather than on specific air quality
models. Certain benefits may be achieved thereby: General conclusions
appropriate to an entire class of models may be stated without reference
to any particular model, and extensive discussions of any observed differ-
ences between intended capabilities and technically achieved ones need not
be conducted for each specific model.
IV-1
-------
Our central purpose In this report is to discuss means for setting
model performance standards. While not central to this, however, we do
recognize a need to associate some specific models with our generic
model categories. To assist in doing so, we examine in Appendix B a number
of air quality models. Though the list is not a complete one, a number of
available models are examined in detail and tabulated according to several
attributes. Among these are the following: level of intended usage
(screening or refined), type of pollutant (reactivity, averaging time),
degree of resolution (spatial and temporal), and certain site specifics
(terrain, geography, as well as source type and geometry).
We summarize at the end of this chapter that part of Appendix 6 needed
to associate specific models with our generic categories. No attempt is
made in this chapter or in Appendix B to screen models for technical accept-
ability nor is any attempt made to be all-inclusive. Models are classified
according to their intended capabilities rather than their technically achieved
ones. Among the references we have drawn upon in gathering this information
are the following: Argonne (1977). EPA (19786), and Roth et al., (1976), as
well as several program users' manuals.
A. GENERIC MODEL CATEGORIES
In this chapter air quality models and prediction methods are class-
ified into generic model categories. Here we describe the structure of
the classification scheme employed, the full form of which is shown in
Exhibit IV-1. Though many such schemes have been proposed (Roth et al.,
1976, and Rosen, 1977, for example), we identify three broad divisions:
rollback, isopleth, and physico-chemical. We describe here each of these
categories, mentioning technical formulation, general capabilities, and
major limitations. In doing so, we draw upon material in Roth et al. (1976).
1. Rollback Category
Included in the first of these are all those prediction methods in
which ambient pollutant concentrations are assumed to be directly (though
IV-2
-------
not necessarily linearly) proportional to emissions, according to some
simple relationship. Emissions control requirements are presumed propor-
tional to the amount by which the peak pollutant concentration exceeds
the NAAQS. Linear rollback and Appendix J are examples of such methods.
I. Rollback
II. Isopleth
III. Physico-Chemical
A. Grid
1. Region Oriented
2. Specific Source Oriented
B. Trajectory
1. Region Oriented
2. Specific Source Oriented
C. Gaussian
1. Long-Term Averaging
2. Short-Term Averaging
D. Box
EXHIBIT IV-1. GENERAL MODEL CATEGORIES
Because atmospheric processes are generally complex and nonlinear,
the fundamental proportionality assumption invoked in rollback methods
is frequently violated in actual application. For this reason, rollback
methods are usually regarded as -screening techniques, whose results give
at best only a general indication of the amount of emissions control
required. They are most often used when insufficient data are available
to perform an analysis that is more technically justifiable. Even then,
results obtained with them are appropriate only as a crude indication of
the need for more extensive data gathering and analysis. Because rollback
methods lack spatial resolution, they are most suitable for addressing
regional, multiple-source* issues. Also, their use is more appropriate
for applications involving relatively nonreactive pollutants (SOp, CO and TSP)
* In this report, "multiple-source" refers to many, well-distributed
sources of all types and sizes. It does not include, for instance,
a single complex having multiple stacks.
IV-3
-------
2. Isopleth Category
Within the second generic model category are included those methods
relying on isopleth diagrams to relate precursor concentrations of primary
emissions (usually oxides of nitrogen and nonmethane hydrocarbon) to the
level of secondary pollutant (usually ozone) resulting from such a mixture.
As is true with the EPA EKMA method (see EPA, 1977), these diagrams are usually
constructed from computer simulations using theoretically and chamber derived
chemical kinetic mechanisms. They invoke assumptions about a number of
parameters such as regional ventilation and solar insolation, as well as
pollutant entrainment, carryover from the previous day, and transport from
upwind. The accuracy of the postulated chain of chemical reactions is
evaluated using smog-chamber data. The types of information required to con-
struct an isopleth diagram are roughly equivalent to those required to employ
a box model, and we note that the two methods are conceptually similar in
many regards. He maintain a distinction between the two, however, because
of the view prevailing in the user community that they are separate classes
of models. Also, not all box models are photochemical, as are isopleth-
based methods.
Entry into an isopleth diagram requires an estimate of the peak con-
centration actually occurring during the day on some initial base date.
Given an assumption about the relative proportion of precursor species
control (HC versus NO ), the degree of emissions cutback required to achieve
A
the NAAQS can be estimated directly.
Isopleth methods lack spatial resolution. They are thus capable of
addressing only regional, multiple-source issues. By their nature, isopleth
methods are useful only for applications involving photochemically reactive
pollutants. Because of the level of approximation involved in constructing
the isopleth diagram itself, in entering it using measured ambient data,
and in accounting for the effect of transport from upwind, such methods are
IV-4
-------
more appropriate for use as screening tools. In this capacity, they can
be helpful in assessing the need for further, more refined analysis. How-
ever, in some limited applications where the assumptions invoked in the
formulation of the isopleth methods are generally satisfied, estimates of the
required degree of emissions control obtained using such a method can be
regarded as acceptably accurate.
3. Phy s i c o-Chemi ca1 Category
The third category contains models based upon physical and chemical
principles as embodied in the atmospheric equations of state. It is divided
into four main subcategories: grid, trajectory, Gaussian, and box. We
discuss here each subcategory.
a. Grid Subcategory
Grid models employ a fixed Cartesian reference system within which
to describe atmospheric dynamics. The region to be modeled is bounded
on the bottom by the ground, on the top usually by the inversion base
(or some other maximum height), and on the sides by the desired east-west
and north-south boundaries. This space is then subdivided into a two- or
three-dimensional array of grid cells. Horizontal dimensions of each cell
measure on the order of several'kilometers, while vertical dimensions can
vary, depending on the number of vertical layers and the spatially and
temporally varying inversion base height. Some grid models assume only a
single, well-mixed cell extending from the ground to the inversion base; .
others subdivide the modeled region into a number of vertical layers.
Ideally, the coupled atmospheric equations of state, expressing con-
servation of mass, momentum, and energy, would be solved systematically
within each grid cell, with a chemical kinetic mechansim used to describe
the evolution of pollutant species. Several major difficulties arise
in practice. Computing limitations are rapidly encountered. A region
IV-5
-------
fifty kilometers on a side and subdivided into five vertical layers requires
12,500 separate grid cells if grid cells are one kilometer on a side.
Maintaining a sufficient number of species to allow the functioning of a
chemical kinetic mechanism compounds the storage problem. For a ten-
species mechanism, storage of the concentrations for each species in each
grid cell in our example would alone require 125,000 storage locations.
To avoid these and other computing or numerical problems, most grid
models solve only one atmospheric state equation—the conservation of
mass, or continuity, equation, decoupling the other two. The momentum
equation is replaced by meteorological data supplied to the model in the
form of spatially and temporally varying wind fields. The energy equation
is supplanted by externally supplied vertical temperature profile data,
from which inversion heights are also calculated.
Other problems are encountered in solving the mass continuity equation,
a principal such problem being the atmospheric viscosity terms. Turbulence,
which is a randomly varying quantity, can be described only in statistical
terms. Species concentrations, as a result, can be found only as values
averaged over some time interval. Also, the continuity equation can be
solved only if turbulence effects are decoupled through a series of approxi-
mations involving turbulence gust eddy sizes and strengths.
Grid models require the specification of time-varying boundary condi-
tions on the outer sides and the top of the modeled region, the initial
Conditions (species concentrations) in each grid cell at the start of a
simulation, and spatially and temporally varying emissions for each pri-
mary pollutant species. The first two of these are derived from station
measurement data, and the last is obtained from an appropriate emissions
inventory for the modeled region.
IV-6
-------
Grid models are capable of considering both reactive and relatively
nonreactive pollutant species. Models considering reactive species,
because of their limited time scale (less than several days), are
appropriate tools only for addressing questions involving pollutants
having short-term standards (0~, CO, HC, and S02) and for medium-range
pollutant transport (an urban plume, for example). Some grid models
are designed to model large spatial regions (such as the Northern Great
Plains—see Liu and Durran, 1977) and thus can address long-range transport
questions. At their present state of development, these models are appropriate
tools only for examining questions involving relatively nonreactive pollutants
(principally long-term S02 and TSP).
There are two major classes among grid models: region oriented and
specific source oriented. In the first class, two basic variants exist:
urban scale and regional scale models. The first of these attempts to
model the urban environment, considering emissions from a number of dif-
ferent sources and simulating both reactive and relatively non-reactive
pollutant species over a spatial scale on the order of tens of kilometers
through a temporal scale of 8 to 36 hours. Regional-scale models, on the
other hand, represent an attempt to model long-range pollutant transport
over a spatial scale of hundreds of kilometers through a temporal scale
of several days. Emissions are assumed to come from a few widely dispersed,
usually rural,.sources; the pollutants considered are relatively nonreac-
tive (or more precisely, slowly reactive) ones such as S02- (Though
S02 -*• S0= it does so much more slowly than the time scale of reactions
involving the more reactive species.) One such model was developed by SAI for
use in assessing the air quality impact of large-scale energy development in
the Northern Great Plains (Liu and Durran, 1977).
Because of their spatial extent, regional oriented grid models are
appropriate tools for addressing regional (multiple-source) issues, such
as SIP/C and AQMP. Because of their spatial resolution, certain regional
questions about single-source issues can also be addressed. The regional
IV-7
-------
effect of a new source can be assessed. The subtract!ve regional effect
of removing an existing source also can be estimated, an essential cap-
ability for addressing OSR questions. However, only grid models specifi-
cally designed to consider a single source have sufficient spatial reso-
lution to assess near-source, or microscale effects.
Specific source oriented models represent the second major class of
grid models. Some specific examples of such models are listed later in
this chapter. Models of this type are particularly useful in two types
of applications: examining the behavior of a plume containing reactive
constituents, and accounting for the effects of complex terrain on a
point source plume. Because of their formulation, these models can con-
sider the effects of plume interaction with ambient reactive pollutants.
This is of interest in addressing single-source issues in urban areas with
significant levels of reactive pollutants. Often urban-scale grid models
are used to predict the ambient conditions with which the plume interacts.
Those models designed for applications in complex terrain can be used
when it is necessary to describe explicitly the wind fields and inversion
characteristics encountered by a dispersing pollutant. Although simpler
models exists, they are often inadequate when applied in situations in
which terrain is particularly complex or when photochemical reactivity
is important.
b. Trajectory Subcategory
Trajectory models employ a reference coordinate system that is allowed
to move with the particular air parcel of interest. A hypothetical column
of air is defined, bounded on the bottom by the ground and on the top by
the inversion base (if one exists), which varies with time. Given a speci-
fied starting point, the column moves under the influence of prevailing
winds. As it does so, it passes over emissions sources, which inject pri-
mary pollutant species into the column. Chemical reactions are simulated
in the column, driven by a photochemical kinetic mechansim. Some trajectory
IV-8
-------
models allow the column to be partitioned vertically into several layers,
or cells. Emissions in such models undergo vertical mixing upward from
lower cells. Other trajectory models allow only a single layer; in these,
vertical mixing is assumed to be uniform and instantaneous.
The forumlation employed by trajectory models to describe atmospheric
dynamics represents an attempt to solve the mass continuity equation
in a moving coordinate system. The remaining state equations—conservation
of momentum and energy—are not solved explicitly. As is done in grid models,
solution of the momentum equation is avoided by specification of a spatially
and temporally varying wind field, while solution of the energy equation
is sidestepped by externally supplying temperature and inversion base height
information.
Several basic assumptions are invoked in the formulation of trajectory
models. Since only a single air column is considered, the effects of
neighboring air parcels cannot be included. For this reason, horizontal
diffusion of pollutants into the column along its sides must be neglected.
This may not seriously impair model results so long as sources are suffic-
iently well distributed that emissions can be idealized as uniform, or
nearly so, over the region of interest. However, if the space-time track
of the air column passes near but not over large emissions sources, neglect
of the effect of the horizontally diffusing material from those sources
might cause model results to be deficient. In general, problems occur
whenever there are significant concentration gradients perpendicular to
the trajectory path.
Also, the column is assumed to retain its vertical shape as it is
advected by prevailing winds. This requires that actual winds be ideal-
ized by means of a mean wind velocity assumed constant with height. Because
of the earth's rotation and frictional effects at ground level, winds aloft
usually blow at greater speeds than do surface winds, and in different directions.
This produces an effect known as wind shear, which is neglected in trajec-
tory models. If emissions are evenly distributed in amount and type over
IV-9
-------
the region of Interest and winds are also uniform, this may not represent
a serious deficiency. In such a case, material blown out of the column
by wind shear effects would be replaced by similar material blown into
it, with the net effect on model results expected to be small. However,
If a significant fraction of the emissions inventory is contributed by
large point sources or if wind patterns display significant spatial vari-
ation, neglect of wind shear can seriously impair the reliability of
trajectory model results.
Additionally, many trajectory models assume that the horizontal dim-
ensions of the air column remain constant and unaffected by convergence
and divergence of the wind field. Where winds are relatively uniform,
this may not be of serious consequence. Where winds have significant
spatial variation, as could be the case in even mildly complex terrain,
however, this assumption could lead to deficient results. In the San
Francisco Bay region, for example, wind flow convergence during the day
causes the merging of several air parcels. Peak pollutant concentrations
subsequently occur in this merged "super-parcel." A trajectory model
would be an inadequate tool for addressing problems in such a region.
In general, trajectory models require as inputs much the same types
of data required to exercise a grid model. Emissions are required along
the space-time track of the air column. Wind speed and direction must be
provided to determine its movement. Vertical temperature soundings must
also be input in order to determine the height of the column (the height
of the inversion base). Although these data need be prepared only for
the corridor encompassing the trajectory path, general application of the
model to an entire urban area requires that data be prepared for a. signi-
ficant portion of the region.
Two major classes exist among trajectory models: region oriented and
specific source oriented. The first of these classes includes those models
designed to address multiple-source, regional issues, usually in urban
IV-10
-------
areas. The second class contains so-called reactive plume models. For
reasons noted above, the use of trajectory models is appropriate on an
urban scale only in certain circumstances. Careful screening is required
of the emission and meteorological characteristics in a proposed appli-
cation region to insure the appropriateness of trajectory model usage.
The second class of trajectory models includes those designed to
evaluate the air quality impact downwind of a specific source. Because
of the underlying equation formation, these models are more appropriate
for use in areas having relatively simple terrain. However, because
they are capable of simulating photochemical reaction, they can be used
in addressing issues involving reactive pollutants. Often, region ori-
ented models are used to generate the ambient conditions with which the
reactive plume downwind of the source must interact. For all trajectory
models considering reactive pollutants, the time scales remain short (less
than several days). Consequently, they are inappropriate for consideration
of problems involving pollutants subject to long-term standards.
c. Gaussian Subcategory
In the formulation of Gaussian models, the atmosphere is assumed to
consist of many diffusing pollutant "puffs," all moving on individual
trajectories determined by prevailing winds. The concentration at any
point is assumed due to the superimposed effect of all puffs passing over
the point at the time of observation. Rather than keeping track of the
path of each puff, their motion (both advection and diffusion) is described
in terms of conditional state transition probabilities. Given an initial
location at a particular time, this state transition probability describes
the likelihood that the puff will arrive at another specified point a
given time interval later. With an entire field specified at some refer-
ence time, the net expected effect at a.particular point and time is calcu-
lated by determining the integral sum of the separate expected effects of
each puff in the field.
IV-11
-------
Central to this type of formulation is a knowledge of the time-varying
state transition probabilities for the entire concentration field. In
practice, turbulence nonuniformities and terrain-specific effects combine
to render it unlikely that such probabilities can be determined. To over-
come this difficulty, traditional Gaussian models (among others, those
recommended by the EPA) invoke several assumptions. First, the turbulence
field is assumed to be stationary and homogeneous, which implies it has
two Important qualities: First the statistics of the state transition
probabilities can be assumed dependent only on spatial displacement, thus
removing their time-dependency; and second, the probabilities are not
dependent on puff location in the field, thus removing spatial variability.
These are satisfactory approximations so long as significant differences
do not exist between turbulence characteristics of the atmosphere in dif-
ferent portions of the region to be modeled. For applications in complex
terrain, for instance, such an assumption might not be justified.
Once turbulence field stationarity and homogenity have been assumed,
it still remains to specify the functional form of the state transition
probability. Gaussian models derive their name from their assumption that
this probability function is Gaussian in form. Given this assumption, the
concentration field can be determined analytically by evaluating the integral
expressing the summation of separate effects from all pollutant puffs affect-
ing the region of interest. In order to isolate the effect of an individual
source, only puffs containing pollutants emitted from that source are
considered.
Concentrations about the plume center!ine are assumed to be distri-
buted according to a Gaussian relationship, whose vertical and horizontal
cross-sectional shape is a function of downwind distance from the source
and atmospheric stability class. Analytic forms can be determined express-
ing the form of the downwind concentration field for several different
types of emissions regimes: instantaneous "puff," continuous point source
emission (steady-state), continuous emissions from an area source, and
continuous emissions along a line source.
IV-12
-------
Several other assumptions are invoked in Gaussian steady-state models.
The vertical and horizontal spread of the plume is assumed characterized
by dispersion coefficients, whose values are dependent on the distance
downwind of the source. They are assumed to be functions of atmospheric
stability and are thus characterized by stability class. Specific values
are obtained from standard workbooks, such as that developed by Turner, or
evaluation of data measured downwind of actual sources.
In many models, plume interaction with the ground and the inversion is
considered. Usually, perfect or near-perfect reflection is assumed to
occur. Multiple reflections are often modeled, although some models assume
that beyond a certain downwind distance mixing is uniform between the ground
and the inversion base.
Consideration of plume rise is made in Gaussian point source models.
Depending upon ambient atmospheric conditons, such as temperature and humi-
dity, hot gases from an emitting stack may rise, sink or remain at the same
height. Simplifying thermodynamic equilibrium relationships, such as that
developed by Briggs, are often used to estimate the magnitude of plume rise.
Two major classes of Gaussian models exist: long-term averaging and
short-term averaging. Though both invoke the basic Gaussian assumptions,
major differences exist in formulation. Long-term models divide the region
surrounding each source into azimuthal sectors. The long-term variation
of the wind at the source must then be specified by wind speed and direction
(by sector) classes, along with the frequency of occurrence for each combin-
ation. This information usually is conveyed in the form of a "wind rose."
Data describing the frequency of occurrence of the various atmospheric
stability categories must also be specified. The probability of occurrence of
of stability category/wind vector (speed and direction) combination is then
used to weight the downwind concentrations resulting from it. The weighted
sum represents the expected value of the long-term averaged pollutant con-
centration. Models employing this so-called "climatological" formulation
IV-13
-------
are appropriate tools for addressing problems involving pollutants for
which long-term (annual) standards are specified (502, TSP, and NO,,).
The second class of Gaussian models includes those designed for short-
term analysis. Prevailing wind direction and speed, as well as emissions
characteristics, are assumed to persist long enough that steady-state con-
ditions are established. The downwind concentration field resulting from
source emissions can then be evaluated analytically. Some models allow a
limited form of temporal variability by dividing the modeling day into
segments (perhaps one hour long), during each of which conditions are assumed
to be in steady state. Source strengths and prevailing wind speed at the
height of emissions release are required for each segment, as are sufficient
vertical temperature profile data to calculate inversion base height, if
one exists, and atmospheric stability class. The last of these is required
in order to determine vertical and horizontal dispersion coefficients.
Because wind data frequently are not available at the height of emission
release, surface wind measurements are extrapolated. Wind speed is assumed
to vary vertically according to a power law, the exponent of which is given
as a function of stability class. Determination of stability class is made
by one of several appropriate methods, each of which is also dependent on
surface observations.
Both Gaussian classes contain models that can be used to estimate the
impact of single or multiple sources. Some models are designed to consider
only a single point source; others can model many different sources simul-
taneously. Consequently, the first group of these is appropriate only for
addressing single-source issues; the second group can be used to consider
multiple-source issues as well. Most models in this second group, though
able to account for many sources, can also simulate as few as one. They
can thus be used to consider both single and multiple source issues.
Full consideration of regional-scale issues (SIP/C and AQMP) requires
of a model the ability to simulate all types of sources: point, area, and
line. Not all multiple-source Gaussian models are capable of doing so.
IV-14
-------
Some are used to consider only point and area sources; others are used to
consider line sources only. These latter nre usually intended for use in
addressing traffic related questions; they might be used, for instance,
to estimate the impact of emissions from a full highway network on regional
CO distribution and level. Consequent to the above, consideration of all
source types in a region may require the joint use of more than one model--
one considering point and area sources and another simulating line sources.
An important restriction exists on the type of pollutant species
that can be simulated using Gaussian models. Because the formulation
cannot accommodate explicit kinetic mechanisms, only relatively nonreactive
pollutants can be modeled (CO, TSP, and SOp).* However, some models incor-
porate first-order, exponential decay to account for pollutant removal
processes and limited species chemical conversion. Multiple-source Gaussian
models assume that the combined effect of many emitters can be calculated
by linearly superimposing the effects from each individual source. Such
an assumption would be an erroneous one if questions involving reactive
species were being considered.
Some Gaussian models have been designed to simulate the effects of
point source emissions in complex terrain. Various assumptions are made
about the behavior of the plume and the variation in height of the inver-
sion base as an obstacle is approached. Usually the plume is allowed to
impinge on the obstacle without any sophisticated means to account for flow
alteration, although some models allow for flow convergence and divergence
in the wind field. Also, the base of the inversion is sometimes assumed to
be at constant height above the source; in other models it is assumed to be
a fixed distance above the terrain, thus varying with it. However, the
Gaussian formulation depends on the assumption of turbulence field station-
arity and homogeneity. This is a simplification that may not be justified
in many applications in complex terrain.
* Long-term Gaussian models are also used to model annual N02, a reactive
species, for which no short-term standard currently is set. This
usually is accomplished by combining NO and N02 as NOX, the "species"
modeled. NOX exhibits less variability during the day than NO? taken
separately.
IV-15
-------
d. Box Subcategory
Box models are the simplest of the physico-chemical models. The region
to be modeled is treated as a single cell or box, bounded by the ground
on the bottom, the inversion base on the top, and the east-west and north-
south boundaries on the sides. The box may enclose an area on the order
of several hundred square kilometers. Primary pollutants are emitted into
the box by the various sources located within the modeled region, under-
going uniform and instantaneous mixing. Concentrations of secondary pol-
lutants are calculated through the use of a chemical kinetic mechanism.
The ventilation characteristics of the modeled region are represented,
though only grossly, by specification of a characteristic wind speed.
Because of their formulation, box models can predict, at best, only
the temporal variation of the average regional concentration for each
pollutant species. Consequently, they are capable of addressing only multi-
ple source, regional issues. Furthermore, such models are useful only in
regions having relatively uniform emissions. In those areas where point
sources contribute significantly to the emissions inventory (in number and
amount), the assumption of emissions uniformity may be an unsatisfactory one.
Box models require only limited data. Emissions can be specified on
a regional basis, eliminating any need for determining their spatial
variation. Only simple meteorological data need be supplied as input. For
these reasons, box models can be used when little information is available.
They are more appropriately used as screening tools, helping to identify
those situations requiring more extensive data collection and modeling
analysis.
B. GENERIC ISSUE/MODEL COMBINATIONS
The discussion in the previous section outlined the characteristics
of generic classes of air quality models. In this section we associate
generic model type with generic issue category. In so doing, we indicate
the gross suitability of a generic mjdel type as a tool in addressing a
particular issue. As noted earlier, each generic model (GM) has associated
with it a set of limitations on its use. In Section C we summarize the
1V-16
-------
effects of these limitations. We first classify types of actual applications
according to several key attributes and then indicate those which each GM
is capable of considering. The result is an enumeration of possible model/
application combinations.
In order to match model to issue, we present in Table IV-1 a matrix of
model/issue combinations. For each GM, an indication is provided of its
usefulness in addressing each of the seven generic issues identified in
the previous chapter. Even where a GM is indicated as suitable, however,
its inherent limitations (some of which are noted in the table) may prevent
its use in certain applications. Consequently, further examination is
required in order to make a final GM selection.
Summarizing the basic features of Table IV-1, we note the following:
> Grid Models
- Region Oriented Models. Urban scale models are able to
address multiple-source issues (SIP/C, AQMP) involving
both reactive and nonreactive pollutants. Their short-
term temporal scale (< 36 hours), however, restricts
them to problems involving pollutants with short-term
standards (03> HC, CO, and secondary S02). Their spatial
resolution (on the order of tens of kilometers) allows
them to address some single-source issues (OSR, EIR, LIT).
Regional scale models, as opposed to urban scale ones, are
more oriented towards application in rural areas (few sources)
involving nonreactive Cor rather, slowly reactive) pollu-
tants, such as S02, TSP, CO, and N02, which is slowly reactive
in nonurban areas because of limited ambient HC). Their
short-term temporal scale (on the order of a week or less),
often a practical restriction due to computing requirements,
limits their use in predicting long-term pollutant concen-
trations (S02, TSP, N02). They are suited for addressing
questions involving single-source issues (PSD, NSR, EIS/R, .
LIT) in isolated rural areas.
IV-17
-------
TABLE IV-1. AIR QUALITY ISSUES COMMONLY ADDRESSED
BY GENERIC MODEL TYPE
Issue Category
E.n.HC *,^ TVO. IM—HE5i£ HS Bft LM 01
Refined usage
1. Srld1 x z
a. Region Oriented X X * *3 *
b. Specific Source Oriented x X X X X
I. Trajectory1
I. Region Oriented XX X^ X X
». Specific Source Oriented X I X X *
J. Causslan3
I. Short-Term Averaging ill
1) Multiple Source X X * J J J
lit C<~.1» t«ll*M V *****
Single Source X
b. Long Tei» Averaging* « I X X X X X
Refined/Screening Usage
4. isopleth1'5 X x
Screening Usage
S. tollbjck x "
«.|ox XX
Notes:
1. Only short-Urn ttne scales can be considered (less than several days).
2. Regional !«<>act of new sources can be assessed but not near-source, or nlcroscale, effects.
3. Only non-reactive pollutants can be considered.
4. Only pollutants having long-tern standards can be considered (SO^, TSP, and NO^).
5. Only photochemlcatly active pollutants can be considered.
IV-I8
-------
- Specific Source Oriented Models. These models are used
primarily for addressing single-source issues (PSD, NSR,
OSR, EIS/R, LIT). This class contains the so-called
reactive plume models. Their ability to consider reactive
pollutants makes them suitable for urban applications or
rural applications where plume reactivity is important.
However, because OSR (a primarily urban issue) requires an
estimate of the subtractive effect of removing an existing
source, only questions involving pollutants for which linear
superposition is approximately valid, i.e., nonreactive
pollutants, can be addressed in an urban area with a specific-
source model. These models are also suitable for use in
applications where terrain complexity is important.
Trajectory Models
- Region Oriented Models. With some important restrictions,
these models can be suitable for use in addressing multi-
ple-source issues (SIP/C and AQMP) and, in limited circum-
stances, some single-source issues (OSR, EIS/R, LIT). Among
the most important of such restrictions are the following:
Emissions must be approximately uniform over the modeling
region; air flow cannot be complex enough to cause merging
of air parcels, i.e., flow convergence or divergence should
not be important; and horizontal diffusion effects should
not have significant nonuni fertilities, e.g., large point
sources near but not within the space-time track of the
advected air parcel being modeled. Because chemical kinetic
mechansims can be included in their formulation, these models
are capable of considering reactive as well as nonreactive
species. Their temporal scale is so short, however, that
no estimates of long-term concentration averages can be
computed.
- Specific Source Oriented Models. Subject to the same restric-
tions mentioned above, these models can be appropriate tools
for use in considering single-source issues (PSD, NSR, OSR,
EIS/R, LIT). Because they can consider reactive pollutant
IV-19
-------
species, they can be used in applications involving reactive
plumes. Limited terrain complexity can also be simulated,
so long as the abovetnentioned restrictions are not violated.
Gaussian Models
Long-Term Averaging Models. These models can be used to
address both multiple-source issues (SIP/C, AQMP) and some
single-source issues (PSD, OSR, EIS/R, LIT). Because of
the Gaussian formulation they cannot consider chemistry or
surface removal effects beyond first order, i.e., exponential
decay. Thus, they are appropriate tools only for addressing
questions involving nonreactive (slowly reactive) pollutants.
Their temporal scale is such that only pollutants having
long-term (annual) standards can be considered (S02 primary
standard, TSP, N02> where N02 is taken as NO + N02» i.e.,
NO ). As currently configured, these models are appropriate
^
for use in both urban and rural settings, although
the terrain in such applications should be relatively
simple.
- Short-Term Averaging Models. Two variants exist among
these models: multiple-source and single-source. The
types of issues they may be used to address divide
similarly. Some multiple-source models, however, do
not consider all types of sources: Some consider only
point and area sources; others consider only line
sources. The latter group is useful for examining the
effects of traffic-related pollutants (particularly CO)
resulting from highway network emissions. Consequently,
if regional questions are to be addressed, the concur-
rent use of more than one model may be required. Only
relatively nonreactive pollutants may be examined
using this type of model. Because of their short-term
temporal scale, these models are best suited for
addressing questions involving pollutants having short-
term standards (CO, S02 secondary standard).
IV-20
-------
Rollback Models
Because rollback models lack spatial resolution, they
are appropriate only for considering questions involving
multiple-source issues (SIP/C, AQMP). Their use is
generally confined to urban areas located in simple
terrain. Their assumption that emissions are directly
proportional to peak pollutant values is a technically
limiting one. Consequently, they should be viewed as
screening tools to evaluate the need for more extensive
analysis and data gathering.
Isopleth Models
Lacking spatial resolution, isopleth models are appro-
priate only for use in addressing multiple-source
issues (SIP/C, AQMP). Employing ozone isopleth dia-
grams derived through the use of a photochemical
kinetic mechansim, these models are designed to examine
questions involving reactive pollutants (0-, HC, short-
term N02). Their use is most appropriate for applications in
urban areas located in simple terrain. Because the isopleth
diagram is constructed using regional ventilation, emissions,
and background/transport assumptions, it is similar to
the box models, which are described below. Like the
box model, its technical limitations, except under
exceptional circumstances, render it more useful and
reliable as a screening tool to evaluate the need for
more extensive analysis.
Box Models
Because they lack spatial resolution, box models are
appropriate only for use in considering multiple-source
issues (SIP/C, AQMP). They assume spatially uniform
emissions. For this reason, their use is more suited
to areas that are urban or semi-urban. They are best
used in modeling areas Icoated in simple terrain but have
IV-21
-------
also been used in applications in complex terrain. An
example of the latter type of application might be the
modeling of a mountain valley containing several ski
resorts and related developments. Technical limitations
render the box models more suitable as screening tools.
C. MODEL/APPLICATION COMBINATIONS
In the previous section we discussed the relationship between generic
models and generic issues. In this section we associate those generic
models and the specific applications in which they may be used. We first
classify applications by means of several key attributes. We then com-
pare the possible values of these with model capabilities. For each generic
model type, we are thereby able to identify the range of applications for
which the model is suited.
Applications are characterized here by five attributes: number of
sources, area type, pollutant, terrain complexity, and required resolution.
In Table IV-2 we list the possible designations these attributes may assume.
Against these we match generic model capabilities, identifying the list of
designations for which each is suitable. A chart of the resulting model/
application combinations is presented in Table IV-3. While exceptions may
occur, the list of attribute designations shown is chosen based upon con-
siderations presented earlier in this chapter.
D. SOME SPECIFIC AIR QUALITY MODELS
Our central purpose in this report is to discuss means for setting
suitable standards for model performance. As prologue to this, both
air quality issues and the models used to address them needed to be
examined. We have done so in general terms to this point. Throughout
this discussion we have referred to air quality models only in generic
IV-22
-------
terms. By doing so, several advantages were achieved: General conclu-
sions appropriate to an entire class of models could be stated without
reference to any specific model, and extensive discussions of any observed
differences between intended capabilities and technically achieved ones
were not necessary for each particular model.
TABLE IV-2. POSSIBLE DESIGNATIONS OF APPLICATION ATTRIBUTES
Attribute
Number of Sources
Possible Designations
Multiple-Source
Single-Source
Area Type
Pollutant
Urban
Rural
Ozone (03)
Hydrocarbon (HC)
Nitrogen Dioxide (N02)
Sulfur Dioxide (S02)
Carbon Monoxide (CO)
Total Suspended Particulates (TSP)
Terrain Complexity
Simple
Complex
Required Resolution Temporal
Spatial
IV-23
-------
TABLE IV-3. MODEL/APPLICATION COMBINATIONS
teneric Model Type
KTIKED USAGE
6rid
«. legion Oriented
Number of
Sources
Area TV
Multiple-Source Urban
Kuril
Pollutant
Terrain
Complexity
Required
Resolution
03. HC. CO. NO; Simple Tempora
(1-hour). SO? Complex (Limited) spatial
(3- and 24-hour).
TSP
k. Specific Source Single-Source Rural
Oriented
03. HC. CO. NO; Simple Temporal
(1-hour). 502 Complex (Limited)
(3- and 24-hour).
TSP
Trajectory
«. Region Oriented Multiple-Source Urban
6. Specific Source Single-Source Urban
Oriented Rural
0), HC, CO. NO;
(1-hour). SO;.
(3- and 24-hour).
TSP
03. HC. CO. NO;
(1-hour). S02.
(3- and 24-hour).
TSP
Simple
Temporal
Spatial (Limited)
Simple Temporal
Complex (Limited) Spatial (Limited)
tamsian
a. Long-Term
Averaging)
6. Short-Term
Averaging
Multiple-Source Urban
Single-Source Rural
Multiple-Source Urban
Single-Source Rural
SO; (Annual). TSP. Simple Spatial
HO; (Annual)*
SO; (3- and 24- Simple Temporal
hour). CO, TSP, Complex (Limited) Spatial
NO;, (l-hour)«
R£FJNEO/SCREENING USAGE
Uopleth Multiple-Source Urban
03, HC. NO?
(T-hour)
Simple
Complex (Limited) •
Temporal (Limited)
SCREENIMC USAGE
toll tuck
Box
Multiple-Source urban
Single-Source Rurai
Multiple-Source Urban
03. HC. »2
SO;. CO. TSP
03. HC. CO. NO?
(I-hour). SO?
(3- and 24-hour).
Simple
Complex (Limited)
Simple
Complex (Limited)
Temporal
• Only if N02 it taken to be total M>x
IV-24
-------
Having made our general points in previous sections, however, we
associate here some specific models with our generic model categories.
Though this is not central to our discussion of model performance
standards, it may be helpful in linking specific models to the issues
and applications for which they are most suited.
In Table IV-4 we associate a number of specific models with the generic
model types identified earlier. We included many of the models with
which we were familiar. Because the list is intended only to be a
representative one, we did not seek to make it fully complete. Many
other models, particularly Gaussian ones, certainly exist and would
be appropriate for use in the proper circumstances.
For the models listed in Table IV-4, a detailed summary of their
characteristics is provided in Appendix B. Among the information
contained there is the following: model developer, EPA recommendation
status, technical description, and model capabilities. The last of these
is further subdivided into source type/number, pollutant type, terrain
complexity, and spatial/temporal resolution.
E. AIR QUALITY MODELS: A SUMMARY
In Chapter III we identified generic classes of air quality issues.
In this chapter we defined generic types of models. Having done so, we
associated the two, identifying those issues for which each model was a
potentially suitable analysis tool. We also described the technical formula-
tions employed in each generic type of model, indicating some key limitations.
As noted in Table IV-1, several generic model types may be of potential
use in addressing the same generic class of issue. Only by considering the
characteristics of a proposed application can a final choice of model be
IV-25
-------
TABLE IV-4. SOME AIR QUALITY MODELS
Generic Model Type
Refined Usage
Grid
a. Region Oriented
b. Specific Source Oriented
Trajectory
a. Region Oriented
b. Specific Source Oriented
Gaussian
a. Long-term Averaging
b. Short-term Averaging
Refiner/Screening Usage
Isopleth
Screening Usage
Rollbacjc
Box
Specific Model Name
SAI
LIRAQ
PICK
EGAMA
DEPICT
DIFKIN
REM
ARTSIM
RPM
LAPS
AQDM
COM
CDMQC
TCM
ERTAQ*
CRSTER*
VALLEY*
TAPAS*
APRAC-1A
CRSTER*
HANNA-GIFFORD
HIWAY
PTMTP
PTDIS
PTMAX
W
VALLEY*
TEM
TAPAS*
AQSTM
CALINE-2
ERTAQ*
EKMA
WHITTEN
LINEAR ROLLBACK
MODIFIED ROLLBACK
APPENDIX J
ATDL
* These models can be used for both long-term and short-term
averaging. iy_26
-------
made. To facilitate the comparison between model capabilities and appli-
cation requirements, we defined a set of application attributes. We then
matched the two, identifying for each generic model type the combinations of
application attributes for which it was suited.
In this chapter we defined the interface between issue, model, and
application. In addition, we mentioned some specific air quality models
within each model category, giving additional detail on each in Appendix B.
With the completion of this chapter, we are ready to consider model
performance measures. In the next chapter, we identify performance measures
appropriate for the consideration of each air quality issue. Having done
so, we examine the interface of performance measure and model category.
Finally, in Chapter VI, we discuss several alternative rationales and
formats for setting model performance standards. These are designed to be
consistent with the performance measures defined in Chapter V.
IV-27
-------
V MODEL PERFORMANCE MEASURES
The central purpose of this report is to identify means for
setting standards for air quality model performance. As prologue to
doing so, we identified generic types of air quality issues in Chapter III
and generic classes of air quality models in Chapter IV, exploring their
interrelationships. Now it remains to discuss the model performance mea-
sures for which performance standards must be set. Several rationales for
setting these standards are presented in Chapter VI.
In this chapter our discussion proceeds as follows: We first
identify generic types of performance measures; we then suggest some
specific performance measures (describing them in detail in Appendix
C); and finally we match generic performance measures to the issue/
model/application combinations presented in earlier chapters. Before
beginning, however, the notion of a model "performance measure" needs
to be defined in more detail.
Typically, air quality models are used in the following context:
a problem is posed, a model is chosen that is suitable for use in
addressing the issue/application, existing data are assembled for in-
put and additional data are gathered (if needed), and a simulation is
conducted. Results often are expressed in the form of spatially and
temporally varying concentration predictions for one or many pollutant
species. Since most problems are hypothetical ones posing "what-if"
questions (e.g., what if a new power plant is built, or what if
population growth and development proceeds as forecast), model results
in such situations are inherently nonverifiable. Consequently, before
its results can be accepted, the reliability of the chosen model must be
demonstrated. Most frequently, "validation" is accomplished by using
the model to simulate pollutant concentrations in a test situation
V-l
-------
which is similar to the hypothetical one and for which measurement
data are available. A region-oriented model (urban or regional scale)
may be required to predict region-wide concentrations resulting from
conditions existing on some past date. A specific-source model may
have to reproduce the downwind concentrations resulting from emissions
from an existing source having size and siting characteristics similar
to the proposed one. If its predictions are judged to be in sufficient
agreement with observed data, the model is then accepted as a satis-
factory tool for use in addressing the hypothetical problem.
However, what do we mean by "satisfactory" agreement between predic-
tion and observation? What are the quantities most appropriate for use
in characterizing differences between the two? Within what range of
values must these quantities remain? The values for how many different
quantities must be "satisfactory" before we judge model predictions to
be acceptably near test case observations?
In this chapter, we explore the second of these questions. In doing
so, we identify a set of model performance measures, surrogate quantities
whose values serve to characterize the comparison between prediction and
observation. We match these performance measures with the generic types of
air quality issues identified in Chapter III and the generic classes of air
quality models listed in Chapter IV. We defer until Chapter VI the next and
final step: the specification of model performance standards against which
to compare for acceptability the values of the model performance measures.
A. THE COMPARISON OF PREDICTION WITH OBSERVATION
Before accepting a model for use in addressing hypothetical air
quality questions, the user must validate it. This is often done by
demonstrating its ability to reproduce a set of test results, usually
consisting of observational concentration data recorded at a number of
measurement stations for several hours during the day. In comparing
predictions with observation, several questions should be asked. Among
these are the following:
V-2
-------
> What are the differences? How much does prediction
differ from observation at the location of the peak
concentration level and at each of the monitoring sta-
tions? What is the spatial and temporal distribution
of the residuals (the difference between prediction and
observation)? Do these differences correlate with diur-
nal changes in atmospheric characteristics (mixing
height, wind speed, or solar irradiation, for instance)?
If more than one species is being considered, are there
differences in performance between each species?
> How serious are the differences? Are peak concentration
levels widely different? Are the estimates of the area
in violation of the NAAQS in substantial disagreement?
How near to agreement are the estimates of the area ex-
posed to concentrations within 10 percent of the peak
value? Are differences in the timing and spatial dis-
tribution of concentrations such that the expected
health impacts on the population (exposure/dosage) are
of different magnitude? Do the predicted and observed
patterns and levels of concentrations lead to seriously
different conclusions about the required amount and cost
of emissions control? Are policy decisions deriving
from prediction and observation different (such as a
"build-no build" decision on a power plant based on PSD
considerations)?
> Are there straightforward reasons for the differences? Are
the locations and timing of the concentration peaks slightly
different between prediction and observation? (If con-
centration gradients within the pollutant cloud are
steep, even a slight difference in cloud location can
produce large discrepancies at set monitoring sites.
Such a problem could occur if there were only slight
errors in the wind speed or direction input to the model.
In such an instance, model performance might otherwise
V-3
-------
be perfectly adequate.) Are wide fluctuations in ground-
level concentrations and thus station measurements produced
by relatively small discrepancies between the modeled and
the actual atmospheric characteristics? [This "multiplier
effect" can occur downwind of an elevated point source,
for example. Because the emissions plume from a point
source has dimensions much greater downwind than crosswind,
slight changes in the atmospheric profile (stability
category), having an effect on plume rise and dispersion,
have a more than proportionate effect both on the downwind
distance, at which the ground-level peak concentration
occurs and on the amount of area exposed to a given con-
centration level.]
In the remainder of this chapter we discuss, first in generic
terms and then in specific ones, several different types of model
performance measures. While each type and variant is designed to high-
light different aspects of the comparison between prediction and obser-
vation, they all address the general questions noted above. Those
questions, and others like them, are the fundamental ones from which
the notion of performance measures and standards derive.
B. GENERIC PERFORMANCE MEASURE CATEGORIES
In this section, we define several generic model performance
* measure categories, distinguishing among them on the basis of their
general characteristics and the amount of information required to
compute them. We also note three variants found among measures in each
category. We then introduce some practical considerations which can
limit the choice of performance measure. In Section C we list some of
the specific measures included in the generic categories, beginning
with a discussion of the fundamental differences between those designed
to measure performance on a regional scale and those characterizing it
on a specific-source scale. Details of these specific measures are
provided in Appendix C.
V-4
-------
1. The Generic Measures
We consider here four generic performance measure categories:
peak, station, area, and exposure/dosage. The first category contains
those measures related to the differences between the predicted and
observed concentration peak, its level, location and timing. The second
category includes measures based upon concentration differences between
prediction and observation at specific measurement stations. Within the
third category are contained those measures based upon concentration
field differences throughout a specified area. The fourth category in-
cludes measures derived from differences in population exposure and
dosage within a specified area.
Each of these generic performance measure categories requires
successively greater knowledge of the spatial and temporal distribution
of concentrations. We show in Figure V-l a schematic representation
illustrating several distinct levels of knowledge about regional con-
centrations. A similar schematic appropriate for source-specific
situations is shown in Figure V-2. Listed in Table V-l are the infor-
mation requirements for the four categories. These range from an
estimate of a simple scalar quantity, concentration at the peak, all
the way to full knowledge of the spatially and temporally resolved
concentration field and population distribution. For peak measures,
the concentration residuals (the difference between predicted and
observed values) are required at a single point and time. For station
< measures, the temporal variations of the residuals are required at
several points. For both area and exposure/dosage measures, the full
residual field is required, both spatially and temporally resolved.
The latter type of measure requires, in addition, the spatial and
temporal history of population movement within the area of interest.
As the information content increases, the ability of the performance
measure to characterize the comparison between prediction and observation
also can increase. However, measures from different categories tend to
emphasize different aspects of the comparison. For this reason, several
V-5
-------
0>
CONCENTRATION
PEAK
VVVV
MEASUREMENT STATION
CONCENTRATIONS
CONCENTRATION
FIELD
C(x,y,t)
BOUNDARIES OF
MODELING REGION
FIGURE V-l. VARIOUS LEVELS OF KNOWLEDGE ABOUT REGIONAL CONCENTRATIONS
-------
POINT
SOURCE
REVAILING
WIND
MEASUREMENT
STATION CONCENTRATIONS
Ci(xi,y.,t)
GROUND-LEVEL CON
CENTRATION PEAK
CONCENTRATION
FIELD
C(x,y,t)
FIGURE V-2. VARIOUS LEVELS OF KNOWLEDGE ABOUT SPECIFIC-SOURCE CONCENTRATIONS
-------
TABLE V-1. GENERIC PERFORMANCE MEASURE
INFORMATION REQUIREMENTS
00
Generic
Performance
Measure Type
Peak
Station
Area
Exposure/dosage
Information Required
Predicted and measured concentration peak (level,
location, and time), I.e.,
VWVpred., Meas.
Predicted and measured concentrations at specific
stations (temporal history), I.e.,
C1(x1*rtW,Heas. • Visitations
Predicted and measured concentration field within
a specified area (spatial and temporal history),
I.e.,
C(x,y,t)pred.tMeas.
Both the predicted and measured concentration
field and the predicted and actual population
distribution within a specified area (spatial
and temporal history), I.e.,
..Meas.
C(x,y,t)pred |Actual
-------
types of performance measures are usually required in order to fully
characterize a model's ability to reproduce observationally obtained
data.
Because a model predicts well the observed concentration peak, for
instance, does not necessarily mean its predictions can reproduce the
spatially distributed concentration field. A comparison of the temporal
history of concentration values at several specific stations might give
a better indication of spatial model behavior. Even this might not
prove conclusive. The prevailing direction of the winds input to the
model might have been slightly in error. This may have little impact
on concentration levels, resulting only in a pollutant cloud slightly
displaced from its actual location. If concentration gradients are
steep within the cloud, station predictions might not agree well with
the values observed, even though the model might not be significantly
deficient. In such a circumstance, area measures might provide a better
means for assessing model performance. For instance, the areas in
excess of a specified concentration value could be compared for several
values ranging between the peak and background values.
Even employing the above measures, the degree of seriousness of
the disagreement between prediction and observation might not be
obvious. Since health effects result from both the pollutant level and
length of exposure, measures expressing differences in exposure/dosage
might give an indication of a model's ability to estimate the inter-
action of population with pollutant. This might be helpful in a number
of circumstances. For example, suppose prevailing winds on "worst" epi-
sode days carry the pollutant cloud containing ozone and its precursors
into adjacent rural areas before the early-afternoon peak occurs. If few
people live in the affected area, exposure/dosage measures may indicate
that the model's failure to accurately predict peak concentrations is of
little practical consequence.
V-9
-------
2. Some Types of Variations Among Performance Measures
Three types of variations are found among performance measures:
scalar, statistical, and "pattern recognition." Those measures
of the first type are based upon a comparison of the predicted and
observed values of a specific quantity: the peak concentration level,
for instance. Those of the second type compare the statistical behavior
(the mean, variance and correlation, for example) of the differences
between the predicted and observed values for the quantities of interest.
Measures of the final type are useful in providing qualitative insight
into model behavior, transforming concentration "residuals" (the differ-
ences between predicted and observed values) into forms that highlight
certain aspects of model performance and thus triggering "pattern
recognition."
In order to Illustrate the types of variations found in each
generic performance measure category, we present Table V-2. Some
typical examples are included for each category/variation combination.
In section D of this chapter, a number of specific performance measures
are listed. Examined in detail in Appendix C, they are classified
according to the scheme presented here.
3. Several Practical Considerations
Several practical considerations have a strong impact on the choice
of model performance measures. Each of these derive from limitations on
the degree of spatial resolution attainable with most models and measure-
ment networks.
Ideally, in assessing the performance of a model, one might want to
examine for several hours during the day the agreement between prediction
and observation throughout the concentration field (the spatial distribu-
tion of concentrations). Differences between the predicted and observed
values of the following could be uncovered thereby: the location, timing,
and level of the concentration peak; the area exposed to a concentration
in excess of a given value (e.g., the NAAQS); and the concentration values
at stations within a measurement network.
V-10
-------
TABLE V-2. TYPES OF VARIATIONS AMONG GENERIC
PERFORMANCE MEASURZ CATEGORIES
Generic Performance
Measure Category
Peak
Station
Area
Types of
Variations
Scalar
Pattern
recoqnition
Scalar
Statistical
Typical Example
Pattern
recognition
Scalar
Statistical
Pattern
recognition
Concentration residual* at
the peak.
Map showing locations and
values of maximum one-hour-
average concentrations for
each hour.
Concentration residual at the
station measuring the highest
value.
Expected value, variance and
correlation coefficient of
the residuals for the model-
ing day at a particular
measurement station.
At the time of the peak(event-
related), the ratio of the
residual at the station hav-
ing the highest value to the
average of the residuals at
the other station sites (this
can indicate whether the model
performs better near the peak
than it does throughout the
rest of the modeled region).
Difference in the fraction of
the modeled area in which the
NAAQS are exceeded.
At the time of the peak, dif-
ferences in the area/concen-
tration frequency distribution.
For each modeled hour, iso-
pleth plots of the ground-
level residual field.
*Residual: The difference between "predicted" and "observed."
V-ll
-------
TABLE V-2 (Concluded)
Generic Performance Types of
Measure Category Variations Typical Example
Exposure/dosage Scalar Differences in the number of
person-hours of exposure to
concentrations greater than
the NAAQS.
Statistical Differences in the exposure
concentration frequency dis-
tribution.
Pattern For the entire modeled day, an
recognition isopleth plot of the ground
level dosage residuals.
V-12
-------
Difficulties hindering such an examination arise from two sources:
the limited spatial resolution of the model and the sparsity of the
measurement network. While some models, such as the Gaussian ones, are
analytic and thus able to resolve the concentration field, many cannot
do so completely. Grid models, for example, predict a single average
concentration value for each cell. For this reason, they can not resolve
the concentration field on a spatial scale any finer than the intergrid
spacing (usually on the order of one or two kilometers for urban scale
grid models). Trajectory models are similarly limited: They can resolve
the concentration field only as finely as the dimensions of the air
parcel being simulated. Further, predictions are computed only for a
particular space-time track, and not for the entire concentration field.
The relatively small number of stations in most measurement networks
limits the ability to reconstruct completely the concentration field
actually occurring on the modeled day. While stations are well-placed in
some networks, in others they are not. Thus, not only are stations
often 3-10 kilometers apart, their placement does not always guarantee
the observation of peak or near-peak concentrations. Further, even in
extended urban areas, seldom does the number of stations exceed 10 to 20.
For these reasons, concentration fields generally are not known
with precision, from either model predictions or observational data.
Estimates of the spatial distribution of concentrations can be obtained
only by inference from "sparse" data. The use of numerical processes,
such as interpolation and extrapolation, to extend that data introduces
additional uncertainty into the comparison of predictions with observations,
Another consequence results from the limited resolution of measure-
ment networks: The value of the concentration peak actually occurring on
the day of observation may not be known. Measurement networks usually
consist of fixed stations arranged in a set pattern. Unless the air
parcel containing the peak drifts over or near one of the stations, the
maximum concentration value sensed by the network will be less (sometimes
substantially so) than the value of the actual maximum. When prevailing
V-13
-------
winds and pollutant chemistry are highly predictable for the days of
worst episode conditions, station placement can be designed so as to
maximize the likelihood of sensing the true peak. When conditions are
not so predictable, a measurement network with a modest number of
stations has little chance of "seeing" the true peak. For instance,
suppose the cloud containing the peak and all concentrations within 20
V
percent of it covers an area of 25 square kilometers in an urban area
having a total area of 1000 square kilometers. If the cloud has an
equal likelihood of being above any point in the urban region at the
time of the peak, by dividing the area of the cloud into the total
urban area, we can make a crude estimate of the number of stations
required to guarantee a measurement within 20 percent of the peak:
40 stations evenly spaced about 5 kilometers apart throughout the
urban region would be required. Even if the probable location of
the cloud were known to be within an area equal to one-quarter of the
urban area, 10 stations would be required just within that small area.
This degree of station density is high and may not be found in many
circumstances.
The above example is a simplistic one. The design of actual
station placement can be a far more complex process than indicated here.
However, the example serves to underscore the main point: a measurement
network, though satisfying EPA regulations,* may still be unable to
guarantee an observation "close" to the actual concentration peak, i.e.,
within 10 to 20 percent.
The points raised in the above discussion have some practical
implications for the choice of a model performance measure. Among
these are the following:
Source: 40 CFR §51.17 (1975).
V-14
-------
> Performance measures relying on a comparison of the
predicted and "true" peak concentrations may not be
reliable in all circumstances since measurement networks
can provide only the concentration at the station re-
cording the highest value, not necessarily the value at
the "true" peak.
> Performance measures relying on a comparison of the
predicted and "true" concentration fields may not be
computationally feasible since neither predicted nor "true"
concentration fields are always resolvable, spatially or
temporally, at the scales required for comparison
> Performance measures based upon a comparison of predicted
and "true" exposure/dosage, though they are appealing
because of their ability to serve as surrogates for the
health effects experienced by the populace, may not be
computationally feasible because of the difficulty in
measuring the "true" population distribution and the
"true" concentration field. (We do suggest in Chapter
VI, however, one means by which health effects considera-
tions can be accounted for implicitly.)
> Performance measures based upon a comparison of the
predicted and observed concentrations at station sites
in the measurement network may be of the greatest practical
value.*
While the above points are general ones, exceptions to them do
occur in specific applications. Also, certain performance measures,
though not fully reliable on their own, can be useful in a qualitative
sense when used in conjunction with other measures.
C. A BASIC DISTINCTION: REGIONAL VERSUS SOURCE-SPECIFIC
PERFORMANCE MEASURES
Some models are used to address multiple-source, region-oriented
issues; others are applied to consider single-source issues. The
*Note caveat on pages VI-18 and VI-19, with respect to point source applications,
V-15
-------
performance measures appropriate for each differ. We consider here the
distinction between regional and source-specific performance measures.
The distinction is drawn not so much between the type of performance
measure used (peak, station, area, or exposure/dosage), but rather between
the spatial scales over which it is applied. To address urban or regional
scale issues (SIP/C, AQMP), we must consider a region hundreds of square
kilometers in area, with the spatial and temporal distribution of
concentrations the result of emissions from many sources. The quantities
of interest are: the regional peak concentration (its level, location
and timing) and for each hour during the day (particularly at the time
of the peak), the spatial distribution of the pollutant concentrations,
by species. This information is frequently conveyed in the form of a
concentration isopleth diagram, an example of which is shown in Figure
V-3. The diagram shown was produced by the SAI Urban Airshed Model,
*
illustrating its ozone predictions for the Denver Metropolitan region
at Hour 1200-1300 MST on 29 July 1975.
To address single-source issues, on the other hand, we consider
only the region downwind of the specific source being modeled. While
emissions from it contribute to the overall pattern and level of
regional pollutant concentrations, 1t is usually the incremental impact
of those emissions that are of concern. The principal quantities of
interest are: the peak incremental ground-level concentration downwind
of the source and the spatial distribution of the incremental concen-
trations within the downwind ground-level "footprint." Specific para-
meters describing the latter are: the area within which concentrations
exceed a certain value and the shape of the concentration isopleths, usu-
ally conveyed in the form of a diagram such as the one shown in Figure V-4.
This diagram was constructed using a Gaussian formulation for a continu-
ously emitting elevated point source. Conditions are in steady-state and
"perfect" reflection from the ground is assumed. No inversion layer exists.
It should be noted that winds are unlikely to persist long enough for
actual conditions ever to resemble these isopleths beyond 20 to 25 km
(about 6 to 10 hours).
V-16
-------
NORTH
v >^» Ai? r i^.-^-—^ ^>^^^r^^^*
'fe ^^a^^^^^^c^^^'^^
*S .'SLfS^V^r^r ..'i. ju." •= ^"bv ^£^? <«^^v —^-.\c!^«
\r
-------
HOT£S:
SOU»K STRCNGTH . 1000 tbn/hr
WIND • ? rph
tFFECTlVE STACK MIGHT • 250 ft
PfPfECT GROUND REFLECTION
HO |NV£»SJON
E-STABILITY CLASS (SL:'-,"n» ST«BIE)
10-HOUR
TRAVEL
TIME
OoMmrlnd Oil Una
(klloaetcrs)
FIGURE V-4.
SAMPLE SPECIFIC-SOURCE ISOPLETH DIAGRAM ILLUSTRATING CONCEN-
TRATIONS DOWNWIND OF A STEADY-STATE GAUSSIAN POINT SOURCE
Y-18
-------
Other types of sources produce different downwind.isopleth
patterns. In Figure V-5 we show qualitatively the downwind concentra-
tion patterns resulting from emissions from each of the three prin-
cipal source types: point, line, and area. These are only represen-
tations; the actual location, level, and shape of the isopleth lines
are heavily dependent on wind speed, source strength, and atmospheric
stability class. The figure does indicate, however, the general shape
of the downwind area within which the source impact is felt.
The type of source provides information in two areas: It identifies
the modeling region within which the peak, station, area, and exposure/
dosage performance measures are to be applied; and it provides insight
for monitoring network design. The observational data against which
model performance is to be judged are gathered at the measurement stations
within that network. To measure properly the impact produced by a
specific source, the measurement network should be'deployed in a
pattern consistent with the concentration field shapes shown in Figure
V-5. The station designed to measure the ground-level peak concentra-
tion should be located downwind from the source, several kilometers
distant for an elevated point source and immediately adjacent for
either a line or an area source. Located farther downwind are those
stations designed to resolve the concentration field and to determine
the concentration value most representative of the regional incremental
impact of the source. A schematic of such a measurement network for a
point source is presented in Figure V-6, showing one possible configu-
ration for the stations.
Several difficulties arise in practice: Wind direction is change-
able, and the location of ground-level footprints is very sensitive to
atmospheric stability. These problems are particularly acute when the
emitter being considered is an elevated point source. To illustrate, we
show in Figure V-7 the locus of the downwind footprint if all wind direc-
tions are considered equally likely to occur. If we idealize the concen-
tration isopleths as being elliptical in shape, we can determine an
V-19
-------
SCURC;
—X
T
tens of
kfloneters
PREVAIL IMG
"~n r~*iloaeteri
SOURtf
:~ i
Mveral
kilo
(PftmiLlHC
WIND
(a) faint Source (e.g.. Power
-------
PREVAILING
WIND
X MEASUREMENT
STATIONS
STATION
SENSING PEAK
POINT
SOURCE
X
FIGURE V-6. SCHEMATIC OF A POINT SOURCE MEASUREMENT NETWORK
:>V:V:-::::Y":-X:-•:•:-.•'
CONCENTRATION-':':.-/:}:-
WITHIN A CERTAIN
AMOUNT OF THE PEAK-
^.-.MAXIMUM v.-.v;.:.:
::;v:;CpNCENTRATION
SJ $$:$ws$&Mini MUM ' CON CE" NT fa'$^:$$&y
X#::y/&£Y-v:TION OF INTEREST :::^:+:}:-/
FIGURE V-7.
LOCUS OF POSSIBLE FOOTPRINT LOCATIONS FOR AN
ELEVATED POINT SOURCE. All wind directions
are considered equally likely.
V-21
-------
expression for the ratio of the area within a given isopleth to the
area of annulus, as shown in Figure V-7. Doing so, we can evaluate a
sample problem. Referring once again to Figure V-4, let the minimum
concentration value of interest be 300 yg/m^. Then, obtaining from
the figure the appropriate values, we can calculate that the isopleth
contains only 1.2 percent of the total area of the annulus. A monitor
placed at random within the annulus would have only a 1.2 percent chance
of observing a concentration greater than the minimum value of interest.
This problem is compounded if we consider variations in the inner and
outer radii due to the varying dispersive power of the wind.
The message of all this is clear: When winds are variable, fixed
monitoring stations have little chance of characterizing the concen-
tration field downwind of an elevated point source. Several specific
implications result for the gathering of measurement data for computing
point source performance measures. Among these are the following:
> Measurement data may have to be gathered using mobile
monitoring stations. Plume cross sectional sampling
could be done then based on the wind speed/direction
and atmospheric stability observed in "real time."
> The annulus (or sector, if winds are more predictable)
containing the locus of peak concentrations is much
smaller in area than that containing the minimum
concentration of interest and is much closer to the
source (usually ranging from 1-5 km distant).
D. SOME SPECIFIC PERFORMANCE MEASURES
Having discussed model performance measures in generic terms, we
now present some specific examples. We provide in Appendix C a detailed
discussion of each specific measure. To summarize here, we provide a
list for each of the four generic types of performance measures: peak
(Table V-3), station (Table V-4), area (Table V-5), and exposure/dosage
V-22
-------
TABLE V-3. SOME PEAK PERFORMANCE MEASURES
Type Performance Measure
Scalar a. Difference* in the peak ground-level
concentration values.
b. Difference in the spatial location of
the peak.
c. Difference in the time at which the
peak occurs.
d. Difference in the peak concentration
levels at the time of the observed
peak.
e. Difference in the spatial location of
the peak at the time of the observed
peak.
Pattern Map showing the locations and values of the
recognition predicted maximum one-hour-average concen-
trations for each hour.
*
"Difference" as used here usually refers to "prediction minus
observation."
V-23
-------
TABLE V-4. SOME STATION PERFORMANCE MEASURES
Type
Performance Measure
Scalar
Statistical
Concentration residual at the station measuring
the highest concentration (event-specific time
and fixed-time comparisons).
Difference in the spatial locations of the pre-
dicted peak and the observed maximum (event-
specific time and fixed-time comparisons).
Difference in the times of the predicted peak
and the observed maximum.
For each monitoring station separately, the
following concentration residuals statistics
are of interest for the entire day:
1) Average deviation
Average absolute deviation
Average relative absolute deviation
4) Standard deviation
5) Correlation coefficient
6) Offset-correlation coefficient.
For all monitoring stations considered together,
the following residuals statistics are of
interest:
1) Average deviation
2) Average absolute deviation
3) Average relative absolute deviation
4) Standard deviation
5) Correlation coefficient
6) Estimate of bias as a function of
concentration
7) Comparison of the probabilities of concen-
tration exceedances as a function of
concentration
Scatter plots of all predicted and observed
concentrations with a line of best fit deter-
mined in a least squares sense.
Plot of the deviations of the predicted versus
observed points from the perfect correlation
line compared with estimates of instrumentation
errors.
V-24
-------
TABLE V-4 (Concluded)
Type Performance Measure
Pattern a. Time history for the modeling day of the pre-
recognition dieted and observed concentrations at each site.
b. Time history of the variations over all stations
of the predicted and observed average concentra-
tions.
c. At the time of the peak (event-related), the ratio
of the normalized residual at the station having
the highest value to the average of the normal-
ized residuals at the other stations.
V-25
-------
TABLE V-5. SOME AREA PERFORMANCE MEASURES
Type
Performance Measure
Scalar
Statistical
Pattern
recognition
a. Difference in the fraction of the area in which
the NAAQS are exceeded.
b. Nearest distance at which the observed concen-
tration is predicted.
c. Difference in the fraction of the area in which
concentrations are within 10 percent of the
peak value.
a. At the time of the peak, differences in the
fraction of the area experiencing greater than
a certain concentration; differences in the
following are of interest:
1) Cumulative distribution function
2) Density function
3) Expected value of concentration
4) Standard deviation of density function
b. For the entire residual field, the following
statistics are of interest:
1) Average deviation
2) Average absolute deviation
3) Average relative absolute deviation
4) Standard deviation
5) Correlation coefficient
6) Estimate of bias as a function of
concentration
7) Comparison of the probabilities of concen-
tration exceedances as a function of con-
centration
c. Scatter plots of prediction-observation concen-
tration pairs with a line of best fit determined
in a least squares sense.
a. Isopleth plots showing lines of constant pollu-
tant concentration for each hour during the
modeling day.
b. Time history of the size of the area in which
concentrations exceed a certain value.
c. Isopleth plots showing lines of constant residual
values for each hour during the day ("subtract"
prediction and observed isopleths).
d. Isopleth plots showing lines of constant residuals
normalized to selected forcing variables (inver-
sion height, for instance).
e. Peak-to-overall performance-indicator, computed
by taking the ratio of the mean residual in the
area of the peak (e.g., where concentrations are
within 10 percent of the peak) to the mean
residual in the overall region.
V-26
-------
(Table V-6). We include scalar, statistical, and qualitative/composite
pattern recognition variants.
E. MATCHING PERFORMANCE MEASURES TO ISSUES AND MODELS
To this point we have identified several performance measures
categories, discussed their general attributes and data requirements,
and associated with them a number of specific performance measures.
Two tasks remain in this chapter: We first indicate for each of the
generic types of issues the performance measures most appropriate for
use; we then discuss the capability of each generic class of model to
calculate those measures.
1. Performance Measures and Air Quality Issues
In Chapter III we identified seven generic types of air quality
issues, dividing them into two broad categories. Within the first of
these multiple-source issues, we included: State Implementation Plan/
Compliance (SIP/C) and Air Quality Maintenance Planning (AQMP). The
second category, source-specific issues, was defined to contain the
following: Prevention of Significant Deterioration (PSD), New Source
Review (NSR), Offset Rules (OSR), Environmental Impact Statements/
Reports (EIS/R), and Litigation (LIT). For each of these issues we now
consider some important distinctions that bear on the selection of the
most appropriate model performance measures (PMs).
> Multiple-Source Issues
- SIP/C. The compliance portion of a SIP details
plans for achieving ambient pollutant levels at
or below the NAAQS in Air Quality Control Regions
(AQCRs) currently in noncompliance. Because it
is the peak concentration level that is of primary
concern, a model should demonstrate its ability
to predict that peak. For a day chosen as the one
V-27
-------
TABLE V-6. S0f€ EXPOSURE/DOSAGE PERFORMANCE MEASURES
Type
Performance Measure
Scalar
Statistical
Pattern
recognition
a. Difference for the modeling day in the number of
person-hours of exposure to concentrations:
1) Greater than the NAAQS
2) Within 10 percent of the peak.
b. Difference for the modeling day in the total
pollutant dosage.
a. Differences in the exposure/concentration fre-
quency distribution function; differences in the
following are of interest:
1 Cumulative distribution function
2 Density function
3 Expected value of concentration
4 Standard deviation of density function
b. Cumulative dosage distribution function as a
function of time during the modeled day.
For each hour during the modeled day, an isopleth
plot of the following (both for predictions and
observations):
1) Dosage
2) Exposure
V-28
-------
to be used for model verification, peak performance
measures should be computed. Also contained within
SIPs are emissions control strategies. To assess
the effects of controlling specific sources, a model
must be capable of spatially resolving its concen-
tration predictions. Area PMs should be calculated,
if possible, to evaluate a model's ability to do so.
Station PMs are another means to evaluate model
spatial resolution, although pollutant cloud offset
can account sometimes for apparent large discrep-
ancies. Because SIP/C is most frequently an issue
in densely populated urban areas, large differences in
health effect impact can exist between prediction and
observation. Exposure/dosage PMs should be calcu-
lated, if possible, in order to evaluate the ac-
ceptability of a model's performance.
AQMP. Detailed within the maintenance portion of
a SIP are procedures for insuring, once compliance
has been achieved, that ambient pollutant concen-
trations do not again rise above the NAAQS. Because
violation of the NAAQS is an issue, peak PM's are
important measures of model performance. However,
because pollutant levels are low (relative to the
values before compliance), small errors in model
performance might not produce a large uncertainty
in expected health impact. Consequently, the use
of exposure/dosage PMs may not be necessary. Also,
emissions control strategies may not be as global.
Retrofit of control devices on existing sources will
have been accomplished. Automotive emissions will
have been controlled (presumably) such that point
sources will contribute a large fraction of the
emissions inventory. While incremental growth and
development will alter the spatial and temporal
V-29
-------
distribution of pollutants, the need for modeling
spatial resolution may not be so crucial as it was
with SIP/C. Agreement between prediction and observa-
tion as measured by area and station PMs, while desir-
able, may not always be required within the same
tolerance as for SIP/C issues.
Specific-Source Issues
~ PSD. Individual sources are not permitted to cause
more than small incremental increases in concentra-
tions in areas currently In attainment of the NAAQS.
Since these so-called "Class I" regions (often state
or national parks) are generally some distance from
the polluting source (>10 kilometers), a model must
be able to predict accurately ground-level concentra-
tions some distance downwind from the source. If the
source being modeled is by Itself likely to produce
near-stack ground-level concentrations in excess of
the NAAQS or increments greater than Class II allow-
able increments, peak measures are of particular
interest. Otherwise, "far-field" concentration predic-
tions are more important than estimates of the peak
value. Downwind station PMs are often the measures
most suitable for evaluating model predictions for
PSD Class I. Also, plumes from point source are very
narrow, that is, their cross-wind dimensions are much
smaller than their downwind ones. Consequently, the
incidence of a Class I violation may be quite sensi-
tive to model performance, as measured by area PMs.
However, exposure/dosage PMs are not likely to be of
interest because of the sparsity of population in areas
where PSD is an issue and the relatively low concentra-
tions occurring there.
~ NSR. New source review is an important issue in both
urban and nonurban regions. With the-density of popula-
tion in urban areas, many persons may live within a short
distance (<5 kilometers) of a source. The ground- '
level peak concentration, then, may be an important
V-30
-------
indicator of near-source health impact. Prediction
of that peak, as measured by a peak PM, may be an
important model performance requirement. However,
because ground-level concentrations fall off rapidly
farther downwind and because of the "narrowness" of
the plume, differences in exposure and dosage between
prediction and observation may not be of substantial
consequence. Close agreement, as measured by area
and exposure/dosage PMs, may not be required. Also,
in order to assess the impact of a new or modified
source, it is necessary to know its incremental effect
on regional air quality. This is best represented by
an "average" concentration value (including background)
well downwind of the source (>10 kilometers). Thus, a
model should demonstrate its ability to reproduce mea-
surement data at that downwind range. The use of
station PMs is indicated.
OSR. In order to construct a new source or modify
an existing one in a region experiencing concentra-
tions in excess of the NAAQS, the owner of the source
must arrange for the removal of existing sources.
An amount greater than the emissions from the proposed
new source must be removed from the regional inven-
tory. Currently, these "offsets" are made on the
basis of emissions rather than as a result of their
impact on ambient concentrations. In such a case,
no air quality predictions are required (unless a
region-wide violation is attributable to the source
being removed or cleaned up). Only an accurate
emissions inventory is necessary. However, if off-
sets were "negotiated" at the level of ambient concen-
trations, the predictions of air quality models would
assume significance. The "far" downwind concentration
value, representative of its regional incremental
impact, would be the quantity of greatest interest,
V-31
-------
since it would describe the source's offset "potential."
Station PMs then would be of use in evaluating
model performance.
EIS/R. Projects having a significant, adverse impact
on air quality usually are presented for public
review by means of an EIS or an EIR. Such projects
generally consist of one or a few distinct sources,
although some consist of a greater number. An
example of the latter is the Denver Metropolitan
Wastewater Overview EIS recently completed by
Region VIII of the EPA. Federal funding for
twenty-two separate sewerage treatment facilities
was conditioned upon favorable review of the EIS
which examined their combined regional Impact. If
the sources are widely distributed throughout the
modeling region, spatial resolution may be an im-
portant model requirement. In such a case, area
and station PMs would provide a useful means to
verify model acceptability. If the combined
emissions from the proposed sources are relatively
low or they are localized to a narrow downwind
plume, their incremental health impact may be
small. Exposure/dosage PMs might be applied to
assess model performance. However, if, as in
Denver, the potential impact is more serious and
widespread, this latter type of PM can be useful.
LIT. Court challenges can arise to the basic air
pollution laws themselves, to their implementation
to federal regulation, or to decisions regarding
specific sources (requests for variances and
applications for construction/modification approval,
for example). While challenges of the first two
types can and have had important consequences, we
identify the third type as the principal variant
included in LIT. When the specific source in question
V-32
-------
is to be located in an urban area, the model used to
estimate its effects should be expected to predict
both its near-source, ground-level concentration peak
and its far-field "average" value. Peak and station
PMs should be used. If the source is to be constructed
in a rural area, PSD may be an issue in arriving at a
build/no-build decision. If so, accuracy of spatial
resolution could be important. The use of area PMs
could be of assistance.
We summarize in Table V-7 many of the points mentioned above. In it
issues are associated with the generic categories of performance measures
most commonly required for use in assessing model performance. However,
exceptions do occur. For this reason, the final choice of performance
measures should be dictated by the character of the specific application.
2. Performance Measures and Air Quality Models
In the previous section we associated performance measures with gen-
eric types of issues. We now discuss the ability of generic classes of
models to generate predictions in a form suitable for calculation of those
measures. All model types produce estimates of the concentration peak.
Some can predict station concentrations. Fewer can spatially resolve
the concentration field. Fewer still are able to determine an estimate
of exposure/dosage. For each generic model category, we outline here
their general capabilities.
> Grid,. The formulation of grid models permits the esti-
mation of concentrations averaged for each grid cell.
Consequently, the concentration field can be resolved
spatially as finely as the dimensions of the grid cell.
The peak is estimated to be the maximum ground-level
grid cell concentration occurring during the modeling day.
The location of the peak is predicted only as closely as
V-33
-------
TABLE V-7. PERFORMANCE MEASURES ASSOCIATED
WITH SPECIFIC ISSUES
Performance Measure Type
Issue Peak Station Area Exposure/Dosage
Multiple-source
SIP/C XXX X
AQMP XXX
Specific-source
PSD XXX
NSR XXX
OSR X X
EIS/R XXX X
LIT XXX
V-34
-------
a single grid cell dimension. The value at the peak is
predicted only as an area average in the vicinity of the
peak (within one grid cell). Because of its spatial and
temporal resolution, predictions suitable for calculation
of station, area and exposure/dosage performance measures
also can be generated.
> Trajectory. Because a single air "column" is simulated,
only concentrations along the space-time track followed
by the advecting air parcel can be estimated. Such
models, as a consequence, can predict station concentra-
tions only for those over which they pass. If several
adjoining parcels are modeled, predictions at other
stations can be determined. The spatial location of the
peak can be estimated only as closely as the dimensions
of the air column. The peak level is estimated to be
the greatest column-averaged concentration occurring
-during the modeling day. Averaging can take place over
the entire vertical region from the ground to the inver-
sion base, or over the lowest of several vertical column-
layers. Because of their limited spatial resolution,
regional trajectory models do not generate predictions
In a form suitable for the calculation of area or
exposure/dosage PMs. Specific-source trajectory models,
on the other hand, may do so. Concentrations are pre-
dicted as a function of downwind distance from the source.
Though lateral resolution is limited, concentration esti-
mates can be put in a form appropriate for calculation of
station, area and exposure/dosage PMs.
> Gaussian. Concentration field predictions are expressed
analytically. Thus, subject to the steady-state limita-
tions of their formulation, the short-term averaging
versions of these models can provide their estimates in a
form that is suitable for the calculation of all performance
measure types. The long-term averaging versions, however,
V-35
-------
predict regional or sector-averaged estimates of annual
concentrations. Estimates of exposure/dosage (except
. crudely on the basis of an annual concentration level) are
difficult to derive. Predictions of annual station averages,
though, can be obtained for regional models of this type.
> Isopleth. Estimates in no other form than the regional
peak concentration can be obtained with this method. This
can be done only when the isopleth diagrams can be inter-
preted in an absolute sense. This is the case only when
the isopleth diagram has been derived for ambient condi-
tions similar to the ones in the area being modeled. In
addition, a prediction of the peak can be verified only
if a historical data base exists that is sufficient to
determine a peak concentration in a previous base year and
a record of the emissions cutbacks occurring since then.
> Rollback. The only prediction obtainable from rollback
is an estimate of the regional peak concentration. This
is determinable only if an historical data base exists
such as that described for the isopleth method.
> Box. A prediction of the regional peak concentration
can be determined using this method. No other estimates
requiring finer spatial resolution can be computed.
Diurnal variation in the estimates of regional average
concentration, however, can be made.
We summarize in Table V-8 many of the points mentioned above. In
it, we indicate for each generic model the type of performance measure
that may be calculated, given the capabilities and limitations of each
formulation.
F. PERFORMANCE MEASURES: A SUMMARY
In this chapter we identified generic performance measure categories,
listed some specific performance measures, and then associated the
V-36
-------
TABLE V-8. PERFORMANCE MEASURES THAT CAN BE
CALCULATED BY EACH MODEL TYPE
Performance Measure Type
Model
Refined usage
Grid
Region oriented
Specific source oriented
Trajectory
Region oriented
Specific source oriented
Gaussian
Long-term averaging
Short-term averaging
Refined/screening usage
Isopleth
Screening usage
Rollback
Box
Peak Station
X
X
Exposure/
Area Dosage
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
V-37
-------
generic measure with generic Issues, noting for each model type the PMs
they are capable of calculating. Having done so, we are now ready to
proceed with the final objective of this report: the discussion of
model performance standards. The presentation in Chapter VI will be
based upon the points raised in this chapter. The following are of
crucial importance:
> Measurement networks often do not sense the "true"
concentration peak.
> Only performance measures based upon station measure-
ment data may be computationally feasible.
> Model predictions are often resolvable on a finer
scale than measured concentrations; even though
strict comparison of prediction with observation
through some computed measure may not be fruitful,
the model predictions themselves may still offer
valuable insight.
V-38
-------
VI MODEL PERFORMANCE STANDARDS
The central purpose of this report is to suggest means for setting
performance standards for air quality dispersion models. Toward that end
our discussion has proceeded as follows: Issues were identified (Chapter
III); issue/model combinations were presented (Chapter IV); and alternative
issue/model/performance measure associations were discussed (Chapter V).
We are now at the final step: the setting of standards. To place this
in the proper framework, we first identify five attributes of desirable
model performance, showing how their relative importance depends on the
issue being addressed and the pollutant being considered. Then we recom-
mend specific performance measures whose values reveal the presence or
absence of each performance attribute. We detail several rationales for
establishing standards for those measures. To illustrate the use of these
measures in assessing model performance, we present a sample case. It is
based upon SAI experience in using a grid-based photochemical model in the
Denver metropolitan region. Finally, we detail possible forms the actual
standard might assume, suggesting a sample draft outline and format.
The subject addressed in this report is a broad and complex one.
Seldom can a rule for judging model performance be stated that does not
have several plausible exceptions to it. Consequently, we view the estab-
lishment of model performance standards to be a pragmatic and evolutionary
exercise. As we gain experience in evaluating model performance, we will
need to modify both our choice of performance measures and the range of
acceptable values we insist on. Nevertheless, the process must begin
somewhere. The recommendations contained in this chapter represent such
a beginning.
We feel the measures and standards we suggest for use here will almost
certainly change as experience improves our "collective judgment" about
what constitutes model acceptability and what does not. Perhaps the
VI-1
-------
number of measures will increase to provide richer insight into model
performance, or perhaps the number will shrink without any loss of "informa-
tion content." Regardless of the list of measures and their standards that
ultimately emerges for use, it is the conceptual structuring of the per-
formance evaluation itself that seems to be most important at this point.
We must identify the attributes of a well-performing jnodel, and we need to
understand how we assess their relative importance, depending on the issue
we are addressing and the pollutant species we are considering. The dis-
cussion in this chapter offers a conceptual structure for "folding in" all
these concerns and suggests candidate measures and standards.
A. PERFORMANCE STANDARDS: A CONCEPTUAL OVERVIEW
The chief value of air quality models lies tn their predictive ability.
Only through their use can the consequences of pollution abatement alter-
natives be assessed and compared. Only by means of model predictions can
the impact of emissions from newly proposed sources be estimated and evalua-
ted for acceptability. However, because the questions typically asked of
models are hypothetical ones, their predictions are inherently nonverifiable.
Only after the proposed action has been taken and the required implementation
time elapsed will measurement data confirm or refute the model's predictive
ability.
Herein lies the dilemma faced by users of air quality models: If
a model's predictions at some future time cannot be verified in advance,
on what basis can we rely on that model to decide among policy alternatives?
In resolving this, most users have adopted a pragmatic approach: If a
model can demonstrate its ability to reproduce for a similar type of appli-
cation a set of "known" results, then it is judged an acceptable predictive
tool. It is on this basis that model "verification" has become an essential
prelude to most modeling exercises.
A further difficulty exists. What constitutes a set of "known" results?
This is not a problem easily solved. For "answers" to be known exactly, the
"test" problem must be simple enough to be solved analytically. Few problems
VI-2
-------
involving atmospheric dynamics are so simple. Most are complex and nonlinear.
For these, the analytic test problem is an unacceptable one. Another, more
practical alternative often is employed. For regional, multiple-source
applications, the "known" results are taken to be the station measurements
of concentrations actually recorded on a "test" date. For pollutants having
a short-term standard, the duration of measurement is a day or less. For
those subject to a long-term (annual) standard, the duration is a year or more.
For source-specific applications, the source of interest may not yet
exist, permission for its construction being the principal issue at hand.
For these applications, it is often necessary to verify a model using the
most appropriate of several protypical "test cases." These could be assembled
from measurements taken at existing sources, the variety of source size,
type and location spanning the range of values found in applications of interest.
The term "known" is used imprecisely when referring to a set of measure-
ment data. Station observations are subject to instrumentation error. The
locations of fixed monitoring sites may not be sufficiently well distributed
spatially to record data fully characterizing the concentration field and its
peak value. Nevertheless, despite those shortcomings, "observed" data often
are regarded as "true" data for the purposes of model verification.
Having assembled two sets of data, one "known" and the other "predicted,"
we can assess model performance by comparing one with the other. Predic-
tion and observation, however, can be compared in many ways. We must select
the quantities that can best characterize the distribution of pollutants in
the ambient air, for it is through comparison of their predicted and observed
("known") values that we specify model performance. We catalogued a number
of useful performance measures in Chapter IV, as well as in Appendix C.
Later in this fchapter we indicate that subset we view as having the
greatest practical usefulness.
Once we have decided on the performance measures best suited to our
issue/application (and most feasible computationally), we can calculate
VI-3
-------
these values. Having done so, however, we must ask a central question: How
close must prediction be to observation in order for us to judge model per-
formance as acceptable? In order for us to answer "how good is good," per-
formance standards for these measures must be set, with allowable tolerances
(predicted values minus observed ones) derived based upon a reasonable
rationale (health effects or pollution control cost considerations, for
instance).
By setting these standards explicitly, certain benefits may be gained.
Among these are the following:
> A degree of uniformity is introduced in assessing model
reliability.
> The impact of limitations in both data gathering proce-
dures and measurement network design can be made more
explicit, facilitating any review of them that may be
required.
> The performance expected of a model is stated clearly,
in advance of the expenditure of substantial analysis
funds, allowing model selection to be a more straight-
forward and less "risky" process.
> The needs for additional research can be identified clearly,
with such efforts more directed in purpose.
B. PERFORMANCE STANDARDS: SOME PRACTICAL CONSIDERATIONS
Before continuing, we point put several practical considerations that
can have a direct impact on model verification. Among the most important
of these are the following: data limitations (due to its form, quantity,
quality, and availability); time/resource constraints; and variability in
the level and timing of analysis requirements. We discuss each of these
in turn.
VI-4
-------
1. Data Limitations
For a modeling simulation to be conducted, data must be gathered charac-
terizing both the "driving forces" (emissions, meteorology, and vertical
temperature profile, for example) and the "resulting effects" (pollutant
concentrations). To do so requires an extensive and coordinated effort.
Consequently, complete data sets usually are assembled for only a few sample
days. The dates on which these data are gathered are chosen as ones likely
to be typical of "worst" episode conditions. However, unanticipated shifts
in meteorology (frontal passage, for example) can occur, confounding attempts
to measure ambient conditions on high-concentration days. Consequently, the
data available for model verification may not be representative of conditions
on the day when the "second highest" concentration occurs, i.e., the worst
NAAQS violation.
Confronted with such a situation, the modeler must decide the following:
Even if model performance proves acceptable for non-episode conditions, can
it be considered "verified" as a predictive tool for higher-concentration
days? This question is part of a still more general one: Should a model
be verified for more than one day, each of these days experiencing a dif-
ferent peak concentration? If such a procedure were followed, model perfor-
mance could be evaluated for concentrations ranging from the current peak
value to ones nearer the NAAQS. But, the meteorology occurring on days
experiencing low peak concentrations is not typical of that occurring on
high peak days. Should not the model, when used as a predictive tool,
employ maximum-episode meteorology? We do not answer these questions here
but note their importance as questions remaining to be resolved. We observe,
however, that limitations on data quantity and availability can constrain us,
limiting our flexibility in dealing with these questions.
Another difficulty can arise because of spatial limitations in the
data. As we noted in the last chapter, measurement networks provide
concentration data only at a few fixed sites. In general, these networks
cannot guarantee observation of the "true" peak, nor are they sufficiently
VI-5
-------
well-spaced to assure that the "true" concentration field can be reconstructed
from the station measurements. As a practical matter, however, these station
data must form the basis for the comparison of prediction and observation.
Station-type performance measures, as defined in Chapter V, therefore must
be the "preferred" (or rather the "unavoidable") measures of interest. We
detail some of these later in Section D.
2. Time/Resource Constraints
Both the amount and quality of the data collected as well as the level
of modeling analysis performed are all strongly influenced by time dead-
lines and resource constraints. This has several consequences among which
are the following: Because it is difficult, expensive and time comsuming
to mount special data gathering efforts, heavy reliance is placed on previously
gathered data, even with its recognized deficiencies; also, model selection
occasionally is made more on the basis of the form and extent of existing
data and financial budgetary considerations than on grounds more technically
justifiable. In such cases a conscious choice has been made, trading model
performance for other considerations.
The combined effect of inadequate data and inappropriate model choice
can reduce in value any assessment of model performance. In this report,
however, we take the following view: The level of performance required of
a model is determined not by exogeneous considerations but by the nature of
the issue and the specific modeling application.
3. Variability of Analysis Requirements
Modeling analysis requirements differ from one application to another.
There is an important question to ask in every modeling situation: How
much analysis is justified? In the Los Angeles Basin, for instance, attain-
ment of the NAAQS for ozone cannot be achieved without widespread and
extensive hydrocarbon (HC) emissions control. Ambient HC levels are currently
so high that more HC radicals are available than are "needed" by the chain
VI-6
-------
of photochemical reactions that results in the 03 peak. Consequently, reduc-
tions in HC emissions must be sizable before any appreciable reduction in
peak 0- can be achieved. The result of this is the following: Estimates of
«3
the percentage HC emissions control required to reach NAAQS compliance in
Los Angeles are so high (75 to 80 percent) that they are not strongly sensi-
tive to uncertainties in the value of the 03 peak, either measured or predicted.
If the only questions to be answered depended on the general region-
wide level of HC emissions control required (a SIP/C-related problem), then
a fair amount of uncertainty could be tolerated in model predictions of the
0~ peak* Use of a less sophisticated model might be acceptable. Were a
different issue/question addressed, however, a model providing more detailed
predictions might be required.
C. MODEL PERFORMANCE ATTRIBUTES
Model predictions are subject to a number of sources of uncertainty. Some
of these are data related, while others are inherent in the model theoretical
formulations. Regardless of their source, however, errors manifest themselves
in similar ways. They may affect a model's ability to predict peak concen-
trations, as well as introduce systematic bias or gross error into its pre-
dictions. They may limit a model's ability to reproduce temporal variation
or affect the spatial distribution of the concentration field.
What are the attributes of desirable model performance? Ideally, we
would ask that a model have five major attributes, the strength of our insis-
tence depending on the circumstance of our application and the pollutant we
are considering. The five model performance attributes are: accuracy of the
peak prediction, systematic bias, lack of gross error, temporal correlation,
and spatial alignment. The first of these concerns the model's
ability to predict accurately the level, timing, and location of the concen-
tration peak. The second attribute is the absence of systematic bias, where
predictions are shown not to differ from observations in any consistent and
unexplained way. The third attribute concerns the lack of gross error, or
rather the absolute amount by which predictions differ from observations.
VI-7
-------
We classify the difference between bias and error by means of the
following example. Suppose when we compare a set of model predictions with
station observations, we find several large positive residuals (predicted
minus observed concentrations) balanced by several equally large negative
residuals. If we were testing for bias, we would allow the oppositely
signed residuals to cancel. A conclusion that the model displayed no syste-
matic bias therefore might be a justifiable one. On the other hand, were
we testing for gross error, the signs of the residuals would not be considered,
with oppositely signed residuals no longer allowed to cancel. Because the
absolute value of the residuals is large in our example, we might well con-
clude that the model predictions are subject to significant gross error.
The fourth of the desirable performance attributes is that of temporal
correlation. When this is important, can the model reproduce the temporal
variation displayed by the observational data? A model might be judged as
being capable of doing so if its predictions varied in phase with observa-
tion, that is, if they were "correlated." The fifth desirable attribute is
that of spatial alignment. At each time of interest, does the model pre-
dict a concentration field that is distributed spatially like the observed
one? To determine this, correlation of prediction with observation could
be assessed at several points in the concentration field, e.g., monitoring
stations.
The five performance attributes are interrelated. Suppose, for instance,
that our model does not reproduce well the photochemistry of ozone formation
in the atmosphere. Not only could its estimates of the concentration peak
be in error, but also its temporal correlation and spatial alignment might
be poor. Even if the model predicted the peak properly, problems might still
exist. If the chemistry were "fast," the peak, though correct, might be pre-
dicted to occur sooner than that actually observed. Even if atmospheric
transport were properly modeled, performance measures might then "detect"
temporal and spatial problems.
By treating each performance attribute separately, we may run the risk
of rejecting a model on several grounds where only a single reason actually
VI-8
-------
exists.. For example, slight errors in the wind field input to the model
might result in predictions apparently wroi.g both spatially and temporally.
Yet, only a single defect exists, in this case not due to the model at all.
Nevertheless, we adopt a conservative viewpoint.' We suggest evaluating
the model separately for the presence of each attribute, even though they
themselves may be interrelated. Redundancy should not result in a satis-
factory model being unfairly rejected. If model predictions are good, they
will be acceptable both spatially and temporally. If they are poor, they
will probably be rejected, both for temporal and spatial masons.
If model performance is mixed, showing, for example, good temporal cor-
relation but poor spatial alignment, two possibilities exist. Either the
model performance may not be particularly poor or the performance measure
used to detect one or the other performance attribute is deficient (too
stringent or too lenient). In either case, however, forcing model perfor-
mance to be reassessed malces sense. On balance, while requiring a model to
"jump the hoop" twice may be redundant in looking for the same problem, it
should provide us a measure of safety in the "double-check" it provides, pre-
suming each attribute assumes the same importance (see the discussion below).
Although they are interrelated, the five model performance attri-
butes are distinct. Consequently, we must employ different kinds of per-
formance measures to determine the presence of each attribute. While we
defer to Section D a statement of specific measures we recommend using, we
list in Table VI-1 their objectives.
We have identified five model performance attributes. Which of these,
however, is most important? This question has no unique answer, the rela-
tive importance fn each problem depending on the type of issue the model
is being used to address and the type of pollutant under consideration.
In order to relate attribute importance to application issue in a more con-
venient manner, we present in Table VI-2 a matrix of generic issue class
(as defined earlier in this report) and problem type. For each combination
VI-9
-------
TABLES VI-1. PERFORMANCE MEASURE OBJECTIVES
Performance
Attributes
Accuracy of the
peak prediction
Absence of
systematic bias
Lack of gross
error
Temporal
correlation
Spatial alignment
Objective of Performance Measures
Assess the model's ability to predict the concentra-
tion peak (its level, timing and location)
Reveal any systematic bias in model predictions
Characterize the error in model predictions both at
specific monitoring stations and overall
Determine differences between predicted and observed
temporal behavior
Uncover spatial misalignment between the predicted
and observed concentration fields
TABLE VI-2. IMPORTANCE OF PERFORMANCE ATTRIBUTES BY ISSUE
Performance Attribute
Accuracy of the peak
prediction
Absence of systematic
bias
Lack of gross error
Temporal correlation
Spatial alignment
Importance of Performance Attribute*
SIP/C AQMP PSD NSR OSR EIS/R LIT
1111211
1
1
1
1
1
1
1
1
2
2
1
2
2
1
3
1
1
3
3
1
3
3
1
3
3
1
3
3
* Category 1 - Performance standard must always be satisfied.
Category 2 - Performance standard should be satisfied, but some leeway
may be allowed at the discretion of a reviewer.
Category 3 - Meeting the performance standard is desirable but failure
is not sufficient to reject the model; measures dealing
with this problem should be regarded as "informational."
VI-10
-------
we indicate an "importance category." We define the three categories based
upon how strongly we insist our model demonstrate the presence of a given
attribute. For Category 1, we require that performance standards always
be satisfied (the problem type is of prime importance). For Category 2,
we state that the standard should be satisfied but some leeway ought to
be allowed, perhaps at the discretion of a reviewer (while the problem type
is of considerable importance, some degree of "mismatch" may be tolerable).
For Category 3, we are not insistent that standards be met, though we state
that as being a desirable objective (the problem type is not of central
importance).
A number of assumptions are embedded in Table VI-2. Among the more
significant are the following:
> Both peak and "far-field" concentrations are of interest
in considering PSD and NSR questions.
> Specific-source issues (PSD, NSR, OSR, EIS/R and LIT) most
often deal with sources assumed to be continuously emitting
at a constant level (or nearly so); consequently, performance
measures considering time variations between prediction
and observation are not the principal measures of interest.
> Spatial agreement between prediction and observation is par-
ticularly important in applications where PSD is an issue;
this is so because source impact on pristine areas (Class I)
and elevated terrain (Class II) often occurs well downwind
of the source, with the magnitude and incidence of impact
highly directional and spatially dependent.
> Specific-source impact generally occurs in a narrow downwind
plume; thus, the monitoring network set up to provide measure-
ment data often consists of only a few stations; as a result,
the calculation of all-station performance measures may not
prove meaningful.
> Error is less important in considering regional issues than is
the presence of a systematic bias.
VI-11
-------
> To achieve and maintain compliance with the NAAQS (SIP/C, AQMP),
alternate control strategies must be developed and evaluated.
For this to be done properly, some degree of spatial resolution
should be attained by the model and verified.
The relative importance of each performance attribute is dependent
on the type of pollutant being considered and the averaging time required
by the NAAQS. If a species is subject to a short-term standard, for
instance, accuracy of the peak prediction and temporal correlation might
be of considerable concern, depending on the issue being addressed. How-
ever, if the species is subject to a long-term standard, neither of these
problem types are of appropriate form. We indicate in Table VI-3 a matrix
of the performance attributes and pollutant species. We rank each combina-
tion by the same importance categories we used earlier in Table VI-2.
Conceivably, a conflict might exist between the ranking indicated by the
issue and the pollutant matrices in Tables VI-2 and VI-3. We suggest resolving
the conflict in favor of the less stringent of the two rankings. For example,
suppose the issue being addressed was SIP/C and pollutant being considered
was CO. According to Table VI-2, the accuracy of the peak prediction should
be regarded as Category 1 (.the standard must always be satisfied). However,
according to Table VI-3, it should be considered as Category 2 (the standard
should be satisfied but some leeway may be allowed). The conflict should
be resolved by allowing the combined issue/pollutant ranking to be Category 2.
D. RECOMMENDED MEASURES AND STANDARDS
In this section we reach a major goal of this report: We identify a
recommended set of performance measures and propose rationales for setting
standards for each. Our discussion in this section unfolds as follows.
First, we isolate a candidate list of performance measures from which we
select the recommended set. Then, we detail several rationales on which to
base standards for our "preferred" measures. Using these we identify
specific "guiding principles" from which standards may be set. In a final
VI-12
-------
TABLE VI-3 IMPORTANCE OF PERFORMANCE ATTRIBUTES BY
POLLUTANT AND AVERAGING TIME
Pollutants
Performance 3
Attribute (1 hourl1
Accuracy of the 1
peak prediction
Absence of 1
systematic bias
Lack of gross 1
error
Temporal 1
correlation
Spatial 1
al Igrunent
CO**
(1 hour)
1
1
1
2
2
NfHC*
(3 hour)
1
1
1
2
2
with Short-term
S02 M>2
(3 hour) (T)f
1 1
1 1
1 - 1
2 1
2 1
Standards
CO
(8 hour)
1
1
1
2
2
Pollutants with
Long-term Standards
TSP**
(24 hour)
1
1
1
3
2
so2«*
(24 hour)
1
1
1
3
2
(1 year)
3
1
1
N/Atf
2
TSP
n year)
3
1
1
N/A
2
SO,
(1 year)
3
1
1
N/A
2
* Category 2 - Perforce Itandtrd should'be satisfied, but some leeway may be allowed at the discretion of a reviewer.
Category 3 - Meeting the performance standard Is desirable but failure Is not sufficient to reject the model.
t No short-term N02 standard currently exists.
I Averaging times required by the NAAQS are In parentheses.
•* Primary standards.
tt The performance attribute 1s not applicable.
VI-13
-------
synthesis, we present a summary table listing for each performance attri-
bute, the reconmended measures and a means for setting standards for them,
along with a sample value for the standard (ones listed are appropriate
for the Denver case study described in Section E of this chapter).
1. Recommended Performance Measures
Of the many performance measures considered in Chapter V (and in more
detail in Appendix C), which of these are most suitable for use in establishing
standards for model performance? The answer to this is constrained in two
major ways, the first conceptual and the second practical. First, the con-
ceptual constraint is imposed by the types of performance attributes we are
concerned with: The measures must adequately assess the presence or absence
of each of the five attributes. Second, the practical constraint is imposed
by the "sparseness" of the observational data: Since station observations
constitute the only data available for characterizing "true" ambient con-
ditions, we have little choice but to employ station performance measures
in determining model acceptability.
We draw a distinction between those measures that are of general use
in examining model performance and the much smaller subset of them that is
most amenable to the establishment of explicit standards. Many measures
can provide rich insight into model behavior but the information is conveyed
in a qualitative way not suitable for quantitative characterization (a
requisite for use in setting performance standards). These "measures,"
often involving graphical display, really are tools for use in "pattern
recognition." They display model behavior in suggestive ways, highlighting
"patterns" whose presence reveals much about model performance. Several
examples of such "measures" are isopleth contour maps of predicted concen-
trations and estimated "observed" ones, isopleth contour maps of the dif-
ferences between the two, and time histories of predicted and observed con-
centrations at specific monitoring stations.
VI-14
-------
Though we focus on station measures for use in setting model performance
standards, we do not suggest the calculation of performance measures be
limited to them. Many others, where each is appropriate, should be used.
The data should be viewed in as many, varied ways as possible in order to
enrich insight into model behavior. We suggest a number of useful measures
both in Chapter V and Appendix C.
Given that station measures are our "preferred" (rather, our "unavoid-
able") choice, we now consider the list of candidate measures. From these
we select our final recommended set. We present the candidate station per-
formance measures in Table VI-4. We group them by the number of stations
compared noting the performance attribute and generic issue class they are
most suited for addressing. We identify four types of comparisons:
> Event Specific Values. Predicted and observed concentra-
tions are compared at the time a specific event occurs.
For instance, the peak station prediction can be compared
with the peak station observation, even though these may
occur at different stations and times.
> Comparative Values. Predicted and observed concentrations
are compared at the same monitoring station.
> Average Values. Predicted and observed concentrations are
compared averaged for all monitoring stations.
> Offset Values. Observed concentrations at a given station
are compared with predicted values offset by a small amount
spatially (values at near-by stations) and/or temporally (values
at other times, either earlier or later).
Performance measures are of two different kinds: "absolute" and
"informational." The first type includes those measures for which we can
set specific, absolute standards. Measures of the second type are more
informational in nature, providing qualitative insight into model performance.
Their values are to be considered as "advisory," having associated with them
no specific standard.
VI-15
-------
TABLE VI-4. CANDIDATE STATION PERFORMANCE MEASURES
Issue Category
Stiller*
Considered.
Peat SUtlens
({vent-Specific
Valmts)
Ctck Sution
Separately
(Conparative
Values)
Kit Stations
Together
(Average Values)
Performance
Attributes
Accuracy of
the peak
prediction
(Concentra-
tion level 1
Accuracy of
the peak
prediction
(Location
of Peak)
Accuracy of
the peak
prediction
(Timing of
Peat)
Absence of
systematic
bias
Lack of
gross error
Temporal
correlation/
spatial
alignment
Temporal
correlation
Absence of
Systematic-
bias
Lack of
gross error
Temporal
correlation/
spatial
attgnwnt
Performance Measure
Description
1, Difference between or
ratio o< peak station
concentrations (could be
at different measurement
stations)
2 Difference between or
ratio of predicted and
observed concentrations
at the station recording
the «a»imu» measured
value
3. Spatial displacement
betxeen predicted and
observed peak stations
4. Timing difference be-
tween occurrence of
predicted and observed
peak
5. Average relative devia-
tion
6. Average absolute rela-
tive deviation
7. Standard deviation of
deviations
8. Correlation coefficient
9. Temporal offset corre-
lation coefficient
10. Plots of comparative
time histories
11. Average relative de-
viation
12. Average absolute rel-
ative deviation
13. Standard deviation of
deviations
14.. Correlogram of
prediction-observation
pain
15. Ratio of peak to
average deviation
It. Correlation coefficient
Multiple-Source
Status S1P/C AQMP
Absolute X X
Absolute X X
Informational X X
AbsaUte - X X
Absolute X X
Absolute X X
Absolute X X
Absolute X X
Informational X X
Informational X X
Absolute X X
Absolute X X
Absolute X X
Informational X X
Informational 1 x
Absolute i x
Specific-Source
PSD HSR OSIC E1S/R LIT
* X * X X
X XX
X XX
XXX XX
XXX XX
XXX XX
XXX XX
XXX XX
XXX XX
VI-16
-------
TABLE VI-4 (Concluded)
Issue Category
Stations
Considered
Problem
Neirby Stations
(Offset Values).
Temporal
correlation
Spatial
alignment
Temporal
correlation
Spatial
Al ignment
Performance Measure
Description Status
17.
18.
19.
Temporal offset corre- Informational
tation coefficient
Plot of comparative Informational
tine histories
Spatial offset corre- Informational
Multiple-Source
StP/C WJKP
Specific-Source
PSD HSR OS*« ElS/« LIT
lation coefficient
(comparison at the
sane tine)
20. Spatial/temporal offset Informational
correlation coefficient
(comparison at differ-
ent times)
• These leisures are appropriate 1f offsets are considered at the level of ambient concentrations
rather than primary emissions.
Often in practice modeling predictions are known with greater spatial
resolution than measurement data. The predicted concentration field, for
instance, can be resolved at intervals of several kilometers or less by
various types of models, including grid and Gaussian ones. To retain the
information contained in concentration field predictions, several "hybrid"
performance measures can be employed. With these, concentration field
predictions are compared with station measurements. We list in Table VI-5
several of these hybrid measures. When predictions are available in this
more detailed form, these measures may be calculated to supplement those in
Table VI-4.
VI-17
-------
Our recommended choice of performance measures is based upon the
following criteria:
> The measure is an accurate indicator of the presence of a
given performance attribute.
> The measure is of the "absolute" kind, that is, specific
standards can be set.
> Only station measures should be considered for use in
setting standards. (This is more an unavoidable choice
than a preferred one.)
Based on these criteria, we have selected the set of measures described
in Table VI-6. The use of ratios (Cp /Cp and v, for example) can introduce
difficulties: They can become unstable at low concentrations, and the sta-
tistics of a ratio of two random variables can become troublesome. Neverthe-
less, when used properly, their advantages can be offsetting. For example,
the use of Cp /Cp instead of (Cpn'Cpn,) permits a health effects rationale to
be used in recommending a performance standard (see a later discussion of the
effects rationale).
Before continuing, however, we insert an important caveat. For calcu-
lation of these measures to be statistically meaningful, a certain minimum
level of spatial and temporal "richness" must be available from monitoring
data. Often, this criterion is met for multiple-source, urban applications.
However, for isolated point source applications, it may not be. For such
cases, data inadequacies may be overcome by using prototypical "test bed"
data bases for the purposes of model verification. Selection of the
proper "test bed" could be accomplished by choosing the prototypical data
base that describes an application most nearly like the proposed one.
VI-18
-------
These data bases, where they do not already exist, could be assembled
through special measurement efforts at existing large point sources. Mon-
itoring could be extensive enough to insure adequate data "richness."
As a practical matter, however, such "test beds" are not currently
available. Verification instead must be conducted using whatever data are
at hand. These may be provided by tracer experiments. Alternatively,
where a source already exists (for instance, where retrofit of pollution
control equipment is the issue or where construction of a new source is
to occur on the site of an existing source), some site-specific data already
may be available.
Considerable care should be exercised when using such data to calcu-
late the performance measures listed in Table VI-6. If the data are too
"sparse," in either a spatial or a temporal sense, these measures may be
of little value, or worse yet, may actually be misleading. Additional
work needs to be conducted to identify, if possible, supplementary perfor-
mance measures for use when the available data is inadequate for reliable
use of the recommended measures.
Having stated the above caveat, we continue. A number of key assump-
tions are embedded in the choice of the specific measures shown in Table
VI-5. We state several of them:
> Concentration gradients within a pollutant cloud can be
"steep". Thus a slight spatial misalignment of the cloud,
perhaps an unconsequential problem on its own, can sometimes
result in the predicted peak occurring at a different
monitoring station than the measured peak. Estimating the
value of the concentration peak, however, is often of
much greater importance than predicting its exact location.
VI-19
-------
TABLE VI-5. USEFUL HYBRID PERFORMANCE MEASURES
Stations
Considered
Performance
Attribute
Peak Sution
(tvent-Speclfic
Values)
Accuracy of
the peak
prediction
(Concentra-
tlon level)
Issue Category
Performance Measure
Description
Status
1. Difference between or
ratio of predicted peak
concentration and nigh-1
est station value
Absolute
jjultiple-Source
S1P/C AQHP
X X
Specific-Source
PSD JSR
X X
OSR«
X
E1S/R
X
UI
x
Each Station
Separately
(Comparative
Values)
All Stations
Together
(Average
Values)
Accuracy of
the peak
prediction
(Location
of Peak)
Accuracy of
the peak
prediction
(Tiling of
Peak)
Spatial
alignment
Lack of
gross error
2. Spatial displacement Informational
between the predicted
peak and the station
measuring the highest
value
3. Tiering difference be- Informational
tween occurrence of
the predicted peak and
the maximum station
measurement
4. Plot showing for each InforMtional
hour during the day the
distance and direction
from the measurement
station to the nearest
point at which a pre-
dicted concentration
occurs equal to the
station measured value
5. Difference for each hour Informational
between the average pre-
dicted concentration
(averaged over the en-
tire field) and the
average station measure-
ment (averaged over all
stations)
6. Difference for each hour Informational
between the standard
deviations of the pre-
dicted concentrations
and the station measured
values
X X
X X
X XXX XX
X XXX XX
VI-20
-------
TABLE VI-6 . MEASURES RECOMMENDED FOR USE IN SETTING MODEL PERFORMANCE STANDARDS1
Performance
Attribute
Accuracy of the
peak prediction
Performance Measure
Ratio of the predicted station peak to the measured station
(could be at different stations and times)
m
Difference in timing of occurrence of station peak*
At
Absence of
systematic bias
Lack of gross
error
Temporal cor-
relation*
Average value and standard deviation of the mean deviation
about the perfect correlation line normalized by the average
of the predicted and observed concentrations, calculated for
all stations during those hours when either the predicted or
the observed values exceed some appropriate minimum value
(possibly the NAAQS)
'OVERALL
Average value and standard deviation of the absolute devia-
tion about the perfect correlation line normalized by the
average of the predicted and observed concentrations, calcu-
lated for all stations during those hours when either the
predicted or the observed values exceed some appropriate
minimum value (possibly the NAAQS)
OVERALL
Temporal correlation coefficients at each monitoring station
for the entire modeling period and an overall coefficient
averaged for all stations
Spatial alignment
li OVERALL
for 1 <. i <. M monitoring stations
Spatial correlation coefficients calculated for each modeling
hour considering all monitoring stations, as well as an over-
all coefficient average for the entire day
r , r
xj XOVERALL
for 1 <. j <. N modeling hours
* These measures are appropriate when the chosen model is used to consider questions
involving photochemically reactive pollutants subject to short-term standards.
t There is deliberate redundancy in the performance measures. For example, in
testing for systematic bias, y" and o_ are calculated. The latter quantity
is a measure of "scatter" about the perfect correlation line. This is also and
indicator of gross error and could be used in conjunction with |p"| and o^-.
VI-21
-------
Consequently, we suggest, when this seems reasonable (judg-
ment is necessary here), comparing the peak station pre-
diction with the peak station measurement, regardless of
when or where they both occur.
In addressing questions involving pollutants subject to
short-term standards, diurnal variation occurs in concen-
tration levels. It is reasonable to insist short-term
predictions emulate that pattern. Differences in the tim-
ing of the peak should be considered (particularly for photo-
chemical^ reactive pollutants) and temporal correlation
should be evaluated.
In many circumstances, percentage differences between predicted
and observed concentrations seem better indicators of model
performance than gross differences. For instance, a difference
of 0.04 ppm of ozone might be regarded as serious if ambient
levels were 0.10 ppm where it might not be if those levels were
0.24 ppm. The use of such measures can cause some problems:
Ratios can become unstable at low concentrations, and the statistics
of a ratio of two random variables can be complex. Neverthe-
less, percentage differences should be calculated (possibly
along with gross differences). Further, we suggest that residuals
(prediction minus observation) be taken about the perfect correla-
tion line (prediction equals observation), since we have no a_
priori reason to regard observation as any more accurate than
prediction. This was pointed out by Anderson et al. (1977). We
also suggest normalizing the residuals by the arithmetic
average of the predicted and observed concentration.
The concentrations of greatest interest are often the higher
values, that is, those that exceed some appropriate minimum
value (possibly the NAAQS, though this may differ from one
situation to another). We may be less interested in model
reliability below those levels. We suggest that performance
measures include only those prediction-observation "pairs" where
one or the other value exceeds the chosen minimum value. (Possibly
"stratification" may be of interest, that is, repeating the calcu-
lation of measures using different minimum values).
VI-22
-------
This should not be done, however, if it results in the
number of pairs being reduced below the number required
for statistical significance.
> Measurement stations usually are widely spaced. We assumed
this spacing to be so great that the use of spatial/temporal
offset correlation coefficients would be of uncertain value.
Consequently, we did not include them among the list of
measures recommended for use.
> Redundancy should be built into the calculation of per-
formance measures. This provides an internal means for
double-checking results. For example, in testing for
systematic bias, IT and a— are calculated. The latter quan-
tity is a measure of "scatter" about the perfect correla-
tion line. This is also an indicator of gross error and
should be used in conjunction with jy"| and o\—\-
2. Recommended Performance Standards
Having identified the performance measures requiring a specific
standard, we now consider four alternative rationales for setting those
standards. We designate the four as follows:
> Health Effects
> Control Level Uncertainty
> Guaranteed Compliance
> Pragmatic/Historic
The guiding principles for each of these rationales are stated in
Table VI-7.
We describe in detail each rationale in Appendix D, deferring their
technical description in order not to interrupt the flow of this chapter.
However, to offer insight into their general nature, we present here a
brief outline of each.
VI-23
-------
TABLE VI-7.
POSSIBLE RATIONALES FOR SETTING MODEL
PERFORMANCE STANDARDS
Rationale
Guiding Principle
Health Effects
Control Level
Uncertainty
Guaranteed Compliance
Pragmati c/Hi storic
The metric of concern 1s the area-Integrated cum-
ulative health effects due to pollutant exposure;
the ratio of the metric's value based on pre-
diction to Its value based on observation must be
kept to within a prescribed tolerance of unity.
Uncertainty in the percentage of emissions control
required must be kept within certain allowable
bounds.
Compliance with the NAAQS must be "guaranteed;"
all uncertainty must be on the conservative side
even if its means introducing a systematic bias.
In each new application of a model should perform
at least as well as the "best" previous performance
of a model in its generic class in a similar appli-
cation; until such a historical data base is com-
plete, other more heuristic approaches may be
applied.
> Health Effects. The most fundamental reason for setting
air quality standards is to limit the adverse health impact
the regulated pollutants (and their products) produce.
Thus, founding a model performance standard on a health
effects basis has strong intuitive appeal. To do so, we
assume an analytic form for urban population distribution
and an exposure/dosage health effects functional, both
of which require as inputs only easily derived data. Using
these, we determine in analytic form a new health-based
metric: the area-integrated cumulative health effects. We
estimate through this metric the total health burden experi-
enced by the population during the day. The model is required
to predict concentrations that do not differ from observa-
tions to the point an unacceptable difference is seen in the
health metric. While the data used is application-specific,
the method itself is general. The assumptions made in deriving
VI-24
-------
this rationale, while extensive, seem plausible. A sample case
was conducted for ozone exposure i.i the Denver Metropolitan
region, with promising corroboration of the rationale in several
key regards. The sample case is described in detail in
Appendix D.
Control Level Uncertainty. With this rationale we set perfor-
mance standards to ensure that uncertainty in estimates of the
amount of pollution control required be kept within acceptable
bounds. These limits may be determined in a number of ways,
but we consider limits on uncertainty in control cost as a
promising means for doing so. If we can assume that pollutant
production and evolution over the modeled region can be approxi-
mated by some simple surrogate, such as an isopleth diagram
for ozone, then control uncertainty limits can be directly and
easily related to equivalent bounds in uncertainty in the pol-
lutant peak, the quantity to which control strategies are often
designed.
Guaranteed Compliance. The NAAQS are written in quite
specific terms and must ultimately be complied with. An
argument can be made that to "guarantee" such compliance,
uncertainty in model predictions must be on the "conser-
vative" side. That is, the probability must be accept-
ably small that a control strategy designed based on model
predictions will not actually achieve compliance. We con-
sider this rationale here and in Appendix D primarily for
completeness. While the rationale has some potential
usefulness, it implies the introduction of a systematic
bias into modeling results, something we would hope to
avoid in a final choice of a performance standard.
Pragmatic/Historic. Standards for all performance measures
cannot be derived based on the rationales mentioned above,
something we will discuss later in this chapter. Until
additional research expands our options by providing insight
into other rationales, we adopt a pragmatic approach. We
may proceed in either of two ways. If we are able to state
VI-25
-------
heuristically a specific guiding principle for setting a
standard for a particular measure, we invoke it. Otherwise,
we simply require the following: In each new application
a model should perform at least as well as the "best" pre-
vious performance of a model in its generic class in a
similar application. In addition to being pragmatic, this
last approach Is also evolutionary, requiring a continually
expanding and updated model/application data base.
The four rationales differ in their usefulness vis-a-vis the five
performance attributes. Shown in Table VI-8 are the attributes addressable
by measures whose standards are set by each of the rationales. Only the
Pragmatic/Historic rationale is of use in addressing all attributes;
the other three are of use principally in defining the level of performance
required in predicting values at or near the concentration peak. The Health
Effects and Guaranteed Compliance rationales also may have some application
to problems involving concentration field error.
TABLE VI-8. PERFORMANCE ATTRIBUTES ADDRESSABLE USING
PERFORMANCE STANDARD RATIONALES
Performance
Attribute
Accuracy of the
peak prediction
Absence of
systematic bias
Lack of gross
error
Temporal
correlation
^natial alinnmpnt
Health*
Effects
X
X
Control Level* Guaranteed
Uncertainty Compliance
X X
X
Pragmatic/
Historic
X
X
X
X
X
* These are most suited for photochemically reactive pollutants subject
to short-term standards.
VI-26
-------
One conclusion seems clear. Unless more comprehensive rationales are
developed in subsequent research work, several must be used simultaneously
to completely define standards of performance. Any one of the four can be
used to specify allowable bounds on model performance in predicting peak
concentrations. Either the Health Effects or the Pragmatic/Historic ration-
ales can be helpful in setting standards for error measures. Only the latter
of these two rationales is of use for addressing attributes of the other types,
We associate in Table VI-9 each rationale with those generic issues
for which its use is appropriate. Several assumptions are embedded in
that table. Among them are the following:
> Health effects are not of overriding concern in PSD and OSR
issues, for reasons noted earlier. (Even though we indicate
such a rationale may be used in addressing other specific-
source issues, we observe that plume "narrowness" can limit
downwind health impact).
> Near-source peak concentrations are not of primary interest
in OSR, but rather "far-field" average values.
> The Guaranteed Compliance rationale is of use in addressing
questions involving PSD as long as the air quality standards
being used are the PSD class increments.
TABLE VI-9. ASSOCIATION OF RATIONALES WITH GENERIC ISSUES
Issue Category
Multiple-Source
SIP/C
X
X
AQMP
X
X
PSD
X
X
Specific-Source
NSR
X
X
OSR EIS/R
X
X
LIT
X
X
Rationale
Health Effects
Control Level
Uncertainty
Guaranteed X X X X X
Compliance
Pragmatic/ X X X X X X
Historic
YI-27
-------
Having outlined the rationales we consider in this report, it remains
to match them with the set of performance measures we recommended earlier
in this chapter. As is clear from Table VI-8, we have no alternative but
to apply the Pragmatic/Historic rationale for those measures designed to
test for systematic bias or to evaluate temporal behavior and spatial align-
ment. However, several alternatives exist for measures dealing with peak
performance and gross error.
We select in the following ways from among the alternatives. Hoping to
avoid introducing a procedural bias, we first eliminate the Guaranteed Com-
pliance rationale from further consideration. Then, because the Health
Effects rationale is better suited for use in setting standards for peak-
accuracy measures, we choose to use it only in that way.
Our recommended choice for use in establishing standards for peak-
accuracy measures is a composite one, combining the Health Effects and Control
Level Uncertainty rationales. Were a model to overpredict the peak, a
control strategy designed based on its prediction might be expected to abate
the health impact actually occurring. If the model underpredicted, however,
the control strategy might be "underdesigned," with the risk existing that
some of the health impact might remain unabated even after control implemen-
tation. The penalty, in a health sense, is incurred only when the model
underpredicts. The Health Effects rationale then is one-sided, helping us
set performance standards only on the "low side."
On the other hand, the Control Level Uncertainty rationale is bounded
"above" and "below", that is, its use provides a tolerance interval about the
value of the measured peak concentration. For a model to be judged accept-
able under this criterion, its prediction of the peak concentration would
have to fall within this interval. Model underprediction could lead to
control levels lower than required, but residual health risks. Overpre-
diction, on the other hand, could lead to abatement strategies posing little
or no health risk but incurring control costs greater than required.
VI-28
-------
For the above reasons, we suggest that the Control Level Uncertainty
rationale be used to establish an upper bound (overprediction) on the
acceptable difference between the predicted and observed peak. We would
choose the lower bound (underprediction) to be the interval that is the
minimum of that suggested by the Health Effects and Control Level Uncertainty
rationales.
We list pur recommendations in Table VI-10, noting the possibility for
peak-accuracy measures that the recommended rationales may not be appropriate
in all applications for all pollutants. Whether health effects would be an
appropriate consideration when considering TSP, for instance, is unclear.
The Health Effects rationale is best suited for use in urban applications
involving short-term, reactive pollutants. In those circumstances when the
HE or CLU rationales are not suitable, we suggest the Pragmatic/Historic
rationale.
TABLE VI-10. RECOMMENDED RATIONALES FOR SETTING STANDARDS
Performance
Attribute
Accuracy of peak
prediction
Absence of
sytematic bias
Lack of gross
error
Temporal cor-
relation
Recommended Rationale
Health Effects* (lower side/underprediction)
Control Level Uncertainty* (upper side/overprediction)
Pragmati c/Hi stori c
Pragmatic/Historic
Pragmati c/Hi stori c
Spatial alignment Pragmatic/Historic
* These may not be appropriate for all regulated pollutants in all applica-
tions. When they are not, the Pragmatic/Historic rationale should be
employed. They are most applicable for photochemicalily reactive pollu-
tants subject to a short-term standard (0, and N0?, if a 1-hour standard
is set).
VI-29
-------
3. Sumnary Table of Recommended Measures and Standards
Until now, our discussion has remained general when relating performance
measures and standards. Here we become specific. In Table VI-11, we sum-
marize for each of the five problem types whose presence we are testing for
the performance measures we recommend and the standards we suggest. Since
the actual value of the standard may vary from one application to another
or between pollutant types, we present sample values calculated based on a
sample case. The example is appropriate for consideration of SIP/C in the
Denver Metropolitan region and is described in a case study fashion in Section
E of this chapter.
Where we invoke the Pragmatic/Historic rationale as justification for
selecting specific standards, we also state the specific guiding principle
we followed. We summarize those here:
> When the pollutant being considered is subject to a short-
term standard, the timing of the concentration peak may be an
important quantity for a model to predict. This is parti-
cularly true when the pollutant is also photochemical1y
reactive. We state as a guiding principle: "For photochem-
ically reactive pollutants, the model must reproduce reason-
ably well the phasing of the peak." For ozone an acceptable
tolerance for peak timing might be ± 1 hour.
> The model should not exhibit any systematic bias at concen-
trations at or above some appropriate minimum value (possibly
the NAAQS) greater than the maximum resulting from EPA-
allowable calibration error. We would consider in our calcu-
lations any prediction-observation pair in which either of
the values exceed the pollutant standard. Error (as
measured by its mean and standard deviation) should be
indistinguishable from the distribution of differences
resulting from the comparison of an EPA-acceptable monitor
with an EPA reference monitor. The EPA has set maximum
allowable limits on the amount by which a monitoring technique
may differ from a reference method (40 CFR §53.20). An '
VI-30
-------
TABLE VI-11. SUMMARY OF RECOMMENDED PERFORMANCE MEASURES AND STANDARDS
Performance
Attribute
Accuracy of the
peak prediction
Performance Standard
Performance of Measure Type of Rationale
Ratio of the predicted
station peak to the
Measured station peak .
(could be at different
stations and times)
Health Effects -
(loner side) com-
bined xith Control
Level Uncertainty
(upper side)
Cuidins Principle
Limitation on uncertainty
in aggregate health
impact and pollution
abatement costsf
Sample Value
(Denver Example)
BO
- <. 150 percent
Difference in timing of
occurrence of station
Pragmatic/Historic
Model must reproduce
reasonably well the
phasing of the peak.
s«y, ±1 hour
± 1 hour
Absence of Average value and standard Pragmatic/Historic
systematic bias deviation of mean devia-
tion about the perfect
correlation line normal-
ized by the average of the
predicted and observed con-
centrations, calculated for
all stations during those
hours when either predicted
or observed values exceed
some appropriate minimum
value (possibly the NAAQS}.
(u. o_)
x "'OVERALL
Lick of gross Average value and Stan- Pragmatic/Historic
error dard deviation of absolute
mean deviation about the
perfect correlation line
normalized by the average
of the predicted and
observed concentrations,
calculated for all sta-
tions durino those hours
•lien either predicted or
observed values exceed some
appropriate minimum value
(possibly the NAAQS)
Wo or very little systematic
bias at concentrations (pre-
dictions Or observations) at
or above some appropriate
Minimum value (possibly the
NAAQS); the bias should not
be worse than the maximum
bias resulting from EPA-
allowable monitor calibra-
tion error (-8 percent is
a representative value for
ozone); the standard devia-
tion should be less than or
equal to that of the differ-
ence distribution of an EPA-
acceptable monitor** com-
pared with a reference moni-
tor. (3 pphm is representa-
tive for ozone at the 95
percent confidence level)
For concentrations at or
above some appropriate
minimum value (possibly
the NAAQS), the error
(as measured by the overall
values of |uj and o|^| }
should be indistinguishable
from the difference result-
ing from comparison of an
EPA-acceptable monitor with
a reference monitor
No apparent bias at
ozone concentrations
above 0.06 ppir
(see Table VI-12 and
Figures VI-5 and VI-6
for further details)
NO excessive gross
error (see Table
Vl-12 and Figures
VI-5 and VI-6 for
further details)
\ I*!/ OVERALL
Temporal correla- Temporal correlation coef- Pragmatic/Historic
tion* ficients at each monitor-
ing station for the entire
modeling period and an
overall coefficient for
all stations
rt . rt
M OVERALL
for 1 i 1 i. M monitoring
stations
At a 95 percent confidence
level, the temporal pro-
file of predicted and
observed concentrations
should appear to be In
phase (in the absence of
better information, a con-
fidence interval may be
converted into a minimum
allowable correlation
coefficient by using an
appropriate t-statistic)
For each monitoring
station,
0.69 i rt <.0.97
Overall,
r, - 0.88
TWERALL
In this example a
value of r >. 0.53 is
significant at the
95 percent confidence
level
Spatial alignment
Spatial correlation coef-
ficients calculated for
each modeling hour con-
sidering all monitoring
stations, as well as an
overall coefficient for
the entire day
xi* "OVERALL
for 1 i J <. N modeling
hours
Pragmatic/Historic
At a 95 percent confidence
level, the spatial distri-
bution of predicted and
observed concentrations
should appear to be cor-
related
For each hour,
-0.43 f. r i 0.66
Overal1,
r - 0.17
"OVERALL
In this example a
value of r 2. 0.71 is
significant at the 95
percent confidence
level
- These measures are appropriate -hen the chosen model is used to consider questions involving photochemically reactive
pollutants subject to short-term standards.
t These may not be appropriate for all regulated pollutants In all applications. When they are not the Pragmatic/
historic rationale should be employed.
« The EPA has set maximum allowable limits on the amount by which a monitoring technique may differ from a reference
method £ •EPA™cc2ptable monitor" 1s defined here to be one that differs from a reference monitor by up to the
•aximum allowable amount.
VI-31
-------
"EPA-acceptable monitor" is defined here to be one that
differs from a reference monitor by up to the maximum
allowable amount.
> Prediction and observation should appear to be correlated
at a 95 percent confidence level, both when compared
temporally and spatially. We can estimate the nininum
allowable value for the respective correlation coef-
ficient by using a t-statistic at the appropriate per-
centage level and having the degrees of freedom required
by the number of prediction-observation pairs.
The guiding principles noted above are plausible ones, though in some
cases they are arbitrary. As a "verification data base" of experience is
assembled, historically achieved performance levels may be better indicators
of the expected level of model performance. Standards derived on this more
pragmatic basis may supplant those deriving from the "guiding principles"
followed in this report.
4. Formulas for Calculating Performance Measures and Standards
A number of performance measures are recommended in Table VI-6. Here
we state explicitly the equations used for their calculation and the forms
assumed by the standards. We include, where appropriate, brief theoretical
justifications for these relationships.
The definitions are self-explanatory for measures testing the accuracy
of the peak model prediction. Specifically,
C
pm
where Cp is the peak station prediction, Cp,,, is the peak station measurement,
a is the lower bound on the ratio of the peaks, and B is the upper bound.
The bounds may be determined either from Pragmatic/Historic considerations
VI-32
-------
or, where possible, by means of the Health Effects/Control Level Uncertainty
rationales described in Appendix D. The latter of these two approaches may
prove feasible only when considering photochemically reactive pollutants
(particularly ozone) subject to a short-term standard. Also, for such
reactive species,
|Atp| < & , (VI-2)
where |At | is the absolute value of the difference between the predicted
and observed times of the station peak, and 6 is the maximum allowable dif-
ference, say, one hour (this is an arbitrarily set value).
Underlying our definitions of bias and error is the following assump-
tion: A priori, we have no reason to prefer either prediction or observa-
tion as a better measure of reality. Both, in fact, can be subject to sig-
nificant uncertainty. It follows from this assumption that residuals (pre-
dicted concentrations minus observed ones) should be taken perpendicularly
about the perfect correlation line.
We emphasize an important point: The residual for a given prediction-
observation pair is not the geometric distance from the perfect correlation
line, as displayed in a correlogram (such as the one shown later in
Figure VI-3). Rather, the geometric distance must be scaled downward by a
factor of ^2. That this is so follows from the discussion presented below.
It is based on our requirement that prediction and observation differ by no
more than the maximum amount by which an EPA-acceptable monitoring technique
may differ from the accepted reference technique.
Uncertainty in monitoring results can be introduced from many sources.
Three principle source categories are the calibration method, the agreement
with the reference monitoring technique, and the actual instrument error.
The last of these categories includes instrument noise and precision, mea-
surement drift, and interference from other contaminants. In defining the
characteristics of the EPA-acceptable monitor we wish to use as a standard,
VI-33
-------
we have chosen to include only the first two error source categories. We
thus eliminate the need to consider performance characteristics of specific
monitoring instruments. Also, in comparing a monitor with an instrument
using the EPA-accepted reference monitoring technique, it is not unreason-
able to assume that both are subject to the same instrument error.
We may define an acceptance standard for a model insofar as error
and bias are concerned: The distribution of differences between prediction
and observation must be indistinguishable from that resulting from the com-
parison of an EPA-acceptable monitor with the accepted reference monitor.
Specifically, we define "indistinguishable" to mean
o^^e , (VI-4)
where £ and e can be determined from federal regulations (40 CFR §53.20)
for instrument performance, and TT and o_ are defined below.
We may confirm a model's acceptability by hypothesizing that the
acceptance standard for bias and error is satisfied and checking to deter-
mine whether this hypothesis is violated. Consistent with this approach,
we may assume that each prediction and observation pair are random samples
drawn from the same distribution, the one that 'describes the behavior of
an EPA-acceptable monitor with respect to a reference monitor. The stan-
dard deviation (S.D.) of a random variable whose value is the difference of
two other random variables having the same S.D. a may be expressed as
The geometric distance from the perfect correlation line, d., may
be written as
P. - M.
, = -5 1 , (VI-6)
1 ^
VI-34
-------
where Pi and M. are the i-th prediction-observation pair. We are search-
ing for a test variable o to compare with o. Therefore, referring to
Equation VI-5, we see that we must divide di by /2 to obtain the properly
scaled mean deviation from the perfect correlation line, d*, that is,
P. - M.
Thus, the average and standard deviation of the mean deviation may be
expressed as
(VI-8)
These quantities may be compared with those characterizing the distri-
bution of differences between an EPA-acceptable monitor and a reference
instrument. Those values may be derived from 40 CFR §53.20. As an example,
(see Burton, et al., 1976) an EPA-acceptable monitor for ozone/oxidants
could have a -8 percent bias and a 95 percent confidence interval of
±3 pphm (a a of 1.53 ppm). If an EPA-acceptable monitor were defined to
be subject to instrument error as well, the -8 percent bias would remain
because it is assumed due to calibration, but the 95 percent confidence
interval would increase to ±7 pphm (a a of 3.57 ppm).
We noted earlier that the "seriousness" of the magnitude of a given
residual depends on the ambient concentration of the pollutant being con-
sidered. For instance, a value for d* of 2 pphm might be considered of
less importance when ambient concentrations are on the order of 30 pphm
than when they are 10 pphm. In consideration of this effect, we suggest
VI-35
-------
normalizing residuals by the arithmetic average of the predicted and
observed concentrations for a given pair. This is consistent with our
earlier statement that, a priori, we have no reason to prefer observa-
tion over prediction as inherently better indicators of reality.
Defining the average concentration to be
P. + M.
'AVE = -4-^ • (VI-10)
we may write expressions for the normalized average and standard deviation
of the mean deviation about the perfect correlation line:
(VI-12)
A deliberate redundancy has been built into the list of suggested per-
formance measures. Both ap and a_ are measures of "scatter" about the
perfect correlation line. Thus, they are also indicators of gross error
and may be used in conjunction with those measures explicitly listed in
Table VI-6 for use in investigating gross error. These measures consider
absolute rather than signed residuals. Specifically the normalized
average value and standard deviation of the absolute deviation about the
perfect correlation line may be written
.VI-36
-------
(VI-14)
Their values may be compared with standards such that
M £
(VI-15)
(VI-16)
when the values of A and Y may be derived from instrument performance
specifications in federal regulations.
It may be helpful to visualize the definitions of d| and CAVE geomet-
rically on a correlogram. Figure VI-1 is a schematic, showing the orien-
tation of the d*-CAVE axes with respect to the P-M axes of the correlogram.
The CAV£ axis is aligned with the perfect correlation line, and both the
d* and C.uc axes are scaled downward by a factor of & from the P and
UAVE
M axes.
FIGURE VI-1.
ORIENTATION AND SCALING OF CAVE AND d* AXES
ON A PREDICTION-OBSERVATION CORRELOGRAM
VI-37
-------
Finally, we consider measures suitable for use in testing for tem-
poral correlation and spatial alignment. The former of these is of con-
cern when the chosen model is used to consider questions involving photo-
chemically reactive pollutants subject to a short-term standard. We sug-
gest the use of temporal correlation coefficients, whose values are
defined to be
•pr 53 /P.- i - Vp \/M.
- 3ST V *J 1A '
™
v ~V°H: (VM7)
IV
OVERALL
where r^ is the temporal correlation coefficient at the i-th station for
the N divisions of the modeling period, and rtovERALL *s the avera9e correla-
tion coefficient for all the K monitoring stations. Also, ^ and ap.
are the mean and standard deviations of the predictions for N hours at the
i-th station. Similarly, ^ and °H. are the mean and standard deviations
of the concentrations at the 1-th station.
In testing for spatial alignment, we recommend using the following
spatial correlation coefficients:
OVERALL
VI-38
-------
where rx. is the spatial correlation coefficient at the j-th hour for the
K monitoring stations, and rxnvERALL 1S ^e avera9e correlation coefficient
for all the N modeling period divisions (e.g., hours). Also, yp. and °p.
are the mean and standard deviations of the predictions for K stations at
the j-th hour. Similarly, vi^. and o^ are the mean and standard deviations
J J
of the concentration at the j-th hour.
As for the form of the standard, we would require that
where r . is defined at the 95 percent confidence level, perhaps using
a t-statistic if no better method is apparent.
I. A SAMPLE CASE: THE SAI DENVER EXPERIENCE
In Section D we recommended a set of measures and standards for use in
evaluating model performance. Here we illustrate how these measures might
actually be used in practice. To do so, we draw on SAI experience in model-
ing the Denver metropolitan region (Anderson et al . , 1977) using the grid-
based SAI Airshed Model (Ames et al., 1978). We first show for the sample
case the values we calculate for the performance measures; then we discuss
how to interpret their meaning.
1. The Denver Modeling Problan
Over the past several years, Region VIII of the EPA has prepared an
Overview EIS assessing the impact on the Denver metropolitan region of the
proposed construction of twenty-two separate wastewater treatment projects.
Adopting a regional approach, they assessed the projected impact of the
facilities in several key ways, among which was their effect on air quality,
They contracted with SAI in late 1976 to conduct that portion of the
assessment. SAI employed several air quality models, one a long-term
climatological model (COM) and the other a short-term photochemical model
(the SAI Airshed Model). We consider the latter of these in our sample
case.
VI-39
-------
The grid-based Airshed Model is fully three-dimensional and capable
of simulating concentrations of up to 13 chemical species, including ozone,
nitrogen dioxide and several types of reactive hydrocarbons. The modeling
grid chosen for overlaying the Denver Metropolitan region was 30 miles by
30 miles, subdivided horizontally into grid cells two miles on a side.
In cooperation with local agencies, SAI assembled meteorological
information (spatial and temporal profiles of temperature and inversion
height, as well as wind speeds and directions) characterizing atmospheric
conditions on several summertime test days, 29 July 1975, 28 July 1976,
and 3 August 1976. Also, gridded emissions inventories were compiled
(hourly by species) for those days as were estimates for the years 1985
and 2000. Simulations were then conducted, with projections also made
of air quality in the two subsequent years.
2. Values of the Performance Measures
We compare in this sample case the predicted and observed concentra-
tions of ozone at each monitoring station in the regional measurement net-
work. :- The issues we address are SIP/C and AQMP. On the test date we have
chosen, 28 July 1976, eight monitoring stations provided ozone concentra-
tion data. Their locations are shown in Figure VI-2. Of the nine sta-
tions, all but CAMP provided usable ozone measurements. Data were
recorded as hourly averages for each hour throughout the day.
The Airshed Model generates its predictions as grid cell-averaged
hourly concentrations. Through interpolation, these values may then be
used to estimate station predictions (concentrations at fixed points
rather than grid cell averages). Plotted in Figure VI-3 are the predicted
and observed ozone concentrations at each of the eight stations reporting
on the modeled day (Anderson, et al., 1977). From the station predictions
and observations, we can calculate performance measure values. We present
the values of these measures in Table VI-12. We indicate in the table
how these values might be interpreted in evaluating model performance,
considering each in more detail below.
VI-40
-------
KEY
NG - Northglenn NJ
WE - Wei by GM
AR - Arvada 0V
CR - C.A.R.I.H. PR
CM - Continuous Air Monitoring
Program [CAMP]
National Jewish Hospital
Green Mountain
Overland
Parker Road
NORTH
,
3K "*--•' *?—*%--=
''-^r-^-:~-^~---~-^-^~^-'--^s^'^-£i'-*
j .
I I t I I I I I I
1_J I I I I I 1 1
SOUTH
FIGURE VI-2. LOCATIONS OF MONITORING STATIONS IN THE DENVER METROPOLITAN REGION
VI-41
-------
.-0 -0 PARKER RO
5 9
8 9 10 11 12 2
? To TT T? T I I
fl
*
Time of Day, By Hourly Interval
Cbsar/ed'
Predicted
FIGURE VI-3.
PREDICTED AND OBSERVED OZONE CONCENTRATIONS
AT EACH MONITORING STATION DURING THE DAY
(DENVER, 28 JULY 1976)
.VI-42
-------
TABLE VI-12. SAMPLE VALUES FOR MODEL PERFORMANCE STANDARDS (DENVER EXAMPLE)
Performance
Attribute
Accuracy of
the peak pre-
diction
Compos 1te
Importance
Category*
1
Performance Measure
Ratio of predicted to mea-
sured station peaks
vc-.
Timing of the pe«k+
it.
Performance Standard
Calculated Value
Interpretation
80 i f^ilSO percent
pm
t 1 hour
99 percent
+ 1 hour
Peak performance of the model
Is satisfactory.
The timing of the peak Is
satisfactory. Since the modal
provides only hourly averages.
this 1s as finely as at can b«
determined. p
•<
•—i
i
CO
Absence of
systematic
bias
Average value and standard
deviation of the mean
deviation about the per-
fect correlation line.
normalized by the average
of the predicted and
observed concentrations
For concentrations (predicted
or observed) at or above the
NAAQS, the bias should not be
greater than the maximum bias
resulting from EPA-allowable
monitor calibration error. A
-8 percent b1as--not normal-
ized—Is representative, which
for this case 1s
11 • -0.4 pphm
o -1.53 pphm
for an EPA-acceptable
mon1torS--see Burton, et al.
(1976)--when all concentra-
tions are considered. An
EPA-acceptable monitor can
have an uncertainty with
respect to a reference moni-
tor of as much as i 3 pphm
for ozone at a 95 percent
confidence level.
For concentrations greater
than the NAAQS (8.0 pphm),
i • 4.IS
o- • 19.41
For all concentrations.
M • -23.4t
o- • 33.5t
u
In a form suitable for
comparison with non-
normalized Instrument bias,
M • -0.52 pphm
o • 1.22 pphm
when all concentrations
are considered.
For concentration] at or abovt
the NAAQS, a slight positive
bias exists, though within
acceptable bounds. When all con-
centrations are considered, a
larger negative bias seems to
exist. Put In a form suitable
for comparison with an tfft-
allowable monitor,' however, the
bias appears to be 1nd1st1ngu1sh-
ble from that resulting from maxi-
mum allowable calibration error.
Overall, no conclusion of unac-
ceptably high bias would seem
justified.
-------
TABLE VI-12 (Concluded)
Compos He
Performance Importance
Attribute Category*
^performance Measure
Lack of
grow error
Average value and standard
deviation of the absolute
mean deviation about the
perfect correlation line,
normalized by the average
of the predicted and
observed concentrations
Performance Standard
Calculated Value
For concentrations at or
above the NAAQS. the error
should be Indistinguishable
from the distribution of error
resulting from comparison of
an EPA-acceptable monitor'
with a reference monitor.
Representative values for an
EPA-acceptable monitor (-8
percent bias; i 3 pphm at a
95 percent confidence level)
might be estimated to be
|»| • 1.22 pphm
«)M| • O.gS pphm .
Note that these values are
based on non-normal1 ted
deviations.
For concentration* greater
than the NAAQS (8.0 pphm),
|5| - 1S.7J
9... - 19.41 .
l»l
For all concentrations.
|5| - 31.51
.,.,. 33.51 .
In a font suitable for com-
parison with non-normal 1ted
Instrument error,
I,) - 1.12 pphf)
• • 0.72 pphm .
Interpretation
For concentrations at or above
the KAAQS, the error seems to be
about half of what Is seen If all
concentrations are considered.
The model thus appears to be sub-
ject to less error at the higher
concentration range. We can
determine the acceptability of
this error level by converting to
a non-normal 1 ted form for com-
parison with an estimate of that
resulting from use of an EPA-
aceeptable monitor.* Even when
all concentrations'are considered,
the error In model predictions
appears to be less than that
resulting from monitoring technloue
differences. Me conclude thut the
mode! performance Is acceptably
good Insofar as error Is concerned.
Temporal
correlation
Temporal correlation coef-
ficients at each monitor-
Ing station and an overall
coefficient (the Hi-
station average)
*V COVERALL
for 1 4 1 4 H monitoring
stations
At a 95 percent confidence
level, predicted and
observed concentrations
should appear to be cor-
related. Using • t-
stattstlc to estimate the
minimum acceptable correla-
tion coefficient. In this
example, we find
Xin
0.53
For each monitoring station. For all stations and overall, pre-
... ... dieted end observed concentrations
o.w 4 r. iu.»i . tppMr tg DC correlated. The model
1 performance appears to be within
Overall. acceptable bounds.
'OVERALL
0.88
Spatial
alignment
Spatial correlation coef-
ficients for each model-
ing hour and an overall
coefficient for the entire
day (the all-hours
average)
V "OVERALL
for 1 4 J 5. H modeling
hours
At a 95 percent confidence
level, predicted and
observed concentrations
should appear to be cor-
related. Using a t-
statlstlc to estimate the
minimum acceptable correla-
tion coefficient, in this
example we find
r, • 0.71 .
"mln
For each modeling hour.
-0.44 t r. t O.Mi
Overall.
"OVERALL
0.17
During none of the hours considered
(all daylight hours) do prediction
and observation appear to be cor-
related at the 95 percent confidence
level. Model predictions appear to
be spatially misaligned, although
the presence of temporal correlation
suggests that the misalignment may
not be a serious problem. (Another
Interpretation may be correct!
Either rx is too stringent a measure
of spatial alignment or rt Is too
lenient a measure of temporal be-
havior. Only by additional research,
however, will we be able to confirm
or refute this.)
* The composite Importance category is determined by consulting Tables Vl-2 end IV-3 for the appropriate Issue and pollutant/averaging time (in this
example, SIP/C and ozonc/one-hour averaging time). The composite category is the less stringent of the two Importance rankings.
t These measures are appropriate when the chosen model is used to consider questions involving photochemically reactive pollutants
subject to short-term standards.
I An "EPA-acceptable monitor"
maximum allowable amount.
Is defined here to be one that differs from a monitor using the EPA reference technique by up to the
-------
3. Interpreting the Performance Measure Values
Briefly, we summarize the conclusions suggested by the model perfor-
mance measures. First, even though the predicted and observed concentra-
tion peaks occur at different monitoring stations and times (North Glenn
at 2-3 p.m. versus Welby at 1-2 p.m.), their values agree quite closely,
well within the acceptable tolerance.
Second, systematic bias appears to remain within acceptable limits.
We can demonstrate this graphically/first by plotting prediction-
observation pairs in a correlogram (see Figure VI-4) ar.d then plotting the
normalized mean deviation about the perfect correlation line as is done
in Figure VI-5. From this latter figure (suggested by Anderson, et al.,
1977) we see that the Airshed Model, while systematically underpredicting
at concentration levels below 4.5 pphm, does not appear subject to such
bias at concentrations above that level. Incidentally, recent internal
studies at SAI have indicated that the Denver region may be subject to
background concentrations as high as 4 pphm (Anderson, 1978), values
substantially higher than those supplied as input to the Airshed Model.
Also, we may compare the deviations about the perfect correlation line
to those that we would expect from comparison of an EPA-acceptable
monitor with a monitor using the EPA reference technique (normally
distributed, -8 percent bias, ± 3 pphm at the 95 percent confidence level —
see Burton, et al., 1976). This comparison is shown in Figure VI-6. To
aid in presenting this graphical comparison, we have converted deviations
to the non-normalized form. We observed that the means (a measure of syste-
matic bias) of both are nearly the same and that the standard deviation of
prediction-observation deviations is somewhat less than that of the monitor-
ing error distribution.
Third, consistent with our conclusions about systematic bias, gross
error also appears to be within tolerable bounds. We show in Figure VI-7
the distribution of non-normalized error, that is, the absolute deviation
of predictions and observations from the perfect correlation line. For
reference we also" estimate the corresponding distribution resulting from
VI-45
-------
20
15
E
O
IB
E
1
U
10
o
CO
O
TJ
B
C
li
O
C
5 NAAQS 10 15
P = Predicted 03 Concentration (pphm)
FIGURE VI-4. CORRELOGRAM OF OZONE OBSERVATION-PREDICTION
PAIRS FOR SAMPLE CASE (DENVER, 28 JULY 1976)
VI-46
-------
0.25i—
e
o
0
o
LJ
a
C7)
1-0-5
c
o
fij
'C.
O
u
CJ
H-
'-
O
f i..
E
O
"1
I
-1.0-
-1.5
-2.01-
,r\ A „
Average Ozone Concentration (pphm)
fPredicted + Observed!
I 2 J
FIGURE VI-5.
NORMALIZED DEVIATIONS ABOUT THE PERFECT CORRELATION LINE AS A FUNCTION
OF OZONE CONCENTRATION (DENVER, 28 JULY 1976)
-------
MEAN (STO, DEV.
1.22 pphmj
DEVIATION OF PREDICTED
VERSUS OBSERVED POINTS
FROM PERFECT CORRELATION
LINE (111 ONE-HOUR-AVERAGED
DATA POINTS)
0.0?
(TRUE-INSTRUMENTAL)
EPA ACCEPTABLE MONITOR
(MEAN BIAS • -8 PERCENT;
i 3 pphm AT 95 PERCENT
'CONFIDENCE LEVEL)
(TRUE-INSTRUMENTAL)
MAXIMUM PROBABLE ERROR
(MEAN BIAS • -B PERCENTi
1 1 pphm AT 95 PERCENT
CONFIDENCE LEVEL)
0 1
Non-norm 11ied Deviation (pphm)
FIGURE VI-6.
NON-NORMALIZED OZONE DEVIATIONS ABOUT THE PERFECT CORRELATION LINE
COMPARED WITH INSTRUMENT ERRORS (DATA FOR 14 HOURS AND 8 STATIONS,
DENVER, 28 JULY 1976)
-------
o.40r
c*
U3
0.30-
4)
u
u
u
O
0.20-
JO
JO
2
0.10-
MEAN (SJQ. DEV. = 0.72 pphm)
ABSOLUTE DEVIATION OF PREDICTED AND OBSERVED
POINTS FROM PERFECT CORRELATION LINE (111
ONE-HOUR-AVERAGED DATA POINTS)
(TRUE-INSTRUMENTAL) EPA ACCEPTABLE
MONITOR (MEAN BIAS = -8 PERCENT;
± 3 pphm AT 95 PERCENT CONFIDENCE LEVEL
3 4
Non-normalized Error (pphm)
FIGURE VI-7.
NON-NORMALIZED OZONE ABSOLUTE DEVIATIONS ABOUT THE PERFECT CORRELATION LINE
COMPARED WITH INSTRUMENT ERROR (DATA FOR 14 HOURS AND 8 STATIONS, DENVER,
28 JULY 1976)
-------
comparison of an EPA-acceptable monitor with an EPA reference instrument. We
see that the mean value and standard deviation of the prediction-observation
"error" are both somewhat less than those resulting from instrument differ-
ences. The conclusion suggests itself that gross error is within acceptable
bounds, though we caution that the shape of the instrument difference curve
1s an estimate and needs to be analyzed In further detail.
Fourth, temporal behavior at each monitoring station seems satisfac-
tory, appearing correlated to better than the requisite 95 percent con-
fidence level. We note that the correlation we have observed provides
information only about the "shape" of the concentration profiles (shown
1n Figure VI-3), not its absolute level. In general, predicted concen-
trations rise and fall when observed values do, though the concentration
values might be quite different. Only by examining bias and error per-
formance measures can we draw conclusions about concentration levels.
Fifth, spatial alignment does not appear to be acceptably good.
During none of the 14 hours considered, do the spatial patterns of pre-
dictions and observations appear to be correlated at the 95 percent
confidence level. In fact, for a number of hours, the correlation seems
quite poor. Two possible explanations exist. Either the spatial cor-
relation coefficient is too "stringent" or the predicted concentration
field in fact is misaligned. Since temporal correlation appears strong,
the lack of corresponding spatial correlation is somewhat surprising,
though countervailing errors responsible for this conceivably could be
present. It is also possible that the temporal correlation coefficient
either is too "lenient" or it should not be computed including concentra-
tions at all daylight hours. Presently, we do not know which of these
explanations is correct, noting only that it is a subject for future
investigation. Conceivably, measurement data errors could also be contri-
buting to the problem.
In this example, we can examine model predictions for spatial mis-
alignment. To do so, we conducted an informal experiment among several
of our staff. In general, reconstructing the "true" concentration
VI-50
-------
field from a "sparse" set of observational data is a difficult and uncer-
tain process. Nevertheless, we attempted, using only station measurement
data, to draw isopleth maps showing contours of constant concentration
values. The process, of course, is a highly subjective one, requiring the
person doing the drawing to make a number of judgmental and often arbi-
trary decisions. In this case, a useful result was achieved.
None of the participants in the experiment were able to draw unam-
biguous isopleth maps for those hours when overall concentrations were low
(before 11 in the morning and after 3 in the afternoon). However, while
they varied widely in their estimates during the four "peak hours" of the
configurations for lower outlying concentration isopleths, each agreed
reasonably well on their estimates of the location of the peak. We com-
pare in Figure VI-8 a "ground-trace" of their composite estimates with
the peak locations predicted by the Airshed Model.
We observe that the ground-traces of the predicted and observed peaks
differ, both in direction and speed of drift. This suggests that either
the model has had some difficulty in simulating atmospheric dispersion
or it is being driven by inputs that imperfectly characterize ambient
conditions on the modeling day. Based on a generally favorable model
performance rating, as judged by the other four types of measures, we
feel the latter of these two explanations is more likely.
The model input data most likely to have caused the alignment problem
is the temporally and spatially varying wind field. By comparing the
ground-trace of the predicted peak with the directions and speeds of pre-
vailing winds that we input to the Airshed model, we confirmed that the
wind field did indeed appear to be "forcing" the predicted pollutant
cloud in just the direction noted in Figure VI-8.
We emphasize that this does not confirm that "errors" in the input
wind field were responsible for the spatial misalignment, but the evi-
dence is suggestive. Final confirmation or refutation would come by
VI-51
-------
NORTH
^2f^3^k3^?fff
^1200-1300
£ ~- '• -a*- '*-i-S-i *r^t- "-
fiea^Cg£*y
(time of day)
PREDICTED
D MEASURED
[ i I I I i I I I I [ I 1 I I I I I L—I
SOUTH
FIGURE VI-8.
GROUND-TRACES OF THE PREDICTED AND OBSERVED PEAK OZONE
CONCENTRATIONS (DENVER, HOURS 1100-1200 TO 1400-1500
LOCAL STANDARD TIME, 28 JULY 1976)
VI-52
-------
rerunning the Airshed Model using a wind field "adjusted" to better
mirror our updated estimates of the meteorology on the modeling day. If
agreement, as evaluated by the five types of performance measures, were
"better," then we might conclude that wind field imperfections were
responsible for our misalignment problems.
F. SUGGESTED FRAMEWORK FOR A DRAFT STANDARD
We have now completed our central objective in this report: the
identification and specification of model performance measures and stan-
dards. In doing so, however, we have not solved the problem but rather
only begun a discussion that will be a continually evolving one. Almost
certainly, the specific measures and standards employed to evaluate
model performance will change as our insight and experience expands.
On balance, the most enduring benefit from this study will be the con-
ceptual structure it sets.
With that structure in mind, we discuss one final subject: a frame-
work for a draft model performance standard. We view the promulgation
of the standard as having two distinct parts: the text of the standard
itself and an accompanying guidelines document. Where the standard
should be quite specific about selecting and applying the performance mea-
sures to be used, there needs to be a guidelines document in which sup-
plementary discussion and examples are provided. While a full examina-
tion of the interrelationships between the two documents is beyond the
scope of the current study, we illustrate in Figure VI-9one possible
configuration.
We focus in this discussion on suggested elements of a draft per-
formance standard. We state several of the functional sections it
should contain:
> Goals and Objectives. The reasons for insisting on model
validation should be stated, as well as a summary of
expected costs and benefits. Our objectives in conduct-
ing performance evaluation should be clearly presented.
VI-53
-------
STANDARD
GOALS AND OBJECTIVES
1
OVERALL MODELING ACCEPTANCE
CRITERIA (E.G., "MODELING
MUST BE DONE FOR 'WORST
CASE1 EPISODE CONDITIONS"
I
DETERMINATION OF
PERFORMANCE MEASURES
1
SPECIFICATION OF
PERFORMANCE STANDARDS
1
CALCULATION OF MEASURES
I
EVALUATION OF
MODEL ACCEPTABILITY
I
DETERMINATION OF
REQUIRED ACTION
RATIONALE FOR GOALS AND OBJECTIVES
GUIDANCE ON CHECKING WHETHER
THE MODELING EFFORT CONFORMS
TO OVERALL ACCEPTANCE CRITERIA
SUPPLEMENTARY GUIDANCE ON
PROPER SELECTION AND RANKING
OF PERFORMANCE MEASURES
BACKGROUND AND STATEMENT |
11 i'
OF RATIONALES FOR STANDARDS
ADDITIONAL GUIDANCE ON THE
GUIDELINES
/;1 I
CALCULATION OF MEASURES
GUIDANCE ON INTERPRETATION
OF THE VALUES OF THE
MEASURES; CASE STUDIES
SUPPLEMENTARY DISCUSSION OF
PROCEDURAL ALTERNATIVES
FIGURE VI-9. POSSIBLE RELATIONSHIPS BETWEEN THE MODEL PERFORMANCE
STANDARDS AND A GUIDELINES DOCUMENT
VI-54
-------
> Overall Modeling Acceptance Criteria. Important criteria
for judging a modeling effort in an overall sense should
be clearly stated, along with the action required if any
of the criteria are not satisfied. Among possible criteria are
the following: The verification must be done for modeling
days typical of "worst case" conditions, the measurement
network must meet certain stated minimum standards
(numbers, types and configurations of the monitoring
stations), and point source models must be verified using
the appropriate prototypical data base (one appropriate for
an application similar to the proposed hypothetical one).
Without these and perhaps other overall criteria being sat-
isfied, model evaluation would be premature.
> Determination of Performance Measures. The procedure must be
stated for determining the performance measures to be used
for model evaluation. Instructions must also be provided
for matching the importance ranking of each of the model
performance attributes to the type of issue being
addressed and the pollutant/averaging time being considered.
We might do so using the importance tables we presented
earlier in this chapter and repeat for convenience in
Tables VI-13 and VI-14.
> Specification of Performance Standards^ The standards must
be clearly stated for each of the performance measures to
be used. We present in Table VI-15 one format for doing
so, presenting the standards in the form of general prin-
ciples. In each instance, the actual numerical standard is
dependent on the characteristics of the specific application.
Guidance must be provided on how to determine the proper
numerical values.
> Calculation of Measures. Each measure should be defined
mathematically, accompanied by directions on precisely how
the measures are to be calculated.
VI-55
-------
TABLE VI-13. IMPORTANCE OF PERFORMANCE ATTRIBUTES BY ISSUE
Importance of Performance Attribute*
Performance Attribute
Accuracy Of the peak
prediction
Absence of systematic
bits
Lock of gross error
Temporal correlation
Spatial augment
S1P/C
1
1
2
2
2
!SSf.
1
i
2
2
2
PSO
1
1
1
3
1
«SR
1
1
1
- 3
3
OSR
2
1
2
3
3
II5/R
1
1
1
3
3
til
1
1
1
3
3
Category 1 - Performance standard must always be satisfied.
Category 2 - Performance standard should be satisfied, but some 1t*My
My be allowed at the discretion of a reviewer.
Category 3 • Meeting the performance standard Is desirable but failure
Is Mt sufficient to reject the model; Measures dealing
•It* this problem should be regarded as 'Informational.*
TABLE VI-14.
IMPORTANCE OF PERFORMANCE ATTRIBUTES BY POLLUTANT
AND AVERAGING TIME
Importance of Performance Attribute*
Pollutants with Short-teni Standards
Pollutants with
Long-term Standards
Performance
Attribute
Accuracy of the
peak prediction
Absence of
systematic bias
Lack of gross
error
Temporal
correlation
Spatial
alignment
V*
(1 hour)1
1
1
1
1
1
CO**
(1 hour)
1
1
1
2
2
NWC*
(3 hour)
1
1
1
2
2
so2
(3 hour)
1
1
1 .
2
2
^* CO
Ull (8 hour)
1 1
1 1
1 1
1 2
1 2
TSP**
(24 hour)
1
1
1
3
2
»2-
(24 hour)
1
1
1
3
2
N02«
(1 year)
3
1
1
VA++
2
TSP
(1 year)
3
1
1
M/A
2
S02
(1 year)
- 3
1
1
H/A
2
• Category 1 - Performance standard Bust be satisfied.
Category 2 - Performance standard should be satisfied, but some leeway may be allowed at the discretion of a reviewer.
Category 3 - Meeting the performance standard Is desirable but failure is not sufficient to reject the model.
•t No short-term H02 standard currently exists.
I Averaging times required by the NAAQS are In parentheses.
*• Primary standards.
tt The performance attribute Is not applicable.
VI-56
-------
TABLE VI-15. MODEL PERFORMANCE MEASURES AND STANDARDS*
Performance
Attribute
Accuracy of the
peak prediction
Performance Measure
Ratio of the predicted station peak to the mea-
sured station (could be at different stations)
Cn /Cn
PP "«
Difference in timing of occurrence of station
peak +
Performance Standard
Limitation on uncertainty in aggregate health
impact and pollution abatement costs*
Model must reproduce reasonably well the
phasing of the peak--say, ±1 hour
Absence of system-
atic bias*
Average value and standard deviation of the mean
deviation about the perfect correlation line,
normalized by the «verage of the predicted and
observed concentrations, calculated for all
stations during those hours when either the
predicted or the observed values exceed some
appropriate minimum value (possibly the NAAQS)
I" °*'OVERALL
No or very little systematic bias at concen-
trations (predictions or observations) at or
above some appropriate minimum value (possibly
the NAAQS); the bias should not be worse
than the maximum bias resulting from EPA-
allowable calibration error (-8 percent is a
representative value for ozone); also, the
standard deviation should be less than or
equal to that of the difference distribution
between an EPA-acceptable monitor" and an EPA
reference monitor (3 pphm is representative
for ozone at the 95 percent confidence level)
Lack of gross Average value and standard deviation of the
error! absolute mean deviation about the perfect cor-
relation line, normalized by the average of the
predicted and observed concentrations, calcu-
lated for all stations during those hours when
either the predicted or the observed values
exceed some appropirate minimum value (pos-
sibly the NAAQS)
For concentrations at or above some appropria-.e
minimum value (possibly the NAAQS) the error
(as measured by the overall values of )u| and
clul) should not be worse than the error result-
ing from the use of an EPA-acceptable monitor**
OVERALL
Temporal Cor-
relation*
Spatial alignment
Temporal correlation coefficients at each mon-
itoring station for the entire modeling period
and an overall coefficient averaged for all
stations
M "-OVERALL
for 1 <. 1 <. H monitoring stations
Spatial correlation coefficients calculated
for each modeling hour considering all monitor-
ing stations, as well as an overall coefficient
average for the entire day
r . r
xj "OVERALL
for 1 i j <. N modeling hours
At a 95 percent confidence level, the temporal
profile of predicted and observed concentra-
tions should appear to be in phase (in the
absence of better information, a confidence
interval may be converted into a minimum
allowable correlation coefficient by using an
appropriate t-statistic)
At a 95 percent confidence level, the spatial
distribution of predicted and observed concen-
trations should appear to be correlated
* There is deliberate redundancy in the performance measures. For example, in testing for systematic bias, u and o-
are calculated. The latter quantity is a measure of "scatter" about the perfect correlation line. This is also
an indicator of gross error and should be used in conjunction with |TT| and °irH-
5 These measures are appropriate when the chosen model is used to consider questions Involving photochemically reac-
tive pollutants subject to short-term standards.
+ These may not be appropriate for all regulated pollutants in all applications. When they are not, standards derived
based on pragmatic/historic experience should be employed.
** By "EPA-acceptable monitor" we mean a monitor that satisfies the requirements of 40 CFR S53.20.
VI-57
-------
> Evaluation of Model Acceptability. The rating procedure
to be used in evaluating model performance must be stated.
Guidance should be supplied on the way in which problem
importance ranking is "folded in" with the performance
rating for each of the measures.
> Determination of Required Action. The alternative actions
required of the model user, depending on the model evalua-
tion, must be stated. Among the possible alternative out-
comes of the model evaluation are the following: The model
is rated acceptable, the model requires a waiver from an
outside reviewer before acceptance can be granted (that is,
the model is deficient in some Category 2-importance problem
area), or the model is unacceptable (the model is deficient
in some Category 1-importance problem area).
We end our discussion of a suitable structure for a draft performance
standard by noting that this has been only a brief encounter with an
important and complex subject. We recommend that it be examined in far
greater detail in subsequent work.
VI-58
-------
VII RECOMMENDATIONS FOR FUTURE WORK
In this study we have suggested a conceptual framework within which model
performance may be objectively evaluated. We have identified key attributes
of a well performing model and selected performance measures for use in detect-
ing the presence or absence of each attribute. For the measures chosen for
use, we have developed explicit standards that specify the range of their
acceptable values.
Throughout, we have maintained the point of view that measures and stan-
dards of performance for models should be determined as independently as possible
of considerations about model-specific limitations and data inadequacies.
Remembering this perspective may be important when evaluating the practical
utility of the procedure suggested in this report in certain point source appli-
cations. This is particularly true when the available measurement data are
"sparse." Where data quantity and resolution (temporal and spatial) are insuf-
ficient to permit meaningful calculation of the performance measures, we view
this more as a data inadequacy that must be overcome than as a deficiency in
the model evaluation framework suggested here.
The development of a performance evaluation procedure for models is an
evolutionary process. We have advanced in this study a conceptual structure
and a first-generation procedure for conducting such an evaluation. We now
recommend ways in which development may proceed, moving from the conceptual
framework provided in this study to the realm of practical application of per-
formance evaluation procedures.
We recommend that the work begun in this study continue in several key
areas. In this chapter we outline briefly our specific recommendations, group-
ing them into three categories: areas for technical development, assessment of
institutional implications, and documents to be compiled. We consider each
category in turn.
VII-1
-------
A. AREAS FOR TECHNICAL DEVELOPMENT
A number of important technical areas remain that would benefit from
additional developmental work. We consider four key areas here.
1. Further Evaluation of Performance Measures
In this study, a sample case has been considered that permits us to
evaluate in a practical situation the utility of the recommended performance
measures in detecting the presence or absence of desirable model attributes.
However, the suitability for use of each of these measures needs further evalu-
ation over a range of circumstances. Specifically, we recommend the following:
> Additional case studies need to be considered, with perfor-
mance measures calculated for each. The choice of case studies
should be nade in order to "stress" the evaluation procedure,
that is, any limitations should be made apparent. The range of
case studies should include both multiple-source and specific-
source applications.
> The behavior of the suggested performance measures needs
to be assessed over a range of conditions. Alternate or supple-
mentary performance measures should be identified, if required,
so as to further extend the range of applicability of the evalua-
tion procedure suggested in this study.
> A performance measure evaluation analysis should be conducted.
Two concentration fields, initially aligned spatially and
temporally, could be progressively "degraded," that is, offset
in space or time. By observing the corresponding changes in the
values of the performance measures and the conclusions that derive
therefrom, insight could be gained into their overall suitability for use.
2. Identification and Specification of Prototypical Point Source
"Test Bed" Data Bases
For the purposes of model evaluation in the many specific-source appli-
cations where site-specific data are either inadequate or nonexistent, a
VII-2
-------
"test-bed," or surrogate, data base is required. This data base must provide
concentration data of sufficient spatial extent and temporal frequency to
permit the calculation of meaningful values for the model performance measures.
Selection of a particular data base could be made by determining, from among
several prototypical "test beds," which derives from conditions most like those
in the proposed application. We recommend that the following work be under-
taken:
> A comprehensive list of prototypical point source situa-
tions should be compiled.
> For each prototypical situation, a "test bed" data base
should be specified and assembled.
3. Examination of Performance Evaluation Procedure in Sparse-Data
Point Source Applications
We have identified in this study several key attributes of a well-
performing model, for each of which presence or absence may be detected
by calculating certain performance measures. However, for the values of
these measures to assume statistical significance, a certain minimum level
is required for the spatial extent and the temporal frequency of the measure-
ment data. Often, in multiple-source applications, such a minimum level is
attained, particularly in urban areas with well-developed monitoring networks.
In specific-source applications, though, a minimum acceptable level of data
may not be attained. To overcome this problem, we have suggested that proto-
typical point source data bases be assembled for the purposes of model evalua-
tion. These data bases would provide sufficiently well-conditioned data for
calculation of the performance measures to be useful.
As a practical matter, however, such data bases are not presently available
to the modeling community. In lieu of their use, other sources of data may be
used for the purpose of model evaluation, despite the deficiencies in such data.
For example, a limited amount of tracer data may be gathered. If the situation
to be modeled involves either construction at a site where another source already
VII-3
-------
exists or retrofit of pollution control equipment, then some limited site-
specific monitoring data may be available. Such data may not be sufficiently
"well-conditioned" to permit meaningful calculation of the performance measures
suggested for use. What can be done? Should calculation of the performance
measures be allowed using the possibly deficient, sparse data available, or
should the model evaluation process be halted until more "robust" data are
acquired? We suggest that the implications of both-these alternatives be assessed,
searching for those limited circumstances where a "middle ground" may be found,
with alternative measures and standards identified for use that are less
"demanding" in their measurement data requirements. The implications of allow-
ing the use of such supplementary measurements also need to be examined.
Also, a related issue may be important in point source modeling appli-
cations: relative versus absolute model performance. Are there circumstances
in which a model may be better able to predict relative, incremental changes
in concentration than absolute ground-level values? It should be determined
whether or not such situations occur in practice. If they do, relative vali-
dation of a model may become a consideration. This could be of concern, for
example, when using a Gaussian model to assess the impact of control equipment
that is retrofit on an existing source. If relative performance is deemed im-
portant in some circumstances, then additional performance measures and stan-
dards should be identified which allow the modeler to make such an assessment.
4. Further Development of Rationales for Setting Performance Standards
Several rationales for setting performance standards have been examined
1n this study. Some of these merit further technical development and assess-
ment of the range of their applicability. Also, additional rationales should
be identified where possible. Towards these ends, we recommend the following:
> Additional developmental work should continue on the Health
Effects (HE) and Control Level Uncertainty (CLU) rationales.
> The use of the HE/CLU rationales in setting a standard for
the ratio of predicted and observed peak station concentra-
VII-4
-------
tions should be exposed to peer review. A journal article
on the subject should be prepared and submitted for publi-
cation.
> Explicit error and bias standards should be calculated for
all regulated pollutants- This may be done using monitoring
specifications in federal regulations. In this study, only
bias and error standards for ozone were calculated numerically.
B. ASSESSMENT OF INSTITUTIONAL IMPLICATIONS
A number of institutional requirements are implied by any decision to
promulgate standards for model performance, or even by a decision to publish
formal guidelines for model performance evaluation. We recommend that these
implications and their attendant procedural and resource requirements be
assessed. Among the many questions to be resolved are the following:
> Regulatory Responsibility
- How should formal performance standards be promulgated—
or should they be promulgated at all?
- If standards are stated or recommended, how will they
be updated?
- Who will accumulate information about historically
achieved model performance? (This information would
be required when setting a standard invoking the Pragmatic/
Historic rationale.)
> Custodial Responsibility
- Who will identify and assemble the prototypical "test
bed" data bases for use in point source applications?
- Who will maintain, store, and distribute the "test bed"
data bases?
> Review Responsibility
- Who should review the adequacy of model performance in
a specific application?
VII-5
-------
- Does a model need to be repeatedly evaluated using a
"test bed" data base? If not, who decides when a
model/data base combination has been sufficiently
examined?
> Advisory Responsibility
- What advisory documents should be provided to the model
user community?
- Who will provide guidance to model users and how should
that support'be funded?
These are simply a few of the many procedural and institutional questions
that arise. Answers to these and other key questions should be sought at
an early date.
C. DOCUMENTS TO BE COMPILED
Specific documents will have to be drafted that describe suggested or
mandated model performance standards. Two documents seem appropriate for
publication (though conceivably they could be combined into a single guide-
lines document). These documents are the following:
> Formally promulgated model performance standards along with
specific procedures for evaluating performance. These could
be presented in guideline form rather than as mandated stan-
dards. The latter of these two approaches may be preferable,
given the complexities of modeling and its attendant uncertain-
ties.
> Advisory/informative model performance guidelines document.
This may provide the advice and information necessary to con-
duct a meaningful model performance evaluation. It could
play the role, with respect to the performance standards,
that is indicated in Figure VI-9.
VII-6
-------
APPENDIX A
IMPORTANT PARTS OF THE CODE OF FEDERAL
REGULATIONS CONCERNING AIR PROGRAMS
A-l
-------
APPENDIX A
IMPORTANT PARTS OF THE CODE OF FEDERAL
REGULATIONS CONCERNING AIR PROGRAMS
PART 50. NATIONAL PRIMARY AND SECONDARY
AMBIENT AIR QUALITY STANDARDS
Section
50.1 Definitions.
50.2 Scope.
50.3 Reference Conditions.
50.4 National primary ambient air quality standards for
sulfur oxides (sulfur dioxide).
50.5 National secondary ambient air quality standards for
sulfur oxides (sulfur dioxide).
50.6 National primary AAQS for particulate matter.
50.7 National secondary AAQS for particulate matter.
50.8 National primary and secondary AAQS for carbon monoxide.
50.9 National primary and secondary AAQS for photochemical oxidants,
50.10 National primary and secondary AAQS for hydrocarbons.
50.11 National primary and secondary AAQS for nitrogen dioxide.
Appendix A—Reference Method for the Determination of Sulfur Dioxide in
the Atmosphere (Pararosaniline Method).
Appendix B—Reference Method for the Determination of Suspended
Particulates in the Atmosphere (High Volume Method).
Appendix C—Measurement Principle and Calibration Procedure for the
Continuous Measurement of Carbon Monoxide in the Atmosphere
(Non-Dispersive Infrared Spectrometry).
Appendix D—Measurement Principle and Calibration Procedure for the
Measurement of Photochemical Oxidants Corrected for Inter-
ferences due to Nitrogen Oxides and Sulfur Dioxide.
Appendix E—Reference Method for the Determination of Hydrocarbons
Corrected for Methane.
Appendix F—Reference Method for the Determination of Nitrogen Dioxide
(24-Hour Sampling Method)
Authority: The provisions of this Part 50 issued under Sec. 4, Public
Law 91-604, 84 Stat. 1679 (42 U.S.C. 1857c-4).
Source: The provisions of this Part 50 appear at 36 F.R. 22384,
November 25, 1971, unless otherwise noted in the CFR.
A-2
-------
PART 51. REQUIREMENTS FOR PREPARATION, ADOPTION,
AND SUBMITTAL OF IMPLEMENTATION PLANS
Section
Subpart A—General Provisions
51.1 Definitions.
51.2 Stipulations.
51.3 Classification of regions.
51.4 Public hearings.
51.5 Submittal of plans; preliminary review of plans.
51.6 Revisions.
51.7 Reports.
51.8 Approval of plans.
Subpart B—Han Content and Requirements
51.10 General requirements.
51.11 Legal authority.
51.12 Control strategy: General.
51.13 Control strategy: Sulfur oxides and particulate matter.
51.14 Control strategy: Carbon monoxide, hydrocarbons, photo-
chemical oxidants, and nitrogen dioxide.
51.15 Compliance schedules.
51.16 Prevention of air pollution emergency episodes.
51.17 Air quality surveillance.
51.17a Air quality monitoring methods.
51.18 Review of new sources and modifications.
51.19 Source surveillance.
51.20 Resources.
51.21 Intergovernmental cooperation.
51.22 Rules and regulations.
51.23 Exceptions.
A-3
-------
Part 51 (continued)
Subpart C—Extensions
51.30 Requests for 2-year extension.
51.31 Requests for 18-month extension.
51.32 Requests for 1-year postponement.
51.33 Hearings and appeals relating to requests for one year
postponement.
51.34 Variances.
Subpart D--Maintenance of National Standards
51.40 Scope.
AQMA Analysis
51.41 Submittal date.
51.42 Analysis period.
51.43 Guidelines.
51.44 Projection of emissions.
51.45 Allocation of emissions.
51.46 Projection of air quality concentrations.
51.47 Description of data sources.
51.48 Data bases.
51.49 Techniques description.
51.50 Accuracy factors.
51.51 Submittal of calculations.
AQMA Plan
51.52 General
51.53 Demonstration of adequacy.
51.54 Strategies.
51.55 Legal authority.
51.56 Future strategies.
51.57 Future legal authority.
51.58 Intergovernmental cooperation.
51.59 Surveillance.
A-4
-------
Part 51 (continued)
51.60 Resources.
51.61 Submittal format.
51.62 Data availability.
51.63 Alternative procedures.
Appendix A—Air Quality Estimation.
Appendix B—Examples of Emission Limitations Attainable with Reasonably
Available Technology.
Appendix C—Major Pollutant Sources.
Appendix D—Emissions Inventory Summary (Example Regions).
Appendix E—Point Source Data.
Appendix F—Area Source Data.
Appendix G--Emissions Inventory Summary (other Regions).
Appendix H--Air Quality Data Summary.
Appendix J--Required Hydrocarbon Emission Control as a Function of
Photochemical Oxidant Concentrations.
Appendix K—Control Agency Functions.
Appendix I—Example Regulations for Prevention of Air Pollution
Emergency Episodes.
Appendix M~Transportation Control Supporting Data Summary.
Appendix N--Emissions Reductions Achievable Through Inspection,
Maintenance and Retrofit of Light Duty Vehicles.
Appendix 0—[No title—but related to §51.18]
Appendix P—Minimum Emission Monitoring Requirements.
Appendix Q--[Reserved]
Appendix R—Agency Functions for Air Quality Maintenance Area Plans
for the AQMA in the State of
for the year .
Authority: Part 51 issued under Section 301(a) of the Clean Air Act
[42 U.S.C. 1857(a)], as amended by Section 15(c)(2) of
Public Law 91-064, 84 Stat. 1713, unless otherwise noted.
Source: Part 51 appears at 36 F.R. 22398, November 25, 1971, unless
otherwise noted. AQMA considerations arose from 41 F.R. 18388,
May 3, 1976, unless otherwise noted in the CFR. NSR seems to
be required by §51.18, with Appendix 0 intended to assist in
developing regulations. Standards are in Part 60.
A-5
-------
PART 52. APPROVAL AND PROMULGATION
OF IMPLEMENTATION PLANS
Section
Subpart A—General Provisions
52.01 Definitions.
52.02 Introduction.
52.03 Extensions.
52.04 Classification of regions.
52.05 Public availability of emission data.
52.06 Legal authority.
52.07 Control strategies.
52.08 Rules and regulations.
52.09 Compliance schedules.
52.10 Review of new source and modification.
52.11 Prevention of air pollution emergency episodes.
52.12 Source surveillance.
52.13 Air quality surveillance; resources; intergovernmental
cooperation.
52.14 State ambient air quality standards.
52.15 Public availability of plans.
52.16 Submission to administrator.
52.17 Severability of provisions.'
52.18 Abbreviations.
52.19 Revision of plans by Administrator.
52.20 Attainment dates for national standards.
52.21 Significant deterioration of air quality.
52.22 Maintenance of national standards.
52.23 Violation and enforcement.
Subpart B—Subpart ODD
SIPs for States and Territories
A-6
-------
Part 52 (concluded)
Subpart EEE--Approval and Promulgation of Plans
Appendix A—Interpretive rulings for §52.22(b)—Regulation for review
of new or modified indirect sources.
Appendix B-C—[Reserved]
Appendix D~Determination of sulfur dioxide emission from stationary
sources by continuous monitors.
Appendix E—-Performance specifications and specification test procedures
for monitoring systems for effluent stream gas volumetric
flow rate.
Authority: 40 U.S.C. 1857c-5, 42 U.S.C. 1857c-5 and 6; 1857g(a); 1859(g)
Source: For Subpart A, 37 FR 10846, May 31, 1972, unless otherwise
noted.
A-7
-------
PART 60. STANDARDS OF PERFORMANCE FOR
NEW STATIONARY SOURCES
Subpart A—General Provisions
Subpart B~Adoption and Submittal of State Plans for Designated Facilities
Subpart C—[Reserved]
Subpart D—Standards of Performance for Fossil-Fuel-Fired Streat Generators
Subpart E—SOP for Incinerators
Subpart F—SOP for Portland Cement Plants
Subpart G~SOP for Nitric Acid Plants
Subpart H--SOP for Sulfuric Acid Plants
Subpart I—SOP for Asphalt Concrete Plants
Subpart J--SOP for Petroleum Refineries
Subpart K--SOP for Storage Vessels for Petroleum Liquids
Subpart L--SOP for Secondary Lead Smelters
Subpart M--SOP for Brass and Bronze Ingot Production Plants
Subpart N—SOP for Iron and Steel Plants
Subpart 0—SOP for Sewerage Treatment Plants
Subpart P—SOP for Primary Copper Smelters
Subpart Q--SOP for Primary Zinc Smelters
Subpart R—SOP for Primary Lead Smelters
Subpart S—SOP for Primary Aluminum Reduction Plants
Subpart T—SOP for the Phosphate Fertilizer Industry: Wet Process
Phosphoric Acid Plants
Subpart U—SOP for the Phosphate Fertilizer Industry: Superphosphoric
Acid Plants
Subpart V—SOP for the Phosphate Fertilizer Industry: Diammonium
Phosphate Plants
Subpart W—SOP for the Phosphate Fertilizer Industry: Triple
Superphosphate Plants
Subpart X--SOP for the Phosphate Fertilizer Industry: Granular Triple
Superphosphate Storage Facilities
Subpart Y—SOP for Coal Preparation Plants
Subpart Z—SOP for Ferroalloy Production Facilities
Subpart AA—SOP for Steel Plants: Electric Arc Furnaces
A-8
-------
Part 60 (concluded)
Appendix A—Reference Methods.
Appendix B—Performance Specifications.
Appendix C—Determination of Emission Rate Change-
Appendix D—Required Emission Inventory Information-
Authority: Sections 111 and 114 of the Clean Air Act, as amended by
Section 4(a) of Public Law 91-604, 84 Stat. 1678
(42 U.S.C. 1857c-6, 1857c-9).
Source: 36 FR 24877, December 23, 1971, unless otherwise noted
in the CFR.
A-9
-------
APPENDIX B
SOME SPECIFIC AIR QUALITY MODELS
B-l
-------
APPENDIX B
SOME SPECIFIC AIR QUALITY MODELS
In Chapter IV of this report we subdivided air quality simulation
models into the following generic categories:
> Rollback
> Isopleth
> Physico-Chemical
- Grid
- Trajectory
- Gaussian
- Box
In this appendix we associate with each of these generic types a number of
specific models. We include many of the models with which we are familiar.
Because the list is intended only to be a representative one, we do not
enumerate all available models. Many others, particularly Gaussian models,
certainly exist and would be appropriate for use in the proper circumstances.
In compiling this list, we have drawn heavily from material in Argonne (1977),
EPA (1977a), and Roth et al. (1976), as well as various program users'
manuals. Also we have made no attempt to screen the models for technical
acceptability.
Among the information contained in the accompanying table is the fol-
lowing: model developer, EPA recommendation status, technical description,
and model capabilities. The last of these is further subdivided into
source type/number, pollutant type, terrain complexity, and spatial/
temporal resolution.
B-2
-------
TABLE 8-1. SOME SPECIFIC AIR QUALITY MODELS
Cateoary
ivtn.cn
Linear Rollback
EM tan isopietk
Nethod
EM
Recuiumnda t Ion
Status
Accepted by CM
for reactive and
non-reactive pol-
lutants;
nonverlflable
Not yet i
•ended ((Kb
active Interest.
Ofveloper
[M
triable
ItlOB
itttm;
rlflablc
Application]
(Sui Rafael.
w
Ptterlptlon
A linear relationship It attuned
between MC mttfom >»0 peak
pollutant level
Itopleths of constant peak Oj on a plot of
NO! vs. HHNC are constructed using • cher-
1C<1 kinetic UOClianlsB timed te fit »•(
chauber dIU for the Isopleth asyeptotot.
The diagram Incorporates diurnal variation
In solar radiation; end Insolation. dllv-
tlon. and Inversion typical of • stagnant.
•rfd-sumr *>y l» LA. Entry to t\»trm
irith (-9 >.•.
•o trMtwnt of
Individual
No troaeont af
lodlvldtMl Mums
OntdMit (bot Hot
Mt DOM comldorvd
aopllod to
dot
consider**
HixtnUoo cot-
back expired
IKS) U MC
dnt (
active and
noiwacilvff
pollutanul
liopltth dlioru It stallar to OM «w«
In EM MtMM *xctpt for • comunt !•>
rather thM • dlvnulljr "rylns on.
Entry ptratttn tUo dlffor. Abtolnt
MWC and HO, concMtratlom *n Mod to
oiUraliw UN ntnt to unit* tkt
actual oinkod moAlot a tm
No tnotaont of
IndlTldoal
net leolonal
comldcnd o«lj
(1-koor (ur»M)
1-1.1 led)
Not con- National
(idored only
(1-koor (urbM)
lapllod)
Peranuoo cut-
back nqolred
(POI) In me
Neaional
CorMo)
m/mc
Regional widen
nnsico-offHiCN.
(MB trld-Keolon Or1«.t« f BB)
Total aero- sorfac* up to J*-
sols. four ruughoest
HC categor- coef-
les (single flclents
bond, slow
double bond.
fast double
bond, car-
bonyl bond)
JO. Norliontal As fine at At fine as
CO. features can input data grid cell
.
. HjOj
MM], »?05
7. HOj,
,
«"7.
RC03.
.
Thr** nt
categortes
be handled .
through wind
field; ver-
tical fea-
tures thr*
cell verti-
cal dlnen-
tion
resolution
(Tta* scale
up to ?«-
hours)
lootlal can- keglonol-saile
centnttun prokleus; •**!-
nous for eac* uatlon stuoHes
hour far eae» have bm c»r-
pollutaM of rled oat for tA
Interest; v*rt1- Us Vegn. and
caVconcentro- Denver; vita
tie* profiles; Sacranent* and
paiM predic- St. Loais soon
tloujs at BOB!- to follow
torlng stations;
concentration
Isapletkt
Roughly the Regional-scale
sane as for the orations; an
SAI node! evaluation study
has bnn con-
ducted for the
V lay Are*.
«*chan«su dividing HC Into "olef Ins a«d
reactive anMtics': -para«f«n».
hlghl/ reacti
less reactive a
ates'; •aldehyd
kewnes-); onl
1s considered; uelt consistent
tlcs. and
sone ar««tles
B-3
-------
TABLE 8-1 (Continued)
CM
**co»Mr*</] of Ut oanltlat.
l»f«m»tiw %t u location >«4 MM of
each, oartlcle nutt be oaintafned. foa-
*w i«*«r
*«•
»u
u HO. C
. W, so. torn**
J» rto* H to ffl* «
|r.de,)l
tkrouo> vlnd- (T1o*
field !•»»
. .
tur*>: MM nrylny mt»(m and i-0
•In*; fi«* chwicil tftcin k,»«< •»
m) (tortlim;
comUitt.
Ml
•nliMtfM ttuty
tat kMM can-
U latin |a»roo-
•ent idCn ootor.
vatlon not tM
toad for 1 tu-
«ow reported «o
Itn ptoar to
Ml.rt.tt •?.)
trl«-$l»t1flc Sourcp Ortantad
EWtt--E»M and No raeommdatton
»t«»*
EnviroiMOntal
Rcuarch and
TocRnoloajr
Stiuliut tlw d<«p*n|UCIM>
>uck »( M*r field dl>p*n1«ii f«r nigfc-
Miy tOuTCH, fwMfktim Mir to pain
. i-ii,, nwpir.
tun*: tlM Mnrliif oliitaM ••< f-«
«r vuwto-d-0 idndtt tM M*ef«t eM»-
litrr/dK«/) m t/tmt rlM foMU
••Utiwu «»M4 Mil »)u4i tta*
(«• mrtfetl dlffmlHtgri kvlmut
dirrmivl^r eftM Mil
to fine at to fin. at Concontrat«a
Input dtt* arid col) ffolda at
»e nod for tloni
tola *a»H-
'todltt (2-0
»r»»a»)! "
Mt of J-fl .rr-
t«M In lonj-
~~bi tranm-
E>MriMt1M of
rilM |Mct ill
tWrMn
lUtm
SCi
Inc. (Stl«v*
«t «1.)
*. 1-0 •
1* aw
«» Brt»llUd
CickmrBtotr
ind mrttnti
itatin
(KtiM
Cntnt
•nMrc*
»t «).
>t«M
UK (U»U
brt*ra. tt);
HOD off *r*d
by CUT
Ptelflc
tiwiranimul
Wrvlcn.
Iw. (butt
Noxtu, C*(
4-tmd
'
to Miit tttrtn I*
. Salvm conurntlw «f
•Mi •OMtllm. It wlnlltM It* wn
1-D irtfld f «»ld n«n| potMrtUI
ld> art t*f*t: f in it«»»«ud wi«f a 1Z-*U*
raacttw •Kknldo); UM Mrttul <•)«••
4« ldod tit* umnl e»l»»; *ortt
dlffutlvltr *> a fwcttM of
ator* (radlont and kotoM o«M* surlocoi
d t»rc«w ara
f loutiUd n**f • tt-iuo wtiun««a
dnloMd to (roat anpirlaii* *nd 1«>*
roacttw Hti M wttcal dlf'utivltf
•Ndtd; tartulDM ara camortro U
•quInlMt aroprtxw and UK.
u» (• w
CooolM to fIM at M fine M
Input data orld col) tlal
rewlktlan tin
at orld »•»-
tiom
Itno
Ana
rionted
, ca.
IM P»H
toqul fao-
turx tan kt
hanolod tkra
tnt Kind
to fiM at Only olano;
tooot oata trajectory
rowlotlo* treat *or-
ora) cowl* air
ttdo.
ftudlM;
toojltm. for
••Mflti lOOClflC
••(••in art
•Itor
Sonera tint St«-
tloariW. Mid
. statlM
(*T.. M/M}. <-j»
ftadtaml oaidint;
aaolM W U
latin (tin dan
, M). Mot tipllclt to flM at «»1y
f. CO. loot l»pH. «o»«t *»t* —-•
\, C»% IMUI fM- ro»oto
-------
TABLE B-1 (Continued)
CM
>«CO*M ndatlon
Status
Developirr
JUTTSIH
Mo recommendation l*U
status
Trajectory-Specific Source Oriented
Ho
sUtn
I/It MB
citlom (for
th* California
Air Resources
»oard~CMm)
LAPS
Description
The model Is tnjectory oriented and
Intended to In used for regional appli-
cation. It appears ta be similar to
01FKIH in tint the air column allows up
to 10 vertical lay«n. Features:
hourly emissions and horizontal 2-0
winds arc Input; sipulated species
include four HC classes (alkenes. aUanet,
aromatic*, and aldehydes) as Mil at o»l-
dantt. SO?, and sulfate' • S4-step mech-
anism it employed; no horllonUI diffu-
sion; vertical dlffustvlty specified at up
to 10 vertical levels with tim* varlitlo*.
The model t> designed to estimate concen-
trations of reactive specie! downwind of
a Ungle point or area! source, hied on
Lagranglan (noving-with-air-parcel) ver-
sioa of ness conservation equation, allott-
ing for background entralnment. th* air
parcel containing the emitted pollutants
is allotted to drift dowiMlnd-
The parcel eipands froai th* plume height
according to Measured plune width and depth
at functions of downwind distance or to
the Posoulll-Glfford methods. Features a
modified M-S-D mechanism for HC-M^iSOj;
I-D wtnd field; pli»» rise Input.
The uodel 1s designed to calculate concen-
tration fields downwind of single or eul-
ttple concentreted sources. The air parcel
II •Honed to drift downwind, dtsperjtn?
laterally and vertically. Features: equi-
libria* coupling of NO. MOj. and 03; first
order conversion of SOj to sulfate; eddy
dlffinlvltles; 2-0 wind field; Brlqji pluoe
rlso; up to 7 species can be specified.
Sowrvs
•woer Type-
Aiiy mmber »o1nt
Ar»a
Line
'"Type**1
Oj. M.
H0». SO}.
Sulfate
Four HC
groups
(alkenes.
alkan*s.
•romatlcs.
aldehydes)
Complexity
Not explicit
but hori-
zontal fea-
ture can be
handled thru
wind field
Resolution
Temporal
As fine at
Input data
resolution
Spatial
Only along
trajectory
track;
several
could be
run side-
by-side
Form of
Outwit
Temporal con-
centration
history In
lir parcel
Problem
Addressed
Regional uldant;
applied to Las
Vegas (UTS).
Troctet (If?*)
and SF Bay Are*
(1*74) « Mil
as LA Basin (U7J-
Eschenroedor and
tmrtlnel)
Single
sovrce
, NO. No terrain
j,S02, Interaction
Sulfate currently
As fine at Resolution
Input data all the My
resolution, to source
long-range (near-.
transport medium-, and
as well far-field)
Temporal con- Single source
ctntratlon problems, e.g.
history 1* refineries, power
downwind plants for fumi-
dtrectlon gattom; trapping;
applied U: Dost
Landing PP. Mwjterey;
Lot Alamitoi PP, IA;
Up te 10
Mlat
sources;
uparat*
•real
lources
•olnt. areal. Oj. NO. No terrain As fine as Resolution Vertical co»
rlevated NO,, S0>. Interaction Input data all the way tratlem amps
«urces lulfate resolution to source vert, m 20 hi
Poi
e
sources
Oil i, LA; romr
Conuiis PP. farmfme-
ton. M; Noons PP.
Nook*. NH; Jefferson
PP. Jefferson. Teus
Vertical concern • Single or few
tratlem mopt (10 tources problems;
« 20 hours only analytical
sta.); ground problems attempted.
concern, cmps and e.g.. steady-state
contours; concen- Couutan plumes
vs. dlst.; ground
com*, crossptot
long-Tern Averaolno
AQDH—Air
Quality Display
Model
Reconv-nded by
EPA In guideline
(No. 2t)
TRU (for This Is a cllnatologlcal steady state
Public Gaussian plunt endel that estinotes the
Health annual arithmetic average SO; and partlcu-
Service) late concentration at ground level. A sta-
tistical nodal based on Larson (US!) Is
used to tranifom th» average concentration
data from a limited number of receptors .
Into an expected geoxetrlc wan and amilinu*
concentration values for several averaging
tines. Features: treats one or two pollu-
tants siaultaneously; Holland (1953) pluw
rise; no plune rise for areil sources; no
temporal variation In sources; 16 wind dir-
ections; • wind speed classes; S stability
classes (Turner. I96«); Pasoulti-etfford
stability coefficients; no chemical mechan-
ism.; perfect reflection at ground; no
effect at ml>1ng height until «, t 0-47H
(•hen i • «2)> fqr * > *?• uniform niiing;
no variation in wind speed with height;
linear superposition of sources;
°z(>) - a>b * c; does not treat fumigation
or downwash; Larson procedure assumes log-
normal concentration distribution and
power law dependence of median and miilnuo
concentrations or averaging time.
Nany (up Point, areal.
to M user> elevated
specified sources
recta-tor
locations;
•• **m
receptors
located on
a uniform
SO,. TSP
(could bo
used for
MO, Mitt
NOj ob-
tained
thru use
of an
Relatively
flat ter-
rain; no
Height dif-
ference
allowed
source and
appropriate receptors
factor)
Steady- Regional
state; scale
averegl ng
time -
1 mo. to
1 f.i
Larsen pro-
cedurt can
be used to
transform
to 1-24
hour
averaget
1 mo.-l yr. aver- Regional long-
aged concentre- term averages for
tlons; Individual relatively Inert
point, area pollutants; urban
source culp- areas prtanrily
ability list
for each
receptor
B-S
-------
TABLE B-l (Continued)
CM
B*ftda
status
Developer
Ottcrtptlon
Sources
•.•solution
Temporal Spatial
»er»ef
Output
«ddrts«ed
COM jnd CDHQC--
Cllnutoloqicat
DisolJx Model
iKOnrviutal by
f» tn gut del In
(No. 27)
nils it • ciiMtoioaicai steady state
Gaussian plimm uadtl for dtteralnlne. lonf-
term (seasonal or annual) arithmetic aver-
aop concentrations
ference 1 no. to
allaxed ot* 1 yr.;
timon source larsa* pro-
end receptors cedure can
bo Meal to
transform
to I-74
now
arareees
icale
\ m. U 1 yr. Peflonal
avoraaed concen- tan averaon
tratlons; source- for relottvely
roceptar culp- inert polle-
aolllty list tants; erban
(COMQC only) areas primarily
TCM--Teus No racnmnndatlon Te*as Air This Is a cllnatalaelcal steady state
Cliwtolofical status Control Sausstan plune xxto) sinrflar to Ofl bvt I "car-
Hoot 1 Board porattno, deslon features •reducing rue tloe
by as suck as tw orders of Monltyde.'
Features: doMMSk and fuoloatlon not con-
sidered; all sources nave a stnol* averaee
•missions rate for tM averaolnf period (I.e..
oontk. season, year); Pasoul 11-Clfford-Turner
itaklllty classes; artitne helent not a factor
because no effect for typical clta*taloey.
IMItatted Point. Mne.
(arbitrary areol. ele-
receptar «ate*
location— sources.
MI iOde) tall stack-
sources la
skort-teni
tial-
SO}. TSP. Dolatlvely
CO. NO, flat ter-
rain; M
talent aHf
Staedy-
itate;
evenelng
time •
aliened ke- I yr!:
tarsen pro-
«0«tOMl
ke used to
transform
to I-M
tlon; concentn-
tia* it (rid
points (op to
SOrfO): a llst-
in§ of tke *
kloMst contrl-
kutart toccn-
centratlom at
eack (rid point
for relatively
Inert polle-
tana; orkan
areas primarily
ttTW—can
also bo ysed
for thprt-tem
tvertotiKj
ilo
status
racommmdation CUT
Hits Is a steady-state sector-averaead
Cavsslam plune nosel tkat calculates en
tratlons of up to sli pollutants from an
unlimited number of point, line, and areal
sources. The nodal can ke operated eltker
In the •cltmataloglcal* made or the •seouen-
tlal* node far skort-ttrm a»er«(in( times.
Feotures: crosswind dispersion function may
ke »ecter-avoraoad over «.»*! for •leajuontlal*
mode and 'tall stacks.* tke crossvlnd dlsper-
siom function is (Iven ky tke eipectod value
wltliln the ZZ.S* sector for receptors ultkin
the doMnMlnd sector; for receptors adjacomt
to tke doMMind sector, a formulation Is used
•kick avoids CMttrltM one-kmr value* v*M
accvmutatlno. concentration estimates for
multiple kour avoraoos; erloes plumi rise;
sleek tip damask Ulftord) for Ull st»cU;
•Ind speed poner lav; keif-Ufa decay fac-
tors for species: chemistry not treated dir-
ectly; perfect raflection at around and nti-
tno, layer; unlaue tmiislens rate for eack
source Uial may ke varied dlurnally. Mekly
or mantkly; S stability classes.
unlimited Point line
(up to la areal elm-
point! at -tall itert-
any Mine- sources In
ted I oca- inert-term
tlgns) -laouentlal-
Flit and juady- *e(1anal
kllly tor- stetet can scale
rain; a tall handle
stack- tar- skort-torm
rain correc- in 'toouan
tlon is evall-tlat- moot
amle for 'so- (I. 1. •.
eumtlal- and M hr)
node aut not and lenm.
•cllmtcoloel- term In
cal- name; •cllmeto-
alsa i unloue lofictl-
elevation mode (I ma..
can ke fpecl- seasonal.
flea1 for re- I yr.)
copters;
plus* and
mlileo depth
respond to
terrain
obstacles
CancantntloM
at eack
avaraea* for
relatively IM
pellotanUi
B-6
-------
TABLE B-l (Continued)
Cite o» rr
Bevelooer
Description
form of
Output
Short-Term Averaging
APRAC-tA
Recommended by EPA (devol-
EPA In guidelines oped by
(No. 14 and 35) Stanford
Research
Institute)
CBTH-
thli alto can be
used far annual
averaging
Recommended by
CPA In guidelines
(Ho. 13]
This Is a model which calculates hourly Many UM CO. TSP
average CO concentrations for urban (an exten-
areas. Contribution from dispersion on live traf-
1 scales are calculated: extraurban, flc tnveft-
matnly from sources upwind of city of tary alue which If used thereafter); 4 lutera-
and local. I ram street canyon effects. ally dt-
Featuni: no pliant rise, fumigation or fined re-
downwash; helical circulation In street ceptors ere
canyons; hourly varying traffic emissions used on
and 2-0 Kind field; oz(i) • axb; link each street
Missions are aggregated Into area where
sources; no wind power lav: 6 stability street can-
classes (Turner); dispersion coefficients yon affects
from HcClroy and Pooler (IMS), modified are con-
using Letghton and OltMr (1953): no chem- sldered
Istry; perfect reflection at surface and
inversion (Ignores latter until concen-
tration equals that calculated using bo«
model and uses that thereafter).
Steady-state Gaussian plume model appll- Single Point CO. S0>.
cable In uneven terrain. Features: 7 source up HO,. TSP
stability classes (Turner. Pasquill); to l»
dispersion coefficients from Turner; no stacks (all
chemistry-. Irlggs plums rise; no fumigation assumed at
or downwash; perfect reflection at surface the saee
* aodel used to
calculate dispersion fron urban arva sources.
Analytic Integration of area sources. All
sources upwind of each receptor area are
sunned. It Is mail applicable In areas
wtiere no point source Information is avail-
able. Features: perfect reflection at
ground; Mixing height reflection not con-
sidered; hourly emissions and winds;
az(x) • ax°; dispersion coefficients from
S-IOi (I9«8|; stability classes fro. S-IUi;
narrow plune approx. (no horizontal 'ilsper-
slon); no plu«e rise: no chewlttr*.
EPA Steady-state (S-S) Gaussian pi UK nadel that
coacMites the hourly concentrations of non-
reactive pollutants downwind of roadways.
lased on analytic Integration, of line source.
It is applied to each lane of traffic. Fea-
tures: no cheailstry: perfect reflection at
surface and Inversion; one road or highway
segment per run; C stability classes (Turner);
dispersion coefficients from Turner; for dis-
tances « 100 n, coefficient from Zlanenaa
and Thoxxison (1975); no wind power law;
hourly emissions and 2-0 wind,
EPA An S-S Gauitian plunr moOel tKat considers
Multiple point sources, tt is based on
linear tddltlvlty of Individual source
effects. Features: hourly emissions and
winds; Briggs plune rise; r^ fumigation or
downwash; no wind power law. Turner stabil-
ity classes and dispersion c:oeffIcients
(horizontal and vertical): no chemistry:
(niltlple reflection).
Haay
SO,. TSP.
Simple
1-hr.. 8-hr. Regional
and 24-hr.
averages
(although
Hourly concern-
tration values
at receptors
Regional prmelmms
Involving 1«art
pollutants: erban
Up to 24 line
(arbitrary
receptor
and release
kcfotts)
Up to ZS Point
(up to 30 (alevated)
receptors)
CO. TSP
Level ter-
rain
average
can be
estimated
Hourly
(1-24 hr.
average)
hear to
medium field
downwind
One-hr. aver-
age concentra-
tions at each
receptor
Regional- or
highway-specific
problems for
nonreactlve
pollutants
S02. TSP
Flat
terrain
Hourly
(1-24 hr.
average)
Regional Hourly concen- Regional ,
trations; source source problems;
contribution
list at each
receptor; aver-
age concentra-
tions
urban area;
reactive pollu-
tants
B-7
-------
TABLE B-1 (Continued)
EM
RcconnendJttM
Statin
Developer
Description
Capabilities
PTBIS
by EPA
ftecomwided by EPA
EPA A steady-state Gaussian pluec model that
estteiatei short-term center-tine concentra-
tions directly downwind of • point source.
Features: same mi RTKIP.
An S-S taussian plumt model Out Until the
maximum short-term concentrations fro* t
single point court* as • function of sta-
bility and wind speod. Features: same ) - l.»H and uniform mlilnf
thereefter; mliint kelakt determined fron
dice-daily tenperature soundlnos (stability
class, also).
Niny Point SO., TV Flat Hourly «e»lon»l
(recevters areal terrain. >nd (urban) and
ira all at iveraond rural
tne un uo to
Mlekt) 24 tirs.
Hourly and aver- national proolen
aee concentre- for nonreectlve
ttans at recee- pol Intents; urban
tent 1 tori ted ai
lawn contrlbu-
tlan list: cunu-
latlv* fiaouancj
alstrtbutton date
VM.LCT-
This can also bo
uied for annual
mraalnf (cll-
•ujtalogtcel
•Oder1
TEW—Teus
episodic Model
•ecomiended by EPA CPA An S-S GawtUn oluw uadel for calcutatlnf
In guidelines annual and nMlnun 24-keur evernoe SOf and TSP
(Ho. 14) ' from single point sources In coonlex terrain.
Features: cllnBCologlcal an* short-tern
nodes; 16 »1od directions and ( «lnd speed
cateaortes; Irlggs plum rise (1971. 1972);
i stability (urban) classes (Turner. I9M):
dispersion fron Pasoulll (1961) and Sir ford
(1961); 6 stability classes for rural; no
«ind power leu; exponential decay for choo-
Istry and removal.
ition Texas Air An S-S Gaussian plune model for predicting
Control short-tern concentreclons (10 oiin. to 24
Board hour) fron) nultlple point and area sources.
Calculations are porfomed for I u 24
scenarios (eateoroloay. averaging tine, end
mrtxinej height). Features: Irtggs plune, rise;
mixing height penetration factor; up te S
pollutants; m chemistry but exponential
decay, no domwash on fumigation; »iod paamr
!«•; rasauf 11-Cl'ford-Turner stability classes
dispersion coefficient from Turner; perfect
reflection from surface and inversion until
«, - 0.47H
No
status
Point, anal
(treated as
a point
)
SO., TV
Up u SO
(112 re-
captors on
radial
grid; can
be at dif-
ferent
taoelagi-
cal bgts.)
Up to M Point, area I SOj. TV
Conplex
terrain
Short- and leglanal
long-Urm (urban and
average rural)
(24-hr, and
ml)
Flat
terrain
us U200
araal
Short-term neglonal
(I. 1. 24 (urban)
hrs.)
(mote
iOxSO re-
-
,._'»««*»st
24-hr, ceneon-
tratlan. source
contribution
llsti long-term
node: arithmetic
emams. source
cantriWttam
list
rssan concentre- Imgienel •rumlemm
timm for each for nemraactlve
grid point (10 pollutants; urban
min.. 30 mln.. area
1 hr., 1 hrs..
and 24 hrt.);
printed plot;
culpability Hit
grid)
B-8
-------
TABLE B-1 (Concluded)
CM
Reco^miidatfon
Status
"ascription
TAJ>AS--Topograpnic
A1r Pollution
System
AQSIK-AIr Quality
Short Term Model
Ho recomnendation
status
CALINC-2
No reconnendatio
status
No rccmmendatlo
status
USD* Forest This model combines a simulation of the
Service wind field over Mountainous terrain with a
Gaussian derived diffusion model. It pro-
vides an estimate of the total allowable
emissions within each of a number of grid
cells (ranging from 0.2S km* to » km2) to
maintain a preselected level of air quality.
The diffusion model Is employed in each grid
cell to provide an estimate of the mixing
conditions within these cells. These con-
ditions are combined wit* the Pollutant
Standards Index such that a maximum allowable
ealsslon is calculated. Features: wind
model (Cressman objective analysis, poten-
tial flow over topography, influences of sur-
face temperature and roughness); Gaussian
model (a. and oz from Turner, effects of
mass flow divergence included, stability
classes from Turner, no upper bound on diffu-
sion although the wind is calculated assum-
ing a lid at a specified height above the
topography); the calculated wind follows the
terrain and thus gives a vertical wind com-
ponent; no chemistry; no explicit treatment
of plume behavior.
Illinois An S-S Gaussian plume model for estimating
environmental short tern concentration averages from oul-
Protectlon tiple point sources in level or complex ter-
Agency rains. It can simulate late Inversion break-
up fumigation, lake shore fumigation, and
atmospheric trapping. Features: one or two
pollutants simultaneously; no chemistry;
Brlggs plume rise; no downwash; wind power
law; user-supplied stability classes; disper-
sion coefficients from Turner (1969): perfect
reflection at ground and mixing height.
California *> S-S Gaussian line source model for traffic
Air Resources i«wact assessment. Features: no chemistry;
Board (CARS) perfect ground reflection; Pasquill stability
cesses; hourly emissions; some accounting
for depressed highways
Atmospheric The region of Interest Is assumed to be
Turtwlence emconpassed by a single cell or bos.
and Otffu- bounded by the inversion above and the
slon labor- terrain below. All concentrations ar*
atory— ATOL assumed to be In steady-state. Features:
(Oak Ridge. for given time, constant emissions rite
Tenn.) and simple winds; seven-step chemical
mechanism proposed by Frledlander and
Seinfeld (1969); uniform and constant
wind and constant mixing depth.
Stt
Many
urces
Tvpe
Point (no
distinc-
tion made1
between
point, line.
and a real
sources
in
Type Complexity
SO*. TS». Complex
CO
Resolution
Tamperel Spatial
toth short- limited
term and regional
long-term
estimates
OutDUt
Allowable emis-
sions In each
grid cell for
each pollutant
of Interest
Addressed
limited regional
Impact problems
in complex ter-
rain; nonreactlve
pollutants
Up to 200 Point
sourcti (elevated)
(up to goo
receptors
located on
a unlforei
rectangular
grid);
unieue to-
pographic
elevation
far bout
«•«» (an line
ei tensive
traffic
inw'nr Is
reqt. -11
All emitting
Inte a single
box
SO.. TSP Mostly flat
terrain, out
sooe correc-
tions for
complex ter-
rain
CO Relatively
flat terrain
Oj. NO. Not explicit
1OI. IMC
Short-tene Regional
(1. 3. and
24 hr.
averaging)
Short-term
Temporal No resolution
resolution
can be
obtained by
varying
Initial
conditions
to match a
temporal
pattern
Average concen-
Cratlonf at re-
ceptors; source
contributions
at receptors
Hourly
concentrations
at receptors
i Concentration
•ilwes et the
time considered
Regional point
source problems
for nonreactive
pollutants; urban
areas; shorelines'
Regional CO
problems
from traffic
sources
Regional onldant;
it was applied
to LA Basin (30
Sept. 1969 data).
Otone predictions
•ere low.
B-9
-------
APPENDIX C
SOME SPECIFIC MODEL PERFORMANCE MEASURES
c-i
-------
APPENDIX C
SOME SPECIFIC MODEL PERFORMANCE MEASURES
Having discussed model performance measures in generic terms in
Chapter V, we now present some specific examples. We discuss each of the
four generic types of performance measures: peak, station, area, and
exposure/dosage. We include scalar, statistical, and "pattern recogni-
tion" variants.
1. PEAK PERFORMANCE MEASURES
The use of a performance measure of this type requires the modeler to
know information about both the predicted and the "true" concentration peak.
The measurement network must be so situated as to insure a high probability
of sensing the "true" peak concentration or a value near to it. There are
three characterizing parameters of interest: peak concentration level,
spatial location, and time of occurrence. The predicted and observed values
of some or all of these may be available for comparison. Differences in
their predicted and observed values represent the performance measures of
interest. These peak measures are summarized in Table C-l.
Each measure conveys separate but related information about model
behavior in predicting the concentration peak. Their values should be
examined in combinations. Several combinations of interest and some of
their possible interpretations are shown in Table C-2. The table is not
intended to include all combinations and interpretations. Rather, it
illustrates by example how inferences can be made about model performance
through the joint use of performance measures.
C-2
-------
TABLE C-l. SOME PEAK PERFORMANCE MEASURES
Type Performance Measure
Scalar
Pattern
recognition
a. Difference* in the peak ground-level
concentration values.
b. Difference in the spatial location of
the peak.
c. Difference in the time at which the
peak occurs.
d. Difference in the peak concentration
levels at the time of the observed
peak.
e. Difference in the spatial location of
the peak at the time of the observed
peak.
Map showing the locations and values of the
predicted maximum one-hour-average concen-
trations for each hour.
"Difference" as used here usually refers to "prediction minus
observation."
Several points are contained in Table C-2. While a large difference
in peak concentration levels might in itself be sufficient reason to question
a model's performance, a simple difference in peak location might not. If
the concentration residual (the difference between predicted and observed
values) at the peak is small (good agreement) and yet there is a difference
in the spatial location of the peak, this may be due mostly to slight errors
in the wind field input to the model. The slight offset in the location of
the peak might cause predicted and measured concentrations to disagree at
specific monitoring stations, particularly if concentration gradients within
the pollutant cloud are "steep." However, a small displacement in the con-
centration field, unless it resulted in a large change in population exposure
and dosage, may not be a serious problem. Model performance might be otherwise
acceptable.
C-3
-------
TABLE C-2. SEVERAL PEAK MEASURE COMBINATIONS OF INTEREST
AND SOME POSSIBLE INTERPRETATIONS
Residual Values
Concentration
Level
Location
Event-Related*
Small Small
Large
Timing
Small
Small
Large
Large
Any value Any value
Fixed-Time^
Large Large
: Small
Some Possible Interpretations
Model performance in predicting the
concentration peak is acceptable
Model performance is still good in
predicting the peak concentration
level
There is a possible error in the
wind field input
Concentration level prediction is
good
There is a possible error in wind
field input
There is a possible error in the
chemistry package or emissions input
Model performance is probably
unacceptable
Model performance may or may not be
acceptable; event-related (peak)
residuals must be examined to make a
final judgment
Model performance is probably
unacceptable
Pollutant transport is handled accep-
tably well
There is a possible error in the chem-
istry package, the emissions input,
or the inversion height time and
spatial history
Residual values are calculated at the time an event occurs (the peak).
Residual values are calculated at a fixed time (the time of the observed peak)
C-4
-------
On the other hand, if the spatial offset of the location of the peak
is accompanied by a significant difference between the predicted and observed
times at which the peak occurs, more serious problems might be suspected.
Not only might there be a wind field problem, but the chemical kinetic
mechanism may be giving erroneous results (if the pollutant species of
interest is a reactive one). Alternatively (or additionally), one might
suspect that the emissions supplied as input to the model were not the same
as those injected into the actual atmosphere. Another possibility also
exists. Slight differences between the modeled and actual wind field
might result in the air parcel in which the peak occurs following a space-
time track having sufficiently different emissions to account for differences
in peak concentration values.
Additional clarity of interpretation can be achieved in another way.
We can compare concentration level, location and timing, not just at the
time a specific event occurs (the peak, for instance) but also at a fixed
time (the time at which the observed peak occurs, for example). Suppose
that the concentration level residual at that fixed time (the difference
between maximum predicted concentration and the observed peak value) is
large but the spatial one is not. In this case, one could conclude that
the model reproduced the pollutant transport process but was unable to
predict concentration levels. This could result from many causes, among
which are errors in the chemical kinetic mechanism, the emissions input,
or the inversion height space/time profile. Whatever the cause, however,
the conclusion remains the same: Model performance is probably inadequate.
Alternatively, if both the fixed-time concentration level and location
residuals are both large, a firm conclusion about model acceptability may
be premature. Performance may or may not be satisfactory. A comparison
with the event-related peak performance measures is necessary before a
final judgment is made.
C-5
-------
If the model being used is capable of sufficient spatial and temporal
resolution, a "pattern recognition" performance measure may be of some use:
a map showing the locations and values of the predicted maximum concentrations
at several times during the day. Such a map is shown in Figure C-l. It
was produced using the SAI Urban Airshed Model simulating conditions in
the Denver Metropolitan region.
2. STATION PERFORMANCE MEASURES
The use of a station performance measure requires the modeler to
know, usually at each hour during the daylight hours, the values of both
the predicted and observed concentrations at each monitoring stations. From
the two concentration time histories at each site, a number of performance
measures are listed in Table C-3, divided into three categories: scalar,
statistical, and ."pattern recognition.-
Station measures are the performance measures whose use is most
feasible in practice. Their calculation is based upon the comparison
of model predictions with observational data in the form that it is most
often available—a set of station measurements. By contrast, peak
measures require the observation of the "true" peak. If this peak value
is not the same as the value recorded at that station in the monitoring
network measuring the highest level, if the location of the peak is
somewhere other than at that station, and if its time of occurrence is
different than the time of the peak observation, then the calculation of
peak performance measures may not be feasible. Although one can sometimes
use numerical methods to infer from station data the level, location and
timing of the peak, results are subject to uncertainty.
Similarly, area and exposure/dosage measures require knowledge of the
"true" spatially and temporally varying concentration field. However,
unless circumstances are simple and the monitoring network is exceptionally
extensive and well-designed, the "true" concentration field will not be
known. The only data available will consist of station measurements. Infer-
ence of the concentration field from such data can often be an uncertain
and error prone process.
C-6
-------
* -
SOUTH
Meteorology of 3 August 1976
FIGURE C-l. LOCATIONS AND VALUES OF PREDICTED MAXIMUM ONE-HOUR-
AVERAGE OZONE CONCENTRATIONS FOR EACH HOUR
FROM 8 a.m. TO 6 p.m.
C-7
-------
TABLE C-3. SOME STATION PERFORMANCE MEASURES
Type
Performance Measure
Scalar
Statistical
Pattern
recognition
Concentration residual at the station measuring
the highest concentration (event-specific time
and fixed-time comparisons).
Difference 1n the spatial locations of the pre-
dicted peak and the observed maximum (event-
specific time and fixed-time comparisons).
Difference 1n the times of the predicted peak
and the observed maximum.
For each monitoring station separately, the
following concentration residuals statistics
are of interest for the entire day:
1) Average deviation
2) Average absolute deviation
3) Average relative absolute deviation
4) Standard deviation
5) Correlation coefficient
6) Offset-correlation coefficient.
For all monitoring stations considered together.
the following residuals statistics are of
interest:
1 Average deviation
2 Average absolute deviation
3 Average relative absolute deviation
4 Standard deviation
5) Correlation coefficient
6) Estimate of bias as a function of
concentration
7) Comparison of the probabilities of concen-
tration exceedances as a function of
concentration
Scatter plots of all predicted and observed
concentrations with a line of best fit deter-
mined in a least squares sense.
Plot of the deviations of the predicted versus
observed points from the perfect correlation
line compared with estimates of Instrumentation
errors.
Time history for the modeling day of the pre-
dicted and observed concentrations at each site.
Time history of the variations over all stations
of the predicted and observed average concentra-
tions.
At the time of the peak (event-related), the ratio
of the normalized residual at the station having
the highest value to the average of the normal-
ized residuals at the other stations.
C-8
-------
a. Scalar Station Performance Measures
Since the "true" concentration peak is not always known with confidence,
a surrogate is needed for determining model performance in predicting the
concentration peak. Such a measure is often based upon a comparison of
the predicted and observed concentrations at the station measuring the
highest value during the day. The comparison can be done at an event-related
time (the peak) or a fixed time. Since the values of the measures may
differ at the two times, the implications of those differences should be
considered carefully.
b. Statistical Station Performance Measures
Many statistical station performance measures are of use. Sometimes
the behavior of the concentration residuals at a single station is considered.
At other times, the overall behavior of the residuals averaged over all
stations is the focus of interest. In either case, however, several of
the statistical performance measures remain the same. We define them here
(the tilde - denotes "predicted," while m is the pollutant species, n
is the hour of the day, k is the station index, K is the number of stations
being considered, and N is the number of hours being compared:
> Average Deviation
~
" N£lfcl
> Average Absolute Deviation
N K
" N n=l k=l
> Average Relative Absolute Deviation
N K
(C-3)
1 Ck'
C-9
-------
> Standard Deviation
T 2: ?•" - •!•"
or, alternatively,
N .. K
^r {E [g (?•• - ••
2
The first three of these relations are designed to measure the mean
difference between predicted and observed concentration, either at a
particular station (K = 1) or averaged over all of them (K = total number
of stations). The average deviation expresses the mean value of the
residuals through the day. A non-zero value is an indication of a system-
atic bias. Because large positive residuals can cancel with large negative
values, a low value of average deviation does not always guarantee close
agreement between prediction and observation. By computing the average
absolute deviation, however, one can assess whether such a "cancelation"
problem is occuring. A large value is an indication of appreciable con-
centration differences, providing such information even if the average
deviation is small. Since a small number of large residuals can dominate
in the computation of the previous measures, a large value for either of
them does not necessarily indicate consistently large disagreement between
prediction and observation. Residuals can be normalized to balance the
effect of large and small residuals. This average relative absolute
deviation is a measure in whose computation this is done.
The standard deviation, as expressed in Eq. (C-4), is a measure of
the shape of the frequency distribution of the residuals. A large value
indicates that residual values vary throughout a large range. Correspond-
ingly, a small value suggests that they cluster closely about their mean
value, as expressed in'Eq. (C-l).
C-10
-------
Another statistical measure is of interest. The correlation coeffi-
cient, as expressed below, provides an indication of the extent to which
variations in observed station concentrations are matched by variations in
the predicted station values. A close natch is indicated by a value near
to one (the value for "perfect" correlation).
> Correlation Coefficient
where
N r K
—Ur F fe
KM - 1 *—i I *—'
m ' n=l Lk=l
k=l
N K
E E e-n
m _ n=l k=l *
'E-
m
'c
*\
?,
N
E
. n=l
KN
K
E
k=l
KN
cm,n
N
V
N
n=l
m m
°c°c
m\J
-^c
(C-6)
(C-7)
(C-8)
(C-9)
(C-10)
If the value of the correlation coefficient is not close to one,
this may or may not be an indication that model performance is deficient.
For instance, suppose slight errors were embedded in the wind field
supplied to the model. Possibly, the only effect of this could be a
slight offset between the predicted and the "true" pollutant cloud location.
The concentration level and its distribution within the cloud might be
C-ll
-------
well predicted otherwise. However, the correlation coefficients computed
at individual stations (K = 1) might not demonstrate agreement between
prediction and observation, indicating instead the opposite. Conceivably,
this also might be the case even if the correlation coefficient is computed
using concentration values averaged for all stations (K - total number of
stations).
Another statistical measure is useful in overcoming this difficulty
when sampling stations are not too "sparsely" sited. This measure is the
offset correlation coefficient and is designed to compare predictions at
one station and time against observations at another station and/or time.
It is defined as follows:
> Offset Correlation Coefficient
N
Efe" - ps
where k is the index of the measurement station at which concentrations are
predicted, j is the index of the station at which they are measured, and An
is the time offset between prediction and observation; also
E
"ck H
(C-12)
C-12
-------
Many reasons can account for differences between prediction and
observation. The offset correlation coefficient itself cannot be used
to isolate specific reasons, but it can detect time lags or spatial offsets
between comparative concentration histories. A time lag might occur
because of slight differences between modeled and actual wind speed, diurnal
inversion height history, emissions, or atmospheric chemistry, as well as
any of a number of other reasons. These differences could manifest them-
selves at a particular monitoring station as a simple time lag, an example
of which is shown in Figure C-2(a). Also, for the reasons mentioned above,
as well as differences in modeled and actual wind direction, a spatial
offset can occur which could result in the actual and predicted pollutant
clouds passing over different but adjacent stations. A comparison of the
concentration profiles at these two stations, such as those shown in
Figure C-2(b), can reveal the offset. Good agreement could be inferred if
the value of the offset correlation coefficient between the concentrations
at the two stations* at the same time, assumed a value near one ("perfect"
correlation).
In using station data as a basis for comparing prediction with obser-
vation, the offset correlation coefficient should be computed as a matter
of course. For the station of interest (perhaps the one recording the highest
concentration value), computation of the following offset correlation coeffi-
cients might be revealing: first, at the same hour, with all adjacent sta-
tions (unless none are nearby); then, at the same station, for adjacent hours
(for example, one and two hours lag and lead); and finally, with all adjacent
stations and hours (to reveal the joint presence of spatial offset and time lag)
C-13
-------
An
Hour of Day
(a) Time Lag (Predicted and Measured Concentrations
are for the same monitoring station)
c
ro O
•»-> +•>
C CO
0) -M
O -M
O ro
Hour of Day
jH
•M C
S- •!-
4J -l->
C (O
QJ +J
C
O -t->
Hour of Day
(b) Spatial effect (Predicted and Measured Concentrations
are for Different but Adjacent Monitoring Stations)
FIGURE C-2. CONCENTRATION HISTORIES REVEALING
TIME LAG OR SPATIAL OFFSET
C-14
-------
For all the monitoring stations considered together, several other
statistics are of interest. For instance, the variation of bias in model
predictions with the level of pollutant concentration can be plotted as
shown in Figure C-3 . In this particular example, based upon simulations
of the Denver Metropolitan region performed using the SAI Urban Airshed
Model, the fractional mean deviation from perfect agreement between predic-
tion and observation appears to vary randomly at the higher ozone concen-
trations. Aside from an apparent systematic bias at very low concentrations,
no conclusion of significant bias seems demonstrable.
Soot Jtean Squirt Ozone Concentr»t1on (ppha)
f(Observed)2 * (Predicted)21
FIGURE C-3. ESTIMATE OF BIAS IN MODEL PREDICTIONS AS A FUNCTION OF
OZONE CONCENTRATION. This figure is based upon pr
tions of the SAI Urban Airshed Model for the Denver
Metropolitan region.
C-15
-------
Residuals can vary in sign and magnitude during the modeling day.
It is often helpful to plot their diurnal variation. An example is
shown in Figure C-4, based upon predictions of the SAI Urban Airshed Model
for three modeling days in Denver. A discernable pattern might be sympto-
matic of basic model inadequacies. In this example, however, no simple
pattern seems apparent.
For each set of observations or predictions (for all stations and
times), there exists a cumulative concentration frequency distribution.
This describes the probability of occurence of a concentration in excess of
a certain value for the range of possible concentration values. An example
based upon the modeling effort noted earlier in shown in Figure C-5. A con-
clusion might be drawn from this figure: Although background ozone concen-
trations are not well-determined (low background concentrations are difficult
to measure accurately), higher concentrations are more predictably distributed.
By plotting observed concentrations against predicted ones (at each
station for each hour), a graphic record of their correlation can be obtained.
The degree of clustering of observation-prediction pairs about the perfect
correlation line provides an indication of the degree of their agreement.
An example is presented in Figure C-6. For each particular combination of
observation and prediction, the number of occasions on which they occurred
are shown.
Superimposed on the figure are the standard deviation bands (la) for
both the EPA standard and maximum acceptable instrumentation error. These
bands portray the extent to which station measurements are accurate indi-
cators of "true" concentrations. To conclude that a model is unable to
reproduce a set of "true" concentrations, one must know the value of those
concentrations. Measurements, however, are imperfect surrogates. If
concentration residuals are within instrumentation limits, differences could
be explained solely by measurement errors. In such a case, no further
conclusions could be reached about model predictive ability.
C-16
-------
E
ex
c -
S
C
O
•*• 9
M C
c a
•> t.
o «
c c
o at
o u
t> o
c u
M «•
o c
o
•o M
to
C O
•i e
MEW OF W.I STATIONS
O NCAN OF ALL STATIONS, N JULY 1975
O MEAN OF ALL STATIONS, ffl JULY 1978
0 MEAN OF ALL STATIONS. 3 AUGUST 197«
AVERAGE OF THE 3 DATS
I
1
n
n
9
in
10
TT
n
17
12
T
1
7
2
I
Tlw of 0«y by Hourly A»er.,1m, Period
FIGURE C-4. TIME VARIATION OF DIFFCREMCES DET'-'EEM MHANS OF OBSERVED AND
PREDICTED OZONE CONCENTRATIONS. This figure 1s based upon
predictions of the SAI Urban Airshed Model for the Denver
Metropolitan region.
-------
M
o.
QL
10
£
OBSERVED
PREDICTED
279 DATA PAIRS FROM 3 DAYS, M HOURS, 9 STATIONS
I ( l |f j.
•» •» W » M M • • N p I I I M tl tl
Probability of Exceedance of Given Ozone Concentration
FIGURE C-5. PROBABILITIES OF OZONE CONCENTRATION EXCEEDANCE. This figure is based
upon predictions of the SAI Urban Airshed Model for the Denver
Metropolitan region.
-------
P=Predicted
FIGURE C-C.
MODEL PREDICTIONS CORRELATED WITH INSTRUMENT OBSERVATIONS
OF OZONE (DATA FOR 3 DAYS, 9 STATIONS, DAYLIGHT HOURS).
This figure is based on predictions of the SAI Urban
Airshed Model for the Denver Metropolitan region.
C-19
-------
Some of the information contained in Figure C-6 is summarized in
Table C-4. The percent of prediction/observation pairs meeting certain
correspondence levels are indicated for this example. The extent to
which concentration residuals compare with instrumentation error is
shown in Figure C~;7. These same plots can be constructed for most
modeling applications for which station predictions are known.
TABLE C-4. OCCURRENCE OF CORRESPONDENCE LEVELS OF PREDICTED
AND OBSERVED OZONE CONCENTRATIONS
Percent of Comparisons
Meeting Correspondence Level
Correspondence Level Both Predicted and
Between Predicted and Observed Pairs Comparisons Observed Cone. > 8 pphm
1) Factor of two (2P > 0 > P/2) 801
.2) Computed value 1s within ± twice
S.D. max. prob. inst. error
(951 level) of observed value 100
3) Computed value 1s within ± S.O.
of max. prob. inst. error
(95X level) of observed value 93
4) Computed value Is within ± twice
S.D. of inst. errors by EPA std.
(95X level) of observed value 89
5) Computed value Is within ± S.O.
of inst. errors by EPA std.
(95X level) of observed value -60
94%
100
90
77
37
c.
"Pattern Recognition" Station Performance Measures
Several qualitative/composite model performance measures are useful
in comparing station predictions with observations. At each monitoring
site, for instance, the time history through the modeling day of the pre-
dicted concentrations can be plotted directly with the time history of
C-20
-------
DEVIATION OP PREDICTED VERSUS 08SERVEO POINTS
ROH PERFECT CORRELATION LINE (281 ONE-HOUR
AVERAGE DATA POIN1S)
TRUC - INSTRUMENTAL)
EPA ACCEPTABLE MONITOR (IVAN HAS • -8 PERCENT;
1 3 PPItl • 95 PERCENT CONFIDENCE LEVEL)
(TRUE - INSTRIMCNTAL)
MAXIMA PROBABLE ERROR (MEAN
BIAS • -B PERCENT! t 7 PPIIM
5PERCENT CONFIDENCE LEVEL)
Difference (pphm)
FIGURE C-7. MODEL PREDICTIONS COMPARED WITH ESTIMATES OF INSTRUMENT ERRORS FOR OZONE (DATA
FOR 3 DAYS, 9 STATIONS, DAYLIGHT HOURS)
-------
the measurement data. This is done in Figure C-9 for one of the days
(3 August 1976) in the Denver modeling example employed earlier.
Preceding this figure is a map in Figure C-8, which shows the names and
locations of the air quality monitoring stations in the Denver Metropol-
itan region.
For each hour during the day, the predicted and observed concentrations
each can be averaged for all measurement stations. The diurnal variation
of this all-station average can also be of interest. An example of such
a time history is shown in Figure C-10.
At the time the concentration peak occurs, the performance of the
model in predicting that peak is of interest as is its ability to predict
the lower concentration values at monitoring stations distant from the
peak. An indication of the relative prediction-observation agreement at
the peak versus the agreement at outlying stations can be found by com-
puting a composite performance measure. The ratio can be found of the
normalized residual at the station measuring the highest concentration
value to the average of the normalized residuals at the other stations.
If this ratio is large, better performance at the outlying stations than
near the peak can be inferred. If the value is small, the reverse is true.
If the ratio is near unity, agreement is much the same throughout the
modeled region.
The value of a concentration residual at a station changes during
the modeling day. If these changes can be tied to corresponding changes
in atmospheric characteristics (the height of the inversion base, for
instance), we can sometimes draw valuable inference about model performance
as a function of the value of these atmospheric "forcing variables." Some
of these variables include: wind speed, inversion height, ventilation (com-
bining the previous two variables into a product of their values), solar
insolation, and a particular category of emissions (automotive, for example).
C-22
-------
KEY
NG - Northglenn NJ
WE - Wei by GM
AR • Arvada 0V
CR - C.A.R.I.H. PR
CM - Continuous Air Moni-
toring Program [CAMP]
National Jewish Hospital
Green Mountain
Overland
Parker Road
NORTH
SOUTH
FIGURE C-8. MAP OF DENVER AIR QUALITY MODELING REGION SHOEING
AIR QUALITY MONITORING STATIONS
C-23
-------
15
12-- • *• • 1 «
tf-.ffftlTltTff»Tt
fstart hour":
. I i tc r L nvwi
Tine of Da>, Rjr Hourlv Interval [slop hour J
— 0— Observes
—<=> Predicted
FIGURE C-9. TIME HISTORY OF PREDICTED AND OBSERVED CONCENTRATIONS
AT MONITORING SITES. This figure is based on the
dictions of the SAI Urban Airshed Model in Denver
for 3 August 1976.
C-24
-------
n
ro
en
Time of Day By Hourly Averaging Period
FIGURE C-10.
VARIATIONS OVER ALL STATIONS OF OBSERVED AMD PREDICTED AVERAGE OZONE CONCENTRATIONS
This figure is based on the prediction of the SAI Urban Airshed Model in Denver.
-------
To examine residual values for cause-and-effect relationships, we can
plot on the same figure the time history of both the residual and the
forcing variable. Alternatively we can plot the residual directly
with the forcing variables. Examples of both of these are presented in
Figure C-11.
-------
model performance measures. In practice, however, we are seldom able to
resolve fully the "true" concentration" field, even if the model we use is
capable of doing so for the predicted field. This difficulty derives from
the limited sampling of measurement data generally available: Only measure-
ments at several scattered monitoring stations are recorded. Unless ambient
conditions are highly predictable and the monitoring network is extensive
and exceptionally well-designed, reconstruction of the "observed" concen-
tration field from discrete station measurements can be an uncertain and
error prone process.
Nevertheless, the observed concentration field can be inferred with
accuracy in some cirucmstances. In addition, models frequently can provide
spatially resolved predictions. Grid models, for instance, predict average
concentrations in a number of grid cells. Resolution is then provided as
finely as the horizontal grid-cell dimensions (on the order of one to sev-
eral kilometers). Trajectory model predictions can be used to calculate
concentrations along the space-time track followed by the air parcel being
modeled. Gaussian models are analytic and can resolve fully their predictions.
Thus, even if the observed concentration field is known only imperfectly,
the predicted field, because it is often much better resolved, can still
provide qualitative information about model performance. Further, the
shape of the predicted concentration field can suggest ways to extract
information for comparison with station measurements. We discuss "hybrid"
performance measures later in this Appendix.
In this section we present several area performance measures. When
predicted and observed concentration fields are known, they can provide
considerable insight into model performance. These performance measures
are based upon taking the difference between the predicted and observed
values of certain quantities. Even when the observed values of these
quantities are not known with accuracy, computation of their predicted values
can provide a systematic means for characterizing model predictions.
C-27
-------
The performance measures presented here can be divided into three
types: scalar, statistical, and "pattern recognition." We discuss each
in turn. In Table C-5, we list some of these measures.
a. Scalar Area Performance Measures
The seriousness of a pollutant problem is a function not only of the
concentration level itself but also of the spatial extent of the pollutant
cloud. Several scalar area performance measures are designed with this in
mind. Even if a model predicts the peak concentration well, it may not
necessarily predict the extent of the area exposed to concentrations near
to that value. This might not be a serious defect if the pollutant cloud
passed over uninhabited terrain. However, if the cloud were to drift
over a densely populated urban area, a considerable difference in the
health effects experienced could exist between a cloud one mile across and
another five miles across. This could affect correspondingly our willing-
ness to accept a model for use whose predictions of cloud dimensions
differed considerably from observed dimensions.
Two performance measures of interest are the following: the differences
between both the fraction of the area of interest within which concentra-
tions exceed the NAAQS and the fraction experiencing concentrations within
10 percent of the peak value. The first of these is a measure of the
general ability of the model to predict the spatial extent of concentra-
tions in the range of interest. The second estimates the performance of
the model in the higher concentration ranges at which, presumably, health
effects are more pronounced.
A third measure is of interest. At each measurement station a set of
concentration readings are recorded. It is interesting to compute from
the predicted concentration field the nearest distance at which there occurs
a value equal to the observed value, as well as the azimuthal direction
from the station to the nearest such point. This direction lies along the
concentration gradient of the predicted field. The magnitude of the distance
is a measure of the spatial offset between the predicted and observed concen-
tration fields in the vicinity of the monitoring station. The direction is
a measure of the orientation of the offset.
C-28
-------
TABLE C-5. SOME AREA PERFORMANCE MEASURES
Type
Performance Measure
Scalar
Statistical
Pattern
recognition
a. Difference in the fraction of the area of interest
in which the NAAQS are exceeded.
b. Nearest distance at which the observed concen-
tration is predicted.
c. Difference in the fraction of the area of interest
in which concentrations are within 10 percent of
the peak value.
a. At the time of the peak, differences in the
fraction of the area experiencing greater than
a certain concentration; differences in the
following are of interest:
Cumulative distribution function
Density function
Expected value of concentration
Standard deviation of density function
the entire residual field, the following
statistics are of interest:
1) Average deviation
2) Average absolute deviation
3) Average relative absolute deviation
4) Standard deviation
5) Correlation coefficient
6) Estimate of bias as a function of
concentration
7) Comparison of the probabilities of concen-
tration exceedances as a function of con-
centration
Scatter plots of prediction-observation concen-
tration pairs with a line of best fit determined
in a least squares sense.
Isopleth plots showing lines of constant pollu-
tant concentration for each hour during the
modeling day.
Time history of the size of the area 1n which
concentrations exceed a certain value.
Isopleth plots showing lines of constant residual
values for each hour during the day ("subtract"
prediction and observed isopleths).
Isopleth plots showing lines of constant residuals
normalized to selected forcing variables (inver-
sion height, for instance).
Peak-to-overall performance indicator, computed
by taking the ratio of the mean residual in the
area of the peak (e.g., where concentrations are
within 10 percent of the peak) to the mean
residual 1n the overall region.
C-29
-------
b. Statistical Area Performance Measures
A number of statistical area performance measures are of use. They are
generally computed either at a fixed time or at the time of a fixed event,
(the peak,for instance). Before they can be computed, however, both the
predicted and observed concentration field must be transformed into a
compatible, discrete form. The scales of resolution must be made the same,
though kept as fine as possible. For example, if a grid model provided
average concentrations every two kilometers in a lattice-work pattern
spanning the region of interest, then the observed concentration field
inferred from station measurement must also be resolved at two kilometer
intervals with concentrations obtained at each point in the lattice-work.
If resolution cannot be obtained so finely, then the predicted concentration
field must be adjusted to be comparable with the observed one. The field
having the coarsest resolution if the limiting one.
Once the fields have been resolved into a compatable form, several
performance measures can be computed. We can characterize a concentration
field by indicating for each concentration value the fraction of the area
experiencing a concentration greater than that value. By so doing, we define
a cumulative distribution function (CDF) such as that shown in Figure C-12.
The CDF is the integral of its density function (f), also shown in the figure.
CUMULATIVE
— DISTRIBUTION
FUNCTION (CDF)
PREDICTED
OBSERVED
DENSITY FUNCTION (f)
Concentration
FIGURE C-12.
DISTRIBUTION OF AREA FRACTION EXPOSED TO GREATER
THAN A GIVEN CONCENTRATION VALUE
C-30
-------
For the predicted and observed concentration fields, the CDF's may
differ. The following statistics can be compared in order to characterize
the difference; the CDF itself, the mean expected concentration in the
node led region, and the standard deviation of the area density function.
If the CDF and f were continuous functions, the following express the
form of these measures:
> Cumulative Distribution Function
CDF(C
/
< K) - I f(c)dc (C-16)
> Expected Concentration
A
VA= I cf(c)dc I
-t
LB .
where Cp is the peak and Cg is the background concentration.
> Standard Deviation
°A
• f P (c - vA)*f(c)dc (C-18)
However, the CDF and f are not available in practice as continuous
functions: They are expressed discretely, derived from concentra-
tions at the nodal points of a ground-level grid having dimensions
I by J. The above measures have the following discrete form:
> Discrete Cumulative Distribution Function
J I
CDF(Cm< K) =
IJ =l 1=1
C-31
-------
where m Is the pollutant species and u is a unit step function whose value
is
u(x)
1 , x >0
0 , x< 0
(C-20)
> Discrete Expected Concentration
J I
yj = -jjE E Cjj (C-21)
> Discrete Standard Deviation
0 I
1-1
The predicted and observed concentration fields can be differenced, with
the result being a spatially distributed residual field at the fixed time or
event of interest. The statistics of this residual field are essentially the
same as those described earlier in Eqs. (C-l) to (C-10) for the set of station
residuals. They are as follows (the tilde " denotes "predicted," while m is
the pollutant species and I, J are the number of nodes in the concentration
field grid):
> Average Deviation
0 1 ,__ -x
(C-23)
> Average Absolute Deviation
J I
lcij ' Sjl
-------
> Averaoe Relative Absolute Deviation
m_ I C* X" ICJ J " tfj L (C-25)
rr rm
3=1 1=1 C.
Standard Deviation
2
> Correlation Coefficient
jn . *v 'V=1 1-1 •* /XJ=» 1=' 1 L (c-27)
r mm
°c°c
Calculation of the above statistics can be extended through the model-
ing day by including residual values not just at a specific time or event
but for each hour during the day. Also, a graphical representation of the
correlation between prediction and observation can be developed by plotting
prediction-observation concentration pairs on a scatter plot, much as was
done for station values in Figure C-6.
C-33
-------
c. "Pattern Recognition" Area Performance Measures
Considerable information about model performance often can be found
through the use of "pattern recognition" area performance measures. Even
if a comparison between prediction and observation is difficult due to the
sparsity of the latter data, insight can still be gained through the use
of the measures described here.
The spatial and temporal development of the pollutant cloud is of con-
siderable interest. Frequently, differences between prediction and obser-
vation can be spotted quickly by comparing isopleth plots showing contours
of constant pollutant concentrations. The development of the cloud can be
portrayed graphically in a series of hourly isopleth plots. Shown in
Figures C-13(a) through (e) is a series of hourly isopleth plots. These
represent predictions for ozone generated by the SAI Urban Airshed Model
for the Denver Metropolitan region on 29 July 1975. The locations of the
measurement stations are also shown, as they were in Figure C-8.
The example illustrated in Figure C-13 is typical of applications involv-
ing multiple-source, region-oriented issues (SIP/C, AQMP). However for
specific-source issues, the downwind Isopleth contours are approximately
elliptical. An example of a specific-source isopleth, or "footprint",
plot was presented earlier in Figure V-4.in Chapter V.
Model performance can also be characterized by comparing against
observation the time histories of the size of the area in which concentrations
exceed a certain value. Such a comparison would provide insight into the
temporal variation of prediction-observation differences. An example of
such a history is presented in Figure C-14 for ozone in the Denver Metro-
politan region. A meteorology the same as that observed on 28 July 1976
was employed by the SAI Urban Airshed Model, along with emissions for that
date and projected emissions for 1985 and 2000, to predict the spatial and
temporal distribution of ozone for each year. Lines of constant concentra-
tion values are also shown.
C-34
-------
NORTH
i £
•^fSS^^-^S^^^ cSE^T-
^^^rf^r-zp-fj^^-ir*^-* J**?l-^ ^T-rr^r^T'J"<
it
••"; '>
SOUTH
FIGURE C-13.
(a) Hour 0800-0900 MST
ISOPLETHS OF OZONE CONCENTRATIONS (pphm) ON 29 JULY 1975,
Isopleth interval 1 pphm. This figure is based on pr
dictions of the SAI Urban Airshed Model for the Denver
Metropolitan region.
C-35
-------
NORTH
• ' ' ' i . _i—i—i—i
(b) Hour 1000-1100 MST
FIGURE C-13 (Continued)
C-36
-------
NORTH
^^^^^Sfe^fe^^^
^^^^LZ^^^^^Sz^^^&svSz
S£:&&ga^&^S&S&3*&,
|^sfevg^,a>^^gfe=»^5
£S^^S^^«s*fL2K?£3SbS
=r;-TL^-* 'ZiAT**-*^"—T^» SuT5C X utxi*^*- *i>^S
^^^^^^^' /m^^
^"^"•2 ffS^r^-Sr^g^s??^^ / / "***
'^i^^f ^E&S&*AS/ ..,-•
pyw/77Fx '/:;''
>-:x;Sl.' / /// X i!
SOUTH
(c) Hour 1200-1300 MST
FIGURE C-13 (Continued)
C-37
-------
i—i—i—i—i—i—i—i—r—i
SOUTH
(d) Hour 1400-1500 MST
FIGURE C-13 (Continued)
60
C-38
-------
NORTH
SOUTH
(e) Hour 1600-1700 MST
FIGURE C-13 (Concluded)
C-39
-------
Year 1976 Emissions
Year 1985 Emissions
Year 2000 Emissions
I
n
i
1/1
O)
E*»
QJ
t- I
O
3
CT,
t/t
•a
01
S-
i rrfl Mf I i I
MM IMUtMMIHB: 11
0 !•
a • £ ^ _jf
ft tt ¥ I 4
* 4 4
Time of Day By Hourly Interval
Meteorology for 28 July 1976 Assumed
FIGURE C-14. SIZE OF AREA IN WHICH PREDICTED OZONE CONCENTRATIONS EXCEED GIVEN VALUES FOR YEARS 1976, 1985,
AND 2000. This figure is based on predictions of the SAI Urban Airshed Model for the Denver
Metropol1 tan reglon.
-------
If both the predicted and observed concentration fields are resolved
compatibly to the same scale, the two can be differenced and the residuals
plotted directly as isopleth contour plots. This may be done either at a
fixed time/event or hourly. The example shown in Figure C-15 is typical
of such a plot, although it was not derived from observational data. This
particular figure was calculated by differencing the annual N02 concentra-
tions predicted by the EPA's Climatological Dispersion Model (COM) for two
emissions regions: one a base case and the other a 17.5 percent reduction
in emissions in downtown Denver. Since the magnitude of the residuals may
be strongly a function of certain atmospheric forcing variables (wind
speed or inversion height, for instance), it can be helpful to normalize
residuals to the forcing variable values.
Several model performance problems can be spotted qualitatively using
residual isopleth plots. Some of those that might be apparent are:
> Good peak/poor spatial agreement.
> Bad peak/good spatial agreement.
> Different peak location.
A composite measure can also be useful in assessing the relative peak/
spatial performance of a model. The peak-to-overall indicator can be calculated
at the time of the peak as the ratio of the mean residual in the vicinity of the
peak (where concentrations are within 10 percent of the peak, for example) to
the mean residual in the overall region.
4. EXPOSURE/DOSAGE PERFORMANCE MEASURES
The health effects experienced by an individual in a pollutant region
seem to be a function of both the concentration level and the duration of
exposure. The aggregate impact experienced by the total populace would be
expressed by the sum of the effects impacting each individual. The serious-
ness of the pollutant problem would be related not just to the spatial and
temporal development of the pollutant alone but also to the spatial and temporal
distribution of the population living beneath it. Several performance measures
attempt to guage model performance on this basis.
C-41
-------
NORTH
•~r—1 r i i ~ i i
SOUTH
FIGURE C-15.
TYPICAL RESIDUALS ISOPLETH PLOT FOR ANNUAL AVERAGE N02<
Units are in
C-42
-------
In this section we present some of these performance measures, acknow-
ledging at the outset the difficulty of their computation in practice. Whether
the spatial scale is urban/regional or source-specific, the problem is essen-
tially the same. Not only must the predicted and observed concentration field
be known, but also the population distribution. All are temporally and spatially
varying. Conceivably, the observed concentration field may be estimable from
station measurements. Recording actual population movements during the modeling
day, however, seems a nearly unsurmountable task. In reconciling these problems,
several options seem available; among these are the following two:
> If the observed concentration field can be estimated
acceptably well, both it and the predicted field can
be used with the predicted population distribution to
compute exposure dosage measures for comparisons. Such a
predicted distribution is frequently available when multiple-
source, region-oriented issues are being considered. To
characterize diurnal variations in emissions, particularly
mobile automotive ones, one must estimate the diurnal
patterns of population movement. Having done so, one can
infer the hourly spatial distribution of population. How-
ever, for specific-source issues, population distribution
is seldom considered. Since only the emissions from the
individual source are of interest, those of the same species
resulting from nearly population-related activities need not
be explicitly considered, except to compute a background con-
centration over which the specific-source emissions are super-
imposed. Unless additional information can be gathered
(from a traffic planning agency perhaps), population distri-
bution may not be available, even as a prediction.
> If the observed concentration field is not known acceptably
well, computation of the observed exposure/dosage measures
cannot be accomplished. However, these quantities often can be
C-43
-------
calculated for model predictions (presuming a predicted
population distribution history is available). Even though
these cannot be compared against their observed values,
they can help characterize model predictions. A model
sensitivity analysis can be conducted to estimate the effect
of population distribution on exposure/dosage calculations.
If sensitive, the gathering of additional observational data
might be warranted, as would an expanded effort in predicting
population movement.
The exposure/dosage performance measures considered here fall into
three types: scalar, statistical, and "pattern recognition." We
present in Table C-6 some specific measures.
a. Scalar Exposure/Dosage Performance Measures
Several performance measures are defined in terms of concentration
exposure and dosage. The exposure is defined to be the product of the
number of persons experiencing a concentration in excess of a certain value
and the time duration over which the value is exceeded. It is expressed
analytically as follows:
Em(x,y,n) = / * P(x,y,t) ufc"(x^.t)-nldt , (C-28)
/ r
P(x,y,t) u|cm(x,y,t)-Ti]dt
ul
where Em(x,y,n) is the exposure at a point (x,y) to a concentration Cm(x,y,t)
of species o in excess of a given level, n (the NMQS, for example);
P(x,y,t) is the population level at (x,y) at time t; u is the unit step
function such that
z >0
U(2) - '
/.
z< 0
C-44
-------
TABLE C-6. SOME EXPOSURE/DOSAGE PERFORMANCE MEASURES
Type
Performance Measure
Scalar
Statistical
Pattern
recognition
a. Difference for the modeling day in the number of
person-hours of exposure to concentrations:
1) Greater than the NAAQS
2) Within 10 percent of the peak.
b. Difference for the modeling day in the total
pollutant dosage.
a. Differences in the exposure/concentration fre-
quency distribution function; differences in the
following are of interest:
1) Cumulative distribution function
2) Density function
3) Expected value of concentration
4) Standard deviation of density function
b. Cumulative dosage distribution function as a
function of time during the modeled day.
For each hour during the modeled day, an isopleth
plot of the following (both for predictions and
observations):
1) Dosage
2) Exposure
C-45
-------
and At = t? - t-j, is the duration of exposure. The total exposure between
t, and t« over a region measuring X by Y can be written as
m
E(x,y,rOdx dy
Since in practice the predicted and observed concentration fields are
known only at discrete points on a ground-level grid, it follows that the
population function P(x,y,t) must be resolved into a compatible, discrete
form. Once this is done, the discrete forms of Eqs. (C-23) and (C-30) can
be written as follows:
= E P!, ufc";n-nl
n=N, 1J L 1J J
J I
£(«) " E?.(n) (C-32)
T j=l i=l 1J
where I and J are the X and Y dimensions of the grid while N, and Np are
the starting and ending hours of the summation.
Dosage is defined as the product of the population at a given point,
the pollutant concentration to which that population is exposed, and the
length of time for which the exposure to that concentration persists. The
dosage provides a measure of the total amount of pollutant present in the
total volume of air inhaled by people over the time period of interest. This
may be illustrated as follows. Let the dosage, D, be in units of ppm-person-
hour. If the volume of air inhaled is V cubic meters per person-hour, the
quantity of pollutant, Q, present in the air may be estimated as
Q = DV x l(f6 cubic meters (C-33)
If V is assumed to be a constant, then Q is proportional to D and the dosage
0 provides a measure of Q. It may be noted that the dosage provides no
C-46
-------
information as to the amount of pollutant inhaled per person. The dosage
at a point (x,y) may be expressed as
±
Dm(x,y)=/
2
P(x,y,t) C(x,y,t)dt (C-34)
while the total dosage within an area X by Y is
Y X
D™ = f f Dm(x,y) dx dy (C-35)
JQ \
Expressed in discrete terms these two equations can be written as
(c'36)
J I
Z D?. (C-37)
TTi 1J
Using Eqs.' (C-31) and (C-32) we can calculate two measures of interest:
We can determine for the predicted and observed concentrations the number
of person-hours of exposure to concentrations (1) greater than the NAAQS
and (2) near the peak (within 10 percent, for example). Using Eqs. (C-36)
and (C-37), we can determine for the modeling day the total predicted and
observed pollutant dosage. By comparison of the predicted and observed
values, the seriousness of any differences between the two can be estimated
in a way that relates, though crudely, to pollutant health impact.
b. Statistical Exposure/Dosage Performance Measures
Exposure/dosage performance measures have several useful statistical
variants. One of these is the difference between the predicted and observed
exposure/concentration distribution function. An example of such a function
is shown in Figure C-16, calculated for ozone in the Denver Metropolitan
C-47
-------
I
2
I,
o I
=2=
10 12 14 16 18 20
Ozone Concentration (pphn)
22
26 28
FIGURE C-16.
ESTIMATED EXPOSURE TO OZONE AS A FUNCTION OF OZONE
CONCENTRATION FOR 3 AUGUST 1976 METEOROLOGY. This
figure is based on predictions of the SAI Urban
Airshed Model for the Denver Metropolitan region.
C-48
-------
region. The figure is based on predictions made by the SAI Urban Airshed
Model using actual emissions and meteorology for 3 August 1976, as well
as projected emissions for 1985 and 2000.
Certain statistics of the exposure distribution are useful: the
cumulative distribution function (CDF) itself, the density function (fE),
the expected value of the pollutant concentration, and the standard devia-
tion of the density function. We show in Figure C-17 a representation of
the general shapes taken by the CDF^ and the f^.
CDF,
'B
Background
Concentration
FIGURE C-17.
GENERAL SHAPE OF THE EXPOSURE CUMULATIVE
DISTRIBUTION AND DENSITY FUNCTIONS
Incorporated in this figure are two important assumptions: None of the
population is exposed to concentrations above the peak value, Cp, while
all are exposed to concentrations at least as high as the background value,
'B'
The first of these is certainly a valid assumption. The second may
not be accurate in all circumstances. Those persons spending their days
C-49
-------
indoors within environmentally controlled buildings may experience lesser
concentrations than the background value. Noting this possible limitation,
however, we proceed.
The CDFF can be derived from the exposure function defined in Eq. (C-30)
and illustrated with the example in Figure C-16. It can be expressed as
Em(C)
CDFr(C) = 1 - -J (0-38)
E
The density function, fF, can be derived from this relation as follows
fE
£ [CDFE(C)]
(C-39)
Combining Eqs. (29) and (31), we can write
ET(C) '[ f f*b*rt u[cm(x,y,t) - c] dt dx dy (C-40)
Y X t
From this, we can express its derivative as
BT[ET(C)]= • / / / p(*>y>v «[cFn(x>y>t)"c]dt dx dy (c"41)
Y X t
C-50
-------
where 6 is the Dirac delta function defined such that 6(z) is 1 when z = 0
and zero for all other values of z. The density function can thus be
written as
f f /"P(x,y,t)6[cm(x,y,t) - c] dt dx dy
Hi
(C-42)
The expected value, y£., and the standard deviation, OE> are defined as follows
.C,
/P
CfE(C) dC
rcP
°E = /
-------
This function has the form shown in Figure C-18.
1.0
AC
Concentration
FIGURE G-18. SHAPE OF ^(C), THE APPROXIMATION TO
THE DELTA FUNCTION
Using Eq. (C-45) the discrete form of the density function can be written
in the following form:
n - cl
(C-46)
The expected value and standard deviation then can be expressed as
"E 'I § CkfE
(C-47)
aE =
(C ~
(C-48)
where K is the number of equally spaced intervals, AC, spanning the concen-
tration range from CB to Cp.
C-52
-------
The quantities described above—the CDF_, f£, p_ and a-—form the
basis for a comparison between prediction and observation. Differences
in the shape of the CDFF can be characterized by differences in yc and
2
CE , as well as being revealed by differences in the qualitative shapes
of the f^. If these differences are large, model performance may be
judged unacceptable.
The variation of the cumulative dosage function during the modeling
day is another means for comparing prediction with observation. An example
of such a dosage function is shown in Figure C-19, calculated for ozone in
the Denver Metropolitan region. The figure is based on predictions made
by the SAI Urban Airshed Model.
c. "Pattern Recognition" Exposure/Dosage Performance Measures
The performance of a model in predicting exposure and dosage can be
judged qualitatively by comparing isopleth plots of predicted values with a
similar plot showing observed ones. We present in Figures C-20 and C-21 the
ozone exposure and dosage contours, respectively, predicted by the SAI Urban
Airshed Model for Denver on 3 August 1976. The population distribution
assumed in each was based on data supplied by the Denver Regional Council
of Governments. Residential population figures were corrected temporally
to account for daytime employment patterns. No attempt was made, however,
to adjust for other shifts during the day.
In Figure C-20, the cumulative exposure at one-mile intervals is shown.
Isopleths of exposure to concentrations greater than a certain value are
included for three different levels. In Figure C-21, the cumulative dosages
are shown for each point on the same one-mile spaced grid. In both figures,
the interval of time considered was 13 hours, from 500 to 1800 (MST).
5. "HYBRID" PERFORMANCE MEASURES
As noted earlier, model predictions often are more finely resolved
spatially than measurement data. A consequence of this is the following:
C-53
-------
10
600 700 800 900 1000
1100 1200 1300 1400 1500 1600 1700 1800 1900
TIM of Day (MST)
FIGURE C-19.
CUMULATIVE OZONE DOSAGE AS A FUNCTION OF TIME OF DAY
FOR 3 AUGUST 1976 METEOROLOGY. This figure is based
on the predictions of the SAI Urban Airshed Model for
the Denver Metropolitan region.
C-54
-------
l« II ta It It It I* IT I- It M •! *b M M
M If JMI I* M
O
I
cn
en
M • • • •
17 00*0
14 1 1 1 1
,, |CA| PKN
ii • • a i
19 • • 4 1
l!l • tt 1 1
II • • • •
1* • • • 1
* • • 0 1
Honnn CHI
a • • 0 o
1 1 IA •
1 1 13 12
-J WIT. 1
9 •! III 13
3 9 II II
3 tt 14 14
1 T II II
l l ll li
(1 ll II
LAKE
* • II II
• 1 2 10
• •10
JUT
IffllM
nnoohrm.D
via mi ii tn
10 10 10 4
AAV ABA
lH ni |A i*
10 10 13 ffl
13 ia ia 10
10C
13 13 14 23
KDCCVATEU
II || |g 23
II II 14 33
0 II 10 03
WOOD
13 13 10 31
13 IB 31 31
14 Iffl 10 23
o • ai ai
CO
N CLtUIH V
T~T. •
rtft
KI
IN
3
14
5ft
80
03
00
17
17
18
ai
Lin
I.Y
NonrucLMM
TUOIWTON
NIL
rro
1 Bcrr rani »m
, — . «r«rL ica
fr- "1 1
" D E*n V ic-n AUiu>iu-
33 03 82 42 I* 14 14 3 8 1
Man •& AA •• IA IA • A i
IT IB Ot 0* 21 II III 14 * «
17 83 33 1 94 a> 11 II 84 ft S
iu aa • • • ai ti B l •
CUCIUIV BILLS
27 33 • 1 S 14 81 I 1 •
tLTOH CRiCNWUOO VI.O
»Ht
«!!••••
• i i a • • •
4 a a 2 • • •
4 8 a a • o •
• • • 1 a • • •
• .
2 • 0 • • • •
-b •..'•.»-
(a) Concentration Greater than 8 pphm; Year 1976 Emissions
FIGURE C-20. CUMULATIVE EXPOSURE (IN TO3 PERSON-HOURS) TO OZONE CONCENTRATIONS ABOVE GIVEN
LEVEL IN ONE-SQUARE-MILE GRID CELLS BETWEEN 500 AND 1800 HOURS FOR 3 AUGUST
1976 METEOROLOGY AND 1976 EMISSIONS. Grid numbers are listed on left side and
top of figure. This plot is based on predictions of the SAI Urban Airshed
Model for the Denver Metropolitan region.
-------
r n • IB ii ia is u IB 14 IT
ta ai aa an a4 aa M tr M a*
o
in
en
**
30
3T
at
99
»4
33
23
at
I*
l«
ir
u
!!
•
ia
it
!•
OOLDCN
Honnimn
Non-raciEim
WESTMHIBTEB
TUORIfltMl
mrre
IICKY HHIH *nSHL
AIWADA
WITT. ROC
8T»rL I Nil
recevAitin
aaatiaio
aaa
NOOB
I*I«7IB444~4~|BBBBBBBBBB
-. - ^—.~ t
BIIERIDAII CHCLCHOOB
mien i van e.nuu.*uw i •
BI38*4arillB«BB
' 1 cnemir mtu I
» . t • T 4 7 T I I I B 4 • B
B B t a a B r r • i i U « • FT
jerr co '—i Lim^nm CMEtmnoaito '
URBAN ci, nit vtr
Cl, nil VtT
I 38
aaa»t I I
i
BBBB'BB-
(b) Concentration Greater than 16 pphm; Year 1976 Emissions
FIGURE C-20 (Continued)
-------
II !• 19 14 It U If Ml !• M •! tt M 14 M M *T M B* M
o
tn
0*
M
ar
9*
33
•4
aa
aa
31
2*
14
id
17
U
II
14
ia
13
II
10
«
0
T
GOLDEN
HOnniSON
imoo WIELD
ROKTOCLCIIII
TaonirroH
o o • « u
ronL
DCT8
IKKY HHTH AJUHL
AIWADA
COMMERCE
wrr.
HTAPL INTL
D E M v e n
AONOHA
0 •
CLEMUALK
LAKEVOOD
tHCLKWOOO
CIIEIIRV DILLfl
Jirr co
UllOAN CLHIM V
LiTn.r»
V
OH CIUCNWOOD VI 0
(c) Concentration Greater than 24 pphm; Year 1976 Emissions
FIGURE C-20 (Concluded)
-------
o
90
39
an
37
34
38
34
33
93
31
30
14
IO
17
14
18
14
19
13
II
10
9
n
T
0
4
9
1
» 0
0 0
. 1 .
Vi
0 0
0 0
O 0
0 0
COLI
~ L
: :]
0 0
0 0
m
0 0
0 0
0 0
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
M
n a
3 9
0 0
'f
0 III
IIUIIBOH—
0 0
1 1
1 1
1 1
1 1
1 1
1 14
1 10
10 10
4 4
4 4
0 14
9 9
9 9
14 14
14 0
1' '
-— 1 nnnonriELo
90000
tmm
a an 9 o
18 IT IT IT 18
14 IB 10 IB 17
AHVADA
IO IB IB IB 17
14 IB 19 19 33
33 23 33 33 33
WIT. WHS
84 34 88 38 36
eocevATci
99 33 99 33 34
39 3!l 34 34 98
IB 10 14 19 19
19 19 91 91 93
19 19 SI 31 33
imvooo
30 SO 33 33 ;I3
4 Itt 33 30 F07
3 4 14 94 37
4 4 4 4|,7
jerr co
tmoA*. ei
13
4
1IH9T
4
7
10
34
04
94
99
80
80
68
34
07
07
oen ii
H
94)
99
13
•HOW
19
19
19 14 18
IHMITI
8 B 13
r.n n
8 8 It
font
IWT8
8 8 10
4 4 IO
13 IB 10
01 01 40
01 01 40
98 07 84
• m L -1
98 49J309J
BE
84 84 80
04 84 80
1
64 44 *a|
94 04 03
09 04 30
40 Oil 43
24 20 1 46
,- -1 C
34 |48 44
93 43 43
LITTLCTOn
04 04 00
.10 Oo! 34
14 It
<• 14 90 91 99 9» 94 98 94
97 tfl 29 30
,
12 13
ici.eim
19 13
13 10
Miurroii
18 10
IB II
IB II
IB II
(1
IB 4
40 43
40 40
84 83
84 83
B4 83
CLI
84 83
44 87
49 87
10 10 T 4
10 It 7 T
II 0 B B
4 B • 1
imr
9 9 1 I
OtOVBCt.
9 9 1 10
99 9 90 17
99 9O 30 17
OTAFL IRTI
M 93 39 7
97 04 04 1 T
97 94 941 99
00 40 40 04
UIDALE
00 00 40 94
49 00 OB OO
43 00 On 00
IV 9 4 1 30 OH 1 14
ICMT OIULB I— 1 I
10 t 9 24) 00 | 4
ONECNWOOO VLO
99 13 13 II II 9
90 II II 10 10 B
94 II II 14 14 B
0 00 0 0
4 4 I 0 0 0
4 to 0 B B
1 1 1 1 J 0
IWTII AM9WL
17 14 14 19 •
17 14 14 19 IB
•
AulWRA
0
0
•
000
000
090
10
10
3 1 • 7 7
9 B 1 1 1
• 4 • 1 4 0
B 4 4 } 0 0
4
T
1
1
1
10
IO
0 9
0 O
ft •
000
1
1
1
0 0
0 O
0 0
0 0
FIGURE C-21.
CUMULATIVE OZONE DOSAGES (IN 106 PPHM-PERSON-HOURS) IN 0!ir.-SQUARE-MILE GRID
CELLS FROM 500 TO 1800 HOURS (MST) for 3 AUGUST 1976 METEOROLOGY AND EMISSIONS
IN 1976. This figure is based on predictions of the SAI Urban Airshed Model
for the Denver Metropolitan region.
-------
model performance sometimes must be evaluated using performance measures
requiring different classes of data "completeness." For instance, the
observed concentration field may not be inferred reliably from station
data even though the predicted field can be well described. In such a
case, concentration isopleth plots for both could not be constructed and
compared directly. Still, we would not wish to rely solely on station
performance measures. To do so, we would sacrifice some of the information
content available on the prediction side of the comparison.
Several performance measures are "hybrid" ones. They are designed
for use when a different level of concentration information is available
for prediction than for observation. We discuss here such a measure, the
basis for which is shown in Figure C-22.
MEASUREMENT
STATION
PREDICTED CONCENTRATION FIELD
NEAREST POINT AT WHICH
PREDICTION EQUALS STATION
OBSERVATION
X
' ACTUAL CONCENTRATION FIELD
FIGURE C-22. ORIENTATION WITH RESPECT TO MEASUREMENT STATION OF NEAREST
POINT AT WHICH PREDICTION EQUALS STATION OBSERVATION
C-59
-------
In the figure, isopleths are shown for the predicted and actual concentration
fields. Only at the measurement station, however, is data available describing
the actual field. The offset between the two fields nevertheless can be
characterized by determining the vector (distance, azimuthal orientation)
from the station to the nearest point at which the predicted concentration
equals the measured value. This can be done for several hours, producing
a time history of the distance and orientation of that point. A plot of
this can be constructed, as shown in Figure C-23.
NORTH
WEST-
5-6 p.m.
3-4 p.m.
SOL
6-7 a.m.
TH
2-1 p.m.
-EAST
FIGURE C-23.
SPACE-TIME TRACE OF LOCATION OF NEAREST POINT
PREDICTING A CONCENTRATION EQUAL TO THE
STATION MEASURED VALUE
The space-time trace shown in the figure is centered at the measurement
station. Similar traces could be constructed for each station. Space-time
correlations could be made to infer the amount and orientation of the
displacement of the two concentration fields.
C-60
-------
APPENDIX D
SEVERAL RATIONALES FOR SETTING
MODEL PERFORMANCE STANDARDS
D-l
-------
APPENDIX D
SEVERAL RATIONALES FOR SETTING MODEL PERFORMANCE STANDARDS
In Chapter VI of this report, we identify a "preferred" set of model
performance measures, the values of which are helpful in assessing the degree
to which model predictions agree with observations. It remains for us to
decide how "close" these must be in order to judge model performance to be
acceptably good. In this appendix, we present four alternate rationales
for making such decisions: Health Effects, Control Level Uncertainty,
Guaranteed Compliance, and Pragmatic Historic. To maintain perspective
about each rationale and the problems for which their use may be appropriate,
we recommend Section D of Chapter VI be read prior to considering this appendix.
1- Health Effects Rationale
Ambient pollutant concentrations are not themselves our most funda-
mental concern but rather the adverse health effects they produce. The
NAAQS are chosen to serve as measurable, enforceable surrogates for the
"acceptable" levels of health impact they imply. Because health effects
are of such basic importance, it makes sense to define model performance
in such terms. However, quantifying the health effects resulting from
exposure to a specified pollutant level can be a difficult and controver-
sial task. Toxicological studies in laboratories by necessity are performed
at high concentrations, often at levels and dosages seldom occuring even
in the most polluted urban areas. Experiments are conducted in animals
whose response patterns may not serve as perfect analogues for human behavior.
Epidemiological studies are confounded by the variety of effects occuring
simultaneously in a complex urban environment. Consequently, isolation
of a "cause-and-effect" relationship between health effect and pollutant
level becomes statistically very difficult.
Nevertheless, in this discussion we indicate one means whereby health
effects can be used as a basis for evaluating the acceptability of model
performance. We postulate the existence of a health effects functional, *,
dependent both on concentration level and health effects for all exposed
0-2
-------
persons in the polluted region. This quantity (the area-integrated cumu-
lative health effect) we use as the metric of interest. If the ratio of
the predicted value of 4> to its observed value remains within a certain
tolerance of unity, model performance is judged acceptable.
Several features of this approach have appeal. Among these are:
> The health effects functional need not be known precisely,
only its general shape.
> The use of area-integrated cumulative health effects as a
»etric has strong intuitive appeal; it is less sensitive
than dosage to concentrations not near the peak value.
> A transformation of variables reduces the spatial sensitivity
of the metric, *, with more than one spatially distributed
region mapped in to the same value of $; this can result
in an increase in generality of application.
> Simplifying assumptions can be invoked to allow computation
of specific numerical values.
*. Area Cumulative Health Effects As a Concept
"Total area dosage" is frequently used as a surrogate for "total area
health effects." Mathematically, total area dosage, DT can be expressed as
DT(trt2) = J J J 2P(x,y,t)C(x,y,t)dt
dx (D_i)
X Y 11
where the duration of exposure is At (=t2-t,); P(x,y,t) and C(x,y,t)"are
the population and concentration at (x,y) at time t; and X and Y represent
the spatial limits of the polluted region.
However, the concentration C(x,y,t) in this relation and the time
duration of exposure really combine to approximate health effects. Suppose
that a health effects function exists such that
D-3
-------
HE = HE(C,At)
(D-2)
Such a function could behave as shown in Figure D-l, with HE disappearing
only when concentrations approach zero. Alternatively, a threshold concen-
tration might exist below which specific effects are either indistinguishable
from a background level or below the threshold of perception.
Concentration
FIGURE D-l, POSSIBLE HEALTH EFFECTS CURVES
We define a new metric: the area-integrated cumulative health effects
functional, . It can be written as follows:
*(At) = J J J P{x.y,t) HECCfx.y.tJ.t-t^dt dy dx (D-3)
X Y At
If this function could be evaluated for predicted and "true" values of
P(x,y,t) and C(x,y,t), we could formulate the performance standard such
that their ratio, r, was required to remain within a fixed tolerance of
unity, i.e.,
I predicted
r =
1 observed
> 1 - a
(D-4)
where a is some small value (10 percent, for instance)
D-4
-------
chosen to represent a maximum acceptable level of uncertainty in
aggregate health impact. It may be noted with this standard that
model acceptability is called into doubt only if the predicted value
of 4> is less than the "observed" value. This makes sense for the
following reason: Considering only a perspective based on health
effects, we are concerned that the model predict conditions leading
to health impact at least (or nearly so) as large as actually occurs.
To bound model on the "upper" side, another rationale must be used
(control level uncertainty, perhaps).
The expression in Eq. n,-3, however, is of only academic interest
unless it can be made more tractable. Several of its key limitations
are as follows:
> It is a spatial integral. The value of P(x,y,t) and C(x,y,t)
change for each new application locale. Thus it is diffi-
cult to extend results obtained in one situation to those
expected in any new one.
> The health effects function, HE, is dependent on concen-
tration and cannot be expressed directly without being
"mapped" through the concentration field.
However, through a transformation of variables, some difficulties
can be overcome. We will replace in Eq. D-3 the double spatial
integration,by a single concentration integration taken over the range
of ambient values (background, CB, to the current peak, Cp). Total
population within the modeling region at time t, PT(t), can be written as
C (t)
f fp(x,y,t) dydx = PT(t) - JP w(C.t) dC (D-5)
X Y CB
where w(C,t) is the population exposed to a concentration C at time t.
(By definition, no one is exposed to concentrations lower than the
D-5
-------
background value, Cg.) A pictorial representation of the population
function P(x,y,t) and wfC.t) is shown in Figure D-2.
ISOPLETH
C+dC*
ISOPLETH C
I IS THE POPU-
I LATION AT THIS POINT
I
^-MODELING
REGION
-w(C,t)dC IS THE
POPULATION WITHIN
THIS AREA
FIGURE D-2. REPRESENTATION OF SPATIAL AND CONCENTRATION
DEPENDENT POPULATION FUNCTIONS
The equivalence expressed in Eq. D-5 holds without qualification
providing the modeling region is chosen large enough to contain the
background (Cg) isopleth for every hour during the day. However, this
requirement can be relaxed under the following condition: No or very
few persons live or work in the area outside the modeling region but
within the Cg isopleth. In such a case the modeling region need only
be large enough to enclose within it the population of interest.
An important observation can now be made: The health effects func-
tion, HE, can be introduced into both sides of Eq. D-5 without disturb-
ing the equality. Doing so and integrating with respect to time, the
area integrated cumulative health effects (CHE) functional can be trans-
formed into
D-6
-------
rrcp(t)
t(At) = JJ wfC.tjHEtC.t-t;,) dCdt (D-6)
AtCB
It is this equation with which we deal in the remainder of this section.
b. Components of the Cumulative Health Effects Functional
We now examine each of the two major components of the CHE functional:
the population distribution and health effects function. For Eq. D-6
to be of any use to us, it must be made analytic in a way that has a degree
of generality from one application locale to another. Consequently, we
are guided by three principal objectives: Both W(C,t) and HE(C,At) must
t
be analytic, integrable, and based upon simple, easily understood assump-
tions. To accomplish this, important simplifications are invoked. The
degree to which they limit the generality of the results is discussed,
although additional research beyond the scope of this study seems desirable,
Population Distribution Function
The function w(C,t) represents the distribution of population with
respect to both concentration level and time of day. As a first approx-
imation, we assume it is separable, i.e.,
w(C,t) = w(c) f (t) , (D-7)
w
where w(C) is the distribution of daytime (workday) population with
respect to concentration level alone at a particular fixed time (the time
of the concentration peak, for example), and fw(t) is a weighting function
chosen to reflect the diurnal variation in that distribution (residential
vs. commute vs. work hours).
D-7
-------
Within a pollutant cloud, concentrations tend to be distributed as
follows: A distinct peak value occurs, with concentration falling off
as a function of radial distance from that peak. Contours of constant
concentration (isopleth lines) surround the peak concentrically, with
concentration diminishing to background levels. This radial distribution
of concentration level is suggestive. If population is distributed about
the peak such that
2* 0
P(C) = f f p(r*,e)r*dr*de ,
-------
cloud may have drifted some distance (10-30 km) from the densest
population centers. However, our approach h?re is highly pragmatic.
To render Eq. D-9 soluble, we must invoke simplifying assumptions.
Having done so, comparison of our results with actual data offers us a
measure of our success.
Such data has been obtained from ozone exposure/dosage studies
done for the Denver Metropolitan region using the grid-based SAI
Urban Airshed Model. Shown in Figure D-3 is the population density
function predicted on 3 August 1976 for the hour from 1300-1400 (1 to
2 p.m.)—the time of the predicted ozone peak (0.24 ppm). The concen-
tration field predicted by the model was used. A coarse population
distribution was derived based upon data supplied by the Denver
Regional Council of Governments (DRCOG) and was adjusted to approximate
employment shifts. Since the analysis supplied exposure estimates only
above 0.08 ppm which were expressed no more finely than in 0.02 ppm
increments, an uncertainty band, as shown, exists about each point.
Several key observations can be made. The value of w(C) seems to
become very small at the peak concentration, i.e., while concentration
levels may be high near the peak (within 90% of it), the area (and
population) affected is small. Also, an apparent anomaly occurs between
0.18 and 0.20 ppm. This may be due to any of several causes. Population
density non-uniformities, however, appear to be the most lively of these.
Using the data contained in Figure D-3 as a standard for comparison,
we may proceed in developing a simplified, analytic form for w(C). We
make two key assumptions in doing so. First, we assume a shape for the
radial concentration distribution, C(r), which we invert to give us r(C).
Then we make a simplifying assumption about the population density
distribution, p(r,e).
To estimate C(r), we may idealize isopleth contours as a series of
concentric circles, as shown in Figure D -4. Further, we may assume
D-9
-------
o
200
a.
a.
a>
a.
ro
O
o
0)
O.
o
150
100
J3
+J
O
o 50
a.
o
a.
0
K-OH
I—OH
NOTES:
1) DATE: 3 AUGUST 1976
2) TIME OF DAY: 1300-1400 HOURS
3) POPULATION CORRECTED TO
ACCOUNT FOR EMPLOYMENT
^-0-^
_L
J.
J.
J
0 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30
Concentration (ppm)
FIGURE D-3. POPULATION DISTRIBUTION AS A FUNCTION OF CONCENTRATIONS. Based on predictions
of the SAI Urban Airshed Model for the Denver metropolitan region.
-------
there to be N isopleths between the peak concentration, C , and the
background value, C.
BACKGROUND C
B
PEAK C,
FIGURE D-4. IDEALIZED CONCENTRATION ISOPLETHS
If we assume that for isopleths separated by a constant concentra-
tion decrement, AC, the interisopleth distance grows exponentially (that
is, the isopleths are separated by a steadily growing distance), then
we may write an expression for the n-th radius such that
n
r =
n
Ar.
n-1
£
i=0
l - e
Ar
(D-10)
D-ll
-------
Since
n(CP - CB)
- nr-N~; •
(D-ll)
we can solve for n, substitute this into Eq. D-10, and then generalize
to yield the following:
C(r) . Cp - f in [l - (^jrj . (D-12)
where AC is the interisopleth concentration decrement and b is chosen
so that r(CB) equals the radius of the pollutant cloud, (here assumed to be
the urban radius). Several typical such concentration distributions
are shown in Figure 0-5.
We can now invert this relation to estimate r(C). Doing so, we can
write
(D-12)
Substituting this and its derivative into Eq. D -9, we get an expression
for w(C) such that
2ir
J p(r,e)de
D-12
-------
o
u>
E
CL
CL
0.24
0.22
0.20
0.18
0.16
0.14
I 0.12
4-1
2
g 0.10
u
0.08
0.06
0.04
0.02
0
NOTES:
1) Ar IS THE DISTANCE FROM THE PEAK TO THE
FIRST ISOPLETH, I.E., CD - AC ( the 0.20 pptn
ISOPLETH) F
2) PEAK CONCENTRATION OF OZONE IS 0.24 ppm.
3) BACKGROUND CONCENTRATION IS 0.04 ppm.
4) POLLUTANT CLOUD RADIUS IS 13 MILES.
BACKGROUND
6 8 10
Radius from Peak, r(C) (miles)
12
14
16
FIGURE D-5. TYPICAL RADIAL CONCENTRATION DISTRIBUTIONS ABOUT THE PEAK. Parameters
are chosen to be representative of the Denver metropolitan region.
-------
We now make another key simplifying assumption. We approximate the value
of the integral by assuming a uniform radial population density, i.e.,
/*
J p(r,e)de = 2irD (D-14)
0
Substituting this into Eq. D-13, we arrive at the final form for w(C):
w(C) =
where
(D-16)
(D-17)
K3 =
and D is chosen such that the integral of w(C) between (L and Cp equals
the total population within the modeled area.
We have made thus far a number of significant assumptions. To test
their adequacy, we can select parameter values appropriate for the Denver
example, calculate w(C), and compare the results against the data shown
in Figure D-3. The parameter values selected are shown in Table D-l.
In Figure 0-6 we show the population distribution predicted by
Eq. D-l5. Several observations can be made about its agreement with the
test data.
D-14
-------
TABLE D-l. SELECTED PARAMETER VALUES IN DENVER TEST CASE
Symbol
CP
CB
AC
Description
Value
D
Peak concentration (ozone).
Background concentration.
Concentration decrement between
isopleth lines (N=5 isopleths).
Exponent by which interisopleth
distance grows, selected such that
C(r) equals C~ at r=13 miles from
the peak (at the approximate urban
radius).
Radius from peak to the first iso-
pleth (the 0.20 ppm contour).
Uniform population density_chosen
such that the integral of w(C)
between CD and CD equals the total
population (1.275 million).
0.24 ppm
0.04 ppm
0.04 ppm
0.4
1 mile
2405 persons/
sq. mi.
D-l 5
-------
o
t
250 r-
•s 200
Q.
CL
s-
OJ
CL
tn
c
O
i-
-------
> Qualitatively, the shapes seem to agree.
> The analytic form of w(C) seems to underpredict the
distribution of population at higher concentration
levels.
> The anomaly occurring in the data at 0.19 ppm remains
unaccounted for in the analytic form.
Despite the seeming limitations imposed by our assumptions, however,
agreement with the test data seems surprisingly good. It remains to be
seen in further investigation (beyond the scope of this study) whether
this result is typical or merely fortuitous. We emphasize that results
obtained thusfar, while encouraging, should be regarded as preliminary.
In deriving Eq. D-15, we assumed a uniform population distribution.
We can estimate qualitatively from our results the change in w(C) re-
sulting from variations in this assumption. The shifts expected in w(C)
for a nonuniform population density are illustrated in Figure D-7. In
all cases the integral of w(C) is assumed to equal the total regional
population.
PEAK OCCURS IN
LOWER DENSITY REGION
UNIFORM POPULATION
DENSITY
\\
o
f*
EAK OCCURS- IN HIGHER
DENSITY REGION
CB
Concentration
FIGURE D-7.
SHIFTS IN w(C) CAUSED BY NONUNIFORM
POPULATION DISTRIBUTIONS
D-17
-------
We now consider other variation of w(C,t) with time. Temporal
changes in the function are caused by two principal effects:
> Evolution of the Concentration Field
- The peak concentration occurring at a time t, Cp(t),
increases during the morning, usually reaches a
diurnal peak in the early afternoon, and then de-
creases slightly by late afternoon.
- The overall radius of the pollutant cloud~r(CB)~
increases up to the time of the peak.
- As the day progresses near-peak concentrations
"spread out," that is, the percentage of the total
cloud area having concentrations near the current-
hour peak (say, within 201 of it) increases during
the day
> Population Shifts
- Urban areas have two distinct patterns of popula-
tion distribution during the day: residential
(non-work) and employment (workday). These are
separated by two peak-traffic commute periods.
- A percentage of the population during the day is
mobile, traveling from one point to another.
Me have assumed here that the total impact of these effects can be
approximated by a separable weighting function, fw(t), applied to the
function w(C). The extent to which this is valid needs to be verified
by additional investigation. Yet, as a first approximation it has
some plausibility, and it allows us to proceed to an analytic result
for model performance standards—our principal objective.
Health Effects Function
Health effects resulting from exposure to polluted air manifest
themselves in many ways, each varying in the symptom it produces and
D-18
-------
the seriousness of its impact. Among such effects are the following:
bronchial irritation, reduced lung function, enzyme damage, eye irri-
tation, dizziness, and coughing. Some of these manifest themselves as
noticeable but low-level discomfort; others produce more serious impact
such as aggravation of respiratory illness. Equating each effect on an
absolute scale and relating their aggregate weighted impact directly to
ambient pollutant levels, however, is a formidable task. Efforts at doing
so have been subject to uncertainty and controversy. To overcome these
difficulties, we resort to several conceptual simplifications. Rather
than differentiating between individual health effects, we collapse them
together into a single function, whose "seriousness" is dependent on
concentration level, C, and duration of exposure, At. We represent
this by the following
HE = HE(C.At) (°-19)
We now make an intuitive appeal. While we may not know the value
of HE in an absolute sense, we observe that its value increases, that
is, the HE gets "worse," as concentration levels rise and the duration
of exposure increases. Further, because health effects at higher con-
centrations and durations are more serious, we expect HE to grow faster
than linearly with increasing C (and probably At). We also can expect
HE to exist even at very low values of C, though these effects may be
small, perhaps below the threshold of human perception. Qualitatively,
the shape of HE might look as shown in Figure D-8.
Based on the reasons noted above, we can make a useful approximation.
We assume that HE is separable, one part dependent on C and At, and that
it can be described by the following simple relation:
HE(C,At) = ACYfH£(At) (D-20)
where A is a scaling constant (whose value we need not know, as we shall
observe later); -y is a "shaping" parameter whose value is likely to be
D-19
-------
LU
X
o>
1C
JV
THRESHOLD
Concentration, C Exposure Time, At
(a) Variation With Concentration (b) Variation With Exposure Time
FIGURE D-8. EXPECTED SHAPE OF HEALTH EFFECTS FUNCTION
greater than one, i.e., linear; and %Ut) is a weighting function
dependent solely on exposure time.
c. Analytic Solution of the Cumulative Health Effects Functional
. Having now specified analytic forms for the population distribution
function, w(C,t), and the health effects function, HE(C,At), we may
proceed to evaluate the area- integrated cumulative health effects func-
tional, $, as it was defined in Eq. D -6. We may rewrite * as follows:
At) =
2
= J *„(*)
*"(C)ACYdC
F(At)*(Cp) ,
(D-21)
where Cp is the peak concentration experienced during the day.
Using relations developed previously, we may evaluate *. Its
value is
D-20
-------
rp
) = A | w(C)CTdC
CB
/
Cp
Aj |Kjl - K,e-3L)e-^|CTdC (D-22)
CB
Though no completely general solution exists to this equation, the
integral may be evaluated in closed-form for each integer value of Y,
the health effects function shaping parameter. A point-wise analytic
solution to Eq. D -22 thus exists.
d. Calculation of Minimum Allowable Predicted Peak
As noted in Eq. D-4, the model performance standard could be
specified in terms of a minimum allowable ratio of the "predicted"
to "measured" values of
-------
where Cp Is the predicted peak concentration and CRm is the measured
peak value. By writing the standard in this form, an important simpli-
fication results: Two parameters, being constant, appear outside the
integrals in the numerator and denominator of Eq. D-23. Since their
values in both are equal, they cancel. By this means, we eliminate the
need for "knowing" the health effects function scaling coefficient, A,
and the population distribution scaling constant, KQ. With the
rationale we present here, uncertainty associated with both, while
appreciable, thus does not affect the setting of performance standards.
He can invert Eq. D-23 to solve for the minimum allowable ratio
of predicted to measured peak concentration value. He do so for the
Denver example discussed earlier, presenting the results in Figure
D- 9. We show results for several representative values of Y and r.
If health effects varied linearly with concentration and r equaled
0.90, for instance, any predicted peak would be acceptably higher than
64 percent of the measured peak value. Similarly, if health effects
were a cubic function of concentration and r=0.90, the predicted peak
would have to exceed 80 percent of the measured value.
Several decisions must be made in determining a final value for a
performance standard based upon this health effects rationale: A
minimum acceptable value must be chosen for r, the ratio of predicted
to measured area-integrated cumulative health effects; and a judgment
must be made about the maximum likely value of Y , the exponent of
concentration in the health effects function. Possible values for use
might be r and Yof 0.90 and 3 or 4 respectively. For reference, we
note that for Y= 10, the minimum allowable ratio of predicted to
measured peak is 94 percent.
e. The Health Effects Rationale; A Summary
A model performance standard based upon pollutant health effects
has intuitive appeal. For this reason the rationale presented in this
D-22
-------
i.oor
o
ro
to
2 3
Exponent of Health Effects Function, y
FIGURE D-9. MINIMUM ALLOWABLE RATIO OF PREDICTED TO MEASURED PEAK CONCENTRATION VALUE
-------
section is of interest. Among the advantages it offers are the
following:
> It is general enough to be applied in many different
locales and applications; while parameters of the method
are appli cat ion -dependent, the method itself is much less so.
> It is analytic and based upon easily derived parameter
values.
> The test for model acceptability is based upon a
simple comparison of predicted and measured peak
concentration values.
> Many of the sources of uncertainty in the method drop
out of its final formulation.
> Results can be condensed into a single figure such as
that shown in Figure D-9.
Similarly, the rationale has several limitations:
> Only a lower bound on the allowable difference between
predicted and measured peak is provided; a prediction
in excess of the measured peak (even by a great deal)
is not sufficient to reject a model on health effects
grounds since the model predicts effects at least as
great as those actually existing.
> The method does not evaluate explicitly a model's
, spatial or temporal behavior.
The rationale presented here should be regarded as a preliminary
method. While meriting additional consideration, the method and many
of its assumptions need to be examined critically. Among the funda-
mental questions for which answers need to be sought are the following:
> On what basis do we select the minimum allowable ratio
of area-integrated cumulative health effects?
D-24
-------
> What value of health effects exponent is most appropriate?
> Does the population distribution, w (C), always repro-
duce the data as well as indicated in Figure D-6?
Does it need to?
> Is w(C,t) really a separable function, as assumed?
What about HE(C,At)?
> Are health effects really related to peak concentra-
tion and exposure time in the fashion assumed here?
What about those who work in environmentally controlled
buildings and may thus be isolated from full exposure
to ambient concentration levels?
We feel the rationale presented here has a number of advantages.
We also feel it requires a careful review and some additional examina-
tion, particularly as regards the questions noted above,
2. Control Level Uncertainty Rationale
In order to reduce peak ambient concentrations in an airshed from a
particular level to one at or below the NAAQS, reduction of emissions into
that airshed is required. The degree of that reduction, however, is
dependent on the amount by which the current peak level exceeds the
standard. Uncertainty in our knowledge of the current peak concentration
(due either to measurement or modeling limitations) translates into cor-
responding uncertainty in the amount of emissions control we must require.
This direct relationship, though generally a highly nonlinear one,fforms
the basis for another rationale for setting model performance standards.
Its guiding principle is as follows; Uncertainty in the percentage of
emissions control required (PCR) must be kept to within certain allowable
bounds.
In this section we discuss this Control Level Uncertainty (CLU)
rationale. We first indicate for a specific pollutant (ozone) how one
may proceed from PCR bounds to equivalent allowable tolerances on the
difference between the predicted and measured peak concentration. We then
D-25
-------
present one means whereby the PCR bounds can be determined from the
economies of pollution control costs. Several benefits derive from
use of the CLU rationale, among which are the following:
> It makes explicit the relationship between model per-
formance limits and the maximum acceptable level of
uncertainty in estimates of regional emissions
control.
> It provides a structure whereby model performance limits
also can be related to equivalent uncertainty bounds
on the total regional cost of pollution control equipment.
The rationale presented here is a useful complement to the Health
Effects (HC) rationale presented earlier. We noted in discussion of that
rationale that it could not provide an upper bound on the maximum
allowable difference between predicted and observed peak concentration
levels. It merely required that the predicted peak be greater than a
fraction (near unity) of the measured peak, i.e., Cpp >. BCpm where B is
near unity (e.g., 0.9). Were Cpp to be larger than Cpm, no health effect
penalty would be incurred by designing a control strategy based upon Cpp.
Rather, the principal penalty would be an economic one: The cost of control
would be greater than that actually required. It is in setting the upper
bound on the allowable value of Cpp - Cpm that the CLU rationale has its
greatest value, since it addresses directly the cost of control.
We can generalize this point as follows: The greatest cost of under-
prediction of the peak concentration lies in the underestimation of
health impact, while the greatest consequence of overprediction is the
extra economic cost associated with unnecessarily imposed control.
Health Effects and CLU, then, are compatible rationales. If the predicted
peak is required to statisfy ^ <. Cpp - Cpm <. 1^, then it seems reason-
able that K2 be selected based upon the CLU rationale with ^ chosen to
be the lesser of the values determined by the HE and CLU rationales.
D-26
-------
a. The Relationship Between CLU and the Concentration Peak
In most cases a highly nonlinear relationship exists between primary
emissions and the ambient concentrations that result from them. The
dynamic behavior of the atmosphere is complex, as are the chemical changes
undergone by dispersing pollutants carried by it. Simplifying assump-
tions, however, can sometimes be made. We consider here one example in
which this can be done.
For urban regions in which certain specific criteria are met (Hayes,1977),
the ozone production resulting from various non-nethane mixtures of precursor
hydrocarbons (NMHC) and oxides of nitrogen (NOji can be represented by means
of an ozone isopleth diagram such as the one shown in Figure D-10. (EPA,
1976). Whether the use of such a diagram is justified in a.given region
depends heavily on a number of factors, among which are the prevailing
meteorology, solar insolation, emissions type/timing/geometry, terrain type/
complexity, and the presence of large upwind pollutant sources.
If a region meets the criteria, however, an isopleth diagram may be
used as an approximation relating regional emissions to consequent peak
. ozone levels. The region-wide cutback in emissions of precursor HC and
- NO necessary to reach the NAAQS from a given starting point can then be
calculated, given a background ozone value (usually about 0.04 ppm) and
a control mix (NMHC versus NOX cutback). Usually, in urban areas the
emphasis has been on NMHC reduction. The starting point often is defined
in one of two ways: It is specified by a peak 03 measurement and either a
• NMHC/NO ratio typical of ambient conditions prevailing in the early
morningX(6-9 a.m.) or specific concentrations of either of the precursors.
Most frequently, it is the first of these methods that is used.
Because the chief value of the isopleth diagram is in its use in
estimating regional emissions cutback, it is helpful to replot the
isopleth diagram as shown in Figure D-ll (Hayes, 1977). In doing so,
D-27
-------
03/fOx1dant (ppm)
0.08 0.12 0.16 0.200.24 0.28 0.32 0.36
0.2
Source: EPA (1976b),
0.4
0.6
0.8
1.0 1.2
NMHC (ppmC)
1.6
1.8
2.0
FIGURE D-10. PROTOTYPICAL ISOPLETH DIAGRAM
-------
100
I
PO
vo
I 1
PEAK 03 (ppm)
0.36
0.24
0.20
8 10
NMHC/NOV
12
14
16
Note: No change In NOV level and no 0- background concentration were assumed.
X <3
FIGURE D-ll. THE ISOPLETH DIAGRAM REPLOTTED
-------
percentage control required (PCR) can be highlighted explicitly. While
in principle any mix of NMHC and NO control could be considered, the
«
example shown assumes that or.ly HC control is employed. That is, per-
centage control reduction (PCR) is equivalent to percentage hydrocarbon
control required (PHCR),
The PHCR diagram in Figure D-11 may be used in the following way
to deduce model performance standards. First, the measured peak ozone
concentration and the appropriate 6-9 a.m. NMHC to NO ratio together
A
define a unique point on the PHCR diagram. The nominal PHCR is thus
identified. Then, by defining an allowable band about the nominal PHCR
(say ± a where a is some small value}, we can identify directly an
equivalent band about the measured peak ozone value. A model predicting
an ozone peak within that allowable band would be judged as acceptable
under this rationale.
We can illustrate the technique by means of an example. Suppose the
measured peak ozone was 0.16 ppm and the 6-9 a.m. NMHC/NO was estimated
A
to be 9.5. This point is denoted on the figure as A. From Figure D-11,
we see that the PHCR is about 70 percent. If we allow an uncertainty in
the PHCR of ± 10 percent, we see that the value based upon model predic-
tions of the peak must lie between 60 and 80 percent. The corresponding
values of peak ozone are determined from points C and B, respectively, on
the PHCR diagram. For a model to be judged as acceptable, it must
predict an ozone peak value, Cp , such that 0.122 <. Cp <. 0.24 ppm or
76 <. Cp- /Cpm <. 150 parcent.
Several general observations may be made about the above results,
though we caution that they are particular to ozone as a pollutant.
Among the observations are the following:
> Because of the characteristic shape of ozone PHCR diagrams,
the upper value of the allowable tolerance band is less
restrictive then the lower one. This is illustrated clearly
in the example.
D-30
-------
> The allowable band for Cp is always bounded on the upper
and lower side (as contrasted with the HE rationale which
calculates only a lower bound).
> In those cities for which use of the ozone isopleth shown
in Figure D-ll is appropriate and where the 6-9 a.m.
NMHC/NO is greater than about 5 or 6, the width of the
allowable band for Cp is not strongly sensitive to the
value of NMHC/NO .
/v
b. The Relationship Between CLU and Control Cost
While the allowable uncertainty in control level (± a in the above
example) may be set in many ways, we examine here one important means to
do so: the explicit use of regional pollution control costs, if these can
be specified unambigously. We might, for instance, choose as our guiding
principle the following: The uncertainty in the total cost of regional
pollution control should not be greater than a certain value 6. We may
restate this in terms of model performance. The level of control deriving
from the predicted peak, Cpp, should not differ in cost by more than a
certain amount from that level determined based upon the measured peak, CpR.
To proceed we must define the total regional cost of pollution control,
TC. Depending on the level of control required, alternative regional
control strategies can be designed. The cost of each generally can be
specified, at least in approximate terms. By plotting the cost of a series
of "preferred" strategies against the level of control they achieve, TC
can be determined, as shown in Figure D-12.
Several aspects of the TC curve should be noted. While TC is zero
for a PCR of zero, any non-zero value of PCR has associated with it a
minimum, non-zero cost. Thus, the TC curve really "begins" with a step
function at PCR = 0. TC rises quickly at first as many fixed costs
of control are incurred.. The cost then increases more slowly as fnxed
costs are spread over greater values of PCR. Finally, at high levels of
PCR each additional amount of control becomes more difficult (and more
expensive) to achieve. The TC function, consequently, rises rapidly.
D-31
-------
C_3
tn
O
O
•M
c
O
CJ
O
PCR
PCR
1 ' ~"0
Percentage Control Required
100
FIGURE D-12.
TOTAL REGIONAL CONTROL COST AS A FUNCTION
OF THE LEVEL OF CONTROL REQUIRED
Once the total cost function has been defined, the allowable band for
the predicted ozone peak can be found in the following way:
> Step 1. The nominal control level PCRQ can be deter-
mined using a diagram such as that in Figure D-10.
With all-NMHC control as considered in deriving
that figure, PCRQ is identical to PHCRQ,
> Step 2. The nominal control cost, TCQ, can be found using
a TC diagram similar to the one in Figure D-ll.
> Step 3. The maximum and minimum allowable TC values then
can be calculated and the corresponding bounds
on PCR determined.
> Step 4. Using the PHCR diagram once again, the allowable
bounds on predicted peak ozone can be found by
employing the PCR bounds found in Step 3.
D-32
-------
The above procedure is a straightforward one creating a
structure in which control cost uncertainty can be considered explicitly.
The example presented, however, is appropriate only for considering ozone
in those regions having ambient conditions simple enough to be represented
by an isopleth diagram. Extension of the procedure to other pollutants
and into regions of greater atmospheric complexity requires that additional
research be conducted beyond the scope of the current effort.
3. Guaranteed Compliance Rationale
As formulated in the federal regulations, the NAAQS are explicit,
with maximum pollutant levels specified that must not be exceeded with
greater than a certain frequency. Peak one-hour concentrations of ozone,
for instance, must not exceed 0.08 ppm more often than once per year.
With the standards written in such an absolute fashion, it may be argued
that little room exists for uncertainty about achieving compliance. Under
such circumstances, a model's performance should be constrained to
"guarantee" that its use will not lead to underestimating the degree of
emissions control required.
Model behavior can affect significantly the likelihood of meeting
the NAAQS. In those regions currently in noncompliance, the effective-
ness of candidate control strategies can be assessed only by means of
model predictions of the peak concentrations resulting from each. If a
model systematically underpredicts the peak value for concentrations
near the NAAQS, the adequacy of controls might be overestimated. Similarly,
if the model overpredicts the peak, controls designed using it might be
excessive.
a. Description of the GC Rationale
With the above in mind, we examine the Guaranteed Compliance (GC)
rationale for setting model performance standards. We state its guiding
principle as follows: Compliance with the NAAQS must be "guaranteed,"
D-33
-------
with all model uncertainty on the conservative side even if it means
introducing a systematic bias into model predictions. The term
"guaranteed" should be taken here in a limited sense. We intend it
to mean that "the probability is very small" that a model will predict
a peak value less than the standard when its actual value is greater.
We illustrate this principle using the diagrams in Figures D-13
and 14. In these figures we illustrate two models, one "conservative"
(Figure D-13) and the other "nonconservative" (Figure D-14). For each,
we show two cases: an actual peak concentration, CA, higher than the
NAAQS, Cs, and one near the standard. We represent the probability density
function of the model as f(C) and the expected value of the predicted peak
as C". Two types of uncertainty affect a model's performance. The first
includes error in model inputs and uncertainty in the values of the model
parameters themselves. These affect the shape of f(C). Uncertainty of
the second type is due to the inability of the model formulation to re-
present reality fully. The difference between the expected model predic-
tion, £, and the actual value, CA, of the peak concentration is a measure
of the effect of formulation errors. As we define it here, a "conserva-
tive" model is one for which the value of (f exceeds CA, while for a "non-
conservative" model the reverse is true. In both figures, the shaded area
A represents the probability that the model will predict a peak concentra-
tion less than the standard at the same time the actual value is greater.
With the GC rationale, we want to insure that A remains acceptably
small. In mathematical terms, we insist that
A « / f (C)dC <. £ , (D-25)
where c is some suitably small number. From the figures we see that A
can be kept small only if iC exceeds C.. Under the requirements of the
GC rationale, only a model having these characteristics would be judged
acceptable.
D-34
-------
c
u
c
C-1
L.
a
u
s_
o
c
r
cr
c.1
NAAQS ACTUAL
CSCA
Peak Concentration
(a) Peak Concentration
Higher than the NAAQS
Peak Concentration
(b) Peak Concentration
Near the NAAQS
FIGURE D-13.
UNCERTAINTY DISTRIBUTION FOR
A CONSERVATIVE MODEL
O) t
u
3
U
U
cr
u
QJ
z:
—
NAAQS
ACTUAL
f(C)
01
u
3
U
8
>
u
c
01
-
•cr
OJ
ACTUAL
Peak Concentration
(a) Peak Concentration
Higher than the NAAQS
Peak Concentration
(b) Peak Concentration
Near the NAAQS
FIGURE D-14. UNCERTAINTY DISTRIBUTION FOR
A NONCONSERVATIVE MODEL
D-35
-------
A practical consideration now becomes important. For peaks near
the NAAQS, we have no way of knowing the actual peak, C^, whose value
we are trying to predict. This is clearly so. Until emissions control
has been implemented and ambient conditions "improve," we cannot estimate
C. with measurement data. Our strategy using the GC rationale is as
follows:
> Step 1. We assume C^ = C$ and estimate the amount by
which £ must exceed CA in order that A <_ %.
> Step 2. We then use the model to predict the peak under
current (uncontrolled) conditions, C* for which
we have measurement data to estimate the current
peak, CA*.
> Step 3. To judge acceptability, we require the model
prediction, C*, to exceed C.* by as much as C~
exceeded C. when C. = C,.. Actually, this is a
bit more complicated. Since C.* is based upon
measurements, it is subject to instrumentation
error. We know CA* only in terms of a measured
value and its probability density function. There-
fore, we must consider the comparison of C.* and
C* statistically, requiring the probability that
C* exceeds C.* by C-C. to be greater than some
large value (near 1.0).
We have invoked several important assumptions here, whose general
validity would require further verification if the GC rationale were to
be applied in judging model performance. Among them are the following:
> (T maintains the same relationship to CA for ambient condi-
tions ranging from current ones to those characterizing
compliance with the NAAQS.
D-36
-------
> The probability density function, f(C), is known or
can be determined, as can C.
> Instrumentation uncertainty can be characterized,
allowing Step 3 to be accomplished.
There are several difficulties associated with the GC rationale
approach, however, some of which are conceptual and some practical.
Among the most important of the conceptual difficulties is the intro-
duction of a conservative bias into model predictions. By insisting
that the model "overpredict" peak concentrations, almost certainly
we will select abatement strategies requiring more control than needed.
Difficulties of the practical kind also can be significant. For most
models, determination of f(C) is a difficult (and usually impractical)
process. The uncertainty in predicting the peak is partially due to
uncertainty in the data input to the model. Since the model results are
related to inputs only in a complex and nonlinear way, estimating the
output uncertainty distribution in terms of the input error distributions
seldom can be done directly. While a Monte Carlo-type of analysis in
principle can be conducted, the number of model runs required and the
amount of computing resources consumed are so considerable as to render
such an analysis impractical.
b. A Possible Simplification
Short of doing a Monte Carlo analysis, is there anything useful that
can be determined? In certain simple circumstances, there is. We may
infer, when appropriate, some limited information about f(.C), C and C^.
To do so, we first recall the modified form of Tchebycheff 's inequality,
PJIx - ii| itej <> > (D-26)
1 *
9k
where P is the probability that -ka >. x - " and -ko *• * ' T1» 1 1s a random
variable, n is its expected value, and o is its standard deviation. This
D-37
-------
relationship holds for all probability distributions. We can adapt it
to the present problem by rewriting it in the following way:
(D-27'
where C is a random variable whose value is the peak concentration pre-
dicted by the model, £ is its expected value, and OG is its standard
deviation. Cp is the standard (NAAQS).
$
The relation in Eq. D-27 is a useful one. The area A in Figures D-13
and 14 represents the same probability as that on the left hand side of
Eq. D-27. Using Eq. D-25, we may now write
(D-28)
where £ is the maximum allowable value of A. From this, we may infer the
minimum allowable value of o /(C - C"). Its value is
C 5
Still, we need an independent approximation of oc in order to solve
Eq. D-29 for the minimum value of C - C$. To do so, we estimate the
maximum value a is likely to assume, that is, the aQ* such that
D-38
-------
o <_a * • (D-30)
C C
If we then use OG* in Eq. D-29, we can determine (C" - Cs)mi-n-
Suppose we represent model behavior with a system response function,
«j>, that tranforms model inputs into the model-predicted concentration
peak, i.e.,
C = »(£) , (D-31)
where C is the predicted peak, an e_ is the vector of model inputs. Suppose
further that we know the probability distributions of each of the input
errors, and that we can identify their one-sigma variations, a£.. If so,
we can determine the maximum change in the predicted peak that would occur
if all error sources varied simultaneously by a standard deviation from
their nominal values. We note that increases in some inputs lower C and
others raise it. Thus, to bound the value of AC, we consider the root-mean-
square of the changes in C as each input is varied separately. This max-
imum AC can be written as
f\ 4* *!..+„ - *L. (D-32)
where each e. (1 <. i <. N) is varied separately and the corresponding change
in peak concentration is represented by the quantity in the brackets. If
we assume that AC is a suitable estimate of OG*, we can write (using
Eq. D-29)
- C ) = •AC , (D-33)
Vmin '
D-39
-------
which provides an indication of the amount of "overprediction" the model
must provide.
We now present an example. Suppose we consider a simple Gaussian
model [no reflection, continuously emitting source), whose only source
of error is the wind speed, U. We assume the following: ou = 0.5 m/sec,
U = 2 m/sec, and C = 35 ppm (the one-hour federal standard for CO). Using
Eq. D-32, we determine that AC = 7 ppm. Then, using Eq. D-33 and assuming
that e = .05, we estimate that (C~ - Cs)min = 14.7 ppm. Using the GS rationale,
we would require when modeling current ambient conditions that the model
over-predict the peak by this same amount (assuming that there was no error
associated with the measurement).
c. The GC Rationale: An Assessment
We have included the GC rationale in our discussion primarily for the
sake of completeness. While the guiding principle underlying it--
"guaranteeing" that an adequate abatement strategy will be designed—
has its virtues, the method as conceived here has significant problems
associated with its use. It is cumbersome and impractical, except in the
most limited of circumstances. Also, it may be excessively conservative,
introducing a systematic bias into model evaluation.
Unless the major problems noted here can be solved somehow, the other
rationales considered in this chapter appear to have greater promise. We
do not recommend that this rationale be pursued extensively in any additional
work.
4. Pragmatic/Historic Rationale
Experience is growing in the use of air quality simulation models. They
have been applied to a variety of problems in a number of different situa-
tions. As an familiarity grows with both their capabilities and limitations,
we become more able to foresee their behavior in new applications. Taking
D-40
-------
advantage of our growing expertise, we may find it reasonable to set per-
formance standards for models based upon the following principle: In
each new application a model must perform at least as well as the "best"
previous performance of a model in its generic class in a similar application.
This approach is a pragmatic one, forced upon us by some very practical
considerations: our limited ability to derive theoretically justifiable
values for the standards and the number of different measures required to
characterize fully model performance. Five major problem areas exist in
characterizing the agreement of model predictions with field observations.
The model may be judged on its ability to predict the concentration peak,
to avoid systematic bias, to limit absolute error, to maintain spatial
alignment, and to reproduce temporal behavior of concentrations. To assess
a model's performance in these five areas, we recommended earlier in this
chapter the use of a number of different performance measures. Our chief
difficulty is as follows: There are as yet few theoretical means to assign
appropriate values for these measures. 'We have identified in this report
several promising candidates for judging the prediction of peak concentrations
Additional work is required, however, to determine appropriate standards
for many of the other measures.
While such additional work is proceeding, what must we do? Many issues
of great practical interest are pending, each of which requires the eval-
uation of model performance. Revisions to State Implementation Plans, for
instance, must be reviewed. Model performance studies now being conducted
by the EPA must continue.
We recommend that the Pragmatic/Historic rationale be used to set
acceptable bounds for performance measures for which no other better method
exists. As research provides greater insight into "better" rationales, we
recommend appropriate updates to the standards.
D-41
-------
To employ this rationale the following steps might be followed:
> Step 1. The proposed application is categorized, identifying
the group of previous studies with which its per-
formance must be compared. The criteria by which
this might be done could include pollutant type,
prevailing meteorology, source geometry, and terrain
irregularity.
> Step 2. Performance measures appropriate to the applications
category are calculated.
> Step 3. Calculated values are compared with the "best" values
previously attained in a similar application.
For the Pragmatic/Historic rationale to be of use, the EPA would
have to accomplish the following steps. A scheme for classifying appli-
cations into "similar" categories needs to be developed. Then, data on
previous modeling efforts needs to be assembled and appropriate perfor-
mance measure values calculated. Finally, a mechanism for updating the
"performance data base" needs to be established. Such a mechanism would
require the EPA to assume a custodial role over the data base, amending
it as results of new modeling studies become available.
D-42
-------
REFERENCES
Ames, J., et al. (1978), "The User's Manual for the SAI Airshed Model,"
EM78-89, Systems Applications, Incorporated, San Rafael, California.
Anderson, G. E. (1978), private communication.
Anderson, 6. E., et al. (1977), "Air Quality in the Denver Metropolitan
Region 1974-2000," EF77-22, Systems Applications, Incorporated,
San Rafael, Calfiornia.
Argonne (1977), "Report to the U.S. EPA of the Specialists' Conference on
the EPA Modeling Guideline," 22-24 February 1977, Argonne National
Laboratory, Argonne, Illinois.
Burton, C. S., et al. (1976), "Oxidant/Ozone Ambient Measurement Methods,"
EF76-111R, Systems Applications, Incorporated, San Rafael, California.
Calder, K. L. (1974), "Miscellaneous Questions Relating to the Use of Air
Quality Simulation Models," Proc. of the Fifth Meeting of the Expert
Panel on Air Pollution Modeling, Chapter 6, NATO/CCMS.
Code of Federal Regulations [CFR] (1975), Title 40 (Office of the Federal
Register, U.S. Government Printing Office, Washington, D.C.).
EPA (1977), "Uses, Limitations and Technical Bases of Procedures for
Quantifying Relationships Between Photochemical Oxidants and Precur-
sors," EPA-450/2-77-021a, Office of Air Quality Planning and Standards,
Environmental Protection Agency, Research Triangle Park, North Carolina.
(1978a), "Workbook for the Comparison of Air Quality Models,"
EPA-450/2-78-028a,b, Office of Air Quality Planning and Standards,
U.S. Environmental Protection Agency, Research Triangle Park, North
Carolina.
(1978b), "Guidelines on Air Quality Models," EPA 450/2-78-027,
Office of Air Quality Planning and Standards, Environmental Protec-
tion Agency, Research Triangle Park, North Carolina.
Johnson, W. B. (1972), "Validation of Air Quality Simulation Models," Proc.
of the Third Meeting of the Expert Panel on Air Pollution Modeling,
Chapter VI, NATO/CCMS.
Liu, M. K., and D. R. Durran (1977), "The Development of a Regional Air
Pollution Model and Its Application to the Northern Great Plains,"
EPA-908/1-77-001, Office of Energy Activities, U.S. Environmental
Protection Agency, Denver, Colorado.
R-l
-------
Rosen, L. C. (1977), "A Review of Air Quality Modeling Techniques," UCID-
17382, Lawrence Livermore Laboratory, Livermore, California.
Roth, P. M., moderator (1977), "Report of the Validation and Calibration
Group (II-5)," in "Report to the U.S. EPA of the Specialists' Conference
on the EPA Modeling Guideline," pp. 111-120, 22-24 February 1977,
Argonne National Laboratory, Argonne, Illinois.
Roth P M., et al. (1976), "An Evaluation of Methodologies for Assessing the
Impact of Oxidant Control Strategies," EF76-112R, Systems Applications,
Incorporated, San Rafael, California.
R-2
-------
TECHNICAL REPORT DATA
(Please read Instructions on the reverse before completing)
REPORT NO.
PA-450/4-79-032
2.
3. RECIPIENT'S ACCESSION-NO.
. TITLE AND SUBTITLE
'erformance Measures and Standards for Air Quality Simu-
ation Models
5. REPORT DATE
October 1979
6. PERFORMING ORGANIZATION CODE
. AUTHOR(S)
i. R. Hayes
8. PERFORMING ORGANIZATION REPORT NO.
. PERFORMING ORGANIZATION NAME AND ADDRESS
Jystems Applications, Incorporated
950 Northgate Drive
San Rafael, California 94903
10. PROGRAM ELEMENT NO.
11. CONTRACT/GRANT NO.
68-02-2593
2. SPONSORING AGENCY NAME AND ADDRESS
13 TYPE OF REPORT AND PERIOD COVERED
Office of Air Quality Planning and Standards
U. S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
3.TYPE OF REPOR1
Final Report
14. SPONSORING AGENCY CODE
15. SUPPLEMENTARY NOTES
16. ABSTRACT
(Currently there are no standardized guidelines for evaluating the performance of
air quality simulation models. In this report we develop a conceptual framework for
objectively evaluating model performance. We define five attributes of a well-
behaving model: accuracy of the peak prediction, absence of systematic bias, lack of
gross error, temporal correlation, and spatial alignment] The relative importance of
these attributes is shown to depend on the issue being-aTraressed and the pollutant
being considered. Acceptability of model behavior is determined by calculating
several performance "measures" and comparing their values with specific "standards."
raflure to demonstrate a particular attribute may or may not cause a model to be
rejected, depending on the issue and pollutant.
Comprehensive background material is presented on the elements of the performance
evaluation problem: the types of issues to be addressed, the classes of models to be
used along with the applications for which they are suited, and the categories of
aerformance measures available for consideration. Also, specific rationales are
developed on which performance standards could be based. Guidence on the inter-
pretation of performance measure values is provided by means of an example using a
large, grid-based air quality model.
17.
KEY WORDS AND DOCUMENT ANALYSIS
DESCRIPTORS
b.lDENTIFIERS/OPEN ENDED TERMS
c. COSATl Field/Group
Air Pollution
Turbulent Diffusion
Mathematical Models
Computer Models
Atmospheric Models
Dispersion
Air Quality Simulation
Model
Model Validation
Model Evaluation
18. DISTRIBUTION STATEMENT
Release Unlimited
19. SECURITY CLASS (ThisReport)
None
20. SECURITY CLASS (Thispage)
None
21. NO. OF PAGES
311
22. PRICE
EPA Form 2220-1 {9-73}
------- |