End Use of Solvents Containing Volatile Organic Compounds


SEPA
            United States
            Environmental Protection
            Agency
           Office of Air Quality
           Planning and Standards
           Research Triangle Park NC 27711
EPA-450 3 79-032
May 1979
            Air
End  Use of Solvents
Containing Volatile
Organic Compounds

-------
                                     EPA-450/3-79-032
End  Use of  Solvents  Containing
  Volatile  Organic  Compounds
                        by

                     Ned Ostojic

            The Research Corporation of New England
                 125 Silas Deane1 Highway
              Wethersfield, Connecticut 06109
                 Contract No. 68-02-2615
                     Task No. 8
              EPA Project Officer: Reid E. Iversen
                     Prepared for

           U.S. ENVIRONMENTAL PROTECTION AGENCY
              Office of Air, Noise, and Radiation
           Office of Air Quality Planning and Standards
           Research Triangle Park, North Carolina 27711

                     May 1979

-------
                              DISCLAIMER


     This report has been reviewed  by the Office of Air Quality Planning
and Standards,  U.S.  Environmental Protection Agency, and approved for
publication.  Approval  does not  signify that the contents necessarily
reflect the views and policies of the U.S. Enviornmental Protection
Agency, nor does mention of trade names or commercial products constitute
endorsement or  recommendation for use.
                                  ii

-------
                               ABSTRACT
     Currently there are no standardized guidelines  for evaluating  the
performance of air quality simulation models.   In  this report we develop
a conceptual framework for objectively evaluating  model  performance.  We
define five attributes of a well-behaving model:   accuracy of the peak
prediction, absence of systematic bias, lack of gross  error, temporal cor-
relation, and spatial alignment.   The relative importance  of these  attri-
butes is shown to depend on the issue being addressed  and  the pollutant
being considered.  Acceptability of model behavior is  determined by cal-
culating several  performance "measures" and comparing  their values  with
specific "standards."  Failure to demonstrate a particular attribute may
or may not cause a model to be rejected, depending on  the  issue and pollutant.

     Comprehensive background material is presented on the elements of the
performance evaluation problem:  the types of issues to be addressed, the
classes of models to be used along with the applications for which  they are
suited, and the categories of performance measures available for considera-
tion.  Also, specific rationales are developed on  which performance standards
could be based.  Guidance on the interpretation of performance measure values
is provided by means of an example using a large,  grid-based air quality
model.
                                  m

-------
                           ACKNOWLEDGMENTS
     A number of persons have generously  provided their assistance and sup-
port to this project.   Special thanks  is  due Philip Roth, whose fore-
sight and leadership made this project possible.  His perceptive advice
and guidance contributed immeasurably  to  the results of this work.

     Steven Reynolds and Martin Hillyer made many significant, insightful
comments, which were greatly appreciated.

     For their patience and diligence,  grateful thanks is also due the
members of the SAI support staff, particularly Marie Davis, Sue Bennett,
Chris Smith, and Linda Hill.
                                   IV

-------
                              CONTENTS


DISCLAIMER	     ii

ABSTRACT	    "i

ACKNOWLEDGMENTS	     iv

LIST OF ILLUSTRATIONS  	    vii

LIST OF TABLES	     *i

LIST OF EXHIBITS	    xiv

  I  INTRODUCTION  	    1-1

     A.  Overview of the Problem 	    1-2

     B.  Structure of the Report 	    1-5

 II  SUMMARY	    H-l

     A.  Main Results	    H-l

     B.  Detailed Summary	    H-2
         1.  Summary of Chapter III (Issues)	    H-2
         2.  Summary of Chapter IV (Models)  	    II-3
         3.  Summary of Chapter V (Performance Measures) ....    I1-4
         4.  Summary of Chapter VI (Performance Standards) .  .  .   11-14

 III  TSSUES REQUIRING MODEL APPLICATION  	   III-l

     A.  A Perspective on the Issues	III-l
         1.  Federal Air Pollution Law	III-2
         2.  The Code of Federal Regulations	III-3

     B.  Generic Issue Categories  	   III-7
         1.  The Issues:  Their Classification 	   III-8
         2.  The Issues:  Some Practical Examples and
             Their  Implications for Air Pollution Modeling .  .  .  111-10
         3.  The Issues:  A Prologue to the Next Chapter  ....  111-13

-------
IV  AIR QUALITY MODELS	      JV-"1
    A.   Generic Model Categories 	      IV-2
        1.   Rollback Category	      Iv'2
        2.   Isopleth Category	      IV-4
        3.   Physico-Chemical  Category	      IV-5
    B.   Generic Issue/Model  Combinations 	     IV-16
    C.   Model/Application Combinations	     IV-22
    D.   Some Specific Air Quality Models	     IV-22
    E.   Air Quality Models:   A Summary	     IV-25
 V  MODEL PERFORMANCE MEASURES	       V-l
    A.   The Comparison of Prediction with Observation	       V-2
    B.   Generic Performance Measure Categories 	       Y-4
        1.  The Generic Measures	       V-5
        2.  Some Types of Variations Among Performance
            Measures	      v~10
        3.  Several Practical Considerations 	      V-10
    C.  A Basic Distinction:  Regional Versus Source-Specific
        Performance Measures	      V-15
    D.  Some Specific Performance Measures 	      V-22
    E.  Matching Performance  Measures to Issues and Models . .      V-27
        1.   Performance Measures  and Air Quality  Issues.  . . .      V-27
        2.   Performance Measures  and Air Quality  Models.  . . .      V-33
    F.  Performance Measures:   A  Summary 	...'..      V-36
VI  MODEL PERFORMANCE STANDARDS	      VI-1
    A.  Performance Standards:  A Conceptual Overview	      VI-2
    B.  Performance Standards:  Some Practical
        Considerations 	     VI-4
        1.   Data Limitations	     VI-5
        2.   Time/Resource Constraints	     VI-6
        3.   Variability of Analysis  Requirements  	     VI-6
    C.  Model  Performance Attributes	     VI-7
    D.  Recommended Measures  and  Standards	     VI-12
        1.   Recommended Performance  Measures  	     VI-14
        2.   Recommended Performance  Standards	     VI-23
        3.   Summary Table of Recommended Measures and
             Standards	     VI-30
        4.   Formulas for Calculating  Performance  Measures
             and Standards	     VI-32

                                    vi

-------
 VI  MODEL  PERFORMANCE  STANDARDS  (Continued)

    E.   A  Sample  Case:   The  SAI  Denver  Experience	VI-39

         1.  The  Denver Modeling  Problem 	  VI-39
         2.  Values  of  the Performance Measures	VI-4U
         3.  Interpreting the Performance Measure Values  	  VI-4b

     F.   Suggested Framework  for  a Draft Standard	VI-53

VII. RECOMMENDATIONS FOR FUTURE WORK	V11'1

     A.   Areas for Technical  Development 	  	  VII~2

         1.  Further Evaluation of Performance Measures  	  VI1-2
         2!  Identification and Specification of Prototypical
             Point Source "Test Bed" Data Bases	VI1-2
         3.  Examination of Performance  Evaluation  Procedure in
             Sparse-Data Point Source Applications  ...  	  VII-3
         4.  Further Development of Rationales for  Setting
             Performance Standards	- •   VII-4

     B.  Assessment of  Institutional Implications  	   VII-5

     C.  Documents To Be Compiled	VII~5

 APPENDICES

    A     IMPORTANT  PARTS OF  THE  CODE OF FEDERAL REGULATIONS
          CONCERNING AIR PROGRAMS	    A-'

    B     SOME SPECIFIC AIR QUALITY MODELS	    B'1

    C     SOME SPECIFIC MODEL PERFORMANCE MEASURES   	    C-l

     D     SEVERAL RATIONALES  FOR  SETTING MODEL PERFORMANCE
          STANDARDS	    D"1

 REFERENCES	    R-1
                                    vn

-------
                           ILLUSTRATIONS
II-l    Various  Levels  of Knowledge About Regional Concentrations  .  .  .    II-9

II-2   Various  Levels  of Knowledge About Specific-Source
       Concentrations	    11-9

 V-l    Various  Levels  of Knowledge About Regional Concentrations  ...     V-6

 V-2   Various  Levels  of Knowledge About Specific-Source
       Concentrations  	     V-7

 V-3   Sample Regional Isopleth Diagram Illustrating Ozone
       Concentrations  in Denver on 29  July  1975 for
       Hour 1200-1300  MST 	  	    V-17

 V-4   Sample Specific-Source Isopleth Diagram Illustrating
       Concentrations  Downwind of a  Steady-State Gaussian
       Point Source 	    V-18

 V-5   Concentration Isopleth Patterns for  Various Source Types ....    V-20

 V-6   Schematic of a  Point Source Measurement Network   	    V-21

 V-7   Locus of Possible Footprint Locations for an  Elevated
       Point Source	    V-21

VI-1    Orientation and Scaling of CAVE and  d* A*65 on
       a Prediction-Observation Correlogram .	      VI-37

VI-2   Locations of Monitoring Stations in  the Denver
       Metropolitan Region  	  VI-41

VI-3   Predicted and Observed Ozone  Concentrations at Each
       Monitoring Station During the Day  (Denver,  28 July 1976) ....  VI-42

VI-4   Correlogram of  Ozone Observation-Prediction Pairs
       for Sample Case (Denver, 28 July 1976) 	  VI-46

VI-5   Normalized Deviations About the Perfect Correlation  Line
       as a Function of Ozone Concentration (Denver, 28 July  1976)  . .  VI-47

VI-6   Non-Normalized  Ozone Deviations About the  Perfect Correlation
       Line Compared with Instrument Errors (Data  for  14 Hours and
       8 Stations, Denver, 28 July  1976)   	  VI-48

VI-7   Non-Normalized Ozone Absolute Deviation About the Perfect
       Correlation Line Compared with  Instrument  Error  (Data  for
       14 Hours and 8 Stations, Denver, 28  July  1976)  	  VI-49

                                 vi ii

-------
VI-8   Ground-Traces of the Predicted and Observed Peak Ozone
       Concentrations (Denver, Hours 1100-1200 to 1400-1500
       Local Standard Time, 28 July 1976)  ...............  VI-52
VI-9   Possible Relationships Between the Model  Performance
       Standards and a Guidelines Document ...............  VI-54
 C-l   Locations and Values of Predicted Maximum One-Hour-Average
       Ozone Concentrations for Each Hour from 8 a.m.  to 6 p.m .....    C-7
 C-2   Concentration Histories Revealing Time Lag or Spatial Offset   .  .   C-14
 C-3   Estimate of Bias in Model Predictions as  a Function of
       Ozone Concentration .......................   C-15
 C-4   Time Variation of Differences Between Means of Observed
       and Predicted Ozone Concentrations  ...............   C-l 7
 C-5   Probabilities of Ozone Concentration Exceedance .........   C-18
 C-6   Model Predictions Correlated with Instrument Observations
       of Ozone (Data for 3 Days, 9 Stations, Daylight Hours)   .....   C-19
 C-7   Model Predictions Compared with Estimates of Instrument Errors
       for Ozone (Data for 3 Days, 9 Stations, Daylight Hours) .....   C-21
 C-8   Map of Denver Air Quality Modeling Region Showing Air
       Quality Monitoring Stations . . . . ...............   C-23
 C-9   Time History of Predicted and Observed Concentrations at
       Monitoring Sites  ........................   C-24
C-10   Variations over All Stations of Observed and Predicted
       Average Ozone Concentrations  ..................   C-25
C-ll   Plots of Residuals and Forcing Variable .............   C-26
C-l 2   Distribution of Area Fraction Exposed to Greater
       Than a Given Concentration Value  ................   C-30
C-l 3   Isopleths of Ozone Concentrations (pphm)  on 29 July 1975  ....   C-35
C-14   Size of Area in Which Predicted Ozone Concentrations Exceed
       Given Values for Years 1976, 1985, and 2000 ...........   C-40
C-15   Typical Residuals Isopleth Plot for Annual Average N0£  .....   C-42
C-16   Estimated Exposure to Ozone as a Function of Ozone
       Concentration for 3 August 1976 Meteorology ...........  C-48
C-17   General Shape of the Exposure Cumulative Distribution
       and Density Functions ......................  C-49
                                     ix

-------
C-18   Shape of t(/(C), the Approximation to the Delta Function	C-52
C-19   Cumulative Ozone Dosage as a Function of the Time of Day
       for 3 August 1976 Meteorology 	   C-54
                                 o
C-20   Cumulative Exposure (in 10  Person-Hours) to Ozone
       Concentrations Above Given Level in One-Square-Mile Grid
       Cells Between 500 and 1800 Hours for 3 August 1976
       Meteorology and 1976 Emissions	C-55
C-21   Cumulative Ozone Dosages (in 10  pphm-Person-Hours) in the
       One-Square-Mile Grid Cells from 500 to 1800 Hours (MST) for
       3 August 1976 Meteorology and Emissions in 1976 	   C-58
C-22   Orientation with Respect to Measurement Station of Nearest
       Point at Which Prediction Equals Station Observation  	   C-59
C-23   Space-Time Trace of Location of Nearest Point Predicting
       a Concentration Equal to the Station Measured Value 	   C-60
 D-l   Possible Health Effects Curves  	    D-4
 D-2   Representation of Spatial and Concentration Dependent
       Population Functions  	    D-6
 0-3   Population Distribution as a Function of Concentrations 	   D-10
 D-4   Idealized Concentration Isopleths  	   D-ll
 D-5   Typical Radial Concentration Distributions About
       the Peak	D-13
 D-6   Predicted Population Distribution as a Function
       of Concentration  	   D-16
 D-7   Shifts in w(C) Caused by Nonuniform Population Distributions  .  .   D-l7
 D-8   Expected Shape of Health Effects Function 	   D-20
 D-9   Minimum Allowable Ratio of Predicted to Measured Peak
       Concentration Value 	   D-23
D-10   Prototypical Isopleth Diagram 	   D-28
D-ll   The Isopleth Diagram Replotted  	   D-29
D-12   Total Regional Control Cost as a Function of the Level
       of Control Required 	   D-32
D-13   Uncertainty Distribution for a Conservative Model  	   D-35
D-14   Uncertainty Distribution for a Nonconservative Model  	   D-35

-------
                               TABLES
II-l   Air Quality Issues Commonly Addressed,
       by Generic Model Type  ....................    H-5
I 1-2   Model /Application Combinations ................    I 1-6
II-3   Some Air Quality Models  ...................    H-7
II-4   Generic Performance Measure Information Requirements .....   11-10
II-5   Types of Variations Among Generic Performance
       Measure Categories ......................   11-12
I 1-6   Performance Measures Commonly Associated with
       Specific Issues   .......................   ll-~\3
11-7   Performance Measures That Can Be Calculated by
       Each Model Type   .......................   H-l3
 II-8   Performance  Measure Objectives  ................  11-15
 I 1-9   Importance of Performance Attributes by  Issue   ........  11-16
11-10   Importance of Performance Attributes by  Pollutant and
        Averaging Time ........................  11-18
II-ll   Measures Recommended  for Use in Setting  Model
        Performance Standards  ....................  11-19
11-12   Possible Rationales for Setting Model  Performance
        Standards  ..........................  n'21
11-13   Performance Attributes Addressable Using Performance
        Standard Rationales   .....................  11-22
11-14   Association of Rationales with  Generic Issues   ........  11-22
11-15   Recommended Rationales for  Setting Standards .........  11-26
11-16   Summary of Recommended Performance Measures and Standards   .  .  11-26
 IV-1   Air Quality Issues Commonly Addressed,
        by Generic Model  Type  ..............  ......  IV-1 8
 IV-2   Possible Designations of Application Attributes   .......  IV-23
 IV-3   Model/Application Combinations  ................  IV-24
 IV-4   Some Air Quality  Models ...................  IV-26
                                       xi

-------
 V-l   Generic Performance Measure Information Requirements .....     V-8
 V-2   Types of Variations Among Generic Performance
       Measure Categories ......................    V-ll
 V-3   Some Peak Performance Measures ................    V-23
 V-4   Some Station Performance Measures  ..............    V-24
 V-5   Some Area Performance Measures ................    V-26
 V-6   Some Exposure/Dosage Performance Measures   ..........    V-28
 V-7   Performance Measures Associated with Specific  Issues .....    V-34
 V-8   Performance Measures That  Can Be Calculated by
       Each Model  Type  .......................    V-37
 VI-1   Performance Measure Objectives ................   VI-10
 VI-2   Importance  of  Performance  Attributes by Issue .........   VI-10
 VI-3   Importance  of  Performance  Attributes by Pollutant and
       Averaging Time ........................  VI-1 3
 VI-4   Candidate Station Performance Measures ............   VI-16
 VI-5   Useful  Hybrid  Performance  Measures  ..............  VI-20
 VI-6   Measures Recommended  for Use in  Setting Model
       Performance Standards   ............ ........  VI~21
 VI-7   Possible Rationales for Setting  Model  Performance Standards  .  VI-24
 VI-8   Performance Attributes  Addressable Using  Performance
       Standard Rationales   .....................  VI'26
 VI-9   Association of Rationales  with Generic Issues   ........  VI-27
VI-10   Recommended Rationales  for Setting Standards .........  VI -29
VI-11   Summary of Recommended  Performance Measures and Standards  . .  VI-31
VI-12   Sample Values  for Model Performance Standards
        (Denver Example) .......................  VI~43
VI-13   Importance of Performance Attributes by Issue  ........  VI-56
VI-14   Importance of Performance Attributes by Pollutant
        and Averaging Time .....................  VI'56
VI-15   Model Performance Measures and Standards ...........  VI-57

                                  xii

-------
B-l   Some Specific Air Quality Models 	    B-3
C-l   Some Peak Performance Measures	    C-3
C-2   Several Peak Measure Combinations of Interest and
      Some Possible Interpretations  	    C-4
C-3   Some Station Performance Measures  	    C-8
C-4   Occurrence of Correspondence Levels of Predicted and
      Observed Ozone Concentrations  	   C-20
C-5   Some Area Performance Measures	c~29
C-6   Some Exposure/Dosage Performance Measures  	   C-45
D-l   Selected Parameter Values in Denver Test Case  	   D-15

-------
                               EXHIBITS
III-l    Formal Organization of CFR Title 40—Protection
        of  Environment	III-4

 IV-1    General Model Categories 	    IV-3
                                 xiv

-------
                          I    INTRODUCTION


     In this report a candidate framework is suggested within which  an
objective evaluation of air quality simulation model  (AQSM)  performance
may be carried out, along with an assessment of the relative applicability
of models to specific problems.  Quantitative procedures  are identified
that could facilitate assessment of the relative accuracy and  usability
of an AQSM.

     The subject addressed in this report is a broad and  complex  one.  Sel-
dom can a rule for judging model performance be stated that  does  not have
several plausible exceptions to it.  Consequently,  we view the  establish-
ment of model performance standards to be a pragmatic and evolutionary
exercise.  As we gain experience in evaluating model  performance, we will
need to modify both our choice of performance measures and the  range of
acceptable values we insist on.  Nevertheless, the  process must begin some-
where.  The recommendations contained in this report represent  such  a
beginning.

     Model performance evaluation should not be viewed as a  mechanistic
process, to be performed in a "cookbook" fashion.  Performance  measures
may be defined to be specific quantities whose value in some way  character-
izes the difference between predicted and observed  concentrations.   No set
of performance measures, however well designed, can fully characterize
model behavior.  Judgment is required of the model  user.   Predictions can
be compared with measurement data in a variety of ways.  Some  comparisons
involve the calculation of specific quantities and  are thus  suited for
having specific standards set.  (An example might be the  difference  between
the predicted and the observed concentration peak.)  Other comparisons are
more qualitative, better used in an advisory sense  to facilitate  "pattern
                                    1-1

-------
recognition."  (Concentration isopleth maps and. time profiles of predicted
and observed concentrations are examples of this type of qualitative com-
parison.)  Although we recommend a set of performance measures and standards
in this report, in no way does this recommendation suggest that computation
of measures be limited to this set.  For this reason, we catalogue many
different types of performance measures, only a small subset of which have
explicit, formal standards.

     The measures and standards we suggest for use will almost certainly
change as experience improves our "collective judgment" about what consti-
tutes mode! acceptability and what does not.  Perhaps the number of measures
will Increase to provide richer insight into model performance, or perhaps
the number will shrink without any loss of "information content."  Regard-
less of the list of measures and these standards that ultimately emerges for
use, ft is the conceptual structuring of the performance evaluation itself
that seems to be most important at this point.  We must identify clearly the
desirable model  attributes whose  presence  we are most  interested in detecting,
and we need to understand how we assess their relative importance, depending
on the issue we are addressing and the pollutant species we are considering.
Thfs report offers a conceptual structure for "folding in" all these concerns
and suggests candidate measures and standards.

A.   OVERVIEW OF THE PROBLEM

     Air quality simulation models (AQSMs) are widely used as predictive
tools, estimating tne impact on future air quality of alternative public
decisions.  Their predictions, however, are inherently nonverifiable.   Only
after the proposed action has been taken and the required implementation
time elapsed will measurement data confirm or refute the model's predictive
ability.

     Herein lies the dilemma faced by users of air quality models:   If a
model's predictions at some future time cannot be verified, on what basis
can we rely on that model to decide among policy alternatives?  In resolving
this dilemma, most users have adopted a pragmatic approach:  If a model can

                                   1-2

-------
 demonstrate  its  ability  to  reproduce a set of  "known" results for a similar
 type  of application,  then it  is judged an acceptable predictive tool.  It
 is  on this basis that model "verification" has become an essential prelude
 to  most modeling exercises.

      Several  investigators  (Calder, 1974, and Johnson, 1972) have objected
 to  this approach,  arguing that  it amounts to little more than "crude cali-
 bration."  They  suggest  that  true model validation can only be accomplished
 by  evaluating each component  sub-model--emissions, transport, or chemistry,
 for example.   While this may  be a scientifically  sound approach, there are
 so  many models available that it is difficult  to  complete  such efforts for
 them  all.  Worse,  the demand  for a model, truly validated  or not, often
 forces such  concerns  to  be  swept aside.  We take  a highly  pragmatic position
 in  this report,  one that is also consistent with  recommendations recently
 made  to and  by the U.S.  Environmental Protection  Agency [EPA] (Roth, 1977,
 and EPA, 1977).   Because verification is so often performed at the "output
 end"  (that  is, only model results are examined, comparing  them with "true"
 data), a systematic and  objective procedure is needed in assessing model
 performance  on that same basis.

      A further difficulty exists.   What  constitutes  a set  of "known" results?
This is not a problem easily solved.   For "answers" to be  known exactly,  the
"test" problem must be simple  enough to  be  solved  analytically.  Few problems
involving atmospheric dynamics are  so simple.   Most are  complex and nonlinear.
For those, the analytic test problem is  an  unacceptable  one.  Another, more
practical alternative often  is employed.   For  regional,  multiple-source
applications, the "known" results are taken  to be  the station measurements
of concentrations actually  recorded  on  a  "test" date.

      For source-specific applications,  the  source of interest may not yet
exist, permission for its construction  being the  principal  issue at hand.
For these applications, it  is  often  necessary  to  verify  a  model using the
most appropriate  of several  prototypical  "test cases."  Though  not existing
currently, these  could be assembled  from measurements taken at  existing
sources, the variety of source size, type and  location spanning the range of
values found in applications of interest.

                                   1-3

-------
      The term  "known"  is used imprecisely when referring to a set of measure-
ment data.   Station observations are subject to instrumentation error.  The
locations of fixed monitoring sites may not be sufficiently well distri-
buted spatially to record data fully characterizing  the  concentration
field and its peak value.  Nevertheless,  despite  those shortcomings,
"observed"  data often are regarded as  "true" data for the purpose of
model verification.

      In evaluating model  performance,  we  must  decide which performance
attributes  we most wish the  modeT  to possess.  Having assembled two sets
of data, one "known"  and the other "predicted," we can assess model perfor-
mance by comparing  one with  the  other.  Prediction and observation, however,
can be compared in  many ways.  We  must select  the quantities (performance
measures) that can most effectively test for the  presence of those attributes.

      Once we have decided on the performance measures best suited to our
 needs (and most feasible computationally),  we  can calculate these values.
 Having done so, however, we must ask a central question: How close must
 prediction be  to observation in  order for us to judge model performance
 as acceptable?  If we are to answer "how good  is  good,"  performance stand-
 ards for these measures must be  set, with allowable tolerances  (predicted
 values minus observed ones)  derived from a  reasonable rationale (health
 effects or pollution control cost considerations, for instance).

      By setting these standards explicitly, certain benefits may  be gained.
Among these are the following:

      >  A degree of uniformity is introduced in assessing model
         reliability.
      >  A rational  and objective basis is provided for  comparing
         alternative models.
      >  The impact  of limitations in both data gathering proce-
         dures and measurement network design can be made more
         explicit, facilitating any review of them that  may  be
         required.
                                    1-4

-------
      >  The performance expected of a model  is  stated  clearly,
         in advance of the expenditure of substantial analysis
         funds, allowing model  selection to be a more straight-
         forward and less "risky" process.
      >  The needs for additional research can be identified  clearly,
         with such efforts more directed in purpose.

B.    STRUCTURE OF THE REPORT

      The central purpose of this report is to suggest  means  for  setting  per-
formance standards for air quality dispersion models.   In doing so,  our dis-
cussion proceeds in two phases, the first exploring key elements  of  the over-
all problem, as well as their interactions, and the second synthesizing all
into a conceptual framework for model performance evaluation.

      We recognize three key elements of the performance assessment  problem,
all of which are interrelated:  the classes of issues addressed  by AQSMs  (air
quality maintenance planning or prevention of significant deterioration,  for
example),  the  types of AQSMs available for use (grid-based, trajectory, or
Gaussian models, for  instance) with the applications for which they are suit-
able,  and  the  classes of performance  measures that are candidates for our
use (two of which  are station  and exposure/dosage measures).

     We  consider each of these three  elements  in Chapters  III, IV,  and V,
providing  supporting  material  in Appendices  A,  B,  and  C.   In Chapter III,
we identify from current federal  law  and  regulations seven distinctly dif-
ferent types of air quality issues,  each  of  which may  be addressed  using an
AQSM.   In  Chapter IV, we assess  major model  classes, examining their capabil-
ities  and  limitations as well  as  their suitability for use in addressing
each of the generic classes of issues.   In Chapter V,  we discuss model per-
formance measures,  identifying four major types, which we  then assess  for
computational  feasibility and  suitability for use.

      We provide supplementary  detail  for these three chapters in  the first
 three  appendicies.   In  Appendix A, we outline important  portions  of the  Code
                                    1-5

-------
of Federal  Regulations.   In  Appendix  B, we  describe  in  summary  form a number of
specific air quality models.   In Appendix C, we  examine at  length a variety
of specific model  performance  measures, discussing their computation and pro-
viding illustrative examples of their calculation.

     Having identified issues  (Chapter III),  issue/model combinations
(Chapter IV), and issue/model/measure associations  (Chapter V), we reach
the synthesis phase in Chapter VI.   Here we first identify  five desirable
attributes  of model performance.   Then we  recommend  a set of performance
measures suitable for use in determining  the presence or absence of each
attribute.   Each measure is  chosen based  on two criteria:  First, it is
an accurate indicator of the presence of a problem type and second, it is
quantitative (that is, amenable to having specific standards set).

     Having selected the performance measures for use,  we then  offer several
possible rationales for determining the range of their acceptable values.
We examine four rationales,  discussing each in detail in Appendix D.  Having
done so, we recommend standards for use.

     We also consider the way in which the relative importance  of the five
model performance attributes varies with the issue being addressed and the
pollutant being considered.   We recommend a means for ranking problem types
that is dependent on these factors, using it as a way to decide from among
procedural  alternatives when a model fails to display a particular attribute.

     To illustrate how to interpret the values of the recommended  perfor-
fance measures, we discuss a sample case.  The sample case  history  is based
on the use of the grid-based SAI Airshed Model in modeling  the  Denver Met-
ropolitan region.  Supplementary means for gaining insight into model
behavior are also shown.

     Finally, a conceptual framework is suggested for a  draft model  perfor-
mance standard.  The elements it should contain are discussed,  as well  as
its  relationship to a supplementary guidelines document.
                                   1-6

-------
     With this final discussion, our presentation  is  complete,  though  the
subject itself is by no means exhausted.   Considerable additional  effort
is warranted, given the importance of this complex and difficult  topic.
We suggest in Chapter VII several areas in which we feel  such work would
prove fruitful.
                                   1-7

-------
                               II   SUMMARY


     In this chapter we summarize the results of this study.   First,
we state them in overall terms.  Then, we summarize detailed  results
on a chapter-by-chapter basis.

A.   MAIN RESULTS

     Several main tasks are accomplished in this report.   These represent
the chief results of the study.  We summarize them as follows:

     >  A conceptual framework is set for objective evaluation of
        dispersion model performance (Chapter VI).
     >  An outline for a draft model performance standards document is
        suggested (Chapter VI).
     >  Specific measures are recommended for use (Chapter VI).
     >  Specific rationales on which standards could be based are
        developed, several of which represent research that is
        original with this study (Chapter VI and Appendix D).
     >  Comprehensive  background material  is presented on key elements
        of  the  performance evaluation  problem:  the  types of  issues to
        be  addressed  (Chapter  III and  Appendix A), the classes of
        models  to be used along with the applications for which they
        are  suited  (Chapter IV and Appendix B), and  the categories of
        performance measures available for consideration (Chapter V
        and  Appendix C).
     >  Guidance on the  interpretation of  performance measure values
        is  provided by means of an  illustrative sample case  (Chapter VI).
                                   II-l

-------
B.   DETAILED SUMMARY

     Discussion in this report proceeds in two phases.   In the first of these,
we present a comprehensive examination of key elements  of the performance
evaluation problem.  This background phase consists of  the in-depth
analysis in Chapters III, IV and V, supported by material in Appendices
A, B and C.

     We intend the background phase of this report to be regarded  not as a
supplement but rather as an essential prelude to the second, or synthesis,
phase.  The second phase, contained in Chapter VI and Appendix D,  draws
from the background material to identify a set of performance criteria
that 1s both useful and computationally feasible.

     In this section we present detailed summaries of the important
results of the report.  We do so on a chapter-by-chapter basis.

1.   Summary of Chapter III (Issues)

     This chapter provides an issues framework within which the
application of air pollution models can be viewed.  First, an overview
is provided, highlighting important aspects of federal  air pollution
law (also see Appendix A).  By means of this discussion, seven generic
classes of issues are identified.  These issues are examined and
their implications for model applications explored.

     The seven issue classes,  divided into multiple-source and single-
source categories, are described as follows:

     >  Multiple-Source Issues
        -   SIP/C  (State Implementation Plan/Compliance).  The attainment
           of regional  compliance with NAAQS, as considered in the SIP.
        -   AQMP (Air Quality Maintenance Planning).  Regional main-
           tenance of compliance  vrith the NAAQS, as considered in
           the SIP.
                                 11-2

-------
     >   Single-Source  Issues
        -   PSD  (Prevention of  Significant Deterioration).  Limitation
           of the amount  by which  the air quality may be degraded in
           areas in attainment of  the NAAQS; this is considered in
           each SIP.
        -   NSR  (New Source Review).  Permit process by which applicants
           proposing new  or modified stationary  sources must demonstrate
           that both directly  and  indirectly caused emissions are
           within certain limits and that the  pollution control to
           be employed is performed with the best available tech-
           nology; this is considered in each  SIP.
        -   OSR  (0 f fse_t_Ru 1 es).  Interpretive decision by which all
           new  or modified stationary sources  in urban areas currently
           in noncompliance with the NAAQS are judged unacceptable
           unless the  applicant can demonstrate  a plan for reducing
           emissions  in an existing source by  an amount greater
           than the emissions  from the  proposed  new sources; this
           decision has a strong impact on the stationary source
           permit process.
        -   EIS/R  (Environmental  Impact  Statement/Report).  A state-
           ment of  impact required for  major projects undertaken by
           the  federal government  or financed  by federal funds
           (EIS), or  a report of project  impact  required of public
           or private agencies by  state or  local statutes  (EIR).
        -   LIT  (Litigation).   Court suits brought to resolve disagree-
           ment over  any  of  the issues  mentioned above or to secure  .
           variances  waiving federal,  state  or local requirements.

2.   Summary of Chapter IV  (Models)

     In Chapter III,  we identified a  set  of  generic air  quality  issues.
In this chapter, we define a set of generic  model  types.  Having done  so,
we match the two,  identifying in generic  terms those  issues for which
each model may  be a  suitable analysis  tool.  We  also describe  the  technical
formulations and  underlying  assumptions employed in each generic model
                                 II-3

-------
type, indicating some key limitations.  Through this presentation, we
specify the relationship between generic issues, models, and the appli-
cations for which they are suitable.

     The generic classes of dispersion models that we consider are:

     >  Rollback
     >  Isopleth
     >  Physico-chemical
        -  Grid
           •  Region Oriented
           •  Specific Source Oriented
        -  Trajectory
           •  Region Oriented
           •  Specific Source Oriented
        -  Gaussian
           •  Long-Term Averaging
           •  Short-Term Averaging
        -  Box

     In Table II-l we associate generic model types with air quality issues
for which their use is most appropriate.  In Table II-2 we present model/
application combinations of interest, characterizing applications by five
attributes:  number of sources, area type, pollutant, terrain complexity,
and required resolution.  The table lists the values of the attributes that
can be accommodated by each model type.

     In Table II-3 we relate some specific air quality models to the generic
model categories in which they may be classified.  Each of these models is
described in detailed summary form in Appendix B.

3.   Summary of Chapter V (Performance Measures)

     In this chapter we discuss the types of performance measures available
for use, examining their relationship with both the issues

                                   II-4

-------
TABLE  II-l.    AIR QUALITY ISSUES  COMMONLY ADDRESSED  BY  GENERIC  MODEL  TYPE
          Generic Model Type
        Refined Usage
        1.  Grid1
            a.   Region Oriented
            b.   Specific Source Oriented
        2.  Trajectory1
            a.   Region Oriented
            b.   Specific Source Oriented
                   3
                                   1
Gaussian
a.
                Short-Term Averaging
                1) Multiple Source
                11) Single Source
            b.  Long Term-Averaging
         Refined/Screening Usaqe
         4. Isopleth1'5
         Screening  Usage
         5. Rollback
         6. Box
                                   	Issue Category	
                                   SlP/CAQMP      PSD      MSR     BSR    IT57R    LTT
        Notes:
             1.  Only short-term time scales can be  considered (less than several days).
             2.  Regional  impact of new sources can  be assessed but not near-source, or microscale, effects.
             3.  Only non-reactive pollutants can be considered.
            4.  Only pollutants having long-term standards can be considered  (SO., TSP, and NO.).
             5.  Only photochemically active pollutants can be considered.
                                                II-5

-------
                    TABLE II-2.    MODEL/APPLICATION  COMBINATIONS
 Centric Model Type

REFINED USAGE

trid

a.  Region Oriented
                         Umber of
                          Sources
                                        Area Type
                      Multiple-Source    Urban
                                        Rural
                                                        Militant
03. MC. CO. «02
(1-hour). S02
(3- and 24-hour).
TSP
                        Terrain
                        Complexity
                                                                                               Required
                                                                                              Resolution
Simple             Temporal
Complex (Limited)   Spatial
ft.  Specific Source
    Oriented
                       Single-Source     Rural
Oj. HC. CO. Mfe
(1-hour). SO?
(3- and 74-hour).
TSP
 Simple
 Complex  (Limited)
                                                                                            Temporal
 Trajectory

 a.  Region Oriented
ft.  Specific Source
    Oriented
                       Multiple-Source   Urban
                      Single-Source      Urban
                                        Rural
01. HC. CO. 102
(1-hour). SO?.
(3- and 24-hour).
TSP

03. HC. CO. W>2
(1-hour). SO?.
(3- end 24-hour).
TSP
Si-pie
Siayle
Complex (United)
Teaporal
Spatial (Lialted)
Teaporal
Spatial  (LiBited)
tauisian
a . Long-Term
ft.
Short-Ter*
Averaging
Multiple- Source
Single- Source
Multiple-Source
Single- Source
Urban
Rural
Urban
Rural
SO? (Annual). TSP. Dimple
NOj (Annual)* Complex (Limited)
SO? (J- and 24-
hour). CO. TSP.
NOz. (l-hour)«
Si«ple
Coaplex (Li-ited)
Spatial
Temporal
Spatial
REFIMA/SC.KENING USAGE

Isopleth
                      ftuUiple-Source    Urban
                                                      .   .
                                                     (T-bour)
                    Simple             Teaporal (Ltaited)
                    Complex (Limited)
SCREENING USAGE

Rollback
                       Multiple-Source
                       Single- Source
                                        Urban
                                        Rural
                       Multiple-Source    Urban
03. HC. «02
S02. CO. TSP

03. HC. CO. NOz
(1-hour). SO?
(3- and 24-hour),
TSP
Simple
Complex (Limited)

Siaple
Conplex (Linited)
                                                                                            Temporal
* duty if M   is taken to be total
                                                 11-6

-------
           TABLE II-3.   SOME  AIR QUALITY  MODELS
     Generic Model  Type
Refined Usage
  Grid
  a.  Region Oriented
  b.  Specific Source Oriented

  Trajectory
  a.  Region Oriented


  b.  Specific Source Oriented

  Gaussian
  a.  Long-term Averaging
   b.   Short-term Averaging
 Refiner/Screening Usage
   Isopleth

 Screening Usage
   Rollback


   Box
Spec ific Model  Name
  SAI
  LIRAQ
  PICK

  EGAMA
  DEPICT
  DIFKIN
  REM
  ARTSIM

  RPM
  LAPS
  AQDM
  COM
  CDMQC
  TCM
  ERTAQ*
  CRSTER*
  VALLEY*
  TAPAS*

  APRAC-1A
  CRSTER*
  HANNA-GIFFORD
  HIWAY
  PTMTP
  PTDIS
  PTMAX
   RAM
  VALLEY*
  TEM
  TAPAS*
  AQSTM
   CALINE-2
   ERTAQ*
   EKMA
   WHITTEN
   LINEAR ROLLBACK
   MODIFIED ROLLBACK
   APPENDIX 0

   ATDL
 * These models can be used for both long-term and short-term
   averaging.

-------
and the models we  identified in Chapters III and IV.  Our discussion
proceeds as follows:  We first identify generic types of performance
measures; we then  catalogue some specific performance measures
(describing them in detail in Appendix C); and finally we match
generic performance measures to the issue/model/application combin-
ations presented in earlier chapters.

     We consider four generic performance measure categories:  peak,
station, area, and exposure/dosage.  The first category contains
those measures deriving from the differences between the predicted and
observed concentration peak, its level, location and timing.  The second
category includes  measures based on concentration differences between
prediction and observation at specific measurement stations.  Within the
third category are contained those measures based on concentration
field differences  throughout a specified area.  The fourth category
includes measures  derived from differences in population exposure and
dosage within a specified area.

     Each of these generic performance measure categories requires
successively greater knowledge of the spatial and temporal  distribution
of concentrations.  We show in Figure INI a schematic representation of
several distinct levels of knowledge about regional concentrations.  A
similar schematic  illustration appropriate for source-specific situations
is shown in Figure I1-2.  Listed in Table I1-4 are the information require-
ments for the four categories.  We also consider the relative likelihoods
that reliable information will be available supporting calculation of measures
from each of the four categories.

     Three types of variations are recognized among performance measures:
scalar, statistical, and pattern recognition.  Those measures of the
first type are based on a comparison of the predicted and observed
values of a specific quantity:  the peak concentration level, for
instance.   Those of the second type compare the statistical behavior
(the mean,  variance, and correlation, for example) of the differences
between the predicted and observed values for the quantities of interest.

                                 11-8

-------
         CONCENTRATION
         PEAK
         Vvvv
                                       MEASUREHHT STATION
                                       CONCENTRATIONS
                                             BOUNDARIES OF
                                             MODELING REGION
FIGURE II-l
VARIOUS LEVELS OF  KNOWLEDGE ABOUT
REGIONAL CONCENTRATIONS
             ASUREHENT
            STATION CONCENTRATIONS
                                             GROUND-LEVEL CON
                                             CENTRATION PEAK
                                        CONCENTRATION
                                        FIELD
                                        C(x.y.t)
   FIGURE  II-2.
   VARIOUS  LEVELS OF KNOWLEDGE  ABOUT
   SPECIFIC-SOURCE CONCENTRATIONS
                        II-9

-------
                TABLE I1-4.  GENERIC PERFORMANCE MEASURE
                            INFORMATION REQUIREMENTS
    Generic
  Performance
 Measure Type

Peak
Station
Area
Exposure/dosage
              Information Required
Predicted and measured concentration peak (level,
location, and time), i.e.,
                                                  ..  Meas.
Predicted and measured concentrations  at specific
stations (temporal history), i.e.,

                           •    1 * *  *  " stations
Predicted and measured concentration field within
a specified area (spatial and temporal  history),
i.e.,
                                     C(x,y,t)pred >
Both the predicted and measured concentration
field and the predicted and actual  population
distribution within a specified area  (spatial
and temporal history), i.e.,

                C(x,y,t)pred §

                C(x,y,t)pred f
                                11-10

-------
 Measures  of the  final  type are  useful  in  triggering  "pattern  recognition,"
 that is,  providing  qualitative  insight into  model  behavior, transforming
.concentration "residuals"  (the  differences between predicted  and observed
 values) into forms  that highlight certain aspects  of model performance.

      To illustrate  the types of variations found in each generic
 performance measure category, we present  Table II-5.  Some typical
 examples  are included  for  each  category/variation  combination.  In
 Section D of this chapter, a number of specific performance measures
 are listed. .Examined  in detail  in Appendix  C, they are classified
 according to the scheme presented here.

      For reasons we examine in  this chapter, performance measures may
 be associated with  the issue classes.   We match issue with measure  in
 Table II-6, indicating where their calculation might be of use.  Note
 that NSR and PSD are both  part  of the preconstruction review  process
 for a new source.

       Also,  we may  match measures  to model type, as  is  shown  in Table II-7.
  This  we  do based on differences among model  types in their ability to cal-
  culate each of  the measure types.   Isopleth,  rollback  and box models, for
  instance,  provide  insufficient  spatial resolution for calculation of station,
  area  or  exposure/dosage measures.   Likewise,  long-term averaging Gaussian
  models lack sufficient temporal resolution  to permit calculation of exposure/
  dosage measures.

       Several  important conclusions are reached in this chapter about the
  suitability for use of each of the four  measure types:

       >  Performance measures relying on  a comparison of the
          predicted  and "true" peak concentrations  may not be
          reliable in all circumstances, since measurement networks
          can provide only  the concentration  at the station record-
          ing the highest value,  not necessarily the value at  the
          "true"  peak.

                                  11-11

-------
           TABLE II-5.
   TYPES OF VARIATIONS AMONG  GENERIC
   PERFORMANCE  MEASURE CATEGORIES
Generic Performance
  Measure Category
Station
 Are*
 Cxpasurc/do&age
 Types of
Variations           	Typical Exaaple	

Scalar              Concentration residual* tt the  peak.
Pattern             Nap showing locations and values of Maximum
Recognition         one-hour-average concentrations for each hour.

Scalar              Concentration residual at the station Measuring
                    the highest value.
Statistical         Expected va\u*. variance and correlation coef-
                    ficient of the residuals for the nodeling day
                    at • particular Measurement station.

Pattern             At the ti«* of the peat (event-related). the
Recognition         ratio of the residual at the station having
                    the highest value  to the average of the resi-
                    duals at the other station sites (this can
                    Indicate whether the nodel performs better near
                    the peak than it does throughout the rest of
                    the modeled region).

Scalar             Difference In the  fraction of the andeled area
                    in which the NAAQS are exceeded.

Statistical         At the tla* of the peak, differences 1n the
                    area/concentration frequency distribution.

Pattern             For each modeled hour, Isopleth plots of the
Recognition         ground-level residual field.

Scalar             Differences in the number of perton-hours of
                    exposure to concentrations greater than the
                    NAAQS.
Statistical         Differences in the exposure concentration fre-
                    quency distribution.
Pattern             For the entire modeled day, an  isopleth plot
Recognition         of the ground level dosage residuals.
 • Residual:  The difference between "predicted"  and "observed.
                                         11-12

-------
TABLE  II-6.
PERFORMANCE MEASURES  COMMONLY
ASSOCIATED WITH SPECIFIC  ISSUES
Performance Measure Type
Issue
Multiple-source
SIP/C
AQMP •
Specific-source
PSD
HSR
OSR
EIS/R
in
Peak

X
X

X
X

X
X
Station

X
X

X
X
X
X
X
Area

X
X

X
X
X
X
X
Exposure/Dosage

X





X

  TABLE  II-7.
  PERFORMANCE MEASURES  THAT  CAN  BE
  CALCULATED BY  EACH MODEL TYPE
           Model
   Refined usage
    Grid
      Region oriented
      Specific source oriented
    Trajectory
      Region oriented
      Specific source oriented
    Gaussian
      Long-term averaging
      Short-term averaging
   Refined/screening usage
    Isopleth
   Screening usage
    Rollback
    Box
            Performance Heasure Type
                           Exposure/
          Peak  Station  Area  Dosage
                    11-13

-------
    >  Performance measures relying on a comparison of the
       predicted and "true" concentration fields may not be
       computationally feasible since neither predicted nor
       "true" concentration fields are always resolvable,
       spatially or temporally.
     >  Performance measures based upon a comparison of predicted
       and "true" exposure/dosage, though they are appealing
       because  of their ability to serve as  surrogates for the
       health effects  experienced  by the populace, may not be
       computationally feasible because of the difficulty in
       measuring the  "true"  population  distribution and the
       "true" concentration  field.   (We do suggest in Chapter VI
       and Appendix D, however, one  means by which health effects
        considerations can be accounted  for implicitly.)
     > Performance measures  based  upon  a comparison of the
       predicted and  observed concentrations at  station sites
        in the measurement network  may be of  the  greatest practical
       value.

4.   Summary of Chapter VI (Performance  Standards)

     The  central purpose of this report  is to suggest means for  setting
performance standards for air quality dispersion  models.   In  this
chapter we reach this goal.  Our discussion proceeds as follows:   First
we identify five key attributes of desirable  model  performance,  evaluating
how their relative importance varies depending  on the  issue addressed and
the pollutant/averaging time considered; then we propose  specific  perfor-
mance measures appropriate for use in testing for the  presence of  these
attributes; and finally we suggest rationales on which to base the setting
of formal standards.  Having recommended for use a list of performance mea-
sures and standards, we deal with two additional issues:   interpretation
of the values of the measures, which we illustrate by means of a sample
case study, and promulgation of formal performance criteria,  which we
explore by proposing an outline of a draft standard.
                                 11-14

-------
     The five attributes of desirable model  performance are defined  as
follows:  accuracy of the peak prediction, absence of systematic  bias,
lack of gross error, temporal  correlation, and spatial alignment.  Though
they are interrelated, each of the five performance attributes is  distinct.
Consequently, we must employ different kinds of performance measures to
determine the presence or absence of each.  We list in Table II-8 the
objectives of each type of performance measure.

                TABLE  II-8.  PERFORMANCE MEASURE OBJECTIVES
    Performance
    Attributes   _.     	Objective of  Performance Measures	
 Accuracy  of  the       Assess  the model's  ability  to  predict the concentra-
 peak prediction       tion  peak  (its  level, timing and  location)
 Absence of            Reveal  any  systematic bias  in  model  predictions
 systematic bias
 Lack of gross          Characterize the error  in model  predictions both at
 error                 specific monitoring stations and  overall
 Temporal               Determine  differences between  predicted and observed
 correlation            temporal behavior
 Spatial alignment     Uncover spatial  misalignment between the predicted
                       and observed concentration  fields

     We classify the difference between bias and error by means of the
following example.  Suppose when we compare a set of model predictions
with station observations, we  find several large positive residuals  (pre-
dicted  minus observed  concentrations)  balanced by several equally large
negative  residuals.   If we were testing for bias, we would allow the
oppositely signed  residuals to cancel.  A conclusion that the model  dis-
played  no systematic  bias therefore might be  a justifiable one.  On  the
other hand, were we testing for gross  error,  the signs of the residuals
would not be considered with  oppositely signed residuals no longer allowed
to  cancel.  Because the absolute  value of the residuals is large in  our
example,  we might  well conclude that  the  model predictions are subject to
significant  gross  error.
                                  11-15

-------
     Which of these performance attributes, however, is most  important?
This question has no unique answer, the relative importance of each
attribute depending on the type of issue the model is being used  to  address
and the type of pollutant under consideration.  In order to relate attri-
bute importance to application issue in a more convenient manner, we pre-
sent in Table II-9 a matrix of generic issues (as defined earlier in this
report) and problem type.  For each combination we indicate an "importance
category."  We define the three categories based on how strongly  we  insist
that model performance be judged acceptable for the given problem type.
For Category  1, we require that the performance attribute must be present
 (the problem  type is of prime  importance).  For Category 2, the attribute
should be present but, if  it is not, some leeway ought to be  allowed, per-
 haps at the discretion of  a reviewer (although the attribute  is of consider-
 able importance,  some degree of "mismatch" may be tolerable).   For Category 3,
we are not insistent that  the  performance attribute be present, though we
 state  that as being a desirable objective (the attribute is not of central
 importance).   The reasoning behind the entries in this table  is complex.
 For this reason,  we urge the reader to consult the detailed discussion in
 Chapter IV Section C.

       TABLE  I1-9.   IMPORTANCE OF PERFORMANCE ATTRIBUTES BY ISSUE

                             Importance of Performance Attribute*
 Performance Attribute   SIP/C   AQMP   PSD   NSR   OSR   EIS/R   LJI
 Accuracy of the  peak      1       1      1121       1
 prediction
 Absence of systematic     1111111
 bias
 Lack of gross error       11       11111
 Temporal correlation      2233333
 Spatial alignment         2213333
 * Category 1 - Performance standard must always be satisfied.
   Category 2 - Performance standard should be  satisfied, but some leeway
                may be allowed at the discretion of a  reviewer.
   Category 3 - Meeting the performance  standard is desirable but failure
                is not sufficient to reject the model; measures dealing
                with this problem should be regarded as  "informational."
                                 H-16

-------
     The relative importance of each performance attribute also is dependent
on the type of pollutant being considered and the averaging time required
by the NAAQS.  If a species is subject to a short-term standard, for
instance, model peak accuracy and temporal correlation might be of con-
siderable concern, depending on the issue being addressed.  However, if
the species is subject to a long-term standard, neither of these are of
appropriate form.  We indicate in Table 11-10 a matrix of the problem types
and pollutant species.  We rank each combination by the same importance
categories we used earlier in Table II-3.

     Conceivably, a conflict might exist between the ranking indicated
by the issue and the pollutant matrices in Tables II-9 and 11-10.  We
would resolve the conflict in favor of the less stringent of the two
rankings.

     Having identified the problem types of interest, we then suggest
specific performance measures for use.  Our recommended choice of perfor-
mance measures is based upon the following criteria:

      >   The measure is an accurate indicator of the presence of a
         given problem type.
      >   The measure is of the "absolute" kind, that is,  specific
         standards can be set.
      >  only station measures should be considered for use in
         setting standards.*  (This is more an  "unavoidable" choice
         than a. "preferred" one.)

      Based on these criteria, we recommend the set of measures described
 in Table 11-11.   The use of ratios (cpp/cpm and v,  for example)  can  intro-
 duce difficulties:   They can become unstable at low concentrations,  and  the
 statistics of a ratio of two random variables can  become  troublesome.  Never-
 theless, when used properly their advantages can be offsetting.   For example,
 the use of Cp /Cp  instead of Cp-Cm)  permits a health effects  rationale  to  be
 used in recommending a performance standard (see a  later  discussion).

*P!ote the caveat on  pages VI-18 and VI-19, with respect to point source applications,
                                   11-17

-------
         We draw a distinction  between  those  measures  that are of general
    use In examining model performance  and the much smaller  subset of measures
    that are most amenable to the  establishment of explicit  standards.   Many
    measures can provide rich insight  into model  behavior, but the informa-
    tion  Is conveyed in a qualitative  way not suitable for quantitative
    characterization (a requisite  for  use in  setting  performance  standards).
TABLE 11-10.    IMPORTANCE OF  PERFORMANCE ATTRIBUTES BY POLLUTANT AND AVERAGING TIME
                                           Importance of Performance Attribute*
                               Pollutants with Short-tern Standards
      Performance     3  .   CO**   MIC*      2
      Attribute   (1 hour)1 (1 hour) (3 hour)  (3 tour)
                            Pollutants with
                  	long-tern Standards
 CO     TSP**    *y   "y    TSP     *°l
B hour)  (24 hour)  (24 hour)  (1 year)  (1 year) fl year)
Accuracy of the
peak prediction
Absence of
systematic bin
lack of gross
•rror
Temporal
correlation
Spatial
alignment
1

1

1
1

1

1

1

1
2

2

1

1

1
2

2

1 1

1 1

1 • 1
2 1

2 1

1

1

1
2

2

1

1

1
3

2

1

1

1
3

2

3

1

1
«/An

2

3
•
1

1
II/A

2

8

1

1
H/A

2

    • Category 1 - Performance standard Mist be satisfied.
     Category 2 - Performance standard should be satisfied, but sow leeway Bay be allowed at the discretion of a reviewer.
     Category 3 - Meeting the performance standard Is desirable but failure Is not sufficient to reject the Model.
    t Ho short-ten M>2 standard currently exists.
    I Averaging tlaes required by the KAAQS are In parentheses.
   •• PrlMry standards.
   tt the performance attribute is not applicable.
                                           11-18

-------
TABLE 11-11.  MEASURES RECOMMENDED FOR USE IN SETTING MODEL PERFORMANCE STANDARDSf
    Performance
     Attribute
Accuracy of the
peak prediction
                     Performance  Measure
Ratio of the predicted station peak to the measured station
(could be at different stations and times)
                                               VA.
                                                 p/  rm
                      Difference in timing of occurrence of station peak*
Absence of
systematic bias
 Lack of gross
 error
 Temporal cor-
 relation*
  Spatial alignment
Average value and standard deviation of the mean deviation
about the perfect correlation line normalized by the average
of the predicted and observed concentrations, calculated for
all stations during those hours when either the predicted or
the observed values exceed some  appropriate minimum value -
(possibly the NAAQS)
                                              v, o_\
                                                  y '
                                                    OVERALL
 Average value and standard deviation of the absolute devia-
 tion about the perfect correlation line normalized by the
 average of the predicted and observed concentrations, calcu-
 lated for all stations during those hours when either the
  predicted or the observed values exceed some  appropriate
  minimum value  (possibly the NAAQS)
                                 OVERALL

  Temporal  correlation coefficients at each monitoring  station
  for the entire modeling period and an overall  coefficient
  averaged for all  stations
                      rf , r
                       *1   COVERALL

  for* 1 <. i <. M monitoring stations

  Spatial correlation coefficients calculated for each  modeling
  hour considering all monitoring stations, as well  as  an over-
  all coefficient average for the entire day

                     r   , r
                      xj   XOVERALL

  for 1  <. j <. N modeling hours
  * These measures are  appropriate when  the  chosen model  is used to consider questions
    involving photochemically  reactive pollutants subject to short-term standards.

  t There is deliberate redundancy in the  performance measures.  For example, in
    testing for  systematic  bias, U  and  o^.  are calculated.  The latter quantity
    is a measure of  "scatter"  about  the  perfect correlation line.  Thjs is also an
    indicator of gross  error and could be  used in conjunction with  |y"| and o~.
                                        11-19

-------
These "measures," often involving graphical display, really are tools
for use in "pattern recognition."  They display model behavior in
suggestive ways, highlighting  "patterns" whose presence reveals much
about model performance.  Several examples of such  "measures" are
isopleth contour maps of  predicted concentrations estimates of
"observed" ones,  isopleth contour maps of the differences between the
two, and time histories of  predicted and observed concentrations at
specific monitoring  stations.

     Although we focus on station measures for use  in setting model per-
 formance  standards,  we do not suggest that the calculation of performance
measures  be limited  to such measures. Many other measures should be used
 where  appropriate.   The  data should  be viewed in as many, varied ways as
 possible  in order to enrich insight  into model behavior.  We suggest a
 number of useful measures both in Chapter V and Appendix C.

      Having identified specific measures for use, we consider four rationales
 for setting appropriate  standards.   The rationales, along with a statement
of their  guiding principles, are shown in Table 11-12.  We discuss each in
 detail  in  Appendix D.

       The four rationales differ in  their ability to consider each of the
  five  problem types.   Shown in Table II-13 are the  types of problems
  addressable by measures  whose standards are set by each of the rationales.
  Only  the  Pragmatic/Historic rationale is of use in addressing all problem
  types; the other three  are of use principally in defining the level of
  performance required in  predicting  values at or near the concentration
  peak.  In Table 11-14 we associate  each rationale with those issues
  for which its use is  appropriate.

      We select in the following ways from among the alternative rationales.
  Hoping to  avoid introducing a procedural bias, we  first eliminate the
  Guaranteed Compliance rationale from further consideration.  Then,
  because the  Health Effects rationale is better suited for use in setting
                                  11-20

-------
          TABLE 11-12.
POSSIBLE RATIONALES FOR SETTING MODEL
PERFORMANCE STANDARDS
      Rationale
                  Guiding 'Principle
Health Effects
Control Level
Uncertainty
Guaranteed Compliance
Pragmatic/Historic
  The metric of concern is the area-integrated cum-
  ulative health effects due to pollutant exposure;
  the ratio of the metric's value based on predic-
  tion to its value based on observation must be
  kept to within a prescribed tolerance of unity.

  Uncertainty in estimates of the percentages of
  emissions control required must be kept within
  certain allowable bounds.

  Compliance with  the NAAQS must be "guaranteed";
  all uncertainty  must  be on the conservative side
  even  if this  approach means  introducing a syste-
  matic  bias.

   In each new  application,  a model  should perform
   at least  as  well as  the  "best" previous perfor-
   mance of  a model in  its  generic  class  in a  sim-
   ilar  application;  until  such a historical data
   base  is  complete,  other more heuristic approaches
   may be applied.
                                   11-21

-------
            TABLE 11-13.   PERFORMANCE  ATTRIBUTES ADDRESSABLE  USING
                          PERFORMANCE  STANDARD RATIONALES
Performance
Attribute
Accuracy of the
peak prediction
Absence of
systematic bias
Lack of gross
error
Temporal
correlation
Soatial alignment
Health*
Effects
X



X




Control Level* Guaranteed
Uncertainty Compliance
X X



X




Pragmatic/
Historic
X

X

X

X

X
  * These are most suited for photocheaically reactive pollutants subject
    to short-term standards.
         TABLE   11-14. ASSOCIATION OF RATIONALES WITH GENERIC ISSUES
                                           Issue Category
   Rationale

Health Effects

Control Level
Uncertainty

Guaranteed
Compliance

Pragmatic/
Historic
Multiple-Source
SIP/C
X
X
AQMP
X
X
PSD
X
X
Specific-Source
NSR
X
X
OSR EIS/R
X
X
LIT
X
X
X


X
X


X
X


X
                                    11-22

-------
standards for peak measures,  we choose to use it only  in  that way.  As is
clear from Table 11-13,  we presently have no alternative  but to  apply
the Pragmatic/Historic rationale for those measures designed to  test
for systematic bias and  gross error as well as to evaluate temporal
correlation  and spatial alignment.

     Where we invoke the Pragmatic/Historic rationale  as  justification
for selecting specific standards, we also state the specific guiding
principles we follow.  We summarize those here:

     >  When the pollutant being considered is subject to a short-
        term standard, the timing of the concentration peak may  be
        an important quantity for a model to predict.   This is parti-
        cularly true when the pollutant is also photochemically
        reactive.   We state as a guiding principle:  "For photochem-
        ically reactive pollutants, the model must reproduce  reason-
        ably well  the phasing of the peak."  For ozone an acceptable
        tolerance for peak timing might be ±1 hour.
     >  The model  should not exhibit any systematic bias  at concen-
        trations at or above some appropriate minimum  value (possibly
        the NAAQS) greater than the maximum resulting  from EPA-allowable
        calibration error in the air quality monitors.  We would
        consider in our calculations any prediction-observation  pair
        in which either of the values exceed the pollutant standard.
     >  Error (as measured by its mean and standard deviation)  should  not
        be significantly different from the distribution of differences
        resulting from the comparison of an EPA-acceptable monitor
        with an EPA reference monitor.  The EPA has set maximum
        allowable limits on the amount by which a monitoring technique
        may  differ from a reference method  (40 CFR § 53.20).   An "EPA-
        acceptable monitor"  is defined here to be one that differs from
        a  reference monitor  by up  to  the maximum allowable amount.
     >  Predictions and observations  should appear to be highly cor-
        related at a 95 percent confidence level, both when compared
                                 11-23

-------
        temporally and spatially.  We  can estimate  the minimum allow-
        able value for the  respective  correlation coefficient by using
        a t-statistic at  the  appropriate percentage level and having
        the degrees  of freedom appropriate  for  the  number of prediction-
        observation  pairs.

      The  guiding principles noted above are plausible ones, though in
 some cases  they are  arbitrary.  As a "verification  data base" of
 experience  is  assembled,  historically achieved  performance levels may
 be better indicators of  the expected level  of model  performance.
 Standards derived on this more pragmatic basis  may  supplant those
 deriving  from the "guiding  principles" followed in  this report.

      Our  recommended choice for use, when possible,  in establishing peak-
 accuracy  standards is a  composite one, combining the Health Effects and
 Control Level  Uncertainty rationales.   Were a model to overpredict the
 peak, a control strategy  based on its  prediction might be expected
 to abate  the health  impact  actually occurring,  though with more control
 than actually needed.   If the model  underpredicted, however, the control
 strategy  might be "underdesigned," with the risk existing that some of the
 health impact might  remain  unabated even after  control implementation.
 The penalty,  in a health  sense, is incurred only when the model underpre-
 dicts. The Health Effects  rationale then is one-sided, helping us set
 performance standards only  on the "low side."

      On the other hand, the Control  Level Uncertainty rationale is
 bounded "above" and  "below",  that is,  its use provides a tolerance
 interval about  the value  of the measured peak concentration.  For a
 model  to be judged acceptable under  this criterion, its prediction of
 the peak concentration would  have to fall within this interval.  Model
 underprediction could lead  to control  levels lower than required, but
 residual health risks.  Overprediction, on  the  other hand, could lead
 to abatement strategies posing little  or no health risk but incurring
control costs greater than  required.
                                  11-24

-------
     For the above reasons, we suggest that the Control  Level  Uncer-
tainty rationale be used to establish an upper bound (overprediction)
on the acceptable difference between the predicted and observed peak.
We would choose the lower bound (underprediction) to be the interval
that is the minimum of that suggested by  the Health Effects and
Control Level Uncertainty rationales.

     We list our recommendations in Table 11-15, noting the possibility
that the recommended rationales may not be appropriate in all  applications
for all pollutants.   Whether health effects would be an appropriate con-
sideration when considering TSP, for instance, is unclear.  The Health
Effects rationale, as defined in Appendix D, is best suited for use in
urban applications involving short-term, reactive pollutants.   In those
circumstances when the HE or CLU rationales are not suitable,  we suggest
the Pragmatic/Historic rationale.

     We summarize in Table 11-16 our list of recommended performance
measures and standards.  In it, we associate performance attribute
and standard.  To further describe the standard, v/e state the type of
rationale used and the guiding principle followed, as well as providing
sample values that are appropriate for the sample case we consider
in this chapter.

     We also discuss two supplementary subjects.  First, we illustrate
how performance measure values may be  interpreted by describing a
sample case based on use of the SAI Airshed Model in simulating the
Denver Metropolitan region  Then, we consider means by which model
performance criteria may be promulgated, suggesting an outline for a
draft standard.

     Thus we conclude this chapter and the report.  We note in closing
that the performance subject itself  is by no means exhausted.   Many
areas remain to be explored in greater detail, all warranting considerable
additional effort.
                                   11-25

-------
         TABLE 11-15.  RECOWENDED RATIONALES FOR SETTING STANDARDS
   Performance
    Attribute

Accuracy of peak
prediction
                                     Recommended Rationale
                    Health Effects* (lower side/underprediction)
                    Control Level  Uncertainty* (upper side/overprediction)
Absence of
sytematic bias
                    Pragmati c/Hi stori c


                    Pragmati c/Hi stori c


                    Pragmatic/Historic


Spatial alignment   Pragmatic/Historic
Lack of gross
error

Temporal cor-
relation
* These may not be appropriate for all regulated pollutants  in all  applica-
  tions.  When they are not, the Pragmatic/Historic rationale should be
  employed.  They are most applicable for photochemically reactive  pol-
  lutants subject to a short-term standard (03 and N02i  if a 1-hour
  standard is set).
                                 11-26

-------
        TABLE  11-16.    SUMMARY  OF RECOMMENDED  PERFORMANCE  MEASURES  AND  STANDARDS
                                                                          Performance Standard
    Performance
     Attribute

 Accuracy of the
 peak prediction
   Performance of Measure

Ratio of the predicted
station peak to the
measured station peak
(could be at different
stations and times)
Type of Rationale
Health Effects1-
(lower side) com-
bined with Control
Level Uncertainty
(upper side)
Guiding Principle
Limitation on uncertainty
1n aggregate health
Impact and pollution
abatement costs^
Sample Value
(Denver Example)
Cp
80 < fr-2- <. 150 percent
\
                    Difference in timing of
                    occurrence of station
                           Pragmatic/Historic
                                                                    Model must reproduce
                                                                    reasonably well  the
                                                                    phasing of the peak,
                                                                    Sty, il  hour
                                                                                                 ±  1  hour
Absence of          Average value and standard  Pragmatic/Historic
systematic bias      deviation  of mean devia-
                    tion about the perfect
                    correlation line normal-
                    ized by the average of the
                    predicted  and observed con-
                    centrations, calculated for
                    all stations during those
                    hours when either predicted
                    or observed values exceed
                    some appropriate minimum
                    value (possibly the NAAQS).
                                 'OVERALL
Lack of gross        Average value and Stan-     Pragmatic/Historic
error               dard deviation of absolute
                    mean deviation about the
                    perfect correlation line
                    normalized by the average
                    of the predicted and
                    observed concentrations,
                    calculated for all sta-
                    tions during those hours
                    when either predicted or
                    observed values exceed some
                    appropriate minimum value
                     (possibly the NAAQS)
                                               No or very little systematic
                                               bias at concentrations (pre-
                                               dictions or observations) at
                                               or above some appropriate
                                               minimum value (possibly the
                                               NAAQS); the bias should not
                                               be worse than the maximum
                                               bias resulting from EPA-
                                               allowable monitor calibra-
                                               tion error (-8 percent is
                                               a representative value for
                                               ozone); the standard devia-
                                               tion should be less than or
                                               equal to that of the differ-
                                               ence distribution of an EPA-
                                               acceptable monitor** com-
                                               pared with a reference moni-
                                               tor.  (3 pphm is represents-
                                               tlve for ozone at the 95
                                               percent confidence level)

                                               For concentrations at or
                                               above some appropriate
                                               minimum value (possibly
                                               the NAAQS), the error
                                              •• (as measured by the overall
                                               values of jiT|  and o|—| )
                                               should be indistinguishable
                                               from the difference result-
                                               ing from comparison of an
                                               EPA-acceptable monitor with
                                               a reference monitor
                                                                                                 No apparent bias at
                                                                                                 ozone concentrations
                                                                                                 above 0.06 ppm
                                                                                                 (see Table VI-12 and
                                                                                                 Figures Vl-5 and Vl-6
                                                                                                 for further details)
                                                                                                  NO excessive gross
                                                                                                  error  (see Table
                                                                                                  VI-12  and Figures
                                                                                                  Vl-5 and VI-6 for
                                                                                                  further details)
Temporal correla-
tion*
 Spatial alignment
      \      I "I/OVERALL

Temporal correlation coef-
ficients at each monitor-
ing station for the entire
modeling period and an
overall coefficient for
all stations
                           Pragmatic/Historic
                        V1
        COVERALL
                     for 1  i 1 i M monitoring
                     stations
Spatial  correlation coef-
ficients calculated for
each modeling hour con-
sidering all  monitoring
stations, as  well as an
overall  coefficient for
the entire day

  V  "OVERALL
for 1 i j <. N modeling
hours
                                                 Pragmatic/Historic
                                                                    At a 95 percent confidence
                                                                    level, the temporal pro-
                                                                    file of predicted and
                                                                    observed concentrations
                                                                    should appear to be in
                                                                    phase (in the absence of
                                                                    better information, a con-
                                                                    fidence interval may be
                                                                    converted into a minimum
                                                                    allowable correlation
                                                                    coefficient by using an
                                                                    appropriate t-statistic)
                                                                     At a 95  percent confidence
                                                                     level, the spatial distri-
                                                                     bution of predicted and
                                                                     observed concentrations
                                                                     should appear  to be cor-
                                                                     related
For each  monitoring
station,

  0.69 <. r.   <. 0.97
                                                                                                    Overall,
                                                                                                     OVERALL
                                                                                                               0.88
In this example a
value of r >. 0.53  is
significant at  the
95 percent confidence
level

For each hour,
 -0.43 t.T   i 0.66
          xj
Overal1,
                                                                                                                 0.17
                                                                                                      'OVERALL
                                                                                                   In this example a
                                                                                                   value of r 2. 0.71  is
                                                                                                   significant at the 95
                                                                                                   percent confidence
                                                                                                   level
 * These measures are appropriate when the chosen model  is  used  to consider questions involving photochemically reactive
   pollutants  subject to short-term standards.
 •t These may not be appropriate for all regulated pollutants in  all applications.  When they are not  the Pragmatic/
   historic rationale should be employed.
** The EPA has set maximum allowable limits on the amount by which a monitoring technique may differ  from a reference
   method.  An "EPA-acceptable monitor" Is defined here to  be one that differs from a reference monitor by up to the
   maximum allowable amount.
                                                      11-27

-------
            Ill    ISSUES REQUIRING MODEL APPLICATION
     Air pollution models  have been developed  over a period of years, not
always in response to specific needs.   While convenience and availability
(rather than strict suitability)  often motivated  their  use in particular
applications, certain classes of  models have come to be associated with
certain classes of applications.   For  this  reason, it is helpful to view
the setting of model performance  measures and  standards within that issue-
specific context.  This chapter is intended to provide  an issues framework
within which the application of air pollution  models can be viewed.  First,
an overview is provided, highlighting  important aspects of air pollution
law.  By means of this discussion, generic issues are identified.  Then,
these issues are examined and their implications  for model applications
explored.

A.   A PERSPECTIVE ON THE ISSUES

     Basic air pollution law  in this country has  been  enacted  at the fed-
eral level, although many important legal variants  exist among states  and
localities.  The passage of  legislation, however, is  often  just a  first
step.  Usually, only broad authority is granted in  the  original  law.   It
remains  to the federal agency thus chartered by the Congress  to set  the
specific regulations implementing  the  law.  These are then  promulgated,
becoming an additional part  of the Code of Federal  Regulations (CFR).
Notice is provided  of  such an action by publication in the  Federal Register
 (FR).  When disagreements exist over the degree to which the  promulgated
 regulations mirror  the intent of  the original  law,  civil suits may be  brought
 in  court to resolve disputes.  Judgments in such suits  can  and have  had
 important effects on the CFR.  In  the  remainder of this section we will
 explore  briefly  the body of  air  pollution  law, from enabling  legislation to
 promulgation  of regulations  in the CFR.

                                III-l

-------
1.   Federal Air Pollution Law

     Basic federal law is contained in the United States Code (USC).  It
is divided into "Titles" which are themselves divided into "Sections."
Groups of sections form "Chapters."  Title 42 of the USC (usually denoted
as 42 USC) is entitled "The Public Health and Welfare."  It contains the
basic law pertaining to air pollution:  Chapter 15B entitled "Air Pollution
Control" and Chapter 55 entitled  "National Environmental Policy."

     The Clean Air Act is contained in Section 1857 of Title 42 (within
Chapter 15B) and  is referenced by the notation 42 USC §1857.  Originally
enacted in  1963,  It has since been amended a number of times.  The most
notable changes occurred with the passage of the Clean Air Act Amendments
of 1970 and 1977, the former of which, among other things, created the
Environmental  Protection Agency (EPA), authorized the setting of national
ambient air quality standards (NAAQS) and required the development of state
 implementation plans  (SIPs) for the attainment of compliance with the NAAQS.
After passage  by  the Congress and signature by the President, a bill con-
 taining such amendments or  providing for new portions of the USC becomes
a part of the  public law and  is referred to both by the Congressional ses-
sion and  a  passage sequence number.  The 1970 Amendments, for example, are
referred  to as  Public Law 91-604.  For reference, the 91st Congress convened
for the two years from January 1969 to January 1971.

     The other  legislation most heavily affecting air pollution law is the
National Environmental Policy Act (NEPA) of 1969 (Public Law 91-190), which
amended Chapter 55 (National Environmental Policy) of Title 42.  In its
•primary features, the act created the Council on Environmental Quality
reporting to the  President and mandated the preparation of environmental
impact statements (EISs) for "major Federal actions significantly affecting
the quality of the human environment."  These are required for federal agency
actions and for projects supported "in whole or in part" with federal finan-
cing.  The NEPA is found in 42 USC §4321, 4331 to 4335, 4341, and 4341 to 4347.
                                 III-2

-------
2.   The Code of Federal  Regulations

     Implementation of federal  law is accomplished by promulgation  of
specific regulations,  the body  of which is contained in the Code  of Fed-
eral  Regulations.   The CFR is divided into "Titles" (not the same as those
in the DSC),  which  are themselves subdivided into "Chapters," "Subchapters,"
and "Parts."   All federal regulations pertaining to air pollution are  con-
tained in Title 40  which  is called "Protection of the Environment." The
formal organization of 40 CFR is shown in Exhibit III-l.   Note that Title  40
contains no Chapters II and III.

     Subchapter C,  "Air Programs," is expanded in that exhibit to include
"Part" subheadings  as  is  Chapter V, "Council on Environmental Quality."
The following parts within Chapter I  are of particular importance.   In Part
50 the primary and  secondary NAAQS are set for sulfur dioxide, particulate
matter, carbon monixide,  photochemical oxidants, hydrocarbons, and  nitrogen
dioxide.  In Part 51 requirements are stated for the development of SIPs.
All State plans, whether approved or disapproved, are published in  Part  52.
In Part 60 the emissions  standards are set for new and modified stationary
sources.  Further breakdown of  these parts by section heading is provided
in Appendix A.

     As originally conceived, SIPs were blueprints for achieving compliance
with the NAAQS.  As the  regulations have evolved, however, they now require
that SIPs now provide for air quality maintenance (AQM) once compliance
has been achieved.   SIPs are currently being revised according to the  man-
dates of the 7 August 1977 Clean Air Act Amendments and are required to
be reassessed periodically as to their ability to attain and maintain  the
NAAQS.
                                  III-3

-------
          EXHIBIT III-1.    FORMAL ORGANIZATION OF CFR TITLE 40—
                           PROTECTION OF ENVIRONMENT
Chapter 1.  Environmental Protection Agency
     Subchapter A - General (Parts 0-21)
     Subchapter B - Grants and Other Federal Assistance (Parts 30-49)
     Subchapter C - Air Programs (Parts 50-89)
          Part 50.  National primary and secondary ambient air quality
                    standards
          Part 51.  Requirements for preparation, adoption, and sub-
                    mi ttal of implementation plans
          Part 52.  Approval and promulgation of implementation plans
          Part 53.  Ambient air monitoring reference and equivalent
                    methods
          Part 54.  Prior notice of citizen suits
          Part 55.  Energy related authority
          Part 60.  Standards of performance for new stationary sources
          Part 61.  National emission standards for hazardous air
                    pollutants
          Part 79.  Registration of fuels and fuel additives
          Part 80.  Regulation of fuels and fuel  additives
          Part 81.  Air quality control regions,  criteria, and control
                    techniques
          Part 85.  Control of air pollution from new motor vehicles and
                    new motor vehicle engines
          Part 86.  Control of air pollution from new motor vehicles and
                    new motor vehicle engines:  certification and test
                    procedures
          Part 87.  Control of air pollution from aircraft and aircraft
                    engines
          Part 88-89.  [Reserved]
     Subchapter D - Water Programs (Parts 100-149)
     Subchapter E - Pesticide Programs (Parts 162-180)
     Subchapter F - [Reserved]
     Subchapter G - Noise Abatement Programs (Parts 201-210)
     Subchapter H - Ocean Dumping (Parts 220-230).
     Subchapter I - Solid Wastes (Parts 240-399)
                                III-4

-------
     Subchapter N - Effluent Guidelines  and Standards  (Parts  401-460)
     Subchapter Q - Energy Policy (Part  600)

Chapter IV.   Low Emissions Vehicle Certification Board (Part  1400)

Chapter V.   Council on Environmental  Quality (Parts  1500-1510)
     Part 1500.  Preparation of environmental  impact statement:
                 Guidelines
     Part 1510:  National  oil and hazardous substances pollution
                 contingency plan
                                  III-5

-------
     Contained within SIPs are procedures for controling emissions from both
mobile and stationary sources.  Because of the size and age of the vehicle
fleet, control of emissions from mobile sources is currently an important
part of other SIP segments dealing with NAAQS compliance.  As stricter auto-
motive emissions standards are achieved and older cars are removed from high-
ways through age attrition, stationary sources will contribute an increasing
fraction of the total emissions inventory.  Their importance thus increases
in the AQM segment of the SIPs.

     The portion of  40  CFR relating  to the review of applications for
new  or modified stationary sources 1s Section 51.18.  There it is stated
that "no approval  to construct or modify will be granted unless the appli-
cant shows to the  satisfaction of the Administrator that the source will
not  prevent or interfere with attainment or maintenance of any national
standard."  The quote Is a paraphrase of §51.18(a), as written in the
California SIP [40 CFR  §52.233(g)(3)].  Several issues of practical impor-
tance derive  from  this  section of 40 CFR.  New source review (NSR) proce-
dures are  thus required, with such stationary sources directed to meet
new  source performance  standards (NSPS) where stated in 40 CFR §60 or as
determined by the  appropriate reviewing agency and to install appropriate
pollution  control  equipment.  Also,  an important consequence of 40 CFR
§51.18 derives from  its interpretation in urban areas currently in noncom-
pliance with  the NAAQS.  In most instances, the addition of a single, modestly
sized, stationary  source would be unlikely to affect regional peak pollutant
concentration. Considered separately, an argument could be made that few new
stationary sources violate the letter of §51.18.  Taken in the aggregate,
however, emissions from several new  sources together could have serious ad-
verse effects  on regional pollutant  concentrations.  To overcome this inter-
pretive difficulty,  the EPA has employed the so-called offset rules (OSR).
All  new stationary sources in noncompliant urban areas are considered to be
in violation of §51.18  unless the applicant can demonstrate that a reduction
in emissions  from other sources and  a reduction in the air quality impact of
those emissions has  been achieved to offset those produced by the proposed
new source.
                                  111-6

-------
     Another issue  of importance  in  SIP  development  is  the  prevention of
significant deterioration  (PSD) of the air quality in areas currently in
attainment of the NAAQS.   Originally, 40 CFR contained  no provision for
consideration of PSD.   A court suit, however, brought about a judgment
that SIPs  must address this  issue.   As a consequence, subsequent to May 31,
1972, the  EPA Administrator  disapproved  all  SIPs  not considering PSD.
Standards  for PSD were promulgated in §52.21, entitled  "Significant Deteriora-
tion of Air Quality."

     In addition to SIPs,  environmental  impact statements and  reports
(EIS/R) represent the other  major class  of planning  documents  formally
required to address air quality issues.   In Chapter  V of 40 CFR, guidelines
are provided for drafting  EISs for major federal  actions.   They are required
not only for projects undertaken  solely  by the federal  government, but  also
for any major projects supported  "in whole or in  part"  by federal  financing.
EISs were  submitted to the CEQ for review.  They  are now,  however, received
and reviewed by the EPA.   State and  local agencies  can  also require for in-
dividual projects a formal statement of  environmental  impact.   In  California,
for instance, such  a statement is called an "Environmental  Impact  Report"
(EIR) and  is filed  pursuant  to the California Environmental Quality Act (CEQA)

     Running throughout air  pollution law is the  basic  right  of legal  appeal.
Court suits have played an important part in shaping the body of the  law.
Portions of the authorizing  statutes,  the CFR, and many individual  EIS/Rs
have come under legal challenge.   As a  result, litigation (LIT) also  re-
presents an important class  of issues addressed by air pollution modelers.

B.   GENERIC ISSUE  CATEGORIES

     In the previous section we have outlined many of the important features
of air pollution law.  A number of generic  issues thereby have been ident-
ified.  In this section we will summarize these generic issues, discuss each
briefly, and then examine their implications for air pollution modeling.
                                  III-7

-------
In the next chapter we will match these issue categories with"a number of
existing models, comparing application requirements with model  capabilities.

1.   The Issues:  Their Classification

     The air pollution burden In a geographical area is the result of the
complex interaction of missions from all sources as they mix and disperse
 in the atmosphere,  subject to prevailing influences of meteorology, solar
 irradiation,  and te'rrain.   The  total pollutant concentrations experienced
 are a  function of the effects of emissions from each of the mobile and
 stationary emitting sources, though that function is generally not a
 linearly  additive one.   Because the NAAQS are expressed in terms of total
 allowable concentration  levels  and are applicable at any location to which
 the public has access,  implementation plans are inherently regional in
 perspective.   There is  a certain duality of focus in SIPs, however:  While
 they detail plans for regional  NAAQS compliance and maintenance, they do so
 through curtailment of emissions from individual sources and source cate-
 gories.  Thus, while the focus  is ultimately on regional effects, the environ-
 mental impact of individual sources also must be considered.  This is an
 explicit  issue with new source  review (NSR), for instance.  As the number of
 sources to be considered decreases, the two perspectives--regional and
 single source-specific—merge together.  A case in point is the examination
 of the impacts of a few sources located in a rural area, where prevention
 of significant deterioration (PSD) is an issue.

      From the discussion of air pollution law presented earlier, we have
 isolated  several specific issues, each falling into one of two distinct
 generic issue categories.   The  chief distinction between the two is not
 simply the difference between regional and source-specific perspective, for
 each individual source has both a regional and a localized downwind impact.
 Rather, the clearest distinction  lies  in the number of sources considered.
 Questions of regional NAAQS compliance and maintenance are multi-source
 issues.  NSR, on the other hand, primarily concerns a single source.  Using
 such a distinction, the  principal issues addressed by air quality planners
 are as follows:
                                    111-8

-------
Multiple-Source Issues
-  SIP/C (State Implementation Plan/Compliance).   The
   attainment of regional compliance with the NAAQS,
   as considered in the SIP.
-  AQMP (Air Quality Maintenance Planning).  Regional
   maintenance of compliance with the NAAQS, as con-
   sidered in the SIP.
Single-Source Issues
-  PSD (Prevention of Significant Deterioration).  Limita-
   tion of the amount by which the air quality can be de-
   graded in areas currently in attainment of the NAAQS;
   this is considered in each SIP.
-  NSR (New Source Review).  Permit process by which  appli-
   cants proposing new or modified stationary sources must
   demonstrate that both directly and indirectly caused
   emissions are within certain limits and that the pollu-
   tion control to be employed is performed with the
   appropriate technology; this is considered in each SIP.
-  OSR (Offset Rules).  Interpretive decision by which all
   new or modified stationary sources in urban areas  cur-
   rently in noncompliance with the NAAQS  are judged unac-
   ceptable unless the applicant can demonstrate a plan for
   reducing emissions in existing sources  and that a  reduc-
   tion in the air quality  impact of these emissions  has
   been achieved to offset  those produced  by the proposed
   new source; this decision has a strong  impact on  the
   stationary source  permit process.
-  EIS/R  (Environmental  Impact Statement/Report).  A state-
   ment of impact required  for major projects  undertaken
   by the federal government or financed by  federal  funds  (EIS),
   or a report of project  impact  required  by state or  local
   statutes  (EIR).
 -  LIT  (Litigation).  Court suits brought  to resolve disagree-
   ment over any of the  issues mentioned above or to secure
   variances waiving  federal,  state  or  local requirements.
                            III-9

-------
     The above seven issues are classified according to their most fre-
quently encountered form.  We note that actual cases do not always conform
to the bounds of the generic issue categories as shown.  An EIS, for
instance, can have a regional perspective, as with the Denver Overview EIS
recently completed for Region VIII of the EPA.  Also, LIT can occasionally
have effects on regional NAAQS compliance and maintenance.  For example, PSD
and AQHP resulted  from court suits.

2.   The Issues:   Some Practical  Examples and Their Implications
     for Air Pollution Modeling

     Many  practical  examples can  be  found in which the issues identified
above  play an important  role in planning.  At this point, we will discuss
some of the more  important applications in which they are likely to be
encountered.  Modeling requirements  can thus be identified.  This discus-
sion will  serve as a prelude to the  examination of air pollution models
presented  in the  next chapter.

     First, we consider  the nature of multiple-source (M/S) issue appli-
cations.   SIP/C and AQMP can focus both on urban areas as well as on large
rural  sources.  Here we  concentrate  on the most frequently encountered
applications, those in urban areas.  Encountered in such regions are both
reactive pollutants [ozone (Og),  hydrocarbon (HC), and nitrogen dioxide
(N02)3 and relatively nonreactive pollutants [carbon monoxide (CO), sulfur
dioxide (SO*)** and total  suspended  particulates (TSP)].  There are a
variety of different source types:   point sources (power plants, refin-
eries, and large  industrial plants,  such as steel, chemical and manufac-
turing companies),  line  sources (highway, railroads, shipping lanes, and
•airport runways),  and area sources (home heating, light industrial users
of volatile chemicals, street sanding, gasoline distribution facilities,
and shipping ports).  Mobile sources (cars, trucks, and buses) almost
invariably can be  aggregated into highway line sources.  While a few
cities with air pollution  problems are located in complex terrain (Pitts-
burg,  for  example),  most are situated in relatively flat or gently rolling
terrain.   Geographical features can  play an important part in regional air
pollution  (for instance, the ocean near Los Angeles, the lake near Chicago,
and the mountains  near Denver).
* Sulfur dioxide is slowly reactive:  S02 •»• S0|, aerosol.
                                 111-10

-------
     Air pollution  modeling  in  such  circumstances  has  been used for several
principal  purposes.   It has  been  useful  in  estimating  the total amount of
emissions  cutback required to reach  compliance with  the  NAAQS.  Individual
control  strategies  also have been assessed, both for SIP/C and AQMP.  In-
sights  from regional  modeling have been  useful in modifying and improving
pollutant measurement network design.   In Denver, for  instance, use of the
SAI  Urban Airshed Model indicated for  a  particular model  day  the  presence
of an ozone (03)  peak in a then-unmonitored area.  Subsequent location of a
temporary monitoring  station at that site lead to the  observation of 03
readings in excess  of any previously measured.   Also,  models  have had an
influence on transportation  network  design (the  balance  of freeways, arterials,
and feeders) and modal split (the mix  between personal and mass transit).
Through the EIS/R process, individual  projects  (for  example,  the  Interstate
470 freeway and the construction of  wastewater treatment facilities, both
in Denver) have been  examined using  models to estimate air quality impact.

     Second, we consider the nature  of stationary single-source  (S/S)
issues.  Important applications occur  in both urban  and rural areas.  These
focus on the following:  (1) SIP/C and the permit approval  process for new
or modified stationary sources and (2) the variance  process  for existing
facilities.  As for the first of these,  SIP/C and the permit approval  pro-
cess, all new or modified major S/Ss,  urban and rural, are  subject to  NSR
and must meet NSPS and use the best  available pollution control  equipment.
Also, both direct and  indirect impact on air quality must be considered.

     In urban areas, major S/Ss might include proposed refineries, power
plants, and industrial facilities, as well as shopping, employment, and
recreational/sports centers.  With the last of  these, indirect effects are
particularly important.   Each draws appreciable  numbers of automobiles,
adding  to  local vehicle miles traveled (VMT) and  increasing  congestion and
thus pollutant emissions.  Also,  automobile hot soak  and some cold start
emissions  are concentrated  in accompanying parking  lots.
                                  III-ll

-------
     Urban S/Ss are dealt with in the SIP/C and permit application process
differently than are rural S/Ss.  In urban areas in noncompliance with the NAAQS,
OSR must be considered.  The air pollution modeler must be able not only to
represent the regional and localized downwind impact of the new S/S but also
to estimate the subtractive effect of reducing emissions from one or more
existing sources.

      Another  difference  between urban and rural areas has important signif-
 icance for the modeler.   In  rural areas, the relatively nonreactive pollu-
 tants (SOp and TSP) are  often of greater interest than are the more reactive
 ones.  Although the NO  emissions also produced at some point could gener-
 ate, with the addition of HC, photochemically reactive pollutants, they are
 usually not of primary concern. In  urban areas, the reactive pollutants
 (0 , N02, and HC) must also  be modeled.  When the incremental effect of a
 S/S is being considered  in an urban  areas (OSR, as well), this distinction
 can have a strong effect on  model choice.  This is particularly true when an
 S/S emits 0  precursors  such as NO  , which power plants do, or HC, which
            *v                     A
 refineries do.

      In rural areas, applications centering on energy development have been
 prominent in recent years, particularly in the northern and central Great
 Plains.  The direct air  pollution impact of these S/Ss would be produced by
 coal extraction (strip mining), conversion to natural gas, transport to
 energy production facilities if they are not on site (via unit train or
 slurry pipelines), or coal combustion in large power plants.  Indirect impact
 would result from the construction  of the above-mentioned facilities  (new
 highways, provision for  temporary construction crews) and the growth of nearby
 "boom" towns (housing for families  of workers and the additional population
 increase required to provide commercial and public services to workers).

      A complicating factor not confronted  in nonattainment regions  arises  in
 attainment areas:  PSD must be considered.  No  S/S or combination  of them is
 permitted to degrade significantly the  air quality in nonpolluted  rural  areas.
 In each SIP such areas are identified.  The modeler must  be able to assess  the
                                  111-12

-------
likelihood that an S/S will impinge on such areas to an unacceptable  degree.
Also, because pollutants from rural source" are either inert or slow  in
reacting and because surface deposition, rainout, and washout often proceed
at slow rates (depending on synoptic meteorology), atmospheric residence  times
are long for some pollutants such as the derivative products of SO^.   Trans-
port distances on the order of a thousand kilometers may not be unusual.   The
modeler must be able to account for pollutant transport and transformation
on this temporal and spatial scale, if required.

     In both urban and rural areas, the owner of a S/S has the right  to seek
a variance temporarily excusing the source from provisions of the law, but
not such as to cause a violation of the NAAQS.  A number of reasons could
motivate such a request.  For a power plant, petroleum shortages could result
in a need to burn high-sulfur fuel.  For a refinery, petroleum storage and
shipping needs might result in a variance request.  Other reasons might include
a need for an extension of the time required to comply with SIP control
strategy requirements or for periodic pollution control equipment maintenance
or replacement.

3.   The Issues:  A Prologue to the Next Chapter

     In this chapter, we have examined the body of air pollution law and
identified two generic  issue categories:  multiple-source issues and single-
source issues.  Seven separate  (though interrelated) types of issues  were
classified within that  structure:  SIP/C, AQMP, PSD, NSR, OSR, EIS/R, and
LIT.

     We have examined some practical examples illustrating particular
features of these issues as they manifest themselves in both  urban and rural
areas.  We have also discussed  some key implications that these issues have
for air pollution modeling.  This  serves as an important prologue to the
discussion of specific  models undertaken in the next chapter.  In that
chapter we will match application  requirements to model capabilities.  The
issues identified here  will serve  as the framework within which that dis-
cussion is carried out.

                                 111-13

-------
                       IV   AIR  QUALITY MODELS
     In the last chapter,  we identified  generic  types  of  air quality issues.
In this chapter, we  define generic  classes  of  models.   Having done so,
we match the two, identifying those issues  for which each model may be a
suitable analytical  tool.   We also  describe the  technical formulations
and underlying  assumptions employed in each generic model class, indicating
some key limitations.

     The final  choice  of a model  for use in addressing a  particular issue
can be made only by  considering the characteristics of the  proposed applica-
tion.  To facilitate the comparison between model  capabilities and applica-
tions requirements,  we define a set of applications attributes.  We then
match the two,  identifying for each generic model  the  combinations of
application attributes for which it is suited.  A  related means for match-
ing model to application is described in EPA (1978a).

     In this chapter we attempt to  specify  the relationship between issues,
models, and applications.   Having done so,  we  then develop  in Chapter V
model performance measures appropriate to each issue/model  combination of
practical interest.  This  will  set  the stage for a discussion of requisite
model performance standards in Chapter VI.

     In order to preserve  generality, our emphasis in  this  chapter centers
primarily on generic model categories rather than  on specific air quality
models.  Certain benefits  may be achieved thereby: General conclusions
appropriate to  an entire class of models may be  stated without reference
to any particular model, and extensive discussions of  any observed differ-
ences between intended capabilities and  technically achieved ones need not
be conducted for each  specific model.
                                   IV-1

-------
      Our central purpose In this report is to  discuss means for setting
 model performance standards.  While not central  to  this, however, we do
 recognize a need to associate some specific models  with our generic
 model categories.  To assist in doing so,  we examine in Appendix B a number
 of air quality models.  Though the list is not a complete one, a number of
 available models are examined in detail and tabulated according to several
 attributes.  Among these are the following: level  of intended usage
 (screening or refined), type of pollutant  (reactivity, averaging time),
 degree of resolution (spatial and temporal), and certain site specifics
 (terrain, geography, as well as source type and  geometry).

      We summarize at the end of this chapter that part of Appendix 6 needed
 to associate specific models with our generic  categories.  No attempt is
 made in this chapter or in Appendix B to screen  models for technical accept-
 ability nor  is  any attempt made to be all-inclusive.  Models are classified
 according  to their intended capabilities rather  than their technically achieved
 ones.  Among the references we have drawn upon in gathering this information
 are  the following:  Argonne (1977). EPA (19786), and Roth et al., (1976), as
 well as several  program users' manuals.

 A.   GENERIC MODEL CATEGORIES

      In this chapter air quality models and prediction methods are class-
 ified into generic model categories.  Here we  describe the structure of
 the classification scheme employed, the full form of which is shown in
 Exhibit IV-1.  Though many such schemes have been proposed (Roth et al.,
 1976, and Rosen, 1977, for example), we identify three broad divisions:
 rollback,  isopleth,  and physico-chemical.   We  describe here each of these
 categories, mentioning technical  formulation,  general capabilities, and
 major limitations.   In doing so,  we draw upon  material in Roth et al. (1976).

 1.    Rollback Category

      Included in the  first of these are all  those prediction methods in
which ambient pollutant concentrations  are assumed  to be directly (though

                                  IV-2

-------
not necessarily linearly)  proportional  to emissions,  according  to  some
simple relationship.   Emissions control  requirements  are presumed  propor-
tional to the amount by which the peak pollutant concentration  exceeds
the NAAQS.   Linear rollback and Appendix J are examples  of such methods.
                      I.   Rollback
                     II.   Isopleth
                    III.   Physico-Chemical
                          A.  Grid
                              1.  Region Oriented
                              2.  Specific Source Oriented
                          B.  Trajectory
                              1.  Region Oriented
                              2.  Specific Source Oriented
                          C.  Gaussian
                              1.  Long-Term Averaging
                              2.  Short-Term Averaging
                          D.  Box
                   EXHIBIT IV-1.  GENERAL MODEL CATEGORIES

     Because atmospheric processes are generally complex and nonlinear,
 the fundamental proportionality assumption invoked in rollback methods
 is frequently violated in actual application.  For this reason, rollback
 methods are usually regarded as -screening techniques, whose results give
 at best only a general indication of the amount of emissions control
 required.  They are most often  used when insufficient data are available
 to perform an analysis that is more technically justifiable.  Even then,
 results obtained with them are  appropriate only as a crude indication of
 the need for more extensive data gathering and analysis.  Because rollback
 methods  lack  spatial  resolution,  they  are  most suitable for addressing
 regional, multiple-source*  issues.  Also,  their use  is  more  appropriate
 for applications  involving  relatively  nonreactive pollutants  (SOp,  CO  and TSP)
 * In this report, "multiple-source" refers to many, well-distributed
   sources of all types and sizes.  It does not include, for instance,
   a single complex having multiple stacks.
                                   IV-3

-------
2.   Isopleth Category

     Within the second generic model category are included those methods
relying on isopleth diagrams to relate precursor concentrations  of primary
emissions (usually oxides of nitrogen and nonmethane hydrocarbon) to the
level of secondary pollutant (usually ozone) resulting from such a mixture.
As  is true with  the EPA EKMA method (see EPA, 1977), these diagrams are usually
constructed  from computer simulations using theoretically and chamber derived
chemical kinetic mechanisms.  They  invoke assumptions about a number of
parameters such  as regional ventilation and solar insolation, as well as
pollutant entrainment, carryover from the previous day, and transport from
upwind.  The accuracy of the postulated chain of chemical reactions is
evaluated using  smog-chamber data.  The types of information required to con-
struct an isopleth diagram are roughly equivalent to those required to employ
a box model, and we note that the two methods are conceptually similar in
many regards. He maintain a distinction between the two, however, because
of  the view  prevailing in the user  community that they are separate classes
of  models.   Also, not all box models are photochemical, as are isopleth-
based methods.

     Entry into  an isopleth diagram requires an estimate of the peak con-
centration actually occurring during the day on some initial base date.
Given an assumption about the relative proportion of precursor species
control  (HC  versus NO ), the degree of emissions cutback required to achieve
                      A
the NAAQS can be estimated  directly.

     Isopleth methods lack  spatial  resolution.  They are thus capable of
addressing only  regional, multiple-source  issues.  By  their nature,  isopleth
methods are  useful only  for applications involving photochemically  reactive
pollutants.   Because  of  the level  of approximation involved in  constructing
the isopleth diagram  itself, in  entering it using measured ambient  data,
and in accounting for the effect of transport from upwind, such methods  are
                                   IV-4

-------
more appropriate for use as screening tools.   In this capacity,  they  can
be helpful  in assessing the need for further, more refined analysis.   How-
ever, in some limited applications where the  assumptions invoked in the
formulation of the isopleth methods are generally satisfied, estimates of the
required degree of emissions control obtained using such a method can be
regarded as acceptably accurate.

3.   Phy s i c o-Chemi ca1 Category

     The third category contains models based upon physical and chemical
principles  as embodied in the atmospheric equations of state.  It is  divided
into four main subcategories:  grid, trajectory, Gaussian, and box.  We
discuss here each subcategory.

a.   Grid Subcategory

     Grid models employ a fixed Cartesian reference system within which
to describe atmospheric dynamics.  The region to be modeled is bounded
on the bottom by the ground, on the top usually by the inversion base
(or some other maximum height), and on the sides by the desired east-west
and north-south boundaries.  This space is then subdivided into a two- or
three-dimensional array of grid cells.  Horizontal dimensions of each cell
measure on the order of several'kilometers, while vertical dimensions can
vary, depending on the number of vertical layers and the spatially and
temporally varying inversion base height.  Some grid models assume only a
single, well-mixed cell extending from the ground to the inversion base;  .
others subdivide the modeled region into a number of vertical layers.

      Ideally,  the coupled  atmospheric  equations  of  state,  expressing  con-
 servation  of  mass, momentum,  and energy,  would  be solved  systematically
 within each grid  cell, with  a  chemical  kinetic  mechansim  used  to  describe
 the  evolution  of  pollutant species.   Several  major  difficulties  arise
 in practice.   Computing  limitations are  rapidly encountered.   A region
                                    IV-5

-------
fifty kilometers on a side and subdivided into five vertical  layers  requires
12,500 separate grid cells if grid cells are one kilometer on a side.
Maintaining a sufficient number of species to allow the functioning  of a
chemical kinetic mechanism compounds the storage problem.  For a ten-
species mechanism, storage of the concentrations for each species in each
grid cell in our example would alone require 125,000 storage  locations.

     To avoid these and other computing or numerical problems, most  grid
models solve only one atmospheric state equation—the conservation of
mass, or continuity, equation, decoupling the other two.  The momentum
equation is replaced by meteorological data supplied to the model in the
form of spatially and temporally varying wind fields.  The energy equation
is supplanted by externally supplied vertical temperature profile data,
from which inversion heights are also calculated.

     Other problems are encountered in solving the mass continuity equation,
a principal such problem being the atmospheric viscosity terms.  Turbulence,
which is a randomly varying quantity, can be described only in statistical
terms.  Species concentrations, as a result, can be found only as values
averaged over some time interval.  Also, the continuity equation can be
solved only if  turbulence effects are decoupled through a series of  approxi-
mations involving  turbulence gust eddy sizes and strengths.

     Grid models require the specification of time-varying boundary  condi-
tions on the outer sides and the top of the modeled region, the initial
Conditions (species concentrations) in each grid cell at the  start of  a
simulation, and spatially and temporally varying emissions for each  pri-
mary pollutant species.  The first two of these are derived from station
measurement data, and the last is obtained from an appropriate emissions
inventory for the modeled region.
                                    IV-6

-------
     Grid models  are  capable of considering both reactive and relatively
nonreactive pollutant species.   Models considering reactive species,
because of their  limited time scale (less than several  days), are
appropriate tools only for addressing questions involving pollutants
having short-term standards (0~, CO, HC, and S02) and for medium-range
pollutant transport (an urban plume, for example).  Some grid models
are designed to model large spatial regions (such as the Northern Great
Plains—see Liu and Durran, 1977) and thus can address  long-range transport
questions.  At their present state of development, these models are appropriate
tools only for examining questions involving relatively nonreactive pollutants
(principally long-term S02 and TSP).

     There are two major classes among grid models:  region oriented  and
specific source oriented.  In the first class, two basic variants exist:
urban scale and regional scale models.  The first of these attempts to
model the urban environment, considering emissions from a number of dif-
ferent sources and simulating both reactive and  relatively non-reactive
pollutant species over a spatial scale on  the  order of tens of kilometers
through a temporal scale of 8 to 36  hours.  Regional-scale models, on the
other hand, represent an attempt to model  long-range pollutant transport
over a spatial scale of hundreds of  kilometers through a  temporal scale
of several days.  Emissions are  assumed  to come  from a few widely dispersed,
usually rural,.sources; the pollutants  considered are  relatively nonreac-
tive  (or more  precisely, slowly  reactive)  ones such as S02-   (Though
S02 -*• S0=  it  does so much more  slowly  than the time scale of reactions
involving  the  more reactive  species.)   One such model  was developed by SAI for
use  in assessing the air quality impact of large-scale energy development in
the  Northern Great Plains  (Liu and Durran, 1977).

      Because of  their  spatial  extent,  regional oriented  grid models are
appropriate  tools for  addressing regional  (multiple-source)  issues, such
as  SIP/C  and AQMP.   Because  of their spatial  resolution, certain regional
questions  about  single-source issues can also be addressed.   The regional
                                    IV-7

-------
effect of a new source can be assessed.  The subtract!ve regional effect
of removing an existing source also can be estimated, an essential cap-
ability for addressing OSR questions.  However, only grid models specifi-
cally designed to consider a single source have sufficient spatial reso-
lution to assess near-source, or microscale effects.

     Specific source oriented models represent the second major class of
grid models.  Some specific examples of such models are listed later in
this chapter.  Models of  this type are particularly useful in two types
of applications:  examining the behavior of a plume containing reactive
constituents, and accounting for the effects of complex terrain on a
point source plume.  Because of their formulation, these models can con-
sider the effects of plume interaction with ambient reactive pollutants.
This is  of  interest in addressing single-source issues in urban areas with
significant levels of reactive pollutants.  Often urban-scale grid models
are used to predict the ambient conditions with which the plume interacts.

     Those  models designed for applications in complex terrain can be used
when it is  necessary to describe explicitly the wind fields and inversion
characteristics encountered by a dispersing pollutant.  Although simpler
models exists, they are often inadequate when applied in situations in
which terrain is particularly complex or when photochemical reactivity
is important.

b.   Trajectory Subcategory

     Trajectory models employ a reference coordinate system that is allowed
to move with the particular air parcel of interest.  A hypothetical column
of air is defined, bounded on the bottom by the ground and on the top by
the inversion base (if one exists), which varies with time.  Given a speci-
fied starting point, the column moves under the influence of prevailing
winds.   As  it does so, it passes over emissions sources, which inject pri-
mary pollutant species into the column.  Chemical reactions are simulated
in the  column, driven by a photochemical kinetic mechansim.  Some trajectory
                                   IV-8

-------
models  allow the  column  to be partitioned vertically into several  layers,
or cells.   Emissions  in  such models  undergo vertical mixing upward from
lower cells.  Other trajectory models  allow only a single layer;  in these,
vertical mixing is  assumed to be uniform and instantaneous.

     The forumlation  employed by trajectory models  to describe atmospheric
dynamics represents an attempt to solve the mass continuity equation
in a moving coordinate system.  The remaining state equations—conservation
of momentum and energy—are not solved explicitly.  As is done in grid models,
solution of the momentum equation is avoided by specification of a spatially
and temporally varying wind field, while solution of the energy equation
is sidestepped by externally supplying temperature  and inversion base height
information.

     Several basic  assumptions are invoked in the formulation of trajectory
models.  Since only a single air column is considered, the effects of
neighboring air parcels  cannot be included.  For this reason, horizontal
diffusion  of pollutants  into the column along its sides must be neglected.
This may not seriously impair model  results so long as sources are suffic-
iently  well distributed  that emissions can be idealized as uniform, or
nearly  so, over the region of interest.  However, if the space-time track
of the  air column passes near but not over large emissions sources, neglect
of the  effect of the  horizontally diffusing material from those sources
might cause model results to be deficient.  In general, problems occur
whenever there are significant concentration gradients perpendicular to
the trajectory path.

     Also, the column is assumed to retain its vertical shape as it is
advected by prevailing winds.  This requires that actual winds be  ideal-
ized by means of a mean wind velocity assumed constant with height.  Because
of the earth's rotation and frictional effects at ground level, winds aloft
usually blow at greater speeds than do surface winds, and  in different directions.
This produces an effect known as wind shear, which  is neglected in trajec-
tory models.  If emissions  are evenly distributed in amount and type over
                                   IV-9

-------
the region of Interest and winds are also uniform, this may not represent
a serious deficiency.  In such a case, material blown out of the column
by wind shear effects would be replaced by similar material blown into
it, with the net effect on model results expected to be small.  However,
If a significant fraction of the emissions inventory is contributed by
large point sources or if wind patterns display significant spatial vari-
ation, neglect of wind shear can seriously impair the reliability of
trajectory model results.

     Additionally, many trajectory models assume that the horizontal dim-
ensions  of the air column remain constant and unaffected by convergence
and divergence of the wind field.  Where winds are relatively uniform,
this may not  be  of serious consequence.  Where winds have significant
spatial  variation, as could be the case in even mildly complex terrain,
however, this assumption could lead to deficient results.  In the San
Francisco Bay region, for example, wind flow convergence during the day
causes the merging of several air parcels.  Peak pollutant concentrations
subsequently  occur in this merged "super-parcel."  A trajectory model
would be an inadequate tool for addressing problems in such a region.

     In  general, trajectory models require as inputs much the same types
of data  required to exercise a grid model.  Emissions are required along
the space-time track of the air column.  Wind speed and direction must be
provided to determine its movement.  Vertical temperature soundings must
also be  input in order to determine the height of the column (the height
of the inversion base).  Although these data need be prepared only for
the corridor  encompassing the trajectory path, general application of the
model to  an entire urban area requires that data be prepared for a. signi-
ficant portion of the region.

     Two major classes exist among trajectory models:  region oriented and
specific source oriented.  The first of these classes includes those models
designed to address multiple-source, regional issues, usually in urban

                                    IV-10

-------
areas.  The second class contains  so-called  reactive plume models.  For
reasons noted above, the use  of trajectory models  is appropriate on an
urban scale only  in certain circumstances.   Careful screening  is required
of the emission and meteorological  characteristics in a proposed appli-
cation region to  insure the appropriateness  of trajectory model usage.

    The second class of trajectory models includes those designed to
evaluate the air  quality impact downwind of  a specific source.  Because
of the underlying equation formation,  these  models are more  appropriate
for use in areas  having relatively simple terrain. However, because
they are capable  of simulating  photochemical reaction, they  can be used
in addressing issues involving  reactive pollutants.  Often,  region ori-
ented models are  used to generate  the  ambient conditions with  which the
reactive plume downwind of the  source  must interact.  For all  trajectory
models considering reactive pollutants, the  time scales remain short  (less
than several days).  Consequently, they are  inappropriate for  consideration
of problems involving pollutants subject to  long-term standards.

c.  Gaussian Subcategory

    In the formulation of Gaussian models,  the atmosphere  is  assumed to
consist of many diffusing pollutant "puffs," all moving on  individual
trajectories determined by prevailing  winds.  The concentration at any
point is assumed  due to the superimposed effect of all puffs passing  over
the point at the  time of observation.   Rather than keeping  track of the
path of each puff, their motion (both  advection and diffusion) is described
in terms of conditional state transition probabilities.  Given an initial
location at a particular time,  this state transition probability describes
the likelihood that the puff  will  arrive at  another specified  point a
given time interval later.  With an entire  field specified  at  some refer-
ence time, the net expected effect at  a.particular point and time is  calcu-
lated by determining the integral  sum  of the separate expected effects of
each puff in the  field.
                                    IV-11

-------
     Central  to this type  of formulation  is  a  knowledge  of the  time-varying
state transition probabilities  for the entire  concentration field.   In
practice, turbulence nonuniformities  and  terrain-specific effects  combine
to render it unlikely that such probabilities  can be  determined.   To over-
come this difficulty, traditional  Gaussian models (among others, those
recommended by the EPA) invoke  several assumptions.   First, the turbulence
field is assumed to be stationary and homogeneous, which implies it has
two Important qualities:  First the statistics of the state transition
probabilities can be assumed dependent only  on spatial displacement, thus
removing their time-dependency; and second,  the probabilities are  not
dependent on puff location in the field, thus removing spatial  variability.
These are satisfactory approximations so long as significant differences
do not exist between turbulence characteristics of the atmosphere  in dif-
ferent portions of the region to be modeled.  For applications  in  complex
terrain, for instance, such an assumption might not be justified.

     Once turbulence field stationarity and homogenity have been assumed,
it still remains to specify the functional form of the state transition
probability.  Gaussian models derive their name from their assumption that
this probability function is Gaussian in form.  Given this assumption,  the
concentration field can be determined analytically by evaluating the integral
expressing the summation of separate effects from all pollutant puffs affect-
ing the region of interest.  In order to isolate the effect of  an  individual
source, only puffs containing pollutants emitted from that source  are
considered.

     Concentrations about the plume center!ine are assumed to be  distri-
buted according to a Gaussian relationship,  whose vertical and  horizontal
cross-sectional shape is a function of downwind distance from the  source
and atmospheric stability class.  Analytic forms can be determined express-
ing the form of the downwind concentration field for several different
types of emissions regimes:  instantaneous "puff," continuous point source
emission (steady-state), continuous emissions from an area source, and
continuous emissions along a line source.
                                  IV-12

-------
     Several  other assumptions are invoked in Gaussian steady-state models.
The vertical  and horizontal  spread of the plume is  assumed characterized
by dispersion coefficients,  whose values  are dependent on the distance
downwind of the source.   They are assumed to be functions of atmospheric
stability and are thus characterized by stability class.   Specific values
are obtained from standard workbooks, such as that developed by  Turner, or
evaluation of data measured downwind of actual sources.

     In many models,  plume interaction with the ground and the inversion is
considered.  Usually, perfect or near-perfect reflection is assumed  to
occur.   Multiple reflections are often modeled, although some models  assume
that beyond a certain downwind distance mixing is uniform between the ground
and the inversion base.

     Consideration of plume rise is made in Gaussian point source models.
Depending upon ambient atmospheric conditons, such as temperature and humi-
dity, hot gases from an emitting stack may rise, sink or remain at the  same
height.  Simplifying thermodynamic equilibrium relationships, such as that
developed by Briggs, are often used to estimate the magnitude of plume  rise.

     Two major classes of Gaussian models exist:  long-term averaging and
short-term averaging.  Though both invoke the basic Gaussian assumptions,
major differences exist in formulation.  Long-term models divide the region
surrounding each source into azimuthal sectors.  The long-term variation
of the wind at the source must then be specified by wind speed and direction
(by sector) classes, along with the frequency of occurrence for each combin-
ation.   This information usually is conveyed in the form of a "wind rose."
Data describing the frequency of occurrence of the various atmospheric
stability categories must also be specified.  The probability of occurrence of
of stability category/wind vector (speed and direction) combination is  then
used to weight the downwind concentrations resulting from it.  The weighted
sum represents the expected value of the long-term averaged pollutant con-
centration.  Models employing this so-called "climatological" formulation
                                  IV-13

-------
are appropriate tools for addressing problems involving pollutants  for
which long-term (annual) standards are specified (502, TSP, and NO,,).

     The second class of Gaussian models includes those designed for short-
term analysis.  Prevailing wind direction and speed, as well as emissions
characteristics, are assumed to persist long enough that steady-state con-
ditions are established.  The downwind concentration field resulting from
source emissions can then be evaluated analytically.  Some models allow a
limited form  of temporal variability by dividing the modeling day into
segments  (perhaps  one hour long), during each of which conditions are assumed
to be in  steady state.   Source strengths and prevailing wind speed at the
height of emissions  release are required for each segment, as are sufficient
vertical  temperature profile data to calculate inversion base height, if
one exists, and atmospheric stability class.  The last of these is required
in order  to determine vertical and horizontal dispersion coefficients.
Because wind  data  frequently are not available at the height of emission
release,  surface wind measurements are extrapolated.  Wind speed is assumed
to vary vertically according to a power law, the exponent of which is given
as a function of stability class.  Determination of stability class is made
by one of several  appropriate methods, each of which is also dependent on
surface observations.

      Both Gaussian classes contain models that can be used to estimate the
impact of single or  multiple sources.  Some models are designed to consider
only a single point  source; others can model many different sources simul-
taneously. Consequently, the first group of these is appropriate only for
addressing single-source issues; the second group can be  used to consider
multiple-source issues  as well.  Most models in this second group, though
able to account for  many sources, can also simulate as few as one.  They
can thus  be used to  consider both single and multiple source issues.

      Full  consideration of regional-scale issues (SIP/C and AQMP) requires
of a model the ability  to simulate  all types of sources:  point, area, and
line.  Not all multiple-source Gaussian models are capable  of doing so.

                                  IV-14

-------
Some are used to consider only point and area sources;  others  are  used  to
consider line sources  only.   These latter nre usually intended for use  in
addressing traffic related questions; they might be used, for  instance,
to estimate the impact of emissions from a full  highway network on regional
CO distribution and level.  Consequent to the above, consideration of all
source types in a region may require the joint use of more than one model--
one considering point  and area sources and another simulating  line sources.

     An important restriction exists on the type of pollutant  species
that can be simulated  using Gaussian models.  Because the formulation
cannot accommodate explicit kinetic mechanisms,  only relatively nonreactive
pollutants can be modeled (CO, TSP, and SOp).*  However, some  models incor-
porate first-order, exponential decay to account for pollutant removal
processes and limited  species chemical conversion.  Multiple-source Gaussian
models assume that the combined effect of many emitters can be calculated
by linearly superimposing the effects from each  individual source.  Such
an assumption would be an erroneous one if questions involving reactive
species were being considered.

     Some Gaussian models have been designed to simulate the effects of
point source emissions in complex terrain.  Various assumptions are made
about the behavior of  the plume and the variation in height of the inver-
sion base as an obstacle is approached.  Usually the plume is  allowed to
impinge on the obstacle without any sophisticated means to account for  flow
alteration, although some models allow for flow convergence and divergence
in the wind field.  Also, the base of the inversion is sometimes assumed to
be at constant height  above the source; in other models it is  assumed to be
a fixed distance above the terrain, thus varying with it.  However, the
Gaussian formulation depends on the assumption of turbulence field station-
arity and homogeneity.  This is a simplification that may not be justified
in many applications in complex terrain.
* Long-term Gaussian models are also used to model annual N02, a reactive
  species, for which no short-term standard currently is set.  This
  usually is accomplished by combining NO and N02 as NOX, the "species"
  modeled.  NOX exhibits less variability during the day than NO? taken
  separately.

                                  IV-15

-------
 d.   Box Subcategory

     Box models  are the simplest of the physico-chemical models.  The region
 to  be modeled  is treated as a single cell or box, bounded by the ground
 on  the  bottom, the inversion base  on the top, and the east-west and north-
 south boundaries on the sides.  The box may enclose an area on the order
 of  several  hundred square  kilometers.  Primary pollutants are emitted into
 the box by the various sources  located within the modeled region, under-
 going uniform and instantaneous mixing.  Concentrations of secondary pol-
 lutants are calculated through  the use of a chemical kinetic mechanism.
 The ventilation characteristics of the modeled region are represented,
 though  only grossly,  by specification of a characteristic wind speed.

     Because of their formulation, box models can predict, at best, only
 the temporal variation of  the average regional concentration for each
 pollutant species.  Consequently,  they are capable of addressing only multi-
 ple source, regional  issues.  Furthermore, such models are useful only in
 regions having relatively  uniform  emissions.  In those areas where point
 sources contribute significantly to the emissions inventory (in number and
 amount),  the assumption of emissions uniformity may be an unsatisfactory one.

     Box  models require only limited data.  Emissions can be specified on
 a regional  basis, eliminating any  need for determining their spatial
 variation.   Only simple meteorological data need be supplied as input.  For
 these reasons, box models  can be used when little information is available.
 They are  more appropriately used as screening tools, helping to identify
 those situations requiring more extensive data collection and modeling
 analysis.

 B.   GENERIC ISSUE/MODEL COMBINATIONS

     The discussion in the previous section outlined the characteristics
 of  generic  classes of air  quality  models.  In this section we associate
 generic model  type with generic issue category.  In so doing, we indicate
 the gross suitability  of a generic mjdel type as a tool in addressing a
particular  issue.  As  noted earlier, each generic model (GM) has associated
with it a set  of limitations on its use.  In Section C we summarize the
                                   1V-16

-------
effects  of these  limitations.   We  first  classify  types of actual applications
according to several  key  attributes  and  then  indicate those which each GM
is capable of considering.   The result is  an  enumeration of possible model/
application combinations.

     In  order to  match model  to issue, we  present in Table IV-1 a matrix of
model/issue combinations.   For each  GM,  an indication is provided of its
usefulness in addressing  each of the seven generic issues identified in
the previous chapter.  Even where  a  GM is  indicated as suitable, however,
its inherent limitations  (some of  which  are noted in the table) may prevent
its use  in certain applications.   Consequently,  further examination is
required in order to  make a final  GM selection.

     Summarizing  the  basic features  of Table IV-1, we note the following:

     >  Grid Models
        -  Region Oriented Models.  Urban  scale  models are able to
           address multiple-source issues  (SIP/C, AQMP)  involving
           both reactive  and nonreactive pollutants.  Their  short-
           term temporal  scale (< 36 hours), however, restricts
           them to problems involving pollutants with short-term
           standards  (03> HC, CO,  and secondary  S02).  Their spatial
           resolution (on the order of  tens of kilometers) allows
           them to address some single-source issues  (OSR, EIR, LIT).
           Regional scale models,  as opposed to urban scale  ones,  are
           more oriented  towards application in  rural  areas  (few  sources)
           involving nonreactive Cor rather, slowly reactive) pollu-
           tants, such as S02, TSP, CO,  and N02, which  is slowly  reactive
           in nonurban areas because of limited  ambient  HC).   Their
           short-term temporal scale (on the order of a  week or less),
           often a practical restriction due to computing requirements,
           limits their use in predicting long-term pollutant concen-
           trations (S02, TSP, N02).  They are suited for addressing
           questions involving single-source issues (PSD, NSR, EIS/R, .
           LIT) in isolated rural  areas.

                                   IV-17

-------
           TABLE IV-1.   AIR QUALITY  ISSUES  COMMONLY ADDRESSED
                              BY  GENERIC MODEL  TYPE
                                                       Issue Category
  E.n.HC *,^ TVO.                   IM—HE5i£     HS    Bft    LM     01
Refined usage
1.  Srld1                                                x        z
    a.   Region Oriented                X         X                *       *3     *
    b.   Specific Source Oriented                          x       X       X      X         X
I.  Trajectory1
    I.   Region Oriented                XX                        X^     X         X
    ».   Specific Source Oriented                          X       I       X      X         *

J.  Causslan3
    I.   Short-Term Averaging                                              ill
        1) Multiple Source              X         X        *               J      J         J
        lit C<~.1»  t«ll*M               V                  *****
           Single Source               X
    b.   Long Tei» Averaging*            «          I        X        X       X      X        X
 Refined/Screening Usage
 4. isopleth1'5                        X          x
 Screening Usage
 S. tollbjck                           x         "
 «.|ox                               XX

Notes:
    1.  Only short-Urn ttne scales can be considered (less  than several days).
    2.  Regional !«<>act of new sources can be assessed but not near-source, or nlcroscale, effects.
    3.  Only non-reactive pollutants can be considered.
    4.  Only pollutants having long-tern standards  can be considered (SO^, TSP,  and NO^).
    5.  Only photochemlcatly active pollutants can  be considered.
                                         IV-I8

-------
-  Specific Source Oriented Models.  These models  are  used
   primarily for addressing single-source issues (PSD, NSR,
   OSR, EIS/R, LIT).  This class contains the so-called
   reactive plume models.  Their ability to consider reactive
   pollutants makes them suitable for urban applications  or
   rural applications where plume reactivity is important.
   However, because OSR (a primarily urban issue)  requires an
   estimate of the subtractive effect of removing  an existing
   source, only questions involving pollutants for which  linear
   superposition is approximately valid, i.e., nonreactive
   pollutants, can be addressed in an urban area with a specific-
   source model.  These models are also suitable for use  in
   applications where terrain complexity is important.
Trajectory Models
-  Region Oriented Models.  With some important restrictions,
   these models can be suitable for use in addressing multi-
   ple-source issues (SIP/C and AQMP) and, in limited circum-
   stances, some single-source issues (OSR, EIS/R, LIT).  Among
   the most important of such restrictions are the following:
   Emissions must be approximately uniform over the modeling
   region; air flow cannot be complex enough to cause merging
   of air parcels, i.e., flow convergence or divergence should
   not be important; and horizontal diffusion effects should
   not have significant nonuni fertilities, e.g., large point
   sources near but not within the space-time track of the
   advected air parcel being modeled.  Because chemical kinetic
   mechansims can be included in their formulation, these models
   are capable of considering reactive as well as nonreactive
   species.  Their temporal scale  is so short, however, that
   no estimates of long-term concentration averages can be
   computed.
-  Specific Source Oriented Models.  Subject to the same restric-
   tions mentioned above,  these models can be appropriate tools
   for  use  in considering  single-source  issues  (PSD,  NSR, OSR,
   EIS/R,  LIT).   Because  they can  consider reactive pollutant

                            IV-19

-------
   species, they can be used in applications involving reactive
   plumes.  Limited terrain complexity can also be simulated,
   so long as the abovetnentioned restrictions are not violated.
Gaussian Models
   Long-Term Averaging Models.  These models can be used to
   address both multiple-source issues (SIP/C, AQMP) and some
   single-source issues (PSD, OSR, EIS/R, LIT).  Because of
   the Gaussian formulation they cannot consider chemistry or
   surface removal effects beyond first order, i.e., exponential
   decay.  Thus, they are appropriate tools only for addressing
   questions involving nonreactive (slowly reactive) pollutants.
   Their  temporal scale is such that only pollutants having
   long-term (annual) standards can be considered (S02 primary
   standard, TSP, N02> where N02 is taken as NO + N02» i.e.,
   NO  ).  As currently configured, these models are appropriate
     ^
   for use in both urban and rural settings, although
   the terrain in such applications should be relatively
   simple.
-  Short-Term Averaging Models.  Two variants exist among
   these models:  multiple-source and single-source.  The
   types of issues they may be used to address divide
   similarly.  Some multiple-source models, however, do
   not consider all types of sources:  Some consider only
   point  and area sources; others consider only line
   sources.  The latter group is useful for examining the
   effects of traffic-related pollutants (particularly CO)
   resulting from highway network emissions.  Consequently,
   if regional questions are to be addressed, the concur-
   rent use of more than one model may be required.  Only
   relatively nonreactive pollutants may be examined
   using  this type of model.  Because of their short-term
   temporal scale, these models are best suited for
   addressing questions involving pollutants having short-
   term standards (CO, S02 secondary standard).
                            IV-20

-------
Rollback Models
   Because rollback models lack spatial resolution,  they
   are appropriate only for considering questions involving
   multiple-source issues (SIP/C, AQMP).  Their use  is
   generally confined to urban areas located in simple
   terrain.  Their assumption that emissions are directly
   proportional to peak pollutant values is a technically
   limiting one.  Consequently, they should be viewed as
   screening tools to evaluate the need for more extensive
   analysis and data gathering.
Isopleth Models
   Lacking spatial resolution, isopleth models are appro-
   priate only for use in addressing multiple-source
   issues (SIP/C, AQMP).  Employing ozone isopleth dia-
   grams derived through the use of a photochemical
   kinetic mechansim, these models are  designed  to examine
   questions involving reactive pollutants  (0-,  HC, short-
   term N02).  Their use is most appropriate for applications in
   urban areas located in simple terrain.   Because the isopleth
   diagram is constructed using regional ventilation, emissions,
   and background/transport assumptions, it is similar to
   the box models, which are described  below.  Like the
   box model, its technical limitations, except  under
   exceptional circumstances,  render it more useful and
   reliable as a screening tool to evaluate the  need for
   more extensive analysis.
Box Models
   Because they lack spatial resolution, box models are
   appropriate only for use in considering multiple-source
   issues (SIP/C, AQMP).  They assume  spatially  uniform
   emissions.  For this reason, their  use is more suited
   to areas that are urban or  semi-urban.  They  are best
   used in modeling areas Icoated in simple terrain but have
                            IV-21

-------
           also been  used  in applications  in complex  terrain.  An
           example of the  latter type of application  might be the
           modeling of a mountain valley containing several ski
           resorts and related developments.  Technical  limitations
           render the box  models more suitable as screening tools.
C.   MODEL/APPLICATION COMBINATIONS

     In the previous section we  discussed the  relationship  between generic
models and generic issues.   In this  section we associate those generic
models and the specific applications in  which  they may be used.   We first
classify applications by means of  several key  attributes.  We then com-
pare the possible values of these  with model capabilities.   For each generic
model type, we are thereby  able  to identify the range of applications for
which the model is suited.

     Applications are characterized here by five attributes:  number of
sources, area type, pollutant, terrain complexity, and required resolution.
In Table IV-2 we list the possible designations these attributes may assume.
Against these we match generic model capabilities, identifying the list of
designations for which each is suitable.  A chart of the resulting model/
application combinations is presented in Table IV-3.  While exceptions may
occur, the list of attribute designations shown is chosen based upon con-
siderations presented earlier in this chapter.

D.   SOME SPECIFIC AIR QUALITY MODELS

     Our central purpose in this report  is to  discuss means for setting
suitable standards for model performance.  As  prologue to this, both
air  quality  issues and  the models used to address them needed to be
examined.  We  have done so in general terms to this point.   Throughout
this discussion we have referred to air quality models only in generic
                                   IV-22

-------
terms.   By doing so, several advantages were achieved:   General  conclu-
sions appropriate to an entire class of models could be stated without
reference to any specific model, and extensive discussions  of any  observed
differences between intended capabilities and technically achieved ones
were not necessary for each particular model.
      TABLE IV-2.    POSSIBLE DESIGNATIONS OF APPLICATION ATTRIBUTES
              Attribute
         Number of Sources
       Possible Designations
Multiple-Source
Single-Source
         Area Type
         Pollutant
Urban
Rural

Ozone (03)
Hydrocarbon  (HC)
Nitrogen Dioxide (N02)
Sulfur Dioxide  (S02)
Carbon Monoxide (CO)
Total Suspended Particulates (TSP)
         Terrain Complexity
Simple
Complex
         Required Resolution     Temporal
                                 Spatial
                                  IV-23

-------
                  TABLE  IV-3.     MODEL/APPLICATION  COMBINATIONS
 teneric Model Type

KTIKED USAGE

6rid

«.  legion Oriented
   Number of
    Sources
Area TV
Multiple-Source    Urban
                  Kuril
                                  Pollutant
                         Terrain
                        Complexity
                         Required
                        Resolution
03. HC. CO. NO;      Simple              Tempora
(1-hour).  SO?        Complex (Limited)   spatial
(3- and 24-hour).
TSP
k.  Specific Source     Single-Source      Rural
    Oriented
                               03. HC. CO. NO;      Simple              Temporal
                               (1-hour). 502        Complex (Limited)
                               (3- and 24-hour).
                               TSP
 Trajectory

 «.  Region Oriented    Multiple-Source    Urban
6.  Specific Source    Single-Source     Urban
    Oriented                             Rural
                               0), HC,  CO. NO;
                               (1-hour). SO;.
                               (3- and  24-hour).
                               TSP

                               03. HC.  CO. NO;
                               (1-hour). S02.
                               (3- and  24-hour).
                               TSP
                     Simple
                                                                                             Temporal
                                                                                             Spatial (Limited)
                     Simple               Temporal
                     Complex (Limited)     Spatial (Limited)
tamsian

a.  Long-Term
    Averaging)

6.  Short-Term
    Averaging
 Multiple-Source    Urban
 Single-Source      Rural

 Multiple-Source    Urban
 Single-Source      Rural
 SO; (Annual). TSP.  Simple              Spatial
 HO; (Annual)*
 SO;  (3- and 24-     Simple              Temporal
 hour). CO, TSP,     Complex (Limited)    Spatial
 NO;, (l-hour)«
R£FJNEO/SCREENING USAGE

Uopleth               Multiple-Source    Urban
                               03, HC. NO?
                               (T-hour)
                    Simple
                    Complex (Limited) •
                     Temporal  (Limited)
SCREENIMC USAGE

toll tuck


Box
Multiple-Source    urban
Single-Source     Rurai
Multiple-Source    Urban
03. HC. »2
SO;. CO. TSP
03. HC. CO. NO?
(I-hour). SO?
(3- and 24-hour).
Simple
Complex (Limited)

 Simple
 Complex (Limited)
                                                                                             Temporal
• Only if N02 it taken to be total M>x
                                               IV-24

-------
     Having made  our  general  points  in  previous  sections,  however, we
associate  here  some specific  models  with  our  generic model  categories.
Though  this  is  not central  to our discussion  of  model  performance
standards, it may be  helpful  in linking specific models  to the  issues
and applications  for  which  they are  most  suited.

     In Table IV-4 we associate a number  of specific models with the generic
model types  identified earlier.  We  included  many  of the models with
which we were familiar.   Because the list is  intended  only to be a
representative  one, we did  not seek  to  make it fully complete.  Many
other models, particularly  Gaussian  ones, certainly exist  and would
be appropriate  for use in the proper circumstances.

     For the  models listed  in Table  IV-4, a detailed summary of their
characteristics is provided in Appendix B.  Among  the  information
contained  there is the following: model  developer, EPA  recommendation
status, technical description, and model  capabilities.  The last of these
is further subdivided  into source type/number,  pollutant  type, terrain
complexity,  and spatial/temporal resolution.


E.   AIR QUALITY  MODELS:  A SUMMARY

     In Chapter III we identified generic classes  of air quality issues.
In this chapter we defined  generic types  of models.  Having done so, we
associated the  two, identifying those issues  for which each model was a
potentially suitable  analysis tool.   We also  described the technical formula-
tions employed  in each generic type  of  model, indicating some key limitations.

     As noted in  Table IV-1,  several generic  model types may be of potential
use in  addressing the same  generic class  of issue. Only by considering the
characteristics of a  proposed application can a  final  choice of model be
                                  IV-25

-------
           TABLE  IV-4.   SOME AIR QUALITY MODELS
     Generic Model  Type
Refined Usage
  Grid
  a.  Region Oriented


   b. Specific Source Oriented

  Trajectory
  a.  Region Oriented


  b.  Specific Source Oriented

  Gaussian
  a.  Long-term Averaging
   b.   Short-term Averaging
 Refiner/Screening Usage
    Isopleth

 Screening Usage
    Rollbacjc


    Box
Specific Model Name
  SAI
  LIRAQ
  PICK
  EGAMA
  DEPICT
  DIFKIN
  REM
  ARTSIM
  RPM
  LAPS
   AQDM
   COM
   CDMQC
   TCM
   ERTAQ*
   CRSTER*
   VALLEY*
   TAPAS*

   APRAC-1A
   CRSTER*
   HANNA-GIFFORD
   HIWAY
   PTMTP
   PTDIS
   PTMAX
   W
   VALLEY*
   TEM
   TAPAS*
   AQSTM
   CALINE-2
   ERTAQ*
   EKMA
   WHITTEN
    LINEAR ROLLBACK
    MODIFIED ROLLBACK
    APPENDIX J
    ATDL
 * These models  can  be  used  for both long-term and  short-term
   averaging.               iy_26

-------
made.   To facilitate the comparison between model  capabilities  and appli-
cation requirements, we defined a set of application attributes.  We  then
matched the two,  identifying for each generic model  type the combinations of
application attributes for which it was suited.

     In this chapter we defined the interface between issue, model, and
application.  In  addition, we mentioned some specific air quality models
within each model category, giving additional detail on each in Appendix B.

     With the completion of this chapter, we are ready to consider model
performance measures.  In the next chapter, we identify performance measures
appropriate for the consideration of each air quality issue.  Having  done
so, we examine the interface of performance measure and model category.
Finally, in Chapter VI, we discuss several alternative rationales and
formats for setting model performance standards.  These are designed  to  be
consistent with the performance measures defined in Chapter V.
                                    IV-27

-------
                 V    MODEL PERFORMANCE  MEASURES
    The central purpose of this report is to identify means for
setting standards for air quality model performance.  As prologue to
doing  so,  we identified generic types  of  air quality issues  in Chapter III
and generic classes  of air  quality  models in Chapter IV, exploring their
interrelationships.  Now  it remains to discuss  the model performance mea-
sures  for  which performance standards  must be set.   Several  rationales for
setting these standards are presented  in  Chapter VI.

     In this chapter our discussion proceeds as  follows:  We first
identify generic types of performance  measures;  we then suggest some
specific performance measures  (describing them  in detail in Appendix
C); and finally we match generic performance measures to the issue/
model/application combinations presented  in  earlier chapters.  Before
beginning, however,  the notion of a model "performance measure" needs
to be  defined in more detail.

     Typically, air  quality models  are used  in  the following context:
a problem  is posed,  a model  is chosen  that is suitable  for  use in
addressing the issue/application, existing data are assembled for  in-
put and additional  data are gathered (if  needed), and a  simulation  is
conducted.  Results  often  are expressed in the form of spatially and
temporally varying  concentration predictions for one or many pollutant
species.   Since most problems are  hypothetical  ones posing  "what-if"
questions  (e.g., what if  a new power plant is built, or what if
population growth  and development proceeds as forecast),  model  results
in such situations  are inherently nonverifiable.  Consequently,  before
its results can be accepted,  the reliability of  the chosen model  must be
demonstrated.  Most frequently, "validation" is  accomplished by using
the model  to simulate pollutant concentrations  in a test situation

                                   V-l

-------
 which is  similar to  the hypothetical one and for which measurement
 data are  available.   A region-oriented model  (urban  or regional  scale)
 may be required  to predict  region-wide  concentrations resulting from
 conditions  existing  on some past date.  A specific-source model may
 have to reproduce the downwind  concentrations  resulting  from emissions
 from an existing source having  size  and siting characteristics similar
 to the proposed one.  If  its predictions are judged  to be  in  sufficient
 agreement with observed data, the  model is  then accepted as a satis-
 factory tool for use in addressing the hypothetical  problem.

     However, what do we mean by "satisfactory" agreement between predic-
 tion and observation?  What are the quantities most appropriate for use
 in characterizing differences between  the two? Within what range of
 values must these quantities remain?  The values for how many different
 quantities  must be "satisfactory"  before we judge model  predictions to
 be acceptably near test case observations?

      In this  chapter, we explore the second of these questions.   In doing
 so, we identify  a  set of model performance measures, surrogate quantities
 whose  values  serve to characterize the  comparison between prediction and
 observation.  We match these performance measures with the  generic types of
 air quality issues identified in Chapter III and the generic classes of air
 quality models listed in Chapter IV.   We defer until  Chapter VI  the next and
 final  step:   the specification of model performance standards  against which
 to compare for acceptability the values of the model  performance measures.

 A.   THE COMPARISON OF PREDICTION WITH OBSERVATION

     Before accepting a model for use in addressing hypothetical air
 quality questions, the user must validate it.   This is often done by
 demonstrating its ability to reproduce a set of test results, usually
 consisting of observational concentration data recorded at a number of
measurement stations for several hours during  the day.  In comparing
predictions with observation, several questions should be asked.  Among
these are the following:
                                     V-2

-------
>  What are the differences?  How much  does  prediction
   differ from observation at the location of the  peak
   concentration level  and at each of the  monitoring sta-
   tions?  What is the  spatial  and temporal  distribution
   of the residuals (the difference between  prediction and
   observation)?  Do these differences  correlate with diur-
   nal changes in atmospheric characteristics (mixing
   height, wind speed,  or solar irradiation, for instance)?
   If more than one species is  being considered, are there
   differences in performance between each species?
>  How serious are the  differences?  Are peak concentration
   levels widely different?  Are the estimates  of  the area
   in violation of the  NAAQS in substantial  disagreement?
   How near to agreement are the estimates of the  area ex-
   posed to concentrations within 10 percent of the peak
   value?  Are differences in the timing and spatial dis-
   tribution of concentrations  such that the expected
   health impacts on the population (exposure/dosage) are
   of different magnitude?  Do the predicted and observed
   patterns and levels  of concentrations lead to seriously
   different conclusions about the required  amount and cost
   of emissions control?  Are policy decisions  deriving
   from prediction and  observation different (such as a
   "build-no build" decision on a power plant based on PSD
   considerations)?
>  Are there straightforward reasons for the differences?  Are
   the locations and timing of  the concentration peaks slightly
   different between prediction and observation?   (If con-
   centration gradients within the pollutant cloud are
   steep, even a slight difference in cloud  location can
   produce large discrepancies at set monitoring sites.
   Such a problem could occur if there were  only slight
   errors in the wind speed or direction input to  the model.
   In such an instance, model performance  might otherwise

                             V-3

-------
        be perfectly adequate.)  Are wide fluctuations in ground-
        level concentrations and thus station measurements produced
        by relatively small discrepancies between the modeled and
        the  actual atmospheric characteristics?  [This "multiplier
        effect" can occur downwind of an elevated point source,
        for  example.  Because the emissions plume from a point
        source has dimensions much greater downwind than crosswind,
        slight changes in the atmospheric profile (stability
        category), having an effect on plume rise and dispersion,
        have a more than proportionate effect both on the downwind
        distance, at which the ground-level peak concentration
        occurs and on the amount of area exposed to a given con-
        centration level.]

      In the  remainder of this chapter we discuss, first in generic
 terms and then  in specific ones, several different types of model
 performance  measures.  While each type and variant is designed to high-
 light different  aspects of the comparison between prediction and obser-
 vation, they all  address the general questions noted above.  Those
 questions, and others like them, are the fundamental ones from which
 the notion of performance measures and standards derive.

 B.   GENERIC PERFORMANCE MEASURE CATEGORIES

      In this section, we define several generic model performance
* measure categories, distinguishing among them on the basis of their
 general characteristics and the amount of information required to
 compute them.  We also note three variants found among measures in each
 category. We  then  introduce some practical considerations which can
 limit the choice of performance measure.  In Section C we list some of
 the specific measures  included in the generic categories, beginning
 with a discussion of the fundamental differences between those designed
 to measure performance on a regional scale and those characterizing it
 on a specific-source scale.  Details of  these specific measures are
 provided  in  Appendix C.
                                    V-4

-------
  1.   The Generic Measures

      We consider here four generic performance measure categories:
  peak, station, area, and exposure/dosage.  The first category contains
  those measures related to the differences between the predicted and
  observed concentration peak, its level, location and timing.  The second
  category includes measures based upon concentration differences between
  prediction and observation at specific measurement stations.  Within the
  third category are contained those measures based upon concentration
  field differences throughout a specified area.  The fourth category in-
  cludes measures derived from differences in population exposure and
  dosage within a specified area.

      Each of these generic performance measure categories requires
  successively greater knowledge of the spatial and temporal distribution
  of concentrations.  We show in Figure V-l a schematic representation
  illustrating several distinct levels of knowledge about  regional  con-
  centrations.  A similar schematic appropriate for source-specific
  situations is shown in Figure V-2.  Listed  in Table V-l  are  the infor-
  mation requirements for the four categories.  These range from an
  estimate of a simple scalar quantity, concentration at the peak,  all
  the way to full knowledge of the spatially  and temporally resolved
  concentration field and population distribution.  For peak measures,
  the concentration residuals (the difference between predicted and
  observed values) are required at a single point and time.   For station
<  measures, the temporal variations of the  residuals are required at
  several points.  For both area and exposure/dosage measures, the  full
  residual field is required, both spatially  and temporally  resolved.
  The latter type of measure  requires, in  addition, the  spatial and
  temporal history of population  movement within the area of  interest.

      As the  information content  increases,  the ability of the performance
  measure to characterize  the comparison  between prediction  and observation
  also can increase.  However, measures  from  different  categories  tend to
  emphasize different aspects of  the  comparison.   For  this reason,  several
                                    V-5

-------
0>
                                 CONCENTRATION
                                 PEAK

                                 VVVV
MEASUREMENT STATION
CONCENTRATIONS
                    CONCENTRATION
                    FIELD
                    C(x,y,t)
        BOUNDARIES OF
        MODELING REGION
                      FIGURE V-l.  VARIOUS LEVELS OF  KNOWLEDGE  ABOUT  REGIONAL  CONCENTRATIONS

-------
                                                       POINT
                                                       SOURCE
 REVAILING
WIND
    MEASUREMENT
    STATION CONCENTRATIONS
    Ci(xi,y.,t)
                              GROUND-LEVEL CON
                              CENTRATION PEAK
                                           CONCENTRATION
                                           FIELD
                                           C(x,y,t)
FIGURE V-2.   VARIOUS LEVELS OF KNOWLEDGE ABOUT SPECIFIC-SOURCE  CONCENTRATIONS

-------
                                     TABLE V-1.  GENERIC PERFORMANCE MEASURE
                                                 INFORMATION REQUIREMENTS
00
                         Generic
                       Performance
                      Measure Type

                     Peak
                     Station
                     Area
                     Exposure/dosage
              Information Required
Predicted and measured concentration peak  (level,
location, and time), I.e.,

             VWVpred.,  Meas.

Predicted and measured concentrations  at specific
stations (temporal  history),  I.e.,

 C1(x1*rtW,Heas.    •     Visitations

Predicted and measured concentration field within
a specified area (spatial  and temporal  history),
I.e.,

                C(x,y,t)pred.tMeas.

Both the predicted and measured  concentration
field and the predicted and actual  population
distribution within a specified  area (spatial
and temporal history), I.e.,
                                                                      ..Meas.

                                                          C(x,y,t)pred |Actual

-------
types  of performance measures  are  usually required in order to fully
characterize a model's ability to  reproduce  observationally obtained
data.

    Because a model predicts  well  the  observed  concentration  peak, for
instance, does not necessarily mean its predictions can  reproduce the
spatially distributed concentration field.   A comparison of the temporal
history of concentration  values  at several specific stations might give
a better indication of spatial model  behavior.   Even this might not
prove  conclusive.  The prevailing  direction  of the winds input to the
model  might have been slightly in  error.  This may have  little impact
on concentration levels,  resulting only in a pollutant cloud slightly
displaced from its actual  location.  If concentration gradients are
steep  within the cloud, station  predictions  might not agree well with
the values observed, even though the model might not be  significantly
deficient.  In such a circumstance, area measures might  provide a better
means  for assessing model  performance.   For  instance, the areas in
excess of a specified concentration value could be compared for several
values ranging between the peak  and background values.

    Even employing the above  measures, the  degree of seriousness of
the disagreement between  prediction and observation might not  be
obvious.  Since health effects result from both the pollutant  level  and
length of exposure, measures expressing differences in exposure/dosage
might  give an  indication  of a  model's ability to estimate the  inter-
action of population with pollutant.  This might be helpful in a number
of circumstances.   For example,  suppose prevailing winds on "worst"  epi-
sode days carry  the  pollutant cloud containing ozone and its precursors
into adjacent  rural  areas before the early-afternoon peak occurs.   If few
people live  in the affected area,  exposure/dosage  measures may  indicate
that the model's failure to accurately  predict peak concentrations is of
little practical  consequence.
                                  V-9

-------
 2.   Some Types of Variations  Among  Performance Measures

      Three types of variations are found among performance measures:
 scalar, statistical, and "pattern recognition."   Those measures
 of the first type are based upon  a comparison of  the  predicted and
 observed values of a specific  quantity:   the peak concentration level,
 for instance.  Those of the second type  compare the statistical behavior
 (the mean, variance and correlation, for example) of  the  differences
 between the predicted and observed values for the quantities  of interest.
 Measures of the final type are useful in providing qualitative insight
 into model behavior, transforming concentration  "residuals"  (the  differ-
 ences between predicted and observed values)  into forms that highlight
 certain aspects of model performance and thus triggering "pattern
 recognition."

       In order  to  Illustrate the types of variations found in each
 generic performance measure category, we present Table V-2.   Some
 typical examples  are included for each category/variation combination.
 In section D of this chapter, a number of specific performance measures
 are  listed.  Examined in detail in Appendix C,  they are classified
 according to the  scheme presented here.

 3.    Several  Practical Considerations

      Several  practical considerations have a strong impact on the choice
 of model performance measures.   Each of these derive from limitations on
 the degree of spatial resolution attainable with most models  and  measure-
 ment  networks.

      Ideally, in assessing the performance of a  model, one might want to
 examine for several hours during the day the agreement between prediction
 and observation  throughout the concentration field (the spatial  distribu-
 tion of concentrations).  Differences between the  predicted and  observed
 values of the following could be uncovered thereby:  the location, timing,
 and level of  the concentration peak;  the  area exposed to a concentration
 in excess of a given value (e.g., the NAAQS);  and the concentration values
at stations within a measurement network.
                                   V-10

-------
            TABLE V-2.  TYPES OF VARIATIONS AMONG GENERIC
                        PERFORMANCE MEASURZ CATEGORIES
Generic Performance
 Measure Category

 Peak
 Station
 Area
  Types of
 Variations

Scalar

Pattern
recoqnition
Scalar
                        Statistical
      Typical Example
                        Pattern
                        recognition
Scalar


Statistical
                        Pattern
                        recognition
Concentration residual* at
the peak.
Map showing locations and
values of maximum one-hour-
average concentrations for
each hour.

Concentration residual at the
station measuring the highest
value.
Expected value, variance and
correlation coefficient of
the residuals for the model-
ing day at a particular
measurement station.
At the time of the peak(event-
related), the ratio of the
residual at the station hav-
ing the highest value to the
average of the residuals at
the other station sites (this
can indicate whether the model
performs better near the peak
than it does throughout the
rest of the modeled region).

Difference in the fraction of
the modeled area in which the
NAAQS are exceeded.
At the time of the peak, dif-
ferences in the area/concen-
tration frequency distribution.
For each modeled hour, iso-
pleth plots of the ground-
level residual field.
*Residual:  The difference between "predicted" and "observed."
                                  V-ll

-------
                          TABLE  V-2  (Concluded)


Generic Performance       Types  of
 Measure Category        Variations       	Typical Example	

  Exposure/dosage       Scalar            Differences  in the number of
                                         person-hours of exposure to
                                         concentrations greater than
                                         the NAAQS.
                        Statistical       Differences  in the exposure
                                         concentration frequency dis-
                                         tribution.
                        Pattern           For the  entire modeled day, an
                        recognition       isopleth plot of  the ground
                                         level dosage residuals.
                               V-12

-------
     Difficulties  hindering such an  examination arise  from two sources:
the limited spatial  resolution of the model  and the sparsity  of  the
measurement network.   While some models,  such as the Gaussian ones, are
analytic and thus  able to resolve the concentration field, many  cannot
do so completely.   Grid models, for  example, predict a single average
concentration value for each cell.  For this reason, they can not resolve
the concentration  field on a spatial scale any finer than the intergrid
spacing (usually on the order of one or two kilometers for urban scale
grid models).  Trajectory models are similarly limited:  They can resolve
the concentration  field only as finely as the dimensions of the  air
parcel being simulated.  Further, predictions are computed only  for a
particular space-time track, and not for the entire concentration field.

     The relatively small number of stations in most measurement networks
limits the ability to reconstruct completely the concentration field
actually occurring on the modeled day.  While  stations are well-placed in
some networks, in others they  are not.  Thus,  not only are stations
often 3-10 kilometers apart, their  placement does not always guarantee
the observation of peak or near-peak concentrations.  Further, even in
extended urban areas, seldom does the  number of stations  exceed  10 to 20.

      For these reasons, concentration  fields generally are not  known
with  precision, from  either model predictions  or observational  data.
Estimates  of the  spatial  distribution  of  concentrations  can  be  obtained
only  by inference from "sparse"  data.  The  use of  numerical  processes,
such  as interpolation and extrapolation,  to extend that  data introduces
additional  uncertainty into  the comparison  of predictions with  observations,

      Another consequence results from the limited  resolution of measure-
ment networks:  The value of the concentration peak actually occurring  on
 the day of observation may not be known.   Measurement networks  usually
 consist of fixed  stations arranged  in a set pattern.   Unless the air
 parcel  containing the peak drifts over or near one of the stations,  the
 maximum concentration value sensed by the network  will be less  (sometimes
 substantially so) than the value of the actual maximum.   When  prevailing
                                    V-13

-------
winds and pollutant chemistry are highly predictable for the  days  of
worst episode conditions, station placement can be designed so as  to
maximize the likelihood of sensing the true peak.   When  conditions are
not so predictable, a measurement network with a modest  number of
stations has little chance of "seeing" the true peak. For instance,
suppose the cloud containing the peak and all  concentrations  within 20
                                      V
percent of it covers an area of 25 square kilometers in  an urban area
having a total area of 1000 square kilometers.  If the cloud  has an
equal likelihood of being above any point in the urban region at the
time of the peak, by dividing the area of the cloud into the  total
urban area, we can make a crude estimate of the number of stations
required to guarantee a measurement within 20 percent of the  peak:
40 stations evenly spaced about 5 kilometers apart throughout the
urban region would be required.  Even if the probable location of
the cloud were known to be within an area equal to one-quarter of  the
urban area, 10 stations would be required just within that small area.
This degree of station density is high and may not be found in many
circumstances.

     The above example is a simplistic one.  The design  of actual
station placement can be a far more complex process than indicated here.
However, the example serves to underscore the main point:  a  measurement
network, though satisfying EPA regulations,* may still be unable to
guarantee an observation "close" to the actual concentration  peak, i.e.,
within 10 to 20 percent.

     The points raised in the above discussion have some practical
implications for the choice of a model performance measure.  Among
these are the following:
   Source:  40 CFR §51.17 (1975).

                                  V-14

-------
      >   Performance measures  relying on a  comparison of the
         predicted  and  "true"  peak  concentrations may not be
         reliable in all  circumstances  since measurement networks
         can  provide only the  concentration at  the station re-
         cording the highest value, not necessarily the value at
         the  "true" peak.
      >   Performance measures  relying on a  comparison of the
         predicted  and  "true"  concentration fields may not be
         computationally  feasible since neither predicted nor "true"
         concentration  fields  are always resolvable, spatially or
         temporally, at the scales  required for comparison
      >   Performance measures  based upon a  comparison of predicted
         and  "true" exposure/dosage, though they are appealing
         because of their ability to serve  as surrogates for the
         health effects experienced by  the  populace, may not be
         computationally  feasible because of the difficulty in
         measuring  the  "true"  population distribution and the
         "true" concentration  field.   (We do suggest in Chapter
         VI,  however, one means  by  which health effects considera-
         tions can  be accounted  for implicitly.)
      >   Performance measures  based upon  a  comparison  of the
         predicted  and  observed  concentrations  at  station  sites
         in the measurement network may be  of  the  greatest practical
         value.*

      While the above points are general ones,  exceptions  to them do
 occur in specific  applications.  Also, certain performance measures,
 though  not fully reliable on  their own, can be useful  in  a qualitative
 sense when used in conjunction  with other  measures.

 C.    A  BASIC DISTINCTION: REGIONAL VERSUS SOURCE-SPECIFIC
      PERFORMANCE MEASURES

      Some models are used to  address  multiple-source,  region-oriented
 issues; others are applied to consider single-source  issues.  The

*Note caveat  on pages VI-18 and  VI-19,  with respect  to  point source applications,
                                   V-15

-------
performance measures appropriate for each differ.   We consider here  the
distinction between regional  and source-specific performance measures.

     The distinction is drawn not so much between  the type of performance
measure used (peak, station,  area, or exposure/dosage),  but  rather between
the spatial scales over which it is applied.  To address urban or regional
scale issues (SIP/C, AQMP), we must consider a  region hundreds of square
kilometers in area, with the spatial and temporal  distribution of
concentrations the result of emissions from many sources.  The quantities
of interest are:  the regional peak concentration  (its level, location
and timing) and for each hour during the day (particularly at the time
of the peak), the spatial distribution of the pollutant concentrations,
by species.  This information is frequently conveyed in the  form of  a
concentration isopleth diagram, an example of which is shown in Figure
V-3.  The  diagram shown was produced by the SAI Urban Airshed Model,
                *
illustrating its ozone predictions for the Denver Metropolitan region
at Hour 1200-1300 MST on 29 July 1975.

     To address single-source issues, on the other hand, we  consider
only the region downwind of the specific source being modeled.  While
emissions  from it contribute to the overall pattern and level of
regional pollutant concentrations, 1t is usually the incremental impact
of those emissions that are of concern.  The principal quantities of
interest are:  the peak incremental ground-level concentration downwind
of the source and the spatial distribution of the  incremental concen-
trations within the downwind ground-level "footprint."  Specific para-
meters describing the latter are:  the area within which concentrations
exceed a certain value and  the shape of the concentration isopleths, usu-
ally conveyed in the form of a diagram such as  the one shown in Figure  V-4.
This diagram was constructed using a Gaussian formulation for a continu-
ously emitting  elevated point source.  Conditions  are in steady-state and
"perfect"  reflection from the ground is assumed.   No inversion layer exists.
It should be noted that winds are unlikely to persist long enough  for
actual conditions ever to resemble these isopleths beyond 20 to 25  km
(about 6 to 10 hours).
                                   V-16

-------
NORTH

v >^» Ai? r i^.-^-—^ ^>^^^r^^^*

'fe ^^a^^^^^^c^^^'^^
*S .'SLfS^V^r^r ..'i. ju." •= ^"bv ^£^? <«^^v —^-.\c!^«

\r
-------
HOT£S:
SOU»K STRCNGTH . 1000 tbn/hr
WIND • ? rph
tFFECTlVE STACK MIGHT • 250 ft
PfPfECT GROUND REFLECTION
HO |NV£»SJON
E-STABILITY CLASS (SL:'-,"n» ST«BIE)
10-HOUR
TRAVEL
TIME
OoMmrlnd Oil Una
(klloaetcrs)
FIGURE V-4.
SAMPLE SPECIFIC-SOURCE ISOPLETH DIAGRAM ILLUSTRATING CONCEN-
TRATIONS DOWNWIND OF A STEADY-STATE GAUSSIAN POINT SOURCE
Y-18
-------
Other types of sources produce different downwind.isopleth
patterns. In Figure V-5 we show qualitatively the downwind concentra-
tion patterns resulting from emissions from each of the three prin-
cipal source types: point, line, and area. These are only represen-
tations; the actual location, level, and shape of the isopleth lines
are heavily dependent on wind speed, source strength, and atmospheric
stability class. The figure does indicate, however, the general shape
of the downwind area within which the source impact is felt.

The type of source provides information in two areas: It identifies
the modeling region within which the peak, station, area, and exposure/
dosage performance measures are to be applied; and it provides insight
for monitoring network design. The observational data against which
model performance is to be judged are gathered at the measurement stations
within that network. To measure properly the impact produced by a
specific source, the measurement network should be'deployed in a
pattern consistent with the concentration field shapes shown in Figure
V-5. The station designed to measure the ground-level peak concentra-
tion should be located downwind from the source, several kilometers
distant for an elevated point source and immediately adjacent for
either a line or an area source. Located farther downwind are those
stations designed to resolve the concentration field and to determine
the concentration value most representative of the regional incremental
impact of the source. A schematic of such a measurement network for a
point source is presented in Figure V-6, showing one possible configu-
ration for the stations.

Several difficulties arise in practice: Wind direction is change-
able, and the location of ground-level footprints is very sensitive to
atmospheric stability. These problems are particularly acute when the
emitter being considered is an elevated point source. To illustrate, we
show in Figure V-7 the locus of the downwind footprint if all wind direc-
tions are considered equally likely to occur. If we idealize the concen-
tration isopleths as being elliptical in shape, we can determine an
V-19
-------
SCURC;
—X
T
tens of
kfloneters

PREVAIL IMG
"~n r~*iloaeteri
SOURtf
:~ i
Mveral
kilo
(PftmiLlHC
WIND
(a) faint Source (e.g.. Power

-------
PREVAILING
WIND
X MEASUREMENT
STATIONS
STATION
SENSING PEAK
POINT
SOURCE
X
FIGURE V-6. SCHEMATIC OF A POINT SOURCE MEASUREMENT NETWORK
:>V:V:-::::Y":-X:-•:•:-.•'
CONCENTRATION-':':.-/:}:-
WITHIN A CERTAIN
AMOUNT OF THE PEAK-
^.-.MAXIMUM v.-.v;.:.:
::;v:;CpNCENTRATION
SJ $$:$ws$&Mini MUM ' CON CE" NT fa'$^:$$&y
X#::y/&£Y-v:TION OF INTEREST :::^:+:}:-/
FIGURE V-7.
LOCUS OF POSSIBLE FOOTPRINT LOCATIONS FOR AN
ELEVATED POINT SOURCE. All wind directions
are considered equally likely.

V-21
-------
expression for the ratio of the area within a given isopleth to the
area of annulus, as shown in Figure V-7. Doing so, we can evaluate a
sample problem. Referring once again to Figure V-4, let the minimum
concentration value of interest be 300 yg/m^. Then, obtaining from
the figure the appropriate values, we can calculate that the isopleth
contains only 1.2 percent of the total area of the annulus. A monitor
placed at random within the annulus would have only a 1.2 percent chance
of observing a concentration greater than the minimum value of interest.
This problem is compounded if we consider variations in the inner and
outer radii due to the varying dispersive power of the wind.

The message of all this is clear: When winds are variable, fixed
monitoring stations have little chance of characterizing the concen-
tration field downwind of an elevated point source. Several specific
implications result for the gathering of measurement data for computing
point source performance measures. Among these are the following:

> Measurement data may have to be gathered using mobile
monitoring stations. Plume cross sectional sampling
could be done then based on the wind speed/direction
and atmospheric stability observed in "real time."
> The annulus (or sector, if winds are more predictable)
containing the locus of peak concentrations is much
smaller in area than that containing the minimum
concentration of interest and is much closer to the
source (usually ranging from 1-5 km distant).

D. SOME SPECIFIC PERFORMANCE MEASURES

Having discussed model performance measures in generic terms, we
now present some specific examples. We provide in Appendix C a detailed
discussion of each specific measure. To summarize here, we provide a
list for each of the four generic types of performance measures: peak
(Table V-3), station (Table V-4), area (Table V-5), and exposure/dosage

V-22
-------
TABLE V-3. SOME PEAK PERFORMANCE MEASURES

Type Performance Measure
Scalar a. Difference* in the peak ground-level
concentration values.

b. Difference in the spatial location of
the peak.

c. Difference in the time at which the
peak occurs.

d. Difference in the peak concentration
levels at the time of the observed
peak.

e. Difference in the spatial location of
the peak at the time of the observed
peak.

Pattern Map showing the locations and values of the
recognition predicted maximum one-hour-average concen-
trations for each hour.
*
"Difference" as used here usually refers to "prediction minus
observation."
V-23
-------
TABLE V-4. SOME STATION PERFORMANCE MEASURES
Type
Performance Measure
Scalar
Statistical
Concentration residual at the station measuring
the highest concentration (event-specific time
and fixed-time comparisons).
Difference in the spatial locations of the pre-
dicted peak and the observed maximum (event-
specific time and fixed-time comparisons).
Difference in the times of the predicted peak
and the observed maximum.

For each monitoring station separately, the
following concentration residuals statistics
are of interest for the entire day:
1) Average deviation
Average absolute deviation
Average relative absolute deviation
4) Standard deviation
5) Correlation coefficient
6) Offset-correlation coefficient.
For all monitoring stations considered together,
the following residuals statistics are of
interest:
1) Average deviation
2) Average absolute deviation
3) Average relative absolute deviation
4) Standard deviation
5) Correlation coefficient
6) Estimate of bias as a function of
concentration
7) Comparison of the probabilities of concen-
tration exceedances as a function of
concentration
Scatter plots of all predicted and observed
concentrations with a line of best fit deter-
mined in a least squares sense.
Plot of the deviations of the predicted versus
observed points from the perfect correlation
line compared with estimates of instrumentation
errors.
V-24
-------
TABLE V-4 (Concluded)

Type Performance Measure

Pattern a. Time history for the modeling day of the pre-
recognition dieted and observed concentrations at each site.
b. Time history of the variations over all stations
of the predicted and observed average concentra-
tions.
c. At the time of the peak (event-related), the ratio
of the normalized residual at the station having
the highest value to the average of the normal-
ized residuals at the other stations.
V-25
-------
TABLE V-5. SOME AREA PERFORMANCE MEASURES
Type
Performance Measure
Scalar
Statistical
Pattern
recognition
a. Difference in the fraction of the area in which
the NAAQS are exceeded.
b. Nearest distance at which the observed concen-
tration is predicted.
c. Difference in the fraction of the area in which
concentrations are within 10 percent of the
peak value.

a. At the time of the peak, differences in the
fraction of the area experiencing greater than
a certain concentration; differences in the
following are of interest:
1) Cumulative distribution function
2) Density function
3) Expected value of concentration
4) Standard deviation of density function
b. For the entire residual field, the following
statistics are of interest:
1) Average deviation
2) Average absolute deviation
3) Average relative absolute deviation
4) Standard deviation
5) Correlation coefficient
6) Estimate of bias as a function of
concentration
7) Comparison of the probabilities of concen-
tration exceedances as a function of con-
centration
c. Scatter plots of prediction-observation concen-
tration pairs with a line of best fit determined
in a least squares sense.

a. Isopleth plots showing lines of constant pollu-
tant concentration for each hour during the
modeling day.
b. Time history of the size of the area in which
concentrations exceed a certain value.
c. Isopleth plots showing lines of constant residual
values for each hour during the day ("subtract"
prediction and observed isopleths).
d. Isopleth plots showing lines of constant residuals
normalized to selected forcing variables (inver-
sion height, for instance).
e. Peak-to-overall performance-indicator, computed
by taking the ratio of the mean residual in the
area of the peak (e.g., where concentrations are
within 10 percent of the peak) to the mean
residual in the overall region.
V-26
-------
(Table V-6). We include scalar, statistical, and qualitative/composite
pattern recognition variants.

E. MATCHING PERFORMANCE MEASURES TO ISSUES AND MODELS

To this point we have identified several performance measures
categories, discussed their general attributes and data requirements,
and associated with them a number of specific performance measures.
Two tasks remain in this chapter: We first indicate for each of the
generic types of issues the performance measures most appropriate for
use; we then discuss the capability of each generic class of model to
calculate those measures.

1. Performance Measures and Air Quality Issues

In Chapter III we identified seven generic types of air quality
issues, dividing them into two broad categories. Within the first of
these multiple-source issues, we included: State Implementation Plan/
Compliance (SIP/C) and Air Quality Maintenance Planning (AQMP). The
second category, source-specific issues, was defined to contain the
following: Prevention of Significant Deterioration (PSD), New Source
Review (NSR), Offset Rules (OSR), Environmental Impact Statements/
Reports (EIS/R), and Litigation (LIT). For each of these issues we now
consider some important distinctions that bear on the selection of the
most appropriate model performance measures (PMs).

> Multiple-Source Issues
- SIP/C. The compliance portion of a SIP details
plans for achieving ambient pollutant levels at
or below the NAAQS in Air Quality Control Regions
(AQCRs) currently in noncompliance. Because it
is the peak concentration level that is of primary
concern, a model should demonstrate its ability
to predict that peak. For a day chosen as the one

V-27
-------
TABLE V-6. S0f€ EXPOSURE/DOSAGE PERFORMANCE MEASURES
Type
Performance Measure
Scalar
Statistical
Pattern
recognition
a. Difference for the modeling day in the number of
person-hours of exposure to concentrations:
1) Greater than the NAAQS
2) Within 10 percent of the peak.
b. Difference for the modeling day in the total
pollutant dosage.

a. Differences in the exposure/concentration fre-
quency distribution function; differences in the
following are of interest:
1 Cumulative distribution function
2 Density function
3 Expected value of concentration
4 Standard deviation of density function
b. Cumulative dosage distribution function as a
function of time during the modeled day.

For each hour during the modeled day, an isopleth
plot of the following (both for predictions and
observations):
1) Dosage
2) Exposure
V-28
-------
to be used for model verification, peak performance
measures should be computed. Also contained within
SIPs are emissions control strategies. To assess
the effects of controlling specific sources, a model
must be capable of spatially resolving its concen-
tration predictions. Area PMs should be calculated,
if possible, to evaluate a model's ability to do so.
Station PMs are another means to evaluate model
spatial resolution, although pollutant cloud offset
can account sometimes for apparent large discrep-
ancies. Because SIP/C is most frequently an issue
in densely populated urban areas, large differences in
health effect impact can exist between prediction and
observation. Exposure/dosage PMs should be calcu-
lated, if possible, in order to evaluate the ac-
ceptability of a model's performance.
AQMP. Detailed within the maintenance portion of
a SIP are procedures for insuring, once compliance
has been achieved, that ambient pollutant concen-
trations do not again rise above the NAAQS. Because
violation of the NAAQS is an issue, peak PM's are
important measures of model performance. However,
because pollutant levels are low (relative to the
values before compliance), small errors in model
performance might not produce a large uncertainty
in expected health impact. Consequently, the use
of exposure/dosage PMs may not be necessary. Also,
emissions control strategies may not be as global.
Retrofit of control devices on existing sources will
have been accomplished. Automotive emissions will
have been controlled (presumably) such that point
sources will contribute a large fraction of the
emissions inventory. While incremental growth and
development will alter the spatial and temporal
V-29
-------
distribution of pollutants, the need for modeling
spatial resolution may not be so crucial as it was
with SIP/C. Agreement between prediction and observa-
tion as measured by area and station PMs, while desir-
able, may not always be required within the same
tolerance as for SIP/C issues.
Specific-Source Issues
~ PSD. Individual sources are not permitted to cause
more than small incremental increases in concentra-
tions in areas currently In attainment of the NAAQS.
Since these so-called "Class I" regions (often state
or national parks) are generally some distance from
the polluting source (>10 kilometers), a model must
be able to predict accurately ground-level concentra-
tions some distance downwind from the source. If the
source being modeled is by Itself likely to produce
near-stack ground-level concentrations in excess of
the NAAQS or increments greater than Class II allow-
able increments, peak measures are of particular
interest. Otherwise, "far-field" concentration predic-
tions are more important than estimates of the peak
value. Downwind station PMs are often the measures
most suitable for evaluating model predictions for
PSD Class I. Also, plumes from point source are very
narrow, that is, their cross-wind dimensions are much
smaller than their downwind ones. Consequently, the
incidence of a Class I violation may be quite sensi-
tive to model performance, as measured by area PMs.
However, exposure/dosage PMs are not likely to be of
interest because of the sparsity of population in areas
where PSD is an issue and the relatively low concentra-
tions occurring there.
~ NSR. New source review is an important issue in both
urban and nonurban regions. With the-density of popula-
tion in urban areas, many persons may live within a short
distance (<5 kilometers) of a source. The ground- '
level peak concentration, then, may be an important
V-30
-------
indicator of near-source health impact. Prediction
of that peak, as measured by a peak PM, may be an
important model performance requirement. However,
because ground-level concentrations fall off rapidly
farther downwind and because of the "narrowness" of
the plume, differences in exposure and dosage between
prediction and observation may not be of substantial
consequence. Close agreement, as measured by area
and exposure/dosage PMs, may not be required. Also,
in order to assess the impact of a new or modified
source, it is necessary to know its incremental effect
on regional air quality. This is best represented by
an "average" concentration value (including background)
well downwind of the source (>10 kilometers). Thus, a
model should demonstrate its ability to reproduce mea-
surement data at that downwind range. The use of
station PMs is indicated.
OSR. In order to construct a new source or modify
an existing one in a region experiencing concentra-
tions in excess of the NAAQS, the owner of the source
must arrange for the removal of existing sources.
An amount greater than the emissions from the proposed
new source must be removed from the regional inven-
tory. Currently, these "offsets" are made on the
basis of emissions rather than as a result of their
impact on ambient concentrations. In such a case,
no air quality predictions are required (unless a
region-wide violation is attributable to the source
being removed or cleaned up). Only an accurate
emissions inventory is necessary. However, if off-
sets were "negotiated" at the level of ambient concen-
trations, the predictions of air quality models would
assume significance. The "far" downwind concentration
value, representative of its regional incremental
impact, would be the quantity of greatest interest,
V-31
-------
since it would describe the source's offset "potential."
Station PMs then would be of use in evaluating
model performance.
EIS/R. Projects having a significant, adverse impact
on air quality usually are presented for public
review by means of an EIS or an EIR. Such projects
generally consist of one or a few distinct sources,
although some consist of a greater number. An
example of the latter is the Denver Metropolitan
Wastewater Overview EIS recently completed by
Region VIII of the EPA. Federal funding for
twenty-two separate sewerage treatment facilities
was conditioned upon favorable review of the EIS
which examined their combined regional Impact. If
the sources are widely distributed throughout the
modeling region, spatial resolution may be an im-
portant model requirement. In such a case, area
and station PMs would provide a useful means to
verify model acceptability. If the combined
emissions from the proposed sources are relatively
low or they are localized to a narrow downwind
plume, their incremental health impact may be
small. Exposure/dosage PMs might be applied to
assess model performance. However, if, as in
Denver, the potential impact is more serious and
widespread, this latter type of PM can be useful.
LIT. Court challenges can arise to the basic air
pollution laws themselves, to their implementation
to federal regulation, or to decisions regarding
specific sources (requests for variances and
applications for construction/modification approval,
for example). While challenges of the first two
types can and have had important consequences, we
identify the third type as the principal variant
included in LIT. When the specific source in question
V-32
-------
is to be located in an urban area, the model used to
estimate its effects should be expected to predict
both its near-source, ground-level concentration peak
and its far-field "average" value. Peak and station
PMs should be used. If the source is to be constructed
in a rural area, PSD may be an issue in arriving at a
build/no-build decision. If so, accuracy of spatial
resolution could be important. The use of area PMs
could be of assistance.

We summarize in Table V-7 many of the points mentioned above. In it
issues are associated with the generic categories of performance measures
most commonly required for use in assessing model performance. However,
exceptions do occur. For this reason, the final choice of performance
measures should be dictated by the character of the specific application.

2. Performance Measures and Air Quality Models

In the previous section we associated performance measures with gen-
eric types of issues. We now discuss the ability of generic classes of
models to generate predictions in a form suitable for calculation of those
measures. All model types produce estimates of the concentration peak.
Some can predict station concentrations. Fewer can spatially resolve
the concentration field. Fewer still are able to determine an estimate
of exposure/dosage. For each generic model category, we outline here
their general capabilities.

> Grid,. The formulation of grid models permits the esti-
mation of concentrations averaged for each grid cell.
Consequently, the concentration field can be resolved
spatially as finely as the dimensions of the grid cell.
The peak is estimated to be the maximum ground-level
grid cell concentration occurring during the modeling day.
The location of the peak is predicted only as closely as

V-33
-------
TABLE V-7. PERFORMANCE MEASURES ASSOCIATED
WITH SPECIFIC ISSUES
Performance Measure Type
Issue Peak Station Area Exposure/Dosage
Multiple-source
SIP/C XXX X
AQMP XXX
Specific-source
PSD XXX
NSR XXX
OSR X X
EIS/R XXX X
LIT XXX
V-34
-------
a single grid cell dimension. The value at the peak is
predicted only as an area average in the vicinity of the
peak (within one grid cell). Because of its spatial and
temporal resolution, predictions suitable for calculation
of station, area and exposure/dosage performance measures
also can be generated.
> Trajectory. Because a single air "column" is simulated,
only concentrations along the space-time track followed
by the advecting air parcel can be estimated. Such
models, as a consequence, can predict station concentra-
tions only for those over which they pass. If several
adjoining parcels are modeled, predictions at other
stations can be determined. The spatial location of the
peak can be estimated only as closely as the dimensions
of the air column. The peak level is estimated to be
the greatest column-averaged concentration occurring
-during the modeling day. Averaging can take place over
the entire vertical region from the ground to the inver-
sion base, or over the lowest of several vertical column-
layers. Because of their limited spatial resolution,
regional trajectory models do not generate predictions
In a form suitable for the calculation of area or
exposure/dosage PMs. Specific-source trajectory models,
on the other hand, may do so. Concentrations are pre-
dicted as a function of downwind distance from the source.
Though lateral resolution is limited, concentration esti-
mates can be put in a form appropriate for calculation of
station, area and exposure/dosage PMs.
> Gaussian. Concentration field predictions are expressed
analytically. Thus, subject to the steady-state limita-
tions of their formulation, the short-term averaging
versions of these models can provide their estimates in a
form that is suitable for the calculation of all performance
measure types. The long-term averaging versions, however,
V-35
-------
predict regional or sector-averaged estimates of annual
concentrations. Estimates of exposure/dosage (except
. crudely on the basis of an annual concentration level) are
difficult to derive. Predictions of annual station averages,
though, can be obtained for regional models of this type.
> Isopleth. Estimates in no other form than the regional
peak concentration can be obtained with this method. This
can be done only when the isopleth diagrams can be inter-
preted in an absolute sense. This is the case only when
the isopleth diagram has been derived for ambient condi-
tions similar to the ones in the area being modeled. In
addition, a prediction of the peak can be verified only
if a historical data base exists that is sufficient to
determine a peak concentration in a previous base year and
a record of the emissions cutbacks occurring since then.
> Rollback. The only prediction obtainable from rollback
is an estimate of the regional peak concentration. This
is determinable only if an historical data base exists
such as that described for the isopleth method.
> Box. A prediction of the regional peak concentration
can be determined using this method. No other estimates
requiring finer spatial resolution can be computed.
Diurnal variation in the estimates of regional average
concentration, however, can be made.

We summarize in Table V-8 many of the points mentioned above. In
it, we indicate for each generic model the type of performance measure
that may be calculated, given the capabilities and limitations of each
formulation.

F. PERFORMANCE MEASURES: A SUMMARY

In this chapter we identified generic performance measure categories,
listed some specific performance measures, and then associated the

V-36
-------
TABLE V-8. PERFORMANCE MEASURES THAT CAN BE
CALCULATED BY EACH MODEL TYPE
Performance Measure Type
Model
Refined usage
Grid
Region oriented
Specific source oriented
Trajectory
Region oriented
Specific source oriented
Gaussian
Long-term averaging
Short-term averaging
Refined/screening usage
Isopleth
Screening usage
Rollback
Box
Peak Station
X
X
Exposure/
Area Dosage
X
X
X
X
X
X
X
X
X
X
X
X
X
X

X
X
X
X
X

X
V-37
-------
generic measure with generic Issues, noting for each model type the PMs
they are capable of calculating. Having done so, we are now ready to
proceed with the final objective of this report: the discussion of
model performance standards. The presentation in Chapter VI will be
based upon the points raised in this chapter. The following are of
crucial importance:

> Measurement networks often do not sense the "true"
concentration peak.
> Only performance measures based upon station measure-
ment data may be computationally feasible.
> Model predictions are often resolvable on a finer
scale than measured concentrations; even though
strict comparison of prediction with observation
through some computed measure may not be fruitful,
the model predictions themselves may still offer
valuable insight.
V-38
-------
VI MODEL PERFORMANCE STANDARDS

The central purpose of this report is to suggest means for setting
performance standards for air quality dispersion models. Toward that end
our discussion has proceeded as follows: Issues were identified (Chapter
III); issue/model combinations were presented (Chapter IV); and alternative
issue/model/performance measure associations were discussed (Chapter V).
We are now at the final step: the setting of standards. To place this
in the proper framework, we first identify five attributes of desirable
model performance, showing how their relative importance depends on the
issue being addressed and the pollutant being considered. Then we recom-
mend specific performance measures whose values reveal the presence or
absence of each performance attribute. We detail several rationales for
establishing standards for those measures. To illustrate the use of these
measures in assessing model performance, we present a sample case. It is
based upon SAI experience in using a grid-based photochemical model in the
Denver metropolitan region. Finally, we detail possible forms the actual
standard might assume, suggesting a sample draft outline and format.

The subject addressed in this report is a broad and complex one.
Seldom can a rule for judging model performance be stated that does not
have several plausible exceptions to it. Consequently, we view the estab-
lishment of model performance standards to be a pragmatic and evolutionary
exercise. As we gain experience in evaluating model performance, we will
need to modify both our choice of performance measures and the range of
acceptable values we insist on. Nevertheless, the process must begin
somewhere. The recommendations contained in this chapter represent such
a beginning.

We feel the measures and standards we suggest for use here will almost
certainly change as experience improves our "collective judgment" about
what constitutes model acceptability and what does not. Perhaps the

VI-1
-------
number of measures will increase to provide richer insight into model
performance, or perhaps the number will shrink without any loss of "informa-
tion content." Regardless of the list of measures and their standards that
ultimately emerges for use, it is the conceptual structuring of the per-
formance evaluation itself that seems to be most important at this point.
We must identify the attributes of a well-performing jnodel, and we need to
understand how we assess their relative importance, depending on the issue
we are addressing and the pollutant species we are considering. The dis-
cussion in this chapter offers a conceptual structure for "folding in" all
these concerns and suggests candidate measures and standards.

A. PERFORMANCE STANDARDS: A CONCEPTUAL OVERVIEW

The chief value of air quality models lies tn their predictive ability.
Only through their use can the consequences of pollution abatement alter-
natives be assessed and compared. Only by means of model predictions can
the impact of emissions from newly proposed sources be estimated and evalua-
ted for acceptability. However, because the questions typically asked of
models are hypothetical ones, their predictions are inherently nonverifiable.
Only after the proposed action has been taken and the required implementation
time elapsed will measurement data confirm or refute the model's predictive
ability.

Herein lies the dilemma faced by users of air quality models: If
a model's predictions at some future time cannot be verified in advance,
on what basis can we rely on that model to decide among policy alternatives?
In resolving this, most users have adopted a pragmatic approach: If a
model can demonstrate its ability to reproduce for a similar type of appli-
cation a set of "known" results, then it is judged an acceptable predictive
tool. It is on this basis that model "verification" has become an essential
prelude to most modeling exercises.

A further difficulty exists. What constitutes a set of "known" results?
This is not a problem easily solved. For "answers" to be known exactly, the
"test" problem must be simple enough to be solved analytically. Few problems

VI-2
-------
involving atmospheric dynamics are so simple. Most are complex and nonlinear.
For these, the analytic test problem is an unacceptable one. Another, more
practical alternative often is employed. For regional, multiple-source
applications, the "known" results are taken to be the station measurements
of concentrations actually recorded on a "test" date. For pollutants having
a short-term standard, the duration of measurement is a day or less. For
those subject to a long-term (annual) standard, the duration is a year or more.

For source-specific applications, the source of interest may not yet
exist, permission for its construction being the principal issue at hand.
For these applications, it is often necessary to verify a model using the
most appropriate of several protypical "test cases." These could be assembled
from measurements taken at existing sources, the variety of source size,
type and location spanning the range of values found in applications of interest.

The term "known" is used imprecisely when referring to a set of measure-
ment data. Station observations are subject to instrumentation error. The
locations of fixed monitoring sites may not be sufficiently well distributed
spatially to record data fully characterizing the concentration field and its
peak value. Nevertheless, despite those shortcomings, "observed" data often
are regarded as "true" data for the purposes of model verification.

Having assembled two sets of data, one "known" and the other "predicted,"
we can assess model performance by comparing one with the other. Predic-
tion and observation, however, can be compared in many ways. We must select
the quantities that can best characterize the distribution of pollutants in
the ambient air, for it is through comparison of their predicted and observed
("known") values that we specify model performance. We catalogued a number
of useful performance measures in Chapter IV, as well as in Appendix C.
Later in this fchapter we indicate that subset we view as having the
greatest practical usefulness.

Once we have decided on the performance measures best suited to our
issue/application (and most feasible computationally), we can calculate

VI-3
-------
these values. Having done so, however, we must ask a central question: How
close must prediction be to observation in order for us to judge model per-
formance as acceptable? In order for us to answer "how good is good," per-
formance standards for these measures must be set, with allowable tolerances
(predicted values minus observed ones) derived based upon a reasonable
rationale (health effects or pollution control cost considerations, for
instance).

By setting these standards explicitly, certain benefits may be gained.
Among these are the following:

> A degree of uniformity is introduced in assessing model
reliability.
> The impact of limitations in both data gathering proce-
dures and measurement network design can be made more
explicit, facilitating any review of them that may be
required.
> The performance expected of a model is stated clearly,
in advance of the expenditure of substantial analysis
funds, allowing model selection to be a more straight-
forward and less "risky" process.
> The needs for additional research can be identified clearly,
with such efforts more directed in purpose.

B. PERFORMANCE STANDARDS: SOME PRACTICAL CONSIDERATIONS

Before continuing, we point put several practical considerations that
can have a direct impact on model verification. Among the most important
of these are the following: data limitations (due to its form, quantity,
quality, and availability); time/resource constraints; and variability in
the level and timing of analysis requirements. We discuss each of these
in turn.
VI-4
-------
1. Data Limitations

For a modeling simulation to be conducted, data must be gathered charac-
terizing both the "driving forces" (emissions, meteorology, and vertical
temperature profile, for example) and the "resulting effects" (pollutant
concentrations). To do so requires an extensive and coordinated effort.
Consequently, complete data sets usually are assembled for only a few sample
days. The dates on which these data are gathered are chosen as ones likely
to be typical of "worst" episode conditions. However, unanticipated shifts
in meteorology (frontal passage, for example) can occur, confounding attempts
to measure ambient conditions on high-concentration days. Consequently, the
data available for model verification may not be representative of conditions
on the day when the "second highest" concentration occurs, i.e., the worst
NAAQS violation.

Confronted with such a situation, the modeler must decide the following:
Even if model performance proves acceptable for non-episode conditions, can
it be considered "verified" as a predictive tool for higher-concentration
days? This question is part of a still more general one: Should a model
be verified for more than one day, each of these days experiencing a dif-
ferent peak concentration? If such a procedure were followed, model perfor-
mance could be evaluated for concentrations ranging from the current peak
value to ones nearer the NAAQS. But, the meteorology occurring on days
experiencing low peak concentrations is not typical of that occurring on
high peak days. Should not the model, when used as a predictive tool,
employ maximum-episode meteorology? We do not answer these questions here
but note their importance as questions remaining to be resolved. We observe,
however, that limitations on data quantity and availability can constrain us,
limiting our flexibility in dealing with these questions.

Another difficulty can arise because of spatial limitations in the
data. As we noted in the last chapter, measurement networks provide
concentration data only at a few fixed sites. In general, these networks
cannot guarantee observation of the "true" peak, nor are they sufficiently
VI-5
-------
well-spaced to assure that the "true" concentration field can be reconstructed
from the station measurements. As a practical matter, however, these station
data must form the basis for the comparison of prediction and observation.
Station-type performance measures, as defined in Chapter V, therefore must
be the "preferred" (or rather the "unavoidable") measures of interest. We
detail some of these later in Section D.

2. Time/Resource Constraints

Both the amount and quality of the data collected as well as the level
of modeling analysis performed are all strongly influenced by time dead-
lines and resource constraints. This has several consequences among which
are the following: Because it is difficult, expensive and time comsuming
to mount special data gathering efforts, heavy reliance is placed on previously
gathered data, even with its recognized deficiencies; also, model selection
occasionally is made more on the basis of the form and extent of existing
data and financial budgetary considerations than on grounds more technically
justifiable. In such cases a conscious choice has been made, trading model
performance for other considerations.

The combined effect of inadequate data and inappropriate model choice
can reduce in value any assessment of model performance. In this report,
however, we take the following view: The level of performance required of
a model is determined not by exogeneous considerations but by the nature of
the issue and the specific modeling application.

3. Variability of Analysis Requirements

Modeling analysis requirements differ from one application to another.
There is an important question to ask in every modeling situation: How
much analysis is justified? In the Los Angeles Basin, for instance, attain-
ment of the NAAQS for ozone cannot be achieved without widespread and
extensive hydrocarbon (HC) emissions control. Ambient HC levels are currently
so high that more HC radicals are available than are "needed" by the chain
VI-6
-------
of photochemical reactions that results in the 03 peak. Consequently, reduc-
tions in HC emissions must be sizable before any appreciable reduction in
peak 0- can be achieved. The result of this is the following: Estimates of
«3
the percentage HC emissions control required to reach NAAQS compliance in
Los Angeles are so high (75 to 80 percent) that they are not strongly sensi-
tive to uncertainties in the value of the 03 peak, either measured or predicted.

If the only questions to be answered depended on the general region-
wide level of HC emissions control required (a SIP/C-related problem), then
a fair amount of uncertainty could be tolerated in model predictions of the
0~ peak* Use of a less sophisticated model might be acceptable. Were a
different issue/question addressed, however, a model providing more detailed
predictions might be required.

C. MODEL PERFORMANCE ATTRIBUTES

Model predictions are subject to a number of sources of uncertainty. Some
of these are data related, while others are inherent in the model theoretical
formulations. Regardless of their source, however, errors manifest themselves
in similar ways. They may affect a model's ability to predict peak concen-
trations, as well as introduce systematic bias or gross error into its pre-
dictions. They may limit a model's ability to reproduce temporal variation
or affect the spatial distribution of the concentration field.

What are the attributes of desirable model performance? Ideally, we
would ask that a model have five major attributes, the strength of our insis-
tence depending on the circumstance of our application and the pollutant we
are considering. The five model performance attributes are: accuracy of the
peak prediction, systematic bias, lack of gross error, temporal correlation,
and spatial alignment. The first of these concerns the model's
ability to predict accurately the level, timing, and location of the concen-
tration peak. The second attribute is the absence of systematic bias, where
predictions are shown not to differ from observations in any consistent and
unexplained way. The third attribute concerns the lack of gross error, or
rather the absolute amount by which predictions differ from observations.

VI-7
-------
We classify the difference between bias and error by means of the
following example. Suppose when we compare a set of model predictions with
station observations, we find several large positive residuals (predicted
minus observed concentrations) balanced by several equally large negative
residuals. If we were testing for bias, we would allow the oppositely
signed residuals to cancel. A conclusion that the model displayed no syste-
matic bias therefore might be a justifiable one. On the other hand, were
we testing for gross error, the signs of the residuals would not be considered,
with oppositely signed residuals no longer allowed to cancel. Because the
absolute value of the residuals is large in our example, we might well con-
clude that the model predictions are subject to significant gross error.

The fourth of the desirable performance attributes is that of temporal
correlation. When this is important, can the model reproduce the temporal
variation displayed by the observational data? A model might be judged as
being capable of doing so if its predictions varied in phase with observa-
tion, that is, if they were "correlated." The fifth desirable attribute is
that of spatial alignment. At each time of interest, does the model pre-
dict a concentration field that is distributed spatially like the observed
one? To determine this, correlation of prediction with observation could
be assessed at several points in the concentration field, e.g., monitoring
stations.

The five performance attributes are interrelated. Suppose, for instance,
that our model does not reproduce well the photochemistry of ozone formation
in the atmosphere. Not only could its estimates of the concentration peak
be in error, but also its temporal correlation and spatial alignment might
be poor. Even if the model predicted the peak properly, problems might still
exist. If the chemistry were "fast," the peak, though correct, might be pre-
dicted to occur sooner than that actually observed. Even if atmospheric
transport were properly modeled, performance measures might then "detect"
temporal and spatial problems.

By treating each performance attribute separately, we may run the risk
of rejecting a model on several grounds where only a single reason actually

VI-8
-------
exists.. For example, slight errors in the wind field input to the model
might result in predictions apparently wroi.g both spatially and temporally.
Yet, only a single defect exists, in this case not due to the model at all.

Nevertheless, we adopt a conservative viewpoint.' We suggest evaluating
the model separately for the presence of each attribute, even though they
themselves may be interrelated. Redundancy should not result in a satis-
factory model being unfairly rejected. If model predictions are good, they
will be acceptable both spatially and temporally. If they are poor, they
will probably be rejected, both for temporal and spatial masons.

If model performance is mixed, showing, for example, good temporal cor-
relation but poor spatial alignment, two possibilities exist. Either the
model performance may not be particularly poor or the performance measure
used to detect one or the other performance attribute is deficient (too
stringent or too lenient). In either case, however, forcing model perfor-
mance to be reassessed malces sense. On balance, while requiring a model to
"jump the hoop" twice may be redundant in looking for the same problem, it
should provide us a measure of safety in the "double-check" it provides, pre-
suming each attribute assumes the same importance (see the discussion below).

Although they are interrelated, the five model performance attri-
butes are distinct. Consequently, we must employ different kinds of per-
formance measures to determine the presence of each attribute. While we
defer to Section D a statement of specific measures we recommend using, we
list in Table VI-1 their objectives.

We have identified five model performance attributes. Which of these,
however, is most important? This question has no unique answer, the rela-
tive importance fn each problem depending on the type of issue the model
is being used to address and the type of pollutant under consideration.
In order to relate attribute importance to application issue in a more con-
venient manner, we present in Table VI-2 a matrix of generic issue class
(as defined earlier in this report) and problem type. For each combination
VI-9
-------
TABLES VI-1. PERFORMANCE MEASURE OBJECTIVES
Performance
Attributes

Accuracy of the
peak prediction

Absence of
systematic bias

Lack of gross
error

Temporal
correlation

Spatial alignment
Objective of Performance Measures
Assess the model's ability to predict the concentra-
tion peak (its level, timing and location)

Reveal any systematic bias in model predictions
Characterize the error in model predictions both at
specific monitoring stations and overall

Determine differences between predicted and observed
temporal behavior

Uncover spatial misalignment between the predicted
and observed concentration fields
TABLE VI-2. IMPORTANCE OF PERFORMANCE ATTRIBUTES BY ISSUE
Performance Attribute

Accuracy of the peak
prediction

Absence of systematic
bias
Lack of gross error

Temporal correlation

Spatial alignment
Importance of Performance Attribute*

SIP/C AQMP PSD NSR OSR EIS/R LIT

1111211
1
1
1
1
1
1
1
1
2
2
1
2
2
1
3
1
1
3
3
1
3
3
1
3
3
1
3
3
* Category 1 - Performance standard must always be satisfied.
Category 2 - Performance standard should be satisfied, but some leeway
may be allowed at the discretion of a reviewer.
Category 3 - Meeting the performance standard is desirable but failure
is not sufficient to reject the model; measures dealing
with this problem should be regarded as "informational."
VI-10
-------
we indicate an "importance category." We define the three categories based
upon how strongly we insist our model demonstrate the presence of a given
attribute. For Category 1, we require that performance standards always
be satisfied (the problem type is of prime importance). For Category 2,
we state that the standard should be satisfied but some leeway ought to
be allowed, perhaps at the discretion of a reviewer (while the problem type
is of considerable importance, some degree of "mismatch" may be tolerable).
For Category 3, we are not insistent that standards be met, though we state
that as being a desirable objective (the problem type is not of central
importance).

A number of assumptions are embedded in Table VI-2. Among the more
significant are the following:

> Both peak and "far-field" concentrations are of interest
in considering PSD and NSR questions.
> Specific-source issues (PSD, NSR, OSR, EIS/R and LIT) most
often deal with sources assumed to be continuously emitting
at a constant level (or nearly so); consequently, performance
measures considering time variations between prediction
and observation are not the principal measures of interest.
> Spatial agreement between prediction and observation is par-
ticularly important in applications where PSD is an issue;
this is so because source impact on pristine areas (Class I)
and elevated terrain (Class II) often occurs well downwind
of the source, with the magnitude and incidence of impact
highly directional and spatially dependent.
> Specific-source impact generally occurs in a narrow downwind
plume; thus, the monitoring network set up to provide measure-
ment data often consists of only a few stations; as a result,
the calculation of all-station performance measures may not
prove meaningful.
> Error is less important in considering regional issues than is
the presence of a systematic bias.

VI-11
-------
> To achieve and maintain compliance with the NAAQS (SIP/C, AQMP),
alternate control strategies must be developed and evaluated.
For this to be done properly, some degree of spatial resolution
should be attained by the model and verified.

The relative importance of each performance attribute is dependent
on the type of pollutant being considered and the averaging time required
by the NAAQS. If a species is subject to a short-term standard, for
instance, accuracy of the peak prediction and temporal correlation might
be of considerable concern, depending on the issue being addressed. How-
ever, if the species is subject to a long-term standard, neither of these
problem types are of appropriate form. We indicate in Table VI-3 a matrix
of the performance attributes and pollutant species. We rank each combina-
tion by the same importance categories we used earlier in Table VI-2.

Conceivably, a conflict might exist between the ranking indicated by the
issue and the pollutant matrices in Tables VI-2 and VI-3. We suggest resolving
the conflict in favor of the less stringent of the two rankings. For example,
suppose the issue being addressed was SIP/C and pollutant being considered
was CO. According to Table VI-2, the accuracy of the peak prediction should
be regarded as Category 1 (.the standard must always be satisfied). However,
according to Table VI-3, it should be considered as Category 2 (the standard
should be satisfied but some leeway may be allowed). The conflict should
be resolved by allowing the combined issue/pollutant ranking to be Category 2.

D. RECOMMENDED MEASURES AND STANDARDS

In this section we reach a major goal of this report: We identify a
recommended set of performance measures and propose rationales for setting
standards for each. Our discussion in this section unfolds as follows.
First, we isolate a candidate list of performance measures from which we
select the recommended set. Then, we detail several rationales on which to
base standards for our "preferred" measures. Using these we identify
specific "guiding principles" from which standards may be set. In a final
VI-12
-------
TABLE VI-3 IMPORTANCE OF PERFORMANCE ATTRIBUTES BY
POLLUTANT AND AVERAGING TIME
Pollutants
Performance 3
Attribute (1 hourl1
Accuracy of the 1
peak prediction
Absence of 1
systematic bias
Lack of gross 1
error
Temporal 1
correlation
Spatial 1
al Igrunent
CO**
(1 hour)
1

NfHC*
(3 hour)
1

with Short-term
S02 M>2
(3 hour) (T)f
1 1

1 1

1 - 1

2 1

Standards
CO
(8 hour)
1

Pollutants with
Long-term Standards
TSP**
(24 hour)
1

so2«*
(24 hour)
1

(1 year)
3

N/Atf

TSP
n year)
3

N/A

SO,
(1 year)
3

N/A

* Category 2 - Perforce Itandtrd should'be satisfied, but some leeway may be allowed at the discretion of a reviewer.
Category 3 - Meeting the performance standard Is desirable but failure Is not sufficient to reject the model.

t No short-term N02 standard currently exists.
I Averaging times required by the NAAQS are In parentheses.

•* Primary standards.
tt The performance attribute 1s not applicable.
VI-13
-------
synthesis, we present a summary table listing for each performance attri-
bute, the reconmended measures and a means for setting standards for them,
along with a sample value for the standard (ones listed are appropriate
for the Denver case study described in Section E of this chapter).

1. Recommended Performance Measures

Of the many performance measures considered in Chapter V (and in more
detail in Appendix C), which of these are most suitable for use in establishing
standards for model performance? The answer to this is constrained in two
major ways, the first conceptual and the second practical. First, the con-
ceptual constraint is imposed by the types of performance attributes we are
concerned with: The measures must adequately assess the presence or absence
of each of the five attributes. Second, the practical constraint is imposed
by the "sparseness" of the observational data: Since station observations
constitute the only data available for characterizing "true" ambient con-
ditions, we have little choice but to employ station performance measures
in determining model acceptability.

We draw a distinction between those measures that are of general use
in examining model performance and the much smaller subset of them that is
most amenable to the establishment of explicit standards. Many measures
can provide rich insight into model behavior but the information is conveyed
in a qualitative way not suitable for quantitative characterization (a
requisite for use in setting performance standards). These "measures,"
often involving graphical display, really are tools for use in "pattern
recognition." They display model behavior in suggestive ways, highlighting
"patterns" whose presence reveals much about model performance. Several
examples of such "measures" are isopleth contour maps of predicted concen-
trations and estimated "observed" ones, isopleth contour maps of the dif-
ferences between the two, and time histories of predicted and observed con-
centrations at specific monitoring stations.
VI-14
-------
Though we focus on station measures for use in setting model performance
standards, we do not suggest the calculation of performance measures be
limited to them. Many others, where each is appropriate, should be used.
The data should be viewed in as many, varied ways as possible in order to
enrich insight into model behavior. We suggest a number of useful measures
both in Chapter V and Appendix C.

Given that station measures are our "preferred" (rather, our "unavoid-
able") choice, we now consider the list of candidate measures. From these
we select our final recommended set. We present the candidate station per-
formance measures in Table VI-4. We group them by the number of stations
compared noting the performance attribute and generic issue class they are
most suited for addressing. We identify four types of comparisons:

> Event Specific Values. Predicted and observed concentra-
tions are compared at the time a specific event occurs.
For instance, the peak station prediction can be compared
with the peak station observation, even though these may
occur at different stations and times.
> Comparative Values. Predicted and observed concentrations
are compared at the same monitoring station.
> Average Values. Predicted and observed concentrations are
compared averaged for all monitoring stations.
> Offset Values. Observed concentrations at a given station
are compared with predicted values offset by a small amount
spatially (values at near-by stations) and/or temporally (values
at other times, either earlier or later).

Performance measures are of two different kinds: "absolute" and
"informational." The first type includes those measures for which we can
set specific, absolute standards. Measures of the second type are more
informational in nature, providing qualitative insight into model performance.
Their values are to be considered as "advisory," having associated with them
no specific standard.
VI-15
-------
TABLE VI-4. CANDIDATE STATION PERFORMANCE MEASURES
Issue Category
Stiller*
Considered.
Peat SUtlens
({vent-Specific
Valmts)

Ctck Sution
Separately
(Conparative
Values)

Kit Stations
Together
(Average Values)

Performance
Attributes
Accuracy of
the peak
prediction
(Concentra-
tion level 1

Accuracy of
the peak
prediction
(Location
of Peak)
Accuracy of
the peak
prediction
(Timing of
Peat)
Absence of
systematic
bias

Lack of
gross error

Temporal
correlation/
spatial
alignment
Temporal
correlation

Absence of
Systematic-
bias
Lack of
gross error

Temporal
correlation/
spatial
attgnwnt
Performance Measure
Description
1, Difference between or
ratio o< peak station
concentrations (could be
at different measurement
stations)
2 Difference between or
ratio of predicted and
observed concentrations
at the station recording
the «a»imu» measured
value
3. Spatial displacement
betxeen predicted and
observed peak stations

4. Timing difference be-
tween occurrence of
predicted and observed
peak

5. Average relative devia-
tion

6. Average absolute rela-
tive deviation
7. Standard deviation of
deviations
8. Correlation coefficient

9. Temporal offset corre-
lation coefficient
10. Plots of comparative
time histories
11. Average relative de-
viation

12. Average absolute rel-
ative deviation
13. Standard deviation of
deviations
14.. Correlogram of
prediction-observation
pain
15. Ratio of peak to
average deviation
It. Correlation coefficient

Multiple-Source
Status S1P/C AQMP
Absolute X X

Absolute X X

Informational X X

AbsaUte - X X

Absolute X X

Informational X X

Absolute X X

Informational X X

Informational 1 x

Absolute i x

Specific-Source
PSD HSR OSIC E1S/R LIT
* X * X X

X XX

XXX XX

VI-16
-------
TABLE VI-4 (Concluded)
Issue Category
Stations
Considered
Problem
Neirby Stations
(Offset Values).
Temporal
correlation
Spatial
alignment
Temporal
correlation
Spatial
Al ignment
Performance Measure
Description Status
17.
18.
19.
Temporal offset corre- Informational
tation coefficient
Plot of comparative Informational
tine histories
Spatial offset corre- Informational
Multiple-Source
StP/C WJKP
Specific-Source
PSD HSR OS*« ElS/« LIT
lation coefficient
(comparison at the
sane tine)
20. Spatial/temporal offset Informational
correlation coefficient
(comparison at differ-
ent times)
• These leisures are appropriate 1f offsets are considered at the level of ambient concentrations
rather than primary emissions.
Often in practice modeling predictions are known with greater spatial
resolution than measurement data. The predicted concentration field, for
instance, can be resolved at intervals of several kilometers or less by
various types of models, including grid and Gaussian ones. To retain the
information contained in concentration field predictions, several "hybrid"
performance measures can be employed. With these, concentration field
predictions are compared with station measurements. We list in Table VI-5
several of these hybrid measures. When predictions are available in this
more detailed form, these measures may be calculated to supplement those in
Table VI-4.
VI-17
-------
Our recommended choice of performance measures is based upon the
following criteria:

> The measure is an accurate indicator of the presence of a
given performance attribute.
> The measure is of the "absolute" kind, that is, specific
standards can be set.
> Only station measures should be considered for use in
setting standards. (This is more an unavoidable choice
than a preferred one.)

Based on these criteria, we have selected the set of measures described
in Table VI-6. The use of ratios (Cp /Cp and v, for example) can introduce
difficulties: They can become unstable at low concentrations, and the sta-
tistics of a ratio of two random variables can become troublesome. Neverthe-
less, when used properly, their advantages can be offsetting. For example,
the use of Cp /Cp instead of (Cpn'Cpn,) permits a health effects rationale to
be used in recommending a performance standard (see a later discussion of the
effects rationale).

Before continuing, however, we insert an important caveat. For calcu-
lation of these measures to be statistically meaningful, a certain minimum
level of spatial and temporal "richness" must be available from monitoring
data. Often, this criterion is met for multiple-source, urban applications.
However, for isolated point source applications, it may not be. For such
cases, data inadequacies may be overcome by using prototypical "test bed"
data bases for the purposes of model verification. Selection of the
proper "test bed" could be accomplished by choosing the prototypical data
base that describes an application most nearly like the proposed one.
VI-18
-------
These data bases, where they do not already exist, could be assembled
through special measurement efforts at existing large point sources. Mon-
itoring could be extensive enough to insure adequate data "richness."

As a practical matter, however, such "test beds" are not currently
available. Verification instead must be conducted using whatever data are
at hand. These may be provided by tracer experiments. Alternatively,
where a source already exists (for instance, where retrofit of pollution
control equipment is the issue or where construction of a new source is
to occur on the site of an existing source), some site-specific data already
may be available.

Considerable care should be exercised when using such data to calcu-
late the performance measures listed in Table VI-6. If the data are too
"sparse," in either a spatial or a temporal sense, these measures may be
of little value, or worse yet, may actually be misleading. Additional
work needs to be conducted to identify, if possible, supplementary perfor-
mance measures for use when the available data is inadequate for reliable
use of the recommended measures.

Having stated the above caveat, we continue. A number of key assump-
tions are embedded in the choice of the specific measures shown in Table
VI-5. We state several of them:

> Concentration gradients within a pollutant cloud can be
"steep". Thus a slight spatial misalignment of the cloud,
perhaps an unconsequential problem on its own, can sometimes
result in the predicted peak occurring at a different
monitoring station than the measured peak. Estimating the
value of the concentration peak, however, is often of
much greater importance than predicting its exact location.
VI-19
-------
TABLE VI-5. USEFUL HYBRID PERFORMANCE MEASURES
Stations
Considered
Performance
Attribute
Peak Sution
(tvent-Speclfic
Values)
Accuracy of
the peak
prediction
(Concentra-
tlon level)
Issue Category
Performance Measure
Description
Status
1. Difference between or
ratio of predicted peak
concentration and nigh-1
est station value
Absolute
jjultiple-Source

S1P/C AQHP

X X
Specific-Source
PSD JSR

X X
OSR«

X
E1S/R

X
UI
x
Each Station
Separately
(Comparative
Values)
All Stations
Together
(Average
Values)
Accuracy of
the peak
prediction
(Location
of Peak)

Accuracy of
the peak
prediction
(Tiling of
Peak)

Spatial
alignment
Lack of
gross error
2. Spatial displacement Informational
between the predicted
peak and the station
measuring the highest
value

3. Tiering difference be- Informational
tween occurrence of
the predicted peak and
the maximum station
measurement

4. Plot showing for each InforMtional
hour during the day the
distance and direction
from the measurement
station to the nearest
point at which a pre-
dicted concentration
occurs equal to the
station measured value

5. Difference for each hour Informational
between the average pre-
dicted concentration
(averaged over the en-
tire field) and the
average station measure-
ment (averaged over all
stations)

6. Difference for each hour Informational
between the standard
deviations of the pre-
dicted concentrations
and the station measured
values
X X
X X
X XXX XX
X XXX XX
VI-20
-------
TABLE VI-6 . MEASURES RECOMMENDED FOR USE IN SETTING MODEL PERFORMANCE STANDARDS1
Performance
Attribute
Accuracy of the
peak prediction
Performance Measure
Ratio of the predicted station peak to the measured station
(could be at different stations and times)
m
Difference in timing of occurrence of station peak*

At
Absence of
systematic bias
Lack of gross
error
Temporal cor-
relation*
Average value and standard deviation of the mean deviation
about the perfect correlation line normalized by the average
of the predicted and observed concentrations, calculated for
all stations during those hours when either the predicted or
the observed values exceed some appropriate minimum value
(possibly the NAAQS)
'OVERALL

Average value and standard deviation of the absolute devia-
tion about the perfect correlation line normalized by the
average of the predicted and observed concentrations, calcu-
lated for all stations during those hours when either the
predicted or the observed values exceed some appropriate
minimum value (possibly the NAAQS)
OVERALL

Temporal correlation coefficients at each monitoring station
for the entire modeling period and an overall coefficient
averaged for all stations
Spatial alignment
li OVERALL
for 1 <. i <. M monitoring stations

Spatial correlation coefficients calculated for each modeling
hour considering all monitoring stations, as well as an over-
all coefficient average for the entire day
r , r
xj XOVERALL

for 1 <. j <. N modeling hours
* These measures are appropriate when the chosen model is used to consider questions
involving photochemically reactive pollutants subject to short-term standards.

t There is deliberate redundancy in the performance measures. For example, in
testing for systematic bias, y" and o_ are calculated. The latter quantity
is a measure of "scatter" about the perfect correlation line. This is also and
indicator of gross error and could be used in conjunction with |p"| and o^-.
VI-21
-------
Consequently, we suggest, when this seems reasonable (judg-
ment is necessary here), comparing the peak station pre-
diction with the peak station measurement, regardless of
when or where they both occur.
In addressing questions involving pollutants subject to
short-term standards, diurnal variation occurs in concen-
tration levels. It is reasonable to insist short-term
predictions emulate that pattern. Differences in the tim-
ing of the peak should be considered (particularly for photo-
chemical^ reactive pollutants) and temporal correlation
should be evaluated.
In many circumstances, percentage differences between predicted
and observed concentrations seem better indicators of model
performance than gross differences. For instance, a difference
of 0.04 ppm of ozone might be regarded as serious if ambient
levels were 0.10 ppm where it might not be if those levels were
0.24 ppm. The use of such measures can cause some problems:
Ratios can become unstable at low concentrations, and the statistics
of a ratio of two random variables can be complex. Neverthe-
less, percentage differences should be calculated (possibly
along with gross differences). Further, we suggest that residuals
(prediction minus observation) be taken about the perfect correla-
tion line (prediction equals observation), since we have no a_
priori reason to regard observation as any more accurate than
prediction. This was pointed out by Anderson et al. (1977). We
also suggest normalizing the residuals by the arithmetic
average of the predicted and observed concentration.
The concentrations of greatest interest are often the higher
values, that is, those that exceed some appropriate minimum
value (possibly the NAAQS, though this may differ from one
situation to another). We may be less interested in model
reliability below those levels. We suggest that performance
measures include only those prediction-observation "pairs" where
one or the other value exceeds the chosen minimum value. (Possibly
"stratification" may be of interest, that is, repeating the calcu-
lation of measures using different minimum values).

VI-22
-------
This should not be done, however, if it results in the
number of pairs being reduced below the number required
for statistical significance.
> Measurement stations usually are widely spaced. We assumed
this spacing to be so great that the use of spatial/temporal
offset correlation coefficients would be of uncertain value.
Consequently, we did not include them among the list of
measures recommended for use.
> Redundancy should be built into the calculation of per-
formance measures. This provides an internal means for
double-checking results. For example, in testing for
systematic bias, IT and a— are calculated. The latter quan-
tity is a measure of "scatter" about the perfect correla-
tion line. This is also an indicator of gross error and
should be used in conjunction with jy"| and o\—\-

2. Recommended Performance Standards
Having identified the performance measures requiring a specific
standard, we now consider four alternative rationales for setting those
standards. We designate the four as follows:

> Health Effects
> Control Level Uncertainty
> Guaranteed Compliance
> Pragmatic/Historic

The guiding principles for each of these rationales are stated in
Table VI-7.

We describe in detail each rationale in Appendix D, deferring their
technical description in order not to interrupt the flow of this chapter.
However, to offer insight into their general nature, we present here a
brief outline of each.
VI-23
-------
TABLE VI-7.
POSSIBLE RATIONALES FOR SETTING MODEL
PERFORMANCE STANDARDS
Rationale
Guiding Principle
Health Effects
Control Level
Uncertainty

Guaranteed Compliance
Pragmati c/Hi storic
The metric of concern 1s the area-Integrated cum-
ulative health effects due to pollutant exposure;
the ratio of the metric's value based on pre-
diction to Its value based on observation must be
kept to within a prescribed tolerance of unity.
Uncertainty in the percentage of emissions control
required must be kept within certain allowable
bounds.
Compliance with the NAAQS must be "guaranteed;"
all uncertainty must be on the conservative side
even if its means introducing a systematic bias.
In each new application of a model should perform
at least as well as the "best" previous performance
of a model in its generic class in a similar appli-
cation; until such a historical data base is com-
plete, other more heuristic approaches may be
applied.
> Health Effects. The most fundamental reason for setting
air quality standards is to limit the adverse health impact
the regulated pollutants (and their products) produce.
Thus, founding a model performance standard on a health
effects basis has strong intuitive appeal. To do so, we
assume an analytic form for urban population distribution
and an exposure/dosage health effects functional, both
of which require as inputs only easily derived data. Using
these, we determine in analytic form a new health-based
metric: the area-integrated cumulative health effects. We
estimate through this metric the total health burden experi-
enced by the population during the day. The model is required
to predict concentrations that do not differ from observa-
tions to the point an unacceptable difference is seen in the
health metric. While the data used is application-specific,
the method itself is general. The assumptions made in deriving

VI-24
-------
this rationale, while extensive, seem plausible. A sample case
was conducted for ozone exposure i.i the Denver Metropolitan
region, with promising corroboration of the rationale in several
key regards. The sample case is described in detail in
Appendix D.
Control Level Uncertainty. With this rationale we set perfor-
mance standards to ensure that uncertainty in estimates of the
amount of pollution control required be kept within acceptable
bounds. These limits may be determined in a number of ways,
but we consider limits on uncertainty in control cost as a
promising means for doing so. If we can assume that pollutant
production and evolution over the modeled region can be approxi-
mated by some simple surrogate, such as an isopleth diagram
for ozone, then control uncertainty limits can be directly and
easily related to equivalent bounds in uncertainty in the pol-
lutant peak, the quantity to which control strategies are often
designed.
Guaranteed Compliance. The NAAQS are written in quite
specific terms and must ultimately be complied with. An
argument can be made that to "guarantee" such compliance,
uncertainty in model predictions must be on the "conser-
vative" side. That is, the probability must be accept-
ably small that a control strategy designed based on model
predictions will not actually achieve compliance. We con-
sider this rationale here and in Appendix D primarily for
completeness. While the rationale has some potential
usefulness, it implies the introduction of a systematic
bias into modeling results, something we would hope to
avoid in a final choice of a performance standard.
Pragmatic/Historic. Standards for all performance measures
cannot be derived based on the rationales mentioned above,
something we will discuss later in this chapter. Until
additional research expands our options by providing insight
into other rationales, we adopt a pragmatic approach. We
may proceed in either of two ways. If we are able to state

VI-25
-------
heuristically a specific guiding principle for setting a
standard for a particular measure, we invoke it. Otherwise,
we simply require the following: In each new application
a model should perform at least as well as the "best" pre-
vious performance of a model in its generic class in a
similar application. In addition to being pragmatic, this
last approach Is also evolutionary, requiring a continually
expanding and updated model/application data base.

The four rationales differ in their usefulness vis-a-vis the five
performance attributes. Shown in Table VI-8 are the attributes addressable
by measures whose standards are set by each of the rationales. Only the
Pragmatic/Historic rationale is of use in addressing all attributes;
the other three are of use principally in defining the level of performance
required in predicting values at or near the concentration peak. The Health
Effects and Guaranteed Compliance rationales also may have some application
to problems involving concentration field error.
TABLE VI-8. PERFORMANCE ATTRIBUTES ADDRESSABLE USING
PERFORMANCE STANDARD RATIONALES
Performance
Attribute
Accuracy of the
peak prediction
Absence of
systematic bias
Lack of gross
error
Temporal
correlation
^natial alinnmpnt
Health*
Effects
X

Control Level* Guaranteed
Uncertainty Compliance
X X

Pragmatic/
Historic
X

X
* These are most suited for photochemically reactive pollutants subject
to short-term standards.
VI-26
-------
One conclusion seems clear. Unless more comprehensive rationales are
developed in subsequent research work, several must be used simultaneously
to completely define standards of performance. Any one of the four can be
used to specify allowable bounds on model performance in predicting peak
concentrations. Either the Health Effects or the Pragmatic/Historic ration-
ales can be helpful in setting standards for error measures. Only the latter
of these two rationales is of use for addressing attributes of the other types,

We associate in Table VI-9 each rationale with those generic issues
for which its use is appropriate. Several assumptions are embedded in
that table. Among them are the following:

> Health effects are not of overriding concern in PSD and OSR
issues, for reasons noted earlier. (Even though we indicate
such a rationale may be used in addressing other specific-
source issues, we observe that plume "narrowness" can limit
downwind health impact).
> Near-source peak concentrations are not of primary interest
in OSR, but rather "far-field" average values.
> The Guaranteed Compliance rationale is of use in addressing
questions involving PSD as long as the air quality standards
being used are the PSD class increments.
TABLE VI-9. ASSOCIATION OF RATIONALES WITH GENERIC ISSUES
Issue Category
Multiple-Source
SIP/C
X
X
AQMP
X
X
PSD
X
X
Specific-Source
NSR
X
X
OSR EIS/R
X
X
LIT
X
X
Rationale
Health Effects
Control Level
Uncertainty
Guaranteed X X X X X
Compliance
Pragmatic/ X X X X X X
Historic
YI-27
-------
Having outlined the rationales we consider in this report, it remains
to match them with the set of performance measures we recommended earlier
in this chapter. As is clear from Table VI-8, we have no alternative but
to apply the Pragmatic/Historic rationale for those measures designed to
test for systematic bias or to evaluate temporal behavior and spatial align-
ment. However, several alternatives exist for measures dealing with peak
performance and gross error.

We select in the following ways from among the alternatives. Hoping to
avoid introducing a procedural bias, we first eliminate the Guaranteed Com-
pliance rationale from further consideration. Then, because the Health
Effects rationale is better suited for use in setting standards for peak-
accuracy measures, we choose to use it only in that way.

Our recommended choice for use in establishing standards for peak-
accuracy measures is a composite one, combining the Health Effects and Control
Level Uncertainty rationales. Were a model to overpredict the peak, a
control strategy designed based on its prediction might be expected to abate
the health impact actually occurring. If the model underpredicted, however,
the control strategy might be "underdesigned," with the risk existing that
some of the health impact might remain unabated even after control implemen-
tation. The penalty, in a health sense, is incurred only when the model
underpredicts. The Health Effects rationale then is one-sided, helping us
set performance standards only on the "low side."

On the other hand, the Control Level Uncertainty rationale is bounded
"above" and "below", that is, its use provides a tolerance interval about the
value of the measured peak concentration. For a model to be judged accept-
able under this criterion, its prediction of the peak concentration would
have to fall within this interval. Model underprediction could lead to
control levels lower than required, but residual health risks. Overpre-
diction, on the other hand, could lead to abatement strategies posing little
or no health risk but incurring control costs greater than required.
VI-28
-------
For the above reasons, we suggest that the Control Level Uncertainty
rationale be used to establish an upper bound (overprediction) on the
acceptable difference between the predicted and observed peak. We would
choose the lower bound (underprediction) to be the interval that is the
minimum of that suggested by the Health Effects and Control Level Uncertainty
rationales.

We list pur recommendations in Table VI-10, noting the possibility for
peak-accuracy measures that the recommended rationales may not be appropriate
in all applications for all pollutants. Whether health effects would be an
appropriate consideration when considering TSP, for instance, is unclear.
The Health Effects rationale is best suited for use in urban applications
involving short-term, reactive pollutants. In those circumstances when the
HE or CLU rationales are not suitable, we suggest the Pragmatic/Historic
rationale.
TABLE VI-10. RECOMMENDED RATIONALES FOR SETTING STANDARDS
Performance
Attribute
Accuracy of peak
prediction
Absence of
sytematic bias
Lack of gross
error
Temporal cor-
relation
Recommended Rationale
Health Effects* (lower side/underprediction)
Control Level Uncertainty* (upper side/overprediction)
Pragmati c/Hi stori c
Pragmatic/Historic
Pragmati c/Hi stori c
Spatial alignment Pragmatic/Historic
* These may not be appropriate for all regulated pollutants in all applica-
tions. When they are not, the Pragmatic/Historic rationale should be
employed. They are most applicable for photochemicalily reactive pollu-
tants subject to a short-term standard (0, and N0?, if a 1-hour standard
is set).
VI-29
-------
3. Sumnary Table of Recommended Measures and Standards

Until now, our discussion has remained general when relating performance
measures and standards. Here we become specific. In Table VI-11, we sum-
marize for each of the five problem types whose presence we are testing for
the performance measures we recommend and the standards we suggest. Since
the actual value of the standard may vary from one application to another
or between pollutant types, we present sample values calculated based on a
sample case. The example is appropriate for consideration of SIP/C in the
Denver Metropolitan region and is described in a case study fashion in Section
E of this chapter.

Where we invoke the Pragmatic/Historic rationale as justification for
selecting specific standards, we also state the specific guiding principle
we followed. We summarize those here:

> When the pollutant being considered is subject to a short-
term standard, the timing of the concentration peak may be an
important quantity for a model to predict. This is parti-
cularly true when the pollutant is also photochemical1y
reactive. We state as a guiding principle: "For photochem-
ically reactive pollutants, the model must reproduce reason-
ably well the phasing of the peak." For ozone an acceptable
tolerance for peak timing might be ± 1 hour.
> The model should not exhibit any systematic bias at concen-
trations at or above some appropriate minimum value (possibly
the NAAQS) greater than the maximum resulting from EPA-
allowable calibration error. We would consider in our calcu-
lations any prediction-observation pair in which either of
the values exceed the pollutant standard. Error (as
measured by its mean and standard deviation) should be
indistinguishable from the distribution of differences
resulting from the comparison of an EPA-acceptable monitor
with an EPA reference monitor. The EPA has set maximum
allowable limits on the amount by which a monitoring technique
may differ from a reference method (40 CFR §53.20). An '

VI-30
-------
TABLE VI-11. SUMMARY OF RECOMMENDED PERFORMANCE MEASURES AND STANDARDS
Performance
Attribute

Accuracy of the
peak prediction
Performance Standard
Performance of Measure Type of Rationale
Ratio of the predicted
station peak to the
Measured station peak .
(could be at different
stations and times)
Health Effects -
(loner side) com-
bined xith Control
Level Uncertainty
(upper side)
Cuidins Principle

Limitation on uncertainty
in aggregate health
impact and pollution
abatement costsf
Sample Value
(Denver Example)
BO
- <. 150 percent
Difference in timing of
occurrence of station
Pragmatic/Historic
Model must reproduce
reasonably well the
phasing of the peak.
s«y, ±1 hour
± 1 hour
Absence of Average value and standard Pragmatic/Historic
systematic bias deviation of mean devia-
tion about the perfect
correlation line normal-
ized by the average of the
predicted and observed con-
centrations, calculated for
all stations during those
hours when either predicted
or observed values exceed
some appropriate minimum
value (possibly the NAAQS}.
(u. o_)
x "'OVERALL
Lick of gross Average value and Stan- Pragmatic/Historic
error dard deviation of absolute
mean deviation about the
perfect correlation line
normalized by the average
of the predicted and
observed concentrations,
calculated for all sta-
tions durino those hours
•lien either predicted or
observed values exceed some
appropriate minimum value
(possibly the NAAQS)
Wo or very little systematic
bias at concentrations (pre-
dictions Or observations) at
or above some appropriate
Minimum value (possibly the
NAAQS); the bias should not
be worse than the maximum
bias resulting from EPA-
allowable monitor calibra-
tion error (-8 percent is
a representative value for
ozone); the standard devia-
tion should be less than or
equal to that of the differ-
ence distribution of an EPA-
acceptable monitor** com-
pared with a reference moni-
tor. (3 pphm is representa-
tive for ozone at the 95
percent confidence level)

For concentrations at or
above some appropriate
minimum value (possibly
the NAAQS), the error
(as measured by the overall
values of |uj and o|^| }
should be indistinguishable
from the difference result-
ing from comparison of an
EPA-acceptable monitor with
a reference monitor
No apparent bias at
ozone concentrations
above 0.06 ppir
(see Table VI-12 and
Figures VI-5 and VI-6
for further details)
NO excessive gross
error (see Table
Vl-12 and Figures
VI-5 and VI-6 for
further details)
\ I*!/ OVERALL

Temporal correla- Temporal correlation coef- Pragmatic/Historic
tion* ficients at each monitor-
ing station for the entire
modeling period and an
overall coefficient for
all stations
rt . rt
M OVERALL
for 1 i 1 i. M monitoring
stations
At a 95 percent confidence
level, the temporal pro-
file of predicted and
observed concentrations
should appear to be In
phase (in the absence of
better information, a con-
fidence interval may be
converted into a minimum
allowable correlation
coefficient by using an
appropriate t-statistic)
For each monitoring
station,
0.69 i rt <.0.97

Overall,
r, - 0.88
TWERALL
In this example a
value of r >. 0.53 is
significant at the
95 percent confidence
level
Spatial alignment
Spatial correlation coef-
ficients calculated for
each modeling hour con-
sidering all monitoring
stations, as well as an
overall coefficient for
the entire day

xi* "OVERALL
for 1 i J <. N modeling
hours
Pragmatic/Historic
At a 95 percent confidence
level, the spatial distri-
bution of predicted and
observed concentrations
should appear to be cor-
related
For each hour,

-0.43 f. r i 0.66
Overal1,
r - 0.17
"OVERALL
In this example a
value of r 2. 0.71 is
significant at the 95
percent confidence
level
- These measures are appropriate -hen the chosen model is used to consider questions involving photochemically reactive
pollutants subject to short-term standards.
t These may not be appropriate for all regulated pollutants In all applications. When they are not the Pragmatic/
historic rationale should be employed.
« The EPA has set maximum allowable limits on the amount by which a monitoring technique may differ from a reference
method £ •EPA™cc2ptable monitor" 1s defined here to be one that differs from a reference monitor by up to the
•aximum allowable amount.

VI-31
-------
"EPA-acceptable monitor" is defined here to be one that
differs from a reference monitor by up to the maximum
allowable amount.
> Prediction and observation should appear to be correlated
at a 95 percent confidence level, both when compared
temporally and spatially. We can estimate the nininum
allowable value for the respective correlation coef-
ficient by using a t-statistic at the appropriate per-
centage level and having the degrees of freedom required
by the number of prediction-observation pairs.

The guiding principles noted above are plausible ones, though in some
cases they are arbitrary. As a "verification data base" of experience is
assembled, historically achieved performance levels may be better indicators
of the expected level of model performance. Standards derived on this more
pragmatic basis may supplant those deriving from the "guiding principles"
followed in this report.

4. Formulas for Calculating Performance Measures and Standards

A number of performance measures are recommended in Table VI-6. Here
we state explicitly the equations used for their calculation and the forms
assumed by the standards. We include, where appropriate, brief theoretical
justifications for these relationships.

The definitions are self-explanatory for measures testing the accuracy
of the peak model prediction. Specifically,

C
pm
where Cp is the peak station prediction, Cp,,, is the peak station measurement,
a is the lower bound on the ratio of the peaks, and B is the upper bound.
The bounds may be determined either from Pragmatic/Historic considerations
VI-32
-------
or, where possible, by means of the Health Effects/Control Level Uncertainty
rationales described in Appendix D. The latter of these two approaches may
prove feasible only when considering photochemically reactive pollutants
(particularly ozone) subject to a short-term standard. Also, for such
reactive species,

|Atp| < & , (VI-2)

where |At | is the absolute value of the difference between the predicted
and observed times of the station peak, and 6 is the maximum allowable dif-
ference, say, one hour (this is an arbitrarily set value).

Underlying our definitions of bias and error is the following assump-
tion: A priori, we have no reason to prefer either prediction or observa-
tion as a better measure of reality. Both, in fact, can be subject to sig-
nificant uncertainty. It follows from this assumption that residuals (pre-
dicted concentrations minus observed ones) should be taken perpendicularly
about the perfect correlation line.

We emphasize an important point: The residual for a given prediction-
observation pair is not the geometric distance from the perfect correlation
line, as displayed in a correlogram (such as the one shown later in
Figure VI-3). Rather, the geometric distance must be scaled downward by a
factor of ^2. That this is so follows from the discussion presented below.
It is based on our requirement that prediction and observation differ by no
more than the maximum amount by which an EPA-acceptable monitoring technique
may differ from the accepted reference technique.

Uncertainty in monitoring results can be introduced from many sources.
Three principle source categories are the calibration method, the agreement
with the reference monitoring technique, and the actual instrument error.
The last of these categories includes instrument noise and precision, mea-
surement drift, and interference from other contaminants. In defining the
characteristics of the EPA-acceptable monitor we wish to use as a standard,

VI-33
-------
we have chosen to include only the first two error source categories. We
thus eliminate the need to consider performance characteristics of specific
monitoring instruments. Also, in comparing a monitor with an instrument
using the EPA-accepted reference monitoring technique, it is not unreason-
able to assume that both are subject to the same instrument error.

We may define an acceptance standard for a model insofar as error
and bias are concerned: The distribution of differences between prediction
and observation must be indistinguishable from that resulting from the com-
parison of an EPA-acceptable monitor with the accepted reference monitor.
Specifically, we define "indistinguishable" to mean
o^^e , (VI-4)

where £ and e can be determined from federal regulations (40 CFR §53.20)
for instrument performance, and TT and o_ are defined below.

We may confirm a model's acceptability by hypothesizing that the
acceptance standard for bias and error is satisfied and checking to deter-
mine whether this hypothesis is violated. Consistent with this approach,
we may assume that each prediction and observation pair are random samples
drawn from the same distribution, the one that 'describes the behavior of
an EPA-acceptable monitor with respect to a reference monitor. The stan-
dard deviation (S.D.) of a random variable whose value is the difference of
two other random variables having the same S.D. a may be expressed as
The geometric distance from the perfect correlation line, d., may
be written as
P. - M.
, = -5 1 , (VI-6)
1 ^
VI-34
-------
where Pi and M. are the i-th prediction-observation pair. We are search-
ing for a test variable o to compare with o. Therefore, referring to
Equation VI-5, we see that we must divide di by /2 to obtain the properly
scaled mean deviation from the perfect correlation line, d*, that is,

P. - M.
Thus, the average and standard deviation of the mean deviation may be
expressed as
(VI-8)
These quantities may be compared with those characterizing the distri-
bution of differences between an EPA-acceptable monitor and a reference
instrument. Those values may be derived from 40 CFR §53.20. As an example,
(see Burton, et al., 1976) an EPA-acceptable monitor for ozone/oxidants
could have a -8 percent bias and a 95 percent confidence interval of
±3 pphm (a a of 1.53 ppm). If an EPA-acceptable monitor were defined to
be subject to instrument error as well, the -8 percent bias would remain
because it is assumed due to calibration, but the 95 percent confidence
interval would increase to ±7 pphm (a a of 3.57 ppm).

We noted earlier that the "seriousness" of the magnitude of a given
residual depends on the ambient concentration of the pollutant being con-
sidered. For instance, a value for d* of 2 pphm might be considered of
less importance when ambient concentrations are on the order of 30 pphm
than when they are 10 pphm. In consideration of this effect, we suggest
VI-35
-------
normalizing residuals by the arithmetic average of the predicted and
observed concentrations for a given pair. This is consistent with our
earlier statement that, a priori, we have no reason to prefer observa-
tion over prediction as inherently better indicators of reality.

Defining the average concentration to be

P. + M.
'AVE = -4-^ • (VI-10)

we may write expressions for the normalized average and standard deviation
of the mean deviation about the perfect correlation line:

(VI-12)
A deliberate redundancy has been built into the list of suggested per-
formance measures. Both ap and a_ are measures of "scatter" about the
perfect correlation line. Thus, they are also indicators of gross error
and may be used in conjunction with those measures explicitly listed in
Table VI-6 for use in investigating gross error. These measures consider
absolute rather than signed residuals. Specifically the normalized
average value and standard deviation of the absolute deviation about the
perfect correlation line may be written
.VI-36
-------
(VI-14)
Their values may be compared with standards such that
M £
(VI-15)

(VI-16)
when the values of A and Y may be derived from instrument performance
specifications in federal regulations.

It may be helpful to visualize the definitions of d| and CAVE geomet-
rically on a correlogram. Figure VI-1 is a schematic, showing the orien-
tation of the d*-CAVE axes with respect to the P-M axes of the correlogram.
The CAV£ axis is aligned with the perfect correlation line, and both the
d* and C.uc axes are scaled downward by a factor of & from the P and
UAVE
M axes.
FIGURE VI-1.
ORIENTATION AND SCALING OF CAVE AND d* AXES
ON A PREDICTION-OBSERVATION CORRELOGRAM

VI-37
-------
Finally, we consider measures suitable for use in testing for tem-
poral correlation and spatial alignment. The former of these is of con-
cern when the chosen model is used to consider questions involving photo-
chemically reactive pollutants subject to a short-term standard. We sug-
gest the use of temporal correlation coefficients, whose values are
defined to be
•pr 53 /P.- i - Vp \/M.
- 3ST V *J 1A '
™
v ~V°H: (VM7)
IV
OVERALL
where r^ is the temporal correlation coefficient at the i-th station for
the N divisions of the modeling period, and rtovERALL *s the avera9e correla-
tion coefficient for all the K monitoring stations. Also, ^ and ap.
are the mean and standard deviations of the predictions for N hours at the
i-th station. Similarly, ^ and °H. are the mean and standard deviations
of the concentrations at the 1-th station.

In testing for spatial alignment, we recommend using the following
spatial correlation coefficients:
OVERALL
VI-38
-------
where rx. is the spatial correlation coefficient at the j-th hour for the
K monitoring stations, and rxnvERALL 1S ^e avera9e correlation coefficient
for all the N modeling period divisions (e.g., hours). Also, yp. and °p.
are the mean and standard deviations of the predictions for K stations at
the j-th hour. Similarly, vi^. and o^ are the mean and standard deviations
J J
of the concentration at the j-th hour.

As for the form of the standard, we would require that
where r . is defined at the 95 percent confidence level, perhaps using
a t-statistic if no better method is apparent.

I. A SAMPLE CASE: THE SAI DENVER EXPERIENCE

In Section D we recommended a set of measures and standards for use in
evaluating model performance. Here we illustrate how these measures might
actually be used in practice. To do so, we draw on SAI experience in model-
ing the Denver metropolitan region (Anderson et al . , 1977) using the grid-
based SAI Airshed Model (Ames et al., 1978). We first show for the sample
case the values we calculate for the performance measures; then we discuss
how to interpret their meaning.

1. The Denver Modeling Problan

Over the past several years, Region VIII of the EPA has prepared an
Overview EIS assessing the impact on the Denver metropolitan region of the
proposed construction of twenty-two separate wastewater treatment projects.
Adopting a regional approach, they assessed the projected impact of the
facilities in several key ways, among which was their effect on air quality,
They contracted with SAI in late 1976 to conduct that portion of the
assessment. SAI employed several air quality models, one a long-term
climatological model (COM) and the other a short-term photochemical model
(the SAI Airshed Model). We consider the latter of these in our sample
case.
VI-39
-------
The grid-based Airshed Model is fully three-dimensional and capable
of simulating concentrations of up to 13 chemical species, including ozone,
nitrogen dioxide and several types of reactive hydrocarbons. The modeling
grid chosen for overlaying the Denver Metropolitan region was 30 miles by
30 miles, subdivided horizontally into grid cells two miles on a side.

In cooperation with local agencies, SAI assembled meteorological
information (spatial and temporal profiles of temperature and inversion
height, as well as wind speeds and directions) characterizing atmospheric
conditions on several summertime test days, 29 July 1975, 28 July 1976,
and 3 August 1976. Also, gridded emissions inventories were compiled
(hourly by species) for those days as were estimates for the years 1985
and 2000. Simulations were then conducted, with projections also made
of air quality in the two subsequent years.

2. Values of the Performance Measures

We compare in this sample case the predicted and observed concentra-
tions of ozone at each monitoring station in the regional measurement net-
work. :- The issues we address are SIP/C and AQMP. On the test date we have
chosen, 28 July 1976, eight monitoring stations provided ozone concentra-
tion data. Their locations are shown in Figure VI-2. Of the nine sta-
tions, all but CAMP provided usable ozone measurements. Data were
recorded as hourly averages for each hour throughout the day.

The Airshed Model generates its predictions as grid cell-averaged
hourly concentrations. Through interpolation, these values may then be
used to estimate station predictions (concentrations at fixed points
rather than grid cell averages). Plotted in Figure VI-3 are the predicted
and observed ozone concentrations at each of the eight stations reporting
on the modeled day (Anderson, et al., 1977). From the station predictions
and observations, we can calculate performance measure values. We present
the values of these measures in Table VI-12. We indicate in the table
how these values might be interpreted in evaluating model performance,
considering each in more detail below.
VI-40
-------
KEY

NG - Northglenn NJ
WE - Wei by GM
AR - Arvada 0V
CR - C.A.R.I.H. PR
CM - Continuous Air Monitoring
Program [CAMP]
National Jewish Hospital
Green Mountain
Overland
Parker Road
NORTH
,
3K "*--•' *?—*%--=
''-^r-^-:~-^~---~-^-^~^-'--^s^'^-£i'-*

j .
I I t I I I I I I
1_J I I I I I 1 1
SOUTH
FIGURE VI-2. LOCATIONS OF MONITORING STATIONS IN THE DENVER METROPOLITAN REGION

VI-41
-------
.-0 -0 PARKER RO
5 9
8 9 10 11 12 2
? To TT T? T I I
fl
*

Time of Day, By Hourly Interval
Cbsar/ed'
Predicted
FIGURE VI-3.

PREDICTED AND OBSERVED OZONE CONCENTRATIONS
AT EACH MONITORING STATION DURING THE DAY
(DENVER, 28 JULY 1976)

.VI-42
-------
TABLE VI-12. SAMPLE VALUES FOR MODEL PERFORMANCE STANDARDS (DENVER EXAMPLE)
Performance
Attribute

Accuracy of
the peak pre-
diction
Compos 1te
Importance
Category*

1
Performance Measure
Ratio of predicted to mea-
sured station peaks

vc-.
Timing of the pe«k+
it.
Performance Standard
Calculated Value
Interpretation
80 i f^ilSO percent
pm

t 1 hour
99 percent
+ 1 hour
Peak performance of the model
Is satisfactory.
The timing of the peak Is
satisfactory. Since the modal
provides only hourly averages.
this 1s as finely as at can b«
determined. p
•<
•—i
i

CO
Absence of
systematic
bias
Average value and standard
deviation of the mean
deviation about the per-
fect correlation line.
normalized by the average
of the predicted and
observed concentrations
For concentrations (predicted
or observed) at or above the
NAAQS, the bias should not be
greater than the maximum bias
resulting from EPA-allowable
monitor calibration error. A
-8 percent b1as--not normal-
ized—Is representative, which
for this case 1s

11 • -0.4 pphm
o -1.53 pphm

for an EPA-acceptable
mon1torS--see Burton, et al.
(1976)--when all concentra-
tions are considered. An
EPA-acceptable monitor can
have an uncertainty with
respect to a reference moni-
tor of as much as i 3 pphm
for ozone at a 95 percent
confidence level.
For concentrations greater
than the NAAQS (8.0 pphm),

i • 4.IS

o- • 19.41

For all concentrations.
M • -23.4t

o- • 33.5t
u
In a form suitable for
comparison with non-
normalized Instrument bias,

M • -0.52 pphm

o • 1.22 pphm

when all concentrations
are considered.
For concentration] at or abovt
the NAAQS, a slight positive
bias exists, though within
acceptable bounds. When all con-
centrations are considered, a
larger negative bias seems to
exist. Put In a form suitable
for comparison with an tfft-
allowable monitor,' however, the
bias appears to be 1nd1st1ngu1sh-
ble from that resulting from maxi-
mum allowable calibration error.
Overall, no conclusion of unac-
ceptably high bias would seem
justified.
-------
TABLE VI-12 (Concluded)
Compos He
Performance Importance
Attribute Category*
^performance Measure
Lack of
grow error
Average value and standard
deviation of the absolute
mean deviation about the
perfect correlation line,
normalized by the average
of the predicted and
observed concentrations
Performance Standard
Calculated Value
For concentrations at or
above the NAAQS. the error
should be Indistinguishable
from the distribution of error
resulting from comparison of
an EPA-acceptable monitor'
with a reference monitor.
Representative values for an
EPA-acceptable monitor (-8
percent bias; i 3 pphm at a
95 percent confidence level)
might be estimated to be

|»| • 1.22 pphm

«)M| • O.gS pphm .

Note that these values are
based on non-normal1 ted
deviations.
For concentration* greater
than the NAAQS (8.0 pphm),

|5| - 1S.7J

9... - 19.41 .
l»l
For all concentrations.

|5| - 31.51

.,.,. 33.51 .

In a font suitable for com-
parison with non-normal 1ted
Instrument error,
I,) - 1.12 pphf)
• • 0.72 pphm .
Interpretation
For concentrations at or above
the KAAQS, the error seems to be
about half of what Is seen If all
concentrations are considered.
The model thus appears to be sub-
ject to less error at the higher
concentration range. We can
determine the acceptability of
this error level by converting to
a non-normal 1 ted form for com-
parison with an estimate of that
resulting from use of an EPA-
aceeptable monitor.* Even when
all concentrations'are considered,
the error In model predictions
appears to be less than that
resulting from monitoring technloue
differences. Me conclude thut the
mode! performance Is acceptably
good Insofar as error Is concerned.
Temporal
correlation
Temporal correlation coef-
ficients at each monitor-
Ing station and an overall
coefficient (the Hi-
station average)

*V COVERALL
for 1 4 1 4 H monitoring
stations
At a 95 percent confidence
level, predicted and
observed concentrations
should appear to be cor-
related. Using • t-
stattstlc to estimate the
minimum acceptable correla-
tion coefficient. In this
example, we find
Xin
0.53
For each monitoring station. For all stations and overall, pre-
... ... dieted end observed concentrations
o.w 4 r. iu.»i . tppMr tg DC correlated. The model
1 performance appears to be within
Overall. acceptable bounds.
'OVERALL
0.88
Spatial
alignment
Spatial correlation coef-
ficients for each model-
ing hour and an overall
coefficient for the entire
day (the all-hours
average)
V "OVERALL
for 1 4 J 5. H modeling
hours
At a 95 percent confidence
level, predicted and
observed concentrations
should appear to be cor-
related. Using a t-
statlstlc to estimate the
minimum acceptable correla-
tion coefficient, in this
example we find

r, • 0.71 .
"mln
For each modeling hour.

-0.44 t r. t O.Mi
Overall.
"OVERALL
0.17
During none of the hours considered
(all daylight hours) do prediction
and observation appear to be cor-
related at the 95 percent confidence
level. Model predictions appear to
be spatially misaligned, although
the presence of temporal correlation
suggests that the misalignment may
not be a serious problem. (Another
Interpretation may be correct!
Either rx is too stringent a measure
of spatial alignment or rt Is too
lenient a measure of temporal be-
havior. Only by additional research,
however, will we be able to confirm
or refute this.)
* The composite Importance category is determined by consulting Tables Vl-2 end IV-3 for the appropriate Issue and pollutant/averaging time (in this
example, SIP/C and ozonc/one-hour averaging time). The composite category is the less stringent of the two Importance rankings.

t These measures are appropriate when the chosen model is used to consider questions involving photochemically reactive pollutants
subject to short-term standards.
I An "EPA-acceptable monitor"
maximum allowable amount.
Is defined here to be one that differs from a monitor using the EPA reference technique by up to the
-------
3. Interpreting the Performance Measure Values

Briefly, we summarize the conclusions suggested by the model perfor-
mance measures. First, even though the predicted and observed concentra-
tion peaks occur at different monitoring stations and times (North Glenn
at 2-3 p.m. versus Welby at 1-2 p.m.), their values agree quite closely,
well within the acceptable tolerance.

Second, systematic bias appears to remain within acceptable limits.
We can demonstrate this graphically/first by plotting prediction-
observation pairs in a correlogram (see Figure VI-4) ar.d then plotting the
normalized mean deviation about the perfect correlation line as is done
in Figure VI-5. From this latter figure (suggested by Anderson, et al.,
1977) we see that the Airshed Model, while systematically underpredicting
at concentration levels below 4.5 pphm, does not appear subject to such
bias at concentrations above that level. Incidentally, recent internal
studies at SAI have indicated that the Denver region may be subject to
background concentrations as high as 4 pphm (Anderson, 1978), values
substantially higher than those supplied as input to the Airshed Model.
Also, we may compare the deviations about the perfect correlation line
to those that we would expect from comparison of an EPA-acceptable
monitor with a monitor using the EPA reference technique (normally
distributed, -8 percent bias, ± 3 pphm at the 95 percent confidence level —
see Burton, et al., 1976). This comparison is shown in Figure VI-6. To
aid in presenting this graphical comparison, we have converted deviations
to the non-normalized form. We observed that the means (a measure of syste-
matic bias) of both are nearly the same and that the standard deviation of
prediction-observation deviations is somewhat less than that of the monitor-
ing error distribution.

Third, consistent with our conclusions about systematic bias, gross
error also appears to be within tolerable bounds. We show in Figure VI-7
the distribution of non-normalized error, that is, the absolute deviation
of predictions and observations from the perfect correlation line. For
reference we also" estimate the corresponding distribution resulting from
VI-45
-------
20
15
E
O
IB
E
1
U

10
o

CO
O

TJ
B
C

O
C
5 NAAQS 10 15

P = Predicted 03 Concentration (pphm)

FIGURE VI-4. CORRELOGRAM OF OZONE OBSERVATION-PREDICTION
PAIRS FOR SAMPLE CASE (DENVER, 28 JULY 1976)
VI-46
-------
0.25i—
e
o
0
o
LJ

a
C7)

1-0-5
c
o
fij
'C.
O
u
CJ
H-
'-
O
f i..

E
O
"1

I
-1.0-
-1.5
-2.01-
,r\ A „
Average Ozone Concentration (pphm)

fPredicted + Observed!
I 2 J
FIGURE VI-5.
NORMALIZED DEVIATIONS ABOUT THE PERFECT CORRELATION LINE AS A FUNCTION

OF OZONE CONCENTRATION (DENVER, 28 JULY 1976)
-------
MEAN (STO, DEV.
1.22 pphmj
DEVIATION OF PREDICTED
VERSUS OBSERVED POINTS
FROM PERFECT CORRELATION
LINE (111 ONE-HOUR-AVERAGED
DATA POINTS)
0.0?
(TRUE-INSTRUMENTAL)
EPA ACCEPTABLE MONITOR
(MEAN BIAS • -8 PERCENT;
i 3 pphm AT 95 PERCENT
'CONFIDENCE LEVEL)
(TRUE-INSTRUMENTAL)
MAXIMUM PROBABLE ERROR
(MEAN BIAS • -B PERCENTi
1 1 pphm AT 95 PERCENT
CONFIDENCE LEVEL)
0 1
Non-norm 11ied Deviation (pphm)
FIGURE VI-6.
NON-NORMALIZED OZONE DEVIATIONS ABOUT THE PERFECT CORRELATION LINE
COMPARED WITH INSTRUMENT ERRORS (DATA FOR 14 HOURS AND 8 STATIONS,
DENVER, 28 JULY 1976)
-------
o.40r
c*
U3
0.30-
4)
u
u
u
O
0.20-
JO
JO
2
0.10-
MEAN (SJQ. DEV. = 0.72 pphm)
ABSOLUTE DEVIATION OF PREDICTED AND OBSERVED
POINTS FROM PERFECT CORRELATION LINE (111
ONE-HOUR-AVERAGED DATA POINTS)
(TRUE-INSTRUMENTAL) EPA ACCEPTABLE
MONITOR (MEAN BIAS = -8 PERCENT;
± 3 pphm AT 95 PERCENT CONFIDENCE LEVEL
3 4
Non-normalized Error (pphm)
FIGURE VI-7.
NON-NORMALIZED OZONE ABSOLUTE DEVIATIONS ABOUT THE PERFECT CORRELATION LINE
COMPARED WITH INSTRUMENT ERROR (DATA FOR 14 HOURS AND 8 STATIONS, DENVER,
28 JULY 1976)
-------
comparison of an EPA-acceptable monitor with an EPA reference instrument. We
see that the mean value and standard deviation of the prediction-observation
"error" are both somewhat less than those resulting from instrument differ-
ences. The conclusion suggests itself that gross error is within acceptable
bounds, though we caution that the shape of the instrument difference curve
1s an estimate and needs to be analyzed In further detail.

Fourth, temporal behavior at each monitoring station seems satisfac-
tory, appearing correlated to better than the requisite 95 percent con-
fidence level. We note that the correlation we have observed provides
information only about the "shape" of the concentration profiles (shown
1n Figure VI-3), not its absolute level. In general, predicted concen-
trations rise and fall when observed values do, though the concentration
values might be quite different. Only by examining bias and error per-
formance measures can we draw conclusions about concentration levels.

Fifth, spatial alignment does not appear to be acceptably good.
During none of the 14 hours considered, do the spatial patterns of pre-
dictions and observations appear to be correlated at the 95 percent
confidence level. In fact, for a number of hours, the correlation seems
quite poor. Two possible explanations exist. Either the spatial cor-
relation coefficient is too "stringent" or the predicted concentration
field in fact is misaligned. Since temporal correlation appears strong,
the lack of corresponding spatial correlation is somewhat surprising,
though countervailing errors responsible for this conceivably could be
present. It is also possible that the temporal correlation coefficient
either is too "lenient" or it should not be computed including concentra-
tions at all daylight hours. Presently, we do not know which of these
explanations is correct, noting only that it is a subject for future
investigation. Conceivably, measurement data errors could also be contri-
buting to the problem.

In this example, we can examine model predictions for spatial mis-
alignment. To do so, we conducted an informal experiment among several
of our staff. In general, reconstructing the "true" concentration

VI-50
-------
field from a "sparse" set of observational data is a difficult and uncer-
tain process. Nevertheless, we attempted, using only station measurement
data, to draw isopleth maps showing contours of constant concentration
values. The process, of course, is a highly subjective one, requiring the
person doing the drawing to make a number of judgmental and often arbi-
trary decisions. In this case, a useful result was achieved.

None of the participants in the experiment were able to draw unam-
biguous isopleth maps for those hours when overall concentrations were low
(before 11 in the morning and after 3 in the afternoon). However, while
they varied widely in their estimates during the four "peak hours" of the
configurations for lower outlying concentration isopleths, each agreed
reasonably well on their estimates of the location of the peak. We com-
pare in Figure VI-8 a "ground-trace" of their composite estimates with
the peak locations predicted by the Airshed Model.

We observe that the ground-traces of the predicted and observed peaks
differ, both in direction and speed of drift. This suggests that either
the model has had some difficulty in simulating atmospheric dispersion
or it is being driven by inputs that imperfectly characterize ambient
conditions on the modeling day. Based on a generally favorable model
performance rating, as judged by the other four types of measures, we
feel the latter of these two explanations is more likely.

The model input data most likely to have caused the alignment problem
is the temporally and spatially varying wind field. By comparing the
ground-trace of the predicted peak with the directions and speeds of pre-
vailing winds that we input to the Airshed model, we confirmed that the
wind field did indeed appear to be "forcing" the predicted pollutant
cloud in just the direction noted in Figure VI-8.

We emphasize that this does not confirm that "errors" in the input
wind field were responsible for the spatial misalignment, but the evi-
dence is suggestive. Final confirmation or refutation would come by
VI-51
-------
NORTH
^2f^3^k3^?fff
^1200-1300
£ ~- '• -a*- '*-i-S-i *r^t- "-
fiea^Cg£*y
(time of day)
PREDICTED

D MEASURED
[ i I I I i I I I I [ I 1 I I I I I L—I
SOUTH
FIGURE VI-8.
GROUND-TRACES OF THE PREDICTED AND OBSERVED PEAK OZONE
CONCENTRATIONS (DENVER, HOURS 1100-1200 TO 1400-1500

LOCAL STANDARD TIME, 28 JULY 1976)
VI-52
-------
rerunning the Airshed Model using a wind field "adjusted" to better
mirror our updated estimates of the meteorology on the modeling day. If
agreement, as evaluated by the five types of performance measures, were
"better," then we might conclude that wind field imperfections were
responsible for our misalignment problems.

F. SUGGESTED FRAMEWORK FOR A DRAFT STANDARD

We have now completed our central objective in this report: the
identification and specification of model performance measures and stan-
dards. In doing so, however, we have not solved the problem but rather
only begun a discussion that will be a continually evolving one. Almost
certainly, the specific measures and standards employed to evaluate
model performance will change as our insight and experience expands.
On balance, the most enduring benefit from this study will be the con-
ceptual structure it sets.

With that structure in mind, we discuss one final subject: a frame-
work for a draft model performance standard. We view the promulgation
of the standard as having two distinct parts: the text of the standard
itself and an accompanying guidelines document. Where the standard
should be quite specific about selecting and applying the performance mea-
sures to be used, there needs to be a guidelines document in which sup-
plementary discussion and examples are provided. While a full examina-
tion of the interrelationships between the two documents is beyond the
scope of the current study, we illustrate in Figure VI-9one possible
configuration.

We focus in this discussion on suggested elements of a draft per-
formance standard. We state several of the functional sections it
should contain:

> Goals and Objectives. The reasons for insisting on model
validation should be stated, as well as a summary of
expected costs and benefits. Our objectives in conduct-
ing performance evaluation should be clearly presented.
VI-53
-------
STANDARD
GOALS AND OBJECTIVES
1
OVERALL MODELING ACCEPTANCE
CRITERIA (E.G., "MODELING
MUST BE DONE FOR 'WORST
CASE1 EPISODE CONDITIONS"
I
DETERMINATION OF
PERFORMANCE MEASURES
1
SPECIFICATION OF
PERFORMANCE STANDARDS
1
CALCULATION OF MEASURES
I
EVALUATION OF
MODEL ACCEPTABILITY
I
DETERMINATION OF
REQUIRED ACTION
RATIONALE FOR GOALS AND OBJECTIVES
GUIDANCE ON CHECKING WHETHER
THE MODELING EFFORT CONFORMS
TO OVERALL ACCEPTANCE CRITERIA
SUPPLEMENTARY GUIDANCE ON
PROPER SELECTION AND RANKING
OF PERFORMANCE MEASURES

BACKGROUND AND STATEMENT |
11 i'
OF RATIONALES FOR STANDARDS

ADDITIONAL GUIDANCE ON THE
GUIDELINES
/;1 I
CALCULATION OF MEASURES

GUIDANCE ON INTERPRETATION
OF THE VALUES OF THE
MEASURES; CASE STUDIES

SUPPLEMENTARY DISCUSSION OF
PROCEDURAL ALTERNATIVES
FIGURE VI-9. POSSIBLE RELATIONSHIPS BETWEEN THE MODEL PERFORMANCE
STANDARDS AND A GUIDELINES DOCUMENT
VI-54
-------
> Overall Modeling Acceptance Criteria. Important criteria
for judging a modeling effort in an overall sense should
be clearly stated, along with the action required if any
of the criteria are not satisfied. Among possible criteria are
the following: The verification must be done for modeling
days typical of "worst case" conditions, the measurement
network must meet certain stated minimum standards
(numbers, types and configurations of the monitoring
stations), and point source models must be verified using
the appropriate prototypical data base (one appropriate for
an application similar to the proposed hypothetical one).
Without these and perhaps other overall criteria being sat-
isfied, model evaluation would be premature.
> Determination of Performance Measures. The procedure must be
stated for determining the performance measures to be used
for model evaluation. Instructions must also be provided
for matching the importance ranking of each of the model
performance attributes to the type of issue being
addressed and the pollutant/averaging time being considered.
We might do so using the importance tables we presented
earlier in this chapter and repeat for convenience in
Tables VI-13 and VI-14.
> Specification of Performance Standards^ The standards must
be clearly stated for each of the performance measures to
be used. We present in Table VI-15 one format for doing
so, presenting the standards in the form of general prin-
ciples. In each instance, the actual numerical standard is
dependent on the characteristics of the specific application.
Guidance must be provided on how to determine the proper
numerical values.
> Calculation of Measures. Each measure should be defined
mathematically, accompanied by directions on precisely how
the measures are to be calculated.
VI-55
-------
TABLE VI-13. IMPORTANCE OF PERFORMANCE ATTRIBUTES BY ISSUE
Importance of Performance Attribute*
Performance Attribute

Accuracy Of the peak
prediction
Absence of systematic
bits

Lock of gross error
Temporal correlation

Spatial augment
S1P/C
1
1
2
2
2
!SSf.
1
i
2
2
2
PSO
1
1
1
3
1
«SR
1
1
1
- 3
3
OSR
2
1
2
3
3
II5/R
1
1
1
3
3
til
1
1
1
3
3
Category 1 - Performance standard must always be satisfied.
Category 2 - Performance standard should be satisfied, but some 1t*My
My be allowed at the discretion of a reviewer.
Category 3 • Meeting the performance standard Is desirable but failure
Is Mt sufficient to reject the model; Measures dealing
•It* this problem should be regarded as 'Informational.*
TABLE VI-14.
IMPORTANCE OF PERFORMANCE ATTRIBUTES BY POLLUTANT
AND AVERAGING TIME
Importance of Performance Attribute*
Pollutants with Short-teni Standards
Pollutants with
Long-term Standards
Performance
Attribute
Accuracy of the
peak prediction
Absence of
systematic bias
Lack of gross
error
Temporal
correlation
Spatial
alignment
V*
(1 hour)1
1

1
1

CO**
(1 hour)
1

1
2

NWC*
(3 hour)
1

1
2

so2
(3 hour)
1

1 .
2

^* CO
Ull (8 hour)
1 1

1 1

1 1
1 2

1 2

TSP**
(24 hour)
1

1
3

»2-
(24 hour)
1

1
3

N02«
(1 year)
3

1
VA++

TSP
(1 year)
3

1
M/A

S02
(1 year)
- 3

1
H/A

• Category 1 - Performance standard Bust be satisfied.
Category 2 - Performance standard should be satisfied, but some leeway may be allowed at the discretion of a reviewer.
Category 3 - Meeting the performance standard Is desirable but failure is not sufficient to reject the model.

•t No short-term H02 standard currently exists.

I Averaging times required by the NAAQS are In parentheses.

*• Primary standards.

tt The performance attribute Is not applicable.
VI-56
-------
TABLE VI-15. MODEL PERFORMANCE MEASURES AND STANDARDS*
Performance
Attribute

Accuracy of the
peak prediction
Performance Measure
Ratio of the predicted station peak to the mea-
sured station (could be at different stations)
Cn /Cn
PP "«
Difference in timing of occurrence of station
peak +
Performance Standard
Limitation on uncertainty in aggregate health
impact and pollution abatement costs*
Model must reproduce reasonably well the
phasing of the peak--say, ±1 hour
Absence of system-
atic bias*
Average value and standard deviation of the mean
deviation about the perfect correlation line,
normalized by the «verage of the predicted and
observed concentrations, calculated for all
stations during those hours when either the
predicted or the observed values exceed some
appropriate minimum value (possibly the NAAQS)
I" °*'OVERALL
No or very little systematic bias at concen-
trations (predictions or observations) at or
above some appropriate minimum value (possibly
the NAAQS); the bias should not be worse
than the maximum bias resulting from EPA-
allowable calibration error (-8 percent is a
representative value for ozone); also, the
standard deviation should be less than or
equal to that of the difference distribution
between an EPA-acceptable monitor" and an EPA
reference monitor (3 pphm is representative
for ozone at the 95 percent confidence level)
Lack of gross Average value and standard deviation of the
error! absolute mean deviation about the perfect cor-
relation line, normalized by the average of the
predicted and observed concentrations, calcu-
lated for all stations during those hours when
either the predicted or the observed values
exceed some appropirate minimum value (pos-
sibly the NAAQS)
For concentrations at or above some appropria-.e
minimum value (possibly the NAAQS) the error
(as measured by the overall values of )u| and
clul) should not be worse than the error result-
ing from the use of an EPA-acceptable monitor**
OVERALL
Temporal Cor-
relation*
Spatial alignment
Temporal correlation coefficients at each mon-
itoring station for the entire modeling period
and an overall coefficient averaged for all
stations
M "-OVERALL
for 1 <. 1 <. H monitoring stations

Spatial correlation coefficients calculated
for each modeling hour considering all monitor-
ing stations, as well as an overall coefficient
average for the entire day

r . r
xj "OVERALL
for 1 i j <. N modeling hours
At a 95 percent confidence level, the temporal
profile of predicted and observed concentra-
tions should appear to be in phase (in the
absence of better information, a confidence
interval may be converted into a minimum
allowable correlation coefficient by using an
appropriate t-statistic)
At a 95 percent confidence level, the spatial
distribution of predicted and observed concen-
trations should appear to be correlated
* There is deliberate redundancy in the performance measures. For example, in testing for systematic bias, u and o-
are calculated. The latter quantity is a measure of "scatter" about the perfect correlation line. This is also
an indicator of gross error and should be used in conjunction with |TT| and °irH-
5 These measures are appropriate when the chosen model is used to consider questions Involving photochemically reac-
tive pollutants subject to short-term standards.
+ These may not be appropriate for all regulated pollutants in all applications. When they are not, standards derived
based on pragmatic/historic experience should be employed.
** By "EPA-acceptable monitor" we mean a monitor that satisfies the requirements of 40 CFR S53.20.
VI-57
-------
> Evaluation of Model Acceptability. The rating procedure
to be used in evaluating model performance must be stated.
Guidance should be supplied on the way in which problem
importance ranking is "folded in" with the performance
rating for each of the measures.
> Determination of Required Action. The alternative actions
required of the model user, depending on the model evalua-
tion, must be stated. Among the possible alternative out-
comes of the model evaluation are the following: The model
is rated acceptable, the model requires a waiver from an
outside reviewer before acceptance can be granted (that is,
the model is deficient in some Category 2-importance problem
area), or the model is unacceptable (the model is deficient
in some Category 1-importance problem area).

We end our discussion of a suitable structure for a draft performance
standard by noting that this has been only a brief encounter with an
important and complex subject. We recommend that it be examined in far
greater detail in subsequent work.
VI-58
-------
VII RECOMMENDATIONS FOR FUTURE WORK

In this study we have suggested a conceptual framework within which model
performance may be objectively evaluated. We have identified key attributes
of a well performing model and selected performance measures for use in detect-
ing the presence or absence of each attribute. For the measures chosen for
use, we have developed explicit standards that specify the range of their
acceptable values.

Throughout, we have maintained the point of view that measures and stan-
dards of performance for models should be determined as independently as possible
of considerations about model-specific limitations and data inadequacies.
Remembering this perspective may be important when evaluating the practical
utility of the procedure suggested in this report in certain point source appli-
cations. This is particularly true when the available measurement data are
"sparse." Where data quantity and resolution (temporal and spatial) are insuf-
ficient to permit meaningful calculation of the performance measures, we view
this more as a data inadequacy that must be overcome than as a deficiency in
the model evaluation framework suggested here.

The development of a performance evaluation procedure for models is an
evolutionary process. We have advanced in this study a conceptual structure
and a first-generation procedure for conducting such an evaluation. We now
recommend ways in which development may proceed, moving from the conceptual
framework provided in this study to the realm of practical application of per-
formance evaluation procedures.

We recommend that the work begun in this study continue in several key
areas. In this chapter we outline briefly our specific recommendations, group-
ing them into three categories: areas for technical development, assessment of
institutional implications, and documents to be compiled. We consider each
category in turn.
VII-1
-------
A. AREAS FOR TECHNICAL DEVELOPMENT

A number of important technical areas remain that would benefit from
additional developmental work. We consider four key areas here.

1. Further Evaluation of Performance Measures
In this study, a sample case has been considered that permits us to
evaluate in a practical situation the utility of the recommended performance
measures in detecting the presence or absence of desirable model attributes.
However, the suitability for use of each of these measures needs further evalu-
ation over a range of circumstances. Specifically, we recommend the following:

> Additional case studies need to be considered, with perfor-
mance measures calculated for each. The choice of case studies
should be nade in order to "stress" the evaluation procedure,
that is, any limitations should be made apparent. The range of
case studies should include both multiple-source and specific-
source applications.
> The behavior of the suggested performance measures needs
to be assessed over a range of conditions. Alternate or supple-
mentary performance measures should be identified, if required,
so as to further extend the range of applicability of the evalua-
tion procedure suggested in this study.
> A performance measure evaluation analysis should be conducted.
Two concentration fields, initially aligned spatially and
temporally, could be progressively "degraded," that is, offset
in space or time. By observing the corresponding changes in the
values of the performance measures and the conclusions that derive
therefrom, insight could be gained into their overall suitability for use.

2. Identification and Specification of Prototypical Point Source
"Test Bed" Data Bases

For the purposes of model evaluation in the many specific-source appli-
cations where site-specific data are either inadequate or nonexistent, a

VII-2
-------
"test-bed," or surrogate, data base is required. This data base must provide
concentration data of sufficient spatial extent and temporal frequency to
permit the calculation of meaningful values for the model performance measures.
Selection of a particular data base could be made by determining, from among
several prototypical "test beds," which derives from conditions most like those
in the proposed application. We recommend that the following work be under-
taken:

> A comprehensive list of prototypical point source situa-
tions should be compiled.
> For each prototypical situation, a "test bed" data base
should be specified and assembled.

3. Examination of Performance Evaluation Procedure in Sparse-Data
Point Source Applications

We have identified in this study several key attributes of a well-
performing model, for each of which presence or absence may be detected
by calculating certain performance measures. However, for the values of
these measures to assume statistical significance, a certain minimum level
is required for the spatial extent and the temporal frequency of the measure-
ment data. Often, in multiple-source applications, such a minimum level is
attained, particularly in urban areas with well-developed monitoring networks.
In specific-source applications, though, a minimum acceptable level of data
may not be attained. To overcome this problem, we have suggested that proto-
typical point source data bases be assembled for the purposes of model evalua-
tion. These data bases would provide sufficiently well-conditioned data for
calculation of the performance measures to be useful.

As a practical matter, however, such data bases are not presently available
to the modeling community. In lieu of their use, other sources of data may be
used for the purpose of model evaluation, despite the deficiencies in such data.
For example, a limited amount of tracer data may be gathered. If the situation
to be modeled involves either construction at a site where another source already

VII-3
-------
exists or retrofit of pollution control equipment, then some limited site-
specific monitoring data may be available. Such data may not be sufficiently
"well-conditioned" to permit meaningful calculation of the performance measures
suggested for use. What can be done? Should calculation of the performance
measures be allowed using the possibly deficient, sparse data available, or
should the model evaluation process be halted until more "robust" data are
acquired? We suggest that the implications of both-these alternatives be assessed,
searching for those limited circumstances where a "middle ground" may be found,
with alternative measures and standards identified for use that are less
"demanding" in their measurement data requirements. The implications of allow-
ing the use of such supplementary measurements also need to be examined.

Also, a related issue may be important in point source modeling appli-
cations: relative versus absolute model performance. Are there circumstances
in which a model may be better able to predict relative, incremental changes
in concentration than absolute ground-level values? It should be determined
whether or not such situations occur in practice. If they do, relative vali-
dation of a model may become a consideration. This could be of concern, for
example, when using a Gaussian model to assess the impact of control equipment
that is retrofit on an existing source. If relative performance is deemed im-
portant in some circumstances, then additional performance measures and stan-
dards should be identified which allow the modeler to make such an assessment.

4. Further Development of Rationales for Setting Performance Standards

Several rationales for setting performance standards have been examined
1n this study. Some of these merit further technical development and assess-
ment of the range of their applicability. Also, additional rationales should
be identified where possible. Towards these ends, we recommend the following:

> Additional developmental work should continue on the Health
Effects (HE) and Control Level Uncertainty (CLU) rationales.
> The use of the HE/CLU rationales in setting a standard for
the ratio of predicted and observed peak station concentra-
VII-4
-------
tions should be exposed to peer review. A journal article
on the subject should be prepared and submitted for publi-
cation.
> Explicit error and bias standards should be calculated for
all regulated pollutants- This may be done using monitoring
specifications in federal regulations. In this study, only
bias and error standards for ozone were calculated numerically.

B. ASSESSMENT OF INSTITUTIONAL IMPLICATIONS

A number of institutional requirements are implied by any decision to
promulgate standards for model performance, or even by a decision to publish
formal guidelines for model performance evaluation. We recommend that these
implications and their attendant procedural and resource requirements be
assessed. Among the many questions to be resolved are the following:

> Regulatory Responsibility
- How should formal performance standards be promulgated—
or should they be promulgated at all?
- If standards are stated or recommended, how will they
be updated?
- Who will accumulate information about historically
achieved model performance? (This information would
be required when setting a standard invoking the Pragmatic/
Historic rationale.)
> Custodial Responsibility
- Who will identify and assemble the prototypical "test
bed" data bases for use in point source applications?
- Who will maintain, store, and distribute the "test bed"
data bases?
> Review Responsibility
- Who should review the adequacy of model performance in
a specific application?
VII-5
-------
- Does a model need to be repeatedly evaluated using a
"test bed" data base? If not, who decides when a
model/data base combination has been sufficiently
examined?
> Advisory Responsibility
- What advisory documents should be provided to the model
user community?
- Who will provide guidance to model users and how should
that support'be funded?

These are simply a few of the many procedural and institutional questions
that arise. Answers to these and other key questions should be sought at
an early date.

C. DOCUMENTS TO BE COMPILED

Specific documents will have to be drafted that describe suggested or
mandated model performance standards. Two documents seem appropriate for
publication (though conceivably they could be combined into a single guide-
lines document). These documents are the following:

> Formally promulgated model performance standards along with
specific procedures for evaluating performance. These could
be presented in guideline form rather than as mandated stan-
dards. The latter of these two approaches may be preferable,
given the complexities of modeling and its attendant uncertain-
ties.
> Advisory/informative model performance guidelines document.
This may provide the advice and information necessary to con-
duct a meaningful model performance evaluation. It could
play the role, with respect to the performance standards,
that is indicated in Figure VI-9.
VII-6
-------
APPENDIX A
IMPORTANT PARTS OF THE CODE OF FEDERAL
REGULATIONS CONCERNING AIR PROGRAMS
A-l
-------
APPENDIX A
IMPORTANT PARTS OF THE CODE OF FEDERAL
REGULATIONS CONCERNING AIR PROGRAMS
PART 50. NATIONAL PRIMARY AND SECONDARY
AMBIENT AIR QUALITY STANDARDS
Section
50.1 Definitions.
50.2 Scope.
50.3 Reference Conditions.
50.4 National primary ambient air quality standards for
sulfur oxides (sulfur dioxide).
50.5 National secondary ambient air quality standards for
sulfur oxides (sulfur dioxide).
50.6 National primary AAQS for particulate matter.
50.7 National secondary AAQS for particulate matter.
50.8 National primary and secondary AAQS for carbon monoxide.
50.9 National primary and secondary AAQS for photochemical oxidants,
50.10 National primary and secondary AAQS for hydrocarbons.
50.11 National primary and secondary AAQS for nitrogen dioxide.

Appendix A—Reference Method for the Determination of Sulfur Dioxide in
the Atmosphere (Pararosaniline Method).
Appendix B—Reference Method for the Determination of Suspended
Particulates in the Atmosphere (High Volume Method).
Appendix C—Measurement Principle and Calibration Procedure for the
Continuous Measurement of Carbon Monoxide in the Atmosphere
(Non-Dispersive Infrared Spectrometry).
Appendix D—Measurement Principle and Calibration Procedure for the
Measurement of Photochemical Oxidants Corrected for Inter-
ferences due to Nitrogen Oxides and Sulfur Dioxide.
Appendix E—Reference Method for the Determination of Hydrocarbons
Corrected for Methane.
Appendix F—Reference Method for the Determination of Nitrogen Dioxide
(24-Hour Sampling Method)

Authority: The provisions of this Part 50 issued under Sec. 4, Public
Law 91-604, 84 Stat. 1679 (42 U.S.C. 1857c-4).

Source: The provisions of this Part 50 appear at 36 F.R. 22384,
November 25, 1971, unless otherwise noted in the CFR.
A-2
-------
PART 51. REQUIREMENTS FOR PREPARATION, ADOPTION,
AND SUBMITTAL OF IMPLEMENTATION PLANS
Section
Subpart A—General Provisions
51.1 Definitions.
51.2 Stipulations.
51.3 Classification of regions.
51.4 Public hearings.
51.5 Submittal of plans; preliminary review of plans.
51.6 Revisions.
51.7 Reports.
51.8 Approval of plans.

Subpart B—Han Content and Requirements
51.10 General requirements.
51.11 Legal authority.
51.12 Control strategy: General.
51.13 Control strategy: Sulfur oxides and particulate matter.
51.14 Control strategy: Carbon monoxide, hydrocarbons, photo-
chemical oxidants, and nitrogen dioxide.
51.15 Compliance schedules.
51.16 Prevention of air pollution emergency episodes.
51.17 Air quality surveillance.
51.17a Air quality monitoring methods.
51.18 Review of new sources and modifications.
51.19 Source surveillance.
51.20 Resources.
51.21 Intergovernmental cooperation.
51.22 Rules and regulations.
51.23 Exceptions.
A-3
-------
Part 51 (continued)

Subpart C—Extensions
51.30 Requests for 2-year extension.
51.31 Requests for 18-month extension.
51.32 Requests for 1-year postponement.
51.33 Hearings and appeals relating to requests for one year
postponement.
51.34 Variances.

Subpart D--Maintenance of National Standards
51.40 Scope.
AQMA Analysis
51.41 Submittal date.
51.42 Analysis period.
51.43 Guidelines.
51.44 Projection of emissions.
51.45 Allocation of emissions.
51.46 Projection of air quality concentrations.
51.47 Description of data sources.
51.48 Data bases.
51.49 Techniques description.
51.50 Accuracy factors.
51.51 Submittal of calculations.
AQMA Plan
51.52 General
51.53 Demonstration of adequacy.
51.54 Strategies.
51.55 Legal authority.
51.56 Future strategies.
51.57 Future legal authority.
51.58 Intergovernmental cooperation.
51.59 Surveillance.

A-4
-------
Part 51 (continued)
51.60 Resources.
51.61 Submittal format.
51.62 Data availability.
51.63 Alternative procedures.

Appendix A—Air Quality Estimation.
Appendix B—Examples of Emission Limitations Attainable with Reasonably
Available Technology.
Appendix C—Major Pollutant Sources.
Appendix D—Emissions Inventory Summary (Example Regions).

Appendix E—Point Source Data.
Appendix F—Area Source Data.
Appendix G--Emissions Inventory Summary (other Regions).

Appendix H--Air Quality Data Summary.
Appendix J--Required Hydrocarbon Emission Control as a Function of
Photochemical Oxidant Concentrations.

Appendix K—Control Agency Functions.
Appendix I—Example Regulations for Prevention of Air Pollution
Emergency Episodes.
Appendix M~Transportation Control Supporting Data Summary.
Appendix N--Emissions Reductions Achievable Through Inspection,
Maintenance and Retrofit of Light Duty Vehicles.

Appendix 0—[No title—but related to §51.18]
Appendix P—Minimum Emission Monitoring Requirements.

Appendix Q--[Reserved]
Appendix R—Agency Functions for Air Quality Maintenance Area Plans
for the AQMA in the State of
for the year .

Authority: Part 51 issued under Section 301(a) of the Clean Air Act
[42 U.S.C. 1857(a)], as amended by Section 15(c)(2) of
Public Law 91-064, 84 Stat. 1713, unless otherwise noted.

Source: Part 51 appears at 36 F.R. 22398, November 25, 1971, unless
otherwise noted. AQMA considerations arose from 41 F.R. 18388,
May 3, 1976, unless otherwise noted in the CFR. NSR seems to
be required by §51.18, with Appendix 0 intended to assist in
developing regulations. Standards are in Part 60.

A-5
-------
PART 52. APPROVAL AND PROMULGATION
OF IMPLEMENTATION PLANS
Section
Subpart A—General Provisions
52.01 Definitions.
52.02 Introduction.
52.03 Extensions.
52.04 Classification of regions.
52.05 Public availability of emission data.
52.06 Legal authority.
52.07 Control strategies.
52.08 Rules and regulations.
52.09 Compliance schedules.
52.10 Review of new source and modification.
52.11 Prevention of air pollution emergency episodes.
52.12 Source surveillance.
52.13 Air quality surveillance; resources; intergovernmental
cooperation.
52.14 State ambient air quality standards.
52.15 Public availability of plans.
52.16 Submission to administrator.
52.17 Severability of provisions.'
52.18 Abbreviations.
52.19 Revision of plans by Administrator.
52.20 Attainment dates for national standards.
52.21 Significant deterioration of air quality.
52.22 Maintenance of national standards.
52.23 Violation and enforcement.

Subpart B—Subpart ODD
SIPs for States and Territories
A-6
-------
Part 52 (concluded)
Subpart EEE--Approval and Promulgation of Plans
Appendix A—Interpretive rulings for §52.22(b)—Regulation for review
of new or modified indirect sources.
Appendix B-C—[Reserved]
Appendix D~Determination of sulfur dioxide emission from stationary
sources by continuous monitors.
Appendix E—-Performance specifications and specification test procedures
for monitoring systems for effluent stream gas volumetric
flow rate.
Authority: 40 U.S.C. 1857c-5, 42 U.S.C. 1857c-5 and 6; 1857g(a); 1859(g)

Source: For Subpart A, 37 FR 10846, May 31, 1972, unless otherwise
noted.
A-7
-------
PART 60. STANDARDS OF PERFORMANCE FOR
NEW STATIONARY SOURCES
Subpart A—General Provisions
Subpart B~Adoption and Submittal of State Plans for Designated Facilities
Subpart C—[Reserved]
Subpart D—Standards of Performance for Fossil-Fuel-Fired Streat Generators
Subpart E—SOP for Incinerators
Subpart F—SOP for Portland Cement Plants
Subpart G~SOP for Nitric Acid Plants
Subpart H--SOP for Sulfuric Acid Plants
Subpart I—SOP for Asphalt Concrete Plants
Subpart J--SOP for Petroleum Refineries
Subpart K--SOP for Storage Vessels for Petroleum Liquids
Subpart L--SOP for Secondary Lead Smelters
Subpart M--SOP for Brass and Bronze Ingot Production Plants
Subpart N—SOP for Iron and Steel Plants
Subpart 0—SOP for Sewerage Treatment Plants
Subpart P—SOP for Primary Copper Smelters
Subpart Q--SOP for Primary Zinc Smelters
Subpart R—SOP for Primary Lead Smelters
Subpart S—SOP for Primary Aluminum Reduction Plants
Subpart T—SOP for the Phosphate Fertilizer Industry: Wet Process
Phosphoric Acid Plants
Subpart U—SOP for the Phosphate Fertilizer Industry: Superphosphoric
Acid Plants
Subpart V—SOP for the Phosphate Fertilizer Industry: Diammonium
Phosphate Plants
Subpart W—SOP for the Phosphate Fertilizer Industry: Triple
Superphosphate Plants
Subpart X--SOP for the Phosphate Fertilizer Industry: Granular Triple
Superphosphate Storage Facilities
Subpart Y—SOP for Coal Preparation Plants
Subpart Z—SOP for Ferroalloy Production Facilities
Subpart AA—SOP for Steel Plants: Electric Arc Furnaces
A-8
-------
Part 60 (concluded)
Appendix A—Reference Methods.
Appendix B—Performance Specifications.
Appendix C—Determination of Emission Rate Change-
Appendix D—Required Emission Inventory Information-

Authority: Sections 111 and 114 of the Clean Air Act, as amended by
Section 4(a) of Public Law 91-604, 84 Stat. 1678
(42 U.S.C. 1857c-6, 1857c-9).

Source: 36 FR 24877, December 23, 1971, unless otherwise noted
in the CFR.
A-9
-------
APPENDIX B
SOME SPECIFIC AIR QUALITY MODELS
B-l
-------
APPENDIX B
SOME SPECIFIC AIR QUALITY MODELS

In Chapter IV of this report we subdivided air quality simulation
models into the following generic categories:

> Rollback
> Isopleth
> Physico-Chemical
- Grid
- Trajectory
- Gaussian
- Box

In this appendix we associate with each of these generic types a number of
specific models. We include many of the models with which we are familiar.
Because the list is intended only to be a representative one, we do not
enumerate all available models. Many others, particularly Gaussian models,
certainly exist and would be appropriate for use in the proper circumstances.
In compiling this list, we have drawn heavily from material in Argonne (1977),
EPA (1977a), and Roth et al. (1976), as well as various program users'
manuals. Also we have made no attempt to screen the models for technical
acceptability.

Among the information contained in the accompanying table is the fol-
lowing: model developer, EPA recommendation status, technical description,
and model capabilities. The last of these is further subdivided into
source type/number, pollutant type, terrain complexity, and spatial/
temporal resolution.
B-2
-------
TABLE 8-1. SOME SPECIFIC AIR QUALITY MODELS
Cateoary
ivtn.cn
Linear Rollback
EM tan isopietk
Nethod
EM
Recuiumnda t Ion
Status

Accepted by CM
for reactive and
non-reactive pol-
lutants;
nonverlflable
Not yet i
•ended ((Kb
active Interest.
Ofveloper
[M
triable
ItlOB
itttm;
rlflablc
Application]
(Sui Rafael.
w
Ptterlptlon
A linear relationship It attuned
between MC mttfom >»0 peak
pollutant level
Itopleths of constant peak Oj on a plot of
NO! vs. HHNC are constructed using • cher-
1C<1 kinetic UOClianlsB timed te fit »•(
chauber dIU for the Isopleth asyeptotot.
The diagram Incorporates diurnal variation
In solar radiation; end Insolation. dllv-
tlon. and Inversion typical of • stagnant.
•rfd-sumr *>y l» LA. Entry to t\»trm
irith (-9 >.•.
•o trMtwnt of
Individual
No troaeont af
lodlvldtMl Mums
OntdMit (bot Hot
Mt DOM comldorvd
aopllod to
dot
consider**

HixtnUoo cot-
back expired
IKS) U MC
dnt (
active and
noiwacilvff
pollutanul
liopltth dlioru It stallar to OM «w«
In EM MtMM *xctpt for • comunt !•>
rather thM • dlvnulljr "rylns on.
Entry ptratttn tUo dlffor. Abtolnt
MWC and HO, concMtratlom *n Mod to
oiUraliw UN ntnt to unit* tkt
actual oinkod moAlot a tm
No tnotaont of
IndlTldoal
net leolonal
comldcnd o«lj
(1-koor (ur»M)
1-1.1 led)
Not con- National
(idored only
(1-koor (urbM)
lapllod)
Peranuoo cut-
back nqolred
(POI) In me
Neaional
CorMo)
m/mc
Regional widen
nnsico-offHiCN.

(MB trld-Keolon Or1«.t« f BB)
Total aero- sorfac* up to J*-
sols. four ruughoest
HC categor- coef-
les (single flclents
bond, slow
double bond.
fast double
bond, car-
bonyl bond)
JO. Norliontal As fine at At fine as
CO. features can input data grid cell
.
. HjOj
MM], »?05
7. HOj,
,
«"7.
RC03.
.
Thr** nt
categortes
be handled .
through wind
field; ver-
tical fea-
tures thr*
cell verti-
cal dlnen-
tion
resolution
(Tta* scale
up to ?«-
hours)
lootlal can- keglonol-saile
centnttun prokleus; •**!-
nous for eac* uatlon stuoHes
hour far eae» have bm c»r-
pollutaM of rled oat for tA
Interest; v*rt1- Us Vegn. and
caVconcentro- Denver; vita
tie* profiles; Sacranent* and
paiM predic- St. Loais soon
tloujs at BOB!- to follow
torlng stations;
concentration
Isapletkt
Roughly the Regional-scale
sane as for the orations; an
SAI node! evaluation study
has bnn con-
ducted for the
V lay Are*.
«*chan«su dividing HC Into "olef Ins a«d
reactive anMtics': -para«f«n».
hlghl/ reacti
less reactive a
ates'; •aldehyd
kewnes-); onl
1s considered; uelt consistent
tlcs. and
sone ar««tles
B-3
-------
TABLE 8-1 (Continued)
CM
**co»Mr*</] of Ut oanltlat.
l»f«m»tiw %t u location >«4 MM of
each, oartlcle nutt be oaintafned. foa-
*w i«*«r
*«•
»u
u HO. C
. W, so. torn**
J» rto* H to ffl* «
|r.de,)l
tkrouo> vlnd- (T1o*
field !•»»
. .
tur*>: MM nrylny mt»(m and i-0
•In*; fi«* chwicil tftcin k,»«< •»
m) (tortlim;
comUitt.
Ml
•nliMtfM ttuty
tat kMM can-
U latin |a»roo-
•ent idCn ootor.
vatlon not tM
toad for 1 tu-
«ow reported «o
Itn ptoar to
Ml.rt.tt •?.)
trl«-$l»t1flc Sourcp Ortantad

EWtt--E»M and No raeommdatton
»t«»*
EnviroiMOntal
Rcuarch and
TocRnoloajr
Stiuliut tlw d<«p*n|UCIM>
>uck »( M*r field dl>p*n1«ii f«r nigfc-
Miy tOuTCH, fwMfktim Mir to pain
. i-ii,, nwpir.
tun*: tlM Mnrliif oliitaM ••< f-«
«r vuwto-d-0 idndtt tM M*ef«t eM»-
litrr/dK«/) m t/tmt rlM foMU
••Utiwu «»M4 Mil »)u4i tta*
(«• mrtfetl dlffmlHtgri kvlmut
dirrmivl^r eftM Mil
to fine at to fin. at Concontrat«a
Input dtt* arid col) ffolda at
»e nod for tloni

tola *a»H-
'todltt (2-0
»r»»a»)! "
Mt of J-fl .rr-
t«M In lonj-
~~bi tranm-
E>MriMt1M of
rilM |Mct ill
tWrMn
lUtm
SCi
Inc. (Stl«v*
«t «1.)
*. 1-0 •
1* aw
«» Brt»llUd
CickmrBtotr
ind mrttnti
itatin
(KtiM
Cntnt
•nMrc*
»t «).
>t«M
UK (U»U
brt*ra. tt);
HOD off *r*d
by CUT
Ptelflc
tiwiranimul
Wrvlcn.
Iw. (butt
Noxtu, C*(
4-tmd
'

to Miit tttrtn I*
. Salvm conurntlw «f
•Mi •OMtllm. It wlnlltM It* wn
1-D irtfld f «»ld n«n| potMrtUI
ld> art t*f*t: f in it«»»«ud wi«f a 1Z-*U*
raacttw •Kknldo); UM Mrttul <•)«••
4« ldod tit* umnl e»l»»; *ortt
dlffutlvltr *> a fwcttM of
ator* (radlont and kotoM o«M* surlocoi
d t»rc«w ara
f loutiUd n**f • tt-iuo wtiun««a
dnloMd to (roat anpirlaii* *nd 1«>*
roacttw Hti M wttcal dlf'utivltf
•Ndtd; tartulDM ara camortro U
•quInlMt aroprtxw and UK.
u» (• w
CooolM to fIM at M fine M
Input data orld col) tlal
rewlktlan tin
at orld »•»-
tiom
Itno
Ana
rionted
, ca.
IM P»H
toqul fao-
turx tan kt
hanolod tkra
tnt Kind
to fiM at Only olano;
tooot oata trajectory
rowlotlo* treat *or-
ora) cowl* air
ttdo.
ftudlM;
toojltm. for
••Mflti lOOClflC
••(••in art
•Itor
Sonera tint St«-
tloariW. Mid
. statlM
(*T.. M/M}. <-j»
ftadtaml oaidint;
aaolM W U
latin (tin dan
, M). Mot tipllclt to flM at «»1y
f. CO. loot l»pH. «o»«t *»t* —-•
\, C»% IMUI fM- ro»oto
-------
TABLE B-1 (Continued)
CM
>«CO*M ndatlon
Status
Developirr
JUTTSIH
Mo recommendation l*U
status
Trajectory-Specific Source Oriented
Ho
sUtn
I/It MB
citlom (for
th* California
Air Resources
»oard~CMm)
LAPS
Description
The model Is tnjectory oriented and
Intended to In used for regional appli-
cation. It appears ta be similar to
01FKIH in tint the air column allows up
to 10 vertical lay«n. Features:
hourly emissions and horizontal 2-0
winds arc Input; sipulated species
include four HC classes (alkenes. aUanet,
aromatic*, and aldehydes) as Mil at o»l-
dantt. SO?, and sulfate' • S4-step mech-
anism it employed; no horllonUI diffu-
sion; vertical dlffustvlty specified at up
to 10 vertical levels with tim* varlitlo*.

The model t> designed to estimate concen-
trations of reactive specie! downwind of
a Ungle point or area! source, hied on
Lagranglan (noving-with-air-parcel) ver-
sioa of ness conservation equation, allott-
ing for background entralnment. th* air
parcel containing the emitted pollutants
is allotted to drift dowiMlnd-
The parcel eipands froai th* plume height
according to Measured plune width and depth
at functions of downwind distance or to
the Posoulll-Glfford methods. Features a
modified M-S-D mechanism for HC-M^iSOj;
I-D wtnd field; pli»» rise Input.

The uodel 1s designed to calculate concen-
tration fields downwind of single or eul-
ttple concentreted sources. The air parcel
II •Honed to drift downwind, dtsperjtn?
laterally and vertically. Features: equi-
libria* coupling of NO. MOj. and 03; first
order conversion of SOj to sulfate; eddy
dlffinlvltles; 2-0 wind field; Brlqji pluoe
rlso; up to 7 species can be specified.
Sowrvs
•woer Type-
Aiiy mmber »o1nt
Ar»a
Line

'"Type**1
Oj. M.
H0». SO}.
Sulfate
Four HC
groups
(alkenes.
alkan*s.
•romatlcs.
aldehydes)

Complexity
Not explicit
but hori-
zontal fea-
ture can be
handled thru
wind field

Resolution
Temporal
As fine at
Input data
resolution

Spatial
Only along
trajectory
track;
several
could be
run side-
by-side

Form of
Outwit
Temporal con-
centration
history In
lir parcel

Problem
Addressed
Regional uldant;
applied to Las
Vegas (UTS).
Troctet (If?*)
and SF Bay Are*
(1*74) « Mil
as LA Basin (U7J-
Eschenroedor and
tmrtlnel)
Single
sovrce
, NO. No terrain
j,S02, Interaction
Sulfate currently
As fine at Resolution
Input data all the My
resolution, to source
long-range (near-.
transport medium-, and
as well far-field)
Temporal con- Single source
ctntratlon problems, e.g.
history 1* refineries, power
downwind plants for fumi-
dtrectlon gattom; trapping;
applied U: Dost
Landing PP. Mwjterey;
Lot Alamitoi PP, IA;
Up te 10
Mlat
sources;
uparat*
•real
lources
•olnt. areal. Oj. NO. No terrain As fine as Resolution Vertical co»
rlevated NO,, S0>. Interaction Input data all the way tratlem amps
«urces lulfate resolution to source vert, m 20 hi
Poi
e
sources
Oil i, LA; romr
Conuiis PP. farmfme-
ton. M; Noons PP.
Nook*. NH; Jefferson
PP. Jefferson. Teus

Vertical concern • Single or few
tratlem mopt (10 tources problems;
« 20 hours only analytical
sta.); ground problems attempted.
concern, cmps and e.g.. steady-state
contours; concen- Couutan plumes
vs. dlst.; ground
com*, crossptot
long-Tern Averaolno
AQDH—Air
Quality Display
Model
Reconv-nded by
EPA In guideline
(No. 2t)
TRU (for This Is a cllnatologlcal steady state
Public Gaussian plunt endel that estinotes the
Health annual arithmetic average SO; and partlcu-
Service) late concentration at ground level. A sta-
tistical nodal based on Larson (US!) Is
used to tranifom th» average concentration
data from a limited number of receptors .
Into an expected geoxetrlc wan and amilinu*
concentration values for several averaging
tines. Features: treats one or two pollu-
tants siaultaneously; Holland (1953) pluw
rise; no plune rise for areil sources; no
temporal variation In sources; 16 wind dir-
ections; • wind speed classes; S stability
classes (Turner. I96«); Pasoulti-etfford
stability coefficients; no chemical mechan-
ism.; perfect reflection at ground; no
effect at ml>1ng height until «, t 0-47H
(•hen i • «2)> fqr * > *?• uniform niiing;
no variation in wind speed with height;
linear superposition of sources;
°z(>) - a>b * c; does not treat fumigation
or downwash; Larson procedure assumes log-
normal concentration distribution and
power law dependence of median and miilnuo
concentrations or averaging time.
Nany (up Point, areal.
to M user> elevated
specified sources
recta-tor
locations;
•• **m
receptors
located on
a uniform
SO,. TSP
(could bo
used for
MO, Mitt
NOj ob-
tained
thru use
of an
Relatively
flat ter-
rain; no
Height dif-
ference
allowed
source and
appropriate receptors
factor)
Steady- Regional
state; scale
averegl ng
time -
1 mo. to
1 f.i
Larsen pro-
cedurt can
be used to
transform
to 1-24
hour
averaget
1 mo.-l yr. aver- Regional long-
aged concentre- term averages for
tlons; Individual relatively Inert
point, area pollutants; urban
source culp- areas prtanrily
ability list
for each
receptor
B-S
-------
TABLE B-l (Continued)
CM
B*ftda
status
Developer
Ottcrtptlon
Sources
•.•solution
Temporal Spatial
»er»ef
Output
«ddrts«ed
COM jnd CDHQC--
Cllnutoloqicat
DisolJx Model
iKOnrviutal by
f» tn gut del In
(No. 27)
nils it • ciiMtoioaicai steady state
Gaussian plimm uadtl for dtteralnlne. lonf-
term (seasonal or annual) arithmetic aver-
aop concentrations
ference 1 no. to
allaxed ot* 1 yr.;
timon source larsa* pro-
end receptors cedure can
bo Meal to
transform
to I-74
now
arareees
icale
\ m. U 1 yr. Peflonal
avoraaed concen- tan averaon
tratlons; source- for relottvely
roceptar culp- inert polle-
aolllty list tants; erban
(COMQC only) areas primarily
TCM--Teus No racnmnndatlon Te*as Air This Is a cllnatalaelcal steady state
Cliwtolofical status Control Sausstan plune xxto) sinrflar to Ofl bvt I "car-
Hoot 1 Board porattno, deslon features •reducing rue tloe
by as suck as tw orders of Monltyde.'
Features: doMMSk and fuoloatlon not con-
sidered; all sources nave a stnol* averaee
•missions rate for tM averaolnf period (I.e..
oontk. season, year); Pasoul 11-Clfford-Turner
itaklllty classes; artitne helent not a factor
because no effect for typical clta*taloey.
IMItatted Point. Mne.
(arbitrary areol. ele-
receptar «ate*
location— sources.
MI iOde) tall stack-
sources la
skort-teni
tial-
SO}. TSP. Dolatlvely
CO. NO, flat ter-
rain; M
talent aHf
Staedy-
itate;
evenelng
time •

aliened ke- I yr!:
tarsen pro-
«0«tOMl
ke used to
transform
to I-M
tlon; concentn-
tia* it (rid
points (op to
SOrfO): a llst-
in§ of tke *
kloMst contrl-
kutart toccn-
centratlom at
eack (rid point
for relatively
Inert polle-
tana; orkan
areas primarily
ttTW—can
also bo ysed
for thprt-tem
tvertotiKj
ilo
status
racommmdation CUT
Hits Is a steady-state sector-averaead
Cavsslam plune nosel tkat calculates en
tratlons of up to sli pollutants from an
unlimited number of point, line, and areal
sources. The nodal can ke operated eltker
In the •cltmataloglcal* made or the •seouen-
tlal* node far skort-ttrm a»er«(in( times.
Feotures: crosswind dispersion function may
ke »ecter-avoraoad over «.»*! for •leajuontlal*
mode and 'tall stacks.* tke crossvlnd dlsper-
siom function is (Iven ky tke eipectod value
wltliln the ZZ.S* sector for receptors ultkin
the doMnMlnd sector; for receptors adjacomt
to tke doMMind sector, a formulation Is used
•kick avoids CMttrltM one-kmr value* v*M
accvmutatlno. concentration estimates for
multiple kour avoraoos; erloes plumi rise;
sleek tip damask Ulftord) for Ull st»cU;
•Ind speed poner lav; keif-Ufa decay fac-
tors for species: chemistry not treated dir-
ectly; perfect raflection at around and nti-
tno, layer; unlaue tmiislens rate for eack
source Uial may ke varied dlurnally. Mekly
or mantkly; S stability classes.
unlimited Point line
(up to la areal elm-
point! at -tall itert-
any Mine- sources In
ted I oca- inert-term
tlgns) -laouentlal-
Flit and juady- *e(1anal
kllly tor- stetet can scale
rain; a tall handle
stack- tar- skort-torm
rain correc- in 'toouan
tlon is evall-tlat- moot
amle for 'so- (I. 1. •.
eumtlal- and M hr)
node aut not and lenm.
•cllmtcoloel- term In
cal- name; •cllmeto-
alsa i unloue lofictl-
elevation mode (I ma..
can ke fpecl- seasonal.
flea1 for re- I yr.)
copters;
plus* and
mlileo depth
respond to
terrain
obstacles
CancantntloM
at eack
avaraea* for
relatively IM
pellotanUi
B-6
-------
TABLE B-l (Continued)
Cite o» rr
Bevelooer
Description
form of
Output
Short-Term Averaging

APRAC-tA
Recommended by EPA (devol-
EPA In guidelines oped by
(No. 14 and 35) Stanford
Research
Institute)
CBTH-
thli alto can be
used far annual
averaging
Recommended by
CPA In guidelines
(Ho. 13]
This Is a model which calculates hourly Many UM CO. TSP
average CO concentrations for urban (an exten-
areas. Contribution from dispersion on live traf-
1 scales are calculated: extraurban, flc tnveft-
matnly from sources upwind of city of tary alue which If used thereafter); 4 lutera-
and local. I ram street canyon effects. ally dt-
Featuni: no pliant rise, fumigation or fined re-
downwash; helical circulation In street ceptors ere
canyons; hourly varying traffic emissions used on
and 2-0 Kind field; oz(i) • axb; link each street
Missions are aggregated Into area where
sources; no wind power lav: 6 stability street can-
classes (Turner); dispersion coefficients yon affects
from HcClroy and Pooler (IMS), modified are con-
using Letghton and OltMr (1953): no chem- sldered
Istry; perfect reflection at surface and
inversion (Ignores latter until concen-
tration equals that calculated using bo«
model and uses that thereafter).

Steady-state Gaussian plume model appll- Single Point CO. S0>.
cable In uneven terrain. Features: 7 source up HO,. TSP
stability classes (Turner. Pasquill); to l»
dispersion coefficients from Turner; no stacks (all
chemistry-. Irlggs plums rise; no fumigation assumed at
or downwash; perfect reflection at surface the saee
* aodel used to
calculate dispersion fron urban arva sources.
Analytic Integration of area sources. All
sources upwind of each receptor area are
sunned. It Is mail applicable In areas
wtiere no point source Information is avail-
able. Features: perfect reflection at
ground; Mixing height reflection not con-
sidered; hourly emissions and winds;
az(x) • ax°; dispersion coefficients from
S-IOi (I9«8|; stability classes fro. S-IUi;
narrow plune approx. (no horizontal 'ilsper-
slon); no plu«e rise: no chewlttr*.

EPA Steady-state (S-S) Gaussian pi UK nadel that
coacMites the hourly concentrations of non-
reactive pollutants downwind of roadways.
lased on analytic Integration, of line source.
It is applied to each lane of traffic. Fea-
tures: no cheailstry: perfect reflection at
surface and Inversion; one road or highway
segment per run; C stability classes (Turner);
dispersion coefficients from Turner; for dis-
tances « 100 n, coefficient from Zlanenaa
and Thoxxison (1975); no wind power law;
hourly emissions and 2-0 wind,

EPA An S-S Gauitian plunr moOel tKat considers
Multiple point sources, tt is based on
linear tddltlvlty of Individual source
effects. Features: hourly emissions and
winds; Briggs plune rise; r^ fumigation or
downwash; no wind power law. Turner stabil-
ity classes and dispersion c:oeffIcients
(horizontal and vertical): no chemistry:

(niltlple reflection).
Haay
SO,. TSP.
Simple
1-hr.. 8-hr. Regional
and 24-hr.
averages
(although
Hourly concern-
tration values
at receptors
Regional prmelmms
Involving 1«art
pollutants: erban
Up to 24 line
(arbitrary
receptor
and release
kcfotts)
Up to ZS Point
(up to 30 (alevated)
receptors)
CO. TSP
Level ter-
rain
average
can be
estimated
Hourly
(1-24 hr.
average)
hear to
medium field
downwind
One-hr. aver-
age concentra-
tions at each
receptor
Regional- or
highway-specific
problems for
nonreactlve
pollutants
S02. TSP
Flat
terrain
Hourly
(1-24 hr.
average)
Regional Hourly concen- Regional ,
trations; source source problems;
contribution
list at each
receptor; aver-
age concentra-
tions
urban area;
reactive pollu-
tants
B-7
-------
TABLE B-1 (Continued)
EM
RcconnendJttM
Statin
Developer
Description
Capabilities
PTBIS
by EPA
ftecomwided by EPA
EPA A steady-state Gaussian pluec model that
estteiatei short-term center-tine concentra-
tions directly downwind of • point source.
Features: same mi RTKIP.
An S-S taussian plumt model Out Until the
maximum short-term concentrations fro* t
single point court* as • function of sta-
bility and wind speod. Features: same ) - l.»H and uniform mlilnf
thereefter; mliint kelakt determined fron
dice-daily tenperature soundlnos (stability
class, also).
Niny Point SO., TV Flat Hourly «e»lon»l
(recevters areal terrain. >nd (urban) and
ira all at iveraond rural
tne un uo to
Mlekt) 24 tirs.
Hourly and aver- national proolen
aee concentre- for nonreectlve
ttans at recee- pol Intents; urban
tent 1 tori ted ai
lawn contrlbu-
tlan list: cunu-
latlv* fiaouancj
alstrtbutton date
VM.LCT-
This can also bo
uied for annual
mraalnf (cll-
•ujtalogtcel
•Oder1
TEW—Teus
episodic Model
•ecomiended by EPA CPA An S-S GawtUn oluw uadel for calcutatlnf
In guidelines annual and nMlnun 24-keur evernoe SOf and TSP
(Ho. 14) ' from single point sources In coonlex terrain.
Features: cllnBCologlcal an* short-tern
nodes; 16 »1od directions and ( «lnd speed
cateaortes; Irlggs plum rise (1971. 1972);
i stability (urban) classes (Turner. I9M):
dispersion fron Pasoulll (1961) and Sir ford
(1961); 6 stability classes for rural; no
«ind power leu; exponential decay for choo-
Istry and removal.

ition Texas Air An S-S Gaussian plune model for predicting
Control short-tern concentreclons (10 oiin. to 24
Board hour) fron) nultlple point and area sources.
Calculations are porfomed for I u 24
scenarios (eateoroloay. averaging tine, end
mrtxinej height). Features: Irtggs plune, rise;
mixing height penetration factor; up te S
pollutants; m chemistry but exponential
decay, no domwash on fumigation; »iod paamr
!«•; rasauf 11-Cl'ford-Turner stability classes
dispersion coefficient from Turner; perfect
reflection from surface and inversion until
«, - 0.47H
No
status
Point, anal
(treated as
a point
)
SO., TV
Up u SO
(112 re-
captors on
radial
grid; can
be at dif-
ferent
taoelagi-
cal bgts.)
Up to M Point, area I SOj. TV
Conplex
terrain
Short- and leglanal
long-Urm (urban and
average rural)
(24-hr, and
ml)
Flat
terrain
us U200
araal
Short-term neglonal
(I. 1. 24 (urban)
hrs.)
(mote
iOxSO re-
-
,._'»««*»st
24-hr, ceneon-
tratlan. source
contribution
llsti long-term
node: arithmetic
emams. source
cantriWttam
list
rssan concentre- Imgienel •rumlemm
timm for each for nemraactlve
grid point (10 pollutants; urban
min.. 30 mln.. area
1 hr., 1 hrs..
and 24 hrt.);
printed plot;
culpability Hit
grid)
B-8
-------
TABLE B-1 (Concluded)
CM
Reco^miidatfon
Status
"ascription
TAJ>AS--Topograpnic
A1r Pollution
System
AQSIK-AIr Quality
Short Term Model
Ho recomnendation
status
CALINC-2
No reconnendatio
status
No rccmmendatlo
status
USD* Forest This model combines a simulation of the
Service wind field over Mountainous terrain with a
Gaussian derived diffusion model. It pro-
vides an estimate of the total allowable
emissions within each of a number of grid
cells (ranging from 0.2S km* to » km2) to
maintain a preselected level of air quality.
The diffusion model Is employed in each grid
cell to provide an estimate of the mixing
conditions within these cells. These con-
ditions are combined wit* the Pollutant
Standards Index such that a maximum allowable
ealsslon is calculated. Features: wind
model (Cressman objective analysis, poten-
tial flow over topography, influences of sur-
face temperature and roughness); Gaussian
model (a. and oz from Turner, effects of
mass flow divergence included, stability
classes from Turner, no upper bound on diffu-
sion although the wind is calculated assum-
ing a lid at a specified height above the
topography); the calculated wind follows the
terrain and thus gives a vertical wind com-
ponent; no chemistry; no explicit treatment
of plume behavior.

Illinois An S-S Gaussian plume model for estimating
environmental short tern concentration averages from oul-
Protectlon tiple point sources in level or complex ter-
Agency rains. It can simulate late Inversion break-
up fumigation, lake shore fumigation, and
atmospheric trapping. Features: one or two
pollutants simultaneously; no chemistry;
Brlggs plume rise; no downwash; wind power
law; user-supplied stability classes; disper-
sion coefficients from Turner (1969): perfect
reflection at ground and mixing height.
California *> S-S Gaussian line source model for traffic
Air Resources i«wact assessment. Features: no chemistry;
Board (CARS) perfect ground reflection; Pasquill stability
cesses; hourly emissions; some accounting
for depressed highways
Atmospheric The region of Interest Is assumed to be
Turtwlence emconpassed by a single cell or bos.
and Otffu- bounded by the inversion above and the
slon labor- terrain below. All concentrations ar*
atory— ATOL assumed to be In steady-state. Features:
(Oak Ridge. for given time, constant emissions rite
Tenn.) and simple winds; seven-step chemical
mechanism proposed by Frledlander and
Seinfeld (1969); uniform and constant
wind and constant mixing depth.
Stt

Many

urces
Tvpe
Point (no
distinc-
tion made1
between
point, line.
and a real
sources
in
Type Complexity
SO*. TS». Complex
CO

Resolution
Tamperel Spatial
toth short- limited
term and regional
long-term
estimates

OutDUt
Allowable emis-
sions In each
grid cell for
each pollutant
of Interest

Addressed
limited regional
Impact problems
in complex ter-
rain; nonreactlve
pollutants

Up to 200 Point
sourcti (elevated)
(up to goo
receptors
located on
a unlforei
rectangular
grid);
unieue to-
pographic
elevation
far bout
«•«» (an line
ei tensive
traffic
inw'nr Is
reqt. -11
All emitting
Inte a single
box

SO.. TSP Mostly flat
terrain, out
sooe correc-
tions for
complex ter-
rain

CO Relatively
flat terrain

Oj. NO. Not explicit
1OI. IMC

Short-tene Regional
(1. 3. and
24 hr.
averaging)

Short-term

Temporal No resolution
resolution
can be
obtained by
varying
Initial
conditions
to match a
temporal
pattern
Average concen-
Cratlonf at re-
ceptors; source
contributions
at receptors

Hourly
concentrations
at receptors

i Concentration
•ilwes et the
time considered

Regional point
source problems
for nonreactive
pollutants; urban
areas; shorelines'

Regional CO
problems
from traffic
sources

Regional onldant;
it was applied
to LA Basin (30
Sept. 1969 data).
Otone predictions
•ere low.

B-9
-------
APPENDIX C
SOME SPECIFIC MODEL PERFORMANCE MEASURES
c-i
-------
APPENDIX C
SOME SPECIFIC MODEL PERFORMANCE MEASURES
Having discussed model performance measures in generic terms in
Chapter V, we now present some specific examples. We discuss each of the
four generic types of performance measures: peak, station, area, and
exposure/dosage. We include scalar, statistical, and "pattern recogni-
tion" variants.

1. PEAK PERFORMANCE MEASURES

The use of a performance measure of this type requires the modeler to
know information about both the predicted and the "true" concentration peak.
The measurement network must be so situated as to insure a high probability
of sensing the "true" peak concentration or a value near to it. There are
three characterizing parameters of interest: peak concentration level,
spatial location, and time of occurrence. The predicted and observed values
of some or all of these may be available for comparison. Differences in
their predicted and observed values represent the performance measures of
interest. These peak measures are summarized in Table C-l.

Each measure conveys separate but related information about model
behavior in predicting the concentration peak. Their values should be
examined in combinations. Several combinations of interest and some of
their possible interpretations are shown in Table C-2. The table is not
intended to include all combinations and interpretations. Rather, it
illustrates by example how inferences can be made about model performance
through the joint use of performance measures.
C-2
-------
TABLE C-l. SOME PEAK PERFORMANCE MEASURES
Type Performance Measure
Scalar
Pattern
recognition
a. Difference* in the peak ground-level
concentration values.
b. Difference in the spatial location of
the peak.
c. Difference in the time at which the
peak occurs.
d. Difference in the peak concentration
levels at the time of the observed
peak.
e. Difference in the spatial location of
the peak at the time of the observed
peak.
Map showing the locations and values of the
predicted maximum one-hour-average concen-
trations for each hour.
"Difference" as used here usually refers to "prediction minus
observation."
Several points are contained in Table C-2. While a large difference
in peak concentration levels might in itself be sufficient reason to question
a model's performance, a simple difference in peak location might not. If
the concentration residual (the difference between predicted and observed
values) at the peak is small (good agreement) and yet there is a difference
in the spatial location of the peak, this may be due mostly to slight errors
in the wind field input to the model. The slight offset in the location of
the peak might cause predicted and measured concentrations to disagree at
specific monitoring stations, particularly if concentration gradients within
the pollutant cloud are "steep." However, a small displacement in the con-
centration field, unless it resulted in a large change in population exposure
and dosage, may not be a serious problem. Model performance might be otherwise
acceptable.
C-3
-------
TABLE C-2. SEVERAL PEAK MEASURE COMBINATIONS OF INTEREST
AND SOME POSSIBLE INTERPRETATIONS
Residual Values
Concentration
Level
Location
Event-Related*

Small Small

Large
Timing
Small
Small
Large
Large
Any value Any value
Fixed-Time^

Large Large

: Small
Some Possible Interpretations
Model performance in predicting the
concentration peak is acceptable

Model performance is still good in
predicting the peak concentration
level
There is a possible error in the
wind field input
Concentration level prediction is
good

There is a possible error in wind
field input
There is a possible error in the
chemistry package or emissions input

Model performance is probably
unacceptable
Model performance may or may not be
acceptable; event-related (peak)
residuals must be examined to make a
final judgment
Model performance is probably
unacceptable
Pollutant transport is handled accep-
tably well
There is a possible error in the chem-
istry package, the emissions input,
or the inversion height time and
spatial history
Residual values are calculated at the time an event occurs (the peak).

Residual values are calculated at a fixed time (the time of the observed peak)
C-4
-------
On the other hand, if the spatial offset of the location of the peak
is accompanied by a significant difference between the predicted and observed
times at which the peak occurs, more serious problems might be suspected.
Not only might there be a wind field problem, but the chemical kinetic
mechanism may be giving erroneous results (if the pollutant species of
interest is a reactive one). Alternatively (or additionally), one might
suspect that the emissions supplied as input to the model were not the same
as those injected into the actual atmosphere. Another possibility also
exists. Slight differences between the modeled and actual wind field
might result in the air parcel in which the peak occurs following a space-
time track having sufficiently different emissions to account for differences
in peak concentration values.

Additional clarity of interpretation can be achieved in another way.
We can compare concentration level, location and timing, not just at the
time a specific event occurs (the peak, for instance) but also at a fixed
time (the time at which the observed peak occurs, for example). Suppose
that the concentration level residual at that fixed time (the difference
between maximum predicted concentration and the observed peak value) is
large but the spatial one is not. In this case, one could conclude that
the model reproduced the pollutant transport process but was unable to
predict concentration levels. This could result from many causes, among
which are errors in the chemical kinetic mechanism, the emissions input,
or the inversion height space/time profile. Whatever the cause, however,
the conclusion remains the same: Model performance is probably inadequate.

Alternatively, if both the fixed-time concentration level and location
residuals are both large, a firm conclusion about model acceptability may
be premature. Performance may or may not be satisfactory. A comparison
with the event-related peak performance measures is necessary before a
final judgment is made.
C-5
-------
If the model being used is capable of sufficient spatial and temporal
resolution, a "pattern recognition" performance measure may be of some use:
a map showing the locations and values of the predicted maximum concentrations
at several times during the day. Such a map is shown in Figure C-l. It
was produced using the SAI Urban Airshed Model simulating conditions in
the Denver Metropolitan region.

2. STATION PERFORMANCE MEASURES

The use of a station performance measure requires the modeler to
know, usually at each hour during the daylight hours, the values of both
the predicted and observed concentrations at each monitoring stations. From
the two concentration time histories at each site, a number of performance
measures are listed in Table C-3, divided into three categories: scalar,
statistical, and ."pattern recognition.-

Station measures are the performance measures whose use is most
feasible in practice. Their calculation is based upon the comparison
of model predictions with observational data in the form that it is most
often available—a set of station measurements. By contrast, peak
measures require the observation of the "true" peak. If this peak value
is not the same as the value recorded at that station in the monitoring
network measuring the highest level, if the location of the peak is
somewhere other than at that station, and if its time of occurrence is
different than the time of the peak observation, then the calculation of
peak performance measures may not be feasible. Although one can sometimes
use numerical methods to infer from station data the level, location and
timing of the peak, results are subject to uncertainty.

Similarly, area and exposure/dosage measures require knowledge of the
"true" spatially and temporally varying concentration field. However,
unless circumstances are simple and the monitoring network is exceptionally
extensive and well-designed, the "true" concentration field will not be
known. The only data available will consist of station measurements. Infer-
ence of the concentration field from such data can often be an uncertain
and error prone process.
C-6
-------
* -
SOUTH
Meteorology of 3 August 1976

FIGURE C-l. LOCATIONS AND VALUES OF PREDICTED MAXIMUM ONE-HOUR-
AVERAGE OZONE CONCENTRATIONS FOR EACH HOUR
FROM 8 a.m. TO 6 p.m.
C-7
-------
TABLE C-3. SOME STATION PERFORMANCE MEASURES
Type
Performance Measure
Scalar
Statistical
Pattern
recognition
Concentration residual at the station measuring
the highest concentration (event-specific time
and fixed-time comparisons).
Difference 1n the spatial locations of the pre-
dicted peak and the observed maximum (event-
specific time and fixed-time comparisons).
Difference 1n the times of the predicted peak
and the observed maximum.

For each monitoring station separately, the
following concentration residuals statistics
are of interest for the entire day:
1) Average deviation
2) Average absolute deviation
3) Average relative absolute deviation
4) Standard deviation
5) Correlation coefficient
6) Offset-correlation coefficient.
For all monitoring stations considered together.
the following residuals statistics are of
interest:
1 Average deviation
2 Average absolute deviation
3 Average relative absolute deviation
4 Standard deviation
5) Correlation coefficient
6) Estimate of bias as a function of
concentration
7) Comparison of the probabilities of concen-
tration exceedances as a function of
concentration
Scatter plots of all predicted and observed
concentrations with a line of best fit deter-
mined in a least squares sense.
Plot of the deviations of the predicted versus
observed points from the perfect correlation
line compared with estimates of Instrumentation
errors.

Time history for the modeling day of the pre-
dicted and observed concentrations at each site.
Time history of the variations over all stations
of the predicted and observed average concentra-
tions.
At the time of the peak (event-related), the ratio
of the normalized residual at the station having
the highest value to the average of the normal-
ized residuals at the other stations.
C-8
-------
a. Scalar Station Performance Measures

Since the "true" concentration peak is not always known with confidence,
a surrogate is needed for determining model performance in predicting the
concentration peak. Such a measure is often based upon a comparison of
the predicted and observed concentrations at the station measuring the
highest value during the day. The comparison can be done at an event-related
time (the peak) or a fixed time. Since the values of the measures may
differ at the two times, the implications of those differences should be
considered carefully.

b. Statistical Station Performance Measures

Many statistical station performance measures are of use. Sometimes
the behavior of the concentration residuals at a single station is considered.
At other times, the overall behavior of the residuals averaged over all
stations is the focus of interest. In either case, however, several of
the statistical performance measures remain the same. We define them here
(the tilde - denotes "predicted," while m is the pollutant species, n
is the hour of the day, k is the station index, K is the number of stations
being considered, and N is the number of hours being compared:

> Average Deviation

~
" N£lfcl

> Average Absolute Deviation
N K
" N n=l k=l

> Average Relative Absolute Deviation
N K
(C-3)
1 Ck'
C-9
-------
> Standard Deviation
T 2: ?•" - •!•"
or, alternatively,
N .. K
^r {E [g (?•• -
-------
Another statistical measure is of interest. The correlation coeffi-
cient, as expressed below, provides an indication of the extent to which
variations in observed station concentrations are matched by variations in
the predicted station values. A close natch is indicated by a value near
to one (the value for "perfect" correlation).

> Correlation Coefficient
where
N r K
—Ur F fe
KM - 1 *—i I *—'
m ' n=l Lk=l
k=l
N K
E E e-n
m _ n=l k=l *
'E-

m
'c
*\
?,

N
E
. n=l

KN
K
E
k=l
KN

cm,n

N
V
N

n=l
m m
°c°c
m\J
-^c
(C-6)
(C-7)
(C-8)
(C-9)
(C-10)
If the value of the correlation coefficient is not close to one,
this may or may not be an indication that model performance is deficient.
For instance, suppose slight errors were embedded in the wind field
supplied to the model. Possibly, the only effect of this could be a
slight offset between the predicted and the "true" pollutant cloud location.
The concentration level and its distribution within the cloud might be
C-ll
-------
well predicted otherwise. However, the correlation coefficients computed
at individual stations (K = 1) might not demonstrate agreement between
prediction and observation, indicating instead the opposite. Conceivably,
this also might be the case even if the correlation coefficient is computed
using concentration values averaged for all stations (K - total number of
stations).

Another statistical measure is useful in overcoming this difficulty
when sampling stations are not too "sparsely" sited. This measure is the
offset correlation coefficient and is designed to compare predictions at
one station and time against observations at another station and/or time.
It is defined as follows:

> Offset Correlation Coefficient
N
Efe" - ps
where k is the index of the measurement station at which concentrations are
predicted, j is the index of the station at which they are measured, and An
is the time offset between prediction and observation; also
E
"ck H

(C-12)
C-12
-------
Many reasons can account for differences between prediction and
observation. The offset correlation coefficient itself cannot be used
to isolate specific reasons, but it can detect time lags or spatial offsets
between comparative concentration histories. A time lag might occur
because of slight differences between modeled and actual wind speed, diurnal
inversion height history, emissions, or atmospheric chemistry, as well as
any of a number of other reasons. These differences could manifest them-
selves at a particular monitoring station as a simple time lag, an example
of which is shown in Figure C-2(a). Also, for the reasons mentioned above,
as well as differences in modeled and actual wind direction, a spatial
offset can occur which could result in the actual and predicted pollutant
clouds passing over different but adjacent stations. A comparison of the
concentration profiles at these two stations, such as those shown in
Figure C-2(b), can reveal the offset. Good agreement could be inferred if
the value of the offset correlation coefficient between the concentrations
at the two stations* at the same time, assumed a value near one ("perfect"
correlation).

In using station data as a basis for comparing prediction with obser-
vation, the offset correlation coefficient should be computed as a matter
of course. For the station of interest (perhaps the one recording the highest
concentration value), computation of the following offset correlation coeffi-
cients might be revealing: first, at the same hour, with all adjacent sta-
tions (unless none are nearby); then, at the same station, for adjacent hours
(for example, one and two hours lag and lead); and finally, with all adjacent
stations and hours (to reveal the joint presence of spatial offset and time lag)
C-13
-------
An

Hour of Day

(a) Time Lag (Predicted and Measured Concentrations
are for the same monitoring station)
c
ro O

•»-> +•>
C CO
0) -M
O -M
O ro
Hour of Day
jH
•M C

S- •!-
4J -l->
C (O
QJ +J

C
O -t->
Hour of Day

(b) Spatial effect (Predicted and Measured Concentrations
are for Different but Adjacent Monitoring Stations)
FIGURE C-2. CONCENTRATION HISTORIES REVEALING
TIME LAG OR SPATIAL OFFSET
C-14
-------
For all the monitoring stations considered together, several other
statistics are of interest. For instance, the variation of bias in model
predictions with the level of pollutant concentration can be plotted as
shown in Figure C-3 . In this particular example, based upon simulations
of the Denver Metropolitan region performed using the SAI Urban Airshed
Model, the fractional mean deviation from perfect agreement between predic-
tion and observation appears to vary randomly at the higher ozone concen-
trations. Aside from an apparent systematic bias at very low concentrations,
no conclusion of significant bias seems demonstrable.
Soot Jtean Squirt Ozone Concentr»t1on (ppha)
f(Observed)2 * (Predicted)21
FIGURE C-3. ESTIMATE OF BIAS IN MODEL PREDICTIONS AS A FUNCTION OF
OZONE CONCENTRATION. This figure is based upon pr
tions of the SAI Urban Airshed Model for the Denver
Metropolitan region.
C-15
-------
Residuals can vary in sign and magnitude during the modeling day.
It is often helpful to plot their diurnal variation. An example is
shown in Figure C-4, based upon predictions of the SAI Urban Airshed Model
for three modeling days in Denver. A discernable pattern might be sympto-
matic of basic model inadequacies. In this example, however, no simple
pattern seems apparent.

For each set of observations or predictions (for all stations and
times), there exists a cumulative concentration frequency distribution.
This describes the probability of occurence of a concentration in excess of
a certain value for the range of possible concentration values. An example
based upon the modeling effort noted earlier in shown in Figure C-5. A con-
clusion might be drawn from this figure: Although background ozone concen-
trations are not well-determined (low background concentrations are difficult
to measure accurately), higher concentrations are more predictably distributed.

By plotting observed concentrations against predicted ones (at each
station for each hour), a graphic record of their correlation can be obtained.
The degree of clustering of observation-prediction pairs about the perfect
correlation line provides an indication of the degree of their agreement.
An example is presented in Figure C-6. For each particular combination of
observation and prediction, the number of occasions on which they occurred
are shown.

Superimposed on the figure are the standard deviation bands (la) for
both the EPA standard and maximum acceptable instrumentation error. These
bands portray the extent to which station measurements are accurate indi-
cators of "true" concentrations. To conclude that a model is unable to
reproduce a set of "true" concentrations, one must know the value of those
concentrations. Measurements, however, are imperfect surrogates. If
concentration residuals are within instrumentation limits, differences could
be explained solely by measurement errors. In such a case, no further
conclusions could be reached about model predictive ability.
C-16
-------
E
ex
c -
S
C
O
•*• 9
M C
c a
•> t.
o «
c c
o at
o u

t> o
c u

M «•
o c
o
•o M
to
C O

•i e
MEW OF W.I STATIONS

O NCAN OF ALL STATIONS, N JULY 1975

O MEAN OF ALL STATIONS, ffl JULY 1978

0 MEAN OF ALL STATIONS. 3 AUGUST 197«

AVERAGE OF THE 3 DATS
I
1
n
n

9
in
10

TT
n
17
12
T
1
7
2

I
Tlw of 0«y by Hourly A»er.,1m, Period
FIGURE C-4. TIME VARIATION OF DIFFCREMCES DET'-'EEM MHANS OF OBSERVED AND

PREDICTED OZONE CONCENTRATIONS. This figure 1s based upon
predictions of the SAI Urban Airshed Model for the Denver
Metropolitan region.
-------
M
o.
QL
10
£
OBSERVED

PREDICTED

279 DATA PAIRS FROM 3 DAYS, M HOURS, 9 STATIONS
I ( l |f j.
•» •» W » M M • • N p I I I M tl tl
Probability of Exceedance of Given Ozone Concentration
FIGURE C-5. PROBABILITIES OF OZONE CONCENTRATION EXCEEDANCE. This figure is based
upon predictions of the SAI Urban Airshed Model for the Denver
Metropolitan region.
-------
P=Predicted
FIGURE C-C.
MODEL PREDICTIONS CORRELATED WITH INSTRUMENT OBSERVATIONS
OF OZONE (DATA FOR 3 DAYS, 9 STATIONS, DAYLIGHT HOURS).
This figure is based on predictions of the SAI Urban
Airshed Model for the Denver Metropolitan region.
C-19
-------
Some of the information contained in Figure C-6 is summarized in

Table C-4. The percent of prediction/observation pairs meeting certain

correspondence levels are indicated for this example. The extent to

which concentration residuals compare with instrumentation error is

shown in Figure C~;7. These same plots can be constructed for most

modeling applications for which station predictions are known.
TABLE C-4. OCCURRENCE OF CORRESPONDENCE LEVELS OF PREDICTED
AND OBSERVED OZONE CONCENTRATIONS
Percent of Comparisons
Meeting Correspondence Level
Correspondence Level Both Predicted and
Between Predicted and Observed Pairs Comparisons Observed Cone. > 8 pphm
1) Factor of two (2P > 0 > P/2) 801

.2) Computed value 1s within ± twice
S.D. max. prob. inst. error
(951 level) of observed value 100

3) Computed value 1s within ± S.O.
of max. prob. inst. error
(95X level) of observed value 93

4) Computed value Is within ± twice
S.D. of inst. errors by EPA std.
(95X level) of observed value 89

5) Computed value Is within ± S.O.
of inst. errors by EPA std.
(95X level) of observed value -60
94%
100
90
77
37
c.
"Pattern Recognition" Station Performance Measures
Several qualitative/composite model performance measures are useful

in comparing station predictions with observations. At each monitoring

site, for instance, the time history through the modeling day of the pre-

dicted concentrations can be plotted directly with the time history of
C-20
-------
DEVIATION OP PREDICTED VERSUS 08SERVEO POINTS
ROH PERFECT CORRELATION LINE (281 ONE-HOUR
AVERAGE DATA POIN1S)
TRUC - INSTRUMENTAL)
EPA ACCEPTABLE MONITOR (IVAN HAS • -8 PERCENT;
1 3 PPItl • 95 PERCENT CONFIDENCE LEVEL)
(TRUE - INSTRIMCNTAL)
MAXIMA PROBABLE ERROR (MEAN
BIAS • -B PERCENT! t 7 PPIIM
5PERCENT CONFIDENCE LEVEL)
Difference (pphm)
FIGURE C-7. MODEL PREDICTIONS COMPARED WITH ESTIMATES OF INSTRUMENT ERRORS FOR OZONE (DATA
FOR 3 DAYS, 9 STATIONS, DAYLIGHT HOURS)
-------
the measurement data. This is done in Figure C-9 for one of the days
(3 August 1976) in the Denver modeling example employed earlier.
Preceding this figure is a map in Figure C-8, which shows the names and
locations of the air quality monitoring stations in the Denver Metropol-
itan region.

For each hour during the day, the predicted and observed concentrations
each can be averaged for all measurement stations. The diurnal variation
of this all-station average can also be of interest. An example of such
a time history is shown in Figure C-10.

At the time the concentration peak occurs, the performance of the
model in predicting that peak is of interest as is its ability to predict
the lower concentration values at monitoring stations distant from the
peak. An indication of the relative prediction-observation agreement at
the peak versus the agreement at outlying stations can be found by com-
puting a composite performance measure. The ratio can be found of the
normalized residual at the station measuring the highest concentration
value to the average of the normalized residuals at the other stations.
If this ratio is large, better performance at the outlying stations than
near the peak can be inferred. If the value is small, the reverse is true.
If the ratio is near unity, agreement is much the same throughout the
modeled region.

The value of a concentration residual at a station changes during
the modeling day. If these changes can be tied to corresponding changes
in atmospheric characteristics (the height of the inversion base, for
instance), we can sometimes draw valuable inference about model performance
as a function of the value of these atmospheric "forcing variables." Some
of these variables include: wind speed, inversion height, ventilation (com-
bining the previous two variables into a product of their values), solar
insolation, and a particular category of emissions (automotive, for example).
C-22
-------
KEY
NG - Northglenn NJ
WE - Wei by GM
AR • Arvada 0V
CR - C.A.R.I.H. PR
CM - Continuous Air Moni-
toring Program [CAMP]
National Jewish Hospital
Green Mountain
Overland
Parker Road
NORTH
SOUTH
FIGURE C-8. MAP OF DENVER AIR QUALITY MODELING REGION SHOEING
AIR QUALITY MONITORING STATIONS
C-23
-------
15
12-- • *• • 1 «
tf-.ffftlTltTff»Tt
fstart hour":
. I i tc r L nvwi
Tine of Da>, Rjr Hourlv Interval [slop hour J
— 0— Observes
—<=> Predicted
FIGURE C-9. TIME HISTORY OF PREDICTED AND OBSERVED CONCENTRATIONS
AT MONITORING SITES. This figure is based on the
dictions of the SAI Urban Airshed Model in Denver
for 3 August 1976.
C-24
-------
n

ro
en
Time of Day By Hourly Averaging Period
FIGURE C-10.
VARIATIONS OVER ALL STATIONS OF OBSERVED AMD PREDICTED AVERAGE OZONE CONCENTRATIONS

This figure is based on the prediction of the SAI Urban Airshed Model in Denver.
-------
To examine residual values for cause-and-effect relationships, we can
plot on the same figure the time history of both the residual and the
forcing variable. Alternatively we can plot the residual directly
with the forcing variables. Examples of both of these are presented in
Figure C-11.

-------
model performance measures. In practice, however, we are seldom able to
resolve fully the "true" concentration" field, even if the model we use is
capable of doing so for the predicted field. This difficulty derives from
the limited sampling of measurement data generally available: Only measure-
ments at several scattered monitoring stations are recorded. Unless ambient
conditions are highly predictable and the monitoring network is extensive
and exceptionally well-designed, reconstruction of the "observed" concen-
tration field from discrete station measurements can be an uncertain and
error prone process.

Nevertheless, the observed concentration field can be inferred with
accuracy in some cirucmstances. In addition, models frequently can provide
spatially resolved predictions. Grid models, for instance, predict average
concentrations in a number of grid cells. Resolution is then provided as
finely as the horizontal grid-cell dimensions (on the order of one to sev-
eral kilometers). Trajectory model predictions can be used to calculate
concentrations along the space-time track followed by the air parcel being
modeled. Gaussian models are analytic and can resolve fully their predictions.
Thus, even if the observed concentration field is known only imperfectly,
the predicted field, because it is often much better resolved, can still
provide qualitative information about model performance. Further, the
shape of the predicted concentration field can suggest ways to extract
information for comparison with station measurements. We discuss "hybrid"
performance measures later in this Appendix.

In this section we present several area performance measures. When
predicted and observed concentration fields are known, they can provide
considerable insight into model performance. These performance measures
are based upon taking the difference between the predicted and observed
values of certain quantities. Even when the observed values of these
quantities are not known with accuracy, computation of their predicted values
can provide a systematic means for characterizing model predictions.
C-27
-------
The performance measures presented here can be divided into three
types: scalar, statistical, and "pattern recognition." We discuss each
in turn. In Table C-5, we list some of these measures.

a. Scalar Area Performance Measures

The seriousness of a pollutant problem is a function not only of the
concentration level itself but also of the spatial extent of the pollutant
cloud. Several scalar area performance measures are designed with this in
mind. Even if a model predicts the peak concentration well, it may not
necessarily predict the extent of the area exposed to concentrations near
to that value. This might not be a serious defect if the pollutant cloud
passed over uninhabited terrain. However, if the cloud were to drift
over a densely populated urban area, a considerable difference in the
health effects experienced could exist between a cloud one mile across and
another five miles across. This could affect correspondingly our willing-
ness to accept a model for use whose predictions of cloud dimensions
differed considerably from observed dimensions.

Two performance measures of interest are the following: the differences
between both the fraction of the area of interest within which concentra-
tions exceed the NAAQS and the fraction experiencing concentrations within
10 percent of the peak value. The first of these is a measure of the
general ability of the model to predict the spatial extent of concentra-
tions in the range of interest. The second estimates the performance of
the model in the higher concentration ranges at which, presumably, health
effects are more pronounced.

A third measure is of interest. At each measurement station a set of
concentration readings are recorded. It is interesting to compute from
the predicted concentration field the nearest distance at which there occurs
a value equal to the observed value, as well as the azimuthal direction
from the station to the nearest such point. This direction lies along the
concentration gradient of the predicted field. The magnitude of the distance
is a measure of the spatial offset between the predicted and observed concen-
tration fields in the vicinity of the monitoring station. The direction is
a measure of the orientation of the offset.
C-28
-------
TABLE C-5. SOME AREA PERFORMANCE MEASURES
Type
Performance Measure
Scalar
Statistical
Pattern
recognition
a. Difference in the fraction of the area of interest
in which the NAAQS are exceeded.
b. Nearest distance at which the observed concen-
tration is predicted.
c. Difference in the fraction of the area of interest
in which concentrations are within 10 percent of
the peak value.

a. At the time of the peak, differences in the
fraction of the area experiencing greater than
a certain concentration; differences in the
following are of interest:
Cumulative distribution function
Density function
Expected value of concentration
Standard deviation of density function
the entire residual field, the following
statistics are of interest:
1) Average deviation
2) Average absolute deviation
3) Average relative absolute deviation
4) Standard deviation
5) Correlation coefficient
6) Estimate of bias as a function of
concentration
7) Comparison of the probabilities of concen-
tration exceedances as a function of con-
centration
Scatter plots of prediction-observation concen-
tration pairs with a line of best fit determined
in a least squares sense.

Isopleth plots showing lines of constant pollu-
tant concentration for each hour during the
modeling day.
Time history of the size of the area 1n which
concentrations exceed a certain value.
Isopleth plots showing lines of constant residual
values for each hour during the day ("subtract"
prediction and observed isopleths).
Isopleth plots showing lines of constant residuals
normalized to selected forcing variables (inver-
sion height, for instance).
Peak-to-overall performance indicator, computed
by taking the ratio of the mean residual in the
area of the peak (e.g., where concentrations are
within 10 percent of the peak) to the mean
residual 1n the overall region.
C-29
-------
b. Statistical Area Performance Measures

A number of statistical area performance measures are of use. They are
generally computed either at a fixed time or at the time of a fixed event,
(the peak,for instance). Before they can be computed, however, both the
predicted and observed concentration field must be transformed into a
compatible, discrete form. The scales of resolution must be made the same,
though kept as fine as possible. For example, if a grid model provided
average concentrations every two kilometers in a lattice-work pattern
spanning the region of interest, then the observed concentration field
inferred from station measurement must also be resolved at two kilometer
intervals with concentrations obtained at each point in the lattice-work.
If resolution cannot be obtained so finely, then the predicted concentration
field must be adjusted to be comparable with the observed one. The field
having the coarsest resolution if the limiting one.
Once the fields have been resolved into a compatable form, several
performance measures can be computed. We can characterize a concentration
field by indicating for each concentration value the fraction of the area
experiencing a concentration greater than that value. By so doing, we define
a cumulative distribution function (CDF) such as that shown in Figure C-12.
The CDF is the integral of its density function (f), also shown in the figure.
CUMULATIVE
— DISTRIBUTION
FUNCTION (CDF)
PREDICTED
OBSERVED
DENSITY FUNCTION (f)
Concentration
FIGURE C-12.
DISTRIBUTION OF AREA FRACTION EXPOSED TO GREATER
THAN A GIVEN CONCENTRATION VALUE
C-30
-------
For the predicted and observed concentration fields, the CDF's may
differ. The following statistics can be compared in order to characterize
the difference; the CDF itself, the mean expected concentration in the
node led region, and the standard deviation of the area density function.
If the CDF and f were continuous functions, the following express the
form of these measures:

> Cumulative Distribution Function
CDF(C
/
< K) - I f(c)dc (C-16)
> Expected Concentration

A
VA= I cf(c)dc I
-t
LB .
where Cp is the peak and Cg is the background concentration.

> Standard Deviation
°A
• f P (c - vA)*f(c)dc (C-18)
However, the CDF and f are not available in practice as continuous
functions: They are expressed discretely, derived from concentra-
tions at the nodal points of a ground-level grid having dimensions
I by J. The above measures have the following discrete form:

> Discrete Cumulative Distribution Function

J I
CDF(Cm< K) =
IJ =l 1=1
C-31
-------
where m Is the pollutant species and u is a unit step function whose value
is
u(x)
1 , x >0
0 , x< 0
(C-20)
> Discrete Expected Concentration
J I
yj = -jjE E Cjj (C-21)

> Discrete Standard Deviation

0 I

1-1

The predicted and observed concentration fields can be differenced, with
the result being a spatially distributed residual field at the fixed time or
event of interest. The statistics of this residual field are essentially the
same as those described earlier in Eqs. (C-l) to (C-10) for the set of station
residuals. They are as follows (the tilde " denotes "predicted," while m is
the pollutant species and I, J are the number of nodes in the concentration
field grid):

> Average Deviation
0 1 ,__ -x
(C-23)
> Average Absolute Deviation
J I
lcij ' Sjl
-------
> Averaoe Relative Absolute Deviation
m_ I C* X" ICJ J " tfj L (C-25)
rr rm
3=1 1=1 C.
Standard Deviation
2
> Correlation Coefficient
jn . *v 'V=1 1-1 •* /XJ=» 1=' 1 L (c-27)
r mm
°c°c
Calculation of the above statistics can be extended through the model-
ing day by including residual values not just at a specific time or event
but for each hour during the day. Also, a graphical representation of the
correlation between prediction and observation can be developed by plotting
prediction-observation concentration pairs on a scatter plot, much as was
done for station values in Figure C-6.
C-33
-------
c. "Pattern Recognition" Area Performance Measures

Considerable information about model performance often can be found
through the use of "pattern recognition" area performance measures. Even
if a comparison between prediction and observation is difficult due to the
sparsity of the latter data, insight can still be gained through the use
of the measures described here.

The spatial and temporal development of the pollutant cloud is of con-
siderable interest. Frequently, differences between prediction and obser-
vation can be spotted quickly by comparing isopleth plots showing contours
of constant pollutant concentrations. The development of the cloud can be
portrayed graphically in a series of hourly isopleth plots. Shown in
Figures C-13(a) through (e) is a series of hourly isopleth plots. These
represent predictions for ozone generated by the SAI Urban Airshed Model
for the Denver Metropolitan region on 29 July 1975. The locations of the
measurement stations are also shown, as they were in Figure C-8.

The example illustrated in Figure C-13 is typical of applications involv-
ing multiple-source, region-oriented issues (SIP/C, AQMP). However for
specific-source issues, the downwind Isopleth contours are approximately
elliptical. An example of a specific-source isopleth, or "footprint",
plot was presented earlier in Figure V-4.in Chapter V.

Model performance can also be characterized by comparing against
observation the time histories of the size of the area in which concentrations
exceed a certain value. Such a comparison would provide insight into the
temporal variation of prediction-observation differences. An example of
such a history is presented in Figure C-14 for ozone in the Denver Metro-
politan region. A meteorology the same as that observed on 28 July 1976
was employed by the SAI Urban Airshed Model, along with emissions for that
date and projected emissions for 1985 and 2000, to predict the spatial and
temporal distribution of ozone for each year. Lines of constant concentra-
tion values are also shown.

C-34
-------
NORTH

i £
•^fSS^^-^S^^^ cSE^T-
^^^rf^r-zp-fj^^-ir*^-* J**?l-^ ^T-rr^r^T'J"<

it
••"; '>
SOUTH
FIGURE C-13.
(a) Hour 0800-0900 MST

ISOPLETHS OF OZONE CONCENTRATIONS (pphm) ON 29 JULY 1975,

Isopleth interval 1 pphm. This figure is based on pr

dictions of the SAI Urban Airshed Model for the Denver

Metropolitan region.
C-35
-------
NORTH

• ' ' ' i . _i—i—i—i
(b) Hour 1000-1100 MST

FIGURE C-13 (Continued)
C-36
-------
NORTH

^^^^^Sfe^fe^^^
^^^^LZ^^^^^Sz^^^&svSz
S£:&&ga^&^S&S&3*&,
|^sfevg^,a>^^gfe=»^5
£S^^S^^«s*fL2K?£3SbS
=r;-TL^-* 'ZiAT**-*^"—T^» SuT5C X utxi*^*- *i>^S
^^^^^^^' /m^^
^"^"•2 ffS^r^-Sr^g^s??^^ / / "***
'^i^^f ^E&S&*AS/ ..,-•

pyw/77Fx '/:;''
>-:x;Sl.' / /// X i!
SOUTH
(c) Hour 1200-1300 MST
FIGURE C-13 (Continued)
C-37
-------
i—i—i—i—i—i—i—i—r—i
SOUTH
(d) Hour 1400-1500 MST

FIGURE C-13 (Continued)
60
C-38
-------
NORTH

SOUTH
(e) Hour 1600-1700 MST
FIGURE C-13 (Concluded)
C-39
-------
Year 1976 Emissions
Year 1985 Emissions
Year 2000 Emissions
I
n
i
1/1
O)
E*»

QJ
t- I
O
3
CT,
t/t
•a
01

S-
i rrfl Mf I i I
MM IMUtMMIHB: 11

0 !•
a • £ ^ _jf
ft tt ¥ I 4
* 4 4
Time of Day By Hourly Interval
Meteorology for 28 July 1976 Assumed
FIGURE C-14. SIZE OF AREA IN WHICH PREDICTED OZONE CONCENTRATIONS EXCEED GIVEN VALUES FOR YEARS 1976, 1985,
AND 2000. This figure is based on predictions of the SAI Urban Airshed Model for the Denver
Metropol1 tan reglon.
-------
If both the predicted and observed concentration fields are resolved
compatibly to the same scale, the two can be differenced and the residuals
plotted directly as isopleth contour plots. This may be done either at a
fixed time/event or hourly. The example shown in Figure C-15 is typical
of such a plot, although it was not derived from observational data. This
particular figure was calculated by differencing the annual N02 concentra-
tions predicted by the EPA's Climatological Dispersion Model (COM) for two
emissions regions: one a base case and the other a 17.5 percent reduction
in emissions in downtown Denver. Since the magnitude of the residuals may
be strongly a function of certain atmospheric forcing variables (wind
speed or inversion height, for instance), it can be helpful to normalize
residuals to the forcing variable values.

Several model performance problems can be spotted qualitatively using
residual isopleth plots. Some of those that might be apparent are:

> Good peak/poor spatial agreement.
> Bad peak/good spatial agreement.
> Different peak location.

A composite measure can also be useful in assessing the relative peak/
spatial performance of a model. The peak-to-overall indicator can be calculated
at the time of the peak as the ratio of the mean residual in the vicinity of the
peak (where concentrations are within 10 percent of the peak, for example) to
the mean residual in the overall region.

4. EXPOSURE/DOSAGE PERFORMANCE MEASURES

The health effects experienced by an individual in a pollutant region
seem to be a function of both the concentration level and the duration of
exposure. The aggregate impact experienced by the total populace would be
expressed by the sum of the effects impacting each individual. The serious-
ness of the pollutant problem would be related not just to the spatial and
temporal development of the pollutant alone but also to the spatial and temporal
distribution of the population living beneath it. Several performance measures
attempt to guage model performance on this basis.
C-41
-------
NORTH
•~r—1 r i i ~ i i
SOUTH
FIGURE C-15.
TYPICAL RESIDUALS ISOPLETH PLOT FOR ANNUAL AVERAGE N02<
Units are in
C-42
-------
In this section we present some of these performance measures, acknow-
ledging at the outset the difficulty of their computation in practice. Whether
the spatial scale is urban/regional or source-specific, the problem is essen-
tially the same. Not only must the predicted and observed concentration field
be known, but also the population distribution. All are temporally and spatially
varying. Conceivably, the observed concentration field may be estimable from
station measurements. Recording actual population movements during the modeling
day, however, seems a nearly unsurmountable task. In reconciling these problems,
several options seem available; among these are the following two:

> If the observed concentration field can be estimated
acceptably well, both it and the predicted field can
be used with the predicted population distribution to
compute exposure dosage measures for comparisons. Such a
predicted distribution is frequently available when multiple-
source, region-oriented issues are being considered. To
characterize diurnal variations in emissions, particularly
mobile automotive ones, one must estimate the diurnal
patterns of population movement. Having done so, one can
infer the hourly spatial distribution of population. How-
ever, for specific-source issues, population distribution
is seldom considered. Since only the emissions from the
individual source are of interest, those of the same species
resulting from nearly population-related activities need not
be explicitly considered, except to compute a background con-
centration over which the specific-source emissions are super-
imposed. Unless additional information can be gathered
(from a traffic planning agency perhaps), population distri-
bution may not be available, even as a prediction.
> If the observed concentration field is not known acceptably
well, computation of the observed exposure/dosage measures
cannot be accomplished. However, these quantities often can be
C-43
-------
calculated for model predictions (presuming a predicted
population distribution history is available). Even though
these cannot be compared against their observed values,
they can help characterize model predictions. A model
sensitivity analysis can be conducted to estimate the effect
of population distribution on exposure/dosage calculations.
If sensitive, the gathering of additional observational data
might be warranted, as would an expanded effort in predicting
population movement.

The exposure/dosage performance measures considered here fall into
three types: scalar, statistical, and "pattern recognition." We
present in Table C-6 some specific measures.

a. Scalar Exposure/Dosage Performance Measures

Several performance measures are defined in terms of concentration
exposure and dosage. The exposure is defined to be the product of the
number of persons experiencing a concentration in excess of a certain value
and the time duration over which the value is exceeded. It is expressed
analytically as follows:

Em(x,y,n) = / * P(x,y,t) ufc"(x^.t)-nldt , (C-28)
/ r
P(x,y,t) u|cm(x,y,t)-Ti]dt

ul
where Em(x,y,n) is the exposure at a point (x,y) to a concentration Cm(x,y,t)
of species o in excess of a given level, n (the NMQS, for example);
P(x,y,t) is the population level at (x,y) at time t; u is the unit step
function such that
z >0
U(2) - '
/.
z< 0
C-44
-------
TABLE C-6. SOME EXPOSURE/DOSAGE PERFORMANCE MEASURES
Type
Performance Measure
Scalar
Statistical
Pattern
recognition
a. Difference for the modeling day in the number of
person-hours of exposure to concentrations:
1) Greater than the NAAQS
2) Within 10 percent of the peak.
b. Difference for the modeling day in the total
pollutant dosage.

a. Differences in the exposure/concentration fre-
quency distribution function; differences in the
following are of interest:
1) Cumulative distribution function
2) Density function
3) Expected value of concentration
4) Standard deviation of density function
b. Cumulative dosage distribution function as a
function of time during the modeled day.

For each hour during the modeled day, an isopleth
plot of the following (both for predictions and
observations):
1) Dosage
2) Exposure
C-45
-------
and At = t? - t-j, is the duration of exposure. The total exposure between
t, and t« over a region measuring X by Y can be written as
m
E(x,y,rOdx dy
Since in practice the predicted and observed concentration fields are
known only at discrete points on a ground-level grid, it follows that the
population function P(x,y,t) must be resolved into a compatible, discrete
form. Once this is done, the discrete forms of Eqs. (C-23) and (C-30) can
be written as follows:
= E P!, ufc";n-nl
n=N, 1J L 1J J
J I
£(«) " E?.(n) (C-32)
T j=l i=l 1J
where I and J are the X and Y dimensions of the grid while N, and Np are
the starting and ending hours of the summation.

Dosage is defined as the product of the population at a given point,
the pollutant concentration to which that population is exposed, and the
length of time for which the exposure to that concentration persists. The
dosage provides a measure of the total amount of pollutant present in the
total volume of air inhaled by people over the time period of interest. This
may be illustrated as follows. Let the dosage, D, be in units of ppm-person-
hour. If the volume of air inhaled is V cubic meters per person-hour, the
quantity of pollutant, Q, present in the air may be estimated as

Q = DV x l(f6 cubic meters (C-33)

If V is assumed to be a constant, then Q is proportional to D and the dosage
0 provides a measure of Q. It may be noted that the dosage provides no
C-46
-------
information as to the amount of pollutant inhaled per person. The dosage
at a point (x,y) may be expressed as
±
Dm(x,y)=/
2
P(x,y,t) C(x,y,t)dt (C-34)
while the total dosage within an area X by Y is

Y X
D™ = f f Dm(x,y) dx dy (C-35)
JQ \
Expressed in discrete terms these two equations can be written as
(c'36)
J I
Z D?. (C-37)
TTi 1J
Using Eqs.' (C-31) and (C-32) we can calculate two measures of interest:
We can determine for the predicted and observed concentrations the number
of person-hours of exposure to concentrations (1) greater than the NAAQS
and (2) near the peak (within 10 percent, for example). Using Eqs. (C-36)
and (C-37), we can determine for the modeling day the total predicted and
observed pollutant dosage. By comparison of the predicted and observed
values, the seriousness of any differences between the two can be estimated
in a way that relates, though crudely, to pollutant health impact.

b. Statistical Exposure/Dosage Performance Measures

Exposure/dosage performance measures have several useful statistical
variants. One of these is the difference between the predicted and observed
exposure/concentration distribution function. An example of such a function
is shown in Figure C-16, calculated for ozone in the Denver Metropolitan
C-47
-------
I
2
I,
o I
=2=
10 12 14 16 18 20

Ozone Concentration (pphn)
22
26 28
FIGURE C-16.
ESTIMATED EXPOSURE TO OZONE AS A FUNCTION OF OZONE
CONCENTRATION FOR 3 AUGUST 1976 METEOROLOGY. This
figure is based on predictions of the SAI Urban
Airshed Model for the Denver Metropolitan region.
C-48
-------
region. The figure is based on predictions made by the SAI Urban Airshed
Model using actual emissions and meteorology for 3 August 1976, as well
as projected emissions for 1985 and 2000.

Certain statistics of the exposure distribution are useful: the
cumulative distribution function (CDF) itself, the density function (fE),
the expected value of the pollutant concentration, and the standard devia-
tion of the density function. We show in Figure C-17 a representation of
the general shapes taken by the CDF^ and the f^.
CDF,
'B
Background
Concentration
FIGURE C-17.
GENERAL SHAPE OF THE EXPOSURE CUMULATIVE
DISTRIBUTION AND DENSITY FUNCTIONS
Incorporated in this figure are two important assumptions: None of the
population is exposed to concentrations above the peak value, Cp, while
all are exposed to concentrations at least as high as the background value,
'B'
The first of these is certainly a valid assumption. The second may
not be accurate in all circumstances. Those persons spending their days

C-49
-------
indoors within environmentally controlled buildings may experience lesser
concentrations than the background value. Noting this possible limitation,
however, we proceed.

The CDFF can be derived from the exposure function defined in Eq. (C-30)
and illustrated with the example in Figure C-16. It can be expressed as

Em(C)
CDFr(C) = 1 - -J (0-38)
E
The density function, fF, can be derived from this relation as follows
fE
£ [CDFE(C)]
(C-39)
Combining Eqs. (29) and (31), we can write

ET(C) '[ f f*b*rt u[cm(x,y,t) - c] dt dx dy (C-40)
Y X t
From this, we can express its derivative as
BT[ET(C)]= • / / / p(*>y>v «[cFn(x>y>t)"c]dt dx dy (c"41)
Y X t
C-50
-------
where 6 is the Dirac delta function defined such that 6(z) is 1 when z = 0
and zero for all other values of z. The density function can thus be
written as
f f /"P(x,y,t)6[cm(x,y,t) - c] dt dx dy
Hi
(C-42)
The expected value, y£., and the standard deviation, OE> are defined as follows

.C,
/P
CfE(C) dC
rcP
°E = /
-------
This function has the form shown in Figure C-18.
1.0
AC
Concentration
FIGURE G-18. SHAPE OF ^(C), THE APPROXIMATION TO
THE DELTA FUNCTION
Using Eq. (C-45) the discrete form of the density function can be written

in the following form:
n - cl
(C-46)
The expected value and standard deviation then can be expressed as
"E 'I § CkfE
(C-47)
aE =
(C ~
(C-48)
where K is the number of equally spaced intervals, AC, spanning the concen-
tration range from CB to Cp.

C-52
-------
The quantities described above—the CDF_, f£, p_ and a-—form the
basis for a comparison between prediction and observation. Differences
in the shape of the CDFF can be characterized by differences in yc and
2
CE , as well as being revealed by differences in the qualitative shapes
of the f^. If these differences are large, model performance may be
judged unacceptable.

The variation of the cumulative dosage function during the modeling
day is another means for comparing prediction with observation. An example
of such a dosage function is shown in Figure C-19, calculated for ozone in
the Denver Metropolitan region. The figure is based on predictions made
by the SAI Urban Airshed Model.

c. "Pattern Recognition" Exposure/Dosage Performance Measures

The performance of a model in predicting exposure and dosage can be
judged qualitatively by comparing isopleth plots of predicted values with a
similar plot showing observed ones. We present in Figures C-20 and C-21 the
ozone exposure and dosage contours, respectively, predicted by the SAI Urban
Airshed Model for Denver on 3 August 1976. The population distribution
assumed in each was based on data supplied by the Denver Regional Council
of Governments. Residential population figures were corrected temporally
to account for daytime employment patterns. No attempt was made, however,
to adjust for other shifts during the day.

In Figure C-20, the cumulative exposure at one-mile intervals is shown.
Isopleths of exposure to concentrations greater than a certain value are
included for three different levels. In Figure C-21, the cumulative dosages
are shown for each point on the same one-mile spaced grid. In both figures,
the interval of time considered was 13 hours, from 500 to 1800 (MST).

5. "HYBRID" PERFORMANCE MEASURES

As noted earlier, model predictions often are more finely resolved
spatially than measurement data. A consequence of this is the following:

C-53
-------
10
600 700 800 900 1000
1100 1200 1300 1400 1500 1600 1700 1800 1900

TIM of Day (MST)
FIGURE C-19.
CUMULATIVE OZONE DOSAGE AS A FUNCTION OF TIME OF DAY
FOR 3 AUGUST 1976 METEOROLOGY. This figure is based
on the predictions of the SAI Urban Airshed Model for
the Denver Metropolitan region.
C-54
-------
l« II ta It It It I* IT I- It M •! *b M M
M If JMI I* M
O
I
cn
en

M • • • •
17 00*0
14 1 1 1 1
,, |CA| PKN
ii • • a i

19 • • 4 1
l!l • tt 1 1

II • • • •
1* • • • 1
* • • 0 1
Honnn CHI
a • • 0 o

1 1 IA •
1 1 13 12
-J WIT. 1
9 •! III 13

3 9 II II

3 tt 14 14
1 T II II
l l ll li

(1 ll II
LAKE
* • II II
• 1 2 10
• •10

JUT
IffllM

nnoohrm.D

via mi ii tn

10 10 10 4
AAV ABA
lH ni |A i*

10 10 13 ffl
13 ia ia 10
10C
13 13 14 23
KDCCVATEU
II || |g 23

II II 14 33
0 II 10 03

WOOD
13 13 10 31
13 IB 31 31
14 Iffl 10 23

o • ai ai
CO
N CLtUIH V

T~T. •

rtft
KI
IN

14
5ft

03
00

17
17
18

ai
Lin
I.Y

NonrucLMM

TUOIWTON
NIL
rro
1 Bcrr rani »m

, — . «r«rL ica
fr- "1 1

" D E*n V ic-n AUiu>iu-
33 03 82 42 I* 14 14 3 8 1
Man •& AA •• IA IA • A i

IT IB Ot 0* 21 II III 14 * «
17 83 33 1 94 a> 11 II 84 ft S
iu aa • • • ai ti B l •
CUCIUIV BILLS
27 33 • 1 S 14 81 I 1 •
tLTOH CRiCNWUOO VI.O

»Ht

«!!••••
• i i a • • •
4 a a 2 • • •
4 8 a a • o •
• • • 1 a • • •

• .
2 • 0 • • • •
-b •..'•.»-

(a) Concentration Greater than 8 pphm; Year 1976 Emissions

FIGURE C-20. CUMULATIVE EXPOSURE (IN TO3 PERSON-HOURS) TO OZONE CONCENTRATIONS ABOVE GIVEN
LEVEL IN ONE-SQUARE-MILE GRID CELLS BETWEEN 500 AND 1800 HOURS FOR 3 AUGUST
1976 METEOROLOGY AND 1976 EMISSIONS. Grid numbers are listed on left side and
top of figure. This plot is based on predictions of the SAI Urban Airshed
Model for the Denver Metropolitan region.
-------
r n • IB ii ia is u IB 14 IT
ta ai aa an a4 aa M tr M a*
o
in
en
**
30
3T
at
99
»4
33
23
at

I*
l«
ir
u

•
ia
it
!•
OOLDCN
Honnimn
Non-raciEim
WESTMHIBTEB
TUORIfltMl
mrre
IICKY HHIH *nSHL
AIWADA
WITT. ROC
8T»rL I Nil
recevAitin

aaatiaio
aaa
NOOB
I*I«7IB444~4~|BBBBBBBBBB

-. - ^—.~ t
BIIERIDAII CHCLCHOOB
mien i van e.nuu.*uw i •
BI38*4arillB«BB
' 1 cnemir mtu I
» . t • T 4 7 T I I I B 4 • B
B B t a a B r r • i i U « • FT
jerr co '—i Lim^nm CMEtmnoaito '
URBAN ci, nit vtr
Cl, nil VtT
I 38
aaa»t I I
i
BBBB'BB-
(b) Concentration Greater than 16 pphm; Year 1976 Emissions
FIGURE C-20 (Continued)
-------
II !• 19 14 It U If Ml !• M •! tt M 14 M M *T M B* M
o
tn
0*
M
ar
9*
33
•4
aa
aa
31
2*
14
id
17
U
II
14
ia
13
II
10
«
0
T
GOLDEN
HOnniSON
imoo WIELD
ROKTOCLCIIII
TaonirroH
o o • « u
ronL
DCT8
IKKY HHTH AJUHL
AIWADA
COMMERCE
wrr.
HTAPL INTL
D E M v e n
AONOHA
0 •
CLEMUALK
LAKEVOOD
tHCLKWOOO
CIIEIIRV DILLfl
Jirr co
UllOAN CLHIM V

LiTn.r»
V

OH CIUCNWOOD VI 0

(c) Concentration Greater than 24 pphm; Year 1976 Emissions
FIGURE C-20 (Concluded)
-------
o
90
39
an
37
34
38
34
33
93
31
30
14
IO
17
14
18
14
19
13
II
10
9
n
T
0
4
9
1

» 0
0 0
. 1 .
Vi
0 0
0 0
O 0
0 0

COLI

~ L
: :]
0 0
0 0
m

0 0

0 0
0 0
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
M

n a
3 9
0 0
'f
0 III
IIUIIBOH—

0 0

1 1
1 1
1 1
1 1
1 1
1 14
1 10
10 10
4 4
4 4

0 14
9 9
9 9
14 14
14 0
1' '

-— 1 nnnonriELo

90000
tmm
a an 9 o
18 IT IT IT 18
14 IB 10 IB 17
AHVADA
IO IB IB IB 17
14 IB 19 19 33
33 23 33 33 33
WIT. WHS
84 34 88 38 36
eocevATci
99 33 99 33 34
39 3!l 34 34 98
IB 10 14 19 19
19 19 91 91 93
19 19 SI 31 33
imvooo
30 SO 33 33 ;I3
4 Itt 33 30 F07
3 4 14 94 37
4 4 4 4|,7
jerr co
tmoA*. ei

4
1IH9T
4
7
10
34
04
94
99
80
80
68
34
07
07
oen ii
H
94)
99
13
•HOW
19
19

19 14 18

IHMITI
8 B 13
r.n n
8 8 It
font
IWT8
8 8 10
4 4 IO
13 IB 10
01 01 40
01 01 40
98 07 84
• m L -1
98 49J309J
BE
84 84 80
04 84 80
1
64 44 *a|
94 04 03
09 04 30
40 Oil 43
24 20 1 46
,- -1 C
34 |48 44
93 43 43
LITTLCTOn
04 04 00
.10 Oo! 34

14 It

<• 14 90 91 99 9» 94 98 94

97 tfl 29 30

12 13
ici.eim
19 13
13 10
Miurroii
18 10
IB II
IB II
IB II
(1
IB 4
40 43
40 40
84 83
84 83
B4 83
CLI
84 83
44 87
49 87

10 10 T 4
10 It 7 T
II 0 B B
4 B • 1
imr
9 9 1 I
OtOVBCt.
9 9 1 10
99 9 90 17
99 9O 30 17
OTAFL IRTI
M 93 39 7
97 04 04 1 T
97 94 941 99
00 40 40 04
UIDALE
00 00 40 94
49 00 OB OO
43 00 On 00
IV 9 4 1 30 OH 1 14
ICMT OIULB I— 1 I
10 t 9 24) 00 | 4
ONECNWOOO VLO
99 13 13 II II 9
90 II II 10 10 B
94 II II 14 14 B

0 00 0 0
4 4 I 0 0 0
4 to 0 B B
1 1 1 1 J 0

IWTII AM9WL

17 14 14 19 •
17 14 14 19 IB
•

AulWRA

0
0
•

000
000
090

10
10

3 1 • 7 7
9 B 1 1 1
• 4 • 1 4 0
B 4 4 } 0 0

4
T
1
1
1

10
IO

0 9
0 O

ft •

000
1
1
1
0 0
0 O
0 0
0 0

FIGURE C-21.
CUMULATIVE OZONE DOSAGES (IN 106 PPHM-PERSON-HOURS) IN 0!ir.-SQUARE-MILE GRID
CELLS FROM 500 TO 1800 HOURS (MST) for 3 AUGUST 1976 METEOROLOGY AND EMISSIONS
IN 1976. This figure is based on predictions of the SAI Urban Airshed Model
for the Denver Metropolitan region.
-------
model performance sometimes must be evaluated using performance measures
requiring different classes of data "completeness." For instance, the
observed concentration field may not be inferred reliably from station
data even though the predicted field can be well described. In such a
case, concentration isopleth plots for both could not be constructed and
compared directly. Still, we would not wish to rely solely on station
performance measures. To do so, we would sacrifice some of the information
content available on the prediction side of the comparison.

Several performance measures are "hybrid" ones. They are designed
for use when a different level of concentration information is available
for prediction than for observation. We discuss here such a measure, the
basis for which is shown in Figure C-22.
MEASUREMENT
STATION
PREDICTED CONCENTRATION FIELD
NEAREST POINT AT WHICH
PREDICTION EQUALS STATION
OBSERVATION
X
' ACTUAL CONCENTRATION FIELD
FIGURE C-22. ORIENTATION WITH RESPECT TO MEASUREMENT STATION OF NEAREST
POINT AT WHICH PREDICTION EQUALS STATION OBSERVATION
C-59
-------
In the figure, isopleths are shown for the predicted and actual concentration
fields. Only at the measurement station, however, is data available describing
the actual field. The offset between the two fields nevertheless can be
characterized by determining the vector (distance, azimuthal orientation)
from the station to the nearest point at which the predicted concentration
equals the measured value. This can be done for several hours, producing
a time history of the distance and orientation of that point. A plot of
this can be constructed, as shown in Figure C-23.
NORTH
WEST-
5-6 p.m.
3-4 p.m.
SOL
6-7 a.m.
TH
2-1 p.m.
-EAST
FIGURE C-23.
SPACE-TIME TRACE OF LOCATION OF NEAREST POINT
PREDICTING A CONCENTRATION EQUAL TO THE
STATION MEASURED VALUE
The space-time trace shown in the figure is centered at the measurement
station. Similar traces could be constructed for each station. Space-time
correlations could be made to infer the amount and orientation of the
displacement of the two concentration fields.
C-60
-------
APPENDIX D
SEVERAL RATIONALES FOR SETTING
MODEL PERFORMANCE STANDARDS
D-l
-------
APPENDIX D
SEVERAL RATIONALES FOR SETTING MODEL PERFORMANCE STANDARDS

In Chapter VI of this report, we identify a "preferred" set of model
performance measures, the values of which are helpful in assessing the degree
to which model predictions agree with observations. It remains for us to
decide how "close" these must be in order to judge model performance to be
acceptably good. In this appendix, we present four alternate rationales
for making such decisions: Health Effects, Control Level Uncertainty,
Guaranteed Compliance, and Pragmatic Historic. To maintain perspective
about each rationale and the problems for which their use may be appropriate,
we recommend Section D of Chapter VI be read prior to considering this appendix.

1- Health Effects Rationale

Ambient pollutant concentrations are not themselves our most funda-
mental concern but rather the adverse health effects they produce. The
NAAQS are chosen to serve as measurable, enforceable surrogates for the
"acceptable" levels of health impact they imply. Because health effects
are of such basic importance, it makes sense to define model performance
in such terms. However, quantifying the health effects resulting from
exposure to a specified pollutant level can be a difficult and controver-
sial task. Toxicological studies in laboratories by necessity are performed
at high concentrations, often at levels and dosages seldom occuring even
in the most polluted urban areas. Experiments are conducted in animals
whose response patterns may not serve as perfect analogues for human behavior.
Epidemiological studies are confounded by the variety of effects occuring
simultaneously in a complex urban environment. Consequently, isolation
of a "cause-and-effect" relationship between health effect and pollutant
level becomes statistically very difficult.

Nevertheless, in this discussion we indicate one means whereby health
effects can be used as a basis for evaluating the acceptability of model
performance. We postulate the existence of a health effects functional, *,
dependent both on concentration level and health effects for all exposed
0-2
-------
persons in the polluted region. This quantity (the area-integrated cumu-
lative health effect) we use as the metric of interest. If the ratio of
the predicted value of 4> to its observed value remains within a certain
tolerance of unity, model performance is judged acceptable.

Several features of this approach have appeal. Among these are:

> The health effects functional need not be known precisely,
only its general shape.
> The use of area-integrated cumulative health effects as a
»etric has strong intuitive appeal; it is less sensitive
than dosage to concentrations not near the peak value.
> A transformation of variables reduces the spatial sensitivity
of the metric, *, with more than one spatially distributed
region mapped in to the same value of $; this can result
in an increase in generality of application.
> Simplifying assumptions can be invoked to allow computation
of specific numerical values.
*. Area Cumulative Health Effects As a Concept

"Total area dosage" is frequently used as a surrogate for "total area
health effects." Mathematically, total area dosage, DT can be expressed as
DT(trt2) = J J J 2P(x,y,t)C(x,y,t)dt
dx (D_i)
X Y 11
where the duration of exposure is At (=t2-t,); P(x,y,t) and C(x,y,t)"are
the population and concentration at (x,y) at time t; and X and Y represent
the spatial limits of the polluted region.

However, the concentration C(x,y,t) in this relation and the time
duration of exposure really combine to approximate health effects. Suppose
that a health effects function exists such that
D-3
-------
HE = HE(C,At)
(D-2)
Such a function could behave as shown in Figure D-l, with HE disappearing
only when concentrations approach zero. Alternatively, a threshold concen-
tration might exist below which specific effects are either indistinguishable
from a background level or below the threshold of perception.
Concentration
FIGURE D-l, POSSIBLE HEALTH EFFECTS CURVES

We define a new metric: the area-integrated cumulative health effects
functional, . It can be written as follows:

*(At) = J J J P{x.y,t) HECCfx.y.tJ.t-t^dt dy dx (D-3)
X Y At
If this function could be evaluated for predicted and "true" values of
P(x,y,t) and C(x,y,t), we could formulate the performance standard such
that their ratio, r, was required to remain within a fixed tolerance of
unity, i.e.,
I predicted
r =
1 observed
> 1 - a
(D-4)
where a is some small value (10 percent, for instance)
D-4
-------
chosen to represent a maximum acceptable level of uncertainty in
aggregate health impact. It may be noted with this standard that
model acceptability is called into doubt only if the predicted value
of 4> is less than the "observed" value. This makes sense for the
following reason: Considering only a perspective based on health
effects, we are concerned that the model predict conditions leading
to health impact at least (or nearly so) as large as actually occurs.
To bound model on the "upper" side, another rationale must be used
(control level uncertainty, perhaps).

The expression in Eq. n,-3, however, is of only academic interest
unless it can be made more tractable. Several of its key limitations
are as follows:

> It is a spatial integral. The value of P(x,y,t) and C(x,y,t)
change for each new application locale. Thus it is diffi-
cult to extend results obtained in one situation to those
expected in any new one.
> The health effects function, HE, is dependent on concen-
tration and cannot be expressed directly without being
"mapped" through the concentration field.

However, through a transformation of variables, some difficulties
can be overcome. We will replace in Eq. D-3 the double spatial
integration,by a single concentration integration taken over the range
of ambient values (background, CB, to the current peak, Cp). Total
population within the modeling region at time t, PT(t), can be written as

C (t)
f fp(x,y,t) dydx = PT(t) - JP w(C.t) dC (D-5)
X Y CB

where w(C,t) is the population exposed to a concentration C at time t.
(By definition, no one is exposed to concentrations lower than the
D-5
-------
background value, Cg.) A pictorial representation of the population
function P(x,y,t) and wfC.t) is shown in Figure D-2.
ISOPLETH
C+dC*
ISOPLETH C
I IS THE POPU-
I LATION AT THIS POINT
I
^-MODELING
REGION
-w(C,t)dC IS THE
POPULATION WITHIN
THIS AREA
FIGURE D-2. REPRESENTATION OF SPATIAL AND CONCENTRATION
DEPENDENT POPULATION FUNCTIONS
The equivalence expressed in Eq. D-5 holds without qualification
providing the modeling region is chosen large enough to contain the
background (Cg) isopleth for every hour during the day. However, this
requirement can be relaxed under the following condition: No or very
few persons live or work in the area outside the modeling region but
within the Cg isopleth. In such a case the modeling region need only
be large enough to enclose within it the population of interest.

An important observation can now be made: The health effects func-
tion, HE, can be introduced into both sides of Eq. D-5 without disturb-
ing the equality. Doing so and integrating with respect to time, the
area integrated cumulative health effects (CHE) functional can be trans-
formed into
D-6
-------
rrcp(t)
t(At) = JJ wfC.tjHEtC.t-t;,) dCdt (D-6)
AtCB

It is this equation with which we deal in the remainder of this section.

b. Components of the Cumulative Health Effects Functional

We now examine each of the two major components of the CHE functional:
the population distribution and health effects function. For Eq. D-6
to be of any use to us, it must be made analytic in a way that has a degree
of generality from one application locale to another. Consequently, we
are guided by three principal objectives: Both W(C,t) and HE(C,At) must
t
be analytic, integrable, and based upon simple, easily understood assump-
tions. To accomplish this, important simplifications are invoked. The
degree to which they limit the generality of the results is discussed,
although additional research beyond the scope of this study seems desirable,

Population Distribution Function

The function w(C,t) represents the distribution of population with
respect to both concentration level and time of day. As a first approx-
imation, we assume it is separable, i.e.,

w(C,t) = w(c) f (t) , (D-7)
w
where w(C) is the distribution of daytime (workday) population with
respect to concentration level alone at a particular fixed time (the time
of the concentration peak, for example), and fw(t) is a weighting function
chosen to reflect the diurnal variation in that distribution (residential
vs. commute vs. work hours).
D-7
-------
Within a pollutant cloud, concentrations tend to be distributed as
follows: A distinct peak value occurs, with concentration falling off
as a function of radial distance from that peak. Contours of constant
concentration (isopleth lines) surround the peak concentrically, with
concentration diminishing to background levels. This radial distribution
of concentration level is suggestive. If population is distributed about
the peak such that

2* 0
P(C) = f f p(r*,e)r*dr*de ,
-------
cloud may have drifted some distance (10-30 km) from the densest
population centers. However, our approach h?re is highly pragmatic.
To render Eq. D-9 soluble, we must invoke simplifying assumptions.
Having done so, comparison of our results with actual data offers us a
measure of our success.

Such data has been obtained from ozone exposure/dosage studies
done for the Denver Metropolitan region using the grid-based SAI
Urban Airshed Model. Shown in Figure D-3 is the population density
function predicted on 3 August 1976 for the hour from 1300-1400 (1 to
2 p.m.)—the time of the predicted ozone peak (0.24 ppm). The concen-
tration field predicted by the model was used. A coarse population
distribution was derived based upon data supplied by the Denver
Regional Council of Governments (DRCOG) and was adjusted to approximate
employment shifts. Since the analysis supplied exposure estimates only
above 0.08 ppm which were expressed no more finely than in 0.02 ppm
increments, an uncertainty band, as shown, exists about each point.

Several key observations can be made. The value of w(C) seems to
become very small at the peak concentration, i.e., while concentration
levels may be high near the peak (within 90% of it), the area (and
population) affected is small. Also, an apparent anomaly occurs between
0.18 and 0.20 ppm. This may be due to any of several causes. Population
density non-uniformities, however, appear to be the most lively of these.

Using the data contained in Figure D-3 as a standard for comparison,
we may proceed in developing a simplified, analytic form for w(C). We
make two key assumptions in doing so. First, we assume a shape for the
radial concentration distribution, C(r), which we invert to give us r(C).
Then we make a simplifying assumption about the population density
distribution, p(r,e).

To estimate C(r), we may idealize isopleth contours as a series of
concentric circles, as shown in Figure D -4. Further, we may assume

D-9
-------
o
200
a.
a.
a>
a.

O
o
0)

O.
o
150
100
J3

o 50
a.
o
a.
0
K-OH
I—OH
NOTES:

1) DATE: 3 AUGUST 1976

2) TIME OF DAY: 1300-1400 HOURS

3) POPULATION CORRECTED TO

ACCOUNT FOR EMPLOYMENT
^-0-^
_L
J.
J.
J
0 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30

Concentration (ppm)
FIGURE D-3. POPULATION DISTRIBUTION AS A FUNCTION OF CONCENTRATIONS. Based on predictions

of the SAI Urban Airshed Model for the Denver metropolitan region.
-------
there to be N isopleths between the peak concentration, C , and the
background value, C.
BACKGROUND C
B
PEAK C,
FIGURE D-4. IDEALIZED CONCENTRATION ISOPLETHS
If we assume that for isopleths separated by a constant concentra-
tion decrement, AC, the interisopleth distance grows exponentially (that
is, the isopleths are separated by a steadily growing distance), then
we may write an expression for the n-th radius such that
n
r =
n
Ar.
n-1
£
i=0
l - e
Ar
(D-10)
D-ll
-------
Since
n(CP - CB)
- nr-N~; •
(D-ll)
we can solve for n, substitute this into Eq. D-10, and then generalize
to yield the following:
C(r) . Cp - f in [l - (^jrj . (D-12)
where AC is the interisopleth concentration decrement and b is chosen
so that r(CB) equals the radius of the pollutant cloud, (here assumed to be
the urban radius). Several typical such concentration distributions
are shown in Figure 0-5.

We can now invert this relation to estimate r(C). Doing so, we can
write
(D-12)
Substituting this and its derivative into Eq. D -9, we get an expression
for w(C) such that

2ir
J p(r,e)de
D-12
-------
o
u>
E
CL
CL
0.24

0.22

0.20

0.18

0.16

0.14
I 0.12
4-1
2
g 0.10
u
0.08

0.06

0.04

0.02

0
NOTES:
1) Ar IS THE DISTANCE FROM THE PEAK TO THE
FIRST ISOPLETH, I.E., CD - AC ( the 0.20 pptn
ISOPLETH) F
2) PEAK CONCENTRATION OF OZONE IS 0.24 ppm.
3) BACKGROUND CONCENTRATION IS 0.04 ppm.
4) POLLUTANT CLOUD RADIUS IS 13 MILES.
BACKGROUND
6 8 10
Radius from Peak, r(C) (miles)
12
14
16
FIGURE D-5. TYPICAL RADIAL CONCENTRATION DISTRIBUTIONS ABOUT THE PEAK. Parameters
are chosen to be representative of the Denver metropolitan region.
-------
We now make another key simplifying assumption. We approximate the value
of the integral by assuming a uniform radial population density, i.e.,

/*
J p(r,e)de = 2irD (D-14)
0
Substituting this into Eq. D-13, we arrive at the final form for w(C):

w(C) =

where
(D-16)
(D-17)
K3 =
and D is chosen such that the integral of w(C) between (L and Cp equals
the total population within the modeled area.

We have made thus far a number of significant assumptions. To test
their adequacy, we can select parameter values appropriate for the Denver
example, calculate w(C), and compare the results against the data shown
in Figure D-3. The parameter values selected are shown in Table D-l.

In Figure 0-6 we show the population distribution predicted by
Eq. D-l5. Several observations can be made about its agreement with the
test data.
D-14
-------
TABLE D-l. SELECTED PARAMETER VALUES IN DENVER TEST CASE
Symbol

AC
Description
Value
D
Peak concentration (ozone).

Background concentration.

Concentration decrement between
isopleth lines (N=5 isopleths).

Exponent by which interisopleth
distance grows, selected such that
C(r) equals C~ at r=13 miles from
the peak (at the approximate urban
radius).

Radius from peak to the first iso-
pleth (the 0.20 ppm contour).

Uniform population density_chosen
such that the integral of w(C)
between CD and CD equals the total
population (1.275 million).
0.24 ppm

0.04 ppm

0.4
1 mile

2405 persons/
sq. mi.
D-l 5
-------
o
t
250 r-
•s 200
Q.
CL

s-
OJ
CL
tn
c
O

-------
> Qualitatively, the shapes seem to agree.
> The analytic form of w(C) seems to underpredict the
distribution of population at higher concentration
levels.
> The anomaly occurring in the data at 0.19 ppm remains
unaccounted for in the analytic form.

Despite the seeming limitations imposed by our assumptions, however,
agreement with the test data seems surprisingly good. It remains to be
seen in further investigation (beyond the scope of this study) whether
this result is typical or merely fortuitous. We emphasize that results
obtained thusfar, while encouraging, should be regarded as preliminary.

In deriving Eq. D-15, we assumed a uniform population distribution.
We can estimate qualitatively from our results the change in w(C) re-
sulting from variations in this assumption. The shifts expected in w(C)
for a nonuniform population density are illustrated in Figure D-7. In
all cases the integral of w(C) is assumed to equal the total regional
population.
PEAK OCCURS IN
LOWER DENSITY REGION
UNIFORM POPULATION
DENSITY
\\
o
f*
EAK OCCURS- IN HIGHER
DENSITY REGION
CB
Concentration
FIGURE D-7.
SHIFTS IN w(C) CAUSED BY NONUNIFORM
POPULATION DISTRIBUTIONS
D-17
-------
We now consider other variation of w(C,t) with time. Temporal
changes in the function are caused by two principal effects:

> Evolution of the Concentration Field
- The peak concentration occurring at a time t, Cp(t),
increases during the morning, usually reaches a
diurnal peak in the early afternoon, and then de-
creases slightly by late afternoon.
- The overall radius of the pollutant cloud~r(CB)~
increases up to the time of the peak.
- As the day progresses near-peak concentrations
"spread out," that is, the percentage of the total
cloud area having concentrations near the current-
hour peak (say, within 201 of it) increases during
the day
> Population Shifts
- Urban areas have two distinct patterns of popula-
tion distribution during the day: residential
(non-work) and employment (workday). These are
separated by two peak-traffic commute periods.
- A percentage of the population during the day is
mobile, traveling from one point to another.

Me have assumed here that the total impact of these effects can be
approximated by a separable weighting function, fw(t), applied to the
function w(C). The extent to which this is valid needs to be verified
by additional investigation. Yet, as a first approximation it has
some plausibility, and it allows us to proceed to an analytic result
for model performance standards—our principal objective.

Health Effects Function

Health effects resulting from exposure to polluted air manifest
themselves in many ways, each varying in the symptom it produces and
D-18
-------
the seriousness of its impact. Among such effects are the following:
bronchial irritation, reduced lung function, enzyme damage, eye irri-
tation, dizziness, and coughing. Some of these manifest themselves as
noticeable but low-level discomfort; others produce more serious impact
such as aggravation of respiratory illness. Equating each effect on an
absolute scale and relating their aggregate weighted impact directly to
ambient pollutant levels, however, is a formidable task. Efforts at doing
so have been subject to uncertainty and controversy. To overcome these
difficulties, we resort to several conceptual simplifications. Rather
than differentiating between individual health effects, we collapse them
together into a single function, whose "seriousness" is dependent on
concentration level, C, and duration of exposure, At. We represent
this by the following

HE = HE(C.At) (°-19)

We now make an intuitive appeal. While we may not know the value
of HE in an absolute sense, we observe that its value increases, that
is, the HE gets "worse," as concentration levels rise and the duration
of exposure increases. Further, because health effects at higher con-
centrations and durations are more serious, we expect HE to grow faster
than linearly with increasing C (and probably At). We also can expect
HE to exist even at very low values of C, though these effects may be
small, perhaps below the threshold of human perception. Qualitatively,
the shape of HE might look as shown in Figure D-8.

Based on the reasons noted above, we can make a useful approximation.
We assume that HE is separable, one part dependent on C and At, and that
it can be described by the following simple relation:

HE(C,At) = ACYfH£(At) (D-20)

where A is a scaling constant (whose value we need not know, as we shall
observe later); -y is a "shaping" parameter whose value is likely to be
D-19
-------
LU
X
o>
1C
JV
THRESHOLD
Concentration, C Exposure Time, At

(a) Variation With Concentration (b) Variation With Exposure Time

FIGURE D-8. EXPECTED SHAPE OF HEALTH EFFECTS FUNCTION
greater than one, i.e., linear; and %Ut) is a weighting function
dependent solely on exposure time.

c. Analytic Solution of the Cumulative Health Effects Functional

. Having now specified analytic forms for the population distribution
function, w(C,t), and the health effects function, HE(C,At), we may

proceed to evaluate the area- integrated cumulative health effects func-

tional, $, as it was defined in Eq. D -6. We may rewrite * as follows:
At) =
2

= J *„(*)
*"(C)ACYdC
F(At)*(Cp) ,
(D-21)
where Cp is the peak concentration experienced during the day.

Using relations developed previously, we may evaluate *. Its
value is
D-20
-------
rp
) = A | w(C)CTdC
CB
/
Cp
Aj |Kjl - K,e-3L)e-^|CTdC (D-22)
CB

Though no completely general solution exists to this equation, the
integral may be evaluated in closed-form for each integer value of Y,
the health effects function shaping parameter. A point-wise analytic
solution to Eq. D -22 thus exists.

d. Calculation of Minimum Allowable Predicted Peak

As noted in Eq. D-4, the model performance standard could be
specified in terms of a minimum allowable ratio of the "predicted"
to "measured" values of
-------
where Cp Is the predicted peak concentration and CRm is the measured
peak value. By writing the standard in this form, an important simpli-
fication results: Two parameters, being constant, appear outside the
integrals in the numerator and denominator of Eq. D-23. Since their
values in both are equal, they cancel. By this means, we eliminate the
need for "knowing" the health effects function scaling coefficient, A,
and the population distribution scaling constant, KQ. With the
rationale we present here, uncertainty associated with both, while
appreciable, thus does not affect the setting of performance standards.

He can invert Eq. D-23 to solve for the minimum allowable ratio
of predicted to measured peak concentration value. He do so for the
Denver example discussed earlier, presenting the results in Figure
D- 9. We show results for several representative values of Y and r.
If health effects varied linearly with concentration and r equaled
0.90, for instance, any predicted peak would be acceptably higher than
64 percent of the measured peak value. Similarly, if health effects
were a cubic function of concentration and r=0.90, the predicted peak
would have to exceed 80 percent of the measured value.

Several decisions must be made in determining a final value for a
performance standard based upon this health effects rationale: A
minimum acceptable value must be chosen for r, the ratio of predicted
to measured area-integrated cumulative health effects; and a judgment
must be made about the maximum likely value of Y , the exponent of
concentration in the health effects function. Possible values for use
might be r and Yof 0.90 and 3 or 4 respectively. For reference, we
note that for Y= 10, the minimum allowable ratio of predicted to
measured peak is 94 percent.

e. The Health Effects Rationale; A Summary

A model performance standard based upon pollutant health effects
has intuitive appeal. For this reason the rationale presented in this

D-22
-------
i.oor
o

ro
to
2 3

Exponent of Health Effects Function, y
FIGURE D-9. MINIMUM ALLOWABLE RATIO OF PREDICTED TO MEASURED PEAK CONCENTRATION VALUE
-------
section is of interest. Among the advantages it offers are the
following:

> It is general enough to be applied in many different
locales and applications; while parameters of the method
are appli cat ion -dependent, the method itself is much less so.
> It is analytic and based upon easily derived parameter
values.
> The test for model acceptability is based upon a
simple comparison of predicted and measured peak
concentration values.
> Many of the sources of uncertainty in the method drop
out of its final formulation.
> Results can be condensed into a single figure such as
that shown in Figure D-9.

Similarly, the rationale has several limitations:

> Only a lower bound on the allowable difference between
predicted and measured peak is provided; a prediction
in excess of the measured peak (even by a great deal)
is not sufficient to reject a model on health effects
grounds since the model predicts effects at least as
great as those actually existing.
> The method does not evaluate explicitly a model's
, spatial or temporal behavior.

The rationale presented here should be regarded as a preliminary
method. While meriting additional consideration, the method and many
of its assumptions need to be examined critically. Among the funda-
mental questions for which answers need to be sought are the following:

> On what basis do we select the minimum allowable ratio
of area-integrated cumulative health effects?
D-24
-------
> What value of health effects exponent is most appropriate?
> Does the population distribution, w (C), always repro-
duce the data as well as indicated in Figure D-6?
Does it need to?
> Is w(C,t) really a separable function, as assumed?
What about HE(C,At)?
> Are health effects really related to peak concentra-
tion and exposure time in the fashion assumed here?
What about those who work in environmentally controlled
buildings and may thus be isolated from full exposure
to ambient concentration levels?

We feel the rationale presented here has a number of advantages.
We also feel it requires a careful review and some additional examina-
tion, particularly as regards the questions noted above,

2. Control Level Uncertainty Rationale

In order to reduce peak ambient concentrations in an airshed from a
particular level to one at or below the NAAQS, reduction of emissions into
that airshed is required. The degree of that reduction, however, is
dependent on the amount by which the current peak level exceeds the
standard. Uncertainty in our knowledge of the current peak concentration
(due either to measurement or modeling limitations) translates into cor-
responding uncertainty in the amount of emissions control we must require.
This direct relationship, though generally a highly nonlinear one,fforms
the basis for another rationale for setting model performance standards.
Its guiding principle is as follows; Uncertainty in the percentage of
emissions control required (PCR) must be kept to within certain allowable
bounds.

In this section we discuss this Control Level Uncertainty (CLU)
rationale. We first indicate for a specific pollutant (ozone) how one
may proceed from PCR bounds to equivalent allowable tolerances on the
difference between the predicted and measured peak concentration. We then
D-25
-------
present one means whereby the PCR bounds can be determined from the
economies of pollution control costs. Several benefits derive from
use of the CLU rationale, among which are the following:

> It makes explicit the relationship between model per-
formance limits and the maximum acceptable level of
uncertainty in estimates of regional emissions
control.
> It provides a structure whereby model performance limits
also can be related to equivalent uncertainty bounds
on the total regional cost of pollution control equipment.

The rationale presented here is a useful complement to the Health
Effects (HC) rationale presented earlier. We noted in discussion of that
rationale that it could not provide an upper bound on the maximum
allowable difference between predicted and observed peak concentration
levels. It merely required that the predicted peak be greater than a
fraction (near unity) of the measured peak, i.e., Cpp >. BCpm where B is
near unity (e.g., 0.9). Were Cpp to be larger than Cpm, no health effect
penalty would be incurred by designing a control strategy based upon Cpp.
Rather, the principal penalty would be an economic one: The cost of control
would be greater than that actually required. It is in setting the upper
bound on the allowable value of Cpp - Cpm that the CLU rationale has its
greatest value, since it addresses directly the cost of control.

We can generalize this point as follows: The greatest cost of under-
prediction of the peak concentration lies in the underestimation of
health impact, while the greatest consequence of overprediction is the
extra economic cost associated with unnecessarily imposed control.
Health Effects and CLU, then, are compatible rationales. If the predicted
peak is required to statisfy ^ <. Cpp - Cpm <. 1^, then it seems reason-
able that K2 be selected based upon the CLU rationale with ^ chosen to
be the lesser of the values determined by the HE and CLU rationales.
D-26
-------
a. The Relationship Between CLU and the Concentration Peak

In most cases a highly nonlinear relationship exists between primary
emissions and the ambient concentrations that result from them. The
dynamic behavior of the atmosphere is complex, as are the chemical changes
undergone by dispersing pollutants carried by it. Simplifying assump-
tions, however, can sometimes be made. We consider here one example in
which this can be done.

For urban regions in which certain specific criteria are met (Hayes,1977),
the ozone production resulting from various non-nethane mixtures of precursor
hydrocarbons (NMHC) and oxides of nitrogen (NOji can be represented by means
of an ozone isopleth diagram such as the one shown in Figure D-10. (EPA,
1976). Whether the use of such a diagram is justified in a.given region
depends heavily on a number of factors, among which are the prevailing
meteorology, solar insolation, emissions type/timing/geometry, terrain type/
complexity, and the presence of large upwind pollutant sources.

If a region meets the criteria, however, an isopleth diagram may be
used as an approximation relating regional emissions to consequent peak
. ozone levels. The region-wide cutback in emissions of precursor HC and
- NO necessary to reach the NAAQS from a given starting point can then be
calculated, given a background ozone value (usually about 0.04 ppm) and
a control mix (NMHC versus NOX cutback). Usually, in urban areas the
emphasis has been on NMHC reduction. The starting point often is defined
in one of two ways: It is specified by a peak 03 measurement and either a
• NMHC/NO ratio typical of ambient conditions prevailing in the early
morningX(6-9 a.m.) or specific concentrations of either of the precursors.
Most frequently, it is the first of these methods that is used.

Because the chief value of the isopleth diagram is in its use in
estimating regional emissions cutback, it is helpful to replot the
isopleth diagram as shown in Figure D-ll (Hayes, 1977). In doing so,
D-27
-------
03/fOx1dant (ppm)
0.08 0.12 0.16 0.200.24 0.28 0.32 0.36
0.2
Source: EPA (1976b),
0.4
0.6
0.8
1.0 1.2
NMHC (ppmC)
1.6
1.8
2.0
FIGURE D-10. PROTOTYPICAL ISOPLETH DIAGRAM
-------
100
I
PO
vo
I 1
PEAK 03 (ppm)

0.36
0.24
0.20
8 10

NMHC/NOV
12
14
16
Note: No change In NOV level and no 0- background concentration were assumed.
X <3
FIGURE D-ll. THE ISOPLETH DIAGRAM REPLOTTED
-------
percentage control required (PCR) can be highlighted explicitly. While
in principle any mix of NMHC and NO control could be considered, the
«
example shown assumes that or.ly HC control is employed. That is, per-
centage control reduction (PCR) is equivalent to percentage hydrocarbon
control required (PHCR),

The PHCR diagram in Figure D-11 may be used in the following way
to deduce model performance standards. First, the measured peak ozone
concentration and the appropriate 6-9 a.m. NMHC to NO ratio together
A
define a unique point on the PHCR diagram. The nominal PHCR is thus
identified. Then, by defining an allowable band about the nominal PHCR
(say ± a where a is some small value}, we can identify directly an
equivalent band about the measured peak ozone value. A model predicting
an ozone peak within that allowable band would be judged as acceptable
under this rationale.

We can illustrate the technique by means of an example. Suppose the
measured peak ozone was 0.16 ppm and the 6-9 a.m. NMHC/NO was estimated
A
to be 9.5. This point is denoted on the figure as A. From Figure D-11,
we see that the PHCR is about 70 percent. If we allow an uncertainty in
the PHCR of ± 10 percent, we see that the value based upon model predic-
tions of the peak must lie between 60 and 80 percent. The corresponding
values of peak ozone are determined from points C and B, respectively, on
the PHCR diagram. For a model to be judged as acceptable, it must
predict an ozone peak value, Cp , such that 0.122 <. Cp <. 0.24 ppm or
76 <. Cp- /Cpm <. 150 parcent.

Several general observations may be made about the above results,
though we caution that they are particular to ozone as a pollutant.
Among the observations are the following:

> Because of the characteristic shape of ozone PHCR diagrams,
the upper value of the allowable tolerance band is less
restrictive then the lower one. This is illustrated clearly
in the example.
D-30
-------
> The allowable band for Cp is always bounded on the upper
and lower side (as contrasted with the HE rationale which
calculates only a lower bound).
> In those cities for which use of the ozone isopleth shown
in Figure D-ll is appropriate and where the 6-9 a.m.
NMHC/NO is greater than about 5 or 6, the width of the
allowable band for Cp is not strongly sensitive to the
value of NMHC/NO .
/v

b. The Relationship Between CLU and Control Cost

While the allowable uncertainty in control level (± a in the above
example) may be set in many ways, we examine here one important means to
do so: the explicit use of regional pollution control costs, if these can
be specified unambigously. We might, for instance, choose as our guiding
principle the following: The uncertainty in the total cost of regional
pollution control should not be greater than a certain value 6. We may
restate this in terms of model performance. The level of control deriving
from the predicted peak, Cpp, should not differ in cost by more than a
certain amount from that level determined based upon the measured peak, CpR.

To proceed we must define the total regional cost of pollution control,
TC. Depending on the level of control required, alternative regional
control strategies can be designed. The cost of each generally can be
specified, at least in approximate terms. By plotting the cost of a series
of "preferred" strategies against the level of control they achieve, TC
can be determined, as shown in Figure D-12.

Several aspects of the TC curve should be noted. While TC is zero
for a PCR of zero, any non-zero value of PCR has associated with it a
minimum, non-zero cost. Thus, the TC curve really "begins" with a step
function at PCR = 0. TC rises quickly at first as many fixed costs
of control are incurred.. The cost then increases more slowly as fnxed
costs are spread over greater values of PCR. Finally, at high levels of
PCR each additional amount of control becomes more difficult (and more
expensive) to achieve. The TC function, consequently, rises rapidly.

D-31
-------
C_3
tn
O
O
•M
c
O
CJ
O
PCR
PCR
1 ' ~"0
Percentage Control Required
100
FIGURE D-12.
TOTAL REGIONAL CONTROL COST AS A FUNCTION
OF THE LEVEL OF CONTROL REQUIRED
Once the total cost function has been defined, the allowable band for
the predicted ozone peak can be found in the following way:

> Step 1. The nominal control level PCRQ can be deter-
mined using a diagram such as that in Figure D-10.
With all-NMHC control as considered in deriving
that figure, PCRQ is identical to PHCRQ,
> Step 2. The nominal control cost, TCQ, can be found using
a TC diagram similar to the one in Figure D-ll.
> Step 3. The maximum and minimum allowable TC values then
can be calculated and the corresponding bounds
on PCR determined.
> Step 4. Using the PHCR diagram once again, the allowable
bounds on predicted peak ozone can be found by
employing the PCR bounds found in Step 3.
D-32
-------
The above procedure is a straightforward one creating a
structure in which control cost uncertainty can be considered explicitly.
The example presented, however, is appropriate only for considering ozone
in those regions having ambient conditions simple enough to be represented
by an isopleth diagram. Extension of the procedure to other pollutants
and into regions of greater atmospheric complexity requires that additional
research be conducted beyond the scope of the current effort.

3. Guaranteed Compliance Rationale

As formulated in the federal regulations, the NAAQS are explicit,
with maximum pollutant levels specified that must not be exceeded with
greater than a certain frequency. Peak one-hour concentrations of ozone,
for instance, must not exceed 0.08 ppm more often than once per year.
With the standards written in such an absolute fashion, it may be argued
that little room exists for uncertainty about achieving compliance. Under
such circumstances, a model's performance should be constrained to
"guarantee" that its use will not lead to underestimating the degree of
emissions control required.

Model behavior can affect significantly the likelihood of meeting
the NAAQS. In those regions currently in noncompliance, the effective-
ness of candidate control strategies can be assessed only by means of
model predictions of the peak concentrations resulting from each. If a
model systematically underpredicts the peak value for concentrations
near the NAAQS, the adequacy of controls might be overestimated. Similarly,
if the model overpredicts the peak, controls designed using it might be
excessive.

a. Description of the GC Rationale

With the above in mind, we examine the Guaranteed Compliance (GC)
rationale for setting model performance standards. We state its guiding
principle as follows: Compliance with the NAAQS must be "guaranteed,"
D-33
-------
with all model uncertainty on the conservative side even if it means
introducing a systematic bias into model predictions. The term
"guaranteed" should be taken here in a limited sense. We intend it
to mean that "the probability is very small" that a model will predict
a peak value less than the standard when its actual value is greater.

We illustrate this principle using the diagrams in Figures D-13
and 14. In these figures we illustrate two models, one "conservative"
(Figure D-13) and the other "nonconservative" (Figure D-14). For each,
we show two cases: an actual peak concentration, CA, higher than the
NAAQS, Cs, and one near the standard. We represent the probability density
function of the model as f(C) and the expected value of the predicted peak
as C". Two types of uncertainty affect a model's performance. The first
includes error in model inputs and uncertainty in the values of the model
parameters themselves. These affect the shape of f(C). Uncertainty of
the second type is due to the inability of the model formulation to re-
present reality fully. The difference between the expected model predic-
tion, £, and the actual value, CA, of the peak concentration is a measure
of the effect of formulation errors. As we define it here, a "conserva-
tive" model is one for which the value of (f exceeds CA, while for a "non-
conservative" model the reverse is true. In both figures, the shaded area
A represents the probability that the model will predict a peak concentra-
tion less than the standard at the same time the actual value is greater.

With the GC rationale, we want to insure that A remains acceptably
small. In mathematical terms, we insist that
A « / f (C)dC <. £ , (D-25)
where c is some suitably small number. From the figures we see that A
can be kept small only if iC exceeds C.. Under the requirements of the
GC rationale, only a model having these characteristics would be judged
acceptable.
D-34
-------
c
u
c
C-1

L.
a
u
s_
o
c
r
cr
c.1
NAAQS ACTUAL
CSCA
Peak Concentration

(a) Peak Concentration
Higher than the NAAQS
Peak Concentration

(b) Peak Concentration
Near the NAAQS
FIGURE D-13.
UNCERTAINTY DISTRIBUTION FOR
A CONSERVATIVE MODEL
O) t
u
3
U
U
cr
u

QJ
z:
—
NAAQS
ACTUAL
f(C)
01
u
3
U

8
>
u
c
01
-
•cr
OJ
ACTUAL
Peak Concentration

(a) Peak Concentration
Higher than the NAAQS
Peak Concentration

(b) Peak Concentration
Near the NAAQS
FIGURE D-14. UNCERTAINTY DISTRIBUTION FOR
A NONCONSERVATIVE MODEL
D-35
-------
A practical consideration now becomes important. For peaks near
the NAAQS, we have no way of knowing the actual peak, C^, whose value
we are trying to predict. This is clearly so. Until emissions control
has been implemented and ambient conditions "improve," we cannot estimate
C. with measurement data. Our strategy using the GC rationale is as
follows:

> Step 1. We assume C^ = C$ and estimate the amount by
which £ must exceed CA in order that A <_ %.
> Step 2. We then use the model to predict the peak under
current (uncontrolled) conditions, C* for which
we have measurement data to estimate the current
peak, CA*.
> Step 3. To judge acceptability, we require the model
prediction, C*, to exceed C.* by as much as C~
exceeded C. when C. = C,.. Actually, this is a
bit more complicated. Since C.* is based upon
measurements, it is subject to instrumentation
error. We know CA* only in terms of a measured
value and its probability density function. There-
fore, we must consider the comparison of C.* and
C* statistically, requiring the probability that
C* exceeds C.* by C-C. to be greater than some
large value (near 1.0).

We have invoked several important assumptions here, whose general
validity would require further verification if the GC rationale were to
be applied in judging model performance. Among them are the following:

> (T maintains the same relationship to CA for ambient condi-
tions ranging from current ones to those characterizing
compliance with the NAAQS.
D-36
-------
> The probability density function, f(C), is known or
can be determined, as can C.
> Instrumentation uncertainty can be characterized,
allowing Step 3 to be accomplished.

There are several difficulties associated with the GC rationale
approach, however, some of which are conceptual and some practical.
Among the most important of the conceptual difficulties is the intro-
duction of a conservative bias into model predictions. By insisting
that the model "overpredict" peak concentrations, almost certainly
we will select abatement strategies requiring more control than needed.
Difficulties of the practical kind also can be significant. For most
models, determination of f(C) is a difficult (and usually impractical)
process. The uncertainty in predicting the peak is partially due to
uncertainty in the data input to the model. Since the model results are
related to inputs only in a complex and nonlinear way, estimating the
output uncertainty distribution in terms of the input error distributions
seldom can be done directly. While a Monte Carlo-type of analysis in
principle can be conducted, the number of model runs required and the
amount of computing resources consumed are so considerable as to render
such an analysis impractical.

b. A Possible Simplification

Short of doing a Monte Carlo analysis, is there anything useful that
can be determined? In certain simple circumstances, there is. We may
infer, when appropriate, some limited information about f(.C), C and C^.
To do so, we first recall the modified form of Tchebycheff 's inequality,
PJIx - ii| itej <> > (D-26)
1 *
9k
where P is the probability that -ka >. x - " and -ko *• * ' T1» 1 1s a random
variable, n is its expected value, and o is its standard deviation. This
D-37
-------
relationship holds for all probability distributions. We can adapt it
to the present problem by rewriting it in the following way:
(D-27'
where C is a random variable whose value is the peak concentration pre-
dicted by the model, £ is its expected value, and OG is its standard
deviation. Cp is the standard (NAAQS).
$
The relation in Eq. D-27 is a useful one. The area A in Figures D-13
and 14 represents the same probability as that on the left hand side of
Eq. D-27. Using Eq. D-25, we may now write
(D-28)
where £ is the maximum allowable value of A. From this, we may infer the
minimum allowable value of o /(C - C"). Its value is
C 5
Still, we need an independent approximation of oc in order to solve
Eq. D-29 for the minimum value of C - C$. To do so, we estimate the
maximum value a is likely to assume, that is, the aQ* such that
D-38
-------
o <_a * • (D-30)
C C

If we then use OG* in Eq. D-29, we can determine (C" - Cs)mi-n-

Suppose we represent model behavior with a system response function,
«j>, that tranforms model inputs into the model-predicted concentration
peak, i.e.,

C = »(£) , (D-31)

where C is the predicted peak, an e_ is the vector of model inputs. Suppose
further that we know the probability distributions of each of the input
errors, and that we can identify their one-sigma variations, a£.. If so,
we can determine the maximum change in the predicted peak that would occur
if all error sources varied simultaneously by a standard deviation from
their nominal values. We note that increases in some inputs lower C and
others raise it. Thus, to bound the value of AC, we consider the root-mean-
square of the changes in C as each input is varied separately. This max-
imum AC can be written as
f\ 4* *!..+„ - *L. (D-32)
where each e. (1 <. i <. N) is varied separately and the corresponding change
in peak concentration is represented by the quantity in the brackets. If
we assume that AC is a suitable estimate of OG*, we can write (using
Eq. D-29)
- C ) = •AC , (D-33)
Vmin '
D-39
-------
which provides an indication of the amount of "overprediction" the model
must provide.

We now present an example. Suppose we consider a simple Gaussian
model [no reflection, continuously emitting source), whose only source
of error is the wind speed, U. We assume the following: ou = 0.5 m/sec,
U = 2 m/sec, and C = 35 ppm (the one-hour federal standard for CO). Using
Eq. D-32, we determine that AC = 7 ppm. Then, using Eq. D-33 and assuming
that e = .05, we estimate that (C~ - Cs)min = 14.7 ppm. Using the GS rationale,
we would require when modeling current ambient conditions that the model
over-predict the peak by this same amount (assuming that there was no error
associated with the measurement).

c. The GC Rationale: An Assessment

We have included the GC rationale in our discussion primarily for the
sake of completeness. While the guiding principle underlying it--
"guaranteeing" that an adequate abatement strategy will be designed—
has its virtues, the method as conceived here has significant problems
associated with its use. It is cumbersome and impractical, except in the
most limited of circumstances. Also, it may be excessively conservative,
introducing a systematic bias into model evaluation.

Unless the major problems noted here can be solved somehow, the other
rationales considered in this chapter appear to have greater promise. We
do not recommend that this rationale be pursued extensively in any additional
work.

4. Pragmatic/Historic Rationale

Experience is growing in the use of air quality simulation models. They
have been applied to a variety of problems in a number of different situa-
tions. As an familiarity grows with both their capabilities and limitations,
we become more able to foresee their behavior in new applications. Taking
D-40
-------
advantage of our growing expertise, we may find it reasonable to set per-
formance standards for models based upon the following principle: In
each new application a model must perform at least as well as the "best"
previous performance of a model in its generic class in a similar application.

This approach is a pragmatic one, forced upon us by some very practical
considerations: our limited ability to derive theoretically justifiable
values for the standards and the number of different measures required to
characterize fully model performance. Five major problem areas exist in
characterizing the agreement of model predictions with field observations.
The model may be judged on its ability to predict the concentration peak,
to avoid systematic bias, to limit absolute error, to maintain spatial
alignment, and to reproduce temporal behavior of concentrations. To assess
a model's performance in these five areas, we recommended earlier in this
chapter the use of a number of different performance measures. Our chief
difficulty is as follows: There are as yet few theoretical means to assign
appropriate values for these measures. 'We have identified in this report
several promising candidates for judging the prediction of peak concentrations
Additional work is required, however, to determine appropriate standards
for many of the other measures.

While such additional work is proceeding, what must we do? Many issues
of great practical interest are pending, each of which requires the eval-
uation of model performance. Revisions to State Implementation Plans, for
instance, must be reviewed. Model performance studies now being conducted
by the EPA must continue.

We recommend that the Pragmatic/Historic rationale be used to set
acceptable bounds for performance measures for which no other better method
exists. As research provides greater insight into "better" rationales, we
recommend appropriate updates to the standards.
D-41
-------
To employ this rationale the following steps might be followed:

> Step 1. The proposed application is categorized, identifying
the group of previous studies with which its per-
formance must be compared. The criteria by which
this might be done could include pollutant type,
prevailing meteorology, source geometry, and terrain
irregularity.
> Step 2. Performance measures appropriate to the applications
category are calculated.
> Step 3. Calculated values are compared with the "best" values
previously attained in a similar application.

For the Pragmatic/Historic rationale to be of use, the EPA would
have to accomplish the following steps. A scheme for classifying appli-
cations into "similar" categories needs to be developed. Then, data on
previous modeling efforts needs to be assembled and appropriate perfor-
mance measure values calculated. Finally, a mechanism for updating the
"performance data base" needs to be established. Such a mechanism would
require the EPA to assume a custodial role over the data base, amending
it as results of new modeling studies become available.
D-42
-------
REFERENCES
Ames, J., et al. (1978), "The User's Manual for the SAI Airshed Model,"
EM78-89, Systems Applications, Incorporated, San Rafael, California.

Anderson, G. E. (1978), private communication.

Anderson, 6. E., et al. (1977), "Air Quality in the Denver Metropolitan
Region 1974-2000," EF77-22, Systems Applications, Incorporated,
San Rafael, Calfiornia.

Argonne (1977), "Report to the U.S. EPA of the Specialists' Conference on
the EPA Modeling Guideline," 22-24 February 1977, Argonne National
Laboratory, Argonne, Illinois.

Burton, C. S., et al. (1976), "Oxidant/Ozone Ambient Measurement Methods,"
EF76-111R, Systems Applications, Incorporated, San Rafael, California.

Calder, K. L. (1974), "Miscellaneous Questions Relating to the Use of Air
Quality Simulation Models," Proc. of the Fifth Meeting of the Expert
Panel on Air Pollution Modeling, Chapter 6, NATO/CCMS.

Code of Federal Regulations [CFR] (1975), Title 40 (Office of the Federal
Register, U.S. Government Printing Office, Washington, D.C.).

EPA (1977), "Uses, Limitations and Technical Bases of Procedures for
Quantifying Relationships Between Photochemical Oxidants and Precur-
sors," EPA-450/2-77-021a, Office of Air Quality Planning and Standards,
Environmental Protection Agency, Research Triangle Park, North Carolina.

(1978a), "Workbook for the Comparison of Air Quality Models,"
EPA-450/2-78-028a,b, Office of Air Quality Planning and Standards,
U.S. Environmental Protection Agency, Research Triangle Park, North
Carolina.

(1978b), "Guidelines on Air Quality Models," EPA 450/2-78-027,
Office of Air Quality Planning and Standards, Environmental Protec-
tion Agency, Research Triangle Park, North Carolina.

Johnson, W. B. (1972), "Validation of Air Quality Simulation Models," Proc.
of the Third Meeting of the Expert Panel on Air Pollution Modeling,
Chapter VI, NATO/CCMS.

Liu, M. K., and D. R. Durran (1977), "The Development of a Regional Air
Pollution Model and Its Application to the Northern Great Plains,"
EPA-908/1-77-001, Office of Energy Activities, U.S. Environmental
Protection Agency, Denver, Colorado.

R-l
-------
Rosen, L. C. (1977), "A Review of Air Quality Modeling Techniques," UCID-
17382, Lawrence Livermore Laboratory, Livermore, California.

Roth, P. M., moderator (1977), "Report of the Validation and Calibration
Group (II-5)," in "Report to the U.S. EPA of the Specialists' Conference
on the EPA Modeling Guideline," pp. 111-120, 22-24 February 1977,
Argonne National Laboratory, Argonne, Illinois.

Roth P M., et al. (1976), "An Evaluation of Methodologies for Assessing the
Impact of Oxidant Control Strategies," EF76-112R, Systems Applications,
Incorporated, San Rafael, California.
R-2
-------
TECHNICAL REPORT DATA
(Please read Instructions on the reverse before completing)
REPORT NO.
PA-450/4-79-032
2.
3. RECIPIENT'S ACCESSION-NO.
. TITLE AND SUBTITLE
'erformance Measures and Standards for Air Quality Simu-
ation Models
5. REPORT DATE
October 1979
6. PERFORMING ORGANIZATION CODE
. AUTHOR(S)

i. R. Hayes
8. PERFORMING ORGANIZATION REPORT NO.
. PERFORMING ORGANIZATION NAME AND ADDRESS
Jystems Applications, Incorporated
950 Northgate Drive
San Rafael, California 94903
10. PROGRAM ELEMENT NO.
11. CONTRACT/GRANT NO.
68-02-2593
2. SPONSORING AGENCY NAME AND ADDRESS
13 TYPE OF REPORT AND PERIOD COVERED
Office of Air Quality Planning and Standards
U. S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
3.TYPE OF REPOR1
Final Report
14. SPONSORING AGENCY CODE
15. SUPPLEMENTARY NOTES
16. ABSTRACT
(Currently there are no standardized guidelines for evaluating the performance of
air quality simulation models. In this report we develop a conceptual framework for
objectively evaluating model performance. We define five attributes of a well-
behaving model: accuracy of the peak prediction, absence of systematic bias, lack of
gross error, temporal correlation, and spatial alignment] The relative importance of
these attributes is shown to depend on the issue being-aTraressed and the pollutant
being considered. Acceptability of model behavior is determined by calculating
several performance "measures" and comparing their values with specific "standards."
raflure to demonstrate a particular attribute may or may not cause a model to be
rejected, depending on the issue and pollutant.
Comprehensive background material is presented on the elements of the performance
evaluation problem: the types of issues to be addressed, the classes of models to be
used along with the applications for which they are suited, and the categories of
aerformance measures available for consideration. Also, specific rationales are
developed on which performance standards could be based. Guidence on the inter-
pretation of performance measure values is provided by means of an example using a
large, grid-based air quality model.
17.
KEY WORDS AND DOCUMENT ANALYSIS
DESCRIPTORS
b.lDENTIFIERS/OPEN ENDED TERMS
c. COSATl Field/Group
Air Pollution
Turbulent Diffusion
Mathematical Models
Computer Models
Atmospheric Models
Dispersion
Air Quality Simulation
Model
Model Validation
Model Evaluation
18. DISTRIBUTION STATEMENT

Release Unlimited
19. SECURITY CLASS (ThisReport)

None
20. SECURITY CLASS (Thispage)

None
21. NO. OF PAGES

311
22. PRICE
EPA Form 2220-1 {9-73}
-------