United States
Environmental Protection
Agency
Office of Air Quality
Planning and Standards
Research Triangle Park NC 27711
EPA-450/4-79-033
October 1979
Air

Procedures for Evaluating
the Performance of Air
Quality Simulation Models

-------
                                             EPA-450/4-79-033
Procedures for Evaluating  the  Performance
       of Air Quality Simulation Models
                                by

                     M.J. Hillyer, S.D. Reynolds, and P.M. Roth

                      Systems Applications, Incorporated
                          950 Northgate Drive
                       San Rafael, California 94903
                         Contract No. 68-02-2593
                      EPA Project Officer: Russell F. Lee
                             Prepared for

                  U.S. ENVIRONMENTAL PROTECTION AGENCY
                      Office of Air, Noise, and Radiation
                   Office of Air Quality Planning and Standards
                  Research Triangle Park, North Carolina 27711

                            October 1979

-------
This report is issued by the Environmental Protection Agency to report
technical data of interest to a limited number of readers. Copies are
available free of charge to Federal employees,  current contractors and
grantees, and nonprofit organizations - in limited quantities - from the
Library Services Office (MD-35), U.S. Environmental Protection Agency,
Research Triangle Park, North Carolina 27711; or for a nominal fee,
from the National Technical Information Service, 5285 Port Royal Road,
Springfield,Virginia 22161.
This report was furnished to the Environmental Protection Agency by
Systems Applications, Inc., 950 Northgate Drive, San Rafael, CA 94903,
in fulfillment of Contract No. 68-02-2593. The contents of this report
are reproduced herein as received from Systems Applications, Inc.
The opinions, findings, and conclusions expressed are those of the
author and not necessarily  those of the Environmental Protection Agency.
Mention of company or product names is not to be considered as an endorse-
ment by the Environmental  Protection Agency.
                   Publication No. EPA-450/4-79-033
                                    11

-------
                          ACKNOWLEDGMENT
     The authors would like to express their appreciation to Stan Hayes
for many helpful discussions and useful comments during the preparation
of this report.
                                    iii

-------
                               CONTENTS



DISCLAIMER	     ii

ACKNOWLEDGEMENT	    Hi

LIST OF ILLUSTRATIONS	    vii

LIST OF TABLES	     ix

LIST OF EXHIBITS	     xi

  I  INTRODUCTION	      1

     A.  Purpose of This Study	      2

     B.  Structure of This Report	      3

 II  REVIEW OF PREVIOUS MODEL EVALUATION EFFORTS  	      5

III  KEY ISSUES IN THE EVALUATION OF AIR QUALITY	      9

 IV  A PROCEDURE FOR EVALUATING MODEL PERFORMANCE	     17

     A.  Problem Specification and Model Selection  	     20

         1.  Setting the Context of Model  Usage	     20
         2.  Selection of a Model	     22

     B.  Planning the Model Evaluation Study 	     23

         1.  Determination of the Need for Model  Evaluation.  ...     23
         2.  Development of a Conceptual Evaluation  Plan  	     26
         3.  Examination of Existing Data  Bases	     37
         4.  Assessment of Data Needs	     43
         5.  Assessment of the Need To Collect Additional
             Data	     64
         6.  Specification of Performance  Standards  and
             Measures	     65

     C.  Identification of the Scope and Requirements  of
         Model  Evaluation	     65

     D.  Performance of the Model Evaluation 	     67

         1.  Adapting the Model for Use in the  Study	     67
         2.  Gathering, Assembling, and Formatting  the
             Required Data	     71
         3.  Exercising the Model	     72

-------
         4.   Analyzing the Results of the Evaluation	    72
         5.   Assessing the Need To Perform Further
             Evaluation	    75
         6.   Evaluating the Adequacy of the Model	    75
     E.   Evaluation for Screening Applications  	    76
     F.   Perspective	    77
  V  RECOMMENDATIONS 	    79
     A.   Institutional Needs 	    79
     B.   Areas for Technical Development 	    81
     C.   Documents To Be Compiled	    82
     D.   Summary	    83
APPENDIX:  SUMMARY OF PREVIOUS EVALUATION STUDIES	    85
REFERENCES	   141
                                    vi

-------
                            ILLUSTRATIONS
IV-l  Simplified Flow Diagram of Tasks  in the Model
      Evaluation Process ......................      '8

IV-2  Comparison of Calculated Ground Trajectory with
      Observed Tetroon Trajectory for the Los Angeles  Basin .....      49

IV-3  Surface CO Concentrations Calculated for the
      San Francisco Bay Area  at 1400 on 26 July 1973 Using
      the LIRAQ Air Quality Model  at Different Grid Sizes ......      69

 A-l  A Typical Cumulative Frequency Distribution of
      24-Hour-Average S02  Concentrations Measured and
      Calculated Using the CRSTER Model ...............    104

 A-2  J.  M.  Stuart Plant Cumulative  Frequency Distribution
      for One-Hour-Average S02 Concentrations at All Stations.  ...    105

 A-3  Measured SFs Concentrations and Predictions Using  PTMTP
      for Stability Categories B Through F .............    108

 A-4  Scatter Diagram Comparing Observed and  Calculated  Con-
      centrations Obtained from the  Standard  Gaussian  Plume
      Model  Under Slightly Unstable  Conditions (C Stability)
      for the High Source  at  the Western Kraft Corporation  .....    109

 A-5  Comparison of CO Measurements  and Predictions from Four
      Models Examined by Maldonado and  Bubbin ............    114
 A-6  Time  Variations  over  All  Stations  of Observed One-
      Hour-Average  Ozone  Concentrations  and  the  Corres-
      ponding  Predictions Obtained  from  the  SAI  Airshed Model.  .  .  .    120

 A-7  Predictions of the  SAI Airshed Model Compared with
      Estimates  of  Instrument Errors for Ozone  (Date for
      3  Days,  9  Stations, Daylight  Hours) ..............    121
                                  vii

-------
A-8   Comparison of Measured and Predicted 25-Day-Averaged
      S02 Concentrations (1-26 February 1965)	    128

A-9   Comparison of Predictions and Measurements of S02 Con-
      centrations in St. Louis Reported by Shir and Shieh	    130

A-10  Comparison of Predicted and Measured Two-Hour-Averaged
      SO? Concentrations at Each Monitoring Station in
      St. Louis	    131

A-ll  Comparison of Predicted and Measured Two-Hour-Averaged
      Frequency Distributions of SO? Concentrations According
      to Wind Sectors	    132

A-12  Comparison of Frequency Distributions of Predicted
      and Measured Two-Hour-Average S02 Concentrations According
      to Ambient Temperature Range 	    133

A-13  Comparison of Frequency Distributions of Predicted and
      Measured Two-Hour-Average SO? Concentrations for
      Three Stations 	    134

A-14  Statistical Quantities Calculated for Nine Air Quality
      Simulation Models	    138
                                   viii

-------
                               TABLES



 IV-l    Possible Data Requirements of Air Quality Models ....      30

 IV-2    Possible Hardware Requirements for Data Collection ...      36

 IV-3    Number and Type of Daily Meteorological Measurements
         in 15 U.S. Cities in 1977 ................      41

 IV-4    Methods Used for the Preparation of Emissions
         Inventories in 15 U.S.  Cities in 1977 ..........      4Z

 IV-5    Number of Stations Performing Routine Air Quality
         Sampling in 15 Major Cities in the United States ....      45

 IV-6    Estimated Meteorological Data Requirements for
         Evaluation of a Large Air Quality Simulation
         Model in a Hypothetical  Urban Area ...........      51
 IV-7    Levels of Detail  in Data Used as Input to Photochemical
         Air Quality Simulation Models ..............      58

 IV-8    Values and Sources of Errors in Meteorological
         Measurements ......................      61

 IV-9    Uncertainties in  Measurements of Pollutant Con-
         centrations .......................      62

IV-10    Model  Performance Measures and Standards ........      66

  A-l    Models Considered in Model Evaluation Studies
         Described in This Appendix ...............      87

  A-2    Statistical Summary of Observed Two-Hour SCL
         Concentrations (in yg/m^) for St. Louis Stations
         and Concentrations Calculated Using a Gaussian
         Plume  Model .......................      91

  A-3    Statistical Summary of Observed One-Hour SCL
         Concentrations (in yg/m3) for Chicago Stations
         and Concentrations Calculated Using a Gaussian
         Plume  Model .......................      92

  A-4    Comparison of Error Distributions for Two-Hourly
         St. Louis and Hourly Chicago Validation Calculations .  .      93
                                    ix

-------
 A-5   Observed, Predicted, and Observed Minus Pre-
       dicted Concentrations by Wind Speed Class for
       St. Louis Data	      94

 A-6   Observed, Predicted, and Observed Minus Pre-
       dicted Concentrations by Wind Speed Class for
       Chicago Data	      94

 A-7   Statistics for Long Term Average Predictions
       of S02	      95

 A-8   Summary of Accuracy of Sampling Intervals for
       Estimating Distribution of Predicted Concentra-
       tions over a Season	      96

 A-9   Comparisons of Measured and Predicted One-Hour-
       Average S02 Concentrations in New York City	      99

A-10   Model Comparisons for Annual Mean S0? Concen-
       trations Using New York City Data.	     100

A-ll   Model Comparisons for Annual Mean Particulate
       Concentrations Using New York Data 	     101

A-12   Correlation Between Observed SF& Concentrations
       and Concentrations Calculated Using PTMTP, by
       Stability Class	     108

A-13   Parameter Combinations Employed in the Prahm and
       Christensen Study and a Summary of the Results 	     Ill

A-14   Statistical Results from the Maldonado and
       Bullin Study 	     113

A-15   Occurrence of Corresponding Levels of Predicted
       and Observed Ozone	     122

A-16   Statistical Measures Used in Verification of
       the LIRAQ Photochemical Model   	     124

A-17   Statistics for the  LIRAQ Model Evaluation Study	     125

-------
                               EXHIBITS
IV-l   Air Quality  Monitoring  Required by the EPA for
      Various  Pollutants  	     44
      Vario
IV-2   Performance  Specifications for Automated
      Measurement  Methods	     63
                                   xi

-------
                          I    INTRODUCTION


     Over the past several  years there has been a significant increase in
the use of air quality models.  This increase has stemmed primarily from
governmental  regulations, notably the Clean Air Act of 1967 and amendments
to it in 1970 and 1977.  Air  quality models are potentially applicable to
the evaluation of State Implementation Plans, Prevention of Significant
Deterioration, New Source Reviews, and Indirect Source Reviews.  In addi-
tion, they can be used in the preparation of Environmental Impact Statements
and in support of various litigation issues.  In these applications the
pollutants of main interest are those for which air quality standards have
been developed, including particulates, SOp. CO, oxidants, and NOp.  Other
pollutants, such as hydrocarbons and NO, must be considered in some appli-
cations because they are involved in the formation of pollutants regulated
by standards.  The issues and the pollutants mentioned above pose many dif-
ferent requirements for air quality modeling, and the standards themselves
add further requirements.   They necessitate pollutant concentration predic-
tions over averaging periods  ranging from one hour to one year.  Many models
have been developed to deal with the different issues, pollutants, and stand-
ards in different physical  settings, but no single model is suitable for all
applications.

     Given this variety of  models, there is a need for an adequate under-
standing of the performance of an air quality model, both relative to that
of other models and in an absolute sense in a particular application.  This
understanding is required to  enable the user to choose an appropriate model
and to ensure that it performs adequately in the intended application.  In
the past, model performance was usually evaluated by the model developers
in the course of an application or a special evaluation study.  As model
evaluation becomes an integral part of air quality studies and as  the number

-------
of these studies increases, there will be an increased need for uniform
procedures for conducting model evaluation studies.   At the present  time,
uniform procedures do not exist.

A.   PURPOSE OF THIS STUDY

     Stimulated by the need for an objective framework for conducting
evaluations of model performance, as expressed in the "Report to the U.S.
EPA of the Specialists' Conference on the EPA Modeling Guideline" (Argonne,
1976), the Environmental Protection Agency (EPA) requested that Systems
Applications, Incorporated carry out a study under Work Assignment 1 of
Contract 68-02-2593.  This work assignment calls for the development of
evaluation procedures for both short-term and long-term models, for  both
area-wide and site-specific applications.  In a companion study under  the
same work assignment, model performance measures and standards are being
examined to aid in assessing the relative accuracy and usability of  any
model for various applications (Hayes, 1979).

     The objective of this study is to lay the groundwork for a set  of
guidelines that can be implemented to aid in assessing performance
characteristics of air quality models.   In carrying out this work,
we have:

     >  Reviewed previous model evaluation studies.
     >  Developed a general procedural framework for performing
        an evaluation study.
     >  Provided specific guidance, to the extent possible, with
        respect to the work required in each step of the perfor-
        mance evaluation procedure.
     >  Examined the role appropriate organizations should play in
        overseeing model  evaluation studies.
     >  Identified gaps in present knowledge that limited our ability
        to provide more detailed guidance in this report, and
        presented recommendations for further work that will help to
        fill those gaps.

-------
Because model evaluation has received relatively little systematic attention
to date, we were able to identify several  areas  ripe for future investigation.
The performance of these suggested studies will  be essential  to the success
of the guidelines presented herein.

     Some of the terminology used to describe the model evaluation process
may require clarification.   The phrases "model validation" and "model verifi-
                                                              »
cation" are frequently used to designate the process of comparing model  pre-
dictions with suitable observations.  In this report,  however,  we avoid  the
terms "validation" and "verification" when referring to the evaluation of
a model.  Instead, we take  a more general  view and use the phrase "model
performance evaluation," which we believe  to be  more indicative of the
process that this report describes.   The "validity"  of a model  is taken  to
be a concept defining how well model predictions would agree  with the
appropriate observations given a  perfect specification of model  inputs.   That
is, validity relates to the inherent quality of  the  model  formulation.  The
term "verification" is reserved to describe a successful  (or  positive) out-
come of the model evaluation process.

B.   STRUCTURE OF THIS REPORT

     Chapter II gives a brief overview of  previous model  performance evalua-
tion studies;  detailed descriptions  of the results of these studies are
given in the appendix.  The objective of Chapter II  is to provide some per-
spective on how model performance has been assessed  in the past.   In partic-
ular, we attempt to identify some of the weaknesses  and inadequacies of
previous efforts so that such shortcomings can be avoided in  the  proposed
guidelines.  The models employed  in  these  investigations range  from rela-
tively simple  Gaussian plume models  to comprehensive numerical  models suit-
able for studying photochemical air  pollution in urban areas.

     Based on  our review of previous studies and further consideration of
the model  evaluation process,  we  identify  in Chapter III some of the key
questions  and issues that need to be addressed in the model evaluation guide-
lines.  This discussion is  intended  to set the stage for the  more detailed
exposition of the guidelines given in Chapter IV.

-------
     Chapter IV is devoted to the development of a step-by-step procedure
for carrying out a complete model evaluation study.   This procedure is
segmented into four sequential phases:  problem definition and model  selec-
tion; preliminary planning efforts; formal definition of the evaluation
plan; and implementation of the evaluation study.   To the extent possible,
we discuss the elements of each step in each phase,  giving special  consider-
ation to the influence that a particular model or type of application might
have on how each step is performed.

     Chapter V concludes this report with a set of recommendations  for
future conduct of model evaluation studies.  Specifically, we identify:

     >  A set of institutional recommendations toward a possible
        organization for ensuring proper evaluation procedures in
        modeling work.
     >  A series of recommended research studies necessary to
        assemble vital background information for model evalua-
        tion.
     >  A list of guidelines documents that should be available
        to those model users who have a need to evaluate a model.

     It is our belief that implementation of these recommendations  in con-
junction with the guidelines presented in Chapter IV would yield a  complete
protocol for structuring a model performance evaluation study.

-------
         II    REVIEW OF PREVIOUS  MODEL  EVALUATION  EFFORTS

     The need for evaluating the  performance of air quality models is well
recognized by model  developers, and evaluation studies are often undertaken
as a part of model development.   In such an evaluation, the goal is to
determine whether the given  model  adequately reproduces representative ambient
air pollutant concentrations for  the  set(s) of conditions for which it was
designed.  "Representative"  refers to concentration measurements that are
commensurate with the model's predictions in terms of both temporal and
spatial scales.   The basic data requirements for a model evaluation study
are twofold:  the input  data necessary to exercise the model and the monitor-
ing data appropriate for checking the model's output.  In this chapter, we
review previous  model  evaluation  efforts, concentrating on the methods used.
The appendix describes the results of these studies in detail.

     Two aspects of air  pollutant concentration variations should be accounted
for by an air quality model.   First,  it should correctly predict the amounts
of pollutants that are formed (as  indicated by pollutant concentrations).
Second, it should account satisfactorily, as far as its formulation permits,
for both spatial and temporal  trends  in the pollutant concentration fields.
The accuracy with which  these trends  should be reflected depends somewhat
on the contemplated application of the model.  If it is to be used for asses-
sing population  exposure to  peak  pollutant impacts, accurate location and
timing of the predicted  peak would be desirable to take diurnal population
shifts into account.  In contrast, the model may be used to compute the
peak impact from a point source or in an urban airshed to judge compliance
with an air quality standard.   In that case, some offset in space and time
may be tolerable provided that the concentration itself is accurately
predicted.

-------
     Discrepancies between observed pollutant concentrations and those
predicted by a model can arise from several sources:

     >   Inaccuracy of  the data input to the model.
     >   Inadequate or  insufficient information to define or
         specify model  inputs  properly.
     >   Inadequacies and simplifying assumptions made in the
         model's formulation.
     >   Inaccuracy (due to both  measurement error and nonrep-
         resentativeness) of air  quality data used to evaluate
         model predictions.
     >   Inaccuracies introduced by numerical solution procedures.

In a well planned evaluation study, all data inadequacies are minimized so
that model inadequacies can be properly identified.

     In  the studies considered in this review, the models fell into two
basic categories:  Gaussian and numerical.  Evaluation procedures  were limited
to comparisons between measured and calculated pollutant concentrations.
In some  studies, these comparisons were made qualitatively using visual tech-
niques such as scatter plots; in others comparisons  were quantitative using
mathematical or statistical techniques to develop numerical figures of merit
for model evaluation.

     In  our review, we did not attempt to include all of the evaluation
studies  carried out to date; rather, we examined some representative examples
and the  methods used by different workers to evaluate air quality  models.   The
results  of these previous efforts can be used to guide our development of a
comprehensive and logically constructed model performance evaluation procedure.

     Numerous air quality models are based on the Gaussian formulation.  These
models have been developed to cover a wide range of applications,  from single
sources  to complete urban areas, and for averaging-times from one  hour to
one year.  Such models are widely used, particularly for assessing pollutant
impacts  from point sources.

-------
     Since all Gaussian models are based on the same dispersion theory,
they should be equally accurate in terms of that aspect.  However, other
model components, such as the treatment of plume rise, introduce differ-
ences among models.  A comparison of Gaussian model performance, classified
by model type (e.g., single-source models, multisource models, climato-
logical models), would provide valuable assistance in model  selection.
Unfortunately, no such comparison is available because no protocol exists
for performing it.  The evaluations of Gaussian models that  have been car-
ried out reflect the particular needs of the evaluators rather than the
general needs of model users.  The evaluations have varied widely in effec-
tiveness, ranging from comparative studies of three or foi'r  models to brief
studies of a single model under a restricted set of conditions.   Both
empirical and quantitative statistical  methods have been used, but the  lack
of standardized methods precludes comparisons of study results.   The model
comparisons that can be derived from studies of more than one  model  are lim-
ited by the data base as  well as by the performance measures used; generally
that data base has been limited to available data.   As we discuss later in
this report, available data are usually unsatisfactory for model  evaluation.
Thus, data bases collected for the specific purpose of the given study  are
highly desirable.

     Evaluation studies of numerical  models generally suffer the same prob-
lems as the Gaussian model studies:  noncomparable  methodologies and inade-
quate data bases.  Again, no absolute indicators of the performance of
particular models exist because of the  lack of appropriate performance
standards against which to measure them.   As in the Gaussian model case,
most evaluation studies were undertaken using a preexisting  data base that
is inadequate for model evaluation.

     Although many models have been subject to evaluative studies, no air
quality model has been adequately verified.  The main reason for this failure
is the lack of standards  against which  to measure model performance.  More-
over, no systematic procedures have been developed  for model evaluation or

-------
interpertation of evaluation results.  Consequently, studies cannot be
compared with one another or assessed in absolute terns.

     What little work has been carried out to investigate model evaluation
methodologies has centered on assessing the statistical aspects of node!
performance measures.

     Examination of the model evaluation studies considered in this chapter
and in the appendix reveals that many of these efforts are undertaken by
the model developer.  Although model evaluation is an important and proper
adjunct to model development, independent evaluation of air quality models
may provide a more objective basis for discrimination.  Also, it seems
reasonable to require that the validity of a model be established according
to some standard guidelines before the model is used.  Proper verification
of an air quality model will provide confidence that the model predictions
are a reasonable representation of the possible effect of a particular
future emissions pattern.

     Later in this report, we discuss the contents of such a^set of guide-
lines, dealing first with the general issues that must be addressed in a
model evaluation effort and then with more specific recommendations regarding
the components of such a study.  An indispensable corollary to the formula-
tion of guidelines for model evaluation studies is the establishment of
nodel performance measures and standards.  This problem is being addressed
in a concurrent study and is the subject of a companion volume (Hayes, 1979).
                                         8

-------
                Ill   KEY ISSUES IN  THE  EVALUATION  OF
                    AIR  QUALITY MODEL PERFORMANCE


     Our review of previous studies, discussed in Chapter II, identified
a number of issues that should be addressed in a model performance evalua
tion study.  The purpose  of this chapter is to delineate these issues and
to provide the reader with an introductory perspective regarding their
relevance.  These issues  form the basis for the step-by-step model evalua
tion procedure presented  in the next chapter.

     Model performance is influenced by two  factors:  First, the model
treatments of various physical and  chemical  phenomena usually involve
a number  of approximations;  second, the information provided as inputs
is often  subject to considerable  uncertainty.  To illustrate the first
point, we consider photochemical air quality simulation models.   Seinfeld
(1977) shows that the governing equation for such models can be written:
             ax        ay         z     ax \     x     ay
where
          c. = concentration of pollutant i,
      K,., K.. = horizontal and vertical diffusivities, respectively*
     u, v, w = wind velocity components in the x, y, and z directions,
               respectively,
          R. = net rate of  production of species i by chemical reactions ,
          S. = net rate of  emissions of species i.

-------
The symbol "-" indicates the result of averaging the conservation of mass
equation over an ensemble of flows, and the symbol "<>" indicates the result
of averaging over a volume corresponding to the size of a grid cell  (as
needed to obtain a numerical solution).

     To derive Eq.  (1) from the more general  mass conservation relationship,
the following assumptions are necessary:

     >  Molecular diffusion is negligible when compared with
        advective transport.
     >  Turbulent transport can be represented by the eddy-
        diffusivity concept.
     >  The influence of turbulent and subgrid-scale concentra-
        tion fluctuations on chemical reaction rates can be
        neglected.

The neglect of molecular diffusion is generally acknowledged to be satis-
factory in atmospheric applications (Seinfeld, 1977).  Thus, the primary
sources of inherent invalidity in Eq. (1) are the treatments (or neglect)
of turbulence effects on transport and on reaction phenomena.

     Model performance  is also affected  by the error associated with
each of the model parameterizations and  inputs.   In photochemical models
based  on  Eq.  (1), the sources of  these  uncertainties include
(Seinfeld, 1977):

     >  Wind velocity components.
        -  Uncertainties in available wind speed and direction
           measurements.
        -  Inadequate number or nonrepresentative locations of
           measurement sites (especially aloft).
        -  Approximations associated with wind field analysis
           techniques used to prepare model inputs.
     >  Source emissions function.
        -  Inaccurate specification of source location.
        -  Inaccurate estimation of plume rise.

                                  10

-------
        -  Errors in emissions factors for stationary sources.
        -  Inadequate representation of actual driving charac-
           teristics in the federal mobile emissions test
           procedures.
        -  Errors in emission factors and vehicle miles traveled
           for mobile sources.
        -  Inaccurate characterization of temporal  variations.
     >  Chemical reaction mechanism.
        -  Omission or inadequate characterization  of chemical
           reaction steps.
        -  Uncertainties in measurement or specification of
           reaction rate constants.
        -  Inaccurate characterization of temperature effects.
        -  Uncertainties associated with categorizing species  into
           reactive groups (such as paraffins, olefins, aromatics,
           etc.).
     >  Initial  and boundary conditions.
        -  Inadequate spatial  characterization of the concentra-
           tion  field on the upwind boundary of the region.
        -  Inadequate characterization of concentrations aloft.
     >  Numerical solution methodology.
        -  Computational errors associated with the use of finite
           difference methods.
        -  Computational errors associated with other numerical
           techniques.

     In addition, we note that the air quality data employed to  judge model
performance are  subject to error.  Instrumental errors can occur, or the
spatial character of the measurement may not be directly comparable to that
of the model  predictions (e.g., a point measurement may be compared to a
spatially averaged model prediction).   Thus, there  are a number  of sources of
error in even the most sophisticated air quality models.  Under nonideal
conditions, the  errors in predictions produced by relatively simple models
(e.g., Gaussian) can be greater than those generated by more sophisticated
techniques that  attempt to provide a more adequate representation of the
                                        11

-------
nonideal conditions or processes.  The purpose of the model  evaluation
study is to test the entire model, including formulation, available  aero-
metric and emissions data, and model input preparation procedures, to obtain
some quantitative measure of model performance.  Hilst (1978)  suggests  in
a recent report the additional need to evaluate the performance of specific
components of a model.  For example, special studies could be  performed
to assess the plume rise algorithm included in a Gaussian point source  model.
These types of investigations will be especially important in  the initial
applications of a model because they may aid in the identification and
rectification of inadequate treatments of atmospheric processes.

     A model evaluation study consists of several tasks,  many  of which  are
interrelated.  As the project team undertakes these tasks, a number  of
questions will arise.  Some of these issues can be resolved by establishing
performance evaluation guidelines, but others will have to be  addressed by
the user In light of the particular situation.  Among the most important
questions that may arise In an evaluative study are the following:

     >  How does the user select a suitable model?
     >  Under what circumstances must the user undertake a model
        evaluation study?
     >  What are the important elements of an evaluative study and
        how should it be planned?
     >  What are the resources available to the study team?
     >  What sources of model evaluation expertise are available
        to the user?
     >  What are the aerometric and emissions data requirements
        of the model?
     >  What are the appropriate measures and standards of model
        performance?
     >  What are the appropriate analyses of model results?
     >  When must the user mount a supplemental field measure-
        ment program?
     >  What should the user do if the performance of the model is
        found to be unsatisfactory?
                                     12

-------
 In the remainder of this chapter we identify additional  issues associated
 with these questions.

     The identification of a suitable model  for calculating air quality
 levels will be one of the first tasks to be  performed by a model  evalu-
 ation study team.  To aid the inexperienced  user,  appropriate model
 selection guidelines that address the various types of models and their
 range of applicability (e.g., pollutants, averaging time and chemical
 effects, etc.) should be published.   The user will  need  to give careful
 consideration to model selection to avoid, as far  as possible, having  to
 make extensive modifications to the model or even  having to select
 another model.

     To facilitate cost-effective and timely use of models, consideration
 should be given to identifying situations requiring model  performance  eval-
 uation.   For example, must the user always provide  direct  evidence of  model
 performance?  What is the value of previous  evaluative experience, and when
 can it be cited by the user in lieu  of conducting a new  study? What assis-
 tance can the EPA or other model evaluation  experts provide to the user in
making these important decisions?

     Once a decision to proceed with the evaluative effort has been reached,
the project team must develop a comprehensive plan  for the study.  The guide-
lines should identify the various elements of such  a study and detail  what
is required in each step.

     Three types of resources will  be required by the study team:  financial,
hardware (both aerometric measuring  instruments and computers), and manpower-
Key issues include the amount of funding required for the  evaluative study
and possible sources of financial assistance, the number of air quality and
meteorological instruments and other associated hardware (in addition  to
that available from the routine monitoring network) required in a  supplemental
field monitoring program, the computer hardware needed to  carry out the model
calculations, and the types of technical expertise  that  the study  team should
                                       13

-------
include.  It may be important to identify compromises that can be made to
reconcile the desired resources with those actually available.

     To estimate the required resources, the investigating team must have
a clear understanding of the model's data requirements and their relation
to the various influences on model performance, such as region size, terrain
characteristics, emissions patterns, meteorological conditions, etc.  If the
available aerometric and emissions data base is not adequate to characterize
some of the model inputs, the user will need some guidance in identifying
alternative means for specifying these parameters.

     Perhaps the best means for reconciling the data needs of a model with
the available data base is to undertake a special field measurement program
to provide aerometric and emissions data specifically for the model evalua-
tion and subsequent application.  Because the costs of such a program can be
considerable, the study team should have sufficient information to enable
them to effectively plan and implement such an effort.  To assess model
performance and results, measures and standards should be established that
take into account the types of available models and their application.  These
measures and standards should be readily computed from model results.  Their
publication will help users to identify the pertinent measures and standards
and to tailor the evaluation program to ensure that the standards are net.

     Given a set of model predictions, the user must perform various analyses
to assess model performance.  An important element of the model evaluation
guideline should be the identification of analyses that will help the user
to characterize model performance as well as to uncover any significant
biases or flaws in either the model formulation or its inputs.

     The achievement of satisfactory results from the analyses cited above
will enable the user to proceed with the intended applications of the model.
However, if the performance is found to be unacceptable, the study team will
need guidance in rectifying the situation.  For example, they might collect
                                    14

-------
 additional  supplemental  field data  with which  to prepare revised inputs,
 they night  attempt to improve one or more  of the technical  components  of
 the model,  or they might even select a  different model  and  assess its
 performance.   An important  consideration that  should be addressed in the
 guidelines  is the circumstances  (if any) in  which the user  may proceed
 with the  model applications even  though the  performance does not  meet  the
 standards.

     The final important problem we  consider here is whether the institu-
tional mechanisms exist that are needed  to  ensure that the many users  of air
quality models carry out adequate performance evaluation studies.  Problems
of particular concern include selecting  an  organization to oversee model
evaluation efforts and determining what  services should be provided to the
model user community.  As the number of  model performance evaluations
increases, especially those undertaken by relatively inexperienced model
users, many problems and questions will  undoubtedly arise.  There will  be
a need for a group of air quality modeling  specialists to provide the
requisite assistance for the users.

      In this  chapter we  have  raised  some of  the more important issues  and
 questions of  model performance evaluation.   The next chapter discusses
 these points  in more detail.
                                        15

-------
      IV   A  PROCEDURE FOR EVALUATING MODEL PERFORMANCE


     Whenever an air quality model  is to be used in a specific application,
the questions of the model's suitability and adequacy arise.  Therefore, an
important part of using an air quality model is an evaluation of that model's
ability to provide the desired results.  We discussed the general issues
involved in model evaluation in the previous chapter; in this chapter we
address these issues in more specific terms.  This discussion will indicate
how the various elements of the model evaluation process might vary depending
on the particular type of model and its intended application.

     Because proper evaluation of an air quality model involves many indivi-
dual steps, we begin this chapter with an overview of the process.  This
overview describes the relationships between the steps, and it identifies
the major steps and the subtasks involved.  Following the overview, we dis-
cuss in detail  the entire procedure.

     Figure IV-1  presents a simplified flow diagram for the tasks included
in a model evaluation.   As this figure indicates, we divide the total effort
into phases.   Before the model  evaluation proper is started, the problem
requiring the use of an air quality model must be defined.  The characteris-
tics of the problem are then examined to guide in choosing a particular
model.  Model selection is not part of an evaluation study per se, but it
does affect the planning of the evaluation study.  The complete model evalua-
tion study thus consists of four phases.  The first phase is the model selec-
tion followed by a series of planning tasks to identify the features of the
study.  Following these activities  is a definition phase, in which all of
the study requirements and attributes are formally assembled into a plan for
conducting the model performance evaluation.  Finally, the evaluation itself
is carried out.  This fourth phase  of the total study includes gathering da'ta>
making computer runs, and evaluating the results.
                                   17

-------
                                                                                       TORI rwiu»Ti« nust
00
                       FIGURE IV-1.   SIMPLIFIED FLOW DIAGRAM OF TASKS IN THE MODEL  EVALUATION PROCESS

-------
     The elements of a model  evaluation,  then,  can  be divided into the
following four tasks:

     >  Specify the problem and choose a  model  (not a part  of the
        evaluation study proper).
     >  Plan the model evaluation  study.
     >  Identify the scope and requirements  of  model  evaluation.
     >  Perform the model  evaluation  (i.e.,  execute the model  and
        assess the results).

     Each of those four tasks  can  be  further divided  into subtasks.   Problem
specification consists of subtasks that are  primarily involved with assess-
ment of the problem posed and  selection of a model  to aid in  its solution.
These subtasks are:

     >  Set the context of model usage
     >  Choose a model.

Further information on this topic  is  contained  in the EPA (1978a)  report.

     Once the problem has been defined and the  model  to be  used  has been
chosen, one must determine if a formal model evaluation is  necessary.  If
it is, the elements of the evaluation study  must be identified,  the data
requirements defined, and appropriate model  performance standards  and
measures chosen.  Much of that work can proceed concurrently.  For instance,
existing data can be gathered at the same time  that the complete data
requirements are being assessed.   Of course, the need for additional  data
collection can be determined only  after data requirements and availability
are known.  The necessary subtasks in this phase of the evaluation study are:

     >  Decide if model evaluation is necessary
     >  Develop a conceptual  evaluation plan
     >  Examine existing data bases
     >  Assess data needs
     >  Assess the need to collect more data
     >  Specify performance standards and measures.

                                   19

-------
      When  these  subtasks  have  been  completed, the scope and requirements
 of the  evaluation  study should be formally  laid out.  In that third task
 no new  information is  developed, but  all of the considerations identified
 earlier are  assembled  into  a plan for the evaluation.  The plan is then
 put into effect.   Required  data are gathered, the model is adapted for
 usage in the study area,  and pollutant concentrations are predicted and
 evaluated  in light of  air quality data and  performance standards.  The sub-
 tasks included in  this final task are:

      >   Modify the model  to represent emissions and atmospheric
         phenomena  adequately in the study area.
      >   Gather required data.
      >   Assemble and format the data.
      >   Exercise the model.
      >   Analyze  the results.
      >   Assess the need to  perform  further  evaluation studies.
      >   Evaluate the adequacy  of the  model.

 Again,  some  of these activities may proceed concurrently; model adaptation
 can be  carried out at  the same time that data are being collected, for
 example.

      We now  proceed to discuss each of these topics individually.

 A.    PROBLEM SPECIFICATION  AND MODEL  SELECTION

 1.    Setting the Context  of Model Usage

      Setting the context  consists of  defining the basic problem that necessi-
 tates the  use of an air quality model.  An  air quality model is generally
 used when  there  is a requirement for  more detailed or extensive knowledge
 than can be  obtained from direct analysis of air quality data.  Alternatively,
 a model  may  be employed to  generate information about possible ground-level
 concentrations of  pollutants due to emissions from a source not yet built.
 In  the latter case,  an  alternative  to mathematical modeling* might be to
* This discussion is limited to mathematical models.  Physical models, such
  as wind tunnels and towing tanks, have yet to be employed in routine air
  pollution applications.
                                  20

-------
find a similar facility in like surroundings  for which  an  adequate  air
monitoring data base exists.   A study  team would be quite  fortunate to
identify such a situation.

     The first requirement in  defining the context of model usage is  to
delineate the issues to be addressed by the use of the  model.  Such issues
as the formulation and approval  of State Implementation Plans  require the
use of models that can simulate impacts from  many sources  spread over a wide
area.  Calculation of impacts  from single sources or new projects can be
required to satisfy Prevention of Significant Deterioration regulations,
New Source Review, and Offset  Rules.   The preparation of Environmental
Impact Statements or evidence  for use  in litigation can require the use of
different types of models, depending on the specific configuration  of sources
and receptors under review.  For a detailed discussion  of  the  various types
and applications of air quality models, see the companion  report (Hayes,
1979).  However, we point out  that a primary  reason for using  an air  quality
model is the need to quantify  air quality impacts of various emissions pat-
terns in order to assess the results of possible scenarios.

      When the issues requiring model  application have  been defined,  the
characteristics of the situation that  guide model selection must be laid
out.  Such considerations as the pollutants of interest, the applicable
air quality standards, the spatial and temporal  range of the problem, and
the important physical and chemical atmospheric  processes  to be modeled
need to be considered.  The pollutants of concern and the  proposed  model
determine the complete set of  pollutants that must be modeled. Inert
pollutants are regulated in the same  chemical form as they are emitted,
and thus no chemical reaction  processes need  be  included in the model.
Reactive pollutants, by contrast, are  subject to chemical  transformation
in the atmosphere, which the model must take  into account.  Thus, if 63  is
to be modeled, it is necessary to model its precursors, namely organic  com-
pounds, NO, and NO^.  The applicable  air quality standard  obviously depends
on the pollutant considered.   Because  averaging  times in air quality stan-
dards for different pollutants range  from one hour to one  year, the temporal
                                  21

-------
 range  required  of  the model  is  to  a  large extent  determined by the pollutant
 being  modeled.   The  spatial  range  of the model  application is essentially
 the  domain  of "significant  influence" of all emissions sources in the study
 area.  These ranges  are  dependent  on meteorological and terrain features
 of the area as  well  as on the type of pollutant.  The concentrations of
 primary  (inert)  pollutants  are  generally greatest close to their source,
 and  so a large  modeling  region  would be required  only if many widely distri-
 buted  sources were being modeled or  if  the  contaminants were  likely to
 be transported  over  a relatively large distance.  Secondary pollutants
 (formed  by  chemical  reactions in the atmosphere)  reach peak concentrations
 at some  distance from a  source  and often necessitate the use of a large
 modeling region.   Processes  that may need to be treated in a model  include
 pollutant emissions, transport  and dispersion,  chemical transformations,
 and  removal processes such  as rainout, washout, and surface uptake.   The
 relative importance  of these processes and  the  requisite degree of sophisti-
 cation depend on the specifics  of  the model application.

 2.   Selection  of  a  Model
       In  principle,  the air quality model  chosen  is  the  one  that  shows
  the best match of capabilities  to the characteristics of  the  problem as
  defined  above.   Other considerations  can  also affect the  choice  of a
  model.   Previous experience with a model  and  the resulting  familiarity
  with exercising it, for example, will usually result in its being chosen
  over another comparable model with which  the  user has little  or  no
  experience,  provided that both  models have similar  capabilities  and
  accuracy in  performance.   Computer access is  an  obvious prerequisite to
  the use  of some models, and the cost  of computer runs to  exercise a
  model  can also be a factor in model choice.   We  note that the EPA (1978a)
  document provides some guidance in model  selection.  Efforts  to  publish
  comprehensive guidelines  that will enable the identification  of  appropri-
  ate air  quality models for use  in particular  applications should continue.

      When the problem has been  fully  defined and an appropriate air quality
model has been selected, evaluation of the model in light of the intended
use follows.  The details of the procedures to be followed in evaluating
models depend on both the model   type and the application.  These specific
points are addressed below.
                                    22

-------
 B.    PLANNING THE MODEL EVALUATION STUDY

 1.    Determination of the Need for Model Evaluation

      Before  a model  evaluation is planned, the necessity of such an eval-
 uation should be assessed according to  the principle that the user must
 provide adequate assurance that model performance is satisfactory.  In
 general,  we  suggest  that all modeling efforts include a performance eval-
 uation study.  Of course, the extent of each study would depend on the
 specific  model and its  intended application.  Situations requiring minimal
 evaluative efforts might occur when:

       >  The model is of a type whose performance cannot be evaluated.
       >  The application is a New Source Review of a completely new
          facility where the source has  not yet been constructed.
       >   The model's performance can be adequately characterized
         by  past experience.

      Considering the first case,  because of features inherent in model
 formulation, the performance characteristics of some models cannot be
 readily determined by the user.   Examples  of such models  include roll-
 back  (as applied to photochemical  oxidants)  and the EKMA.   Such models
 are not verifiable in the sense  addressed  in this report,  because their
 predictions, as noted in Section  E, cannot be checked.   Preliminary
 studies are under way to develop  means for assessing the  performance
 characteristics of such models;  it would seem that ultimately they will
 be subjected to a general  evalaution.   One of the results  of the evalu-
 ation would h
-------
case, which includes all Gaussian models, the steady state assumption
results in a basic inconsistency between the model  and the atmosphere,
and the conditions under which data are gathered for an evaluation study
might not match those for which the model is appropriate.   This problem
should be noted in an evaluation study.

     Moreover, Gaussian models that compute long-term (annual) average  pol-
lutant concentrations invoke a further assumption,  namely, that it is
appropriate to average over a long period the concentrations calculated using
the steady-state assumption.  The validity of this  assumption needs some
attention when evaluating such models.  Data collection requirements for long-
term models are addressed in a later section.

    •  In the case of a model application to a source not yet constructed, a
full evaluation of the predicted impacts is obviously not possible at that
site.  At best, the study team could carry out limited tracer experiments.   Thus,
before a model is used in such an application, it should be verified in a more
general setting, e.g., for an existing source of a similar type under com-
parable meteorological conditions and topographic influences.  In this  way
some confidence could be gained in the use of the model.  In addition,  such
an evaluation would indicate what input data are necessary to permit the
model to produce acceptable results.  These input data requirements would
then need to be satisfied in the new application of the model.

      The third circumstance in which model evaluation is not necessary is
when the model's performance can be ascertained on  the basis of past experi-
ence.  For example, a model may have been previously verified for identical
circumstances.  Such a situation could arise when a model, previously used
to evaluate the impact of a new highway through an  urban area, is employed
to evaluate the results of imposing automotive emission control measures in
the same area,  Another example would be a requirement to evaluate a new
emissions control  strategy with a model previously  used to evaluate a
different strategy.   In both of these cases the pollutants modeled and  the
area of application  would be identical, and thus the study team could cite
the pertinent evaluation effort in lieu of performing further verification work.

                                 24

-------
     Care  must  be exercised when evaluating the transferability of a pre-
vious evaluation.  We suggest that criteria be developed to judge applica-
bility.  At  a minimum,  it appears that all of the following criteria would
be applicable:

     >  Similarity of model application.  A previous evaluation for
        the  same area is clearly transferable to a new study,  but
        whether a model application in another study area can  rely
        on previous evaluative work is unclear.  Ideally, the
        model should be evaluated each time it is used in a new
        area.   However, with further modeling experience, it
        may  be  possible to identify a set of model input require-
        ments that, if  satisfied, would ensure adequate model  per-
        formance.  Thus, the user would be responsible for collecting
        the  requisite input data but would not have to mount,  say,
        an additional air monitoring effort for model  evaluation
        purposes.
     >  Similarity of pollutants modeled.  In general, as long as the
        pollutants are  the same, it would appear that the evaluation
        is transferable.  If a model is found to perform satisfactorily
        for  one set of  pollutants in some area and if there is an
        interest in using it to simulate some other set of pollutants,
        we would recommend that the model be evaluated for the new
        pollutants.
     >  Similarity of emissions and meteorology.   Even if a model was
        previously verified for a particular area, significant alter-
        ations  to the gasoline emissions  inventory might necessitate
        reevaluation of the model.   In addition, a verification based
        on one  set of meteorological  conditions (e.g., adverse summer
        episodic conditions) might  not be sufficient to ensure adequate
        performance under a different set of conditions (adverse winter
        conditions).  Again, criteria could be developed to guide the
        user in judging whether  further evaluation of the model is
        necessary.
                                  25

-------
      In  summary, whenever  the  use  of  an  air quality model is contemplated,
 it  should  be  a  verified  model,  except for  a few  restricted cases.  The
 use of an  unverified  model  for any situation  involving air quality
 regulations  should be subject  to prior approval  by the group responsible
 for overseeing  model  applications.  The circumstances under which the
 requirement  for model evaluation can be waived should be  a subject for
 consideration.   It is our view that such waivers should  be granted only
 when exceptional circumstances prevail.

       Once the  decision  has been made  to carry out a model evaluation,
with  the object of verifying an  air quality model for a particular use,
the process of  planning  the study  can  move forward.   We now discuss  each
of  the subtasks identified earlier as  belonging to the planning phase.
Note  that  they  are not necessarily intended to be carried out sequentially,
but rather concurrently, as convenience and necessary interaction between
them  dictate.

2.    Development of a Conceptual Evaluation Plan

      This  initial  planning work  can be divided into three parts:

      >  Defining the  extent of evaluation  required.
      >  Outlining  the implementation of the evaluation study.
      >  Considering actions to take at the conclusion of the
        study.

The first part  consists of establishing the characteristics  and scope of  the
proposed study; the second part  considers the implementation  of the  effort in
light of the available resources;  and  the third part essentially comprises a
contingency plan to anticipate all possible outcomes  of the  study and what
will be done as a  result of each.  Definition of the extent  of  evaluation
requires that the  following issues be addressed:

     >  Assessment of model availability.  When the  study is
        initiated,  it should be certain that the latest  version
        of the model is available for use.   The potential  user
        should assure himself,  by checking with the  model  developer
                                   26

-------
   or the EPA if necessary, that all  known  errors  in  his
   version of the model  have been rectified.   If the  model
   is implemented on a computer, a related  question that  is
   relevant at this stage is whether  the  codes  are compatible
   with the user's computer.  The magnitude of  this problem,
   particularly for large computer models with  extensive  file
   manipulations and overlays, should not be  underestimated.
   Consultation with the model developer may  be  necessary to
   effect a transfer.

>  Definition of the size and boundaries of the  modeling  region.
   The appropriate modeling region is  a function of the model
   type and application.   For a Gaussian model  applied to a
   single source, the  study has to cover the  region where
   impacts from the source are expected for the  meteorological
   conditions to be modeled.   For a multiple  source Gaussian
   climatological model, the modeling  region  usually covers a
   complete urban area.   With trajectory models, the simulation
   is constrained by the distances over which the basic model
   concept is valid.  In the case of  grid models, a constraint
   that often comes into play is the  number of  grid cells that
   can be accommodated in the available computer core storage.
   This number of cells  has to be judiciously arranged so as to
   cover the area of interest, or the  region  segmented, so that
   only part is dealt with at one time.

>.  Definition of the resolution of the model.   For Gaussian
   models, which assume steady-state conditions and yield
   analytical solutions for concentration  fialds, temporal and
   spatial resolution is not an  issue.  (Because of the  steady-
   state assumption the models yield time-invariant concentra-
   tions, and because the solution is analytic  in spatial coor-
   dinates a concentration can be calculated for any desired
   location.)  Other types of models may have  particular temporal
   and spatial characteristics.   It should be  verified that the
   temporal resolution of the model is compatible with the

                              27

-------
   averaging time of the air quality standard being addressed.
   For short-term computer simulation models, compatibility is
   not usually a problem with averaging times from 1 hour to
   24 hours since time steps in these models are less than 1
   hour.  However, these models have not been applied to problems
   involving pollutants subject to annual standards because of
   the inordinate amount of computing required to simulate such
   a period.  The spatial resolution of the model should be
   appropriate to the problem being considered.   For instance,  if
   the model is being used to establish broad regional trends,  a
   relatively coarsely resolved result might suffice.  Studies
   of mixed control strategies, however, would require that the
   model be capable of distinguishing the effects of controlling,
   say, automobiles, power plants, and oil refineries.  Control
   strategy evaluation would necessitate the ability to include
   the important spatial and temporal characteristics of the emis-
   sions patterns associated with these sources, as well as the
   spatial and temporal characteristics of their respective air
   quality impacts.

   In conclusion, we reiterate that the useful resolution of
   the model predictions is limited by the characteristics of
   the available input data.  Model results cannot be resolved
   to a scale finer than that of the input data.  For instance,
   if the diurnal variation in traffic emissions is not properly
   accounted for, no useful conclusions can be drawn about con-
   trol strategies that attempt to reduce only peak-hour traffic.
   If large scale synoptic flows are used as input, no informa-
   tion on local effects of specific terrain features will be
   available.

>  Pollutants to be included.  In the case of primary, non-
   reactive pollutants, only those particular species under
   study need concern the model user.  However,  where reactive
   pollutants are concerned, all species that participate in
   relevant chemical reactions must be treated.   For instance,

                                28

-------
   to operate the SAI Airshed Model  to calculate 03 concentra-
   tions, the user must supply gridded emissions data  for hydro-
   carbons, NO, N02, and CO.   All  of these  pollutants  affect
   the predicted 03 concentrations through  reactions in  the
   model's kinetic mechanism.

>  Time period to be simulated.   This aspect of model  operation
   is the most dependent on the  intended  application.  If com-
   pliance for a pollutant subject to an  annual  average  standard
   is the issue, then a model that can incorporate  a year's  data
   will be used.  These data are often used in  summary form, as
   an annual stability/wind rose,  annual  average temperature, and
   so on.  For shorter term studies, such as 1-hour-average  con-
   centrations, the period simulated generally  varies  from 1  hour
   up to 24 hours or possibly longer.   With a model  that does not
   include any temporal variation, only a steady-state concentra-
   tion is calculated.  For photochemical models that  include
   temporally varying inputs, the  period  generally  simulated
   is the daylight hours, when photochemical  reactions are impor-
   tant.  Thus, the variations of emissions and meteorological
   conditions over this time  must  be included.   Sometimes, pollutant
   carry-over from the previous  day  is important and must be treated
   through the performance of a  multiple-day simulation.

>  Listing of the kinds of data  that will be needed.   As  a
   preliminary task in determining the scope of the  required
   evaluation, the types of data required should be  identified.
   At this stage, detailed data  requirements are not listed,  but
   knowledge of the types of  data  required  will  enable a  proper
   assessment of current availability of  useful  data and the
   probability that a suitable data  collection  effort  can be
   initiated.  Table IV-1 presents a list of types  of  data that
   may be required by an air  quality model.   The level of detail
   required by any individual model  is specific to  that  model.
                              29

-------
 TABLE  IV-1.   POSSIBLE DATA RETIREMENTS OF AIR QUALITY MODELS

            Data
           Category             	Examples	
        Meteorological          Wind  speed and direction
                               Temperature
                               Atmospheric stability
                               Mixing  depth
                               Insolation
                               Humidity
                               Cloud cover
                               Atmospheric pressure
        Emissions               Mobile  sources
                       X
                               Stationary sources
                               Natural sources
        Air quality             Surface measurements
                               Vertical pollutant con-
                               centration soundings


   For instance, a single source  model  usually requires wind
   speed and direction inputs  only at the point of emissions.
   For a long-term average model, a wind rose covering the
   appropriate length of time  is  needed.  A grid-based photo-
   chemical model  requires a  three-dimensional, temporally vary-
   ing wind field.  Thus  it is useful to list the types of data
   needed for the  model  so that the extent of data gathering
   required can be appreciated.

>  Choice of meteorological conditions  for study.  Since meteoro-
   logical conditions play a significant role in determining pollu-
   tant concentrations,  the choice of conditions for a modeling
   study is very important.  Generally, episodic conditions leading
   to  high pollutant concentrations are desirable.  These can be
   identified,  for locations that have  the necessary data, by
                                30

-------
examining the concentration records for a period in the past
and observing which meteorological conditions lead to the
highest observed concentrations.   Of course, this approach
works only if there are emissions sources in the area:   It is
not applicable to a hypothetical  emissions source in an area
with clean air.  If air quality records are not available, an
analysis of the meteorological  data must be carried out to
identify adverse conditions.   Conditions judged most likely to
lead to significant impacts should be identified (e.g., low
mixing depth and stagnant wind),  and their historical frequencies
of occurrence should be determined.  For a short-term simulation,
data for one or more days in  the  past will need to be collected
and used in the model evaluation  effort.

Determination of the number of meteorological  regimes to
study.  The basic considerations  in selecting the number of
regimes to be included in an  evaluation study are (1) specifica-
tion of a sufficient number to enable characterization  of
model performance, and (2) selection of regimes representative
of those required in the actual applications studies.   While
little definitive guidance can be provided at the present time,
we do offer the following observations.  With regard to long-
term average models, Rubin (1974) found that year-to-year
variations in annual stability/wind rose had a significant
effect on pollutant concentrations calculated by the Air Quality
Display Model.  He concluded  that such year-to-year variations
would have an important effect on model validation if a rose
were used from a year other than  that for which pollutant con-
centrations were recorded.  Thus, roses from several years
should be used so that variations are accounted for. For model
evaluation, pollutant concentrations for several years  would
have to be available, and normally they are not.

Ideally, for short-term average models, all frequently  occurring
adverse meteorological regimes would be examined.  However,
                             31

-------
        it should be necessary for the user to evaluate the model
        subject only to those regimes that will actually be employed
        in subsequent applications.  For example, if a model is to
        be used to estimate 0^ concentrations that occur under adverse
        conditions during the summer, it would not be necessary to
        evaluate the model for conditions that occur during the winter.
        We note that photochemical model usage generally entails the
        selection of conditions leading to high 0- concentrations.
                                                 «5
        For these models, it would also be advisable to select condi-
        tions  resulting in lower maximum 0- levels (e.g., from 0.12
        to 0.20 ppm) to test the model's performance at concentrations
        close  to the standard (0.12 ppm).  We also note that the number
        of conditions to be studied must be reconciled with financial
        resources.

The next step  in developing the conceptual evaluation plan is to lay out
the resources  available to implement the plan.  These resources are dis-
cussed below.

a.   Financial Resources and Temporal Constraints

     At this stage in the evaluation study the complete requirements have
not been identified, and so the dollar needs of the study cannot be speci-
fied accurately.  A rough estimate can be made, however, on the basis of
the extent of  the study (discussed above) and previous similar model evalua-
tion efforts.  Funds for the evaluation can come from both public and pri-
vate sources:  Federal, state, and local governmental agencies may fund a
model evaluation study as part of an effort like the development of a State
Implementation Plan, while private sources might fund evaluation efforts
in conjunction with a New Source Review.

     The responsibility for financial support of model evaluation efforts is
a matter deserving some consideration.  In general, funding should be pro-
vided for studies aimed at collecting requisite data bases and carrying out
general evaluations of EPA-recommended models.  Consideration should also
be given to setting aside additional funds for model verification, which
                                     32

-------
would be made available to public agencies  about to  use an  air quality model
(new or established) in a new application.   We  discuss  this point further in
the next chapter.

     An issue related to the economic  resources available for  model  evaluation
is the amount of time available  to complete the study.   In  many cases  the
model application must be completed by a  definite date.  For example,  a
provision in a law may require certain actions  by certain dates,  or  a
commitment must be honored to commence construction  of  a project by  a
particular date.  It will  be important for  the  study team to carry out
the evaluation planning task in  a timely  fashion to  allow adequate time
to actually evaluate the model's performance.   Moreover, they  should be
realistic with respect to the total time  required for the study.

     Any problems associated with time or money limitations must be  resolved
in order that an adequate plan can be  formulated.  However, this  resolution
should take place at the formal  plan definition stage,  when the plan for the
study has been completed.   If the amount  of work required by the  plan  is
greater than the resources available to carry it out, then  it  must be  expli-
citly recognized that the study  will  be inadequate and  the  impact of the
deficiencies on the study's conclusions should  be estimated.   Priorities
should be assigned to the various elements  of the plan  to ensure that  the
most important activities are carried  out.

     As an example, if only one  vertical  temperature sounding  is  available
for an area, the consequences of having so  little information  for character-
izing mixing depths and winds aloft must  be established.  For  models that
do not use detailed spatial and  temporal  information, the number of  temper-
ature soundings may be of little consequence, but for a complex model  the
number could have a substantial  impact on the model's prediction. We  note
that the effect of data limitations on the  adequacy  of  predictions of  photo-
chemical grid models is a subject of current research studies  (Tesche,
1978).
                                     33

-------
b.   Manpower Resources

     To carry out a model evaluation study, people with many different
skills are required to undertake many different types of tasks.   Personnel
are needed with experience in the fields of:

      >  Meteorology
      >  Analytical chemistry
      >  Computer programming
      >  Statistical analysis
      >  Air quality analysis.

These people are needed to:

      >  Select an appropriate air quality model
      >  Plan and set up the evaluation study
      >  Collect data
      >  Specify and prepare data inputs to the model
      >  Run the model
      >  Analyze the study results.

In addition, supervisory personnel are needed to assume responsibility  for
proper execution of all of these activities.

      In view of the likelihood that a relatively large number of model
evaluation efforts may be initiated in the future and that there will be
a need for personnel with a high level of expertise to design and implement
these studies, some form of central model evaluation group should be estab-
lished.  The function of this group would be to be available to help model
users who need to set up a model evaluation study.  The group would include
experts in the fields listed above and would be available to help plan  or
review studies and aid in their implementation.  Thus there would be
                                     34

-------
experts available for advice on model  selection and usage, air quality mea-
surements, meteorological measurements, assembly of emissions inventories,
setting up and running air quality models on the computer, and analysis of
the results.  In addition, the group could serve a custodial  function in
collecting a set of properly evaluated air quality models and a number of
data bases to use in future evaluations.   The availability of such a group
would relieve model users of the responsibility to assemble a comparable
group of experts (presumably from outside contractors)  and, at the same time,
would build up a fund of knowledge of and expertise in  model  evaluation.
Accumulation of this knowledge would also result in improvements in model
evaluation methods over a period of time.

c .   Hardware Resources

     Hardware will be needed for data  collection (to the  extent that neces-
sary data are unavailable), exercise of the model,  and  analysis of results.
Table IV-2 lists the types of hardware that can be required to collect the
data necessary to exercise the model.   'For running the  model  and analyzing
the results, access to computing facilities is often required.   A portion
of the measurement hardware may be available as part of a currently operat-
ing data-gathering network.  All available aerometric monitoring hardware
should be inventoried, with special consideration given to the identifica-
tion of those instruments that can be  deployed at the discretion of the
model evaluation group.  In the early stages of the model evaluation effort,
the exact extent of the data-gathering effort will  not  be l.nown, but a survey
of the currently available instrumentation will indicate  the  size of the
effort that can be mounted without purchasing or leasing  equipment.

     Once the extent of the evaluation effort has been  defined and the
available resources are known, the final  part of the plan for the evalua-
tion study can be undertaken, which is to lay out actions to  be taken at the
conclusion of the study.  The product of the performance  evaluation con-
sists of values for one or more performance measures that are to be compared
with appropriate standards.  In the event that satisfactory performance is
achieved, the results of the study can be reported and  the model can be considered
                                     35

-------
TABLE IV-2.  POSSIBLE HARDWARE REQUIREMENTS FOR DATA COLLECTION
        Data
      Category
   Meteorological
   Emissions
   Air quality
             Hardware
Anemometers (wind speed)
Wind vanes (wind direction)
Thermometers (temperature)
Instrumented balloons (upper air
measurements of winds and
temperatures)
Radiometers (insolation)
Hygrometers (relative humidity)
Barometers (atmospheric pressure)
Traffic counters
Source monitors
Samplers
Analytical instrumentation
Calibration samples
Recorders
Instrumented aircraft
                                36

-------
to be verified.  If the model fails to achieve the appropriate performance
standard, the possible reasons for failure must be examined.   These reasons
can include:

     >  Inadequate data base.
        -  Data too sparse.
        -  Data too imprecise.
     >  Inadequate model inputs.
        -  Uncertain interpolation of data.
        -  Uncertain extrapolation of data.
     >  Inadequate model formulation.
        -  Poor treatment of emissions.
        -  Inadequate transport and dispersion algorithms.
        -  Inadequate chemical mechanism for atmospheric
           reactions.
        -  Poor treatment of removal  processes.
     >  Inadequate air quality data.
        -  Too few data to calculate performance measure properly.
        -  Data uncertainties too great.

We discuss these possibilities in detail  in  a later section.   At this  stage
the implications of an unsuccessful evaluation study should  be considered.
It would seem appropriate to set aside some  time and financial  resources
for possible use at the end of the study to  enable the performance  of  some
additional work to achieve a  set of satisfactory model results.

3.   Examination of Existing Data Bases

     In this task, existing data bases are surveyed with a view to  their
use for the model evaluation,  To the extent that suitable data are already
available, they will not need to be collected as part of a special  field
program.  We anticipate that many model  evaluations can make significant
use of existing data.  Data required for the evaluation are of three basic
types:
                                     37

-------
     >  Meteorological data
     >  Emissions data
     >  Air quality data.

They may be available from three sources:

     >  Federal government agencies
     >  State and local government agencies
     >  Private organizations.

Data from governmental sources should be readily available.  Data from the
private sector may be harder to obtain and, in addition, may be of limited
utility, since they may have been collected for some special purpose.

     The characteristics of the data bases to be evaluated are:

     >  Coverage of the modeling region.
     >  Coverage of the desired range of meteorological conditions.
     >  Coverage of the required time period.
     >  Coverage of the required model variables.
     >  Compatibility with the model's spatial and temporal resolution.

These characteristics are all related to the features of the study that
were identified in developing the conceptual plan for the evaluation.

     Coverage of the modeling region is judged by the adequacy with which
the data base collection system covers all data of interest.  For a study
involving a single point source,  this area would  include the location of the
source's maximum impacts under the various meteorological regimes of interest
(and the location of the source itself).  For a regional scale study,  the
area of interest might extend beyond the actual region modeled to include
outside sources that impact on the region.  The boundary of the region of
interest is also influenced by terrain features, such as mountain barriers.
If pollutants do not traverse such features,  there is  usually no  interest  in
modeling the opposite side.  Also, if pollutants are carried outside the region
during one day and brought back by wind the next day, there would be a need
to locate monitors to gather the relevant data.

                                    38

-------
     The existing data base should also be judged for its  coverage  of the
desired range of meteorological  conditions.   In the course of defining the
extent of the evaluation study,  one should select the meteorological  con-
ditions most appropriate in light of the intended model  application.
Meteorological and air quality data should be assembled  for these conditions.

     Another feature of the evaluation  study  that was defined in the  plan-
ning stage is the period of time over which the data  should be collected.
For models that simulate a day or less, available data bases are likely to
be able to provide data, although adequacy of areal coverage or availability
of data for a specific day may be a problem.   For situations requiring the
estimation of annual average concentrations,  care should be taken that the
data do not include .systematic gaps (e.g., all  summer data missing).
Seasonal influences on emissions may have  to  be taken into account  in pre-
paring an emissions inventory.  Furthermore,  for a satisfactory evaluation,
a sufficient number of years of  data should be available to constitute a
representative sample.

     The variables for which data are required and the temporal and spatial
resolution of those data should  also be identified and listed at the  plan-
ning stage.  'Existing data bases must be examined for comprehensiveness and
compatibility with model requirements.   Data  requirements  for which exist-
ing data are not suitable or comprehensive enough must be  fulfilled in the
course of a supplementary field  measurement program.   However, only certain
types of supplementary data collection  are possible.   If some measurements
on a particular day were missed, it is  impossible to  collect them subse-
quently.  Additional days of data might, however, be  collected if necessary.
If existing data are used anyway, the consequences of their use must  be
recognized.  The effect of degradation  of model inputs on  model output should
be determined for each EPA-recommended  model  through  the performance  of
appropriate sensitivity studies.  For a simply formulated  model, determination
of this sensitivity to data degradation may be straightforward, but for a
complex model the work involved  will be substantial.   Plans for carrying out
one such effort for a complex model are described by  Tesche (1978).
                                    39

-------
     The largest single source of meteorological  data is  the National
Weather Service, which supervises the collection  of data  at a network  of
stations, mostly located at airports, around the  country.   Other sources of
meteorological information are air pollution agencies, which often  record
surface wind and temperature information.  However, routine data collection
activities are likely to be sparse, as illustrated by Table IV-3,
shows the number and type of daily meteorological measurements made in 15
U.S. cities (Tesche, 1978).  The level of monitoring varies widely  from city
to city, and it will not be possible to say how many data are available for
any given area without surveying all possible sources.

     The major repositories of emissions data are, of course, the air  pollu-
tion agencies.  The EPA maintains a system, called the Aerometric and
Emissions Reporting System (AEROS), that contains both emissions and air
quality information reported by pollution control agencies throughout  the
United States.  The emissions data are stored in  the National Emissions
Data System (NEDS), which has the capability of storing and retrieving
source and emissions-related data for particulates, SOX,  NOX, CO, and  hydro-
carbons.  A problem with this centralized system is that the data can  never
be fully up to date.  To obtain the most recent data, one must consult
appropriate local or state authorities.  Table IV-4 summarizes
the types of emissions inventories available for the  15 cities listed
in Table IV-3.  Note that the level of detail available can vary from  dis-
aggregation on a 0.1 km grid down to aggregation on a  county-wide  scale.
The latter scale would clearly be inadequate for some models, and a supple-
mentary emissions preparation effort would need to be considered.  This
topic is covered later.

     Air quality data are required for the model  evaluation not only to
supply input to the model, but also to provide the basis for comparison of
model results with performance standards.  Input air quality data are  used
to supply background pollutant concentrations for many models and to specify
initial and boundary conditions for complex models.  The Storage and
Retrieval of Aerometric Data (SAROAD) system is a primary source of air
                                     40

-------
TABLE  IV-3.
NUMBER AND TYPE  OF DAILY METEOROLOGICAL  MEASUREMENTS
IN  15  U.S. CITIES IN 1977
Surface
Hind
City Velocity
Albuquerque, **•
Chicago, IL
Denver, CO
Houston, TX
Las Vegas, NV
Los Angeles, CA
New Tork. HI
Philadelphia. PA
Phoenix, AZ
Portland. OR
Sacramento, CA
San Diego, CA
San Francisco, CA
St. Louis, MO
Washington, D.C.
7
10
25
3
B
44
10
2
8
9
12

17
25
25
Surface
Temperature
7
10
2
3
1
9
10
2
8
9
4

17
25
25
Upper level
Atmospheric H1nd
Stability Velocity
RU, RW,
RU RW-
RW1 RW1
0 P,
AC2. AS, 0
RDB. AS, RD8
Pl
Pl
0 0
RU, RWrP,
AS3 P.

RU, RW,
MB P96
RU, RU,
Solar
Insolation
1
3
1
1
1
2
3
1
1
1
1

17
6
2
Humidir
1
3
1
1
1
e
3
1
1
1
2

17
20
2
  AC * acoustic sounder.
  AS « aircraft spiral.
  RO * radiosonde.
  RW * rawinsonde.
   t * pibal.

  Note: Subscripts refer to the number of measurements taken each day.  A zero entry indicates
       that a particular measurement is not taken; a blank indicates uncertainty as to whether
       or to what extent the measurement is taken.
  Source:   Tesche  (1978).
                                           41

-------
             TABLE  IV-4.    METHODS  USED  FOR THE PREPARATION
                              INVENTORIES  IN 15  U.S.  CITIES IN
OF  EMISSIONS
1977
CHy
Albuquerque. NH
CMC* go. U
Denver. CO
Houston. TX
Las Vegas. NV
Los Angeles. CA
tow fork. NY
Philadelphia. PA
Phoenix. A7
Portland. OR
Sacramento. CA
San Diego, CA
San Francisco. CA
St. Louis. MO
Washington. D.C.
format
1 Ink -node:
VHT
Gridded
Gridded
Lint-node:
VHI
Gridded
Gridded
VKT
Gridded
Gridded
Gridded
Gridded

Gridded
Variable
size grid
Gridded
Species Grid Size
N. H. C N/A
N. H. C 50 x 50:
2 ml
N. H. S. 30 i 30:
P. C 1 mi

N. H. C 30 « 40:
1 km
N, H. S. C 100 x 50:
2 mi
Borough by
borough
H. C. N 48 x 48:
2 mi
N. H, C 1 mi
S. P. C 20 * 30:
2 tor.
N, H, S. 25 x 25:
P. C 2 km

N, H. S. 120 x 60:
P. C 1 km
N. H. S. 150 x 200:
P. C 1-10 km
N. H. S, 4 mi
P. C
Hot/Cold
Start
Area -wide
tempera 1
resolution

Area -wide
temporal
distribu-
tion

Area -wide
temporal
distribu-
tion
Area -wide
temporal
distribu-
tion




Area -wide
temporal
resolu-
tion

Area-wide
temporal
distribu-
tion
Hot/cold
distribu-
tions
applied
to each
grid cell

Format
NEDS
Gridded
Gridded
By counties
Gridded
Gridded

NEDS
NEDS
By dis-
trict
Gridded

Gridded
Gridded
Gridded
Species Spatial
*. N. S. Area-wide
P. C
N, H, C 2 »n
N. H. S. P 1 ni
H, S. H. P County-
wide
N 1 lor.
N. H, S. C 2 mi
S. P
N. H. S. Area-wide
P. C
N. H. S. Area-wide
D. C
H. N Depends on
size of
districts
N. H. S. 2 km
P. C

N. H. S. 1 km
P. C
K. H. S. 1-10 tor.
P. C:
hydrocarbon
speciation
N. H. S, 4 mi
P. C
Temporal
Annual
•verage

6 or ?4
hour, plus
seasonal


Hourly

Annual
average
Annual
average

Annual
average

Hourly
Hourly

  N • nitrogen oxides.
  K • hydrocarbons.
  S • sulfur oxides.
  P • participates.
  C • carbon monoxide.
 VMT - vehicle miles traveled.
NCOS • National Emissions Data System.

  Source:   Tesche (1978)
                                                  42

-------
quality data for use in modeling studies.   SAROAD is maintained by the EPA
as part of AEROS, which is used to store and report air quality and emissions
data.  The data in SAROAD may not be sufficient,  however, because in many
areas the reporting network is based on the minimum number of stations re-
quired by the EPA (40 CFR §51.17, 1975) (see Exhibit IV-1).   Unfortunately,
there are currently no criteria for judging the adequacy of these monitor-
ing networks for use in evaluating model performance.   In fact, it is likely
that the monitoring objective of an existing network is something other than
to provide information to the model evaluation process.  As a result, loca-
tions of the monitors may make those measurements of limited usefulness in
an air quality modeling study.  In particular, existing monitoring facil-
ities will likely be concentrated in areas where  air quality violations
occur, which tend to be in the centers of regions.   Thus, data that are
currently available often do not give information on boundary or background
concentrations.  We discuss placement of monitors in a later section.
Table IV-5 presents the extent of routine air quality monitoring in the
15 cities listed in Tables IV-3 and IV-4.   Again, there is a noticeable
variation in the extent of monitoring from city to city, with the amounts
generally corresponding to the severity of air pollution problems.

     In summary, there are many sources of relevant meteorological,  emissions,
and air quality data, but unless the data  are obtained from a special  inten-
sive monitoring activity whose objective is to provide model input informa-
tion, they may not be comprehensive enough to satisfy  the complete needs  of
a model  evaluation effort.   A relevant matter not covered in this  section
is the problem of data quality.  We discuss this  issue more  fully in the  section
on assessing data needs, but we point out  here that data quality is  a very
important element in judging the acceptability of a data base.

4.   Assessment of Data Needs

     The data needs  for a model evaluation  study  are dependent  on:

     >  The  model.
     >  The  nature of the evaluation (general  or  site-specific).
     >  The  relationship between amount and quality of data  and
        model  performance.
                                    43

-------
   OlMsUJcatlon
    olregion
FoUulanl
Measurement melbod
    or principle'
                                            Minimum frequency of sampling     Region popoUtlon
     Minimum number of air
     quality mooltorlnf iltM •
  I	Suspended partleulatef... High volume sampler	Ona34-bonriamplieverytdayi*. Less thin 100,000..
                                                                                                 100,000-1,000,000...
                                                                                                 1,000,001-8.1100,000.
                                                                                                 Above 8,000,000...
                                       Tape sampler	On* sample every 1 boon.		
                BuUur dioxide	Pararosanllins or equivalent«.
                                            One M-hoar umpli every 6 dayi
                                             (gas bubbler).'
                Carbon monoxide	


                Photochemical oxldant*..


                Nitrogen dioxide..	..
  II....	Suspended partlcnlatM..

                BuUar dioxide	
                 Nondlsperslve Infrared or
                  equivalent.*

                 Oas phase cherallumlnetenee
                  or equivalent.'

                 14-hour sampling method
                  (Jacobs-llochbolser
                  method).
                 High volume sampler..	
                 Tnpe sampler	
                 Pararosanlllne or equivalent'.
  HI*	Suspended partlculates...
                "uUurdl  "
                BuUur dioxide..
                , High volume sampltf	
                 Pararosanlllne or equivalent'.
                                                      Lets than 100.000..
                                                      100.000-1.000,000...
                                                      1.000,001 -.1.000,000.
                                                      Above 8.000,000...
                         Continuous	..	Lena thin 100,000..
                                                      100,000-8.000.000...
                                                      AboveS.000.000...
                         Oootlnnoof	Loss thin 100,000..
                                                      100.000-8.000.000...
                                                      Above 8,000.000...
                         Continuous	Less than 100,000..
                                                      100,000-8,000,000...
                                                      A bore 8,000,000...
                       One 24-hour simple every 14      Less tbnn 100,000..
                         dijri (gis bubbler).*   •        100.000-1.000,000...
                                                      Abort 1,000.000...
                       One 24-hour umple every fldayi *..	
                       One sample every I boun	
                       One 24-hour umple every 6 diyi  	
                         (gas bubbler).*
                       Continuous	
                       One 24-hour sample every 8 days'	
                       One 14-hour sample erery 6 d»yi  	
                         (gu bubbler).*
. 4+0.0 per 100,000 population.1
, 7.8+O.M per 100,000 population."
 12+0.18 per 100,000 population.
. One per 280,000 population • up
   to eight IltM.

. j!«+0.» per 100,000 population.*
. C+O.IB per 100.000 population."
 11+0.08 per 100.000 population.

! 1+0.18 per 100,000 population.'
 8+0.08 per 100,000 population."

 1+0.16 per 100,000 population.'
 8+0.08 per 100,000 population.*

' 1+0.15 per 100.000 population.*
 8+0.06 per 100.000 population.'

 4+0.8 per 100,000 population.*
 10.
 I.
 1.
 I.

 1.
 1.
      • Kqutvalpnt to 81 random samples per year.
      • Equivalent to 28 random samples per year.
      • Totnl popiilntlnn of • region. When required number of lamplers Includes • fraction, round-off to nearest whole number.
      «-l|H
-------
TABLE IV-5.
NUMBER OF STATIONS  PERFORMING ROUTINE AIR QUALITY  SAMPLING
IN  15 MAJOR CITIES  IN THE UNITED STATES
City
Albuquerque, NM
Chicago. IL
Denver, CO
Houston, TX
Las Vegas. NV
Los Angeles. CA
New York. NY
Philadelphia. PA
Phoenlz. AT.
Portland, OR
Sacramento, CA
San Diego. CA
San Francisco. CA
St. Louis, HO
Washington. D.C.
0»1dant
4
4
9
?
3
39
7
8
5
3
8

26
25
10
Si
3

3
3
2
27
7
S
2
3
8

16
25
10
£0
5

9
3
24
23

8
8
15
8

16
25
10
Hi
3

1
3
0
17

8




21
11
10
RHC
0

2
3
4
11

3
3

4

16
25
10
Farttcu-
lates
13

0
3
3


8


1

17
20
10
!3
0

0

0
8




0

17
10

Upper Air
Measure-
ments
0

R

0
R




0
0
0
S

Hydro-
carbon
Species
0



0
S







S

      S • special studies.

      R « rirely.


      Note:  A zero entry Indicates that a particular measurement 1* not taken; • blank Indicate!
           uncertainty as to whether or to what extent the measurement Is taken.
      Source:  Tesche (1978).

-------
The assessment of what data need be collected should be carried out concur-
rently with the examination of existing data bases.  These two tasks should
be executed with a high degree of interaction.  The need for particular data
should initiate a search for those data among existing data bases.   Conversely,
the total lack of a particular type of data and the lack of any prospect of
obtaining them could modify the assessment of data needs.  For instance,
lack of vertical temperature soundings at a sufficient number of locations
might be compensated for by using a single temperature sounding in conjunc-
tion with data from several surface temperature observation sites to pre-
scribe the spatial and temporal characteristics of the mixing depth.

     The question of the number and location of monitoring sites required
for air quality model evaluation is at present unresolved.  The number of
stations employed in evaluative studies at the present time is generally
determined by the available resources.  Obviously, there is a need for an
effort to determine more accurately how many measurements are needed to
satisfy the various requirements of the model evaluation.  The number will,
of course, depend on the model and the application.

     There is also a need for an effort to develop an optimal siting metho-
dology for pollutant monitoring stations, despite many recent studies dealing
with this subject.  Seinfeld (1972), for example, developed a siting algo-
rithm based on the premise that stations should be located so that the con-
centration data are as sensitive as possible to changes in emissions from
major sources.  Hougland and Stevens (1976) developed a site location model
based on maximizing the sum of coverage factors for each source.  Ott's
(1977) standardized system of site selection is designed to improve the com-
parability of data from different stations.  Pooler (1974) described the
rationale behind the selection of sites for the St. Louis RAPS study and its
relation to monitoring objectives.  Liu et al. (1977) developed a methodology
for designing monitoring networks based on a figure of merit that was related
to the probability of detection of concentration peaks.  Although there have
been many studies of network designs, using many different algorithms, a
practical methodology suitable for siting stations for model evaluation
purposes has yet to be developed.

                                     46

-------
     As mentioned earlier, all models require data of three basic types:

     >  Meteorological data
     >  Emission data
     >  Air quality data.

In general, the more complex a model  is,  the  more  flexibility  it  allows in  the
preparation of the input data.  We now discuss the data  requirements  of dif-
ferent models for each of the three categories listed above.   Our discussion
is general, with specific comments made about individual model  types  as
appropriate.  We then discuss data quality control.

a.   Meteorological Data

     One of the principal mechanisms  for  distribution of pollutants is
transport by the wind.   Therefore,  wind speeds are required inputs
for many air quality models.  In the  simplest case, that of a  short term
Gaussian model, the wind speed at the emissions  source is  required.
Since such a model  is commonly used to compute concentrations  resulting
from emissions from elevated stacks,  the  wind speed aloft  is the  relevant
parameter.  However, the vast majority of wind speed  measurements are made
at or near ground level.  Extrapolation of  surface measurements to upper
levels is commonly  done by using an empirical power law  correlation,  which
can introduce errors on the order of  20 percent  at 2000  feet (Roth et al.,
1975).  In the case of Gaussian models that simulate  longer term  (up  to
annual) average concentrations, winds are needed over correspondingly longer
periods.  They are  supplied as a stability/wind  rose, which is a  joint
frequency function  for wind speed, wind direction, and atmospheric stability.
Since a single rose is used for the complete  modeling area, the questions
arise of where the  measurements are made  and  how typical they  are. The
importance of wind  speed in the Gaussian  formula argues  for its careful
measurement in a model  evaluation study.  Enough measurements  of  wind speeds
at ground level and aloft should be made  to check  any correlation
used for extrapolation.   In addition, measurements of wind roses  at a
number of locations in the modeling area  should be made  to ascertain  the
                                   47

-------
potential variations that can occur and to estimate the  potential  errors
from this source.  The exact number of locations needed  will  have  to  be
determined from experience gained in a few studies.  If  it is not  possible
to make wind rose measurements for the needed length of  time  (up to a year),
then they should be made for as long as possible.to obtain at least a lower
bound on the potential error.

     For more complex models, such as grid and trajectory models,  more detail
yet is required in the wind measurements.  Trajectory models, which track
a moving parcel of air and neglect the variation of wind speed with height,
must have some means of converting measurements into a characteristic
wind speed and direction for the parcel as a function of time.  Use of
ground-based measurements can lead to large errors in the computed tra-
jectory, as illustrated in Figure IV-2.  This potential  for error  suggests
that an element of evaluating a trajectory model might include the com-
parison of computed trajectories with suitable observations,  such  as
tetroon releases, and an accounting for any discrepancies found.   Another
type of trajectory model Is the Reactive Plume Model (RPM) (Liu, Stewart,
and Roth, 1978).  In this model no spatial variation of the wind is assumed,
just a temporal variation.  However, the winds should be defined at the
height of the source, and so the previous remarks about wind speed varia-
tions with height apply.

     Grid models utilize a wind field defined hourly on a two- or three-
dimensional grid over the time period of the simulation.  Definition  of
such a wind field requires both surface and upper air observations.   When
wind data aloft are not available, theoretical wind shear relationships must
be employed.  As noted above, no definitive guidance is  available  as  to the
appropriate number and location of monitors.  For general guidance, Hayes,
Reynolds, and Roth (1977) recommended a minimum of 8 to  12 monitors and a
desirable number of 12 to 20, making continuous measurements, for  charac-
terizing ground-level wind speeds and directions.  For vertical  wind  sound-
ings, they recommended a minimum of one location measuring four times per
day and a desirable number of three locations making measurements  six to
                                   48

-------
          •TTTROON (ICASURID)

          •GROUND (CALCULATED)
                               1230
                                 r
                MISH
                 •
                                                 I
   SEPTEMBER 30. 1969
                          BUR ;

                          1030-i
                                                       PASA
                                                        BURBANK
                                                        COfWERCE
                                                        DOWNTOWN LOS ANGELES
                                                        HOLLYWOOD
                                                        -LA CANADA
                                                        LOS ANGELES IKT'L AIRPORT
                                                        LENNOX
                                                        MISSION HILLS
                                                        PASADENA
                                                        VENICE
                                                        WEST LOS ANGELES
  Source:  Eschenroeder,  Martinez, and Nordsieck (1972).
FIGURE  IV-2.
COMPARISON OF CALCULATED GROUND TRAJECTORY  WITH OBSERVED
TETROON TRAJECTORY  FOR THE  LOS ANGELES BASIN
                                        49

-------
eight times per day.  These recommendations were for a verification study
in a hypothetical urban area with a population of 1 million people and an
                     2
area of about 2500 km .  The features of a particular area that may
affect the number of monitors required are:

     >  Topography.  Where there is significant influence on the
        wind flow by terrain features, extra monitors may be
        required near those features to define the flow fields
        adequately.
     >  Emissions patterns.  Where there is high spatial variability
        in the pollutant concentration pattern, we can expect com-
        parable variability in the pollutant concentration patterns.
        As a result, careful definition of the wind and air quality
        monitoring  locations will be required.  Where emissions
        are evenly  spread over a region, concentration patterns will
        be smoother and less sensitive to small fluctuations in wind
        speeds and  directions.
     >  City size.  Cities covering large areas should have more
        monitors to treat adequately possible complexities in the
        pollutant concentration field.

     Other meteorological data required for large, computer-based air quality
simulation models are temperature soundings (used to obtain mixing depth and
atmospheric stability information), insolation (solar radiation intensity),
and cloud cover.  Much the same considerations apply to these variables as
to wind measurements, but in general fewer measurements are required to
define their input values adequately.  Table IV-6 shows the meteorological
data requirements recommended by Hayes, Reynolds, and Roth (1977).  Satis-
factory mixing depth data are important because computed concentrations are
often nearly proportional to this variable.  Measurements of insolation and
cloud cover are required when photochemically reactive pollutants are to be
modeled or when they are needed te estimate atmospheric mixing characteristics.
                                   50

-------
  TABLE IV-6.    ESTIMATED METEOROLOGICAL DATA REQUIREMENTS  FOR EVALUATION
                OF A LARGE AIR QUALITY SIMULATION MODEL  IN  A HYPOTHETICAL
                URBAN AREA
       Parameter
Wind speed and direc-
tion (ground-based)
Vertical  wind soundings
Temperature soundings
Insolation
Cloud cover
 Number of Stations
Minimal    Desirable
 8-12
12-20
                                                    Frequency of Measurements
                                                     (observations  per day)
             Minimal
1
2
1
1
3
4
3
3
4
4
Continuous*
8
                Desirable
Continuous*    Continuous*

                   6-8
                   6-8
               Continuous*
                  24
* Continuous  indicates  hourly averaged data.
Source:   Hayes,  Reynolds,  and Roth  (1977).
                                    51

-------
b.   Emissions  Data

     To calculate atmospheric pollutant concentrations,one must first have
full information on  the quantities of pollutants and precursors being emit-
ted.  Such  information is obtained in the form of an emissions inventory.
A  complete  inventory has data on emissions from stationary sources
(including  large point sources, such as power plants, and distributed area
sources such  as home heating furnaces), mobile sources (principally auto-
mobiles,  trucks, and aircraft), and natural sources (such as vegetation).

     The  extent and  degree of detail needed for an emissions inventory depends
on model  requirements and the application.  Data on the emissions of a single
point source  should  be readily obtainable.  However, the assembly of a
detailed  emissions inventory for a number of pollutants over a large urban
area can  be an  expensive and time-consuming task, involving as much if not
more effort than the modeling exercise itself; general guidelines for pre-
paring an emissions  inventory are given in the reports by EPA (1973).  The
ease with which an emissions inventory can be assembled depends on what kind
of inventory  is already available.  If it is extensive as the result of on-
going surveillance activities, relatively little extra effort may be needed,
but if little information is available, a trade-off will  need to be made
between the effort involved to assemble a full inventory and the degrada-
tion of results associated with an imcomplete inventory.

     When data on emissions from a single large point source or a small
group of  sources are required, they can be obtained directly through stack
measurements or indirectly from knowledge of the throughput or operating
characteristics of the specific equipment.  If a large number of plants
are being modeled or specific information is not available, emissions can
be  calculated for general classes of sources through the use of the EPA's
"Compilation of Air  Pollutant Emission Factors" (EPA, 1972),  which gives
estimates of emissions for many types of equipment as a function of some
activity level such  as fuel  consumption.   These factors are averages estimated
with varying degrees of accuracy; in any case they would not be expected to be
as accurate as specific information on a particular piece of equipment.
                                   52

-------
     In construction of emissions  Inventories,  small  point  sources  are gen-
erally aggregated and treated as area  sources because individually  each
one is too small  to have a significant impact,  but a  large  number of  such
sources spread over a few square kilometers will  contribute an  appreciable
amount of pollutants.  In some applications, where short-term-average con-
centrations are required, the temporal  variations in  emissions  from these
sources are important (for instance, residential  space heating  has  pro-
nounced seasonal  and diurnal  variations).  If an  annual average concentration
is sought, only larger-scale  temporal  fluctuations (e.g., seasonal) may be
required for the  model inputs.

     Mobile source emissions  are obtained  by combining traffic  and  vehicle
driving chracteristies with emissions  rates for individual  types of vehicles
(EPA, 1978b).  Different levels of detail  are possible,  ranging from  an over-
all estimate of vehicle miles traveled (VMT) combined with  characteristics
of an assumed vehicle population,  to detailed characterization  of traffic
density and speeds for all major streets by time  of day.  together with emis-
sions data from a representative sample of the  actual  vehicle population.
Traffic and vehicle data are  often available from local highway departments
that collected them while planning transportation system  improvements.

     A relatively low level of detail  in mobile source emissions inventory
would be required if the contribution  of traffic  to annual  average  parti -
culate concentrations is to be studied.  In some  models,  merely an  annual
average, not diurnal  and seasonal  variations in traffic densities would be
required.   By contrast, if the formation of photochemical pollutants  is to
be followed over a wide area  on an hourly  basis,  an emissions inventory with
substantial spatial and temporal detail is required.   In  the absence  of
specific information, many assumptions must be  invoked, with the concomitant
possibility of degradation in the  quality  of model results. For example,
vehicular emissions are dependent  on vehicle operating characteristics.
Therefore, to account properly for photochemical  precursors emitted during
the morning rush hour, the traffic distribution and average speed  patterns
on surface streets and freeways during those hours are needed.   Note that
collection of such detailed data where they are not  previously  available can
be a time-consuming and expensive  task.
                                   53

-------
     As pointed out above, any model application requires full  information
on emissions of all pollutants and precursors of interest.   This  specification
may be relatively simple in the case of a model used for one inert pollutant,
but for a complex model incorporating a comprehensive photochemical  reaction
mechanism, complete data may not be available.  For instance, the SAI  Airshed
Model currently requires data on hydrocarbon emissions divided into five
categories:

     >  Paraffins
     >  Ethylene
     >  Olefins excluding ethylene
     >  Aromatics
     >  Aldehydes and ketones.

Other photochemical models might require somewhat different divisions.
When such data are not available for all source types, either a special
study must be undertaken or estimates must be made based on surveys in
other areas, such as that reported by Trijom's and Arledge (1975).  How-
ever, it would be fruitful to review the sensitivity of the model predic-
tions to the level of detail in these inventories before embarking on  a
potentially costly data collection program.

c.   Air Quality Data

     Air quality data are required in a model evaluation study for two
purposes:  (1) to define background or upwind pollutant concentrations*
to which emissions from the sources of interest are added, and (2) to
provide the basis for comparison with model predictions.  Unfortunately,
data obtained for the latter purpose are of limited use for the former.
The air quality measurements most valuable for comparison with model pre-
dictions are generally those taken in the areas of highest pollutant con-
centrations in the modeling region, but these measurements are of little
value in determining background concentrations.
* In some models, it is also necessary to specify initial  pollutant
  concentration conditions throughout the modeling region.
                                   54

-------
     Thus, the air quality data needs of the evaluation program are dependent
not only on the model being evaluated, which specifies the background or
boundary and initial conditions data needed, but also the performance measure
to be used, which dictates the type and amount of air quality data needed
for comparison with predictions.   Information on required data for various
performance measures is contained in the companion report by Hayes (1979).

     Data to aid in the specification of background,  initial, or boundary
conditions are needed for all types of models.   For a Gaussian single source
model used for an inert pollutant, only information on the background con-
centration is required.  If the facility being modeled currently exists,
background measurements should be made in such a manner that they are not
influenced by emissions from that facility.   Such measurements should be
made under all conditions to which the model will  be  applied.   In the case
of a multiple-source model, the same principles apply, but additional  monitor-
ing may be necessary to characterize the uiwind background concentrations
adequately.  The number of monitors required to obtain this information is
related to the spatial  variability of the upwind pollutant concentration field,

     Grid models, which need initial conditions for every cell,and boundary
conditions, throughout the period being modeled, require much more air
quality data than do multiple-source Gaussian models.  For the specification
of initial conditions,  an array of surface observations  is required.   Ver-
tical pollutant soundings are desirable to characterize  concentrations
aloft.   When these data cannot be collected, then  concentrations aloft must
be estimated based on the results of other pertinent  studies or inferred
from the ground-level observations.   Hayes,  Reynolds, and Roth (1977)  sug-
gested that up to ten stations making continuous measurements  was a desirable
level of monitoring for ground-based observations of air quality, while at
least three and up to eight vertical sounding sites making up to eight
measurements per day would be needed to characterize  concentrations aloft
adequately.  Of course, these inputs can be  specified using fewer observa-
tions,  with possible attendant increases in  their uncertainties.  These
                                                                      o
numbers, suggested for an urban area of 1 million people and a 2500 km
area, were based on experience with the SAI  Airshed Model-
                                   55

-------
d.   Data Quality

     Under this heading we consider all issues that relate to the suitability
of the data employed in the model evaluation effort.   We divide these con-
siderations into three broad categories:

     >  Resolution of the data
     >  Area and time period covered by the data
     >  Precision and accuracy of the data.

1)   Resolution of the Data

     Ideally, data should be spatially and temporally resolved to the degree
required by the model.  For techniques that utilize relatively simple alge-
braic expressions to calculate concentrations, such as Gaussian models,
point measurements like those obtained using a monitoring network are appro-
priate.  That is, the Gaussian formula can be used to estimate the concen-
tration at the exact point where the monitoring instrument is located.  Care
should be taken, however, that the pollutant concentrations measured for
comparison with predictions are due solely to the emissions source being
modeled.  Because Gaussian models are steady-state models, they do not pre-
dict any temporal concentration fluctuation; therefore, the time-averaging
of air quality data should be commensurate with that of the meteorological
inputs.

     For models that compute a concentration that is representative of the
spatial average over a cell, such as grid models, there is always a question
about the degree to which pollutant concentrations measured at a point in
the cell represents the average concentration in the entire cell.  If the
measurement is not representative, it cannot be readily compared with the
computed value, and thus the model's performance cannot be properly evalu-
ated.  Specific criteria governing monitor placement to ensure representa-
tiveness are not available, but as a general principle it is undesirable
                                     56

-------
to locate the monitor in an area  where  pollutant concentration gradient
may be large.  For example, it would  be advisable  to avoid placing a monitor
near a significant source of any  pollutant of  interest.  The averaging
time relevant to pollutants simulated by this  type of model is usually one
hour, and since the time steps used in  model calculations are of the order
of a few minutes, computed concentrations can  be time-averaged and com-
pared with the observations.

2)   Area and Time Period Covered by  the Data

     One aspect of data  quality that  must be given attention in the appli-
cation of models with large data  requirements  is the adequacy of coverage
in a spatial  and temporal sense.   In  most applications it seems inevitable
that some compromises in the amount of  data collected will need to be made,
because of lack of resources,  or  time,  or both.  As was pointed out above,
it is not possible with  current methods to design  objectively a data-
gathering network.   Thus, it is usually necessary  to rely in part on exper-
ience and judgment to specify the number and locations of monitors.  Gen-
erally, data  collection  for model  evaluation must  be more extensive than
for model exercise (Hayes, Reynolds,  and Roth,  1977).

     In view  of the probability that  the input  data base will prove to be less
comprehensive than is ideal,  the  sensitivity of the model to degradation
of data should be known.   This sensitivity can  be  studied as part of model
development activities,  or it can be  determined by making model computations
with various  levels of detail  in  an existing comprehensive data base, such
as that collected in the RAPS program for St.  Louis.  For instance, Table
IV-7 shows three possible levels  of detail for the data base for  input to
the SAI Airshed Model (Tssche, 1978).   Evaluation  of model results using
these different data bases should enable determination of the  kinds of
uncertainties that could be introduced  in model -predictions  using  less  than
ideal data bases.
                                   57

-------
             TABLE  IV-7.
   LEVELS  OF DETAIL  IN  DATA USED  AS  INPUT  TO  PHOTOCHEMICAL
   AIR  QUALITY SIMULATION MODELS
           Input
Atmospheric  stability
    Maximum Practical  level

Continuous monitoring  of «1i-
Ing depths with acoustic soun-
der at one or BO re locations

Several (5-8) vertical tem-
perature soundings through-
out the day at various loca-
tions within the Bodellng
region

Numerous surface temperature
Measurements recorded  hourly
•t various locations through-
out the modeling region

One or Bore Instrumented
towers providing continuous
neasurenents of the Mixed
layer thermal structure
     Commonly Uted Level	

A few (3-5)  temperature sound-
Ings at different tines of the
day at one or two locations

Several surface temperature
measurements recorded  at  var-
ious locations  throughout the
•odeling region
  Hlnlmum Acceptable level*

Twice dally temperature
soundings  at an airport
within or  nearby the  region
being modeled

A few (1-3) surface tempera-
ture Measurements with which
to estimate temporal
variation
                                                                                                  Limited spatial resolution
                                                                                                  or none at all
Hind fields
                                 Numerous  ground-based monitor-
                                 ing stations  reporting hourly
                                 average values

                                 Frequent  upper  air soundings
                                 at several  locations through-
                                 out the modeling  region

                                 Continuous  upper  level measure-
                                 ments on  one  or a few elevated
                                 towers

                                 Hind, Inversion,  temperature,
                                 and terrain data  used as Input
                                 to the 3-0  numerical model
                                 yielding  the  mass conserving
                                 3-D wind  field
                                 Interpolations from ground-
                                 based monitoring network and
                                 limited (3-5)  number of upper
                                 level soundings at  one or two
                                 locations

                                 Resultant wind field rendered
                                 mass consistent by  divergence-
                                 free algorithm
                                 Interpolations  from  limited
                                 (3-5 stations)  routine
                                 surface wind data; theoret-
                                 ically derived  vertical pro-
                                 file assumed
Solar radiation
Several (3-5) UV pyranometers
located In the region, contin-
uously recording UV radiation
levels

Vertical attenuation of radi-
ation at a few locations
several times dally determined
by aircraft observations

Spatial (3-D) Insolation fields
determined by Interpolation of
measurements
A single, ground-based net
radiometer; Insolation assumed
constant over the region

Vertical attenuation estimated
empirically as a  function of
aerosol mass
No radiation measurements
available; estimated theo-
retical values based on the
solar tenlth angle

Attenuation not accounted for
Boundary and  initial
conditions
Hourly species concentrations
extrapolated and interpolated
throughout the region using
data from the extensive
ground-based monitoring net-
work; airborne data also
available; hydrocarbon mix
obtained from qas chromato-
graphic analyses at several
times during the day

Sulfate concentrations avail-
able on an hourly basis at
several locations
Hourly concentrations extrapo-
lated and interpolated using
data from several  ground-based
stations; hydrocarbon mix
obtained from qas  chromato-
graphic analysis at one or two
stations one or a  few times
during the day

Sulfate concentrations based
on a dally average and diurnal
ozone curve
Hourly concentrations extra-
polated and Interpolated from
a minimal routine monitorinq
network; either hydrocarbon
mix assumed or average value
obtained from a compilation
of available data taken In a
similar area

No data on concentration
variations aloft

Sulfate measurements Inferred
from values obtained In
similar areas
                                                          58

-------
                                                TABLE  IV-7  (Concluded)
            Input
 Stitloniry source missions
    Maximum Practical level

Sepirate grldded Inventories
for point and area  stationary
sources; characterization of
organic comoosltlon, and NO/
NO? and S02/S04 emissions
rates for major sources;
diurnal and seasonal varia-
tions In nominal emissions
rates for each major source
type
                                                                       Comnonly Used Level
Lumped, grldded Inventory  for
stationary sources; NO species
fractlonatlon; seasonal  and
diurnal variation In regional
emissions for each pollutant
  Minimum Acceptable Level*

Lumped stationary source
emissions Inventory for U»e
region as a whole; Halted
Information on the percentage
of each source type; no t^-
poral variation
 Hydrocarbon species distri-
 bution
Mix obtained front gas chroma-
toqraohlc analysis of samples
collected throughout the
reqlon. particularly near
larqe sources

Cold start factors applied
grid by grid when calcula-
ting mobile source emissions
Mix obtained from standard
•missions factors (AP-42) to-
gether with a detailed  source
Inventory, supplemented with
one or two oas chromatographlc
analyses
Mix assumed or obtained frm
available data compilation.
either for the city of
Interest or some slullar area
 Mobile source emissions
 factors
AP-4Z  (latest  supplement)
emissions factors  used  In con-
junction with local vehicle
age distribution;  corrldor-
by-corr1dor VMT, Including
peak and off-peak  speed dis-
tributions, vehicle nix, and
traffic daU for Intrazonal
trips
AP-42 emissions factors,
assumed vehicle n1x,  and
Intrazonal VKT( estimated peak
and off-peak speeds,  fewer
traffic counts available for
verification, VKT  available
for fewer major arterlals
Grldded VHT, emissions
factors estimated froa 49
state n1i, and average (FDC)
driving profile; assumed
regional speed distribution
Vehicular cold start distri-
bution
Spatial and temporal distri-
butions of cold  starts
Inferred from actual traffic
and demographic  data
Cold starts temporally  resolved
using traffic distribution; no
spatial resolution or spatial
resolution only from estimates
of driving patterns
Cold starts as a fixed per-
centage of all driving-
traffic data ire not detailed
enough for spatial resolution
of cold starts; cold starts
estimated from demographic
data
 Model  verification data
Hourly averaoed species con-
centrations for NO,  NOji Oi,
SO;, NMHC- sulfate,  CO, and
partlculates from an extensive
ground-based monitoring
network
Hourly averaged concentrations
of NO, N02, Oj. SO;.  NMHC. CO.
and partlculates from several
ground-based stations

Dally averaaed sulfate measure-
ments available from  a limited
(3-5) number of stations
Hourly averaged concentrations
of NO,. 03, THC. S0». and CD
from a minimal routine moni-
toring network
* Using data  at  this level of detail  necessitates numerous assumptions.
Source:   Tesche  (1978).
                                                          59

-------
     In this context, we believe that it would be desirable for the  EPA
to consider maintaining a set of data bases suitable for evaluating  air
quality models.  These data bases could be used for general evaluation
of models and for studying model sensitivity.  They could be assembled  for
many typical model applications, such as to urban areas (e.g.,  St. Louis)
and to isolated point sources in flat terrain and complex terrain.   The
data bases could be updated as new and better Information became available.

3)   Precision and Accuracy of the Data

     The third type of data quality issue is the precision and  accuracy
of the data used in model evaluation.  By "precision" we mean the uncertainty
of a datum about its stated value, and by "accuracy" we mean the bias in the
stated value—the difference between it and the true value.  We do not
provide an extended discussion of these topics here since they  are amply
covered in the standard statistical literature.  Meteorological emissions
and air quality data are all subject to error.  Table IV-8 shows the accu-
racy of meteorological measurement devices, and Table IV-9 and  Exhibit
IV-2 present the published precision required of federal reference methods
for analyzing air pollutants.  More specific information on measurement
method precision can te found in many sources (e.g., Lawrence Berkeley
Laboratory, 1976; Burton et al., 1976; Roth et al., 1975).

     The use of an acceptable and sufficiently precise measurement tech-
nique does not guarantee the collection of high quality data.  Proper data
quality control procedures are also required, including the following
activities:

     >  Procurement of adequate auxiliary equipment and supplies
     >  Calibration procedures
     >  Sampling and analysis procedures
     >  Data collection and reporting
     >  Calculation and data processing
     >  Preventive maintenance
     >  Data auditing procedures.
                                   60

-------
TABLE  IV-8.     VALUES  AND  SOURCES  OF  ERRORS  IN  METEOROLOGICAL  MEASUREMENTS
       Variable
   Hind direction
   •t ground level
Wind speed
•t ground level
   Temperature
   at ground level
   Mixing or inver-
   sion height

   Vertical  tem-
   perature  struc-
   ture

   Pressure
   Vertical wind
   structure
                       Measurement Method

                      Vane
                      Two or three
                      orthogonal hot wires
                      or films
                      Two or three sonic
                      anemometers
                      Three-cup  anemo-
                      meter or propeller
                      Not wire or hot
                      film
                      Sonic anemometer
Electronic  (sonic
thermistor,  etc.)
Radiosonde


Radiosonde



Radiosonde


Radiosonde
                                          Scale  Limitation
                                            (wavelength)

                                          Micro-peso scale
                                          » •  10 ft.

                                          Hind tunnel tur-
                                          bulence scale
                                          i •  10-« ft.
                       Micro-scale
                       » • 10-' ft.
                       Micro-Beso scale
                       1 •  10 ft.
                       Hind tunnel tur-
                       bulence  scale
                       » •  10-4 n.
                       Micro-scale
                       » • ID"1 ft.
                                          Microscale-
                                          micro-mesoscale
                                          » •  ID"1 - 10 ft.

                                          Mesoscale
                                          x «  10< ft.

                                          Mesoscale
                                          x «  104 ft.
                                          fesoscale
                                          » •  10*  ft.

                                          Mesoscale
                                          » •  10*  ft.
	Source  of Error

Overshoot system
resonance, calibration

Electronics  calibration
and response time, sys-
tem reactance, ambient
temperature  drift

Electronics  calibration
pressure and temperature
drift

Systems  inertia, spin-
down time (distance con-
stant) calibration

Electronics  calibration
and response time, sys-
tem reactance, ambient
temperature  drift

Electronics  calibration
pressure and temperature
drift

Calibration, response
lag
Tracking errors (±0.28*)


Sensing error



Sensing error


Tracking error (±0.28°)
                                                                     Approximate Accuracy

                                                                          ±5'


                                                                          ±3°
±8°



±0.5 nph



±1.0 n>h




±1.0 Bph



13'C



±50 ft.


±0.5*C



±1.5 nb
±4 mph
±0.7°
   Sources:  Data on scale limitation from tazzarella  (1972); data on source or error from MacCrady  and Jex (1967).
            Approximate accuracies taken from those references and also MR I (1975), Wtll (1974),  and U.S. Army
            (1960).  Data on radiosonde measurements  from Lenhard (1973).
                                                  61

-------
   TABLE IV-9.
UNCERTAINTIES IN MEASUREMENTS OF POLLUTANT CONCENTRATIONS
BY FEDERAL REFERENCE METHODS
    Pollutant
S0r
Participates
Carbon monoxide
Photochemical
oxidants

Hydrocarbons
(corrected for
methane)
   Reference Method

   Pararosaniline
   High volume
   sampler
   Nondispersive
   infrared spec-
   troscopy

   Ethylene
   Chemi1uminescence

   Flame ionigation
   detector
Nitrogen dioxide    Colorimetric
	Precision and Accuracy	

4.6% relative standard deviation
at 95% confidence level

Repeatability 3.0%, Reproducibi-
lity 3.7% (based on collaborative
testing).  Accuracy:  Error may be
as high as ±50% of the measured
concentration

Not given (analyzer must meet
specifications given in Exhibit IV-2),
Not given (analyzer must meet
specifications given in Exhibit IV-2).

Precision:  0.5% of full scale
Accuracy:  1% full scale for higher
ranges; 2% full scale for lower
ranges

Relative standard deviations:
14.4% at 140 vg/m3 N02; 21.5% at
200 vg/m3 N0
Source:  40 CFR §50, Appendices A-F (1975).
                                    62

-------

Performance parameter Unlti '

2. Nol*e 	 	 ..do 	 .
1. Lower detectable limit 	 . 	 .do „_




•, Rj»m rtrfft, 2< honr* _ . r ,,r.
20 percent of npper rangr limit. . 	 Percent .. . .
80 percent of upper range limit 	 	 do 	 ....
7. Lie tlm* ,., 	 	 	 ... ... 	 . . MlP"*»s ...

9. Fall time 	 ... 	 . 	 . 	 ...do. ...
10. Precision . . 	 	 	 . ...•._
20 percent of upper rsnire Ujralt. ... Parts per million.
80 perrwnt of upper ranf* Umtt do

Bnlfor Carbon Definitions
dioxide Oridantt monoxide and Ust
procedures
0-0. J
a 006
a 01
±0.02
a oe
±ao2
±2X0
±4.0
»
IS
14
a oi
a ois
O^LS
a 005
a 01
±0.02
0.08
±aoz
±20.0
±4.0
90
,V>
15
a 01
a 01
< To COD vert from parti per million to mlCFoffram per coble meter at 25* C and 760 mm'Bc,
where M Is the molecular weight of toe fas.
0-60 1 SB-23(»).
a SO I53.23(b).
1.0 JS8.3(c).
	 I 53-23(d).
±1.0
L5
±1.0 1 53.23 (e).
±iaO I53.23(e).
±Z& |S3^3(e).
10 1 53.23 (e).
5 1 53.23 (e).
5 1 M.23 (e).
	 | 53^3 (e).
as
as
multiply by M/O.02447,
     Source:  40 CFR §53.20 (1975).
EXHIBIT IV-2.    PERFORMANCE SPECIFICATIONS  FOR AUTOMATED MEASUREMENT METHODS
                                   63

-------
These items are discussed in the EPA's "Quality Assurance Handbook for
Air Pollution Measurements" (EPA, 1976),  All data collection should at
a minimum fulfill the requirements of these guidances.

5.   Assessment of the Need to Collect Additional Data

     As a result of the previous activities, the group responsible for model
evaluation now knows which pertinent existing data are available for the
study and what specific data are still needed.  In principle, any identified
data requirement that cannot be met from previous work will need to be
collected.

     It should be noted that although much of the data in the emissions
inventory is independent of meteorological conditions, the same is not
true of air quality data.  However, those elements of the emissions inven-
tory that are dependent on weather (e.g., space heating and electrical
power generation emissions depend on ambient temperature) should be
obtained for conditions corresponding to those used for the modeling study.
Host diumal and seasonal variations in emissions, except diurnal traffic
patterns, are neglected in practice.  Air quality observations, of course,
depend on meteorological conditions in a very direct way, and thus the
air quality data and meteorological data should be obtained for the same
day.

     This stage of the model evaluation study is thus concerned with delin-
eating additional information required for the emissions inventory and
ascertaining whether sufficient corresponding meteorological and air quality
data exist to evaluate the model properly.  It will probably be found that,
for most areas, the emissions, meteorological, and air quality data require-
ments will be partially fulfilled by the existing data base.  The prime
concern will be to delineate any supplementary data-gathering efforts needed
to assemble a minimally acceptable data base for model evaluation.  Of course,
as pointed out above, only certain types of data can be added in a supple-
mentary effort.
                                   64

-------
6.   Specification of Performance  Standards  and  Measures

     The rationale behind and the  process  of specifying performance  stan-
dards and measures are fully described  in  the companion volume  (Hayes,  1979).
Table IV-10 presents  the model  performance measures and standards developed
in that work,  which is one of the  most  systematic considerations of  measuring
model performance carried out to date.   While four generic types of  per-
formance measures were identified—peak measures, station measures,  area
measures, and  exposure/dosage measures—the  difficulty of measurement and
the unreliability of all but station measures leaves station measures as the
only practical candidate for measuring  model  performance.  The  importance
of station measures in evaluating  model  performance emphasizes  the necessity
of choosing locations of monitoring stations  carefully.

C.   IDENTIFICATION OF THE SCOPE AND REQUIREMENTS OF MODEL fVALUATION

     At this point in the model evaluation study, all of the planning work
has been completed.  Before the execution  of the study, a detailed descrip-
tion of the required  tasks should  be assembled.  Such a description  will
ensure that nothing has been left  out,  that  the  various efforts are  coordi-
nated, and that necessary resources are allocated.  Items that  should be
included for this definition phase are:

     >  Delineation of required model characteristics.
     >  Description of any necessary model modifications.
     >  Listing of available resources.
        - Money.
        - Manpower.
        - Equipment.
        - Data.
     >  Description of requirements for further  data collection.
     >  Description of the analyses to  be  conducted with the
        model  results.
     >  Description of the performance  standards and measures  to
        be used.
                                   65

-------
            TABLE  IV-10.    MODEL  PERFORMANCE  MEASURES  AND  STANDARDS
    Performance
     Attribute

Accuracy of the
peat prediction
Absence of System-
atic bias5
              Performance Measure-
Lack of gross
Temporal cor-
relation''
Spatial  alignment
Ratio of the predicted  station peii to the mea-
sured station (could be at  different stations)

Difference in timing of occurrence of station
peak'

Average value and standard  deviation of the mean
deviation about the perfect correlation line,
normalized by the average of the  predicted and
observed concentrations, calculated for all
stations during those hours when  either the
predicted or the observed values  exceed the
NAAQS
Average value and standard  deviation of the
absolute mean deviation about  the  perfect cor-
relation line, normalized by the average of the
predicted and observed concentrations, calcu-
lated for all stations during  those hours when
either the predicted or the observed values
exceed the NAAQS

Temporal correlation coefficients  at each mon-
itoring station for the entire modeling period
and an overall coefficient  averaged for all
stations
Spatial  correlation coefficients calculated
for each modeling hour considering all monitor-
Ing stations,  as  well  as  an overall coefficient
average  for the entire day
                                                              Performance  Standard
Limitation on uncertainty  in aggregate health
impact and pollution abatement costs*
Model must reproduce reasonably well the
phasing of the peal.--say,  il hour

No or very little systematic bias at concen-
trations (predictions or observations) at or
above the NAAQS;  the bias  should not be worse
than the maximum bias resulting froir. EPA-
allowable calibration error  (-8 percent is •
representative value for ozone); also, the
standard deviation should  be less than or
equal to that of the difference distribution
between an EPA-accepiable  monitor** and an EPA
reference monitor (3 pphm  is representative
for ozone at the 95 percent confidence level)

For concentrations at or above the NAAQS, the
error (as measured by the  overall values of
the average and standard deviation of the
absolute mean normalized deviation about the
perfect correlation line)  should not be worse
than the error resulting from the use of an
EPA-acceptable monitor**

At a 95 percent confidence level, the temporal
profile of predicted and observed concentra-
tions should appear to be  in phase (in the
absence of better information, a confidence
Interval may be converted  into a minimuir.
allowable correlation coefficient by using an
appropriate t-statistic)

At a 95 percent confidence level, the spatial
distribution of predicted  and observed concen-
trations should appear to  be correlated
   There is deliberate  redundancy  In the performance measures.  For example.  In testing for systematic bias,  the
   mean and standard deviation of  the mean (signed) deviation are calculated.  The latter quantity Is a measure
   of "scatter" about the  perfect  correlation line.  This is also an indicator of gross error and should be used  in
   conjunction with the estimates  of the mean and standard deviation of  the absolute mean deviation about the perfect
   correlation line.
 < These measures  are  appropriate when the chosen model
   tive pollutants subject  to short-term standards.
 t These may not be appropriate for all regulated pollutants In all  applications.
   based on pragmatic/historic experience should be employed.
•• By "EPA-acceptable  monitor" we mean a monitor that satisfies the  requirements of 40 CFR 153.20.
                                   1s sued to consider questions  Involving photochemlcally reac-

                                                              Uhen  they are not, standards derived
 Source:    Hayes  (1979).
                                                    66

-------
This task represents the transition between the planning and  the execution
of the model evaluation study.   The formal  description  of the complete
model evaluation study will  permit the identification of possible flaws
in the modeling approach and compromises  to be  made  in  evaluating model
performance.

D.   PERFORMANCE OF THE MODEL EVALUATION

1.   Adapting the Model for Use in the Study

     This task consists of those operations necessary to  prepare  the model
so that the data can be input to the model  and  translated into a  set of
concentration predictions for the area of interest.  The  operations involved
are:

     >  Installation of the  model  on a computer
     >  Checking of model operation using test  cases
     >  Selection of optional model features
     >  Adaptation of algorithms to specific area and application.

We orient our discussion to  air quality models  that  require a computer
because alT models except some  of the simplest  exist in  the form  of com-
puter programs.   The first operation,  that  of installing  the  model program
on the computer, can be easy if a version of the model  that is compatible with
the particular machine exists.   If it does  not, advice  from computer experts
should be sought to estimate the time and effort required to  convert the
program.   These estimates should be compared with the cost of acquiring time
on a computer of a more suitable type, so that  a decision can be  made as
to where to run the program. Model developers  should attempt to  make their
computer codes as portable as possible to ensure that a  reasonable choice
of computer hardware options is available to the user.

     After the program is installed and running, its operation with actual
data should be -verified if possible.  If  the program has been used before,
                                   67

-------
 input data from a previous successful  run  can  be  used  for check-out.   It
 is  important at this stage to examine  all  parts of the  program operation,
 so  that one can be confident that the  codes  are properly installed  in  the
 host computer.

      Another part of adapting a model  for  use  in  the contemplated study is
 to  modify the model to cover the area  of interest adequately and to treat the
 physical and chemical phenomena of importance  in  the study  area.  For
 instance, the coordinates of the boundaries  of the modeling region  must be
 included in the input for a model that predicts a concentration  field  over
 a given area.  For the Climatological  Dispersion  Model  (COM), a  long-term
 Gaussian model, this amounts to specifying the coordinates  of a  set of
 receptors at whose position the pollutant  concentrations will be computed.
 For a grid model, the boundaries of the modeling  region must be  chosen, as
 well as the grid size.  Smaller grid sizes give more resolution  of  output
 but take more computer time.  (Of course,  the  resolution of the  output is
 also dependent on suitable resolution  of the input data.)

      To illustrate the influence of grid size  on  model  output, we present
 in  Figure IV-3 surface concentration isopleths calculated by the LIRAQ
 Model (MacCracken et al., 1977) for CO using a 1-kilometer  grid  and
 a 5-kilometer grid, respectively.  There is  much  more detail  in  the
 1-kilometer grid results than in the 5-kilometer  results.   Comparison  of the
 two figures shows the same broad trends, though calculation using the
 coarser grid produces lower predictions of peak concentrations.* The  size
 of  the grid used should be compatible  with the resolution of the available
 input data as well as the desired resolution of the output.   In  general, a
 grid model is capable of resolving only features  in the concentration  field
 that have spatial scales of at least two grid cell dimensions.

     Finally, it is necessary to adapt  model  algorithms for  the application.
For  instance, any model that deals with point sources has embedded in it a
 * We note that the lower values of the peak predicted  concentrations are
   expected because the model  predictions  represent  an  average over a much
   larger area when the 5 kilometer grid is  employed.   This  has  the effect
   of averaging out the peak values.
                                    68

-------
               L1RAO-1  JULY  REGION 6  VERIFICATION
                                       \^>
                                               ^,
C«03
 H.II6
                                                               \>
      «i8
                                                o  «»   n  ft   f
                           (a)   1-Kilometer Grid
   FIGURE IV-3.
SURFACE CO CONCENTRATIONS CALCULATED  FOR THE SAN FRANCISCO
BAY AREA AT 1400 ON 26 JULY 1973 USING THE LIRAQ AIR
QUALITY MODEL AT DIFFERENT GRID SIZES
                                    69

-------
                LIRAQ-I  JULY 5 KM  REGION 6 RUN
                                                    •OTR
                                                                      \
 *.!!•

C*03
 H.IIS
                            »   •
       tfi o 0
                           (b)  5 -Kilometer Grid

 Source:  MacCracken  et  al .  (1977).

                           FIGURE  IV-3 (Concluded)
                                     70

-------
 provision for incorporating a plume rise algorithm.  Many plume rise algo-
 rithms have been proposed in the technical literature.  Because there is
 some latitude in the selection of a plume rise algorithm for a given appli-
 cation, the choice should involve comparison of computed values by different
 algorithms with observations.  Other algorithms that may need adaptation for
 a particular situation are:

     >  The technique used to construct wind inputs given the
        available meteorological observations.
     >  The method for estimating mixing depths.
     >  The turbulent dispersion or di.ffusivity algorithm.
     >  The kinetic mechanism employed to treat chemical  reactions.
     >  The algorithms for generating emissions inputs.

 In general, the user should examine all algorithms in the model  in light
 of the current understanding of physical and chemical processes  known to
 be important in the study area.

 2.   Gathering, Assembling, and Formatting the Required Data

     We discuss gathering, assembling, and formatting data together, since
 they are obviously highly interrelated.

     As the data base is  assembled,  it is codified and formatted  for use by
the  computer program.  The first part of this task consists  of collecting
existing data from emissions inventories and meteorological  and air  quality
data bases, as  discussed  in Section  IV.B.3.   Then  additional  data, as dis-
cussed in Section IV.B.5,may need to be gathered.   For the  emissions inven-
tory, the requisite traffic surveys  should be undertaken, and necessary
data pertaining to industrial  activities and land  use should be  collected.
For  the meteorological  and air quality data collection,  supplemental monitor-
ing  should be carried out for the necessary periods of time.
                                  71

-------
      Data assembly and formatting include the following activities:

      >  Evaluate and verify the  data
      >  Load the data on  the computer
      >  Prepare the data  bases for model  use.

 Data evaluation and verification involve examination of the collected data
 to ensure that adequate quality  control  procedures  were applied  in their
 collection.   The data validation procedures discussed  in the  EPA's Quality
 Assurance Handbook (EPA,  1976)  should be followed.  For simple models such
 as the Gaussian, loading the data and preparing them for the  model can entail
 just punching a few computer cards.   In  the case of a  large,  complex model,
 the data are entered into the computer and then used as input to a set of
 data preparation programs, which prepare the necessary data files for use
 with the simulation program.  These  data files  can  be  checked for errors
 and consistency before the main  program  is exercised.   At the conclusion
 of this phase of the evaluation, a complete data base  has been assembled
 for exercising the model  under the appropriate  conditions.

 3.   Exercising the Model

      In this task, the computer  runs  necessary  to evaluate the model are
 carried out.   If all of the preceding tasks have been  executed properly,
 no further difficulties should be encountered.   All results of the model
 runs should  be saved on computer files for later access.

      At this point, selected performance measures will  be calculated (as
 discussed in Section IV.B.6).  Although  these measures  will be the means
 whereby the  model  will  be judged to be adequate  or not,  all other results
 produced by  the  model  runs must  be saved  since  they will  be needed for
 further analysis  should model performance  be unsatisfactory.

 4.    Analyzing the Results of the Evaluation

     The first step  in analyzing  the results of the model evaluation is to
compare each computed performance measure with the appropriate performance

                                   72

-------
standard.   If the measure meets the standard,  the  model  is  considered ver-
ified, and one can conclude that model  performance is satisfactory for the
application.  Thus, the user can proceed with  some confidence that the
verified model will produce reliable results  for the intended application.

     If a performance measure fails to  meet a  performance standard in some
respect, the model is not considered verified  and  further analysis is indi-
cated.  (Even if the measure meets the  performance standard,  further anal-
ysis as outlined below will give much useful  information about model
behavior and is worthwhile to carry out if possible.)

     The analysis of the model  evaluation results  should center on the dif-
ferences between computed and observed  pollutant concentrations, i.e., the
residuals.  These residuals can arise from three sources:

     >  Errors in the input data (emissions data,  meteorological
        data, initial and boundary conditions,  and background
        concentrations).
     >  Errors in the formulation of the model  (approximations
        made in modeling pollutant dispersion,  transport,  or
        chemical transformations).
     >  Errors in the air quality measurements  used for  comparison.

The magnitudes of the first and third sources  of error can  be estimated
from the characteristics of the methods by which they are measured,  using
standard statistical techniques.  In addition  to measurement  errors that
result from statistical  fluctuations in measurement devices,  nonzero resi-
duals can result, requiring the extrapolations and interpolations necessary
to generate a complete model  input data set from insufficient data.  The
errors introduced by these extrapolations and  interpolations  are more dif-
ficult to quantify than instrumental errors.   They can be investigated by
carrying out an evaluation of the interpolation algorithm similar in struc-
ture to that described earlier for the  full air quality  model.
                                   73

-------
     Discrepancies introduced by shortcomings in the model's  formulation
are difficult to evaluate because we have no "true" model  of  the  relevant
atmospheric processes with which to compare it.   Also, in  light of the
uncertainties of the input and comparison data,  error due  to  model  formu-
lation cannot be isolated from the total modeling process.

     The air quality data used for comparison with model outputs  introduce
an uncertainty that can be quantified.  Because this uncertainty  stems  from
the measurement process, it can be determined by replication  of measurements
and standard statistical anlaysis.

     Careful analysis of the residuals can yield much useful  information
about the model, even if quantitative statements about sources of error
cannot be made.  Several different ways of analyzing residuals can be
informative [see, for example, Koch and Thayer (1971)]. Plots of residuals
against time of day can reveal systematic biases, which might result from
one of a number of assignable causes, such as an inadequate kinetic mechanism
in a photochemical model.  Dependence of the magnitude of  residuals on  con-
centration might indicate that a monitoring location is poorly placed to
detect a large area-wide concentration level, or that the  wind field gen-
erated by the model nrislocates the plume from a particular source.   Differences
between residual dependencies for primary and secondary pollutants can  be
used to infer deficiences in the kinetic mechanism or dispersion  processes.
The examples given here are only a sampling of possible situations; this
type of analysis should be guided by the particular situation and a knowledge
of the model's characteristics.

     Other statistically based analysis methods that can be used  to study
the evaluation results are:

     >  Plots of residuals against exogenous variables.
     >  Scatter plots of observed and computed concentrations.
     >  Correlation between observed and computed concentrations.
     >  Nonparametric tests of location to indicate possible  bias
        of computed concentrations relative to observations.
                                   74

-------
 5,   Assessing the Need To Perform Further Evaluation

     Upon completion of the analysis of the results of the evaluation, some
 indications of possible inadequacies in the model should be apparent.  This
 diagnosis can result in various actions:

     >  Reformulate model.  Instead of switching to a completely
        different model, consideration can be given to reformulat-
        ing parts of the chosen model.  For instance, the plume
        rise algorithm can be changed, or an interpolation algo-
        rithm for input data can be altered.
     >  Carry out additional verification.  If a performance  measure
        upon evaluation does not seem appropriate,  alternative
        choices could be identified and evaluated.   In addition,
        consideration could be given to examining more cases.
     >  Collect more data.  If deficiencies in the  input data are
        identified, supplementary data collection should be insti-
        tuded and the model should be reevaluated.   Further data
        collection can also augment air quality data used for com-
        parison purposes and to evaluate the performance measures.
     >  Reexamine the choice of model.  If it is shown that the model
        being considered is inadequate for the purpose and no other
        recourse is possible, consideration could be given to using
        a different model.

 It is likely that the data already collected would  support a  subsequent
 evaluation study of a new  or reformulated model.  Thus, the time and effort
 that went into the original evaluation study would  by no means  be  wasted
 even if the initially chosen model is found to perform inadequately.

 6.   Evaluating the Adequacy of the Model

     Basically, there are  two possible outcomes -of  the evaluation:  The
model is satisfactorily verified or it is not.  In  the former case, the
model can be used for the  application with some confiden.  that it will
                                   75

-------
 produce  satisfactory  results.   In  addition,  if the analysis of the evalua-
 tion  study  results  was  done  thoroughly, much information was gained as to
 the operation  of the  model and  the situations for which it might not give
 acceptable  results.

      In  the case where  the model was  not  verified, the preferable action
 is to reconsider the  use of  the model  in  the form tested.  Action to rectify
 deficiencies should be  taken before the model is applied in further studies.
 However, circumstances  may dictate that,  even though unverified, the model
 must  be  used anyway.  For example, the analysis may be subject to deadlines
 imposed  by  governmental regulations.   Perhaps the model represents the cur-
 rent  state  of the art and no improvements are possible without a major
 research effort.  Whatever the  reason, if the model is used, the previous
 analyses may pinpoint some of the  deficiencies in the model that caused
 it to fail  to meet the  required performance  standards.  These deficiencies
 should be fully detailed in  the account of its use.  To provide some
 assistance  and guidance to the  model  user under  such circumstances,
 we suggest  that a central model evaluation group be formed.  The model
 user could  seek help from this  group should  the  model evaluation be
 unsuccessful.

 E.    EVALUATION FOR SCREENING APPLICATIONS

      Air quality model  usage can be segmented into screening and refined
 applications.   As an  example of the former,  a relatively simple model might
 be used  to  identify the potential  existence  of an air quality problem.  If
 a  more detailed or refined analysis is contemplated, then a more sophisti-
 cated model  would be  employed to achieve  the requisite accuracy in the
 computed results.

      Although  we have discussed model  evaluation in the context of refined
 usage, this  does  not  imply that all screening applications do not merit or
 require  some appropriate  type of evaluation.   The performance of some pos-
 sible  screening models, such as rollback  and the Empirical  Kinetic Modeling
Approach  (EKMA), cannot be readily  evaluated owing to *      treme paucity
                                     76

-------
(if not unavailability)  of the  requisite  data.*   For  other  screening  models,
the evaluation process  should be  limited  to  establishing  that model perform-
ance is adequate for screening  purposes.   We suggest  that,  to the  extent
possible, all screening models be subjected to a comprehensive evaluation
prior to their adoption for such  usaoe.   This  evaluation  could  generally
follow the guidelines described in this chapter.   In  addition to establishing
the performance characteristics of these  models,  it will  be necessary to
identify appropriate input data requirements to ensure  adequate performance.

     Air quality models are made  up  of many  component parts.  Occasionally
there could be a need to evaluate particular parts, or  "modules,"  of  a
model.  The recommendations in  this  chapter  have  been formulated with evalua-
tion of the complete model in mind.   However,  there is  no reason why  the
methods should not be used to evaluate particular modules,  if the  necessary
data bases can be assembled. The principles of such  a  module-by-module
evaluation are similar  to those laid out  here.

F.   PERSPECTIVE

     As we have developed in this chapter, there  are  many issues that need
to be properly taken account of in carrying  out an air  quality  model  evalua-
tion.  To aid the reader in placing  the material  in this  chapter in perspec-
tive, we close with a brief retrospective on the  context  of model  evaluation
studies.

     It seems likely that in the  future air  quality model usage will  be  wide-
spread.  For example, the need  to demonstrate  compliance  with schedules  for
improving air quality will be one problem that will require the use of model-
ing.   Extensive model use will  create a need for  adequately verified  models
of all types.  At present, there  are no formal procedures specified for  model
evaluation, though individual groups have carried out numerous  evaluation
studies.  These past studies have lacked  common bases and goals; thus, they
do not yield much information of a general nature.
* Plans for a study to assess the feasibility of employing relatively sophis-
  ticated air quality models to determine the performance characteristics of
  simpler procedures such as EKMA are described by Tesche (1978).
                                   77

-------
     In the discussion in this chapter, we have attempted to lay out in
some detail a conceptual  framework for model performance evaluation and to
identify the component parts of such a study.  The current needs for model
evaluation are:  (1) to specify in detail all of the steps of an evaluation
study, (2) to develop formal procedures and techniques where they do not
now exist, (3) to prepare a guidelines document, and (4) to implement those
procedures.  In the final chapter we discuss some needs in the areas of
institutional requirements, technical development, and documentation.
                                   78

-------
                         V    RECOMMENDATIONS
     In this report we have outlined the relevant issues  in  air quality
model evaluation, setting out a comprehensive structure for  carrying  out
such an evaluation.  Inasmuch as this work represents  a first  effort  to
systematically address all of the relevant issues,  it  has provided  an
opportunity to discover that there are significant  gaps in the background
information required for successful  model  evaluation.  In this Chapter  we
list items that we have identified as necessary  adjuncts  to  establishing
model evaluation as a routine element of air quality model use.

     The recommendations are divided into three  sections: institutional,
technical development, and documentation.   In the first section, we
describe a set of functions for a model  evaluation  group  that  might be
established by an appropriate private or governmental  body.  The second
section contains suggestions for technical development work  that might
be undertaken to clarify those aspects of model  evaluation in  which,
currently, judgment must be substituted for applicable study results.
In the final section we list guideline documents that  we  believe should
be available to the future model user who wishes to first evaluate
properly the model he intends to use.

A.   INSTITUTIONAL NEEDS

     We believe that it would be highly  desirable to set  up  a  group of
experts who would be responsible for providing assistance in air quality
model evaluation within the model  development and user communities.  This
group, which could consist of experts in air quality and  many  different
fields, could offer assistance in carrying out model evaluations.  In addition,
                                  79

-------
they could call on other outside experts when necessary.   The functions
of such a group might include:

     >  Establishing guidelines and practices for all  aspects of
        model performance evaluation.
     >  Supplying expertise in setting up air quality model eval-
        uation studies and assisting in such studies.
     >  Coordinating (and possibly supplying) funds and equipment
        necessary for model evaluation work.
     >  Maintaining a central information exchange on the status
        of air quality model evaluation results.
     >  Developing, evaluating, and maintaining adequate and diver-
        sified data bases for future model evaluations.
     >  Making judgments on the need for evaluative studies in
        specific cases.
     >  Assigning responsibilities for model evaluations.
     >  Compiling, maintaining, and updating a set of guidelines
        documents covering all aspects of model evaluation.
     >  Developing guidelines for model selection based on evalua-
        tion experience.
     >  Determining the disposition of cases in which performance
        evaluation is unsatisfactory.

To carry out the above tasks, this model evaluation group will need to have
access to air quality experts with specialized knowledge in the following
fields (c.f. Chapter IV, Section B.2.b.):

     >  Meteorology
     >  Analytical chemistry
     >  Computer programming
     >  Statistical analysis
     >  Air quality analysis.
                                   80

-------
The group should be staffed at a  level  commensurate  with  the  amount  of ongoing
modeling and model  development activity.   Some  extra effort may  be desirable
to get initial  studies  underway;  however,  once  some  experience is gained,  it
may be possible to  decrease the commitment of resources.

B.   AREAS FOR  TECHNICAL  DEVELOPMENT

     Several  areas  pertinent to model evaluation studies  currently lack a
sound and complete  technical  basis.  Consequently, evaluation study  design
drawing on these areas  necessarily  relies  heavily on the  judgment of experi-
enced scientists rather than on pertinent  results.   We list here those areas
that we have identified,  in developing  our model evaluation procedures, as
meriting further study.

     >  Determination of  the circumstances under which model
        evaluation  is not mandatory.  As pointed out in Chapter  IV,
        a model that is to be applied in a situation for  which
        a successful evaluation has previously  been  carried out
        might not require further evaluation.   The specific cir-
        cumstances  under  which a  previous  evaluation would be
        transferable should be investigated.
     >  Determination of  the amount and quality of data needed
        to carry out a  model  evaluation.   In  general, the amount
        of data needed  to evaluate  a model  is greater than that
        necessary for a routine study.  The relationship  between
        model performance standards and measures on  one hand, and
        adequacy of data  to calculate them on the other,  should
        be investigated.
     >  Definition  of an  appropriate period of  record for sto-
        chastically varying quantities  such as  meteorology.   This
        would be applicable to long-term average models.
     >  Development of  uniform programming standards for  the  com-
        puter programs, to ensure orderly  transfer of air quality
        models  between  different  user groups.

-------
     >  Study of the number of monitors required for a model
        evaluation and their optimum placement.  The results of
        this investigation would have wide applicability.
     >  Study of the costs of data collection and model simula-
        tions.  Although collection cost studies have been carried
        out in the past, these estimates should be updated
        periodically.
     >  Study of the degree to which air quality and meteorolog-
        ical measurements are representative of their environment.
        This work would clarify the problem of comparability
        between observed and computed concentrations.
     >  Development of a set of data bases suitable for model
        evaluation work.  A study designed to develop such a set
        of data bases would consider appropriately generalized
        scenarios and develop meteorological, emissions, and air
        quality information necessary for evaluating the perfor-
        mance of a wide variety of models.

C.   DOCUMENTS TO BE COMPILED

     There is a clear need for guidance in many of the necessary tasks to
ensure proper evaluation of air quality models.  We list here a series of
guideline documents that could be developed to aid the proposed model
evaluation group:

     >  Air quality model performance evaluation.
     >  Selection of model performance  standards
        and measures.
     >  Air quality monitoring for model  perfor-
        mance evaluation.
     >  Model selection and applications.
        Air quality model input data base
        preparation.
                                     82

-------
D.    SUMMARY

     In  the above discussion,  we  have  outlined  many  areas  in which  further
research is desirable.   The  number of  areas  is  an  indication of the current
state of model  evaluation.   Until  now, model  evaluation  has typically  been
carried  out on  a  more or less  ad  hoc basis by model  developers.  There is  a
clear need  for  a  formal  framework for  model  evaluation.  This  report repre-
sents a  first step towards the development of that framework.
                                   83

-------
               APPENDIX
SUMMARY OF PREVIOUS EVALUATION STUDIES
                  85

-------
                               APPENDIX
             SUMHARY OF PREVIOUS  EVALUATION  STUDIES
     In Chapter II we discussed the results  of some previous model evalua-
tion studies.  This appendix gives  detailed  results from those evaluations,
together with comments on the methods  employed.  We have not attempted
a comprehensive survey of evaluation stuides,  but  rather have chosen some
representative studies to illustrate the methods used.  Table A-l lists
the models that are covered by the  studies included.

     Our review of air quality model evaluation studies has revealed that
emphasis so far has been on matching pollutant concentrations predicted by
a model with point measurements at  monitoring  stations.  Measures used for
comparison include:

     >  Correlation coefficients.
     >  Differences between observed and predicted values--
        either mean or root-mean-square differences, and
        either absolute or relative differences.
     >  The ratio of observed to predicted concentrations.
     >  Frequency distributions of  pollutant concentrations.
     >  Regression statistics.
     >  Qualitative comparisons.

     Evaluation studies have concentrated on statistical measures of agree-
ment with station observations.  Although most model developers have con-
sidered evaluation to be a part of  the development process, they have in
general used existing data instead  of collecting new data  designed  specif-
ically for evaluation purposes.  No studies  to date have attempted to match
                                   86

-------
TABLE A-l.    MODELS CONSIDERED IN  MODEL  EVALUATION
             STUDIES DESCRIBED IN  THIS APPENDIX
   Model Type
   Gaussian
   Box
   Plume
   Grid
   Trajectory
 Model Name
CRSTER
PTMTP
COM
AQDM
SCIM
TRAPS
CALINE-2
HIWAY
AIRPOL-4
APRAC-1A

fiifford-Hanna
RPM
SAI
LIRAQ
Shir and Shieh
DIFKIN
REM
Page
100
101
94, 107
94
94
110
110
110
110
no
 124,  125
 116,  118
 121
 126
 116
 116
                         87

-------
model results not expressed as concentrations, such as areas in violation
of air quality standards or exposure/dosage information, because of the
lack of appropriate observational data.

     Results from the studies reviewed show performance of air quality
models varying over a wide range.  However, from available evaluative
results, it is difficult to draw general conclusions about relative model
performance that could serve as a basis for selecting a model for a parti-
cular situation.  Even definite conclusions about the validity of each
model considered cannot be deduced.  Thus, a set of standardized guidelines
for  carrying out a model evaluation effort is needed.  Such a set of pro-
cedures will permit reasonably definite statements to be made about model
validity.  Evaluation studies should produce evidence of model performance
in both an absolute and relative sense.  Moreover, such studies should
cover a variety of conditions to test model features thoroughly.

     In the review  that follows, we sometimes criticize the way in which
a model evaluation was carried out.  This criticism, however, does not
reflect on the investigators, for there were no general guidelines for model
evaluation when these studies were carried out; thus, each worker of neces-
sity designed his study to suit his own particular needs.

1-   GAUSSIAN MODELS

     Gaussian models are "generally considered to be the state-of-the-art
techniques for estimating the impact of non-reactive pollutants" (EPA,
1977).  Although a great variety of Gaussian models have been developed,
they all use the same basic formulation, which assumes that steady-state
pollutant concentrations downwind of a source are described by the expression;
                                                                         (A-l)
                                   88

-------
where
             X = concentration at the  receptor,
             Q = pollutant  emissions rate at the source,
             u = wind speed,
             h = effective  height of the source,
             y - crosswind  distance between source and receptor,
             x = downwind distance from source to receptor,
          o  (x)= horizontal diffusion  parameter,

          o  (x)= vertical diffusion parameter.

     With this formula as a basis, many different types of models have been
developed.  Some are formulated to calculate the short-term average concen-
tration downwind of a source  or group  of sources (point, area, or line sources).
Others, the  so-called climatological models, integrate Eq. (A-l) over a long-
term distribution of wind speeds, wind directions, arid atmospheric stabil-
ities, to yield long-term average concentrations over a region.  Thus, Gaussian-
type models  are available for single sources or whole cities, and for long- or
short-term average concentrations.

a.   Study by Koch and Thayer (1971)

     We first discuss two evaluation studies of Gaussian models  reported
by Koch and  co-workers (Koch  and Thayer, 1971; Koch and Fisher,  1973).  In
the first study, the objective was "to evaluate critically the predictive
accuracy of  the urban diffusion model  based on the Gaussian plume concept."
Both short-term (one- and  two-hour average) and long-term  (one-month  and
three-month  average) SCL concentrations  in  St. Louis  and  Chicago were used
for comparison with model  predictions.  The short-term  average  concentra-
tions were calculated using a multiple source  steady-state Gaussian  plume
model, whereas the long-term  averages  were  evaluated  using a  statistically
selected sample of one-hour-average concentrations.

     The input data were obtained from available  meteorological  and  air
quality data.  Wind speeds  were obtained by averaging observations  at
                                  89

-------
several  stations.  Vertical wind profiles were obtained from a  single  tower
in St. Louis and calculated assuming a power law wind profile for Chicago.
Stability class was characterized by wind speed and radiation index, and
 ixing depths were interpolated from measurements 100 to 200 miles away.
mi
     The overall results for short-term average concentrations  are  given
in Tables A-2 and A-3.  For the St. Louis data, the authors  commented
that the mean observed and predicted concentrations at individual monitoring
stations were in good agreement.  However, the agreement was not as good  for
individual values, as can be seen by examining the standard  deviations  and
mean absolute differences of observed minus predicted concentrations.   The
authors concluded that the data in Table A-l constitute evidence of the
model's ability to predict long-term rather than short-term  average concen-
trations.  Table A-2 also shows that, in general, correlation coefficients
for two-hour-average concentrations were low, and the slopes of the regres-
sions of observed on predicted values were substantially different  from
unity, both indicative of the generally poor agreement between observations
and predictions.  Similar results were observed for the Chicago data
(Table A-3).

     Table A-4 shows the percentages of observations for the two cities
that were within various error limits.  The results are generally similar—
about 75 percent of the predictions are within ±(observed mean) of  the
observed mean.  The only factor found to have a consistent effect on
predictions was the wind speed.  Tables A-5 and A-6 show that agreement
between observed and predicted results was highly dependent  on wind speed.
The authors attributed this effect to inadequate estimation  of diffusion
parameters and/or inadequate accounting for the possible effect of  wind
speeds on emissions rates (e.g., fuel consumption for space  heating is
affected by wind speed, and the model inputs did not take this  into account).
                                    90

-------
             TABLE A-2.    STATISTICAL  SUMMARY OF OBSERVED TWO-HOUR S02 CONCENTRATIONS (1n pg/ni ) FOR ST. LOUIS
                           STATIONS AND CONCENTRATIONS CALCULATED USING A GAUSSIAN PLUME MODEL
Station
Number
3
4
10
12
15
17
23
28
33
36
All
Mean
Observed
Values
156
175
335
179
137
211
90
87
73
80
154
Predicted
Values
196
142
207
211
118
181
191
94
61
88
151
Observed
Mean
Minus
Predicted
Mean
- 40
+ 33
+ 128
- 31
+ 19
+ 31
-101
- 7
+ 11
• 8
+ 3
Standard Deviation
Observed
Values
145
157
237
136
132
124
106
117
88
78
159
Predicted
Values
180
195
165
214
119
161
241
149
99
134
179
Observed
Minus
Predicted
Values
207
212
255
194
133
161
238
152
103
122
194
Mean Absolute
Difference
of Observed
Minus Predicted
130
116
201
121
87
114
142
80
53
64
112
Regression of
Observed on
Predicted Values
Slope
0. 1637
0. 2354
0.3373
0.2891
0. 4964
0. 2973
0. 1085
0. 2849
0.3517
0. 2542
0. 3085
Intercept
123.9
141.4
265.2
118.4
78.6
157.7
69.2
60.1
51.1
57.2
107.9
Number
of Values
1037
872
975
980
900
1031
963
788
922
952
9420
Correlation
Coefficient
0.203
0.292
0.235
0.455
0.448
0.386
0.247
0.363
0.396
0.437
0.347
VD
          Source:   Koch and Thayer  (1971).

-------
                TABLE A-3.   STATISTICAL SUMMARY OF OBSERVED ONE-HOUR SOe CONCENTRATIONS (1n pg/m )

                             FOR CHICAGO STATIONS AND CONCENTRATIONS CALCULATED USING A GAUSSIAN

                             PLUME MODEL
TAM*
Station
Number
1
2
3
4
5
6
7
8
All
Mean
of
Observed
Values
33
114
312
123
62
23
102
43
96
of
Predicted
Values
47
99
379
315
128
58
158
36
145
Observed
Mean
Minus
Predicted
Mean
• 14
+ 15
- 67
-192
• 66
. 35
- 55
•f 7
- 49
Standard Deviation
of
Observed
Values
56
87
152
89
47
32
95
39
117
of
Predicted
Values
111
108
416
294
140
98
159
76
232
of
Observed
Minus
Predicted
Values
98
128
397
274
135
97
157
83
201
Mean Absolute
Difference
of Observed
Minus Predicted
39
87
221
201
83
45
100
45
99
Regression of
Observed on
Predicted Values
Slope
0. 2349
0.1188
0. 1106
0.1119
0.0936
O.OS95
0. 1905
0. 0366
0. 2493
Intercept
21.6
102.7
269.7
88.0
50.2
19.5
72.2
41.8
60.2
Number
of Values
723
602
606
614
722
703
711
726
5407
Correlation
Coefficient
0.466
0.148
0.303
0.370
0.279
0.182
0.319
0.071
0.494
to
ro
         * TAM = Telemetered air monitoring.


         Source:  Koch and Thayer (1971).

-------
     TABLE A-4.    COMPARISON OF ERROR DISTRIBUTIONS FOR  TWO-HOURLY
                  ST. LOUIS AND HOURLY CHICAGO VALIDATION  CALCULATIONS
Range of Predicted Minus
Observed Concentration
..g/m3
± 5
± 10
± 20
± 50
±100
+ 150
* of Comparisons Within Error Limits
St. Louis
(Mean Observed
Concentration « 154 *g/m )
8
15
25
46
65
76
Chicago
(Mean Observed
Concentration « 96 »g/m )
8
17
30
S3
73
82
 Source:  Koch and Thayer  (1971).
     When the observed and predicted S02  concentrations for the different
stations were averaged over three months  for  St.  Louis and one month for
Chicago, improved agreement was  obtained.  Those  results are shown in
Table A-7.  Averaging over all  of the data at each  station was intended
to provide an indication of the  performance of the  model in predicting longer
term average concentrations. However,  the low values for correlation coef-
ficients (corresponding to 46 percent of  the  variance explained for the
St. Louis data and 76 percent for Chicago), and the slope of 0.63 for the
Chicago oata, indicate that agreement between observed and predicted
concentrations is still  not high.   Koch and Thayer  (1971) quoted a
                                                 o
combined root-mean-square error  (RMSE)  of 68  yg/m , compared with an  over-
                    3
all mean of 128 yg/m , indicating that, for this  study, overall RMSE
was about one-half of the overall  mean.  In addition, it was stated that
the long-term mean was overpredicted at 11 stations and underpredicted
at 7 stations, indicating a tendency of the model to overpredict observed
concentrations more often than  to underpredict.   If the model  actually
had no tendency to overpredict  or underpredict, however, the above  result
would be obtained with a probability of 12 percent. Consequently,  that
result is not a strong indicator of bias  in the model.
                                   93

-------
TABLE A-5.   OBSERVED, PREDICTED, AND OBSERVED MINUS  PREDICTED
             CONCENTRATIONS BY HIND SPEED CLASS FOR ST.  LOUIS
             DATA
WinJ Speed
Class (m/sec)
1.5
1.5< u < 2.0
2.0< u < 2.5
2.5 < u < 3.0
3.0 < u <4.0
4.0 < u < 5.0
5. 0 < u < 6. 0
6.0 
-------
        TABLE A-7.  STATISTICS FOR LONG-TERM AVERAGE PREDICTIONS OF SO,
    City
Number    Overall
   of      Mean      RMSE
Stations  (yq/m3)   (yg/m3)
                                          Regression of Observed
                                           on  Predicted Values
Slope
Intercept
Correlation
Coefficient
St. Louis
Chicago
10
8
154
96
56
78
0.98
0.63
-0.56
4.9
0.675
0.873
  Source:   Koch and Thayer (1971).


       When long-term averages were calculated  using a sample of the one-
  hour-average concentrations, the mean  predicted concentrations did not
  change substantially, but the RMSE  increased  relative to the mean that
  was  calculated using all of the data,  as  expected.  This effect is shown
  in Table A-8.  The authors concluded that 1 hour sampled out of 24 is
  as small a sample as should be used for calculating seasonal averages.
   TABLE A-8.    SUMMARY OF ACCURACY OF SAMPLING  INTERVALS FOR ESTIMATING
                DISTRIBUTION OF PREDICTED  CONCENTRATIONS OVER A SEASON
Sampling
Interval, Hours
1
2
4
6
8
12
24








Root Mean
Square Error (RMSE), **'
pg/m
__
2.43
3.70
3.58
7.69
7.94
16.43
Mean, yg/m
150
150
152
148
151
150
158
  (a)
                                     • •
                                     l,i
                         N = No. of Stations (10)
                         j = Sampling Interval

                           = Seasonal Mean Concentration for i th Station with
                             Sampling Every Hour

                           = Seasonal Mean Concentration for i th Station with
                             Sampling  Ever/ Hour
Source:  Koch and Thayer  (1971),
                                       95

-------
     Koch and Thayer (1971) drew the following conclusions about model
performance and adequacy from their study:

     >  Predicted long-term (monthly or seasonal) concentrations
        averaged over several locations are in good agreement
        with observed concentrations.
     >  Predicted long-term concentrations at individual loca-
        tions show a root-mean-square error equal to about
        half the mean and indicate a slight tendency to over-
        predict observed concentrations.
     >  Predicted short-term (one- or two-hour average) concentra-
        tions at individual stations show larger deviations from
        observed concentrations than do the long-term predictions.
        However, over a period of a month or a season, the overall
        distribution of predicted short-term concentrations closely
        approximates the distribution of observed concentrations.
     >  The calm, or light wind, case is not adequately treated
        by the Gaussian plume type of urban diffusion model.
        Further study of procedures for applying the model to
        this type of situation is needed.

b.   Study of SCIM. COM. AQDM. and Gifford-Hanna Models

     In a later study, Koch and Fisher (1973) carried out an evaluation
of three Gaussian models, namely, the Sampled Chronological Input Model
(SCIM), the Climatological Dispersion Model (COM), the Air Quality
Display Model AQDM), and a fourth, box, model, the simplified Gifford-
Hanna Model (GHM).  The SCIM generates a long-term average concentration
by calculating one-hour-average concentrations for a limited number of
selected hours from the required long period (e.g., one year).
These one-hour-average concentrations are then averaged to obtain the
long-term mean.  COM and AQDM calculate a long-term mean concentration
by utilizing a distribution of meteorological conditions for the period
interest.   Concentrations are calculated for various meteorological condi-
tions and weighted by their relative frequency of occurrence.  The GHM
                                   96

-------
model  merely assumes  that pollutant concentrations are directly propor-
tional  to source  strengths and inversely proportional  to wind speed.
To ascertain the  consequences of options in preparing  model  inputs,  Koch
and Fisher studied  several variations of the models.   In the preparation
of data inputs, the emissions rates, stability conditions,  and wind  speed
were assumed either to  be constant throughout the data period or to  vary
from hour to hour.  In  addition, for the calculations  with  GHM, the  con-
centrations originating from point sources were either added or not  added
in the calculations.  In all, ten model variations were studied as follows:

       (1)  SCIM--variable area source emissions rates, atmospheric
                 stability, the height of the mixing layer.
       (2)  SCIM—constant area source emissions rates, variable
                 atmospheric stability and height of the mixing
                 layer.
       (3)  SCIM--constant area source emissions rates, atmospheric
                 stability, and height of the mixing layer.
       (4)  GHM—constant area source emissions rates and wind speed,
                without point sources.
       (5)  GHM--variable area source emissions rates and wind speed,
                without point sources.
       (6)  GHM--constant area source emissions rates and wind speed,
                with point sources.
       (7)  GHM--variable area source emissions rates and wind speed,
                with point sources.
       (8)  CDM--constant atmospheric stability and height  of the
                mixing  layer.
       (9)  CDM--variable atmospheric stability and height  of the
                mixing  layer.
      (10)  AQDM.

     Two types of air quality data were used:   one-hour-average  S02  concentra-
tions at 10 locations and annual mean  S02 and  particulate concentrations
at 127 locations  in the vicinity of New York City.  Meteorological data
were obtained from La Guardia and Kennedy Airports.
                                    97

-------
     The one-hour data were used to compare SCIM and GHM.   Results  are
given in Table A-9.  They are similar to the results obtained by Koch
and Thayer (1971) for short-term predictions:  Correlation coefficients
are < 0.45, slopes of regression of observed on predicted  concentrations
are different from unity, and the root-mean-square error for the indivi-
dual differences is of the same order as the mean values.

     As expected, the models performed best when used for predicting
annual mean concentrations.  These results are shown in Tables A-10
and  A-ll.  Comparison with  the  data in Table A-9 shows that root-mean-
square errors are lower and correlation coefficients are higher for the
annual averages than for the one-hour averages.  No single model stands
out as being better than all of the others in every respect.  Koch  and
Fisher (1973) reached the following conclusions based on this study:

     >  The use of variable emissions rates for SCIM and GHM
        does not result  in  any  conclusive improvement in
        model performance over  the use of mean emissions rates.
        It is inferred that this result is due to the failure
        to properly treat other causes of variance, such as
        those associated with atmospheric stability.
     >  Based on the results for New York City, the Clima-
        tological Dispersion Model (COM) and SCIM versions
        of the multiple-source  Gaussian plume model produce
        a smaller station-to-station root-mean-square error
        than does the Air Quality  Display Model (AQDM) version
        (i.e., RMSEs of  52  and  59, respectively, compared  with
        92, with an overall mean of 135 yg/m  for S02; RMSEs
        of 22 and 22 compared with 36, with an overall mean of 82
        yg/m  for particulates).
     >  Although the New York City evaluation statistics for
        GHM, COM, and SCIM are  similar for S02> GHM results for
        particulates have a much higher station-to-station root-
        mean-square error than  do COM and SCIM (i.e., RMSE
        of 60 compared with 22, with an overall mean of 82 yg/m ).
                                     98

-------
TABLE A-9.   COMPARISONS OF MEASURED AND PREDICTED ONE-HOUR-AVERAGE
             S02 CONCENTRATIONS IN NEW YORK CITY


Nmbu of CompuUoM
"l
•a
•*
5
z

1
£ ii-
o"'
Tj 3

j
S
1 iB"?
JP

J-;

< 3
3|
X fl
Mtuorad
SdM(Vuli>U9, S, H)
SdM(Mt»9, Vu. S, H)
SdM (Mua9, S, H)
CHM (without pol«o)
CHM (with poln)
Mouoiwl
SdM (VvUfcla 9, S, H)
IQM (MM* Q, Vn. S, H)
SdM(M«u9, S, H)
CHM (wlttoot pol«a)
CHM (wilt ool«)
SdM (VulthU Q, S, H)
SdM (M«u 9, Vu. S, H)
SdM(M*u9. S.H)
CHM (wHfaool polxa)
CKM (wldi (Bin)
SdM (VuUbU 9, S, H)
SdM (M«u Q, Vu. (, H)
SdM(M«u9, S, H)
CHM (wltfarat H<<>)
CKM (with (Bt«)
Mudmom Mtuurad |«/n>>)
jji

f 1
J I
• ,
1 T;
* !
Id

•5^1
I ! §

• 1 5
3
H
t ^
J- •J
§
1C
"8 J
K .
H
ii
*
SdM (VuliU. 9, t, H)
SdM (MM* Q, Vu. S, H)
SdM(M«u9, S, H)
CHM (vlthMI pol*a)
CHM (wltt polM)
SdM (VuUblt 9, S, H)
SdM(M..«9, Vu. S, H)
SaM(Mt.«9. S, H)
CHM (without polDB)
CHM (with poloB)
SaM (VuUbU Q, S, H)
SC!M(MclD9, Vu. S, H)
SdM (Mug 9, S, H)
CHM (without polnn)
CHM (with polBB)
5C1M (Vuliblt 9, S, H)
SdM (Meu 9, Vu. $, HI
SaM(Mem9, S, H)
CKM (without polna)
CHM (with point.)
SCIM (VtrlibU9. S, H)
SnM(Mu>9, V.I. S, H)
SOM(Mcu9, S, H)
CHM (wlthoat ralmU)
CHM (wltfe pain)
KTC Saaom Nmh
-------
               TABLE A-10.
MODEL COMPARISONS FOR ANNUAL MEAN S02 CONCENTRATIONS USING NEW YORK CITY  DATA
Statistic
Number of Comparisons
Mean Measured (pg/m )
Mean Calculated fyig/m3)
Mean Error fyig/m3)
Root-Mean-Square Error fyig/m )
Mean Absolute Error (^g/m3)
Largest Negative Enor (/iig/m3)
Largest Positive Enor (|tg/m )
Error Range Ojg/m3)
Correlation Coefficient
Reduction of Variance (H)
Slope of Regression Line
Intercept of Regression Line
Maximum Measured (fig/m3)
Error for Maximum Measured (/ig/m3)
SCIM
75
135
163
28
59
46
-112
169
281
0.84
71
0.70
20
385
-47
SCIM
(Q)
75
135
162
27
59
47
-106
162
268
0.83
69
0.70
21
385
-58
SCIM
(Q,S,Hm)
75
135
88
-47
65
50
-149
50
200
0.82
68
0.98
48
385
-149
CHM
71
140
78
-62
78
67
-171
104
274
0.82
67
0.85
74
385
-75
CHM
71
140
94
-46
70
59
-170
162
332
0.82
67
0.70
74
385
-11
CHM
(Q,Tj)
4 Points
71
140
107
-33
58
46
-139
151
290
0.83
70
0.76
59
385
-10
CHM
4 Points
71
140
123
-17
59
44
-133
209
342
0.83
69
0.64
61
385
53
AQDM
75
135
211
76
121
92
•87
310
397
0.89
79
0.45
31
350
112
COM
75
135
138
3
52
37
-118
166
284
0.84
70
0.66
35
350
-101
COM
75
135
206
71
124
89
-112
332
444
0.84
71
0.41
40
350
13
o
o
       Source:  Koch and Fisher (1973).

-------
           TABLE  A-ll.    MODEL COMPARISONS  FOR ANNUAL MEAN PARTICIPATE CONCENTRATIONS USING NEW YORK DATA
Statistic
Number of Comparison!
Mean Measured (yg/m3)
Mean Calculated (pg/m )
Mean Error (yg/m3)
Root-Mean-Square Error (yg/m3)
Mean Absolute Error (yg/m3)
Largest Negative Error (y g/m3)
Largest Positive Error (yg/m3)
Error Range (yg/m )
Correlation Coefficient
Reduction of Variance (tt)
Slope of Regression Line
Intercept of Regression Line
Maximum Measured (yg/m3)
Error for Maximum Measured (yg/m3)
SCIM
114
81
69
-12
22
16
-68
-43
110
0.68
46
0.78
28
169
-54
SCIM
(9)
114
81
69
-13
22
16
-66
39
106
0.68
46
0.80
26
169
-52
SCIM
(Q,s,Hm)
114
81
58
-24
55
33
-83
463
546
0.30
9
0.13
74
169
-83
CHM
(Q,U)
112
82
92
11
60
36
-66
325
391
0.66
43
0.21
62
169
150
CHM
112
82
104
22
77
44
-63
405
468
0.66
43
0.18
63
169
208
CHM
(9,U)
+ Points
112
82
101
19
64
38
-61
338
400
0.67
45
0.21
61
169
161
CHM
+ Points
112
82
113
31
82
47
-57
419
475
0.67
45
0.17
62
169
219
AQDM
113
82
102
20
36
28
-51
115
166
0.62
39
0.38
43
169
5
COM
113
82
74
-8
22
16
-63
68
131
0.61
37
0.63
35
169
-48
COM
(S,Hm)
113
82
88
6
28
21
-60
98
158
0.64
41
0.42
45
169
-6
Source:  Koch and Fisher (1973).

-------
     Proper design and interpretation of a model  evaluation  study can aid
in improving model performance.   The results of these two  studies show
that the validity of an air quality model cannot necessarily be  judged by
a single statistic:  A comprehensive program is necessary  if valid con-
clusions are to be reached.  Model evaluation statistics can vary from one
station to another and from one averaging time to another, and one figure
of merit cannot necessarily provide a basis for choosing  one model over
another.  In addition, comparison of results obtained using  different types
of inputs was used to infer possible problems with model formulation.  More-
over, in the latter study of several models, which was carried out so that
direct model comparisons could be made, conclusions were drawn as to the
relative performance of the various models.  It is clear that if model evalua-
tions were carried out according to standardized procedures, studies of
different models by different workers would be directly comparable.

     These studies had a shortcoming that is common for evaluative studies
to date:  The data used for the evaluation were not collected specifically
for that purpose.  The large amount of pollutant concentration data avail-
able, however, did allow some conclusions to be drawn about  model performance.

c.   Studies of CRSTER Model

     Two studies were carried out in 1975 to examine the performance of
the EPA single source model CRSTER (Mills and Record, 1975;  Mills and
Stern, 1975).  The first of these studies was carried out  for the Canal
Power Plant, located near Cape Cod.  The second studied the  Stuart,
Muskingum, and Philo power plants in Ohio.

     Comparisons between predicted and observed results were made by
constructing frequency distributions.  In the Canal study, the model
consistently underpredicted measured concentrations (less  so at  higher
concentrations), which was attributed to an overestimation of plume
height by the model and a consequent underprediction of pollutant levels
at the monitoring stations.  In addition, Mills and Record pointed out
that the calculated plume spread was not large enough to affect  more than
                                   102

-------
one model  receptor location,  which resulted  in  few  predicted  low  and
medium concentrations.   A typical  set of frequency  distributions  is shown
in Figure  A-l.   No attempt was  made to quantify statistically the
agreement  between calculated  and measured distributions  (e.g.,  by a
                       P
Kolmogorov-Smirnov or x  test).

     In the second study (Mills  and Stern, 1975), much better agreement
between measured and calculated  concentrations  was  found.   Figure A-2
shows one  of the frequency distributions for the Stuart  plant.  The better
agreement  between measured and  calculated results at  higher concentrations
was attributed  to the reduced influence of uncertainties associated with
the determination of background  concentrations.  The  method of determining
background concentrations, which was to average the concentrations at
all stations upwind from the  plant using the wind direction measured at the
plant, sometimes resulted in  downwind concentrations  less  than background,
producing  negative net measured  concentrations.  Better  agreement at low
levels was obtained when background was added to predicted concentrations
(Figure A-2),  so that all concentrations (observed  and predicted)  were
nonnegative.

     In both of these studies comparison was made  through  the use of
frequency distributions.  Such  comparison methods  fail  to  associate  a
particular observed concentration with the corresponding computed value.
Scatter plots are useful visual  devices for these  comparisons.
d.   Study of PTMTP Model

     Another Gaussian mdoel  for which an evaluation study was  carried out
is the EPA's PTMTP Model  (Guzewich and Pringle,  1977).   PTMTP  is  used to
calculate one-hour-average concentrations at a number of receptors result-
ing from the emissions of up to 25 point sources.  The effluent character-
istics of the sources (i.e., parameters employed to estimate plume rise)
as well as hourly meteorological data are input to the model.   In this
study an inert tracer (SFg)  was injected into a spray dryer stack during

                                   103

-------
•fttt
                   PERCENTAGE Of  J4 HOUR CONCENTRATION
                        GREATER  THAN  INDICATED VALUE
                   )M   M«O  tO TO M M 4O BO tO    IO  •
           CUMULATIVE  FREQUENCY
           DISTRIBUTION FOR 24 HOUR

           SOf CONCENTRATIONS AT  STATION 3

                    MEASURED
                    MEASURED. MINUS
                    BACKGROUND
           ---- • CALCULATED
     ao» O.030J oj as i
                X   »   10   10 90 40  90 «0 TO *0  fO  t»   •
                  PERCENTAGE OF 24 HOUR CONCENTRATIONS
                        LESS THAN  INDICATED VALUE
  Source:  Mills  and Record (1975).
FIGURE A-l.
           A TYPICAL CUMULATIVE FREQUENCY  DISTRIBUTION OF 24-HOUR-
           AVERAGE SO? CONCENTRATIONS MEASURED AND CALCULATED
           USING THE CRSTER MODEL

-------
                            00-
                            to
                            ID
                         cn
                         o
                            (V-
o
01
t£L
                            10.
                         LU
                         O  BT.
                         21
                         O  n
                         O

                             IM-
                                             PERCENTAGE OF CONCENTRATIONS
                                             GREATER  THAN INDICATED VALUE
                                  .» ** •
                                  »  t—»-
                 • t «•
                -•	4—
ti  «o  •»
*   I	»—
M  N
         j. M.  srupnr TLPMT
         CUMULOTIVE FREOUEMCr
         OISTRIBUTIOM FOR   1 HOUR
         sea coMCEMrnonoMs RT  W.L STPTIOMS
           AMCP3UHEO M1WU3 BRCKCnOUtJO
           4.CPLCULOTCO
                                                   »•-<*> 4	••—»
                                 Ob**OJOJ Mil   I  10   IOM«OM«OTOM  1
                                             PERCENTAGE OF CONCENTRATIONS
                                               LESS THAN INDICATED VALUE
                                                                             •I  MM
                                                                                      til Mt
                                                                                              •*•
                         FIGURE  A-2.
             J. M. STUART PLANT CUMULATIVE FREQUENCY DISTRIBUTION FOR
             ONE-HOUR-AVERAGE S02 CONCENTRATIONS AT ALL STATIONS.
             Number of measured Concentrations • 45,512; number
             of calculated concentrations « 61,320.

-------
                  PERCENTAGE OF CONCENTRATIONS
                  GREATER THAN INDICATED VALUE
               t* ••  ti
                       10
                           •0 TO 10 10 40 10 M
                                           10
                                                    I O-t 01 0.1
                                                             001
   to-

   in-
 §-
                              >  »- >  I
J.  M. STUflRT  PLflNT
CUMULRJIVE  FREQUENCY
tISTRiBUtf!oN .
>02 CONCENTRflT
                 OR  1 HOUR
                 IONS flT RLL STflTIONS
         *PREOICTCO fLUS BACKGROUND
     O.M •.OS MM *4 I  I   »   10  10 M «0 M M TO M  MM  •• ••  tt« M»  •••

                  PERCENTAGE OF CONCENTRATIONS

                    LESS THAN INDICATED VALUE
Source:  Kills and Stern (1975).
                   FIGURE A-2 (Concluded)

-------
various meteorological  conditions,  and downwind  tracer  concentrations were
monitored.   The results were assessed  by  calculating  correlation coefficients
and by plotting the data.   The findings are  shown  in  Table  A-12  and
Figure A-3.

     The data presented in Table  A-12  show that  good  correlation was
obtained for stability  categories C and D, whereas very poor correlation
was observed for stability categories  E and  F.   The poor correlations
were attributed to the  paucity of the  data (60 percent of the observations
were below  the detection limit of the  SFg monitoring  instrumentation) and
to the resultant small  data sets.   Another potential  source of error is the
identification of the stability category.  The value  used for the vertical
stability coefficient,  a , depends  on  the stability category, and Bohac
et al. (1974) pointed out that concentrations predicted by  the Gaussian
plume model  can be very sensitive to az-

     Figure A-3 shows the results of regressing  the predicted on the observed
data.  The  authors pointed out that "every predicted  concentration was within
a factor of 10 of the value measured and  89  percent of all  predicted values
were within a factor of three of  the measured concentration."  It may be
seen from Figure A-3 that while the regression line is fairly close to the
ideal, the  95 percent confidence  interval on the dependent  variable is of
the same order of magnitude as the  total  range of  its values.

     The correlation coefficient  is a  statistic  commonly used in evalua-
tion work.   However, although it  is a  useful indicator  of a relationship
between two data sets,  it has some  drawbacks, most notably  that  it is
scale invariant.  (This implies that if all  numbers in  one  data  set were
doubled, the correlation coefficient would not change.)  Thus it should
be used in conjunction  with another performance  measure that can indicate
the correspondence in  scale between observed and predicted  results.   The
use of regression analysis in this  context has been discussed by Brier
(1973, 1975), as mentioned later  in this  appendix.

     The Gaussian plume formulation was also tested by Shum et al. (1975),
who found that, for C and D stabilities,  72 percent of measured concen-
trations were within a factor of two of model  predictions, and for B

                                    107

-------
     TABLE A-12.
CORRELATION BETWEEN OBSERVED SF, CONCENTRATIONS AND
CONCENTRATIONS CALCULATED USINGbPTMTP, BY
STABILITY CLASS
                                       Stability  Class
Correlation
Excluding zero
concentrations
Including zero
concentrations
A
__
(0)*
__
(0)
B
0.29
(19)
0.36
(20)
C
0.79
(33)
0.79
(33)
D
0.54
(45)
0.63
(55)
E
-0.58
(8)
-0.1
(20)
F
^^
(2)
0.46
(5)
ALL
0.77
(107)
0.81
(133)
     * Figures in parentheses  are  numbers  of observations.
     Source:  Guzewich and  Pringle (1977).
                        0.06
                                0.01   0.0?   0.03   0.04  0.06
                                    Measured SFt (ppm)
Source:  Guzewich and Pringle  (1977).
 FIGURE A-3.  MEASURED SF6 CONCENTRATIONS AND PREDICTIONS USING PTMTP
              FOR STABILITY CATEGORIES  B THROUGH  F-  The  solid  line
              shews the least square regression line that best  fits
              the data; the dotted lines indicate the 95  percent con-
              fidence interval; the dashed line shows predicted con-
              centration equals measured concentration.
                                   108

-------
stability, 63 percent were within a factor of two.  Figure A-4 shows their
results for C stability.  It can be seen that in this case also there is
a wide uncertainty interval around each point.
                            20
                         "I 16
                         o"
                         •o
                         i 12
                         .1
                          £
                          o
                              0      4      8      12
                              Calculated concentration (i 10  g/m1)
    Source:  Shum et al. (1975).
      FIGURE A-4.
SCATTER DIAGRAM COMPARING OBSERVED AND CALCULATED
CONCENTRATIONS OBTAINED FROM THE STANDARD
GAUSSIAN PLUMR MODEL UNDER SLIGHTLY UNSTABLE
CONDITIONS (C STABILITY) FOR THE HIGH SOURCE
AT THE WESTERN KRAFT CORPORATION
     The evidently large uncertainty in the observed concentration measure-
ments argues for some indication of the uncertainty in the parameters of the
regression line (intercept, slope) to be given.  Moreover, there are theo-
retical problems associated with the use of regression analysis in this
context,  as pointed  out  by Brier  (1973).   In particular,  the  assumption
independence of data values is clearly violated by air quality observations.

e.   Study of COM Model

     A more recent study of the Gaussian model COM in the Copenhagen area
was reported by Prahm and Christensen (1977).  Predicted  three-month-
average SOp concentrations were compared with measurements at 24  stations
                                   109

-------
                                      2
covering a "flat" urban area of 500 km.  Different combinations  of para-
meters were used in the model runs, as follows:

      (1)  OL, dispersion parameter for area (low) sources:
           (a)  original Pasquill parameters.
           (b)  original Pasquill parameters, shifted one
                stability class toward  the unstable region.
           (c)  parameters due to McElroy (1969).

      (2)  OH, dispersion parameter for  point (high) sources:
           (a)  Hogstrom parameters (see Brummage, 1968)
           (b)  Singer and Smith parameters (see Brummage, 1968)
           (c)  McElroy parameters.

      (3)  OQ, initial dispersion at a source:
           (a)  20 meters
           (b)  30 meters
           (c)  40 meters.

      (4)  T1/2, half-life of S02:
           (a)  1 hour
           (b)  3 hours
           (c)  x hours.

  The results are shown in Table A-13.  For the 24 stations, the  squared
                            2
  correlation coefficient (r ) varied between 0.6 and 0.7.  However, data
  from two stations appeared as outliers in all cases.  When data from
  these stations were excluded (they were in "questionable surroundings"),
  the squared correlation coefficients  varied between 0.82 and 0.65.  The
  authors stated, "thus the model explains more than 80 percent of the
  spatial variation of the S02 concentrations in the urban area."  The
  correlation coefficient was not sensitive to the parameter combinations
  used, a result attributed by the authors to the large number of sources
  distributed at various distances from the receptor points.  When measured
  concentrations were regressed on calculated concentrations, slopes closest
  to  1.0 were obtained for parameter values corresponding to rapid dilution
  and  reaction.
                                   no

-------
          TABLE A-13.
PARAMETER COMBINATIONS EMPLOYED  IN THE PRAHM AND CHRISTENSEN STUDY
AND A SUMMARY OF THE RESULTS
Test Number
°l


°H


°o


Tl


r1
r2
r
a
B
Pasquill a
Pasquill b
HeElroy
Hogstrom
Singer Smith
HeElroy
20 m
30 m
40 m
1 hour
3 hours
x hours
24 stations
22 stations
22 stations
slope
cut-off
1« Ib 2a 2b 2c 3a 3b 3c 4* 4b Si 5b


X X X X X'X X X
X X X X
XXXXXXXX

X X
XXX XX
XXX XX
X X XXX
X X XXX
X X
0.64 0.63 0.64 0.63 0.62 0.64 0.63 0.61 0.66 0.65 0.66 0.65
0.84 0.84 0.83 0.83 0.82 0.83 0.82 0.82 0.85 0.84 0.84 0.84
0.92 0.92 0.91 0.91 0.91 0.91 0.91 0.91 0.92 0.92 0.92 0.92
0.74 0.79 0.82 0.84 0.88 0.92 0.93 0.98 0.87 0.92 0.96 1.2
40 43 35 38 40 32 36 39 33 37 31 35
61 6b 7i 7b 8a Bb 9a 9b
X X X X
X X


X X X X X X
X X

X X X X XX
X X
X X X
X X X X

0.64 0.62 0.66 0.65 0.66 0.65 0.64 0.62
0.83 0.83 0.84 0.84 0.83 0.83 0.82 0.83
0.91 0.91 0.92 0.92 0.91 0.91 0.91 0.91
0.87 0.93 0.54 0.57 0.60 0.64 0.72 0.76
39 42 39 42 37 40 37 40
Source:  Prahm and Christensen 0977).

-------
     This study is notable for its use of model evaluation to study the
structure of the model and look at many possible parameter combinations
for input.  However, because of the problems with use of regression
analysis, the significance of the differences in slopes and intercepts
for the various cases is not clear.

f.   Study of TRAPS, CALINE-2. HIWAY. and AIRPOL-4 Models

     A study of four line-source models incorporating the Gaussian disper-
sion formula was carried out by Maldonado and Bullin (1977), using CO data
from five experimental programs.  This study was a part of the development
activity for a new model, the Texas Roadway Air Pollution Simulator
(TRAPS).  Results are given in Table A-14 and Figure A-5.

     When TRAPS was compared with the other three models over the com-
plete data set, 1t was found to be superior over all.  It had lower
average error (an indication of bias), lower mean-squared error (an
indication of general agreement with measured concentrations), and higher
percentages of results within ±1 ppm and ±2 ppm of measured concentra-
tions.  The regression lines in Figure A-5 indicate that the TRAPS Model
most closely approaches the ideal relationship, although the authors give
no details of the regression statistics.  These regressions show graphically
that the HIWAY Model exhibited the worst fit of the predicted to the
measured concentrations.  HIWAY also has the highest mean-squared error for
all of the data sets except one.  This effect apparently was due to a small
number of extremely large errors, both positive and negative, since neither
the average error nor the percentages of results within ±1 ppm and ±2 ppm
were consistently different for HIWAY relative to the other three models.
Again, we note the use of regression analysis in model evaluation.

g.   Study of APRAC-1A Model

     A validation study of the APRAC-1A Model was.carried out by Dabberdt
et al. (1973).  APRAC-1A is an urban diffusion model intended for pre-
dicting concentrations of inert, vehicle-generated pollutants.  The model
was evaluated using observations from St. Louis.  Meteorological data

                                   112

-------
TABLE A-14.   STATISTICAL RESULTS  FROM THE  MALDONADO  AND BULLIN STUDY
DM* M*
Tennes-
see






North
Carolina






VSrgmia







Olinois







•UIMIc
Mo. of data pts
Av error, ppm
Avsqd error.
ppm2
% within ±1
ppm
% within ±2
ppm
No of data pts
Av error, ppm
Avsqd error.
ppm2
% within ±1
ppm
% within ±2
ppm
No. of data pts
Av error, ppm
Avsqd error.
ppm2
% within ±1
ppm
% within ±2
ppm
No. of data pts
Av error, ppm
Avtqd error.
ppm2
It within ±1
ppm
% within ±2
ppm
California No. of data pts







Cumula-
tive com-
parison


Av error, ppm
Av sqd error.
ppm2
% within ±1
ppm
% within ±2
pom
No. of data pts
Av error, ppm
Av sqd error.
ppm2
% within dkl
TRAP*
499
0.4
1.1

74

95

274
0.0
2.9

48

65

170
-0.5
1.1

74

95

132
0.6
1.3

67

92

211
-0.2
3.1

55

84

1168
0.1
1.6

65
CAUME-l
459
1.6
4.0

33

62

274
0.2
4.7

49

74

186
-0.1
1.0

75

95

132
1.2
2.4

41

83

211
0.2
2.7

62

84

1262
0.8
3.3

48
MWAY
503
1.1
4.0

43

76

274
—1.1
11.5

45

85

186
-0.9
8.5

67

85

132
-1.9
10.5

46

70

211
-1.0
39.9

47

73

1306
-0.3
12.7

48
AMPOL-4
503
1.9
4.7

27

66

274
1.0
4.2

61

74

186
0.0
0.9

64

95

132
0.9
1.3

60

82

74
0.3
3.5

56

80

1161
1.2
3.5

47
                      % within
                        ppm
91
76
74
73
            Source:  Maldonado and Bull in  (1977).
                                  113

-------
 Source:  Maldonado and Bullin (1977).
FIGURE A-5.   COMPARISON OF CO MEASUREMENTS  AND PREDICTIONS
              FROM FOUR MODELS EXAMINED BY MALDONADO AND
              BULLIN
                            114

-------
were obtained from the St. Louis Airport and the NWS station at Salem,
Illinois.  Traffic data were obtained from the Missouri Highway Depart-
ment, except for average vehicle speeds and diurnal traffic cycles
measured in the downtown area.  It was felt that using data that were
not specially collected represented the way the model would be applied
by a user and therefore would be a realistic test of the applicability
and accuracy of the model.

     The results are shown in Figures A-6 and A-7, which give the diurnal
variations in measured and predicted CO concentrations for two weeks
and the measured and predicted cumulative frequency distributions of
CO concentrations.  The root-mean-square difference between measured
and predicted concentrations ranged from 2.6 to 3.9 ppm.  Calibration
of the model reduced this difference to 1.6 to 3.3 ppm.  Correlation
coefficients for the different locations were in the range from 0.4 to 0.7.

     Of the possible sources of error in the model, Dabberdt et al.  cited
two as being most likely to account for the discrepancies  between measure-
ments and predictions.   First, the minimum transport speed was assumed
to be 1 m/s, which, it was suggested, is too low for uban  areas.   This
error would result in a tendency to overestimate, particularly the higher
concentrations.   Second, the emissions submodel, which relates emissions
rate to average vehicle speed over a composite test route, was probably
inadequate to specify the microscale distribution of emissions.  The model
would probably predict average concentrations over an area better than
concentrations at a particular point.
                                   115

-------
2.
      NUMERICAL MODELS
     Having discussed several  evaluation studies  of  Gaussian models, we
now turn to studies of models  that incorporate  numerical  solutions of the
atmospheric diffusion equation.   As noted earlier, these  models are based
on the equations of conservation of mass for each pollutant species.  The
focus of this section, therefore, is on models  based on the solution of the
following equations [see, for example,  Reynolds et al.  (1973)]:
  at
where
        Advection Terms
                                       Turbulent Diffusion Terms
Ti /*      U r*
O C •     O V •
	_L 4. \/   T
3x      3y
3c
                                       ac
                                a        i      a
                              '     <        +
                                        ..... cn,T)  + S^x.t.z.t)
                                                1  =  1,2 .....  n
                                                                    3c
               c. = time-averaged concentration of species i,
          x, y, z = Cartesian coordinates, with z the vertical  coordinate,
          u, v, w = components of the wind vector in the x, y,  and  z
                    directions, respectively,
                t = time,
                n = number of species,
           Ku, K  = horizontal and vertical turbulent diffusivities,
            n   V
                    respectively,
                T = temperature,
               R. = volume rate of production of species i through  chemical
                    reactions,
               S. = rate of emission of species i  from volume sources,
               W. = rate of removal  of species i  through scavenging
                    mechanisms.
                                   116

-------
These equations form the basis  of all the numerical models considered in
this section.   Solution of the  equations  in  the  form given can be extremely
complex; gathering, preparing,  and supplying the requisite input data can
also be tedious and time-consuming.   Thus, simplifying assumptions are
frequently made, leading to more easily solvable forms of the equation  or
to reduced  requirements  for  input information.  To some extent,  there  is a
trade-off between the degree of simplification on the one hand and the
expected accuracy and reliability of prediction  on the other.  Since these
models are formulated in a fundamentally  different manner from Gaussian
models, they may present different problems  in evaluation.

     The numerical models described herein fall  into three categories:  grid,
trajectory, and box models.  In formulations of  grid models, the region of
interest is divided into a three-dimensional array of "cells," each perhaps
1  to 4 kilometers on a side and on the  order-of  10 to several hundred meters
high.  In the  trajectory approach, a hypothetical column of air advected by
the wind is followed through the modeling region. The box model, which is
conceptually the simplest, treats the entire region of interest as a well-
mixed cell.

     Since grid models predict  pollutant  concentrations averaged over a
complete cell, the problem of representativeness mentioned earlier becomes
an issue.  Care should be exercised in  the placing of pollutant monitors
for collecting comparison data  so that  the point measurement  taken by the
monitor is representative of average concentration over an area the  size
of a grid cell.  This problem is of importance for both primary and
secondary pollutants.  For example, CO  and NO concentrations  in the  imme-
diate vicinity of a source can  deviate  significantly from  the  spatially
averaged values.  For secondary pollutants such  as N0« and 0.,, some  time
must elapse between the release of their  precursors  and  their ultimate
formation through photochemical reactions, and this  time  allows  for  the
pollutant cloud to become more  spatially  homogeneous.  Nevertheless, micro-
scale phenomena affect the concentrations of these pollutants as,  for example,
in the depletion of 0^ in the vicinity  of a  roadway  by  fresh emissions  of NO.
                                   117

-------
     Liu et al.  (1976a)  reported three previous  evaluation  studies:

     >  "Further Development and Evaluation of a Simulation
        Model for Estimating Ground Level  Concentrations  of
        Photochemical Pollutants," R73-19, Systems  Applications,
        Incorporated, Beverly Hills (now in San  Rafael),  Califor-
        nia (February 1973).
     >  "Evaluation of a Diffusion Model of Photochemical Smog
        Simulation," EPA-R4-73-012, Volume A (CR-1-273),
        General  Research Corporation, Santa Barbara, California
        (October 1972).
     >  "Controlled Evaluation of the Reactive Environmental
        Simulation Model (REM)," EPA-R4-73-013a, Volume I,
        Pacific Environmental Services, Incorporated, Santa
        Monica, California (February 1973).

The models included were the SAI Airshed Model,  the GRC model  DIFKIN,  and
the PES model REM.  The SAI model is a grid model;  the other  two are
trajectory models.  Data for four pollutants from six measurement  stations
in the Los Angeles basin for six smoggy days in  late summer and early
fall of 1969 were used  for comparison.  The performance measures used
to evaluate the models were:

     >  Correlation coefficients.
     >  Root-mean-square deviation between measurements and
        predictions.
         n
     >  x  test on the  residuals, comparing them .with a normal
        distribution.
     >  Scatter plots of predictions versus measurements.
     >  Plots of residuals against
        -  Time of day
        -  Predicted concentrations
        -  Measured concentrations.
     >  Histograms of residuals.
                                   118

-------
The above set of evaluation methods was chosen to detect both random and
systematic failure of the models to account for the observed results.
However, Liu et al. commented, "the results of statistical  tests are
relatively insensitive indicators of model performance because of the
limited quantity of data, the varying conditions and assumptions,  the
nondistributional character of the data, and the complexity of the
potential source of error.  One should not substitute statistical  analysis
results for an examination of the plots."

     The correlation coefficients were mostly between 0.5 and 0.9, which
is generally higher than results achieved by Gaussian models for one-hour
averages.  Correlations were higher for CO and NO, both primary pollu-
tants, than they were for NOp and 03, which are secondary pollutants.

     The main conclusion from the study was that none of the three models
could be said to have been adequately validated, mainly because of the
sparseness and unrepresentativeness of the data base.  This illustrates the
point that evaluation studies utilizing existing data are less likely to
reach satisfactory conclusions as are studies for which the data-gathering
effort is an integral part of the evaluation effort.  In addition, we note
that these efforts represent an early stage in the development of photo-
chemical models; a considerable amount of developmental work has been
carried out subsequent to these studies.

     A validation study of a later version of the SAI Airshed Model was
carried out by Anderson et al. (1977).  Predictions of ozone concentra-
tions were compared with measurements for three days in 1975-1976 at
nine Denver, Colorado, monitoring stations.  The average time variations in
predicted and measured ozone concentrations are shown in Figure A-6.  Note
that the predictions follow measurements fairly closely.  It was also
shown in this study that the residuals compared very closely in both mean
and standard deviation with the expected error distribution of a measuring
instrument (Figure A-7).  The authors concluded that the observational
evidence is not precise enough to establish confidence limits on the
model predictions.  Overall, the correspondence of predictions to measure-
ments is very close, both for concentrations above the NAAQS for ozone and
for all  of the data (see Table A-15).
                                   119

-------
fs>
O
        E
        J=
        Q.
        Q.
         C
         o
fO
u
4-1
C
OJ
u
C
o
o

4)

O
N
O
   10
            0 <*r
            10
                                                                              ...Q—  OBSERVED
                                                                              —o—
              (1	U	
              ()	
                                                                                                MEAN OF
                                                                                                3 DAYS
                                                                                        3 AUGUST 1976
                                                                                                ?8 JULT 1976
                                                    .^^O—O—0,
                                                                                                29 JULY 1975
                                   Time of Day By Hourly Averaging Period
          Source:  Anderson et al. (1977),
            FIGURE A-6.  TIME VARIATIONS  OVER ALL STATIONS OF OBSERVED ONE-HOUR-AVERAGE OZONE CONCENTRATIONS
                         AND THE CORRESPONDING PREDICTIONS OBTAINED FROM  THE  SAI  AIRSHED MODEL

-------
rv>
                                                                       DEVIATION OF PREDICTED VERSUS OBSERVED POINTS
                                                                       RON PERFECT CORRELATION LINE (281 ONE-HOUR
                                                                       AVERAGE  DATA POINTS)
                                                                       TRUE -  INSTRUMENTAL)
                                                                      EPA ACCEPTADIE MONITOR (MEAN BIAS • -8  PERCENT;
                                                                      ! 3 PPIItl 9 95 PERCENT CONFIDENCE LEVEL)
(TRUE • INSTRUMENTAL)
MAX I HIM PROBABLE ERROR (MEAN
BIAS * -B PERCENT;  * 7 PPHH
  5PERCENT CONFIDENCE LEVEL)
                                                   -2-10      1      2

                                                       Difference  (pphm)
           Source:   Anderson  et al.  (1977)
               FIGURE A-7.   PREDICTIONS OF THE SAI  AIRSHED MODEL  COMPARED WITH  ESTIMATES OF  INSTRUMENT
                              ERRORS FOR OZONE  (DATA  FOR 3  DAYS, 9  STATIONS, DAYLIGHT  HOURS)

-------
                 A-15.   OCCURRENCE OF CORRESPONDENCE  LEVELS OF
                         PREDICTED AND OBSERVED OZONE
          (percentage of comparisons meeting correspondence  level)
        Correspondence Level
Between Predicted and Observed Pairs

Factor of 2 (2P > 0 > P/2)

Computed value is within ± twice
standard deviation maximum probable
instrument error (95% level) of
observed value

Computed value is within ± standard
deviation of maximum probable
instrument error (95% level) of
observed value

Computed value is within ± twice
standard deviation of instrument
errors by EPA standard (95% level)
of observed value

Computed value is within ± standard
deviation of instrument errors by
EPA standard (95% level) of observed
value
Comparisons

    80%

   100
    Both Predicted
     and Observed
Concentrations >8 pphm

          94%

         100
    93
          90
    89
          77
    60
          37
Source:  Anderson et al.  (1977).
                                   122

-------
     CO, NO, and f^ concentrations  were  predicted  less well.  Discrepancies
in the CO results were attributed to microscale  effects not  included  in
the model's formulation.   Observational data  for NO and N02  were  insuf-
ficient to evaluate the model.

     Anderson et al. (1977) drew the following  conclusions from their
study of the SAI Airshed Model:

     >  The model is a very good predictor of one-hour-average
        ozone concentrations in  grid cells in the Denver  region.
     >  The model's ozone predictions at  any  given  station
        probably have at least as narrow  an error distribution
        as do measurements at the same station.
     >  If predictions and measurements are equally accurate,
        model predictions can be expected to  be  within a  factor
        of 2 of true concentrations  80 percent  of the time,
        and its predictions of exceedances of the NAAQS for  ozone
        at that time (i.e., more than 8 pphm) could be expected
        to be within that factor 94  percent of  the  time.
     >  The accuracy of model predictions of  regional maximum
        ozone concentrations should  exceed the  accuracy of model
        predictions of concentrations at  specific stations.

     A verification of the Livermore Region Air Quality Model  (LIRAQ),
(MacCracken et al., 1977) was carried out by  MacCracken and  his coworkers
(MacCracken and Sauter, 1975).  LIRAQ is  a two-dimensional grid model
(i.e., it has no vertical resolution). This  was a  very comprehensive
study, covering data for CO, HC, NO^, NO, and 0- for  two  days  in  1973.
Many different evaluation statistics were evaluated,  as listed in Table
A-16.  Results from the two study days are shown in Table A-17.   Gener-
ally, the temporal correlation coefficients are higher  than  the  station
correlations, which indicates that the model  follows  the  temporal trends
in pollutant concentrations better than the spatial trends.   As  pointed
out above, this effect is to be  expected  in a grid  model, since  predicted
concentrations are averaged over relatively large, grid  squares and the
                                   123

-------
       TABLE  A-16.
STATISTICAL  MEASURES  USED  IN  VERIFICATION  OF THE
LIRAQ  PHOTOCHEMICAL MODEL
                   Statistical Measure	

                   *"*" "time           *"" correlation coefficient in time, given by:
                   Median
                   Sutler-""*
                   Station •««•»
                   • MS >
           •her* Rj-ts  the correlation coefficient for
           measurement  n predictions based on one-bour-
           •veraged station records at the 1th station, and
           *l Is the number of one-hour-aver* 9* records at
           the 1th station.
           Median of the correlation coefficients developed
           for wch of  the stations.

           Correlation  coefficient for the Measured and pre-
           dicted wan  concentrations at the stations. where
           the averaging is over only those hours during  the
           •ir quality  simulation for which Measurements
           mists.
           Correlation  coefficient for the Measured and pre-
           dicted Mutmua concentrations at the various
           stations.
           Ratio of the average predicted to the average
           •Msured concentration. where the average is over
           •11 one-hour station periods within the simulation
           period for which measurements exist.
           Ratio of the average of the predicted mixlMum
           hourly concentration at each of the stations to
           the average  of the Measured maximum one-hour-
           average concentrations at the stations.
           Root-oean-square deviation between predicted and
           Measured one-hour-average station records, based
           on all of the observed data.
           Correlation  coefficient between predictions and
           Measurements based on all of the one-hour-average
           concentration measurements. (Of all the correla-
           tion coefficients calculated, only • is based  on
           a sample site large enough to be used without  a
           substantial  correction for the degrees of freedom.)
Source:   MacCracken  and  Sauter (1975).
                                                124

-------
TABLE A-17.   STATISTICS  FOR THE LIRAQ MODEL EVALUATION STUDY
                                        Value
Pollutant Statistic
oo *•« *tl-t
Bedlan »tl-e
•tation m**n*
•tation
/<0>
^/<0MT>

ft
HC Maafl ft .^
Hadian *tl-e
*• tation m**M
Station m"xlm*
/<0>
^MS*'*0.,**

ft
•02 Ka« fttlM
Hadiaa »tl-t
ft maan
•tation
•tation ***
/<0>
^M-*7^ 	 *

ft
KO »<«an *tlBt
Median »tlat
ft , mcaaa
•tation
Station
/<0>
/
<»G>
ft
0 Kcan ft j^
3 Median »tjmt
t ^ maani
•tation
K . maxima
•tation
/<0>
/<0Ma>

ft
26 July 1973
0.67
0.70
0.58
0.69
0.76
0.86
1.5
0.56
0.62
0.74
0.43
0.25
0.80
0.89
1.4
0.54
0.44
0.79
0.48
0.28
0.88
1.04
3.13
0.54
0.71
0.80
0.85
0.46
1.82
1.65
€.1
0.72
0.79
0.84
0.68
0.79
0.85
0.80
2.4
0.78
20 August 1973
0.35
0.47
0.59
0.86
1.0
0.81
0.77
0.58
0.54
0.56
0.62
0.74
0.79
0.78
1.00
0.60
0.33
0.50
0.33
0.30
1.31
1.12
1.76
0.34
0.64
0.53
0.37
0.86
0.91
0.87
1.95
0.62
0.74
0.86
0.62
0.64
1.56
1.4
1.94
0.73
Source:  MacCracken and Sauter (1975).
                           125

-------
measurements with which they are compared are taken at particular points
within grid squares.  The RMS errors vary widely, but in the absence of
any indication of the accuracy or precision of the instruments used to
obtain the measurements, it is not possible to assess the significance
of these figures.

     The authors concluded that for 26 July 1973 the model  gave a fair
overall fit to most measurements and a very good fit to the 03 data.
The model also gave a good representation of temporal trends for that
day.  It was also concluded that, though NO was overpredicted, other
species were slightly underpredicted, a result that the authors said was
expected considering the 5 km resolution of the model.

     The August 1973 day studied by MacCracken and Sauter (1975) was
characterized by strong winds and low pollutant concentrations.  It was
concluded that the spatial and temporal correlations for this day were
quite good for all species except NOg, which was overpredicted quite
markedly.  This overprediction was attributed to too high a boundary con-
centration above the mixing layer.

     MacCracken and Sauter (1975) pointed out that statistical verification
measures can be heavily influenced by outlying points.  For instance, the
statistic /<0>* can be substantially diminished by one large measure-
ment (particularly if few results are being compared).  They pointed out
that examination of concentration patterns as well as statistics can alle-
viate this problem.  In addition, the use of statistics that are relatively
insensitive to outlying data values can help to minimize their effects
(see for example, the study by Cleveland et al., 1976).  Care must be taken,
however, that the data values whose Importance 1s being diminished are
truly spurious, and not actual high concentrations that the model 1s falling
to predict correctly.

     The SAI Reactive Plume Model (RPM) was evaluated in the course of a
study to determine the feasibility of ozone formation 1n power plant plumes
(Tesche et al., 1976).  The results from 16 field experiments were used
to assess the performance of RPM.  Data obtained from aircraft flights

* The ratio of the average predicted to the average measured concentration
  (see Table A-16).
                                   126

-------
operated by Meteorology Research,  Incorporated,  and  the University of Wash-
ington provided the data base for  initial  conditions,  and,  farther downwind,
concentration data for comparison  with model  predictions.

      The dilution  scheme used by RPM was examined using data from a
previous study with an  inert tracer (SFg) (Liu et al., 1976b).  Good
agreement  between  RPM predictions and tracer measurements was observed, as
evaluated  visually from data plots, suggesting that the transport and dis-
persion portion of RPM performs quite satisfactorily.

      Comparisons of measured and predicted concentrations of reactive
species were made for NO , 0^, and S09.  In the case of NO , there were
                        X   O        £                    X
discrepancies between the measurements by the two aircrafts, and the model
predictions agreed more closely with the MRI results.   For both NO  and
                                                                  r*
0-, the model tended to overpredict.  The discrepancies were attributed
to uncertainty in the hydrocarbon measurements needed  for the chemical
reaction calculations.  Later study showed that some hydrocarbon concen-
trations used as input to RPM were too high, which would have had the
effect of  inflating the predicted concentrations of both NOp and 0.,.  More-
over, a sensitivity study showed that the variable to  which RPM predic-
tions were most sensitive was the ambient reactive hydrocarbon concentra-
tion.  Another source of uncertainty in the model predictions was the
background pollutant concentrations, which were obtained from the aircraft
traverses of the plume.  This information was used to  prescribe the concen-
trations of ambient pollutants.entrained into the plume.  The aircraft measure-
ments were taken at particular locations and times, however, and cannot com-
pletely describe the possible temporal and spatial variations in the back-
ground reactive hydrocarbon concentration field.

      The overall conclusions of the study, based on graphical comparisons
of measured and predicted concentrations, were:

      >  The dilution of conservative pollutants and tracer
        material is described well by RPM.
      >  NO  and 0, were generally overpredicted in the study,
          X      0
        probably in large part because of excessively  high
        estimates of entrained hydrocarbons.

                                   127

-------
      >   The accuracy of the model for ozone predictions at large
         downwind distances is of the order of ±25 percent.
         This accuracy is commensurate with the accuracy of the
         field data used.

      A  numerical  grid model developed by Shir and Shieh (1974) was evaluated
as  part of its development, with data for a 25-day period in St. Louis
during  February 1965.  The model was developed to calculate S02 concentra-
tions,  and only first-order chemical reaction effects were considered.  The
air quality data  base employed in the study consisted of observations from
10  stations that  reported two-hour-averaged concentrations.  In addition,
there were instruments at those stations that recorded the 24-hour-average
concentrations.

     Figure A-8 shows  a  comparison of measured and  predicted  concentra-
tions averaged over the 25 days. Correlation coefficients of 0.873 for the
24-hour-average data and 0.899 for the  two-hour-average data were obtained.
No explanation was  offered by Shir and  Shieh (1974) as to why the correla-
tion  coefficient  for two-hour averages  was higher than that for 24-hour
averages.   Presumably this discrepancy  is an artifact of the different instru-
ments used  for the  measurements.
                                     • 24-Hour Av. Dtta. r-0.173
                                     * 2-Hour Av. 0«U. r« 0.899
                                      3-MonlKiUc«nof 2-
                                        •nd Computed flrajla Inm
                                             Plwir* Uadrt. f - CLOTS
                                      t * cprtUHOt »•««•«*
      Source:  Shir and  Shieh  (1974).
        FIGURE A-8.
COMPARISON OF MEASURED AND PREDICTED  25-DAY-
AVERAGED SO, CONCENTRATIONS  (1-26 FEBRUARY 1965)
                                  128

-------
     Figure A-9 shows the comparison between measured and predicted 24-
hour-average concentrations; Figure A-9 (a) shows the temporal  variations
of the daily values, and Figure A-9 (b) illustrates the results in a set of
scatter plots.  Discrepancies were ascribed to (1) underestimated emissions
rates from the emissions inventory model, and (2) microscale effects that
cannot be modeled on the grid scale used.  The scatter plots show good agree-
ment for most stations, and the overall correlation coefficients between
measured and predicted values are 0.806 for the logarithms of the concen-
trations and 0.654 for the concentrations themselves.

     Figure A-10 shows the comparison between observed and predicted two-
hour-average concentrations.  The Shir and Shieh model predicts these
shorter-term averages less well, as may be seen by comparing Figures A-9
and A-1Q.  The correlations for the results in Figure A-10 were 0.706 for
the logarithms of the concentrations and 0.531 for the concentrations them-
selves.  We note that these correlation coefficients are lower than those
calculated for the 24-hour-average concentrations, in contrast to the station-
averaged data presented above.   Again, no explanation was advanced by the
authors for this discrepancy.

     The frequency distributions for the two-hour-average concentrations were
then classified according to different variables and compared.   As illus-
trated in Figure A-ll, pollutant frequency distributions were constructed based
upon the prevailing wind direction.

     Comparisons for four different wind sectors are shown.  The agreement
is best for the NE and SE wind directions.  For NW winds, concentrations
were overpredicted, and for SH winds, they were underpredicted.  The authors
hypothesized that since these winds often carry cold air and warm air,
respectively, the discrepancies may be due to lag response of the emissions
rates of changing air temperatures.  This hypothesis was checked by construct-
ing the frequency distributions for different temperature ranges, shown  in
Figure A-12.  Although the agreement is good for temperatures in the range
-3 < T < 3°C, concentrations are overpredicted for temperatures  below
-3°C and underpredicted for temperatures above 3°C.

                                   129

-------
JOO
100

 10
300
too

 10

 10
300
100

 10
300
100

 te
• 15
   • 17
•23
                 13    17   21
                                                          • 2-Hr. Obw«dD«ti
                                                          • 24-Hr. Ob«rv«dD*a
   (a)  24-Hour-Average Variations  of S02 Concentrations for  a
        25-Day  Period at Each Monitoring Station
                       1000
                       100
                         10    100   1000     10    100  1000
    (b)   24-Hour-Average S02 Concentrations at Each Station
  Source:  Shir and  Shieh (1974).
   FIGURE A-9    COMPARISON OF PREDICTIONS AND MEASUREMENTS OF  S02
   FIGURE A 9    SNCENTRATIONS  IN ST.  LOUIS REPORTED BY SHIR AND SHIEH
                                   13d

-------
                                                                2-Hr. Data
  1000
   100^
                        8    10    12    14    16    18   20    22    24   26
                        8    10   12   14    16    18    20   22    24    26
                                                                       - #10
                                                                       • #12
                                                                         #28
                                                                         #36
    10
Source:  Shir and Shieh  (1974).
   FIGURE A-10.
COMPARISON OF PREDICTED AND MEASURED TWO-HOUR AVERAGED
S02 CONCENTRATIONS AT EACH MONITORING  STATION IN ST. LOUIS.
Dots represent measurements;  lines  represent predictions.
                                    131

-------
                      1000
                      too
                      100
                       90
                         .  S-WWind   /.
                       -   S-EWM
                        10  30 60 70   to 10  306070  M
                                   Ptnamilt
                     Source:   Shir and Shieh  (1974).
FIGURE A-11.
COMPARISON OF  PREDICTED AND MEASURED TWO-HOUR-
AVERAGED FREQUENCY DISTRIBUTION OF S02  CON-
CENTRATION ACCORDING TO WIND SECTORS.
Combined data  from nine stations.
                                 132

-------
                woo
                 •00
                 100
                  10
                  W
                      r<-rc
-rcrc
                   10  30 BO 70  M 10 30 50 70 90 10 30 SO 70 M
               Source:   Shir  and  Shieh  (1974).
        FIGURE  A-12.  COMPARISON OF  FREQUENCY DISTRIBUTIONS OF PREDICTED
                     AND flEASURED TWO-HOUR-AVERAGE Sp2 CONCENTRATIONS
                     ACCORDING TO AMBIENT TEMPERATURE RANGE
     In Figure A-13, three station frequency distributions are shown.
Good agreement is found for two of the stations, but at Station 10, which
is near point sources, concentrations are underpredicted.
     In conclusion, the authors cited the following advantages of the
model:
     >  Consistent performance under different conditions,
        particularly in both strong and light winds, and
        for sudden shifts in wind.
     >  Flexibility in handling arbitrary distributions of
        sources with different emissions rates, spatial and
        temporal variations in the wind field, eddy diffusion
        coefficients, stability, and mixing height.
     >  Ability to deal with surface roughness, topographical
        features, and chemical reactions (although only a
        limited treatment of chemistry was included in the model
        tested).
                                133

-------
              too
              100
                  •10
                  	Compuad
                  	Obiwwd
                              •12
                10  30 iO 70 K> 10  30 SO 70  K> 10 30 K) 70
            Source:   Shir and Shieh (1974).
   FIGURE A-13.   COMPARISON OF FREQUENCY DISTRIBUTIONS OF PREDICTED
                 AND MEASURED TWO-HOUR-AVERAGE S02 CONCENTRATIONS
                 FOR THREE STATIONS
The following disadvantages were  pointed  out:

     >  Neglect of microscale effects.
     >  Possible problems with  the  numerical  integration
        methods; these could be solved  by using  more
        accurate methods.
     >  Neglect of the turbulence attributable to  the effect
        of the urban area.
     >  Inaccurate representation of  temperature dependence
        of emissions.
     >  Lack of representativeness  of the data from monitor-
        ing stations.
                               134

-------
     The studies cited here indicate that evaluation  studies  of  numeri-
cal models have, in general, been more thorough  and comprehensive  than
studies of Gaussian models.  This is probably  a  result  of  their  compre-
hensive nature as models; these models are applied to multiple pollutants
with chemical reactions and cover a complete urban area.   Thus,  a  numeri-
cal model uses comprehensive inputs and produces much output, requiring
a fairly comprehensive study to evaluate its various  aspects.

3.  MODEL EVALUATION METHODOLOGY

     In the above discussion, we reviewed many model  evaluation  studies.
Obviously much effort has been directed towards  evaluating air quality
models.  However, relatively little work has addressed  the considerations
to be taken into account when designing such a study—the  appropriate level
and methods of data collection and suitable performance measures and
standards.  We now discuss three studies that  deal with the methodology
of model evaluation.   Brier (1973) examined the often-used practice of
"calibrating" models with observed data.  Calibration consists of  calcula-
ting a linear regression of observed on calculated values  for a  given set
of station measurements and then using this derived relationship to con-
vert future computed values to "true" concentrations.  Brier  concluded
that calibration "is not a statistically valid procedure when used to make
predictions of air quality for a distribution  of emissions differing from
that under which the calibration was actually  established."  Brier based  this
conclusion on two considerations:

     >  The data sample for which the calibration  relationship
        is derived does not correspond to the  situation to
        which it is later applied.  In statistical terms,  the
        population from which the relationship is  derived  is  not
        the same as the one to which it is applied.
     >  Use of the regression procedure entails  the  assumption
        that the data points are statistically independent.
        The data used in calibration do not satisfy  this  assump-
        tion because of spatial and temporal correlations of
        concentrations.
                                   135

-------
 Brier  suggested  that,  if  no alternative  to calibration were available, the
 procedure might  be made more meaningful  by:

      >  Improvement  of measurement and sampling  techniques.
      >  Making separate calibrations  for different  sets of
         conditions,  e.g., for  different  meteorological
         situations.
      >  Use of different  statistical  models.

      In a  later study, Brier  (1975) examined  some statistical questions
 relating to comparisons between predicted and measured values.  He dis-
 cussed the  relationships  among errors in model inputs, errors due to
 imperfection in the  nrdel, and errors in the  observations used for valida-
 tion.   Suggestions were made as to estimating separately the input and
 output data uncertainties and  the errors introduced by model inadequacies.
 However, the methods required  some assumptions about independence of error
 sources, and Brier concluded that more work was  necessary to see if a solu-
 tion  to this problem could be  found.   Finally, recommendations on model eval^
 uation procedures were made.   First,  Brier stated that "it does not seem
 desirable to specify a fixed set of rules to  be  followed blindly under all
 conditions.  However, certain general  guidelines  and suggestions can be
 provided that should fee applicable in most cases to give assistance in
 planning and executing a  validation study."   A sensitivity analysis of the
 model  under study was suggested as an important  requirement for verifica-
 tion.   The  purposes  of the sensitivity analysis  in  this context would be:

     >  To  reveal internal inconsistencies in the model.
     >  To  identify  the parameters that  dominate the model's
         operation.
     >  To  provide guidance for data  collection.
     >  To  investigate error propagation  through the model.

After  the sensitivity  analysis  is  complete, further analysis would center
on comparisons between observed  and predicted  concentrations.  Statistics
such as  correlation  coefficients,  variances of observed and predicted results,
mean square errors,  mean absolute  differences, statistics of the regression
of observed on predicted concentrations,  and  characteristics of the error
                                   136

-------
distribution were suggested for evaluation.   Attention should be given to
departures from normal distributions in evaluating some of these statistics.
Brier suggested that all of these results should be considered in evaluat-
ing the model, since they can all provide information on different aspects
of the comparison between observed and predicted concentrations.

     Nappo (1974) also studied methods for evaluating air  quality
models.  He pointed out that judging the relative merits of different models
that embody both temporal and spatial  resolution on the basis of temporal
correlation coefficients at various stations  is  inadequate.   He also  pointed
out that such models should be judged not-only on how well  they followed
temporal trends, but on the accuracies of their  spatial  predictions.   He
defined the quantities R(t)s, the average over all  monitoring stations of
the temporal  correlation coefficients, and ETsT  » the time-averaging  of the
spatial correlation coefficients.  Ideally, both quantities  should be equal
to 1, and the model's overall quality is to be judged by how closely  this goal
is approached.  Figure A-l4(a) shows a plot of these quantities for nine air
quality simulation models.   Eight of these models show better temporal
correlations than spatial correlations.  Thus, conclusions drawn on the basis
of one of these criteria are not necessarily  valid when the  other criterion
is considered.  In addition to the model's ability to follow trends in the
data, measured by the correlation coefficient, Nappo pointed out that an
important feature of performance is the model's  ability to predict the correct
quantities of pollutants formed.  This  ability is measured by the terms r(t)s, the
space average of the time-averaged ratios of  predicted to measured concentra-
tions, and r(s) , the time  average of the space-averaged ratios.  Since these
quantities both represent the grand average of all  ratios, they are equal, but
their standard deviations are not.  Denoting  the respective standard  deviations
by o(t)3 and o(s) , he derived the plot shown in Figure A-14(b).  Examination
of the data in that figure shows that the models considered vary widely in
the certainty with which they predict the amounts of pollutant at a partic-
ular station as opposed to at a particular  time.  Nappo contended that only by
considering all aspects of a model's performance can a true picture of its
utility be obtained and can different models  be  adequately compared.
                                  137

-------
                        to
                       OB
                       0*
                       0.4
                       0.2
(a)
                          O    O.Z   O.4   O.«    O.«    1.0
                                       versus
CO
00
                                                          (c)
                                                                                                 versus
                                                                                                                  2.0
                                                                                          Key to  A1r Quality
                                                                                          Model:
                                                                                                 MOTH ft */. (19711
                                                                                                 MCTMOLOS •' •
                                                                                                 HANNA  H973I

                                                                                                 MNDOLFO AND JACOBS »»T3»
                                                                                                 SKLANCW H 01.  K972)

                                                                                                 LAMB AND NCIBUHCEM II97M
                                                                                                 MocCNACKCN ft ft 1197*1
                                                                                                 24 hr KMSISTCNCC
                Source:   Nappo (1974).

                               FIGURE  A-14.  STATISTICAL  QUANTITIES  CALCULATED FOR NINE AIR QUALITY
                                              SIMULATION MODELS.  Quantities  defined 1n  text.

-------
     Although the work reviewed here addresses some statistical  aspects
of measuring model performance, it is evident that there has been a lack
of studies directed toward the development of air quality model  evaluation
methodology.  The material presented in the main body of this report
represents a first step to rectify this situation.
                                 139

-------
                            REFERENCES


Anderson, G.E., et al. (1977),  "Air Quality  in  the Denver Metropolitan
     Region, 1974-2000," EPA-908/1-77-002, Environmental Protection Agency
     Region VIII, Denver, Colorado.

Argonne National Laboratory (1976), "Report  to  U.S.  EPA of the Specialists
     Conference on the EPA Modeling Guideline," Energy and Environmental
     Systems Division, Argonne, Illinois.

Bohac, R. L., et al.  (1974), "Sensitivity of the Gaussian Plume Model,"
     Atmos. Environ., Vol. 8,  p.  291

Brier, G. W. (1973),  "Validity of the Air Qaulity Display Model Calibra-
     tion," EPA-R4-73-017, Environmental Protection  Agency, Research
     Triangle Park, North Carolina.

Brummage, K. G. (1968), "The Calculation of  Atmospheric Dispersion from
     a Stack," Atmos. Environ., Vol. 2, pp.  197-224.

Burton, S.C.,et al. (1976), "Oxidant/Ozone Ambient Measurement Methods: An
     Assessment and Evaluation,"  EF76-111R,  Systems  Applications,
     Incorporated, San Rafael, California.

Cleveland, W. S., et al. (1976),  "Robust Statistical  Methods  and Photo-
     chemical Air Pollution Data," J. Air Pollut. Control. Assoc., Vol.
     26, pp. 36-38.

Dabberdt, W. F.,et al. (1973), "Validation and  Application of an Urban
     Diffusion Model  for Vehicular Pollutants," Atmos.  Environ., Vol. 7,
     p. 603.

Environmental Protection Anency [EPA] (1978a) "Guideline on  Air Quality
     Models, "  EPA-450/2-78-027, Research Triangle Park,  North Carolina.

	(1978b), "Guidelines for Air Quality Maintenance Planning and
Analysis,  Volume  9 {Revised):  Evaluating Indirect Sources," EPA-450/4-78-001,
Research Triangle Park, North  Carolina.

	(1976)  "Quality  Assurance Handbook for Air Pollution Measurement,"
      EPA-600/9-76-005,  Research Triangle Park, North Carolina.

	(1973), "Guide  for Compiling a Comprehensive Emission Inventory
 (Revised),"  Research  Triangle  Park, North Carolina.

 	 (1972), "Compilation  of Air  Pollutant Emission Factors," AP-42,
      Research Triangle Park, North Carolina.

 Eschenroeder, A. Q., 0. R. Martinez  and  R  A. Nordsieck (1972), "Evaluation
      of a Diffusion Model for Photochemical Smog Simulation," CR-1-273, EPA-
      R4-73-012a, General  Research Corporation, Santa Barbara, California.
                                  141

-------
Guzewich, D. C., and W.J.B. Pringle (1977), "Validation of the EPA-PTMTP
     Short-Term Gaussian Dispersion Model," J. Air Pollut. Control  Assoc.,
     Vol. 27, p. 540.

Hanna, S. R. (1973), "Urban Air Pollution Models—Why?"  ATDL Contribution
     File No. 83, Atmospheric Turbulence and Diffusion Laboratory,  Oak
     Ridge, Tennessee.

Hayes, S. R. (1979), "Performance Measures and Standards for Air Quality
     Simulation Models," EF78-93R, Systems Applications, Incorporated,
     San Rafael, California.

Hayes, R. S., S. D. Reynolds, and P. M. Roth  (1977), "A Commentary on the
     Analysis of Control Measures Required to Achieve Compliance with the
     National Ambient Air Quality Standards:  The Selection of Models and
     the Specification of Data Requirements," Systems Applications, Incor-
     porated, San Rafael, California.

Hilst, G. R. (1978), "Plume Model Validation," EA-917-SY, Workshop WS-78-99,
     Electric Power Research Institute, Palo Alto, California.

Hougland, E. S. and N. T. Stephens (1976), "Air Pollutant Monitor Siting
     by Analytical Techniques," J. Air Pollut. Control Assoc., Vol. 26, p. 51

 Koch,  R.  C., and  G.  E.  Fisher  (1973),  "Evaluation of the  Multiple  Source
      Gaussian  Plume  Diffusion  Model—Phase I,"  EPA-650/4-75-018a,  Environ-
      mental  Protection  Agency,  Research Triangle Park,  North  Carolina.

 Koch,  R.  C., and  S.  D.  Thayer  (1971),  "Validation and  Sensitivity  Analysis
      of  the  Gaussian  Plume  Multiple-Source Urban Diffusion Model,"
      EF-60,  GEOMET,  Incorporated, Gaithersburg, Maryland.

 Lamb,  R.  G., and  M. Neiburger  (1971),  "An  Interim Version of  a Generalized
      Urban Air  Pollution Model," Atmos. Environ., Vol.  5, pp. 239-264.

 Lawrence  Berkeley Laboratory  (1976),  "Instrumentation  for Environmental
     Monitoring," LBL-1, University of California, Berkeley,  California-

 Lenhard,  R.  W.  (1970),  "Accuracy of Radiosonde Temperature and Pressure--
     Height  Determination,  "Bull. Am. Meteorol. Soc.,  Vol. 51, pp. 842-846.

Liu, M. K., Stewart, D. A., and Roth, P. M.  (1978),  "An Improved Version of
     the Reactive Plume Model  (RPM-II), 9th  International Technical Meeting
     on Air Pollution Modeling and its Application,  28-31 August,  1978,
     Toronto, Canada.
                                  142

-------
Liu, M. K.,et al.  (1977), "Development of a  Methodology  for  Designing
     Carbon Monoxide Monitoring Networks," EPA-600/4-77-019,  Systems  Applications,
     Incorporated, San Rafael, California.

           (1976a), "Continued Research in Mesoscale  Air Pollution Simula-
     tion Modeling:  Volume I—Assessment of prior  Model  Evaluation  Studies
     and Analysis of Model Validity and Sensitivity,"  EPA-600/4-76-016a,
     Systems Applications, Incorporated, San Rafael, California.

	 (1976b), "The Chemistry, Dispersion,  and Transport  of  Air Pollutants
     Emitted from Fossil Fuel Power Plants in California:  Data Analysis and
     Emission Impact Model,"  EF76-18, Systems Applications,  Incorporated,
     San Rafael, California.

MacCracken, M. C., and G. D.  Sauter, eds. (1975),  "Development of an Air
     Pollution Model for the  San Francisco Bay Area,"  Appendix 12-3,
     UCRL-51920, Lawrence Livermore Laboratory,  University of California,
     Livermore, California.

MacCracken, M. C. et al. (1977), "The Livermore  Regional  Air Quality Model:
     I.  Concept and Development," UCRL-77475 Pt.  1, Rev.  2, Lawrence
     Livermore Laboratory, University of California, Livermore, California.

	 (1971), "Development of a Multibox Air Pollution  Model and Initial
     Verification for the.San Francisco Bay Area,"  UCRL-73348, Lawrence
     Radiation Laboratory, Livermore, California.

MacCready, P. B. and H. R. Jex  (1963), "Response Characteristics  and Applica-
     tion Techniques of Some Meteorological Sensors,"  Meteorology Research,
     Inc., Altadena, California, and Systems Technology,  Inc., Inglewood,
     California.

Maldonado, C., and J. A. Bullin (1977),  "Modeling Carbon Monoxide Dispersion
     from Roadways," Environ. Sci. Techno!., Vol. 11,  p.  1071.

Mazarella, D.A.  (1972), "An  Inventory of Specifications for Wind Measuring
     Instruments," Bull. Am.  Meteorol. Soc., Vol. 53,  No. 9.

McElroy, J. L.  (1969),  "A Comparative Study of Urban and Rural Dispersion,"
     J. Appl. Meteorol., Vol. 8, pp. 19-31.

Meteorology Research Inc.  (1975), Technical  information courtesy of T. B.
     Smith.

 Mills, M.  T., and F. A. Record (1975),  "Comprehensive Analysis of Time-
      Concentration Relationships  and Validation  of a  Single-Source
      Dispersion Model, "  EPA-450/3-75-083,  Environmental  Protection Agency,
      Research Triangle Park, North Carolina.
                                   143

-------
Mills, M. T., and R. W. Stern (1975), "Model Validation and Time-Concentration
     Analysis of Three Power Plants," EPA-450/3-76-002, Environmental  Pro-
     tection Agency, Research Triangle Park, North Carolina.

Nappo, C. J., Jr. (1974), "A Method for Evaluating the Accuracy of Air
     Pollution Prediction Models," in Preprints of the Symposium on
     Atmospheric Diffusion and Air Pollution, 9-13 September 1974, Santa
     Barbara, California (sponsored by American Meteorological Society,
     Boston, Massachusetts).

Pandolfo, J. P., and C. A. Jacobs (1973), "Tests of an Urban Meteorological
     Pollutant Model Using CO Validation Data in the Los Angeles Metropolitan
     Area," Vol. 1, EPA-R4-730-025a, The Center for the Environment and  Man,
     Incorporated, Hartford, Connecticut.

Pooler,  F., Jr.  (1974), "Network Requirements for the St. Louis Regional Air
     Pollution Study" J. Air Pollut. Control Assoc., Vol. 24, p. 228.

Prahm, L. P., and M. Christensen (1977), "Validation of a Multiple Source
     Gaussian Air Quality Model," Atmos. Environ., Vol. 11, pp. 791-795.

Reynolds, S. D., et al. (1973), "Further Development and Evaluation of a
     Simulation  Model for Estimating Ground Level Concentrations of
     Photochemical Pollutants," EPA-68-02-0339, Systems Applications,  Incor-
     porated, San Rafael, California.

Roth, P. M., et  al. (1975),  "An Examination of the Accuracy and Adequacy of
     Air Quality Models and  Monitoring Data for Use in Assessing the Impact
     Of  EPA Significant Deterioration Regulations on Energy Developments,"
     EF75-58R, Systems Applications, Incorporated, San Rafael, California.

	  (1971),  "Development of a Simulation Model for Estimating Ground
     Level Concentrations of Photochemical Pollutants," 71-SAI-21, Systems
     Applications,  Incorporated, San Rafael, California.

Rubin, E. S. (1974),  "The Influence of Annual Meterological Variations on
     Regional Air Pollution  Modeling:  A Case Study of Allegheny County
     Pennsylvania." J. Air Pollut. Control Assoc., Vol. 24, p. 349.

Seinfeld, J. H.  (1972),"Optimal Location of Pollutant Monitoring Stations
     in  an Airshed," Atmos. Environ.. Vol. 6, p. 847.

Seinfeld, J. H.  (1977), "Current Air Quality Simulation Model Utility,"
     Department  of Chemical  Engineering, California Institute of Technology,
     Pasadena, California.

Shir, C. C., and L. J. Shieh (1974), "A Generalized Urban Air Pollution Model
     and Its Application to  the Study of S02 Distributions  in the St.  Louis
     Metro!opitan Area," J.  ADD!. Meteorol., Vol. 13, p. 185.
                                   144

-------
Shum, Y. S., et al. (1975), "The Use of Artifical  Activable Trace Elements
     to Monitor Pollutant Source Strengths and Dispersal  Patterns,"
     J. Air Pollut. Control Assoc..  Vol.  25, p.  1123.

Sklarew, R. C., et al. (1971), "A Particle-in-Cell  Method for Numerical
     Solution of the Atmospheric Diffusion Equation,  and  Applications to
     Air Pollution Problems,"  Final  Report 3SR-844, Systems, Science and
     Software, La Oolla, California.

Tesche, T. W. (1978),  "Evaluating Simple Oxidant  Prediction Methods Using
     Complex Photochemical Models,"  Monthly Technical  Progress Narrative No. 1,
     EM78-14, Systems  Applications,  Incorporated,  San Rafael, California.

Tesche, T. W., et al.  (1976),  "Determination of the Feasibility of Ozone
     Formation in Power Plant  Plumes," EA-307, Electric Power Research
     Institute, Palo Alto, California.

Trijonis, 0. C., and K. W. Arledge (1975), "Utility of Reactivity Criteria 1n
     Organic Emission Control  Strategies for Los  Angeles," TRW Environmental
     Services, Redondo Beach,  California.

U.S. Army Signal Missile Support Agency (1960), "A Comparison between the
     Double-Theodolite and Single-Theodolite Wind Measuring Systems,"
     Progress Report NR-11, Wind Effect on the Aerobee, White Sands Missile
     Range, White Sands, New Mexico.

Weather Measure Corporation (1974),  Technical Information, P.O. Box 41257.
     Sacramento, California  95841.
                                   145

-------
146

-------
                                   TECHNICAL REPORT DATA
                            (Please read fiiatnictions on the reverse before completing)
1  REPORT NO.
EPA-450/4-79-033
                                                           3. RECIPIENT'S ACCESSION-NO.
4. TITLE AND SUBTITLE
Procedures for  Evaluating the Performance of Air  Quality
Simulation Models
             5. REPORT DATE
             October  1979
             6. PERFORMING ORGANIZATION CODE
7. AUTHORIS)
 . J. Hillyer,  S.  D.  Reynolds, P. M. Roth
                                                           8. PERFORMING ORGANIZATION REPORT NO.
9. PERFORMING ORGANIZATION NAME AND ADDRESS
 ystems Applications,  Incorporated
950 Northgate  Drive
San Rafael, California   94903
                                                           10. PROGRAM ELEMENT NO.
             11. CONTRACT/GRANT NO.

              68-02-2593
12. SPONSORING AGENCY NAME AND ADDRESS
                                                           13. TYPE OF REPORT AND PERIOD COVERED
 )ffice of Air  Quality Planning and Standards
 J. S. Environmental  Protection Agency
Research Triangle  Park,  North Carolina  27711
             14. SPONSORING AGENCY CODE
15. SUPPLEMENTARY NOTES
     Currently  there are no standardized guidelines  for evaluating the  performance of
air quality  simulation models.  In this report, we develop a procedural  framework for
objectively  evaluating model performance.   In carrying out this work, we have:

        Reviewed  previous model evaluation  studies.
        Developed a  general procedural framework  for performing an evaluation study.
        Provided  specific guidance, to the  extent possible, with respect to the work
        required  in  each step of the performance  evaluation procedure.
        Identified gaps in present knowledge that limited our ability to provide more
        detailed  guidance in this report, and presented recommendations  for further
        work that will help to fill those gaps.

tecause model evaluation has received relatively  little systematic attention to date,
we were able to identify several areas ripe for future investigation.   The performance
of these suggested studies will be essential to the  success of the guidelines presented
herein.
 7.
                               KEY WORDS AND DOCUMENT ANALYSIS
                  DESCRIPTORS
                                             b.IDENTIFIERS/OPEN ENDED TERMS
                           c. COSATI Field/Group
Air Pollution
Turbulent  Diffusion
Mathematical Models
Computer Models
Atmospheric Models
Dispersion
Air Quality  Simulation
  Models
Model Validation
Model Evaluation
 3. DISTRIBUTION STATEMENT

 Release  unlimited
19. SECURITY CLASS (This Report)

 None.
21. NO. OF PAGES
                                              20. SECURITY CLASS (Thispage)
                                               None
-159.
                           22. PRICE
EPA Form 2220-1 (9-73)

-------