United States       Office of Air Quality        EPA-450/4-84-023
Environmental Protection  Planning and Standards      September 1984
Agency         Research Triangle Park NC 27711
Air
Interim Procedures
For Evaluating Air
Quality Models
(Revised)

-------

-------
                                   Disclaimer
         This report has been reviewed by The Office of Mr Quality
         Planning and Standards, U. S. Environmental Protection
         Agency, and has been approved for publication.  Mention of
         trade names or commercial products is not intended to
         constitute endorsement or recommendation for use.
U.S.
                                     11

-------
                                     EPA-450/4-84-023
Interim Procedures for  Evaluating Air
         Quality Models (Revised)
                         U.S. Environmental Protection Agency,
                         Region V. Library
                         230 South Dearborn Street ^
                         Chicago^ Illinois 60604
             U.S. ENVIRONMENTAL PROTECTION AGENCY
               Monitoring and Data Analysis Division
              Office of Air Quality Planning and Standards
             Research Triangle Park, North Carolina 27711

                      September 1984

-------
                                Preface







     The quantitative evaluation and comparison of models for application




to specific air pollution problems is a relatively new problem area for




the modeling community.  Although considerable experience has been gained




in applying the procedures contained in an earlier version of this document,




it is expected that there will continue to be a number of problems in




carrying out the procedures described herein.  Thus, procedures discussed




in this document should continue to be considered interim.




     EPA Regional Offices and State air pollution control agencies are




encouraged to use this document to judge the appropriateness of a proposed




model for a specific application.  However, they must exercise judgment




where individual recommendations are not of practical value.  After a




period of time during which further experience is gained, problem areas




will become better defined and will be addressed in additional revisions




to this document.




     The procedures described herein are specifically tailored to




operational evaluation, as opposed to scientific evaluation.  The main




goal of operational evaluation as applied here is to determine whether




a proposed model i.s that which is most reliable for use in a specific




regulatory action.  The ability of various sub-models (plume rise, etc.)




to accurately reproduce reality or to add basic knowledge assessed by




scientific evaluation is not specifically addressed by these procedures.




     An example illustrating the procedures described in this document




has been prepared, and is attached as Appendix B.  As noted in the preface




to Appendix B,  the primary utility of the example is to illustrate some
                                 iii

-------
considerations in designing the performance  evaluation protocol.   The




example is not intended to be a "model"  to be  followed in an individual




application of these procedures.
                                 iv

-------
                           Table of Contents
Preface 	iii

Table of Contents 	   v

List of Tables 	 vii

List of Figures 	  ix

Summary 	  xi

1.0  INTRODUCTION 	   1

     1.1  Need for Model Evaluation Procedures 	   2
     1.2  Basis for Evaluation of Models 	   4
     1.3  Coordination with Control Agency	   5

2.0  PRELIMINARY ANALYSIS 	   7

     2.1  Regulatory Aspects of the Application 	   7
     2.2  Source and Source Environment 	   8
     2.3  Reference Model	  10
     2.4  Proposed Model	  11
     2.5  Preliminary Estimates 	  12
     2.6  Technical Comparison with the Reference Model 	  13
     2.7  Technical Evaluation When No Reference Model Is Used 	  14
     2.8  Technical Summary 	  16

3.0  PROTOCOL FOR PERFORMANCE EVALUATION 	  19

     3.1  Performance Measures 	  20

          3.1.1  Model Bias 	  21
          3.1.2  Model Precision  	  23
          3.1.3  Correlation Analysis 	  24

     3.2  Data Organization 	  25

     3.3  Protocol Requirements 	  27

          3.3.1  Performance Evaluation Objectives 	  29
          3.3.2  Selecting Data Sets and Performance Measures 	  31
          3.3.3  Weighting the Performance Measures 	  34
          3.3.4  Determining Scores for Model Performance 	  36
          3.3.5  Format for the Model Comparison Protocol 	  38

     3.4  Protocol When No Reference Model Is Available 	  42
                                  v

-------
4.0  DATA BASES FOR THE PERFORMANCE EVALUATION 	  45

     4.1  On-Site Data 	  46

          4.1.1  Air Quality Data 	  46
          4.1.2  Meteorological and Emissions Data 	  50

     4.2  Tracer Studies 	  51
     4.3  Off-Site Data 	  53

5.0  MODEL ACCEPTANCE	  57

     5.1  Execution of the Model Performance Protocol 	  57
     5.2  Overall Acceptability of the Proposed Model	  59
     5.3  Model Application	  60

6.0  REFERENCES	  63

APPENDIX A.  Reviewer's Checklist 	 	 A-l

APPENDIX B.  Narrative Example 	,	 	 B-l

APPENDIX C.  Procedure for Calculating Non-Overlapping Confidence
             Intervals 	.	 C-l
                                 VI

-------
                             List of Tables


Number                           Title                                Page


 3.1    Statistical Estimators and Basis for Confidence
        Limits on Performance Measures 	  22

 3.2    Summary of Candidate Data Sets for Model
        Evaluation 	  28

 3.3    Summary of Data Sets and Performance Statistics for
        Various Performance Evaluation Objectives 	  32

 3.4    Suggested Format for the Model Comparison Protocol 	  39

 5.1    Suggested Format for Scoring the Model Comparison 	  58
                                vii

-------
                            List of Figures


Number                          Title                                Page
   1     Decision Flow Diagram for Evaluating a Proposed
         Air Quality Model 	 xii

 3.1     Observed and Predicted Concentration Pairings Used
         in Model Performance Evaluations 	  26
                                IX

-------
                                Summary




     This document describes interim procedures for use in accepting, for




a specific application, a model that is not recommended in the Guideline




on Air Quality Models-'-.  The primary basis for the model evaluation




assumes the existence of a reference model which has some pre-existing




status and to which the proposed nonguideline model can be compared from




a number of perspectives.  However for some applications it may not be




possible to identify an appropriate reference model, in which case specific




requirements for model acceptance must be identified.  Figure 1 provides




an outline of the procedures described in this document.







     After analysis of the intended application, or the problem to be




modeled, a decision must be made on the reference model to which the




proposed model can be compared.  If an appropriate reference model can be




identified, then the relative acceptability of the two models is determined




as follows.  The model is first compared on a technical basis to the




reference model to determine if it can be expected to more accurately




estimate the true concentrations.   Next a protocol for model performance




comparison is written and agreed to by the applicant and the appropriate




regulatory agency.  This protocol  describes how an appropriate set of




field data will be used to judge the relative performance of the proposed




and the reference model.  Performance measures recommended by the American




Meteorological Society^ will be used in describing the comparative perfor-




mance of the two models in an objective scheme.  That scheme should




consider the relative importance to the problem of various modeling




objectives and the degree to which the individual performance measures




support those objectives.  Once the plan for performance evaluation is
                                 XI

-------
(2.4)
 Wri te Tech.
Descri pti on of**^
Proposed Model
                                       eferenc
                                      Model Avai
                                        able
Unsound or
                   'erform
          (2.7)<(ech. Analysi
                     Model/  Not Applicable

                 Acceptable
                 or Marginal
          (3.4)
          (4.0)
          (5.1)
Write Perf.
Evaluati on
Protocol
\
;
Collect Perf.
Evaluation
Data
\

Conduct I
Perf.
Evaluation |
                                                  Technical
                                                  Comparison
                                                  of Models
                                                                 V
                                                  Write Perf.
                                                  Evaluation
                                                    Protocol
        Marginal

                 (2.7)
         Result\   Marginal^
        of Tech.
         Eval.X              (5.2)
                 Acceptable
                                                 Collect Perf.
                                                  Evalution
                                                     Data
                                                    Conduct
                                                     Perf.
                                                   Evaluation
                                           (2.4)
                                           (2.6)
                                           (3.3)
                                           (4.0)
                                           (5.1)
                                                                          (5.1)
                                                                          (5.2)
Figure 1.   Decision Flow Diagram for  Evaluating a Proposed Air  Quality Model
(Applicable Sections of  the Document  are indicated  in Parentheses.)

-------
written and the data to be used are collected/assembled, the performance




measure statistics are calculated and the weighting scheme described in




the protocol is executed.  Execution of the decision scheme will lead to




a determination that the proposed model performs better, worse or about




the same as the reference model for the given applications.  The final




determination on the acceptability of the proposed model should be primarily




based on the outcome of the comparative performance evaluation.  However,




it may also be based, if so specified in the protocol, on results of the




technical evaluation, the ability of the proposed model to meet minimum




standards of performance, and/or other specified criteria.




     If no appropriate reference model is identified, the proposed model




is evaluated as follows.  First the proposed model is evaluated from a




technical standpoint to determine if it is well founded in theory, and is




applicable to the situation.  This involves a careful analysis of the




model features and intended usage in comparison with the source configura-




tion, terrain and other aspects of the intended application.  Secondly,




if the model is considered applicable to the problem, it is examined to




see if the basic formulations and assumptions are sound and appropriate




to the problem.  (If the model is clearly not applicable or cannot be




technically supported,  it is recommended that no further evaluation of




the model be conducted and that the exercise be terminated.)  Next, a




performance evaluation protocol is prepared that specifies what data




collection and performance criteria will be used in determining whether




the model is acceptable or unacceptable.  Finally, results from the




performance evaluation should be considered together with the results of




the technical  evaluation to determine acceptability.
                                Xlll

-------
                         INTERIM PROCEDURES FOR




                     EVALUATING AIR QUALITY MODELS






1.  INTRODUCTION




    This document describes interim procedures that can be used in judging




whether a model, not specifically recommended for use in the Guideline on




Air Quality Models^, is acceptable for a given regulatory action.  It




identifies the documentation, model evaluation and data analyses desirable




for establishing the appropriateness of a proposed model.




     This document is only intended to assist in determining the accepta-




bility of a proposed model for a specific application (on a case-by-case




basis).  It is not intended for use in determining which of several




models and/or model options (similar to an optimization procedure) are




best for application to a given situation, nor does it address procedures




to be used in model development ..M>J "validation." It is not for use in




determining whether a new model could be acceptable for general use




and/or should be included in the Guideline on Air Quality Models.  This




document also does not address criteria for determining the adequacy of




alternative data bases to be used in models, except in the case where a




nonguideline model requires the use of a unique data base.  The criteria




or procedures generally applicable to the review of fluid modeling procedures




are contained elsewhere.3,4,5




     The reminder of Section 1 describes the need for a consistent set of




evaluation procedures,  provides the basis for performing the evaluation,




and suggests how the task of model evaluation should be coordinated




between the applicant and the control agency.  Section 2 describes the




preliminary technical analysis needed to define the regulatory problem,

-------
the choice of the reference and proposed models, arid the regulatory




consequences of applying these models.  Section 2 also contains a suggested




method of analysis to determine the applicability of the proposed model




to the situation.  Section 3 discusses the protocol to be used in judging




the performance of the proposed model.  Section 4 describes the design of




the data base for the performance evaluation.  Section 5 describes the




execution of the performance evaluation and provides guidance for combining




these results with other criteria to judge the overall acceptability of




the proposed model.  Appendix A provides a reviewer's checklist which can




be used by the appropriate control agency in determining the acceptability




of the applicant's evaluation.  Appendix B provides an example illustrating




the use of the procedures.  Appendix C describes a procedure for calculating




non-overlapping confidence intervals.







     1.1  Need for Model Evaluation Procedures




          The Guideline on Air Quality Models makes specific recommenda-




tions concerning air quality models and the data bases to be used with




these models.  The recommended models should be used in all evaluations




relative to State Implementations Plans (SIPs) and Prevention of Signifi-




cant Deterioration (PSD) unless it is found that the recommended model is




inappropriate lor a particular application and/or a more appropriate




model or analytical procedure is available.  However, for some applications




the guideline does not recommend specific models and the appropriate model




must be chosen on a case-by-case basis.  Similarly, the recommended data




bases should be used unless such data bases are unavailable or inappropriate.




In these cases, the guideline states that other models and/or data bases




deemed appropriate by the EPA Regional Administrator may be used.

-------
          Models are used to determine the air quality impact of both new




and existing sources.  The majority of cases where nonguideline models




have been proposed in recent years have involved the review of new sources,




especially in connection with PSD permit applications.  However, most




Regional Offices have also received proposals to use nonguideline models




for SIP relaxations and for general area-wide control strategies.




          Many of the proposals to use nonguideline models involve modeling




of point sources in. complex terrain and/or a shoreline environment.




Other applications have included modeling point sources of photochemical




pollutants, modeling in extreme environments (arctic/tropics/deserts),




modeling of fugitive emissions and modeling of burning where smoke manage-




ment (a form of intermittent control) is practiced.  For these applications




a refined approach is not identified in the Guideline on Air Quality Models.




Also a relatively small number <>[ proposals have involved applications




where a recommended model is appropriate, but another model is considered




preferable.




          The types of nonguideiine models proposed have included:  (1)




minor modification of computer codes to allow a different configuration/




number of sources and receptors that essentially do not change the estimates




from those of the basic model; (2) modifications of basic components in




recommended models, e.g., different dispersion coefficients (measured or




estimated), wind profiles, averaging times, etc; and (3) completely new




models that involve non-Gaussian approaches and/or phenomenological




modeling, e.g.  temporal/spatial modeling of the wind flow field.




          The Guideline on Air Quality Models, while allowing for the use




of alternative  models in specific situations, does not provide a technical





basis for deciding on the acceptability of such techniques.  To assure a

-------
more equitable approach in dealing with sources of pollution in all




sections of the country it is important that both the regulatory agencies




and the entire modeling community strive toward a consistent approach in




judging the adequacy of techniques used to estimate concentrations in the




ambient air.  The Clean Air Act^ recognizes this goal and states that the




"Administrator shall specify with reasonable particularity each air quality




model or models to be used under specified sets of conditions ..."




          The use of a consistent set of procedures to determine the accep-




tability of nonguideline models should also serve to better ensure that




the state-of-the-science is reflected.  A properly constructed set of




evaluation criteria should not only serve to promote consistency, but




should better serve to ensure that the best technique is applied.  It




should be noted that a proposed model cannot be proprietary since it may




be subject to public examination and could be the focus of a public




hearing or other legal proceeding.




     1.2  Basis for Evaluation of Models




          The basis for accepting a proposed model for a specific appli-




cation, as described in this document, involves a comparison of performance




between the proposed model and an applicable reference model.  The proposed




model would be acceptable for regulatory application if its performance




is better than that of the reference model.  It should not be applied to




the problem if its performance were inferior to that of the reference




model.  This model should also meet other criteria that may be specified




in the protocol.




          A second basis for accepting or rejecting a proposed model




could involve the use of performance criteria written specifically for





the intended application.  While this procedure is limited by a lack of






                                 4

-------
experience in writing such criteria and the necessity of considerable




subjectivity, it is recognized that in some situations it may not be




possible  to specify an appropriate reference model.  Such a scheme should




ensure  that the proposed model is technically sound and applicable to the




problem.  Further, the model should pass certain performance requirements




that are  acceptable to all parties involved.  Marginal performance together




with a marginal determination on technical acceptability would suggest




that the  model should not be used.




          At the present time one cannot set down a complete set of




objective evaluation criteria and standards for acceptance of models




using these concepts.  Bases for such objective criteria are lacking in




a number  of areas, including a consistent set of standards for model




performance, scientific consensus on the nature of certain flow phenomena




such as interactions with complex terrain, etc.  However, this document




provides  a framework for inclusion of  iuture technical criteria, as well




as currently available criteria.







     1.3  Coordination with Control Agency




          The general philosophy of this document is that the applicant




or the developer of the model should perform the analysis.  The reviewing




agency should review this analysis, perform checks, and/or perform an




independent analysis.  The reviewing agency must have access to all of




the basic information that went into the applicant's analysis (model




computer code,  all input data, all air quality data) so that an indepen-




dent judgment is possible.




          To avoid costly and time-consuming delays in execution of the




model evaluation,  the applicant is strongly urged to maintain close

-------
liaison with the reviewing agency(s),  both at  the beginning and throughout

the project.  A mininum* of two reports should be submitted to the control

agency for review and subsequent negotiation.   The first report should

contain the preliminary analysis,  the  protocol for the performance evalua-

tion and the design of the data base network.   Before any monitors are

deployed or data collection begins,  it is important that the control

agency concur on all aspects of the  planned evaluation,  including choice

of the reference model, design of  the  performance evaluation protocol and

the design of the data base network.  The second  report  would be submitted

at the conclusion of the study.  It  should describe the  data base, the

results'of executing the protocol, and the model  chosen  for application.
 *As a mechanism to maintain close liaison between the source and
  control agency, the submission of other periodic progress reports is
  encouraged.

-------
2.0  PRELIMINARY ANALYSIS




     As a prerequisite to design of the performance evaluation and the




data base network, it is necessary to develop and document a complete




understanding of all regulatory and technical aspects of the model appli-




cation.  This preliminary analysis establishes the regulatory requirements




of the application and describes the source and its surroundings.  Based




on these factors, the analysis identifies and describes a reference model




or historically based regulatory model which would normally be applied to




the source(s).  The preliminary analysis includes concentration estimates




from the reference and proposed models, based on existing data and appro-




priate emission rates.  If the protocol specifies that the technical




analysis is to be considered in the final decision (see Section 3) then




the application-specific technical aspects of the two models are compared




using techniques described in the Workbook lor Comparison of Air Quality




Models.'  This workbook is used to develop a judgment on the scientific




credibility of the models for the regulatory application.




     The outcome and primary purpose of the preliminary analysis is to




provide a focus for the performance evaluation (Section 3) and for identi-




fication of the requisite data bases (Section 4).  A secondary purpose is




to provide a technical basis for judging the model, in the event that the




performance evaluation is inconclusive.  The preliminary analysis require-




ments are detailed in the following subsections.







     2.1  Regulatory Aspects of the Application




          The preliminary analysis should establish the pollutant or




pollutants to be modeled, the averaging times, e.g. 3-hour, 24-hour, etc.




for these pollutants,  and the limiting ambient criteria (standards, PSD

-------
increments, etc.)-  The current regulatory classification,  e.g., attain-




ment, nonattainment,  PSD Class I, should be documented.   The regulatory




boundaries of the area for which concentration estimates apply should also




be established.  Existing emission limits, if any,  should be identified.




          For example, there may be a question whether the  SC>2 emission




limits from several sources in an attainment area can be relaxed, and if




so, by how much.  In  this case, the 3-hour, 24-hour and  annual S02 ambient




air standards apply,  as do the Class II Increments  for these averaging




times.  There may also be a distant Class 1 PSD area for which any emission




relaxation could result in incrcMnent consumption; as such,  incremental




concentration estimates corresponding to the three  averaging times would




be required in that area.  The allowable time frame for  regulatory action




should also be identified since the evaluation of model  performance




involves a significant amount of  time and expenditure of resources.




Allowable emission rates during the period of model evaluation should be




specified.







     2.2  Source and  Source Environment




          To define the important source-receptor relationships involved




in a regulatory modeling probleir, it is necessary to assemble a complete




description of the source and its surroundings.  Information on the source




or sources involved includes the configuration of the sources, location




and heights of stacks, stack parameters (flow rates and  gas temperature)




and location of any fugitive emissions to be included.  Existing and




proposed emission rates should be identified for each averaging time that




corresponds to an ambient air quality standard applicable to pollutants




under consideration.   In the case of complex industrial  sources it is

-------
also generally necessary to obtain a plant layout including dimensions




of plant buildings and other nearby buildings/obstacles.  Sources should




be characterized in as much detail as possible, i.e. commensurate with




the input requirements of the models (See Sections 2.3 and 2.4).  For




example, source emissions should be Dissembled as mobile and area line




source segments, grid squares, etc.




          Information on the source surroundings are usually best identi-




fied on a topographic map or maps that cover the modeling area.  The




areal coverage is sometimes predetermined by political jurisdiction




boundaries, i.e., an air quality region.  More often, however, modeling




is confined to the region where any significant threat to the standards




or PSD increments is likely to exist.  The locations of major existing




sources (for the pollutants in question), urban areas, PSD Class I areas,




and existing meteorological and air quality data should be identified on




the maps.




          A determination should be made whether the source(s) in question




are located in an urban or rural setting.  The recommended procedure for




making this determination utilizes the techniques of Auer^ where the land




use within a 3 km radius of the source is classified.  Other techniques,




based on population and judgmental considerations may be used if they can




be shown to be more appropriate.




          The method to be used in establishing the ambient concentration




due to all other existing sources should be established.  If nearby sources




are to be modeled,  then their emissions and source characterization needs




to be specified.  Applicable background concentrations and the method




used to estimate them should be documented.

-------
     2.3  Reference Model

          The reference model is the model that would normally be used

by the regulatory agency in setting emission limits tor the source.  The

choice of reference model should be made by the appropriate regulatory

agency and follow from guidance provided in the Guideline on Mr Quality

Models.

          However, not all modeling situations are covered by recommended

models.  For example, models for point sources of reactive pollutants or

shoreline fumigation problems are not included.  In other cases the model

normally used by the regulatory agency might be a screening technique

that does not lend itself easily to performance evaluations.  In these

circumstances the applicant and the reviewing agency should attempt to

agree on an appropriate and technically defensible reference model, which

provides for hour-by-hour estimates based on the current technical literature

and on past experience.  Major considerations are that the reference

model is applicable to the type of problem in question, has been described

in published reports or the open literature, and is capable of producing

concentration estimates for all averaging times for which a performance

measure statistic must be calculated (usually 1-hour and the averaging

times associated with the standards/increments).  This latter requirement

usually* precludes the use of screening techniques which rely on assumed

meteorological conditions for a worst case.

          Where it is clearly not possible to specify a reference model,

the proposed model must "stand alone" in the evaluation.  In such cases
*Some screening techniques do contain provisions for hour-by-hour
estimates and as such they may be used.

                                 10

-------
the technical justification and the performance evaluation necessary to




determine acceptability should be more substantial.  Section 2.7 discusses




a rationale for determining it the model is technically justified for use




in the application.  Section 3.4 discusses some considerations in designing




the performance evaluation protocol when no reference model comparison is




involved.







     2.4  Proposed Model




          The model proposed for use in the intended application must be




capable of estimating concentrations corresponding to the regulatory




requirements of the problem as identified in Section 2.1.  In order to




conduct^ the performance evaluation, the model should be capable of sequen-




tially estimating hourly concentrations based on meteorological and




emission inputs.




          A complete technical description of the model is needed for the




analysis in Sections 2.6 or 2.7.  This technical description should




include a discussion of the features of the proposed model, the types of




modeling problems for which the model is applicable, the mathematical




relationships involved and their basis, and the limitations of the model.




The model description should take the form of a report or user's manual




that completely describes its operation.  Published articles which describe




the model are also useful.  If the model has been applied to other problems,




a review of these applications should be documented.  For models designed




to handle complex terrain, land/water Interfaces and/or other special




situations, the technical description should focus on how the model




treats these cases.  To the maximum extent possible, evidence for the




validity of the methodologies should be included.
                                11

-------
     2.5  Preliminary Estimates

          Once the reference and proposed models are identified,  it is

essential that, at least in a preliminary sense, the consequences of

applying each of these models to the regulatory problem be established.

The questions to be answered are:   (1)  What are the  preliminary concentra-

tion estimates for each model that would be used to  establish emission

limits?  (2) Where are the locations of such critical concentrations?   and

(3) What are the differences between estimates at these locations?   The

preliminary estimates should utilize the appropriate emission rates for

the regulatory problem and whatever representative meteorological data

are available before the evaluation.*  In those infrequent cases  where  no

representative meteorological data can  be identified, it may be necessary

to collect on-site data before making preliminary estimates.

          It is recommended that two or three separate preliminary estimates

of the concentration field be made.  The first set of estimates should  be

made with the screening techniques mentioned or referenced in the Guideline

on Mr Quality Models.  The second set  of estimates  should be done with

the proposed model and the third set with the reference model.   Estimates

for all applicable averaging times should be calculated.  The three sets

of estimates serve to define the modeling domain and critical receptors.

They also aid in determining the applicability of the proposed model (Sec-

tions 2.6 and 2.7), the development of  a performance evaluation protocol,

and the design of requisite data networks (Sections  3 and 4).
*A final set of model estimates,  to be used in decision making,  may
 utilize additional data collected during the performance evaluation as
 input to the appropriate model.
                                12

-------
     2.6  Technical Comparison with the Reference Model




          When an appropriate reference model can be identified it may




prove useful to compare the proposed model with the reference model.




Emphasis should be on dispersion conditions and subareas of the modeling




domain that are most germane to the regulatory and technical aspects of




the problem (Sections 2.1 and 2.2).  The procedures described in the Work-




book for Comparison of Air Quality Models are appropriate for this compari-




son.  This Workbook contains a procedure whereby a proposed model is




qualitatively compared, on technical grounds, to the reference model, and




the intended use of the two models and the specific application are taken




into account.




          The Workbook procedure is application-specific; that is, the




results depend upon the specific situation to be modeled.  The reference




model serves as a standard of comparison against which the user gages the




proposed model.  The way in which the proposed model treats twelve aspects




of atmospheric dispersion, called "application elements," is determined.




These application elements represent physical and chemical phenomena that




govern atmospheric pollutant concentrations and include such aspects as




horizontal and vertical dispersion, emission rate, and chemical reactions.




The importance of each element to the application is defined in terms of




an "importance rating."  Tables giving the importance ratings for each




element are provided in the Workbook, although they may be modified under




some circumstances.   The heart of the procedure involves an element-by-




element comparison of the way in which each element is treated by the two




models.  These individual comparisons, together with the importance




ratings for each element in the given application, form the basis upon





which the final comparative evaluation of the two models is made.






                                13

-------
          It is especially important that the user understand the physi-




cal phenomena involved, because the comparison of two models with respect




to the way that they treat these phenomena is basic to the procedure.




Sufficient information is provided in the Workbook to permit these compari-




sons.  Expert advice may be required in some circumstances.   If alternate




procedures are used to complete the technical comparison of  models, they




should be discussed with the reviewing agency.  The results  of the compari-




son may be used in the overall model evaluation in Section 5.







     2.7  Technical Evaluation When No Reference Model Is Used




          If it is not possible to identify an appropriate reference model




(Section 2.3), then the procedures of Section 2.6 cannot be  used and the




proposed model must be technically evaluated on its own merits.  The




technical analysis of the proposed model should attempt to qualitatively




answer the following questions:




          1.  Are the formulations and internal constructs of the model




well founded in theory?




          2.  Does the theory fit the practical aspects and  constraints




of the problem?




          To determine whether or not the underlying assumptions have




been correctly and completely stated requires an examination of the basic




theory employed by the model.  The technical description of  the model




discussed in Section 2.4 should provide the primary basis for this exami-




nation.  The examination of the model should be divided into several




subparts that address various aspects of the formulation.  For example,




for some models it might be logical to separately examine the methodologies




used to characterize the emissions, the transport, the diffusion, the
                                 14

-------
plume rise, and the chemistry.  For each of these model elements it should




be determined whether the formulations are based on sound scientific,




engineering and meteorological principles and whether all aspects of each




element are considered.  Unsound or incomplete specification of assumptions




should be  flagged for consideration of their importance to the actual




modeling problem.




           For some models, e.g., those that entail a modification to a




model recommended in the Guideline on Air Quality Models or to the reference




model, the entire model would not need to be examined for scientific credi-




bility.  In such cases only the submodel or modification should be examined.




Where the  phenomenological formulations are familiar and have been used




before, support for their scientific credibility can be cited from the




literature.




          For models that are relatively new or utilize a novel approach




to some of the phenomenological formulations, an in-depth examination of




the theory should be undertaken.  The scientific support for such models




should be established and reviewed by those individuals who have broad




expertise in the modeling science and who have some familiarity with the




approach and phenomena to be modeled.




          To determine how well the model fits the specific application,




the assumptions involved in the methodologies proposed to handle each




phenomenon should be examined to see if they are reasonable for the given




situation.  To determine whether the assumptions are germane to the




situation, particular attention should be paid to assumptions that are




marginally valid from a basic standpoint or those that are implicit and




unstated.   For assumptions that are not met,  it should be established





that these deficiencies do not cause significant differences in the






                                 15

-------
estimated concentrations.  The most desirable approach takes the form of




sensitivity testing by the applicant where variations made on these




assumptions are indeed critical.   Such an exercise should be conducted,




if possible, and should involve estimates that reflect alternate assumptions




before and after modification of  formulas or data.  However, in many




cases this exercise may be too resource consumptive and the proof of model




validity should still rest with the performance evaluation described in




Section 3.




          Execution of the procedures in this section should lead to a




judgment on whether the proposed  model is applicable to the problem  and




can be scientifically supported.   If these criteria are met, the model




can be designated as appropriate  and should be applied if its field  perfor-




mance (Section 5) is acceptable.   When a model cannot be supported for




use based on this technical evaluation, it should be rejected.  When it




is found that the model could be  appropriate, but there are questionable




assumptions, then the model may be designated as marginal and carried




forward through the performance evaluation.






     2.8  Technical Summary




          The final step in the technical analysis is to combine the




results of Sections 2.1 through 2.6/2.7 into a technical summary. This




summary should serve to define (1) the scope of the issues to be resolved




by the performance evaluation, (2) the areal and temporal extent of  the




differences in estimates between  the proposed and the reference models,




and (3) the reasons why the two models produce different estimates and/or




different concentration patterns.
                                 16

-------
          The technical summary provides a focus for the performance




evaluation and the design of the requisite data base network.   The  results




of the technical summary are used in Section 3  to establish criteria for




the performance evaluation protocol and in Section 4 to  define  the  requisite




data base.
                                17

-------
3.0  PROTOCOL FOR PERFORMANCE EVALUATION

     The goal of the model performance evaluation is to determine whether

the proposed model provides better estimates of concentrations germaine

to the regulatory aspects of the problem than does the reference model.

To achieve this goal, model concentration estimates are compared with

observed concentrations in a variety of ways.*  The primary methods of

comparison produce statistical information and constitute a statistical

performance evaluation.

     This section describes a procedure for evaluating the performance of

the proposed model and for determining whether that performance is adequate

for the specific application.  The procedure requires that a protocol be

prepared for comparing the performance of the reference and proposed

models.  The protocol must be agreed upon by the applicant and appropriate

regulatory agencies prior to collection of the requisite data bases.  The

description of the protocol includes a scheme to (1)  weight the relative

importance of various performance measures to the regulatory goals of the

evaluation and (2) objectively discriminate between the relative performance

of the proposed and reference models.  Some guidance is also provided on

how to write a protocol and evaluate model performance when comparison

with a reference model is not possible.

     Before going into the details of the protocol, it is important to

review briefly some of the statistical performance measures that are commonly

used to assess the performance of a model against measured data.  It is al-

so useful to consider how the ambient data base is commonly broken down into
*Concentration and meteorological data needed for the performance evaluation
 are discussed in Section 4.   The data base network design/requirements are
 partially determined by the  nature of and amount of performance statistics
 defined in the protocol.

                                19

-------
data subsets which are operated on by the statistical  measures.   Section




3.1 describes these performance measures  and Section 3.2  briefly describes




some of the commonly used data subsets.




     Model performance should be evaluated for each of the  averaging  times




specified in the appropriate regulation(s).   In addition, performance for




models whose basic averaging time is shorter than the  regulatory averaging




time should also be evaluated for that shorter period, provided, of course,




that the measurements are available for  shorter averaging periods.   For




example, a model may calculate sequential 1-hour concentrations  for SC>2




from which concentrations for longer averaging periods can  be computed.




Performance of this model can thus be evaluated separately  for 1-,  3-,




and 24-hour averages and, if appropriate, for the annual  mean.  It  should




be noted that although frequency distribution statistics  are indicated  in




Table 3.1, they may be considered somewhat redundant: when performance




measures of both bias and precision are  used.  For this reason,  graphical




displays of the cumulative frequency distribution of observed and predicted




values may be useful as supplementary aids in the evaluation process.






     3.1  Performance Measures




          The basic tools used in determining how well a  model performs




in any given situation are performance measures.  Performance measures  can




be thought of as surrogate quantities whose  values serve  to characterize




the discrepancy between predictions and  observations.   Values obtained




from applying the performance measures to a  given data base are  most




often statistical in nature; however, certain performance measures (e.g.,




frequency distributions) may be more qualitative thaa quantitative in




nature.
                                 20

-------
          Performance measures may be classified as magnitude of differ-




ence measures and correlation or association measures.  Magnitude of




difference measures present a quantitative estimate of the discrepancy




between measured concentrations and concentrations estimated by a model




at the monitoring sites.  Correlation measures quantitatively delineate




the degree of association between estimations and observations.




          Table 3.1 lists a number of the more commonly used and recom-




mended performance statistics for model evaluation purposes.  These




statistics and the corresponding nomenclature are taken from Fox^ and are




based primarily on the recommendations of an AMS Workshop on Dispersion




Model Performance held in 1980.  Since the statistics and basis for




confidence limits are described extensively in most statistical texts,




only a brief description of how these measures apply to model performance




is presented below.  Although each of the statistics provide a quantitative




measure of model performance, they are somewhat easier to interpret when




accompanied by graphical techniques such as histograms, isopleth analyses




and scatter diagrams.







          3.1.1  Model Bias




                 Many of the performance statistics serve to characterize,




in a variety of ways, the behavior of the model residual, defined as the




observed concentration minus the estimated concentration.  For example,




model bias is determined by the value of the model residual averaged over




an appropriate range of values.  Large over- and underestimations may




cancel in computing this average.  Supplementary information concerning




the distribution of residuals should therefore be supplied.  This supple-




mentary information consists of confidence intervals about the mean




value, and histograms or frequency distributions of model residuals.





                                 21

-------


































cn
•H
cn
•a
pa

-a
d










cn
C
o
•H
cfl
t-l
4-1
C.
CD
O
C
O
0

13
01
J-I
•H
CO
Cu
C
r-i









cfl C
o
u cn
O 4-1 W
4-1 .H 01
-o a vj
c= -H 3
•H 1-3 '*!
4J CO
cn cu o>
w cj 3d
C
rH CU 11
co -a u
cj .H c;
• H '4-1 :fl
4-> c a
cn o VJ
•H 0 0
4-1 4-1
Cfl i-l S-*
4-1 O Ol
CO 4-1 frj


, — 1

CO

o>

43
CO
H












































cn
p;
O
•H
4J
Cfl
(-1
4-J

CD
U

O
O

-d
0)

•H
cfl
PM











cfl
p>
S-i
CD
4-J
C!
1 — l

0)
O
a
0)
-a
•i— (
c
o
CJ

S-l
o
4-1

cn
•H
cn
co
PQ




u
o
4-J
cfl
e
•H
4-1
cn
w





rH
Cfl

}_j
CD
4-1
C


CD
O
C
O)
•H
4H
d
o


u
o
4H

cn
•H
cn
cfl






S-l
o
4J
CO
a
•H
4-1
cn
|T"]



"O
0)
CD 4-J
cj cfl
C CD 0
CO VJ -H
S3 4-1
cn CD
O cfl Cd
4-1 CD
s-i g a;
0) O3
P-i
O
H














;
4-J
:

OJ
rH
Cu
e
cfl
cn

O
3
H








O-i
'V
o
1 CJ



cn
4-J
a
CD
E
4-1
cn
3
-l — 1
"d
CO

-d C
4-> O
•H -H
cfl

4-1 CD
- S-l
U
CD O
rH O
eu o

cfl 3
cn co

0) S-l
a o
O 4-1













l-d

















r**>
CD
C
4-1
i
c~!
|^5
1
rj
C
co
^^








Cu
'V
o
1 C-)












)_l
•H
cfl
Cu
T-,
CD

O
4-1
CO
a

d
o
X
o
o
rH
•H
^










LO
•
O
-o














cn
cfl
• H
PO
0)
"d cfl
c e
CO -H

o> cn
cn cu
•H
O O)
'^ j_j
cfl
0)
t-l >->
Cfl 4-J
S-i -H
CO rH
CJ CU .rH
•r-l D 43
4-1 cn co
cn -H
•H 0 S-l
4-J '<3 cO
cfl >
4-J
cn •• cn
CD cn
1 4-J O
,° -*-1





CU
u
Ol
cn
^^
o
CN]
cn






























cxl
,*!.











"d
rs|
cn













0)
cn
•H
(O

























f — ^
i — I
^_^
•^
—
^i











Csl
-a z



>,
4—1
•H
rH
•H

cfl
-H
S-i
CO


cn
cn
O
S-l
O






















r~N
— 1
S^ X
 -H
3 4-1
rH Cfl
O -H
cn >
43 CD
< p




t



















f 	 S.
t."Nj
s 	 /
 ,
1

o
o
^^
tt^


_ „












£>-l
4-J
•H
, — 1
cfl
0 B
•H M
4-J O
cn Z
•H
4-1 VJ
cfl O
4-J ').r|
CO
4-)
cn cn
1 OJ
^ H1










/~*\
T)

pL|

	



cn
a
o
•H
K^ 4-*
CJ 3
C 43
CD -H
3 S-l
0* 4-1
a) cn
S-l -H



















•
cn
•H
cn
K*~>
rH
CO
C
cO

4-1
O

C1J
Cu
>~»
4-J

cn
•H
43
4-1

S-l
O
4H
rH
3
00
C
•H cfl
C 4-1
cfl CO
Ol -d
0
C4H
4-1 O
O
C 0)
Cu
4-1 >,
3 4-1
43
cn
T3 -H
0) A
4-1 4-1
Cfl

3 0
CJ 4-1
rH
co cn
O 4-1
cn
CD -H
43 X
0)
c
CO cj
O -H
4-1
cj cn
•H -H
4J 4-J
cn cfl
•rH 4J
4J CO
Cfl
4-1 O
CO <3
1 1
CD cu
43 43
CO cfl
O cj
•H -r-1
rH rH
Cu Cu
Cu Cu
^ "^

4-1 4-1
0 0
21 2


r—\ l*~xl
22

-------
                  For certain applications, especially cases in which the




 proposed model  is designed to simulate concentrations occurring during




 important meteorological processes, it is important to estimate model




 bi as under different: meteorological conditions.  The degree of data dis-




 aggregation is  a  compromise between the desired goals of defining a large




 enough number of  meteorological categories to cover a wide range of con-




 ditions and having a sufficient number ot observations in each category




 to calculate statistically meaningful values.  For example, it may be




 appropriate to  stratify data by lumped stability classes, unstable (A-C),




 neutral (D) and stable (E-F) rather than by individual classes A, B, C,




 D, R and F.  The use of wind speed classes may also be appropriate.







          3.1.2  Model Precision




                 Model precision refers to the average amount by which




 estimated and observed concentrations differ as measured by a different




 type of residual  than that used for bias, that is the absence of an




 algebraic sign.  While large positive and negative residuals can cancel




 when model bias is calculated, the unsigned residuals comprising the




 precision measures do not cancel.  Thus,  they provide an estimate of the




 error scatter about some reference point.  This reference point can be




 the mean error or zero error.   Two types  of precision measures are the




 noise,  which delineates the error scatter about the mean error, and the




gross variability, which delineates the error scatter about zero error.




                 The performance measure  for noise is either the variance




of the  residuals, s,^, or the  standard deviation of the residuals, s,.





The performance measure for gross variability is the mean square error,




or the  root-mean-square-error.  An alternate performance measure for





the gross variability is the mean absolute residual, TdT-  The mean





                                 23

-------
absolute residual is statistically more robust  than the  root-mean-square-

error; that is, it is less affected by removal  of  a few  extreme  values.

                 Supplementary analyses for model  precision should include

confidence limits, as appropriate, and computation of  these measures  for

selected meteorological categories as  discussed in Section 3.1.1.
                                                                     *

          3.1.3  Correlation Analyses

                 Correlation analyses  involve parameters calculated from

linear least squares regression and associated  graphical analyses.  The

numerical results constitute quantitative  measures of  the association

between estimated and observed concentrations.   The graphical  analyses

constitute supplementary qualitative measures of the same information.

There are three types of correlation analyses;  coupled space-time, spatial,

and temporal analyses.

                 Coupled space-time correlation analysis involves  computing

the Pearson's correlation coefficient, r,  or an equivalent nonparametric

coefficient such as Spearman's p or Kendall's T.  The  parameters a and b

of the linear least squares regression equation should be included.  A

scattergram of the predicted data pairs is supplementary information

which should be presented.

                 Spatial correlation analysis involves calculating the

spatial correlation coefficient and presenting  isopleth  analyses of the

predicted and observed concentrations  for  particular periods of  interest.

The spatial coefficient measures the degree of  spatial alignment between

the estimated and observed concentrations.  The method of calculation

involves computing the correlation coefficient  for each  time period and

determining an average over all time periods.   Estimates of the  spatial
                                24

-------
correlation coefficient for single source models are most reliable for




calculations based on data intensive networks such as those contained in




a  tracer  study.  Isopleths of the distributions of estimated and observed




concentrations for periods of interest should be presented and discussed.




                 Temporal correlation analysis involves calculating the




temporal  correlation coefficient and presenting time series of observed




and  estimated concentrations or of the model residual for each monitoring




location.  The temporal correlation coefficient measures the degree of




temporal  alignment between observed (Co) and predicted (Cp) concentrations.




The  method of calculation is similar to that for the spatial correlation




coefficient.  Time series of Co and Cp or of model residuals should be




presented and discussed for each monitoring location.







     3.2  Data Organization




          The performance measures described above may be applied to various




combinations of observed and predicted values depending on the objectives




of the evaluation and the nature of the regulatory problem (i. e., the




intended application).  For example, when "once per year" ambient stan-




dards are of primary concern, observed and predicted maximum (or near




maximum) concentrations should be compared.  Since complete space and/or




time pairing is often not important from a regulatory point of view, the




appropriate data combination need not be restricted to only concentration




pairs having the same hour or location.




          There are many possible combinations of observed and predicted




concentrations that may be chosen for evaluation.  Thus, it is useful to




organize, at least conceptually, the complete data set into a matrix of




observed and predicted values as exhibited in Figure 3.1.  Entries in the




center of the figure are completely paired in time and space.  Entries




                                25

-------














LO
e
o
•H
4J
CO
4-1
CO
,








..










CN
C
0,
4J '
CO
4-1,
c/j


•H,
' C,
0
4J
cO ^
£j
i- *
, 	 1-
'
1 .
[ ;
T „
4->
x\_/
cfl
S O«
CJ
zr
Js'"*'
^* Q
o




o
Ed
0
0)
CO
g
	 iH
•
CO
/-~s r-s 
-------
 shown in  the bottom two rows and last two columns represent, are respectively,




 pairs of  maximum concentration paired in space only and time only.




          Since the figure permits illustration of only a few data combina-




 tions which may be of interest from a regulatory viewpoint, a more complete




 tabulation of data combinations (data sets) is shown in Table 3.2.  The




 first type of data combination refers to "peak concentration" which by defi-




 nition excludes the low concentration comparisons.  Except for combination




 A-3 (completely paired peak residuals), all data sets involve some degree




 of spatial and/or temporal unpairing between observed and predicted




 values.   The second type of data combination refers to "all concentrations"




 which comprise complete time and space pairing for all predicted and




 observed  values within a defined category.  For example, data set B-l




 refers to the set of all observed and predicted values at a given station,




 paired in time.  Since each station is evaluated separately, the total




 number of data combinations is equal to the total number of stations.




          The rationale for selecting particular data combinations and




 statistics to evaluate various aspects of model performance is simplified




 by first  establishing major objectives to be accomplished by the performance




 evaluation.  The procedure for establishing these objectives and for assign-




 ing levels of importance to each objective is discussed in the following




 section.







     3.3  Protocol Requirements




          Because of the variety of statistical measures and data combi-




 nations that might be considered for evaluation purposes, it is essential




 that a written protocol be prepared and agreed to by the applicant and




appropriate control agency before the data collection and evaluation
                                 27

-------
  Table 3.2.   Summary of  Candidate  Data Sets  for  Model  Evaluation
   A.   Peak Concentration
       Comparisons
(A-l)  Compare highest observed
       value for each event  with
       highest prediction for
       same event (paired in
       time, not location).

(A-2)  Compare highest observed
       value for the year at
       each monitoring station
       with the highest prediction
       for the year at the same
       station (paired in location,
       not time).

(A-3a) Compare maximum observed
       value for the year with
       highest predicted values
       representing different  time
       or space pairing (fully
       unpaired, paired in location,
       paired in time, paired  in
       space and time).

(A-3b) Compare maximum predicted
       value for the year with
       highest observed values for
       various pairings, as  in (A-3a)

(A-4a) Compare highest N (=25)
       observed and highest  N
       predicted values, regardless
       of time or location.

(A-4b) Compare highest N (=25)
       observed and highest  N
       predicted values, regardless
       of time, for a given  monitor-
       ing location.  (A data  set
       for each station.)

(A-5)  Same as (A-4a), but for sub-
       sets of events by meteoro-
       logical conditions (stability
       and wind speed) and by  time
       of day.
   B.  All-Concentrations
  	Comparisons	

(B-l)  Compare observed and
       predicted values at a
       given station, paired
       in time.  (A data set
       for each station.)

(B-2)  Compare observed and
       predicted values for
       a given time period,
       paired in space (not
       appropriate for data
       sets 'with few moni-
       toring sites).

(B-3)  Compare observed and
       predicted values at all
       stations, paired in
       time and location (one
       data set) and by time
       of day.

(B-4)  Same as (B-3), but for
       subsets of events by
       meteorological condi-
       tions (stability and
       wind speed) and by time
       of day.
                                 28

-------
process is initiated.  Conceptually, the protocol describes how various




performance measures will be used to compare the relative performance of




the proposed and reference models in a manner that is most relevant to




the regulatory need (the intended application as described in Section




2.1.)  To organize this concept it is suggested that the protocol contain




four major components as follows: (1) a definition of the performance




evaluation objectives to be accomplished in terms of their relevance to




the regulatory application; (2) a compilation of specific data sets and




performance measures that will be applied under each performance objective;




(3) an objective scheme for assigning weights to each performance measure




and data set combination; and (4) an objective scheme for scoring the




performance of the proposed model relative to the reference model.




          This section discusses the factors to be considered in estab-




lishing such a protocol for an individual performance evaluation.  Although




some experience has been gained in applying the techniques to actual




regulatory situations, it remains clear that the procedures described




below must remain general enough to adequately cover all types of regulatory




problems.







          3.3.1  Performance Evaluation Objectives




                 The first step in developing the model performance protocol




is to translate the regulatory purposes associated with the intended model-




ing application into performance evaluation objectives, which, in turn,




can be linked to specific performance measures and data sets.  This step




is important since each intended modeling application is unique with




respect to source configuration, the critical source-receptor relationships,




the types of ambient levels to be protected (e.g., NAAQS vs. PSD), averaging
                                   29

-------
times of concern (e.g., 1-hr,  3-hr,  24-hr,  etc.),  and  the  form of  the




ambient standard (e.g., not to be exceeded  more  than once  per  year vs.




annual).




                 In most applications,  the  primary regulatory  purposes




can be stated clearly in terms that  relate  directly to certain performance




measures and data sets.  For example, if the primary regulatory purpose




is to prevent violations of short-term  ambient  standards which might  be




threatened by construction of  a large isolated  SC>2 source,  then the ability




of the models to accurately predict  highest 3-hour and 24-hour concentrations




is critical.  In this example  situation, the primary performance objective




might be stated as: "Determine the accuracy of  peak estimates  in the




vicinity of the proposed plant."




                 While "peak accuracy"  is the first order  objective in




this example, other performance objectives  can  be  stated that  relate  to




the ability of the selected models to perform over a variety of concen-




trations levels and conditions.  For example, additional confidence can




be placed in a model if it is  also accurate in  estimating  the  magnitude




of lower concentrations at specific  stations and for specific  meteoro-




logical events.  Thus, a second order objective  might  be stated as "Deter-




mine the accuracy of estimates of concentrations over  a range  of concentra-




tions, time periods, and stations."




                 A third performance evaluation objective  which can be




derived from this example regulatory application is related to measures




of spatial and temporal correlation. While a model may adequately predict




peaks and average levels at given stations, a measure  of additional




confidence can be gained if the model also  traces  the  time sequence of
                                   30

-------
conceatratLons reasonably well.  Thus, a third order performance evalua-




tion objective might be stated as "Determine the degree of correlation




between predicted and measured values in time and space."  While corre-




lation is a reasonably stringent performance measure (time and/or space




pairing is required), it is ranked below the previous two pertormance




objectives.  Iwen good correlation can be obtained in cases where the




magnitude of peak levels are poorly predicted and for which a large




overall bias exists.




                 It should be noted that the generic formulation and




number of performance objectives for any given application may differ




substantially from those illustrated here.  In other words, the specific




regulatory purpose should be the guide for the selection of those perfor-




mance objectives that are most directly relevant to the intended application.







          3.3.2  Selecting Data Sots and Performance Measures




                 Once the performance evaluation objectives are established,




it is necessary to choose among the various data combinations and performance




statistics listed in Tables 3.1 and 3.2.   These are used to characterize




the ability of the models to meet the evaluation objective.  Table 3.3




summarizes the more important data sets and performance statistics relevant




to each generic objective described above.  These objectives have been




arbitrarily numbered in relative order of importance as they might pertain




to the hypothetical SC>2 regulatory application described above.  For an




actual application, any of the three generic objectives (or some other




derived objective) could have a higher level of importance depending on




the nature of the regulatory problem.




                Table 3.3 shows,  for each performance evaluation objective,





a  suggested list  of the most relevant  data sets and performance measures




                                   31

-------
            Table 3.3.   Summary of Data Sets  and Performance
                        Statistics for Various  Performance  Evaluation
                        Objectives
Performance
Evaluation
Objectives
1. Determine
Model
Accuracy for
Peak Values
2. Determine
Model
Accuracy
Over Entire
Concentration
Domain
3. Spatial and
Temporal
Correlation
Data Sets
(Table 3.2)
A-3a, A-3b
A-4a, A-4b, A-5
A-l, A- 2
B-l,B-2,B-3,B-4
B-l,B-2,B-3,B-4
A-l,A-2
Performance
Statistics
(Table 3.1)
Single Valued
Residuals
s2c /s2c , d
o p
s2d, d
s2d, d
r, Regression
Statistics
r, Regression
Statistics
Supplementary
Graphics
None
Freq Dist
of Top 25
Freq Dist of
All Values
Selected
Isopleths
and Time
Series Plots
Scattergrams
Scattergrams
Notes:  (1)  If particular site(s)  are crucial (i.e.,,  PSD),  then analyses
             should be confined to  a site or  a subset,  of important sites.

        (2)  For reactive pollutants,  performance measures should be
             developed for each of  a number of selected days.
                                   32

-------
along with supplementary graphical displays that may prove useful in the




evaluation process.  For example, the first order objective shown as




"Accuracy of Peak Values" has three data sets.  Except for selected cases,




these data sets correspond to the peak concentration category shown in




Table 3.2.  While each data set offers some measure of information regarding




accuracy of peak estimates, the focus is on different aspects of peak




levels  that may be of greater importance in some applications.  Data sets




A-3a and A-3b relate most directly to short-term ambient standards.




However, they suffer by being statistically non-robust compared to data




sets A-4 and A-5 which involve a greater number of highest values in the




computation of the performance measures.  Data set A-l,  since it consists




of a large number of values (one pair for each time period), is subject




to the  least statistical variability but suffers by including many events




that may be below the concentration range of primary concern.  Thus, the




tradeoff is between the degree of confidence desired and the degree of




regulatory relevance associated with each candidate data set/performance




statistic.




                 The performance statistics are directly tied to the




nature  of the performance evaluation objective and the degree of natural




pairing between the measured and predicted values.  Since the first and




second  objectives both relate to accuracy, measures of bias and precision




(noise  and variability) are indicated.  The third performance evaluation




objective, by virtue of its definition,  involves correlation and hence




the correlation coefficient,  r,  is indicated.   Note that whenever perfor-




mance measures are applied to paired data (e.g.,  A-l,  A-2), the measure




of precison is the noise,  s^,  while for  unpaired data (e.g., A-4a,  A-4b,




A-5), the ratio of variances  is  indicated.






                                  33

-------
                 A precise procedure for choosing data sets and perform-




ance statistics for each objective cannot be illustrated here.   It must




be determined by consideration of the nature of the regulatory  purpose(s),




the degree of confidence desired in the final result and the resources




available for the evaluation.   In specific applications, some of the




statistics and/or data sets may be omitted depending upon the degree of




redundancy or relevance to the regulatory problem.   For example, data set




B-3 uses all available pairs of data but requires that only one set of




statistics be calculated.  This contrasts with data set B-l which also




makes use of all data pairs but requires a separate calculation for each




station.   The decision regarding the use of both data sets (i.e., B-3




and B-l) depends to some extent on the need to know how well the models




perform at specific station locations over all concentration levels.







          3.3.3  Weighting the Performance Measures




                 Once the appropriate performance evaluation objectives,




data sets and performance measures are specified, it is necessary to




establish the relative importance each performance measure should hold




in the final decision scheme.   It is suggested that the relative impor-




tance of the performance measures be objectively established by assigning




weights to the performance evaluation objectives and also to each per-




formance measure according to  how well that measure characterizes the




objective.  The assignment of  weights in any given situation is somewhat




judgmental and may differ slightly among trained analysts.  Thus, it is




important in the protocol to document the rationale used to establish




the relative weights.  It is suggested that, in order to keep the problem




simple, that weights be established on the basis of a percentage or





fraction of a total 100 points.




                                   34

-------
                 Generally the first order objective would be weighted




most heavily while less important objectives would be weighted less




heavily depending upon their importance in the application.  As an example,




the first order objectives might be weighted 50 percent, the second order




objectives 30 percent, and the third order objectives 20 percent.  For




each performance objective, each combination of performance measures and




data sets must also be given a weight.  Again the determination of the




appropriate weight for each performance measure is judgmental and should




be accompanied by a rationale.  Some of the judgments involved, for




example are:  (1) Is model bias a more important factor than gross varia-




bility? (2) Is accurate prediction of the magnitude of the peak more




important than accurate prediction of the location of that peak?  Answers




to these questions vary with the application and will result in different




assignment of weights accordingly,   Those measures of performance which




best characterize the ability of either model to more accurately estimate




the concentrations that are critical to decision-making should carry the




most weight.  If the estimated maximum concentration controls the emission




limit for the source(s), then more weight should be given to performance




measures that assess the models'  ability to accurately estimate the maxium




concentration.




                 The magnitude of the weights should also take into




consideration the degree of confidence that can reasonably be assigned




to the performance statistic to be calculated,  i.e., only minimal confi-




dence can be placed in single-valued residuals  since these values are




non-robust and sensitive to unusual conditions.  Generally, there will be




some trade-off between degree of confidence and relevance of the particular




performance measure.   This means that the most  relevant performance







                                   35

-------
measures may be given less weight than otherwise might be assigned were




confidence in the result not a critical factor.







          3.3.4  Determining Scores for Model Performance




                 The final step in writing the protocol is to  establish




how each performance statistic (calculated by applying a performance meas-




ure to a given data set) can be translated into  a performance  evaluation




score.  Such a scheme involves definition of the rationale to  be used in




determining the degree to which each pair of performance measure statistics




supports the advantage of one model over the other.   Stated differently,




it is necessary to have a measure of the degree  to which better  performance




of one model over the other can be established for each performance measure,




It seems apparent that the more confidence one has that one model is per-




forming better than the other, the higher that model  should score in the




final decision on the appropriateness of using that model.  Clearly this




is important when at least one of the models is  performing moderately




well.  For example, if only one model appears to be unbiased,  the degree




to which the other is biased can be a factor in  quantifying the  relative




advantage of the apparently unbiased model.




                 Qualitatively, the problem of determining which model is




performing better is straightforward.  Clearly,  the model with the smaller




residuals, the smaller bias, the smaller noise and the higher  correlation




coefficient is better.  The difficulty, which is not  straightforward, is




how to meaningfully quantify the comparative advantage that one  model has




over the other.  There are several approaches that can be used.




                 In one approach, a "score"  is derived for each  pair (one




for each model) of performance statistics.  The  number of points which




are awarded is based on the degree of statistical significance attached




                                   36

-------
to the difference in each model's ability to reproduce the observed data.




The level of significance could be determined by the degree to which con-




fidence limits on performance measures of each model overlap or, alterna-




tively, on an hypothesis-testing method in which a specified confidence




level is assigned.  A procedure for awarding points using confidence limits




is outlined in the Appendix B.  In the "example problem" positive points




are awarded for each performance statistic if the proposed model performs




better than the reference model and negative points if the reference




model performs better.  The (absolute) magnitude of the score is dependent




on the relative difference in the model's performance of each model but




is limited to the "maximum score" established for each measure.  Such a




maximum score is directly proportional (or perhaps equivalent) to the




weight for each measure.




                 The reader is cautioned that the actual level of statis-




tical significance is based to a varying degree on the assumptions that




model residuals are independent of one another, an assumption that is




clearly not true.  For example, model residuals from adjacent time periods




(e.g., hour-to-hour) are known to be positively correlated.  Also, the




proposed and reference model residuals for a given time period are related




since the residual for each model involves the same observed concen-




tration for a given data pairing.  However, if such statistical limitations




are recognized, this approach can be useful as a quantitative indicator




for determining which model is performing better in a particular situation.




                 A second approach for assigning points is to assign




points separately to each model for each performance measure; then, by




difference,  derive a net point total for the proposed model;  the point
                                   37

-------
total can be positive or negative,  as discussed below.  Various schemes,




both statistical and nonstatistical,  have been proposed for assigning




points based on the numerical difference between measured and predicted




levels (i.e., the performance measures).  A predetermined function of the




performance measure could be used to  award points for each model.   The




number of possible points could range from zero when the model performs




unacceptably (e.g., the bias exceeds  the observed average by more  than 50




percent) up to a maximum when the model performs perfectly (e.g.,  the




bias is zero).  The net number of points assigned to the proposed  model




would then be the number of points  awarded to the proposed model minus




the number of points awarded to the reference model.  A positive difference




favors the proposed model while a negative difference favors the reference




model.  In essence, this second approach involves a subjective decision




as to what constitutes acceptable performance.  Although this suggests a




"de facto" performance standard, the  result may be informative since the




total accumulated points for each model would serve as an indicator of




how poorly or how well the models are doing overall in terms of the




particular application.






          3.3.5  Format for the Model Comparison Protocol




                 A suggested format for the model comparison protocol,




based on the weighting and scoring  scheme, discussed above, is provided by




Table 3.4.  The example format in Table 3.4 is for the first order perfor-




mance objectives.  A similar format would be used for second order objec-




tives, third order objectives, etc.  An example of "filled out" tables




using this format are provided in Tables 1-4 of Appendix B.




                 In the first column  of Table 3.4 the various data sets





or subsets which will be used to generate statistical or other information





                                  38

-------
                                        o
                                        oo
 o
 u
 o
 JJ
 o
 >J
 Oi
 c
 o
 on
 cfl
 0.
 e
 o
                     t*

                    f)   CD
                     00  0>
                     0)   0
                     P  -H
                     0)  H
 0)
•o
 o
as
                         CD
                        4-1
                     K   C
                     O  -H
                    IM   o
                                    O  60
                                   SI  C
                                       I
                y
                m
                Ui
                0)
               •o
                Ui
               o
 CO
 u
v-l
CK

                     O)
                     U  co
                     C  01
                     tfl  t-i
                     B  3
                     C  oo
                     O  A
                    cu
                    Oi
                                        0)

                                    00 5
                                    C H
                                   -H   (U
                                    <0   U
                                   PU   03
                                        (X
u
4J
0)
                                                                                                                                                  CO
                                                                                                                                                  o
                                                                                                                                   0
                                                                                                                                  1-1
                                                                                                                                   0)

                                                                                                                                   4)
                                                                                                                                                  01
                                                                                                                                                  u

                                                                                                                                                  0)
                                    01   O)
                                   c/3  en

                                    03  ,&!
                                   *->   3
                                    A  en
                                                                                                                                   o
                                                                                                                                   u
                                                                                                                                                  00
                                                                                                                                                   •
                                                                                                                                                  u
                                                                                                                                                  *
                                                                        39

-------
are listed.  The second column specifies the various combinations of time




and space pairing between estimates and measurements.   The third column




lists the performance measures to be employed on each data set and time/




space pairing.  The fourth column contains the numerical  scheme that will




be used to determine the points to be awarded to the proposed  model.  The




fifth column lists the averaging times for which statistical or other




information that will be obtained and the sixth column lists the maximum




points or "weighting" for each statistic (or other objective quantity).




In the last column a rationale is to be provided for the  choices made in




the preceding columns.




                 This format is intended to provide a quick visual summary




of the overall scheme for scoring the relative performance of  the models




and for use in establishing criteria for selecting the best model.  The




actual scoring should proceed in a straightforward manner once the perfor-




mance statistics have been calculated and used to allocate points for




each indicated data set.  A total score can be derived by simply summing




the individual scores which will result in a net positive score if the




proposed model scores higher and a net negative score if  the reference




model Scores higher.




                 Although it is tempting to choose the higher  scoring




model for use in the regulatory application, two additional criteria may




be considered in arriving at the final determination.   First,  it may be




desirable to establish, a priori, standards of performance that must be




met before either model may be selected.  For example, a  limit (positive




or negative) on peak bias could be set that, if exceeded, would be suffi-




cient for rejecting the proposed model.
                                   40

-------
                 A second selection criteria that may be considered is




establishment of a scoring point range that serves to separate outcomes




that clearly favor the proposed or reference model and outcomes that do




not clearly favor either model.  When the score falls within the scoring




ranges or "window" where neither model is clearly favored, the final




rejection or acceptance of the proposed model could be decided by the




outcome of the technical evaluation (Section 2.6 or 2.7).  Under this




scheme a marginal outcome of the performance evaluation coupled with a




marginal or unfavorable outcome of the technical evaluation would suggest




that the model not be accepted.  Conversely, if the proposed model is




clearly technically well founded or superior to the reference model but




its performance score falls in the window it probably should be accepted.




Several factors might influence the width of such a scoring margin including




the representativeness and completeness of the data base and the need to




choose a model having a clear performance edge.




                 If any or all of the above suggested additional criteria




are to be considered, then these criteria and their objective use in the




decision process need to be specified in the protocol.  This requirement




is in concert with the basic philosophy of this document that the entire




decision-making scheme is specified "up-front" before any data are collected/




analyzed which might provide insight into the possible outcome.




                 After the model selection process is completed it is still




desirable to ensure that the chosen model will not underpredict measured




concentrations to the extent that the emission limit inferred from appli-




cation of the model would likely result in violations of the NAAQS or PSD




increments.   This could occur in those cases where one model outscores the





other,  and thus judged to be the better performer, yet it still underpredicts







                                   AT

-------
the highest concentrations.  To cover such an eventuality it may be




desirable to include criteria in the protocol that allow the emission




limits or the model to be adjusted to such an extent that attainment of




the ambient criteria will be ensured.







     3.4  Protocol When No Reference Model Is Available




          When no reference model is available,  it is necessary to write




a different type of protocol based on case-specific criteria for the




model performance.  However, at the present time,  there is a lack of




scientific understanding and consensus of experts  necessary to  provide a




supportable basis for establishing such criteria for models.  Thus the




guidance provided in this subsection is quite general in nature.  It is




based primarily on the presumption that the applicant and the regulatory




agency can agree to certain performance attributes which, if met,  would




indicate within an acceptable level of uncertainty that the model predic-




tions could be used in decision-making.




          A set of procedures should be established based on objective




criteria that, when executed, will result in a decision on the  accept-




ability of the model from a performance standpoint.  As was the case for




the model comparison protocol, it is suggested that the relative importance




of the various performance measures be established.  Table 3.3  may serve




as a guide.  However, the performance score for  each measure should be




based on statistics of d, or the deviation of the  model estimates from




the true concentration, as indicated by the measured concentrations.  For




each performance measure, criteria should be written in terms of a quanti-




tative statement.  For example, it might be stated that the average model




bias should not be greater than plus or minus X  at the Y percent signifi-





cance level.  Some considerations in writing such  criteria are:





                                   42

-------
                 (1)  Conservatism.  This involves the introduction of  a




purposeful bias that is protective of the ambient standards or increments,




i.e., overprediction may be more desirable than underprediction.




                 (2)  Risk.  It might be useful to establish maximum or




average deviation from the measured concentrations that could be  allowed.




                 (3)  Experience in the performance of models.  Several




references in the literature^.10,11,12 describe the performance of  various




models.  These references can serve as a guide in determining the performance




that can be expected from the proposed model,  given that an analogy with




the proposed model and application can be drawn.




          As was the case for the  model comparison protocol,  a decision




format or table analogous to Table 3.4 should  be  established.  Execution




of the procedures in the table  may lead to a conclusion that  the  performance




is acceptable, unacceptable or  marginal.
                                   43

-------
4.0  DATA BASES FOR THE PERFORMANCE EVALUATION




     This section describes interim procedures for choosing, collecting




and analyzing field data to be used in the performance evaluation.  In




general there must be sufficient accurate field data available to adequately




judge  the performance of the model in estimating all the concentrations




of interest for the given application.




     Three types of data can be used to evaluate the performance of a




proposed model.  The preferred approach is to utilize meteorological and




air quality data from a specially designed network of monitors and instru-




ments  in the vicinity of the sources(s) to be modeled (on-site data).  In




some cases especially for new sources, it is advantageous to use on-site




tracer data from a specifically designed experiment to augment or be used




in lieu of long-term continuous data.  Occasionally, where an appropriate




analogy to the modeling problem can be identified, it may be possible to




utilize off-site data to evaluate the performance of the model.




     As a general reference for this section the criteria and requirements




contained in the Ambient Monitoring Guidelines for Prevention of Signifi-




cant Deterioration (PSD)13 should be used.  Much of the information con-




tained in the PSD monitoring guideline deals with acquiring information




on ambient conditions in the vicinity of a proposed source, but such data




may not entirely fulfill the input needs for model evaluation.




     All data used as input to the air quality model and its evaluation




should meet standard requirements or commonly accepted criteria for




quality assurance.  New site-specific data should be subjected to a




quality assurance program.   Quality assurance requirements for criteria




pollutant measurements  are  given in Section 4 of the PSD monitoring





guideline.   Section 7  of the PSD monitoring guideline describes quality






                                   45

-------
assurance requirements for meteorological data.   For any time periods




involving missing data, it should be specified how such time periods, e.g.




data substitution, will be handled.







     4.1  On-Site Data




          The preferable approach of performance evaluation is to collect




an on-site data base consisting of concurrent measurements  of emissions,




stack gas parameters,  meteorological data and air quality data.   Given an




adequate sample of these data, an on-site data base designed to  evaluate




the proposed model relevant to its intended application should lead to a




definitive conclusion on its applicability.  The most important  goal of




the data collection network is to ensure adequate spatial and temporal




coverage of model input and air quality data.







          4.1.1  Mr Quality Data




                 The analysis performed in Section 2 serves to define the




requisite areal and temporal coverage of the data base and  the range of




meteorological conditions over which data must be acquired.  Once the




scope of the data base is established the remaining problem is to define




the density of the monitoring network, the specific locations of ambient




monitors and the period of time for which data are to be recorded.  In




general it can be said that the type and quantity of data to be  collected




must be sufficient to meet the needs of the protocol developed from the




guidance provided in Section 3.  This determination is a judgment that




must be made in advance of the network design; some more specific con-




siderations are now provided.
                                   46

-------
                 The number of monitors needed to adequately conduct a




performance evaluation is often the subject of considerable controversy.




It has been argued that one monitor located at the point of maximum con-




centration for each averaging time corresponding to the standards or




increments should be sufficient.  However, the points of maximum concen-




tration are not known but are estimated using the models that are them-




selves the subject of the performance evaluation, which of course unaccept-




abiy compromises the evaluation.  Tt is possible that the use of data




from one or two monitors in a performance evaluation may actually be worse




than no evaluation at all since no meaningful statistics can be generated.




Attempts to rationalize this problem may lead to erroneous conclusions on




the suitability of the models.  When the data field is sparse, confidence




bands on the residuals for the two models will be broad.  As a consequence,




the probability of statistically distinguishing the difference between




the performance of the two models may be unacceptably low.




                 At the other extreme is a large number of monitors, perhaps




40 or more.  The monitors may cover the entire modeling domain or area




where significant concentrations, above a small cutoff, can be reasonably




expected.  The monitors may be sufficiently dense that the entire concen-




tration field (isopleths) is established.  Such a concentration field allows




the calculation of the needed performance statistics and given adequate temporal




coverage, as discussed below, would likely result in narrow confidence bands




on the model residuals.   With these narrow confidence bands it is easier




to distinguish between the relative capabilities of the proposed model vs.




the reference model.   however, costs associated with such a network would




likely be large.
                                  47

-------
                 Thus, the number of monitors needed to conduct a signifi-




cantly meaningful performance evaluation should be judged in advance.




Some other factors that should be considered are:




                 1.  Models or submodels that are designed to handle special




phenomena should only be evaluated over the spatial domain where that




phenomena would result in significant concentrations.  Tiius, the monitor-




ing network should be concentrated in that area, perhaps with a lew out-




lying monitors for a safety factor.




                 2.  In areas where the concentration gradient is expected




to be high (based on preliminary estimates) a high density of monitors




should be considered, while in areas of low concentration gradient a less




dense network Ls often adequate.




                 3.  Lf historical on-site air quality and/or meteorological




data are available, these data should also be used to define the locations




and coverage of monitors.




                 In the temporal sense some of the above rationale are also




appropriate.  A short-term study may lead to iow or no confidence on the




ability of the models (proposed and reference) to reproduce reality.  A




multi-year effort will yield several samples and model estimates of the




second-highest short-term concentrations, thus providing some basis for a




statistically significant comparison of models for this frequently critical




estimate.  Realistically, multi-year efforts usually have prohibitive




costs and one has to rely on somewhat circumstantial evidence, e.g. the




upper end of the frequency distribution, to establish confidence in the




models' capabilities to reproduce the second-highest concentration.




                 In general, the data collected should cover a period of re-





cord that is truly representative of" the site in question, taking into account







                                   48

-------
variations in meteorological conditions, variations in emissions and




expected frequency of phenomena leading to high concentrations.  One year




of data is normally the minimum, although short-term studies are sometimes




acceptable if the results are representative and the appropriate critical




concentrations can be determined from the data base.  Thus short-term




studies are adequate it it can be shown that "worst case conditions" are




limited to a specific period of the year and that the study covers that




period.  Examples might be ozone problems (summer months), shoreline




fumigation (summer months) and certain episode phenomena.




                 Models designed to handle special phenomena need only




have enough temporal coverage to provide an adequate (produce significant




statistical results) sample of those phenomena.  For example, a downwash




algorithm might be evaluated on the basis of 50 or so observations in the




critical wind speed range.




                 It is important that the data used in model development




or model selection be independent of those data used in the performance




evaluation.  In most cases, this is not a problem because the model is




either based on general scientific principles or is based on air quality




data from an analogous situation.  However, in some semi-empirical approaches




where site-specific levels of pollutants are either an integral part of




the model or are used to select certain model options, an independent set




of data must be used for performance evaluation.  Such an independent




data set may be collected at the same site as the one used in model




development, but the data set should be separated in time, e.g. use one




year of data for model development/tuning and a second year for performance




evaluation purposes.

-------
                 For air quality measurements used in the performance




evaluation, it is necessary to distinguish between (1) the contribution




o£ sources that are included in the model and (2) the contribution attri-




butable to background (or baseline levels).  The Guideline on Air Quality




Models discusses some methods for estimating background.   Considerable




care should be taken in estimating background so as not to bias the




performance evaluation.







          4.1.2  Meteorological and Emissions Data




                 Requisite supporting data such as meteorological and




emissions data should be collected concurrently with the  ambient data.




The degree of temporal resolution of such data should be  comparable to




that of the ambient data (usually 1-hour) or shorter if model input needs




so require.  The location and type of meteorological sensors are generally




defined by the model input requirements.   The more accurately one can pin-




point the location of the plume(s) the less noise that will occur in the




model residuals.  This can be done by increasing the spatial density and




degree of sophistication in meteorological input data, for models that




are capable of accepting such data.  Continuous collection of representative




meteorological input data is important.   if multiple (redundant) sensors




are to be deployed, a statement should be included in the protocol as to




how these data will be used in the performance evaluation.




                 Accurate data on emissions and stack gas parameters, over




the period of record, diminishes the noise in the temporal statistics.  The




more accurate the emissions data are, the less noise in the residuals.




Although data contained in a standard emissions inventory can sometimes




be used, it is generally necessary to obtain and explicitly model with
                                 50

-------
real time (concurrent with the air quality data used in performance




evaluation) emissions data.  "In-stack" monitoring is highly recommended




to ensure the use of emission rates and stack gas parameter data comparable




in time to measured ground-level concentrations.







     4.2  Tracer Studies




          The use of on-site tracer material to simulate transport and dis-




persion in the vicinity of a point or line source has received increasing




attention in recent years as a methodology for evaluating the performance




of air quality simulation models.  This technique is attractive from a




number of standpoints:




          1.  It allows the impacts from an individual source to be




isolated from those of other nearby sources which may be emitting the




same pollutants;




          2.  It is generally possible to have a reasonably dense net-




work of receptors in areas not easily accessible for placement of a




permanent monitor;




          3.  It allows a precise definition of the emission rate;




          4.  It allows for the emissions from a proposed source to be




simulated.




          There are some serious difficulties in using tracers to demon-




strate the validity of a proposed model application.  The execution of




the field study is quite resource intensive, especially in terms of man-




power.  Samplers need to be manually placed and retrieved after each test




and the samples need to be analyzed in a laboratory.  Careful attention




must be placed on quality control of data and documentation of meteorological




conditions.   As a result most tracer studies are conducted as a short-term
                                   51

-------
(a few days to a few weeks)  intensive campaign where large amounts of




data are collected.   If conducted carefully,  such studies provide a




considerable amount  of useful data for evaluating the performance of the




model.  However, the performance evaluation is limited to those meteorolo-




gical conditions that occur  during the campaign.   Thus,  while a tracer




study may allow for  excellent spatial coverage of pollutant concentrations,




it provides a limited sample, biased in the temporal sense, and leaves an




unanswered question  as to the validity of the modei for  all points on the




annual frequency distribution of pollutants at each receptor.




          Another problem with tracer studies is  that the plume rise




phenomena may not be properly simulated unless the tracer material can be




injected into the gas stream from an existing stack.  Thus, for new




sources where the material is released from some  kind of platform, the




effects of any plume rise submodel cannot be  evaluated.




          Given these problems, the following criteria should be considered




in determining the acceptability of tracer tests:




          1.  The tracer samples should be easily related to the averaging




time of the standards in question;




          2.  The tracer data should be representative of "worst case




meteorological conditions";




          3.  The number and location of the  samplers should be sufficient




to ensure measurement of maximum concentrations;




          4.  Tracer releases should represent plume rise under varying




meteorological conditions;




          5.  Quality assurance procedures should be in accordance with




those specified or referenced in the PSD monitoring guideline, as well as





other commonly accepted procedures for tracer data;






                                   52

-------
          6.  The on-site meteorological data base should be adequate;




          7.  All sampling and meteorological instruments should be




properly maintained;




          8.  Provisions should be made for analyzing tracer samples at




remote locations and for maintaining continuous operations during adverse




weather conditions, where necessary.




          Of these criteria, items 1 and 2 are the most difficult to




satisfy because the cost of the study precludes collection of data over




an annual period.  Because of this problem it is generally necessary to




augment the tracer study by collecting data from strategically placed




monitors that are operated over a full year.  The data are used to establish




the validity of the model in estimating the second-highest short term and




the annual mean concentration.  Although it is preferable to collect these




data "on-site," this is usually not possible where a new plant is proposed.




It may be possible to use data collected at a similar site, in a model




evaluation as discussed in the next subsection.




          As with performance evaluations using routine air quality data,




sufficient meteorological data must be collected during the tracer study




to characterize transport and dispersion input requirements of the model.




Since tracer study data are difficult to Interpret, it is suggested that




the data and methodologies used to collect the data be reviewed by indivi-




duals who have experience with such studies.







     4.3  Off-Site Data




          Infrequently, data collected in another location may be sufficiently




representative of a new site so that additional meteorological and air




quality data need not be collected.   The acceptability of such data rests
                                   53

-------
on a demonstration of the similarity of the two  sites.   The existing

monitoring network should meet minimum requirements  for  a  network required

at the new site.  The source parameters at the two  sites should be similar.

The source variables that should be considered are  stack height,  stack gas

characteristics and the correlation between load and clitnatological con-

ditions.

          A comparison should be made of the terrain surrounding  each source,

The following factors should be considered:

          1.  The two sites should fall into the same generic category of

terrain:

              a.  flat terrain;

              b.  shoreline conditions;

              c.  complex terrain;

                  (1)  three-dimensional terrain elements,  e.g.,  isolated
                       hill,
                  (2)  simple valley,
                  (3)  two dimentional terrain elements, e.g., ridge, and
                  (4)  complex valley.

          2.  In complex terrain the following factors  should be  considered

in determining the similarity of the two sites:

              a.  aspect ratio of terrain, i.e., ratio  of
                  (1)  height of valley walls to width  of  valley,
                  (2)  height of ridge to length of  ridge,  and
                  (3)  height of isolated hill to width of  hill base;

              b.  slope of terrain;

              c.  ratio of terrain height to stack/plume height;

              d.  distance of source from terrain,  i.e., how close to
                  valley wall, ridge, isolated hill;

              e.  correlation of terrain feature with prevailing  winds;

              f.  the relative size (length, height, depth) of the terrain
                  features.
                                   54

-------
          It is very difficult to secure data sets with the above emission




configuration/terrain similarities.  Nevertheless, such similarities are




of considerable importance in establishing confidence in the representa-




tiveness of the performance statistics.  The degree to which the sites




and emission configuration are dissimilar is a measure of the degree to




which the performance evaluation is compromised.




          More confidence can be placed in a performance evaluation which




uses data collected off-site if such data are augmented by an on-site




tracer study (See Section 4.2).  In this case the considerations for




terrain similarities still hold, but more weight  is given to the compara-




bility of the two sets of the observed concentrations.  On-slte  tracer




data can be used to test the ability of the model to spatially define the




concentration patterns if a variety of meteorological conditions are




observed during the tracer tests.  Off-site data  must be adequate to test




the validity of the model in estimating maximum concentrations.
                                   55

-------
5.0  MODEL ACCEPTANCE




     This section describes interim criteria which can be used to judge




the acceptability of the proposed model for the specific regulatory appli-




cation.  This involves execution of the performance protocol which will




lead to a determination as to whether the proposed model performs better




than the reference model.  Or when no reference model is available the




proposed model may be found to perform acceptably, marginally, or unaccept-




ably in relation to established site-specific criteria.  Depending on the




results of the performance evaluation, the overall decision on the accepta-




bility of the model might also consider the results of the technical




evaluation of Section 2.







     5.1  Execution of the Model Performance Protocol




          Execution of the model performance protocol involves:  (1) collect-




ing the performance data to be used (Section 4); (2) calculation and/or




analysis of the model performance measures (Section 3.1); and (3) combining




the results in the objective manner described in the protocol (Section




3.3 or Section 3.4) to arrive at a decision on the relative performance




of the two models.




          Table 5.1 provides a format which may be used to accommodate the




results of the model comparison protocol described in Section 3.3.5.  If




a different protocol format is prepared, it should have the same goal,




i.e., to arrive at a decision on how the proposed model performs relative




to the reference model.




          The first column lists the performance objectives.   The next




three columns in Table 5.1 are analogous to the first three columns in




Table 3.4.   The fifth column contains the actual score for each modeling
                                   57

-------
CO cO
•H 42 CD
CO 4-J p
>* O
i— 1 CO o
CO c! CO
d O
< -H CD
1 i p-;
- CO 4->
CO rH
0 3 4J
•H CJ (-<
4-1 ^H O
CO CO CX
•H 0 CX
4-) 3
cQ '"O ^^
4-1 C!
CO cO



































































































































































o
co
•H
U
CO
CX
e
O
CJ
r-l
CD
0


CD
4=
4-1

oo
C!
•H
S-i
o
CJ
to

j_t
0


4-1
CO
g
O
[w
a)
4->
CO
0)
oo
oo
£3
to

I—I
0)
,— i
43
cC
H



CO
•H
co
K^
1— 1
cO
£j
<
^
CO
o
•H
4-1
CO
•r-l
4-1
cO
4-1
CO


CD
)_i
O
O
CO



00
c
•H
00
cO
^_l
0)
^

•H
 CJ
 CD
 01
 >
•r-l
                                                                                            CJ

                                                                                            0)
01
>
         o
         
-------
objective as well as the sub-scores for each supporting performance measure.




The scores in this column cannot exceed the maximum scores allowed in the




protocol.  The last column is for the statistics, graphs, analyses and cal-




culations that determine the score for each performance measure, although




most of this information would probably be in the form of attachments.







     5.2  Overall Acceptability of the Proposed Model




          Until more objective techniques are available, it is suggested




that the final decision on the acceptability of the proposed model be based




primarily on the results of the performance evaluation.  The rationale is




that the overall state of modeling science has many uncertainties regard-




less of what model is used, and that the most weight should be given to




actual proven performance.  Thus when a proposed model is found to perform




better than the reference model, it should be accepted for use in the




regulatory application.  If the model performance is clearly worse than




that of the reference model, it should not be used.  Similarly, if the




performance evaluation is not based on comparison with a reference model,




acceptable performance should imply that the model be accepted, while




unacceptable performance would indicate that it is inappropriate.




          As mentioned at the end of Section 3.3.5, the protocol may contain




other criteria, beyond the simple consideration of the score, to determine




whether a proposed model is acceptable.  For example, the protocol might




specify that when the results of the performance evaluation are marginal




or Inconclusive,  the results of the technical evaluation discussed in




Section 2 should  be used as an aid to deciding on the overall acceptability.




In this case,  a favorable (better than the reference model) technical




review would suggest that the model be used, while a marginal or worse
                                    59

-------
determination would indicate that the model offers no improvement over




existing reference techniques.   if Section 2.7 were used to determine




technical acceptability, a marginal or Inconclusive determination on




scientific supportability combined with a marginal performance evaluation




would suggest that the model not be applied to the regulatory problem.




          Also, as mentioned in Section 3.3.5 the protocol might also




specify standards of performance or provisions to guard against underpre-




diction of critical concentrations.  If so, these additional criteria must




be compared against the performance of the model (in the manner specified




in the protocol) before a final decision on the acceptability of the




model can be made.







     5.3  Model Application




          If, as a result of execution of the procedures described in this




document, the proposed model is found to be acceptable, then the model




should be appropriately applied to the intended application.  The data




base requirements, the requirements for concentration estimates and other




applicable regulatory constraints described in the Guideline on Air




Quality Models should be considered.




          Much of the data collected  during the performance evaluation




may also be used during the application phase.  For example the meteoro-




logical data records may be used as model input.  However, in order to




ensure that temporal variation of critical meteorological conditions are




adequately accounted for, it may be necessary to include a longer period




of record.  Source characterization data collected during the performance




evaluation can be used to the extent  that they reflect operating conditions




corresponding to the proposed emission limits.
                                   60

-------
          The "proven"  model is only applicable  for the source-receptor




relationship for which  the performance evaluation was carried out.   Any




new application, even for a similar source-receptor relationship, in a




different location would generally require a new evaluation.   Significant




differences in the source configuration,  e.g., doubling the stack height




from those in existence during the model  technical test,  may  necessitate




a new evaluation.
                                  61

-------
6.0  REFERENCES

1.  Environmental Protection Agency.   "Guideline on Air Quality Models,"
EPA-450/2-78-027, Office of Air Quality Planning and Standards, Research
Triangle Park, N.C., April  1978.

2.  Fox, D. G.  "Judging Air Quality Model  Performance," Bull. Am. Meteor.
Soc. 62, 599-609, May 1981.

3.  Environmental Protection Agency.   "Guideline for Use of Fluid Modeling
to Determine Good Engineering Practice Stack Height," EPA 450/4-81-003,
Office of Air Quality Planning and Standards, Research Triangle Park,
N.C., July 1981.

4.  Environmental Protection Agency.   "Guideline for Fluid Modeling of
Atmospheric Diffusion," EPA 600/8-81-008, Environmental Sciences Research
Laboratory, Research Triangle Park, N.C., April 1981.

5.  Environmental Protection Agency.   "Guideline for Determination of
Good Engineering Practice Stack Height (Technical Support Document for
Stack Height Regulations)," EPA 450/4-80-023, Office of Mr Quality
Planning and Standards, Research Triangle Park, N.C., July 1981.

6.  U. S. Congress.  "Clean Air Act Amendments of 1977," Public Law 95-95,
Government Printing Office, Washington, D.C., August 1977.

7.  Environmental Protection Agency.   "Workbook for Comparison of Air
Quality Models," EPA 450/2-78-028a', EPA 450/2-78-028b, Office of Air
Quality Planning and Standards, Research Triangle Park, N.C., May 1978.

8.  Auer, A. H., "Correlation of Land  Use and Cover with Meteorological
Anomalies," J_. Appl. Meteor. 17, 636-643, May 1978.

9.  Bowne, N. E.  "Preliminary Results from the EPRI Plume Model Validation
Project—Plains Site."  EPRI EA-1788-SY, Project 1616, Summary Report, TRC
Environmental Consultants Inc., Wethersfield, Connecticut, April 1981.

10.  Lee, R. F., et. al.  "Validation of a  Single Source Dispersion Model,"
Proceedings of the Sixth International Technical Meeting on Air Pollution
Modeling and Its Application, NATO/CCMS, September 1975.

11.  Mills, M. T., et. al.  "Evaluation of  Point Source Dispersion Models,"
EPA 450/4-81-032, Teknekron Research,  Inc.  September 1981.

12.  Londegran, R. J., et. al.  "Study Performed for the American Petroleum
Institute—An Evaluation of Short-Term Air Quality Models Using Tracer
Study Data," Submitted by TRC Environmental Consultants, Inc. to API,
October 1980.

13.  Environmental Protection Agency.  "Ambient Monitoring Guideline for
Prevention of Significant Deterioration (PSD)," EPA 450/4-80-012, Office
of Air Quality Planning and Standards, Research Triangle Park, N.C.,
November 1980.
                                   63

-------
     Appendix A




Reviewer's Checklist
        A-l

-------
                                  Preface







     Each proposal to apply a nonguideline model to a specific situation




needs to be reviewed by the appropriate control agency which has jurisdiction




in the matter.  The reviewing agency must make a judgment on whether the




proposed model is appropriate to use and should justify this judgment with




a critique of the applicant's analysis or with an independent analysis.




This critique or analysis normally becomes part of the record in the case.




It should be made available to the public hearing process used to justify




SIP revisions or used in support of other proceedings.







     The following checklist serves as a guide for writing this critique or




analysis.  It essentially follows the rationale in this document and is




designed to ensure that all of the required elements in the analysis are




addressed.  Although it is not necessary that the review follow the format




of the checklist, it is important that each item be addressed and that the




basis or rationale for the determination on each item is indicated.
                                  A-3

-------
                 CHECKLIST FOR REVIEW OF MODEL EVALUATIONS







1.  Technical Evaluation




    A.  Is all of the information necessary to understand the intended




application available?




        1.  Complete listing of sources to be modeled including source




parameters and location?




        2.  Maps showing the physiography of the surrounding area?




        3.  Preliminary meteorological and climatological data?




        4.  Preliminary estimates of air quality sufficient to (a) determine




the areas of probable maximum concentrations, (b) identify the probable




issues regarding the proposed model's estimates of ambient concentrations




and, (c) form a partial basis for design of the performance evaluation data




base?




    B.  Is the reference model appropriate?




    C.  Is enough information available on the proposed model to understand




its structure and assumptions?




    D.  Was a technical comparison of the proposed and reference models




conducted?




        1.  Were procedures contained in the Workbook for Comparison of Mr




Quality Models followed?  Are deviations from these procedures supportable




or desirable?




        2.  Are the comparisons for each application element complete and




supportable?




        3.  Do the results of the comparison for each application element




support the overall determination of better, same or worse?
                                  A-5

-------
    E.  For cases where a reference model is not used, is the proposed




model shown to be applicable and scieutifLcally supportable?







II.  Model Performance Protocol




     A.  Are all the performance measures recommended in the document to be




used?  For those performance measures that are not to be used, are valid




reasons provided?




     B.  Is the relative importance of performance measures stated?




         1.  Have performance evaluation objectives that best characterize




the regulatory problem been properly chosen and objectively ranked?




         2.  Are the performance measures that characterize each objective




appropriate?  Is the relative weighting among the performance measures




supportable?




     C.  How are the performance measure statistics for the proposed and




the reference model to be compared?




         1.  Are significance criteria used to discriminate between the




performance of the two models established for each performance measure?




         2.  Is the rationale to be used in scoring the significance criteria




supportable?




         3.  Is the proposed "scoreboard" associated with marginal model




performance supported?




         4.  Are there appropriate performance limits or absolute criteria




which must be met before the model could be accepted?




     D.  How is performance to be judged when no reference model is used?




         1.  Has an objective performance protocol been written?




         2.  Does this protocol establish appropriate site-specific performance




criteria and objective techniques for determining model performance relative





to these criteria?





                                   A-6

-------
         3.  Are the performance criteria in keeping with experience, with




the expectations of the model and with the acceptable levels of uncertainty




for application of the model?







III.  Data Bases




      A.  Are monitors located in areas of expected maximum concentration




and other critical receptor sites?




      B.  Is there a long enough period of record in the field data to




judge the performance of the model under transport/dispersion conditions




associated with the maximum or critical concentrations?




      C.  Are the field data completely independent of the model development




data?




      D.  Where off-site data are used, is the situation sufficiently




analogous to the application to justify the use of the data in the model




performance evaluation?




      E.  Will enough data be available to allow calculation of the various




performance measures defined in the protocol?  Will sufficient data be




available to reasonably expect that the performance of the model relative




to the reference model or to site-specific criteria can be established?







IV.  Is the Model Acceptable




     A.  Was execution of the performance protocol carried out as planned?




     B.  Is the model acceptable considering the results of the performance




evaluation and the technical evaluation?
                                  A-7

-------
    Appendix B







Narrative Example
      B-l

-------
                                 Preface




     This narrative example was developed to illustrate the use of the




Interim Procedures for Evaluating Air Quality Models.  Although the




example substantially abbreviates many of the tasks involved in a real




model comparison problem and recommended in the interim procedures, it




does illustrate the task with which users are most unfamiliar, i.e., the




development and execution of the performance evaluation protocol.  The




following comments/caveats are in order to help better understand and




utilize the example:







     1.  The preliminary technical/regulatory analysis of the intended




model application, while included in the example, is significantly fore-




shortened from that which would normally be needed for an actual problem.







     2.  The example was specifically designed to illustrate in a very




general way the components of the decision making process and the protocol




for performance evaluation.  As such, the protocol incorporates a broad




spectrum of performance statistics with associated weights.  The number of




statistics contained in this example is probably overly broad for most per-




formance evaluations and perhaps, even for the problem illustrated.  Thus




its use is not intended to be a "model" for actual situations encountered.




For an individual performance evaluation it is recommended that a subset




of statistics be used, tailored to the performance evaluation objectives




of the problem.  The statistical performance measures and associated weight-




ing scheme should be kept as simple (and as understandable) as possible.




Complexity implies more precision than exists in the performance measures
                                   B-3

-------
and weighting schemes and does not reflect the  current  level  of  knowledge




and experience in conducting performance  evaluations.







     3.  Similarly, the method used to assign scores  to each  performance




statistic (non-over-lapping confidence intervals)  is  not intended  to  be a




"model" to be followed but should be viewed as  only one of  several possible




techniques to accomplish the same goal.







     4.  The example does not illustrate  the design of  the  field measurement




program required to obtain model evaluation data.







     The original narrative example was developed  in  1982 by  TRC Inc.,




under contract to EPA.  This revised example was adapted from the  TRC




contract report to reflect the revisions  made in the  Interim  Procedures




in September 1984.
                                  B-4

-------
                            Table of Contents



     Preface 	  B-3

     Table of Contents 	  B-5

     List of Tables 	  B-7

     List of Figures 	  B-9

1.0  Introduction 	  B-ll

2.0  Preliminary Analysis 	  B-13

3.0  Model Evaluation Protocol	  B-19

     3.L  NAAQS Attainment 	  B-19

     3.2  PSD Analysi s 	  B-31

4.0  Field Measurements 	  B-35

5.0  Performance Evaluation Results and Model Selection  	  B-37

     5.1  Results for Model Performance Comparison in the NAAQS
          Analysi s 	  B-46

     5.2  Results for Model Performance Comparison in the PSD
          Analysi s 	  B-4 7

6.0  Summary 	  B-49

7.0  References 	  B-51
                                   B-5

-------
                              List of Tables
Number

  1    Model Comparison Protocol for NAAQS Analysis.
       First-Order Objective:  Predicted Highest
       Concentrations 	 B-21

  2    Model Comparison Protocol for NAAQS Analysis.
       Second-Order Objective:  Predict the Domain of
       Concentrations 	 B-24

  3    Model Comparison Protocol for NAAQS Analysis.
       Third-Order Objective:  Predict the Pattern
       (Spatial and Temporal) of Concentrations 	 B-26

  4    Model Comparison Protocol for PSD Analysis.
       First-Order Objective:  Predict Highest
       Concentrations in PSD Area 	 B-32

  5    Model Comparison Results for NAAQS Analysis.
       First-Order Objective:  Predict Highest
       Concentrations 	 B-38

  6    Model Comparison Results for NAAQS Analysis.
       Second-Order Objective:  Predict the Domain of
       Concentrations 	 B-41

  7    Model Comparison Results for NAAQS Analysis.
       Third-Order Objective:  Predict the Pattern
       (Spatial and Temporal) of Concentrations 	 B-44

  8    Model Comparison Results for PSD Analysis.
       First-Order Objectives:  Predict Highest
       Concentrations in PSD Areas 	 B-45
                                   B-7

-------
                             List of Figures
Number                                                            Page

  1    Field Monitoring Network Near the
       Clifty Creek Power Plant 	  B-15

  2    Example of Overlapping 95% Confidence
       Intervals on Bias for Two Models 	  B-30

  3    Example of "Tightened" Confidence
       Intervals to Result in Non-Overlapping Biases.	  B-30
                                   B-9

-------
B-10

-------
1.0  Introduction




     The Interim Procedures for Evaluating Air Quality Models,  provide a




methodology to judge whether a proposed model, not specifically recommended




for use by the Guideline on Air Quality Models,2 is acceptable for a




particular regulatory application.  This example model evaluation illustrates




the methodology set forth in the Interim Procedures.




     The Interim Procedures provide a basis tor objectively selecting




between the proposed model and a reference model that is either recommended




in the Guideline on Air Quality Models or is otherwise agreed to be




acceptable for the particular application.  To judge which model is more




acceptable, the technical features of the two models are compared and




then a site-specific performance evaluation of both models is carried




out.  (For certain regulatory applications, EPA does not designate a




reference model.  In these cases the Interim Procedures provide a method




for assessing the suitability of the proposed model,  based on a technical




review of the model's applicability and a model performance evaluation).




     This example application illustrates the use of  the Interim Procedures




to select between a proposed model and the reference  model for a specific




regulatory application.  The proposed model is AQ40,  a hypothetical air




quality dispersion model defined for the purpose of this narrative example.




The reference model will be selected as a step in applying the Interim




Procedures.  The regulatory issue of interest is the  short-term air




quality impact from a coal-fired power plant in relation to maintaining




the National Ambient Air Quality Standards (NAAQS)  and the Prevention of




Significant Deterioration (PSD) requirements.  The  Interim Procedures




specify the following major steps:
                                   B-ll

-------
1.  Perform a preliminary technical/regulatory analysis of  intended model




application.  This includes a definition of the regulatory  issues  of con-




cern, a description of the source and physical situation being modeled,




identification of the appropriate reterence model,  identification  and tech-




nical description of the proposed model, preliminary estimates of  air quality




impacts of the two models and an application specific technical comparison




of the proposed and reference models.







2.  Prepare a model performance evaluation protocol which specifies the




statistical performance comparisons for selecting  the appropriate  model.







3.  Describe the proposed field measurements program required  to obtain




model evaluation data.







4.  Carry out the field measurements program,  conduct the performance




evaluation of the proposed and reference models with the data  collected




in the field measurements program using the statistical performance mea-




sures specified in the protocol and, based upon an  objective comparison




of performance results, select/reject the proposed  model.







     Although each of these steps is discussed in  sequence  in  the  narrative




example, resources precluded a rigorous illustration of Steps  1 and 3.




Thus the primary utility of the example is the detailed illustration of




Steps 2 and 4, the design and execution of the performance  evaluation.
                                   B-12

-------
2.0  Preliminary Analysis




     The first step in applying the Interim Procedures for Evaluating Air




Quality Models is to analyze the regulatory issues, physical setting and




pollutant source to which the proposed and reference models are to be




applied.  The regulatory requirements dictate the impact region and the




averaging periods of interest for the model applications.  The physical




setting and source characteristics are the basis Cor selecting the appro-




priate reference model for the comparative model evaluations.  Additionally,




preliminary modeling estimates of the expected air quality impacts are




made at this time.  These estimates are used subsequently in designing




the performance evaluation data network and the statistical performance




evaluation field program methodology.




     The regulatory issue addressed in this example evaluation is the




short-term (3- and 24-hour average) air quality impact from a coal-fired




power plant located in the Midwest.  The power plant used for this example




is the Clifty Creek generating station located in southern Indiana and




operated by the Indiana-Kentucky Electric Corporation.  Compliance with




NAAQS and PSD Class I requirements for 3- and 24-hour average sulfur




dioxide (SC^) impacts is the specific regulatory concern to be addressed.




No actual PSD Class I region exists in the vicinity of the Ciifty Creek




station; therefore, a hypothetical Class I region 15 kilometers northeast




of the plant is assumed for this example.  To assess compliance with NAAQS.




model prediction of the highest, second-highest concentration per year




within 50 kilometers of the Clifty Creek station is required.  PSD regula-




tions are based on the predicted highest, second-highest impact per year




of the source within the Class I region.
                                   B-13

-------
     The physical setting for this example case is the region surrounding




the Clifty Creek generating station.  The plant is located in the Ohio




River Valley in southern Indiana.  The Clifty Creek station is a baseload




facility and has three 208-meter stacks, with combined average emissions




of 8600 g/s S02-  The average exit temperature is approximately 445°K and




the exit velocity ranges from approximately 25 to 50 m/s,  depending on




the load.  The plant is on a flood plain located on a bend in the river.




Bluffs rise approximately 60 meters along the Ohio River near the plant.




The terrain beyond the bluffs from south-southwest clockwise to north-




northeast is quite flat.  In the other directions are several streams




cutting down to the Ohio River which have created dendritic drainage




valleys.  The maximum relief in the area (plant grade to the highest




monitor, located on a ridge) is about 130 meters and is associated with a




ridge between stream cuts.  The terrain surrounding Clifty Creek is not




"ideally flat"; however, terrain is well below stack height.  The site-




specific monitoring program includes a 60-meter meteorological tower to




measure winds and vertical temperature gradients, and six S02 stations 3




to 16 kilometers from the stack.  A map of the monitoring network is




included as Figure 1.




     Selection of the reference model for this application is based upon




the recommendations of the Guideline on Air Quality Models.  The Guideline




recommends the CRSTER model as appropriate for point sources with collocated




stacks located in regions where the terrain does not exceed stack height.




For this example evaluation the CRSTER-equivalent model, MPTER, cited in




the Guideline on Air Quality Models, will serve as the reference model.




(Unlike CRSTER, MPTER permits the user to specify receptor locations exactly.)
                                   B-14

-------
                          fU
  w*p&
  •1?-".
 ^^
  ->,-*v
 -^
  ^V
  '>Si •
 ;ri-
L
^
l
*<
j

Key
\f] Surface wind measurvment
AS02 Monitor
A Meteorological tower
] f i
hum
f 1 f ?
Kiometws
ri
3
f'
f
^pOTJrtfriSr^
>J^-*-s?s?.3*-A&wr_%; *;>
                                             "*• " -i.  *^K3*" ^^-S*1"®.-''
                                             ./.fcsSig&k^-
                                    Plant
                                   Distance
                                    Ckm)
     Eleva-   Azimuth
     tion    From North
     MSL(m)  of Plant(°)
                 1. Bacon Ridge
                 2. Rykers Ridge
                 3. North Madison
                 4. Hebron Church
                 5. Liberty Ridge
                 6. Canip Creek
                 Clifty Creek Plant Elevation
                 Elevation at Top of Stacks
Figure 1.   Field monitoring network near the Clifty Creek Power Plant.
                              B-15

-------
     The proposed model for this narrative example of the Interim




Procedures is AQ40,  a hypothetical dispersion model.   The computer code




for AQ40 embodies the features of several  publicly available Gaussian




dispersion models.  Because of resource constraints for  preparing this




example, the proposed model description and the technical model comparisons




have been foreshortened.  For this example, familiarity  with the features




of the MPTER model is assumed, and the technical comparison presents only




the key technical differences between AQ40 and MPTER.  For an actual




application of the Interim Procedures, a complete technical description




of the proposed model should be prepared.




     Preliminary estimates of the S02 impact of the Clifty Creek plant were




obtained using EPA screening techniques as recommended in the Interim Pro-




cedures.  These estimates indicate that iraxi mum concentrations occur within




10 kilometers of the Clifty Creek generating station. Refined modeling




using the proposed and reference models AQ40 and MPTER,  respectively,




with 1975 hourly meteorological data has also been done.   On the basis of




the AQ40 modeling results, the 3- and 24-hour average maximum S02 concen-




trations would be expected to occur approximately 3 kilometers south of




the plant.  MPTER predicts that both the J- and 24-hour  average maximum




S02 impacts will occur approximately 7 kilometers northeast of the Clifty




Creek station.  Results of this preliminary modeling are to be used in




designing an appropriate performance evaluation data network by indicating




potential maximum impact areas and are useful in designing the statistical




model comparison methodology required by the Interim Procedures.  Once the




preliminary ambient  estimates have been made, the next step in applying




the Interim Procedures For Evaluting Air Quality Models  is to perform a





technical comparison of the proposed and reference models.  The technical






                                   B-16

-------
comparison of the proposed and reference models should then be performed




following the methodology set forth in the Workbook For Comparison of Air




Quality Models.-*  The purpose of the technical model comparison is to




determine which model would be expected to predict more accurately concen-




trations for the source being considered.  If results of the statistical




performance comparisons, carried out in a subsequent step, are inconclusive,




the results of the technical model comparison can serve as the bias for




determining the acceptability of the proposed model.




     The important technical differences between AQ40 and MPTER are:




     (a)  Terrain considerations.  MPTER simulates the effect of terrain




by subtracting the full terrain height from the effective plume height.




AQ40 uses full terrain subtraction from the effective plume height for




stable atmospheric mixing conditions and half terrain height subtraction




for neutral and unstable meteorology.




     (b)  Dispersion coefficients.  MPTER uses the Pasquill-Gifford hori-




zontal and vertical dispersion coefficients and six stability classes.




AQ40 uses the rural ASME^ horizontal and vertical dispersion coefficients




and five stability classes (one stable class).




     (c)  Stack tip downwash.  MPTER, as run for this example evaluation,




does not invoke this option.  AQ40 does simulate this phenomenon.




     (d)  Plume rise.  MPTER uses the final Briggs'  plume rise approximation.




AQ40 uses the transitional or distance-dependent Briggs'  plume rise




formulation.




     (e)  Buoyancy induced dispersion.  MPTER does not enhance dispersion




due to buoyantly rising plumes,  but AQ40 does employ this option.




     (f)  Wind profile.  MPTER and AQ40 both use a power  law for adjusting





wind speed with height, but  use  different coefficients,  as presented in








                                   B-17

-------
the tollowirig table.  AQ40 uses the predicted wind speed at final plume




height in the denominator of the Gaussian dispersion equation.  MPTER




uses the predicted wind speed at stack height in the Gaussian equation.




           Stability     1      2      3      4      _5      6_




             MPTEK      .1    .15     .2     .25    .3     .3




             AQ40       .LO   .11     .12    .15    .20    none







     (g)  Mixing height.  With both MPTER and AQ40, the mixing height rises




and falls to maintain a constant height above local terrain.  For MPTER,




however, plumes rising above the mixing height have no ground-level impact




and plumes below the mixing height are fully reflected.  With AQ40, on




the other hand, unlimited mixing heights are used for stable atmospheric




conditions, while a partial plume penetration algorithm is employed for




nonstable conditions.




     In an actual model evaluation, a complete technical model comparison




using the "Workbook" procedures would be carried out arid submitted to the




control agency for review and agreement that both the proposed and reference




models are appropriate for the regulatory application at hand.
                                   B-18

-------
3.0  Model Evaluation Protocol




     As previously stated, the two principal regulatory purposes  that




this evaluation protocol addresses are the following:




     0  Compliance with National Ambient Air Quality Standards (NAAQS)




for sulfur dioxide (862) for 3-hour and 24-hour averaging times.




     0  Assessment of plant 862 impact on a hypothetical Class I Prevention




of Significant Deterioration (PSD) region located 15 kilometers northeast




of the plant (3-hour and 24-hour averaging times in the vicinity of the




Bacon Ridge Site.)




     The performance of the proposed and reference models, AQ40 and MPTER,




respectively, will be compared based upon each model's ability to simulate




air quality impacts measured on a monitoring network of six SC>2 stations




in the vicinity of the Clifty Creek generating station.  The period of




record for the concurrent air quality, meteorological and source data




proposed for these evaluations is January 1 through December 31, 1976.




     Since the projected impact areas are different for each regulatory




purpose, the performance of the models for NAAQS and PSD applications




will be assessed independently.  The performance of the models for NAAQS




will be judged based upon the data from the entire six station network




for 1-, 3- and 24-hour averaging times.  Model performance for the PSD




application will emphasize data from the Bacon Ridge Station within the




hypothetical Class I region.  It is possible that different models may be




selected as being most appropriate for each of the above issues.







     3.1  NAAQS Attainment




          Three performance evaluation objectives have  been established




which are important with respect to this  primary regulatory purpose.
                                  B-19

-------
The first-order objective is to test the ability of the models to predict




successfully the highest concentrations for use in the regulatory decision-




making process.  It is recognized that the single-point prediction of the




highest, second-highest concentration Ls statistically unmeaningful;




therefore, performance measures in this group also include analysis of




the uppermost predicted and observed concentrations for the data period




of record.




     The second-order objective is to test the ability of the models to




predict successfully the entire domain of concentrations.




     The third-order objective is to test the ability of the models to




predict successfully the spatial and temporal patterns of concentrations.




Tables 1 through 3 summarize the model comparison protocol for the NAAQS




analysis.  The tables describe the evaluation data sets, the performance




measures, bases for calculating confidence Intervals, averaging times to




which the performance measures will be applied, and the point assignments




for each measure that will be used to score and compare the predictive




abilities of the two models.




     The performance measures listed in Tables 1 through 3 were selected




to reflect the spirit of the American Meteorological Society (AMS)5




recommendations.  The listed performance measures are specifically those




required to test the ability of the models to meet the model performance




objectives stated above.  The AMS recommendations define statistical




procedures for comparing model predictions with observed concentration




values.  In this example, two models are being compared based on how each




performs against the same set of observations.  This three-way comparison




(proposed model vs. reference model vs. observations) poses a formidable
                                   B-20

-------
























i
*4
09 4
M b
co u
» e
•J 0
< o

^ 1

S 44
4! GO
* V
Z £
_ o»
K •*«
p s
44
r* J O

3U '
p o
m c b
S So.
^ 1

0 S
w 7
(D —
i 2
•C V
Oi 1-1
§a
o

b
J 0
IS
i?
w
a
u

hi





































O
^4
a
e
0
^.
i
M






§tt
*>
c
••4 —4
4 A
S £
Z

?
w4
? S
U -4
V H
>
<



0 ••
£ 4
>
a b
— i C
• a
4 c
ai >->
~< a
a o
u c
— 0
JJ ^
a —
•*4 IW
44 C
a o
^ u
to




0
0 «
e 0
Pecforaa
Measuc




IS
Is
b
i* O
0 u
a. a
a
eo









?
0
a
a
3
CO

0
to

a
jj



i
0 -«>-<•
M 0 u 0 -a
0.3 4 U 0
2 > " o2
>,e5 2S
*4 O 44 a
•4 *4 U »
-4 *> »< r* C
-1 0 -O 3 0
A U 0 m 0
• Mw o<2
. g°-5.
• U A) C >
- c = « n
0 5 — 0 jc
BOO!
8 a 5 •
• 4J 1 9 4J
• • i c
0 0 ^H ^4
£ £ 0> > O
i> O« C -* &
-< -. ^<
«£•<•>
0 . .iJ5
0 j: u jj
u u • « >,
3 ^ * "* "*
a O S a >
• -x 5 •« -<
2 •o £ « «


0 0
W (S




m *»
O4








•
^4
A
a
^j
^4
1
0
0
2




O
e

•*4
U
a *
II
w •
^
W -4
-i a
O >

£


£




1^
S*
5 *<
U 0
o e
i?
u -«
a u
— o
§^
ri
a
b 0
M .C
c v
S C
\J C
§-•

O "•»
T3
4J 0
• >
4) U _
£ 0 ("I
o« ; i
-• -O <
3 O •—























































































1 ><
• V ^H
M -< V
S«5
50 JJ W
> a a
*^ o ^
e^ . 2 .
S . 2 u TJ
04 0
•O 3 o C
0 -H S O>
• a • * ^f
3> c a
O •
4J -4 » C
0 • 4^
JSJS-Sf g
• 5?f J
§j« ^ ^2
i a = 0
^* "o 2 *
*• e 2 •
« 0 •" i J=
•38^1 .
ST- & ^
b a 1 >, —
• 0 -7 o
CO o -4 -< O,
Of £ 94
•< o» c o >
Z £ • U rf


0 0
rt n





<•» *
fl«






•

A
•
0
^4
»4
1
0
S






e
s .
-< >.
b ri
a c
|0
s;
*i ^
o •
0 >
b
*•* «4
0 0

i


2



SJ,
O <04
is
•*> 0
c j:
0 i
c e
O **4

,•,
• ?
i!«
5g, s
J: a
I 0

lib
fi Z rt
s • §
• •O M
0 0
* ±1 G
•" 0
tg _ gi
0 t: c
j: 0 -.
O< b b
^ a, o

O • rs «
W «W 4^ C w
O 1 u «J C •
? 2? 3 ? S!
« i» « i g
u ** O 5 -T «
o, M z 2 tn M
2 i " ?•
0 I « 0 • •
J it S -I T! S •
" • » 1 u
>, j 0 0 • 3
** • "* S •
— • C *< S •.-! •
•* • S • 2 1! s
S5|ra|s
4 4 « 4 E7* 4,1
> 5 ** • c
'. - 0 • 4J £ -
38- .532.
i^ssa.-
U Q  in o m in m in m
(^ «4 ^4 ^4 (^ P4) ^4





iHnw r4m«Mn«
<"• «• c«




44 a
• e
• b
- 44 <•*
1 •
*> in

5 &
a, C
s a
• a S
0 f
O jj -i
5 i o
£ h> K






.u
^i
M

«4
0
• •
H :
" i
4 b Q
-4 40
a > 3

S 2 £


i i s




s'J

?i
>
b
0 •
• •
a e
Q n
u O
«•
•» 44
XI 0
n b
4J
b c
£ 0
— g
^ o
m o

u 13
S2
• o
u — •
0 -0
£2
3 a.

-------








•
^
a
*»N
JJ
i




i •
Is
•» •»
s p
• a.

• m
c e
O «-
•— 0i •
jj a •
OB A W «4
M U • H
tn *> >
x c «e
j •
< u
is
4£ O *•*
»u £•.
a AI >
< • • u
•C • «•* «
Z .S • -
01 « C
EC ••* AM
f 2- -5
M 5 J o u c
> O •** «•* V
33 81 32
S t- P W -« »4
(DC S J^ iJ ^
y 55 Q o« ** c
5 # a a o
* si a. ** u
- 00
O ^
M JJ
2 S
i S1 «
8° 8 S
j S S S
M T3 u «
§M O a
O « •
I u x
w •
• a.
u
v4
fel
|H
w
£ §
a
«o




AJ

O
S
m
'"*
*j
e
to
«
^
a
a


i
t* *»
oca
-" o a,
?--
^?:
sll
. = «
i1,-
-*4 £
2 •" «
^3 B •£
• - s
— M £
* 2
- §-
9 -^ C •
•g AI • -»
»Sr 2
 u
£2 Si


moo o *» 4 **t V
M M




41
•}
*
4J
1
4J

•
^
J
g s
I ~x






w
O
§«^
•
3
« t)
II «rt -^
a u •
—• « •
m > u
• •
£ i

2 £

§ i
a •"
• e
J?
M "^ .••&
S u 3
i o i
«i £
v £
*i 0
o • •
-. u
•o • e
a £ «
u a >
°" e *
?§-*
O ^
fl •-*
• - *
> a a
U hd
O AJ j^
5 c w
a « o
001
1
1 3 , «
1 ? Si
' 0, t> I
1 "
2*1 JJ .
« •
u n a
1 >, u a o
AJ • «
i -* • e a,
•H • M r
1 — 3 O> X
£ •»<
1 " 52 g
1 -.c-ll
1 — i 5 5" «
• •M C U
1 ^ JJ -< *1
5 0 JJ C
1 I U -H «
i< i O
i •* e -< c
0 a> j: p
1 -88 ?
I • 0 J=
U M ^ O»
1 3 O -*
• -C A* -C
1 • S OU
• «H • «J
1 X £ U O
i
i
moo i n in m
^ »H ^H f4 *H ^
1
1
1
1
*

M **» •» 1 ^4 « *»
i
*
i
> i
8*»
i •
u •
•4 1 **
S i i
Q 1 •*
> t •
O ^
s • §•
? : »
o ; ^

i

•
•
*> i
^4
M 1
•H 1
o
. '
• r
1 ' -
X ' —
8 i a
i
i ; s
i
s ! «
i
i o* 5
1 * ?5
| • • ft ~, 4J
• •JvV'** > 4%l^4
i »js**5*«i-o-'
At > £ AI •
i v o> u e c — .
Suc«oe«mu
i >o—«-"OgtM«
1 «*--«o«2oi-0
• • ^ A» • O 0
1 J3 C — J= « *< •—«
O 0 J= 0> • *• 0
1 -rt X — » 0 « >
*>*!•£-•• inu
I m 4 ^ g i^ ^
M M C Vi4 fr^ )_' If j_f **
I A* 0 o a a a ~
u c -< » • « 0 J2
1 O«A i c •c — = <
» 0 « • 0 « ^ J= 0 —
I m u 3 — i x a —
U C A# 444 O 0 AJ
I Ai«3
-------
-4 3


53

S§
       >; c
       J a
!:
z a

mS

£s

-«
o *•«
o -a
p «
ft M
5 o.
      5 8
      Oi 1-1
                • H
                O -•
e _,
flQ Ml
                • 0

                £°
                is
                14 O
                O a
                      u i
                      O -*
                      ••* -«4

                      ?s
                       O  * js e
                       v  jz u o
                         *> •— -4
                       >.   > u
                       *>  u   a
                       •M  O Q U
                       •H  w a u
                       •*    u e
                       o  « « •
                       o  c -i o
                         out:
                       --too
                       •  *• • o
                       H  « «
                       (I  w « ^

                       Stl. ?

                       a88S
                       M  c -4 a
                      £S^
       |


       I
                                               I
                                               s
                        §a I  o
                        « 0   »  •
                        e a u m  «
                      •   « *    c
                      3 o> « 4 *»  o

                     55e1 S3
                      M >j o s i  a
                      4 0 -4   w  hi
                      • 4rf 44 4J « AJ
                        -; -< • a  s
                     fl c •
                      • 2
                     A4 •



                     1----
                      O *• JJ     P
                      u  -• e c  «
                      a. c -4 ^ —  >
                        —« *P4 4J    kl
                      •  a   * •
                     • a a £ c o
                     > c u u o -o
                        o « •" •* o
                     •o -»   t u
                     u u a   4 x>
                     > « £ ^ > m
                     W U 4J  « LI 01
                     « 4J   4J V
                     a c u « CD u
                     ^3 O O  — J3 O
                     o o **  o o •»
                                                          B-23

-------








•
e
0
v4
*J
a
u
Jj
<° §
- s
s§
25
3 
<

u
p •-
•* A
• w
•ss
55
- 5
o S
-* t
*> -a
• -i
-< w
JJ C
JS
(0




!i
w a
0 a
t 0 C JC
 "i ** • *
• £ • jj u
; a a • o
a • u
: • 0 u c •
3 - 3 — -. C
< -4 >H • O 0
j 0 « S, 2
I -a > 0 -•
s g u a w
I e a 3 «
o so.
«J -< « _ g
O JJ w x 5
• 0 4 0
0 x JI •
• *«* *rf 0
S S .55
S g g- w
1 83 S2


y> m .A o IA M
0« *H -4 ^





•-* «n *«• ^ m  JJ
u t
0
a e
^
jj
*^ ,S^iS ft 2*^5
- i-So-g S-ljS'tTJc
. 25 >2 ==S-2§.§
fljJC"0 1 •* ~> 0 V <~ u -4
i • i- -< S S. , •Qoij=icw>
<• • I'80'«J^4I Jj Jo ft.
1 •• &|«T; • -. • "•.§£!§
1 -3 *{ « ,1 i-2jj3g.0§E
i|*"i ,0«o5«^ot7>
. Is :.« Iss"l-5S
. ^5 »S ° ' § 5 . 22-J3
, ° S-^S ' o2 u S '"S S °
• S~0 1 .."SOax^iJx
I0JJ.JJ JK»4BJJ J
we,— '•JJfl-^SJ'O'^
i = i o o -i . |-e«-<-Hg0-(
«u0ca ' SSS0^"1"-"
i •eiorj , 5 S 2 a — w J3
00— — -< ' SoJ**1'**""
i * o a-- • S S a
1 •
1 ,
1 sss ; sss s"- sss - s s
1 "*
1
1 1
1
1
1
1
1
J -< "> « | ^ „ » .H -1 » -l^>» , ^
• r» M n ex •<
• !
> I
is S • jj
u • S ' 2
™* 1 ** M V
• i -2 i **
? ' - « • i
S ' 5 > . -
I •' 1 - I'll
J • : a f : 5 i
s ; 5 o , | ^
1 '.
1
- ' - 1

•Ml S ,
o ; % « ; ,
• > 5-5 •-
1 ' . 1« S 85
•5 , • «,^:s ', ««
X '* u « 5 i 5 -*-1
5 ,S «05 'J! w»
° i« >ug ia *•
* > u
1
1
* ; 5 5 s ; 1 s
• ' « «
" ,' 2 5 i ; 3 s
• •'
1 s • • •
1 1 . §-5
i jj •* • o» i -»
1 ., 1 JJ 0 5 u -r
1 u « " ** O 1
** 1 u g, 0 0 a
ic JJ 0 u jj •«»
0 i c -5 o «
i S _ • o 0 2
S 
i
•*^
?
>
o
w
o
f
•
*H
o
X



JJ
IM
IM
O
a
a
o
c

o

s

o






"













B-24

-------
                          • • • a
                          • £ a u
                      «   u u   a

                                *
                    -4 «   U
                             5 i!
                    -
                    -i - j= • S 9 a.
                    _. O   .
            «  S n
            u -^ •—»  •

            *? -O
            . -^ — 0
            • •HOB
            >  0 • -x
                   o ** —
                                                   B-25

-------




















B
C
0
tJ
B

4J
C
C
o
c
o
U

o
_—
«^
W B
Ml W
10 Q
> U

< 01
Z H
<
^!
in c
0 *
<
< —•
Z «

e 4j
C «
u. i.
tr.

C
b =
g S
c *
d 4-1
U «
0.
Z
o t>
W —
IX
< 4->
i- O
§TT
U
b
j -
u
£
i ;
>
.U
t)
o
"fi*
O

4>
•o
w


13
u
••
JI
f-











































V
«
C
o
4J
«
K








t B
3 ^
E C

x O
c U
j



P
c
» ft>
« E
•• —+
U H
>
<



b
O ^
b. C
>
H f
— 4»
D *J
10 C
a —

^ o
C U
U C
— c
+t -D
o -M
J^ C
* c
«v U
1C





0
U Q
C 4>
ID w
£ 3
* a
O c
«* 4>
i- X

c.



1 c>
je
^p
-1
wj
-! 4>
Ci U
C., ID
—
((a


















*J
0)
a
i .,
i ti
t «
4^
V
CO

iQ
+J
«











c
4»
•t
*
«•>
S

__
4>
B

1
tt
a
o
rt
U
B>
ti
€J




«/>
(Sl








•-I














(4
1
u
V

to
iu









c
c
—
-J
a

a>
w

O
\J



B
4)
»


m
V
>-




i
«

A>

5
o
c
o
U

•o
4*
4^
t)

•c
41
U
a


a
i :
o
>
M
8>
to

1 C



<

tw
e o
O a •
— » y * e k.
** ^ •* v « o
B = T £ ^ ~
b B « *J .
4J B h. u I} f
c e a • o o o
v c o ^ —
U O C S. k. *<
0 ~ 0) X > JJ
U x — B a
b V B
TJ O T> -1 t) £
u *j 6 — ti a o
> b E C C B
w. t) b o — t>
« U f tl »H
n v jz u «_•»,*
£ ^ 4J *J B > O
0 B C
— >u a o e c
•COO — ti o
C B _ u C B
B B ^ e B ••« *™"
tl W 4J O W
r; a — o a £, B
» « — c. a
i Si !•; | |
^ 3 1M
C ^ O CJ 4) X 4>
.. « ^: z. w » j:
Cta > •*-' J^ 4 £ *J




in *n
»H «^








f^ **
IN




























JJ
C
4)

0

IW
bj
O
0
U
























..^

1
Q



i
C
0

*J
c
4J
o
i
U
c
a

jj
*

c
o

^
I
1

1

t
1
1
1
1
1
I
1

1

1
t
1
f
1

1
1
1

t

1

1

1

t
1
1

1

1
t

1

1

1

1

1

1

1

1
\
1

1

1

1
1
1
1

1

1

1
1
1

I
1

1
1



1

1

1

1

1

\

\

1

1
1

1

1
1

1

1

1

i

t

i

I

I






C
V
S
•w
&
c
o
JJ
»

8
B
B
«
0
B
£
a
a
&>
X




IT)
rt:








-H














14
1
W
V

o
•^
b»









C
O

*j
«

V
w
8



B
C




O
Z






1
c
c
U

I

i

X
a
C

"0
c
•>
O

•o
tl
w
a.
B
>

•c
ti
>
u
C
K

0







£
3*
£
T
e
>
bl
V
B
S
0

•o
c
B
V
tl
u
^
tl
M
...




*n
»4








^1





























±i
c
4>

u

IW
o
0
u



















1
4J
£




u
o
*j

C
i

«
£

C

c
o

*>
«

*J

4>
U
•4
«
b
o
5.
r
c 2
0 ~
JC
i •
^ ^^
^ B
tl B
C £
V
£
*l ^ .

C £ «"
— > s
t> ~*
a i x
c >. «
0 £ E
•^ t
4-1 i) »,
B C 0
u „
*^ > C
C tl V.
41 %)
W *J
C ^J
0 C <
o a &,




tit
H








•»
IN































































^

<


,
4J
C
a>
>
o
«
E

4J

>.


.X

O
2
I
|

1

1
1
1
I
I
1
1
1

(
1

i
i
i
i

i
i
i



t

i

I

t
i
i

i

i
I

i

t

i

i

i

i

i

i
t
i

I

i

t
i
i
i

i

i

i
i
i

i
i

i
I

i

i

i

i

l

I

i

i

i


i

i

i
i

I

i
i
i

I

i

t

i

i



i
*
b
...

e
•o
§
V
2
e
_o
4V
B
S
B
B
0
M
O
•*
w
B
B
«
X




e
^























N
1
k<
ff

B









C
O

4J
B

V
b

0
o



B
e



a
V
>•




B
C
0

4J
B

4^
C
O
u
c
o
u

^
0)
4J
c

?
tl
k.
a
B
>


9
>
w
ti
a

0



•>
•
|

Jj
C
w
*>
C
4> •
U *
0 *
°4?
^


O

IM
%)
0
u


























*n
1
a



.^
u
a

iQ
(S
**

V
u

.V
c
e


c
IM
B-26

-------
problem for which appropriate statistical methods have not yet been




devised.  The procedures described below for comparing models provide a




decision making framework based upon standard statistical measures.




     The first three columns in Tables 1 through 3 describe the data sets




being used.  The Letter and number code in parentheses in the first column




are included for cross reference to a numbering system recently prepared by




EPA (See Table 3.2 Interim Procedures for Evaluating Air Quality Models1).




The fourth and fifth columns specify the performance measure being addres-




sed and the statistical method that will be used to assign the 95 percent




confidence band about each performance measure.  The sixth column lists the




averaging times to which each performance measure will be applied.  The




seventh column lists the points assigned to each performance measure for




scoring model performance.  The final column briefly discusses each of the




performance measures, providing a rational for using each data subset and




group of performance measures.




     Tables 1 through 3 contain 67 performance measures designed to test




the relative abilities of AQ40 and MPTER to meet the three evaluation




objectives.  In assigning points to each performance measure, an attempt




was made to balance the regulatory importance, statistical significance




and scientific value of each performance measure.  A total of 1,000




possible points has been divided among the three model evaluation




objectives.  In recognition of the regulatory importance of the first-




order model evaluation objective, one-half of the total available points




(5UO) have been assigned to the set of performance measures grouped under




that objective, that is, the ability of the models to predict the highest




concentration values.  The second performance objective, prediction of
                                   B-27

-------
the domain of concentration, has been assigned 300 points, and the third




performance objective,  prediction of the pattern of concentrations, has




been allotted the remaining 200 points.




     Four types of performance measures and associated statistical tests




are being used to judge model performance.  The performance measures (and




associated statistical  tests) are the absolute value of the bias (t-test),




variance (F-test and ^  ),  goodness of fit (Kolmogorov-Smirnov test), and




correlation coefficient (Fisher-z test).  Errors of magnitude of prediction




are considered to be more  critical than errors of scatter of prediction,




therefore measures of bias and goodness of fit (which test magnitude errors)




have been alloted more  points than measures of variance and correlation.




Since the basic prediction time step of both MPTER and AQ40 is 1 hour,  the




1-hour averaging time measures have received more points than the regulatory




averaging times (3- and 24-hours).  This is done following the recommendations




of the AMS Workshop on  Dispersion Model Performance.




     Performance measures  and confidence intervals for each performance




measure will be calculated for both MPTER and AQ40 for the averaging




times indicated in Tables  1 through 3.  The performance of the models




will be compared, performance measure by performance measure and averaging




time by averaging time.  If the performance of the two models is signifi-




cantly different statistically (that is, the 95 percent confidence interval




for either model does not  include the value of the performance measure




for the other model) the points indicated in Tables 1 through 3 will be




awarded to the model that  calculates closest to the observed value-




Positive points are accumulated for each performance measure if the




proposed model performs better; negative points are accumulated if the
                                   B-28

-------
reference model shows superior performance.   For  goodness  of  fit  all  the




points (plus or minus) will be awarded based upon which model  has better




statistical performance.




     If, for the two models, the 95 percent  confidence  intervals  for  the




absolute value of the bias, variance,  or correlation measures  do  contain




the values of the performance measures for both models,  the non-overlapping




confidence intervals for those measures will be calculated, and the




corresponding percentage of the maximum available points will  be  assigned.




For example, if the 95 percent confidence intervals for  the bias  of the




two models overlap each other's mean bias (see Figure 2),  the  confidence




intervals of measures for both models  will  be "tightened"  (See Appendix C)




until the two biases are mutually different  statistically  at  some level




of significance, as in Figure 3.  To illustrate,  assume  that bias is  a




10-point measure and assume the biases become statistically distinct  at




the 90 percent confidence level; then  nine  (90 percent)  of the possible




10 points would be awarded to the model that better predicts  the  bias.




Only integer points will be awarded as fractions  will be rounded. (Although




it is recognized that this methodology may not be ideal  in the strictest




statistical sense, it is acceptable for example purposes and  is easy  to




apply.  Others may wish to propose another methodology  for scoring.)




     Following the completion of all the performance measure comparisons,




the points awarded will be totalled.  If the grand total is >  +100 points,




the proposed model will be deemed more suitable for assessing  the plant




impact for the appropriate NAAQS averaging time.   If the grand total  is




between -100 and +100 points, no decisive conclusion may be reached




regarding the superiority of either model and further analysis would  be





considered (for example, technical comparisons or further  evaluations
                                  B-29

-------
                                                 Mean  Bias
                                                 95% Confidence Interval
   -Bias
0 Bias
+Bias
    Figure  2.   Example of Overlapping 95% Confidence  Intervals
               on  Bias for Two Models
                                               • Mean Bias
                                            h—H Tightened Confidence Intervc
                                  —*-—i
                                       I	+.—|
    -Bias
 0 Bias
Figure 3.  Example of "Tightened"  Confidence Intervals  to Result
           in Non-Overlapping Biases
                             B-30

-------
with more data - see Section 2.)  If the grand total is <^ -100 points,  the




reference model will he judged to have the better performance.







     3.2  PSD Analysis




          As with the NAAQS analysis, three performance objectives have




been established to assess the performance of the two models.   The major




difference between the two analyses results from the fact that only one




station is available in the Class I PSD area, which reduces the number  of




data sets used in the PSD portion of the model evaluation.




          Table 4 summarizes the first-order objectives and associated




performance measures designed to assess the ability of the models to




predict concentrations as required for a Class i PSD analysis.  That is,




the data subsets and performance measures in Table 4 evaluate  the ability




of the models to predict highest impacts within the hypothetical Class  I




area described previously.  The second- and third-order objectives and




performance measures for evaluating the models in this PSD application  are




identical to those presented previously in Tables 2 and 3 for  NAAQS analysis.




Again, one-half the points have been allotted to the group of  performance




measures included under the first-order objective (500 points), that is,




testing the ability of the models to predict highest concentrations at  the




station located within the PSD Class I area.  The remaining points are




assigned to the second- and third-order objectives in the same manner as




was done for the NAAQS model evaluation.  The performance of the models for




the PSD application will be compared after summing the points  that each model




scores in Tables 2,  3,  and 4.   Awarding of points and identification of the




model with superior performance are accomplished in a manner identical  to




the method used for the NAAQS  analysis.
                                  B-31

-------






















V
M
1









































•
0
u
<
0
09
cu
c
••4
a •
a o
» —
< «
< Al
e
a 0
2 s
A* C
i5
Bb Al
, •
J 0
O £
g o>
?4 X
S Al
a. u
•*4
§TJ
0
2 "•
5
& -
§1
Al
«J 0
U 0
Q «r^
O Al
X 0
IB*
*
«3
u
O
Al
•
U
••4
ft)























i


<
1
1
1
J
i
i;
si
X




?s
H -H
0 I"
>


1*
>
• U
"S •,
1 =
OB M
-4 0
sg
5*
•2s
A" C
23
00




ga
0
a L,
o53
 i
B - 0 C 0
3 - |S cS '
3 58-S 2* '
J 8- ?ls '
s . : a = .5 •
fSS-rt-gL '
-* ••* ^ t
u -C • « j
0 . .53 '
0 -C U AJ I
"*»••>,
3 > -< _( |
• AI e AI 0
a o > a > i
0 — O Al ._>
X 'O J= " Al 1
1
I
0 0 |
m in

I
i

I
I
*n «» i
i
U ^ 1
l§
5 • i
8S
Al -<
85
±« '
Q O |
1
2
a
0 1
£

(
i
15
** 1
Is
55
•8 »
S. e« '
— 0 1 |
-gs1
-. Al |
Al « •
« « 1
W 0 0
Al JS 14 |
C Al o
0 1
U Al M
§ • .
°?3
Al 0 — 1 1
a > u
0 " 1
^ a >
o> o a i
44 Al CO
X 0 0, |
. ;^
W •** 0 1
0 Al >
1 5-
• 1 *5
5^.2. '
•» u 43 i
S • * «
o = 0 e i
T5U . Sj ,
S > S «
• O • |
5 AI -• _r o
a 0 AI -i i
Jo a c
-• •• 5 i
" 31 -o 5? o
•S S5 * '
84 a'g 5 '
02 0 « i
"• 9 *< 1 5
-8 = 2 t
• 0 -; a •>
•* « 0 Al 1
3 a e
7AI 1 >,-. |
• 0-40
^ 0 -H — i 5, i
_ c. y a
o o> £ o > i
CO 44 *4 *4 O
* JZ " Al -4 1
1
1
o e i
w »
I
|

1
|
I
"S '
1
1
1
• I
»H i
Al '
• |
O
3 '
a •
• i
s
2
i
i
i
e i
0 '
• fc i
H
* 0 1
OB i
0 S '
«5 '
i; '
-« •
00 .
1
2 '
1
. '
V i
>•

1
e i i
0 •
-4 Al 1
Al •
2:
= •= 1
sw . ,
O 14 •)
c o 0 i
O *M h4
0 0 1
'i. « w '
|? .
JS U 0
O< 0 0 I
-4 B -4
•C £ U 1
I rt
•9 « i
C fl -C
0 e AI i
U 0
a •a ~> '
0 J= i
* .h) .kJ
U5-S
0 -O 1
JC 0 C
O< u O |
S3T ,
4J
O 0
- e i "
•5 — a> 0 a
0 £ 0 ~*
14 A> Al u 0! X
Oi 3 -4 3 C «
_ O » B -4 £
0 J= • • Al
** Al C 0
x-S5*5 S
- . tJS" S.-
-< 0 AI « c o ^i
-< 3 • J= «J 0 B
a ^ H jc 2! a
« « 0 Al Al
> j: 0 •
Al . • >
9 C C Al o 0
rt 0 Al o C K 2
IS ,?&I8
• W • V4 0 >.
-«S«2g3
O 0 Q • «
• SSS1" 2"
2 8^55 *1!
3 "-1 U -« Al -4
" jc a w 0 c i
« o> 2> 0 o -. a
J — 6 £ 5 Q jj
X JB " AI u i a

2£2 o«m eoo
m « "> «rtrt «r>S






•^in:; »«m^ rt <••> •»
" MM


.. ^
: 2
• u
Y -<
At J
»
5 4>
81 2
I *
P ^ rt
S Jo
H Ik K




A*
^4

*" >4 >,



I,'.'
Al 0
•0 ki
0 u •
u ° «
0
• • 9
A C «
0 0 *
••4 i-t
rf«. AJ (J

-------
          The above methodology will result in the objective selection of




the reference or proposed model for eacli of tVie regulatory situations of




concern.
                                   B-33

-------
B-34

-------
4.0  Field Measurements




     Following EPA concurrence that the protocol for model performance




evaluation is technically sound, the field measurements program is




undertaken.  The purpose of the field measurements program is to generate




a data base to be used for the comparative model performance evaluations.




The field program design is based upon the results of the preliminary




modeling, as discussed in Section 2, and the requirements of the protocol,




as discussed in Section 3.  For this example, resource constraints pre-




cluded the design of a data acquisition network and collection of the




requisite field data.  Instead a historical data base is used and a




hypothetical regulatory problem is constructed around that data base with




the primary goal to illustrate the use of the statistical performance




evaluation methodology described in the Interim Procedures.  A real




regulatory problem would require an in-depth analysis of the data require-




ments for a comparative model evaluation.
                                  B-35

-------
B-36

-------
5.0  Performance Evaluation Results and Model Selection




     After completion of the field measurements program, the data collected




are used for the comparative model performance evaluation.  The performance




evaluation follows the plan presented in the protocol.  Once the results




of the performance evaluation are compiled, the decision is made whether




to accept or reject the proposed model based upon the objective scoring




scheme presented in the protocol.  A report containing the results of the




evaluations, the results of the comparative model scoring, and the decision




whether or not to accept the proposed model is submitted to the control




agency for review.  It is essential to calculate the statistical performance




measures and apply the decision criteria exactly as specified in the




preplanned protocol.  Adherence to the protocol ensures that the decision




is completely objective.




     For this example evaluation, it is assumed that the control agency




approved the performance measures and scoring scheme proposed in the




protocol as presented in Section 3.  Tables 5 through 8 present the




results of the example model evaluations called for in Tables 1 through 4.




The first three columns of each table list the data sets being compared




and indicate whether or not the observed and predicted concentrations are




paired in tLme or space.  The next two columns list the statistical




performance measures and the statistical bases for calculating confidence




intervals for use in scoring performance.   The sixth column gives the




averaging time for which each comparison has been made.  The next three




columns list the actual performance of MPTER and AQ40, and, where




applicable, the level of significance at which the models demonstrate




statistically different performance.   Where the variance ratio is utilized
                                   3-37

-------
   e^
s  i?
I  8*
   «*>
   1*
   is
   £ •
   s «

.1
3,
s
i.
i:
«.:
id
0 •
•- • X
.co-*
» e -i
9 •
" • U •
Hm M ^ ^ T
** 0 W ^ Al
0 0 - C
• M • i
•H 5 U -« L
• e • *> <
> S a. • •!
3O M Ik
— -1 CO -
we c
-4 "0 «
e o "•"
01 X 4
«4
OB
R °
So
I <
•
u
O
•* 5
u M
<2£

?
?;
W «H
I*"
•
**
• •
a 6
01 c
g
«- 13 -^
fl ~* fl
O *< >
3 o «
• O JJ
— e
« 0 "
jj k<
OJ
s _
W •
c •
a u
s s
0 .
U -4 X
« c a
g. 0 •
If ?
4J ^ M
.; 2
b
S? J1
2


2

•o
S?
o •-
•3 S
• M
u ••*
Qi C

•
-. .C
4i *t
Zc
M •* —
e 
W «H fM
« e
I- ^
5 • M
*2 S
4J <•* H
.; »
k*
S^ J»
2


2

S
25
w u
44
e e
• -«

If
>
us u
o> • •
»< a M
& a Z
1 O 0
?,!
§S2
•S?
* *J «»H
*> O u
o -« 0
« -0 jj
x: « -.
3> u c
2£i



0 10
-^ •*
+

m in
•^i ^




S S
^ "»




o »
• •
« e
^ Irt
M 1
t

f to

0 W
to f»«
M 1
1

^
•
•
M
1
44
•
!
1



• ^
U Y
S J>
2


s

m •*
>T
•0 <
0 •"
>
u
2 .*
II
«4
«^ »U
»rt «
M b
JJ
u e
£ g
« a
in o
S 2
• 91
8U
0
U -4
a 'o
S, a
A u
s a.
             »  in  m
                   •*•   +   i
          e  in  in  m  in   in
          **        M  HI   -^
         in
         Ol
               01
               at
                                                             >e
                                                             m
                   I
      a

      0
      I
      ki
            m  oo
            01  m
         1

      o   3

     >


     2
                             «
                             >H
                             3
B-38

-------
•o
§




















i
•»H
Jj
25
Sg
J V
II
Is
< V
2 ^"
o>
« s
e1
RISOH RESULTS
itlvai Predict
< 8
If *
§S
£ 0
HODI
rlrat-Ord




















•
44
e
«•
S
I
•*
M •
.c
u •
.2 • >.
£ O *+
*2«
44 i o
•J • u —
0 0
* «4
-i S u
020
> « £
« o
»J *•< «H
M 0
«4 fl
e o
tr-i
a

o
3
0
.«
u
« g
0. •
u


_
«
a
00

44
•
9i i i Z




ovn MOM owei
O^*O ^IA*^ ...
M » M w •» _l ||
m M*» or^n
••• r»«-t r»rtf»
M9i0 «^fn ...
» l*» -H » ID ~< ||
m **



M r« CM

*" O
M 2
0 U
V •
tt a)
• §
f I 1
5 ~x S
A«
 u d)
• •
0 0 O
x x 4

ooo
222
I1-
• >- -1
•3 o i
M ** <
s- -
;S.
0 44
44 0 C
O £ 0
•5 44 >
•o 0
0 e
w — o
ac J
. 0 44
0 -"
* % a
11 "
0 44 *
> C U
ui 0 0
0 v *
a c 44
SO 0
u c
O ^ CI O ("^ I/) |f^ ^ y^
P| ^ *4 *4 ± 1 (M -i ^
+ 11 + * ' 7 V T

MMM OMM MMM
N -^ "• -^ CM -( -1






S S S » » »
ooo MOM yoo
» o> as ati««i 22Z




^MV» r^nn on*
-m» «*"*. «! " «
«o>n *•< o -< ooo
^ III
f*# ft *
l^«p- ... .*.
*^M 000 000
1 i i
1 1


-ms -«s -«»

44 >
: §
2 ™ 1
_ « OD -4 |
0 « p« » >
r< r« •> n o
0* >H «> — u
• m «• o
s *- f
8 5' I
£ i 0* 2
- §
5 ..
M * 44
O •« ••
44 •
• > M
u _ O
s! s
s '1 1
s 5! I
£n n
£ £
• _
2S *
s a 5
•f-l'SMOl* 44
>**••— 2wu a
« » w e "2 c 2S £
| g ?S § 5 g IB
jj*444ja3'c§Sj;«
• -SOaco 5xi
"•«-4400445» .
«§s^-4:« so;j,
-3 8SJ5 °Sg"f
2 * _ £ " n -" o a;
"•"g»jP 0 j= .; . "
W O Q aV A T» a* M
we** S^^^S11
^|*S.-aa3SI5
• I{{5gs°-c->.
*8"l3"ii^sj*
44ijSc-'2B£S-"
TTl?* ^0044445 ^^
•*•*•"•> k. -« «-'U>4
InaSS**** "* e
• wu0c^«k« iMaa
i^ssg01!0^
9'0*j0C00O^K0a
s'JiS.ilX'0-c-''«>"
= O>C4lOIU«)4444r«0 0












































?
^
a
Ml
3
O
a
O
44
g

1
u
2
«
                                         B-39

-------





















c
o

XI
M
2 s
< 2
5a
I-
< s
TABLE S
(Continued)
RESULTS FOR (
i Predict Hl9
0 •
§ >

i!

0 o
j jj
§u
^
XI
hi
••H
Bk
































5 :
ig
x,2
*• • kl
O O
•sis
> a o.
35^
^ i
*4

•IN
Q

•

«
£
AJ

W
<2

etwork
c

0


o







in




o
9\
O


in
in





XI
XI
1
k.


-i
a
hi
•
Jj
_•
u
*






i
XI
a
£


f
XI
£


41
XI
a

0
S
a
a
g
XI
•o





























(10280)
i

_-
i

k*
>
?
2
•
|







•
•"
e

•
§
«
ki
e
m
a
0
«j
o

a
z
c
XI
a



















































?

W
!
0
in
c*

u
2

^
m
^i
a)
i
U
V
&
•s

o r*
c« in
i f
0 0

m

—
XI
0

^ i




„
91
O
1


p^
0



>
i
u
Q
1
O
J5
0
hi

XI
•M
VI
0

s
c
9







i
a
XI
XI
1


I/I
1
«c


.
tt ?

Q
U a
a *

XI
S i?
O —
0 -
o a






















































i



3
O
4
o

"Q

;
«
B-40

-------





























vC
a
03

























































CO
M
X

2
s
i
06
£
I
i
g
00

a.
S
j
u

X
















































rations
ncent
o
u
o
c
a
I

e
.c
o
•o
0.

«
2
2
8

Lri
"C
p
•a
8
01
to























^
c

i
•
3
J5
X
X
u •
-4 • ><
*» • 0
•4 a u ->
0 0 *
• IM a
—4 M u -4
• i « u
•* mi n. ft
•* ^ B« **
5 - -. M
•^ -*4 *^ 09
V. 0
11 5
CO
u
c
a
w
0
a.

?
y
c
>

"
• c
a o
• i
•H •O
4 -^
•4 C
•J O
• u
u u
a o
A* fe
co


0

3
o

•
Ou




I
u
a
a*




1













•Q
•a
w
a


•
•W
c

£


^

c

0
IM
Q


O
OC
I
sc


K
H


a
•
c




B>

3
^
»
"'



1

tf
O
a







0
(D

3
*
4J
0
49
«5



^




m




















~
S
i
•
1
«
c
o







*
•^
05


•
X
«
a

i
c
o -»
sr
° £
•0
«
0 C
*4 O

t> u
a u
B
a £.
> U
T3 a
a
All observi
tratlon at


o o in
+



m in o









oooox^o oooox>o oaoou^o tnutininino
V V V V V V







s



t;
4-*
Csl
X


o

0 *fl
C 3
Q. * T3
fj -™* -*^
p S «
Q ^ M


s
*"
d
4f
**











f



C£0 § C * § "^ '' § "^ *
fi <*4 *O » 01 0 * ••* 'O C Oi 0 *O ••* *O £ OS 0 "0 ^^ 13 ^ OS O
-• CC « 0 U — Ofi «J U k« -* OC "C tj u -* 3C « CJ U
as I -^ u a x>cj at x xu a: x >>U
o c. *J « c •*-• n c ** B c j-»
c M .c o u a* c u js o " a* EU^IO^O* c trf ^ o *-• o<
O0 -^
O ^ u ^ J3 C y J< - Q .Q C O J* *^ il i C Ujtf^d«>C
"0 >( 0 ff "^ fl * > O « -* *J » O at "** fl 0>«00M4<0
ffl OS Z L!, J U 03 K ,'i = J CJ a IX Z = -3 U 03 CC Z = -3 U


*N (N
•4- +



tn in









mSoo*no tnooSSS
O*£


^H ,H 04 ^
r>«^eo«o9 -H -^^



(N





































0 7« •) u T? JC 0 71 CO M *U ^
•6 -^ ^ j; cc «i *o -* "a .c as v
S"!0*,^ S*5U>.5
• e -^ a c w
CwJZOuQt CM-CO1*^
'B^O^'^IO fl^O^-^fl


























































•a
«
2
3
U

•J
4J
O
z
B-41

-------



















•
I
ij
•J
kl
AJ
C
(0 0
S g
S3
X 
£ O -i
> C -1
S *
^ *» i ^ •
O O C
0 'M •
*H O U h4
« S 0 0
> a 0h «4
0 O u
•J «4 »•) ~4
*u 0 Q
C 0
£. 2
a
9 O
25
s<
w
O
*4 ac
b u
£ E
CM &
X
?
?s
Is
•
«4
Btlcal Bas
Confidence
ntecval



0 O O
ft rt -^





f) rj ri rj f) ri O f ) CJ rj CJ r\ U Cl CJ tj CJ Cl
Jii^Zz 222z2z 2i2z22




1 1 1 1 1 1 I 1 1 1
W 1*^ ^ f) Of dD *"^ N ^ O <"* CN ^H ^H ^ ^H 
jj
£
u
%d
«rf
O
•
•
0
c
1

1
a
S








c.c0 CwC0 c £ «
• oujf "S0^ 'a0^
0encDk>13.o 06 *>«tj 06 *>.u
a e *• • ?•" • e**
c u ,c 5 ^ a c M £ 5 ^ a, c M J= a t, a
O0«-4 O « JJ M V -.
u^ki^-ac u ji •' J3 a c o^^-Q.ae
O >, O « — O O >, O 0) — O « >. 0 0 — «J
a a z = J u « a; :K z J o B « 2 = J u


o --o -a omin o o o
H 1 1 -I + +. CM -( _
1 +• •*• + +


tAWiMt otirt noo
«• -H «4J ^ P< — 1 _4






ooo m u-i m s z" z
» •» 



»o>-w —i o in Ni0«
• • • t* C4 ^H M »H »^
« in « eo in » . . .
«r !•>•*; rt » II
m -«
«iAf^ r>4O(N n»»r-
^
.'J o
a c
0 u
 u a

• •
5 5 2
• •
000
» x 2


i
e •
0 •"
gs
85
•o «
0 t»
4J
0 «
-4 VI
•3 -« '
0 -W
u C
O> 0
• kl
a 0
> *4
IS
> o
0 i> c-1
a a i
a "- a
O *« —


(M
1


O
M






tf* «» y-
O O ;.-
CM (M vc
V V



* ^ o
» » p^
(N 1 VO
1

O 9* *£
00 VO O
fM I r~
1


^
4J
B
0
4J
1
4J
S
?
a
a
0
§





a,
-¥
-4 0
0 U

•
5
•
0
X

1
. ? - ,
C .-4 - C
0 O Q O
ciSS-0
855S-3
0 3 O O
"o i 0 -4 —
0 C 4J ™ —
u - -4 6 •»
o — —"O -< i
-• m — 2 o  O 4j n 0 (d
« -^ 0 ^H "^
n *> n — xi
0  C O " 4J =
u 3 O O a O
0 0 — 03
Su -. JJ 13 J->
o 4 0 e -4 .
Ouuia-3 
-------















«
0
«•)
«•*
«
M

«S
M U
CO C
33
Z %4
< o
85
2 o
TABLB G
(Continued)
RESULTS FOR
Predict the 1

^
*•* 0
S >
^ ^4
a> u
§o
•
t-i
rt
•!
§9
*o
u
?

g
V
ca

















•o
• 0
3*
II
is
-4 C
sl
c
o •
-4 O >,
£ O —"
* C *H
9 *
M i u *
%4 0 H *"4 AJ
O Q U C.
• u a 0
-I O " — • >-
> a a, a a.
0 O 4J »J
j J; -. ca -i
IM e a
"e 13 n
91 X a
«4
oa
• o
SS
I*
^
o
<" *
u oa
£|
f
•*4
?.'
U -*N
J*

Btical Baals
Confidence
interval
^4 r*
4rf kl
• 0
u tu
CO

•
g ;
« w
• s
hi A
O •
*4 «
s*
a.


ail
h4
•^ »
« C
"1
CO



**
«
j;

do
**
£
•i
*.
«



O irt ^ N O (*1
~4 ** I -t- *H O
* •*• + ^
+

o MI O moo
^ -4 .H -HO
at«i


a« ot 1*1 **> r* at ao
^4 -H ^  M -< o
^i ^ IN » 1*11*1 as *> o ^ a> r- ii*
<4> i-> 1-1 OD » iiii 0«tn r- -< » ill
«cooa<0 ci » H «Dr-o
as » o 1-1 in o VD
^H o 

rt -< -4  >
i ^ g
2 S Z
$ 1 &
1 1
5 5 5
M Qf U
t; §> « S §•
« i " £ I
** II •+* m
»-4 a >H
i o e c\j o
^< 5 o x 5

-a .w
•^ — <
•4 «*j
«4  2 d

So" "a
x z >• x z
1 £ • 32

i
, i
. >
g
§?_
u •** u
35 a
•D O •
5 ->?
o 0 a
- vi m _ r
 «
• • IN — 1
u £ v U a)
Q.4J— «-,O O P
•  «- 3 — O « U « O •
O m o 0 o «
 I • v. • X0 X0
> J ^ -^ « i • • i a I i a
a b< a fc e iJ « • mm! mm> mine:
• 00 •• • •
* % » ^A^QiVA ^ ^ *n M *N m CO O4 m

-------




















a
§
**
to
fi
2
S
IM
O
« 2
h>4
00 2
x 9*
j •

< a
1 "
z ^J
Sa
" 01
^ e ~
M 5 =
-3 3 J
g W jj
0
Z *

XT
S *•
< >j
S ^
f W
p ~*
U „



5 ••
* 0

—*
8
8
•
•c
hi
J

.C
[4
















O
C u

25
Is
— e
M x
tQ O
X S
JC
u «
-4 « >•
e. o -i
X C ~l
« a
4V) • 0 •

o o *> c
• u a a
1 1 2 93
SO AJ *4
-1 «4 09 —I
w « a
"* 'S *
o« X •
a

o 5
o
u OS
a H
04 04
X



~
§1
S H
"*
"3 *
° V -4
-4 13 a
0 ~4 ^
O **4 b
•* s •
u O u
• u e
2 - l-
c 0
4J Ih
CO



t>
g:
*9 U
• 3

0 a
>u 0
• X


J_S

.£ H
••* •
•« o
a. a
a

^4 O O ^ ^4 O
1111











M
I
u
J=

^4
k.




e a


* 2

« M
U «-l
S O

•




•

e
«
8r
S 0

* •
o c
•X o

u «
a


> u
a CAD
•a u « O o i»>
« • 51 • u 45 jc
* •» •a>-4?Ece*
0 -i 06 «5 U hi
a c oe x >, cj
a o o CM
O«4 Cu«OuCV
*• o e *< M 5 -i
-i « o j< u J3 a c
•HU 4xOv<^0
<  e a
c o c
— • -M «
M -4 41 wl
a o « o
•4-4 rt -4

MM hi W
U tl hi 0
So o 5
u o u

• n
0 «
» >4


•1
£ a



i ^ '
— O 1 U 0
M « < e •
31". 8 -

fl • • T3 fl
*» 0 C *• "°
O •« 0 U 0
•4 ** > "4 M '
•a 0 tj -4
0 C 9 tt
U -"4 0 *4 C
cu i a 0
C J= « CJS0 C _!
0Ouoi 0009 4 o i< • u
0«awifljt « o. • u -e jc • -^ no
en 13 — 3 — u » ^3 -i 3 — « > xi >, »w
•S -• 13 £ DC « -S -. 13 £ 0! «U « J
-40iau " — os « o hi • o t X >. O « -< a KXO0-4IB O 0 «J J w 05
on as z r j u aaezsJU o u c o *< —

m o
m o
1 ^4
4>

a o
e o
M 0
•VI — 1
1 +1

»^
•
^ T:

** "a

3 ^T
» z
c
K
jf
z

t4
C
X
4J
c
i
»H
a
'o
•o
c
a
hi
a

























































































































S
4>J
«

3
u
—4
a
o
4J
0
c
1
y
«
B-44

-------










































a
3
i


































































c/}
M
cn
>H
^.3
<

Q
en
CL,
g
P
RESULTS i
§
to
a.
£


j
H
i






















































4
•
b
<
Q
S
e
-*4
U
c
0
•H
a
M
4J
C
0
u
1
JJI
a
V
c.
ai
z
4j
o
—4
•a
o
u
a.

u
>
«*
AJ
»
•*••*
13
O
U
0
•o
u


a
u
••0J
Ok*























CD
4J
C
•*«.
2
5
•
*4
M
V
X

J=
a a
-- V >l
JS o *•<
* C -I
0 0
•U • U
•H 0 U -«
0 O *•
0 U «
*4 U to -P«
0 5 0 JJ
> 0 fl. 0
2O JJ
— rt U5
•J 0
•^ Tl O
c S u
o« x 0
09


i
2
**
O

M
ki
•
>
<
~*
a *
Btical Ba
Conf Idenc
^4
*» U
a o
ti Du
TO


«
g
U
O
«w
M
S



|
u
•01
•9
a*





















^
41
TJ
u
«
*
<
0
l)
c
-~4
o
a.





«
*»
c
0
u
0
**a
«J
a






o
«r
c*
<
S
u
£
X



S
w4
H




ntecval
*4




Measures



:
K
«
U
0
a
ca





^^
4J
»
«
J3
3
a


*>
0
03

0
At
0
0




O O
in in
l 1

o o
in in












O U
i. 2






n in
v «i
r- *
^4
r- M
r- »
«




m **
«

0
2
0
u
»-4
Q.
a
0
«
c
£

2
•X >l
U M
rect conpa
values on
•^ «4
O O

£
0
0
»

1
j:
•o *»
0 —
•U >
o
?5
0 -0<
k4 A*
a «
— *j
OB
§ .
-i j=
JJ -W
0
U JJ
C
« —
3*
o >
0 0
V
•W 0
0 a
.0
5"?
X «















































M
S
a
V
K
O
0







<»>
1
<
0
«
to -M
rect coapa
values on
•^ IM
Q 0

£
0
0
X

1
0 1
to 0
Q. jj
— 0

§5
™^ *j
4J
0 U
u o
W W
c
0 —
Q -a
C 0
o >
O u
•U 0
0 a
5°
o>^
2 g
J>B •(
1
T3 -O
C 0
O -u
85
ca 13













































•n
S

C4
f
*
0
r<4
0
tr










0 .
« •
•f
*• *
:§
0 0
^^ 0
u a

JS --
«i
c|
j: ~Z
00) 0Q,
»s
s^
-» £
- £-

in o o
<•«(*>
i i l

o o o
in in *n












* «* *»
o in in
« 01 tn






 a
-< --
« "2 * «
S 5 c 5
i u o 0
u — -«
0 -o ^ a
a « 0 0
Q* LI *» jc
s a< 
•M


-t 0
0 ->
** 0
O >.
K <-t
O 0
3 C
CO 0
<•«•«• o y u
« o >n 2 2 2 Q
OT 91 tfk QQ
&

to
a
0
0 r-  « c
r« m o » to r- ^
0 -4 M 1 1 1 ^
rH
0
^J
m O r>« o ^ ^ o
ot a> •» M « >a 2
ooo i -o
0
O


^ « •*• ^ m «

g
.0 *•
Ox —
<«4 to «4
« 5
"•o °
s s :
e w 0
00 e
"to S ?

52. a

£ £
0 d
0 0
x *




















































































?
0J
*3
(M
3
O
*-t
4]
O
4J
0
c
'i
«
B-45

-------
as a performance measure, the variance of the observed concentrations is




also provided to aid in data interpretation.  The last two columns list




the maximum points that may be awarded for each performance measure/averaging




time combination, and the points that actually were awarded.  At the end




of each table the subtotal of points awarded is indicated.  At the end of




Tables 7 and 8, the grand total points tor the model performance comparisons




are presented for the NAAQS application and the PSD application,




respectively.







     5.L  Results tor Model Performance Comparisons in the NAAQS Analysis




          The grand total points awarded for the model performance




comparison in the NAAQS analysis is +100 points out of a possible -f-1000




points.  As set forth in the protocol, a score from +100 to +1000 points




results in the acceptance of AQ40,  the proposed model, over MPTER, the




reference model, for NAAQS regulatory applications at the Clifty Creek




station.  With a score of +100, AQ40 has attained the minimum  score




required for acceptance over the reference model.




          Inspection of Table 5 reveals that in the NAAQS application




AQ40 better predicted the highest,  second-highest observed S02 concentra-




tions for both 3- and 24-hour averaging times and showed slightly better




performance overall  for the first performance evaluation objective,




predicting the highest concentrations (+52 points out of a possible +500




points).  AQ40 also outperformed MPTER on the second objective, predicting




the domain of concentrations, by scoring +103 out ot a possible ^+300




points (see Table 6).  MPTER, however, scored better (with -55 points out




of +200 possible points) for the third objective which tests the ability




of the models to match spatial and  temporal patterns of concentration
                                  B-46

-------
(see Table 7).  Based upon the grand total points awarded,  AQ40 is accepted




as suitable for the regulatory analyses of 3- and 24-hour average S02




impacts from the Clifty Creek generating station in relation to NAAQS




requirements.







     5.2  Results for Model Performance Comparisons in the PSD Analysis




          The grand total points awarded for the model performance




comparison in the PSD analysis is -416 points out of a possible +1000




points, as shown in Table 8.  As set forth in the protocol,  a score from




-100 to -1000 points results in the rejection of the proposed model.   The




score of -416 for the PSD comparative performance evaluation indicates  a




decisive margin in favor of the reference model MPTER over  the proposed




model AQ40.




          Inspection of Table 8 shows that MPTER outperformed AQ40 for  the




first-order performance evaluation objective of predicting  the highest




concentrations within the PSD region.  MPTER scored -464 points out of  a




possible +500 points for this first-order objective, and more accurately




predicted both the 3- and 24-hour average highest,  second-highest concen-




trations as monitored within the PSD region.




          As stipulated by the protocol, the model  performance measures




and scoring scheme used for the second- and third-order PSD  evaluation




objectives, predicting the domain of concentrations, respectively,  are




identical  to the performance measures and scoring scheme used for the




NAAQS model comparisons.  The performance results and points awarded  for




these comparisons were presented previously in Tables 6 and  7.   Recali




that AQ40 scored +103 points out of +300 possible points for the second-




order objective, and that MPTER scored -55 points of +200 possible  points
                                  B-47

-------
for the third-order objective.  Based upon the grand total points awarded,




AQ40 is rejected as being suitable for the regulatory analyses of 3- and




24-hour average SC>2 impacts from the Clifty Creek generating station within




the hypothetical PSD Class I area.
                                   B-48

-------
6.0  Summary




     This narrative example of the  Interim Procedures For Evaluating Air




Quality Models illustrates the analytical steps necessary to judge the




acceptability oL" a proposed, non-Guideline model for a specific regulatory




application.  The statistical performance measures and scoring methodology




used in the narrative example have been selected for this hypothetical




application and set forth in the preplanned protocol.  In actual applica-




tions of the Interim Procedures, the performance evaluation methodology




must be designed to meet the specific objectives of the intended regulatory




use of the proposed model.  It is especially important that close liaison




be maintained wi th the control agency throughout the model evaluation




process to ensure agreement on the objectivity of the model comparison




results.
                                   B-49

-------
7.U  References

     1.  EnvironmenLal Protection Agency.  "Interim Procedures For
         Evaluating Air Quality Models."  Office of Air Quality Planning
         and Standards, Research Triangle Park, North Carolina, August  1981,

     2.  Environmental Protection Agency.  "Guideline On Air Quality
         Models."  EPA 45U/2-78-027.  Office of Air Quality Planning and
         Standards, Research Triangle Park, North Carolina, April 1981.

     3.  Environmental Protection Agency.  "Workbook For Comparison Of
         Air Quality Models."  EPA 450/2-78-028a and EPA 450/2-78-0286,
         Office of Air Quality Planning and Standards, Research Triangle
         Park, North Carolina, May 1978.

     4.  American Society of Methanical Engineers.  "Recommended Guide
         for the Prediction of Dispersion of Airborne Effluents."  M. Smith
         (editor).  American Society of Mechanical Engineers, New York, New
         York, 1968.

     5.  Fox, D. G.  "Judging Air Quality Model Periormance - A Summary
         of the AMS Workshop on Dispersion Model Performance, Woods hole,
         Massachusetts, September 8-11, 198U."  Bulletin of the American
         Meteorological Societh 62:599-609, May T9~8~E

-------
                           Appendix ('.




Procedure For Calculating Non-Overlapping  Contidt-ru.e Intervals
                             C-l

-------
     This Appendix il lust rates  the  procedure used to calculate non-





overlapping confidente  intervals  as discussed in Section 4.3 ol  the





narrative example.   This  procedure  is  used  when the 05 percent confidence





intervals of the performance  measure' (absolute value of  bias, variance or





correlation) contains  tne  value ol  the performance measure lor both





models, as  illustrated  in  Figure  2  of  Appendix B.  The following  >xample





demonstrates tlu s procedure.





     Suppose that tor Model. A t. lie value of  the bi.is performance' measuif





is 105 -jg/niJ, the standard error  is 20 ^g/nH, and the sample size is bOO;





and for Model 8, these  values ar
-------
20 percent, until  the non-overlapping level  was identified-   For the example




in Table C-l ,  the 90 percent confidence intervals are 72 ug/m3  to 138 pg/m3




for Model A and 34 yg/m-' to 116 ug/m-' for Model B.  Again,  both confidence
                                TABLE C-l




          EXAMPLE CONFIDENCE INTERVALS AT FOUR CONFIDENCE LEVELS




                            Model A                        Model B




                        Bias = 105 ng/M3               Bias  =  75 |jg/m3
Confidence
Level
95%
90%
80%
60%
Lower Bound
(ugM3)
66
72
79
88
Upper Bound
(Mg/m3)
144
138
131
122
Lower Bound
(pg/m3)
26
34
43
54
Upper Bound
(pg/m3)
124
116
107
96
intervals include the value of the bias  for  the other model,  and  therefore




the confidence level must be decreased by .-mother  step.   The  80  percent




confidence interval for Model A (79 Mg/irJ to 13J  ug/m3)  does  not  include




the value of the bias Cor Model B (75 pg/m3) .   However,  since the 80  percent




confidence interval for Model B (43 pg/m3 to 107  pg/m3)  does  include




the value of the bias for Model A (105 ug^3) >  the confidence level must




be decreased by another step.  The 60 percent  confidence intervals are 88




pg/m3 to 122 pg/m3 for Model A and 54 pg/m3  to  96  pg/m3  for Model B.




Since neither interval includes the value of the bias for the other model,




the non-overlapping confidence interval  has  been identified as being  60




percent.  Thus, in the scoring scheme, 60 percent  of the total possible




points would be awarded to Model B in the case  of  this  example,  the model




with the lower bias.  For the case when  the  20  percent  level  fails to produce




non-overlapping confidence intervals, neither model is  awarded any points.
                                  C-4

-------
                                   TECHNICAL REPORT DATA
                            (Please read Instructions on the reverse bejore completingl
 1. REPORT NO.
  EPA 450/4-84-023
4. TITLE AND SUBTITLE
  Interim Procedures for Evaluating  Air Quality
    Models  (Revised)
7. AUTHOR(S)
9. PERFORMING ORGANIZATION NAME AND ADDRESS
  Monitoring and Data Analysis Division
  Office of'Air Quality  Planning and Standards
  U.S. Environmental Protection Agency
  Research Triangle Park,  N,C.  27711
 12. SPONSORING AGENCY NAME AND ADDRESS
  Monitoring and Data Analysis  Division
  Office of Air Quality  Planning and Standards
  U.S.  hnvironmental Protection Agency
  Research Triangle Park,  N.C.   27711	
                                                           3. RECIPIENT'S ACCESSION NO.
                                                           5. REPORT DATE
                                                              September 1984
                                                           6. PERFORMING ORGANIZATION CODE
                                                           8. PERFORMING ORGANIZATION REPORT NO.
                                                           10. PROGRAM ELEMENT NO.
                                                           11 CONTRACT/GRANT NO
                                                            13. TYPE OF REPORT AND PERIOD COVERED
                                                           14 SPONSORING AGENCY CODE
                                                             CPA-450/4-84-023
15. SUPPLEMENTARY NOTES
16 ABSTRACT
       This document describes  interim procedures for use  in  accepting, for a specific
  regulatory application, a model  that is not recommended  in  the  Guideline on Air Qua!it
  Models.   The procedure involves  a  technical evaluation and  a  performance evaluation,
  utilizing measured ambient data, of the proposed rionguideline model.   The primary
  basis for accepting the proposed model  is a demonstration that  it performs better
  (better  agreement with measured  data)  that tiv Quideline model  or the model that EPA
  would normally use in the given  situation.  Thp acceptance  procedure  may also consider
  the  technical  merits of the proposed model anos especially  in cases where an EPA
  recommended model cannot be identified, the performance of  the  model  in comparison to
  a  set of specially designed performance standards,  A major component of the proce-
  dure is  the development of a  protocol  which describes exactly how the performance
  evaluation will  be conducted  and what  the specific basis for  accepting or rejecting
  the  proposed model will be.
 7.
                                KEY WORDS AND DOCUMENT ANALYSIS
                  DESCRIPTORS
 Air Pollution
 Meteorology
 Mathematical Models
 Performance Evaluation
 Performance Standards
 Statistics
                                              b.IDENTIFIERS/OPEN ENDED TERMS
                                               Performance Measures
                                               Technical  Evaluation
                                                                           COSATi Field/Group
 4B
12A
18. DISTRIBUTION STATEMENT


  Release Unlimited
                                              19. SECURITY CLASS (This Report/

                                                Unclassified	
                                                                         21 NO. OF PAGES
                                              20 SECURITY CLASS (Tills page)

                                                Unclassified	
                                                                         22 PRICE
EPA Form 2220-1 (Rev. 4-77)   PREVIOUS EDITION is OBSOLETE

-------
Suborn Street

-------