United States      Office of Air Quality        EPA-450/4-85-006
             Environmental Protection  Planning and Standards      July 1985
             Agency         Research Triangle Park NC 27711
             _
.&EPA      Interim Procedures
             For Evaluating Air
             Quality Models:
             Experience with
             Implementation

-------
             United States      Office of Air Quality        EPA-450/4-85-006
             Environmental Protection  Planning and Standards      July 1985
             Agency         Research Triangle Park NC 27711
             _
,4>EPA      Interim Procedures
             For Evaluating Air
             Quality Models:
•            Experience with
             Implementation

-------
                                          EPA-450/4-85-006
        Interim Procedures for Evaluating Air
Quality Models:  Experience with  Implementation
                         U.S. Environments! Protection Agency
                         Region V, Library
                         230 South Dearborn Street
                         Chicago, Illinois 60604
                    U.S ENVIRONMENTAL PROTECTION AGENCY
                      Monitoring and Data Analysis Division
                    Office of Air Quality Planning and Standards
                    Research Triangle Park, North Carolina 27711

                             July 1985

-------
                                               Disclaimer

This report has been reviewed by The Office of Air Quality Planning and Standards, U S Environmental Protection
Agency, and has been approved for publication. Mention of trade names or commercial products is not intended to
constitute endorsement or recommendation for use.

-------
                                 Preface




     In August 1981, EPA developed and distributed to its Regional Offices an




in-house document "Interim Procedures for Evaluating Air Quality Models."




The Regional Offices were encouraged to use the guidance contained in the




document as an aid to determining whether a proposed model, not recommended in




the Guideline on Air Quality Models^, could be applied to a specific regulatory




situation.  Subsequently, as a result of experience gained in several applica-




tions of these procedures, EPA revised and published the "Interim Procedures for




Evaluating Air Quality Models (Revised)"^, in September, 1984.




     The material contained in this report summarizes the experience gained




from the first several applications of the original guidance.  Potential




users of the revised Interim Procedures are encouraged to read this report




so that they might benefit from the experience of others and thus be able to




better design their own application.  The user should pay particular attention




to the Findings and Recommendations (Section 4) so as to know and better




understand particular aspects in the revised procedures on which EPA will




place emphasis in the future applications.
                                   iii

-------
                              Acknowledgements




     This report was prepared by Dean Wilson with contributions from




Joseph Tikvart, James Dicke and William Cox, all of the Source Receptor




Analysis Branch, Monitoring and Data Analysis Division.




     Appreciation is extended to Michael Koerber, Region V,  Alan Cimorelli,




Region III and Francis Gombar, Region II for their helpful comments during




the review process.   The patience of Linda Johnson as she typed this report




is appreciated.

-------
                            Table of Contents

                                                                     Page

Preface	  iii

Acknowledgements	   iv

Table of Contents 	    v

List of Tables 	  vii

List of Figures	   ix

List of Symbols 	   xi

Summary	xiii

1.0  INTRODUCTION 	    1

     1.1  Scope and Contents 	    2

     1.2  Basic Principles Employed in the Interim Procedures 	    2

     1.3  Summary of the Interim Procedures 	    3

2.0  APPLICATIONS OF THE INTERIM PROCEDURES TO REGULATORY PROBLEMS .    7

     2.1  Baldwin Power Plant 	    8

          2.1.1  Ba ckg round 	    8
          2.1.2  Preliminary Analysis 	   10
          2.1.3  Protocol for the Performance Evaluation 	   11
          2.1.4  Data Bases for the Performance Evaluation 	   12
          2.1.5  Results of the Performance Evaluation and Model
                 Acceptance 	   13

     2.2  Westvaco Luke Mill 	   13

          2.2.1  Background	   13
          2.2.2  Preliminary Analysis 	   14
          2.2.3  Protocol for the Performance Evaluation 	   15
          2.2.4  Data Bases for the Performance Evaluation 	   17
          2.2.5  Results of the Performance Evaluation and Model
                 Acceptance	   18

     2.3  Warren Power Plant 	   19

          2.3.1  Background 	   19
          2.3.2  Preliminary Analysis 	   21
          2.3.3  Protocol for the Performance Evaluation 	   22
          2.3.4  Data Bases for the Performance Evaluation 	   25

     2.4  Lovett Power Plant 	   25

-------
          2.4.1  Background	   26
          2.4.2  Preliminary Analysis	   26
          2.4.3  Protocol for the Performance Evaluation 	   28
          2.4.4  Data Bases for the Performance Evaluation 	   30

     2.5  Guayanilla Basin 	   30

          2.5.1  Background 	   32
          2.5.2  Preliminary Analysis 	   33
          2.5.3  Protocol for the Performance Evaluation 	   34
          2.5.4  Data Bases for the Performance Evaluation 	   37

     2.6  Other Protocols 	   38

          2.6.1  Example Problem 	   38
          2.6.2  Gibson Power Plant 	   39
          2.6.3  Home r Ci ty Area 	   40

3.0  1NTERCOMPARISON OF APPLICATIONS 	   43

     3.1  Preliminary Analysis 	   43

          3.1.1  Regulatory Aspects	   44
          3.1.2  Source Characteristics and Source Environment 	   44
          3.1.3  Proposed and Reference Models	   46
          3.1.4  Preliminary Concentration Estimates 	   48

     3.2  Protocol for the Performance Evaluation 	   48

          3.2.1  Performance Evaluation Objectives 	   49
          3.2.2  Data Sets, Averaging Times and Pairing 	   50
          3.2.3  Performance Measures 	   53
          3.2.4  Model Performance Scoring 	   55

     3.3  Data Bases for the Performance Evaluation	   57

     3.4  Negotiation of the Procedures to be Followed 	   60

4.0  FINDINGS AND CONCLUSIONS	   63

5.0  REFERENCES	   69

Appendix A.  Protocol and Performance Evaluation Results for
             Baldwin Power Plant 	  A-l

Appendix B.  Protocol and Performance Evaluation Results for
             Westvaco Luke Mill	  B-l

Appendix C.  Protocol for Warren Power Plant	  C-l

Appendix D.  Protocol for Lovett Power Plant 	  D-l

Appendix E.  Protocol for Guayanilla Basin 	  E-l


                                  vi

-------
                              List of Tables



Number                            Title                               Page

 3-1               Source and Source Environment	   45

 3-2               Proposed and Reference Models 	   47

 3-3               Weighting of Maximum Possible Points by
                   Data Set, Averaging Time and Degree of
                   Pairing 	   51

 3-4               Performance Measures Used in the Protocols 	   54

 3-5               Data Bases for Performance Evaluations	   58

 3-6               Issues Involved in Negotiations 	   62
                                   vii

-------
Vlll

-------
                             List of Figures
Number                            Title

 1-1           Decision flow diagram for evaluating a
               proposed air quality model	
 2-1           Map of air quality monitoring stations and the
               meteorological tower in the vicinity of the Baldwin
               power plant,  April 1982-March 1983 	
 2-2           Topographic map of the area surrounding the
               Westvaco Luke Mill 	    14

 2-3           Map of seven air quality monitoring stations  and
               the meteorological station in the Warren area 	    20

 2-4           Map of air quality monitoring stations  and the
               primary meteorological tower in the vicinity  of
               the Lovett power plant	    26

 2-5           Map of existing air quality monitoring  network
               and expanded air quality monitoring network in the
               Guayanilla area 	    31
                                    IX

-------
                               List of Symbols
Co    = Observed Concentration


Cp    = Predicted Concentration


d     = Residual = Co - Cp


Mc    = Number of Observed/Predicted Meteorological Events in Common


R     = Pearson's Correlation Coefficient


RMSE^ = Root-mean-square-error of Residual


Sjj    = Standard Deviation of Residual

 o
S*:    = Variance of Observed Concentration

 o
&„    = Variance of Predicted Concentration
                                     XI

-------
                              Summa ry




     This report summarizes and intercompares the details of five major




regulatory cases for which the guidance provided in the "Interim Procedures for




Evaluating Air Quality Models"* was implemented in evaluating candidate models.




In two of the cases the evaluations have been completed and the appropriate




model has been determined.  In three cases the data base collection and/or




the final analysis has not yet been completed.




     Due to the unique source-receptor relationships in each case, however,




the procedures, data bases and number of monitors here are not necessarily




applicable to other situations.  These cases are presented only as examples of




how the 1981 Interim Procedures document has been applied to some real world




situations.




     Each of the five cases involves major point sources of SC>2.  In all




cases the major regulatory concern is to determine the emission limit that




would result in attainment of the National Ambient Air Quality Standards




(NAAQS) within a few kilometers of the plants.  Most of the cases involve




power plants and/or industrial facilites located in complex terrain where




short-term impact on nearby terrain is the critical source-receptor




relationship.




     Although the scope of model problems is limited, it seems clear that the




basic principles or framework underlying this guidance is sound and workable




in application.  For example, the concept of using the results from a pre-




negotiated protocol for the performance evaluation has been shown to be an




appropriate and workable primary basis for objectively deciding on the best




model.  Similarly, "up-front" negotiation on what constitutes an acceptable




data base network, while often difficult to accomplish because of conflicting
*1981 EPA internal document
                                   Xlll

-------
viewpoints, has been established as an acceptable way of promoting objectivity




in the evaluation.




     In earlier evaluations there was some laxity on the part of the reviewing




agencies in requiring a detailed preliminary evaluation/documentation of the




critical source-receptor relationships.  In more recent evaluations fulfilling




the requirement for preliminary estimates has led to better understanding of




the source-receptor relationships and provided a better linkage between these




relationships and the contents of the performance evaluation protocol.   These




preliminary estimates also seem to better define the requisite data base net-




work.  As a consequence of this experience, it is recommended that in future




protocols more emphasis be placed on the preliminary analysis; the results




of this analysis should be linked to the protocol and the requisite data




base through the development of detailed performance evaluation objectives.




     Experience has also pointed up the need to build in some "safeguards" in




the application of the chosen model, should that model be shown to underpredict




concentrations.  This is particularly a problem if an emission limit derived




from the model application might result in violations of the NAAQS.  The methods




used in more recent regulatory cases generally involve the use of "adjustment




factors" to correct for possible underprediction.  This technique is not




particularly appealing and the development of more innovative and scientifi-




cally defensible schemes is recommended.




     Finally, based on this experience, it should be emphasized that the




credibility of the performance evaluation is greatly enhanced by the availability




of continuous on-site measurements of the requisite model input data.  This




includes the measurement of meteorological parameters, as well as pre-specified




backup data sources for missing data periods.  Also included is the need for




continuous in-stack measurement of emissions and accurate stack parameter data.
                                   xiv

-------
1.0  INTRODUCTION




     In 1981 a document "Interim Procedures for Evaluating Air Quality




Models" was prepared in-house by EPA and distributed to the ten




Regional Offices.  This document identified the documentation, model




evaluation and data analyses desirable for establishing the appropriateness




of a proposed model.  The Regional Offices were encouraged to use the




procedures when  judging whether a model not specifically recommended for




use in the "Guideline on Air Quality Models,"^ was acceptable for a given




regulatory action.  These procedures, which involved the quantitative




evaluation and comparison of models for application to specific air




pollution problems, addressed a relatively new problem area for the




modeling community.  It was recognized that experience with their use would




provide better insight to the model evaluation problem and its limitations.




During the 1981-1984 time period, several projects which entailed the use




of the procedures were undertaken.  Based on this experience, the procedures




were revised and published as "Interim Procedures for Evaluating Air




Quality Models (Revised)"2.




     It was clear from the experience gained in application of these 1981




procedures that the basic principles contained therein were sound




and appropriate to apply to regulatory model evaluation problems.  However




the state of the science did not suggest a single prescription detailing




their application.  In fact, each application of the procedures differed




considerably in detail.  However, while the individual merits of each




application could be scientifically debated, each case reflected an




acceptable interpretation of the interim guidance.

-------
     1.1  Scope and Contents




          The purpose of this document is to provide potential users of the




revised Interim Procedures with a description and analysis of several




applications that have taken place.   With this information in mind the user




should be able to:  (1) more effectively implement the procedures since




some of the pitfalls experienced by  the initial pioneers can now be




avoided; and (2) design innovative technical criteria and statistical




techniques that will advance the state of the science of model evaluation.




          Remaining sections of this report are as follows.   Section 1.2




reviews the basic principles underlying the Interim Procedures.   Section




1.3 is a summary of the Interim Procedures, to be used as a  point of




reference in reading this report.  Section 2 contains summaries of each




of five major regulatory cases where the Interim Procedures  were applied,




as well as brief summaries of three  other incomplete cases.   Section 3




intercompares the technical details  of each of the five cases.  Section 4




lists the findings and recommendations resulting from the analyses in




Sections 2 and 3.  Appendices A-E contain details of the protocols for




each of the five cases.  Appendices  A and B also contain the final scores




for two of the performance evaluations.






     1.2  Basic Principles Employed  in the Interim Procedures




          The Interim Procedures for Evaluating Air Quality  Models is built




around a framework of basic principles whereby the details of the decision




process to be used in the model evaluation should be established and documented




up-front.  The performance evaluation protocol should be established before




data are available that would allow either the applicant or  the control




agency(s) to determine, in advance,  the outcome of the evaluation.  These





principles are:

-------
          0  Up-front negotiations/agreements between the user and the




regulatory agencies are vital;




          0  All relevant technical data/analyses and regulatory constraints




are documented;




          0  A protocol for performance evaluation is written before any




data bases are in hand;




          °  A data base network is established that will meet the needs of




both the technical/regulatory requirements and the performance evaluation




protocol;




          0  The performance evaluation is carried out and the decision on




the appropriate model must be made as prescribed in the protocol.




          The macerial in Sections 2 and 3 is an analysis, among other things,




of how well these principles were adhered to for five cases.   The findings




in Section 4 include specific statements to this effect.







     1.3  Summary of the Interim Procedures




          The document Interim Procedures for Evaluating Air Quality




Models (Revised) describes procedures for use in accepting, for a specific




application, a model that is not recommended in the Guideline on Air Quality




Models.  One requirement is for an evaluation of model performance.  The




primary basis for the model evaluation assumes the existence of a reference




model which has some pre-existing status and to which the proposed nonguideline




model can be compared from a number of perspectives.  However for some appli-




cations it may not be possible to identify an appropriate reference model,




in which case specific requirements for model acceptance must be identified.




Figure  1-1 provides an outline of the procedures described in the document.




          After analysis of the intended application, or the problem to be




modeled, a decision must be made on the reference model to which the proposed




                                   3

-------
Wri te
Descrip
\
Tech.
tion of
i
Technical
Compari son
of rbs?ls
\
t
knte Perf.
Evaluati on
Protocol
                  Wn te Tech.
                  Description of
                  Proposed
                  Acceptable
                  or terminal
                 i Write ?erf.
                  Evaluation
                    Protocol
                        I rert.
                   Eva! uati on
                      Data
Collect ?erf
 Evaluti on
    Dats
                    Concuct
                     Pert.
                   Evaluati on
   Conauct
    Perf.
  Evaluati on
                                                                of Perf
                                                                 Eval.
                                                                 ther
                                                               Protocol
                                                               Criteria
Figure  1-1.   Decision flow  diagram  for evaluating  a proposed air quality model

-------
model can be compared.  If an appropriate reference model can be identified,




then the relative acceptability of the two models is determined as follows.




The model is first compared on a technical basis to the reference model




to determine if it can be expected to more accurately estimate the true




concentrations.  This technical comparison should include preliminary con-




centration estimates with both models for the intended application.  Next




a protocol for model performance comparison is written and agreed to by




the applicant and the appropriate regulatory agency.  This protocol




describes how an appropriate set of field data will be used to judge the




relative performance of the proposed and the reference model.  Performance




measures recommended by the American Meteorological Society-^ are used in




describing the comparative performance of the two models in an objective




scheme.  That scheme should consider the relative importance to the problem




of various modeling objectives and the degree to which the individual per-




formance measures support those objectives.  Once the plan for performance




evaluation is written and the data to be used are collected/assembled,




the performance measure statistics are calculated and the weighting scheme




described in the protocol is executed.  Execution of the decision scheme




will lead to a determination that the proposed model performs better,




worse or about the same as the reference model for the given application.




The final determination of the acceptability of the proposed model should




be based primarily on the outcome of the comparative performance evaluation.




However, if so specified in the protocol, the decision may also be based




on results of the technical evaluation,  the ability of the proposed model




to meet minimum standards of performance, and/or other specified criteria.

-------
          If no appropriate reference model is identified, the proposed model




is evaluated as follows.  First the proposed model is evaluated from a




technical standpoint to determine if it is well founded in theory, and is




applicable to the situation.  Preliminary concentration estimates for the




proposed application should be included.  This involves a careful analysis




of the model features and use in comparison with the source configuration,




terrain and other aspects of the intended application.   Secondly, if the




model is considered applicable to the problem, it is examined to see if the




basic formulations and assumptions are sound and appropriate to the problem.




(If the model is clearly not applicable or cannot be technically supported,




it is recommended that no further evaluation of the model be conducted and




that the exercise be terminated.)  Next, a performance  evaluation protocol




is prepared that specifies what data collection arid performance criteria




will be used in determining whether the model is acceptable or unacceptable.




Finally, results from the performance evaluation should be considered




together with the results of the technical evaluation to determine accepta-




bility.

-------
2.0  APPLICATION OF THE INTERIM PROCEDURES TO REGULATORY PROBLEMS




     This section describes five major regulatory cases, covering the




period 1982-1984, where the techniques described in the Interim Procedures




are being applied to establish the appropriate model for setting emission




limits.  Although protocols for the comparative performance evaluation of




competing models have been prepared for all five cases, in only two cases




has the execution of the protocol been completed; these results are pre-




sented.




     Sections 2.1 through 2.5 are arranged roughly chronologically, i.e.




in the order in time when a final performance evaluation protocol was




established.  Section 2.6 contains brief summaries for other applications




of the Interim Procedures of which EPA is aware; however, for a variety of




reasons, the chosen models have not been used in regulatory decision-making.




     The history of negotiation over appropriate models, data bases, emission




limits, etc, for the sources included in these specific applications dates




back several years.  The development and execution of an agreed upon procedure




for the comparative performance evaluation of competing models is, or is




designed to be, the basis for resolution of these issues.  No attempt is




made in the following subsections to describe the complete history of issues/




negotiations.  Instead, only a brief definition of the issues to be resolved




by the performance evaluation is provided.




     Each of the Sections 2.1 through 2.5 contain separate subsections dealing




with the background (history), the preliminary analysis, the protocol for




the performance evaluation and the data bases to be used in the performance




evaluation.  In addition, Sections 2.1 and 2.2 include a subsection which




summarizes the results of the performance evaluation.

-------
     2.1  Baldwin Power Plant




          The Baldwin power plant,  located in Randolph County,  Illinois,




about 60 km southeast of St. Louis, Missouri, is composed of three steam/




electric generating units with a combined design generating capacity of




1,826 megawatts.  Each of the boilers is vented through an individual 605-




foot (184m) stack.  A map of the area is provided in Figure 2-1.






          2.1.1  Background




                 In late 1981 the State-approved SC>2 emission rate was




101,588 Ib/hour.  Illinois Power Company (IP) requested that this rate be




established as the EPA-approved SIP limit adequate to protect both primary




and secondary National Ambient Air Quality Standards (NAAQS).  The basis




for this proposal was estimates by the MPSDM model indicating compliance




with the standards.  The company claimed that the use of MPSDM was supported




by data from an 11-station monitoring network in the vicinity of  the plant.




Estimates using the EPA CRSTER model indicated compliance with the primary




NAAQS but violations of the 3-hour secondary NAAQS.




                 Potential problems were:




                 1.  locations of the monitors were not adequate  to conduct




a performance evaluation for MPSDM and CRSTER.




                 2.  adequacy of the IP model performance evaluation was in




question since the available monitoring data were used to select  a "best




fit" option of MPSDM, i.e., an independent performance evaluation was not




conducted; and




                 3.  available monitoring data (summarized as block data)




indicated exceedances (no violations) of both the 3-hour and 24-hour standards




at a previously operated monitor not included in the 11-station network.

-------
                                                  ^
                                               ' j]fi •   ^~|S)oWriwuicl.Directi
-------
                 Based on this information EPA decided thac the proposed




emission limit was adequate to attain the primary SC>2 NAAQS; however, the




secondary NAAQS demonstration should be re-evaluated by the State of Illinois.




Guidance contained in the Interim Procedures for Evaluating Air Quality




Models should be used in the re-evaluation.




                 In response to this suggestion, IP, in February 1982,  prepared




the "Proposed Procedures for Model Evaluation and Emission Limit Determina-




tion for the Baldwin Power Plant."  Negotiations then took place between the




Illinois Environmental Protection Agency (IEPA) and IP on the contents  of the




document.  The end result of these negotiations was a final protocol^ issued




by IEPA in June 1982.  The four major differences between the IEPA document




and the IP protocol were:  (1) IEPA eliminated one performance measure  that




involved case studies of the 10 episodes with highest measured concentrations,




(2) more weight was given to the comparison of the second-high, single-valued




residuals in the IEPA protocol (and less weight for some of the other mea-




sures); (3) IEPA eliminated the use of 1-hour statistics; and (4) IEPA




eliminated performance tests involving comparison of monitored data with




predictions for a 180 receptor grid.  (Instead, only predictions at the




monitor sites were to be used.)






          2.1.2  Preliminary Analysis




                 The preliminary analysis of the proposed application,




submitted by IP to IEPA, included a definition of the regulatory aspects




of the problem and a description of the source and its surroundings.  The




analysis established that only the 3-hour concentration estimates were  at




issue.  IP proposed to use MPSDM in lieu of CRSTER to estimate 3-hour




concentrations, pending the outcome of a comparative performance evaluation.





A technical description of MSPDM and a user's manual for the model were






                                     10

-------
provided to IEPA.  IP also provided a technical comparison between MPSDM




and CRSTER following the procedures outlined in the "Workbook for Comparison




of Air Quality Models"^.  IP's "workbook" comparison concluded that MPSDM




was technically comparable to CRSTER for most application elements but




was technically better for two of the elements; thus MPSDM was judged by




IP to be technically superior to CRSTER for the proposed application.




                 Preliminary concentration estimates were made with both




CRSTER and MPSDM although the details of these estimates were not documented.




From other information available it was evident that MPSDM would yield lower




3-hour estimates than CRSTER at locations within 2 km under very unstable




meteorological conditions (A-stability).  These estimates would be controlling,




i.e. the estimates that would be used to set the emission limit for the power




plant.







          2.1.3  Protocol for the Performance Evaluation




                 The IEPA protocol for the comparative performance evaluation




of MPSDM and CRSTER, which is detailed in Appendix A, strongly emphasized




accurate prediction of the peak (highest-second-highest) estimate.  Fifty-five




(55) percent of the weighting in the protocol involved the calculation of




performance statistics that characterize each model's ability to reproduce




the measured second-high concentrations at the various monitors.   Thirty-




five (35) percent of the weighting was assigned to performance statistics




that characterize the models'  ability to reproduce the measured concentration




in the upper end of the observed frequency distribution, namely the high-25




observed and predicted concentrations.  In addition, the protocol included




performance measures designed to determine how well the models perform for




specific meteorological conditions (5%) and performance statistics that compare




the upper end of the frequency distribution of measured/predicted values (5%).






                                   11

-------
               The primary performance measures used in trie evaluation were




the residual (observed minus predicted concentration) and the bias (average




residual for the high-25 data set).  Performance measures were calculated




from data paired in space and time and completely unpaired with the major




weighting on the unpaired data.   Other performance measures in the protocol




included the standard deviation  of the residual and the root-mean-square-error




of the residual.




               The scoring scheme used for most performance statistics




consisted of a percentage of maximum possible points within specified cutoff




values.  If the performance statistic fell outside of the cutoff values,




no points were awarded to the model.  Within the acceptable range, the per-




cent of possible points was linearly related to the value of the performance




statistic.  The sign (+ or -) of the residual and bias statistics was not




considered in the scoring process, i.e.  overprediction and underprediction




were weighted equally.  The scoring schemes for the meteorological cases  and




for the frequency distrubutions  were more complicated; refer to Appendix  A




for details.




                 The decision criteria by which the better model was chosen




was simply which model attained  the best score.






          2.1.4  Data Bases for  the Performance Evaluation




                 As mentioned earlier, the data base for the performance




evaluation ultimately consisted  of a network of monitors and a meteorological




station specifically designed to fit the needs of the application (See Figure




2-1).  Data obtained from previously operated networks were used in designing




this data base network.  This data base consisted of 10 S02 monitors and  a




single meteorological tower instrumented to collect wind speed, wind direction




and turbulence intensity (for use in MPSDM) data.  Off-site meteorological







                                     12

-------
data used in the evaluation consisted of mixing height data derived from




National Weather Service (NWS) soundings from Salem, IL and Pasquill-Gifford




stability data derived from surface observations at Scott Air Force Base,




IL (CRSTER only).  Hourly emission data and stack gas parameters were




derived from records of plant load level and daily coal samples.






          2.1.5  Results of the Performance Evaluation and Model Acceptance




                 The data base for this evaluation has been collected and the




performance evaluation has been carried out according to terms specified in




the protocol^.  The overall result was that MPSDM scored 51.3 points and




CRSTER scored 41.7 points out of a possible 100 points.  Thus MPSDM was




selected as the appropriate model to be used to determine the emission limit




necessary to attain the secondary 3-hour NAAQS.  Details of the performance




evaluation results are provided in Appendix A.







     2.2  Westvaco Luke Mill




          The Westvaco Luke mill in the town of Luke in Allegany County,




Maryland, is located 970 feet (296m) above mean sea level (msl) in a deep




valley on the north branch of the Potomac River.  The region surrounding




the mill is mountainous and generally forested.  Figure 2-2 is a topographic




map of the area surrounding the Westvaco Luke Mill,  The  ©  symbol shows the




location of the 623-foot (190m) main stack which serves the facility.






          2.2.1  Background




                 In response to a consent decree, the company operated an




ambient monitoring and meteorological data collection network from December




1979 through November 1981.  The  •  symbols in Figure 2-2 show the location




of continuous SC>2 monitors and the  Jk symbols show the locations of the
                                    13

-------
      BLOOMINGTON Monitor
            Bloomingto
Figure 2-2.  Topographic map of  t:.j  aroci  surrounding the ,-;estvaco Luke Mill.
Elevations are in feet above mean  sea  level  and the contour interval is 500 feet.
The •  symbols represent S02 monitoring  sites.   The A symbols represent meteor-
ological monitoring sites.  Sites  1  and  2 are also S00 monitoring sites.
                                        14

-------
100-meter Meteorological Tower No. 1, the 30-meter Meteorological Tower No. 2




(the Luke Hill Tower) and the 100-meter Beryl Meteorological Tower.  Continuous




S02 monitors were collocated with Tower No. 1 and Tower No. 2 and an acoustic




sounder was collocated with Tower No. 2.  As shown by Figure 2-2, there were




eleven SC>2 monitors of which eight were located on a ridge southeast of the




Main Stack.  SC>2 emissions during the two-year monitoring period were limited




to 49 tons per day.




                 The company developed a site-specific dispersion model,




LUMM, which they claimed was applicable to the problem and should be accepted




as the basis for setting a new emission limit of 89 tons per day. The company's




basis for this claim was described in a March 1982 report' in which estimates




from the LUMM model were compared to ambient measurements from the 11-station




network.  EPA reviewed the report and found a number of technical problems




with the model, including the use of ambient data to "tune" the model, i.e.




no independent performance evaluation was undertaken.




                 In order to resolve these problems, EPA developed, under




contract in mid-1982, a protocol for conducting a performance evaluation of




models applicable to the Westvaco site.  The company was then asked to compare




their model with the SHORTZ model, using procedures like those suggested in




this protocol.  As a result of these negotiations a final protocol" was agreed




upon in late 1982 and subsequently executed by the company, utilizing the




second year of the two-year data base.






          2.2.2  Preliminary Analysis




                 There is little written material on the Westvaco case which




would suggest that an up-front, in-depth preliminary analysis of regulatory




and technical aspects of the problem was undertaken.  However, based on the
                                     15

-------
above two references, various Federal Register actions and numerous meetings,




both the source and the control agencies apparently had at least tacit under-




standings of the regulatory and technical issues involved.  For example, the




regulatory agencies were concerned about attainment of the short-term ambient




standards at elevated receptors near (within a few kilometers of) the source.




It was also apparent that SHORTZ would yield higher concentration estimates,




and thus a tighter emission limit, than LUMM.




                 References 7 and 8 contain technical descriptions of the two




competing models but no user's manuals.  The SHORTZ model was modified for




use at Westvaco and no user's manual exists for this version.  The references




do not describe any preliminary estimates using the two models nor do they




contain an in-depth technical comparison of the two models.  No analysis




using the Workbook for Comparison of Air Quality Models was undertaken.







          2.2.3  Protocol for the Performance Evaluation




                 The final agreed upon protocol for the comparative performance




evaluation of LUMM and SHORTZ, which is detailed in Appendix B, emphasized




accurate estimates of the peak concentrations and the upper end of the fre-




quency distributions.  Forty-three (43) percent of the weighting in the




protocol involved the calculation of performance statistics that characterize




each model's ability to reproduce the measured maximum and second-high




concentrations at the various monitors.  Fifty-seven (57) percent of the




weighting was assigned to performance statistics that characterize the




models' ability to reproduce measured concentrations in the upper end of




the observed frequency distribution, namely the high-25 observed and predicted




concentrations.  No "all data" statistics were calculated, i.e. the protocol




assumed that the only relevant data were the top-25 estimated and observed




concentrations.




                                     16

-------
                 The protocol specified three basic performance measures to be




used in the evaluation, the absolute residual for single-valued comparisons,




the bias for the top-25 concentrations and the ratios of the observed and




predicted variances for the the top-25 concentrations.  Various time and




space pairings were specified with most of the weighting (61 percent) on




data paired in space but not time.




                 The scoring scheme used for each performance statistic was




specified by somewhat complicated formulae and the reader is referred to




Appendix B for details.  Basically, the scheme involved computing ratios




of performance measures between the two competing models and bias ratios




or variance ratios for each model.  These ratios were then combined in




various ways to produce a percentage of maximum possible points for each




performance statistic.  This result was then multiplied by the maximum




possible points for that performance statistic to yield a subscore.




Subscores were then totalled for each model to yield a composite score.




The model with the highest total score was deemed to be most appropriate




to apply to the source.






          2.2.4  Data Bases for the Performance Evaluation




                 The data base used in the performance evaluation was the




second year of the historical two-year data base described above.  The




locations of the ten monitors and the two meteorological towers for use in




the evaluation are shown in Figure 2-2.  Data from the Beryl tower and the




Bloomington monitor were not to be used, although Bloomington data were




used to help establish background values.  Each tower was instrumented at a




number of levels; thus there were often a number of possible values for the




meteorological inputs to each model to choose from.  To promote objectivity




in the evaluation, the primary source of data for each meteorological






                                   17

-------
parameter, as well as ranked "default" data sources to be used in the event




of missing data, were specified in a protocol.  No off-site meteorological




data were used in the models; however default values lor mixing height and




some turbulence intensities were specified.  Hourly emission data and stack




gas parameters were derived from continuous in-stack measurements.




                 The data base for the model evaluation already existed.




The network was designed in 1978-1979, to determine if there were any NAAQS




violations in the vicinity of the plant and possibly for use in conducting




a performance evaluation.  Most of the monitors were densely clustered on




the hillside south of the plant, the area where maximum concentrations were




expected.  However, a decision was made that the definition of ambient air




did not apply to this property, i.e." the NAAQS did not apply there.  This




fact, together with an opinion of the control agencies that the LUMM model




was partially based on the same data that would be used in the performance




evaluation, raised many questions on the objectivity of the evaluation.




Detailed records on the negotiations between the company and control agencies




to resolve this concern are lacking.  In the end it was apparently decided




that the objectivity in the performance evaluation was not sufficiently




compromised to warrant the redesign of the network and collection of an




additional year's data.  The performance evaluation protocol contained one




mitigating measure in this regard.  In an apparent attempt to compensate




for the lack of sufficient "offsite" monitors, several of the performance




statistics for the single offsite monitor (No. 10 in Figure 2-2) were weighted




by a factor of four over those same statistics for the other eight monitors.






          2.2.5  Results of the Performance Evaluation and Model Acceptance




                 The data base for this evaluation has been collected and




the performance evaluation has been carried out according to terras specified





                                    18

-------
in the protocol^.  The overall result was that LUMM scored 363 points and




SHORTZ scored 168 points out of a possible 602 points.  Thus, LUMM was




selected as the appropriate model to be used to determine the emission




limit necessary to attain the NAAQS.  Details of the performance evaluation




results are provided in Appendix B.






     2.3  Warren Power Plant




          The 90 megawatt Warren power plant, operated by the Pennsylvania




Electric Company (Penelec), is located in Warren County in northern Pennsyl-




vania, about 80 km southeast of Erie.  The plant has a single 200-foot (61m)




stack which emits about 2420 Ib/hour of S02 at maximum capacity.  The model-




ing region near Warren is characterized by irregular mountainous terrain,




with peak terrain elevations substantially above the top of the power plant




stack (See Figure 2-3).







          2.3.1  Background




                 As a result of earlier modeling, the area was designated




as nonattainment in the late 1970's.  Penelec was directed by the State of




Pennsylvania to establish, through monitoring and modeling, an emission




limit that would ensure attainment of the NAAQS.




                 Penelec believed that the LAPPES model was appropriate to




use for purposes of setting the emission limits.  In March 1984 Penelec




proposed to the State of Pennsylvania Department of Environmental Resources




(DER) an analysis and a performance evaluation protocol, patterned after




the Interim Procedures, to establish whether LAPPES would be more appro-




priate than EPA's Complex I model.   A series of negotiations between DER,




EPA and Penelec followed.  A number of additions and changes were made to
                                   19

-------
                                          GV-:   o
                      :  S13 rjrick
                                        41. Warren  Power Plant

                                        ' /'--'
  CONTOUR INTERVAL 20 FEET
   UATUM IS MEAN SEA LEVEL
  \
Figure  2-3.   Map of seven  air quality monitoring stations  ( •)  and the
meteorological station  (^)  in the Warren  area.
                                           20

-------
the analysis and the protocol and a final agreed upon analysis and protocol




was written in November 1984-'-1-'.  Data collection requisite to executing the




protocol is currently underway.






          2.3.2  Preliminary Analysis




                 The protocol document contains a definition of the regulatory




aspects of the problem and a description of the source and surroundings.




The analysis establishes that the 3-hour and the 24-hour concentration esti-




mates are at issue.  Penelec proposes to use LAPPES in lieu of Complex I to




estimate concentrations for all averaging times pending the outcome of a com-




parative performance evaluation.  Penelec has also submitted a technical




description of LAPPES.  Although a user's manual for LAPPES exists, it is




not clear that the manual is "current" with the version of LAPPES used in




this application.  Penelec has not provided a rigorous technical comparison




of LAPPES and Complex I following the procedures outlined in the Workbook




for Comparison of Air Quality Models.




                 Based on one year of meteorological data, preliminary concen-




tration estimates have been made with both LAPPES and Complex I and the details




of these estimates, including isopleth maps, are provided in the protocol




document.  These estimates show that maximum concentrations for all averaging




times occur on elevated terrain to the north of the plant.  The preliminary




analysis also identifies another significant S02 source located approximately




4 km east of the Warren power plant.  This source is close enough such that




short-term impacts could overlap.  Since monitoring data would not always




distinguish between these sources, both sources are induced in the model




comparison study.  The hourly average background SC>2 concentration is to be




the lowest concentration observed by any station in the monitoring network.
                                     21

-------
          2.3.3  Protocol for the Performance Evaluation




                 The protocol for the comparative performance evaluation of




LAPPES and Complex I, which is detailed in Appendix C, emphasizes accurate




prediction of the peak concentration.  Forty-three (43) percent of the weight-




ing in the protocol involves the calculation of performance statistics that




characterize each model's ability to reproduce the measured high and second-




high concentrations at the various monitors.  An additional forty-three (43)




percent of the weighting is assigned to performance statistics that character-




ize the models' ability to reproduce the measured concentration in the upper




end of the observed frequency distribution, namely the high-25 concentrations.




These analyses of the high-25 data set include certain statistics that break




out performance by stability category.  In addition, the protocol assigns a




weight of fourteen (14) percent to performance statistics on the entire




range (all data) of measured/predicted values.




                 A variety of performance measures are used in the Warren




protocol; see Appendix C.  Although the bias is weighted heavily in all of




the data sets, the specific performance measures used to characterize bias




vary.  For the maximum single-valued comparisons, the average residual and




the ratio of the absolute residual to the observed concentration (both




paired in space but not time) are used to characterize the bias.  For other




data sets, including the second high single-valued comparisons, extensive




use is made of the ratio of the predicted to observed concentrations as a




measure of bias.  Other performance measures used in the protocol include




correlation measures and ratios of predicted to observed variances.  Perfor-




mance statistics are to be calculated for 1-hour, 3-hour, 24-hour and annual




averaging times.  Each averaging time carries considerable weighting.  Sixty





(60) percent of the weighting is assigned to unpaired data comparisons and




forty (40) percent to data paired in space but not time.



                                  22

-------
                 The scoring scheme used for most performance statistics




consists of a percentage of maximum possible points within specified cutoff




values.  If the performance statistics fall outside of the cutoff values,




no points are to be awarded to the model.  Within the acceptable range, the




percent of possible points is specified in tabular form (discrete values




for specified ranges of performance).  The tabular values for the bias




statistics slightly favor the model that overpredicts, if one model overpre-




dicts to the same extent that the other model underpredicts.  The scoring




schemes for other performance measures are more complicated; refer to




Appendix C for details.  Subscores for each performance statistic are




totaled to obtain a final score for each model.




                 Initially, the model with the highest score is deemed to




be most appropriate to apply for regulatory purposes.  However, the protocol




contains some additional procedures to be employed if the LAPPES model




attains the highest score but is shown to underpredict the highest concentra-




tions.  For the 3-hour and 24-hour averaging periods, the average of the 10




highest concentrations predicted by LAPPES will be compared with the average




of the 10 highest observed values.  If the ratio of the observed to predicted




average is greater than 1, then this ratio will be used to adjust LAPPES




model predictions for the regulatory analyses.  This "safety factor" is




intended to compensate for any systematic model underpredictions.  If -the




ratio is less than 1, no adjustment will be made.  Note that a different




ratio will be used for each averaging time.  For annual average concentra-




tions, the averages of observed and predicted values at the seven monitoring




stations will be compared.  If the average of observed annual values is




larger than predicted, then model predictions will be adjusted by the ratio





of the observed to predicted average.
                                   23

-------
          2.3.4  Data Bases for the Performance Evaluation




                 The data base for the performance evaluation consists of a




network of monitors and meteorological stations specifically designed to cover




the area of maximum predicted concentration and to fit the needs of the pro-




tocol.  This data base consists of seven monitors, six of which are in the




area north of the plant, where preliminary estimates indicated that high




concentrations would occur (See Figure 2-3).  The seventh monitor, located




south of the plant would most often be used to determine background.  Two




meteorological towers are included in the network but data from the Starbrick




tower would be used exclusively unless such data are missing.  For missing




data periods, a hierarchy of default data sources are specified in the




protocol, including data from the Preston tower and off-site data.  Wind




fluctuation (sigma theta) data are used to determine stability in accordance




with the scheme defined in the "Regional Workshops on Air Quality Modeling:




A Summary Report" H.  Morning and afternoon mixing heights are primarily from




Pittsburgh National Weather Service data.  Hourly emission data and stack gas




parameters are to be derived from records of plant load level and coal




sample data.






     2.4  Lovett Power Plant




          The Lovett power plant is located in the Hudson River Valley of




New York State and is owned by Orange and Rockland Utilities, Inc.  The




plant generates 495 megawatts of electricity and is currently burning 0.37




percent sulfur oil.  Major terrain features in the vicinity of the plant




include the Hudson River Valley, which generally runs from north to south,




and several nearby mountains.  Dunderberg Mountain, with a maximum elevation




of approximately 1100 feet (335m) is located 1-2 km to the north.  Other signi-





ficant topographic elevations include Buckberg Mountain, about 1.3 km to the




                                     24

-------
west, with a peak of 787 feet (240 m).  An area of high terrain extends

from west-northwest through north within 5 km of the plant.  A map of the

region is presented in Figure 2-4.


          2.4.1  Background

                 The company requested to convert the plant to low sulfur

(0.6-0.7 percent) coal with a new emission^ limit of 1.0 Ibs S02/nm Btu.  An

actual increase in S02 emissions of approximately 12,000 tons per year

would result.

                 In April 1984, the EPA Administrator agreed, in principle, to

allow the company to construct a new 475-foot (145m) stack and convert the

plant to coal.  One provision of the agreement was that the company develop

a protocol for a performance evaluation which was acceptable to EPA and execute

this protocol once the new stack was erected and the conversion to coal was

completed.  The company drafted a protocol for the comparative performance

evaluation of three models: the NYSDEC model, a modified version of the NYSDEC

model (the company's model of choice) and EPA's Complex I model.  A series of

negotiations then took place between the company, the State of New York and

EPA where the details of the protocol and the proposed monitoring network

were changed several times.  A final protocol11>12 was agreed upon by all

parties in September 1984.*  The data base collection phase is not yet under-

way.  It should be completed by 1988.


          2.4.2  Preliminary Analysis

                 The preliminary analysis of the proposed application, contained

in the protocol documents, provides a complete description of the existing
*Although an appropriate protocol was agreed upon by the source and the
 control agencies, the construction of the 475-foot stack and conversion of
 the plant to coal has not yet begun, pending the outcome (final Federal
 Register approval or disapproval) of the proposed SIP revision.

                                     25

-------
                        ,     _
             7  -  ,.-*   ^ v ,.£*•  =
             f   ,^       \  \oo :
             1\ s 5  ,,r   -    »,
 "" /   	'  [i.   I •'   ""•  •*--"- 4rr \ix.U "",1™.-., •: ;t'
. •'  '  '  -•    . //     t>  r  /    JSs^1/™-. '.^--p-~^. "'•, ,  -S'
                                                                                         v^
                                                                                          ^5
                                                        J   ^-^   H'^Vi^Siv
Figure  2-4.   Map  of  air quality  monitoring  stations  (•)  and the primary
meteorological tower (^) in  the vicinity of  the Lovett  power plant  («).
meter meteorological towers are  also located  at Sites  75,  100, 119 and 6.
                                           26
                                                                         Ten-

-------
and proposed source and the surroundings.  The regulatory constraints had




been established earlier, namely attainment of the S02 NAAQS, primarily the




short-term NAAQS, on nearby elevated terrain above stack height.  The proto-




col document identifies Complex I as the reference model and the two proposed




models, NYSDEC and Modified NYSDEC model.  The technical features of the




two proposed models are described but no user's manuals are provided.  The




preliminary analysis does not include a formal technical comparison of the




proposed and reference models following the procedures outlined in the




Workbook for Comparison of Air Quality Models.




                 Preliminary estimates of 3- and 24-hour S02 concentrations




have been made with Complex I and the Modified NYSDEC model, using one year of




meteorological data from a tower located at the nearby Bowline power plant.




Modeling has been performed for both maximum and average load conditions.




The protocol document contains a fairly comprehensive analysis of the




results including isopleth maps of maximum short-term concentrations and




tables listing the magnitude and locations of the "high-50" estimates.  The




analysis shows that maximum concentrations for both models would be expected




on Dunderberg Mountain to the north of the plant.  Complex I estimates are




as much as an order of magnitude higher than the Modified NYSDEC model




estimates.  Secondary maxima are estimated to occur on other more distant




terrain features in several directions but these estimates are much lower




than those on Dunderberg Mountain.




                 The protocol document identifies the Bowline power plant, 6 km




to the south, as another significant source of SC>2, the plume from which could




simultaneously (with Lovett) impact Dunderberg Mountain.  The contribution from




this plant will be quantified, as a function of meteorological conditions, through
                                     27

-------
utilization of data from the monitoring network obtained prior to the




Lovett plant conversion.






          2.4.3  Protocol for the Performance Evaluation




                 The protocol for the comparative performance evaluation of




the three competing models, which is detailed in Appendix D, emphasizes




accurate prediction of the peak concentrations and the upper end of the




frequency distribution.  Twenty (20) percent of the weighting in the pro-




tocol involves the calculation of performance statistics that characterize




each model's ability to reproduce the measured second-high concentrations




at the various monitors.  Fifty-eight (58) percent of the weighting is




assigned to performance statistics that characterize the models' ability




to reproduce the measured concentration in the upper end of the observed




frequency distribution, namely the high-25 concentrations.  In addition,




the protocol assigns twenty-two (22) percent of the weighting to performance




measures designed to determine how well the models perform for the entire




range (all data) of measured/predicted values, broken out into stable and




unstable conditions.




                 The primary performance evaluation measures are the ratios




of observed to predicted concentrations, ratios of the observed to predicted




variances and the inverse of these ratios.  Seventy-eight (78) percent of




the weighting is associated with statistics based on the values of these




ratios.  These statistics are to be calculated for all combinations of data




pairings but most often the unpaired data sets and the data sets paired in




space only are used.  The analysis of the "all data" data set includes




statistics that break out performance by stability category.  The other




twenty-two (22) percent of the weighting is associated with performance
                                   28

-------
measures designed to characterize correlation, gross variaoility and the




ability of the models to accurately predict observed concentrations during




observed meteorological conditions.




                 The scoring scheme used for most performance statistics is




specified by somewhat complicated formulae and the reader is referred to




Appendix D for details.  The scheme is similar to that used in the Westvaco




protocol.  Basically, it involves computing ratios of performance measures




between the three competing models and bias ratios or variance ratios for




each model.  These ratios are then combined in various ways to produce a




percentage of maximum possible points for each performance statistic.




This result is then multiplied by the maximum possible points for that




performance statistic to yield a subscore.  Subscores are then summed




for each model to yield a total score.




                 Initially, the model with the highest score is deemed to be




most appropriate to apply for regulatory purposes.  However, the protocol




contains some additional procedures to be employed if the chosen model is




shown to underpredict the highest concentrations.  The procedure, which is




based on the unpaired in time and space comparisons, is as follows:




                 (1)  If the average of the highest ten predicted 3- or 24-hour




average concentrations is less than the average of the highest ten observed




3- or 24-hour average concentrations, or




                 (2)  If the highest, second-highest predicted 3- or 24-hour




average concentration is less than ninety (90) percent of the highest,




second-highest observed 3-or 24-hour average concentration,




then the model predictions will be linearly adjusted to correct this regula-




tory problem.  The adjustment factors will be calculated as the minimum




needed to eliminate the two conditions of underprediction listed above.
                                   29

-------
          2.4.4  Data Bases for the Performance Evaluation




                 The data base for the performance evaluation will consist




of a network of monitors/meteorological stations specially designed to cover




the area of maximum predicted concentration and to fit the needs of the




protocol.  This data base will consist of eleven monitors, nine of which




are to be in the area north of the plant where preliminary estimates indicate




that high concentrations would occur (See Figure 2-4).  Monitor #38, located




south of the plant, will most often be used to determine background.  A




100-meter meteorological tower, instrumented at three levels,  will be




located at the plant site.  Ten-meter meteorological towers are included in




the network at sites 6, 119, lOu and 75 but data from the 100-meter tower




will be used exclusively.  For missing data periods, a hierarchy of default




data sources is specified in the protocol.   These primarily consist of data




from other levels on the 100-meter tower. Wind fluctuation (sigma theta)




data from the 10-meter height are used to determine stability inputs to




Complex I and the NYSDEC model in accordance with the scheme defined in the




Regional Workshops on Air Quality Modeling:  A Summary Report.   Sigma theta




data from the 100-meter level are used as direct input to the Modified NYSDEC




model.  Morning and afternoon mixing heights will be primarily derived from




the Albany National Weather Service data.  Hourly emission data and stack




gas parameters will be derived from continuous in-stack measurements.







     2.5  Guayanilla Basin




          The Guayanilla Basin is located on the southern coast of the island




of Puerto Rico.  The area is characterized by coastal plains with hills




rising abruptly from the plains (See Figure 2-5).  Historically, several




industrial sources of S02 have operated in the area but most have shut down.




The only currently operating sources, and which are relevant to this analysis,





                                    30

-------
_=-'^^i^r/V-A':^
                  !;\ ;;<4i3.T'"fei    i   & v
                 =^^^TLJL''"" **           •-
                 "H^ic~V. - -TV* :.&—;
                 ST**^   &;" V  " -T^U   ••
                                  .T i^.. T". ^ 'V-jL,*;
                                  ^.si^SuwTS0
r% -^ A^=_  • "~^^-vlc|^
3:  V^s^-^-fJffiiissiKfi _

                                                                                   cu
                                                                                   4-1
                                                                                   cu
                                                                                   e
1
5
5
8 2
n =d '
. il
-z '
2"!-

•\ ^tzr**—^**^ ' ^
//\^r x" i , ~ 1
' 7 / N"'" ! '•-. ; ; *- ' :
/^ ' \ * , \. - & •
                                                                                   u
                                                                                   3
                                                                                —;  o
                                                                                 1  4-1

                                                                                   O
                                                                                           o

                                                                                           4-1
                                                                                           
-------
are the Puerto Rico Electric Power Authority (PREPA) power plant and the




Union Carbide (UCCI) facility, both located near Tallaboa Poniente.  The oil




fired PREPA plant has stacks ranging in height from 23 feet (7m) to 250




feet (76m) and a combined nominal SC>2 emission rate of 16,545 Ib/hour.  The




UCCI plant has five stacks ranging in height from 38 feet (12m) to 160 feet




(49m) with a combined nominal SC>2 emission rate of 1568 Ib/hour.  Nominal




plant grade for both facilities is ten feet (3m) above mean sea level.







          2.5.1  Background




                 The major regulatory concern with these plants has been the




attainment of the short-term S02 NAAQS on elevated terrain to the north and




northwest of the sources.  Modeling with EPA's Complex I model indicated that




there would be NAAQS violations on the terrain.  Industrial interests and




the Puerto Rico Environmental Quality Board (PREQB) maintained for several




years that emission limits should be based on estimates from the Puerto Rico




Air Quality Model (PRAQM), which generally predicts lower concentrations




than Complex I.  In 1979 Environmental Research and Technology, Inc. (ERT)




prepared a report for PREQB entitled "Validation of the Puerto Rico Air




Quality Model for the Guayanilla Basin"^ .  The report compared, in various




ways, model estimates with historical ambient air quality data from eight




monitors (four on elevated terrain) in the area.  EPA expressed concerns




about the technical aspects ot the model and the underestimation of the




observed concentrations at some monitors.




                 In response to these concerns, it was decided in early 1984




that a comparative performance evaluation between the PRAQM and Complex I




should be undertaken.  Hence EPA developed an analysis and draft protocol for




this performance evaluation.  The protocol and design of the monitoring
                                   32

-------
network were then negotiated with PREQB and the industrial interests.  A




final agreed-upon protocol was issued in December 1984^.







          2.5.2  Preliminary Analysis




                 The protocol document contains a definition of the regulatory




aspects of the problem and a description of the sources and their surroundings.




The document states that only the short-term S02 concentration estimates are




at issue and that the PREQB proposes to use PRAQM in lieu of Complex 1 to




estimate these concentrations, pending the outcome of a comparative performance




evaluation.  A technical description of PRAQM is contained in the protocol.




Apparently no user's manual for PRAQM exists.  No formal technical comparison




of PRAQM and Complex I, following the procedures outlined in the Workbook




for Comparison of Air Quality Models, was performed.




                 Some preliminary concentration estimates have been made with




the PRAQM, Complex I and also the SHORTZ model.  The details of these




estimates are not provided in the protocol document; however, all parties




are privy to the results.   The results indicate the following:




                 1.  Maximum concentration estimates occur on elevated terrain




to the north and northwest of the plants; however Complex I,  PRAQM and




SHORTZ all produce different results in terms of magnitude, specific location,




and time of the maximum concentrations.




                 2.  Maximum 3-hour and 24-hour concentrations frequently




occur both at the monitored locations and in areas that are not monitored.




                 3.  In terms of magnitude, SHORTZ seems to yield the highest




concentrations, significantly higher than either Complex I or PRAQM.  The




PRAQM yields the lowest concentration estimates.
                                   33

-------
                 4.  The meteorological data indicate a predominance of




neutral/unstable conditions associated with the daytime southeast winds.




Such conditions generally carry the plumes over the terrain to the northwest




of the sources.  However, there are occasional hours, during periods of wind




shifts, when stable plumes traveling over terrain could have a significant




short-term air quality impact.




                 Based on these results,  it has been decided that Complex I is




the appropriate reference model and PRAQM the proposed model for the performance




evaluation; SHORTZ has been dropped from further consideration.   It has also




been established that while the existing monitoring network is acceptable




for a preliminary performance evaluation, some data from a more detailed




network will be necessary to confirm/refute the results of this evaluation.




The specifics on how to use the existing network data as well as the design




and use of the augmented network and data are discussed below.







          2.5.3  Protocol for the Performance Evaluation




                 The protocol for the comparative performance evaluation of




PRAQM and Complex I, which is detailed in Appendix E, specifies that the




performance evaluation will be divided into two phases.  Phase I is an evalua-




tion for the period January 1983 through December 1984 using monitored data




collected at the four existing monitoring sites.  Using the selection criteria




contained in the protocol a model of choice will be selected in this phase.




Phase II of the evaluation is designed to confirm the conclusions reached




as a result of the Phase I evaluation.  Phase II will be based on six months




of air quality data from all eight sites (beginning around September 1984).




The specifics of the protocol for each phase are identical, except for a




minor stipulation involving the weighting of performance statistics by




monitor.




                                     34

-------
                 The protocol emphasizes accurate prediction of the peak




concentration.  Thirty-two (32) percent of the weighting in the protocol




involves the calculation of performance statistics that characterize each




model's ability to reproduce the measured maximum and second-high concentra-




tions at the various monitors.  Sixty-eight (68) percent of the weighting




is assigned to performance statistics that characterize the models'  ability




to reproduce measured concentration in the upper end of the observed frequency




distribution, namely the high-25 observed and predicted concentrations.




                 The primary performance measures are the ratio of the predicted




to the observed concentration (average predicted to average observed for the




high-25 data set) and the ratio of the variance of predicted concentrations




to variance of observed concentrations.  Seventy-seven (77) percent  of the




weighting is on data paired in space but not in time and twenty-three (23)




percent on unpaired data.  Most performance statistics are to be calculated




for 1-, 3-, and 24-hour averaging times.  The ratio measures are supplemented




by case study statistics, based on the number of cases in common between




predicted and observed concentrations (stratified by stability class, for




the upper five percent of the 1-hour values).




                 The Guayanilla protocol specifies that certain performance




measures are weighted according to the magnitude of the observed concentra-




tions.  The performance statistics for the monitor with a higher observed




concentration is given proportionally more weight than that of the next




lower ranked monitor.  The monitor with the lowest reading receives  the




least weight.




                 The scoring scheme used for most performance statistics




consists of a percentage of maximum possible points within specified cutoff




values.   If the performance statistic falls outside of the cutoff  values,
                                   35

-------
no points would be awarded to the model.   Within the acceptable range,  the




percent of possible points is specified in tabular form (discrete values for




specified ranges of performance).  The tabular values for the bias statistics




favor the model that overpredicts, if one model overpredicts  to the same




extent that the other model underpredicts.




                 Scores for each model for Phase I and Phase  II are determined




by totalling the subscores for each performance statistic.   For each Phase,




the PRAQM is deemed to be the better performer if its score exceeds the score




obtained for Complex I by 10 percent.  If Phase II leads  to a selection of




the same model as Phase I, this will be the model for future  regulatory use




in Guayanilla.  If Phase II leads to a selection of a different model,  air




quality data will be collected for an additional six month  period at the




eight monitoring sites.




                 If for both Phases I and II the PRAQM model  has a point score




at least 10 percent higher than Complex I, it will be considered the preferred




model for use in the Guayanilla Basin.




                 Concentration estimates  from the model with  the highest score




are to be adjusted upward if the highest  observed concentrations are signifi-




cantly underpredicted.  The procedure, which is based on the  unpaired in




time and space comparisons, is as follows:




                 (1)  If the average of the highest ten predicted 3- and




24-hour average concentrations is less than the average of  the highest  ten




observed 3- and 24-hour average concentrations, or




                 (2)  If the highest, second-highest predicted 3- or 24-hour




average concentration is less than ninety (90) percent of the highest,  second-




highest observed 3-or 24-hour average concentration,
                                   36

-------
then the model predictions will be linearly adjusted to correct for this




regulatory problem.  The adjustment factors will be calculated as the




minimum needed to eliminate the two conditions of underprediction listed




above.  If Phase II of the evaluation confirms the selection of the model




determined by Phase I, but there is a difference in terms of whether an




adjustment is warranted or different adjustments are indicated, the adjust-




ment that is most conservative (leads to the most stringent emission limit)




will be selected.






          2.5.4  Data Bases for the Performance Evaluation




                 The data base for the Phase I performance evaluation consists




of two years of data from an existing 4-station monitoring network and an




on-site meteorological tower.  The data base for Phase II consists of six




months of data from an 8-station network including the original four monitors




plus four additional monitors situated to better cover the area of predicted




maximum concentration and to fit the requirements of the protocol.  The




locations of the monitors are indicated in Figure 2-5.  Data from the same




meteorological tower are used in Phase II.




                 Sensors are mounted on a meteorological tower, located




near the PREPA plant, to collect wind speed, wind direction and temperature




data at 10 and 76 meters.  Wind data from 76 meters will be scaled to plume




height with the 10-meter data used as backup.  Wind fluctuation (sigraa




theta) data collected at 10 meters will be used to determine Pasquill-Gifford




stability class for both models according to the scheme described in the




Regional Workshop on Air Quality Air Quality Modeling:  A Summary Report.




Periods of missing data will be eliminated from the performance evaluation,




Climatological average daily maximum and minimum mixing heights will be
                                   37

-------
used.  Hourly emission data and stack gas parameters are to be generated from




load levels, fuel consumption rates, fuel sampling and other surrogate




parameters that are technically defensible.




                 At the present time data collection from Phase II is still




underway and no results from either the Phase I or Phase II performance




evaluation are available.







     2.6  Other Protocols




          In addition to the five major performance evaluation analyses and




protocols discussed above,  EPA is aware of three other analyses/protocols




written to assess the acceptability of proposed models for specific sources.




For one reason or another  these efforts never reached fruition, i.e.  no




decisions were made or are  intended to be made, on emission limits based on




the chosen model.  Brief descriptions of these three efforts are provided




below.







          2.6.1  Example Problem




                 One such  effort is the example problem which illustrates




the use of the Interim Procedures for Evaluating Air Quality Models (Revised)




and is included as Appendix B to that document.  This narrative example was




based on 1976 emissions data from the Clifty Creek power plant in Indiana




and 1976 S02 ambient data  from a 7-station network in the vicinity of the




plant.




                 The narrative example was specifically designed to illustrate




in a very general way the  components of the decision making process and the




protocol for performance evaluation.  As such, the preliminary technical/




regulatory analysis of the  intended model application, while included in




the example, was significantly fore-shortened from that which would normally




be needed for an actual case.  Also, since the evaluation was carried out




                                    38

-------
on an existing data base, the example did not illustrate the design of the




field measurement program required to obtain model evaluation data.




                 The example problem protocol incorporated a broad spectrum




of performance statistics with associated weights.  The number of statistics




contained in the example was overly broad for most performance evaluations




and perhaps, even for the problem illustrated.  Thus its use was not intended




to be a "model" for actual situations.  For an individual performance evalua-




tion it was recommended that a subset of statistics be used, tailored to the




performance evaluation objectives of the problem.  Similarly, the method




used to assign scores to each performance statistic (non-overlapping confi-




dence intervals) was not intended to be a rigid "model" but only an illustration




of one of several possible techniques to accomplish the goal.






          2.6.2  Gibson Power Plant




                 In May' 1981, Public Service Company of Indiana (PSI)




submitted to the Indiana Air Pollution Control Division (IAPCD) a report^




which outlined proposed procedures for conducting a comparative performance




evaluation of models applicable to setting the SC>2 emissions limit for the




Gibson power plant.  PSI proposed to establish a monitoring network (actually




augment an existing network), the data from which would be used to establish




whether either of two versions of the MPSDM model would be more appropriate




to apply to the plant than EPA's CRSTER model.  The report contained an in-




complete performance evaluation protocol that would be used in the evaluation.




                 Following submittal of this report, a series of negotiations




on the protocol and the monitoring network took place between PSI and IAPCD.




Some of these negotiations involved EPA.  In July 1981, IAPCD accepted the




PSI plan, but EPA continued to express major concerns about the technical




aspects of the proposed models,  on the monitoring network  and on the



                                   39

-------
protocol.  These concerns were not resolved and in June and August 1982




EPA sent letters to PSI^IS cautioning them that the Agency could not accept




the results of the performance evaluation, if the company chose to proceed.




                 Apparently PSI proceeded with the evaluation and collected




the one year of data from the network.  The outcome of the evaluation is




unknown at the present time.







          2.6.3  Homer City Area




                 In November 1982, the Pennsylvania Electric Company (Penelec)




submitted to the State of Pennsylvania Department of Environmental Resources




(DER) a report, "Protocol for the Comparative Performance Evaluation of the




LAPPES and Complex I Dispersion Models Using the Penelec Data Set"19.  xhe




company's intent was to execute the protocol and demonstrate the acceptabil-




ity of the LAPPES model in the Homer City, Pennsylvania area so that this




model could be used to revise SC>2 regulations for four area power plants.




The plants, which have varying stack heights, are located in moderately




complex terrain with receptors of concern located both above and below the




heights of the stacks.




                 The protocol was reviewed by DER and by EPA and a number of




comments/suggestions were provided to Penelec.  The most significant comment




involved the choice of Complex I as the only reference model.  An examination




of the topography in relationship to stack heights in the area revealed




that many of the monitors (and most of the terrain) were below most of the




physical stack heights.  In fact, when expected plume rise was considered,




only the Seward plant, because of its relatively short stack, exhibited a




real risk of direct, stable plume impaction on terrain; the Conemaugh plant





was somewhat marginal in this regard.  From an overall performance evaluation




standpoint, this resulted in a dilemma.  Some of the monitors were considered




                                   40

-------
"flat terrain" receptors for which CRSTER was the appropriate reference




model while some were complex terrain sites where Complex I might be appro-




priate.  An added complexity was that, because of varying stack heights,




some monitors might be both flat terrain and complex terrain receptors




depending on which power plant was being modeled.  Thus, Complex I was not




the appropriate model for all monitors, as proposed in the protocol and it




will likely underestimate concentrations at receptors that are below stack




height.




                 Although the protocol has been executed,20 the issue regarding




the choice of an appropriate reference model(s) has apparently never been




resolved.
                                   41

-------
42

-------
3.0  INTERCOMPARISON OF APPLICATIONS




     In this section, the details of the five major applications of the




Interim Procedures are intercompared.  Each subsection below corresponds




roughly to and in the same order as Sections 2, 3 and 4 (and inherent sub-




sections) of the Interim Procedures for Evaluating Air Quality Models




(Revised).  It is also possible to identify the subsections below with




sequential blocks in the flow chart for the Interim Procedures provided




in Figure 1-1 above.  In this way it is possible to analyze the five




applications according to subject matter as it appears in the Interim




Procedures.




     In the subsections below the common and differing features among the




five major applications are described. Where appropriate, these features




are compared to recommendations/suggestions contained in the corresponding




section/subsection of the Interim Procedures and similarities/differences




are noted.




     The material contained in this section is intended to be factual,




i.e. additional interpretation or opinion is generally avoided.  Any inter-




pretations and/or opinions that are provided are only intended to reflect the




views, or apparent views, contained in the individual protocols and related




documents.






     3.1  Preliminary Analysis




          The Interim Procedures recommend that before any performance




evaluation protocol is written or any performance data are identified/




collected, the applicant should conduct a thorough preliminary analysis




of the situation.  This analysis serves to describe the source and its




environment, the regulatory constraints, the proposed and reference
                                   43

-------
models, the relative technical superiority




and the ambient consequences of applying i




regulatory problems.




          In each of the five application




conducted, although the level of detail a




the recommended procedures varied conside




3.1.1 - 3.1.4 below.






          3.1.1  Regulatory Aspects




                 The Interim Procedures t




identify the regulatory aspects of the pi




averaging times and applicable regulatior




                 In each of the five app.




analysis was quite thoroughly covered.




SC>2 emitters and the NAAQS were identifi




          .  i.e. PSD increments or other
compliance with the annual standard.  u




power plant where it was established the




to be acceptable, would only be used to






          3.1.2  Source Characteristics




                 The Interim Procedures




be accompanied by a complete descriptio




                 Table 3-1 compares the




protocols.  The Table shows that power

-------
 0>

 fl
 o
 >
 c
 3
 O
cn
 en
 o
 u
 (U
 iJ
 o
 CO
 r*
3
 3

CO
PI


 D
rH
XI
 C8
H


H
CO
jjj
3




Z
i— i
j
33


















H Z
Z <
— i
2
H
_J
O.

OS
Cd
^e
O rH
Ol
•1
J
1— 1
2

3*
^
3
3-i — 1

H
Z
<
3-
OS
3
O
a- en





^j
u co
cd ^
= u
o <
CO H
CO
Ltd
O E&,
O
^ .
>. Q
H Z




o en
in o
CM CM
1 - O
en co o
"H r— i in





!-» O
in en o
r^ 00 -1
^f P^ 'H





O
O CM O
o ^r o
CM CM r^






en o
en oo 0
rsj o in
*o -i <
33 iO oS
O CO OS
*-l IH M
jd 2 H

^t
^ _3 aq
^C fH ^^
H O Ed
CO H Z





J

OS

OS




, 1
^
oS
^3
OS




^J

-------
                                                           •  r
emissions with the exception of Westvaco.  Most evaluations involve,




effectively, a single tall stack.  The Guayanilla evaluation is the only




evaluation involving a true multiple stack situation.  In all cases except




Baldwin, complex terrain is a major consideration with nearby terrain well




above stack height(s).  All of the sources are in a rural environment and




most are isolated from any neighbors, i.e. the contribution from nearby




sources is not considered to be significant.  The exceptions are Warren,




where a nearby plant is to be explicitly modeled, and Lovett, where the con-




tribution from the nearby Bowline power plant will be determined from




monitoring data.






          3.1.3  Proposed and Reference Models




                 The Interim Procedures state that for each evaluation it




is highly desirable to choose a proposed and a reference model applicable




to the situation.  (For cases where no reference model can be identified,




the Interim Procedures suggest an alternative approach that can be used to




determine acceptability of the proposed model.)  It is further recommended




that each model be well documented, by a user's manual if possible.  The




technical features of the competing models should be intercompared, preferably




using techniques described in the Workbook for Comparison of Air Quality Models,




                 Table 3-2 lists the proposed and reference models for each




of the five evaluations and the degree to which these models are documented




and intercompared.  The Table shows that each evaluation involves a different




proposed model.  In- the case of Lovett there are two proposed models.  The




Complex I model is most often used as the reference model.  All of the




preliminary analyses contain technical descriptions of the models to be




evaluated as well as technical/descriptive comparisions of the relavant

-------
 01
-a
 o


 O)
 o

 0)
 i-i
 01
^
 01
oe!

-3
 C
 CO
 en
 o
 1)
i—I
^3

 cO
-J
^4
z

?•*
^
a
O






^
til
§
_J


z
Ed
OS
sg
3
0
CJ

^
C-l
CO
Ed
3
Z
M

Q
, T
^
33
























^
o-
< CO CO

CU >•* Z ^

u
Ed
Q
CO

u"z
Q •
CO Q CO CO
>" O Ed O Ed
Z Z >« Z >•


03
Cu
Cu CO —4 CO
^J ^* XS ?^





2
a co —i co
3 Ed O EiJ
J >- Z >>


2
Q
CO CO CO CO
p . fT^ J>^ [j^
2 >* >- >H
H
H-l
3

O Z
M O
H co
CU M
i-t oi
as < -3
CJ CU Ed
J co S Q
Ed Ed O O
Q Q J - CJ Z
O <
2 J = -3 Ed

O CJ ^5 CJ Z
Ed ^H 2£ *-"* Ed
CO z Z ai
O S ad 35 Ed
Cw CJ M CJ i,
O Ed CO Ed a
OS H => H OS
Cu
! — 1
X
J
CLi
2
O
CJ



i— i

X
Cd
Cu
O
-
1— 1
X
J
CL,
2
O



M
H
2i
o

CO

as
Ed
C— (
CO
as
O








-J

Q
O
y

a
CJ
z
Ed
ai
Ed
Ex^
"T")
«



CO
Ed
>H








CO
Ed




CO
s






CO
Ed
, >"




CO
Ed


Z
O
i— i
H
CU
M
n^
U
C/3
b3
Q

_3
•^
CJ
1— 1
z
*

Ed
£-4




CN
0
Z








o
z




CM
O
z






1— i
o
:z




01
U3












i-J
•^
H3
2
2
3£

ceS
UJ
va
^

C
o
1-1
^J
o
^
o.
o.
CO

01
ff
4J

e
•H
0)
cn
3

^4
o
U-l
T3
CU
•H
U-l
•H
•a
0
6
CO
Cfl
3
rH
0)
-3
o
e

0)
r*
4-1

4-1
3


73
4^
CO
.-1
>;
CU
1— 1
0)
-a
0
s

0)
x;
4-1

O
U-l

i— 1
tfl
3
C
^

i-i
ty
W
S


a 01
e 3
O Q
CJ S
                                                                       47

-------
                                                           :  r
features of the models.  In only one case was the "Workbook" comparison




rigorously applied.  Explicit, up to date, user's manuals were most often




not available.  In some cases such manuals did exist but were not up-to-date




with the version of the models to be used in the evaluation.






          3.1.4  Preliminary Concentration Estimates




                 The Interim Procedures suggest that preliminary concentration




estimates be obtained from both the proposed and the reference models, as




an aid to writing the protocol and designing the requisite data bases.




                 In the three most recent protocols (Warren, Lovett and




Guayanilla) such estimates were made, although they are not well documented




for Guayanilla.  In the Baldwin and Westvaco evaluations, it is not evident




that any formal estimates were made although both the source and the control




agencies had a good idea of the consequences (location and magnitude of




high estimates) of applying the models.






     3.2  Protocol for the Performance Evaluation




          The Interim Procedures require that a protocol be prepared for




comparing the performance of the reference model and proposed model.  The




protocol must be agreed upon by the applicant and the appropriate regulatory




agencies prior to collection of the requisite data bases.




          In each of the five cases such a protocol was written, negotiated




with the control agencies, and a. final protocol to be used in the evaluation




was established.  The relative details of the various protocols are compared




in the following subsections, 3.2.1 - 3.2.5.  The degree to which the




negotiating parties were in full agreement that the final established




protocol was optimum is discussed in Section 3.4.

-------
          3.2.1  Performance Evaluation Objectives




                 The Interim Procedures suggest that the first step to




developing a model performance protocol is to translate the regulatory pur-




poses associated with the intended model application into performance evalua-




tion objectives which, in turn can be linked to specific data sets and




performance measures.  Ranked-order performance objectives are suggested with




the primary objective focussing on what is perceived to be (from the preliminary




analysis) the critical source-receptor relationship, i.e. the averaging time,




the receptor locations, the set(s) of meteorological conditions and the source




configuration that are most likely associated with the design concentration.




Lower-order objectives, e.g. second-order, third-order, etc., would focus on




other source-receptor relationships which must be addressed when the chosen




model is ultimately applied to the situation, but are not perceived to be of




prime importance (not as likely to be associated with a design concentration)




when the chosen model is applied.




                 In the five protocols, specific sets of ranked-order




objectives were not stated, at least in the sense described above.  However,




it is apparent from the choices of data sets and performance measures, the




weighting of data sets/performance measures, the sometimes-used differential




weighting of individual monitor data, and the scoring schemes employed in




the protocols, that the writers had such ranked-order objectives implicitly




in mind.  Most of the protocols explicitly stated a single broad objective




which focuses on an accurate prediction of peak short-term concentrations.




These statements were generally not narrowed down to include specific recep-




tor locations, the importance of time pairing or critical meteorological




conditions.  However, as mentioned above, it is evident from the protocols'




contents that these single broad performance objective statements really did




implicitly contain sets of ranked-order specific objectives.



                                   49

-------
          3.2.2  Data Sets, Averaging Times and Pairing

                 The Interim Procedures mention a number of possible data

sets which can be considered but makes no specific recommendation as to the

choice of data sets for an individual situation.

                 Table 3-3 compares the data sets contained in each of the

five major protocols and the weighting (percent of maximum possible points)

of each data set.  The protocols are arranged roughly chronologically

across the top of the table, in the order in time when each was finalized,

to see if there are any trends in the choices of data sets or weighting.

No obvious pattern is apparent.  It is clear from Table 3-3 that each of

the protocols focuses on the common broad performance evaluation objective

of accurate prediction of peak short-term concentration.  However, it is

obvious from the choice and weighting of data sets that the protocol

writers had different ideas on how to best meet that objective.  Three of

the five protocols examined the highest observed/predicted concentration

data set as well as the second-highest data set.  All of the protocols tested

the competing models against the second-highs and the high-25 set, although

there were considerable differences in the weighting among the protocols.

Two protocols specify that some performance statistics will be calculated

for all data but the weighting of this data set is lower than the peak/high-25

data sets.

               The Interim Procedures suggest that performance of models whose

basic averaging time is shorter than the regulatory averaging time should

be evaluated for that shorter period as well as averaging times corresponding

to the regulatory constraints.*  Since all five cases involved SC>2 models
*Most models compute sequential concentrations at each receptor over a short
 time average, e.g., 1-hour.  Average concentrations for longer periods, e.g.,
 3-hour, 24-hour, are arrived at by summing the sequential short-term averages.

-------
Table 3-3  Weighting (%) of Maximum Possible Points by Data Set, Averaging
Time and Degree of Pairing

1
IDATA SET
MAXIMUM
SECOND HIGH
HIGH 25
ALL DATA
-
AVERAGING TIME
1-HOUR
3-HOUR
24-HOUR
ANNUAL
PAIRING
UNPAIRED
SPACE ONLY
TIME ONLY
SPACE AND TIME
BALDWIN

0
55
45
0
0
100
0
0
70
5
0
25
WESTVACO

21
22
57
0
19
37
37
7
32
62
3
3
WARREN

15
28
43
14
20
30
36
14
60
40
0
0
LOVETT

0
20
58
22
36
36
26
2
44
34
21
1
GUAYANILLA

12
15
74
0
34
27
39
0
23
77
0
0
                                     51

-------
                                                          :   r
whose basic averaging time is one hour, this would suggest that 1-hour




statistics be calculated as well as 3-hour, 24-hour, etc.  Table 3-3 shows




that all of the protocols except Baldwin specify that performance statistics




should be calculated for 1-, 3- and 24-hour averaging times.   For Baldwin




it was established up-front that the proposed model, if selected, would only




be applied for the 3-hour averaging time.  This may be the reason why statis-




tics are not to be calculated for other averaging times, including 1-hour.




Computation of the annual concentration is not a significant  issue in any




of the cases.  This is apparently the reason that low or no weight is given




to statistics for that averaging time.




               Weighting may also be distributed according to performance




statistics calculated for data paired in space, time, both space and time




or completely unpaired.  The Interim Procedures discuss the various possible




degrees of pairing asociated with each data set but makes no  specific recom-




mendation as to which to choose or the weighting distribution.  Instead, the




Interim Procedures suggest that through the development of performance evalua-




tion objectives, pairing can be identified.




               Table 3-3 also shows the weighting of maximum  possible points




according to the degree of pairing specified in each of the five protocols.




Since detailed performance evaluation objectives are generally lacking for




these protocols, it is difficult to establish a rationale for the seemingly




significant variation of weighting among the protocols.  In each evaluation




a relatively isolated point source of SC>2 controlled the short-term ambient




S02 levels in its vicinity.  Thus it is not very important that the models




predict the concentration in time and space accurately; only  the magnitude




is of importance.  This suggests that completely unpaired performance statis-




tics would be of prime importance.  Table 3-3 shows that unpaired statistics
                                   52

-------
were important in all five protocols but the weighting and aegree of importance

vary significantly.  In fact, in the Westvaco and Guayanilla protocols, data

paired in space only seem to be regarded as the most pertinent.  Although

specific rationales are generally lacking, it appears that the protocol writers

were concerned with model credibility.  Credibility in model performance can

be linked to the ability of the models to reproduce measured concentrations

at the right place, right time and perhaps both.  This explains (perhaps)

the varying degree of pairing.

         3.2.3   Performance Measures

                 The Interim Procedures state that the basic tools used

in determining how well a model performs in a given situation are performance

measures.  These performance measures are viewed as surrogate quantities

whose values/statistics serve to characterize the discrepancy between

predictions and observations.

                 Table 3-4 lists, by data set, the various performance

measures used in the protocols for characterizing performance for that data

set.  From an overall perspective the Table seems to indicate that,  while

there are some similarities, there are also a wide variety and combinations

of performance measures used among the protocols.  Each protocol seems to

contain a more or less unique combination of measures used to characterize

performance and this combination often differs from those suggested in the

Interim Procedures.  Some of the protocols contain certain performance

measures not mentioned in the Interim Procedures.  For example, three of

the protocols contain a performance measure, Mc*, designed to test the models'
*MC is not a unique performance measure but refers to schemes for quantifying
 this type of performance which differ among the various protocols.   See
 Appendices B, D and E for details.
                                   53

-------
                                                                   ;  r
Table 3-4  Performance Measures Used in the Protocols
1
IDATA SET
1
1
I
IPEAK VALUES
1
(HIGH &
SECOND HIGH)
HIGH- 2 5



,ALL DATA
1
I
1
[BALDWIN
d

d
c.
ad
RMSEd
Mc

F


WESTVACO
d

III
2*>
O / O O / S
'




WARREN
d
Idl/C0
R
S/Co

By Stability:
S2/S2
Cp/Co
d
Id|/C0
R
LOVETT


Vco, V^
p o» p p
Mc

Cp/c0, vc,,
By Stability:
r / r r I r
Up/ V^Q , UQ/ ^p
S/S°' S°/SP
GUAYANILLA
Cp/Co

Cp/Co
•J/'i
Mc




      d = Residual



      Sd » Standard  deviation of residual



      S2 = Variance  of  predicted concentration




      S  = Variance  of  observed concentration
        o


      F  = Frequency  distribution



      R = Correlation coefficient



      Mc = Meteorological cases in common




               Root-mean-square-error of  the  residual
                                            54

-------
capability to reproduce observed concentrations during observed meteorological




conditions.   Note also that certain performance measures such as the correla-




tion coefficient, the variance of the residuals and statistics on the frequency




distribution were not widely used in the protocols.  Where they were used,




they were not weighted heavily.




                 One specific point revealed by Table 3-4 concerns the use




of performance measures that characterize the model bias.  The Interim




Procedures suggest that model bias is an important quantity in performance




evaluations and that the model residual is an appropriate measure to charac-




terize the bias.  In the earlier protocols, Baldwin and Westvaco, the




model residual was used exclusively in this regard.  However, in time, as




indicated by the more recent protocols on the right side of Table 3-4,




the residual is used less frequently or not at all.  Instead, various




combinations of the ratio of the predicted to observed concentration are




used to characterize the bias.  No clues/rationale are contained in these




recent protocols that suggest a reason for using ratios instead of




residuals.  At the time that these protocols were negotiated with the




control agency, no significant objections were apparently raised over the




use of ratios in lieu of residuals.






          3.2.4  Model Performance Scoring




                 One of the more difficult aspects of writing a performance




evaluation protocol is devising a scheme which, for each performance measure




or other surrogate measure, objectively quantifies the degree to which the




model reproduces measured concentrations.  The specification of the details




of this concept, which is called "scoring" in the Interim Procedures, lacks




a clear technical basis or a basis in past experience.  The Interim Procedures
                                   55

-------
                                                          ;r
recognize this lack of guidance and invite the use of innovative schemes,




although the use of confidence intervals is mentioned as  one  such possible




scheme.




                 The lack of guidance in this area is well reflected  by  the




wide variety of scoring schemes that are specified in the various protocols.




In fact, each protocol generally contains several different schemes in




itself.  No attempt is made here to intercompare the details  of  the various




scoring schemes employed; the reader is referred to Appendices A-E and the




specific protocols in the Section 5.0 References for these details.




                 In general, most of the schemes ultimately generate what




might be termed a performance factor.  The performance factor is either  a




measure or an indicator of how well the model performs in relation to




measured data.  The method (usually formulae) used to arrive  at  the perfor-




mance factor depends on the specific measure of performance (residual, ratio,




correlation coefficient, etc.) and varies widely among the protocols.   The




performance factor, once obtained, is either multiplied by the maximum




possible points to obtain a subscore for that performance measure, or  a




table is entered which provides point subscores for specific ranges of the




performance factor.  The tables most often have "cutoff values"  above  or




below which a z«ro subscore is specified.  For measures of the bias,  the




table is sometimes skewed in favor of overprediction, i.e. a given amount




of overprediction is awarded more points than the same degree of underpre-




diction.




                 Once the various subscores are obtained, they are, in each




protocol, summed to obtain a total score for each model.   In some cases,




the performance factor mentioned above involves performance statistics from




both models.  Thus in these cases the scores obtained for each model are

-------
not truly independent indicators of how well the model performs relative to




measured data but contain some elements of relative performance between




models.  In any event, the scores for each model are then compared to obtain




a preliminary indication of which model is the better performer.




                 At this point the Interim Procedures suggest that, among




other things, it might be desirable to define a "window" of marginal perfor-




mance.  If the apparently better performer falls in the window then the




results of the technical evaluation could be used to arrive at a final




decision.  In only one of the protocols, Guayanilla, is the window concept




used and in that case it is merely stated that the proposed model, if it




receives a higher score, will not be chosen unless that score exceeds that




of the reference model by ten percent.




                 The Interim Procedures also suggest that it might be




undesirable to apply the chosen model should it be shown to underpredict




critical high concentrations.  In this case it is suggested that the chosen




model be "corrected" or "adjusted" to the degree which it apparently under-




predicts.  In the three most recent protocols (Warren, Lovett, and Guayanilla)




this concept is employed, although the details on the criteria for and the




method of correcting the model estimates vary.






     3.3  Data Bases for Performance Evaluations




          The Interim Procedures suggest that three types of data bases can




be used for performance evaluation purposes, data from an on-site specially




designed network, data from an on-site tracer experiment and, rarely, data




from an off-site network.  The five performance evaluations utilize data




from an on-site network of SCb monitors and other instruments.  Table 3-5




shows that three of those networks were specially designed for the performance
                                     57

-------
Table 3-5  Data Bases For Performance Evaluations
1
1 DATA
1
I TYPE OF
I NETWORK
1
1
i
1
INO. OF
[MONITORS
I
1
| LENGTH OF
I DATA RECORD
1
1
1
INO. OF ON-
ISITE MET,
I TOWERS*
1
1
ION-SITE MET.
IDATA
i
1
I OFF-SITE
IMET. DATA
1
1
i
(EMISSIONS
IDATA
1
BALDWIN
SPECIAL
DESIGN
10
1 YEAR
1
WD,WS,WF,T
NWS: MXHT &
STABILITY
LOAD LEVEL/
FUEL SAMPLES
WESTVACO
EXISTING
9
1 YEAR
2
WD,WS,WF,T
NONE
IN-STACK
DATA
WARREN
SPECIAL
DESIGN
-
7
1 YEAR
2
WD,WS,WF,T
NWS: MXHT &
MISSING DATA
PERIODS
LOAD LEVEL/
FUEL SAMPLES
LOVETT
SPECIAL
DESIGN
11
1 YEAR
5
WD,WS,WF,T
NWS: MXHT
IN-STACK
DATA
GUAYANILLA
PHASE I-
EXI STING
PHASE II-
SPECIAL DESIGN
PHASE 1-4
PHASE 1 1-8
PHASE 1-2 YEARS
PHASE II-
6 MONTHS
1
WD,WS,WF,T
CLIMATOLOGICAL
MXHT
LOAD LEVEL/
FUEL SAMPLES
    WD = Wind Direction
    WS = Wind Speed
    WF = Wind Fluctuation or Turbulence Intensity
    T  - Temperature
    NWS = National Weather Service
    MXHT = Mixing Height

    * Data from only one primary tower are used,  except for data substitutions when
      primary data source is not operating

-------
evaluation.  The Westvaco protocol utilizes data from a network that was




originally designed to monitor compliance of the source with the NAAQS.  As




pointed out in Section 2.2, this network was judged to be acceptable for




performance evaluation purposes.  In the Guayanilla protocol, the existing




network of four monitors was judged to be only marginally adequate for




performance evaluation purposes.  Thus there will be a Phase II performance




evaluation, where six months of data from an augmented, specially designed




network are to be utilized.




          The Interim Procedures recommend that the number and spatial coverage




of monitors is a tradeoff between the scientific desire for wide coverage-




with a dense array and the practical constraints of cost and logistics.  In




any event the requisite network must have sufficient spatial coverage/density




to address important source-receptor relationships identified in the preliminary




analysis and to meet the needs of the protocol.  Table 3-5 shows that the




networks to be utilized for performance evaluations contain about the same




number of ambient monitors, i.e. ranging from 7 to 11.  Further investigation




of these networks reveals that in each of them nearly all of the monitors




are fairly densely clustered in the area of expected maximum concentration




with one or two monitors, generally to be used for assessing background,




located well outside of this area.




          The Interim Procedures suggest that a 1-year data collection period




is normally the minimum in order to calculate performance statistics that




are related to the NAAQS, i.e. the high second-high concentration.  Table




3-5 shows lengths of record ranging from one to 2-1/2 years will be used




for performance evaluation purposes.




          In all of the performance evaluations the primary source of meteoro-




logical data is from an on-site network.  Although some of the networks con-
                                     59

-------
                                                          :  r
tain multiple towers (See Table 3-5), none of the models to be considered




in the evaluations is capable of utilizing spatially divergent meteorolo-




gical data inputs.  Thus, meteorological data inputs to the models are




pre-specified to be from a single tower, with other stations used as backup




for missing data periods.  In most cases, on-site wind fluctuation data




(sigma-theta) are to be used either as direct input to the models or as a




means to categorize stability.  Mixing heights are usually derived from




off-site National Weather Service temperature sounding data.  On-site




temperature data are sometimes used to interpolate hourly values of the




mixing height.




          The Interim Procedures recommend that in-stack instrumentation is




the preferable data source to be used in deriving hourly emissions and




values of stack gas parameters.  Table 3-5 shows that such in-stack instru-




mentation is or will be in place at Westvaco and Lovett..  The other three




performance evaluations derive emissions and stack data from surrogate




measures such as fuel analyses and documented load level information.






     3.A  Negotiation of the Procedures to be Followed




          The Interim Procedures strongly recommend that the applicant (source)




maintain close liaison with the reviewing agency at the beginning and through-




out the project.  It is especially important that the protocol and design




of the monitoring network be negotiated and agreed upon before any data are




in-hand.  In each of the five cases, such negotiations took place.  These




negotiations generally took place at two points of time: (1) in advance




of any work on the project itself where the need to do a comparative model




evaluation was identified as an acceptable way to resolve differences of




opinion on model acceptability and (2) after a draft protocol was written




and the proposed network was designed or identified.





                                   60

-------
          Table 3-6 lists Che major components of the model evaluation process




as identified in the Interim Procedures and as discussed in Section 3.1-3.3




above.  For each of the five cases the Table indicates whether that compo-




nent was a significant or minor issue in the negotiation process.  A signi-




ficant issue is defined as one where there was a significant difference of




opinion between the source and the control agency or, in some cases, between




control agencies.  A minor issue is one where the source did not strongly




object to the control agency's request for changes or additions to the




analyses, protocol or data base collection plans.  (A minor issue may have




resulted in a significant amount of additional analysis).  If no entry is




made in the Table it indicates that there was no issue or that the compo-




nent was apparently not discussed in the review process.




          The Table shows that regulatory aspects and the design of the data




base network were significant issues common to all of the projects.  The




resolution of these issues was, in fact, the decision to go ahead with the




model evaluation.  The network design issues generally reflect Agency concerns




that monitors be located in areas of expected maximum concentration.  It is




interesting to note that, in spite of the wide variation in the details of




the protocol, discussed in Section 2.2, there was apparently not much debate




on these details.
                                   61

-------
Table 3-6  Issues Involved in Negotiations

1 PRELIMINARY ANALYSIS
I REGULATORY ASPECTS
i SOURCE & SOURCE ENVIRONMENT
CHOICE OF PROPOSED MODEL
DOCUMENTATION OF PROPOSED MODEL
CHOICE OF REFERENCE MODEL
PRELIMINARY ESTIMATES
PROTOCOL
PERFORMANCE EVALUATION OBJECTIVES
CHOICE OF DATA SETS
CHOICE OF AVERAGING TIME
DEGREE OF PAIRING
CHOICE OF PERFORMANCE MEASURES
WEIGHTING (DISTRIBUTION)
WEIGHTING OF MONITORS
SCORING
ADDITIONAL CRITERIA1
DATA BASES
NETWORK DESIGN
NUMBER OF MONITORS
CHOICE OF METEOROLOGICAL INPUTS
BALDWIN

S
-
-
-
-



S
S

S
S
-
-


S
-
"~
WESTVACO

S

-
-
- S


-
-
-
-
-
-
S
-
—

S
-
S
WARREN

S
-
-
M

M

-
-
-
-
-
M
-
-
M

S

M
LOVETT

S
-
M
-
-
M

-
-
-
-
-
-
-
-
M

S

M
1
GUAYANILLAl

S

M

S
M

-
-
-
-
-
-
-
-
M

S
S
S
     M = Minor difference of opinion
     S = Significant difference of opinion
     - = No difference of opinion stated

     Footnote
     1.  Includes criteria to guard against underprediction of critical concentrations,

-------
4.0  FINDINGS AND RECOMMENDATIONS




     The summaries and analyses of several major cases, which utilize




guidance contained in the Interim Procedures for Evaluating Air Quality Models,




lead to the following general findings.  These findings parallel the basic




principles of the Interim Procedures listed in Section 1.2.




     Finding 1.  Up-front negotiations between the applicant and the




regulatory agencies on the nature of the protocol and design/utilization of




the data base network took place in each case.  Up-front discussions on the




preliminary analysis did not always take place.  This lack of early communi-




cation sometimes led to backtracking to fill in needed analyses.




     Recommendation




          In the interest of expediency, the applicant should initiate




frequent discussions with all of the control agencies that will be ultimately




involved in reviewing/approving the evaluation.  .Based on experience it is




especially important that discussion take place before the preliminary




analysis is conducted such that the applicant can provide all the relevant




information required for the case.




     Finding 2.  For each case a detailed protocol for performance evaluation




was written.




     Recommendation




          Establishing an up-front protocol has worked very well as the central




mechanism for decision-making on the appropriate model.  This should be




continued.




     Finding 3.  For each case an on-site data base network was established or




identified as meeting the needs of the protocol and the technical/regulatory




requirements.  In three of the evaluations a network was specially designed




to meet these needs.  In one evaluation the existing network was augmented
                                   63

-------
                                                           :  I
to meet these needs.  In one evaluation the existing network was judged




to be adequate without any modification.




     Recommendation




          It is clear from experience that it is highly important to establish




the design of the data base network before any data are collected or at




least before any data are available to the user.  This practice should be




continued.




     Finding 4.  Details of the protocol and network design were well




documented in each case.  However, details of the preliminary analysis




and the negotiation process were not always well documented.




     Recommendation




          It has become increasingly obvious that the preliminary analysis,




especially the preliminary concentration estimates, plays an important




role in the design of the protocol and the data base network.  In the




interest of avoiding misunderstanding, complete documentation of this




preliminary analysis is strongly recommended.




     Finding 5.  For the two cases where the evaluations have been completed,




the decision on the appropriate model was made as prescribed in the




protocol.




     Recommendation




          The execution of an established protocol leading to a rationalized




decision on the more appropriate regulatory model is a basic premise in




the Interim Procedures.  This practice should be continued.




     Other more specific findings and recommendations are:




     Finding 6.  Each of the five protocols involved large point sources




of S02 where attainment of the short-term NAAQS was at issue.  Four of




the sources were located in complex terrain.

-------
     Recommendation




          The use of the Interim Procedures over a broader range of model




evaluation problems is encouraged.




     Finding 7.  In each of the five cases a technical description of the




proposed model was provided.  However, a rigorous technical comparison of




the proposed and reference models, according to procedures outlined in




the Workbook for Comparison of Air Quality Models, was not generally




performed.  Also, user's manuals on both the proposed and reference models




were generally not available.




     Recommendation




          The results of rigorous application of the "workbook" procedures




have not been used as decision criteria for any of the cases covered in




this report.  However, it is important that the technical features of the




competing models be compared and the workbook provides a good framework ,




for making such comparisions.  Thus its continued use, at least in the




latter regard, is encouraged.




     Either a self-documenting code or a user's manual should be provided




for each model under consideration.  All data bases used in the evaluation




should be provided.




     Finding 8.  Preliminary estimates of expected concentration levels




were made in some cases; these results were not always well documented.




     Recommendation




          Preliminary estimates should be submitted in all future applications




of the Interim Procedures and the results of these estimates should be




documented in the form of isopleth maps and tables as well as descriptive




material that interprets the results.
                                   65

-------
     Finding 9.  Detailed performance evaluation objectives were generally


not established before writing the protocols.


     Recommendation


          It is believed that the development and submission of detailed


performance evaluation objectives should lead to logical and perhaps more


uniform choices of performance measures, averaging times, pairing and weight-


ing in the protocol.  Then the rationale for the choices will be explicit


to the reviewer, and this should facilitate any negotiation.  Thus the


use of detailed performance evaluation objectives is encouraged.


     Finding 10.  A wide variation in the choice of data sets, averaging
                                                    •

times, pairing, performance measures, and weighting is evident among the


protocols.


     Recommendation


          While EPA is not necessarily concerned about these wide variations


at this time, it is important that the rationale for the choices be


documented; the recommendation regarding performance evaluation objectives,


above, is one way to establish this rationale.


     Finding 11.  Similarly, a wide variety in the schemes used for


objectively determining the degree to which the models reproduce the


measured concentration (scoring) is evident.


     Re c ommendat ion


          Same as Item (10) above.


     Finding 12.  More recent protocols contain stipulations for adjusting


estimates from the chosen model, should that model be shown to underestimate


critical concentrations.
                                   66

-------
     Recommendation




          The use of model "adjustment factors" to take care of model




underestimates was a result of EPA's concerns.  While the "adjustment




factor" approach is acceptable for the time being, the development of




more innovative and more scientifically defensible schemes to address




underestimates is encouraged.




     Finding 13.  The data bases to be used in the performance evaluations




consist of networks of 7 to 11 monitors, primarily clustered in the area




of expected maximum concentration.  Meteorological data from on-site towers




are generally to be used in the evaluations.




     Recommendation




          These limited monitoring networks were acceptable because the areal




and temporal extent of the critical source-receptor relationships in the




five protocols was very limited.  In many cases it may not be possible to




establish a priori these critical source-receptor relationships.  In such




cases more monitors might be required.




          The need for representative meteorological data is critical to the




performance of the models.  To ensure that this need is met, the practice




of collecting on-site meteorological data, commensurate with the model's




input requirements, is encouraged.




     Finding 14.  Emissions data are either derived from in-stack instrumentation




(two cases) or from surrogate measures such as fuel samples, load levels, etc.




(three cases).




     Rec ommenda t i on




          The use of surrogate data such as fuel sampling, load levels, etc.




leads to considerable uncertainty in emissions especially when coal fired




boilers or industrial process emissions are involved.  The use of continuous




in-stack instrumentation is encouraged.




                                   67

-------
5.0  REFERENCES

1.  Environmental Protection Agency.  "Guideline on Air Quality Models,"
EPA-450/2-78-027, Office of Air Quality Planning and Standards, Research
Triangle Park, NC 27711, April 1978.

2.  Environmental Protection Agency.  "Interim Procedures for Evaluating
Air Quality Models (Revised),"  EPA-450/4-84-023, Office of Air Quality
Planning and Standards, Research Triangle Park, NC 27711, September 1984.

3.  Fox. D. G.  "Judging Air Quality Model Performance," Bull. Am. Meteor.
Soc. 62, 599-609, May 1981.

4.  Illinois Power and the Illinois Environmental Protection Agency.
"Procedures for Model Evaluation and Emission Limit Determination for the
Baldwin Power Plant," June 1982.

5.  Environmental Protection Agency.  "Workbook for Comparison of Air
Quality Models," EPA 450/2-78-028a,b, Office of Air Quality Planning and
Standards, Research Triangle Park, NC 27711, May 1978.

6.  Environmental Research & Technology, Inc.  "Evaluation of MPSDM and CRSTER
using the Illinois EPA-approved Protocol and the Subsequent Emission Lim-
itation Study for the Baldwin Power Plant," Documents P-B881-100, P-B881-200,
Prepared for Illinois Power Company, July 1983, July 1984

7,  Hanna, S., C. Vaudo, A. Curreri, J. Beebe, B. Egan, and J. Mahoney.
"Diffusion Model Development and Evaluation and Emission Limitations at
the Westvaco Luke Mill," Document PA439, Prepared for the Westvaco
Corporation by Environmental Research and Technology Inc., 696 Virginia Road,
Concord, MA 01742, March 1982.

8.  Bowers, J. F., H. E. Cramer, W. R. Hargraves and A. J. Anderson.  "Westvaco
Luke, Maryland Monitoring Program:  Data Analysis and Dispersion Model Val-
idation," Final Report prepared for U.S. Environmental Protection Agency,
Region III by H.E. Cramer Company Inc., University of Utah Research Park,
Post Office Box 8049, Salt Lake City, UT 84108, June 1983.

9.  Hanna, Steven B., Bruce A, Egan, Cosmos J. Vaudo and Anthony J.
Curreri.  "An Evaluation of the LUMM and SHORTZ Dispersion Models Using
the Westvaco Data Set," Document PA-439, Prepared for the Westvaco
Corporation by Environmental Research & Technology, Inc., 696 Virginia
Road, Concord, MA 01742, November 1982.

10.  Londergan, Richard J.  "Protocol for the Comparative Performance
Evaluation of the LAPPES and Complex I Dispersion Models Using the Warren
Data Set," TRC Environmental Consultants, Inc., 800 Connecticut Blvd.,
East Hartford, CT 06108, November 1984.

11.  Environmental Protection Agency.  "Regional Workshop on Air Quality
Modeling:  A Summary Report," EPA 450/4-82-015, Office of Air Quality Planning
and Standards, Research Triangle Park, NC 27711, April 1981.
                                     69

-------
12.  Environmental Research & Technology, Inc.   "Protocol for the Evaluation
and Comparison of Air Quality Models for Lovett Generating Station," Docu-
ment P-B636-100, Prepared for Orange & Rockland Utilites, Inc., July 1984.

13.  Environmental Protection Agency.  Letter to Mr.  Frank E. Fischer, Vice
President, Engineering, Orange & Rockland Utilites, Inc., EPA, Region II,
26 Federal Plaza, New York, NY 10278, August 30, 1984.

14.  Environmental Research & Technology, Inc.   "Validation of the Puerto
Rico Air Quality Model for the Guayanilla Basin," Document P-9050, Pre-
pared for Environmental Quality Board of Puerto Rico, November 1979.

15.  Environmental Protection Agency.  "Protocol for  the Comparative Performance
Evaluation of the PRAQM and Complex I Dispersion Models in the Guayanilla
Basin," EPA Region II, 26 Federal Plaza, New York, NY 10278, August 1984.

16,  Public Service Company of Indiana. "Plan for Field Study Leading to
Model Evaluation and Emission Limit Determination for the Gibson Generating
Station," Document P-A892, Environmental Research & Technology, Inc., 696
Virginia Road, Concord, MA 01742, May 1981.

17.  Environmental Protection Agency.  Letter to Mr.  S. A. Ali, Public Service
Indiana from Environmental Protection Agency, Region V, 230 South Dearborn
Street, Chicago, IL 60604, August 10, 1982.

18.  Environmental Protection Agency.  Letter to Mr.  S. A. Ali, Public Service
Indiana from Environmental Protection Agency, Region V, 230 South Dearborn
Street, Chicago, IL 60604, June 10, 1982.

19.  Burkhart, Richard P.  "Protocol for the Comparative Performance Eval-
uation of the LAPPES and Complex I Dispersion Models Using the Penelec
Data Set," Pennsylvania Electric Company, 1001 Broad Street, Johnstown, PA
15907, November 15, 1982.

20.  Burkhart, Richard P., Richard J. Londergan, Richard A. Rothstein and
Herbert S. Borenstein.  "Comparative Performance Evaluation of Two Complex
Terrain Dispersion Models," Preprint Paper No.  83-47.4, 76th Annual Meeting
of the Air Pollution Control Association, June 19-24, 1983.
                                     7f)

-------
                 APPENDIX A




Protocol and Performance Evaluation Results




                    for




            Baldwin Power Plant
                    A-l

-------
PERFORMANCE EVALUATION PROTOCOL AND FINAL SCORES FOR BALDWIN POWER PLANT
DATA SET
Second-
Highest
25-
Highest
PAIRING
SPACE
Yes
No
Yes
No
Yes
No
Yes
No
Yes
TIME
Yes
No
Yes
No
Yes
No
Yes
No
No
PERFORMANCE
MEASURES
d
d
I
d
Sd
srf
RMSEn
RMSEd
No. of Cases
In Common
Cumulative
Frequency
Distribution
AVERAGING
TIMES
3-hour
3-hour
3-hour
3-hour
3-hour
3-hour
3-hour
3 -hour
3-hour
3-hour
MAXIMUM IS CORING
POINTS | SCHEME
I(CODE)*
WEIGHTING
SCORES
INDIV-lDATAlMPSDM
IDUAL JSET j
15 | a | 15
40
5
15
2.5
5
2.5
5
5
5

a
b
b
c
c
c
c
d
e
TOTAL
40
5
15
2.5
5
2.5
5
5
5
100
55
45
100
0.0
17.7
0.1
13.5
1.7
4.4
1.0
4.4
4.0
4.5
51.3
CRSTERl
0.0
14.0
0.2
9.0
1.8
4.0
1.1
3.6
4.0
4.0
41.7
*Letters in this column refer to the specific scoring scheme to be used.
 See subsequent page(s).
                                   A-3

-------
                              SCORING SCHEME
     a.  Second-highest data set:  Single-valued residuals (d), paired and
         unpaired

     A match between observed and predicted concentration is awarded a
maximum skill score, while a residual (observed minus predicted concentration)
that is more than 1/2 the observed highest, second-highest concentration
in magnitude is assigned a score of zero. -Regardless of the sign of the
residual, the points awarded vary linearly between 0 and 100% of the maximum
possible as the model error varies in magnitude between 1/2 the observed
highest, second-highest 3-hour average and zero.


     b.  25-highest data set:  Bias (d), paired and unpaired

         A scoring scheme for the bias that is the same as that used for
the second-high values is used.  A zero skill level is assigned to a bias
equal to 1/2 of the average observed value for the highest-25 3-hour S02
concentrations.  The total number of points awarded to a model vary between
0 and the maximum value as the magnitude of the average residual varies
between 1/2 the average observed 3-hour concentration and zero.


     c.  25-highest data set:  Noise and gross variability (S^, RMSE^),
         paired and unpaired

         The scoring scheme for the noise and gross variability tests
involves the ratio of the model precision measure to the average value
about which it is being computed.  For the noise test, this ratio is the
standard deviation divided by the average modeled value.  For the gross
variability test, the ratio is the root-mean-square-error divided by the
coefficient of variation value (standard deviation divided by the mean)
often used in statistical testing.  A score of 0 points is suggested for
a ratio of 1.0, linearly increasing to the maximum score as the ratio
goes to zero.  That is, the score will be:

          SCORE = (1.0 - computed ratio) x the maximum possible points

      d.  25-highest data set:  Meteorological cases in commmon, unpaired

          For the meteorological conditions comparison, 4 general weather
categories is used:

          1.  Unstable (Classes A-C), with the 10-meter wind speed less
than 5 m/sec;

          2.  Neutral (Class D), with the 10-meter wind speed less than
5 m/sec;
                                  A-4

-------
          3.  Stable (Classes E-G),  with the 10-meter wind speed less than
5 ra/sec;

          4.  Any case with the 10-meter wind speed greater than 5 m/sec.

          The number of cases for each weather category is totaled for the
top 25 observed and modeled 3-hour cases.  The number of unpaired cases
"in common" between observed and predicted 3-hour events is totaled to
determine the score for this test for each model:

                        r                      ~i r             ~i
               Score  = |  No. of Cases in Common | (Maximum Points I
                        l_          25_| |_             J

     e.  25-highest data set:  Cumulative frequency distribution, paired in
space

          For each individual monitor, the Kolomogorov-Smirnoff (K-S) test
is be used to determine whether the cumulative frequency distributions
between the top 25 observed and predicted 3-hour values are significantly
different (at the 5% significance level).  Points are awarded for each
monitor for which there is not a significant difference in a cumulative
frequency distribution:

         I  No. of monitors where frequency distributions | |
Score =  |   are not significantly different	I I  Maximum Points
         I                        25                      I I

-------
                 APPENDIX B




Protocol and Performance Evaluation Results




                    for




             West vac o Luke Mill
                    B-l

-------

-------
          PERFORMANCE EVALUATION PROTOCOL AND FINAL SCORES FOR WESTVACO LUKE MILL
1DATA SET| PAIRING

Maximum
Second-
Highest
25-
iHighest
S P ACE 1 TIME
No No
Yes No
No Yes
Yes Yes
No No
Yes
No
Yes
No
No
No
PERFORMANCE
MEASURES
Idl
Idl for 8
monitors
|d| for
monitor #10
Id)
Idl

Idl for 8
monitors
|d| for
monitor #10
Idl
Sp/So>So/Sp
Idl for 8
monitors
c2/c2 e2/o2
V o' o/bp
for 8
monitors
Idl for
monitor #10
c2/c2 c2/c2
V bo'Vbp
for monitor
#10

AVERAGING
TIMES
3- hour
24-hour
3-hour
24-hour
3-hour
24-hour
Annual
Annual
3-hour
24-hour
3-hour
24-hour
3-hour
24-hour
1-hour
3-hour
24-hour
1-hour
3-hour
24-hour
1-hour
3 -hour
24-hour
1-hour
3-hour
24-hour
1-hour
3 -hour
24-hour
1-hour
3-hour
24-hour
TOTAL
MAXIMUM
POINTS
20
20
16d)
16(D
8
8 -
20
20
30
30
24U)
24(2)
12
12
25
25
25
5
5
40^>
40(3)
40(3)
16(4)
16W
16(4)
20
20
20
8
8
8
SCORING
SCHEME
(CODE)*
a
a
a
a
a
a
a
a
a
a
a
a
a
a
b
b
b
f\
c
c
b
b
b
c
c
c
b
b
b
c
c
c
602
WEIGHTING
SCORES
INDIV-lDATA ILUMM
IDUAL ISET |
3.3
3.3
2.7
2.7
1.3
1.3
3.3
3.3
5.0
5.0
4.0
4.0
2.0
2.0
4.2
4.2
4.2
0.8
0.8
0.8
6.6
6.6
6.6
2.7
2.7
2.7
3.3
3.3
3.3
1.3
1.3
1.3
(5)
99.3
21.2
22.0
56.7
99.9
12
19
10
13
0
1
13
Q
27
26
17
4
16
3
17
23
23
0
1
4
30
30
26
6
5
7
1
9
4
2
2
3
SHORTZ
0
0
4
3
8
8
1
8
0
0
5
10
6
11
0
0
0
0
0
0
5
8
9
2
4
4
19
16
19
2
8
8
1
363 168
*Letters in this column refer to the specific scoring scheme  used.

 Footnotes:

     (1)  2 points per monitor
     (2)  3 points per monitor
     (3)  5 points per monitor
     (4)  2 points per monitor
     (5)  Do not add to 100% because of  rounding

                                             B-3
See subsequent page(s),

-------
                                                          •   f
..       » an, .«.-M.h..t

    pairings
                               SCORING SCHEME




                                                          residual (|d|), various
    Score - [IdUn/ldlil [*in (C^/C^/Cp.i) ]  t-x points]



             Where i - 1,2  -  Model 1 or Model 2



 b.  25-highest data  set:  Bias ( |d| ), unpaired, paired in space



             Score- [IdUn/ldlil  [«ln(Cp,i/Co,Co/Cp,i)]



             where i -  1,2  -  Model 1 or Model  2




                                  - re;  2/c  2  s  2/s  2j   unpaired,  paired  in space
 c.  25-highest data set: -Variance (Sp /So ,bo /^p ;,


             r  ^  cc 2  /c 2 c  2/s  2  )] [max  points]
     Score = [minCSp^.i/SQ ,bo /ap •i-'J



              where i -  1,2 - Model 1 or Model 2

-------
                             :   r
    APPENDIX  C




     Protocol




       for




Warren Power Plant
          01

-------
; r

-------
             PERFORMANCE EVALUATION PROTOCOL FOR WARREN POWER PLANT
DATA SET


PAIRING
SPACE) TIME

Maximum Ye s










Second- No
Highest


25-
Highest






All
Data




Yes

No







No

Yes



No








No

No

No







No

No


PERFORMANCE
MEASURES


d


Id I/ C0

R


CP/CO

CP/CO (D

_





~r ( ? ^
|-o {*•)
c2/c2 fo\
V o u;
Cp/Co

d
Id|/C0
R
AVERAGING
TIMES

1-hour
3-hour
24-hour
1-hour
3-hour
24-hour
1-hour
3-hour
24-hour
3-hour
24-hour
3-hour
24-hour
1-hour
3 -hour
24-hour
1-hour
3-hour
24-hour
1-hour
1-hour
Annual

Annual
Annual
Annual
1
1 TOTAL
MAXIMUM
POINTS

2.0
2.7
3.6
2.0
2.7
3.6
1.0
1.6
1.8
7.0
9.0
12.0
12.0
8.0
10.0
13.0
4.0
6.0
7.0
8.0
4.0
8.0

4.0
4.0
4.0
SCORING
SCHEME
(CODE)*
a
a
a
b
b
b
c
c
c
d
d
e
e
f
f
f
g
g
g
h
i
j

k
1
m
141
WEIGHTING
INDIV-
IDUAL
1.4
1.9
2.6
1.4
1.9
2.6
0.7
1.1
1.3
5.0
6.4
8.5
8.5
5.7
7.1
9.2
2.8
4.3
5.0
5.7
2.8
5.7

2.8
2.8
2.8
100.0
DATA
SET
14.9








28.4



42.6







14.1




100.0
 *Letters refer to the specific scoring scheme to be used.   See subsequent  page(s).

 Footnotes:
(1)   For stations with the 3 highest observed and 3 highest estimated values — see
     scoring scheme below.
(2)   Stratified by stability — see scoring scheme below.

-------
                                                           ;  r
                              SCORING SCHEME
a.  Maximum data set:   Average difference (d),  paired in space

Confidence intervals for 50 percent,  80 percent,  95 percent confidence
levels from t-test.

                                                            Point Score

1
|50 percent confidence interval (C.I.) contains
| zero (observed=predicted)
1
1
|80 percent C.I. contains zero (but 50 percent
| does not)
1
1
|95 percent C.I. contains zero (but 80 percent
I does not)
1
1
|95 percent C.I. does not contain zero
I
(1-Hour)
2.0
1.33
0.67
0
(3-Hour)
2.7
1.8
0.9
0
(24-Hour)
3.6
2.4
1.2
0
b.  Maximum data set:  Average absolute difference (AAD),  paired in space

Compute ratio of AAD to average observed value.



0.2 <
0.4 <
0.8 <


ratio < 0.
ratio < 0.
ratio < 0.
ratio


2
4
8

I
(1-Hour)
2
1.33
0.67
0
'oint Score
(3-Hour) |
1
2.7 |
1.8 I
0.9 I
0 I
1

(24-Hour)
3.6
2.4
1.2
0
c.  Maximum data set:  Pearson's correlation coefficient (R),  paired in space


1
1
1 0.8
1 0.6
i



R > 0.8
> R > 0.6
_> R
Point Score
(1-Hour) |( 3-Hour) I
1 1
1 1 1.6 1
0.5 I 0.8 I
0 I 0 I
1 1

(24-Hour)

1.8
0.9
0

-------
d.  Second-highest data set:  Highest second-highest value, unpaired
            the ratio of the predicted to the observed highest
            second-high value.
                                       I     Point Score    I
                                       13-hour   I   24-hour I
0.5 > Cp/Co
0.67 > Cp/Co > 0.5
0.83 > Cp/Co > °-67
1.2 > Cp/Co > °-83
1.5 > Cp/Co > 1-2
2.0 > Cp/Co > 1-5
Cp/Co > 2.0
1 -
1
1 0
I 2.6
! 4.4
1 7
1 4.4
1 2.6
1 0
0
3.4
5.6
9
5.6
3.4
0
    Second-highest data set:  Second-highest observed and predicted value (by
    stations with the highest, second-highest, and third-highest values (12
    points possible), paired in space
Cp/C
            rati° °f predicted to observed second-highest value at the
    same station


0.5 >
0.67 >
0.83 >
1.2 >
1.5 >
2.0 >



Cp/Co
CpVc0 > 0.5
Cp/Co > 0.67
Cp/Co > °'83
Cp/Co > 1*2
Cp/Co > 1«5
Cp/Co >. 2.0

Station w/highest
value
0
1
2
3
2
1
0
Point Score
Second-highest
station
0
1
1
2
1
1
0

Third-highest
station
0
0
1
1
1
0
0

-------
f.  25-highest data set:   bias (Cp/Co),  unpaired




        0 = ratio of predicted to observed  average  value

1
I 0.67 > Cp
1 0.83 > a,
1 0.91 > Cp
1 1.1 > Cp
1 1.2 > Cp
1 1.5 > Cp
j c,

/c0
/"CQ > 0-67
/C0 > 0.83
/C0 > 0.91
/c0 > 1.1
/c0 > 1.2
/c0 > 1.5
1
Point Score
I (1-hour )| (3-hour)
1 1
1 0 1
1 2.5^ |
1 5 I
1 8 I
1 5 1
1 2.5 I
1 0 I
1 1
0
3
6
10
6
3
0
(24-hour)
0
4
8
13
8
4
0
g.  25-highest data set:   variance ratio
                                                ), unpaired
    S /S  = ratio of predicted to observed variance


0.25 > S^/S^
0.50 > S^/S^ > 0.25
P o —
0.75 > SpVsJ >_ 0.50
1.33 > Sp/S^ >_ °-75
2.0 > sj/sj >_ 1.33
4.0 > SpVs2 _> 2.0
S2/S^ > 4.0

(1-hour)
0
1.33
2.67
4
2.67
1.33
0
Point Sc
(3-hour)
0
2
4
6
4
2
0
:ore
(24-hour)
0
2.4
4.8
7
4.8
2.4
0

-------
h.  25-highest data set:  Bias (Cp/C0), by stability category, unpaired

    For the stability category with the highest observed concentrations,
    compare the 25-highest observed and 25-highest predicted values
    (unpaired in time or location).  Repeat for the stability category
    with the highest predicted concentrations.  (1-hour average only).
              Ratio of predicted to observed average value
                                                 Point Score
0
0
0
1
1
1

.67
.83
.91
.1
.2
.5

>
>
>
>
>
>

Cn/Cn
V£o
Cp/io
CP/CO
Cp/Co
Cp/C0
Vuo

>
>
>
>
>
>

0
0
0
1
1
1
"
.67
.83
.91
.1
.2
.5

1
2

2
1

0
•
•
4
•
•
0

25
5

5
25

i.  25-highest data set:  Variance ratio (S^/S^), by stability category,
    unpaired

    For the stability category with the highest observed concentrations,
    compare the 25 highest observed and 25 highest predicted values (un-
    paired in time or location).  Repeat for the stability category with the
    highest predicted concentrations.  (1-hour average only).

                  S'r/S^ = Ratio of predicted to observed variance

0.25 )

0.50 )

0.75 )

1.33 )

2.0 >

4.0 )



' Sn/Sn
P °
' So/So >
p o —
' SD/S0 >
P ° ~
> SD/S0 >
p o —
' So/So >
p o —
> S2/S2 >
SD/S0 >
p o —



0.25

0.50

0.75

1.33

2.0
4

Point Score
0

0.67

1.33

2

1.33

0.67
0


-------
 j.  All data set:  Bias (Cp/Co), unpaired

    Ratio of predicted to observed highest value

0.75 > Cp/Co
0.83 > Cp/Co
0.91 > Cp/Co
0.95 > Cp/Co
1.05 > Cp/Co
1.1 > Cp/Co
1.2 > Cp/Co
1.33 > Cp/Co
Cp/Co


> 0.75
> 0.83
> 0.91
> 0.95
> 1.05
> 1.1
> 1.2
> 1.33
Point Score
0
2
4
6
8 -
6
4
2
0
k.
All data set:  Average residual (d), paired in space

Use confidence intervals as in a.

50
80
.95
95
percent
percent
percent
percent
confidence interval
C.
C.
C.
I.
I.
I.
contains
contains
does not
zero
zero
contains
(but
(but
50%
80%
zero
does
does

not)
not)
contain zero
Point
4
2
1
0
Score

1
1
1
1
1.  All data set:  Ratio of average absolute difference to average observed
value, paired in space

1
lo.i <
10.2 <
10.3 <
1
1

: III /c0
. Id|/C0
. Id|/C0
Id|/C0


< 0.1
< 0.2
< 0.3
Point Score
4
2
1
0
tn.  All data set:  Pearson's correlation coefficient (R), paired in space.

0.9 > R
0.8 > R > 0.9
0.7 > R > 0.8
0.6 > R > 0.7
R > 0.6
Point Score
4
3
2
1
0

-------
                             *  r
    APPENDIX D




     Protocol




       for




Lovett Power Plant
        D-l

-------
D-2

-------
                 PERFORMANCE EVALUATION PROTOCOL FOR LOVETT POWER PLANT
DATA SET
Second
Highest
25-Highest
All
Data
PAIRING I PERFORMANCE
SPACE
No
Yes
No
Yes
No
"
Yes
No
No
Yes
No
TIME MEASURES
No Cp/C0, C0/Cp
No Cp/C0, C0/Cp
I
No I _ _ _ _
W)/ *"o * o* p
No
Cp/C0, C0/Cp
No jsZ/S*,
|S2/S2
Vsp
No Is'/S2,.
SO/SP
No No. of cases
1 in common
Yes Cp/C0, Cp/Cp
1
Yes 1 Cp/C0, C0/Cp
Yes 1 R(i>
VV c0/VU
#4
S2/S2 (1)
so/sp
did)
R(2>
Cp/C0. C0/V2>
S2/S2 S2/S2(2)
bp/ao> so/sp
d2(2)

AVERAGING
TIMES
3-hour
24-hour
3-hour
24-hour
1-hour
3-hour
24-hour
1-hour
3-hour
24-hour
1-hour
3-hour
24-hour
1-hour
, 3-hour
24-hour
1-hour
Annual
Annual
1-hour
3-hour
1-hour
3-hour
1-hour
3-hour
1-hour
3-hour
1— hour
3-hour
1-hour
3-hour
1-Hour
3- hour
1-hour
3-hour
MAXIMUM
POINTS
5.0
5.0
5.0
5.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
10.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
2.0
2.0
1.0
1.0
1.0
1.0
1.0
1.0
2.0
2.0
TOTAL 1100.0
SCORING
SCHEME
(CODE)*
a
&
b
b
c
c
c
d
d
d
e
e
e
f
f
g
h
i
j
j
k
k.
1
m
m
n
n
o
o
P
1
1
WEIGHTING
INDIV-lDATA
IDUAL (SET
5.U
5.0
5.0
5.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
10.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
2.0
2.0
1.0
1.0
1.0
1.0
1.0
1.0
2.0
2.0
20.0
58.0
22.01
1100. 0 1100.0
*Letters refer to specific scoring scheme to be used.   See  subsequent page(s).

 Footnotes:  (1)  Stable conditions
             (2)  Nonstable conditions
                                D-3

-------
                                                           '   f
                                SCORING SCHEME


a.  Second-highest data set:  Ratios of concentrations (Cp/Co,  Co/Cp),  unpaired

         Score = [min(Cp/Co, Co/Cp)] tmax points]

b.  Second-highest data set:  Ratios of concentrations (Cp/Co,  Co/Cp),  paired in
    space

         Score - [min (Cp/Co, Co/Cp)] [max points]

c.  25-highest data set:  Bias (Cp/Co, Co/Cp), unpaired

         Score = [min (Cp/Co, Co/Cp)] [max points]

d.  25-highest data set:  Bias (Cp/C^o, ^0/Cp), paired in space

         Score = [min (Cp/ Co/Cp), paired in  space and time

         Score -  [min(Cp/Co, Co/Cp)] [max points]

j.  All data set:  Pearson's correlation coefficient (R), stable conditions
    only, paired in time

         Score = [R^] [max points]

k.  All data set:  Bias (Cp/Co, Co/Cp), stable conditions only, paired in time

         Score = [min(Cp/Co, Co/Cp)] lmax points]
                                   D-4

-------
1.  All data set:  Variance ratios (S2/S2, S2/S2), stable conditions only,
    paired in time

         Score = [min(S2/S2, S2/S2)] [max points]

    All data set:  Gross variability (d2) , stable conditions only, paired
    in time

        Score = [(Id2)min/ (£d2)] [max points]    ..
m.
              where
                        /
                                    f°r best performing model
n.  All data set:  Pearson's correlation coefficient (R) , nonstable conditions,
    paired in time

        Score = [R.2] [max points]

o.  All data set:  Bias (Cp/Co,C0/Cp) , nonstable conditions, paired in time

        Score = [min^Cp/C0,"c0/Cp)] [max points]

p.  All data set:  Variance ratios (S2/S2, S^/Si1), nonstable conditions,
    paired in time

        Score = [min(S2/S2, S2/S2)] [max points]

q.  All data set:  Gross variability (d^) , nonstable conditions, paired in time

        Score = [(Id2)min/(£d2)] [max points]
              where ( Id )
                          ,
                                    for best performing model
                                  D-5

-------
                     •   f
D-6

-------
   APPENDIX E




    Protocol




      for




Guayanilla Basin
        E-l

-------
E-2

-------
             PERFORMANCE EVALUATION PROTOCOL FOR GUAYANILLA BASIN
IDATA SET

Maximum

Second-
Highest
25-Highest





•


Upper
5% of
Observed &
Predicted
PAIRING
SPACE | TIME

No
Yes

No
Yes
No




Yes




Yes




No
No

No
No
No




No




No



PERFORMANCE
MEASURES

Cp/Co
Cp/Co

Cp/Co
Cp/Co

Cp/Co

S2/S2


_


S2/S*


No. of Cases
in Common



AVERAGING
TIMES

3-hour
24-hour
3-hour
24-hour
3-hour
24-hour
3-hour
24-hour
1-hour
3-hour
24-hour
1-hour
3-hour
24-hour
1-hour
3-hour
24-hour
1-hour
3-hour
24-hour
1-hour



TOTAL
MAXIMUM
POINTS

4
5
9
12
5
6
12
15
6
8
12
3
4
6
12
18
30
6
9
15
60



257
SCORING
SCHEME
(CODE)*
a
a
b
b
c
c
d
d
e
e
e
f
f
f
g
g
g
h
h
h
i




WEIGHTING
INDIV-
IDUAL
1.6
2.0
3.5
4.7
2.0
2.3
4.7
5.9
2.3
3.1
4.7
1.2
1.6
2.3
4.7
7.0
11.7
2.3
3.5
5.8
23.4



(1)
100.3
DATA
SET
11.8

14.9
50.2








23.4



(1)
100.3
*Letters refer to specific scoring scheme to be used.  See subsequent page(s).
 Footnote:
 (1)  Do not add to 100% because of rounding.

-------
                                SCORING SCHEME
a.  Maximum data set:  Concentration ratio (C_/CO),  unpaired


0.67 )
0.80 )
0.91 )
1.20 )
1.50 5
2.50 )



> Cp/C0
> CP/CO >
> Cp/C0 >.
' Cp/Co >
> CP/CQ >
» VCo I
S/Co >



0.67
0.80
0.91
1.20
1.50
2.50
|3-hr
1
lo.o
10.5
12.0
|4.0
12.5
|1.5
lo.o
1
24-hr

0.0
1.0
2.5
5.0
3.5
2.0
0.0
b.  Maximum data set:   Concentration ratio (Cp/Co),  paired in space

    A weighting factor is to be applied to the scores for the tests at
    each monitor.  The weighting factor will be based on the relative rank
    of the observed data for each averaging period to be examined.  The
    following weights will be assigned and should be applied to the table
    below.

Rank
1
2
3
4
Phase 1
Weight
1.0
0.8
0.7
0.5

Rank
1,2
3,4
5,6
7,8
Phase II
Weight
0.50
0.40
0.35
0.25

0.67 )
0.80 )
0.91 )
1.20 )
1.50 )
2.50 :


> c /c0
> C^/C° >
> cp/c0 >
> S/Co >
> Cp/C0 1
> cp/c0 >
S/Co >


0.67
0.80
0.91
1.20
1.50
2.50
|3-hr
1
lo.o
lo.o
ll.O
|3.0
|2.0
ll.O
lo.o
i
1 24-hr
1
1 0.0
1 0.5
I 2.0
I 4.0
1 2.5
1 1.5
1 0.0
i

-------
c.  Second-highest data set:  Concentration ratio (Cp/Co), unpaired
                                                      13-hr  I  24-hr
0.67 :
0.80 J
0.91 )
1.20 )
1.50 5
2.50 )

> Cp/Co
> Cp/Co >
> Cp/Co >
» Cp/Co >.
» Cp/Co >
> Cp/C0 >
VC°>

0.67
0.80
0.91
1.20
1.50
2.50
1
lo.o
ll.O
|2.5
|5.0
|3.5
|2.0
lo.o
1
0.0
1.5
3.0
6.0
4.0
2.5
0.0
d.  Second-highest data set:  Concentration ratio (Cp/Co),  paired in space

    A weighting factor is to be applied to the scores for the tests at each
    monitor.  The weighting factor will be based on the relative rank of the
    observed data for each averaging period to be examined.  The following
    weights will be assigned and should be applied to the table below.
Phase I Phase II
Rank Weight Rank
1 1.0 1,2
2 0.8 3,4
3 0.7 5,6
4 0.5 7,8

1
I 0.67 > Cp/Co
t 0.80 > Cp/Co > 0.67
1 0.91 > Cp/Co > °-80
I 1.20 > Cp/Co > 0.91
i 1.50 > Cp/Co > i-20
1 2.50 > Cp/Co > i-50
1 Cp/Co > 2-50
1
Weight
0.50
0.40
0.35
0.25

13-hr
1
lo.o
|0.5
|2.0
|4.0
12.5
11.5
lo.o
1
24-hr
0.0
1.0
2.5
5.0
3.5
2.0
0.0
                                   V-S

-------
e.  25 highest data set:   Bias  (Cp/Co),  unpaired

1
I 0.67 >
1 0,80 >
1 0.91 >
1 1.20 >
1 1.50 >
1 2.50 >
1
1

Cp/Co
Cp/Co >
CTI/C >
Cp/Co ^
Cp/Co >
C /C >
CpVco >


0.67
0.80
0.91
1.20
1.50
2.50
1 1-hr
1
1 0.0
I 1.5
1 3.0
1 6.0
\ 4.0
1 2.5
1 0.0
1
3-hr
0,0
2.0
4,0
8.0
5.5
3.0
0.0
24-hr
0.0
3.0
6.0
12.0
8.0
4.0
0.0
f.  25 highest data set:   Variance  ratio
                                                , unpaired


0.25 <
i 0.50 <
|
0.75 <
1 1.33 <
1
I 2.00 <
|
I 4.00 <
1

S2/S2 <
SpVs0<
Sp/So i
sji/s2 <
SpVs2 <
SpVs2, <
sj/sj

0.25
0.50
0.75
1.33
2.00
4.00

1-hr
0.0
1.0
2.0
3.0
2.0
1.0
0.0
3-hr
0.0
1.5
3,0
4.0
3.0
1.5
0.0
24-hr
0.0
2.0
4.0
6.0
4.0
2.0
0.0

-------
g.  25 highest data set:  Bias (Cp/Co), paired in space

    A weighting factor is to be applied to the scores for the tests at each
    monitor.  The weighting factor will be based on the relative rank of the
    observed data for each averaging period to be examined.  The following
    weights will be assigned and should be applied to the table below.

Rank
1
2
3
4
Phase I
Weight
1.0
0.8
0.7
0.5

Rank
1,2
3,4
5,6
7,8
Phase II
Weight
0.50
0.40
0.35
0.25

0.67 > Cp/Co
0.80 > Cp/Co >
0.91 > Cp/Co >
1.20 > Cp/Co >
1.50 > Cp/Co >
2.50 > Cp/Co >
Cp/Co >


0.67
0.80
0.91
1.20
1.50
2.50
1-hr
0.0
0.5
2.0
4,0
2.5
1.5
0.0
3-hr
0.0
1.5
3.0
6.0
4.0
2.5
0.0
24-hr
0.0
2.5
5.0
10.0
6.5
3.5
0.0
h.  25 highest data set:
                       9  7
      Variance ratio (S^/Sl1), paired in space
    A weighting factor is to be applied to the scores for the tests at each
    monitor.  The weighting factor will be based on the relative rank of the
    observed data for each averaging period to be examined.  The following
    weights will be assigned and should be applied to the table below.
              Rank

               1
               2
               3
               4
Phase I
     Weight

      1.0
      0.8
      0.7
      0.5
     Phase II
Rank       Weight
1,2
3,4
5,6
7,8
0.50
0.40
0.35
0.25

-------
                                                           :  r

sp/so 1 °'25
0.25 < Sp/So _< 0.50
0.50 < S2/S2 <. 0.75
0.75 < Sp/Sg £ 1.33
1.33 < S^S2 < 2.00
2.00 < S2/S2 <. 4.00
4.00 < SJT/S2
1 P
1-hr
0.0
0.50
1.0
2.0
- 1.0
0.5
0.0
3-hr
0.0
1.0
2.0
3.0
2.0
1.0
0.0
24-hr
0.0
2.0
3.5
5.0
3.5
2.0
o.o !
I
i.  Upper 5% of frequency distribution data  set:   Number  of  cases  in  common

    At each monitor,  unpaired in time, stratify  the  upper 5% of  the 1-hour
    predicted and observed concentrations  according  to  the following  categories:

          I.    Unstable (Classes A,  B, C)
          II.   Neutral (Class D)
          111.  Stable (Classes E,  F)

    The number of unpaired cases "in  common" between observed and  predicted  1-hour
    events will be used to determine  a skill factor  for each category,  defined
    as:
          Rsf
2 x (Number in Common)
                (Number Predicted + Number Observed)
    The total number of points for each Phase I monitor  is  15 points  and  total
    points for each Phase II monitor is 7.5 points,  appropriated  as  follows:

          from the highest 25 observed concentrations,
          Most predominant category:
          Next predominant category:
          Least predominant category:

     The total score is given by

          Score =    ^ (Rsf)(max points)
                  categories
                    Phase I

                    8 pts.
                    4 pts.
                    3 pts.
Phase II

  4 pts.
  2 pts.
1.5 pts.

-------
                                                                     ;  r
                                   TECHNICAL REPORT DATA
                           (Please read Instructions an the reverse before completing/
1. REPORT NO
EPA 450/4-85-006
                                                           3 RECIPIENT'S ACCESSION NO.
4. TITLE AND SUBTITLE
Interim Procedures  for  Evaluating Air Quality Models
Experience with Implementation
                                                          5 REPORT DATE

                                                           .li.lv
                                                          6. PERFORMING ORGANIZATION CODE
7. AUTHOR(S)
                                                           8. PERFORMING ORGANIZATION REPORT NO.
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Monitoring and Data  Analysis Division
Office of Air Quality  Planning and Standards
U. S. Environmental  Protection Agency
Research Triangle  Park,  N.C. 27711
                                                           10. PROGRAM ELEMENT NO.
                                                           11. CONTRACT/GRANT NO.
12. SPONSORING AGENCY NAME AND ADDRESS
                                                           13. TYPE OF REPORT AND PERIOD COVERED
                                                           14. SPONSORING AGENCY CODE
15. SUPPLEMENTARY NOTES
16 ABSTRACT
                report summarizes and intercompares  the  details ot five major regulatory
cases for which guidance provided in the "Interim Procedures for Evaluating Air Quality
Models" was  implemented in evaluating candidate models.   In two of the cases the evalua-
tions have been completed and the appropriate model  has  been determined.  In three cases
the data base  collection and/or the final analysis  has  not yet been completed.  The pur-
pose of the  report is to provide potential users of the  Interim Procedures' with a des-
cription and analysis of several applications that  have  taken place.  With this informa-
tion in mind the user should be able to: (1) more effectively implement the procedures
since some of  the pitfalls experienced by the initial pioneers can now be avoided; and
(2) design innovative technical criteria and statistical  techniques that will advance
the state of the science of model evaluation.
     The analyses show that the basic principles or framework underlying the Interim
Procedures is  sound and workable in application.  The concept of using the results
from a prenegotiated protocol for the performance evaluation has been shown to be an
appropriate  and workable primary basis for objectively  deciding on the best model .  Sim-
ilarly, "up-front" negotiation on what constitutes  an acceptable data base network has
been established as an acceptable way of promoting  objectivity in the evaluation.  Pre-
liminary concentration estimates and the need for accurate continuous on-site measure-
ments of the requisite model input data are also important.
17
                               KEY WORDS AND DOCUMENT ANALYSIS
                  DESCRIPTORS
                                              b.IDENTIFIERS/OPEN ENDED TERMS  c.  COSATI I leld/Croup
Air Pollution
Meteorology
Mathematical Models
Performance  Evaluation
Statistics
                                              Performance Measures
                                              Technical Evaluation
4B
12A
1S DISTRIBUTION STATEMENT
Unlimited
                                             | 19 SECURITY QLASS iTtus Repor;
                                                Unclassified
                                                                         21 NO OF PAGES
                                              20 SECURITY CLASS
                                                 Unclassified
                                                                        22. PRICE

-------

-------