Summary Of Complex Terrain Model Evaluation


                                                JUNE 1985
SUMMARY OF COMPLEX TERRAIN MODEL EVALUATION
  ATMOSPHERIC SCIENCES RESEARCH LABORATORY
     OFFICE OF RESEARCH AND DEVELOPMENT
    U.S. ENVIRONMENTAL PROTECTION AGENCY
RESEARCH TRIANGLE PARK, NORTH CAROLINA 27711

-------
       SUMMARY OF COMPLEX TERRAIN MODEL EVALUATION
                            by
                  Fred D. White, Editor
             American Meteorological  Society
                     45 Beacon Street
               Boston, Massachusetts  02108
                           and

Jason K. S. Ching, Robin L. Dennis, and William H. Snyder
           Meteorology and Assessment Division
         Atmospheric Sciences Research Laboratory
       Research Triangle Park, North Carolina 27711
                     Project Officer

                  Francis A. Schiermeier
           Meteorology and Assessment Division
         Atmospheric Sciences Research Laboratory
       Research Triangle Park, North Carolina 27711
         ATMOSPHERIC SCIENCES RESEARCH LABORATORY
            OFFICE OF RESEARCH AND DEVELOPMENT
           U.S. ENVIRONMENTAL PROTECTION AGENCY
       RESEARCH TRIANGLE PARK, NORTH CAROLINA 27711

-------
                      NOTICE
The information  in  this  document  has  been  funded  by
the United States Environmental Protection Agency under
Cooperative Agreement 810297  to the  American  Meteoro-
logical Society.  It  has  been  subject  to  the  Agency's
peer and administrative  review,  and it has been  approved
for publication as an EPA document.

Mention of trade names  or  commercial  products  does  not
constitute endorsement or recommendation for  use.

-------
                                    ABSTRACT
     The Environmental Protection Agency conducted a scientific review of
a set of eight complex terrain dispersion models.  TRC Environmental  Con-
sultants, Inc. calculated and tabulated a uniform set of performance
statistics for the models using the Cinder Cone Butte and Westvaco Luke
Mill data bases.  Three members of the EPA Meteorology and Assessment
Division reviewed the performance statistics and presented objective
analyses of the models and their performance.  An American Meteorological
Society Steering Committee summarized the reviews and formulated three
conclusions:  (1) none of the models can be regarded as up-to-date scien-
tifically; (2) one model  exhibited much better performance statistics
than did the others; and (3) overprediction was the most common problem
with the models.  This report consists of the AMS summary and copies  of
three independent reviews conducted to evaluate the model performance.
                                      m

-------
                                    CONTENTS

Abstract	,	iii

  1.  Summary of Complex Terrain Model  Evaluation 	  .   1
      AMS Steering Committee

  2.  Review of Complex Terrain Model  Performance 	 ,  ,  ,  ,   9
      Jason K. S. Ching

  3.  Review of Complex Terrain Model  Performance ............  61
      Robin L. Dennis

  4.  Review of Complex Terrain Model  Performance 	  ,  ,  . 113
      William H. Snyder

-------
          SUMMARY OF COMPLEX TERRAIN MODEL EVALUATION

I.     SUMMARY

       The complex terrain model evaluation is the third in the
       series conducted under the cooperative agreement between
       the American Meteorological Society (AMS) and the U.S.
       Environmental Protection Agency (EPA).  Like the earlier
       reviews of rural and urban models, this evaluation in-
       cluded three distinct tasks, a performance evaluation in
       which model predictions and field data were compared, a
       set of independent peer reviews, and a brief summary of
       the evaluation developed by the AMS members of the
       Committee.

       This evaluation differed from the previous projects in
       two respects.  First, the peer reviewers consisted of
       three EPA staff members, working independently.  Second,
       the performance evaluation included two data sets in-
       stead of one.

       All of the reviewers were on assignment from NOAA to the
       Meteorology and Assessment Division, Atmospheric
       Sciences Research Laboratory:

                       Jason K.S. Ching
                        Robin L. Dennis
                       William H. Snyder

       In our judgment, the reviewers presented valuable,
       objective studies of the models and their performance.

-------
They mirrored the reactions of the reviewers of  the
rural and urban models and the AMS Committee members  in
being disappointed in the technical quality of the mod-
els, and in concluding that a massive statistical analy-
sis is not the best way to analyze model performance.
The difficulty with the latter is that one obtains much
numerical information about model performance on limited
data sets, but very little about the strengths and
weaknesses of the models as they deal with specific
meteorological problems.  This group of reviewers and
the Committee members believe that future evaluations
should stress understanding of the physics of the models
and study of their performance in specific case  studies,
with less emphasis on massive statistical efforts.

It is important to keep in mind that the current evalua-
tion developed data on rather limited aspects of trans-
port and diffusion in complex terrain.  The models were
originally intended to deal with flow and dispersion
around isolated obstacles, in particular on the  windward
side of such obstacles.  Thus, the performance data
addressed only the source-receptor relationships on the
upslope and crest portions of nearby terrain obstacles.
This information is of course important in regulatory
applications, but it represents a very limited aspect  of
complex terrain problems.  It tells one nothing  about
up-and down-valley flow, slope flow or effects beyond
the immediate terrain obstacles.

However, within these limitations, three conclusions
about the models and their performance are evident:

-------
• The Models are Scientifically Outdated

  None of the models can be described as up-to-date
  scientifically.  Seven of the eight models still util-
  ize the Gaussian diffusion kernel with simple modifi-
  cations to account for the terrain effects.  The key
  modification is the reduction of the effective stack
  height of the plume by some fraction of the terrain
  height to simulate the passage of the plume closer to
  the terrain.  The eighth model develops a wind field
  and treats diffusion differently, but it is difficult
  to evaluate from the information presented in the
  users guide.

• RTDM a Clear Choice

  This performance evaluation was unlike either the
  rural or the urban reviews in that one model, RTDM,
  exhibited much better performance statistics than the
  others.  In both data sets, this model predicted con-
  centrations that were quite close to the observed, on
  average, although there were indeed large deviations
  between individual observations and predictions.  In
  contrast, the other models exhibited large deviations
  even in the average ratio of predicted-to-observed
  concentrations.

  RTDM is also one of the more up-to-date models, in-
  cluding improvements that are missing in most of the
  others.  Like COMPLEX/PFM, it determines the dividing
  streamline height, H crit., during stably stratified
  flow conditions, treating diffusion above and below
  this elevation differently.  It also prevents the
  unrealistic increase of the crosswind integrated

-------
       concentration with distance that would violate the
       second law of thermodynamics.  The model accounts for
       wind shear and utilizes onsite meteorological data  in
       detail.

     • Overprediction is the Rule

       One can generalize in saying that overprediction is  the
       most common problem with the models, and frequently  the
       overprediction is sufficiently gross as to make regula-
       tory application meaningless.  This overprediction  seems
       to be associated largely with plume impingement on  ter-
       rain being predicted to occur where it does not, or  in
       calculating excessive concentrations when it does,  but
       it is dangerous to generalize about the reasons for  poor
       performance.

II.    MODELS

       Eight models were included in the evaluation:

                           COMPLEX I
                           COMPLEX II
                           4141
                           PLUME 5
                           RTDM
                           SHORTZ
                           COMPLEX/PFM
                           IMPACT

       Five of the models listed are basically simple Gaussian
       models, using minor adaptations to deal with some as-
       pects of complex terrain flow and dispersion.  These

-------
       adaptations include modified terrain-receptor  height
       adjustments to account for vertical plume motion  over
       obstacles, enhanced lateral dispersion  under certain
       circumstances, adjustments to account for wind  direction
       shear, and limitation of plume approach height  on impac-
       tion to some minimum value.  RTDM and COMDPLEX/PFM  are
       more contemporary in a scientific sense, the latter
       using potential flow adjustments in the wind field  for
       part of the calculations, but none of these seven
       develops or simulates the three-dimensional transport
       and diffusion field that is actually present in such
       situations.  Of the group evaluation, only IMPACT treats
       the problem with such a technique, and  it is difficult
       in a review such as this one to determine how  faithful  a
       depiction this may be.

       Inasmuch as the technical details of the models have
       been well summarized by TRC, we did not consider  it
       necessary to repeat the description here.  Similarly,
       since the reviews themselves are contained in  this
       document, there is no point in providing a digested
       version of them.
III.   DATA SETS
       Two data sets were used for the model evaluation,  repre-
       senting a departure from the rural  and  urban  reviews
       that were completed earlier.  This  is of  course  commend-
       able, since it does not leave the evaluation  at  the
       mercy of a single batch of data.  However,  although the
       two data sets are probably the best  that  could have been
       used for the validation study, they  leave much to  be
       desired in terms of completeness and representativeness.

-------
       The Cinder Cone Butte data were the product of an excel-
       lent research project.  The key terrain feature is a
       nearly perfect cone rising from a nearly flat plain, and
       both the concentration data and meteorological measure-
       ments were as detailed as one could possibly hope to
       achieve.  The sources, however, consisted of passive
       releases of tracer gases and smoke which, while they
       provided very reliable concentration data, did not
       represent typical industrial sources in which large
       buoyant plumes are common.  Thus, this set of data is
       unrepresentative in a very important way.

       The second data set, obtained at the Westvaco plant
       site between 1979 and 1981, represented a much more
       typical industrial problem, but the concentration and
       meteorological data obtained were limited.  Further-
       more, one could hardly describe this particular site
       as being ideal from the standpoint of a validation
       study.  The plant is situated in a rather winding
       valley, a configuration which distorted both the flow
       trajectories and the diffusion from an idealized
       complex terrain problem.

       It was necessary to condense both data sets somewhat
       to make them digestible and consistent for this evalu-
       ation.  It was also necessary to limit the study of
       one of the models, IMPACT, to considerably less than
       the full year of Westvaco data, presumably because of
       computer costs.
IV.    PERFORMANCE
       As we have noted earlier, the models tended to over-
       predict very seriously in most applications.  One
                               6

-------
       could choose any one of a number of statistical  sum-
       maries to illustrate the point, but the following
       provides the flavor of the results:
             Average Ratios of the 25 Highest One-Hour
            Predictions to the 25 Highest Observations
                Without Regard to Time or Location
                                       Ratios
                                             CINDER CONE
             Model          WESTVACO SO,,        TRACER
          COMPLEX I             9.2                1.6
          COMPLEX II            19.6                3.8
          4141                  6.2                3.0
          RTDM                  1.7                1.0
          PLUME 5               7.4                2.7
          COMPLEX/PFM           7.7                2.5
          SHORTZ                6.9                2.1
          IMPACT                10.0                0.5
          (*Impact was run only on selected portions of the
          WESTVACO data.)
       Clearly, gross overprediction is the rule for all of  the
       models other than RTDM on the industrial SO2 data, and
       overprediction still appears in the passive tracer data,
       except for IMPACT, which underpredicted.  Curiously,
       IMPACT was a serious overpredictor in the SO2 evalua-
       tion.
V.     CONCLUSIONS
       The AMS members of the Committee feel that the  following
       are the key findings of the evaluation.  These  points
       are not a synthesis of the findings of the reviewers,
       although each of them concurs in several if not all of
       the conclusions.

-------
The models and the data sets against which they were
compared provide a very limited representation of  the
full variety and range of the flow patterns and
diffusion situations actually occurring in complex
terrain.  The evaluation has provided limited data  on
predicted and observed concentrations that might be
expected on nearby slopes and crests, nothing more.
Problems involving internal valley circulations,
diffusion in meandering valleys, slope flow, and
terrain more distant than the immediate obstacles
cannot be assessed with the models discussed in this
study.

The voluminous performance measures developed in this
study, like those in the rural and urban reviews be-
fore it, are difficult to digest.  Both the reviewers
and the Committee would have been happier if the com-
parison against the data had provided more information
about what the models were doing in a limited number
of specific situations.

It is highly unlikely that complex terrain problems
can ever be treated intelligently with a set of
"cookbook" models.  Flow and diffusion in complex
terrain is, and will continue to be, complex.

-------
                 REVIEW OF COMPLEX TERRAIN MODELS
           Prepared for the AMS-EPA Steering Committee
                                by

                         Jason K.S.  Ching *
               Meteorology and Assessment Division
             Atmospheric Sciences Research Laboratory
               U.S.  Environmental Protection Agency
                Research Triangle Park, NC  27711
                             April 1985
On assignment from the National  Oceanic and Atmospheric Administration,
U.S. Department of Commerce.

-------
                                 ABSTRACT

     A two stage review of eight different complex  terrain dispersion
models was performed.  First,  the scientific  bases  for  these models, both
separately and as an aggregate,  were examined.   Second,  the operational
performance of these models against two  sets  of complex  terrain model  evalu-
ation data bases as prepared by  TRC Environmental Consultants,  Inc. were
reviewed.  These models:  COMPLEX I,  COMPLEX  II,  COMPLEX PFM, 4141, PLUMES,
RTDM, SHORTZ and IMPACT and references  to the documentation and information
of each model reviewed are found in the  reference list  of the TRC draft
report "Evaluation of Complex  Terrain Air Quality Models", Wackter and
Londergan (1984).  The first seven models listed are  Gaussian type models;
the eighth, IMPACT, is a deterministic  finite difference numerical grid
diffusion model.

     The approach taken to parameterize  plume rise, dispersion, transport,
terrain adjustment, and mixed  layer heights by  each model was examined. Each
model's treatment of terrain features and the stability dependency of  flow over
terrain and its potential  importance to  complex terrain modeling was studied
in greatest detail.  It was found that  terrain  adjustment and transport was
treated quite differently by all  the models in  this set; in combination
with other model components, it  was concluded that  large differences in
model predictions is expected.   The incorporation of  potential  flow concepts
and Froude number scaling and  the use of a critical dividing streamline
height by some models is clearly an important improvement.

     The performance analyses, evaluation and conclusions drawn by TRC as
reported in Wackter and Londergan was useful, but did not provide sensitivity
diagnostics.  Additional model comparisons are  included  to compliment  the
TRC study in an effort to compare the different model predictions relative to
the methodology adopted by individual models  in handling the more important
and sensitive model components.   It was  shown that  model prediction differences,
and errors can be as much a function of  inadequate  model methodologies as well
as in the formulation of the basic model  components.
                                      10

-------
TECHNICAL EVALUATION

Introduction

     Air quality simulation models used for assessing the air quality impact
of point sources are based on solving the standard diffusion equation.
Typically, model approaches include numerical  simulations on a prescribed
grid or the adoption of the Gaussian solutions with ad hoc approaches to
specific circumstances.  The first approach permits a high level  of realism,
but computational requirements and model  complexities increase with the
level of realism and accuracy required.  Gaussian models are based on
simplistic solutions to the diffusion equation.  It is a highly parameterized
scheme based on a few empirical  studies of simple flows over uncomplicated
surface conditions, but it is clear that  most  of the applications quickly
overextend the physical and empirical bases.  [Improvements in the state-
of-the-art Gaussian models are possible with (1) improvements in  the parame-
terizations by extending the various parametric theoretical or empirical
bases and by (2) more general and consistent ad hoc engineering modeling
approaches to the input data, to boundary conditions and to handling of unique
conditions.]  The Gaussian modeling approach is highly practical  and relatively
inexpensive to use as an assessment tool.  Unfortunately, in most instances,
the initial application of a model (even  a sophisticated one) to  a specific
problem will yield potentially large errors, much of which may arise out of
inconsistences between the theoretical  limitation and the unique, ad hoc
approaches specific to each model.  A recent article by M. Smith, (1984)
provides a synthesis of seven independent reviewers' evaluations  of Rural
Air Quality Simulation Models.  In that cooperative study between the U.S.
EPA and the American Meteorological Society (AMS), 10 models were studied
and evaluated against a common data base, much as was done in the current
study.  In particular, COMPLEX I and II,  PLUME5 and 4141, currently being
evaluated, were also evaluated in that study.   Additionally, a similar
evaluation study was conducted for the EPA/AMS on a comparable set of Urban
Air Quality Simulation Models.  In general, both studies reveal  the scientific
bases of these models to be out  of date.   Recent improvements in  our knowledge
                                  11

-------
of the structure and evaluation of the planetary  boundary  layer  (PBL),
especially during the convective and  also  during  the  nocturnal periods,
have not been incorporated into these models  for  example.   Further,  stagna-
tion conditions cannot be handled by  these Gaussian  schemes,  and the stabil-
ity and dispersion parameterization classified  according to Pasquil1-Gifford-
Turner developed from surface data may not be appropriate  for tall  stacks.
I fully support Smith's (1984) viewpoint  and  the  bases  for his summary,
including a statement regarding inherent  uncertainties  in  both predictive
schemes and in the inadequacies of the input  and  the  receptor data  bases.

     The technical and scientific bases of the  Gaussian models in Smith's
1984 article have not been appreciably improved in the  present set  of models.
Therefore, relatively speaking, the scientific  bases  are considerably more
limiting for model application in complex  terrain where each  of  the  processes
modeled is affected in some way by the underlying terrain.  There are
recent theoretical concepts that provide the  first real break-throughs  in
extending the Gaussian modeling approach  from flat to complex terrain.
First, a basis for neutral flow over  simple two and three  dimensional
obstacles has been postulated based on potential  flow theory  and fluid
modeling tank experiments.  Second, in stable conditions,  a critical
dividing streamline height can be computed based  on Froude number scaling
below which the flow cannot surmount  the  obstacle.  This is examined in
greater detail below.

     All the models in this study compute  one,  three  and twenty-four hour
short term averages from which long term  predictions  such  as  annual  averages
can be calculated.  Additionally, annual  averages are also calculable directly
by some models.  Each model handles multiple  sources  and large numbers  of
receptors.  Input meteorology, emissions  and  terrain  data  were treated
somewhat differently between each model.   Default procedures  for handling
special limitations such as missing data,  stagnation  conditions, etc.,
varied between models.  No two models were exactly alike.   For one,  each
Gaussian model adaptation to complex  topography was different.   Subsequently,
these models all varied in their predictive skills against the two  common
                                   12

-------
data bases used in the TRC evaluation.  The most sophisticated models
recognized Froude number scaling and incorporated critical  height for flow
separation (RTDM and COMPLEX/PFM).  Bounds were established which limit
reflection to contribute no more to surface concentration than would satisfy
the constraint of the 2nd law of thermodynamics wherein, the surface concen-
tration cannot increase with downwind distances past the first maximum
(RTDM).  Plume centerline is adjusted for terrain differently for all
models.  Specific model components for each model is discussed below.

Specific Model Components

     Some of the major processes in complex terrain to be modeled include
plume rise, horizontal and vertical dispersion, transport,  transformation
and deposition which are all influenced by the variations in the underlying
terrain.  Table 1 lists some of the more important aspects  of each of these
processes and the extent to which they are included in the  eight models.
In several instances, the treatment of some particular aspect was either  not
ascertainable or ambiguous from the available documentation, and is so
indicated by an X.  The available model documentation varied considerably in
the degree of specificity for different model components.

Plume Rise

     Plume rise is an important dispersion process for buoyant point sources.
Effluents may experience downwashing beyond the stack but typically rise  to
a final equilibrium height (Ha) where no excess buoyancy remains.  A
finite distance is traversed during plume rise and environmental air is en-
trained into the plume during this transition period.  Plume growth after
final plume rise is dependent on the magnitude of the turbulent intensity
of the environment.  Plumes during this transition phase don't impact
ground level  concentration unless downwind receptors are located above
stack top, as might be the case for complex terrain.  The theory and formu-
lations for transition and final plume rise by Briggs (1975 and 1984) are
generally the accepted standards and with the exception of  SHORTZ and
PLUME5 have been adapted by all the models in this set.  The presence of
                                  13

-------

in
CU
c_
3
+->
re
cu
U-
c
cu
c
o
Q.
0
CJ
t
•c
o
s:

c
»r—
re
t_
^
cu
X
cu
^—
Q.
E
o
CJ

1— 1
cu
^^
JC
re
^—

1—
CJ

Q-
s:
M
h-
a:
o
CO
x
o
Ct:
in
cu
E
3

1— <
^*
1—4
«3-

x
UJ
1 2£
Q- la-
s' 0.
o
CJ

X
UJ
1
Q. l— <
2: •->
c
CJ

X
UJ
_l
Q- i— i

cu
c_
3
1 *
re
cu
u.

c/> in ^C m ^C in in in in
cu cu """^ cu *•*. c o cu cu o cu cu
>->-x x -z.y--z.-x -z. xxz >->- z>->-
5
in in co in in inmin t-
ooo cu cu«cuo cu oocu cucu re
zzz >- s-^i->-z >- zz>- >->- XQ.X

inin in inlom in inmininm in
CUCUO CU CUI'-'CUO CU CUCUCU CUCU OCUO

in in uD in in in in
CUO CU'CUO O OOO OJO OCUCU
>-zx x >-cr>>-z z zzz >-z z>->-

•— • -S"^-^^
o; uj o; uj
UJ t— UJ I—
(— Q- 1— Q-
oo 2: to s
ry **^- ^ •«*— '
CJ O
in »— ' in ^— • in in PO in in
CU CU CU CU»OO O OOO CUO OCUO
>-xx>-x>- >-«a-zz z zzz >-z z>-z

in ^» in in in UD in in in
CUXCU CU CU'OO O CUOO CUO OCUO
>-^ — ->- >- >-cnzz z >-zz >-z z>-z

in in in in LD in in
CU CU -x>- >- >-cnzz z zzz >-z z>-z

in in in LO in -x>- >- zcozz z zzz >-z z>-z

c
0
•1 — •
c m 2 in •
CU O C t- O .C O-Q£
in •?•" re cu t» ^^ co o *
•r- +j T- o. re in <*- o 4-i CL
oi re in in cu t. *^ •>— E
t. in-r-x: cum+J-i-)O-^'i —
cu +J f~ 3 ^***1 co 4^ c ^ o t_ o re
Ecu m re cuooicutt-ree
3CTJ re CZ3TJT3 E-r- -r-Cr— +:•!-
r— CUCU 2 CUC <»-
Q-Q.O c cuo-r- s-re j=-r-cure
i — ••a o re"d Q.t_i — re re.o
c • •— 4 t_ i— i j2 c e in orerecu O c
OCL. a. re oojo-t-Q.T-24-1 cu
•t- E •!- > >i "C 1- i- O -4J OJ 4-> O E
+j r— 3 I— •r-ocu+Jin T- in t- a. -c -t->c
•i-re+J cccores-E t- re c: mo
in -i— c -^ reccncuo cj D-—
c 4-) cu o • • >> re *f~~ Q. t- r^ 2 'o -M
res-E re cOjzEin<4- •• ••- cTjre
t-reo •(-> o 3 c 3 t- 4-> <«- +j o
-------
ground based and elevated stable layers, will strongly control plume rise
and dispersion. Portions of plumes that penetrate elevated inversions,
Z-j, will not impact the near ground level unless receptors are located at
or above this level. Thus, the extent to which plumes penetrate elevated
stable layers greatly influences the subsequent ground level concentration.
In effect, this process acts as a source depletion term and must be
modeled accurately. Most Gaussian plume models have adopted an all-or-
nothing approach whereby only those plumes whose plume rise is less than
the mixed layer height can impact the surface. Such models are burdened
with the need to model Ha and Z-j very accurately in order to correctly
determine the sign of Ha-Z-j, but both parameters are known to display large
variability and accurate modeling of these parameters and their difference
is apt to be unsuccessful. This methodology is so critical that extremely
large overpredictions or underpredictions are expected.

COMPLEX/PFM (hereinafter PFM), RTDM, and IMPACT are the only models in
this set that address this modeling of plume penetration issue by
incorporating a layered plume rise calculation scheme. The other schemes
are apt to exhibit very large predicted variances due to the all-or-nothing
plume rise into elevated stable layer methodologies. PFM calculates
plume rise in layers determined from hourly wind and temperature profiles
interpolated from the twice daily radiosondes. The reduction of buoyancy
flux is calculated layer by layer until the final plume rise. In this
fashion, PFM accounts for elevated inversions and vertical wind shears.
However, this calculations is still an all-or-nothing calculation within
any of these stable layers aloft. RTDM and IMPACT calculate the fractional
amount of plume imbedded in the elevated stable layers according to the
Briggs (1975) rectangular plume model. Documentation for PLUMES was
unclear on this point. Perhaps the most acceptable means of modeling
plume penetration into elevated stable layers are to be found in Briggs,
(1984) and Weil and Brower (1982). Briggs (1984) assumes the mixed layer
stability near the inversion base to be the same as that of the thermal
gradient in the elevated stable layer. Weil and Brower (1982) suggest a
15
-------
more realistic value in the mixed layer. None of the models reviewed utilize
either approach.

Generally, plume rise is predominantly buoyancy driven. However, in
some instances, it would be more accurate and consistent to include the
contribution by momentum. Currently, RTDM and SHORTZ exclude this feature;
it was indeterminate for 4141, PLUMES and IMPACT. The inclusion of stack
tip downwash was generally available as an option for most of these models.
These last two features are not expected to be critical features of plume
rise as it impacts complex terrain model predictions unless the emissions
are predominantly non-buoyant.

Dispersion

After initial buoyancy-induced plume spread, Gaussian plume models
utilize bivariate parameters for plume spread in the vertical and lateral
directions. This spread is prescribed to be a function of atmospheric
stability and is a function of turbulent intensity and travel distance.
Default values rely on the use of empirical sigma curves according to
stability classes known as attributed to Pasquill, Gifford and Turner. The
experimental bases for these functions are for relatively flat terrain,
near surface, nonbuoyant plumes and the general applicability of these
curves to complex terrain and to tall stack, buoyant plumes is yet to be
established. With this in mind, the operational complex terrain model that
appropriately incorporates on-site turbulence data (intensity or sigma theta,
phi) for hourly averages is preferred by this reviewer. Only RTDM and
SHORTZ provide a user option to compute ay, az from input, on-site, tur-
bulence data. In the meantime, studies to derive generalized dispersion
parameterization schemes applicable to complex terrain are highly recommended,
Currently, several major field studies programs are underway in Complex
Terrain which should provide for improvement in dispersion parameterization
(ASCOT, see Gudiksen and Dickerson, 1983; EPA Complex Terrain Model Develop-
ment Study, Lavery et al., 1982, 1983, 1984; and Plume Model Validation
Study, Hi 1st, 1978).
16
-------
With the exception of COMPLEX I all the Gaussian models of plume
dispersion in this set were bivariate. COMPLEX I considered the lateral
spread a constant of 22 1/2°, thus independent of stability. IMPACT
computes dispersion using eddy diffusivities rather than sigma specifica-
tions. In the current version, IMPACT calls a subroutine (DEPICT) which
computes KZ=K u a^i where u=wind speed, a
-------
With the exception of IMPACT, each model included an adjustment to
plume spread arising out of buoyancy-induced entrainment of environmental
air while undergoing plume rise. Each considers this spread as Ah/a
where a is a proportionally constant and Ah is plume rise during transition.
Table 1 lists values of a for each of the models. Differences are as large
as 36%, yet relatively insignificant with respect to uncertainties associated
with other parameters. An additional enhancement of plume spread occurs as a
result of vertical wind shear acting over plume rise. RTDM, SHORTZ and
PLUMES include this enhancement which depends on wind shear through plume
rise and distance traveled. Baroclinic condition, nocturnal wind shear all
quite prevalent climatically suggest that this feature attains considerable
importance in complex terrain for conditions where receptors are at altitudes
in the vicinity of effective stack heights. Pasquill's suggested relationship
of 0.17»A9«x can be comparable to dispersion induced by atmospheric turbulence,
especially for the more stable dispersion classes.

There is evidence to suggest that fumigation after sunrise and
incidences of high ground level pollutant concentration are correlated,
both in simple and in complex terrain (Frank, Blagun and Slater, 1981), yet
this process is ignored by all the Gaussian models. It would be of interest
to determine whether IMPACT simulates this process; the current documentation
does not permit this examination. The TRC model evaluation did not provide
analyses sufficiently detailed to determine how critical the process is in
complex terrain, but it is suggested that both data sets could provide
a limited basis for such an analysis.

Transport

Recent theories by Hunt and Mulhearn (1973) Hunt, Puttock and Snyder,
(1979), and Hunt and Snyder (1980) provide a bases for dispersion modeling
applicable to non-flat surfaces. Utilization of potential flow theory
permits the generation of streamlines over a variety of surfaces for neutral
to slightly stable conditions. The adjustment of the flow field over the
obstacles causes distortion in plume spread due to compression and expansion
-------
of the streamlines. For flows that are stable or for large obstacle
sizes, a critical dividing streamline height, HC, exists above the obstacle
base for which plumes whose center lines are below HC do not flow over
the obstacles (i.e., ground impaction), while plumes above HC rise over the
impediment. This critical height reduces to zero in neutral to unstable
stability and the plume then flows over the obstacle at all elevations.
There is no comparable theoretical or physical bases for the unstable or
convective boundary layer. One intuitively expects some undulation of the
plume relative to terrain; this undulation can be as large as the underlying
terrain or only slightly distorted. A half-height correction is recommended
by the Specialist Conference on EPA modeling guidelines (Chicago, 1977) as a
suitable and conservative terrain adjustment procedure for screening purposes,
until the advent of more fundamental empirical bases. Airborne lidar
studies (Uthe, 1984) show examples of plumes over terrain which depict
undulations almost parallel to the underlying terrain.

Given this background, we see that only RTDM and PFM utilize potential
flow approximation and dividing streamline concept. Without a doubt the
most physically consistent and sophisticated Gaussian model in this set is
PFM. It is by far the most detailed and complete in introducing two and
three dimensional terrain features and some engineering approaches to
incorporate oblique wind direction angle of attack to the major terrain axis.
Further, the relationship for plume spread ay and az over flat terrain
is calculated as modified by potential flow theory. PFM, however, uses a
rather awkward scheme to compute concentrations; under stable conditions,
concentrations for receptors below the height of the dividing streamline HC
are calculated using COMPLEX I, the monovariate dispersion model and the
concentration above HC utilizes the PFM modified flow fields. PFM defers to
COMPLEX II for its concentration computation under unstable conditions.
PFM doesn't provide for partial reflections to limit concentrations from
increasing with distance as might be possible due to influence of large
terrain slope on multiple reflection calculation scheme.
19
-------
The RTDM recognizes Froude number scaling and the dividing streamline
concept. This model surveys the gridded terrain contours for each 10 degree
sector and tabulates the distance from the source to the successive predeter-
mined contour intervals, stopping at the highest point within this sector.
RTDM ignores receptors at successive terrain features farther downwind, a
somewhat artificial restriction, but consistent for screening considera-
tions. Note that RTDM makes no adjustment to the sigma curves over terrain
as PFM does. RTDM is the only Gaussian model in this set to limit concen-
trations from increasing with downwind distance to avoid violating the
principle of the 2nd law of thermodynamics from a multiple reflection calcu-
lation scheme applied to a steep terrain feature.

RTDM and SHORTZ are the only models to extract their wind profile power
laws from on site data, and along with the IMPACT model consider the trans-
port wind the average between stack top and final plume rise, (where is
eq. 17 in Plume 5 documentation?)

None of the models include surface uptake by dry deposition, (i.e.,
source depletion) and capabilities for photochemistry was limited to PLUMES
and IMPACT.

Terrain adjustment

Believing that methodology for treatment of plume and flow adjustment
to underlying terrain is critical in complex terrain models, this aspect was
examined in some detail. I thought it useful to provide the details of my
comparative analyses (all figures in this review are taken from the documen-
tation provided). I concluded that none of the models (except for COMPLEX
I and II) utilize the same modeling procedures for adjustment of plume
height over terrain. It is suggested that large variance in prediction may
be caused inadvertently by a variety of rather artificial modeling procedures
This will become evident when comparing model output against the Westvaco
data set as discussed later. The clarity and quality of the model procedures
for terrain adjustment varied widely between model documentation. COMPLEX
I and II handle terrain adjustment as indicated in Figure la. Thus,
20
-------
'T-1.
Figure la. COMPLEX I, II (PFM), 4141.
HH(ai*HE*MFAC

HI^AI-HE'HFAC'MC-ZM
PlumaPain
» n f
ii
Figure 15. (COMPLEX PFM).
Figure 2. (RTDM)
21
-------
Ha = H-(1-FT)AE

where Ha is the adjusted effective plume height

H is the equilibrium plume rise above flat terrain
AE is ZR-ZS where
ZR is receptor height and
ZS is elevation of base of source
FT is terrain adjustment factor

For unstable to neutral conditions, FT is set to 0.5; and is zero for stable
flows. Terrain adjustments are limited to receptors lower than the top of
the lowest stacks modeled. Thus,

Ha = H-0.5 AE unstable
Ha = H-AE stable

There is no provision for calculating the height of the dividing streamline,
or HC in these two models.

The terrain factor is zero for stable flows; thus the plume is horizontal
and the plume can impact the surface for all A£ _> H. In this case, COMPLEX
I and II perform VALLEY-like impingement calculations. These two models
do not recognize a dividing streamline height; plume rise above the mixed
layer height is ignored.

In PFM, for stability classes D, E and F, for Ha > HC
Ha = (H-HC) HFAC for ZR > HC
and
Ha = (H-HC) HFAC + (HC-ZR) for ZR < HC

HFAC is determined from application of potential flow theory and the plume
path shown in Figure Ib to be along a streamline at plume rise, H. In this
situation, concentration for receptors at heights less than HC are zero.
22
-------
Ha AE - 0.5 Ha
Ha = - ( ) + 1)
2 0.9ZB - 0.5Ha

This approach is quite different in form from the previous models (there is
no simple form for Fj for example), and depends on Zg, the height of the
base of the elevated inversion. This elaborate technique was not justified
on any physical principle.

RTDM recognizes a critical height HC for which only the air above this
level can flow over the terrain; plumes below will impinge onto the terrain.
For neutral to unstable conditions, HC=0. Using HC as the reference level,
RTDM then utilizes a plume path coefficient, C, which is exactly the same
as Fj with default value of 0.5 for all stabilities (as was used in the
TRC model performance). Flow immediately above HC have little but suffi-
cient energy to crest the terrain, so

Ha - FT Hpc

where HpC = height of plume above HC

(FT = 0 if Hpc < 0, plume below HC implies terrain impaction)

For plume height at or above terrain crest,

Ha ' Hpc - (l-FT)HTc where

Hyc is height of terrain above Hc.

This methodology is illustrated in Figure 2.
23
-------
For plumes below HC, the plume center!ine is not adjusted for terrain and
COMPLEX I is invoked to perform the impingement calculations, which is
known to be conservative prediction.

When HC is zero, as in neutral to unstable conditions, it can be shown
0.5AE
that HFAC = 1 for FT = 1/2 as in COMPLEX I and II. For these flows,
H
PFM utilizes COMPLEX II to compute the concentration fields, after plume
rise and mixed layer depths have already been computed using the PFM approach.

Model 4141 also shown in Figure la utilizes a modified CRSTER approach, where,

Ha = H - (1-FT)AE
Fj = 0.5 for A, B, C and D stability and
= 0.25 for E, F stability.

This differs from COMPLEX I and II in the stable classes and is less
conservative. AE is limited to positive values, i.e., receptors are assumed
to be no lower than stack base.

If Ha is less than zero, the effective stack height is set to zero and the
plume impacts the surface. Moreover, terrain adjustment is limited to those
receptors at heights less than the top of the lowest stack considered. This
stipulation is the same for COMPLEX I and II.

PLUME5 (see Figure 3) considers plume rise relative to the location of
the stable layer. When Ha above the top of the stable layer XQ = ^ f°r
receptors below Z-j and also for receptors outside a stable layer for which Ha
is imbedded. In the case of Ha below elevated stable layers over terrain,
FT=O for receptors located below Ha/2; and Ha = H-A£. (i.e., plume impaction
possible). For receptors above Ha/2 the relative height of the terrain
relative to plume rise is
24
-------
(a)
(b)
Mixing Height
Mixing Height
TERRAIN TREATMENT
WITHia MODEL
nnnnnnnTinifi n iniiifi
Note: R1-R5 are receptor points at 5 ring distances.
Figure 3. (PLUMES).
EPA (STABLE CONDITIONS). CRAMER (ALL CONDITIONS). NOAA
(STABLE CONDITIONS) WHERE CLOSEST APPROACH TO RECEPTOR
_ — EPA ANO NOAA (NEUTRAL AND UNSTABLE CONDITIONS)

. — .— ERT IALL CONDITIONS)
Figure 4. (IMPACT).
25
-------
When the flow is neutral to unstable HpC = H

and Ha = FT H for H < AE

Ha = H - (I-FT)AE for H > AE

Then, since FT = 0.5 this formulation is the same as for COMPLEX I and
II for H > A£ for those stability groups, but without the limitation on a
receptor to be lower than the top of the lowest stack. Appropriate
adjustments are made to Z-j to avoid inconsistences.

The method utilized in SHORTZ is illustrated in Figure 5a and b. SHORTZ
restricts ground level concentration calculations to receptors below Hm
where Hm is the actual mixed layer height above the source and is not
modified by terrain (Figure 5a). Thereafter, for computational purposes, an
effective mixed layer height (Figure 5b) is utilized, restricted to be no
less than Hm. Plumes above Hm do not contribute to ground level concent-
rations. Plume centerlines below Hm are presumed to be unaltered by presence
of terrain (i.e., Fj =0). The SHORTZ approach is thus an ad hoc engineering
method without a sound physical basis. Statistics that are heavily weighted
by receptors generally located above Hm will be underpredicted. Receptors
below Hm on an average will overpredict. Thus, model performance statistics
are complicated by opposing tendencies for overprediction due to lack of
terrain adjustment to plumes, and the assignment of zero concentrations to
receptors located above the mixed layer, the latter expected to be more
pronounced during the night under stable conditions when Hm is likely to be
low. Their use of the term "surface mixing layer" is a bit archaic and can
be misleading since they really intend the "mixed layer". (Section 2.1.1.1,
alluding to a definition of mixing depth by use of turbulent intensity rather
than by thermal stratification alone, is missing in their documentation).

IMPACT is unique among this set. It is a deterministic finite differ-
ence numerical grid model which predicts the concentration fields in three
dimensions based on the mass conservation laws for species. The flow field
26
-------
Top of Mixing Layer
Ml.xing Depth Measured
at Airport
(No calculations /
made for grid
points with /
terrain elevations
above top of /
mixing layer (HSL>
at airport) /
Assigned
Source
(a) Mixing depth H*{zs) used Co determine uhecher Che stabilized plume Is contained ulthln
the surface mixing layer.
Effective Top of Mixing Layer
Effective
Mixing Depth
Mixing Depth Measured
• at Airport Equals
Minimum Depth
(No calculations
made (or grid
points with
terrain elevations
above top of
mixing layer (MSL)X
at airport
Assigned to
Receptor
(5) Effective alxlng depth H' {«} assigned Co receptors for Che conc«ncratlon calculaelons.
Figure 5. (SHORTZ).
27
-------
is non divergent throughout and terrain following at the surfaces. Disper-
sion from point sources into the flow field is accomplished by means of
diffusivity prescribed as a function of stability. The accuracy is deter-
mined to some extent by the grid resolution and must be chosen carefully to
balance both practical computational requirements and accuracy. Terrain
considerations are thus handled implicitly as part of the numerical model
rather than explicitly as in the Gaussian models.

Mixed layer height (Hm)

The mixed layer height is an important dispersion parameter in Gaussian
models. Certainly, this term appears explicitly as a reflection condition.
Its criticality is most pronounced, however, with respect to criteria for
partial plume penetration. Hm remains a reflecting surface for plumes
until plume rise is larger relative to Hm. Then Hm may be modeled as an
insulating surface. In complex terrain, receptors may lie above, at or
below H,,, and thus the models predicted concentration will be subject to the
accuracy of predicting Hm and Ha as well as the modeling strategy. It can
be shown that enormous prediction errors are possible due to improper modeling
strategy using the Gaussian model. Numerous mixed layer growth theories
and parameterization approaches now exist, especially for the convective
boundary layer. None of these models apply such approaches, however.

Typically, Hm exhibits large diurnal changes which are modulated by
synoptic scale conditions and by season and geography. In the present case
of complex terrain modeling, topography will certainly influence the spatial
distribution of Hm, as well as Ha as discussed earlier. For this latter
reason, the accuracy of Gaussian models is truly sensitive, even critically
dependent on model methodology for plume rise H, or Ha relative to Hm.

COMPLEX I and II are similar. Mixed layer heights are preprocessed
from twice daily radiosondes soundings and linearly interpolated for hourly
values (see Figure 6). Hm is linearly interpolated from the previous day's
maximum value to the present day maximum unless the surface is stable at
28
-------
AM PM
7
> Uneer ImercoJJtten in Tim* Along
I Constant P*«
-------
sunrise at which time the interpolation proceeds linearly from about sunrise
using the morning minimum mixed layer values and the standard Holzworth (1972)
technique. A plume with final rise above Hm is considered insulated from the
surface. This interpolation scheme for Hm is rather crude and lacks temporal
resolution to deal with the rapid rise in the morning transition period.
Plume rise, discussed earlier, has a 1/2 height terrain adjustment and thus,
if Hm is level, the assumption of insularity will cause the average predicted
concentration at receptors above Hm to be relatively low, and those at or just
below to be maximum (i.e. near plume center!ine). Thus large variations are
expected using those models, and especially if receptors are distributed
such that they are above the stable inversions at night and below during
the day. Note also that terrain factors are limited to those receptors
below stack top. Since plume rise relative to mixed layer heights is so
critical, model predictions for a hypothetical set of receptors lower than
stack top will be quite different from those receptors above stack top.
This restriction seems unnecessary, arbitrary and defeating and should be
considered for removal.

The PFM methodology for determining the temporal variation of Hm is
considered the superior approach in this set of Gaussian models that uti-
lizes twice daily soundings. It is an improved Benkley-Schulman (1979)
approach providing better vertical resolution of the temperature advection.
Unfortunately, no allowance is made to alter mixed layer height as a function
of topography, even though its terrain adjustment procedure was probably
the most sophisticated and soundly based theoretically. Thus, it too, as
with other similar models is expected to suffer large predictive variances.
Layered plume rise certainly provides improvements over the COMPLEX models.

Model 4141 (see Figure 8) utilizes procedures for the MPTER model in
developing hourly mixing heights. This procedure is identical to COMPLEX I
and II.
30
-------
3
i

§
3
i
Sunrise 1400 Sunset
Sunset
Sunrise 1400 Sunset

TIME
1400
Figure 8. Mixing height as used by 4141.
(tt) URBAN

0
M
HOMIX
I
YESTERDAY j
MXHTI-1 ' ff\
***T " ' ' '
1
j-_©
T MNHT
, 1
TODAY

'••»..© -.
'/-"*••* MXMT' H. ®
I 1 1
TOMORROW

. MNHTI « 1
SUNSET MIDNIGHT SUNBISE 1400LST tUNfCT MIDNIGHT 140OLST
(b) RURAL 2

H
T
O
X
O
jr
X
I

YESTERDAY
MXHT 1 - 1
TODAY TOMORROW
ff,
" ">• ..Y |
" * ••• G) i

j
••••.. »«MT, (T) |
^ /£\ N *'******•**•» MXMTI
^ V^y ^^ * **»««j_ m w i i • i
| ) vl UNHTIO |
SUNSET MIDNIGHT SUNRISE 1400L1T SUNSET MIDNIGHT
1400LST

•••••• NEUTRAt

— —.-. STASH

. BOTH
Figure 9. MI XING HEIGHT ALGORITHMS USED IN PLUMES.

31
-------
PLUMES (see Figure 9) utilizes a similar procedure for determining mixed
layer heights as COMPLEX I and II with the following difference. If the
surface layer is stable between sunset to midnight and from midnight to
sunrise, the model uses a minimum value of Hm determined from the morning
sounding, otherwise h is obtained by interpolation between the previous and
current days maximum Hm. If the hour before sunrise is stable, the mixed
layer is obtained by linear interpolation between the min Hm of the morning
soundings and the max Hm of the afternoon sounding. This is a significant
difference since Hm doesn't drop to a minimum value for stable conditions at
night in COMPLEX I and II and 4141, but it does for PLUMES. The ramifications
on concentration predictions will probably cause large differences in model
outcome between those two approaches as discussed above.

RTDM doesn't specify any modification to procedures used in the CRSTER
model for computing its mixed layer heights. In this regard RTDM methodology
is identical to COMPLEX I, II and 4141. But terrain adjustments are consider-
ably different as discussed earlier.

SHORTZ model strongly recommends the option of user input mixed layer
data. However, it defaults to CRSTER-like calculation but substitutes a
2.5 times the significant roughness element height, Z0 for minimum Hm in
stable conditions. This procedure is totally arbitrary however. For one,
the user has no guidance for a proper choice of appropriate Z0. Once
determined, the Hm value is used to eliminate receptors from consideration
in concentration prediction for the hour when the receptors are above Hm.
Then the effective Hm for purposes of the Gaussian model is considered
terrain following, but level above any terrain depression below stack base.
(see Figure 5).

In terms of predictive errors attributable to model uncertainties in the
relative values of Ha and Hm, the nocturnal period is likely to be the critical
period. It is apparent that the current set of model formulations for
32
-------
nocturnal Hm will need to be Improved. It is clear that studies of nocturnal
mixed layer evolution is an active research area and any formulation and
even definition is apt to be controversial. Nevertheless, efforts to
incorporate state-of-the-art models for Hm through a literature review and
assessment program is highly recommended. The use of accoustic sounders to
obtain nocturnal mixed layer depths for direct input into model schemes may
provide an attractive alternative approach.

IMPACT is a numerical grid model coded in modular form to treat separ-
ately, the transport wind field, diffusivity, plume rise, stability, and
chemistry. The current version of IMPACT utilizes WEST, an objective ana-
lyses terrain dependent, three dimensional and divergence free wind field
model. Input wind observations are projected upward by power law extrapo-
lation, unless sounding data are available. Atmospheric stability data are
interpolated with a r~2 weighting, which in turn controls the transpar-
encies of the horizontal or vertical grids. Initial horizontal wind fields
are obtained using a 1/r2 weighting; then the interpolated winds are verti-
cally shifted to clear the terrain. Utilizing the gridded transparencies,
the wind fields are then made divergence free. At this state, we have no
clear discussion on the sensitivity and general applicability of the relat-
ionship of stability to transparences. Apparently, the table lookups for
transparency values have been predetermined on the basis of prior simula-
tions on idealized problems and are therefore only qualitatively correct
for general applications. The generality of this method is thus not
addressed. The discussion on the subject of transparencies is ambiguous in
its presentation and its technical merit therefore could not be adequately
judged. However, its sensitivity to stability is quite pronounced. Plume
rise is computed by a separate module, and the current version utilizes
Briggs1 1975 formulations. The module presently recognizes the first
inversion layer, ignoring others, a potential source of error. Plume
transport utilizes a Crowley second order flux corrected scheme which
apparently is sensitive to the choice of the direction of the flux correction
relative to the the actual direction of the flow field. Typically, numerical
33
-------
diffusion is minimal, and in general, overpredictions are prevalent. Alter-
native diffusivity schemes are available by selection of choice of module.
The one adopted here uses a K diffusion dependent on wind speed, ae, the
standard deviation of the vertical wind and A, a turbulent length scale,
each dependent on stability in some manner. Aside from possible individual
module parametric limitations and requirements for a practical code, the
model user must find the appropriate compromise between desired accuracy
which is a function of grid resolution and the high cost of running the
model. It is clear that finer resolution exacts a very large cost increase
to derive a solution. Also, the modelers report large model inaccuracies
in the near source region, potentially affecting its application for sources
in deep valleys and canyons adjacent to the sources. The wind field is
computed elegantly and physically consistent with mass conservation, but is
not valid for situations where winds are transient and nonuniform such as
associated with a nocturnal jet above terrain. A power law extrapolation
would fail in such a situation. (This would apply equally to those Gaussian
models that utilize similar vertical extrapolation procedures). One clear
advantage of IMPACT is the removal of terrain adjustment requirement since
the plume dispersion is solved as part of the solution. However, once
again, the accuracy of WEST will depend greatly on the quality, resolution
and frequency of the input wind and stability data. IMPACT does not require
an explicit specification of Hm.

Summary - Technical Evaluation of Complex Terrain Models

This set of Gaussian models exhibits wide variations in model adaptation
to complex terrain. The degree of realism is of course quite limited, how-
ever, PFM and RTDM introduce Froude number scaling and the use of a critical
dividing streamline height which greatly enhance their scientific bases. For
this reason, both models show great promise. However, both have numerous
identifiable technical and modeling deficiencies that must be addressed to
become more consistent and hopefully more accurate. Technically, PFM is
clearly the superior model in its handling and methodology of incorporating
potential flow theory into a Gaussian framework.

The Gaussian models by necessity utilize various ad hoc engineering
approaches to modeling plume rise, transport winds, mixed layer heights,
34
-------
stability, dispersion, boundary layer structure, terrain adjustment, partial
plume penetration, limits on reflections (as mathematical artifact) as well
as handling input meteorological, emissions and terrain data, all of
which can potentially introduce extremely large errors in predicted concen-
trations. With the exception of plume rise formulations, the other models'
components treatment are generally archaic at best and/or scientifically
deficient. These were discussed above. The IMPACT model was the only
deterministic numerical grid, non Gaussian type model in this set. An
attractive feature of IMPACT is its construction in modular form, which
therefore permits relatively easy upgrading to permit some maintenance of a
state-of-the-art status. Greater realism and sounder physical bases are
potentially possible with this type of model. This model however requires
some ambiguous and rather arbitrary weighting schemes (method of transpar-
encies based on stability classes) to complete the wind field analysis, and
as such this was not clearly and convincingly argued. This model is relative-
ly expensive to run and its accuracy in point source dispersion prediction
depends in part on fine space and time scale grid resolution which must be
at least as small as or smaller than the plume.

Beside quick fixes to ad hoc modeling approaches, especially for the
Gaussian models, there is need for such models to be able to handle fumiga-
tion, down slope flow and flow channeling effects. Additionally, improved
input data from remote sensors such as acoustic sounders and lidars for
mixed layer depth, plume height and range resolved turbulence intensity,
and wind and wind shear is now technically possible and commercially avail-
able. Properly deployed, these improved input data and upgraded-improved
models will potentially be quite general and yield far more accurate model
predictions than currently available. Some of the present models provide
options for user input data such as these. Such options are highly desirable.

In the next section, a limited analyses was conducted using TRC's
model evaluation data. It will be shown that model errors are as much a
function of inadequate modeling methodologies as of parameterization formula-
tions for the basic physical processes.
35
-------
MODEL PERFORMANCE EVALUATION

Review of the Model Evaluation Study

Performance statistics were generated and used to evaluate eight
complex terrain air quality models. This study, performed by TRC, utilized
data bases from two different sites, Cinder Cone and Westvaco, each of
different terrain characteristics. The results of their evaluation and
tables of performance statistics are reported in the TRC report by Wackter
and Londergan (1984).

Cinder Cone is a rather symmetrical and simple shaped 3-D hill; data
collection was limited to non bouyant plumes in periods of neutral to
stable stratification, of relatively short duration but provides high
resolution receptor density. The one year of data at the Westvaco site is
for a tall stack buoyant plume in rugged and complicated terrain with
limited spatial resolution on observations over the complete range of
atmospheric stability. TRC prepared and ran these models using a common
set of input data agreed to by the modelers and subsequently generated a
set of extensive model evaluation statistics against the two evaluation
data bases. Their analyses plan followed the recommendations of the 1980
AMS/EPA workshop on model performance evaluation to study bias, scatter,
correlation and frequency distributions (Fox, 1981). TRC's study included
both paired and unpaired comparisons, on highest, second highest and one,
three, and 24 hour averages. The result is a very extensive matrix of
comparisons.

TRC's documentation on scope, rationale, means of performance and data
interpretation was well done. There is little quarrel with their general
interpretation, discussion and conclusions. One finds this analysis a
useful start towards an evaluation of these complex terrain models, and
moreover, one has the published results with which to conduct further
comparative analysis. Overall, this was an important study which led to the
result that this set of complex terrain models performance against these two
36
-------
data bases ranges from a qualified good to very poor. In general, TRC
finds this set of models to overpredict the Westvaco data ranging from
factors of 2 to 20! The analyses suggest the importance of properly modeling
source characteristics and terrain configuration because the model performance
exhibited large differences between the Westvaco and the Cinder Cone evalua-
tion statistics, with respect to overall or gross errors. Each of the
models varied in their relative performance depending on the nature of the
test, whether it be with different data bases, different averaging periods,
with stability, wind speed, highest, second highest and so on. In considera-
tion of the technical deficiencies discussed earlier, I would recommend
against a premature ranking and judgment of these models based upon the
TRC study. In my opinion, at this point, one has little confidence in
predicting the outcome of an extension of these models to different terrain
setting for different sources, for different geographic locations.

Rather, I would like to suggest that the TRC study provide a very
useful and important beginning towards identifying inadequacies in each of
these models. It is instructive and highly recommended to use these data
bases and model runs in conducting detailed sensitivity analyses. It is
conceivable that orders of magnitude improvements are possible with simple
trouble shooting exercises. The technical evaluation discussed in Part I
anticipated potentially large errors; the performance study confirms this.

Other Diagnostics

I started by performing some additional analyses of the published
data to highlight potential areas of critical model sensitivity (Tables 2
through 5). It was clear, however, that the interpretation and the conclu-
sions that could be drawn were still rather limited and that further
computation and analyses will be required for adequate sensitivity tests.
These will be discussed in a set of recommendations to follow.
37
-------
Table 2 provides a summary for the performance category of "25 Highest",
the "Highest" (paired In times and paired by station), and "All Events"
paired by both time and station using only hourly averages, for both Cinder
Cone and Westvaco, and for runs with and without IMPACT in terms of over or
under prediction (Second Highest class was not included in this summary
table, but should be similar to that of the Highest class). Values for
overprediction refer to the ratio of predicted to observed while conversely,
values listed as under predictions refer to the ratio of observed to predicted.
At a glance, a general pattern of model overprediction is apparent. Individual
models vary greatly in the accuracy of prediction, but large errors are
exhibited by most of the models, especially for Westvaco. On the other
hand, the numbers in parentheses represent prediction falling within a
factor of two; when considering the potential modeling pitfalls, such
success is rather remarkable.

In the 25 highest category, only IMPACT underpredicted, and that was
limited to Cinder Cone. RTDM was consistently the most successful for the
three test categories. In general, almost all models were more successful
with Cinder Cone than Westvaco. This should be explored further. IMPACT
exhibited the largest inconsistency between the two different data sets.
COMPLEX II was the most unsuccessful. These findings are consistent with
those reported in the TRC report. We notice a fairly large improvement, of
order 50%, in model performances using the Westvaco data when comparing
results obtained between the full data set to the more limited set applicable
to the IMPACT runs. In this case, the data set was limited to 20 randomly
selected days out of those sets of days with the six highest observed
concentration at each of the 10 monitors. This analysis is therefore
biased and as a result the overprediction will be and in fact was smaller.
COMPLEX I was unexpectedly more successful than COMPLEX II in just about
every category throughout all the tables since ay was restricted to a
constant value of 22.5°, independent of stability. Other performance
measures for COMPLEX I and II were similar. One suspects that the large
overpredictions for COMPLEX I and II arise out of the "VALLEY-like" computa-
tions where plumes are assumed to impinge onto the surface. Interestingly,
-------

01
u
i
0
Ifr-
01
O> U
J? "C
C3 QJ
(.
c cc.
o ~~

•U O)
o >

"O 0)
O) (/I
U -O
o. cr

•• 4J
4-1 O
u •»-
•c o>
01 t.
0 0.
a. u
1- 0)
Ol "C
> c
c =

o
o.

o ^—
u
<0 0
w> *•» X
4-> (/I
> «C
uj a.
2T
5 3
t_
01
•C 0)
c c
*»•" O
u u
(—
CJ
c a.
0 J£
4J
a o
*J 0 -«.
in u s
i?.Mcj
in <;
i_ •—
"a ~3
a.
i —
o
4J CL

cu ^*
J= O
o^ o o
£ > s
a; ui i—
E 01 0
•r- 2 (U h-
t/1 3 o
a: cc
.c cu
Oi E

£ -^
3
in
CM

CD
-o aj
c c
•i- O

Ol
o
2:

1 —
u

c
UJ
Q£
a.
LU
>
c

o> «• S"
co in ai
• • 1 1 1 1 • X
CO CO »—
"~

o* m to
CM »T *T
~~ ' ' ' ' ^

— — . ^-**-^*-».
gco co co ^* c P^
c; cc co u; o *~
CM CM ^^ CM *-• *— • ~4

»«M co r*» r^ cO m cc

in ro r** co in »~* ^r
t-* *—

CM CO — 1C P~ cT-i CO
C^CClCCCMrt
in c co co CM,-, co £

to cc oj PI — * in
— ^* in
it . . . • i
^_ ^H ,-4 C\J

00 C? ^^ ^^ CO
^- m c^ cv *-«
i— « CO r^ CO *-*

___
§
1 1 1 1 1 • 1

1 1 1 | 1 | 1

1 t 1 t 1 1 1

r^.

i i t i i • i
C\J

C\J CO
LO C
III • 1 * 1
i— l CNJ

C
c
' ^

I 1 1 1 1 1 1

' i i i i i t

i i I i i i I
t—
O ^
>— ' k— > U.
Q ^- ^- Q.
UJ
Q£ X >f X
a. LU LU LU m rvj
C£. — J «J -_) LU ^^
uj Q-cuQ-'-»s:3rc£

^ O^O^-Cu^LO

C
c;
CO

c
a
cv

^>
cr
C

f—

„
39
-------
PFM does not show much better success than the other Gaussian models in
general. This may be due in part to invoking the COMPLEX I and/or II
schemes for flow below HC and for unstable stability class respectively.

Comparison of the Highest paired in time shows much improved results
over the 25 High. RTDM still performs well, showing skill at Cinder Cone,
but underpredicts at Westvaco. When paired in time IMPACT also underpredicts
for Cinder Cone. When paired by station, RTDM is the most successful. All
models overpredict; IMPACT was least skillful overpredicting by a factor of
more than 20 for Westvaco. Further, when paired by station, the models
tended to overpredict to a greater extent than for the paired in time
category.

The most stringent test is the comparison of All Events, paired by time
and by station. In the mean, the prediction of this category appear to be
relatively more successful than for the 25 High or Highest category. For
example, more than half the model's prediction were better than a factor of
two on average. This good news is offset to a large extent by the practical-
ly negligible correlation between prediction and observation for Westvaco
data as shown in Table 2b. This poor correlation is of course not unexpected,

Table 3 is presented which shows in detail what TRC suggested; that the
degree of success of models varied with the location of the receptors.
This is examined in greater detail. From their Figure 3-3 and following
the suggestion in the TRC report, the receptors were classified into five
distinct categories for the reasons as discussed below:

(I) Stations (1, 3, 4 & 6) are near stack (average 0.8 km downwind from
the stack), on rising terrain and elevation, slightly above stack
top and impacted by winds from the NW.
40
-------

cu
c
•a

Wl
O •&•*
<4- C
i- co
cu •<-
a. o
•f—
CU 4-
•o a>
o o
S 0

c c
•t- O
^3 »f«"
s_ -M
i— *o
CU r-
|_
UJ

UJ «— •
o <
o
*J T3
O 0)

0 *5
a.

^j
w>

C
cu
^
UJ

1—
r"«
«c

O
o
^
1—
oo
UJ
3 C
o
*f~
<4^
^0
io
JD
tl
cu
a

cu
£
• fH»
I—

C
• !•

1—
O

a.
s

t—
o

a.
21
^^
2

I—
0
<
Cu

I-H

<
a.
'5

fmm»
0)
•c
o
2:

t— 1
co
•
o

in
•3-
•
O

CO
o
o
•
o

CM
0
0
0

•
O

OO
•
o

in
t— i
0

CO
.—4
o
o

in
CO

CO
<&
o
"

X
UJ
— 1
Q.
2:
o
o

.— I
co
*
o

«—4
to
•
o

o
•
o

f«^
•—1
o
o

CM
CO

CD
2:
a.
X
UJ
1
a.
2:
o
o

CO
CM
•
o

en
CO
•
o

CM
IO
o
0

*^
un
•
o

•a-
co
o

<— 1
^
*«^
^~

CM
CM
*
o

CO
CM
•
o

,— 1
in
o
•
o

en
o
<— i
o

^
-------
(II) Stations (5, 7, 8 & 9) are farther downwind (average distance of 1.25
km downwind from the stack and higher than class I but impacted by N
to NNW flow. Receptors in class I and II are positioned along a
ridge sloping upward from the NNE to SSW and thus presents a broad-
side pattern to the plume.

(Ill) Station 2 is about the same distance from the stack as class I (about
0.9 km), is just above stack top but presents a convex face to plumes
from the SSE direction.

(IV) Station 10 is the most distant receptor at 3.5 km impacted by flow
from the SW with potential channeling due to orientation of the ridge
at its location. This receptor is also located at about stack top.

(V) Station 11 is 1.5 km to the NW of the stack, but situated at stack
base elevation.

While this is a rather arbitrary categorization, the groupings are unique
and some rather interesting results are observed. Additionally, published
data permitted the analysis of the standard deviations of residuals for all
events and the ratio of observed to predicted variance for the 25 Highest
and is presented in Table 3b.

Class I. All the models overpredicted the 25 Highest, most overpredicted
for All Events. One notes that the option for gradual plume rise was not
used in COMPLEX I, II and PFM and one wonders if this factor contributed to
the poor prediction skill for these receptors which are relatively close to
the stacks. Class I overprediction for COMPLEX I and II models is much
larger than for Class II. One wonders if the Class II receptors being
higher in elevation than Class I makes it subject to default concentration
of zero for some of the receptors in the plume penetration region, thus
reducing the average prediction values. Note that PFM is much more skillful
than COMPLEX I and II in Class I which suggest the modeling approach is
much improved using potential flow theory. Reference to Table 3b shows the
42
-------
Table 3a. Model Comparison by Site Classes Using WESTVACO Data

A: 25 Highest Category, Unpaired Time or Station
B: All Event Paired in Time and Station
Model
OYERPREDICT
COMPLEX I
COMPLEX II
COMPLEX PFM
4141
PLUME 5
RTDM
SHORTZ
UNDERPREDICT
COMPLEX I
COMPLEX II
COMPLEX PFM
4141
PLUME 5
RTDM
SHORTZ
Performance
Group
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
I
11.49
5.36
23.62
5.39
6.20
2.18
5.29
(1.25)
(1.53)
6.73
3.03
-
_
(1.07)
(1.79)
-
(1.88)
-
II
4.65
5.84
2.38
4.66
(1.54)
-
(1.67)
(1.11)
(1.14)
2.68
(1.36)
3.61
(1.34)
4.77
(1.88)
Site Classes
III
-
-
(1.00)
-
-
_
(1.20)
7.40
52.00
8.15
52.00
(1.00)
13.00
9.64
52.00
3.97
26.00
(1.76)
12.50
3.27
IV
(1.26)
5.08
2.27
-
-
-
(1.21)
(1.43)
(1.29)
2.50
4.84
28.00
(1.04)
4.91
(1.29)
2.72
(1.24)
V
-
—
(1.34)
-
-
-
-
(1.65)
18.67
2.25
18.67
5.00
3.03
18.67
5.33
56.00
2.28
9.67
(1.47)
4.00
43
-------
Table 3b. Complex Terrain Model Variability Using WESTVACO
Data by Site Category

A: a (Residuals) All Events
B: Ratio of Variance (OBS/PRED) 25 Highest
Model
COMPLEX I
COMPLEX II
COMPLEX PFM
4141
PLUME 5
RTDM
SHORTZ
Performance
Statistic
A
B
A
B
A
B
A
B
A
B
A
A
B
I
3012
0.26
4477
0.01
1210
0.03
594
45.26
1262
0.03
317
0.34
1157
0.03
Site
II
998
0.18
1369
0.01
703
0.05
974
0.02
622
0.12
246
2.24
395
0.38
Category
III
44
11.97
44
34.06
76
0.17
44
44.17
43
5.90
49
9.36
71
4.76
IV
128
30.64
239
0.37
116
0.11
51
42.52
69
0.52
68
3.69
91
1.08
V
46
0.75
45
1.97
67
0.19
45
5.48
44
21.93
46
2.39
58
2.23
44
-------
standard derviations of the residuals to be the largest for Class I, and
the ratio of the variances shows the predictive variances to he much greater
than those for the observed (with the exception of 4141). Thus, the predic-
tion errors are greatest for near source receptors.

Class II. Every model in this set (except RTDM) overpredicted for the 25
highest category, and with the exception of 4141, the skill increased
relative to results for class I. Interestingly, all the models underpre-
dicted in the category of All Events. A closer examination and further
analysis of model performance to explain this dicotomy is recommended.
The residual in Table 3b showed much improvement in the run to run errors
compared to Class I. Various difference between models were noted when
comparing the variance of observed vs. predicted. The behavior of 4141
seemed anomalous when comparing Class I and II performances.

Class III. With the exception of being impacted from a different direc-
tion, the receptor in Class III is similar to that of Class I, in terms of
proximity and relative height of stack to receptor. Yet, there is a large
difference between the model performances. In general, the skill of PFM,
RTDM and SHORTZ is good; the remaining models largely underpredict the
concentrations, both for the 25 highest and for the All Event category.
The data, as published, do not provide sufficient bases to explain the
difference between I and III. Results as published do not permit a more
detailed sensitivity analyses such as possible correlation of NW flow
associated with stronger winds, lower mixing heights, different stability
classes, etc. versus perhaps opposite results for southerly winds. Further,
the terrain at site 2 (Class III) presents a convex face to the plume while
the Class I and II terrain presents more a concave to broadside face to the
plume. The ratio of observed concentration between Class I and III is 4
and 2 for the 25 high and All Event category, respectively. In contrast,
the ratios of the predicted concentration between Class I and Class III are
much larger, greater than two orders of magnitude for the Complex I and II
for example. PFM, RTPM and SHORTZ were comparatively similar, i.e., within
about a factor of 5 of the observed. Table 3b lists the standard deviations
45
-------
of differences between observations and model predictions, for All Event
category and also the ratio of variance of the observations against the
variance of the model predictions according to receptor positions. The
highest value for each of the four stations in Class I and II are listed.
We see illustrated very small standard deviations of the residuals for
Class III compared to Class I (and Class II). We also see model variances
are very small relative to observed variances for Class III but this is
reversed for Class I (and II) when observed variances are so very much
smaller than predicted ones. (The small range for the residuals for Class
III is somewhat remarkable!)

Class IV. With the obvious exception of being farther away, the topographic
features of Class IV are similar to those of Class I. Yet, the model perform-
ance skill was greatly improved for Class IV. All but three of the models
overpredicted for the 25 high and all underpredicted the All Event category.
Thus, one tentatively concludes that the close proximity of receptors magni-
fies the impact of inadequate model components treatment (such as relative
terrain to plume rise height, wind speed, h,
-------
provides a crude but revealing disparity in model performance skills. The
Westvaco data are underutilized in terms of sensitivity analyses.

Table 4 compares model performance for Westvaco in terms of wind
speed and stability using TRC published data. (The Westvaco published data
did not include results of IMPACT performance in their categories) All
models overpredicted in the highest 25 category with respect to wind speed.
The performance in the All Event category was better, but some models
underpredicted. Model skill seem to be unrelated to wind speed class as a
general rule. RTDM performed quite well for the 25 high, but its performance
for the All Event category was not as good. Westvaco stability classes E
and F 25 High category was overpredicted by all models, and also for the
All Event category with the exception of PLUME5 and RTDM. It is not clear
whether this overprediction is due to the large overprediction at the close-
in receptor stations 1, 3, 4 and 6, to the SSE of the stack (Class I). A
sensitivity, diagnostic type analyses would provide insight. Prediction
for neutral and unstable classes were mixed. Interestingly, COMPLEX I and II
greatly underpredicted for these stabilities, and greatly overpredicted the
stable classes. Noteworthy, 4141 exhibited similar behavior as the COMPLEX
set. In contrast PLUME5, which is similar to 4141 and the COMPLEX set
overpredicted the 25 high throughout the range of stability, as did PFM and
SHORTZ. (One concludes that models with seemingly similar characteristics
can behave so extremely differently.) The variance about the predicted means
for the various stability ranges was much greater than 3 orders of magnitude
for COMPLEX I, II, and 4141. Moreover, note that an extreme example of model
differences was observed in the neutral stability class. Complex I and II
and 4141 predicted near zero concentrations (25 High and All Events) con-
siderably different from that observed; but PLUME5 predicted values 4
orders of magnitude higher than for the COMPLEX I, II and 4141. PFM was
relatively successful compared to COMPLEX I and II in the neutral case.

Note also that the range of the observation varied by only about 50^
of the observed means for any of the stability classes.
47
-------
Table 4. Model Performance Using WESTVACO Data
A: 25 Highest B: All Events
w- (<2.5 m sec'1)
w (2.5 - 5.0 m sec'1)
w+ (>5.0 m sec"1)
Model
OVERPREDICT
COMPLEX I
COMPLEX II
COMPLEX PFM
4141
PLUME 5
RTDM
SHORTZ
UNDERPREDICT
COMPLEX I
COMPLEX II
COMPLEX PFM
4141
PLUME 5
RTDM
SHORTZ
Statistics
Class
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
w-
11.53
2.29
23.68
2.44
9.04
5.22
6.23
(1.59)
5.33
(1.56)
-
-
(1.27)
2.50
(1.92)
3.32
-
Wind Speed
w
7.76
5.03
18.13
5.17
5.66
5.15
6.29
(1.42)
7.20
2.43
-
-
(1.30)
(1.25)
(1.26)
(1.95)
-
w+
(1.02)
(1.28)
27.85
(1.49)
4.80
3.49
18.72
(1.68)
-
4.46
(1.39)
-
-
5.38
4.37
-
3.78
-
A-C
-
-
2.33
-
3.98
-
3.37
2.61
11.75
3.96
13.29
(1.58)
5.35
15.50
(1.72)
(1.81)
3.20
(1.10)
Stability
D E
10.61
10.65
19.42
11.11
4.75 8.25
(1.58)
2.86
(1.16)
9.06 4.07
(1.45) -
(1.40) (1.21)
5.22 5.82
(1.92) 2.62
189.63 -
oo —
151.00 -
00 —
4.61 -
758.00 -
CO —
- (1.56)
3.04 2.27
-
F
10.90
8.12
23.18
8.54
7.62
(1.06)
7.32
(1.50)
(1.51)
(1.45)
7.42
2.52
-
-
-
_
5.29
(1.71)
-
48
-------
Another interesting situation involves PFM, which for the unstable
class, uses COMPLEX II to compute ground level concentration. However, for
this type of stability, PFM overpredicted, COMPLEX II underpredicted in the
25 high, and while both underpredicted in the all event category, PFM skill
was much better than COMPLEX II. The expected similarity was not observed.
Since procedures for computing plume rise, H, was the major difference, the
difference in prediction for unstable class might be due to difference in
computing H and Hm. Again, diagnostic, sensitivity analysis is needed.

Table 5 is like Table 4 but for Cinder Cone data. Cinder Cone results
show considerable improvement in model performance than for Westvaco. With
respect to wind speed, the models show a general trend towards diminishing
model prediction errors with increasing wind speed as one might suspect
intuitively.

The correlation between observation and model prediction exhibited a
rather unusual behavior with respect to the wind speed classes. The
typical correlation for the low and high wind speed classes ranged from
about 0.4 to 0.8. The correlation for the intermediate wind speed class
was nearly zero for all but PLUMES, RTDM, SHORTZ. This result does not
have any immediately obvious explanation.

In terms of stability, the models overpredicted the stable classes
(with the exception of IMPACT and RTDM). The performance for class C and D
was exceptionally well (except for IMPACT) in striking contrast to the
similar comparison using Westvaco data. One can conjecture that model per-
formmance degrades rapidly when terrain becomes more complex and pronounced,
and when source emission character differs.
49
-------
Table 5. Model Performance Using Cinder Cone Data
A: All Events B: Highest
Model
OVERPREDICT
COMPLEX I
COMPLEX II
COMPLEX PFM
4141
PLUME 5
RTDM
SHORTZ
IMPACT
UNDERPREDICT
COMPLEX I
COMPLEX II
COMPLEX PFM
4141
PLUME 5
RTDM
SHORTZ
IMPACT
Statistics
Class
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
~A
B
A
B
A
B
A
B
w-
2.33
(1.93)
2.50
4.34
2.33
2.90
3.00
3.41
(1.83)
2.90
(1.00)
(1.10)
(1.17)
(1.48)
-
-
-
-
-
-
(1.00)
-
2.00
(1.61)
Wind Speed
w w+
(1.50)
(1.05)
(1.60)
2.36
(1.00)
(1.18)
(1.60)
(1.59)
(1.40)
(1.91)
-
(1.20)
(1.77)
-
-
_
-
-
-
(1.25)
(1.16)
-
2.50
3.14
(1.20)
(1.20)
(1.68)
(1.00)
(1.60)
(1.41)
(1.20)
(1.55)
-
(1.20)
3.23
-
(1.38)
-
(1.20)
-
_
(1.25)
(1.57)
-
2.50
3.14
C-D
-
(1.29)
(1.19)
(1.00)
(1.13)
(1.65)
_
(1.10)
-
(1.14)
(1.14)
(1.14)
(1.14)
(1.03)
-
(1.33)
(1.07)
(1.67)
2.67
3.10
Stability
E
2.80
2.00
3.00
4.73
2.60
2.42
3.20
2.85
2.40
3.73
(1.00)
(1.20)
(1.50)
-
-
_
_
-
-
(1.20)
(1.04)
-
(1.67)
(1.73)
F
3.25
(1.95)
3.25
4.91
2.75
3.41
4.50
4.82
(1.75)
2.05
(1.50)
(1.09)
2.00
2.91
-
-
-
_
_
-
-
-
2.00
(1.46)
-------
SUMMARY AND RECOMMENDATIONS

Technical Evaluation

In my opinion, the introduction and explicit use of both Froude number
scaling and potential flow theory complimented with fluid modeling studies
for simple, two and three dimensional obstacles provide an important exten-
sion of the physical basis for Gaussian model adaptation to terrain of
varying complexity. In this respect, the PFM and to a lesser extent RTDM
are potentially the most advanced and general of the Gaussian models in
this set. In its current state, PFM "patches in" the computational proce-
dures of COMPLEX I or II in certain instances and is thus a limitation to
the PFM model. This limitation can and should be corrected.

RTDM doesn't incorporate potential flow parameters explicitly. How-
ever, RTDM's terrain adjustment procedures are simplistic engineering
approaches that only crudely approximate the terrain adjustment predicted
by a potential flow model. With the exception of terrain adjustment RTDM
is relatively the most multi-featured, detailed, operational Gaussian model
for complex terrain in this set, as shown in Table 1, and its relatively
superior performance with Westvaco and Cinder Cone reflects this. (It was
the only model, for example, that limited the amount of reflected concentra-
tions to that imposed by the second law of thermodynamics.)

I would judge the merits of SHORTZ, 4141 and PLUMES to be technically
comparable, but deficient for their general application to complex terrain.
Their terrain adjustment procedures, while clever, are without scientific
merit when compared to PFM and RTDM. The relative performance of these
three models would depend on the application; the source, meteorology and
type of terrain and geography. Different but potentially large prediction
errors would arise out a combination of unique, subtle but significant,
and different model inconsistencies as discussed earlier. COMPLEX I and II
have purposefully been modeled for screening applications and are therefore
characteristically conservative. Thus, their technical bases for plume rise,
51
-------
mixing height, dispersion are not in general dissimilar from that of the
SHORTZ, 4141 and PLUME5 set, but operational characteristics were definitely
biased. Even here, the variation in the prediction error while large would
depend on the application. For example, the performance for near source
receptors at low levels will differ for those receptors at very high al-
titudes when plume rise and mixed large heights are low. This difference
is not entirely due to the inadequacy of modeling the various parameters,
but to operational model assumptions such as the artificial off-on switch
for receptors at elevations about plume rise or Hm. Thus, large errors, and
variances may appear in the model outcome.

Several major technical deficiencies felt to be critical were evident.
Provisions for partial plume penetration into elevated stable layers were
missing in all models with the exception of RTDM. The current RTDM model
version, however, requires updating to reflect current knowledge about this
process. Universally, the Gaussian models ignore the potentially important
process of fumigation. All the models suffered in not incorporating the
recent advancements on dispersion parameterization and mixed layer growth
model formulations in convective conditions. Further, all formulations of
nocturnal mixing heights are crude, and their current usage probably
contributes to the great uncertainties in predictive skills. Those models
that utilized on-site wind, turbulence, mixed layer heights are considered
significant advancements in comparison to models limited to extrapolated
values.

The IMPACT model shows promise. The modularity of this modeling system
will permit IMPACT to attain state-of-the-art status with relative ease as
the scientific and technological bases for parameterizing processes in
plume dispersion improves. There is concern, however, that the current
scientific bases for plume dispersion in convective conditions is not state-
of-the-art. Further, the wind flow computation scheme which uses an ill
defined "Method of Transparency" was not adequately justified. Additionally,
model accuracy for complex terrain applications come at a heavy expense in
computation costs.
52
-------
This review did not examine the model performance for stagnation con-
ditions. It is anticipated that such scenarios can contribute to larger
near source impact; the knowledge of such conditions is presently in-
sufficient to permit a technical evaluation. Diagnostic studies, however,
can be developed to assess the importance of stagnation conditions on air
quality, and even with reference to the EPA standards.

The skill of any dispersion model will be affected by the quality of the
input meteorological data. However, it was out of the scope of this review
to perform an analysis of model sensitivities to input meteorological data
for this set of models. I would recommend that such analyses be performed
on these candidate models for completeness, since it is anticipated that
the predictive accuracy of different models will vary for a given uncertainty
in input data. For example, it can be shown that spatial correlation
between predicted and observed maximum ground level concentration degrades
rapidly for small wind direction uncertainties. Given level and rather
uncomplicated terrain, the spatial correlation for complex terrain condi-
tions is expected to be even more sensitive to wind direction uncertainties.

Performance Evaluation

The performance evaluation conducted by TRC was extensive in the type
of performance statistics generated. Their documentation of and the model
prediction runs were clear and in general fair to each model. (It was
unfortunate that transition plume rise was removed for COMPLEX I, II, and
PFM, and that the partial plume penetration options for RTDM were excluded
in the model runs). The results were mixed between the comparison for the
two different data bases. Further, the results were mixed for different
evaluation classes such as 25 High, Highest paired in time etc. Therefore,
the conclusion drawn with the present evaluation effort was limited, but
aptly discussed by TRC.

In my opinion, I do not believe it would be appropriate nor construc-
tive to make judgment on acceptability or ranking of any of these models
without additional sensitivity tests.
53
-------
Using TRC published data, it became quite evident that model skill
varied widely and in a generally inexplicable fashion when surveyed as a
function of stability, wind speed, height, range and unique topography of
receptors and for bouyant and nonbuoyant sources. Model's performances
changed from overprediction to underprediction or vice versa when evaluated
against different criteria classes such as 25 High, Highest, and All Event
categories. Thus, the order of ranking of each model according to skill
varied with the performance measure. It should be mentioned, however,
that the reviewer conducted such an exercise, and did find RTDM to be the
most consistent successful performer for most of the categories surveyed.
However, this is not intended to imply acceptance of the RTDM approach since
RTDM exhibited some erratic performance, and has some technical limitations,
discussed earlier. Rather, it optimistically points out that there is
clearly the possibility for the other models to improve their skills, even
within the framework of the Gaussian dispersion formulation.

The wide disparity between the rankings of each model for different
comparisons support this view. I believe it would be far better to extend
this study to include sensitivity analyses. I suspect that even small
changes in model operation procedure can drastically change the relative
performance for example. This sensitivity analyses even if it were limited
to the Westvaco and Cinder Cone data bases could still provide important
diagnostic information or clues to "troubleshoot" each model. A number of
studies that come to mind are to display the observed and predicted data in
terms of season, time of day, for the Highest, the 25 High and then to
stratify the All Event category, also by time of day, and by station.
Parameters to be displayed should include Hm, HC, H, and stability class as
well as the observed and predicted concentrations. Separate scatter plots
and analysis of observed, predicted and differences of observed and predicted,
and ratio of overpredicted or underprediction as dependent variables against
stability, wind speed, Hm, HC, H, Hm-H, H-HC, time of day, and distance are
possible diagnostic tools. Perusing Tables 2-5 and examining data on
correlation, positive residual etc. showed such erratic model performance and
raised too many unanswered questions which cannot be resolved with the
54
-------
currently available and published tables. These studies should attempt to
separate out errors due to model operations versus those due to technical or
scientific limitations. It is strongly recommended that modelers contemplate
the introduction of Froude number scaling and potential flow adjustments
for complex terrain applications.

Future Prospects and Recommendations

It is recommended that:

1. Models be upgraded to include those aspects of potential flow theory
and Froude number scaling such as described in the PFM model.

2. Improve the dispersion parameterization system by a) upgrading with
state-of-the-art convective scaling methodology and b) use on-site
turbulence data. For example, the use of remote sensing systems
such as doppler sodars may be promising. In this regard, the turbulence
intensities will be more representation of the dispersion potential at
plume rise.

3. Improve the transport wind data with the requirement for on-site mean
vector wind profiles. The local transport wind direction and shear
may be totally unrelated to any extrapolations from alternate and
distant measurements, such as airport observations for example.

4. Improve the methodology for determining the mixed layer depths,
especially during the nocturnal periods.

5. Include provisions for partial plume penetration into elevated stable
layers using the theory of Briggs (1984) or Weil and Brower (1983).

Tools to develop improved, more general, theoretically sound and
accurate Gaussian models are currently available. These include remote
sensing radars, sodars and lidars to provide high resolution, accurate,
unambiguous data on
55
-------
a) wind speed, wind direction, shears
b) mixed layer heights
c) plume characteristics
d) ambient turbulent wind fluctuation statistics, and
turbulent intensities
e) multiple layers

Carefully utilized, these sensors can yield information leading to bet-
ter terrain adjustment procedures for example. Further, differential absorp-
tion lidars are available for some pollutants from which range resolved con-
centration data can be obtained for research and model validation purposes.

Obviously, if very high concentrations can occur due to fumigation pro-
cesses, some research and model development effort will be necessary to in-
corporate this process in the computation schemes. Quick fixes may be
possible to the Gaussian model through a source enhancement technique
(analogous to a source depletion concept), but the truly improved model
will require substantial effort.

Active field and fluid modeling programs are underway (ASCOT; the EPA
Complex Terrain Study, the EPRI Plume Model Validation and Development Studies,
and other less comprehensive ones) to address the issues on Complex Terrain
modeling such as plume impaction and terrain adjustment, flow channeling,
dispersion, and data resolution requirements. Advancement in complex
terrain modeling will certainly benefit by a strong effort to follow through
those basic studies to the point of model development and validation.
56
-------
REFERENCES

Benkley, C.W., and L.L. Schulman. Estimating hourly mixing depths from
historical meteorological data. J. Appl. Met. 18: 772-780, 1979.

Briggs, G.A. Plume rise predictions. In: Lectures on air pollution and
environmental impact analyses. AMS, Boston, MA, pp 59-111, 1975.

Briggs, G.A. Plume rise and buoyancy effects. In: Chapter 8 in Atmospheric
Science and Power Production, DE 84005177 (DOE/TIC-27601), NTIS, U.S.
Dept. of Commerce, Springfield, VA 22161, pp 327-361, 1984.

Fox, Douglas G. Judging air quality and performance. Bulletin of the Am.
Meteor. Society. 62(5): 599-609, 1981.

Frank, N.H., B.E. Rlagan, and y. Slater. Diurnal patterns of sulfur dioxide.
Presentation at 74th Annual Meeting, APCA, Philadelphia, PA, 1981.

Gudliksen, P.M., and M.H. Dickerson. Executive Summary: Atmospheric Studies
in Complex Terrain, Technical Progress Report, FY 79-83. UCID-18878-83.
Summary, ASCOT 84-2, 1983.

H1lst, G.R. Plume Model Validation. EPRI EA-917-59, Electric Power Research
Institute, Palo Alto, CA, 1978.

Holzworth, G.C. Mixing heights, wind speeds and potential for urban air
plollution throughout the contiguous United States. AP-101, EPA.
Research Triangle Park, NC. 118 pp, 1972.

Hunt, J.C.R., and R.J. Mulhearn. Turbulence dispersion from sources near
two-dimensional obstacles. J. Fluid. Mech. 61: 245-274, 1973.
57
-------
Hunt, J.C.R., Puttock, J.S., and W.H. Snyder. Turbulent diffusion from a
point source in stratified and neutral flows around a three-dimensional
hill. Part I - Diffusion Equation analyses. Atm. Environ.
13: 1227-1239, 1979.

Hunt, J.C.R., and W.H. Snyder. Experiments on stably and neutrally strati-
fied flow over a model three-dimensional hill. J. Fluid Mech. 96:
671-704, 1980.

Lamb, R.G. A numerical simulation of dispersion from an elevated point
source in the convective planetary boundary layer, Atm. Environ. 12:
1297-1304, 1978.

Lamb, R.G. The effect of release height on natural dispersion in the
convective planetary boundary layer. In: Proceedings of the Fourth
Symposium on Turbulence, Diffusion and Air Pollution, Reno, NV., AMS,
Boston, MA, 1979, pp 27-33.

Lamb, R.G. Diffusion in the convective boundary layer. In: Atmospheric
Turbulence and air pollution modeling, F.T.M. Nieuwstadt and H. Van Dop,
eds. 0. Reidel Publishing Co., Dordrecht, Holland, 1984, pp 159-230.

Lavery, T.F., A. Bass, D.G. Strimaitis. A. Venkatram, B.R. Greene, P.J.
Drivas, and B.A. Egan. EPA Complex Terrain model development program.
First milestone Report, 1981. EPA-600/3-82-036, U.S. Environmental
Protection Agency, Research Triangle Park, NC, 1982, 305 pp.

Lavery, T.F., D.G. Strimaitis, A. Venkatram, B.R. Greene, D.C. DiCristofaro,
and B.A. Egan. EPA Complex Terrain Model Development, Third Milestone
Report - 1983. U.S Environmental Protection Agency, Research Triangle
Park, NC. 1983, 271 pp.
58
-------
Report to the U.S. EPA of the Specialist Conference on the EPA Modeling
Guidelines, Feb. 22-24, 1977, Chicago, 111.

Smith, M.E. Review of the Attributes and Performance of 10 Rural Diffusion
Models. Bulletin of the American Meteorological Society 65(6): 554-558,
1984.

Strimaitis, D.G., A. Venkatram, B.R. Greene, S. Hanna, S. Heisler, T.F.
Lavery, A. Bass, and B.A. Egan. EPA Complex Terrain Model Development,
Second Milestone Report - 1982. EPA-600/3-83-015, U.S. Environmental
Protection Agency, Research Triangle Park, NC. 1983, 375 pp.

Strimaitis, D.G., T.F. Lavery, A. Venkatram, D.C. DiCristofaro, B.R. Green
and B.A. Egan. EPA Complex Terrain Model Development, Fourth Milestone
Report - 1984. U.S. Environmental Protection Agency, Research Triangle
Park, NC, 1984, 319 pp.

Uthe, E.E. Cooling tower plume rise analyses by airborne lidar. Atm.
Environ., 18(1): 107-119, 1984.
•
Wackter, D.J., and R.J. Londergan. Draft: Evaluation of Complex Terrain
Air Quality Models. TRC Project 2465-R81, TRC Env. Cons. Inc., E.
Hartford, CT. Contract No. 68-02-3514. U.S. Environmental Protection
Agency. Research Triangle Park, NC 27711.

Weil, J.C. Applicability of stability classification schemes and associated
parameters to dispersion of tall stack plumes in Maryland. Atm.
Environ. 13: 819-831, 1979.

Weil, J.C., and R.P. Brower. The Maryland PPSP dispersion model for tall
stacks. Prepared by Environmental Center, Martin Marietta Corporation
for Maryland Department of Natural Resources (Ref. No. PPSP-MP-36), 1982,

Weil, J.C. and R.P. Brower. Estimating convective boundary layer parameters
for diffusion applications. Prepared by Env. Center, Martin Marietta
Corp. for Md. Dept. of Natural Resources. (Ref. No. PPSP-MP-48),
1983.
59
-------
Willis, G.E., and J.W. Deardorff. A laboratory study of dispersion from an
elevated source within a modeled convective boundary layer. Atm.
Environ., 12: 1305-1311, 1978.

Willis, G.E., and J.W. Deardorff. On plume rise within a convective boundary
layer. Atm. Environ., 12: 2435-2447, 1983.
60
-------
REVIEW OF COMPLEX TERRAIN MODEL PERFORMANCE
Prepared for the AMS-EPA Steering Committee
by

Robin L. Dennis*
Meteorology and Assessment Division
Atmospheric Sciences Research Laboratory
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
March 1985
*0n assignment from the National Oceanic and Atmospheric Administration,
U.S. Department of Commerce.
61
-------
INTRODUCTION

This review is divided into three main sections: Perspective on the
Models; Model Inter-comparison; and Review of the Evaluation Framework. The
Perspective on the Models section presents a brief discussion of general
knowledge as it relates to the eight models included in the performance
evaluation. The intent is to provide some indication of the "currentness"
of the assumptions used in the eight models. The Model Intercomparison
section examines how the models seem to compare; can the differences be
explained? An indicative evaluation is presented that examines the quality
of the answers from the models. Suggestions for a needed improvement in
the measures used in an intercomparison are discussed. Preliminary judg-
ments are presented with respect to the performance of the models and the
relation of performance to currentness of the assumptions used in the
models. As well, several apparent problems with the models or with how
they were run are noted. The Review of the Evaluation Framework examines
the design of the TRC evaluation (Wackter and Londergan, 1984) with respect
to stated and unstated goals for such evaluations. The evaluation measures
used are critiqued and suggestions for additional, needed measures are
given. Issues involved with transferability of the results to other complex
terrain problem domains are discussed.

PERSPECTIVE ON THE MODELS

Interestingly, the parameterizations embodied in the eight complex ter-
rain models cover a fairly broad range of approximations of the physical
process involved in point source dispersion. There is a range in the
sophistication of the approximations to flow over or around a terrain
feature. There is a range in methods to estimate turbulence parameter
magnitudes and in the number of modifications to them that are included.
The use of local turbulence data as the basis for estimation is an important
facet. Most of the models seem to be blithely run in violation of the
second law of thermodynamics. Plume-rise algorithms are similar across the
models, but implemented differently for the short downwind distances involved.
62
-------
Complex Terrain Flow

The complex terrain features that are considered in this evaluation
are (1) a sloping ridge and (2) a hill. These features are two generic and
"simple" types of complex terrain. The pollutant source is a single elevated
point source, fairly close to the terrain feature. For complex terrain
features, such as these, it is known that the mean path of the plume of
pollutants does not follow the mean streamline (Hunt and Mulhearn, 1973;
Egan, 1975). Two models, however, IMPACT and SHORTZ ignore this element of
the physics.

One has the distinct impression from reading the documentation that
the developers of IMPACT do not believe that pollutant plumes necessarily
follow the mean streamline. It appears as if the influence of the terrain
features is to simply constrain the wind not to blow through the terrain
surface in the calculation of a non-divergent wind field. The entire link
to inhomogeneous flow and the resulting effect on plume displacement and on
lateral and vertical diffusion rates caused by obstacles seems to be lacking.
Evidentally a wind model, MATHEW, closely related to the one used in IMPACT
(WEST) has extremely unrealistic and undesirable behavior at the leading
edge and at the top of terrain features where there is compression. This
behavior results because continuity of mass flow along the streamline is
not checked. Making the wind field divergence-free does not take care of
the problem (Graeme Lorimer, Australia, personal communication). One
suspects the same problems could show up in WEST. There also seems to be
no consideration of the well-known problems that come into play when using
a multiplebox, K-theory model to model point sources (e.g. Lamb and Durran,
1978; Deardorff, 1978).

SHORTZ also seems to ignore the impact of an obstacle on streamlines
and on the mean path of the plume. Plume impaction is not allowed if the
turbulence intensity is less than 0.01, but such low intensities rarely
occur in these data. Thus, the plume is essentially always assumed to be
in the mixed layer. The documentation states that when in the mixed layer
63
-------
the plume height remains constant with respect to sea level, once it has
reached final rise, regardless of terrain height. If terrain height is
higher than the stabilized plume height, then plume height is fixed at
zero. Thus, lift of the plume over an obstacle seems not to be considered.
Citing the second law of thermodynamics and avoidance of "unrealistic
compression of the plume," the effective mixing depth is terrain-foil owing
for obstacles. This latter treatment of mixing depth does seem consistent
with the more recent observation by Hunt, Puttock and Snyder (1979) that,
qualitatively, the variation of concentration on a 3-dimensional hill is
primarily determined by displacement of the streamline, rather than by
convergence or divergence of the streamlines. It is not apparent the model
developers had this in mind, however.

The simplest approximation to the physics underlying the above observa-
tion that plumes do not follow the mean streamline is more than ten years
old (Briggs, personal communication). In neutral and stable conditions,
the plume loses part of its effective stack height relative to the surface
of the terrain feature. This simplest approximation is to assume that the
loss in height of the plume is half the full height that would have been
calculated had not the feature been present (half-height correction). In
stable conditions, the approximation is that the plume will maintain a
constant elevation, irrespective of the terrain feature. If the terrain
feature is high enough, the plume will impact. Three models, COMPLEX I,
COMPLEX II, and PLUME5, treat the flow over or around the terrain feature
in this basic manner.

It has been recognized during the last 10 years that this simplest
approximation is terrain-feature dependent (ANL, 1977). The half-height
correction is most appropriate for terrain objects with roughly equal
horizontal and vertical dimensions. For two-dimensional ridges, due to
distortion effects, it appeared that a half-height correction would be too
conservative. It was suggested that a terrain-foil owing trajectory might be
more appropriate (Egan, 1975).
64
-------
The next level of sophistication is use of the dividing-streamline
concept to determine whether a plume will rise over the top of or impact
the surface of the obstacle in neutral and stable conditions (Hunt, Puttock
and Snyder, 1979; Snyder, Britter and Hunt, 1980; Snyder, 1983). The use
of the Froude number to define a critical height is an important advance in
approximating the physics of plume behavior in complex terrain, even though
it is representative of simplified cases. Two models, RTDM and COMPLEX/PFM,
use the critical height concept. The application is more "exact" for
neutral stability (based on potential flow trajectory solutions) and more
empirically based for stable conditions (based on the results of experiments
at the EPA Fluid Modeling Facility).

Giving the model developers the benefit of the doubt, it appears that
one model, 4141, attempts to "simulate" the physics of the dividing-stream-
line concept by using a 1/4-height correction to the plume height in stable
conditions, rather than determining whether or not the plume will go over
the hill or impact it. The documentation, however, does not imply that the
4141 model developers had such a "sophisticated" rationale in mind.

A further increment of sophistication is to actually account for, in a
single model, the observation that the shape of the terrain feature is an
important determinant of the plume height above the surface when the plume
has enough kinetic energy to surmount the obstacle (Hunt, Britter and
Puttock, 1979). That is, the half-height correction, even used with the
Froude number, is too simplistic; it does not reflect the sensitivity of
plume height to terrain geometry and meteorological variations. COMPLEX/PFM
incorporates a first-generation, first-order approximation to account for
this sensitivity. Thus, the PFM component of COMPLEX/PFM represents the
most complete operationalization of current understanding of plume behavior
in interaction with "simple" terrain features for "straight-forward" meteo-
rology. This would appear to be no small achievement.

There is still a lot of "physics" that the models included in this
evaluation are not able to consider, such as being able to account for the
65
-------
fact that plumes can behave differently for the same Froude number or the
fact that upwind boundary conditions can greatly affect the gross flow
features of the wind field upon which the local terrain influence is super-
imposed, possibly dominating or suppressing local terrain influences. The
evaluation sites seem to be representative of the more simple cases of
complex terrain and wind field flows, however. This is clear for Cinder
Cone Butte, but less clear for the case of Westvaco. Thus, there does not
seem to be an inordinate disparity between the simplified real world against
which the models are being compared and the simplified model assumptions.
Even though more complex situations, which are closer to many of the real
world applications, are not being examined and tested in this evaluation,
the range of physics represented by the assumptions in the different models
should be very informative, both from a scientific and a regulatory point
of view.

Dispersion Parameters

Essentially three sets of dispersion estimates are used in the models.
The majority use the Pasquill-Gifford-Turner (PGT) curves (COMPLEX I,
COMPLEX II, COMPLEX/PFM, 4141, PLUMES). However, in 4141 the lateral
dispersion coefficients are increased by a factor of 1.82 over the PGT
values, citing "considerations of sampling time." Apparently this was done
for all classes of stability. For two models, RTDM and SHORTZ, on-site
turbulence data are used to estimate the turbulence parameters. Inclusion
of on-site turbulence data is expected to be a real advance. IMPACT uses
an empirical set of estimates, which assigns lateral turbulence values to
exogenously determined classes of stability in order to calculate both
horizontal and vertical dispersion parameter values. This approach seems
backwards, but similar in spirit to the use of PGT curves.

An investigation of the differences between the use of the PGT curves
versus the local turbulence data, using a small non-random sample of Cinder
Cone data (20% of the data), indicates the following: (1) There is essentially
no difference between the estimates of vertical dispersion for stable
66
-------
conditions at a one-kilometer distance using either PGT curves or on-site
turbulence data (and RTDM's algorithm). These estimates agree well with
values presented in recent literature (Irwin, 1983). (2) There is a minor
difference'(15-20%) between the estimates of horizontal dispersion of
Cramer (SHORTZ) and RTDM at one kilometer for neutral and stable conditions
and these estimates are in good agreement with the most recent literature
(Irwin, 1983). However, the estimates of horizontal dispersion, based on
local turbulence data, are a factor of 2 to 4 larger than the estimates
based on PGT curves. The difference is roughly a factor of 2 for neutral
conditions. The difference is roughly a factor of 3-4 for very stable
conditions. Thus the differences in the estimation of the lateral dispersion
parameters should have a large influence on the relative predictions of the
model s.

All of the Gaussian-based models include plume buoyancy corrections to
account for the fact that the dispersion is enhanced for strongly buoyant
plumes. Thus, all of the models take into account what seems to have been
an earlier concern that models were not including corrections for buoyancy
(ANL, 1977). Somewhat different approaches are used in SHORTZ and 4141
from the "standard" approach used for the other models based on guidance
from Pa squill (1976). The difference seems minor in the case of 4141.

Three of the models, RTDM, PLUMES, and SHORTZ, include enhancements of
the lateral diffusion due to wind shear. The suggested form from Pasquill
(1976) was followed.

Second Law of Thermodynamics

If a flow conserves mass, then concentrations in that flow cannot
increase with increasing travel time (a simple statement of the second law
of thermodynamics). Using the conventional reflection algorithm of the
older Gaussian models doubles the plume center-line concentrations for
cases of plume impaction on a terrain surface (a violation of the second
law). In essence, the model calculation, because it is steady state,
67
-------
assumes that ground-level has been at the height of the plume center-line
the full distance from the source. This assumption of full reflection of
the plume is rather unrealistic for complex terrain models, a point that
has been made both mathematically and empirically (Hunt, Puttock and Snyder,
1979; Hunt, Britter and Puttock, 1979; Snyder, Britter and Hunt, 1980).
The difference between what seems to be a more reasonable concentration
estimate and an estimate based on full reflection is roughly a factor of
two.

Only one model, RTDM, did not apply the full reflection assumption. A
method is employed in the RTDM that produces a simplified, conservative
estimate of the center-line concentration at the time of impaction. That
is, the "real world" answer is expected to be lower than their estimate due
to their not considering dispersion that still can occur and their not
considering both wind meander effects and the build-up of eddies that keep
the plume from fully impacting the surface.

One would, therefore, expect the other six Gaussian models to system-
atically overpredict concentrations associated with impaction. They should
also systemically predict higher concentrations than RTDM for the extremes.
While it is nice to establish this "known" fact for regulatory purposes, it
is not particularly illuminating when one is trying to understand how well
the models could predict.

Three models, COMPLEX I, COMPLEX II, and COMPLEX/PFM, could have been
run in a mode that established limits on the center-line concentration,
similar to those used in RTDM, by setting user option IOPT(25)=5. They
were run in the regulatory mode, however, employing full reflection, setting
IOPT(25)=1, due to regulatory needs. There is no question that it is very
useful to examine the performance of these models as they are used in a
regulatory, screening mode. One obtains a clearer understanding of the
margin of safety that is built into the predictions produced by these
models for the screening mode. However, to evaluate the science in the
models, especially COMPLEX/PFM, the models should also have been run with
-------
IOPT(25)=5, because a simple "correction" of the results is not possible
without considerably more information than that included in the evaluation.
It is unfortunate that these models were not run both ways.

Plume Rise

The models use plume-rise formulations that are from or consistent
with the 1975 set of Briggs' formulas (Briggs, 1984). There is a noteworthy
aspect in this evaluation associated with the plume-rise calculations.
While the models all cluster around a few formulas of plume-rise, a major
difference is whether the models use final rise or a gradual rise. A back-
of-the-envelope calculation implies that the difference could be important.

The Westvaco stack height is 183 meters. The monitors vary in
distance from the stack, ranging from 800 meters to 1500 meters from the
stack along the ridge. These distances are all less than 10 stack-heights
from the source. At 1 kilometer the difference between a final rise and a
gradual rise plume height is 50% for neutral stability, assuming final rise
is achieved at a distance of 1.8 km. This effect should not affect the
computation of maximum concentrations, because one would assume that plume
impaction is the cause of the maximum. It will affect predictions of
concentrations which are lower on the concentration distribution, however,
and would be expected to produce a bias toward underprediction, especially
for the lowest concentrations. This bias will also be monitor (distance)
specific for those models that use final-rise plume height. Several models
had options for gradual rise, but were run using final rise. It is not
clear from the TRC documentation why this choice was made.
69
-------
MODEL EVALUATION AND INTERCOMPARISON

One problem that perenially faces model evaluations is that the data
sets on which the evaluations are based are always too limited and there
are never enough data sets to answer all of the relevant questions. There-
fore, the evaluations must be used to the hilt. In this case, diagnostic
evaluation becomes an important part of any overall evaluation, because
reasoned inference will be required to assess how the models could be
expected to perform and compare in situations other than those represented
by the specific evaluation site data set being used. One would assume
that an evaluation oriented towards regulatory application of the models
would want to know whether or not a good or bad showing of a particular
model on the evaluation data set is a fluke and whether such a showing is
expected to carry over to other cases.

To provide a basis for reasoned inference, a regulatory evaluation
must go beyond the simple comparison of the predicted and observed maxima.
This is because, from a simple comparison one does not really know what one
has, except the 25 largest numbers, for example, from several black boxes.
There is no way of telling if these 25 numbers mean the same thing. One
does not know how to evaluate the comparisons without going further into
some diagnostic work on the numbers.

Importantly, the diagnostic work relates not only to the results from
the black boxes, but also to the information contained in the data set of
observed values. That is, the types of measures used, how they are specif-
ically defined and used, and the subdivisions of the data (breakdowns) used
must be developed in interaction with the type of results the models are
producing and the type of behavior evidenced by the real system. Reasonably
specific guidance can be established on how to begin an evaluation for
particular types of pollution/meteorological systems, given their idiosyn-
cracies. However, it is just that, only a beginning. Conducting an evalua-
tion is an iterative process. The second and succeeding steps in the
evaluation depend on the first. The first step can be the most mechanical
70
-------
of all, but it is only the first step. What is presented in the TRC docu-
ment, therefore, is only the (partial) first step towards an evaluation.
An actual evaluation is far from being realized.

Examination of Distributions/Populations

To begin with, it is clear that in many instances different, possibly
non-comparable populations are being compared, especially for comparisons
based on the Westvaco data set. For one model, the population of the top
25 apparently corresponds to F stability and station number 1; for another
model the top 25 seems to represent a population from D stability and
station 6; for another model the top 25 represents a population from F
stability and stations 9 and 5; and for yet another model, the top 25
represent a population from D and F stabilities over several stations. The
question is, do these populations have anything in common, are they even
comparable?

The second question is, what do these populations of model predictions
have in common with "reality?" For the observed data, the top 25 represent
a population from D and F stabilities over several stations (most likely
4). Thus for "reality" there are 16 subpopulations of stability and station.
To obtain the 25 highest, one is, at most, picking only the top two from
each subpopulation. This population of extremes, made up of several sub-
populations, is then being compared with a population made up of the top 25
from a single subpopulation.

There is no reason to expect the population of the predicted "top 25"
to be the same as the observed "top 25" nor for their distribution of
concentrations to be the same. One only has to remember that a distribution
of concentrations which is the sum of two log-normal distributions is not
itself log-normal. However, the TRC evaluation report assumes that all of
the "top 25" populations are similar across models, and can validly be
compared, rather than establishing that fact for the data set in hand. In
fact, one might expect RTDM to produce a flatter cumulative distribution of
71
-------
this "top 25" compared to the cumulative distributions produced by the
other models. That is, the range between the highs and the lows of the top
25 could be less for RTDM, because several combinations of meteorological
conditions (subpopulations) contribute to its top 25 predictions. Thus,
for RTDM, a flatter distribution than those produced by the other models
could be a "correct" answer, instead of expecting the models to produce the
same distribution. One can try to crudely estimate the relative spread of
the distributions coming from the different models for Westvaco using the
eight stations on the ridge (one can't do it properly, given the information
available in the TRC report) and then for Cinder Cone (again in a very
crude manner). To do this, a ratio of average concentration using a small
number of observations to an average using a large number was computed.
The following comparisons result:

Table 1. Ratio of Maximum Values for Different n

Westvaco: n=8/n=200 (mean of station maxima/mean of station
top 25, across 8 stations)
Cinder Cone: n=25/n=104

Westvaco Cinder Cone
COMPLEX I 1.50 2.5
COMPLEX II 1.92 2.7
4141 2.59 2.7
RTDM 2.34 2.7
PLUME5 3.46 2.7
COMPLEX/PFM 3.48 2.8
SHORTZ 2.16 3.0
IMPACT — 2.6

Observed 1.88 2.5
72
-------
As expected, there are a variety of ratios for Westvaco. It has still
to be established how comparable one should expect the slopes (ratios) to
be. It is noteworthy that the RTDM slope is not the lowest (i.e. not the
flattest cumulative distribution) and it is also higher than the slope
for the population of observed values. (Whether it is significantly differ-
ent should be established.) One would have expected the observed and
RTDM populations to be the most similar of all, if RTDM is predicting well.
For the Cinder Cone results, the ratios are much more comparable across
the models. It is quite possible that the populations represented in the
Cinder Cone data are fairly comparable in a relative sense. If so, then
all the models, except SHORT!, do a very good job of representing the
relative spread of the distribution of high concentrations in the Cinder
Cone data. The degree of agreement on the Cinder Cone data is rather
phenomenal. One might suspect that differences across models, as evidenced
by the differences in bias, are due to systematic, multiplicative factors
associated with parameterization of physical process. (A more adequate
treatment of Westvaco data appears later in the review due to "finding"
some data late in the review process.)

For the Westvaco data, one could have a model that underpredicts
drastically at every station, but one, and for every stability, but one.
For that one station and one stability category, the bias of the "top 25"
is only a factor of two greater than the observed "top 25." Is that model
as good as RTDM because it has the same bias? Given the present level of
information contained in this evaluation, one has no "objective" basis
for making a judgment in answer of that question that could ever extend
beyond the very specific, local population of the "top 25 from all sta-
bilities and all stations 800 and 1600 meters from the stack on a ridge
exactly as Westvaco1 s for the year's meteorology represented by the given
data set."

Thus, to compare the model predictions, with the "real world" and
across models, one should use populations that are as similar as possible.
This means that different breakdowns of the data must be developed, based
73
-------
in part on how the models tend to predict. Exploration and display of the
different sub-populations could be achieved by the use of histograms and
box plots. The box plots have the advantage of containing more information
than standard deviations; box plots can present the median, the skewness,
the width of the distribution and odd-ball values, all displayed at once.
Only with such information can relevant subsets of data be constructed that
will produce well-defined and interpretable comparisons. Of course, this
assumes that the actual data are available in order to generate all of these
possible plots, tables and diagrams.

Once this diagnostic work has been done, one can then go back and
contruct a careful and meaningful set of results that meets the needs of a
regulatory evaluation. The TRC report represents a good first iteration
of the many iterations required to develop a good evaluation of these
model s.

Following a different vein, one should have noticed that the above
results present a totally different evaluation of the ability of the models
to reproduce the relative spread of the concentration distribution from
that contained in the TRC evaluation report. The model comparisons look
terrific for Cinder Cone and not all that bad for Westvaco (see also the
section on Exploratory Analysis of Top 25). There is no possible manner
in which the same conclusions can be developed from the values presented
in the TRC document. That is because the measures TRC uses to compare
standard deviations of residuals and frequency distributions are heavily
influenced by the amount of bias in the model predictions.

For these models, the influence of the bias is so great as to make
the interpretation of these measures, as used by TRC, totally misleading.
In the TRC report the comparisons of standard deviations of the residuals
and the frequency distributions are essentially meaningless for these
particular model comparisons. To have any meaning, the computation of
the measures used by TRC must include a correction for bias. One suggestion
is the use of a statistical test that can account for bias, such as the
74
-------
Siegel-Tukey test (Gibbons, 1976). But the shape is best measured by
parameters of slope and spread. Why not "measure" these directly, rather
than use the indirect manner of the TRC report? Thus, better measures
could be included in the performance evaluation. In addition, one of the
best ways to display this information is through the use of graphs. This
evaluation is remiss in how the information is displayed.

Breakdown by Stability

One of the more useful breakdowns of the data is by stability class in
order to try and understand how the models are performing. As an example
of this, Table 2 presents the average of the highest concentrations within
each stability class. Unfortunately, the information of this table is not
directly available; one has to calculate the values from bias and observed
values.
75
-------
Table 2. Means of Highest Concentrations
Associated With Each Stability Class

Westvaco
(nricrograms/m**3)
Model

COMPLEX I
COMPLEX II
4141
RTDM
PLUMES
COMPLEX/PFM
SHORTZ
A-C
n=25
418
276
204
1,048
4,346
2,547
3,678
D
n=25
8
10
2
2,121
13,745
7,205
7,912
E
n=25
11,859
21,767
3,200
1,353
4,555
9,219
6,513
F
n=25
18,169
38,638
12,205
2,415
2,520
12,790
12,363
Observed
1,092
1,517
1,118
1,667
Cinder Cone
(nricrograms/m**3)
Model

COMPLEX I
COMPLEX II
4141
RTDM
PLUME5
COMPLEX/PFM
SHORTZ
IMPACT
C-D
n=30
27
40
30
29
51
37
34
10
E
n=38
51
123
74
24
97
63
39
15
F
n=36
43
108
106
24
45
75
64
15
Observed
31
26
22
76
-------
It is unfortunate that for Cinder Cone TRC combined C and D stability,
because some of the models use very different algorithms to develop their
predictions for each of these stabilities. I suspect that TRC did this in
order to obtain a larger sample size, n. But I would rather have been
given more information about the individual samples and had them disaggre-
gated fully in a manner that makes sense in terms of the parameterization
of physical processes that influence the predictions made by the various
models. If the sample size, n, is too small, so be it. Relevancy is more
important in this instance than rigidly following some statistical rule.
One would also like to have more information about each of the distributions
to be able to adjust the means for their different sample sizes for comparisons
across stability classes.

This reviewer attempted to understand the differences between the
models and the differences in their predictions, listed in Table 2. This
was done in the spirit of the goal of the TRC evaluation: "... a systematic
evaluation of these models to decide in an objective manner which models
should be included in the guidelines and what recommendations should be
made concerning the use of these dispersion models for regulatory application."
The second goal was also kept in mind: " The principal objective of this
project is to produce performance statistics so that EPA and a group of
reveivers may judge the relative merits of different models."

Several conclusions beside the obvious ones about RTDM's performance
relative to the other models surfaced. One is that there is not sufficient
information collected and presented as part of the evaluation data base to
carry out an adequate evaluation of the models. What is done is too simplistic.
Too many factors are influencing each set of numbers and these influences
cannot be uncovered with the presently available information. Yet the
diagnosis of what is "going on" in the numbers is important to the evaluation
of the models' performance.

Another conclusion is that components of the models not directly
related to their handling of complex terrain seem to have as much or even
77
-------
greater influence on the predictions of the models as do the assumptions
about how to treat complex terrain. This is a judgment that must be based
on insufficient information and relies heavily on noting consistent patterns
of behavior between models. The result is, that it is difficult to establish
the value of the different levels of sophistication with which complex
terrain is treated in the different models. The three most obvious components
that strongly influence the predictions of the models, in addition to the
component reflecting assumptions about complex terrain, seem to be treatments
of plume-rise, buoyancy induced dispersion and eddy diffusion.

The differences between the models on the Westvaco data for D stability,
shown in Table 2, could be an indication of the importance of plume rise on
the model predictions, because many use the same half-height correction and
only SHORTZ should allow plume impaction. It is impossible, given the
information available, to more precisely know what effect plume rise is
having other than to infer that it is an important effect. For example,
predictions of COMPLEX II and COMPLEX/PFM are similar in the Cinder Cone
comparison, but vastly different for the Westvaco comparison. For the
Westvaco data, SHORTZ may predict higher concentrations than RTDM for
neutral stability, even though both models have similar estimates of lateral
diffusion, because plume height above sea level is held constant in SHORTZ,
whereas RTDM uses a half-height correction. Yet, why is this same pattern
repeated for the Cinder Cone data set where no plume rise is involved?

One would want to be able to establish, more quantitatively, what relative
influence is coming from what component of the model. One suggestion would
be to include as part of the evaluation data set the center-line prediction
at the distance of the receptor for each hour. This would help to develop
some diagnosis of the model's predictions and help assess the importance of
plume rise, buoyancy induced dispersion and eddy diffusion on the actual
prediction at the receptor. This dissection of a predicted concentration
is important for an assessment of the question, "is the better science in
the model having a positive effect on the predictions?" (Post Script: an
evaluation in this spirit was carried out by Dennis and Irwin, 1985 which
provided extremely important illumination of model behavior.)
78
-------
The apparent differences in lateral dispersion parameters based on on-
site turbulence data compared to the PGT curves seem to be important. The
differences appear to be greater for the Westvaco data set than for Cinder
Cone. Thus differences seem to be a function of data set. Time and data
availability did not permit the quantitative establishment of this
inference.

Much of the difference between COMPLEX II and RTDM for stable
conditions can be explained by expected differences in the horizontal
diffusion coefficients and the reflection of the plume when the plume is
expected to impact (a factor of 6-8 compared to an observed factor of 10).
Differences in lateral dispersion parameters could explain a major portion
of the difference between COMPLEX/PFM and RTDM for the neutral case for
both Westvaco and Cinder Cone. Without knowing more precisely the differ-
ence in lateral turbulence values (PGT versus on-site), one cannot truly
evaluate COMPLEX/PFM's predictions, which account for terrain shape,
against RTDM's predictions, which do not account for terrain shape. The
turbulence estimates used by each model at the distance of the receptor
should be output and made available as part of the evaluation data set.

Differences between COMPLEX I and COMPLEX II for impaction are
consistent with the difference expected as a result of the 22.5 degree
sector averaging compared to a bivariate Gaussian point estimate at a one-
kilometer distance. It is interesting to note that the difference is
expected to increase to a factor of five at a distance of five kilometers.

It is not clear why SHORTZ and COMPLEX/PFM give such similar predic-
tions, both for the breakdowns by stability and by monitoring station
(Westvaco). Having a grasp of the reasons for their similarity on these
two data sets, which are rather special, would seem to be very important
to a judgment about using either model under different circumstances for
regulatory purposes.
79
-------
The "top 25" observed concentrations seem to be very uniform in space
and across stabilities for Westvaco and have a rather simple trend across
stabilities for Cinder Cone. None of the models can reproduce this space
and stability class behavior of the "real world," although RTDM comes by
far the closest of any model. The question that should be raised is, do
the models have to be able to reproduce the spatial and stability class
behavior of the real world in order to come up with useful predictions for
regulatory purposes? This is where some insight into the sample "popula-
tions" for the different space and stability classes is needed, as discussed
in an earlier section.

Spatial Behavior

The evaluation does not really address the spatial and stability class
aspects of model behavior, even though TRC thinks it evaluates the spatial
behavior by using the Pearson correlation coefficient. The reason that the
Pearson correlation coefficients (Spearman, too) are not giving much infor-
mation at all (Westvaco data), is because they are anchored on the lower
end by stations numbers 2, 10, and 11. These stations have very low values,
more than a factor of ten lower for predicted values and a factor of four
lower for observed values.

This very large gap between the two clusters of points will, of course,
produce a fairly large coefficient of correlation, regardless of whether or
not the values are correlated within their own clusters. (As long as one
includes babies, IQ scores correlate very well with height.) To this
reviewer's mind, Stations 2, 10, and 11 should not have been included in
the evaluation of model predictions for the stations on the ridge. Their
domination of the correlation coefficient imparts no physical meaning to its
interpretation. On the contrary, their inclusion precludes any coherent
interpretation of the results.

Using the averages of the "top 25" across the stations, one calculates
correlations similar to those produced by TRC. When one removes the three
80
-------
low stations and recalculates the correlation, one finds that the corre-
lations across all of the models are not significantly different from zero,
ranging from 0.26 to 0.66. For n=8, the correlation coefficient must
be larger than .666 to be significantly different from zero at the 95%
confidence level. Five sets of random numbers produced correlations with
the station concentrations that ranged from 0.15 to 0.73. The anchoring of
the correlations by the 3 low-concentration stations would have been immedi-
ately obvious if some graphs had been used. One thing that is not pointed
out in the report is that, when n=ll a correlation coefficient that is less
than .576 is not significantly different from zero at the 95% confidence
level.

Exploratory Analysis of Top 25 at Uestvaco

The purpose of this particular section is to present initial elements
of the first step of an evaluation in order to draw attention to the limita-
tions of the TRC report. This reviewer considers the type of exploration
of the data presented in this section very necessary, not only as an aid to
the interpretation of the evaluation statistics, but also for its inclusion
as part of the evaluation itself. The data used is not available in the
TRC report and had to be obtained as "outside" information. Although the
review ostensibly is to be based on the TRC report, the goals of the overall
project stated in the TRC report are better served by bringing in "outside"
information and including it in the review, as was done with on-site turbu-
lence data.

Table 3 shows a stem-and-leaf diagram for the top 25 concentrations at
Westvaco, depicting and comparing observed concentrations and predictions
from the eight models. The stems are to the left of the vertical line and
are the left-most 1 or 2 significant digits of the concentration values.
The very next digit, rounded, to the right in the concentration value
becomes the leaf, the number on the right of the verticle line. Each
number to the right of the vertical line represents a different concentration
value. Thus the stem-and-leaf display is like a histogram, but it retains
81
-------
to o *»• en o to
—u—u—u—(J-
01
O1 I—
CTl to O
CM Cl P- CM tO tr»-r».tototoio«3'*r<»i>f>CMeM.-i-*oooioi>cOflOr~r».totoioin^^'co
O"-t
a* to
in ^* ^* to
10 r» -< in
ee
o
o
CSJ
IP: i
-(->

(O -r-
Q r— i—
CJ Q.
o c co
(J 1—4 -I—
O)
-M C ^~
CO O i—
O) aj.—^
c c
LO O O
CM CO

a. co ai
o aj en.
O rv.
COCO CMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCJ.
a.
o
CO OJ •<-
>, 0) S
a, a) a;
to ca E
•r- (O
Q co iyi

<+- aj oj
(O -M .J=
a> GO -u

i i+_ co
-o o -i-

« i- c
i a> 03
csj ^ GO
Q-
U
4-> 3 O)

JD
(J> i-l O IO
CT>OOO CT> — » CTl o f
CM
O
CMCMCMCMCMCMCMCMCMCJ
> CM —.

X
CL.
(->
to oo co CM

•a
01
§
82
-------
numerical information about the concentrations. The diagram was set up so
that number of stems between the median (n=13) and the lowest value (n=25)
is approximately equal for each of the distributions. The distributions
are, therefore, "normalized," although the normalization is not a cardinal
one but rather with respect to spread.

There are several items of note. First, four of the models have
problems with "outliers" when their normalized distributions are compared
to that of the observations. Second, except for IMPACT, removing two of
the extreme values brings the spread across bins of five of the models very
close to each other. The spreads of SHORTZ and COMPLEX/PFM remain somewhat
larger than the rest , but are still very similar to them. One would
expect this to occur if the distributions are similar in shape, given the
type of normalization. IMPACT stands out as having severe problems. The
4141 model has a noticeably shorter distribution of bins compared to the
other models. Third, and importantly, several of the normalized distri-
butions look very similar to the observed data (COMPLEX I, COMPLEX II, and
RTDM). Overall the differences are much less than expected, based on the
TRC information (e.g., the maximum frequency difference computations), and,
except for IMPACT, seem to be mostly differences in "magnification" rather
than some gross underlying difference in shape between the distributions.
This is a very different conclusion than that presented by TRC.

While the stem-and-leaf display gives a good overview of a particular
nature, more insight can be obtained with empirical quantil e-quantile (EQQ)
plots. Empirical quantil e-quantile plots are shown for five of the models
in Figures 1-5. In EQQ plots one can also discover the existance of outliers,
as for example Figures 3 and 5 for IMPACT and SHORTZ. One can also establish
that the distributions of the predicted concentrations are rather similar
to the distribution of the observed concentrations, because the EQQ plots
are not too far from being straight lines for the lowest ranked 20 points.
This is important to establish. However, one now notices that COMPLEX I
has an odd "hump" in the middle of its distribution (Figure 1), and that
the distribution from SHORTZ is piece-wise linear with two segments (Figure
83
-------
OBSERVED versus COMPLEX I
(Top 25 Values)

^_^
o
0
o
i—*
X
en
C
o
+J
-------
OBSERVED versus COMPLEX-PFM
(Top. 25 Values)

„ .
o
o
o
r-t
X
"^"^
CO
c
o
i-
1 \
^^
c
OJ
o
c
o
o
•o

4J
o
-o
-------
110
OBSERVED versus IMPACT

fTap 23 Values)
1OO -I

8 90H
^ 80 -

CO
£

.2 TO -
re
0)
o

o
T3
01
60 -
SO -
40 -
30 -
"O

£ 2D^
a.

1O -

O
'
Bffi-
-»-
-B-
1.1
1.S 1.7 1.» 2.1 2.3

Observed Concentrations ( xlOOO )
2.S
Figure 3: Empirical Quantil e-Quantile Plot

Top 25 Values Westvaco Data
86
-------
OBSERVED versus RTDM
(Top 25 Values)

o
o
o
X
c
o
+J
(O
s-
c
ai
u
o
o
-a
O)
tj
•5
OJ
s.
0_

J.^t ~
S -
4.e -
4.6 -
4.4 -
4.2 -

4 -

3.3 -

3.S -
3.4 -
3.2 -

3 -
2.8 -

2.S -
2.4 -
2.2 -
F
^

^
J&
.--""
jf"
/
/
r
____--«
/0e"^""
d'
^^^.-o
B"
jX^
ff*^
_Q&^
cT'^^

1 1 f 1 II 1 1 T

1.6 1.8 2 2.2 2.4 2.6
Observed Concentrations ( xlOOO )
Figure 4: Empirical Quantile-Quantile Plot
Top 25 Values Westvaco Data
87
-------
1.6
OBSERVED versus SHORTZ
(Top 25 Values)

o
o
o
r— 1
X
"~~^
0
'.J3
-

16 -

17 -
16 -
IS -
14 -
13 -

12 -
11 -
in

f
/

1
y"

/''
/
fs
rf"
7
_ 1
_Q^3
...-•''
r^
i
s
__^^3~~cl
™B__— ee-
^.o*^
iS-
1.S 2 2.2 2.4

Observed Concentrations ( xlOOO )
2.6
Figure 5: Empirical Quantil e-Quantile Plot
Top 25 Values Westvaco Data
88
-------
5). Thus these two models exhibit odd behavior that is difficult, If not
impossible, to notice in the stem-and-leaf display. The severe problems of
IMPACT are noticeable in both types of data display.

The EQQ plots establish that the central part of the distribution from
each model is reasonably close to being linear with respect to the observed
concentrations. Thus one can easily make a quantitative comparison of the
spread of the distributions and their "slope" by using the upper and lower
quartiles and be satisfied that such comparisons are consistent across all
the data sets. Let us again note the need to scale or normalize the distri-
butions for comparison purposes and define a relative spread as the upper
quartile minus the lower quartile divided by the median. (We effect a more
usual cardinal normalization.) The "slope" we will define as the ratio of
the upper quartile to the lower quartile. The results for the models are
shown in Table 4.

Table 4. Comparison of the Relative Spread
of the Concentration Distributions
Model
COMPLEX I
COMPLEX II
4141
RTDM
PLUMES
COMPLEX/PFM
SHORTZ
Relative Spread
.200
.154
.444
.402
.223
.391
.337
"Slope"
1.22
1.17
1.55
1.46
1.24
1.43
1.36
Observed .264 1.29

IMPACT 2.33 4.37
Observed(IMPACT) .387 1.46
(subset)
89
-------
The "slope" for SHORTZ is a bit overestimated due to SHORTZ's odd distribu-
tion and might be a bit underestimated for PLUMES. But in general, IMPACT
is the only model that is clearly out of line with the observations. An
uncertainty analysis and statistical test is needed to establish whether or
not the spread of a model such as RTDM is significantly different from the
observed spread. RTDM and COMPLEX/PFM are rather similar on these measures
of relative spread and "slope."

Although some models show reasonable patterns of the "top 25" at West-
vaco, ignoring the outliers and the problem of extreme bias, one still
would like to know if there is any match with real world conditions associ-
ated with the maxima, because the models are going to have to be used
elsewhere than Westvaco. We still have to use complete unpairing in time
and space, because the models just don't produce maxima on the same days as
maxima are observed. The next best match is for the models to reproduce
the mix of stability classes, the mix of wind speeds and the mix of stations
associated with the "top 25." A first examination of this can be performed
with the use of histograms. Figure 6 shows histograms for a breakdown of
the top 25 by stability category for five models and the observed. Figure
7 shows histograms of the data broken down into wind speed ranges for five
models and the observed. Figure 8 shows histograms for the top 25 data
broken down by monitoring sites on the ridge plus station 2 on the other
side of the valley.

These three figures show univariate patterns which can be compared
with the pattern in the observations. Whereas the range of wind speeds in
the top 25 is too narrow for COMPLEX I, it is too broad for RTDM. RTDM and
COMPLEX/PFM look the best on stability class comparisons and PLUMES and
4141 look worst. Most models exhibit the same patchiness in hitting the
monitoring stations as demonstrated by the behavior of RTDM. COMPLEX/PFM
is the only Gaussian model that moves away from this patchiness. IMPACT
looks very good, if it were not for the fact that IMPACT predicts a high
concentration for station 2, a prediction which is totally out of line.
90
-------
STABILITIES FOR OBSERVED
i-TOP aa VOIUMI
STABILITIES FOR COMPLE'-PFM
13 -

10 -
Stability Class
1O -

3 -

C -
STABILITIES FOR PLUME5
I Top 25 V*lua*i
STABILITIES FOP M414 1
• Tap 2S VAali^MI
STABILITIES FOR SHORT!
STABILITIES FOR RTDM
Figure 6: Histogram of Top 25 Values of Westvaco Data: Stability Class
91
-------
Wit ID SPEEDS FOR OBSERVED
i Tap Z.S valuer i
0-1.0 1.1-2.0
WIMD SPEEDS FOR M4141
r Top 25 "oluM t
O
o
0-1.0 1.1-10 tl-10 JJ-4,0 41-SJ £!•<»
Wind Speed (m/s)
WIND SPEEDS FOR COMPLEX I
(Top 3S VoltM)
WIND SPEEDS FOR PLUME5
2* -
Xt -
m -
i« -
»4 -
'3 -
1O -
a -
« -

4 -
2 -
rt

•rx:1
V / •-" v
^•/'<
•/' '/'
'•'.':''','
' ' / ' .'
''/•'•'/'
'///,
.' ' '•' •
f / S .'
'.' ''.''•
V ''//
' •' ••

"FT- — i
0-tO tt-2J3 2J-W JJ-*J -U-&0 S*-*+
Wlt-JD SPEEDS FOR COMPLEX-PFM
22 -

iO -

12 -

>*-
2J-J.O il-4.0 4V&0 £!-«+

fm/«)
WIND SPEEDS FOR PTDM
H-iO 11-4.0
Figure 7: Histogram of Top 25 Values of Westvaco Data: Wind Speed
92
-------
X
u
^1
t-
\xv\ •
9 !
-^i '
l_I_ '
2 i
. " 1

:il
•£ I
'_•> 1
s s a ^
ki^VN"* 1
h
• » X
« » i-t o
LJ

m
o

OL\
PI
I v \ ^ s «
t-Vv;,)-fl S
O
-
LJ
O
S ^ 3 :^
Figure 8: Histogram of Top 25 Values of Westvaco Data: Station Number
93
-------
t-z i-j -j-e -r-f -s-s

Mind Speed (*/*)
2-3 j-v -C-S

LJind
Observed
o-j :-z 2-3 j-f f-f

Wine/ SpeeJ (m/s^
(m/s)
COMPteX-PFM
SHORTZ
Figure 9: Bivariate Histogram of Top 25, Westvaco Data,
for Combinations of Stability and Wind Speed
94
-------
Observed

Wind Speed (m/s)

0-1 1-2 2-3 3-4 4-5
1-3

Stability 4
Class
5

6-7
COMPLEX I

Wind Speed (m/s)

0-1 1-2 2-3 3-4 4-5
1

2
1

5 1
1 1
7 6
[diagonal]
1-3
4
5
6-7

2 19 4
[clump]
COMPLEX/PFM
SHORTZ
0-1 1-2 2-3 3-4 4-5
0-1 1-2 2-3 3-4 4-5 5-6
1-3
Stability 4
Class
6-7

3
1

3
1
3 13 1
[diagonal]
1-3
4
5
6-7
1
4

1 4
1

2 1
9 1

1
[triangle]
Table 5: Counts for the Bivariate Comparison Matrix of Wind Speed
and Stability Class.
-------
A more stringent type of match is a bivariate one. Figure 9 shows a
bivariate histogram of stability class matched with wind speed versus
count. The counts are given in Table 5. Three models are shown. The
observations exhibit a diagonal pattern. COMPLEX I exhibits a clumped
pattern; COMPLEX/PFM exhibits a diagonal pattern; and SHORTZ exhibits a
triangular pattern. This information, the unpaired comparison of model
behavior to real world behavior, appears to be useful in helping to discrim-
inate between models.

It should be clear by now that a detailed analysis such as the one
begun here is necessary to understand what the models are "doing," and,
hence, understand what is contained in the overall statistical measures.
Without this understanding one cannot make an informed evaluation of the
models on this data set, much less make any informed judgments with respect
to the regulatory use of these models. Much more needs to be done. Yet,
the TRC document is incapable of providing the necessary information,
because what has been used in this subsection is the actual ordered hourly
data, not the summaries given in the TRC document.

Some Assessments of Model Behavior (TRC Document)

Almost all of the influences that come from "current" thinking about
complex terrain models move the predictions of the models towards lower
values compared to the predictions of models based on older concepts and
approximations. Getting rid of the surface reflection at impact, using
onsite turbulence data, and accounting for terrain shape for ridge-like
features would reduce the predictions in most models. Accounting for
streamline response to hill-like shapes is the only change that would
increase the predictions of the models (according to potential flow theory).

Because all of the models, except IMPACT on Cinder Cone data, suffer
from problems of over-prediction, inclusion of "better" science should mean
"better" model predictions, according to this evaluation. However, evalua-
tions at two sites, especially in the manner they have been presented, do
96
-------
not make a strong case for such a sweeping conclusion, given all of the
uncertainty. Inclusion of a lot more diagnostic information in the evalua-
tion is necessary. A more thorough evaluation might or might not help to
bolster the suggestion that better science means better predictions. It is
difficult to tell from the information available in the TRC report which
way it would go.

One conclusion one might draw from the information presented in the
TRC report is that the model that does best, RTDM, has nearly all of its
components identified with the more current thinking about how to model
complex terrain. But one is not completely comfortable with that conclu-
sion. COMPLEX/PFM, 4141, PLUMES and SHORTZ are fairly similar in their
"top 25" predictions, unpaired in space and time. This seems to be due to
very large predictions for a few specific cases of location and stability.
How would fixing up the individual components improve these models? One
has the "feeling" that they still would not do as well as RTDM.

RTDM seems to be better than the sum of its parts. Why? The TRC
evaluation, to this reviewer's mind, is incapable, as it stands, of
establishing why. It is not clear whether the needed information from the
model predictions can be recovered from what TRC archived to adequately
address the above question. But establishing the reasons for RTDM1s better
performance would be very helpful to the prescreening of other models for
evaluation and to the interpretation of their evaluations on the same two
data sets as well as other data sets. One consistency test of the reasons
underlying RTDM's good performance would be to up-date the two or three
major components in COMPLEX/PFM that obviously need it, i.e., use of on-
site turbulence data and using IOPT(25) = 5 (or something similar in
intent) and rerun the evaluation for this improved version.

Three of the models, PLUMES, 4141, and COMPLEX/PFM seem to have odd
behavior that deserve mention. The behavior of PLUMES does not seem to be
consistent with its description. For example, one would expect that for
stability classes E and F the predictions from PLUMES would be larger than
97
-------
those from SHORTZ. They are not. One would also expect that for stabili-
ties D, E, and F, the predictions from PLUME5 should resemble those from
COMPLEX II, because the plume rise equations and the diffusion parameters
are the same. Again, they do not. There is no reason from the description
of PLUMES to expect that, by a large margin, the maximum concentrations
would occur with neutral stability for the Westvaco data. Impaction of the
plume is supposedly allowed and expected under stable conditions and a
modified hal f-height correction for neutral stabilities precludes impaction
for that case.

The behavior of 4141 does not seem to be very consistent on the West-
vaco data. It too has markedly larger predictions for a single stability
class. However, it is 4141's spatial behavior that is more unusual. This
can best be seen by looking at the relative rankings of the "top 25" model
predictions as a function of distance from the source, using 1-hour concen-
trations as shown in Table 6.

Table 6. Ranking of the Models Based on the
Highest 25 Values by Station Number
Monitoring Station
RANK
1459
(800m) (900m) (1100m) (1500m)
1
2
3
4
5
6
7
CPLX II CPLX II
CPLX I CPLX I
CPLX/PFM SHORTZ
SHORTZ CPLX/PFf
PLUME5 4141
4141 RTDM
RTDM PLUME5
CPLX II
CPLX I
4141
CPLX/PFM
RTDM
PLUME5
SHORTZ
4141
CPLX II
CPLX I
CPLX/PFM
RTDM
SHORTZ
PLUME5
98
-------
I
Whereas the models by-and-large maintainjtheir rank ordering as a function
of space, 4141 does not.

COMPLEX/PFM does not seem to be operating in accordance with its
description. For unstable conditions the predictions from COMPLEX/PFM
should be the same as COMPLEX II. They are not. The code indicates that
COMPLEX II is being called for the correct stability classes. For neutral
stability one would expect the predictions from COMPLEX/PFM to be less
than those from COMPLEX II for a ridge and greater for a hill. In the
Westvaco data there is a tremendous difference. One can only surmise that
this difference is due to differences in plume rise or that something is
drastically wrong. For Cinder Cone, the predictions from the two models
(top 30) are essentially equal. This comparison is marred by the fact the
C stability is mixed in and the comparison is not for a pure neutral case.
For cases with a stable atmosphere, one would expect predictions from
COMPLEX/PFM to at least be less than or equal to predictions from COMPLEX
I. For Cinder Cone, where plume rise is not a complicating factor, this is
not the case.

One model, IMPACT, can not truly be evaluated because there is no
equivalent information to that in Table 5-2 of the TRC report for the data
set that includes its predictions. This is very unfortunate and should be
rectified. Otherwise a tremendous amount of effort by TRC is wasted and a
considerable amount of useful information is lost. The behavior of IMPACT
does appear to be inconsistent. It had the lowest predictions for Cinder
Cone and some of the highest for Westvaco. In fact, on the 10-station
average of the highest concentrations predicted, IMPACT predicted concen-
trations more than twice those of COMPLEX II (45,000 versus 19,000). Thus
IMPACT reverses its ranking across the two data sets, as indicated below.
99
-------
Table 7. Ranking of Models Based on the 25 Highest Values
(unpaired in time and space)
RANK
Westvaco
Cinder Cone
1
2
3
4
5
6
7
8
COMPLEX II
IMPACT
COMPLEX I
COMPLEX/PFM
SHORTZ
PLUMES
RTDM
4141
COMPLEX II
4141
PLUME5
COMPLEX/PFM
SHORTZ
COMPLEX I
RTDM
IMPACT

One notes that, in fact, IMPACT changes places with 4141 and PLUMES
changes places with COMPLEX I. Although based on admittedly weak evidence,
one slowly begins to build an impression that the predictions from IMPACT,
4141, PLUMES and possibly COMPLEX I should not be trusted.

Assessments Revisited

Did the information contained in the more detailed exploration of the
top 25, based on the actual data, make a significant contribution to an
assessment of the models? Clearly, the answer is yes. The EQQ plots
showed that for the top 25 at Westvaco the differences in sample populations
of the predictions does not seem to present a problem for the comparisons.
The EQQ plots showed that SHORTZ is the only model that has a problem in
this regard. This gives notice that something is different in the pre-
dictions of SHORTZ. Except for SHORTZ and except for extreme outliers,
the distributions from the Gaussian models were differing mostly by a
factor of "magnification," which was severely distorting the interpretation
of the statistical measures used by TRC. IMPACT has severe problems with
outliers.
100
-------
A much worse problem that this exploration pointed out seems to be
one of outliers. The outliers do affect the statistics in important ways.
For example, trimming the top three outliers changes the mean of the top 25
predictions from IMPACT from 17,829 to 9,845; a substantial change. The
models do not exhibit similar behavior with respect to outliers: some have
them, others do not; of the model which had outliers some had a few, one
had many. The stem-and-leaf display and the EQQ plots are important tools
to use in defining procedures to deal with this problem of outliers, such
as trimming the distribution.

The EQQ plots established the fact that the upper and lower quartiles
could be effectively used to quantitatively compare the spreads and the
"slopes" of the distributions from the different models. With appropriate
trimming, which will be case specific, one can now go back and compute
statistics that could be used for hypothesis testing of the differences or
perform robust regression on the distributions to compare slopes. The
comparisons of the distributions based on the information in the TRC document
were totally inadequate as well as being misleading.

Importantly, the new information changed this reviewer's mind about the
ability of better science to improve the predictions for regulatory applica-
tions of the models. Some worrisome, odd behavior of SHORTZ began to sur-
face, whereas, before, its predictions seemed not that different from
COMPLEX/PFM's. COMPLEX/PFM is doing better than the other models in more
consistently having a pattern of prediction somewhat like the real world
pattern. The older models are more consistently locked-in to categories of
wind speed and stability class that did not match well with the observations,
engendering low confidence in their predictions for other situations.
COMPLEX/PFM has many similarities with RTDM. That is encouraging. Thus,
maybe even RTDM could be improved by putting in better science. It certainly
looks worthwhile to upgrade COMPLEX/PFM and get it away from the older
model formulations, especially away from COMPLEX I, and get it away from
using VALLEY-like computations when there is impaction of the plume. The
station comparison for IMPACT indicates that possibly there are advantages
101
-------
to working with three-dimensional wind fields, if the other problems can be
corrected.

Thus,'to this reviewer's mind, the use of graphs and pattern comparisons
and other techniques of exploratory data analysis is very important to the
interpretation of model performance and the interpretation of the aggregate
statistical measures, as used by TRC. More should obviously be done than
presented in this review, because this paper is a review, not an evaluation.
One should be able to generate such analyses; however, one cannot from the
information presented in the TRC document. In a sense, the document is
remiss in meeting the stated objectives. The document, as it stands, can-
not support a performance evaluation of the models. At a minimum, the raw
data need to be provided for each class of breakdown of the data.
102
-------
REVIEW OF THE EVALUATION FRAMEWORK AND PRESENTATION

This section will address three sets of issues: First, there are
issues with respect to the larger question of the representativeness of an
evaluation based on the two data sets. Second, there are issues that need
to be addressed with respect to how the measures are developed and used.
Third, there are the more minor issues of useful and complete presentation
of the information.

Questions of Representativeness

The stated goal of the TRC evaluation is: " The principal objective of
this project is to produce performance statistics so that EPA and a group
of reviewers may judge the relative merits of different models." The judg-
ments are desired for the purposes of "... a systematic evaluation of these
models to decide in an objective manner which models should be included in
the guidelines and what recommendations should be made concerning the use
of these dispersion models for regulatory application."

No evaluation data set can meet every need of an evaluation. That
is obvious. The point is how to make the best use of the data. An important
step in that direction is to clearly define the representativeness (or limi-
tations) of the data set with respect to the goals of the evaluation. This
is true, whether or not prior thought has gone into structuring and designing
the data set from the point of view of a model evaluation. Clearly a
lot of thought went into the Cinder Cone experiment.

Both data sets test the "close-in" behavior of models. For the Westvaco
case, plume-rise calculations seem to have a very important influence. Thus
the Westvaco data set could be identified as much with tests of transient
(close-in) plume-rise behavior as with complex terrain behavior. In this
evaluation, thought does not seem to have been given to dealing with the
effect of plume rise on the evaluation results, and trying to more systema-
tically account for it.
103
-------
Close-in behavior also means short travel times, on the order of min-
utes, even for 1 m/s winds. Both data sets test the value of using on-site
turbulence data for a regime that is similar to that used for the development
of the PGT curves. This is excellent. But the spatial regime of one kilo-
meter is quite limited compared to the full range of distances to which the
models are expected to be applied.

Cinder Cone was obviously an excellent choice to represent a simple
hill feature. Westvaco is not the best representative of a two-dimensional
terrain feature, but the history behind the choice of Westvaco is not
known.

What about applications of the models for longer distances? Many
applications are cases in which the receptors are at a much greater dis-
tance from the source than for this evaluation; past the point of final
plume rise, beyond a 1-hour travel time from the source, and influenced by
upwind topographic relief. How important is it to use on-site turbulence
measurements for these cases? Over what distances are these measurements
valid? Insights from the Westvaco and Cinder Cone evaluations will not
be directly applicable to answer such questions.

Also, model behavior can be a function of distance. For example, for
stable conditions and impaction of the plume, COMPLEX I predicts values
that are one-half those of COMPLEX II at a one-kilometer distance. At five-
kilometers distance, COMPLEX I predictions are one-fifth those of COMPLEX
II. Assuming the other models do not greatly change their performance
relative to each other and RTDM doesn't do something odd, then at five
kilometers, COMPLEX I could possibly be in second place, right ahead of
COMPLEX/PFM, as far as the top 25 values are concerned (assuming over-
prediction is still a problem at five kilometers). Both COMPLEX I and
COMPLEX/PFM will produce better predictions than the other models (except
RTDM), but not because they do a better job of treating complex terrain.
At some distance they may do a "better job" than RTDM. This points out
that one should be suspicious of taking the results of a performance
104
-------
evaluation at one distance and "blindly" using those results as the basis
from which to judge the behavior of the models at all distances.

Thus one must think about the issues central to the evaluation of a
complex terrain model and central to the operational use such a model will
be put to. From a detailed understanding of these issues, one can develop
a number of criteria that can help guide the establishment of a number of
evaluation data sets. Sensitivity tests of the models could also be
required as part of the evaluation. The present evaluation can serve an
important function with respect to learning how to perform better evalua-
tions of complex terrain models. There is no substitute to going through
an actual evaluation. A good start has been made, but more work has yet
to be done. More work has to be done in defining what is needed of per-
formance evaluations to adequately address the broad goals stated in the
TRC document. One obtains the impression from this evaluation that
the issues have not been thoroughly enough thought through.

Questions About the Approach to the Measures

One has the distinct impression that the measures used in this evaluation
have been implemented in a very mechanical fashion. There is no argument
with the issues the measures are supposed to address; the issues are good
ones. The argument is that, while there may have been a lot of thought
devoted to the development of a potentially useful measure, very little
time or thought seems to have gone into making sure the measures are actually
giving us relevant information, doing the job they are supposed to, once
they are actually applied to a real "live" evaluation. Two examples of
severe problems with the interpretability of measures were presented above:
the measures comparing frequency distributions and those of spatial correlation
(Pearson and Spearman correlation coefficients). It seems clear that these
measures were mechanically computed without spending much time to think
about what was being computed and what, if anything, could be influencing
the answers and/or severely distorting them.
105
-------
Another example along the same line might also be instructive. This
example has to do with sensitivity to extremes. One of the issues of model
evaluation has been that the accuracy of highest or second highest estimates
from a model is expected to be poor. Guidance from the scientific community
has been that evaluations applied to an upper percentile of the predicted
values would be more informative about overall model performance than
those applied only to the extremes (Fox, 1981).

First, from an examination of the numbers in the TRC report, one
suspects that even though the authors followed the recommendations in Fox
(1981) and used the "top 25" (the 2.8th percentile), the top 25 for some
models, particularly IMPACT and COMPLEX/PFM, appear still to be affected by
a few very extremely high predicted values. One must recompute the average
of the maxima for the stations at Westvaco on the basis of the stations on
the ridge (n=8) to see this more easily. The problem with outliers is
immediately obvious from the stem-and-leaf and EQQ plots, however. Thus
the intent of the recommendation about extreme estimates is most likely not
being met in this evaluation. (The problem with the predictions from
IMPACT may be an extreme case.) No check of the behavior of the extremes
seems to have been done to look for values that are truly "wild."

The point is that other approaches exist which could easily address
the problem just mentioned. It seems to be quite valid to apply the concept
of "trimmed means" to these evaluations (Hosteller and Tukey, 1977). That
is, the sample is trimmed of its possibly straggling tails by setting aside
some fraction of the values from each tail of the sample. Considering our
special case, the trimming could be asymmetric, only the upper side of the
distribution would be trimmed. This would result in a mean that would be
less sensitive to the extremes, yet would well (possibly better) characterize
the behavior of the upper distribution. The other approach is to use the
median of the population, rather than the mean. Each approach has its ad-
vantages and disadvantages, but both seem to be better than what is presently
done in this evaluation. Both of these suggested approaches would produce
answers which are closer in spirit to the intended AMS recommendation.
106
-------
Median differences were reported by TRC, but it seems like they were not
integrated into the reporting of results.

Since the extreme estimates of the models are still going to be used
in regulatory practice, it seems important that the behavior of the models
for the extremes be explicitely examined as well. The extremes should not
be ignored or hidden in the upper percentile just because they are difficult
to predict. Thus it would be useful to evaluate both the trimmed means and
the extremes that were trimmed. This raises some important issues about the
potential difference between an evaluation and an application. On what
basis is one to judge the model?

Second, for the different models, the "top 25" means different things,
that is, this set of values is comprised of different sample populations.
For some models, the "top 25" means a sample of the top 25 of a particluar
population. However, for other models, the "top 25" means a sample of the
top 2 to 4 from several populations. Thus some "top 25"'s have more extreme
values (top 2-5 from several distributions) than others. As a result, some
models are being evaluated with a large weight on their extreme predictions,
whereas other models are not having such a large weight put on their extreme
predictions. This would seem to lead to some inconsistencies and problems
with the comparison between observations and prediction and with cross-
comparisons of the model predictions. The question that needs to be addressed
is, does the existence of this inconsistency make a difference? It turns
out that, except for SHORTZ, the inconsistency probably does not make a
difference for the Westvaco data. This was empirically demonstrated with
the EQQ plots. What about Cinder Cone? Not enough attention has been given
in this evaluation to the issues involved in unpairing in time and space
and what haphazard unpairing is doing to the evaluation. This area would
seem to deserve some attention.

"Informative graphic techniques should be included in any performance
evaluation." (Fox, 1981, p.603). Graphics are noticeable by their total
absence in this evaluation. That is remiss. Some of the issues about
107
-------
distributions raised above can be addressed very effectively through the
use of graphs, as we have seen in this review. Graedel and Kleiner (1983)
discuss several types of graphs that could be useful in performance evalua-
tions, including the empirical quantile-quantile plots. The usefulness of
this evaluation has been limited by the exclusion of graphs.

General Presentational Details

I have a few specific comments concerning the TRC report, mostly with
respect to fairly obvious suggestions about the text or the tables. The
documentation of the models is not considered to be precise enough,
although on the whole TRC did a very reasonable job. For example, from
the TRC description one has no idea that in 4141 the lateral diffusion
parameters of PGT are multiplied by a factor of 1.82. Also, stating that
a model uses a modified half-height plume correction does not provide
enough information. It would be useful to also state in which direction
the modification affects the answer compared to an unmodified half-height
correction.

Stations 10 and 11 of the Westvaco study are not defined, either in
the text or on Figure 3-3 of the TRC report The definition of the geometry
of the monitoring stations relative to the release heights would be very
helpful. The labeling of Table 5-2 is not precisely correct for all of its
subparts.

In Table 5-5 it would be much superior to give the average predicted
value and put the average observed in as a footnote, since the observed is
essentially the same for every model (only one exception). In many of the
tables only bias is given; the predicted values should also be presented.
In all of the tables listing predicted and observed averages, the standard
deviations should also be presented. If the data are skewed, as one might
expect, then it would be better to report the median and the upper and
lower quartiles. That would give some indication of skewness as well as
give the magnitude of the interquartile separation, which would be useful
108
-------
for intercomparisons. In tables such as Table 5-7 it would be very useful
to Include the average predicted value. This is simple and easy to do and
can contain useful information. These numbers would have provided some
helpful cross-checks in this evaluation.

There were several instances where the Pearson Correlation Coefficient
was not significantly different from zero at the 95% confidence level, for
example, in Table 5-5. These cases should always be noted. This author
tends to prefer gross error over root mean square error, because the
latter emphasises the extremes. I would rather have a measure that
provides one with a better sense of the central tendency, as does gross
error.

It would be useful to have more summary tables of the model predictions
in the report. These would give the reader a better overview. Two examples
of summary tables are given early in this review. Other summary tables
would be ones for each of the breakdowns used, such as wind speed or moni-
toring station for the Westvaco study.

Finally, a reiteration of points made earlier. One recommendation is
that the measures of variance comparison and maximum frequency difference
be corrected for bias. The Pearson and Spearman correlation coefficients,
as calculated by TRC, are probably not meaningful measures and could be
omitted, at least for the Westvaco date. Graphs would be better. Values
for C stability should be separated from those of D stability for the
Cinder Cone data. It would be very useful to rerun COMPLEX II and COMPLEX/
PFM with IOPT(25)=5. A very strong recommendation is that the actual
hourly data with all of the various conditions of windspeed, stability
class, station number, etc., for each of the breakdowns (subgroupings
such as the top-25 or the top-25 by stability class) in the tables developed
by TRC should be made available as an appendix to the report. Thought
should be given to doing the same for the 3-hourly and the 24-hour values
as well.
109
-------
REFERENCES•

Argonne National Laboratory, J.J. Roberts, ed., Report to the U.S. EPA of
The Specialists' Conference on the EPA Modeling Guidelines, Environmental
Protection Agency, Research Triangle Park, NC, 1977.

Briggs, G.A., Plume Rise and Buoyancy Effects in: Atmospheric Science and
Power Production, Darryl Randerson, ed. DOE/TIC-27601 (DE84005177),
U.S. Department of Energy, Technical Information Center, Oak Ridge,
Tenn., 1984. pp. 327-366.

Deardorff, J.W., "Different approaches toward predicting pollutant disper-
sion in the boundary layer, and their advantages and disadvantages,"
WMO Symposium on Boundary Layer Physics Applied to Specific Problems of
Air Pollution, Norrkoping, June 19-23, 1978, WMO-No. 510, Geneva,
Switzerland, 1978, pp. I.1-1.8.

Dennis, R.L., and J.S., Irwin, Current Views of Model Performance Evaluation,
in: Proceedings of the DOE/AMS Model Evaluation Workshop (Oct. 23-26,
1984 at Kiawah, S.C.), Vol. I: Participants and Invited Speakers Papers,
A.M. Weber and A.J. Garret, eds, E.I. DuPont de Nemours and Co., Savannah
River Laboratory, Aiken, S.C., 29808, 1985.

Egan, B.A., Turbulent diffusion in complex terrain, in: Lectures on Air
Pollution and Environmental Impact Analyses, D. Haugen, ed. American
Meteorological Society, Boston, MA, 1975, pp. 112-135.

Fox, D.G., "Judging air quality model performance," Bulletin American
Meteorological Society, 62: 599-609, 1981.
110
-------
Gibbons, J.D., Nonparametric Methods for Quantitative Analysis, Holt,
Rinehart and Winston, New York, 1976. 463 pp.

Graedel I.E. andB. Kleiner, Exploratory analysis of atmospheric data,
in: Probability, Statistics, and Decision Making in the Atmospheric
Sciences, A.H. Murphy and R.W. Katz, eds., Westview Press, Boulder,
CO, 1983.

Hunt, J.C.R. and P.J. Mulhearn, Turbulent dispersion from sources near
two-dimensional obstacles, J. Fluid Mech., 61: 245-274, 1973.

Hunt, J.C.R., J.S. Puttock and W.H. Snyder, Turbulent diffusion from a
Point source in stratified and neutral flows around a three-dimensional
hill--Part I. Diffusion equation analysis, Atmospheric Environment,
13: 1227-1239, 1979.

Hunt, J.C.R., R.E. Britter and J.S. Puttock, Mathematical models of
dispersion of air pollution around buildings and hills, in: IMA Conference
on Mathematical Modelling of Turbulent Diffusion in the Environment,
C.J. Harris, ed. Academic Press, New York, 1979. pp 145-200.

Irwin, J.S., Estimating plume dispersion—a comparison of several sigma
schemes, J. of Climate and Applied Meteor., 22: 92-114, 1983.

Lamb, R.G. and D.L. Durran, Eddy diffusivities derived from a numerical
model of the convective boundary layer, II Nuovo Cimento, 1: 1-17,
1978.

Mosteller F. and J.W. Tukey, Data Analysis and Regression, Addison-Wesl ey
Publishing Co., Reading, MA, 1977. 588 pp.
ill
-------
Pasquill, F., Atmospheric Dispersion Parameters in Gaussian Plume Modeling
Part II, U.S. EPA, EPA-600/4-76-030b, Env. Monitoring Series, U.S. EPA,
Research Triangle Park, NC, 1976. 44 pp.

Snyder, W.H., R. Britter and J.C.R. Hunt, "A fluid modeling study of the
flow structure and plume impingement on a three-dimensional hill in
stably stratified flow," in: J.E. Cermak, ed. Wind Engineering,
pp. 319-329. Pergamon Press, New York, 1980.

Snyder, W.H., Fluid modeling of terrain aerodynamics and plume dispersion,
paper in the Sixth Symposium on Turbulence and Diffusion, Preprint
volume, American Meteorological Society, Boston MA, 1983. pp. 317-320.

Wackter, D.J. and R.J., Londgergan, Evaluation of Complex Terrain Air
Quality Simulation Models, EPA-450/4-84-017, Office of Air Quality
Planning and Standards, EPA, Research Triangle Park, NC 27711, 1984.
243 pp.
112
-------
REVIEW OF COMPLEX TERRAIN MODELS
Prepared for the AMS-EPA Steering Committee
by

William H. Snyder*
Meteorology and Assessment Division
Environmental Sciences Research Laboratory
U.S. Environmental Protection Agency
Research Triangle Park, NC 27711
July 1984
*0n assignment from the National Oceanic and Atmospheric Administration,
U.S. Department of Commerce.
113
-------
REVIEW OF COMPLEX TERRAIN MODELS
INTRODUCTION

Eight complex terrain models are reviewed relative to their
scientific merits and their performance measures. Seven of the
models are based on Gaussian plume assumptions and one is a numer-
ical grid model. This review will attempt to assess the scientific
merits of the various models and explain their performance in terms
of those merits.

ASSESSING THE SCIENTIFIC FOUNDATIONS OF THE MODELS

Description of the Models

The seven Gaussian models have basically similar model compo-
nents. Five models use the wind speed at release height as input
whereas two (RTDM and SHORTZ) use a wind speed extrapolated from
release height to plume height. Four models use the Turner stabi-
lity categories to obtain the Pasquill-Gifford sigmas (except for
COMPLEX I, which uses 22.5° sector averaging for horizontal disper-
sion, and COMPLEX/PFM, which uses different combinations of Pasquill
Gifford-Turner (PGT) sigmas and sector averaging for different
stability categories). Two (RTDM and SHORTZ) use onsite turbulence
data to determine dispersion coefficients. One (PLUME5) obtains
stability categories from horizontal turbulence intensities and
time of day, then uses Pasquill-Gifford sigmas. All seven models
include buoyancy-induced vertical dispersion.
114
-------
Limits to vertical mixing are identical in five models and very
similar in a sixth model (SHORTZ). RTDM, however, uses a reflection
factor for terrain that is a function of the slope of the terrain;
for flat terrain, this method defaults to full reflection as is
customarily assumed, but for sloping terrain, it permits partial
plume reflection.

Larger differences appear in the Gaussian models in their calcu-
lation of plume rise. COMPLEX I and II use Briggs' (1975) final plume
rise, including momentum. PLUMES uses Briggs1 final rise with a
determination of stable layer penetration. SHORTZ uses a modification
of Briggs1 (1971, 1972) final rise with hourly temperature gradients
for plumes rising under stable conditions. 4141 uses Briggs1
(1975) transitional rise. RTDM also uses Briggs1 transitional
rise, but with hourly temperature gradients in stable conditions.
Finally, COMPLEX/PFM uses a modification of Briggs' layered rise
with allowances for reductions in plume height due to streamline
deformations. Five of the models allow for stack tip downwash,
while two (4141 and PLUME5) do not.

Major differences in the models appear in the manner in
which terrain impaction is treated. COMPLEX I and II use terrain
adjustment factors of 0.5 (plume half-height assumption) for neutral
and unstable flows and zero (with a standoff distance of 10 m) for
stable flows. COMPLEX/PFM performs the same as COMPLEX I and II
except in the narrow range of stable conditions when the plume
is above the dividing-streamline height. Under these conditions,
the plume centerline follows calculated streamlines with allowances
being made for stability (Froude number) and hill shape (crosswind
aspect ratio). 4141 uses the half-height assumption for neutral and
unstable conditions and a quarter-height assumption for stable
conditions. PLUME5 uses a variant of the half-height assumption.
SHORTZ assumes plume impingement within the mixed layer. RTDM
uses the half-height assumption for all neutral and unstable
conditions and also for stable conditions when the plume height
exceeds the dividing-streamline height. It assumes impingement
115
-------
when the plume is below the dividing-streamline height.

The three-dimensional grid model, IMPACT, interpolates and
extrapolates wind measurements at multiple sites to create a
divergence-free wind field. It uses diffusivities derived from
Smith and Howard's (1972) empirical formulations to obtain finite
difference solutions to the diffusion equation. The plume path is
determined through the computed wind fields, with Briggs1 layered
plume rise including penetration of stable layers.
Evaluation of the Models

Probably the most crucial question concerning prediction of
concentrations in complex terrain is whether the plume impinges
on the terrain or surmounts the hill top. Plumes are allowed to
impinge on terrain in all models except 4141 and PLUME5. However,
the conditions under which plume impaction is allowed to occur are
generally much too broad and, under impingement conditions, it
would appear that most of the treatments overestimate surface
concentrations.

From laboratory studies (Hunt and Snyder, 1980; Snyder et al.,
1984) and field studies (Lavery et al., 1982), we know that plume
impaction can and does occur. These studies have shown that plume
impingement occurs only under strongly stable conditions. The
criterion for impingement should be based on a Froude number based
upon the approach flow wind speed, the temperature gradient from
the base to the top of the hill, and the hill height; more appro-
priately, the criterion should be based upon the dividing-stream-
line height (Snyder et al., 1985), which allows for arbitrary
shapes of wind and temperature profiles. The assumption of plume
impingement under all stable conditions is simply incorrect.
Hence, COMPLEX I and II can be expected to predict plume impingement
much too often.

Under impingement conditions, COMPLEX II simply calculates
surface concentrations with a bivariate Gaussian distribution and a
116
-------
10 m standoff distance, thus allowing for no terrain effects on
plume diffusion and no wind meander. Hence, we may expect COMPLEX
II to grossly overestimate surface concentrations because (1) i-t
predicts impingement conditions much too often and (2) under impin-
gement, it does not allow for plume meander, deformation, nor
increased diffusion.

COMPLEX I will, in general, predict lower concentrations than
does COMPLEX II because it performs a 22.5° sector averaging for
horizontal dispersion. However, laboratory studies (Snyder and
Lawson, 1981) and field experiments (Strimaitis et al., 1983) sug-
gest that such sector averaging is inappropriate because of (2)
above. Hence, COMPLEX I may also be expected to grossly overestimate
surface concentrations.

COMPLEX/PFM moves one step in the right direction by determin-
ing the hill Froude number and thus is more likely to predict the
proper conditions for plume impingement. However, under impinge-
ment conditions, COMPLEX/PFM calculates concentrations in the same
manner as does COMPLEX I, i.e., 22.5° sector horizontal averaging.
Hence, COMPLEX/PFM may also be expected to overestimate concentra-
tions under impingement conditions because of (2) above.

SHORTZ treats plume impaction from an entirely different point
of view than do the other models. In SHORTZ, a plume must be
contained within a surface mixing layer if it is to cause signifi-
cant ground-level concentrations at any point. Hence, plume impac-
tion, in the SHORTZ definition, may occur under any stability
condition but, in the now customary definition of plume impaction,
will not occur under stable conditions unless a surface mixing
layer exists that is deep enough to include the plume. As a prac-
tical matter, however, the depth of the surface mixing layer is
defined as the height at which the vertical turbulence intensity
drops below 0.01, so that plumes in complex terrain are almost
always within the surface mixing layer (see later discussion).
117
-------
SHORTZ uses onsite turbulence data to obtain dispersion
coefficients, which is certainly an improvement over the PGT
scheme, and so SHORTZ may be expected to perform better relative
to, say, COMPLEX I or II. Also, because the model has been tested
and.developed with the use of numerous data sets, we may expect its
performance overall to be relatively better than the COMPLEX (I,II,
PFM) models.

RTDM uses the dividing-streamline concept to predict plume
impaction, although it uses a bulk parameterization (Froude number)
as opposed to the more refined integral formula of Snyder et al.
(1984). It also uses onsite turbulence data to obtain dispersion
coefficients. Thus, we may expect this model to perform better
than any of the other models under stable plume impaction conditions.

IMPACT assigns "transparencies" to the horizontal and vertical
cell faces in order to accomodate the effects of atmospheric stabi-
lity. These transparencies were developed on the basis of simula-
tions of idealized problems. It is not known whether or not such
assignments are realistic. However, the assignments are based
on the PGT classification system which, as mentioned previously,
is much too broad a classification scheme to use in predicting plume
impaction. The diffusivities used in IMPACT are also determined
through the PGT scheme and not through onsite turbulence data.

One class of flows that needs to be dealt with separately is
the stable condition where the plume surmounts the hill top. This
includes strongly stable flows where the plume is released above
the dividing-streamline height as well as moderately stable flows
where all the flow surmounts the hill top (dividing-steamline height
is zero). This class of flows is not dealt with at all by COMPLEX I
or II. COMPLEX/PFM makes the most noble attempt of all the models
for treating this type of flow; it allows for a deformation of
streamlines (closeness of approach of the plume to the terrain)
as a function of stability (Froude number) and terrain shape (cross-
wind aspect ratio). It is the only Gaussian model that allows for
118
-------
differences in terrain shape, which is known to have a strong
influence on the plume trajectory and hence surface concentrations.
In principle, COMPLEX/PFM should perform better than the other
Gaussian models in this class of flows, although the algorithms
for-deal ing with the adjustments for terrain shape and stability
could certainly be improved.

4141 uses a "quarter-height plume" assumption for all stable
conditons, irrespective of the plume height or terrain shape. RTDM
uses a "half height plume" assumption for all stable conditons
where the plume height exceeds the dividing-streamline height.
Indeed, when the dividing-streamline height HQ is not zero, RTDM
treats the flow as if the ground surface were located at the dividing-
streamline height, and the "half-height" is calculated with refer-
ence to this pseudo ground surface. This treatment appears to be
physically sound (Snyder and Hunt, 1984; Snyder and Lawson, 1984).
PLUMES uses a conservative modification to the half-height assump-
tion. Whereas the RTDM and PLUMES methods are certainly improve-
ments over the 4141 method, they are clearly not as physically
sound as the COMPLEX/PFM method.
EVALUATION OF MODEL PERFORMANCE

The statistical performance measures are quite illuminating
and do, in general, illustrate that the soundness of the physics
improves model performance.

Many performance measures are presented in the TRC report
Wackter and Londergan, 1984) and it is difficult to know which
measures are most important. A broad-brush look at all of the
measures for the various types of comparisons suggests that the
models should be ranked in the following order (best to worst) for
the Westvaco comparison:
119
-------
RTDM
4141
SHORTZ
PLUME5
COMPLEX/PFM
COMPLEX I
COMPLEX II
IMPACT

And for the Cinder Cone Butte comparison:

RTDM
IMPACT
COMPLEX I
SHORTZ
COMPLEX/PFM
PLUMES
4141
COMPLEX II

It should be pointed out that these rankings are not hard and
fast. None of the models scored at the same rank for all measures
and for all subsets of data. However, generally speaking, rankings
based on one measure were surprisingly close to those based on other
measures, e.g., if the average difference was small, then the rms
error was also small, and the correlation coefficient was generally
higher (relatively speaking). Hence, the weighting of the relative
importance of the various measures was not highly significant.
These performance rankings are essentially independent of the
averaging times, as may be expected.

COMPLEX II shows the most consistent and pronounced tendency
to overpredict concentrations, for essentially all data sets and all
types of comparisons. This presumably results from the overpre-
diction of impingement conditions and overprediction of concentra-
120
-------
tions under impingement conditions. This is borne out by the
tables of Appendix B, where concentrations are grossly overpredict-
ed under stability conditions E and F.

. It is not clear why the-IMPACT model performed much better
(relative to the other models) on the Cinder Cone Butte (CCB)
comparison than on the Westvaco comparison. One possible reason is
the plume rise algorithm, since plume rise was not a factor in the
CCB data but was in the Westvaco data. Much more likely, however,
is that the grid covering the Westvaco terrain was much too small
to adequately resolve the flow field in this "sea of mountains".
Whereas the grid for the CCB terrain allowed a sufficient area (of
flat land) surrounding the hill, the Westvaco grid was only 2.6 x 3
km in area and could not allow for the effects of the surrounding
high terrain. Also not clear is why 4141 and PLUMES performed well
on the Westvaco comparison, but poorly on the CCB comparison. The
most likely reason is that the CCB data are primarily for stable
conditions whereas the Westvaco data cover all stability conditions.
As pointed out previously, neither of these models allows for terrain
impaction, so they might be expected to underpredict concentrations
under strong stability. The tables in Appendices B and C, however,
do not support this notion.

The fact that COMPLEX/PFM performed better than COMPLEX I and
II for the Westvaco comparison suggests that the adjustments for
streamline deformations when the plume is above the dividing-
streamline height under neutral and stable conditions were worth-
while. The much poorer performance of COMPLEX/PFM compared with
COMPLEX I for the CCB comparison, however, suggests that those
algorithms for streamline deformations as functions of terrain
shape and stability need additional work.

With regard to the predictions of SHORTZ, the user's manual
leads the reader to believe that with a ground-based inversion
(small mixing depth) and with the plume above the mixing depth,
plumes do not contribute significantly to ground-level concentra-
121
-------
tions at any receptor. The model thus appears to allow impaction
under neutral and even unstable conditions (level plume within
mixed layer), but not under the most strongly stable conditions.
This algorithm is diametrically opposite to current understanding
of flows in complex terrain,-where plume impaction occurs only
under strongly stable conditions. However, the surface mixing layer
in this model is defined as the height where the vertical turbulence
intensity drops below 0.01. Study of the Modeler's Data Archive
from Cinder Cone Butte showed that the vertical turbulence intensity
at plume elevation was less than 0.01 in only 1 hour of the total
111 hours. Perusal of one year of Westvaco data also showed that,
at typical plume elevations, the vertical turbulence intensity was
less than 0.01 only 0.75% of the time (Cimorelli, private communica-
tion). Hence, it appears that, in practical terms, the plumes are
"always" within the surface mixing layer and hence are allowed to
impact on the hills. This is supported by the tables in the appen-
dices, which show that SHORTZ overpredicts most strongly under
stability class F. Hence, the end result is that SHORTZ uses similar
level-plume trajectory assumptions as do COMPLEX I and II for stable
conditions. The fact that SHORTZ performs better than COMPLEX I and
II suggests that the use of on-site turbulence data for the computa-
tion of dispersion coefficients is helpful.

The comparisons between the predicted and observed concentrations
paired in time and location showed significantly smaller discrepancies
and significantly higher correlations for the Cinder Cone Butte data
set than for the Westvaco data set. I believe this is due to the more
refined meteorological data collected at Cinder Cone Butte and to the
better control maintained during the experiment. For example, wind
directions contained in the Modeler's Data Archive were derived from
interpolations of tower wind measurements (to plume elevation) as well
as lidar and photographic observations of plume position. Similarly,
the single high tower at CCB provided a much better characterization
of the relatively homogeneous approach flow there as compared with the
one or two measurements per tower at each of three towers scattered
over the Westvaco site. Also, plume rise was known (zero) at CCB and
122
-------
unknown (estimated by the models) at Westvaco. These remarks are not
intended in any way to denigrate the measurement program at Westvaco,
but rather to make the point that the more accurate and comprehensive
input data do indeed significantly improve model performance.

I wish finally to comment that the wealth of statistics generated
is useful, but so voluminous as to be difficult to digest. Neverthe-
less, it would be desirable to see even more detailed depictions of
the results through various types of graphical displays such as scatter
plots in order to isolate the causes of poor model performance. The
statistics by themselves only suggest possible causes. Also, the sub-
groupings are not divided finely enough to isolate particular causes.
On the other hand, as pointed out by Wackter and Londergan (1984), it is
difficult to select meaningful graphical and tabular displays with a
limited report space. But even a modest amount of additional informa-
tion could further overwhelm a reviewer. I therefore strongly advocate
specific case studies as suggested by Irwin and Smith (1984), to eval-
uate specific strengths and weaknesses in both the data and modeling
assumptions. I believe such case studies would be a valuable supple-
ment to the purely statistical approach employed here.
123
-------
REFERENCES

Briggs, G.A., 1971: Some Recent Analyses of Plume Rise Observations,
In: Proceedings of the Second International Clear Air Congress, Academic
Press, New York.

Briggs, G.A., 1972: Chimney Plumes in Neutral and Stable Surroundings,
Atmos. Envir., v. 6, p. 507-510.

Briggs, G.A., 1975: Plume Rise Predictions. In: Lectures on Air
Pollution and Environmental Impact Analyses, Amer. Meteorol. Soc.,
Boston, MA.

Hunt, J.C.R. and Snyder, W.H., 1980: Experiments on Stably and Neut-
rally Stratified Flow over a Model Three-Dimensional Hill, J. Fluid
Mech., v. 96, p. 671-704.

Irwin, J. and Smith, M., 1984: Potentially Useful Additions to the
Rural Model Performance Evaluation, Bull. Amer. Meteorol. Soc., v.
65, p. 559-568.

Smith, F.B., and Howard, S.M., 1972: Methodology for Treating
Diffusivity, Meteorology Research, Inc. (MRI) Publication FR-1020.

Snyder, W.H. and Hunt, J.C.R., 1984: Turbulent Diffusion from a Point
Source in Stratified and Neutral Flows around a Three-Dimensional
Hill; Part II: Laboratory Measurements of Surface Concentrations,
Atmos. Envir., v. 18, p. 1969-2002.

Snyder, W.H. and Lawson, R.E. Jr., 1981: Laboratory Simulation of
Stable Plume Dispersion over Cinder Cone Butte: Comparison with
Field Data, Appendix, EPA Complex Terrain Model Development First
Milestone Report - 1981, Rpt. No. EPA-600/3-82-036, Envir. Prot.
Agcy., Res. Tri. Pk., NC, p. 250-304.

Snyder, W.H. and Lawson, R.E., Jr., 1984: Stable Plume Dispersion
over an Isolated Hill: Releases above the Dividing-Streamline Height,
Appendix, EPA Complex Terrain Model Development Fourth Milestone
Report - 1984 (in review), Envir. Prot. Agcy., Res. Tri. Pk, NC.

Snyder, W.H., Thompson, R.S., Eskridge, R.E., Lawson, R.E., Jr.,
Castro, I.P., Lee, J.T., Hunt, J.C.R. and Ogawa, Y., 1985: The
Structure of Strongly Stratified Flow over Hills: Dividing-Stream-
line Concept, J. Fluid Mech. (to appear).

Strimaitis, D.G., Venkatram, A., Greene, B.R., Hanna, S., Heisler,
S., Lavery, T.F., Bass, A., and Egan, B.A., 1983: EPA Complex
Terrain Model Development Second Milestone Report - 1982, Rpt. No.
EPA-600/3-83-015, Envir. Prot. Agcy., Res. Tri. Pk., NC, 375p.

Wackter, D.J. and Londergan, R.J., 1984: Evaluation of Complex
Terrain Air Quality Models, Rpt. to Envir. Prot. Agcy. under Con-
tract No. 68-02-3514, Res. Tri. Pk., NC, 233p.
124
-------
TECHNICAL REPORT DATA
(Mease read Instructions on the reverse before completing)
1. REPORT NO.
2.
3. RECIPIENT'S ACCESSION NO.
4. TITLE AND SUBTITLE

SUMMARY OF COMPLEX TERRAIN MODEL EVALUATION
5. REPORT DATE
6. PERFORMING ORGANIZATION CODE
7. AUTHOR(S)
8. PERFORMING ORGANIZATION REPORT NO.
Fred D. White, Jason K..S. Ching, Robin L. Dennis,
and William H. Snyder
9. PERFORMING ORGANIZATION NAME AND ADDRESS

American Meteorological Society, Boston, MA

Meteorology and Assessment Division
Research Triangle Park, NC 27711
10. PROGRAM ELEMENT NO.

CDWA1A/02 0279 (FY-85)
11. CONTRACT/GRANT NO.
CR 810297 and Inhouse
12. SPONSORING AGENCY NAME AND ADDRESS
13. TYPE OF REPORT AND PERIOD COVERED
Atmjspheric Sciences Research Laboratory— RTP, NC
Office of Research and Development
U.S. Environmental Protection Agency
Research Triangle Park, NC 27711
Interim
(FY-84/85)
14. SPONSORING AGENCY CODE
EPA/600/09
15. SUPPLEMENTARY NOTES
16. ABSTRACT
The Environmental Protection Agency conducted a scientific review of a set of
eight complex terrain dispersion models. TRC Environmental Consultants, Inc. calcula-
ted and tabulated a uniform set of performance statistics for the models using the
Cinder Cone Butte and Westvaco Luke Mill data bases. Three members of the EPA
Meteorology and Assessment Division reviewed the performance statistics and presented
objective analyses of the models and their performance. An American Meteorological
Society Steering Committee summarized the reviews and formulated three conclusions:
(1) none of the models can be regarded as up-to-date scientifically; (2) one model
exhibited much better performance statistics than did the others; and (3) overprediction
was the most common problem with the models. This report consists of the AMS summary
and copies of three independent reviews conducted to evaluate the model performance.
17.
KEY WORDS AND DOCUMENT ANALYSIS
DESCRIPTORS
b.lDENTlFIERS/OPEN ENDED TERMS C. COSATI Field/Group
18. DISTRIBUTION STATEMENT

RELEASE TO PUBLIC
19. SECURITY CLASS (Tills Report)
UNCLASSIFIED
21. NO. OF PAGES
20. SECURITY CLASS (Thispage)
UNCLASSIFIED
22. PRICE
EPA Form 2220-1 (R«v. 4-77) PREVIOUS EDITION is OBSOLETE
-------