7.4
AN AGGREGATION AND EPISODE SELECTION SCHEME
              FOR ERA'S MODELS-3 CMAQ
             Brian K. Eder***, Richard D, Cohrf, Sharon K. LeDuc** and Robin L. Dennis*1*1

                                   *Air Resources Laboratory
                         National Oceanic and Atmospheric Administration
                              Research Triangle Park, North Carolina

                                   " Analytical Sciences, Inc.
                                    Durham, North Carolina
1. INTRODUCTION

     In support of studies mandated by the 1990
Clean   Air  Act  Amendments,  the  Models-3
Community Multiscale Air Quality  (CMAQ) model
was developed by the U.S. Environmental Protection
Agency's Atmospheric Modeling Division (Byun and
Ching 1999). CMAQ, which is a powerful, new "third
generation"  model,  is  used  to simulate  air
concentrations  and  deposition  of tropospheric
ozone, acidic deposition, visibility  and particulate
matter associated with specified levels of emissions.
These simulations are then  used by EPA Program
Offices and research laboratories  to support both
regulatory assessment  and scientific studies on a
myriad of scales ranging from urban to continental.
  These assessment studies often require CMAQ-
based simulations on seasonal and even annual
time frames.   Unfortunately, it is  computationally
expensive to execute CMAQ, which is designed for
episode simulation,  over such long time  frames.
Therefore in practice, CMAQ must be executed for
a finite number of  episodes or events  that are
selected  to   represent   a  wide  variety  of
meteorological  classes or clusters,  A statistical
procedure called aggregation must then be applied
to the outputs from  CMAQ in order to derive the
requisite seasonal and annual average estimates.
  This paper  describes  the development of the
aggregation  and episode selection scheme and
provides an evaluation of its effectiveness using the
light-extinction coefficient (bext), which is  used to
characterize visibility.

1+1 On assignment to the  National Exposure Research
Laboratory, United States  Environmental Protection
Agency, RTF, NC 27711

*  Corresponding author address:  Brian  K.  Eder,
AMD/ARL/NOAA, Mail Drop 80. RTF, NC 27711; e-mail:
eder@hpcc.epa.gov.
                              2.  SCHEME DEVELOPMENT

                              2.1 Meteorological classification

                                The first step  in the  aggregation and episode
                              selection   scheme   is   identification   and
                              characterization of  representative meteorological
                              clusters. Following the work of Brook et al. (1995),
                              the 700 mb wind field as defined by the zonal u and
                              meridional v wind components, was clustered using
                              Ward's method of cluster analysis.  Ward's method
                              was chosen because it minimizes within-cluster
                              sums of squares,  in an agglomerative {i.e., moving
                              from   many   clusters   toward  fewer  clusters),
                              hierarchical process {i.e., once clusters are joined
                              they cannot be separated).  The 700 mb height was
                              selected (as opposed to 850  hPa, which may be
                              more  representative of the  boundary layer)  in
                              consideration of the  high  mountainous terrain of the
                              western United States.  The data, which cover the
                              nine year period  from 1984-1992, were obtained
                              from  the  National Centers  for  Environmental
                              Prediction/National  Center  for  Atmospheric
                              Research reanalysis project (Kalnayetal. 1996). To
                              conform to normal CMAQ runs,  which  typically
                              simulate five day episodes, the wind fields were
                              grouped into running five day periods prior to being
                              clustered.  Also, to accommodate the  continental
                              domain, while achieving adequate spatial resolution,
                              336 grid nodes with 2.5°  spatial resolution were
                              used in the clustering.
                                Numerous alternative  schemes were compared
                              using statistical and meteorological considerations
                              until an optimal scheme was developed that included
                              20 clusters, assigned five per season (Conn et al.
                              1999).- An  example of the wind regime associated
                              with one particular cluster (4) is shown in  Fig.  1.
                              This cluster, which accounted for 25.7%  of all 5 day
                              winter events, depicts a typical high-amplitude flow
                              with a western ridge and  eastern trough.

-------
           Fig. 1  Mean wind vectors for of seasonally (Winter) defined cluster 4 (of 20).
2.2 Episode selection and aggregation

   From  these  20  homogeneous  meteorological
clusters, a stratified sample of 40 events was then
randomly selected  using a systematic sampling
technique (without replacement) to ensure adequate
representation over the entire nine year period.
These 40 events can then be aggregated into the
desired  seasonal  and  annual time frames. The
sample  of  40  recommended  simulation events
includes  representation  of  every month (ranging
from 1 (February) to 6 (January) events per month)
and every year during the period (ranging from 3
(1984,1988) to 6 (1990) events per year).
   In practice, the aggregation calculations will be
applied to model-based estimates obtained for each
sampled  event  in order  to  achieve unbiased
estimates for seasonal and annual means within
each model grid cell. In essence, the aggregation
calculations simply provide  weighted means from
our sample of 40 events.
   As an illustration of the aggregation approach,
consider the   estimation  of  a  mean  annual
concentration using model output for the 40 events
selected  above.   These events represent the 20
meteorological clusters. Let 1, denote the frequency
of occurrence associated with cluster /  (Le., trie
total number of  events belonging to  the cluster
during the period 1984-92). For an individual grid
cell, also lei:
                     'MODEL
represent the mean model-based concentration
associated with ail sampled events from cluster /'.
Thus, for clusters with a single sampled event, it is
just the event mean concentration in the grid cell.
For clusters with two or three sampled events, it is
the mean concentration for all of those events. Then
an  unbiased  estimate   of the  annual  mean
concentration is given by:
                20  _
c =
                    20
                    Sf.
                                          (1)
Estimates for other parameters (i.e. dry deposition)
and other summary statistics are calculated using
similar techniques.

-------
3. EVALUATION
       4.  SUMMARY
  In order to illustrate and evaluate the aggregation
and episode selection technique, comparisons were
made between the actual mean extinction coefficient
(bB)[t) observed at over 200 stations nationwide:for
the period 1984-1992 and the aggregated estimates
of that mean using the  stratified sample pf events
discussed in Section 2.2. The b^  (units of km"1)
was selected as the evaluative parameter for several
reasons. First, it serves well as a surrogate for fine
particulate  matter  (PM2S),  for  which  little
observational  data exist.  Second, of all of the air
quality parameters  simulated by CMAQ,  the bex,,
has  one of the most  spatially and  temporally
comprehensive observational data sets available.
And  finally, the visual  range  v,  (km)   can  be
estimated from  be!d  by  using  the Koschmieder
equation:
                 V,  =   3.91
(2)
Observations with precipitation or a relative humidity
greater than 90% were omitted.
 As part of the evaluation, the percent deviations in
aggregated estimates of the mean  b^  (where the
deviations are relative to the observed mean) were
calculated over  the  period  1984-1992 and are
presented in Fig. 2.
   For the most part, the evaluation revealed very
good representation in  that a majority  of the 201
stations recorded mean aggregated bext estimates
falling withing ± 10% of the mean observed b^ for
the entire nine-year period. There is however, some
spatial bias as reflected  in the tendency for areas of
under- and over-prediction to be clustered together
geographically.
       Reasons  for this  bias are not currently
understood and will be the focus of future research,
which will also investigate the ability of the scheme
to replicate b^ on finer temporal and spatial scales
in  order to accommodate various applications of
CMAQ.  For instance, will aggregated estimates of
bax( for a particularly anomalous year (such as 1988)
still fall within the ± 10% of the observed means or
will the  estimates deteriorate? Likewise,  wili the
application of this approach, which was developed
on  a continental scale,  exacerbate the spatial
dependencies seen in  Fig. 2 when performed on
various  regional  scales?  These concerns will be
addressed as  specific CMAQ simulations are
planned and performed.
          The objective of this research was to develop a
       new aggregation  approach and set of events to
       support  seasonal  and  annual  CMAQ-based
       distributional estimates of air concentrations and
       deposition of tropospheric ozone, acidic deposition,
       visibility and particulate matter over the continental
       domain.  The primary strategy involved categorizing
       many years of meteorological patterns into a few
       homogeneous  classes or  clusters.    A  basic
       aggregation technique was also illustrated for the
       selected sample of events, and revealed aggregated
       estimates of the bexl falling generally within ±10% of
       the observed mean bex, for the period 1984-92 .
Acknowledgments.  This work was supported by the
U.S. Environmental Protection Agency under GSA
contract GS-35F-4750G.

Disclaimer. This document has been reviewed and
approved  by the  U.S.  Environmental  Protection
Agency for publication. Mention of trade names or
commercial  products   does  not   constitute
endorsement or recommendation for use.

5. REFERENCES

Brook, J. R., P. J. Samson, and S. Sillman, 1995:
       Aggregation of selected three-day periods to
       estimate   annual  and   seasonal   wet
       deposition totals for  sulfate, nitrate,  and
       acidity.  Part I: A synoptic  and chemical
       climatology for eastern North America. J.
       Appl. Meteor., 34, 297-325.

Byun, D.,  and J. K. S. Ching,  1999:    Science
       Algorithms of the EPA Models-3 Community
       Multiscale Air Quality (CMAQ) Modeling
       System,  EPA Tech.  Rep.  EPA-600/R-
       99/030.

Cohn, R. D., B. K. Eder, and S. K. LeDuc, 1999: An
       aggregation and episode selection scheme
       designed  to support Models-3  CMAQ.
       Science Algorithms of the  EPA Models-3
       Community Multiscale Air Quality (CMAQ)
       Modeling System, EPA Tech. Rep. EPA-
       600/R-99/030.

Kalnay, E, and Coauthors, 1996: The NCEP/NCAR
       40-year  reanalysis  project.  Bull,  Amer.
       Meteor. Soc., 77, 437-471.

-------
Fig. 2. Spatial variation of the bias of the aggregated estimates of the mean bext (km-1) for the
period 1984-1992. Top figure indicates sites with positive bias, bottom figure sites with negative
bias,   (Deviations (%) are relative to the observed mean: aggregate-observed/observed).

-------
         HERL-RTP-AMD-00-158
TECHNICAL  REPORT  DATA
  1. REPORT NO. EPA/600/A-OQ/(H?-
  4. TITLE AND SUBTITLE

  An  Aggregation  and  Episode  Selection Scheme  for
  EPA's  Models-3  CMAQ
                               3.RECI
                                                                      5.REPC-.- 	
                                                                      6.PERFORMING ORGANIZATION CODE
  7. AUTHOR(S)

  Brian Eder, Rich Cohn, Sharon LeDuc and Robin Dennis
                               8.PERFORMING ORGANIZATION REPORT NO.
  9. PERFORMING ORGANISATION NAME AND ADDRESS

  Same as  Block  12
                                                                      10.PROGRAM ELEMENT NO.
                                                                      11. CONTRACT/GRANT NO.
  12. SPONSORING AGENCY NAME AND ADDRESS
  National Exposure  Research Laboratory
  Office  of  Research and  Development
  U.S.  Environmental Protection  Agency
  Research Triangle  Park,  NC 27711
                               13.TYPE OF REPORT AND PERIOD COVERED

                               Conference  Reprint
                               14.  SPONSORING AGENCY CODE

                               EPA/600/9
  15. SUPPLEMENTARY NOTES
  16.  ABSTRACT
  The development of an episode selection and aggregation approach, designed to support distributional estimation for use with
  the Models-3 Community MultiscaJe Air Quality (CMAQ) model, is described. The approach utilized cluster analysis of the
  700 hPa u and v wJnd field components over the time period 1984-92 to define homogeneous meteorological clusters.
  Alternative schemes were compared using relative efficiencies and meteorological considerations,  An optima! scheme was
  defined to include 20  clusters (five per season), and a stratified sample of 40 events was selected from the 20 clusters using a
  systematic sampling technique. The light-extinction coefficient, which provides a measure of visibility, was selected as the
  primary evaluative parameter for two reasons. First, this parameter can serve as a surrogate for PM-2.5, for which little
  observational data exist.  Second, of the air quality parameters simulated by CMAQ, this visibility parameter has one of the
  most spatially and temporally comprehensive observational data sets.  Results suggest that the approach reasonably
  characterizes synoptic-scale flow patterns and leads to strata that explain the variation in extinction coefficient and other
  parameters (temperature and relative humidity) used in this analysis, and therefore can be used to achieve improved estimates
  of these parameters relative to estimates obtained using other methods. Moreover, defining seasonally based clusters further
  improves the ability of the clusters to explain the variation in these parameters.
                                       KEY WORDS AND DOCUMENT ANALYSIS
                       DESCRIPTORS
                  b.IDENTIFIERS/ OPEN ENDED
                  TERMS
                                                                                         c.COSATI
  18. DISTRIBUTION STATEMENT
  RELEASE TO  PUBLIC
                  19. SECURITY CLASS (This
                  Report)

                  UNCLASSIFIED
                                                                                         21.NO.  OF PAGES
                                                         20. SECURITY CLASS (This
                                                         Page)

                                                         UNCLASSIFIED
                                                                                         22.  PRICE
2220-1

-------