7.4
AN AGGREGATION AND EPISODE SELECTION SCHEME
FOR ERA'S MODELS-3 CMAQ
Brian K. Eder***, Richard D, Cohrf, Sharon K. LeDuc** and Robin L. Dennis*1*1
*Air Resources Laboratory
National Oceanic and Atmospheric Administration
Research Triangle Park, North Carolina
" Analytical Sciences, Inc.
Durham, North Carolina
1. INTRODUCTION
In support of studies mandated by the 1990
Clean Air Act Amendments, the Models-3
Community Multiscale Air Quality (CMAQ) model
was developed by the U.S. Environmental Protection
Agency's Atmospheric Modeling Division (Byun and
Ching 1999). CMAQ, which is a powerful, new "third
generation" model, is used to simulate air
concentrations and deposition of tropospheric
ozone, acidic deposition, visibility and particulate
matter associated with specified levels of emissions.
These simulations are then used by EPA Program
Offices and research laboratories to support both
regulatory assessment and scientific studies on a
myriad of scales ranging from urban to continental.
These assessment studies often require CMAQ-
based simulations on seasonal and even annual
time frames. Unfortunately, it is computationally
expensive to execute CMAQ, which is designed for
episode simulation, over such long time frames.
Therefore in practice, CMAQ must be executed for
a finite number of episodes or events that are
selected to represent a wide variety of
meteorological classes or clusters, A statistical
procedure called aggregation must then be applied
to the outputs from CMAQ in order to derive the
requisite seasonal and annual average estimates.
This paper describes the development of the
aggregation and episode selection scheme and
provides an evaluation of its effectiveness using the
light-extinction coefficient (bext), which is used to
characterize visibility.
1+1 On assignment to the National Exposure Research
Laboratory, United States Environmental Protection
Agency, RTF, NC 27711
* Corresponding author address: Brian K. Eder,
AMD/ARL/NOAA, Mail Drop 80. RTF, NC 27711; e-mail:
eder@hpcc.epa.gov.
2. SCHEME DEVELOPMENT
2.1 Meteorological classification
The first step in the aggregation and episode
selection scheme is identification and
characterization of representative meteorological
clusters. Following the work of Brook et al. (1995),
the 700 mb wind field as defined by the zonal u and
meridional v wind components, was clustered using
Ward's method of cluster analysis. Ward's method
was chosen because it minimizes within-cluster
sums of squares, in an agglomerative {i.e., moving
from many clusters toward fewer clusters),
hierarchical process {i.e., once clusters are joined
they cannot be separated). The 700 mb height was
selected (as opposed to 850 hPa, which may be
more representative of the boundary layer) in
consideration of the high mountainous terrain of the
western United States. The data, which cover the
nine year period from 1984-1992, were obtained
from the National Centers for Environmental
Prediction/National Center for Atmospheric
Research reanalysis project (Kalnayetal. 1996). To
conform to normal CMAQ runs, which typically
simulate five day episodes, the wind fields were
grouped into running five day periods prior to being
clustered. Also, to accommodate the continental
domain, while achieving adequate spatial resolution,
336 grid nodes with 2.5° spatial resolution were
used in the clustering.
Numerous alternative schemes were compared
using statistical and meteorological considerations
until an optimal scheme was developed that included
20 clusters, assigned five per season (Conn et al.
1999).- An example of the wind regime associated
with one particular cluster (4) is shown in Fig. 1.
This cluster, which accounted for 25.7% of all 5 day
winter events, depicts a typical high-amplitude flow
with a western ridge and eastern trough.
-------
Fig. 1 Mean wind vectors for of seasonally (Winter) defined cluster 4 (of 20).
2.2 Episode selection and aggregation
From these 20 homogeneous meteorological
clusters, a stratified sample of 40 events was then
randomly selected using a systematic sampling
technique (without replacement) to ensure adequate
representation over the entire nine year period.
These 40 events can then be aggregated into the
desired seasonal and annual time frames. The
sample of 40 recommended simulation events
includes representation of every month (ranging
from 1 (February) to 6 (January) events per month)
and every year during the period (ranging from 3
(1984,1988) to 6 (1990) events per year).
In practice, the aggregation calculations will be
applied to model-based estimates obtained for each
sampled event in order to achieve unbiased
estimates for seasonal and annual means within
each model grid cell. In essence, the aggregation
calculations simply provide weighted means from
our sample of 40 events.
As an illustration of the aggregation approach,
consider the estimation of a mean annual
concentration using model output for the 40 events
selected above. These events represent the 20
meteorological clusters. Let 1, denote the frequency
of occurrence associated with cluster / (Le., trie
total number of events belonging to the cluster
during the period 1984-92). For an individual grid
cell, also lei:
'MODEL
represent the mean model-based concentration
associated with ail sampled events from cluster /'.
Thus, for clusters with a single sampled event, it is
just the event mean concentration in the grid cell.
For clusters with two or three sampled events, it is
the mean concentration for all of those events. Then
an unbiased estimate of the annual mean
concentration is given by:
20 _
c =
20
Sf.
(1)
Estimates for other parameters (i.e. dry deposition)
and other summary statistics are calculated using
similar techniques.
-------
3. EVALUATION
4. SUMMARY
In order to illustrate and evaluate the aggregation
and episode selection technique, comparisons were
made between the actual mean extinction coefficient
(bB)[t) observed at over 200 stations nationwide:for
the period 1984-1992 and the aggregated estimates
of that mean using the stratified sample pf events
discussed in Section 2.2. The b^ (units of km"1)
was selected as the evaluative parameter for several
reasons. First, it serves well as a surrogate for fine
particulate matter (PM2S), for which little
observational data exist. Second, of all of the air
quality parameters simulated by CMAQ, the bex,,
has one of the most spatially and temporally
comprehensive observational data sets available.
And finally, the visual range v, (km) can be
estimated from be!d by using the Koschmieder
equation:
V, = 3.91
(2)
Observations with precipitation or a relative humidity
greater than 90% were omitted.
As part of the evaluation, the percent deviations in
aggregated estimates of the mean b^ (where the
deviations are relative to the observed mean) were
calculated over the period 1984-1992 and are
presented in Fig. 2.
For the most part, the evaluation revealed very
good representation in that a majority of the 201
stations recorded mean aggregated bext estimates
falling withing ± 10% of the mean observed b^ for
the entire nine-year period. There is however, some
spatial bias as reflected in the tendency for areas of
under- and over-prediction to be clustered together
geographically.
Reasons for this bias are not currently
understood and will be the focus of future research,
which will also investigate the ability of the scheme
to replicate b^ on finer temporal and spatial scales
in order to accommodate various applications of
CMAQ. For instance, will aggregated estimates of
bax( for a particularly anomalous year (such as 1988)
still fall within the ± 10% of the observed means or
will the estimates deteriorate? Likewise, wili the
application of this approach, which was developed
on a continental scale, exacerbate the spatial
dependencies seen in Fig. 2 when performed on
various regional scales? These concerns will be
addressed as specific CMAQ simulations are
planned and performed.
The objective of this research was to develop a
new aggregation approach and set of events to
support seasonal and annual CMAQ-based
distributional estimates of air concentrations and
deposition of tropospheric ozone, acidic deposition,
visibility and particulate matter over the continental
domain. The primary strategy involved categorizing
many years of meteorological patterns into a few
homogeneous classes or clusters. A basic
aggregation technique was also illustrated for the
selected sample of events, and revealed aggregated
estimates of the bexl falling generally within ±10% of
the observed mean bex, for the period 1984-92 .
Acknowledgments. This work was supported by the
U.S. Environmental Protection Agency under GSA
contract GS-35F-4750G.
Disclaimer. This document has been reviewed and
approved by the U.S. Environmental Protection
Agency for publication. Mention of trade names or
commercial products does not constitute
endorsement or recommendation for use.
5. REFERENCES
Brook, J. R., P. J. Samson, and S. Sillman, 1995:
Aggregation of selected three-day periods to
estimate annual and seasonal wet
deposition totals for sulfate, nitrate, and
acidity. Part I: A synoptic and chemical
climatology for eastern North America. J.
Appl. Meteor., 34, 297-325.
Byun, D., and J. K. S. Ching, 1999: Science
Algorithms of the EPA Models-3 Community
Multiscale Air Quality (CMAQ) Modeling
System, EPA Tech. Rep. EPA-600/R-
99/030.
Cohn, R. D., B. K. Eder, and S. K. LeDuc, 1999: An
aggregation and episode selection scheme
designed to support Models-3 CMAQ.
Science Algorithms of the EPA Models-3
Community Multiscale Air Quality (CMAQ)
Modeling System, EPA Tech. Rep. EPA-
600/R-99/030.
Kalnay, E, and Coauthors, 1996: The NCEP/NCAR
40-year reanalysis project. Bull, Amer.
Meteor. Soc., 77, 437-471.
-------
Fig. 2. Spatial variation of the bias of the aggregated estimates of the mean bext (km-1) for the
period 1984-1992. Top figure indicates sites with positive bias, bottom figure sites with negative
bias, (Deviations (%) are relative to the observed mean: aggregate-observed/observed).
-------
HERL-RTP-AMD-00-158
TECHNICAL REPORT DATA
1. REPORT NO. EPA/600/A-OQ/(H?-
4. TITLE AND SUBTITLE
An Aggregation and Episode Selection Scheme for
EPA's Models-3 CMAQ
3.RECI
5.REPC-.-
6.PERFORMING ORGANIZATION CODE
7. AUTHOR(S)
Brian Eder, Rich Cohn, Sharon LeDuc and Robin Dennis
8.PERFORMING ORGANIZATION REPORT NO.
9. PERFORMING ORGANISATION NAME AND ADDRESS
Same as Block 12
10.PROGRAM ELEMENT NO.
11. CONTRACT/GRANT NO.
12. SPONSORING AGENCY NAME AND ADDRESS
National Exposure Research Laboratory
Office of Research and Development
U.S. Environmental Protection Agency
Research Triangle Park, NC 27711
13.TYPE OF REPORT AND PERIOD COVERED
Conference Reprint
14. SPONSORING AGENCY CODE
EPA/600/9
15. SUPPLEMENTARY NOTES
16. ABSTRACT
The development of an episode selection and aggregation approach, designed to support distributional estimation for use with
the Models-3 Community MultiscaJe Air Quality (CMAQ) model, is described. The approach utilized cluster analysis of the
700 hPa u and v wJnd field components over the time period 1984-92 to define homogeneous meteorological clusters.
Alternative schemes were compared using relative efficiencies and meteorological considerations, An optima! scheme was
defined to include 20 clusters (five per season), and a stratified sample of 40 events was selected from the 20 clusters using a
systematic sampling technique. The light-extinction coefficient, which provides a measure of visibility, was selected as the
primary evaluative parameter for two reasons. First, this parameter can serve as a surrogate for PM-2.5, for which little
observational data exist. Second, of the air quality parameters simulated by CMAQ, this visibility parameter has one of the
most spatially and temporally comprehensive observational data sets. Results suggest that the approach reasonably
characterizes synoptic-scale flow patterns and leads to strata that explain the variation in extinction coefficient and other
parameters (temperature and relative humidity) used in this analysis, and therefore can be used to achieve improved estimates
of these parameters relative to estimates obtained using other methods. Moreover, defining seasonally based clusters further
improves the ability of the clusters to explain the variation in these parameters.
KEY WORDS AND DOCUMENT ANALYSIS
DESCRIPTORS
b.IDENTIFIERS/ OPEN ENDED
TERMS
c.COSATI
18. DISTRIBUTION STATEMENT
RELEASE TO PUBLIC
19. SECURITY CLASS (This
Report)
UNCLASSIFIED
21.NO. OF PAGES
20. SECURITY CLASS (This
Page)
UNCLASSIFIED
22. PRICE
2220-1
------- |