Recommendations for Analysis of PAMS Data


Final Report

RECOMMENDATIONS FOR ANALYSIS
OF PAMS DATA

SYSAPP94-94/01 lrl

28 February 1994

Prepared for

Terence Fitz-Simons
Monitoring and Reports Branch
Office of Air Quality Planning and Standards
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711

EPA Contract No. 68-D3-0019
Work Assignment 4-94

Prepared by

Till E. Stoeckenius
Mary P. Ligocki
Jonathan P. Cohen
Arlene S. Rosenbaum
Sharon G. Douglas

Systems Applications International
101 Lucas Valley Road
San Rafael, California 94903
(415) 507-7100

09501 9401 lrl

-------
Contents

Preface		 ii

1	INTRODUCTION 	1-1

2	PAMS DATA ANALYSIS OBJECTIVES 	2-1

3	ANALYSIS PROCEDURES 	3-1

General 	3-1

NAAQS Attainment Strategy Development	3-7

SIP Control Strategy Evaluation	3-10

Emissions Tracking	3-13

Ambient Trends	3-25

Exposure Assessment 	3-30

References 	R-l

Appendix: STATISTICAL TREND DETECTION AND ANALYSIS METHODS

9401 lrl.01

-------
Preface

With the implementation of PAMS networks in ozone nonattainment areas, a wealth of new
data on ozone precursors will be made available on a routine basis. Previously, detailed
aerometric data of this type has only been collected as part of special studies of just a few days
or months duration, most often conducted in support of photochemical modeling applications.
The availability of routinely collected data in all serious, severe, and extreme ozone
nonattainment areas will, in addition to providing information in support of photochemical
modeling, make possible a range of analyses which have not been possible until now. Results
of these analyses will provide air quality managers with valuable information on:

Relative impact of ozone and precursors transported into an air basin from upwind

sources.

Efficacy of alternative control strategies.

Possible errors in emission inventories.

Recent trends in emissions, including trends in specific source categories.

Recent trends in ozone and ozone precursor concentrations.

Population exposures to ozone, N02, and air toxics.

Insightful analysis of PAMS data requires a clear understanding of the goals of the analysis (a
precise statement of the questions to be answered by the investigation), familiarity with the
range of possible analysis techniques and their individual advantages and limitations, and the
ability to properly interpret the results. This document provides supporting information
needed to meet these requirements. In particular, it suggests analyses designed to achieve the
objectives of the PAMS monitoring program as specified in 40 CFR 58. These objectives and
the associated data uses are listed in the accompanying table together with a series of
appropriate analysis procedures. The analysis procedures and their application to PAMS data
are discussed in Section 3. It is assumed that the reader is familiar with air quality
management issues and basic aerometric data analysis procedures and has a working
knowledge of the underlying statistical techniques. Detailed directions for the application of
each technique are not included; additional information can be obtained by consulting the
References.

9401 lrl.01

-------
Monitoring objectives and associated data analysis procedures.

Monitoring Objective	Data Uses	Analysis Procedures

NAAQS Attainment

Contribution of transport

Trajectory analysis

and Control Strategy





Flux-Plane calculations

Development





Photochemical modeling



Photochemical

Initial and boundary

Spatial interpolation



grid modeling

conditions







Domain selection

Spatial interpolation





Episode selection

Episode classification





Performance

Hypothesis testing





evaluation

Graphical analysis







Analysis of residuals

SIP Control Strategy

Track effectiveness, mid-course

Comparison of ozone and precursor trends

Evaluation

corrections







Identification of most effective/ least-cost

Evaluation of VOC/NOx ratio



control strategies



Estimation of biogenic contribution







Cost/effectiveness evaluation

Emissions Tracking Corroborate inventories

Track emission reductions (e.g., RFP
requirements)

Identification of toxic "hot spots"

Comparison of NMOC/NOx, NMOC/CO
ratios with inventories
Comparison of ambient VOC speciation
profiles with inventories
Indicator species analysis
Receptor modeling

Comparison of trends in ambient NMOC,
NOx and NMOC/NOx ratios with trends
in emissions

Comparison of trends in VOC indicator
species with trends in emissions for
specific source categories

Comparison of toxics concentrations with
typical urban concentration levels

Tracking Emissions-
Related Air Quality
Trends

Identify and evaluate air quality trends

Linear regression

Analysis of variance

General linear models

Multi-site trends analysis

Estimation of trend detection probability

Time series models

Analysis of trends in extreme values

Exposure Assessment Estimate exposures to 03, N02, air toxics

Spatial interpolation
Calculation of exposure indices

9401 lrl.Ol

-------
1 INTRODUCTION

The Clean Air Act Amendments of 1990 required EPA to develop regulations for the
establishment and maintenance of Photochemical Assessment Monitoring Stations (PAMS) in
certain ozone nonattainment areas. These new monitoring stations are primarily intended to
provide improved characterizations of the temporal and spacial distribution of ozone and
precursor concentrations and the composition of precursor species in support of EPA and
State efforts to bring areas into attainment of the national ambient air quality standard for
ozone (Hunt and Gerald, 1991). Final PAMS monitoring rules were incorporated into 40
CFR 58 on February 12, 1993. The PAMS network will join the State and Local Air
Monitoring Station (SLAMS) and National Air Quality Monitoring Station (NAMS) networks
as EPA's principal source of routine air quality data in populated areas.

Data to be collected by PAMS include:1

Air Quality Concentration Data for:

target nonmethane hydrocarbons (Table 1-1)

target carbonyl compounds (formaldehyde, acetaldehyde, and acetone)

NO, N02, NOx
03

Meteorological parameters (10 m tower):
wind speed, direction
temperature
relative humidity
barometric pressure
solar radiation

Upper air measurements (pressure, temperature, dew point, winds) are also required at a
location representative of each urban monitoring network. Specific information on monitoring
schedules, site location criteria, and areas where monitoring is required can be found in 40
CFR 58 and the references sited therein.

Large amounts of data will be collected as part of the PAMS program and there exists a need
for the development and dissemination of information on how the data can best be analyzed to
achieve the objectives of the PAMS program.

1 Not all variables will be measured at all sites (see 40 CFR 58, Appendix D, Section 4). In this report,
the term "VOC species" will be used to refer to the combined set of target nonmethane hydrocarbons listed
in Table 1-1 and the target carbonyls.

9401 lrl.01

-------
This report addresses a broad range of analyses which one may wish to consider for
application to PAMS data. In many cases, details of the analysis procedure are not included
but may be found by consulting the references. It is assumed that the reader is familiar with
basic aerometric data analysis procedures and has a working knowledge of the associated
statistical methods.

As the PAMS networks are just now beginning to be deployed, there is currently no PAMS
data available with which to illustrate the analyses described in this report. However, where
previous investigations of similar data relevant to the data uses listed above have been
conducted, these are referenced in the text.

9401 lrl.Ol

-------
2 PAMS DATA ANALYSIS OBJECTIVES

A detailed discussion of the objectives of the PAMS monitoring program can be found in 40
CFR 58. Implementation of the PAMS is intended to provide improved characterization of
ozone precursor concentrations for use in developing strategies for attaining the ozone
NAAQS. The principal objectives of the PAMS program can be summarized as follows:

NAAOS Attainment and Control Strategy Development, including attainment/
nonattainment determinations, assessment of the relative contributions of local and
upwind sources, boundary conditions for photochemical modeling, episode selection,
and model evaluation.

SIP Control Strategy Evaluation, including evaluation of the effectiveness of various
control strategies in terms of the VOC species most heavily targeted by a given
strategy.

Emissions Tracking, including corroboration of NOx and VOC inventories and trends,
corroboration of VOC species source profiles, and analysis of air toxics.

Ambient Trends, including indicator selection and adjustments for variations in
meteorological conditions.

Exposure Assessment, including estimation of annual mean concentrations and the size
of effected populations.

Suggestions for analyses of PAMS data which address each of these objectives are presented
in Section 3. Criteria for location of PAMS specified in Appendix D of 40 CFR 58 identify
four different types of sites, each of which is intended to provide a different perspective on the
above objectives.

Type (1) stations are located at the upwind edge of the urbanized area, based on the
predominant morning wind direction on high ozone days. These sites are designed to
provide an urban-scale indication of the amount of ozone and ozone precursors
transported into the area from upwind source regions.

Type (2) stations are located at the downwind edge of the central area of greatest
emissions density. These sites are designed to provide a neighborhood-scale
characterization of maximum precursor concentration magnitudes, information on
exposure to toxics and ozone, and corroboration of emission inventories.

9401 lrl.01

-------
Type (3) stations are located downwind of the major emissions source region where
the highest ozone concentrations in the nonattainment area occur most frequently.

Type (4) stations are located near the downwind boundary of the urban area. These
sites are intended to characterize the air mass leaving the urban area before impacting
any downwind areas.

As indicated in Section 3, each data analysis procedure is intended to be applied to one or
more site types. It is the analyst's responsibility to ensure that data from the site type(s)
appropriate to the intended purpose of the analysis are used.

9401 lrl.Ol

-------
3 ANALYSIS PROCEDURES

Recommendations for data analysis procedures designed to achieve the PAMS monitoring
objectives are presented in this section. Following a discussion of general data handling
procedures, recommended analyses are presented in a series of subsections corresponding to
the principal monitoring objectives listed in the previous section.

GENERAL

Most data analysis studies will have several steps in common, including data checking,
handling missing data, handling measurements that fall below instrument detection limits, and
a preliminary exploratory analysis phase. Suggestions for proceeding with these steps are
presented in this subsection.

Data Checking

Although data that is ready for analysis has presumably passed through the appropriate quality
assurance/quality control procedures, it is always a good idea to double check data, especially
if it has been received from an outside source. Examination of univariate statistics such as the
mean, median, high and low quantiles, and extreme upper and lower values is easy to do,
facilitates familiarity with the data, and can provide clues to at least some types of potential
problems. Commonly encountered problems include incorrect units, misread formats, etc.
Some types of outliers can be detected via simple inspection of the univariate statistics. More
sophisticated procedures for detecting outliers in multivariate data sets are available via
principal component analysis (Preisendorfer, 1988) and cluster analysis (Anderberg, 1973).
Other potential data problems (or unusual features) can be identified by scanning time series
for periods of unrealistic or unusual constant values (plateaus) or jumps (unusual or
unrealistically large changes from one value to the next).

It is recommended that the data analyst review quality assurance reports and any other related
documentation to determine the appropriate interpretation to be applied to the data received.
This includes identification of data validity flags (if any), missing value codes, and data
handling procedures. This is particularly important in the analysis of speciated VOC data
which is obtained via relatively complex analytical methods and for which minimum detection
limit issues are especially important. Quality assurance reports and related documentation
should also be consulted to determine the uncertainty associated

9401 lrl.Ol

-------
with the measurements to be analyzed and any instances in which the predefined data quality
objectives have not been met should be noted. Uncertainty estimates are important inputs for
some of the data analysis procedures described below.

Missing Data

Missing data is a fact of life in the analysis of environmental data sets and must always be
properly taken into account. Minimum data completeness requirements should be specified
for each analysis. In the computation of mean values, it is common practice to apply a 75%
data completeness requirement. This requirement can also be applied to the calculation of
maximum values (e.g., daily maxima) so long as periodicities in the data are accounted for.
For example, EPA's requirement for a valid daily maximum ozone concentration applies the
75% criteria to only hourly values between 9 am and 9 pm LST, since nighttime readings do
not generally provide any information about the maximum value (40 CFR 50, Appendix H).

Most statistical data analysis software is designed to operate only on cases with complete
data; cases with missing data are ignored. This is not always appropriate, however, (e.g.,
examining trends in the number of exceedances, time series analysis, etc.) and in these
situations procedures for estimating missing values must be used. For example, estimates of
the number of times per year that a concentration threshold is exceeded based on incomplete
data records can be made by assuming that the probability of exceedance on missing days is
the same as on non-missing days (see 40 CFR 50, Appendix H for an example application to
ozone NAAQS exceedances). For data with known periodicities, improved estimates can be
obtained by making separate estimates for each period (e.g., seasons). More general types of
serial dependence in the data can be accounted for by using the time series modeling
techniques described in the Appendix to fill in the missing values. For example, improved
estimates of annual mean concentrations can be obtained in this way (assuming the time series
mode fits the data well). A similar approach in the spatial domain can be used to fill in missing
station values when computing spatial averages.

Detection Limits

Concentrations of VOC species in ambient samples are often found to be at or below the
detection limit of the analytical method used to estimate them. It is anticipated that method
detection limits (MDLs) will be estimated for each species and reported along with the
concentration data from PAMS networks. It is unknown at this time if values below the MDL
(possibly including negative values) will be reported as is or set to some constant value (e.g.,
zero) or code. In either case, care must be taken in analyzing these data to account for the
uncertainty in sub-MDL values. Mean values calculated under the assumption that all sub-
MDL values are equal to some constant between zero and the MDL may be biased. Methods
for estimating unbiased mean values have been developed (Dempster et al., 1977; Gilbert,
1987) and applied (Stoeckenius et al., 1993). These methods are based on the assumption
that all concentrations represent random samples

9401 lrl.01

-------
from a parametric family of distributions and seek to estimate the parameters of the
distribution using only values falling above the MDL (i.e., a censored sample).

Detection limits must also be considered when interpreting small NO, N02, and NOx
concentrations. Generally speaking, for monitoring methods currently in widespread use, N02
concentrations below 5 or 10 ppb are not reliable.2 More sensitive monitoring techniques are
available and may be used in PAMS networks. Ozone concentrations at PAMS sites are likely
to be above detection limits nearly all the time so special data handling procedures are
generally not needed.

Exploratory Analyses

Most analyses of PAMS data will include a preliminary, exploratory phase designed to
evaluate the basic distributions of the measurements and the relationships between them.
Results of the exploratory analyses provide useful summaries of the data and can be used to
formulate working hypotheses regarding the data features of interest which are then used to
guide further work. Some of the more common and useful exploratory data analysis
techniques include:

Scatter plot matrices: Can be used to evaluate the relationship between small groups
of variables such as ambient NOx and VOC concentrations and meteorological
variables.

Box plots: Provide a convenient graphical summary of the key features of the
distribution of a variable. By positioning several pox plots side-by-side, one can
visually compare differences between distributions.

Ouantile-quantile plots: Two distributions can be compared by plotting their
corresponding quantiles on an x-y graph. For example, daily acetone (i.e., the data
values corresponding to a given percentile) concentrations from PAMS in two
different urban areas could be compared in this way. If the distributions are identical,
the plotted points will fall along a straight line with zero intercept and slope equal to
one. Differences in the distributions can be diagnosed by examining the intercept,
slope, and deviations from linearity as described by Chambers et al. (1983). Quantile-
quantile plots are also useful for comparing data with a given family of parametric
distributions. The intercept and slope of a straight line fit to the data points can be
used as estimates of the mean and variance and the correlation of the x and y data
points are a measure of the normality of the data. Departures from

2 Interpretation of N02 and NOx measurements made using catalytic conversion of N02 to NO must
take into account potential interference by peroxyacetyl nitrate (PAN) and nitric acid (HN03). See
Purdue et al. (1991).

9401 lrl.01

-------
linearity indicate departures from the expected normal distribution.

Other useful exploratory data analysis techniques are described by Tukey (1977) and
Chambers et al. (1983).

NAAQS ATTAINMENT STRATEGY DEVELOPMENT

Photochemical grid models will be used to develop strategies for attaining the ozone NAAQS
in areas where PAMS networks are being established. Information on ozone and precursor
concentrations provided by the PAMS can be used to evaluate the influence of transport on
ozone and precursor concentrations and to assist in selecting episodes to be modeled, defining
the modeling domain, and estimating initial and boundary conditions. PAMS data can also be
used in the evaluation of model predictions.

Analysis of Ozone and Precursor Transport

For those nonattainment areas in which the measured ozone concentrations are potentially
influenced by the transport of ozone and ozone precursor pollutants from outside the
nonattainment area, the PAMS data may be used to assist in the qualitative and quantitative
investigation of such transport. Qualitatively, examination of the wind and pollutant
concentration data for the upwind (Type 1) monitoring site for the nonattainment area and the
urban and downwind (Type 2, 3, and 4) monitoring sites located in the potential source or
upwind area may, for a given episode, support or bring into question the hypothesis for
pollutant transport. Quantitatively, several methods of varying complexity can be used to
assess the potential for transport. These include (but are not limited to) trajectory analysis,
flux-plane calculations, and photochemical modeling. Because each of these methods rely
upon measured data for input, the use of PAMS data will enhance the reliability of each of
these approaches.

Trajectory analysis typically involves the calculation of two- or three-dimensional backward
particle paths (the difference is that vertical motion is accounted for in the three-dimensional
particle-path calculations) for which the destination locations and times correspond to the
monitoring sites and times at which high ozone concentrations were measured (typically at the
Type 3 monitor). Surface and upper-air wind data from PAMS and other sources, as
appropriate, are used in calculating the two- or three-dimensional particle paths. This type of
analysis coupled with the analysis of ambient air quality measurements along the estimated
transport pathways will provide a quantitative assessment of the potential for transport.
Examples of this type of analysis are provided by Roberts et al. (1993).

Calculation of the flux of ozone and precursors across a boundary is a more rigorous way to
quantitatively estimate the potential for pollutant transport. Simple flux calculations,
performed using wind and pollutant concentration measurements at multiple locations along
the boundary, may provide some indication of the magnitude of pollutant transport. The
PAMS monitoring network itself can provide only a rough approximation of the flux due to its

9401 lrl.Ol

-------
limited spatial coverage. However, PAMS data may be used to supplement a routine or
special study monitoring network so that flux-plane calculations utilizing surface and upper-air
wind and pollutant concentration measurements may be carried out.

Photochemical modeling of multi-urban areas is the most sophisticated approach to examining
ozone and precursor transport. Modeling will be enhanced through the use of the PAMS
data. As discussed in the modeling subsection below, PAMS meteorological data will assist in
the identification and representation of inflow boundaries both at the surface and aloft; PAMS
ozone, NOx, and especially the speciated hydrocarbon data will aid in the specification of
inflow boundary conditions for the model.

In summary, the PAMS data will provide valuable information regarding the transport of
ozone and precursor pollutants from one nonattainment area to another. However, transport
analysis results will be subject to numerous uncertainties due to the limited spatial (horizontal
and vertical) extent of the monitored data. Because the upwind (Type 1) and downwind
(Type 4) monitors will be sited based on an analysis of the airflow patterns that are most
frequently associated with ozone episodes in the nonattainment area, the usefulness of the data
in examining transport for a particular episode will depend very much on the existence of
"typical" airflow patterns. Additionally, transport conditions may occur prior to an ozone
episode and under a different set of meteorological conditions. Since the PAMS network
configuration by itself may not be well suited to the assessment of precursor transport during
such pre-episodic transport conditions, additional data from other sources may have to be
collected.

Modeling

PAMS data will enhance photochemical modeling efforts in several ways including selection
of the modeling domain, selection of the modeling episodes, specification of the initial and
boundary conditions, and evaluation of model performance.

Modeling domain selection for photochemical grid modeling involves consideration of the
distribution of major emissions sources, the locations of the meteorological and air-quality
monitoring sites, and the typical wind directions associated with ozone episodes in the
nonattainment area. While the modeling domain is typically centered on the nonattainment
area, it is important that the domain allow for the resolution of ozone and precursor advection
both upwind and downwind of the area of interest. For a given episode, the PAMS
meteorological data may provide an indication of the important inflow boundaries. Based on
this information, the inflow boundaries should be located sufficiently far away from the urban
area so as to limit the influence of the boundary conditions on the simulation of pollutant
concentrations within the area of interest. Pollutant concentration data from the downwind
(Type 3 and 4) PAMS monitoring sites will be especially useful in determining how far beyond
the urban area the modeling domain should extend to ensure that all exceedances of the
National Ambient Air Quality

9401 lrl.Ol

-------
Standard (NAAQS) for ozone (and possibly other pollutants) that are a result of the emissions
from the urban area can be simulated within the selected modeling domain.

By providing an improved understanding of the spatial and temporal characteristics of ozone
episodes as well as the meteorological conditions associated with the episodes, the PAMS
data will allow a more informed selection of episodes for modeling. According to EPA
guidance (EPA, 1991a), episodes for photochemical modeling should be selected such that
those meteorological regimes most frequently associated with ozone episodes in the
nonattainment area are represented in the modeling analysis. The magnitude of the observed
ozone concentrations, as well as the geographic extent and duration of the high ozone
concentrations are also important considerations.

Photochemical models generally require gridded estimates of the concentrations of the
simulated chemical species at the initial simulation time as well as hourly estimates of the
pollutant concentrations along the outer (lateral and top) boundaries of the modeling domain.
Initial concentration fields are generally prepared by interpolating air quality data collected
within the modeling domain. The PAMS data will contribute to estimation of the initial
surface-layer pollutant concentrations (including precursor concentrations) and will provide
useful information regarding the concentration differences and gradients between the upwind,
urban, and downwind sites.

Hourly boundary conditions along the lateral inflow boundaries of a modeling domain are
generally estimated using observed air quality data at monitors near the inflow boundaries,
including those just outside the boundaries. For those modeling episodes for which the inflow
boundary corresponds to the location of the upwind PAMS monitoring site (i.e., typical
meteorological conditions), the PAMS data will provide detailed information regarding the
surface-layer inflow boundary conditions. The NOx and speciated hydrocarbon data will be
especially useful in specifying the inflow boundary conditions for these pollutants.

Finally, the PAMS data will contribute significantly to the evaluation of photochemical model
performance. Before a photochemical model can be used reliably to examine pollutant
transport issues or evaluate ozone attainment strategies, it must be shown to be able to
reasonably replicate the spatial and temporal pollutant concentration patterns associated with
one or more historical ozone episodes. The determination of model performance is
accomplished through model performance evaluation - a procedure which generally includes
the application of a variety of graphical and statistical analysis techniques as described in detail
by EPA (1991a).

The additional data provided by the PAMS will supplement the graphical and statistical
analysis of the simulation results and will be useful in determining whether the model is able to
properly simulate the magnitude and timing of the ozone concentrations upwind of, within,
and downwind of the urban area. The availability of NOx and speciated hydrocarbon data will
enable examination of how well the model is able to simulate non-ozone precursor species.
Comparison of measured and simulated species ratios, such as the hydrocarbon-to-NOx ratio,
may allow evaluation of the way in which the photochemical processes are simulated. The
NOx and speciated hydrocarbon data, especially those from the urban (Type 2) monitoring

9401 lrl.Ol

-------
sites, may also provide a gross check on the emission inventory used to drive the
photochemical model in terms of emission levels, hydrocarbon-to-NOx ratios, and the assumed
hydrocarbon speciation used in the modeling analysis (see discussion on Emissions Tracking
below).

SIP CONTROL STRATEGY EVALUATION

PAMS networks will provide a greatly expanded pool of data on ozone precursor
concentrations which can be used to assist in the selection of alternative future control
strategies. PAMS data can be used to evaluate the effectiveness of emission control measures
which have been put in place since the introduction of the PAMS network by comparing
ozone and precursor trends. PAMS data can also be used to assist in the identification of the
most effective/least cost control strategies.

Comparison of Ozone and Precursor Trends

Routine measurements of ozone and precursor concentrations provided by the PAMS
network can be used to assess the effectiveness of emission control strategies by comparing
recent changes in precursor concentrations with resulting changes in ozone concentrations.
Procedures for detecting and evaluating trends in ambient concentrations are described in the
"Trends" section below. Total NMOC and NOx trends can be evaluated and compared with
ozone trends. Ozone trends during periods of contrasting NMOC and NOx trends can be
evaluated to determine the relative influence of VOC vs. NOx controls. It is also constructive
to compare trends in NMOC/NOx ratios with ozone trends. Regression analysis can be used
to develop and test models that relate changes in precursors to changes in ozone. Issues to be
concerned with when preparing ambient data for these analyses and analyzing results are
similar to those described in the "Emissions Tracking" section below.

Identification of Most Effective/Least-Cost Control Strategies

Assessment of the most-effective/least-cost control strategy requires a number of pieces of
information which can be supplied at least in part by data from PAMS networks. These
include the overall ratio of VOC to NOx in the atmosphere, the contribution of anthropogenic
sources to VOC concentrations, and the costs of various emission control options for each
precursor.

VOC to NO.. Ratio

Modeling studies using both the Empirical Kinetic Modeling Approach (EKMA) and grid-
based models have shown that NOx reductions can result in either increases or decreases in
peak ozone concentrations depending on the ratio of VOC to NOx, among other factors. A
study by Chang et al. (1989) used EKMA to evaluate the sensitivity of ozone to VOC

9401 lrl.Ol

-------
reductions in 20 cities. The results showed that at VOC/NOx ratios of less than 8.5, VOC
reduction alone was more effective in reducing peak ozone concentrations than VOC
reduction with proportional NOx reduction. Conversely, where the ratio is much greater than
8.5, (i.e., in NOx-limited regimes), NOx reductions may be particularly effective and VOC
reductions ineffective. A recent National Research Council report also examined this issue
(NRC, 1991).

Besides determining the relative influence of VOC vs. NOx reductions, the VOC/NOx ratio has
been shown to influence the relative reactivity, or ozone-forming potential, of various species
of VOCs (Dodge 1984). Specifically, when the ratio is high, the difference in reactivity
among species is greater than when the ratio in low. Thus, the VOC/NOx ratio may indicate
whether or not VOC control would be more effectively targeted at particular species.

The above discussion indicates that the VOC/NOx ratio may be used as a general guide to the
effectiveness of control of VOC and NOx, and whether VOC control should be targeted to
particular species. Of course, the actual emission control options available and their
associated costs (discussed below), must also be considered in constructing a cost-effective
control strategy.

Specification of what values of the VOC/NOx ratio are generally considered "low" or "high"
may be assisted by consulting previous studies, such as Chang et al. (1989). In comparing
ratios from such studies, it is important to consider whether or not the ratios are from
emissions estimates or ambient data (i.e., NMOC/NOx ratios) and, if from ambient data, the
manner in which the data was processed to arrive at the ratio. As indicated in the "Emissions
Tracking" section below, significantly different ratios can be obtained depending on how the
data was processed; some suggestions are presented in the following discussion.

The NMOC/NOx ratio will vary diurnally and geographically. The ratio of most interest for
evaluating emission control strategies for ozone attainment would be the morning ratio
corresponding to the conditions that result in the basinwide maximum ozone concentration
later in the day. Therefore, the appropriate ratio to use would be that observed at the type 2
PAMS monitor that is downwind of the area of maximum precursor emissions on the peak
ozone day being analyzed. PAMS meteorological data could be combined with other wind
data to perform a back trajectory calculation from the time of the ozone maximum to
determine the appropriate type 2 monitor and the appropriate time periods for calculating the
ratio.

Better estimates of the relative effectiveness of emission reductions focused primarily on
VOCs, primarily on NOx, or on both precursors may be made on the basis of site-specific
modeling. Besides considering local conditions, a city-specific EKMA isopleth diagram gives
information on potential ozone reductions associated with emission reductions, accounting for
the fact that the NMOC/NOx ratio will change as controls are put into place. Because EKMA
results are known to be quite sensitive to the ambient NMOC/NOx ratio, the use of PAMS
data for this important piece of input data can significantly reduce the uncertainty of EKMA
estimates.

9401 lrl.01

-------
EKMA modeling focuses only on the basinwide maximum ozone concentration; it may not
capture important geographic and temporal variations in the NMOC/NOx ratio, or geographic
variations in resulting ozone changes. The more detailed analysis possible with a grid-based
model may also be enhanced with PAMS data for initial and boundary conditions of ozone
and its precursors. Multiple simulations with a grid-based model may be used to construct
ozone isopleth diagrams, similar to those produced by EKMA, for a variety of ozone air
quality indicators, including the basinwide maximum ozone concentrations, the maximum
concentration in each basin subregion, and population exposure statistics.

Estimation of the Contribution of Biogenic Emissions to VOCs

As noted above, the actual options for emissions controls, as well as their associated costs,
must also be considered in the construction of a cost-effective emission control strategy. The
greater the contribution of presumably uncontrollable biogenic emissions to VOC
concentrations, the greater will be the reduction in anthropogenic VOC emissions required to
achieve a given reduction in overall VOC emissions.

An estimate of the contribution of biogenic sources of VOC emissions to the overall level of
NMOC concentrations in the nonattainment area may be made from an analysis of the relative
abundances of biogenic and anthropogenic VOC species using PAMS speciated data sets.
The three most prevalent biogenic hydrocarbons, isoprene and a- and P-pinene, will be
measured in the PAMS program. Together, these three species comprise a majority of
biogenic emissions. However, all three are much more reactive than most anthropogenic
hydrocarbons. Assessments of the relative importance of anthropogenic and biogenic
emissions should take this reactivity into account. Differences in spatial and temporal patterns
of biogenic emissions also complicate the use of PAMS data to assess the relative importance
of biogenic and anthropogenic emissions. The use of PAMS data to verify biogenic emission
inventories is discussed further in the "Emissions Tracking" section.

Evaluation of Costs of Emission Reduction Options

As noted above, the costs of the control options must be considered in conjunction with their
effectiveness in constructing cost-effective emission control strategies. In situations where the
number of emission control options is limited, it may be possible to systematically evaluate the
cost and effectiveness of all the various potential combinations of measures that are estimated
to lead to attainment. Where a multitude of possibilities exist, a software system currently
under development, the Attainment Strategies Assessment Package (ASAP; Rosenbaum,
1994), may be utilized. ASAP combines Urban Airshed Modeling (UAM) and control cost
data, with statistical and mathematical optimization techniques to estimate the least cost
combination of emission control options that will achieve a set of user-determined air quality
goals. Possible goals include attainment of the ozone NAAQS, no significant deterioration in
any sub-region, and achievement of various population exposure statistics. PAMS data can
provide valuable information for the UAM applications required by ASAP as discussed in the
"NAAQS Attainment Strategy Development" section above.

9401 lrl.Ol

-------
EMISSIONS TRACKING

Speciated VOC data to be obtained at PAMS sites will allow more detailed reconciliation of
ambient measurements and emission inventories than has been possible to date in most cities.
Measurements taken at PAMS sites during the first few years of the program can be used as a
"top-down" verification of the current-year emission inventories. After the PAMS databases
have accumulated over several years, they can be used as a check on emission reduction
trends, including checks on compliance with reasonable further progress (RFP) requirements.
The next few years will see a number of new emission regulations go into effect, including
reformulated gasoline and toxics regulations. The PAMS data will be useful in tracking the
ambient impacts of these programs.

Among the types of PAMS sites described in Section 4 of Appendix D, type 2 sites will be
most important for tracking emissions originating within the nonattainment area. These are
sites located in the central portion of each urban area that is expected to exhibit the highest
ambient concentrations of ozone precursor emissions. Type 1 sites will be useful for tracking
trends in upwind emission sources so long as chemical transformations of primary emissions
between release and impact at the site are accounted for.

Verification of Current Inventories

The use of ambient data to check the validity of current emission inventories relies on the
construction of characteristic ratios, such as NMOC/NOx, NMOC/CO, or the ratios of
individual VOCs to NMOC. As such, this type of verification cannot determine whether or
not the overall inventory is overestimated or underestimated, only if the relative proportions
of various components of the inventory are correct. When used in conjunction with
photochemical modeling, however, a more complete picture of inventory accuracy can be
obtained by examining model performance for NMOC and NOx.

Inventory verification should include checks on both the total NMOC inventory and the
individual components of the inventory. For the total NMOC inventory, this process involves
the use of two characteristic ratios: NMOC/NOx, and NMOC/CO. This check assesses the
validity of the VOC inventory as a whole, relative to the NOx and CO inventories. The
validity of the base VOC inventory is important to emissions planning and RFP
demonstrations. The check on the individual VOCs assesses the validity of emission
speciation data. This is important for assessing the accuracy of emission estimates for toxics,
and for assessing the reactivity of the VOC inventory as it is used for photochemical
modeling.

9401 lrl.Ol

-------
NMOC/NOx and NMOC/CO Ratios

The NMOC/NOx ratio has been used as the primary tool to understand the relative importance
of NMOC and NOx precursors to ozone formation. In a simplified system such as a box
model, when NMOC/NOx is low (well below 10), the atmosphere is considered to be
"hydrocarbon-limited", and a VOC-based emission control strategy is necessary for ozone
reductions. In some such cases, NOx emission reductions will actually increase ozone.
Conversely, when NMOC/NOx ratios are well above 10, the atmosphere is "NOx-limited", and
NOx emission controls will be more effective than VOC emission controls in reducing ozone.
In the actual atmosphere, however, wide variations in the measured NMOC/NOx ratio occur,
making interpretation less straightforward than it would seem from this simplistic picture.

Measured NMOC/NOx ratios have been used to evaluate the VOC emission inventory in the
Los Angeles area (Fujita et al., 1992). Starting with the assumption that the NOx emission
inventory was relatively accurate, comparison of measured and inventory NMOC/NOx ratios
was used to suggest that motor vehicle VOC emissions in the inventory were underestimated.
PAMS data will be instrumental in determining whether this finding is universal, or is limited
to particular geographic locations or meteorological conditions.

Measured NMOC/CO ratios may be less useful for emissions tracking than the NMOC/NOx
ratios, since CO inventories are suspected to contain the same types of underestimates as the
VOC inventories. However, CO is the best tracer for motor vehicle exhaust of all the criteria
pollutants. Observation of ambient NMOC/CO ratios near the values predicted by motor
vehicle emission models and high correlation between measured NMOC and CO will indicate
that motor vehicle exhaust is a major component of ambient concentrations in a given
monitoring site.

Calculation of NMOC/NO., and NMOC/CO Ratios

PAMS NMOC data will be obtained as eight 3-hour averages on the days sampled.

Therefore, ambient NMOC/NOx and NMOC/CO ratios should be constructed as 3-hour
averages, where 3-hour NOx and CO concentrations are obtained from the 1-hour averages at
the same site. The ratios should be constructed with NMOC expressed in units of ppbC and
NOx and CO expressed in units of ppb.

In order to construct NMOC/NOx and NMOC/CO ratios for the emission inventory, the
inventory totals must be converted from a mass basis (e.g., tons per day) to a molar basis.

This is accomplished by dividing by the molecular weight. By convention, a molecular weight
of 46 is used for NOx, expressing all NOx as N02. The molecular weight of CO is 28. For
NMOC, an average molecular weight per carbon atom is needed for the conversion. This
average molecular weight will vary from inventory to inventory, but is generally on the order
of 14. Gridded emission inventories prepared for input into the Urban Airshed Model (UAM)
have reactive organic emissions expressed as methane. For these inventories, a molecular
weight per carbon atom of 16 should be used.

9401 lrl.01

-------
The NMOC/NOx and NMOC/CO ratios for the emission inventory must be developed for
each 3-hour period, since diurnal patterns of emissions can vary significantly from source to
source. Source-specific diurnal profiles should be used for this purpose. If a gridded hourly
emission inventory is available, 3-hour averages can be developed from the hourly values.

In making a comparison between ambient and inventory NMOC, it is important to recognize
the varying operational definitions of NMOC and how they may differ in ambient
measurements and emission inventories. According to Purdue et al. (1991), total NMOC is
calculated in a GC/FID-based measurement system by summing the concentrations associated
with all identified and unidentified peaks within the elution time window of the analysis. This
restricts the species included in total NMOC to those with boiling points below a certain point
that respond in a flame ionization detector (FID). In general, this includes all hydrocarbons
(i.e., organic compounds containing only carbon and hydrogen) with 2-10 carbon atoms.

Many moderately polar oxygen- and nitrogen-containing organic compounds with 2-7 carbon
atoms will be included in NMOC totals; however, their response factors in the FID are lower
than those of the hydrocarbons and thus their contribution will be underestimated. Highly
polar organic compounds will not elute from the GC column and hence will not be included
although they may be included in the VOC emission inventory. Alternative analytical methods
could be used to alleviate some of these potential biases. The specific analytical method used
should be considered when analyzing the ambient data.

Therefore, in order to make an accurate comparison, the inventory NMOC totals should be
adjusted to exclude hydrocarbons with more than 10 carbons, moderately polar oxygen-
containing compounds with more than 7 carbons, and all highly polar compounds. In practice,
this type of detailed adjustment may not be feasible. However, analysts should be aware that
this is a potential source of bias that would tend to result in high NMOC/NOx and NMOC/CO
ratios for the emission inventory relative to the ambient measurements. The magnitude of this
adjustment has been reported to be 5 to 7 percent in one study (Fujita et al., 1992).

Diurnal Variations

Comparison of measured NMOC/NOx ratios to those from emission inventories has
historically focused on the 6:00 - 9:00 a.m. time period. There are several reasons for this.
First, this time period represents the morning commute during which large quantities of fresh
emissions are introduced to the atmosphere. Second, atmospheric mixing heights are low
during this time period, keeping ground-level concentrations high and preventing the fresh
emissions from mixing with aged air aloft. Finally, levels of photochemical activity are low in
the early morning, minimizing the effects of atmospheric transformation on VOC and NOx
concentrations. In a more practical sense, the 6:00 - 9:00 a.m. ratios are needed as input to
the EKMA model. For these reasons, NMOC concentrations have historically been measured
only during the 6:00 - 9:00 a.m. period.

The 6:00 - 9:00 a.m. NMOC/NOx ratios tend to emphasize the importance of motor vehicle
exhaust, since this source peaks during the 6:00 - 9:00 a.m. time period. Other VOC sources
such as biogenic hydrocarbons, non-road mobile sources, and evaporative sources of all types,

9401 lrl.01

-------
peak at midday or in the afternoon. In addition, elevated NOx sources do not always
contribute to 6:00 - 9:00 a.m. NOx concentrations at ground level but may mix down later in
the day. There is a growing awareness that these processes may contribute significantly to
ozone formation.

Part of the purpose in collecting 3-hour NMOC samples throughout the day in the PAMS
program is to move away from complete reliance on 6:00 - 9:00 a.m. NMOC/NOx ratios.
However, comparing NMOC/NOx ratios at other times of day increases the potential for
artifacts arising from the relative reactivities of NMOC and NOx. NOx reacts more rapidly
than most VOCs, so the NMOC/NOx ratio in aged emissions will be higher than the ratio in
fresh emissions. This effect is most pronounced during the hours of 9:00 a.m. to 3:00 p.m.,
when photolytic activity is maximized. Thus, measured NMOC/NOx ratios taken during this
time period may be biased high relative to the inventory ratio.

For sites located close to emission sources, such as the PAMS type-2 sites, this bias will be
minimized because fresh emissions are likely to dominate the local concentrations throughout
the day. Meteorological data obtained on-site can be used to identify days in which stagnation
or changes in wind direction had the potential to cause significant buildup of emissions, or
transport of aged emissions, at that site. Midday NMOC/NOx ratios obtained on such days
should not be used for emissions tracking.

One way to mitigate the effects of NOx reactivity is to calculate the ratio of NMOC to total
reactive nitrogen, NOy, rather than NOx. In addition to NOx, NOy includes all major products
of NOx oxidation, such as nitric acid, aerosol nitrate, and PAN. Measurements of NOy are not
required at PAMS sites, however, and are difficult to make.

The reactivity of CO is much lower than that of most VOCs. Thus, NMOC/CO ratios in aged
emissions will be lower than those in fresh emissions. The direction of potential bias in
midday ambient NMOC/CO ratios is thus opposite to that for NMOC/NOx.

Day-to-Dav Variations

PAMS measurements represent conditions at a single location, and may not be representative
of the air basin as a whole. On any given day, ambient concentrations at a particular site may
be influenced by particular nearby sources. A seasonally averaged NMOC/NOx ratio at a
given site will be more representative of the overall mix of emissions in the vicinity of the site.
Thus, emissions tracking comparisons should rely on seasonal averages of ambient
NMOC/NOx rather than values from a single day or a single ozone episode.3 The variability of
NMOC/NOx ratios for a given 3-hour time period at a given site will provide an estimate of
the uncertainty in the seasonal average ambient NMOC/NOx ratio. Possible
weekday/weekend effects should be explored by comparing weekday and weekend averages.

3 This applies to tracking of seasonal inventories. Inventories used in photochemical modeling of
specific episodes may differ from seasonal inventories due to day-specific factors such as temperature
and point source activity. Such inventories are difficult to verify with ambient data.

9401 lrl.01

-------
Most emission inventories are designed to be representative only of weekday emissions;
weekend emission characteristics may be markedly different.

An exception to this general rule may arise in the case of specific episodes to be used for
photochemical modeling. In this case, day-specific, gridded hourly emissions will be available,
and it may be of interest to perform site- and day-specific NMOC/NOx comparisons.

Spatial Variations

The PAMS regulations call for one or two PAMS type-2 sites to be located in each urban
area. Thus, there will be minimal opportunity to assess spatial variations of the ambient
NMOC/NOx ratio in emission-dominated sites. Nonattainment areas with multiple urbanized
areas will be required to establish at least one type-2 site in each urbanized area. For these
regions, the spatial variation in seasonal-average NMOC/NOx for each 3-hour time period will
provide another measure of the uncertainty in the measured ratio.

Assessment of the Agreement between Ambient and Inventory NMOC7NO..

As noted above, ambient NMOC/NOx ratios can be calculated in a number of different ways
for comparison with emissions estimates. For most methods, a mean ratio is calculated from a
series of observations. Assuming the ratios are approximately normally distributed, a suitable
90 or 95% confidence interval can be constructed for the mean and compared with the ratio
obtained from the inventory. If the confidence interval does not include the inventory ratio,
then it is highly likely that a real discrepancy exists. Otherwise, one can conclude that the
ratios are in agreement to within the uncertainty of the ambient data.

VOC Composition

The PAMS speciated VOC measurements will allow tracking of individual components of the
VOC inventory. The primary purpose of emissions tracking for individual VOC at PAMS
sites is to validate VOC composition used as input to photochemical models. The importance
of VOC composition arises from the fact that individual VOCs exhibit widely ranging
potentials for ozone formation. Even if current inventories accurately represent the total
tonnage of VOC emissions, they may not allocate the correct overall reactivity to those VOC
emissions. This has important ramifications for photochemical modeling, since it will impact
model performance and modeled response to control strategies.

The PAMS VOC data will also assist in verification of toxic emission inventories and
identification of potential problem areas and unreported sources for toxics. Such
measurements also have the potential to be used in receptor modeling analyses for VOC
source apportionment.

9401 lrl.01

-------
Calculation of Ambient Speciation Profiles

In order to compare ambient speciated VOC measurements with speciated emissions, both
must be expressed as a percent of total NMOC. Emissions speciation data are reported in
terms of weight percent. To convert ambient VOC data from ppbC to weight percent of
NMOC, the molecular weight (MW) and number of carbons (#C) must be known for each
compound. The conversion formula is:

cone (ppbC) * MW/#C * 100

weight percent of NMOC =

NMOC (ppbC) * 14

Again, the value of 14 in the denominator is an approximation to the average MW/#C for the
inventory as a whole and may vary from inventory to inventory.

An acceptable alternative to this rather tedious exercise is available for hydrocarbons. Since
MW/#C is always between 13 and 15 for VOC that only contain carbon and hydrogen, the
weight fraction can be approximated by the ratio of the ppbC concentration to NMOC.
However, if oxygen-containing species are included, the error in this approximation is
appreciable and the full calculation should be performed (the ratio of MW to #C is 30 for
formaldehyde).

To obtain a speciated VOC inventory, source-specific speciation profiles are required. These
are available through EPA's SPECIATE database (EPA, 1991b), as well as other sources
(e.g., Scheff et al., 1989, 1992). The SPECIATE database provides cross-references of
profiles to inventory SCC codes. It is important to note that SPECIATE profiles are
expressed as weight percent of TOG rather than NMOC. In order to convert to NMOC, the
methane weight percent in the SPECIATE profiles should be subtracted and the remaining
species re-normalized to total 100 percent. In some cases, default speciation profiles may not
be appropriate and it may be advisable to supplement the SPECIATE profiles with locally-
obtained data. This is especially true of industrial sources such as chemical plants and
refineries.

As with the NMOC/NOx ratios, the ambient speciation profiles should be developed as 3-hour
averages throughout the day. Seasonal averages should be calculated for each 3-hour period
for each site, and these seasonal averages should be the primary basis for comparison with the
emission inventory.

When gridded inventories are prepared for input into the UAM, VOCs are reported according
to their speciation in the Carbon Bond Mechanism (CBM). When the primary purpose of
speciated VOC comparisons is to assess potential effects on photochemical modeling results,
it may be of interest to convert measured speciated VOCs to their corresponding CBM class.
This comparison allows for the possibility of compensating errors in the inventory and focuses
on the accuracy of the representation of overall reactivity of VOC emissions. Information on
CBM classes can be found in EPA (1992b).

9401 lrl.01

-------
Indicator Species

Some chemical species are uniquely or nearly uniquely associated with a particular source, and
can be used as a "tracer of opportunity" for that source. For example, tetrachloroethylene is a
tracer for dry cleaning operations. If indicator species can be identified for major sources
within an urban area, considerable information on source contributions can be obtained
without the need for sophisticated receptor modeling approaches.

The properties of a good indicator species, besides being uniquely associated with a particular
source, are that it should be among the target compounds routinely measured at the PAMS, it
should be present at high enough concentration to be reliably detected, and that it should be
fairly slow-reacting. Using these criteria, butane emerges as a good candidate for a tracer for
gasoline evaporative emissions. Butane constitutes roughly 35 percent of motor vehicle
evaporative emissions. Using the assumption that butane has no other major sources, a
reasonable upper bound estimate on the contribution of gasoline evaporative emissions can be
obtained from the ambient butane percent of NMOC.

Tracers for motor vehicle exhaust are more difficult to come by. Acetylene has been used in
the past as a tracer for exhaust (Whitby and Altwicker, 1978). However, measurement
difficulties for C2 hydrocarbons, including low recovery and co-elution with ethene and
ethane, are commonly encountered and thus reliable measurements of acetylene are not
guaranteed unless proper analytical methods, such as a dual column GC, are used. In
addition, the acetylene content of auto exhaust has decreased in recent years. Benzene has
also been proposed as a tracer for motor vehicle exhaust. Since the toxic potential of benzene
has been realized, most uses of benzene in solvents and industrial processes have been
eliminated. However, benzene is still present in gasoline and is a relatively large component of
motor vehicle exhaust. Benzene is present in evaporative emissions as well as exhaust, and is
present in emissions from numerous combustion processes in addition to motor vehicles.
Nevertheless, it might be useful in developing an upper bound estimate of the contribution of
motor vehicle exhaust.

The aromatic compounds benzene, toluene, and xylenes are present in a characteristic ratio in
motor vehicle exhaust. Excess toluene beyond the amount expected for motor vehicle exhaust
is an indicator for surface coating processes such as those from solvent-based paints. Propane
can be an indicator for liquefied petroleum gas (LPG) in areas where LPG is used, and is also
an indicator for refinery emissions and oil and gas production. Ethane is an indicator for
natural gas leakage (but is hard to measure as noted above). Isobutane is the propellant that
replaced chlorofluorocarbons in most consumer aerosol products, and may be a tracer for
consumer product emissions.

Non-road mobile sources have recently been identified as a major source category for VOCs
(EPA, 1991a). VOC speciation profiles for nonroad sources are not currently available.

Based on the limited data currently available, it is likely that emissions from 4-stroke gasoline
engines resemble those from non-catalyst motor vehicles, while the emissions from 2-stroke
gasoline engines resemble a mixture of non-catalyst vehicle exhaust and pure gasoline (Hare
and White, 1991). Thus, source attributions for motor vehicle exhaust are likely to include

9401 lrl.Ol

-------
the contribution of 4-stroke nonroad engines. Source attributions for unburnt gasoline or
gasoline spillage may include the contribution of 2-stroke gasoline engines. Evaporative
emissions from nonroad sources are expected to be negligible (EPA, 1991a).

Isoprene and a- and P-pinene are tracers for biogenic emissions, although the pinenes are also
present in consumer products such as air fresheners. All three of these compounds violate one
of the guidelines for a good indicator species in that they are all highly reactive. However, no
non-reactive tracers for biogenic emissions have been identified to date, so there is little
choice but to use these. Because of their reactivity, source attributions based on ambient
concentrations are likely to underestimate the contribution of biogenic emissions.

The high reactivity of biogenic hydrocarbons has two effects. First, the extremely short
atmospheric lifetimes (on the order of minutes for isoprene) will reduce midday concentrations
by comparison with fresh emissions. Second, the high efficiency of these species in producing
ozone means that their contribution to ozone production may be much greater than their
contribution to total NMOC. To adjust for this second effect, the Maximum Incremental
Reactivity / Maximum Ozone Reactivity (MIR/MOR) approach developed by Carter can be
used. Each species concentration is weighted by a factor corresponding to its reactivity
towards ozone formation. In this way, the contributions of biogenic and anthropogenic
sources to ozone formation can be assessed.

Other issues besides reactivity complicate emissions tracking for biogenic emissions as well.
The PAMS type-2 sites, designed to be located in the area of maximum anthropogenic
emissions, will not be located in the areas of maximum biogenic emissions (although type 1 or
type 4 sites may be). Therefore, spatially-resolved emissions, rather than emission totals for
an entire area, are needed to make meaningful comparisons.

This type of source attribution for VOCs is relatively new, and reliable indicator species have
not been identified or verified for most source categories. The examples cited above represent
species that have been used as tracers, or that common sense says ought to be tracers.
However, their universal applicability has not been demonstrated.

9401 lrl.Ol

-------
Receptor Modeling

Receptor modeling has been used for source apportionment of VOCs and emission inventory
verification (Nelson et al., 1983; Harley et al., 1992; Lewis et al., 1993; Kenski et al., 1993).
In order to conduct a receptor modeling analysis, it may be advisable to remove highly
reactive species, and secondary species such as formaldehyde, from the ambient profile. If
overnight samples or early morning samples are used, this may not be necessary. Highly
reactive species include all olefins, especially the 2-butenes, 2-pentenes, and 2-hexenes. As
pointed out above, isoprene and the pinenes also fall into this category. Other highly reactive
species are the xylenes, and especially the trimethylbenzenes.

The ratios of benzene to toluene, and ethylbenzene to m+p-xylene have been suggested as an
estimate of the degree of reaction that has occurred in an ambient sample (Nelson and
Quigley, 1983; Singh et al., 1985). In both cases, the pair of species exhibit a relatively
constant ratio in emissions, but decay at markedly different rates. If the ratio in fresh
emissions is known, the age of an ambient sample can be calculated from the ratio in the
sample (Roberts et al., 1993). In this manner, using the known reaction rates of all the species
in the profile, an estimate of the ambient profile at the time of emission could be obtained.
Although this approach has not been used to date, it might overcome the difficulties with
attempting to conduct source allocations for biogenic emissions and other reactive categories.

Verification of Emissions Trends

An important component of emissions tracking is the assessment of the effectiveness of
emission control measures. All ozone nonattainment areas are required by the CAAA to
adopt measures sufficient to reduce VOC emissions by 15 percent between 1990 and 1996
and by 3 percent each year thereafter to demonstrate Reasonable Further Progress (RFP)
towards attainment. The PAMS data provide the opportunity for verifying that these emission
changes are actually taking place. Trends in VOC composition are also expected, especially in
areas where reformulated gasolines will be used.

A complete discussion of the detection of trends in ambient data is presented in the "Trends"
section below. The following discussion focuses on the comparison of observed trends with
expected trends arising from changes in emissions.

RFP Tracking

Net changes in emissions from year to year represent the cumulative effects of a number of
different processes. Emission decreases arising from the application of controls may be offset
by emission increases due to growth. Motor vehicle VOC emissions are decreasing every year
because fleet turnover causes replacement of older, high emitting vehicles with new, cleaner
vehicles. Fleet turnover alone is expected to result in motor vehicle VOC emission reductions
on the order of 50 percent or more between 1990 and 1996. However, emission reductions
due to fleet turnover cannot be credited towards RFP reductions. Thus, RFP emission

9401 lrl.01

-------
reductions are only one component of overall VOC emission trends.

One approach to RFP tracking is simply to compare overall trends in the VOC inventory with
trends in measured total NMOC and NMOC/NOx ratios. Good agreement between the
emission and ambient trends would provide some degree of verification of RFP reductions.
However, if the agreement is poor, this approach cannot distinguish between discrepancies
caused by failure to meet RFP targets and those caused by other factors.

The RFP requirement for a 15 percent reduction between 1990 and 1996 may be met through
gradual reductions over a period of years, or by an abrupt set of reductions occurring within a
single year or two. Since the earliest PAMS sites began operation in 1993 and others will be
phased in over a period of several years, the phase-in schedule of the RFP controls for each
urban area must be considered when interpreting ambient trends. In many areas, the bulk of
the RFP reductions are likely to occur during 1995 and 1996. In these cases, the RFP
reductions may represent a significant portion of the overall VOC emission trends over the
years 1994-96 and may be relatively easy to detect. In order to detect these trends, PAMS
type-2 sites would need to be operational by the summer of 1994.

Another approach to RFP tracking is to use information on the source categories targeted for
reduction to identify characteristic components associated with those sources. Since the
strategies used to meet RFP targets may vary from city to city, this approach may be more
successful in some areas than in others. For instance, if reductions in surface coating
emissions are part of the RFP strategy, toluene may be useful as an indicator. However, if
motor vehicle control programs such as enhanced I/M are a key component of RFP strategies,
it may be difficult to distinguish effects of RFP programs from the effects of fleet turnover.

The fact that PAMS data will be obtained throughout the day will be helpful in tracking
source category-specific emission trends. Expected trends in motor vehicle emissions can be
verified using 6:00 - 9:00 a.m. PAMS data. Expected trends associated with surface coating
or reduction in evaporative emissions can be assessed using the midday PAMS samples.

Changes in Ambient VOCs Due to Introduction of RFG

Reformulated gasoline usage is mandated by the CAAA for nine urban areas with the most
severe ozone air quality problems. Other areas may "opt in" to the Federal RFG program as
part of their attainment strategies for ozone and CO. In addition, the State of California has
developed its own, more stringent, RFG program. Phase I of the Federal program begins in
1995, when a 15 percent reduction in VOC and toxic emissions is required. The California
program begins in 1996, and is expected to produce a 30 percent or greater reduction in VOC
emissions. Both RFG programs will result in significant changes in the composition of the
exhaust and evaporative emissions from gasoline-powered vehicles. Because the RFG
programs are ambitious and expensive, and because the contribution of motor vehicle
emissions to overall VOC emissions is large but uncertain, it is of interest to assess the effects
of RFG on ambient VOC composition.

9401 lrl.01

-------
Trends in VOC composition can be determined as absolute trends in individual species
concentrations, as trends in the fraction of total NMOC, and as trends in ratios of species.
Trends in absolute concentrations are of interest, but can be obscured by year-to-year
meteorological variations. Normalizing individual concentrations as a percentage of NMOC
reduces the importance of meteorology as a confounding variable, but may also reduce the
signal associated with RFG because overall reductions in NMOC are lost in the normalization.
For instance, if RFG results in a 25 percent decrease in the emissions of a given VOC species
and overall NMOC is reduced 15 percent, the reduction in the percent of NMOC associated
with that species will be only 12 percent.

Indicator species for RFG should include those for which the largest changes are expected,
those for which motor vehicle emissions are a dominant source, and those which have the
greatest impact on ozone formation. Data on the composition of exhaust emissions for
vehicles operated on RFG are readily available (EPA, 1993). Preliminary VOC speciation
profiles for exhaust and evaporative emissions for Federal Phase I RFG are included in the
1993 update of the SPECIATE database. However, EPA is currently in the process of
updating these profiles, and continued updates are likely in the future as actual RFG
composition becomes more certain.

One important indicator for RFG is MTBE, an oxygenated additive which will be a
component of nearly all RFGs. MTBE will be an essentially unique tracer for gasoline-related
emissions. However, MTBE is not one of the target analytes for the PAMS program, and
measurements may not be available. Emissions of two of the PAMS target analytes, i-butene
and formaldehyde, are expected to increase after introduction of RFG (Hoeckman, 1992).
Emissions of many other target analytes are expected to decrease. All RFGs are required to
have reduced volatility, leading to lower butane content. All will have reduced benzene and
aromatic content. Beyond these fairly universal changes in fuel composition, some flexibility
is allowed in the RFG composition. RFG composition, therefore, may differ from one
nonattainment area to the next and may change from year to year as refiners determine the
most cost-effective ways to reach the mandated VOC and toxic emission reductions.

Benzene is one important RFG indicator that will be measured at all PAMS sites. Benzene
emissions from motor vehicles will be reduced by an estimated 30-40 percent after the
introduction of RFG (EPA, 1992a). Since mobile sources (including nonroad mobile sources)
account for a major portion of overall benzene emissions, this decrease should be detectable in
ambient benzene data as a relatively abrupt change occurring between the summers of 1994
and 1995. Since benzene is of concern as an air toxic, observation of a decreasing trend in
ambient benzene concentrations will provide an important verification of the toxics reduction
provision of the Federal RFG program.

Benzene is not an important ozone precursor. An air toxin which is also an important ozone
precursor is formaldehyde. Ambient formaldehyde is a mixture of primary emissions and
secondary formaldehyde produced in the atmospheric oxidation of other hydrocarbons. In
summer, secondary formaldehyde is the dominant component. Although primary emissions of
formaldehyde increase with RFG, photochemical modeling studies suggest that introduction of
RFG will result in a decrease in secondary formaldehyde production (Ligocki et al., 1992).

9401 lrl.01

-------
The availability of diurnally-resolved ambient formaldehyde measurements in the PAMS
program should allow the verification of both of these trends. The morning samples can be
used to verify the increase in formaldehyde primary emissions, whereas the afternoon samples
can be used to assess the change in secondary production.

Toxic Hotspots

The CAAA identify 189 air toxics, most of which are VOCs. A subset of these will be
measured in the PAMS program, including benzene, toluene, xylenes, ethylbenzene, styrene,
hexane, formaldehyde, and acetaldehyde. Typical ambient concentrations of these species in
urban areas are available from several EPA-sponsored studies such as the Urban Air Toxics
Monitoring Program (e.g., McAllister et al., 1991) which measures 24-hour average
concentrations and the 3-hour air toxics monitoring program (e.g., O'Hara et al., 1992) which
measures 6:00 - 9:00 a.m. toxics concentrations. Concentrations of these toxics measured in
the PAMS network can be compared to typical urban values obtained from these studies. The
presence of any of these species at concentrations much higher than the typical urban values
will signify the presence of major sources in the vicinity of that site. In many cases, the cause
for the high concentrations may be well known already, such as a nearby major industrial
source. The PAMS data may identify toxic hotspots that were previously unknown, and may
also identify specific time periods where major releases from nearby sources occurred.

The PAMS data will also provide diurnal profiles of ambient toxic concentrations, which are
not available from any of the existing toxics databases. These diurnal profiles will aid in
identification of major sources of toxics, and in toxic exposure assessments.

AMBIENT TRENDS4

One of the most important uses of PAMS data will be for the identification and evaluation of
trends in ambient concentrations of ozone, ozone precursors, and air toxics. Trends analyses
typically consist of several steps: selection of the air quality indicators (summary statistics) to
be studied, possible adjustment of the indicators to remove unwanted meteorological
influences, application of statistical procedures for detecting any upward or downward trend,
and (if a trend is found), evaluation of the trend for direction, rate of change, etc. These steps
are discussed in the following subsections.

Selection of Summary Statistics

As noted below, most methods for detecting and evaluating trends in ambient concentrations
are applicable to annual summary statistics (e.g., the number of days exceeding the ozone
NAAQS, the mean 6:00 - 9:00 a.m. total NMHC concentration over the monitoring season,
etc.). A wide variety of such statistics may be constructed, each measuring a different aspect

4Some of the material in this section is adapted from Cohen and Stoeckenius (1992a).

9401 lrl.01

-------
of the ambient concentration distribution. A number of such statistics are reviewed by Hunt
(1991). In selecting an indicator for analysis, several criteria should be considered, including:

Relationship of the indicator to exposure events of concern and relevance to
regulatory considerations (i.e, how does the indicator relate to the NAAQS?)

Stability in the absence of changes in emissions (sometimes referred to as "native
variability"). Some indicators will be more sensitive to fluctuations in meteorological
conditions than others, making detection of underlying trends associated with changes
in emissions difficult.

Sensitivity to changes in emissions. Some indicators respond more readily to a given
change in emissions than others, making it easier to detect the air quality impact of the
emission changes.

In most cases, a balance must be struck between the need for stable indicators which minimize
native variability, and the need for indicators that are sensitive to the expected emission
changes. Larsen et al. (1990) provide comparative estimates of sensitivity and stability for a
variety of ozone air quality indicators. In general, indicators of peak concentration events
(e.g., maximum values, 99th percentiles) exhibit more variability than indicators of average
concentrations (e.g., medians, means). On the other hand, peak indicators are generally more
sensitive to changes in emission levels. Larsen found that the average of the highest 30
maximum daily ozone concentrations exhibited a favorable ratio of native variability to
sensitivity. Potential users of this statistic should note, however, that it is subject to biases
due to missing data. If a significant number of values are missing, an alternative indicator
such as the average of the top 10% of values should be considered.

As in any data analysis effort, it is important to examine the extent to which results and
conclusions are dependent on the particular choice of indicator. Therefore, the indicator
selection process should be performed with the goal of identifying a family of indicators rather
than any single indicator; each selected indicator will provide a different perspective on
changes in air quality. Usually, it is desirable to examine indicators of both peak and
"average" concentrations. Simple exposure indicators, such as the number of ppm-hours
above a selected threshold concentration may also be of interest from a public health
perspective, although Larsen et al. (1990) found that ozone indicators of this type were
relatively sensitive to changes in meteorological conditions but insensitive to changes in
emissions.

Adjustments for Variations in Meteorological Conditions

As noted above, a problem frequently encountered in the analysis of air quality trends is the
confounding influence of meteorological conditions which can obscure any underlying trend
indicative of long-term changes in air quality associated with changes in emissions. Much
work has gone into the development of methods for adjusting indicators to remove the
unwanted meteorological influences. To date, this work has focussed almost exclusively on

9401 lrl.01

-------
ozone trends, although the general methodologies used can in principal be applied to any
species. Stoeckenius (1993a) presents a comprehensive review of these methods and provides
references which may be consulted for details.

For most meteorological adjustment methods, successful implementation depends on proper
identification of the key meteorological factors responsible for fluctuations in the pollutant
species of interest. Studies reviewed by Stoeckenius (1993a) include information on
meteorological factors that have been found to influence ozone concentrations. Information
on the meteorological factors influencing ozone precursor concentrations is much more
limited due to the lack of reliable precursor monitoring data heretofore available. Stoeckenius
et al. (1993) provide a brief analysis of meteorological influences on 6:00 - 9:00 a.m. total
NMHC concentrations. The only consistent, significant relationship identified in this study
was with wind speed: the highest concentrations are associated with the lowest wind speeds
which are indicative of stagnant, limited dispersion conditions. Wind direction effects were
found to be significant at some sites, presumably those influenced by inhomogeneous spatial
source distributions. No consistent significant temperature effects were noted in this study,
despite the fact that evaporative emissions are likely to increase with increasing temperature.
However, Cohen and Stoeckenius (1992a) demonstrated that the relative contribution of
isoprene, which is primarily of biogenic origin, to total NMHC is sensitive to variations in
temperature. Additional studies of the effect of temperature on VOC species measured at
various times of the day are needed. Wind speed effects similar to those identified for total
NMHC by Stoeckenius et al. (1993) are likely to be found for any primary precursor species
(VOCs, NOx). Meteorological adjustment of precursor trends found in PAMS data will
require additional analysis of the meteorological influences on precursor concentrations for the
specific PAMS sites used in the analysis.

Trend Detection and Evaluation Procedures

Types of Procedures

Statistical procedures appropriate to the detection and evaluation of trends in ambient air
quality data collected at PAMS networks can be classified into two broad categories:

procedures for annual summary statistics such as means or extreme values, and

procedures based on the time series of the raw measurements.

The first category of statistical analyses are procedures that could be applied to annual
summary statistics at multiple sites. Appropriate methods include fitting a simple linear
regression model (straight line trend), analysis of variance methods, and some non-parametric
methods. These methods are described in the Appendix. The biggest difficulty in applying
some of these methods may be that in the first few years of the PAMS program there will only
be a few annual summary statistics for each PAMS site making it unlikely that a trend could
be detected at a specific site or even within a small region (unless the trend is relatively large).
An exception to this might be for trends in ozone and N02 summary statistics where the

9401 lrl.01

-------
PAMS data can be combined with other NAMS/SLAMS data.

The second category describes more sophisticated procedures applied to the time series of
data collected at each PAMS site. At a cost of increased complexity for the statistical
analysis, time series techniques offer much more powerful trend detection tests (i.e., tests with
greater probabilities of detecting a trend that is actually present). These methods fit explicit
statistical models to the time series and therefore are more likely to distinguish a true trend
from serial dependence, seasonal and other temporal factors, meteorological factors, or other
factors unrelated to the trend. Details are provided in the Appendix under "Time Series
Models."

For NOx and VOC species, trends of particular interest are trends in the annual mean of the
daily means or of the means for particular 3-hour periods (e.g., 6:00 - 9:00 a.m.). For these
analyses, most of the proposed methods in either category described above can be applied.
Since sampling frequencies may vary by pollutant and PAMS monitor (either every day, every
three day, or every six day sampling is required and the number of samples per day can be
either one 24-hour, eight 3-hour, or twenty-four 1-hour samples), the calculations from the
raw data will differ between these compounds. However, all of the proposed methods use a
single daily summary statistic (such as the early morning or daily mean) and so essentially the
same methods can be applied to any of these compounds.

As noted above, trends in daily and annual extreme concentrations are of primary interest for
ozone and this calls for a slightly different approach. To track trends in the average daily
maximum concentration (averaged across days, sites, or both), most of the methods in the first
and second categories can be used assuming that a log transformation applied to the daily
maxima will result in data that are approximately normally distributed (see Cohen and
Stoeckenius, 1992a). To tracks trends in annual extreme values (e.g., the annual maximum
hourly ozone concentration, or the Mi highest daily maximum hourly concentration), most of
the procedures in the first and second categories are not appropriate because these annual
summary statistics have a distribution that is known to deviate significantly from normality and
from log-normality. A more appropriate set of distributions are based on the two or three
parameter extreme value distributions, which arise as theoretical limiting distributions for the
highest values of a long sequence of approximately independent measurements (Roberts,
1979a,b; Galambos, 1978).

Three different types of approach are recommended for the analyses of trends in ozone
summary statistics based on extremes of the daily maximum hourly ozone concentrations.

First, most of the non-parametric methods described under the first category of procedures
(for annual summary statistics) are directly applicable, although the statistical power of these
methods will usually be unacceptably low. Second, some fairly complex approaches based on
fitting extreme value distributions to the daily or annual maximum concentrations can be
applied. Third, the analyses can be somewhat simplified by considering trends in the number
of exceedances per year, rather than trends in the concentration levels; an exceedance day is a
day for which the daily peak hourly concentration exceeds the ozone NAAQS (or, more
generally, some preset high threshold.) Both the second and third types of analyses are
discussed under "Procedures Based on Extreme Value Theory or Exceedances" in the

9401 lrl.01

-------
Appendix.

Types of Trends

In this discussion we regard a trend as a long term change (usually a decrease) in the mean
ambient concentration over several years. Thus a trend is quite different from day to day or
hour to hour variation (which can be thought of as random variation about the overall trend),
and also quite different from seasonal (or monthly) factors that represent the variability of the
mean concentration during the annual PAMS monitoring season.

The statistical procedure selected depends upon what types of trend are considered. The
simplest type, often assumed by air quality managers, is a linear trend, which means that the
annual summary statistic increases or decreases on average by the same amount every year. In
this case the annual summary statistic is a linear function of the calendar year. In some cases
higher order polynomial trend functions for the annual mean can be used, such as a quadratic
or cubic function. These results are often displayed by simply plotting the observed annual
summary statistic against the year and superimposing the estimated trend function (a straight
line for linear trends).

The trend analysis will also depend on the transformation used. For example, if the annual
summary statistic decreases by the same absolute amount (e.g., 1 pphm daily peak ozone) per
year, then it is appropriate to analyze the raw data. If the annual summary statistic decreases
by the same percentage amount every year, then it may be more convenient to analyze the
logarithm of the ambient concentrations, since a constant percentage decrease in the
concentration implies a constant absolute decrease in the logarithm of the concentration.
However, unless the expected percentage trend is quite large it usually makes very little
difference which approach is taken because the decrease will still be almost constant from year
to year over a small number of years.

A plot of the annual summary statistics may suggest that the year to year variation is not a
simple polynomial or other simple function of the calendar year. If there is no obvious pattern
then a fairly general approach would be to simply compare the summary statistics for two
years of interest and ignore the intervening years where the values may be higher or lower.

A very common situation that is expected to be encountered with the PAMS data is that there
will be gradual long term changes in the mean in addition to some abrupt step changes. For
example, the introduction of certain types of emission control measures may cause a large
decrease in the mean ambient NMOC concentration between the years before and after the
control measure is implemented. If the annual mean has been decreasing at a fairly steady rate
due to other more gradual effects of control measures then it is likely that the annual mean
will continue to decrease at a similar rate for the second and later years after the control
measure was implemented. Air quality managers may wish to regard such a trend as the
combined effect of the linear trend and the step change. In practice the inherent variability in
ambient concentration data and the variability of the effects of a control measure at different
sites means that unless the emissions control measure has a very large and abrupt effect, it

9401 lrl.Ol

-------
may be quite difficult to distinguish the step changes from the year to year variability and from
more gradual trends.

The trend analyses methods discussed above ignore the possibility of year-to-year serial
dependence; the mean values (or other annual summary statistics) for two consecutive years
are assumed to be independent. A general problem is that it is very difficult to distinguish
between a trend (not necessarily linear) and this type of serial dependence since both
phenomena can lead to very similar concentration series. However, for analyses based on data
for the peak ozone season this assumption of statistical independence between data from
different years is quite reasonable since there will be many months separating the
measurements from one year to the next.

EXPOSURE ASSESSMENT

Human exposure to air pollution may be characterized as contact by a person with air
containing a specific pollutant concentration for a specific period of time. There are a number
of exposure assessment methods currently in use including personal monitoring and analysis of
biomarkers. Another common method of exposure assessment is the indirect estimation of
exposure by combining concentration data (either monitored at fixed sites or estimated with
models) with time-activity data (Hunt et al., 1984).

This section will discuss the use of fixed site monitored air quality data, such as will be
available from the PAMS network, for exposure assessment. Because air concentrations of
pollutants vary from one location to another and from one time period to another, geographic
coverage and averaging time are important considerations in the analysis of monitoring data
for inhalation exposure assessment.

Spatial Variability

When ambient exposure concentrations are based on monitored data from fixed sites, one
common approach to exposure assessment involves dividing the geographic area into a set of
exhaustive, non-overlapping exposure districts, one corresponding to each ambient monitor.
The ambient concentration is assumed to be uniform throughout the exposure district. The
borders of the exposure districts may be defined simply from monitor proximity or may
include considerations of local terrain features and air flow patterns. If this approach is taken
it is important for monitors to be sited at locations with concentrations representative of broad
areas and the number of monitors should give sufficient geographical resolution to distinguish
areas that are likely to have varying ambient concentrations. Alternatively, fixed site
monitored data may be used as reference points for ambient concentration interpolation. With
this approach each geographical population grouping, such as a U.S. Census block or block
group, may be defined as a separate exposure district. Ambient concentrations are then
interpolated to the geographical centroid of each district. If this approach is taken, monitors
should be located to sample maximum and minimum concentrations with additional monitors
at intermediate sites.

9401 lrl.Ol

-------
Minimum PAMS network design criteria specify at most five monitoring sites for the most
heavily populated urban areas with carbonyl concentration measurements required at two of
the sites. Some, but not necessarily all, of these sites will supplement the existing ozone and
N02 monitoring sites and thus provide additional spatial coverage for these criteria pollutants.
Ozone, since it is a secondary pollutant, exhibits relatively smooth horizontal concentration
gradients and the enhanced network may be sufficient to provide a reasonable estimate of total
exposure to ozone within the nonattainment area. Concentrations of many air toxics,
however, are likely to have sharp spatial gradients. Since the only routine measurements of
toxics are likely to be those obtained from the limited number of PAMS sites, PAMS data can
only be used to provide exposure estimates for a limited area around each monitoring site.

Averaging Time

The choice of averaging time should reflect the duration of exposure that is expected to lead
to health effects. In addition, if the exposure assessment includes consideration of people's
movements among microenvironments between which pollutant concentrations may vary (see
discussion of exposure models below), the averaging time should reflect the time scale of
these movements to avoid bias. For example, if the ambient concentration of a pollutant is
highest at night, when few people are outdoors, a 24-hour averaging time for ambient
concentrations would overstate the outdoor exposure concentration for most of the
population.

Formulation of Exposure Indices

Population exposure to air pollutants may be characterized in many different ways. The most
general way to characterize population exposure is through a frequency distribution of people
exposed or person-hours of exposure to concentrations exceeding a series of threshold values.
This information makes possible the construction of a number of summary measures, or
exposure indices. Exposure indices can be tracked over time to determine trends in exposure
(Hunt et al., 1985).

In some cases there is a threshold concentration and averaging time of interest, either due to
health impact considerations or regulatory considerations. For ozone the operative
specifications are often the NAAQS reference concentration of 0.12 ppm and 1-hour
averaging time. In that case population exposure may be indicated by such exposure indices
as:

The number of people exposed one or more hours per ozone season to concentrations
exceeding 0.12 ppm; or

The number of person-hours of exposure per ozone season to concentrations
exceeding 0.12 ppm; or

The maximum potential exposure to ozone.

9401 lrl.01

-------
Variations on either measure may include other averaging times of interest, or particular
conditions of exposure, such as during heavy exercise, defined according to ventilation rate.
Other variations are designed to account for the extent to which the exposure concentration
exceeds the threshold of interest, by weighting each person-hour according to the exposure
concentration. The weighting factor might be the exposure concentration itself or may be
related to the expected health impact of the exposure concentration, which may be non-linear
(i.e., from an exposure-response relationship). Indices constructed with such weighting
factors are more closely related to the prevalence of potential health impacts.

For pollutants whose health risk is assumed to be related to total lifetime exposure without a
threshold (e.g., carcinogens), the averaging time of interest will be the annual average
(assuming population activity is not considered in the exposure assessment). For carcinogenic
health risk assessment, these lifetime average exposure concentrations throughout the
population are combined with unit risk factors, defined in units of [|ig/m3]"', to estimate the
expected number of excess lifetime cancer cases. Similarly, the maxima of such
concentrations are combined with unit risk factors to estimate the health risk to the maximally
exposed individual (Hunt et al., 1984).

9401 lrl.Ol

-------
Appendix

STATISTICAL TREND DETECTION AND ANALYSIS METHODS5

TRENDS IN ANNUAL SUMMARY STATISTICS—PARAMETRIC METHODS

This section discusses some proposed classical methods for evaluating trends in annual
summary statistics. Many analyses require that either the consecutive concentrations or the
corresponding annual summary statistics are approximately normally distributed and
independent. Various tests of these assumptions are applied to the estimated residuals: the
observed concentration or annual summary statistic minus the estimated mean (predicted by
the statistical model). Since means are derived from data, residuals are not exactly
independent even if the model exactly describes the underlying distributions, but the proposed
tests should work reasonably well. Since these tests use general linear models and assume
normality of the concentrations or annual summary statistics, it follows that residuals will be
normally distributed if the statistical model is correct.

Testing Independence

We describe two statistical methods for testing independence of residuals. The first, a
nonparametric test, requires no further major assumptions about the data (in particular, it is
applicable even if residuals are not normally distributed). The second requires that residuals
be normally distributed. In general, the normality assumption is less important than the
independence assumption as most of the proposed procedures give reasonable estimates of the
trend even if the normality assumption is violated provided (a) the true distribution is not
excessively skewed and (b) the independence assumption holds. The trend estimates will be
quite poor if the independence assumption is too severely violated.

Before applying these procedures, the first step is to examine a time series plot of the
residuals; the presence of clear patterns in the plot indicates significant departures from
independence. For example, the residuals are not independent if the series tends to alternate
between clusters of large positive residuals and large negative residuals. Unfortunately, this
can also occur if the regression model (the assumed relation between the mean concentration
or annual summary statistic and the regression terms such as site and year) is incorrect. It is
not always possible to distinguish which features defining the statistical model are
representative of real world conditions and which are not.

5 Some of the material in this appendix has been adapted from Cohen and Stoeckenius (1992).

9401 lrl.Ol

-------
Runs Test of Independence

The runs test of independence is a nonparametric method used to test the assumption of
independence for data from a single site. It is not applicable when evaluating combined data
from more than one site as the test statistic measures temporal rather than spatial dependence.
Its advantage over the autocorrelation test described next is that specific parametric
assumptions are not needed when applying the test.

Consecutive residuals are given plus or minus signs according to whether or not they exceed
the median. The runs test is then based on the number of changes of sign in the series. A very
small number of sign changes indicates a strong positive dependence between consecutive
residuals; that is, both values tend to be high or low. A very large number of sign changes
indicates a strong negative dependence; that is, high values tend to be followed by low values
and vice versa. Thus if the number of sign changes is extreme in either direction, there is
strong statistical evidence of dependence.

Standard texts give the theoretical mean M and standard deviation S of the number of runs
calculated under the null hypothesis that consecutive observations are independent. It can
also be shown that for large N the distribution of the number of runs is approximately normal.
Thus the p-value of a given number of runs is approximately given by the tail area of a
standard normal distribution that exceeds the standardized test statistic, (observed runs -
M|/S. Equivalently, the null hypothesis of independence is rejected at the 5 percent level if
this standardized test statistic exceeds z0 025 (the value exceeded with a probability of 0.025 by
a standard normal distribution). Such a case is "statistical evidence of dependence." (More
exact probability calculations are available for small samples but are harder to implement.)

Autocorrelation Test of Independence

As an alternative to the runs test of independence we also propose the autocorrelation test to
consecutive residuals from the same site. This test is based on values of the correlation
coefficients calculated from all pairs of data points separated by k time periods, where k, the
lag time, is between 1 and 5. Such autocorrelation coefficients between consecutive values of
the same variable are called autocorrelation coefficients. Following standard practice, we
shall use the following definition of 0(k), the estimated autocorrelation coefficient between
consecutive values k time periods apart:

N-k _ _

m - -£!

£
-------
values of 10(k) | are evidence of dependence.

The statistical test of independence uses the standardized autocorrelation coefficient, defined
as the autocorrelation coefficient 0(k) divided by an estimate of its standard deviation. We
propose the use of the estimate of standard deviation given by

Variance \o(k)\ = — —

n(n + 2)

This formula gives the exact variance of 0(k) in the case where the X observations are
independent and normally distributed, and where the true mean replaces the sample mean, X,
in the formula for 0(k). The approximate statistical significance of the standardized
correlation coefficient is found from standard normal tables, assuming normally distributed
data.

Testing for Normality

If the above procedures show that the residuals are approximately independent then various
procedures can be used to test for normality. It is extremely important to apply these tests to
the residuals and not the concentrations or annual summary statistics, because any trend in the
mean would invalidate these procedures. These methods assume that all the data (i.e.
residuals) come from a common population and therefore have a constant mean and variance.
If the overall statistical model assumes that a transformation of the raw data (e.g. a
logarithmic transformation) is required to convert them to normality, then it will be necessary
to calculate the residuals from the transformed data.

The simplest approach and an important first step is to prepare a histogram of the residuals
that gives the frequencies of each possible value, grouped into class intervals of equal sizes. If
the residuals are normally distributed then the histogram should approximately follow the
classic bell-shaped normal curve. More quantitative procedures are described below: the
Kolmogorov-Smirnov test of normality and the Shapiro-Wilks test. Note that if the normality
test indicates a departure from normality then it may be that the statistical model is correct
except that the error distribution is not normal, or that the statistical model and normality is
correct but the data are not independent, or that the statistical model (regression terms) is
wrong. If only the normality of the error distribution is wrong, then it may be desirable to use
the proposed non-parametric procedures. If only the independence assumption is wrong (see
previous section) then it will be appropriate to apply the time series methods discussed below.

Kolmogorov-Smirnov Test of Normality

For each residual v in the data set, the distribution function, F(v), is defined as the fraction of
values less than or equal to v:

9401 lrl.Ol

-------
I'\v) = (Number of residuals < v)/(Number of residuals)

Correspondingly, using statistical tables, we can also calculate the distribution function G(v)
for an infinite sample from the hypothetical normal distribution (with the same mean and
variance as in the data).

The Kolmogorov-Smirnov test is based on the maximum difference between F(v) and G(v) as
v varies across all possible values. The greater the observed maximum difference, the greater
the evidence against the normality assumption. The critical value of the test statistic
(exceeded with a 5 percent probability if normality holds) can be approximated using standard
asymptotically valid formulae.

Shapiro-Wilks Test of Normality

Another standard test of normality is the Shapiro-Wilks test described in many standard
statistical texts and in the SAS software manuals (SAS procedure UNIVARIATE). This test
is based on the ratio of the optimal estimate of the sample variance based on squaring a linear
combination of order statistics to the sample variance. For large samples the test is
asymptotically equivalent to plotting the data on a normal probability plot and calculating the
Pearson correlation coefficient.

Simple Linear Regression

One of the simplest methods of trend detection and estimation is simple linear regression
(Chock et al., 1982; Kumar and Chock, 1984; Wackter and Bayly, 1987; SCAQMD, 1991;
EPA, 1974). In this method the expected annual summary statistic is assumed to be linear in
the year, so the estimated means plot as a straight line. This statistical model is usually fitted
by least squares, which is defined as finding the straight line such that the squared errors about
that line (squared differences between the observed annual summary statistic and the estimate
from the straight line trend) are minimized. That method is most appropriate when the annual
summary statistics are normally distributed with a constant variance. For the set of annual
summary statistics from a single site rather than averages across multiple sites and/or multiple
days, the assumption of normality may be less tenable than an assumption of log-normality, so
one can apply simple linear regression to the logged concentrations and then transform from
the straight line in log space back to an exponential curve for the mean concentrations (Chock
et al., 1982, Kumar and Chock, 1984).

General Linear Regression

The simplest approach uses the set of annual summary statistics from a single site and fits a
linear trend using simple linear regression, as described above. To estimate polynomial or
other trend curves at a single site, other terms are used in the linear regression model. For

9401 lrl.Ol

-------
example, instead of expressing the annual mean as a straight line function of the calendar year,
higher powers of the calendar year can be added to the regression equation and hence a
quadratic or higher order polynomial trend curve can be estimated.

Since the above method only uses the annual summary statistics it is not a very powerful
method for detecting the trend at a single site (i.e., there is a high probability that the slope or
trend curve will not be found to be significantly different from the case of no trend - the same
mean every year).

Use of Raw Concentration Data

If the summary statistic is an annual mean of the consecutive concentrations, and certain
assumptions are made, then much more powerful statistical techniques can be used to estimate
the trend at a single site. The simplest analysis assumes that the consecutive measurements
within an ozone season (or PAMS monitoring season) are approximately independent and
have the same mean and variance; under those assumptions the trend in the annual mean can
be evaluated by fitting the regression model to the raw data instead of the annual means.
Analyses taking into account the possible dependencies between consecutive measurements at
the same site are discussed in the section on time series analyses.

Multiple Sites

Various extensions of the simple and general linear regression approach can be used to
evaluate overall trends at multiple sites. Since emissions control measures will generally have
different effects on ambient measurements in different locations (depending on where the
emissions reductions occur as well as meteorological factors), it would not be expected that
trends in VOC, VOC species, air toxics, and ozone would be exactly the same at every PAMS
site but they might be reasonably similar. If data from several sites can be combined into one
calculation of the estimated trend (assumed consistent across those sites), then the estimate
will be much more accurate than a trend estimate calculated from data at a single site. This
type of analysis will be extremely useful in the early years of PAMS data collection since there
will not be enough data at each site to get very reliable site-specific trend estimates.

The simplest method averages the annual summary statistics across all the sites and estimates
the trends in these spatial averages using linear regression with one value for each year. Since
the method is simple and requires few additional assumptions, we recommend using this
approach to get a quick and useful overall estimate of the trend. The method does not require
assumptions about the variation of the mean or trend from site to site. The method also has
the advantage of being quite robust, in the sense that the mean across a large number of sites
can be expected to follow the normal distribution fairly closely even if the site means do not;
this follows from the central limit theorem.

9401 lrl.Ol

-------
A slightly more complicated method applies the same regression model to the data set
consisting of the annual statistic for each site and year. Since a site effect is not assumed, this
method effectively assumes that the underlying mean is the same at every site. This
assumption is less tenable than the assumption of a consistent trend across sites and therefore
this regression approach is not recommended except in certain situations. The approach is
reasonable for the analysis of nearby sites known to have similar ambient concentration levels.
The approach also may be useful in the early years of the PAMS program, where there may be
insufficient data to accurately estimate separate means for each site. Usually a better
approach is the two-way analysis of variance method described next.

Two-Way Analysis of Variance

A very reasonable approach to the analysis of data from several sites assumes that the annual
summary statistic for a given site and year varies both with the site and year. For example, if
annual summary statistics are available for every site, two-way analysis of variance can be
used (Pollack and Stocking, 1989; Cohen and Pollack, 1991; Capel et al., 1983; Pollack et al.,
1984; Pollack and Hunt, 1985; EPA, 1984-1991). The two-way analysis of variance method
assumes that the annual summary statistic for a given site and year is the sum of a site effect, a
year effect, and a normally distributed error term. In particular, this approach does not
assume a specific trend curve since the form of the year effect is not specified. The errors are
assumed to be independent, with mean zero and a constant variance. The main advantage of
this method is that it accounts for the dependence between annual composite site averages
caused by site effects. A significant trend determined from the two-way analysis of variance
corresponds to a statistically significant year effect.

The two-way analysis of variance method can also be applied in cases where some annual
summary statistics are missing for some sites (often because the available data are insufficient
to satisfy the validity criteria for the annual summary statistic). In this case the approach is
equivalent to using the general linear model on the available data to estimate the site and year
effects, and then estimating any missing value as the sum of the estimated site and year effects
corresponding to the particular missing site and year (Pollack and Stocking, 1989). One
disadvantage of this method is that data from all sites are used to fill in the missing values,
rather than data from local sites. Cohen and Pollack (1991) provide an extension of the two-
way analysis of variance approach that deals with this problem by allowing the year effects to
depend on the region. Regions are defined by combining nearby sites in such a way that the
mean squared error in the fitted general linear model is minimized.

In the two-way analysis of variance approach, a trend is indicated by statistically significant
differences between the mean annual summary statistics for a pair of years. An often useful
plot is given by graphing the annual means with simultaneous confidence intervals defined
such that two means are significantly different if the corresponding confidence intervals do not
overlap (Pollack and Stocking, 1989; Pollack et al., 1984; Pollack and Hunt, 1985; EPA,
1984-1991). For example, Figure A-l (from Pollack et al., 1984) illustrates simultaneous
confidence intervals for four years of data. Since the plotted confidence intervals overlap for
years 1 and 2 but not for years 1 and 3, years 1 and 2 are not significantly different, but years
1 and 3 are significantly different.

9401 lrl.Ol

-------
The Tukey studentized range technique used to derive the simultaneous confidence intervals
in Figure A-l was described by Pollack and others (1984; Pollack and Hunt, 1985). Since the
number of possible simultaneous comparisons is k(k-l)/2, where k is the often large number of
years of data analyzed, testing each pair at the usual 5 percent significance level would lead to
a high probability that at least one difference would be declared significant when in fact there
are no real trends (since each comparison would then have a 5 percent probability of being
declared statistically significant). To treat this problem, the confidence intervals were
computed in such a way that if there is no real trend, the probability that every pair of annual
confidence intervals overlaps is 95 percent. Thus the probability of erroneously determining
one or more significant differences is 5 percent. The same Tukey studentized range technique
is applicable in other cases where a general linear model is used to estimate the year effects.

One important problem with the two-way analysis of variance method is that significant year
effects may be due to almost entirely to differences in meteorological conditions for different
years. Thus these year effects may not be due to long term emissions trends. One way of
dealing with this problem is to adjust the annual summary statistics to account for possible
annual meteorological differences before fitting the linear model. Such an adjustment can be
based on a statistical regression analysis simply by adding additional terms representing
meteorological factors to the site and year effects; for example a term giving the annual mean
summer temperature might be very useful for an analysis of ozone data.

The two-way analysis of variance approach can be modified to allow for specific trend
functions by assuming that the year effects are certain functions of the year, but the site effects
are arbitrary. The case of a linear trend but arbitrary site effects is used by Capel and others
(1983). One major advantage of this modified approach is that the year effects will then
probably not be confounded with annual meteorological conditions, since any long term trends
in meteorology are likely to be negligible compared to trends in emissions.

Other General Linear Model Approaches

The above set of analyses are important examples of the general linear model approach to
estimating and testing for trends. These analyses can be extended in various ways that are far
too numerous to be described here. In a similar manner to the time series approaches
described below, the regression models fitted to the annual summary statistics can be
enhanced by the addition of various explanatory variables, such as annual summary statistics
for meteorology or emissions, or terms to represent abrupt step changes in the annual
summary statistics. Other enhanced analyses use multivariate methods that combine the set of
measurements at different sites, or for different pollutants, into a single vector and thus
explicitly model the correlations between the concentrations at different sites or for different
pollutants.

9401 lrl.Ol

-------
If the consecutive measurements at a given site within the same PAMS measurement season
can be assumed to be approximately independent then more powerful statistical methods can
be used to estimate the trends by fitting a general linear model to these raw concentrations
rather than the annual means. The general linear model can then explicitly take into account
the effects of the variation in the mean during the week (for CO, for example) or from month
to month during the ozone season by adding appropriate terms to the fitted general linear
model. Another possibility is to assume that the mean concentration on the z'th day of year y is
equal to a multiple of 365y + i; this formula recognizes the fact that if there is a linear trend in
the annual mean, then it is plausible that the mean at the beginning of the PAMS season will
be different to the mean at the end of the season and will slowly decrease during the
measurement period (after adjusting for the effects of the month and the day of the week.)

Trend Detection Probabilities

An important issue for air quality managers is the probability that a trend in the annual
summary statistics can be detected using an appropriate statistical analysis of the PAMS data,
since this gives the managers some measure of the effectiveness of these data. The calculated
probability of detecting a trend depends upon the assumptions that are made about the
statistical distributions of the ambient concentration data and the amount of valid data that are
collected each year. Such a calculation was made using speciation data collected in the
Summer 1990 Atlanta Ozone Study (Cohen and Stoeckenius, 1982) where it was found that a
three-percent annual trend in early morning mean total VOC could be detected with a 60
percent probability assuming 90 days of valid data per year at 4 sites. Similar results were
found for specific VOC species (e.g. benzene and acetylene) and for daily means. Note that
these calculations considered the question of detecting ANY non-zero linear trend in regions
with similar ambient concentration distributions to Atlanta assuming that the true means
decreased by three percent per year. The probability of detecting a specific non-zero trend
(e.g. two percent or greater) will be even lower.

NONPARAMETRIC METHODS

The assumptions of independent normally distributed errors with the same constant variance
used in several of the trend analyses described above may not be applicable. In order to test
for and/or estimate the trend without making such distributional assumptions, various
nonparametric methods can been used. Nonparametric methods proposed and applied in the
literature for trend analysis include the use of Spearman's rho (Kolaz and Swinford, 1988,
1989; Sweitzer and Kolaz, 1984; Lettenmaier, 1976; EPA, 1974), and Kendall's tau (Hirsch et
al., 1982; Hirsch and Slack, 1984; Freas and Sieurin, 1977) for trend detection, and the use of
the Theil/Sen slope estimator for linear trend estimation (Hirsch et al., 1982; Freas and
Sieurin, 1977). Since only very general distributional assumptions are made (concerning the
dependence structure), the results are valid under very general conditions but the methods will
have lower power (trend detection probability) compared to parametric tests in cases where
the parametric model assumptions are reasonable approximations. The nonparametric tests
can have much greater power than parametric tests when the distributional requirements of the

9401 lrl.01

-------
parametric test are violated (Lettenmaier, 1976).

The main advantages of the non-parametric procedures over parametric alternatives are that
the procedures can be used without making too many assumptions about the underlying
concentration distributions and. in many cases, their relative simplicity. One disadvantage of
these approaches is the relatively low power (i.e. a low probability of detecting a trend) in
cases where the assumptions for a corresponding parametric test are reasonable. Another
disadvantage for some of these procedures is that the non-parametric test may only be able to
determine whether a statistically significant trend exists (trend detection) and cannot
determine the size of the trend. Obviously an air quality manager will usually prefer to have
an estimate of the trend. For parametric tests the results can usually be expressed as a
confidence interval for the trend. If the confidence interval does not contain zero then a non-
zero trend has been detected. Several nonparametric methods (such as the Spearman's rho
and chi-square tests described below and in EPA, 1974) are used for trend detection but
cannot be used for trend estimation.

In this description we include some of the more standard non-parametric approaches that have
been used in the past for trend analysis and could easily be adapted to the PAMS data. These
analyses use data from a single site and do not combine results from different sites. A wide
variety of other more general non-parametric tests applicable to data from multiple sites could
be derived. A general approach that often produces reasonable non-parametric analyses from
a parametric analysis is to apply a standard test based on a fitted general linear model but
apply it to the ranks instead of the raw data. Thus any of the general linear model approaches
described above based on annual summary statistics could be converted into non-parametric
procedures by replacing the highest annual summary statistic by 1, the next highest by 2, and
so on. A similar approach could be used for the general linear models applied to the daily
summary statistics.

Spearman's Rho Test of Trend

The Spearman's rho test of trend (Kolaz and Swinford, 1988, 1989; Sweitzer and Kolaz,
1984; Lettenmaier, 1976; EPA, 1974) is based on Spearman's rho statistic, which is the
standard Pearson correlation coefficient between the rank of the annual summary statistics and
the year. The rank is 1 for the highest summary statistic, 2 for the second highest, and so on.
If there is no trend and all observations are independent, then all rank orderings are equally
likely. This fact is used to calculate the statistical significance of the Spearman's rho statistic;
a value significantly different from zero implies a significant trend. If ties in the annual
summary statistics are present, then the significance level has to be adjusted to account for the
number of ties. In paper 12 a comparison of the power (trend detection probability) of
Spearman's rho with the power of simple linear regression shows that the nonparametric test
can be almost as efficient as the simple linear regression t test even when the normality
assumption holds. The linear regression power calculations in (Lettenmaier, 1976) are based
on formulae that are incorrect for small samples but approximately correct for large samples
(see formula 3b in Lettenmaier, 1976). Thus the reported results in that paper may be
inaccurate for small samples and should be used with caution.

9401 lrl.Ol

-------
Kendall's Tau Test of Trend

Kendall's tau is an alternative nonparametric statistic that can be used to test for trend (Hirsch
et al., 1982; Hirsch and Slack, 1984; Freas and Sieurin, 1977). This statistic can be calculated
as the number of possible pairs of years for which the ordering of the years is the same as the
ordering of the annual summary statistics (the lower annual statistic occurs in the earlier year)
less the number of possible pairs of years with the reverse ordering. If there is no trend and all
observations are independent, then all rank orderings of the annual statistics are equally likely;
this result is used to compute the statistical significance of the tau statistic. Adjustments for
tied annual summary statistics are described in the cited articles.

Adjustments of Kendall's tau for seasonality (Hirsch et al., 1982) and serial dependence
(Hirsch and Slack, 1984) have been proposed and investigated in the context of water quality
data analysis. A seasonally adjusted Kendall's tau (Hirsch et al., 1982) allows for different
annual means and trends in different calendar months by adding up the 12 Kendall's tau
statistics from each month. In paper 13 the null distribution of this statistic (when there are no
trends) is calculated assuming values from different calendar months are independent. In
paper 14 the null distribution is calculated assuming values in different months can be
correlated. Both papers include calculations of the power of these tests for simulated data.
The power of the seasonal and serial dependence adjusted Kendall's tau is greater than the
power of the simpler seasonal dependence adjusted Kendall's tau if there is serial dependence,
but is less in the independent case.

Kendall's tau test of trend is related to the Theil/Sen non-parametric slope estimator (Freas
and Sieurin, 1977), which gives an estimate of the assumed linear trend. This estimator is the
median of all possible ratios of the change in the annual summary statistic from one year to a
later year divided by the number of years separating the two values. If the trends differ by
calendar month, then the same calculation can be applied to the monthly summary statistics by
only considering ratios for values in the same month, i.e., that differ by an exact multiple of 12
months (Hirsch et al., 1982).

TIME SERIES MODELS

Most of the above procedures require that consecutive concentrations or annual summary
statistics are approximately independent. This assumption can be tested using the procedures
described above under "Tests of Independence." Usually this issue is most important when
the raw concentration data are analyzed, since measurements separated by shorter time
intervals are generally more likely to be dependent. Note that we propose to analyze daily
summary statistics calculated from the hourly or three-hourly PAMS data, rather than dealing
with the raw hourly or three hourly data. This approach avoids technical difficulties
associated with the fact that the PAMS monitoring scheme allows for multiple samples on a
given day but sampling need not be every day (it can be every third or every sixth day).

One simple approach to treating the serial dependence between consecutive measurements is

9401 lrl.Ol

-------
to decrease the sampling frequency by using only every second, third, fourth,., measurement.
This simple approach has the disadvantage of throwing away valuable information, but if the
autocorrelation is very high then the amount of additional information in the dropped
measurements will be small. A useful rule of thumb to determine how much data to drop is
found using the autocorrelation coefficients calculated in the autocorrelation test of
independence described above. If the lag k correlation is the first non-significant correlation
then one might use every k sample values and assume those values are approximately
independent.

A technically much better approach, but one that requires significantly more sophisticated
analysis and computation, is to fit a time series model that explicitly takes the dependence into
account. In this discussion we mainly present models based on normally distributed data
although the statistical literature does include methods for dealing with more general error
distributions.

A wide variety of different statistical models can be used for these analyses, allowing for
different annual trend functions, different dependencies between consecutive measurements,
the inclusion of day of the week, seasonal, and/or meteorological factors, the inclusion of
spatial dependencies (for data at different monitoring sites), and the inclusion of intervention
terms to account for relatively abrupt step changes in the ozone and ozone precursor
concentrations. In fact most of the models described above in the subsection "Other general
linear model approaches" could be analyzed using a time series approach by adding to the
model the error auto-correlations (the general linear model approach assumes independent,
and hence, uncorrelated errors). The biggest limitations are a) the availability of software for
the proposed analysis, and b) the need to consider and compare a large number of possible
time series models.

For most trend analyses the time domain approach to time series analysis is more appropriate
than a frequency domain approach, which would analyze the series by examining the inherent
periodicities. The auto-regressive integrated moving average (ARJMA) modeling approach is
a general model that is likely to encompass the dependencies in the PAMS data. The simplest
case, conceptually, is an autoregressive process with regression terms. Each daily summary
statistic is assumed to be the sum of regression terms plus an autoregressive error term. The
regression terms can represent the annual trend, the month to month variation within a year,
variation within a week, site effects, meteorological measurements, and similar explanatory
variables. The autoregressive error term is generated from a statistical model which assumes
that each value is an error term plus a constant times the previous value, another constant
times the second previous value, and so on up to a certain lag. The error terms are assumed
to be independently drawn from a normal distribution, and are often referred to as white noise.

More sophisticated ARJMA models allow for moving average terms, differencing, and
seasonal differencing. A moving average term can be added to an autoregressive model by
replacing the independent error terms with sums of innovation terms; the ith error term is the
sum of the ith innovation plus a multiple of the i-lth innovation, plus a multiple of the i-2th
innovation, and so on; the innovations are assumed to be white noise. Differencing expands
the set of possible models by assuming that the ARJMA model applies to the difference

9401 lrl.Ol

-------
between one daily value and the following daily value. Seasonal differencing expands the
process further by using differences between values a fixed number of periods apart (e.g. 12
month seasonal differences could be used to take into account monthly effects.)

Differencing will affect the trend estimation. For example, if there is an assumed linear trend
in the annual means, then the mean of the differences between concentrations 365 days apart
will be the trend rate (slope of the trend line in the annual means).

The number of possible time series models is immense and it can require considerable
expertise to select the best model, or even to select a good fitting model. The methods
suggested above for testing normality and independence can be applied to appropriately
defined residuals. A useful definition is to calculate the residual as the difference between the
observed daily summary statistic and the best model prediction based on all observed data up
to the previous day. For application of the statistical tests for independence and normality it
will first be necessary to divide these residuals by their estimated variances (since the variances
of consecutive residuals are not equal). A commonly used test for time series analysis is the
portmanteau test which is based on sum up to lag k of the squared autocorrelations for the
residuals. This statistic is compared with a chi-square distribution to determine significance
(large values imply a poor fit).

The fitting of these models to the raw data can be performed using modern time series
software. A crucial requirement for the analysis of PAMS data is that the approach allows for
model fitting even if there are substantial blocks of missing values; PAMS data will be
collected during summer months only at many sites. One very useful software package that
allows for almost any pattern of missing values (except for values at the beginning of the
series) is the Splus package (a product of Statistical Sciences, Inc.). This software can fit any
type of univariate ARIMA model including regression terms and seasonal differencing, taking
into account missing values. The method is a version of the maximum likelihood method
using a Kalman filter and a state space representation.

The extreme value model of Smith (1989) described in the next subsection is an example of a
time series model that includes trends and serial dependence and is based on extreme value
distributions rather than normal distributions. Software for such complex statistical analyses is
not directly available in commercial software packages and so these analyses can require a
substantial programming effort.

PROCEDURES BASED ON EXTREME VALUES AND EXCEEDANCES

As discussed above, the proposed methods in this section are designed for the analysis of
trends in ozone rather than other PAMS species. We shall discusses methods based on fitting
extreme value distributions to the daily maxima, and also discuss methods based on the
analysis of annual exceedance rates, defined as the number of days per year that the daily
maximum exceeds the NAAQS.

The theory of extreme values can be used to estimate the distribution of the annual maximum
hourly concentration and/or the second up to the kth highest daily maximum hourly

9401 lrl.Ol

-------
concentration, and to estimate the distribution of the number of days for which the daily
maximum exceeds a high threshold (that may or may not be the ozone NAAQS). This section
considers some trend analyses based on these approximations. We begin with a simple, but
not very powerful, non-parametric technique, based on the chi-square distribution, to compare
exceedance rates in different years. We then present the use of the Poisson process
approximation for the daily exceedances. This approach is now being routinely applied for the
EPA Trends reports (EPA, 1984-1991). Other applications of extreme value theory that have
been used in the past are also presented. These other methods are potentially very useful in
analyzing ozone trends but may be too complex for routine application.

Chi-Square Test of Trend

The Chi-square test of trend (EPA, 1974) is a simple test primarily applicable to ozone data to
compare exceedance rates (exceedances per year) for two different years. A simple two-by-
two table is created giving the number of NAAQS exceedance days and NAAQS non-
exceedance days for each year. If there were no trend, then the proportions of exceedance
days per year would be equal for both years. The differences between the observed numbers
of exceedance days and the expected numbers in the case of no trend can be used to compute
a chi-squared statistic. Because of the minimal amount of information used to compute this
trend test statistic, the test has the disadvantage of having a very low trend detection
probability which in most cases outweighs the advantage of simplicity.

Poisson Process Approximation

The simplest approach that uses extreme value theory is based on the result that exceedance
days will follow a Poisson process in the limit, provided that the dependence between daily
concentrations separated by a given number of days decreases sufficiently fast as the
separation increases. Assuming that the numbers of exceedances for different sites are
approximately independent, it follows that the total number of exceedance days for a given
year (summed over the sites) will approximately have a Poisson distribution. The Poisson
distribution has a variance equal to the mean, and the maximum likelihood estimate of this
parameter will be the observed number of exceedances. If annual exceedances are averaged
across a large number of sites, then the annual average number of exceedances per site will be
approximately normally distributed, with a mean estimated by the annual average number of
exceedance days per site and a variance estimated by the annual average number of
exceedance days per site divided by the number of sites.

9401 lrl.Ol

-------
The Poisson distribution model for the exceedance rates can be used to calculate simultaneous
confidence intervals for the annual mean number of exceedances per site using the Bonferroni
method. (The Tukey studentized range method is not applicable in this case because the
variance varies from year to year.) This approach is derived in Pollack et al. (1984) and has
been applied in several of the annual EPA Trends Reports (EPA, 1984-1991). The Bonferroni
approach is used in these situations to compute confidence intervals such that the probability
of erroneously determining one or more significant differences is 5 percent or less. In general
these Bonferroni intervals are wider than the unknown width needed to exactly attain an
overall 5 percent error probability, i.e., the Bonferroni intervals are an upper bound
approximation to the exact 95 percent simultaneous confidence intervals.

Advanced Methods

More powerful but much more complex procedures based on extreme value theory fit detailed
statistical models to the process of exceedances (i.e. the sequence of records that scores each
day as an exceedance or not an exceedance) or to the daily maxima themselves. We shall
describe here some specific statistical papers by Smith and Shively describing the results of
these approaches. These methods are potentially very useful but are probably too complex for
routine use.

Shively (1991) used an approach similar to the Poisson exceedance count model to estimate
the long-term trend in ozone exceedance rates for Houston daily maxima. The sequence of
daily exceedances of the selected high ozone threshold was modeled as a non-homogeneous
Poisson process. Thus the exceedances were assumed to follow a Poisson process with a rate
that was not constant. The logarithm of the exceedance rate for a given day is the sum of
multiples of certain meteorological measurements for that day and of a multiple of the
calendar year. The calendar year multiple gives the estimated trend.

In another paper, Shively (1990) used the limiting joint extreme value distribution for the k
highest daily maximum hourly ozone concentrations for each of several years. This limiting
distribution assumes that all daily maxima are approximately independent, and that for each
year the daily maxima have the same distribution. The location parameter (a parameter
related to the mean of the limiting distribution) was assumed to change linearly with the year.
The maximum likelihood method was used to estimate the parameters, but a bootstrap method
was used to determine the statistical significance of the trend, since the amount of data used in
the analysis was too small to apply asymptotic theory for the significance test.

The methods of Smith (1989) use the latest advances in extreme value theory to derive a very
complete description of the sequence of daily maxima that incorporates the most general
limiting extreme value distribution for the upper tail, the possible clustering of exceedances,
seasonal trends (within year), and annual trends (across years). Since exceedance days often
cluster together in cases of strong serial dependence, Smith fitted the trend model to all hourly
ozone concentrations greater than a high threshold separated in time by more than a cluster
interval; if more than one hourly exceedance of the threshold occurred within the cluster
interval, only the highest of the cluster exceedances was used to fit the model. To fit ozone

9401 lrl.01

-------
data from Houston, Texas (1973-1986) various thresholds (0.08, 0.10, 0.12, 0.16, 0.20, 0.26,
0.28, and 0.30 ppm) and two alternative cluster intervals (24 and 72 hours) were used with
somewhat different results.

According to the limiting extreme value theory model, the cluster exceedances occur
according to a Poisson process, and the distribution of the cluster maximum concentration is
the tail generalized Pareto distribution (GPD). The tail GPD has a location, scale, and shape
parameter. To treat seasonality, which is variation within the year, the scale and shape
parameters differ by the calendar month (or pair of months), but are the same for every year.
To treat trend, which is variation from year to year, the location parameter was assumed to be
an intercept plus a slope parameter multiplied by the calendar year; both intercept and slope
vary by calendar month (or pair of months). This complex model was fitted by the maximum
likelihood method.

The extreme value theory model in Smith (1989) is the most realistic application of extreme
value theory to ozone data since it incorporates serial dependence, seasonal dependence,
annual trends, and the most general limiting extreme value distribution. The routine
application of Smith's model by air quality managers is difficult because of the computational
difficulties in fitting the model and the very difficult problem of selecting reasonable choices
for the threshold and cluster interval; Smith's selections for Houston are likely to be
inappropriate for many other cities. An even more complete analysis would include terms
representing meteorological effects into the extreme value theory model and allow for
nonlinear trend functions.

9401 lrl.01

-------
References

Bloomfield, P., G. Oehlert, M. L. Thompson, and S. Zeger. 1983. A frequency domain
analysis of trends in Dobson total ozone records. J. Geophysical Res. ,
88(C 13): 8512-8522.

Capel, J., T. R. Johnson, and T. McCurdy. 1983. "Analysis of Ozone Trends for Selected
Indices of Daily Maximum Air Quality Data." Air Pollution Control Association
Annual Meeting, Atlanta, Georgia (June 19-24, 1983).

Chock, D. P., S. Kumar, and R. W. Herrmann. 1982. An analysis of trends in oxidant air
quality in the South Coast Air Basin of California. Atmos. Environ.. 16(11):2615-
2624.

Cohen, J. P., and A. K. Pollack. 1991. "General Linear Models Approach to Estimating
National Air Quality Trends Assuming Different Regional Trends." Systems
Applications International, San Rafael, California (SYSAPP-91/035).

EPA. 1973. The National Air Monitoring Program: Air Quality and Emissions Trends -
Annual Report. U.S. Environmental Protection Agency, Office of Air Quality
Planning and Standards, Research Triangle Park, North Carolina (450/l-73-001a and
b).

EPA. 1974. Guideline for the Evaluation of Air Quality Trends. Office of Air Quality
Planning and Standards, U.S. Environmental Protection Agency.

Freas, W. A., and E. Sieurin. 1977. "A Nonparametric Calibration Procedure for Multi-

Source Urban Air Pollution Dispersion Models." Fifth Conference on Probability and
Statistics in Atmospheric Sciences, American Meteorological Society, Las Vegas,
Nevada.

Hirsch, R. M., and J. R. Slack. 1984. A nonparametric trend test for seasonal data with serial
dependence. Water Resources Res.. 20(6):727-732.

Hirsch, R. M., J. R. Slack, and R. A. Smith. 1982. Techniques of trend analysis for monthly
water quality data. Water Resources Res. . 18(1): 107-121.

Kolaz, D. J., and R. L. Swinford. 1988. "Ozone Air Quality: How Does Chicago Rate?"
81st Annual Meeting of the Air Pollution Control Association, Dallas, Texas (June
1988).

Kolaz, D. J., and R. L. Swinford. 1989. "Ozone Trends in the Greater Chicago Area."

Ozone Conference on "Federal Controls for Ozone Around Lake Michigan," Lake
Michigan States' Section and Wisconsin Chapter of the Air and Waste Management
Association (October 12-13, 1989).

9401 lrl.01

-------
Kumar, S., and D. P. Chock. 1984. An update on oxidant trends in the South Coast Air
Basin of California. Atmos. Environ.. 18(10):2131-2134.

Lettenmaier, D. P. 1976. Detection of trends in water quality data from records with
dependent observations. Water Resources Res.. 12(5):1037-1046.

Pollack, A. K., and W. F. Hunt 1985. "Analysis of Trends and Variability in Extreme and
Annual Average Sulfur Dioxide Concentrations." Air Pollution Control Association
Specialty Conference on "Quality Assurance in Air Pollution Measurements," Boulder,
Colorado.

Pollack, A. K., W. F. Hunt, Jr., and T. C. Curran. 1984. "Analysis of Variance Applied to
National Ozone Air Quality Trends." 77th Annual Meeting of the Air Pollution
Control Association, San Francisco, California (June 24-29, 1984).

Pollack, A. K., and T. S. Stocking. 1989. "General Linear Models Approach to Estimating
National Air Quality Trends." Systems Applications, Inc., San Rafael, California
(SYSAPP-89/098).

Reinsel, G., G. C. Tiao, M. N. Wang, R. Lewis, and D. Nychka. 1981. Statistical analysis of
stratospheric ozone data for the detection of trends. Atmos. Environ.. 15(9): 1569-
1577.

SCAQMD. 1991. "Final Air Quality Management Plan 1991 Revision. Final Appendix II-B:
Air Quality Trends in California's South Coast and Southeast Desert Air Basins, 1976-
1990." South Coast Air Quality Management District.

Shively, T. S. 1990. An analysis of the long-term trend in ozone data from two Houston,
Texas monitoring sites. Atmos. Environ.. 24B(2):293-301.

Shively, T. S. 1991. An analysis of the trend in ground-level ozone using nonhomogeneous
Poisson processes. Atmos. Environ.. 25B(3):387-395.

Smith, R. L. 1989. Extreme value analysis of environmental time series: An application to
trend detection in ground-level ozone. Statist. Sciences. 4:367-393.

Sweitzer, T. A., and D. J. Kolaz. 1984. "An Assessment of the Influence of Meteorology on
the Trend of Ozone Concentrations in the Chicago Area." Air Pollution Control
Association Specialty Conference on "Quality Assurance in Air Pollution
Measurements," Boulder, Colorado (October 14-18, 1984).

Wackter, D. J., and P. V. Bayly. 1987. "The Effectiveness of Connecticut's SIP on Reducing
Ozone Levels from 1976 through 1987." Air Pollution Control Association Specialty
Conference on "The Scientific and Technical Issues Facing Post-1987 Ozone Control
Strategies," Hartford, Connecticut (November 1987).

9401 lrl.01

-------
9401 lrl.Ol

-------
References

Anderberg, M. R. 1973. Cluster Analysis for Applications. Academic Press, New York.

Anderson, G. E. 1983. Human Exposure to Atmospheric Concentrations of Selected

Chemicals, Volume 1. U.S. Environmental Protection Agency (NTIS PB84-102540).

Austin, B. S., A. S. Rosenbaum, and S. R. Hayes. 1988. "User's Guide to the NEM/SAI
Exposure Model." Systems Applications International, San Rafael, California
(SYSAPP-88/051).

Chambers, J. M., W. S. Cleveland, B. Kleiner, and P. A. Tukey. 1983. Graphical Methods
for Data Analysis. Wadsworth, Belmont, California and Duxberry Press, Boston.

Chang, T. Y., S. J. Rudy, G. Kuntasal, and R. A. Gorse, Jr. 1989. Impact of methanol
vehicles on ozone air quality. Atmos. Environ., 23:1629-1644.

Cohen, J. P., T. E. Stoeckenius. 1992a. "Analysis of Sources of Variability in the Atlanta
1990 Ozone and Ozone Precursor Study Data." Systems Applications International,
San Rafael, California (SYSAPP-92/094).

Cohen, J. P., T. E. Stoeckenius. 1992b. "Literature Reviews on Ozone Design Values."
Systems Applications International, San Rafael, California (SYSAPP-92/098b).

Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood estimation from
incomplete data via the EM algorithm. J. Royal Statis. Soc., Ser. B, 39:1-38.

Dodge, M. C. 1984. Combined effects of organic reactivity and nonmethane hydrocarbon to
nitrogen dioxide ratio on photochemical oxidant formation: A modeling study.

Atmos. Environ., 18:1657-1666.

EPA. 1991a. NonroadEngine and Vehicle Emission Study—Report. U.S. Environmental
Protection Agency, Office of Air and Radiation, Washington, D.C. (ANR-443; 21A-
2001).

9401 lrl.01

-------
EPA.

1991b. VOC/PMSpeciation Data System. Electronic Format. U.S. Environmental
Protection Agency.

EPA. 1992a. Regulation of fuels and fuel additives: Standards for reformulated gasoline;
Proposed Rule (Supplementary). Federal Register, 57(74): 13416-13495.

EPA. 1992b. User's Guide for the Urban Airshed Model, Volume IV: User's Manual for
the Emissions Preprocessor System 2.0. Part A: Core FORTRAN System. U. S.
Environmental Protection Agency, Office of Air Quality Planning and Standards
(EPA-450/4-90-007D(R)).

EPA. 1993. Complex Model Database. Final version released 7-93. U.S. Environmental
Protection Agency, Office of Mobile Sources, Ann Arbor, Michigan.

Fujita, E. M., B. E. Croes, C. L. Bennett, D. R. Lawson, F. W. Lurmann, and H. H. Main.
1992. Comparison of emission inventory and ambient concentration ratios of CO,
NMOG, and NOv in California's South Coast Air Basin. J. Air Waste Manage. Assoc.,
42:264-276.

Galambos, J. 1978. The Asymptotic Theory of Extreme Order Statistics. Wiley, New York.

Gilbert, R. O. 1987. Statistical Methods for Environmental Pollution Monitoring. Van
Nostrand Reinhold, New York.

Hare, C. T., and J. J. White. 1991. "Toward the Environmentally-Friendly Small Engine:
Fuel, Lubricant, and Emission Measurement Issues" (paper 911222). Presented at
1991 Small Engine Technology Conference, Yokohama, Japan (October 21-25, 1991).

Harley, R. A., M. P. Hannigan, and G. R. Case. 1992. Respeciation of organic gas emissions
and the detection of excess unburned gasoline in the atmosphere. Environ. Sci.
Technol., 26(12):2395-2408.

Hoeckman, S. K. 1992. Speciated measurements and calculated reactivities of vehicle
exhaust emissions from conventional and reformulated gasolines. Environ. Sci.
Technol., 26:1206-1216.

Hunt, W. 1991. "An Examination of Alternative Air Quality Indicators." The 3rd
International Conference on Environmetrics, Madison, Wisconsin.

Hunt, W., R. Faoro, T. Curran, and J. Munty. 1984. "Estimated Cancer Incidence Rates
from Air Toxic Pollution." U.S. Environmental Protection Agency, Office of Air
Quality Planning and Standards, Research Triangle Park, North Carolina.

Hunt, W., R. Fauro, and T. Curran. 1985. "Estimation of Cancer Incidence Cases and Rates
for Selected Toxic Air Pollutants Using Ambient Air Pollution Data, 1970 Versus
1980." U.S. Environmental Protection Agency, Office of Air Quality Planning and
Standards, Research Triangle Park, North Carolina.

9401 lrl.01

-------
Hunt, W., and O. Gerald. 1991. "The Enhanced Ozone Monitoring Network Required by the
New Clean Air Act Amendments." Presented at the Air and Waste Management
Association Meeting, Vancouver, British Columbia, Canada.

Johnson, T. R., and R. A. Paul. 1981. "The NAAQS Exposure Model (NEM) and Its

Application to Nitrogen Dioxide." PEDCo Environmental, Durham, North Carolina.

Johnson, T. R., and R. A. Paul. 1982. "The NAAQS Exposure Model (NEM) Applied to
Carbon Monoxide." PEDCo Environmental, Durham, North Carolina.

Johnson, T. R., J. E. Capel, and M. McCoy. 1993a. "Estimation of Ozone Exposures
Experienced by Urban Residents using a Probabilistic Version of NEM and 1990
Population Data." International Technology Air Quality Services, Durham, North
Carolina.

Johnson, T. R., M. McCoy, J. E. Capel, M. Alberts, and B. Morrison. 1993b. "Estimation of
Incremental Benzene Exposures and Associated Cancer Risks Attributable to a
Petroleum Refinery Waste Stream Using the Hazardous Air Pollutant Exposure Model
(HAPEM)." Air and Waste Management Association Annual Meeting and Exhibition,
Denver, June 13-18, 1993.

Kenski, D. M., R. A. Wadden, P. A. Scheff, and W. A. Lonneman. 1993. "A Receptor

Modeling Approach to VOC Emission Inventory Validation in Five U.S. Cities." Air
and Waste Management Association 86th Annual Meeting and Exhibition, Denver,
Colorado (June 13-18, 1993).

Larsen, L. C., R. A. Bradley, and G. L. Honcoop. 1990. "A New Method of Characterizing
the Variability of Ozone Air Quality-Related Indicators." Transactions Tropospheric
Ozone and the Environment, Air and Waste Management Association International
Conference, Pittsburgh, Pennsylvania.

Lewis, C. W., T. L. Conner, R. K. Stevens, J. F. Collins, and R. C. Henry. 1993. "Receptor
Modeling of Volatile Hydrocarbons Measured in the 1990 Atlanta Ozone Precursor
Study." Air and Waste Management Association 86th Annual Meeting and Exhibition,
Denver, Colorado (June 13-18, 1993).

9401 lrl.01

-------
Ligocki, M. P., R. R. Schulhof, R. E. Jackson, M. M. Jimenez, G. Z. Whitten, G. M. Wilson,
T. C. Myers, and J. L. Fieber. 1992. "Modeling the Effects of Reformulated
Gasolines on Ozone and Toxics Concentrations in the Baltimore and Houston Areas."
Systems Applications International, San Rafael, California (SYSAPP-92/127).

McAllister, R., E. Bowles, J. DeGarmo, J. Rice, R. F. Jongleux, R. G. Merrill, Jr., and J.

Bursey. 1991. "1990 Urban Air Toxics Monitoring Program." U.S. Environmental
Protection Agency and Radian Corporation (EPA-450/4-91-024).

National Research Council. 1991. Rethinking the Ozone Problem in Urban and Regional
Air Pollution. National Academy Press, Washington, D.C.

Nelson, P. F., and S. M. Quigley. 1983. The /??, /^-xylenes: ethyl benzene ratio: A technique
for estimating hydrocarbon age in ambient atmospheres. Atmos. Environ., 17(3):659-
662.

Nelson, P. F., S. M. Quigley, and M. Y. Smith. 1983. Sources of atmospheric hydrocarbons
in Sydney: A quantitative determination using a source reconciliation technique.
Atmos. Environ., 17(3):439-449.

O'Hara, P. L., R. A. McAllister, D.-P. Dayton, J. E. Robbins, R. F. Jongleux, R. G. Merrill,
Jr., J. Rice, J. E. McCartney, T. L. Sampson, and J. Y. Martin. 1992. "1991
Nonmethane Organic Compound, Speciated Nonmethane Organic Compound, and
Three-Hour Air Toxics Monitoring Program." U.S. Environmental Protection Agency
and Radian Corporation (EPA-454-R-92-010).

Preisendorfer, R. W., and C. D. Mobley. 1988. Principal Component Analysis in
Meteorology and Oceanography. Elsevier, New York.

Purdue, L. J., D.-P. Dayton, J. Rice, and J. Bursey. 1991. "Technical Assistance Document
for Sampling and Analysis of Ozone Precursors." U.S. Environmental Protection
Agency and Radian Corporation (EPA-600/8-91/215; PB92-122795).

Roberts, E. M. 1979a. Review of statistics of extreme values with applications to air quality
data. Parti. Review. J. Air Pollut. Control Assoc., 29(6):632-637.

Roberts, E. M. 1979b. Review of statistics of extreme values with applications to air quality
data. Part II. Applications. J. Air Pollut. Control Assoc., 29(7):733-740.

Roberts, P. T., H. H. Main, L. R. Chinkin, S. F. Musarra, and T. E. Stoeckenius. 1993.

"Methods Development for Quantification of Ozone and Ozone Precursor Transport
in California." Sonoma Technology Inc., Santa Rosa, California and Systems
Applications International, San Rafael, California (STI-90100-1233-DFR-2).

Rosenbaum, A. S. 1994. Personal communication.

Scheff, P. A., R. A. Wadden, B. A. Bates, and P. F. Aronian. 1989. Source fingerprints for
receptor modeling of volatile organics. JAPCA, 39:469-478.

9401 lrl.01

-------
Scheff, P. A., R. A. Wadden, C. B. Keil, J. Graf-Teterycz, and J.-Y. Jeng. 1992.

"Composition of Volatile Compound Emissions from Spark Ignition and Diesel
Vehicles, Coke Ovens, Wastewater Treatment Plants and Wood Combustion."
Presented at the 85th Annual Meeting & Exhibition, Air and Waste Management
Association, Kansas City, Missouri (June 21-26, 1992).

Singh, H. B., L. J. Salas, B. K. Cantrell, and R. M. Redmond. 1985. Distribution of aromatic
hydrocarbons in the ambient air. Atmos. Environ., 19(11): 1911-1919.

Stoeckenius, T. E. 1993a. "Meteorological Influences on Ozone Concentrations and Trends
Analysis: A Literature Review." Systems Applications International, San Rafael,
California (SYSAPP-93/007).

Stoeckenius, T. E. 1993b. "Results of Spatial Variability and Ozone Indicator Analyses for
the Ozone Design Value Study; Work Assignment 2-1, Contract No. 68D00096."
Memorandum to Warren Freas, Office of Air Quality Planning and Standards, U.S.
Environmental Protection Agency, dated 29 June 1993.

Stoeckenius, T. E., S. B. Shepard, and R. K. Iwamiya. 1993. "Subcontract AQMD-93-01,
Work Assignment 11." Memorandum to Richard Rehm, Pacific Environmental
Services, dated 30 December 1993.

Tiao, G. C. 1982. Statistical analysis of the effect of car inspection and maintenance
programs on the ambient CO concentrations in Oregon. Environ. Sci. Technol.,

16(6).

Tukey, J. W. 1977. Exploratory Data Analysis. Addison-Wesley, Reading, Massachusetts.

Whitby, R. A., and E. R. Altwicker. 1978. Acetylene in the Atmosphere: sources,

representative ambient concentrations and ratios to other hydrocarbons. Atmos.
Environ., 12:1289-1296.

9401 lrl.01

-------