Technical Basis for the EPA's Development of
the Significant Impact Thresholds for PM2 5 and
Ozone
-------
2
-------
EPA-454/R-18-001
April 2018
Technical Basis for the EPA's Development of the Significant Impact Thresholds for
PM2.5 and Ozone
U.S. Environmental Protection Agency
Office of Air Quality Planning and Standards
Air Quality Assessment Division
Research Triangle Park, NC
3
-------
Contents
1.0 Introduction: 5
2.0 Background on Air Quality Variability Approach 8
2.1 U.S. Ambient Monitoring Data 8
2.1.1 Ozone Monitoring Network 9
2.1.2 PM2.5 Monitoring Network 10
2.1.3 Monitoring Network Design 11
2.1.4 Air Quality System (AQS) Database 12
2.2 Statistical Methods and Assessing Significance Using Confidence Intervals 12
2.2.1 General Overview of Statistical Methods 12
2.2.2 Characterizing Air Quality Variability 16
2.2.3 Bootstrapping Method 19
3.0 Results of the Air Quality Variability Approach 23
3.1 Ozone results 23
3.2 PM2.5 Results (Annual and 24-hr) 26
3.2.1 Analysis of PM2 5 Spatial Variability 29
3.2.2 Analysis of the Influence of PM2.5 Monitor Sampling Frequency 35
4.0 Application of Air Quality Variability to Calculate SILs for the PSD Program 38
4.1 PSD Air Quality Analyses and Statistical Significance 38
4.1.1 Confidence Interval 38
4.1.2 Adjustment to the Level of the NAAQS 39
4.1.3 Selection of a Geographic Scale 39
4.1.4 Selection of the Three Most Recent Design Value Years 40
4.2 Analysis for Ozone 40
4.2.1 Ozone Temporal Trends 41
4.3 Analysis for PM2.5 43
4.3.1 PM2.5 Temporal Trends 48
5. Additional Information 50
4
-------
i uction i
In order to understand the nature of air quality, the EPA statistically estimates the distribution of
pollutants contributing to ambient air quality and the variation in that air quality. The statistical
methods and analysis detailed in this report focus on using the conceptual framework of
statistical significance to calculate levels of change in air quality concentrations that have a
"significant impact" or an "insignificant impact" on air quality degradation. Statistical
significance is a well-established concept with a basis in commonly accepted scientific and
mathematical theory. This analysis examines statistical significance for a range of values
measured by air quality monitors. The statistical methods and data reflected in this analysis may
be applicable for multiple regulatory applications where EPA and state agencies seek to quantify
a level of impact on air quality that they consider to be either "significant" or "not significant."
Note: We have adopted the following convention throughout the document: a "significant
impact" (in quotes) refers to a level of air quality change that can be used in the permit analysis
of the ambient impacts from a facility to determine if it "causes, or contributes to" a violation of
the applicable National Ambient Air Quality Standards (NAAQS) or Prevention of Significant
Deterioration (PSD) increment, whereas we use significant (italicized) to refer to a mathematical
assessment of probabilistic properties.
While this technical analysis may have utility in several contexts, the primary purpose of this
document is to quantify the degree of air quality impacts corresponding to different confidence
intervals (related to the statistical analysis presented here) that can be used in determining what
is an "insignificant impact" when considering an application for a permit under the PSD
program. In order to obtain a preconstruction permit under the PSD program, an applicant must
demonstrate that the increased emissions from its proposed modification or construction will not
"cause or contribute to" a violation of any NAAQS or PSD increment.1 One way that this
criterion can be met is by showing that the increased emissions from a proposed source will not
have a significant impact on ambient air quality at any location, including locations where an
exceedance of the NAAQS or PSD increment is occurring or may be projected to occur.2 For the
purposes of a PSD permit, the EPA has promulgated analytical methods involving air quality
modeling and monitoring for conducting these compliance demonstrations.3 More generally
(e.g., for purposes of designating areas as attainment or nonattainment), compliance with the
NAAQS is determined by comparing the measured "design value"" (DV) at an air quality
monitor to the level of the NAAQS for the relevant pollutant.4 A DV is a statistic or summary
metric based on the most recent one or three years (depending on the specific standard) of
1 40 Code of Federal Regulations (CFR) 51.166 and 52.21.
2 Memorandum from Peter Tsirigotis, EPA Office of Air Quality Planning and Standards, Guidance on Significant
Impact Levels for Ozone and Fine Particles in the Prevention of Significant Deterioration Permitting Program,
April 17, 2018.
3 40 CFR, part 51, Appendix W, 82 FR 5182 (January 17, 2017), Revisions to the Guideline on Air Quality Models:
Enhancements to the AERMOD Dispersion Modeling System and Incorporation of Approaches to Address Ozone
and Fine Particulate Matter.
4 A design value is a statistic that describes the air quality status of a given location relative to the level of the
NAAQS. More information may be found at: http://www3.epa.gov/airtrends/values.html.
5
-------
monitored data that describes the air quality status of a given location relative to the level of the
NAAQS.
The EPA has decided that an "insignificant impact" level of change in ambient air quality can be
characterized by the observed variability of ambient air quality levels. Since the cause or
contribute test is applied to the NAAQS in the PSD program, this analysis has been designed to
take into account the ambient data used to determine DVs and the form of the relevant NAAQS.
The EPA's technical approach, referred to as the "Air Quality Variability" approach, relies upon
the fact that there is inherent variability in the observed ambient data, which is in part due to the
intrinsic variability of the emissions and meteorology controlling transport and formation of
pollutants, and uses statistical theory and methods to model that intrinsic variability in order to
facilitate identification of a level of change in DVs that is acceptably similar to the original DV,
thereby representing a change in air quality that is not significant.5 The DVs and background
ambient concentrations that are used in the PSD compliance demonstrations are obtained through
the U.S. ambient monitoring network with measured data being archived for analysis in the
EPA's Air Quality System (AQS).6
Based on these observed ambient data, the EPA has estimated the variability of the air quality
levels of ozone and PM2.5 through applying a well-established statistical approach known as
bootstrapping. Bootstrapping is a method that allows one to construct measures to quantify the
uncertainty of sample statistics (e.g., mean, percentiles) for a population of data.7'8 The bootstrap
approach applied here uses a non-parametric, random resampling with replacement on the
sample dataset (in this case, the ambient air quality concentration data underlying the DVs),
resulting in many resampled datasets. This approach allows measures of uncertainty for sample
statistics when the underlying distribution of the sample statistic is unknown and/or the
derivation of the corresponding estimates is computationally unfeasible or intractable.7
Bootstrapping is also commonly utilized to overcome issues that can occur when quantifying
uncertainty in samples with correlated measurements. Bootstrapping has been used across a
variety of scientific disciplines and in a wide range of applications within the environmental
sciences.9'10'11,12 For example, bootstrapping has been used to evaluate the economic value of
5 This approach is applied here strictly for the purpose of section 165(a)(3) and no other parts of the Clean Air Act.
6 The AQS contains ambient air pollution data collected by EPA, state, local, and tribal air pollution control agencies
from over thousands of monitors. These data are used to assess air quality, assist in attainment/nonattainment
designations, evaluate State Implementation Plans for nonattainment areas, perform modeling for permit review
analysis, and other air quality management functions. More information may be found at: http://www.epa.gov/aqs.
7 Efron, B. (1979); Bootstrap methods: Another look at the jackknife. The Annals of Statistics 7(1): 1-26.
doi: 10.1214/aos/l 176344552.
8 Efron, B. (2003); Second Thoughts on the Bootstrap. Stat. Sci., 18, 135-140.
9 Schuenemeyer, J., Drew, L. (2010); Statistics for Earth and Environmental Scientists, John Wiley & Sons, Inc.
http://dx. doi. org/10.1002/9 7804 70650707. ch3.
10 Park, Lek, Baehr, Jorgensen, eds. (2015); Advanced Modelling Techniques Studying Global Changes in
Environmental Sciences, 1st Edition, Elsevier. ISBN 9780444635365.
11 Chandler, R., Scott, M. (2011); Statistical Methods for Trend Detection and Analysis in the Environmental
Sciences, John Wiley & Sons, Inc. ISBN: 978-0-470-01543-8.
12 Mudelsee, M. & Alkio, M. (2007); Quantifying effects in two-sample environmental experiments using bootstrap
confidence intervals, Env. Mod. & Software, 22, 84-96.
6
-------
clinical health analyses13 and environmental policies,14 in evaluations of environmental
monitoring programs,15 and in determining uncertainty in emissions inventories.16 Additionally,
the EPA has used bootstrapping techniques as a key component in evaluating air quality model
performance for use in our nation's air quality management system.17'18
The bootstrap technique, as applied in this analysis, quantifies the degree of air quality variability
at an ambient monitoring site and allows one to determine confidence intervals (CIs), i.e.,
statistical measures of the variability associated with the monitor-based DVs, to inform the
degree of air quality change that can be considered an "insignificant impact" for PSD
applications. This approach is fundamentally based on the idea that an anthropogenic
perturbation of air quality that is within a specified range may be considered indistinguishable
from the inherent variability in the measured atmospheric concentrations and is, from a statistical
standpoint, not significant at the given confidence level. Specifically, the analysis uses 17 years
(2000-2016) of nationwide ambient ozone and PM2.5 measurement data from the AQS database
to generate a large number of resampled datasets for ozone and PM2.5 DVs at each monitor from
which the appropriate design values are calculated. The DVs from the resampled datasets are
used to determine CIs that provide a measure of the inherent variability in air quality at the
monitor location. This variability may be driven by the frequency of various types of
meteorological and/or emissions conditions impacting a particular location. The analysis
estimates a range of CIs for each monitor. As discussed in Section 4.1.1 of this document and in
the Policy Document,2 the 50% CI was chosen to quantify the bounds of a change in air quality
that can be considered an "insignificant impact" for the purposes of meeting requirements under
the PSD program.
This technical basis document explains the analysis design and results provide the EPA's rational
basis to recommend Significant Impact Levels (SILs) values that can be applied as a tool for
making the PSD compliance demonstration required by the Clean Air Act (CAA) and PSD
regulations. The second section of this document provides an overview of EPA's Air Quality
Variability approach, including details on the ambient monitoring network, the ambient ozone
and PM2.5 data from AQS that are used to derive monitor-specific DVs, a general review of
statistical significance and confidence intervals, and a description of the bootstrap technique as
applied to characterize air quality variability. The third section presents the measures of air
quality variability determined from applying the bootstrap technique to the AQS data for ozone
and PM2.5. The last section provides an analysis of confidence intervals for the ozone and PM2.5
DVs and the implications of the geographical analysis performed in response to peer reviewer
13 Campbell, M., & Torgerson, D. (1999); Bootstrapping: Estimating Confidence Intervals for Cost-effectiveness
Ratios, Q. J. of Med., 92, 177-182.
14 Kochi, I., Hubbell, B., & Kramer, R. (2006); An Empirical Bayes Approach to Combining and Comparing
Estimates of the Value of a Statistical Life for Environmental Policy Analysis, Env. & Resource Econ., 34, 385-406.
15 Levine, C., et al (2014); Evaluating the efficiency of environmental monitoring programs, Ecol. Ind., 39, 94-101.
16 Tong, L., et al (2012); Quantifying uncertainty of emission estimates in National Greenhouse Gas Inventories
using bootstrap confidence intervals, Atm. Env., 56, 80-87.
17 Hanna, S. (1989); Confidence limits for air quality model evaluations, as estimated by bootstrap and jackknife
resampling methods, Atm. Env., 6, 1385-1398.
18 Cox, W. & J. Tikvart (1980); A statistical procedure for determining the best performing air quality simulation
model, Atm. Env., 9, 2387-2395.
7
-------
comments. The resulting values chosen by the EPA can serve as SIL levels for the ozone
NAAQS and the annual and 24-hour PM2.5 NAAQS.
1 I 1 i1 1' vp, ,,' », 1 Air r\i 1
-------
For the 24-hr PM2.5 NAAQS, the DV is the 3-year average of the annual 98th percentile
24-hr average PM2.5 mass concentration. A monitor is in compliance with the 24-hr PM2.5
standard if the DV is less than or equal to 35 [j,g/m3.
2.1.1 Ozone Monitoring Network
The ozone monitoring network consists of only one type of monitor, Federal Equivalent Method
(FEM) monitors.23 The FEM for ozone uses ultraviolet (UV) light to determine ozone
concentrations at high temporal resolutions, on the order of seconds to minutes, although only
hourly averages are typically recorded. Unlike PM2.5 monitors, most ozone monitors are not
required to operate year-round, and are instead required to operate only during the "ozone
season." The ozone season is the time of year that high ozone concentrations (which may
potentially exceed the NAAQS) can be expected at a particular location. The ozone season varies
widely by location, but is generally focused on the summer months, with a typical season
spanning March through October. During the period of 2000 through 2016, a total of 1,708
ozone monitors reported data, with the locations of the ozone monitors shown in Figure 1 along
with the average number of days sampled each year that the monitor was active.
Average number of 03 samples each year
50"
-120 -100 -80
long
Figure 1 - Location and average number of monitored ozone days each year from the ozone
sampling network for the years 2000-2016.
23 FEM monitors are approved on an individual basis. The list of approved monitors and the accompanying CFR
references can be found at http://www3.epa.gov/ttn/amtic/criteria.html.
9
-------
2.1.2 PM2.5 Monitoring Network
The PM2.5 monitoring network consists of two types of monitors: Federal Reference Method
(FRM)24 and FEM23 monitors. FRM monitors use a filter-based system, passing a low volume of
air through a filter over a period of 24 hours (midnight to midnight) to determine 24-hr average
concentrations. All monitors operate year-round, but not all monitors operate every day
throughout the year. Although some FRM sites operate every day {i.e., 1:1 monitors), most
operate every third day (1:3 monitors), while a smaller number of monitors operate only every
sixth day (1:6 monitors), according to a common schedule provided by the EPA. Newer FEM
monitors are "continuous" monitors that can provide hourly (or shorter) PM2.5 measurements and
have undergone testing to demonstrate conformance (including linear regression, slope/intercept,
time series, and mean concentration ratios) with the FRM monitors.25 FEM monitors operate on a
1:1 schedule and daily averages from FEM monitors are determined by averaging the 24 hourly
measurements collected throughout the day. FEM monitors are slowly replacing FRM monitors,
so monitoring sites with a long data record may have data derived from either an FEM, FRM, or
combination of both types of monitors. Although the FRM and FEM monitors have small
differences in their performance, the largest impact to the bootstrap technique of this transition
from all FRM monitors to a mix of FRM and FEM monitors is the gradual increase in the
frequency of PM2.5 measurements over time. During the period of 2000 through 2016, a total of
1,773 PM2.5 monitors reported data, with the locations of the PM2.5 monitors shown in Figure 2
along with the average number of days sampled each year that the monitor was active.
24 Appendix B to Part 50Reference Method for the Determination of Suspended Particulate Matter in the
Atmosphere (High-Volume Method).
25 Noble, C. A. et al (2001); Federal Reference and Equivalent Methods for Measuring Fine Particulate Matter,
Aerosol Sci. & Tech, 34:5, 457-464.
10
-------
Average number of PM2.5 samples each year
50-
-120 -100 -80
long
Figure 2 - Location and average number of monitored PM days each year from the PM2.5
sampling network for the years 2000-2016.
2.1.3 Monitoring Network Design
The ambient air monitoring network is designed to support several objectives. In consideration of
the location and measurement taken, each monitor is assigned a spatial scale. Spatial scales are
generally associated with the size of the area that a pollutant monitor represents. The monitor
spatial scales are defined in 40 CFR part 58, Appendix D as:
1. MicroscaleDefines the concentrations in air volumes associated with area dimensions
ranging from several meters up to about 100 meters.
2. Middle scaleDefines the concentration typical of areas up to several city blocks in size
with dimensions ranging from about 100 meters to 0.5 kilometer.
3. Neighborhood scaleDefines concentrations within some extended area of the city that
has relatively uniform land use with dimensions in the 0.5 to 4.0 kilometers range. The
neighborhood and urban scales listed below have the potential to overlap in applications
that concern secondarily formed or homogeneously distributed air pollutants.
4. Urban scaleDefines concentrations within an area of city-like dimensions, on the order
of 4 to 50 kilometers. Within a city, the geographic placement of sources may result in
there being no single site that can be said to represent air quality on an urban scale.
5. Regional scaleDefines usually a rural area of reasonably homogeneous geography
without large sources, and extends from tens to hundreds of kilometers.
11
-------
6. National and global scalesThese measurement scales represent concentrations
characterizing the nation and the globe as a whole.
Depending on the distribution and types of sources in an area and the need to determine
particular aspects of the air quality, there may be multiple types of monitors placed in an area.
For example, a large metropolitan area, due to its size, may require several "urban scale" or
"neighborhood" scale monitors to capture the range of air quality in the area. Such an area might
also have "microscale" monitors placed in order to assess the impacts from a single source or
small group of sources as well as a "regional scale" monitor to establish the background air
quality in an area in order to differentiate the impacts from the urban area. Conversely, for a
smaller urban area a single "urban scale" monitor may be considered sufficient to fully
characterize the local air quality. Thus, there are wide variety of monitors in any area, covering a
range of air quality monitoring needs. For ozone, the appropriate spatial scales are neighborhood,
urban, and regional scale. For PM2.5, in most cases the appropriate spatial scales are
neighborhood, urban, or regional scales; however, in some cases it may be appropriate to
monitor at smaller scales, depending on the monitoring objective.
2.1.4 Air Quality System (AQS) Database
The EPA's AQS database contains ambient air pollution data collected by state, local, and tribal
air pollution control agencies, as well as EPA and other federal agencies, from the monitoring
stations described above (as well as monitoring stations for other NAAQS).6 AQS also contains
meteorological data, descriptive information about each monitoring station, and data quality
assurance/quality control information. The Office of Air Quality Planning and Standards
(OAQPS), state and local air agencies, tribes, and other AQS users rely upon the system data to
assess air quality, assist in attainment/nonattainment designations, evaluate state implementation
plans for nonattainment areas, perform modeling for permit review analysis, and execute other
air quality management functions related to the CAA.
2.2 Statistical Methods and Assessing Significance Using Confidence Intervals
This section provides a general overview of statistical methods, how air quality variability is
characterized for this analysis, and the bootstrapping approach employed to estimate air quality
variability.
2.2.1 General Overview of Statistical Methods
Statistics is the application of mathematical and scientific methods used to interpret, analyze and
organize collections of data. Most statistical techniques are based on two concepts, a
"population" and a "sample." The population represents all possible measurements or instances
of the entity being studied. The sample is a subset of the population that is able to be collected or
measured. Since the sample is only a portion of the population, any observations or conclusions
made about the population based on the sample will have uncertainty, i.e., there will be some
error in those observations or conclusions due to the fact that only a subset of the population was
sampled or measured. Consider the following example:
12
-------
As discussed above, the ambient monitoring network is designed to capture a range of
ambient impacts from facilities and to characterize both background and local air quality.
Suppose we want to determine the average ground-level PM2.5 levels in a remote state
wilderness area over the course of a year. Assuming the wilderness area does not have
major PM2.5 sources and the area is remote (i.e., there are no major metropolitan areas
upwind), a single, well-placed "regional scale" monitor may be sufficient to capture the
nature of PM2.5 levels in the area (i.e., the PM2.5 levels within the wilderness area are
homogenous). Due to the remote nature of the monitor, it is only operated on a 1-in-
every-6 days schedule, such that one 24-hr average PM2.5 measurement is made every six
days. In this case, we may consider the population to be the 24-hr average PM2.5
concentrations every day (365 potential samples over the whole year) within the
wilderness area. The sample would be the l-in-every-6 days 24-hr average PM2.5
measurements (60 samples taken over the whole year). From this sample of the
population, a mean 24-hr average PM2.5 concentration can be calculated, which can be
characterized as representing the mean 24-hr average PM2.5 concentration from the
population, with some amount of error between the sample mean and the population
mean. By using information about the size and distribution of the sample, an estimate of
the population variability (i.e., the spread of the distribution), can be determined (e.g., the
standard deviation).
Significance testing, or determining the statistical significance of a particular value as it relates
to a sample, is a major application of statistics. In formal hypothesis testing, a statement of non-
effect or no difference - termed the null hypothesis - is established prior to taking a sample in
order to test the effect of interest. A statistical test is then carried out to determine whether a
significant effect (or difference) is present at the desired level of confidence. Note that not
finding a statistically significant difference is not a claim of the null hypothesis being true or a
claimed probability of the truth of the null hypothesis.26 Non-significance simply shows the data
to be compatible with the null hypothesis under the set of assumptions associated with the
statistical test.26 A CI can be used as a mathematically equivalent procedure26 to a formal
hypothesis test for significance. CIs are constructed based on the desired confidence level and
characteristics of the sample, including the sample variance, to determine error bars for the
statistic of interest, such as the mean. Error bars constructed in this fashion are referred to as CI
because they convey the confidence in the sample estimate of the population given the size of
and the variability in the sample. This can then be used to determine if the mean is significantly
different from a particular value of interest, such as zero or some other threshold for the
pollutant, by examining whether the value of interest is within the CI or outside the bounds of the
CI.
The most well-known approach to deriving CIs uses the characteristics of sampling distributions
and the Central Limit Theorem. The sampling distribution of the mean results from sampling all
possible samples of a specified size n from the true population and considering the distribution of
the resulting means from each sample. The Central Limit Theorem is based on the fact that the
26 Gelman, A. P values and Statistical Practice, Epidemiology, 2013, Vol 24, Num 1, pg 70.
13
-------
sampling distribution of the sample mean will center around the population mean. Regardless of
the distribution of the original population, the sampling distribution of the mean will be normally
distributed.27 Additionally, the sampling distribution will have a spread, with a standard
deviation that is inversely proportional to the square root of the sample size n (i.e., the larger the
sample size, the tighter the spread of the sampling distribution of the mean around the true mean
of the population). This allows for the derivation of a CI by calculating the estimated mean
plus/minus the standard error, which is a function of the sample size, the standard deviation, and
the desired level of confidence.
To relate these statistical tests to a practical application, we continue the hypothetical example
from above:
Suppose that the observed annual mean PM2.5 concentration for a given year is 7 |ig/m3,
and that based on the Central Limit Theorem utilizing the properties of the sampling
distribution, the 95% CI for the annual mean is determined to be 6.4-7.6 |ig/m3 (7 |ig/m3
+/- 0.6 |ig/m3, where 0.6 |ig/m3 has been determined based on the standard error and the
desired level of confidence). Since the CI contains the value 7.5 |ig/m3, we may,
therefore, conclude based on this specific sample that the mean of the population is not
significantly different from 7.5ug/m3 at the 0.95 confidence level. Conversely, if the 95%
CI for the annual mean PM2.5 concentration is 6.7-7.3 |ig/m3 (7 |ig/m3 +/- 0.3 |ig/m3),
then the CI does not contain 7.5 |ig/m3 and it could be concluded that the mean of the
population is significantly different from 7.5 |ig/m3 at the 0.95 confidence level.
The Central Limit Theorem also tells us that due to the Gaussian (Normal Distribution)
properties of a sampling distribution, 68/95/99.7 percent of the values in the theoretical sampling
distribution will be within 1/2/3 standard deviations of the true population mean respectively.
Additionally, in any symmetric distribution such as the Gaussian obtained with the theoretical
sampling distribution, the mean is equal to the median, where the median is the center value such
that 50% of the values are below the median and 50% above. Thus, an alternative approach to
deriving a CI directly utilizes these characteristics of the sampling distribution to consider the
spread around the sampling distribution mean. For example, a 95% CI would be defined as the
lowest value to the highest value of the 95% of the distribution that centers around the sampling
distribution mean. This corresponds to the 0.025 and 0.975 quantiles of the sampling
distribution. An example of this method of determining CIs is given in Figure 3, which shows a
distribution of the mean determined from repeated samples from the population. Note that in
practice the sampling distribution is approximately Normal. The average of the sample means is
6.98 |ig/m3. In order to determine the 95% CI, the data are first rank-ordered from smallest to the
largest concentration value, then the bounds of the 0.025 and 0.975 quantiles are the bounds of
the CI (the 50% CI is also shown as an example).
27 These are asymptotic properties given that the sample size n is large and that the number of samples (N) drawn
from the population is large - in theory, all possible samples of size n are drawn from the population. (Moore and
McCabe, 4th Ed, 2003 - p. 262.) In practice, n >30 and N is often 1,000, 10,000, or as determined by convergence
of distributional characteristics, and the resulting sampling distribution is approximately normal.
14
-------
Mean of estimates 95% CI ^^¦50%CI
1200
1000
800
600
400
200
0
6
6.2
6.4
6.6
6.8
7
7.2
7.4
7.6
7.8
8
Estimate of population mean from sample
Figure 3- Example of CIs determined from a distribution of sample means.
The techniques utilizing the sampling distribution to make inferences about the population mean
can be applied to other statistics as well, such as sample quantiles. Additionally, a statistical
technique applied as resampling from one particular drawn sample, known as bootstrapping, can
be used to generate estimated CIs for any desired statistic. Bootstrapping is further explained in
Section 2.2.3.
The CIs for any sample comparison are generally affected by three main factors: the size of the
sample, the variability within the sample, and the confidence limits desired for the comparison
(e.g., 0.95 level of confidence was used in the example above). Increasing the sample size
(taking more measurements or samples) will increase the representativeness of the sample of the
population and decrease the variance associated with the calculated measurement, resulting in
narrower CIs. Samples from populations with greater inherent variability will have greater
uncertainty and result in larger CIs. Finally, increasing the confidence level of the inferred
conclusion will necessitate larger CIs, while lower confidence thresholds will result in narrower
CIs. There are clearly many complicated aspects of significance testing, many of which require
subjective selections by the analyst to insure that the results are appropriate to the application
15
-------
and to reduce the influence of uncontrolled variables on the results and conclusions. These
selections are usually made based on convention and standard practice, such as choosing a 95%
CI. While there are many more applications of statistical techniques and nuances of the
principles described above, these basic concepts of the population, sample, CIs (and their
relationship to probability) are the fundamental concepts used in the development of "significant
impact" thresholds presented here.
2.2.2 Characterizing Air Quality Variability
As discussed in Section 2.1, the DV from a particular monitor is the air quality statistic that is
used to describe the air quality in an area (e.g., the annual mean was the statistic from the
example above) and is compared to the NAAQS to determine attainment status for that area.
Within the conceptual framework discussed in the previous section, the ambient data from a
single monitor are a sample of a population of the air quality in an area and the uncertainty in
that sample stems from the inherent variability that occurs in air quality. The inherent variability
is driven by a collection of factors, both natural (meteorological) and anthropogenic (emissions),
which can be grouped into spatial and temporal categories.
2,2,2,1 Spatial variability
The spatial variability is the change in air quality that is present at any one moment across an
area. This variability is driven by the spatial distribution of sources (causing localized increases
in ambient concentrations due to their emissions), removal or sinks (causing localized decreases
in ambient concentrations due to physical or chemical processes), variations in chemical
production for secondarily formed PM2.5 and ozone (which do not have direct emissions
sources), and meteorology (wind patterns may transport air from areas with higher emissions to
areas that typically have lower concentrations due to fewer localized emissions). The spatial
variability is directly addressed in the network design (i.e., the spatial scale associated with each
monitor and the potential need for multiple monitors to characterize the air quality in an area).
One way to estimate the spatial variability is to compare ambient monitors that are in close
proximity to one another. Such monitors would likely show similar trends in the ambient
concentrations, with some variation due to changes in emissions and meteorology responsible for
transporting pollutants and affecting chemical conversion, creation, and removal of atmospheric
species that are specific to each individual location.
These spatial variations occur in the population of air quality levels and can be estimated from
the existing sample (i.e., data available from the ambient monitoring network). Depending on the
intended scale of the monitor, there is some room for interpretation as to the population that
sample represents (e.g., a sample from an area-wide monitor theoretically represents the
population of air quality across a wide area), and this interpretation has implications for the
determination of the uncertainty associated with the sample (e.g., a sample from an area-wide
monitor is less likely to accurately represent air quality across the whole area at any moment,
thus having greater uncertainty as to its ability to characterize the population of air quality it is
intended to represent). Given the nature of the variability in air quality, there are three potential
16
-------
populations represented by the sample and the spatial variability between the sample and the
population:
1. If the population is considered to be the air quality at the location of the monitor only,
then there is no spatial variability.
2. If the population is considered to be the air quality in the immediate vicinity of the
monitor, then there will be some spatial variability, the degree of which will depend on
nearby sources and sinks and the distance of the location of interest from these sources
and sinks. For PM2.5, if there is a nearby source of primary PM2.5, changes in wind
direction and mixing conditions will change where these nearby sources have impacts,
such that there would be more spatial variability on this small scale. If there is no nearby
source of primary PM2.5, then secondary PM2.5 would dominate and there would likely be
little small-scale spatial variability on this small scale. For ozone, the same is true, in that
there will likely be little spatial variability unless there are nearby sources that act as a
sink {i.e., major NOx source such as a highway or point source). Without a nearby sink,
then the secondary nature of ozone would generally indicate that there is little spatial
variability on this small scale.
3. If the population is considered to be the air quality over a larger scale (e.g., a county or
Core Based Statistical Area or CBSA), then there is much more spatial variability. As
with case 2, the presence and location of sources and sinks will impact how much spatial
variability is present, though on such a large scale, there are likely to be many sources
and sinks across the area, resulting in more spatial variability.
As discussed in Section 2.2.1, monitoring sites are assigned a spatial scale, which are associated
with the size of the area for which a particular monitoring site should be representative of the air
quality. For secondarily formed pollutants, Appendix D to Part 58 states that the highest
concentration monitors may include urban or regional scale monitors (i.e., 50 to hundreds of km
spatial scale). Intuitively, it would be expected that the air quality changes across these distance
scales, such that the air quality across such a large area is not identical to the air quality as
determined by a single monitor. Indeed, these classifications are supportive of the idea that there
are spatial variations, such that multiple monitors are generally needed to adequately characterize
the air quality in an urban area. However, in rural areas with few emissions sources, a single
monitor may be sufficient to characterize the air quality over hundreds of square km (as was the
case in the example above).
2,2,2,2 Temporal variability
In the example introduced in Section 2.2.1, there may be uncertainty not only from the limited
sampling of the population, but also based on changes in the population occurring with time.
Temporal variability is the variability in air quality that occurs over time, which is driven by
changes in emissions and meteorology over a range of time scales. For shorter time scales,
diurnal patterns in both emissions and meteorological processes can impact most atmospheric
pollutants. Mobile source emissions, which can substantially contribute to atmospheric pollution,
have particularly strong daily (i.e., rush-hour) and weekly (no rush-hour on the weekends)
17
-------
patterns. Day-to-day meteorological variability {i.e., frontal passages and synoptic weather
patterns) can also cause temporal variability on the timescale of days to weeks. At intermediate
time scales, seasonal changes in weather can have a major impact in transport patterns and
chemical reactions. There can be seasonal trends in emission patterns as well, particularly those
associated with energy production and mobile source emissions. At longer time scales, there can
be longer-term trends in meteorology {e.g., particularly warm or wet years) and emission sources
(sources being added or removed or changes in emissions due to emissions controls or economic
conditions) that result in long-term air quality variability. Temporal variability is reflected in the
form of the standard {i.e., compliance with each ozone and PM2.5 standard is based on 3 years of
data in order to reduce from the impact of temporal variability on NAAQS implementation
programs). This variability can be addressed by requiring continuous monitoring in an area, even
after air quality levels in an area are below the level of the standard. The long-term temporal
variability can be characterized by examining changes in air quality over time at a particular
monitor {e.g., trends in DVs or other metrics from the monitor). The shorter-term temporal
variability can be described by examining the hourly and daily changes in air quality or by
comparing data from periods with similar meteorological conditions {e.g., afternoon, weekdays
versus weekends, or summertime concentrations).
Whatever the spatial scale of the monitor, temporal variability will always contribute to the air
quality variability, as there will always be day-to-day changes in meteorology and emissions and
variability between seasons and years, which may or may not include any trends in emissions
and meteorology. The form of the standard {e.g., annual average or a ranked daily value), the
temporal resolution of the monitoring data {e.g., hourly or 24-hr averaged samples), and the
frequency of the sampling {e.g., daily samples or samples taken every sixth day) may affect the
ability of the monitoring data to fully capture the inherent temporal variability and thus increase
the uncertainty in any statistic or DV derived from a particular sample. If a monitor has some
missing data, then it is easy to conceptualize that there is some uncertainty caused by temporal
variability in that there are days and hours that are not represented by the monitor. On the other
hand, if a monitor has a perfect sampling record, then the uncertainty due to reduced sampling
frequency is eliminated, but there remains long-term variability. Since the PM2.5 and ozone DVs
are based on 3 years of data, there is variability between the years that affect the DVs. As noted
above, the use of a 3-year DV, rather than a DV derived from 1 or 2 years of data, is intended to
increase the stability (or reduce the variability) of the DVs.
The importance of temporal variability is perhaps more apparent when the application of the
DVs are considered. For area designations purposes, the DVs are historical (updated DVs for a
particular year are published in the following calendar year), such that the DV is an estimate of
the current state of the air quality in an area. Furthermore, in the permitting process, DVs are
paired with modeling of past years of meteorology and planned future emissions. Thus, the
changes from year-to-year and the uncertainty in estimating future air quality levels are
illustrative of important factors affecting temporal variability that impacts regulatory applications
and exists regardless of the completeness of the sampling record or the spatial scale defining the
population discussed above.
18
-------
Continuing the example from Section 2.2.1:
Suppose that after 1 year of sampling, there is some commercial development adjacent to
the wilderness area, such that new buildings and larger traffic volumes are present during
the second year of the monitor's operation. One might want to assess whether or not the
new activity has had a notable impact on the average PM2.5 concentrations within the
wilderness area. A comparison between the scenarios can be considered, and the idea that
the difference between the two may be "notable" can be evaluated by comparing that
difference to the estimated CIs created by the bootstrap procedure using the concepts in
significance testing (Section 2.2.1).
2,2,2,3 Assessing air quality variability
Based on the description of the population determined above, the DV can be understood to be a
statistic determined from a sample of the population. CI's for a particular DV can then be used to
compare the DV with another DV or a constant value (e.g., the NAAQS). If the CI for the DV
contains the value of interest, then the DV and the value of interest are statistically
indistinguishable from one another, given the sample data available at a particular confidence
level. In the context of an air quality analysis, if a CI can be determined for a DV. then it can be
concluded that a value within some given amount of variation of a DV (i.e.. within a CI for that
DV) is statistically not significant with respect to that selected level of confidence. Note that in
this context non-significance simply shows the data to be compatible with an assumption of no
difference between the value and the DV.26
2.2.3 Bootstrapping Method
For annual-average standards (i.e., averages of many samples during 1 or 3 years), there are
standard parametric methods (e.g., the standard deviation) that might be used to estimate
variability associated with DVs. When the statistic of interest has a variance that is difficult to
estimate with parametric assumptions, such as a rank order statistic, some other approach must
be taken to determine CIs. For non-normal populations, there are some adjustments that can be
made to determine CIs of the mean if the data conform to some standard distribution (e.g., log-
normal). For small sample sizes, other non-parametric tests such as the Mann-Whitney28 test or
the Wilcoxon signed-rank test29 may be used. However, for many statistics (e.g., the 98th
percentile), the underlying distribution of the statistic may be complicated or unknown, and thus
determination of the CIs for these statistics can be difficult or impossible to determine with
traditional metrics.30 Of the three NAAQS considered here, the annual PM2.5 standard is the only
NAAQS that is based on a sample mean. However, the calculation of the DV statistic for the
annual PM2.5 NAAQS is more complicated than merely taking a simple arithmetic average of the
24-hr PM2.5 values across 3 years; thus, deriving the distribution of the annual PM2.5 DV statistic
28 Mann, H. B.; Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically
Larger than the Other. Annals of Mathematical Statistics 18 (1): 50-60. doi: 10.1214/aoms/l 177730491.
29 Wilcoxon, F. (Dec 1945). Individual comparisons by ranking methods. Biometrics Bulletin 1 (6): 80-83.
30 Woodruff, R. S. (1952); Confidence intervals for medians and other position measure. J. Amer. Stat. Assoc., 47,
635-646, doi: 10.1080/01621459.1952.10483443.
19
-------
is not straightforward. The CIs for the 24-hr PM2.5 and ozone NAAQS are based on rank-order
statistics (98th percentile for PM2.5 and 4th highest daily maximum 8-hr ozone concentration, see
Section 2.1), which cannot be easily described using standard statistical techniques. Thus, for the
three DV statistics being analyzed here, an alternative technique to determine CIs is needed.
The bootstrapping method mentioned above is a well-established and accepted statistical method
that allows one to estimate the underlying distribution of many sample statistics (e.g., mean,
percentiles, and correlation coefficients) when the theoretical distribution is complicated or
unknown.7'8:9 The bootstrap method relies on the underpinnings and characteristics of sampling
distributions discussed in Section 2.2. The estimate of the distribution is accomplished by
resampling with replacement from the initial dataset many times, resulting in many resampled
datasets (bootstrapped samples). The sample statistic of interest is then computed from each
resampled dataset, resulting in an empirical estimate of the sampling distribution for the desired
statistic. This estimate of the sampling distribution can then be used to determine CIs for the
statistic of interest. Bootstrapping does not require any distributional assumptions for the
population, nor does it require that there be an established formula for estimating the uncertainty
in the statistic.
Meaningful information on the variability associated with the ozone and PM2.5 DVs can be
derived by using bootstrapping to assess the variability associated with the three DV statistics
(i.e., the ozone DV, the annual PM2.5 DV, and the 24-hr PM2.5 DV).9 This analysis uses ambient
PM2.5 and ozone measurement data taken from the EPA's AQS database to determine CIs for
each monitor for 3-year DV periods (i.e., the 3 years of ambient data required to compute a DV
for these NAAQS). The CIs give a measure of the temporal and spatial variability in the air
quality represented by each monitor. A nationwide analysis of the variability and changes in this
variability over time is also conducted. Finally, the results from this analysis of air quality
variability are used to calculate levels of change in pollutant concentrations that can serve as
"significant impact" thresholds in the context of source-specific "cause or contribute"
determinations.
The dataset used for this technical analysis comes from the AQS database described in Section
2.1 and is the same dataset that would be used for determining the DV at any particular monitor.
The ambient PM2.5 concentration data used for this analysis consist of 24-hr averaged samples,
while the ozone data consist of 8-hr averaged concentrations (i.e., the MDA8's). This includes
data from all of the monitoring sites in the EPA's AQS database from the years of 2000 to
2016.31
The bootstrapping estimates used in this analysis were calculated independently for each
monitoring site, and the bootstrapping resamples at each site were taken independently within
31 Raw daily and hourly measurements from FRM and FEM monitors are aggregated by AQS into a single daily
value for each sampling site and NAAQS (annual and 24-hr) according to the procedures described in Appendix N
of Part 50. The aggregation procedures in AQS include accounting for multiple monitors at sites, handling of
exceptional events (which can be different between the two PM2 5 NAAQS), and calculating a 24-hr value from 1-hr
measurements. These results reside in the "site daily values" table of AQS, which were downloaded for use in the
current analysis.
20
-------
each calendar year. The re-sampling within each year is completed such that the re-sampled year
contains the same number of days as the original data. The number of measurements varies by
monitoring site and can have important implications for the inherent variability. The variation in
the sampling schedule is explored further in Section 3.2.2. The re-sampling and computation of
new DVs at each site are conducted to mimic the DV calculation procedures as closely as
possible, which differ for each NAAQS.19'21
For the annual PM2.5 NAAQS, the data from each year was further subset by quarter (i.e.,
Jan-Mar, Apr-Jun, Jul-Sep, Oct-Dec), such that the re-sampling did not allow for data
from one quarter to occur in another quarter. The resulting re-sampled dataset was
averaged by quarter; then the quarterly means were averaged to find the annual mean,
with the DV being computed as the average of the three annual means. Design values for
the annual PM2.5 NAAQS were rounded to the tenth [j,g/m3 (i.e., the one decimal),
consistent with the computation of DVs for designation purposes.
For the 24-hr PM2.5 NAAQS, the data from each year was subset by quarter (i.e., Jan-
Mar, Apr-Jun, Jul-Sep, Oct-Dec), such that the re-sampling did not allow for data from
one quarter to occur in another quarter. The number of days in each quarter was kept
equal to the corresponding number in the original dataset. While this isolation of quarters
is not a feature of the DV calculation procedure, it was applied as a precaution to avoid
changing the seasonal balance in the bootstrapped samples. The resulting re-sampled
dataset was then ranked, and the 98th percentile value was selected based on the number
of daily measurements in each year, as described in Table 1 of Appendix N. The DVs
were then computed as the average of the three annual 98th percentile values. Design
values for the 24-hr PM2.5 NAAQS were rounded to the nearest (J,g/m3, consistent with
the computation of design values for designation purposes.
For the ozone NAAQS, all available data at each site were used. The ozone monitoring
regulations require monitoring for the "ozone season," which varies by state. Many states
operate a subset of ozone monitors outside of the required monitoring season and when
those data are available it is used in determining DVs for regulatory purposes. Therefore,
if a monitor operated beyond the required ozone season, all valid data were included in
the DV calculation. For example, if the required monitoring season was from April-
October, but data from November were also available, then the MDA8 values from April-
November were ranked in order to find the 4th highest value. The DVs were then
computed as the average of the three annual 4th highest MDA8 values. Design values for
the ozone NAAQS were truncated to the nearest ppb, consistent with the computation of
design values for designation purposes. Though the regulations for processing ozone data
to compute a DV do not involve segregation of the data by season, a sensitivity analysis
was conducted to determine the impact of applying the same quarterly segregation used
for PM2.5. The results are summarized in Section A.4 of the Appendix, but the results
indicated relatively little sensitivity to this choice for most sites and, thus, no quarterly
segregation was applied for the final analysis.
21
-------
For both PM2.5 and ozone, each year of data from each site was re-sampled 20,000 times. During
initial development of the method, the distributions derived from the bootstrap analysis did not
appear to change after 3,000-4,000 re-samples for several single calendar years. Therefore,
20,000 re-samples were chosen to conservatively ensure that stable results were obtained for all
cases. For each 1-year re-sample for each pollutant, the relevant annual statistic was computed
(annual mean for PM2.5, 98th percentile for PM2.5, and 4th highest MDA8), giving 20,000
estimates of the annual statistic for each year. In order to replicate the way in which the standard
is calculated, the data from each year are resampled separately from the other years. In order to
calculate the bootstrap samples in a manner consistent with the DV calculations {i.e., calculating
averages and 98th percentile values in each year independently), then averaging the three annual
values, each of the 20,000 estimates for year 1 were averaged with the corresponding 20,000
estimates for year 2 and year 3, giving 20,000 estimates of the DV. From the 20,000 estimates,
the mean, median, standard deviation, maximum, minimum, 25%, 50%, 68%, 75% and 95% CIs
for the mean,32 were computed and retained for further analysis. For symmetric distribution such
as the Normal Distribution obtained with the sampling distribution, the mean is equal to the
median, where the median is the center value such that 50% of the values are below the median
and 50% above. Thus, a bootstrapped CI for the mean is analogous to a bootstrapped CI for the
median and the CIs can be calculated by rank-ordering the bootstrap results and selecting the
bounds that contain the corresponding percentage of data. Since data from 2000-2014 were
processed, all possible 3-year DVs from 2002-2014 were computed, for a total of 13 DV-years,
including five 3-year periods that had non-overlapping years {i.e., 2000-2002, 2003-2005, 2006-
2008, 2009-2011, and 2012-2016).33 As we are defining the CIs as the bounds of the uncertainty
and a measure of the air quality variability, we frequently refer to each CI as the uncertainty
associated with the actual DV.
The following gives an example of how the CIs are determined utilizing the percentile method34
for the 24-hr PM2.5 DVs from a monitor:
Consider the dataset Xo, which contains 150 measurements of 24-hr averaged PM2.5
monitoring values from year 1. Datasets Yo and Zo contain data from the same site, but
for years 2 and 3 respectively, and contain 250 and 350 days of data respectively.
From Xo, we calculate the 98th percentile as the 3rd highest value in the dataset. From Yo,
we calculate the 98th percentile as the 5th highest value in the dataset. From Zo, we
calculate the 98th percentile as the 7th highest value in the dataset. The DV for this site is
the average of the 98th percentiles from Xo, Yo, and Zo.
32 Here, and elsewhere in this document, a CI for the median is the interval spanning the data that contains V2 of the
CI of the data above the median and of the CI of the data below the median of the re-sampled DV estimates. For
example, the 50% CI consists of the 25% of the data above the median and the 25% of the data below the median.
33 Later in this document, whenever a single year is used to identify a DV, it refers to the last year of the 3-year
period.
34 Efron, B.; Tibshirani, R. (1993); An Introduction to the Bootstrap. Boca Raton, FL: Chapman & Hall/CRC. ISBN
0-412-04231-2.
22
-------
From Xo, 20,000 new sample datasets, Xi, X2, ..X2o,ooo, each with 150 measurements of
PM2.5 are sampled with replacement from the original dataset Xo. Likewise, 20,000 new
sample datasets are sampled with replacement from Yo, and Zo.
For each Xi, the 98th percentile value is the 3rd highest value, for each Yi, the 98th
percentile is the 5th highest value, and for each Zi, the 98th percentile is the 7th highest
value. Thus, the DV for each subset, DVi, is the average of the 3rd high value from Xi, the
5th highest value from Yi, and the 7th highest value from Zi. This calculation yields 20,000
different DVs.
To determine the CIs from these 20,000 DVs, the DVs are ranked from low to high. Then
the lower bound for the 50% CI is the 5,000th ranked DV, and the upper bound for the
50% CI is the 15,000th ranked DV. That is, the CIs are determined simply by ranking the
resulting distribution of DVs and the (1 -qj% CI for the mean is the bounds of the center
of the data that contains q percentage of the results {i.e., the lower bound is the (7//2)lh
percentile and the upper bound is the (l-g/2)th percentile).
Section A.l provides several illustrative examples of the bootstrapping analysis for both the
annual and 24-hr PM2.5 NAAQS with actual data from six different sites.
5 s11- "is ¦' 'i the Air 1 b> <, «l <,\>>» n ch
This section provides results on characterizing the variability of air quality for ozone and PM2.5
based on EPA's Air Quality Variability approach.
3.1 Ozone results
The results from the bootstrap analysis for the 2014-2016 ozone DVs are shown in Figure 4,
which shows the mean, median, minimum, and maximum bootstrap DVs for each monitor, as
well as the upper and lower bounds of the 25%, 50%, 68%, 75%, and 95% CIs for the median
DV calculated from the 20,000 bootstrap samples as a function of the DV determined from the
original dataset (top panel), the relative differences between the CI DVs and the actual DVs
(middle panel), and box-and-whisker plots of the distribution of the relative difference at each CI
(bottom plot). The mean and median of the bootstrap DVs for the ozone NAAQS replicate the
actual DV from the original site data fairly well, with some very small deviations (maximum
deviation is less than 5%). Even though the ozone NAAQS is based on peak values (similar to
the 24-hr PM2.5 NAAQS), the magnitude of the relative variability in the ozone bootstrap DVs
ranges from 1-5%, with maximums around 25-30%). This is likely due to the nature of ozone
formation {i.e., ozone is almost exclusively a secondarily formed pollutant, with precursors
typically originating from multiple sources, rather than a single source). There is a component of
reaction/formation time, both of which are likely to reduce the spatial variability and temporal
variability of the ambient ozone. There is an increase in the absolute variability with an increase
in the baseline DVs, but there is not an apparent trend in the relative variability. This indicates
that the baseline air quality does not systematically affect the relative amount of variability at a
site. This is especially important because it indicates that a central tendency value for the relative
23
-------
variability in the DV for the ozone NAAQS is stable across levels of ozone concentrations.
Therefore, a representative value can be multiplied by the level of that NAAQS to obtain a value
in concentration units (ppb for ozone) that is appropriately used to characterize variability for
sites with air quality that "just complies" with the NAAQS.
24
-------
tOfm
CI limits
50% CI
68% CI
75% CI
95% CI
max/min
median
CI limits
50% CI
68% CI
75% CI
95% CI
max/min
median
bootstrap metric
.Q
Q.
a
>
Q
Q.
CO
100-
40 60 80 100
Daily DV (ppb)
40 60 80 100
Daily DV (ppb)
Figure 4 - Bootstrap results for the ozone 2014-2016 DVs (25%, 50%, 68%, 75%, and 95% CIs,
along with the mean and median bootstrap DVs) Top panel shows the values for the DVs at the
various CIs, the middl e panel shows the average of the relative difference between the upper and
lower bounds of the CI and the actual DV, and the bottom panel shows the distribution of the
relative differences between the CI and the actual DV.
25
-------
3.2 IPIIV12,55 Results (Annual
The results from the bootstrap analysis for the 2014-2016 DVs are shown in Figures 5 and 6. The
top two panels of Figure 5 show the upper and lower limits of the 25%, 50%, 68%, 75%, and
95% CIs for the median as well as the mean, median, minimum and maximum DVs calculated
from the 20,000 bootstrap samples as a function of the DV determined from the original dataset.
Variability is greater for the 24-hr PM2.5 NAAQS than the annual PM2.5 NAAQS. This is not
surprising since the mean is expected to be a more stable statistic than the 98th percentile. Since
the PM2.5 data distributions tend to be skewed to the right (see examples in the Appendix), the
presence of a few very high concentration values, or "outliers," in the original dataset for a year
would tend to increase the variability associated with any metric based on the highest
concentrations (e.g., if the 50th percentile value were determined, it would likely have much less
variability than the 98th percentile). The mean and median of the bootstrap DVs for the annual
NAAQS almost perfectly replicate the actual DV from the original site data. While some
deviations of the mean and median bootstrap DVs from the actual 24-hr NAAQS DV are
evident, there are only a few sites where the mean and median bootstrap DVs deviate
substantially from the actual DV.
The relative variability (i.e., the difference between the bounds of the bootstrapped CI and the
actual design value for a single monitoring site, divided by the actual design value for the site) is
also shown in Figure 5, with distributions of the relative differences for each CI across
monitoring sites shown in Figure 6. Viewing the results on a relative scale allows the display of
finer details of the deviations between the bootstrap results and the actual DVs. The relative
variability shows that for the annual NAAQS there are relatively small differences in the values
corresponding to the 25%, 50%, 68%, and 75% CIs compared to the difference between these
and the 95% CI. Similarly, for the 24-hr NAAQS, the values corresponding to the 50%, 68% and
75%) CIs are fairly close to each other, with greater differences between these and the 25% CI on
the low end and the 95% CI on the high end. The relative variability shows an important feature:
that from a relative sense, the air quality variability is fairly stable as the baseline air quality
worsens. That is, there is no notable increase in the relative variability of the bootstrap DV as the
actual DV increases. This is important because it indicates that the magnitude of the actual DV
does not systematically affect the relative variability in the bootstrap DV at a site and because it
indicates that a central tendency value for the relative variability in the DV. Therefore, a
representative value can be multiplied by the level of that NAAQS to obtain a value in
concentration units ((J,g/m3 for PM2.5) that is appropriately used to characterize variability for
sites with air quality that "just complies" with that NAAQS.
26
-------
CO
E
~CT>
15-
>
Q
10-
Q.
co
-t<
5-
C/3
0
0
-Q
CO
E
~5>
75-
s'
>
Q
50-
Q.
CO
I
25-
2
0
0
.Q
Annual bootstrap results
ci
8 12
Annual DV (ug/m3)
16
24-hr bootstrap results
ci
24-hr DV (ug/m3)
Difference between DVs and boot results, Annual NAAQ2?1
8 12
Annual DV (ug/m3)
Difference between DV and boot results, Daily NAAQS ci
Daily DV (ug/m3)
limits
- 50% CI
- 68% CI
75% CI
- 95% CI
- max/min
- median
limits
- 50% CI
- 68% CI
75% CI
- 95% CI
- max/min
median
limits
- 50% CI
- 68% CI
75% CI
- 95% CI
- max/min
- median
limits
- 50% CI
- 68% CI
75% CI
- 95% CI
- max/min
- median
Figure 5 - Bootstrap results for the PM2.5 2014-2016 DVs (25%, 50%, 68%, 75%, and 95% CIs,
along with the mean and median bootstrap DVs). The top two panels show the values for the
DVs at the various CIs, while the bottom two panels show the average of the percent difference
between the upper and lower bounds of the CI and the actual DV.
27
-------
Annual NAAQS bootstrap summary
0)
O 15-
c
0
B
1
1
1
1
1
1
1
1
1
a>
CD
a3
a3
a>
CD
a3
cd
a3
0
Q_
Q.
£
0
Cl
Cl
_o
Q.
Cl
£
0
Q_
Q.
£
0
=3
=3
-0
sJD
-0
^0
^0
-0
sJD
-0
0s
LO
O^
LO
0s
0
0^
O
0s
CO
0-
CO
cr-
m
O^
in
0s
in
CM
CM
in
in
CD
CD
h-
Is-
O)
(D
Q.
CL
m
CD
X
03
E
c
03
(D
E
£=
"o
CD
E
c >
i -s
¦o'
-t'
(/)
bootstrap metric
24-hr NAAQS bootstrap summary
"-ir
ox
0)
O
C
0
0
fc
>
Q
25-
20-
(D
CD
CD
CD
-------
3.2.1 Analysis of PM2.5 Spatial Variability
Section 2.1.3 discusses the design of the monitoring network and the spatial scales associated
with each monitor. While there may be changes to the area around a monitor after the scale was
determined when the monitor was sited, the monitor scale should be somewhat reflective of air
quality within the area indicated. This basic need for multiple monitor scales and multiple
monitors in an area to assess an area's air quality is due to the fact that there is an inherent spatial
variability of air quality. For example, due to the inherent variability in the location of emission
sources and changes in meteorological patterns, two "urban scale" monitors located a few blocks
from each other would likely record different daily values, resulting in different DVs. The
analysis conducted here seeks to quantify that spatial variability by identifying pairs of monitors
that are located in proximity to one another to determine the relative difference between the two
monitors, as indicated by the DVs. The differences between the DVs are interpreted as a measure
of the spatial variability in the area and provide a benchmark to evaluate the variability
determined from the Bootstrap analysis.
The analysis was conducted using the 2012-2016 annual and 24-hr PM2.5 DVs and focused on
pairs of monitors which collected PM2.5 samples every day (1:1 monitors) in order to reduce the
impact of temporal variability (see Section 4.3.1 for an analysis of the temporal variability). A
total of 70 1:1 monitors were identified that were separated by a distance of less than 50 km,
with 13 less than 10 km apart. We did not investigate whether based on emission sources,
winds, and terrain any of these sites could reasonably be considered representative for
particular locations at which a new source could seek a permit in the future.
The results from the analysis are summarized in Table 1 (monitor pairs within 10 km) and in
Figures 7, 8 and 9 (monitor pairs within 50 km). There is a fairly strong correlation between the
DVs in the site pairs (top panels in Figure 7), with a slope of 0.8 (r2 of 0.51) between monitor
pairs less than 50 km apart for the annual NAAQS and a slope of 0.87 (r2 of 0.59) for the 24-hr
NAAQS. There are no obvious trends in the differences between the monitors, either the absolute
differences or the relative differences (defined as the absolute difference between the DVs from
the two monitors divided by the average DV). The relative differences range from 0% to 66%,
with a median relative difference of 9% for the annual DVs. For the 24-hr DVs, the relative
differences range from 0% to 67%, with a median relative difference of 6%. When the subset of
monitors within 10 km are considered, the slope between paired monitors is similar for the
annual NAAQS, though the r2 increases to 0.82, while the slope for the 24-hr NAAQS increases
to 0.97 and the r2 increases to 0.94. For this subset, the maximum relative differences drop to
23% and 16% for the annual and 24-hr DVs, respectively, and the median relative differences
drop to 5% and 4%, respectively.
These results are interesting and seem to somewhat contrast the results from the bootstrap
analysis, which suggest less variability in the annual NAAQS than in the 24-hr NAAQS. This
comparison suggests that there is more spatial variability associated with the annual NAAQS,
while the bootstrap results show that there is less variability in the annual NAAQS. Conversely,
this comparison suggests that there is less spatial variability associated with the 24-hr NAAQS,
while the bootstrap results show that there is more variability in the 24-hr NAAQS. Despite this
29
-------
apparent contradiction, these results make sense in the context of secondary pollutants,
particularly PM2.5. In general, the highest concentrations associated with pollutants that have a
substantial portion due to secondary formation occur in widespread "events"". These events are
an important aspect of the air quality in an area and are associated with unique meteorological
conditions, which can either transport air from polluted upwind regions, increasing the
background concentrations, or trap local pollutants and facilitate in-situ production. Events are
also associated with unique emissions episodes, such as dust storms or biomass burning events
that emit large quantities of primary and precursor pollutants. Because of the nature of PM2.5
events, there would tend to be a stronger correlation of the higher concentrations across larger
spatial scales. The average air quality (annual NAAQS), on the other hand, would not be as
heavily impacted by the unique (and wide-spread events) and instead would be more heavily
affected by local emissions and production. As such, the prevailing meteorological conditions
and the prevalent local emission sources would have the most impact on the annual DVs. In this
case, localized differences in emissions could cause monitors to have greater differences in the
annual DVs than is seen at a number of site pairs.
The result from the spatial variability analysis of PM2.5 also suggests an important link to
temporal variability of PM2.5. The occurrence of these transport and emissions events is
infrequent with varying intensity, such that they may not occur in every year and their frequency
and duration would vary. Even when these events do occur, the intensity and impact on regional
and local air quality would vary and also be difficult to predict. Since the bootstrap results show
that 24-hr NAAQS has the most variability, this seems to imply that temporal variability is the
most important component of the 24-hr NAAQS variability, while the spatial variability may be
the most important component of the annual NAAQS variability, based on the results from the
spatial analysis.
30
-------
Table 1 - Summary of results from PM2.5 spatial variability analysis for monitor pairs within 10
cm of one another.
State
City
Dist
Monitor 1
Annual
Monitor 2
Annual
Delta
(km)
ID
DV 1
ID
DV 2
(%)35
Minnesota
Washington
1.0
271630447
8.1 ng/m3
271630448
8.8 |-ig/m3
8%
Hawaii
Honolulu
1.7
150031001
4.9 ng/m3
150031004
5.6 ng/m3
14%
Pennsylvania
Philadelphia
2.6
421010047
10.3 |-ig/m3
421010057
10.9 |-ig/m3
5%
Pennsylvania
Philadelphia
3.1
421010055
11.6 |-ig/m3
421010047
10.3 |-ig/m3
12%
Louisiana
East Baton
Rouge
5.4
220330009
9.0 |-ig/m3
221210001
9.2 ng/m3
3%
Nevada
Washoe
5.5
320310016
7.9 ng/m3
320311005
10.0 |-ig/m3
23%
Pennsylvania
Northampton
5.7
420950025
10.5 |-ig/m3
420950027
10.1 |-ig/m3
4%
Rhode Island
Providence
5.9
440070022
7.1 ng/m3
440071010
7.4 ng/m3
3%
Iowa
Clinton
6.4
190450019
10.6 |-ig/m3
190450021
9.4 ng/m3
11% 1
Utah
Salt Lake
7.3
490353006
9.2 ng/m3
490353010
9.7 ng/m3
5%
New Mexico
Bernalillo
7.9
350010023
6.5 ng/m3
350010024
6.3 ng/m3
3%
Indiana
Marion
8.9
180970078
11.1 ng/m3
180970081
11.8 |-ig/m3
6%
Indiana
Clark
9.3
180190006
11.8 |-ig/m3
211110067
11.3 ng/m3
4%
State
City
Dist
Monitor 1
24-hr
Monitor 2
24-hr
Delta
(km)
ID
DV 1
ID
DV 2
(%)35
Minnesota
Washington
1.0
271630447
20.6 |-ig/m3
271630448
21.1 ng/m3
3%
Hawaii
Honolulu
1.7
150031001
10.9 |-ig/m3
150031004
11.4 ng/m3
5%
Pennsylvania
Philadelphia
2.6
421010047
24.3 |-ig/m3
421010057
25.2 ng/m3
4%
Pennsylvania
Philadelphia
3.1
421010055
26.4 |-ig/m3
421010047
24.3 |-ig/m3
8%
Louisiana
East Baton
Rouge
5.4
220330009
19.7 |-ig/m3
221210001
19.4 |-ig/m3
2%
Nevada
Washoe
5.5
320310016
26.8 |-ig/m3
320311005
31.5 ng/m3
16%
Pennsylvania
Northampton
5.7
420950025
27.2 ng/m3
420950027
28.3 |-ig/m3
4%
Rhode Island
Providence
5.9
440070022
18.3 |-ig/m3
440071010
18.6 |-ig/m3
2%
Iowa
Clinton
6.4
190450019
24.7 |-ig/m3
190450021
22.8 |-ig/m3
8%
Utah
Salt Lake
7.3
490353006
42.3 |-ig/m3
490353010
41.0 |-ig/m3
3%
New Mexico
Bernalillo
7.9
350010023
15.4 |-ig/m3
350010024
15.1 ng/m3
2%
Indiana
Marion
8.9
180970078
25.0 |-ig/m3
180970081
26.4 ng/m3
5%
Indiana
Clark
9.3
180190006
24.2 |-ig/m3
211110067
22.8 |-ig/m3
6%
35 Defined as the difference between the two monitored DVs divided by the mean DV of the two monitors.
31
-------
15-
45-
co
e
D>
3
10-
o
E
>
a
c
(0
5-
V/
sr
/V
n
km
i i i i
0 5 10 15
annual DV monitor 2 (ug/m3)
40
30
20
10
ro
E
o>
35-
o
c25-
o
E
>
° 15-
*3-
CM
5-
i
*
4
m
w w
;<
km
5 15 25 35 45
24-hr DV monitor 2 (ug/m3)
40
30
20
10
7.0-
6.5-
co 6.0 -
£ 5.5 -
§5.0 -
>4.5 -
q 4.0 -
15
=> 3.0 -
= 2.5-
"2.0-
11-51
¦§1.0-
0.5-
0.0-
' %*
' :
14-
E
o>10 -
i8-
6-
4-
2-
0-
CN
0)
"O
1
<
» #
# ,
i M
i
.4
III!
1
0 5 10 15
mean annual DV (ug/m3)
5 15 25 35 45
mean 24-hr DV (ug/m3)
50% -
> 40% -
Q
"co
i 30% -
c
CD
03
% 20% -
"0
I 10%-
Q 50% -
i
sz
30
4 40% -
CM
20
CD
% 30% -
"O
10
c 20% -
0
a> 10% -
Cl
0% -
4
»
_ <
#
* .
t
5 15 25 35 45
mean 24-hr DV (ug/m3)
km
40
30
20
10
Figure 7 - Results from the analysis of spatial variability. Left column shows results for annual
PM2.5 NAAQS and the right column shows the results for the 24-hr PM2.5 NAAQS.
32
-------
Annual NAAQS, <50 km delta
50-
40-
03
30-
20-
-160
-140
-120
-100
i
-80
long
avg_ann
12
10
mon_d1
10
20
30
40
% diff between sites, Annual NAAQS DV
-
1 1
r \
1 1
-160
-140
-120 -100
long
-80
mon_d1
10
20
30
40
avg_ann_pd
60
40
n
Figure 8 - Spatial distribution of the difference between the DVs from spatial analysis of the
2012-2016 PM2.5 annual DVs. Top panel shows the absolute value of the difference between the
two monitors while the bottom panel shows the percent difference between monitors.
33
-------
50-
40-
30-
20-
-160
24-hr NAAQS, <50 km delta
mon dl
~ 0
avg TF
-140
-120
-100
i
-80
long
% diff between sites, 24-hr NAAQS DV
50-
40-
30-
20-
1 1
1 1
' \
1 1
avg_TF_pd
60
40
20
0
mon_d1
10
20
30
40
long
Figure 9 - Spatial distribution of the difference between the DVs from spatial analysis of the
2012-2016 PM2.5 24-hr DVs. Top panel shows the absolute value of the difference between the
two monitors while the bottom panel shows the percent difference between the two monitors.
34
-------
3.2.2 Analysis of the Influence of PM2.5 Monitor Sampling Frequency
The PM monitoring network was been designed to operate continuously. When initially designed
and deployed, the monitoring requirements for PM indicated that many sites only needed to
sample on every third or sixth day, with a smaller number required to sample every day. This
was partly due to the technology available at the time, which required a person to collect the
filter sample and reload the filter cartridge for each sample taken. The filters were then
transported to a laboratory for weighting analysis. While much of the PM2.5 network still relies
on filter-based sampling, systems that can load multiple filters and automatically swap out filters
after each 24-hr monitoring period have reduced the labor requirements. Non-filter based
measurement techniques have also been developed that allow for continuous operation (as well
as 1-hr sampling) so that concentration values are provided for every 24-hr period. Additionally,
the requirements for sampling frequency have tightened, requiring more frequent sampling,
particularly in areas with DVs close to the NAAQS. The result of the technological and
regulatory changes is a sampling network with varied sampling frequency, with notable changes
in the sampling frequency over time (see Figure 10). The total number of sites in the network has
decreased, but the number of 1:1 sites has increased. Many 1:6 and 1:3 sites have been replaced
by 1:1 sites, a trend most obviously starting around 2008. (The site classification was based
solely on the number of daily samples during the course of the year, i.e., sites with 60 or less
samples were 1:6, sites with 121 samples or less but more than 60 were classified as 1:3, and
sites with 122 or more samples were classified as 1:1.)
Due to the nature of temporal variability, it would generally be expected that data from datasets
from sites with less frequent sampling would in general have a higher sample variance and
therefore wider confidence intervals. Sensitivity tests conducted with the 2010-2013 DVs indeed
showed that statistics from the subset of sites with daily monitoring (1:1) have tighter confidence
intervals than the subset of sites with 1:3 monitoring and all data (which includes 1:6 monitors)
(see Table 2). However, since the 1:1 monitors are not sampling the same air as the 1:3 monitors,
it is difficult to directly compare the results from these subsets as a definitive indicator of the
inherent increase in variability due to less frequent sampling. However, the results do support
what is generally expected from reduced sampling frequency {i.e., while 1:1 monitoring might
capture a wider range of air quality, less frequent sampling would likely result in increased
sample variance and wider confidence intervals for statistics from the air quality measurement
data).
Since the monitor sampling frequency can have a notable impact on the calculated air quality
variability, an important question arises regarding which monitors should be used to characterize
air quality variability. Using only the 1:1 monitors would likely produce smaller estimates of the
sample variance due to the increased sample size while possibly capturing a wider range of air
quality across a more widely sampled spectrum. However, the 1:3 and 1:6 monitors are part of
the monitoring network and will continue to be present for the foreseeable future. Additionally,
despite an increase in the number of 1:1 monitors, the overall air quality variability indicated by
the network has been fairly stable for the annual and 24-hr PM2.5 NAAQS (see Section 4.3.1).
This suggests that the inherent variability in the air quality is more influential than the increased
35
-------
variability induced by the presence of 1:3 and 1:6 monitors. In addition, the much greater
number of monitoring sites available when sites with all schedules are considered (see Table 2)
provides more confidence that the results are representative of the U.S. as a whole.
Table 2 - Summary of com
analysis for PM for three c
parison of the air quality variability
esign periods for monitors with difl
determined by the bootstrap
'erent sampling frequencies.
Monitor class
all
lin 1
1 in 3
all
1 in 1
1 in 3
all
lin 1
1 in 3
Year/NAAQS 2014 annual
2015 annual
2016 annual
Difference,
median
bootstrap vs
actual
0.04%
0.02%
0.04%
0.03%
0.03%
0.06%
0.04%
0.03%
0.03%
Avg. 25% CI span
0.67%
0.57%
0.94%
0.70%
0.58%
0.91%
0.71%
0.62%
0.88%
Avg. 50% CI span
1.63%
1.14%
1.81%
1.65%
1.24%
1.85%
1.69%
1.22%
1.85%
Avg. 68% CI span
2.44%
1.72%
2.67%
2.46%
1.77%
2.74%
2.45%
1.79%
2.76%
Avg. 75% CI span
2.80%
1.92%
3.11%
2.83%
2.00%
3.09%
2.82%
2.08%
3.18%
Avg. 95% CI span
4.72%
3.33%
5.26%
4.86%
3.43%
5.38%
4.79%
3.47%
5.48%
Year/NAAQS 2014 24-hr
2015 24-hr
2016 24-hr
Difference,
median
bootstrap vs
actual
1.14%
0.67%
1.54%
1.36%
0.84%
1.78%
1.23%
1.01%
1.40%
Avg. 25% CI span
2.27%
1.89%
2.38%
2.27%
1.92%
2.50%
2.50%
2.17%
2.63%
Avg. 50% CI span
4.29%
2.94%
4.76%
4.17%
3.45%
4.65%
4.35%
3.13%
5.13%
Avg. 68% CI span
6.00%
4.76%
7.02%
6.25%
5.09%
7.14%
6.52%
5.00%
7.89%
Avg. 75% CI span
6.82%
5.36%
8.33%
7.50%
5.56%
8.33%
7.69%
5.77%
8.88%
Avg. 95% CI span
12.50%
9.40%
14.14%
12.50%
10.00%
14.81%
13.16%
9.62%
16.67%
Number of sites 507
182
274
531
210
270
535
237
240
36
-------
Number of sites with each sampling frequency
900-
to
CD
300-
20002001200220032004200520062007200820092010201120122013201420152016
Mean sampling frequency by site type
e- 3oo-
class
1:6
1:3
1:1
H- all
class
1:6
-*¦- 1:3
H- all
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Dist of # of 24-hr samples for 1:6
60-
if)
-------
' 1 '| ! !f -if" 11 i fr»",! 1 Ifr 1 11 hhi i >1 ! 11I i II I 'i 1lii I" i'
5 ram
For a specific change in air quality concentrations to be used to show that a proposed source does
not cause or contribute to a violation of the NAAQS, the concentration change must represent a
level of impact on ambient air quality that is "insignificant" or not meaningful. The EPA has
taken into account the necessary policy considerations in conjunction with the statistical analysis
presented here to provide a rational basis to select values derived from the statistical analysis that
can be applied to represent "insignificant impacts."
Section 3 presented the results from the bootstrap analysis, which produced variability estimates
at the 25%, 50%, 68%, 75%, and the 95% CIs for all the AQS data across the U.S. from 2000-
2016. This section presents the technical considerations related to the policy2 considerations
guiding the application of the above results to identify an appropriate SIL for each context, and
the final values the EPA has selected from the study results.36
11 I IN ii Quaiity Analyses . 11'1 jistica! Significance
The following four factors are important for EPA's choice of a SIL: determining a CI to
represent the inherent variability for purposes of the NAAQS compliance demonstration, an
approach for scaling local variability to the level of the NAAQS, the geographic extent of each
summary value, and the DV year or years from which to use the variability results. The EPA has
balanced the necessary policy considerations in conjunction with technical information discussed
here and in the Policy Document2 to develop SIL values that represents, in the Agency's
judgment, an appropriate measure of "insignificant impact" that can be used by PSD permitting
authorities to determine if emissions from proposed construction will "cause or contribute" to a
violation of the corresponding NAAQS.
4.1.1 Confidence Interval
The bootstrap analysis produced estimates for the 25%, 50%, 68%, 75%, and 95% CIs in order to
characterize the range of the inherent variability and to provide options for selecting an
appropriate "insignificant impact" level that will be applied to determine each SIL. The statistical
framework that forms the basis for the bootstrap CIs can be related to more traditional
assessments of statistical significance and statistical significance testing. In contrast to the usage
here, the traditional application of statistical significance testing seeks to determine if a deviation
from the base value is significant (rather than not significant, which is the usage here). In order
to make this determination, larger CIs are typically selected (e.g., 90-99%, which results in a
36 The methods, analysis, and application to the PSD program was subject to a peer-review. The results of that peer-
review and the subsequent changes to the analysis and the document are detailed in a companion report, U.S. EPA,
2018, Peer review report for the technical basis for the EPA's development of significant impact thresholds for PM2.5
and ozone, RTP, NC, EPA 454/S-18-001, available from the U.S. EPA RTP library.
38
-------
high level of confidence that a deviation from the base value is indeed significant). In practice,
the smallest CI that might be considered for a similar significance determination would be the
68% CI, which corresponds to one standard deviation of the mean for a normally distributed
sample. Thus, any deviation larger than the bounds of the 68% CI could traditionally be
identified as a significant deviation from the mean. In this application for the PSD program,
however, we are seeking for each NAAQS a value below which we can conclude that the change
in air quality is "not statistically significant" {i.e., that there will not be a notable difference in air
quality after the new source begins operation). Thus, a CI that could potentially be considered to
represent a significant value would not simultaneously be appropriate for identifying a value that
is statistically not significant. As such, CIs used for identifying a value that is not statistically
significant value should be below 68%. For the reasons described in the Policy Document, the
50% CI was chosen as the benchmark statistic from the bootstrap analysis to represent the
recommended SILs in PSD permitting for ozone and PM2.5 NAAQS.
4.1.2 Adjustment to the Level of the NAAQS
Since air quality variability may have different characteristics at different baseline air quality
levels (e.g., areas with smaller DVs may have less variability than areas with higher DVs), it is
reasonable to characterize the variation in the air quality across a range of air quality levels.
Sections 4.2 and 4.3 present the 50% CI value on both an absolute scale (ug/m3 and ppb) and a
relative scale (percentage), where the relative variability is defined as the percent change from
the base DV at each site. The figures in these sections indicate that there is less of a trend in the
relative variability compared to the absolute variability, and no trend in the relative variability
for the ozone DV at any of the CIs {i.e., the relative variability is not particularly higher or lower
at higher or lower baseline DVs: see Figures 11 and 14). However, the relative variability was
fairly consistent across the range of design values, suggesting a commonality in the relative
variability across a wide range of geographic regions, chemical regimes, and baseline air quality
levels. These results suggest that there is an inherent aspect to the variability, regardless of the
baseline air quality. Thus, for reasons explained in the policy memorandum, the relative
variability values are used for the SILs development.
4.1.3 Selection of a Geographic Scale
A fundamental question raised in using air quality variability to inform the selection of a value
for a SIL is whether the variability-based SIL value should be based on an analysis of air quality
variability at the particular site of the new source or modification, or whether the SIL value
should reflect the central tendency of all monitored sites in the U.S., regardless of the new
source's or modification's planned location.
The EPA recognizes that the air quality data and the nature of the emissions and chemical
formation of ozone and PM2.5 can impact areas differently and, thus, should be considered as part
of this evaluation. The analysis presented in Sections 4.2 and 4.3 (Figures 11 and 14) examine
the relative variability represented by the 50% CI to explore any spatial trends in the data. The
analysis indicates that while there is evidence of local spatial correlation {i.e., most areas have
fairly similar levels of relative variability and that sites with higher variability are isolated), there
39
-------
are no large scale {i.e., region-to-region) trends in ambient air variability. While there is a fairly
consistent range of variability across the U.S., the magnitude of the variability differs from site-
to-site within a state or region with few instances of regional patterns and no strong instances of
east/west or north/south trends.
The analysis shows that a small number of sites with particularly high variability have an effect
on the average network-wide variability. A median network-wide variability is not overly
influenced by a few outliers. Thus, for the reasons explained in the Policy Document, the median
variability from the 50% CI from the entire U.S. ambient monitoring network is used to calculate
SIL values.
4.1.4 Selection of the Three Most Recent Design Value Years
Sections 4.2.1 and 4.3.1 present trends in the median nation-wide variability at the 50% CI from
2000-2016 (equivalent to DV years of 2002-2016). For all three NAAQS considered here, there
are general downward trends in the computed variability across these years. Since the SILs
should reflect the most representative state of the atmosphere, the analysis uses for each NAAQS
the lower variability observed in the more recent periods, rather than all the data since 2000.
However, it may be advantageous to avoid relying on a single 3-year period that may have been
influenced by unusual circumstances, particularly in light of the slightly different trends in the
last several years across pollutants {i.e., most recently the 24-hr PM2.5 NAAQS median 50% CI
has increased, while the annual PM2.5 and ozone NAAQS median 50% CIs have continued to
decrease). Faced with a similar selection of DV periods for use in attainment demonstrations for
nonattainment areas,37 the EPA also recommended using the average of three DV periods to be
used along with a modeling analyses. Thus, for the reasons explained in the Policy Document,
the three most recent DV periods {i.e., 2012-2014, 2013-2015, 2014-2016) were used for
determining SILs for PM2.5 and ozone.
4.2 Analysis for Ozone
Figure 11 shows, for each monitoring site, the half-width of the 50% CI divided by the actual
design value, from the 2014-2016 data for the ozone NAAQS.38 The scatter plot for the relative
variability values shows that the data are fairly well concentrated around 1-2%, with a small
number of sites exceeding 3% and a maximum around 4.5% (with one exception). The
variability is fairly consistent across the range of baseline air quality levels, indicating that there
is no particular trend with actual design value in the site level variability. The median and mean
variability values are fairly similar
The spatial distribution of the relative variability from the 50% CI is also shown in Figure 11,
with 2014-2016 DV period site data colored according to their relative variability and sites with
insufficient data during this period in gray. There appears to be no notable large-scale spatial
37 Draft Modeling Guidance for Demonstrating Attainment of Air Quality Goals for Ozone, PM2 5, and Regional
Haze. R. Wayland, AQAD, Dec. 3, 2014.
38 The plots for ozone show a distinct banding in the results. This is a feature of the truncation conventions that were
applied to the AQS data prior to the air quality variability analysis.
40
-------
trends in highest relative variability. The lack of any large-scale spatial trend indicates that there
is indeed a fundamental characteristic to the relative ambient air quality variability (see Section
4.1.3).
4.2.1 Ozone Temporal Trends
The median air quality variability from the 15 DV periods for ozone is shown in Figure 12 (each
period is 3 years). This analysis shows how the combination of changes in the network design
(e.g., the change in the monitoring season) and the changes in emissions and meteorology over
this period have impacted the variability in the DVs from the network. There has been a small
decrease in the variability for ozone (0.03 percentage points per year), though most of that
decrease occurred in the form of a large drop in the variability between the 2003-2005 and 2004-
2006 DV periods. There were increases in the variability for the 2008 and 2012 DV periods,
indicating that there is some variability between years. The median air quality variability values
at the 50% CI for the three most recent DV periods (i.e., 2012-2014, 2013-2015, 2014-2016) as
shown in Table 3, when averaged result in a SIL value for the ozone 8-hour NAAQS of 1.47%.
This corresponds to 1.0 ppb at the level of the 2015 ozone NAAQS (70 ppb).
Table 3 - Summary of ozone bootstrap results for three design periods, 2012-2014, 2013-2015,
and 2014-2016
Year/NAAQS
2014 annual
2015 annual
2016 annual
Difference, mean of median bootstrap vs
actual DV
0.44%
0.48%
0.43%
Avg. 25% CI span
0.74%
0.76%
0.75%
Avg. 50% CI span
1.47%
1.47%
1.47%
Avg. 68% CI span
2.14%
2.05%
2.11%
Avg. 75% CI span
2.38%
2.34%
2.31%
Avg. 95% CI span
4.31%
3.97%
3.97%
Number of sites
1148
1131
1131
41
-------
2016 bootstrap 50th percentile uncert
c
'ro
¦c
CD
O
c
Z5
0)
>
"-4»
ro
CD
9% ¦
6% ¦
3% ¦
LONG
-80
mean 1.422%
«
median 1.471%
'«
o%-
-100
-120
-140
40 60 80
4th 8-hr high NAAQS (ppb)
100
2016 bootstrap 50th percentile uncert
-Q
Q.
A 3-
c
'ro
¦c 2-
CD
O
c
CD ) -
O
U)
JD
CC o-
mean 0.943
median 1.000
40
60 80
4th 8-hr high NAAQS (ppb)
100
4th MDA8, rel uncert (%)
cc
50-
45-
40-
35-
30-
25-
4%
3%
2%
1%
0%
-120
-100
-80
long
Figure 11 - Bootstrap results from the 50% CIs for the 2016 ozone DVs. The top panel shows the
relative difference between the span of the CI and the actual DV across the range of actual DVs,
the middle panel shows the absolute difference between the values across the same range, and
42
-------
the bottom panel shows the spatial distribution of the relative difference between the values at
each site.
~ - Median of ozone 50% CI
Mean of ozone 50% CI
Number of monitors
y = -0.0003x + 0.5475
R2 = 0.5376
Figure 12 - Median and mean variability in the network determined from the bootstrap analysis
for the 15 DV periods from 2002-2016 for ozone (each DV period represents 3 years of data and
the data are plotted on the ending year, i.e., the 2016 DV period is from 2014-2016 and plotted at
2016).
4.3 Analysis for PM2.5
Figure 13 shows, for each monitoring site, the half-width of the 50% CI divided by the DVs, for
both the annual and 24-hr PM2.5 NAAQS. This figure shows that the relative variability using
these assumptions is indeed stable across the range of baseline air quality levels, while the
absolute variability increases as the baseline air quality levels increase.39 The values for relative
variability are fairly well concentrated around 1-2% for the annual NAAQS, with a small number
of sites exceeding 3% and a maximum slightly less than 5%. For the 24-hr NAAQS, the data are
39 The rounding conventions for PM2.5 result in striations in the data, which are clearly visible in Figure 13. While
these striations appear to represent trends in the data, this is a function of the display and not actual trends in the
data. Linear regression lines have been added to each panel, which clearly show an increase in the absolute
variability with increasing DVs, while the relative variability is relatively unaffected by changes in the DVs.
43
-------
concentrated around 4-5%, with a small number of sites exceeding 10%. The outliers occur
across the range of baseline air quality levels, indicating that there is no particular trend with
actual DV in the occurrence of sites with especially high variability. When assessed as a whole,
despite their relatively infrequent occurrence, these outliers do tend to increase the average
variability. As with ozone, the median variability is less influenced by these outliers and appears
to be more representative of the central tendency of the distribution of relative variability values
than the average. Unlike the ozone results, the median is smaller than the mean
The spatial distribution of the relative variability from Figure 13 is shown in Figure 14, with sites
having data during the 2014-2016 DV period colored according to their relative variability (sites
with insufficient data during the 2014-2016 DV period are not shown, data from other years are
presented in the Appendix). Based solely a visual inspection, there appears to be no notable
large-scale spatial trends in highest relative variability in either the annual or 24-hr PM2.5
NAAQS. The sites with larger variability tend to occur in the western half of the U.S., though the
sites are isolated and generally not grouped into any specific geographic region. The exceptions
to this appears in Western U.S. and along the Ohio River Valley, where there are a collection of
sites with higher variability (generally above 7.5%) in the 24-hr NAAQS (though the annual
NAAQS does not display this apparently higher variability). This result may be related to the
nature of high PM events in the western half of the U.S. (e.g., the typical PM2.5 levels may be
lower in the western states, but the events that do occur produce much higher concentrations than
the typical background, which would result in greater skew and thus greater variability in DVs
computed from these data, particularly in the 24-hr PM2.5 DVs). These sites also tend to have a
lower sampling frequency (see Figure 2), which we have shown to artificially increase the
apparent variability. There are also trends in missing data that are important to consider when
exploring regional trends in variability. In particular, for the period 2008 through 2013, the data
were invalidated for several states. Late in 2014, a problem was found with the PM2.5 data from
these states and, as a result, the data were invalidated for a number of years.40
In response to comments received during the peer-review of the initial public draft of this
document, several more detailed spatial analyses are presented for the annual and 24-hr PM2.5
data in Section 7 of the Appendix to this document. The analysis attempts to identify natural
groupings of sites based on location and the level of air quality variability using cluster analysis.
The analysis applied both an iterative (K-means) and a hierarchical clustering algorithm using
various combination of the site-level variability, latitude, and longitude, resulting in 12 different
sets of clusters. The analysis also considered comparing sites by grouping them using the
National Oceanic and Atmospheric Administration (NOAA) "climate regions," which are
groupings of states known by NOAA to have similar climatic conditions. While some of the
analysis did identify some unique clusters, these groups were often not spatially grouped. Many
40 The dates and specific monitors affected in each state vary. For DC, data were invalidated in Q4 of 2016. For FL,
data were invalidated from 2011-2014. For GA, data were invalidated in Q1 of 2011. For ME, data were
invalidated from 1998-2015. For ID, data were invalided from 2011-2014. For IL, data were invalidated from 2011-
2013 and Q1-Q2 of 2014. For Louisville, KY, data were invalidated from 2009-2013. For the South Coast Air
Basin, CA, data were invalidated in 2014. For MS, data were invalidated in 2014. For TN, data were invalidated
from 2011-2014. For WA, data were invalidated from 2011-2015. The invalidation may not have affected every
monitor in each state, but these dates cover the time spans for which the data invalidation occurred.
44
-------
of the analyses did not identify any unique clusters. When the results from the special cluster
analysis are considered as a whole, they do not indicate any consistent large-scale trends. The
lack of any consistent regional trend indicates that there is indeed a fundamental characteristic to
the relative ambient air quality variability (see Section 4.1.2).
45
-------
boot 50th percentile uncert, all sites
4% -
c
00
¦c.
<1>
O
c
3
J5 2% -
>
13
20% -
0%-
mean 5.600 %
median 4.348 %
20
40
24-hr NAAQS
60
boot 50th percentile uncert, all sites
0.4 -
0.0-
median 0.150
8 12
annual NAAQS
16
7.5-
>>
g
C 5.0-
-------
Annual NAAQS, rel uncert (%), all sites
50-
45-
40-
35-
30-
25-
5%
4%
3%
2%
1%
0%
-120
-100
long
24-hr NAAQS, rel uncert (%), all sites
-80
50"
45-
40-
35-
30-
25-
10.0%
7,5%
5.0%
2.5%
-120
-100
long
-80
Figure 14 - Spatial distribution of the relative difference between the span of the 50% CI and the
actual DV for the 2014-2016 PM2.5 DVs.
47
-------
4.3.1 PM2.5 Temporal Trends
The median air quality variability from the 13 DV periods for both the annual and 24-hr PM2.5
NAAQS are shown in Figure 15. This analysis shows how the combination of the changes in the
network design (e.g., the change in the monitoring frequency) and the changes in emissions and
meteorology have impacted the network variability. There has been a greater decrease in the
variability in the 24-hr PM2.5 NAAQS than in the variability for the annual PM2.5 NAAQS (0.03
percentage points per year versus 0.02 percentage points per year). The analysis in Section 3.2.2
showed that the 24-hr NAAQS is more affected by the monitoring frequency than the annual
NAAQS, so it is likely that the change in monitoring frequency played some role in the larger
decrease in the variability for the 24-hr PM2.5 NAAQS. The median air quality variability at the
50% CI for the three most recent DV periods (i.e., 2012-2014, 2013-2015, 2014-2016) is shown
in Table 4, and when averaged result in a SIL value of 1.66% for the annual PM2.5 NAAQS (12
(j,g/m3) and 4.27% for the PM2.5 24-hr NAAQS (35 (J,g/m3). These values correspond to 0.2 [j,g/m3
at the level of 12 [j,g/m3 for the annual NAAQS, and 1.5 [j,g/m3 at the level of 35 |ig/m3 for the
NAAQS.
Table 4 - Summary of comparison of the air quality variability determined by the bootstrap
analysis for three design periods.
Year/NAAQS
2014 annual
2015 annual
2016 annual
Difference, median bootstrap vs actual
0.04%
0.03%
0.04%
Avg. 25% CI span
0.67%
0.70%
0.71%
Avg. 50% CI span
1.63%
1.65%
1.69%
Avg. 68% CI span
2.44%
2.46%
2.45%
Avg. 75% CI span
2.80%
2.83%
2.82%
Avg. 95% CI span
4.72%
4.86%
4.79%
Year/NAAQS
2014 24-hr
2015 24-hr
2016 24-hr
Difference, median bootstrap vs actual
1.14%
1.36%
1.23%
Avg. 25% CI span
2.27%
2.27%
2.50%
Avg. 50% CI span
4.29%
4.17%
4.35%
Avg. 68% CI span
6.00%
6.25%
6.52%
Avg. 75% CI span
6.82%
7.50%
7.69%
Avg. 95% CI span
12.50%
12.50%
13.16%
Number of sites
507
531
535
48
-------
- ~ - PM2.5 annual median
PM2.5 24-hr mean
6.0%
PM2.5 annual mean - ¦ - PM2.4 24-hr median
Number of monitors
y = -0.0003x + 0.598
n2_n
1
r\
u.o// /
700
650
0)
oc
> 3.5%
OJ
15 3.0%
2.5%
2.0%
1.5%
1.0%
^ ¦ *4 ^ '
- *
v = -0.0002x + 0.3854
R2 =
0.7479
600
10
CD
10
M
O
550
CD
-Q
3
;z:
500
450
400
2002
2004
2006
2008
2010
2012
2014
2016
Figure 15 - Median and mean variability in the network determined from the bootstrap analysis
(50% CI) for the 15 DV periods from 2002-2016 for PM2.5 (each DV period represents 3 years of
data and the data is plotted on the ending year: i.e., the 2016 DV period is from 2014-2016 and
plotted at 2016).
49
-------
5. Additic
Data for the analyses presented in this document can be obtained by contacting:
Chris Owen, PhD
Office of Air Quality Planning and Standards, U. S. EPA
109 T.W. Alexander Dr.
RTP, NC 27711
919-541-5312
owen.chris@epa.gov
-------
United States Office of Air Quality Planning and Standards Publication No. EPA-454/R-18-001
Environmental Protection Air Quality Analysis Division April, 2018
Agency Research Triangle Park, NC
51
------- |