Bayesian Space-time Downscaling Fusion Model (Downscaler) Derived Estimates of Air Quality for 2021


e 	 "l

w

*1 PRO^

Bayesian Space-time Downscaling Fusion Model
(Downscaler) - Derived Estimates of Air Quality
for 2021

-------

-------
EPA-454/R-24-002
October 2024

Bayesian Space-time Downscaling Fusion Model (Downscaler) - Derived Estimates of Air Quality

for 2021

U.S. Environmental Protection Agency
Office of Air Quality Planning and Standards
Air Quality Assessment Division
Research Triangle Park, NC

-------
Authors:

Adam Reff (EPA/OAR)
Alison Eyth (EPA/OAR)
David Mintz (EPA/OAR)
Janice Godfrey (EPA/OAR)
Jeff Vukovich (EPA/OAR)
Julia Black (EPA/OAR)
Karl Seltzer (EPA/OAR)
Sharon Phillips (EPA/OAR)

Acknowledgements:

The following people served as reviewers of this document: Julia Black (EPA/OAR) and

David Mintz (EPA/OAR).

-------
Contents

Contents	1

1.0 Introduction	2

2.0 Air Quality Data	5

2.1	Introduction to Air Quality Impacts in the United States	5

2.2	Ambient Air Quality Monitoring in the United States	7

2.3	Air Quality Indicators Developed for the EPHT Network	12

3.0 Emissions Data	15

3.1	Introduction to Emissions Data Development	15

3.2	Emission Inventories and Approaches	17

3.3	Emissions Modeling Summary	24

3.4	Emissions References	41

4.0 CMAQ Air Quality Model Estimates	43

4.1	Introduction to the CMAQ Modeling Platform	43

4.2	CMAQ Model Version, Inputs and Configuration	45

4.3	CMAQ Model Performance Evaluation	50

5.0 Bayesian space-time downscaling fusion model (downscaler) -Derived Air Quality Estimates	75

5.1	Introduction	75

5.2	Downscaler Model	75

5.3	Downscaler Concentration Predictions	76

5.4	Downscaler Uncertainties	81

5.5	Summary and Conclusions	83

Appendix A - Acronyms	84

Appendix B - Emissions Totals by Sector	87

1

-------
1.0 Introduction

This report describes estimates of daily ozone (maximum 8-hour average) and fine particulate matter
(PM2.5) (24-hour average) concentrations throughout the contiguous United States during the 2021
calendar year generated by EPA's recently developed data fusion method termed the "downscaler
model" (DS). Air quality monitoring data from the State and Local Air Monitoring Stations (SLAMS) and
numerical output from the Community Multiscale Air Quality (CMAQ) model were both input to DS to
predict concentrations at the 2010 and 2020 U.S. census tract centroids encompassed by the CMAQ
modeling domain. Information on EPA's air quality monitors, CMAQ model, and DS is included to provide
the background and context for understanding the data output presented in this report. These estimates
are intended for use by statisticians and environmental scientists interested in the daily spatial
distribution of ozone and PM2.5.

DS operates by calibrating CMAQ data to the observational data, and then uses the resulting relationship
to predict "observed" concentrations at new spatial points in the domain. Although similar in principle to
a linear regression, spatial modeling aspects have been incorporated for improving the model fit, and a
Bayesian1 approach to fitting is used to generate an uncertainty value associated with each
concentration prediction. The uncertainties that DS produces are a major distinguishing feature from
earlier fusion methods previously used by EPA such as the "Hierarchical Bayesian" (HB) model (McMillan
et al, 2009). The term "downscaler" refers to the fact that DS takes grid-averaged data (CMAQ) for input
and produces point-based estimates, thus "scaling down" the area of data representation. Although this
allows air pollution concentration estimates to be made at points where no observations exist, caution is
needed when interpreting any within-grid cell spatial gradients generated by DS since they may not exist
in the input datasets. The theory, development, and initial evaluation of DS can be found in the earlier
papers of Berrocal, Gelfand, and Holland (2009, 2010, and 2011).

EPA's Office of Air and Radiation's (OAR) Office of Air Quality Planning and Standards (OAQPS) provides
air quality monitoring data and model estimates to the Centers for Disease Control and Prevention (CDC)
for use in their Environmental Public Health Tracking (EPHT) Network. CDC's EPHT Network supports the
linkage of air quality data with human health outcome data for use by various public health agencies
throughout the U.S. The EPHT Network Program is a multidisciplinary collaboration that involves the
ongoing collection, integration, analysis, interpretation, and dissemination of data from: environmental
hazard monitoring activities; human exposure assessment information; and surveillance of noninfectious
health conditions. As part of the National EPHT Program efforts, the CDC led the initiative to build the
National EPHT Network (https://www.cdc.gov/nceh/tracking/). The National EPHT Program, with the
EPHT Network as its cornerstone, is the CDC's response to requests calling for improved understanding
of how the environment affects human health. The EPHT Network is designed to provide the means to

1 Bayesian statistical modeling refers to methods that are based on Bayes' theorem and model the world in terms of
probabilities based on previously acquired knowledge.

2

-------
identify, access, and organize hazard, exposure, and health data from a variety of sources and to
examine, analyze, and interpret those data based on their spatial and temporal characteristics.

Since 2002, EPA has collaborated with the CDC on the development of the EPHT Network. On September
30, 2003, the Secretary of Health and Human Services (HHS) and the Administrator of EPA signed a joint
Memorandum of Understanding (MOU) with the objective of advancing efforts to achieve mutual
environmental public health goals.2 HHS, acting through the CDC and the Agency for Toxic Substances
and Disease Registry (ATSDR), and EPA agreed to expand their cooperative activities in support of the
CDC EPHT Network and EPA's Central Data Exchange Node on the Environmental Information Exchange
Network in the following areas:

•	Collecting, analyzing, and interpreting environmental and health data from both agencies (HHS and EPA).

•	Collaborating on emerging information technology practices related to building, supporting, and
operating the CDC EPHT Network and the Environmental Information Exchange Network.

•	Developing and validating additional environmental public health indicators.

•	Sharing reliable environmental and public health data between their respective networks in an efficient
and effective manner.

•	Consulting and informing each other about dissemination of results obtained through work carried out
under the MOU and the associated Interagency Agreement (IAG) between EPA and CDC.

The best available statistical fusion model, air quality data, and CMAQ numerical model output were
used to develop the estimates. Fusion results can vary with different inputs and fusion modeling
approaches. As new and improved statistical models become available, EPA will provide updates.

Although these data have been processed on a computer system at the EPA, no warranty expressed or
implied is made regarding the accuracy or utility of the data on any other system or for general or
scientific purposes, nor shall the act of distribution of the data constitute any such warranty. It is also
strongly recommended that careful attention be paid to the contents of the metadata file associated
with these data to evaluate data set limitations, restrictions, or intended use. The EPA shall not be held
liable for improper or incorrect use of the data described and/or contained herein.

2The original HHS and EPA MOU is available at https://www.cdc.gov/nceh/tracking/pdfs/epa mou 2007.pdf.

3

-------
The four remaining sections and appendices in the report are as follows:

•	Section 2 describes the air quality data obtained from EPA's nationwide monitoring network and
the importance of the monitoring data in determining potential health risks.

•	Section 3 details the emissions inventory data, how it is obtained, and how it is processed into a
key input into the CMAQ air quality computer model.

•	Section 4 describes the CMAQ computer model and its role in providing estimates of pollutant
concentrations across the U.S. based on 12-km grid cells over the contiguous U.S.

•	Section 5 explains the downscaler model used to statistically combine air quality monitoring data
and air quality estimates from the CMAQ model to provide daily air quality estimates for the 2010
and 2020 U.S. census tract centroid locations within the contiguous U.S.

•	Appendix A provides a description of acronyms used in this report.

•	Appendix B is a separate spreadsheet that shows emissions totals for the modeling domain and
for each emissions modeling sector (see Section 3 for more details).

4

-------
2.0 Air Quality Data

To compare health outcomes with air quality measures, it is important to understand the origins of
those measures and the methods for obtaining them. This section provides a brief overview of the
origins and process of air quality regulation in this country. It provides a detailed discussion of ozone (03)
and particulate matter (PM). The EPHT program has focused on these two pollutants, since numerous
studies have found them to be most pervasive and harmful to public health and the environment, and
there are extensive monitoring and modeling data available.

2.1 Introduction to Air Quality Impacts in the United States
2.1.1 The Clean Air Act

In 1970, the Clean Air Act (CAA) was signed into law. Under this law, EPA sets limits on how much of a
pollutant can be in the air anywhere in the United States. This ensures that all Americans have the same
basic health and environmental protections. The CAA has been amended several times to keep pace
with new information. For more information on the CAA. go to https://www.epa.gov/clean-air-act-
overview.

Under the CAA, the EPA has established standards, or limits, for six air pollutants known as the criteria
air pollutants: carbon monoxide (CO), lead (Pb), nitrogen dioxide (N02), sulfur dioxide (S02), ozone (03),
and particulate matter (PM). These standards, called the National Ambient Air Quality Standards
(NAAQS), are designed to protect public health and the environment. The CAA established two types of
air quality standards. Primary standards set limits to protect public health, including the health of
"sensitive" populations such as asthmatics, children, and the elderly. Secondary standards set limits to
protect public welfare, including protection against decreased visibility, damage to animals, crops,
vegetation, and buildings. The CAA requires EPA to review these standards at least every five years. For
more specific information on the NAAQS, go to https://www.epa.gov/criteria-air-pollutants/naaqs-table.
For general information on the criteria pollutants, go to https://www.epa.gov/criteria-air-pollutants.

When these standards are not met, the area is designated as a nonattainment area. States must develop
state implementation plans (SIPs) that explain the regulations and controls it will use to clean up the
nonattainment areas. States with an EPA-approved SIP can request that the area be designated from
nonattainment to attainment by providing three consecutive years of data showing NAAQS compliance.
The state must also provide a maintenance plan to demonstrate how it will continue to comply with the
NAAQS and demonstrate compliance over a 10-year period, and what corrective actions it will take
should a NAAQS violation occur after designation. EPA must review and approve the NAAQS compliance
data and the maintenance plan before designating the area; thus, a person may live in an area
designated as nonattainment even though no NAAQS violation has been observed for quite some time.
For more information on ozone designations, go to https://www.epa.gov/ozone-designations and for
PM designations, go to https://www.epa.gov/particle-pollution-designations.

5

-------
2.1.2 Ozone

Ozone is a colorless gas composed of three oxygen atoms. Ground level ozone is formed when pollutants
released from cars, power plants, and other sources react in the presence of heat and sunlight. It is the
prime ingredient of what is commonly called "smog." When inhaled, ozone can cause acute respiratory
problems, aggravate asthma, cause inflammation of lung tissue, and even temporarily decrease the lung
capacity of healthy adults. Repeated exposure may permanently scar lung tissue. EPA's Integrated
Science Assessments and Risk and Exposure documents are available at

https://www.epa.gov/naaqs/ozone-o3-air-quality-standards. The current NAAQS for ozone (last revised
in 2015) is a daily maximum 8-hour average of 0.070 parts per million [ppm] (for details, see
https://www.epa.gov/ozone-pollution/setting-and-reviewing-standards-control-ozone-
pollution#standards). The CAA requires EPA to review the NAAQS at least every five years and revise
them as appropriate in accordance with Section 108 and Section 109 of the Act. The standards for ozone
are shown in Table 2-1.

Table 2-1. Ozone National Ambient Air Quality Standards

Form of the Standard (parts per million, ppm)

1997

2008

2015

Annual 4th highest daily max 8-hour average, averaged over
three years

0.08

0.075

0.070

2.1.3 Particulate Matter

PM air pollution is a complex mixture of small and large particles of varying origin that can contain
hundreds of different chemicals, including cancer-causing agents like polycyclic aromatic hydrocarbons
(PAH), as well as heavy metals such as arsenic and cadmium. PM air pollution results from direct
emissions of particles as well as particles formed through chemical transformations of gaseous air
pollutants. The characteristics, sources, and potential health effects of particulate matter depend on its
source, the season, and atmospheric conditions.

As practical convention, PM is divided by sizes into classes with differing health concerns and potential
sources.3 Particles less than 10 micrometers in diameter (PMi0) pose a health concern because they can
be inhaled into and accumulate in the respiratory system. Particles less than 2.5 micrometers in
diameter (PM2.5) are referred to as "fine" particles. Because of their small size, fine particles can lodge
deeply into the lungs. Sources of fine particles include all types of combustion (motor vehicles, power
plants, wood burning, etc.) and some industrial processes. Particles with diameters between 2.5 and 10
micrometers (PM10-2.5) are referred to as "coarse" or PMc. Sources of PMc include crushing or grinding
operations and dust from paved or unpaved roads. The distribution of PM10, PM2.5 and PMc varies from
the eastern U.S. to arid western areas.

3 The measure used to classify PM into sizes is the aerodynamic diameter. The measurement instruments used for PM are
designed and operated to separate large particles from the smaller particles. For example, the PM25 instrument only captures
and thus measures particles with an aerodynamic diameter less than 2.5 micrometers. The EPA method to measure PMc is
designed around taking the mathematical difference between measurements for PM10and PM25

-------
Particle pollution - especially fine particles - contains microscopic solids and liquid droplets that are so
small that they can get deep into the lungs and cause serious health problems. Numerous scientific
studies have linked particle pollution exposure to a variety of problems, including premature death in
people with heart or lung disease, nonfatal heart attacks, irregular heartbeat, aggravated asthma,
decreased lung function, and increased respiratory symptoms, such as irritation of airways, coughing or
difficulty breathing. Additional information on the health effects of particle pollution and other technical
documents related to PM standards are available at https://www.epa.gov/pm-pollution.

The current NAAQS for PM2.5 (last revised in 2024) includes both a 24-hour standard to protect against
short-term effects, and an annual standard to protect against long-term effects. The annual average
PM2.5 concentration must not exceed 9.0 micrograms per cubic meter (ug/m3) based on the annual
mean concentration averaged over three years, and the 24-hr average concentration must not exceed
35 ug/m3 based on the 98th percentile 24-hour average concentration averaged over three years. More
information is available at https://www.epa.gov/pm-pollution/setting-and-reviewing-standards-control-
particulate-matter-pm-pollution#standards. The standards for PM2.5 are shown in Table 2-2.

Table 2-2. PM2.5 National Ambient Air Quality Standards

Form of the Standard
(micrograms per cubic meter, |ig/m3)

1997

2006

2012

2024

Annual mean of 24-hour averages, averaged over 3 years

15.0

12.0

9.0

98th percentile of 24-hour averages, averaged over 3 years

During June to August 2024, EPA updated PM2.5 data in AQS collected since 2017 with Teledyne
Advanced Pollution Instrumentation T640/T640X Federal Equivalent Method (FEM) monitors to make
those data more comparable to data collected by Federal Reference Method (FRM) monitors. PM2.5 data
retrieved from AQS after August 2024 reflect this update, including the 2021 PM2.5 downscaler input
dataset documented in this report which was retrieved in November 2024. See this PM?.s Data Advisory
for more details.

2.2 Ambient Air Quality Monitoring in the United States
2.2.1 Monitoring Networks

The CAA (Section 319) requires establishment of an air quality monitoring system throughout the U.S.
The monitoring stations in this network have been called the State and Local Air Monitoring Stations
(SLAMS). The SLAMS network consists of approximately 4,000 monitoring sites set up and operated by
state and local air pollution agencies according to specifications prescribed by EPA for monitoring
methods and network design. All ambient monitoring networks selected for use in SLAMS are tested
periodically to assess the quality of the SLAMS data being produced. Measurement accuracy and
precision are estimated for both automated and manual methods. The individual results of these tests
for each method or analyzer are reported to EPA. Then, EPA calculates quarterly integrated estimates of
precision and accuracy for the SLAMS data.

-------
The SLAMS network experienced accelerated growth throughout the 1970s. The networks were further
expanded in 1999 based on the establishment of separate NAAQS for fine particles (PM2.5) in 1997. The
NAAQS for PM2.5 were established based on their link to serious health problems ranging from increased
symptoms, hospital admissions, and emergency room visits, to premature death in people with heart or
lung disease. While most of the monitors in these networks are located in populated areas of the
country, "background" and rural monitors are an important part of these networks. For more
information on SLAMS, as well as EPA's other air monitoring networks go to
https://www.epa.gov/amtic.

In 2023, approximately 35 percent of the U.S. population was living within 10 kilometers of ozone and
PM2.5 monitoring sites. Highly populated areas in the eastern U.S. and California are well covered by
both ozone and PM2.5 monitoring network (Figure 2-1).

8

-------
Distance to Active
Ozone Monitors

#	< 10 km (100.7 million
people)

#	10 km - 25 km (129.7
million people)

25 km - 50 km (58.8
million people)
50 km - 75 km (21.2
million people)
75 km - 100 km (8.8
million people)
, 100 km -150 km (8.4

million people)
i 150 km < ( 5.4 million
people)

Distance to Active
PM2.5 Monitors

#	< 10 km (115.1 million
people)

#	10 km - 25 km (114
million people)

25 km - 50 km (59
million people)
50 km - 75 km (24.6
million people)
75 km -100 km (10.9
million people)

#	100 km -150 km (6.6
million people)

9 150 < (2.9 million
people)

Figure 2-1. Distances from U.S. Census Tract centroids to the nearest monitoring site, 2023.

9

-------
In summary, state and local agencies and tribes implement a quality-assured monitoring network to
measure air quality across the U.S. The EPA provides guidance to ensure a thorough understanding of
the quality of the data produced by these networks. These monitoring data have been used to
characterize the status of the nation's air quality and the trends across the U.S. (see
https://www.epa.gov/air-trends).

2.2.2 Air Quality System Database

EPA's Air Quality System (AQS) database contains ambient air monitoring data collected by EPA, state,
local, and tribal air pollution control agencies from thousands of monitoring stations. AQS also contains
meteorological data, descriptive information about each monitoring station (including its geographic
location and its operator), and data quality assurance and quality control information. State and local
agencies are required to submit their air quality monitoring data into AQS within 90 days following the
end of the quarter in which the data were collected. This ensures timely submission of these data for use
by state, local, and tribal agencies, EPA, and the public. EPA's OAQPS and other AQS users rely upon the
data in AQS to assess air quality, assist in compliance with the NAAQS, evaluate SIPs, perform modeling
for permit review analysis, and perform other air quality management functions. For more details,
including how to retrieve data, go to https://www.epa.gov/aqs.

2.2.3 Advantages and Limitations of the Air Quality Monitoring and Reporting System

Air quality data is required to assess public health outcomes that are affected by poor air quality. The
challenge is to get surrogates for air quality on time and spatial scales that are useful for EPHT activities.

The advantage of using ambient data from EPA monitoring networks for comparison with health
outcomes is that these measurements of pollution concentrations are the best characterization of the
concentration of a given pollutant at a given time and location. Furthermore, the data are supported by
a comprehensive quality assurance program, ensuring data of known quality. One disadvantage of using
the ambient data is that it is usually out of spatial and temporal alignment with health outcomes. This
spatial and temporal 'misalignment' between air quality monitoring data and health outcomes is
influenced by the following key factors: the living and/or working locations (microenvironments) where
a person spends their time not being co-located with an air quality monitor; time(s)/date(s) when a
patient experiences a health outcome/symptom (e.g., asthma attack) not coinciding with time(s)/date(s)
when an air quality monitor records ambient concentrations of a pollutant high enough to affect the
symptom (e.g., asthma attack either during or shortly after a high PM2.5 day).

To compare/correlate ambient concentrations with acute health effects, daily local air quality data is
needed.4 Spatial gaps exist in the air quality monitoring network, especially in rural areas since the air
quality monitoring network is designed to focus on measurement of pollutant concentrations in high
population density areas. Temporal limits also exist. Hourly ozone measurements are aggregated to daily
values (the daily max 8-hour average is relevant to the ozone standard). Ozone is typically monitored
during the ozone season (the warmer months, approximately April through October). However, year-

4 EPA uses exposure models to evaluate the health risks and environmental effects associated with exposure. These models
are limited by the availability of air quality estimates, https://www.epa.gov/technical-air-pollution-resources.

-------
long data is available in many areas and is extremely useful to evaluate whether ozone is a factor in
health outcomes during the non-ozone seasons. PM2.5 is generally measured year-round. Most Federal
Reference Method (FRM) PM2.5 monitors collect data one day in every three days, due in part to the time
and costs involved in collecting and analyzing the samples. Additionally, continuous monitors have
become available which can automatically collect, analyze, and report PM2.5 measurements on an hourly
basis. These monitors are available in most of the major metropolitan areas. Some of these continuous
monitors have been determined to be equivalent to the FRM monitors for regulatory purposes and are
called Federal Equivalent Methods (FEM).

2.2.4 Use of Air Quality Monitoring Data

Air quality monitoring data has been used to provide the information for the following situations:

(1) Assessing effectiveness of SIPs in addressing NAAQS nonattainment areas

(2) Characterizing local, state, and national air quality status and trends

(3) Associating health and environmental damage with air quality levels/concentrations

For the EPHT effort, EPA is providing air quality data to support efforts associated with (2), and (3)
above. Data supporting (3) is generated by EPA through the use of its air quality data and its downscaler
model.

Most studies that associate air quality with health outcomes use air monitoring as a surrogate for
exposure to the air pollutants being investigated. Many studies have used the monitoring networks
operated by state and federal agencies. Some studies perform special monitoring that can better
represent exposure to the air pollutants: community monitoring, near residences, in-house or workplace
monitoring, and personal monitoring. For the EPHT program, special monitoring is generally not
supported, though it could be used on a case-by-case basis.

From proximity-based exposure estimates to statistical interpolation, many approaches are developed
for estimating exposures to air pollutants using ambient monitoring data (Jerrett et al., 2005).

Depending upon the approach and the spatial and temporal distribution of ambient monitoring data,
exposure estimates to air pollutants may vary greatly in areas further apart from monitors (Bravo et al.,
2012). Factors like limited temporal coverage (i.e., PM2.5 monitors do not operate continuously such as
recording every third day or ozone monitors operate only certain part of the year) and limited spatial
coverage (i.e., most monitors are located in urban areas and rural coverage is limited) hinder the ability
of most of the interpolation techniques that use monitoring data alone as the input. If we look at the
example of Voronoi Neighbor Averaging (VNA) (referred as the Nearest Neighbor Averaging in most
literature), rural estimates would be biased towards the urban estimates. To further explain this point,
assume the scenario of two cities with monitors and no monitors in the rural areas between, which is
very plausible. Since exposure estimates are guaranteed to be within the range of monitors in VNA,
estimates for the rural areas would be higher according to this scenario.

Air quality models may overcome some of the limitations that monitoring networks possess. Models
such as CMAQ can estimate concentrations in reasonable temporal and spatial resolutions. However,
these sophisticated air quality models are prone to systematic biases since they depend upon so many

-------
variables (i.e., meteorological models and emission models) and complex chemical and physical process
simulations.

Combining monitoring data with air quality models (via fusion or regression) may provide the best
results in terms of estimating ambient air concentrations in space and time. EPA's eVNA5 is an example
of an earlier approach for merging air quality monitor data with CMAQ model predictions. DS attempts
to address some of the shortcomings in these earlier attempts to statistically combine monitor and
model predicted data, see published paper referenced in Section 1 for more information about DS. As
discussed in the next section, there are two methods used in EPHT to provide estimates of ambient
concentrations of air pollutants: air quality monitoring data and the downscaler model estimate, which
is a statistical 'combination' of air quality monitor data and photochemical air quality model predictions
(e.g., CMAQ).

2.3 Air Quality Indicators Developed for the EPHT Network

Air quality indicators have been developed for use in the Environmental Public Health Tracking Network
by CDC using the ozone and PM2.5 data from EPA. The approach used divides "indicators" into two
categories. First, basic air quality measures were developed to compare air quality levels over space and
time within a public health context (e.g., using the NAAQS as a benchmark). Next, indicators were
developed that mathematically link air quality data to public health tracking data (e.g., daily PM2i5 levels
and hospitalization data for acute myocardial infarction). Table 2-3 and Table 2-4 describe the issues
impacting calculation of basic air quality indicators.

Table 2-3. Public Health Surveillance Goals and Current Status

Goal

Status

1) Air data sets and metadata required for air
quality indicators are available to EPHT state
Grantees.

Data are available through state agencies
and EPA's AQS. EPA and CDC developed an
interagency agreement, where EPA provides
air quality data along with statistically
combined AQS and CMAQ data, associated
metadata, and technical reports that are
delivered to CDC.

a) Estimate the linkage or association of
PM2.5 and ozone on health to: Identify
populations that may have higher risk of
adverse health effects due to PM2.5 and
ozone,

b) Generate hypothesis for further research,
and

Regular discussions have been held on
health-air linked indicators and
CDC/HFI/EPA convened a workshop January
2008. CDC has collaborated on a health
impact assessment (HIA) with Emory
University, EPA, and state grantees that can
be used to facilitate greater understanding
of these linkages.

5 eVNA is described in the "Regulatory Impact Analysis for the Final Clean Air Interstate Rule", EPA-452/R-05-002, March
2005, Appendix F.

-------
c) Provide information to support
prevention and pollution control
strategies.

2) Produce and disseminate basic indicators and
other findings in electronic and print formats
to provide the public, environmental health
professionals, and policymakers, with current
and easy-to-use information about air
pollution and the impact on public health.

Templates and "how to" guides for PM2.5
and ozone have been developed for routine
indicators. Calculation techniques and
presentations for the indicators have been
developed.

Table 2-4. Basic Air Quality Indicators used in EPHT, derived from the EPA data delivered to CDC

Ozone (daily 8-hr period with maximum concentration, ppm, by FRM)

• Number of days with maximum ozone concentration over the NAAQS (or other relevant benchmarks
(by county and MSA)

• Number of person-days with maximum 8-hr average ozone concentration over the NAAQS & other
relevant benchmarks (by county and MSA)

PM2.5 (daily 24-hr integrated samples, ug/m3, by FRM)

• Average ambient concentrations of particulate matter (< 2.5 microns in diameter) and compared to
annual PM2.5 NAAQS (by state).

• Percent of population exceeding annual PM2.5 NAAQS (by state).

• Percent of days with PM2.5 concentration over the daily NAAQS (or other relevant benchmarks (by
county and MSA)

• Number of person-days with PM2.5 concentration over the daily NAAQS & other relevant benchmarks
(by county and MSA)

2.3.1 Rationale for the Air Quality Indicators

The CDC EPHT Network is initially focusing on ozone and PM2.5. These air quality indicators are based
mainly around the NAAQS health findings and program-based measures (measurement, data, and
analysis methodologies). The indicators will allow comparisons across space and time for EPHT actions.
They are in the context of health-based benchmarks. By bringing population into the measures, they
roughly distinguish between potential exposures (at broad scale).

-------
2.3.2	Air Quality Data Sources

The air quality data will be available in the EPA's AQS database based on the state/federal air program's
data collection and processing. The AQS database contains ambient air pollution data collected by EPA,
state, local, and tribal air pollution control agencies from thousands of state or local air monitoring
stations (SLAMS).

2.3.3	Use of Air Quality Indicators for Public Health Practice

The basic indicators can be used to inform policymakers and the public regarding the air quality within a
state and across states (national). For example, the number of days per year that ozone is above the
NAAQS can be used to communicate to sensitive populations (such as asthmatics) the number of days
that they may be exposed to unhealthy levels of ozone. This short-term NAAQS level is the same level
used in the AQI to inform sensitive populations when and how to reduce their exposure. These
indicators, however, are not a surrogate measure of exposure and therefore will not be linked with
health data.

14

-------
3.0 Emissions Data

3.1 Introduction to Emissions Data Development

The U.S. Environmental Protection Agency (EPA) developed an air quality modeling platform for air
toxics and criteria air pollutants that represents the year 2021. The platform is based on the 2020
National Emissions Inventory (2020 NEI) published in April 2023 (EPA, 2023) along with other data
specific to the year 2021. The air quality modeling platform consists of all the emissions inventories and
ancillary data files used for emissions modeling, as well as the meteorological, initial condition, and
boundary condition files needed to run the air quality model. This section focuses on the emissions
modeling aspects of the 2021 modeling platform, including the emission inventories, the ancillary data
files, and the approaches used to transform inventories for use in air quality modeling.

The modeling platform includes all criteria air pollutants and precursors (CAPs), two groups of hazardous
air pollutants (HAPs), and diesel particulate matter. The first group of HAPs are those explicitly used by
the chemical mechanism in the Community Multiscale Air Quality (CMAQ) model (Appel, 2018) for
ozone/particulate matter (PM): chlorine (CI), hydrogen chloride (HCI), naphthalene, benzene,
acetaldehyde, formaldehyde, and methanol (the last five are abbreviated as NBAFM in subsequent
sections of the document). The second group of HAPs consists of 52 HAPs or HAP groups (such as
polycyclic aromatic hydrocarbon groups) that are included in CMAQ for the purposes of air quality
modeling for a HAP+CAP platform.

Emissions were prepared for the Community Multiscale Air Quality (CMAQ) model version 5.4.6 which
was used to model ozone (O3) particulate matter (PM), and HAPs. CMAQ requires hourly and gridded
emissions of the following inventory pollutants: carbon monoxide (CO), nitrogen oxides (NOx), volatile
organic compounds (VOC), sulfur dioxide (SO2), ammonia (NH3), particulate matter less than or equal to
10 microns (PM10), and individual component species for particulate matter less than or equal to 2.5
microns (PM2.5). In addition, the Carbon Bond mechanism version 6 (CB6) with chlorine chemistry within
CMAQ allows for explicit treatment of the VOC HAPs naphthalene, benzene, acetaldehyde,
formaldehyde and methanol (NBAFM), includes anthropogenic HAP emissions of HCI and CI, and can
model additional HAPs as described in Section 3. The short abbreviation for the modeling case name was
"2021hb", where 2021 is the year modeled, 'h' represents that it was based on the 2020 NEI, and 'b'
represents that it was the second version of a 2020 NEI-based platform.

Although not used for this downscaler analysis, emissions were also prepared for an air dispersion
modeling system: American Meteorological Society/Environmental Protection Agency Regulatory Model
(AERMOD) (EPA, 2018). AERMOD was run for 2021 for all NEI HAPs (about 130 more than covered by
CMAQ) in a similar way as was done for the 2018 version of AirToxScreen (EPA, 2022a). This TSD focuses
on the CMAQ aspects of the 2021 emissions modeling platform from which ozone and PM data were
developed for this study. The effort to create the emission inputs for this study included development of

6 CMAQ version 5.4: https://zenodo.org/record/7218076. CMAQ is also available from https://www.epa.gov/cmaq and the
Community Modeling and Analysis System (CMAS) Center at: https://www.cmascenter.org.

15

-------
emission inventories to represent emissions during the year of 2021, along with application of emissions
modeling tools to convert the inventories into the format and resolution needed by CMAQ.

The emissions modeling platform includes point sources, nonpoint sources, onroad mobile sources,
nonroad mobile sources, biogenic emissions and fires for the U.S., Canada, and Mexico. Some platform
categories use more disaggregated data than are made available in the NEI. For example, in the
platform, onroad mobile source emissions are represented as hourly emissions by vehicle type, fuel type
process, and road type while the NEI emissions are aggregated to vehicle type/fuel type totals and
annual temporal resolution. Emissions used in the CMAQ modeling from Canada are provided by
Environment and Climate Change Canada (ECC) and Mexico are mostly provided by SEMARNAT and are
not part of the NEI. Year-specific emissions were used for fires, biogenic sources, fertilizer, point sources,
and onroad and nonroad mobile sources. Where available, hourly continuous emission monitoring
system (CEMS) data were used for electric generating unit (EGU) emissions.

The primary emissions modeling tool used to create the CMAQ model-ready emissions was the Sparse
Matrix Operator Kernel Emissions (SMOKE) modeling system. SMOKE version 5.0 was used to create
CMAQ-ready emissions files for a 12-km grid covering the continental United States. Additional
information about SMOKE is available from http://www.cmascenter.org/smoke.

The gridded meteorological model used to provide input data for the emissions modeling was developed
using the Weather Research and Forecasting Model (WRF, https://ral.ucar.edu/solutions/products/
weather-research-and-forecasting-model-wrf) version 4.1.1, Advanced Research WRF core (Skamarock,
et al., 2008). The WRF Model is a mesoscale numerical weather prediction system developed for both
operational forecasting and atmospheric research applications. The WRF model was run for 2021 over a
domain covering the continental U.S. at a 12km resolution with 35 vertical layers. The run for this
platform included high resolution sea surface temperature data from the Group for High Resolution Sea
Surface Temperature (GHRSST) (see https://www.ghrsst.org/) and is given the EPA meteorological case
abbreviation "21k." The full case abbreviation includes this suffix following the emissions portion of the
case name to fully specify the abbreviation of the case as "2021hb_cb6_21k."

CMAQ was run on a 12km modeling domain over the Continental United States. The outputs from
CMAQ provide the overall mass, chemistry, and formation for specific hazardous air pollutants (HAPs)
formed secondarily in the atmosphere (e.g., formaldehyde, acetaldehyde, and acrolein). Data files and
summaries for this platform are available from the "2021 Data Files and Summaries" link on this page of
the air emissions modeling website https://www.epa.gov/air-emissions-modeling/2021-emissions-
modeling-platform.

This chapter contains two additional sections. Section 3.2 contains high-level information about the
inventories input to SMOKE and summaries of the emissions used for the study. Section 3.3 contains
high-level information on the emissions modeling performed to convert the inventories into the format
and resolution needed by CMAQ. Additional details on the development of the emissions inputs to
CMAQ are provided in the publication Technical Support Document (TSD): Preparation of Emissions
Inventories for the 2021 North American Emissions Modeling Platform (EPA, 2024).

-------
3.2 Emission Inventories and Approaches

This section describes the emissions inventories created for input to SMOKE, which are based on the
April 2023 version of the 2020 NEI with updates to reflect emissions in 2021. The NEI includes five main
data categories: a) nonpoint sources; b) point sources; c) nonroad mobile sources; d) onroad mobile
sources; and e) fires. For CAPs, the NEI data are largely compiled from data submitted by state, local and
tribal (S/L/T) agencies. HAP emissions data are often augmented by EPA when they are not voluntarily
submitted by S/L/T agencies. The NEI was compiled using the Emissions Inventory System (EIS). EIS
collects and stores facility inventory and emissions data for the NEI and includes hundreds of automated
QA checks to improve data quality, and it also supports release point (stack) coordinates separately from
facility coordinates. EPA collaboration with S/L/T agencies helped prevent duplication between point
and nonpoint source categories such as industrial boilers. The 2020 NEI Technical Support Document
describes in detail the development of the 2020 emission inventories and is available at
https://www.epa.gov/air-emissions-inventories/2020-national-emissions-inventory-nei-technical-
support-document-tsd (EPA. 2023).

A complete set of emissions for all source categories is developed for the NEI every three years, with
2020 being the most recent year represented with a full "triennial" NEI. S/L/T agencies are required to
submit all applicable point sources to the NEI in triennial years, including the year 2020. Because only
point source emissions were submitted by S/L/T agencies to develop the NEI for 2021, emissions for any
point sources not submitted for 2021, and not marked as shutdown, were pulled forward from the 2020
NEI. The SMARTFIRE2 system and the BlueSky Pipeline (https://github.com/pnwairfire/bluesky)
emissions modeling system were used to develop the fire emissions. SMARTFIRE2 categorizes all fires as
either prescribed burning or wildfire, and the BlueSky Pipeline system includes fuel loading,
consumption, and emission factor estimates for both types of fires. Onroad and nonroad mobile source
emissions were developed for this project using MOVES4 (https://www.epa.gov/moves).

With the exception of fire emissions, Canadian emissions were provided by Environment Canada and
Climate Change (ECCC) for the years 2020 and 2023 and most 2021 emissions were developed by
interpolating between 2020 and 2023. For Mexico, inventories from the 2019 emissions modeling
platform (EPA, 2022b) were used as the starting point with data for border states supplemented with
data for 2018 developed by SEMARNAT in collaboration with U.S. EPA.

The emissions modeling process was performed using SMOKE v5.0. Through this process, the emissions
inventories were apportioned into the grid cells used by CMAQ and temporally allocated into hourly
values. In addition, the pollutants in the inventories (e.g., NOx, PM, and VOC) were split into the
chemical species needed by CMAQ. For the purposes of preparing the CMAQ- ready emissions, the NEI
emissions inventories by data category were split into emissions modeling platform "sectors"; and
emissions from sources other than the NEI are added, such as the Canadian, Mexican, and offshore
inventories. Emissions within the emissions modeling platform were separated into sectors for groups of
related emissions source categories that were run through the appropriate SMOKE programs, except the
final merge, independently from emissions categories in the other sectors. The final merge program
called Mrggrid combines low-level sector-specific gridded, speciated, and temporalized emissions to
create the final CMAQ-ready emissions inputs. For biogenic and fertilizer emissions, the CMAQ model
allows for these emissions to be included in the CMAQ-ready emissions inputs, or to be computed within

-------
CMAQ itself (the "inline" option). This study used the option to compute biogenic emissions within the
model and the CMAQ bidirectional ammonia process to compute the fertilizer emissions.

Table 3-1 presents the sectors in the emissions modeling platform used to develop the year 2021
emissions for this project. The sector abbreviations are provided in italics; these abbreviations are used
in the SMOKE modeling scripts, the inventory file names, and throughout the remainder of this section.

Table 3-1. Platform Sectors Used in the Emissions Modeling Process

Platform Sector:
abbreviation

NEI Data Category

Description and resolution of the data input to SMOKE

EGU units:
Ptegu

Point

2021 NEI point source EGUs, replaced with hourly
Continuous Emissions Monitoring System (CEMS) values for
NOx and S02, and the remaining pollutants temporally
allocated according to CEMS heat input where the units are
matched to the NEI. Emissions for all sources not matched
to CEMS data come from 2021 NEI point inventory. Annual
resolution for sources not matched to CEMS data, hourly
for CEMS sources. EGUs closed in 2021 are not part of the
inventory.

Point source oil and gas:
pt_oilgas

Point

2021 NEI point sources that include oil and gas production
emissions processes for facilities with North American
Industry Classification System (NAICS) codes related to Oil
and Gas Extraction, Natural Gas Distribution, Drilling Oil and
Gas Wells, Support Activities for Oil and Gas Operations,
Pipeline Transportation of Crude Oil, and Pipeline
Transportation of Natural Gas. Includes U.S. offshore oil
production.

Aircraft and ground
support equipment:
airports

Point

2021 NEI point source emissions from airports, including
aircraft and airport ground support emissions projected to
2021 based on the 2022 Terminal Area Forecast (TAF).
Annual resolution.

Remaining non-EGU point:
ptnonipm

Point

All 2021 NEI point source records not matched to the
airports, ptegu, or pt_oilgas sectors. Includes 2020 NEI rail
yard emissions projected to 2021. Annual resolution.

Livestock:
livestock

Nonpoint

2021 nonpoint livestock emissions developed using a
similar method to 2020 NEI but with adjusted animal
counts and using 2021 meteorology. Livestock includes
ammonia and other pollutants (except PM2.5). County and
annual resolution.

Agricultural Fertilizer:
fertilizer

Nonpoint

2021 agricultural fertilizer ammonia emissions computed
inline within CMAQ.

-------
Platform Sector:
abbreviation

NEI Data Category

Description and resolution of the data input to SMOKE

Area fugitive dust:
afdust_adj

Nonpoint

PMio and PM2.5 fugitive dust sources from the 2020 NEI
nonpoint inventory; including building construction, road
construction, agricultural dust, and paved and unpaved
road dust where paved and unpaved road dust were
adjusted to 2021 based on VMT differences. The emissions
modeling system applies a transportable fraction reduction
and zero-out adjustments based on the year-specific
gridded hourly meteorology (precipitation and snow/ice
cover). Emissions are county and annual resolution.

Biogenic:
beis

Nonpoint

Year 2021 emissions from biogenic sources. These were left
out of the CMAQ-ready merged emissions, in favor of inline
biogenic emissions produced during the CMAQ model run
itself. Version 4 of the Biogenic Emissions Inventory System
(BEIS) was used with Version 6 of the Biogenic Emissions
Landuse Database (BELD6). These CMAQ-generated
emissions are similar to the 2021 biogenic emissions
generated through running SMOKE, but they are not
exactly the same.

Category 1, 2 CMV:
cmv_clc2

Nonpoint

2021 Category 1 (CI) and Category 2 (C2), commercial
marine vessel (CMV) emissions based on 2021 Automatic
Identification System (AIS) data categorized using SCCs
specific to ship type. Point and hourly resolution.

Category 3 CMV:
cmv_c3

Nonpoint

2021 Category 3 (C3) commercial marine vessel (CMV)
emissions based on 2021 AIS data categorized using SCCs
specific to ship type. Point and hourly resolution.

Locomotives:
rail

Nonpoint

Line haul rail locomotives emissions from 2020 NEI
projected to 2021 using 5 percent growth based on Annual
Energy Outlook (AEO) changes from 2020 to 2021. County
and annual resolution.

Nonpoint source oil and
gas: np_oilgas

Nonpoint

Nonpoint emissions from oil and gas-related processes for
2021 computed using activity data from 2021. County and
annual resolution.

Residential Wood

Combustion:

rwc

Nonpoint

2020 NEI nonpoint sources with residential wood
combustion (RWC) processes, projected to 2021 with state-
level adjustment factors derived from the State Energy
Data System (SEDS). County and annual resolution.

Solvents: np_solvents

Nonpoint

Emissions of solvents for 2021 based on methods used for
the 2020 NEI (Seltzer, 2021). Includes household cleaners,
personal care products, adhesives, architectural and
aerosol coatings, printing inks, and pesticides. Annual and
county resolution.

Remaining nonpoint:
nonpt

Nonpoint

2020 NEI nonpoint sources not included in other platform
sectors. County and annual resolution.

19

-------
Platform Sector:
abbreviation

NEI Data Category

Description and resolution of the data input to SMOKE

Nonroad:
nonroad

Nonroad

2021 nonroad equipment emissions developed with
MOVES4, including the updates made to spatial
apportionment that were developed with the 2016vl
platform. MOVES4 was used for all states except California,
which submitted their own emissions for 2020 and 2023
from which an interpolation to 2021 was performed.
County and monthly resolution.

Onroad:
onroad

Onroad

Onroad mobile source gasoline and diesel vehicles from
parking lots and moving vehicles for 2021 developed using
VMT data from 2020 NEI projected to 2021 using factors
based on FHWA VM-2 data. Includes the following emission
processes: exhaust, extended idle, auxiliary power units,
evaporative, permeation, refueling, vehicle starts, off
network idling, long-haul truck hoteling, and brake and tire
wear. MOVES4 was run for 2021 to generate year-specific
emission factors.

Onroad California:
onroad_ca_adj

Onroad

California-provided 2020 and 2023 CAPs that were
interpolated to 2021. HAPs speciated from CAPs. Onroad
mobile source gasoline and diesel vehicles from parking lots
and moving vehicles based on Emission Factor (EMFAC),
gridded and temporalized based on outputs from MOVES4.

Point source agricultural
fires: ptagfire

Nonpoint

Agricultural fire sources for 2021 developed by EPA as point
and day-specific emissions.7 Only EPA-developed data were
used in this study, thus 2020 NEI state submissions are not
included. Agricultural fires are in the nonpoint data
category of the NEI, but in the modeling platform, they are
treated as day-specific point sources. Updated HAP-
augmentation factors were applied.

Point source prescribed
fires: ptfire-rx

Nonpoint

Point source day-specific prescribed fires for 2021
computed using SMARTFIRE 2 and BlueSky Pipeline. The
ptfire emissions were run as two separate sectors: ptfire-rx
(prescribed, including Flint Hills / grasslands) and ptfire-
wild.

Point source wildfires:
ptfire-wild

Nonpoint

Point source day-specific wildfires for 2021 computed using
SMARTFIRE 2 and BlueSky Pipeline.

Non-US. Fires:
ptfire_othna

N/A

Point source day-specific wildfires and agricultural fires
outside of the U.S. for 2021. Canadian fires were computed
using SMARTFIRE 2 and BlueSky Pipeline. Mexico,
Caribbean, Central American, and other international fires,
are from v2.5 of the Fire INventory (FINN) from National
Center for Atmospheric Research (Wiedinmyer, C., 2023).

7 Only EPA-developed agricultural fire data were included in this study; data submitted by states to the NEI were excluded.

-------
Platform Sector:
abbreviation

NEI Data Category

Description and resolution of the data input to SMOKE

Canada Area Fugitive dust

sources:

canada_afdust

N/A

Area fugitive dust sources from ECCCfor 2021 (interpolated
between provided 2020 and 2023 emissions) with transport
fraction and snow/ice adjustments based on 2021
meteorological data. Annual and province resolution.

Canada Point Fugitive dust

sources:

canada_ptdust

N/A

Point source fugitive dust sources from ECCC for 2021
(interpolated between provided 2020 and 2023 emissions)
with transport fraction and snow/ice adjustments based on
2021 meteorological data. Monthly and province
resolution.

Canada and Mexico
stationary point sources:
canmex_point

N/A

Canada and Mexico point source emissions not included in
other sectors. Canada point sources were provided by ECCC
for 2020 and 2023, and interpolated to 2021. Mexico point
source emissions for border states represent 2018 and
were developed by SEMARNAT in collaboration with EPA,
while emissions for all other states were carried forward
from 2019ge (EPA, 2022b). Annual and monthly resolution.

Canada and Mexico
agricultural sources:
canmex_ag

N/A

Canada and Mexico agricultural emissions. Canada
emissions were provided by ECCCfor 2020 and 2023, and
interpolated to 2021. Mexico agricultural emissions were
provided by SEMARNAT and include updated emissions for
border states representing 2018 developed by SEMARNAT
in collaboration with EPAT, while emissions for all other
states were carried forward from 2019ge. Annual
resolution.

Canada low-level oil and
gas sources:
canada_og2D

N/A

Canada emissions from upstream oil and gas, provided by
ECCC for 2020 and 2023, and interpolated to 2021. This
sector contains the portion of oil and gas emissions which
are not subject to plume rise. The rest of the Canada oil and
gas emissions are in the canmex_point sector. Annual
resolution.

Canada and Mexico
nonpoint and nonroad
sources:
canmex_area

N/A

Canada and Mexico nonpoint source emissions not
included in other sectors. Canada: ECCC provided
surrogates and 2020 and 2023 inventories, that were
interpolated to 2021. Mexico: include updated emissions
for border states representing 2018 developed by
SEMARNAT in collaboration with EPA, while emissions for
all other states were carried forward from 2019ge. Annual
and monthly resolution.

Canada onroad sources:
canada_onroad

N/A

Canada onroad emissions. 2020 and 2023 Canada
inventories provided by ECCC and interpolated to 2021;
processed using updated surrogates. Province and monthly
resolution.

Mexico onroad sources:
mexico_onroad

N/A

Mexico onroad emissions. 2020 and 2023 emissions output
from MOVES-Mexico were interpolated to 2021. Municipio
and monthly resolution.

21

-------
Ocean chlorine emissions were also merged in with the above sectors. The ocean chlorine gas emission
estimates are based on the build-up of molecular chlorine (CI2) concentrations in oceanic air masses
(Bullock and Brehme, 2002). Ocean chlorine data at 12 km resolution were available from earlier studies
and were not modified other than the name "CHLORINE" was changed to "CL2" because that is the
name required by the CMAQ model.

The emission inventories in SMOKE input formats for the platform are available from EPA's Air Emissions
Modeling website: https://www.epa.gov/air-emissions-modeling/2021-emissions-modeling-platform.
The platform informational text file indicates the zipped files associated with each platform sector. Some
emissions data summaries are available with the data files for the 2021 platform. The types of reports
include state summaries of inventory pollutants and model species by modeling platform sector and
county annual totals by modeling platform sector.

Annual summaries of the emissions in the Contiguous U.S. and emissions within the 12-km domain but
outside of the U.S. are shown in Table 3-2 and Table 3-3, respectively. State total emissions for each
sector are provided in Appendix B, a workbook entitled
"Append ix_B_2021_emissions_totals_by_sector.xlsx".

22

-------
Table 3-2. 2021 Contiguous United States Emissions by Sector (short tons/yr in 48 states + D.C.)

Sector

CO

NH3

NOX

PM10

PM2_5

S02

VOC

afdust_adj







6,027,656

821,738





airports

333,660

0

83,674

8,521

7,533

9,126

50,041

cmv_clc2

19,892

68

134,167

3,662

3,548

615

5,116

cmv_c3

10,252

44

81,846

2,507

2,307

5,767

4,687

fertilizer



1,275,333











livestock



2,824,644









225,971

nonpt

2,173,885

145,073

723,480

711,625

622,905

106,258

1,005,040

nonroad

11,037,304

1,998

816,810

80,205

75,312

917

945,175

nP_oilgas

654,275

43

728,663

14,048

13,880

139,514

2,876,480

np_solvents

0

0

0

0

0

0

2,716,884

onroad

14,391,846

183,954

2,258,178

188,833

74,375

8,748

1,039,569

ptegu

467,560

21,482

879,533

125,564

109,306

968,652

26,731

ptagfire

773,523

172,492

33,830

114,547

74,469

13,729

125,668

ptfire-rx

7,825,125

68,537

125,890

1,267,230

1,131,103

80,356

1,586,259

ptfi re-wild

17,682,184

178,672

163,750

3,826,054

2,386,263

166,480

4,865,824

ptnonipm

1,226,638

61,712

793,938

350,108

228,948

456,300

726,236

pt_oilgas

174,223

9,095

318,687

12,460

11,916

31,186

195,937

rail

96,705

296

444,124

11,360

10,982

369

18,367

rwc

2,940,341

22,616

44,790

448,615

446,995

11,894

453,043

beis

3,314,764



989,492







28,539,802

CONUS + beis

63,122,176

4,966,059

8,620,851

13,192,993

6,021,580

1,999,910

45,406,832

Table 3-3. Non-US Emissions by Sector within the 12US1 Modeling Domain (short tons/yr)

Sector

CO

NH3

NOX

PM10

PM2_5

S02

VOC

Canada ag



500,395



6,562

1,875



124,257

Canada oil and gas 2D



8









306,206

Canada afdust







1,028,722

194,713





Canada ptdust







3,588

443





Canada area

2,040,850

5,983

317,182

184,382

134,440

14,175

711,153

Canada onroad

1,669,722

6,994

356,236

24,858

13,378

893

118,094

Canada point

1,021,439

18,569

538,357

112,670

42,409

483,703

148,235

Canada fires

18,068,782

259,108

302,681

3,543,123

3,141,541

173,644

5,070,468

Canada cmv_clc2

3,179

10

20,497

541

525

64

720

Canada cmv_c3

7,750

27

60,418

1,498

1,378

3,331

3,773

Mexico ag



137,778



53,862

11,638





Mexico area

98,400

26,201

57,960

42,108

20,576

21,937

425,809

Mexico onroad

1,418,503

2,509

350,527

13,377

9,349

5,778

127,181

Mexico point

158,097

979

199,367

90,822

53,973

341,038

32,822

Mexico fires

415,564

6,820

24,903

54,701

45,743

4,240

204,334

Mexico cmv_clc2

157

0

1,016

27

26

4

42

Mexico cmv_c3

9,601

87

82,079

4,907

4,514

12,970

4,596

23

-------
Sector

CO

NH3

NOX

PM10

PM2_5

S02

VOC

Offshore cmv_clc2

4,445

14

28,377

743

721

88

1,065

Offshore cmv_c3

51,349

309

414,286

17,467

16,069

43,957

25,126

Offshore pt_oilgas

28,548

5

34,658

422

416

321

31,400

Can/Mex/offshore total

24,996,385

965,797

2,788,544

5,184,380

3,693,725

1,106,143

7,335,281

3.3 Emissions Modeling Summary

The CMAQ air quality model requires hourly emissions of specific gas and particle species for the
horizontal and vertical grid cells contained within the modeled region (i.e., modeling domain). To
provide emissions in the form and format required by the model, it is necessary to "pre-process" the
"raw" emissions (i.e., emissions input to SMOKE) for the sectors described above. In brief, the process of
emissions modeling transforms the emissions inventories from their original temporal resolution,
pollutant resolution, and spatial resolution into the hourly, speciated, gridded, and vertical resolution
required by the air quality model. Emissions modeling includes temporal allocation, spatial allocation,
and pollutant speciation. Emissions modeling sometimes includes the vertical allocation (i.e., plume rise)
of point sources, but many air quality models also perform this task because it greatly reduces the size of
the input emissions files if the vertical layers of the sources are not included.

The temporal resolutions of the emissions inventories input to SMOKE vary across sectors and may be
hourly, daily, monthly, or annual total emissions. The spatial resolution may be individual point sources;
totals by county (U.S.), province (Canada), or municipio (Mexico); or gridded emissions. This section
provides some basic information about the tools and data files used for emissions modeling as part of
the modeling platform.

3.3.1	The SMOKE Modeling System

SMOKE version 5.0 was used to process the raw emissions inventories into emissions inputs for each
modeling sector into a format compatible with CMAQ. SMOKE executables and source code are
available from the Community Multiscale Analysis System (CMAS) Center at

http://www.cmascenter.org. Additional information about SMOKE is available from http://www.smoke-
model.org. For sectors that have plume rise, the in-line plume rise capability allows for the use of
emissions files that are much smaller than full three-dimensional gridded emissions files. For quality
assurance of the emissions modeling steps, emissions totals by specie for the entire model domain are
output as reports that are then compared to reports generated by SMOKE on the input inventories to
ensure that mass is not lost or gained during the emissions modeling process.

3.3.2	Key Emissions Modeling Settings

When preparing emissions for the air quality model, emissions for each sector are processed separately
through SMOKE, and then the final merge program (Mrggrid) is run to combine the model-ready, sector-
specific 2-D gridded emissions across sectors. The SMOKE settings in the run scripts and the data in the
SMOKE ancillary files control the approaches used by the individual SMOKE programs for each sector.
Table 3-4 summarizes the major processing steps of each platform sector with the columns as follows.

24

-------
The "Spatial" column shows the spatial approach used: "point" indicates that SMOKE maps the source
from a point location (i.e., latitude and longitude) to a grid cell; "surrogates" indicates that some or all of
the sources use spatial surrogates to allocate county emissions to grid cells; and "area-to-point"
indicates that some of the sources use the SMOKE area-to-point feature to grid the emissions.

The "Speciation" column indicates that all sectors use the SMOKE speciation step, though biogenics
speciation is done within the Tmpbeis3 program and not as a separate SMOKE step.

The "Inventory resolution" column shows the inventory temporal resolution from which SMOKE needs
to calculate hourly emissions. Note that for some sectors (e.g., onroad, beis), there is no input inventory;
instead, activity data and emission factors are used in combination with meteorological data to compute
hourly emissions.

Finally, the "plume rise" column indicates the sectors for which the "in-line" approach is used. These
sectors are the only ones with emissions in aloft layers based on plume rise. The term "in-line" means
that the plume rise calculations are done inside of the air quality model instead of being computed by
SMOKE. In all of the "in-line" sectors, all sources are output by SMOKE into point source files which are
subject to plume rise calculations in the air quality model. In other words, no emissions are output to
layer 1 gridded emissions files from those sectors as has been done in past platforms. The air quality
model computes the plume rise using stack parameters, the Briggs algorithm, and the hourly emissions
in the SMOKE output files for each emissions sector. The height of the plume rise determines the model
layers into which the emissions are placed. The plume top and bottom are computed, along with the
plumes' distributions into the vertical layers that the plumes intersect. The pressure difference across
each layer divided by the pressure difference across the entire plume is used as a weighting factor to
assign the emissions to layers. This approach gives plume fractions by layer and source. Day-specific
point fire emissions are treated differently in CMAQ. After plume rise is applied, there are emissions in
every layer from the ground up to the top of the plume.

Table 3-4. Key emissions modeling steps by sector

Platform sector

Spatial

Speciation

Inventory
resolution

Plume rise

afdust_adj

Surrogates

Yes

Annual

airports

Point

Yes

Annual

None

beis

Pre-gridded
land use

in BEIS4

computed
hourly in CMAQ

fertilizer

EPIC

computed
hourly in CMAQ

livestock

Surrogates

Yes

Daily

cmv_clc2

Point

Yes

hourly

in-line

cmv_c3

Point

Yes

hourly

in-line

nonpt

Surrogates &
area-to-point

Yes

Annual

nonroad

Surrogates

Yes

monthly

-------
Platform sector

Spatial

Speciation

Inventory
resolution

Plume rise

np_oilgas

Surrogates

Yes

Annual



onroad

Surrogates

Yes

monthly activity,
computed
hourly



onroad_ca_adj

Surrogates

Yes

monthly activity,
computed
hourly



canada_onroad

Surrogates

Yes

monthly



mexico_onroad

Surrogates

Yes

monthly



canada_afdust

Surrogates

Yes

annual &
monthly



canmex_area

Surrogates

Yes

monthly



canmex_point

Point

Yes

monthly

in-line

canada_ptdust

Point

Yes

annual

None

canada_og2D

Point

Yes

monthly

None

canmex_ag

Surrogates

Yes

annual



ptagfire

Point

Yes

daily

in-line

pt_oilgas

Point

Yes

annual

in-line

ptegu

Point

Yes

daily & hourly

in-line

ptfire-rx

Point

Yes

daily

in-line

ptfi re-wild

Point

Yes

daily

in-line

ptfire_othna

Point

Yes

daily

in-line

ptnonipm

Point

Yes

annual

in-line

rail

Surrogates

Yes

annual



rwc

Surrogates

Yes

annual



np_solvents

Surrogates

Yes

annual



Note that SMOKE has the option of grouping sources so that they are treated as a single stack when
computing plume rise. For the modeling cases discussed in this document, no grouping was performed
because grouping combined with "in-line" processing will not give identical results as "offline"
processing (i.e., when SMOKE creates 3-dimensional files). This occurs when stacks with different stack
parameters or latitude and longitudes are grouped, thereby changing the parameters of one or more
sources. The most straightforward way to get the same results between in-line and offline is to avoid the
use of stack grouping.

Biogenic emissions can be modeled two different ways in the CMAQ model. The BEIS model in SMOKE
can produce gridded biogenic emissions that are then included in the gridded CMAQ-ready emissions
inputs, or alternatively, CMAQ can be configured to create "in-line" biogenic emissions within CMAQ
itself. For this study, the in-line biogenic emissions option was used, and thus biogenic emissions from
BEIS were not included in the gridded CMAQ-ready emissions.

26

-------
3.3.3 Spatial Configuration

For this study, SMOKE was run for the larger 12-km CONtinental United States "CONUS" modeling
domain (12US1) shown in Figure 3-1, but the air quality model was run on the smaller 12-km domain
(12US2). The grid used a Lambert-Conformal projection, with Alpha = 33, Beta = 45 and Gamma = -97,
with a center of X = -97 and Y = 40. Later sections provide details on the spatial surrogates and area-to-
point data used to accomplish spatial allocation with SMOKE. Later sections provide details on the spatial
surrogates and area-to-point data used to accomplish spatial allocation with SMOKE. WRF, SMOKE, and
CMAQ all presume the Earth is a sphere with a radius of 6370000 m.

Figure 3-1. CMAQ Modeling Domain.

3.3.4 Chemical Speciation Configuration

Chemical speciation involves the process of translating emissions from the inventory into the chemical
mechanism-specific "model species" needed by an air quality model. Using the CB6R5_AE7 chemical
mechanism as an example, these model species either represent explicit chemical compounds (e.g.,
acetone, benzene, ethanol) or groups of species (i.e., "lumped species;" e.g., PAR, OLE, KET). Table 3-5
lists the model species produced by SMOKE in the platform for the mechanism used for this study.

27

-------
Table 3-5. Emission model species produced for CB6R5_AE7 for CMAQ

Inventory Pollutant

Model Species

Model species description

Cl2

CL2

Atomic gas-phase chlorine

HCI

HCL

Hydrogen Chloride (hydrochloric acid) gas

CO

CO

Carbon monoxide

NOx

NO

Nitrogen oxide

NOx

N02

Nitrogen dioxide

NOx

HONO

Nitrous acid

S02

S02

Sulfur dioxide

S02

SULF

Sulfuric acid vapor

nh3

NH3

Ammonia

nh3

NH3_FERT

Ammonia from fertilizer

VOC

AACD

Acetic acid

VOC

ACET

Acetone

VOC

ALD2

Acetaldehyde

VOC

ALDX

Propionaldehyde and higher aldehydes

VOC

APIN

Alpha pinene

VOC

BENZ

Benzene

VOC

CAT1

Methyl-catechols

VOC

CH4

Methane

VOC

CRES

Cresols

VOC

CRON

Nitro-cresols

VOC

ETH

Ethene

VOC

ETHA

Ethane

VOC

ETHY

Ethyne

VOC

ETOH

Ethanol

VOC

FACD

Formic acid

VOC

FORM

Formaldehyde

VOC

GLY

Glyoxal

VOC

GLYD

Glycolaldehyde

VOC

IOLE

Internal olefin carbon bond (R-C=C-R)

VOC

ISOP

Isoprene

VOC

ISPD

Isoprene Product

VOC

IVOC

Intermediate volatility organic compounds

VOC

KET

Ketone Groups

VOC

MEOH

Methanol

VOC

MGLY

Methylglyoxal

VOC

NAPH

Naphthalene

VOC

NVOL

Non-volatile compounds

VOC

OLE

Terminal olefin carbon bond (R-C=C)

VOC

PACD

Peroxyacetic and higher peroxycarboxylic acids

VOC

PAR

Paraffin carbon bond

VOC

PRPA

Propane

VOC

SESQ

Sesquiterpenes (from biogenics only)

28

-------
Inventory Pollutant

Model Species

Model species description

VOC

SOAALK

Secondary Organic Aerosol (SOA) tracer

VOC

TERP

Terpenes (from biogenics only)

VOC

TOL

Toluene and other monoalkyl aromatics

VOC

UNR

Unreactive

VOC

XYLMN

Xylene and other polyalkyl aromatics, minus naphthalene

Naphthalene

NAPH

Naphthalene from inventory

Benzene

BENZ

Benzene from the inventory

Acetaldehyde

ALD2

Acetaldehyde from inventory

Formaldehyde

FORM

Formaldehyde from inventory

Methanol

MEOH

Methanol from inventory

PM10

PMC

Coarse PM > 2.5 microns and < 10 microns

PM2.s

PEC

Particulate elemental carbon < 2.5 microns

PM2.s

PN03

Particulate nitrate < 2.5 microns

PM2.5

POC

Particulate organic carbon (carbon only) < 2.5 microns

PM2.5

PS04

Particulate Sulfate < 2.5 microns

PM2.5

PAL

Aluminum

PM2.5

PCA

Calcium

PM2.5

PCL

Chloride

PM2.5

PFE

Iron

PM2.5

PK

Potassium

PM2.5

PH20

Water

PM2.5

PMG

Magnesium

PM2.5

PMN

Manganese

PM2.5

PMOTHR

PM2.5 not in other AE6 species

PM2.5

PNA

Sodium

PM2.5

PNCOM

Non-carbon organic matter

PM2.5

PNH4

Ammonium

PM2.5

PSI

Silica

PM2.5

PTI

Titanium

The TOG and PM2.5 profiles used to speciate emissions are part of the SPECIATE v5.2 database
(https://www.epa.gov/air-emissions-modeling/speciate). The SPECIATE database is developed and
maintained by the EPA's Office of Research and Development (ORD), Office of Transportation and Air
Quality (OTAQ), and the Office of Air Quality Planning and Standards (OAQPS), in cooperation with
Environment Canada (EPA, 2019). These profiles are processed using the EPA's S2S-Tool
(https://github.com/USEPA/S2S-Tool) to generate the GSPRO and GSCNV files needed by SMOKE. As
with previous platforms, some Canadian point source inventories are provided from Environment
Canada as pre-speciated emissions.

Speciation profiles (GSPRO files) and cross-references (GSREF files) for this platform are available in the
SMOKE input files for the platform. Emissions of VOC and PM2.5 emissions by county, sector, and profile
for all sectors other than onroad mobile can be found in the sector summaries. Total emissions for each
model species by state and sector can be found in the state-sector totals workbook.

29

-------
The following updates to profile assignments were made to this modeling platform and differ from prior
years:

• For PM2.5:

o All GSPRO files were generated by the S2S-Tool, dated 09-11-2023, and utilized SPECIATE
v5.3.

o Update of the CMV speciation cross-reference files to utilize the SCC updates for this
sector and use the new CROC profiles introduced in SPECIATE v5.3.

o Update onroad and nonroad mobile cross-reference files to utilize the CROC profiles
introduced in SPECIATE v5.3.

• ForVOC:

o All GSPRO and GSCNV files were generated by the S2S-Tool, dated 09-11-2023, and
utilized SPECIATE v5.3.

o All oil and gas well completion and abandoned wells emissions were updated (or added in
the case of abandoned wells) from 1101 and 8949, respectively, to 95404 and 95403,
respectively. However, this update was not performed for basin-specific profiles that
were output by the O&G Tool.

o Update of the CMV speciation cross-reference files to utilize the SCC updates for this
sector and use the new GROC profiles introduced in SPECIATE v5.3.

o Update usage of 95120a to 95120c.

o Update onroad and nonroad mobile cross-reference files to utilize the GROC profiles
introduced in SPECIATE v5.3.

The base emissions inventory for this modeling platform includes total VOC and individual HAP
emissions. Often, individual HAPs are components of VOC (HAP-VOC), and these HAP-VOCs are included
("integrated") in the speciation process. This HAP integration is performed in a way to ensure double
counting of emitted mass does not occur and requires specific data processing by the S2S-Tool and user
input in SMOKE.

To incorporate HAP emissions from the base inventory into the modeling platform, one of two methods
are performed. (1) Integrate, HAP-use is a method where the mass of integrated HAP-VOCs is summed
and subtracted from VOC, and the residual mass (NONHAPVOC) is speciated using a renormalized
speciation profile that does not include the integrated HAP-VOCs (they are subtracted from the profile
and then the profile is renormalized to 100%). (2) No-Integrate, HAP-use is a method where the mass of
VOC is speciated using a speciation profile that does not include the integrated HAP-VOCs (they are
subtracted from the profile and the profile is not renormalized to 100%). In this scenario, the HAP-VOC
and VOC portions of the inventory are difficult to harmonize, and it is assumed that the proportions of
HAPs from these sources are adequately captured in the speciation profile used to speciate the VOC
emissions (which is why there is no renormalization). In addition, HAPs can be introduced into a
modeling platform using speciation profiles. In this scenario, HAP-VOC emissions are "generated"
through VOC speciation and are not incorporated from the base inventory. This method is called
"Criteria" speciation. The integration methods used for each platform sector are shown in Table 3-6.

-------
Table 3-6. Integration status for each platform sector

Platform
Sector

Approach for Integrating NEI emissions of Naphthalene (N), Benzene (B), Acetaldehyde
(A), Formaldehyde (F) and Methanol (M)

afdust

N/A - sector contains no VOC

airports

No integration, use NBAFM in inventory

beis

N/A - sector contains no inventory pollutant "VOC"; but rather specific VOC species

cmv clc2

No integration, no NBAFM in inventory, create NBAFM from VOC speciation

cmv c3

No integration, no NBAFM in inventory, create NBAFM from VOC speciation

fertilizer

N/A - sector contains no VOC

livestock

Full integration (NBAFM)

nonpt

Partial integration (NBAFM)

nonroad

Full integration (internal to MOVES)

np_oilgas

Partial integration (NBAFM)

onroad

Full integration (internal to MOVES)

Canada onroad

No integration, no NBAFM in inventory, create NBAFM from VOC speciation

mexico_onroad

Full integration (internal to MOVES-Mexico); however, MOVES-MEXICO speciation was
older CB6, so post-SMOKE emissions were converted to CB6R3AE6

Canada afdust

N/A - sector contains no VOC

canmex area

No integration, no NBAFM in inventory, create NBAFM from VOC speciation

canmex_point

No integration, no NBAFM in inventory, create NBAFM from VOC speciation

canada_ptdust

N/A - sector contains no VOC

canada_og2D

No integration, no NBAFM in inventory, create NBAFM from VOC speciation

canmex_ag

No integration, no NBAFM in inventory, create NBAFM from VOC speciation

pt_oilgas

No integration, use NBAFM in inventory

ptagfire

Full integration (NBAFM)

ptegu

No integration, use NBAFM in inventory

ptfire-rx

Full integration (NBAFM)

ptfire-wild

Partial integration (NBAFM)

ptfire_othna

No integration, no NBAFM in inventory, create NBAFM from VOC speciation

ptnonipm

No integration, use NBAFM in inventory

rail

Full integration (NBAFM)

rwc

Full integration (NBAFM)

np_solvents

Partial integration (NBAFM)

The HAPs integrated from the base inventory into the modeling platform are sector and chemical
mechanism specific. In recent years, CB6R3_AE7 has been the primary chemical mechanism used at the
EPA. Within that mechanism, naphthalene (NAPH), benzene (BENZ), acetaldehyde (ALD2), formaldehyde
(FORM), and methanol (MEOH) are explicit HAP-VOCs, and these compounds are collectively referred to
as NBAFM. Since NBAFM are explicitly modeled in CB6R3_AE7, these species have become the default
collection of integrated HAP species at the EPA. MOVES, the EPA's mobile emissions model, features
additional species that are explicitly modeled (e.g., ethanol). These species are also incorporated directly
into modeling platforms. To incorporate these species, additional files from the S2S-Tool are required.
For California, speciation of NONHAPTOG is performed on CARB's VOC submissions using the county-
specific speciation profile assignments generated by MOVES in California.

31

-------
In the NEI, N0X emissions are inventoried on a NO2 weighted basis, but must be speciated into NO, NO2,
and HONO. Table 3-7 provides the NOx speciation profiles used in EPA's modeling platforms. The only
difference between the two profiles is the allocation of some NO2 mass to HONO in the "HONO" profile.
HONO emissions from mobile sources have been identified in tunnel studies and its inclusion in
emissions inventories is important for urban chemistry. Here, a HONO to NOx ratio of 0.008 was selected
(Sarwar, 2008). In this modeling platform, all non-mobile sources use the "NHONO" profile, all non-
onroad mobile sources (including nonroad, cmv, and rail) use the "HONO" profile, and all onroad NOx
speciation occurs within MOVES. For further details on NOx speciation within MOVES, please see the
associated technical report.

Table 3-7. NOx speciation profiles

Profile

pollutant

Species

split factor

HONO

NOx

N02

0.092

HONO

NOx

0.9

HONO

NOx

HONO

0.008

NHONO

NOx

N02

0.1

NHONO

NOx

0.9

3.3.5 Temporal Processing Configuration

Temporal allocation is the process of distributing aggregated emissions to a finer temporal resolution,
thereby converting annual emissions to hourly emissions as is required by CMAQ. While the total
emissions are important, the timing of the occurrence of emissions is also essential for accurately
simulating ozone, PM, and other pollutant concentrations in the atmosphere. Many emissions
inventories are annual or monthly in nature. Temporal allocation takes these aggregated emissions and
distributes the emissions to the hours of each day. This process is typically done by applying temporal
profiles to the inventories in this order: monthly, day of the week, and diurnal, with monthly and day-of-
week profiles applied only if the inventory is not already at that level of detail.

The temporal factors applied to the inventory were selected using some combination of country, state,
county, SCC, and pollutant. Table 3-8 summarizes the temporal aspects of emissions modeling by
comparing the key approaches used for temporal processing across the sectors. In the table, "Daily
temporal approach" refers to the temporal approach for getting daily emissions from the inventory
using the SMOKE Temporal program. The values given are the values of the SMOKE L_TYPE setting. The
"Merge processing approach" refers to the days used to represent other days in the month for the
merge step. If this is not "all," then the SMOKE merge step runs only for representative days, which
could include holidays as indicated by the right-most column. The values given are those used for the
SMOKE M_TYPE setting (see below for more information).

-------
Table 3-8. Temporal Settings Used for the Platform Sectors in SMOKE





Monthly

Daily

Merge

Process

Platform sector

Inventory

profiles

temporal

processing

holidays as

short name

resolutions

used?

approach

approach

separate days

afdust_adj

Annual

Yes

week

all

Yes

airports

Annual

Yes

all

All

No

beis

Hourly



n/a

all

No

cmv_clc2

Annual & hourly



All

all

No

cmv_c3

Annual & hourly



All

all

No

fertilizer

Monthly



met-based

All

Yes

livestock

Daily



met-based

All

No

nonpt

Annual

Yes

week

week

Yes

nonroad

Monthly



mwdss

mwdss

Yes

np_oilgas

Annual

Yes

aveday

aveday

No

onroad

Annual & monthly1



all

all

Yes

onroad_ca_adj

Annual & monthly1



all

all

Yes

canada_afdust

Annual & monthly

Yes

week

all

No

canmex_area

Monthly



week

week

No

canada_onroad

Monthly



week

week

No

mexico_onroad

Monthly



week

week

No

canmex_point

Monthly

Yes

mwdss

mwdss

No

canada_ptdust

Annual

Yes

week

all

No

canmex_ag

Annual

Yes

mwdss

mwdss

No

canada_og2D

Monthly



mwdss

mwdss

No

pt_oilgas

Annual

Yes

mwdss

mwdss

Yes

ptegu

Annual & hourly

Yes2

all

All

No

ptnonipm

Annual

Yes

mwdss

mwdss

Yes

ptagfire

Daily



all

all

No

ptfire-rx

Daily



all

all

No

ptfire-wild

Daily



all

all

No

ptfire_othna

Daily



all

all

No

rail

Annual

Yes

aveday

aveday

No

rwc

Annual

No3

met-based3

All

No3

np_solvents

Annual

Yes

aveday

aveday

No

1.	Note the annual and monthly "inventory" actually refers to the activity data (VMT, VPOP, starts)
for onroad. The actual emissions are computed on an hourly basis.

2.	Only units that do not have matching hourly CEMs data use monthly temporal profiles.

3.	Except for 2 SCCs that do not use met-based temporalization.

The following values are used in the table. The value "all" means that hourly emissions were computed
for every day of the year and that emissions potentially have day-of-year variation. The value "week"
means that hourly emissions were computed for all days in one "representative" week, representing all

33

-------
weeks for each month. This means emissions have day-of-week variation, but not week-to-week
variation within the month. The value "mwdss" means hourly emissions for one representative Monday,
representative weekday (Tuesday through Friday), representative Saturday, and representative Sunday
for each month. This means emissions have variation between Mondays, other weekdays, Saturdays and
Sundays within the month, but not week-to-week variation within the month. The value "aveday" means
hourly emissions computed for one representative day of each month, meaning emissions for all days
within a month are the same. Special situations with respect to temporal allocation are described in the
following subsections.

In addition to the resolution, temporal processing includes a ramp-up period for several days prior to
January 1, 2021, which is intended to mitigate the effects of initial condition concentrations. The ramp-
up period was 10 days (December 22-31, 2020). For all anthropogenic sectors, emissions from December
2021 were used to fill in emissions for the end of December 2020. For biogenic emissions, December
2020 emissions were computed using year 2020 meteorology.

The FF10 inventory format for SMOKE provides a consolidated format for monthly, daily, and hourly
emissions inventories. With the FF10 format, a single inventory file can contain emissions for all 12
months and the annual emissions in a single record. This helps simplify the management of numerous
inventories. Similarly, daily and hourly FF10 inventories contain individual records with data for all days
in a month and all hours in a day, respectively.

SMOKE prevents the application of temporal profiles on top of the "native" resolution of the inventory.
For example, a monthly inventory should not have annual-to-month temporal allocation applied to it;
rather, it should only have month-to-day and diurnal temporal allocation. This becomes particularly
important when specific sectors have a mix of annual, monthly, daily, and/or hourly inventories. The
flags that control temporal allocation for a mixed set of inventories are discussed in the SMOKE
documentation. The modeling platform sectors that make use of monthly values in the FF10 files are
nonroad, onroad (for activity data), and all Canada and Mexico inventories except for agriculture.
Commercial marine vessels in cmv_c3 and cmv_clc2 use hourly data in the FF10 files.

3.3.6 Vertical Allocation of Emissions

Table 3-4 specifiesthe sectors for which plume rise is calculated. If there is no plume rise for a sector, the
emissions are placed into layer 1 of the air quality model. Vertical plume rise was performed in-line
within CMAQ for all of the SMOKE point-source sectors (i.e., ptegu, ptnonipm, pt_oilgas, ptfire-rx, ptfire-
wild, ptagfire, ptfire_othna, othpt, and cmv_c3). The in-line plume rise computed within CMAQ is nearly
identical to the plume rise that would be calculated within SMOKE using the Laypoint program. The
selection of point sources for plume rise is pre-determined in SMOKE using the Elevpoint program. The
calculation is done in conjunction with the CMAQ model time steps with interpolated meteorological
data and is therefore more temporally resolved than when it is done in SMOKE. Also, the calculation of
the location of the point sources is slightly different than the one used in SMOKE and this can result in
slightly different placement of point sources near grid cell boundaries.

For point sources, the stack parameters are used as inputs to the Briggs algorithm, but point fires

-------
do not have traditional stack parameters. However, the ptfire-rx, ptfire-wild, ptagfire, and ptfire_othna
inventories do contain data on the acres burned (acres per day) and fuel consumption (tons fuel per
acre) for each day. CMAQ uses these additional parameters to estimate the plume rise of emissions into
layers above the surface model layer. Specifically, these data are used to calculate heat flux, which is then
used to estimate plume rise. In addition to the acres burned and fuel consumption, heat content of the
fuel is needed to compute heat flux. The heat content was assumed to be 8000 Btu/lb of fuel for all fires
because specific data on the fuels were unavailable in the inventory. The plume rise algorithm applied to
the fires is a modification of the Briggs algorithm with a stack height of zero.

CMAQ uses the Briggs algorithm to determine the plume top and bottom, and then computes the
plumes' distributions into the vertical layers that the plumes intersect. The pressure difference across
each layer divided by the pressure difference across the entire plume is used as a weighting factor to
assign the emissions to layers. This approach gives plume fractions by layer and source. Note that the
implementation of fire plume rise in CMAQ differs from the implementation of plume rise in SMOKE. This
study uses CMAQ to compute the fire plume rise.

3.3.7 Emissions Modeling Spatial Allocation

The methods used to perform spatial allocation are summarized in this section. For the modeling
platform, spatial factors are typically applied by county and SCC. Spatial allocation was performed for
the 12US1 modeling grid. To accomplish this, SMOKE used national 12-km spatial surrogates and a
SMOKE area-to-point data file. For the U.S., the surrogates use circa 2020 data. The U.S., Mexican, and
Canadian 12-km surrogates cover the entire CONUS domain. For Canada, shapefiles for generating new
surrogates were provided by ECCC for use with their 2020 inventories. The U.S., Mexican, and Canadian
12-km surrogates cover the entire CONUS domain 12US1. While highlights of information are provided
below, the file Surrogate_specifications_2021_platform_US_Can_Mex.xlsx documents the complete
configuration for generating the surrogates and can be referenced for more details.

3.3.7.1 Surrogates for U.S. Emissions

There are more than 80 spatial surrogates available for spatially allocating U.S. county-level emissions to
the 12-km grid cells used by the air quality model. Note that an area-to-point approach overrides the use
of surrogates for a limited set of sources. Table 3-9 lists the codes and descriptions of the surrogates.
Surrogate names and codes listed in italics are not directly assigned to any sources for this platform, but
they are sometimes used to gapfill other surrogates. When the source data for a surrogate have no
values for a particular county, gap filling is used to provide values for the spatial surrogate in those
counties to ensure that no emissions are dropped when the spatial surrogates are applied to the
emission inventories.

The surrogates for the platform are based on a variety of geospatial data sources, including the
American Community Survey (ACS) for census-related data and the National Land Cover Database
(NLCD). Onroad surrogates are based on average annual daily traffic counts (AADT) from the highway
monitoring performance system (HPMS).

-------
Surrogates for the U.S. were generated using the Surrogate Tools DB with the Java-based Surrogate tools
used to perform gapfilling and normalization where needed. The tool and documentation for the original
Surrogate Tool are available at https://www.cmascenter.org/sa-

tools/documentation/4.2/SurrogateToolUserGuide 4 2.pdf, and the tool and documentation for the
Surrogate Tools DB is available from https://www.cmascenter.org/surrogate tools db/. The file
"Surrogate_specifications_2021_platform_US_Can_Mex.xlsx" documents the configuration for
generating the surrogates.

Table 3-9. U.S. Surrogates available for the modeling platform

Code

Surrogate Description

Code

Surrogate Description

N/A

Area-to-point approach (see 3.6.2)

6696

All Abandoned CBM Wells - Plugged

100

Population

6697

All Abandoned Oil Wells - Unplugged

110

Housing

6698

All Abandoned Gas Wells - Unplugged

135

Detached Housing

670

Spud Count - CBM Wells

136

Single and Dual Unit Housing

671

Spud Count - Gas Wells

137

Single + Dual Unit + Manufactured Housing

672

Gas production - oil wells

150

Residential Heating - Natural Gas

674

Unconventional Well Completion Counts

170

Residential Heating - Distillate Oil

676

Well count - all producing

180

Residential Heating - Coal

677

Well count - all exploratory

190

Residential Heating - LP Gas

678

Completions at Gas Wells

205

Extended Idle Locations

679

Completions at CBM Wells

239

Total Road AADT

681

Spud Count - Oil Wells

240

Total Road Miles

683

Produced Water at All Wells

242

All Restricted AADT

6831

Produced water at CBM wells

244

All Unrestricted AADT

6832

Produced water at gas wells

258

Intercity Bus Terminals

6833

Produced water at oil wells

259

Transit Bus Terminals

685

Completions at Oil Wells

261

NTAD Total Railroad Density

686

Completions - all wells

271

NTAD Class 12 3 Railroad Density

687

Feet Drilled at All Wells

300

NLCD Low Intensity Development

689

Gas Produced - Total

304

NLCD Open + Low

691

Well Counts-CBM Wells

305

NLCD Low + Med

692

Spud Count - All Wells

306

NLCD Med + High

693

Well Count - All Wells

307

NLCD All Development

694

Oil Production at Oil Wells

308

NLCD Low + Med + High

695

Well Count - Oil Wells

309

NLCD Open + Low + Med

696

Gas Production at Gas Wells

310

NLCD Total Agriculture

697

Oil production - gas wells

319

NLCD Crop Land

698

Well Count - Gas Wells

320

NLCD Forest Land

699

Gas Production at CBM Wells

321

NLCD Recreational Land

711

Airport Areas

340

NLCD Land

801

Port Areas

350

NLCD Water

850

Golf Courses

401

FAO 2010 Cattle

860

Mines

402

FAO 2010 Pig

861

Sand and Gravel Mines

403

FAO 2010 Chicken

862

Lead Mines

404

FAO 2010 Goat

863

Crushed Stone Mines

405

FAO 2010 Horse

900

OSM Fuel

406

FAO 2010 Sheep

901

OSM Asphalt Surfaces

36

-------
Code

Surrogate Description

Code

Surrogate Description

508

Public Schools

902

OSM Unpaved Roads

650

Refineries and Tank Farms

4011

FAO 2010 Large Cattle Operations

669

All Abandoned Wells

4012

NPDES 2020 Beef Cattle

6691

All Abandoned Oil Wells

4013

NPDES 2020 Dairy Cattle

6692

All Abandoned Gas Wells

4021

NPDES 2020 Swine

6693

All Abandoned CBM Wells

4031

NPDES 2020 Chicken

6694

All Abandoned Oil Wells - Plugged

4041

NPDES 2020 Goat

6695

All Abandoned Gas Wells - Plugged

4071

NPDES 2020 Turkey

For the onroad sector, the on-network (RPD) emissions were spatially allocated differently from other
off-network processes (i.e. RPV, RPP, RPHO, RPS, RPH). Surrogates for on-network processes are based
on AADT data and off network processes (including the off-network idling included in RPHO) are based
on land use surrogates as shown in Table 3-10. Emissions from the extended (i.e., overnight) idling of
trucks were assigned to surrogate 205, which is based on locations of overnight truck parking spaces.
The underlying data for this surrogate were updated during the development of the 2016 platforms to
include additional data sources and corrections based on comments received and these updates were
carried into this platform.

Table 3-10. Off-Network Mobile Source Surrogates

Source type

Source Type name

Surrogate ID

Description

11

Motorcycle

307

NLCD All Development

21

Passenger Car

307

NLCD All Development

31

Passenger Truck

307

NLCD All Development

32

Light Commercial Truck

308

NLCD Low + Med + High

41

Other Bus

306

NLCD Med + High

42

Transit Bus

259

Transit Bus Terminals

43

School Bus

508

Public Schools

51

Refuse Truck

306

NLCD Med + High

52

Single Unit Short-haul Truck

306

NLCD Med + High

53

Single Unit Long-haul Truck

306

NLCD Med + High

54

Motor Home

304

NLCD Open + Low

61

Combination Short-haul Truck

306

NLCD Med + High

62

Combination Long-haul Truck

306

NLCD Med + High

For the oil and gas sources in the np_oilgas sector, the spatial surrogates were updated to those shown
in Table 3-11 using 2021 data consistent with what was used to develop the nonpoint oil and gas
emissions. The exploration and production of oil and gas have increased in terms of quantities and
locations over the last seven years, primarily through the use of new technologies, such as hydraulic
fracturing. Census-tract, 2-km, and 4-km sub-county Shapefiles were developed, from which the year-
specific oil and gas surrogates were generated. All spatial surrogates for np_oilgas are developed based
on known locations of oil and gas activity for year 2021.

37

-------
Table 3-11. Spatial Surrogates for Oil and Gas Sources

Surrogate Code

Surrogate Description

669

All Abandoned Wells

6691

All Abandoned Oil Wells

6692

All Abandoned Gas Wells

6693

All Abandoned CBM Wells

6694

All Abandoned Oil Wells - Plugged

6695

All Abandoned Gas Wells - Plugged

6696

All Abandoned CBM Wells - Plugged

6697

All Abandoned Oil Wells - Unplugged

6698

All Abandoned Gas Wells - Unplugged

670

Spud Count - CBM Wells

671

Spud Count - Gas Wells

672

Gas Production at Oil Wells

673

Oil Production at CBM Wells

674

Unconventional Well Completion Counts

676

Well Count - All Producing

677

Well Count - All Exploratory

678

Completions at Gas Wells

679

Completions at CBM Wells

681

Spud Count - Oil Wells

683

Produced Water at All Wells

685

Completions at Oil Wells

686

Completions at All Wells

687

Feet Drilled at All Wells

689

Gas Produced - Total

691

Well Counts - CBM Wells

692

Spud Count - All Wells

693

Well Count - All Wells

694

Oil Production at Oil Wells

695

Well Count - Oil Wells

696

Gas Production at Gas Wells

697

Oil Production at Gas Wells

698

Well Count - Gas Wells

699

Gas Production at CBM Wells

6831

Produced water at CBM wells

6832

Produced water at gas wells

6833

Produced water at oil wells

3.3.7.2 Allocation Method for Airport-Related Sources in the U.S.

There are numerous airport-related emission sources in the NEI, such as aircraft, airport ground support
equipment, and jet refueling. The modeling platform includes the aircraft and airport ground support

38

-------
equipment emissions as point sources. For the modeling platform, the EPA used the SMOKE "area-to-
point" approach for only jet refueling in the nonpt sector. The following SCCs use this approach:
2501080050 and 2501080100 (petroleum storage at airports), and 2810040000 (aircraft/rocket engine
firing and testing). The ARTOPNT file that lists the nonpoint sources to locate using point data was
unchanged from the 2005-based platform.

3.3.7.3 Surrogates for Canada and Mexico Emission Inventories

The surrogates for Canada to spatially allocate the Canadian emissions are based on the 2020 Canadian
inventories and associated data. The spatial surrogate data came from ECCC, along with cross
references. The shapefiles they provided were used in the Surrogate Tool (previously referenced) to
create spatial surrogates. The Canadian surrogates used for this platform are listed in Table 3-15. The
population surrogate was updated for Mexico is based on the 2015 GPW v4 (see
https://sedac.ciesin.columbia.edu/data/collection/gpw-v4/sets/browse). The other surrogates for
Mexico are circa 1999 and 2000 and were based on data obtained from the Sistema Municpal de Bases
de Datos (SIMBAD) de INEGI and the Bases de datos del Censo Economico 1999. The surrogates for
Mexico in this platform are show in Table 3-13.

Table 3-12. Canadian Spatial Surrogates

Code

Canadian Surrogate Description

Code

Description

100

Population

925

Manufacturing and Assembly

101

total dwelling

926

Distribution and Retail (no petroleum)

102

urban dwelling

927

Commercial Services

103

rural dwelling

933

Rail-Passenger

104

capped total dwelling

934

Rail-Freight

105

capped meat cooking dwelling

935

Rail-Yard

106

ALL INDUST

940

PAVED ROADS NEW

113

Forestry and logging

945

Commercial Marine Vessels

116

Total Resources

946

Construction and mining

200

Urban Primary Road Miles

948

Forest

210

Rural Primary Road Miles

949

Combination of Dwelling

211

Oil and Gas Extraction

951

Wood Consumption Percentage

212

Mining except oil and gas

952

Residential Fuel Wood Combustion (PIRD)

220

Urban Secondary Road Miles

955

UNPAVED ROADS AND TRAILS

221

Total Mining

960

TOTBEEF

222

Utilities

961

80110 Broilers

230

Rural Secondary Road Miles

962

8011 l_Catt 1 e_d a i ry_a n d_H e if e r

233

Total Land Development

963

80112_Cattle_non-Dairy

240

capped population

964

80113_Laying_hens_and_Pullets

308

Food manufacturing

965

80114 Horses

321

Wood product manufacturing

966

80115_Sheep_and_Lamb

323

Printing and related support activities

967

80116 Swine

324

Petroleum and coal products
manufacturing

968

80117_Turkeys

39

-------
Code

Canadian Surrogate Description

Code

Description



Plastics and rubber products





326

manufacturing

969

80118 Goat



Non-metallic mineral product





327

manufacturing

970

TOTPOUL

331

Primary Metal Manufacturing

971

80119 Buffalo

340

Construction - Oil and Gas

972

80120_Llama_and_Alpacas

350

Water

973

80121 Deer



Petroleum product wholesaler-





412

distributors

974

80122 Elk

448

clothing and clothing accessories stores

975

80123 Wild boars



Waste management and remediation





562

services

976

80124 Rabbit



SCL12003 Petroleum Liquids





601

Transportation (PIRD)

977

80125 Mink



SCL12007 Oil Sands In-Situ Extraction





602

and Processing (PIRD)

978

80126 Fox



SCL12010 Light Medium Crude Oil





603

Production (PIRD)

980

TOTSWIN

604

SCL

12011 Well Drilling (PIRD)

981

Harvest Annual

605

SCL

12012 Well Servicing (PIRD)

982

Harvest Perennial

606

SCL

12013 Well Testing (PIRD)

983

Synthfert_Annual

607

SCL

12014 Natural Gas Production (PIRD)

984

Synthfert_Perennial

608

SCL

12015 Natural Gas Processing (PIRD)

985

Tillage_Annual



SCL

12016 Heavy Crude Oil Cold





609

Production (PIRD)

990

TOTFERT



SCL:12018 Disposal and Waste Treatment





610

(PIRD)

996

urban area



SCL:12019 Accidents and Equipment





611

Failures (PIRD)

1251

OFFR TOTFERT



SCL:12020 Natural Gas Transmission and





612

Storage (PIRD)

1252

OFFR MINES

651

MEITC1C2 Anchored

1253

OFFR Other Construction not Urban

652

MEIT C1C2 Underway

1254

OFFR Commercial Services

653

MEITC1C2 Berthed

1255

OFFR Oil Sands Mines

661

MEIT C3 Anchored

1256

OFFR Wood industries CANVEC

662

MEIT C3 Underway

1257

OFFR UNPAVED ROADS RURAL

663

MEIT C3 Berthed

1258

OFFR Utilities

901

AIRPORT

1259

OFFR total dwelling

902

Military LTO

1260

OFFR water

903

Commercial LTO

1261

OFFR ALL INDUST

904

General Aviation LTO

1262

OFFR Oil and Gas Extraction

905

Air Taxi LTO

1263

OFFR ALLROADS

921

Commercial Fuel Combustion

1264

OFFR AIRPORT



TOTAL INSTITUTIONAL AND





923

GOVERNEMENT

1265

OFFR_RAILWAY

40

-------
Code

Canadian Surrogate Description

Code

Description

924

Primary Industry

Table 3-13. Mexican Spatial Surrogates

Code

SURROGATE

WEIGHT SHAPEFILE

WEIGHT ATTRIBUTE

MEX Population

mex_population_2020

gridcode_Y

MEX Total Road Miles

mex roads

NONE

MEX Total Railroads Miles

mex railroads

NONE

MEX Total Agriculture

mex_agriculture

NONE

MEX Commercial plus Industrial Land

mex com ind land

NONE

MEX Airports Area

m ex_a i rpo rts_a rea

NONE

MEX Airports Point

m ex_a i rpo rts_poi nt

NONE

MEX Brick Kilns

mex brick kilns

NONE

MEX Border Crossings

mex_border_crossings

SUM_Value

3.4 Emissions References

Appel, K.W., Napelenok, S., Hogrefe, C., Pouliot, G., Foley, K.M., Roselle, S.J., Pleim, J.E., Bash, J., Pye,
H.O.T., Heath, N., Murphy, B., Mathur, R., 2018. Overview and evaluation of the Community
Multiscale Air Quality Model (CMAQ) modeling system version 5.2. In Mensink C., Kallos G. (eds), Air
Pollution Modeling and its Application XXV. ITM 2016. Springer Proceedings in Complexity. Springer,
Cham. Available at https://doi.org/10.1007/978-3-319-57645-9 11.

Bullock Jr., R, and K. A. Brehme (2002) "Atmospheric mercury simulation using the CMAQ model:

formulation description and analysis of wet deposition results." Atmospheric Environment 36, pp
2135-2146. Available at https://doi.org/10.1016/S1352-2310(02)00220-0.

EPA, 2018. AERMOD Model Formulation and Evaluation Document. EPA-454/R-18-003. U.S.

Environmental Protection Agency, Research Triangle Park, North Carolina 27711. Available at
https://www3.epa.gov/ttn/scram/models/aermod/aermod mfed.pdf.

EPA, 2019. Final Report, SPECIATE Version 5.0, Database Development Documentation, Research
Triangle Park, NC, EPA/600/R-19/988. Available with Addenda for versions 5.1, 5.2, and 5.3 at
https://www.epa.gov/air-emissions-modeling/speciate-51-and-50-addendum-and-final-report.

EPA, 2022a. Technical Support Document EPA's Air Toxics Screening Assessment - 2018 AirToxScreen
TSD. EPA-452/B-22-002. Available at: https://www.epa.gov/AirToxScreen/airtoxscreen-technical-
support-document.

EPA, 2022b. Technical Support Document: Preparation of Emissions Inventories for the 2019 North

American Emissions Modeling Platform. EPA-454/B-24-011. Available at: https://www.epa.gov/air-
emissions-modeling/2019-emissions-modeling-platform-technical-support-document.

-------
EPA, 2023. 2020 National Emission Inventory Technical Support Document. EPA-454/R-23-001. U.S.
Environmental Protection Agency, OAQPS, Research Triangle Park, NC 27711. Available at:
https://www.epa.gov/air-emissions-inventories/2020-national-emissions-inventory-nei-technical-
support-document-tsd.

EPA, 2024. Technical Support Document (TSD): Preparation of Emissions Inventories for the 2021 North
American Emissions Modeling Platform. EPA-454/B-24-011. Available at https://www.epa.gov/air-
emissions-modeling/2021-emissions-modeling-platform-technical-support-document.

Luecken D., Yarwood G, Hutzell WT, 2019. Multipollutant modeling of ozone, reactive nitrogen and HAPs
across the continental US with CMAQ-CB6. Atmospheric environment. 2019 Mar 15;201:62-72.

Sarwar, G., S. Roselle, R. Mathur, W. Appel, R. Dennis, "A Comparison of CMAQ HONO predictions with
observations from the Northeast Oxidant and Particle Study", Atmospheric Environment 42
(2008) 5760-5770). Available at https://doi.Org/10.1016/i.atmosenv.2007.12.065.

Seltzer, K. M., Pennington, E., Rao, V., Murphy, B. N., Strum, M., Isaacs, K. K., and Pye, H. 0. T., 2021:

"Reactive organic carbon emissions from volatile chemical products", Atmos. Chem. Phys. 21, 5079-
5100, 2021. https://doi.org/10.5194/acp-21-5079-2021and
https://acp.copernicus.org/articles/21/5079/2021/.

Skamarock, W., J. Klemp, J. Dudhia, D. Gill, D. Barker, M. Duda, X. Huang, W. Wang, J. Powers, 2008. A
Description of the Advanced Research WRF Version 3. NCAR Technical Note. National Center for
Atmospheric Research, Mesoscale and Microscale Meteorology Division, Boulder, CO. June 2008.
Available at: http://www2.mmm.ucar.edu/wrf/users/docs/arw v3 bw.pdf.

Wiedinmyer, C., Y. Kimura, E. C. McDonald-Buller, L. K. Emmons, R. R. Buchholz, W. Tang, K. Seto, M. B.
Joseph, K. C. Barsanti, A. G. Carlton, and R. Yokelson, Volume 16, issue 13, GMD, 16, 3873-3891,
2023. https://gmd.copernicus.org/articles/16/3873/2023/.

Yarwood, G., R. Beardsley, Y. Shi, and B. Czader: Revision 5 of the Carbon Bond 6 Mechanism (CB6r5).
Presented at the Annual CMAS Conference, Chapel Hill, NC, 2020.

42

-------
4.0 CMAQ Air Quality Model Estimates

4.1 Introduction to the CMAQ Modeling Platform

The Clean Air Act (CAA) provides a mandate to assess and manage air pollution levels to protect human
health and the environment. EPA has established National Ambient Air Quality Standards (NAAQS),
requiring the development of effective emissions control strategies for such pollutants as ozone and
particulate matter. Air quality models are used to develop these emission control strategies to achieve
the objectives of the CAA.

Historically, air quality models have addressed individual pollutant issues separately. However, many of
the same precursor chemicals are involved in both ozone and aerosol (particulate matter) chemistry;
therefore, the chemical transformation pathways are dependent. Thus, modeled abatement strategies of
pollutant precursors, such as VOC and NOx to reduce ozone levels, may exacerbate other air pollutants
such as particulate matter. To meet the need to address the complex relationships between pollutants,
EPA developed the Community Multiscale Air Quality (CMAQ) modeling system.8 The primary goals for
CMAQ are to:

• Improve the environmental management community's ability to evaluate the impact of air quality
management practices for multiple pollutants at multiple scales.

• Improve the scientist's ability to better probe, understand, and simulate chemical and physical interactions
in the atmosphere.

The CMAQ modeling system brings together key physical and chemical functions associated with the
dispersion and transformations of air pollution at various scales. It was designed to approach air quality
as a whole by including state-of-the-science capabilities for modeling multiple air quality issues, including
tropospheric ozone, fine particles, toxics, acid deposition, and visibility degradation. CMAQ relies on
emission estimates from various sources, including the U.S. EPA Office of Air Quality Planning and
Standards' current emission inventories, observed emission from major utility stacks, and model
estimates of natural emissions from biogenic and agricultural sources. CMAQ also relies on
meteorological predictions that include assimilation of meteorological observations as constraints.
Emissions and meteorology data are fed into CMAQ and run through various algorithms that simulate the
physical and chemical processes in the atmosphere to provide estimated concentrations of the
pollutants. Traditionally, the model has been used to predict air quality across a regional or national
domain and then to simulate the effects of various changes in emission levels for policymaking purposes.
For health studies, the model can also be used to provide supplemental information about air quality in
areas where no monitors exist.

8 Byun, D.W., and K. L Schere, 2006: Review of the Governing Equations, Computational Algorithms, and Other Components
of the Models-3 Community Multiscale Air Quality (CMAQ) Modeling System. Applied Mechanics Reviews, Volume 59,
Number 2 (March 2006), pp. 51-77.

-------
CMAQ was also designed to have multi-scale capabilities so that separate models were not needed for
urban and regional scale air quality modeling. The CMAQ simulation performed for this 2020 assessment
used a single domain that covers the entire continental U.S. (CONUS) and large portions of Canada and
Mexico using 12-km by 12-km horizontal grid spacing. Currently, 12-km x 12-km resolution is sufficient as
the highest resolution for most regional-scale air quality model applications and assessments.9 With the
temporal flexibility of the model, simulations can be performed to evaluate longerterm (annual to multi-
year) pollutant climatologies as well as short-term (weeks to months) transport from localized sources.
By making CMAQ a modeling system that addresses multiple pollutants and different temporal and
spatial scales, CMAQ has a "one atmosphere" perspective that combines the efforts of the scientific
community. Improvements will be made to the CMAQ modeling system as the scientific community
further develops the state-of-the-science.

For more information on CMAQ, go to https://www.epa.gov/cmaq or http://www.cmascenter.org.
4.1.1 Advantages and Limitations of the CMAQ Air Quality Model

An advantage of using the CMAQ model output for characterizing air quality for use in comparing with
health outcomes is that it provides a complete spatial and temporal coverage across the U.S. CMAQ is a
three-dimensional Eulerian photochemical air quality model that simulates the numerous physical and
chemical processes involved in the formation, transport, and destruction of ozone, particulate matter,
and air toxics for given input sets of initial and boundary conditions, meteorological conditions, and
emissions. The CMAQ model includes state-of-the-science capabilities for conducting urban to regional
scale simulations of multiple air quality issues, including tropospheric ozone, fine particles, toxics, acid
deposition, and visibility degradation. However, CMAQ is resource intensive, requiring significant data
inputs and computing resources.

An uncertainty of using the CMAQ model includes structural uncertainties, representation of physical
and chemical processes in the model. These consist of: choice of chemical mechanism used to
characterize reactions in the atmosphere, choice of land surface model, and choice of planetary
boundary layer. Another uncertainty in the CMAQ model is based on parametric uncertainties, which
include uncertainties in the model inputs: hourly meteorological fields, hourly 3-D gridded emissions,
initial conditions, and boundary conditions. Uncertainties due to initial conditions are minimized by
using a 10-day ramp-up period from which model results are not used in the aggregation and analysis of
model outputs. Evaluations of models against observed pollutant concentrations build confidence that
the model performs with reasonable accuracy despite the uncertainties listed above. A detailed model
evaluation for ozone and PM2.5 species provided in Section 4.3 shows generally acceptable model
performance which is equivalent or better than typical state-of-the-science regional modeling
simulations as summarized in Simon et al., 2012.10

9 U.S. EPA (2018), Modeling Guidance for Demonstrating Air Quality Goals for Ozone, PM2.5, and Regional Haze, pp 205.
https://www3.epa.gov/ttn/scram/guidance/guide/O3-PM-RH-Modeling_Guidance-2018.pdf.

10 Simon, H„ Baker, K.R., and Phillips, S. (2012) Compilation and interpretation of photochemical model performance
statistics published between 2006 and 2012. Atmospheric Environment 61,124-139.

-------
4.2 CMAQ Model Version, Inputs and Configuration

This section describes the air quality modeling platform used for the 2021 CMAQ simulation. A modeling
platform is a structured system of connected modeling-related tools and data that provide a consistent
and transparent basis for assessing the air quality response to changes in emissions and/or meteorology.
A platform typically consists of a specific air quality model, emissions estimates, a set of meteorological
inputs, and estimates of boundary conditions representing pollutant transport from source areas outside
the region modeled. We used the CMAQ modeling system as part of the 2021 Platform to provide a
national scale air quality modeling analysis. The CMAQ model simulates the multiple physical and
chemical processes involved in the formation, transport, and destruction of ozone and PM2.5.

This section provides a description of each of the main components of the 2021 CMAQ simulation along
with the results of a model performance evaluation in which the 2021 model predictions are compared
to corresponding measured ambient concentrations.

4.2.1 CMAQ Model Version

CMAQ is a non-proprietary computer model that simulates the formation and fate of photochemical
oxidants, including PM2.5 and ozone, for given input sets of meteorological conditions and emissions. As
mentioned previously, CMAQ includes numerous science modules that simulate the emission,
production, decay, deposition, and transport of organic and inorganic gas-phase and particle pollutants
in the atmosphere. This 2021 analysis employed CMAQ version 5.4.11 The 2021 CMAQ run included
CB6r5 chemistry12'13, AER07 aerosol module14 with non-volatile Primary Organic Aerosol (POA), and
updated halogen chemistry15. The CMAQ community model versions 5.2 and 5.3 were most recently
peer-reviewed in May of 2019 for the U.S. EPA.16

11 CMAQ version 5.4: United States Environmental Protection Agency. (2022). CMAQ (Version 5.4) [Software], Available from
https://doi.org/10.5281/zenodo.7218076; https://www.epa.gov/cmaa. CMAQ v5.4 is also available from the Community
Modeling and Analysis System (CMAS) at: http://www.cmascenter.org.

12 Luecken, D. J., Yarwood, G., and Hutzell, W. T.: Multipollutant modeling of ozone, reactive nitrogen and HAPs across the
continental US with CMAQ-CB6, Atmos Environ, 201, 62-72,10.1016/j.atmosenv.2018.11.060, 2019.

13 Yarwood, G., Beardsley, R., Shi, Y., Czader, B.: Revision 5 of the Carbon Bond 6 Mechanism (CB6r5), CMAS 2020, October
27, 2020. https://www.cmascenter.org/conference/2020/slides/BeardsleyR_CMAS2020_CarbonBond6_Revision5_clean.pdf

14 Xu, L., Pye, H. O. T., He, J., Chen, Y. L., Murphy, B. N., and Ng, N. L: Experimental and model estimates of the contributions
from biogenic monoterpenes and sesquiterpenes to secondary organic aerosol in the southeastern United States, Atmos
Chem Phys, 18, 12613-12637, 10.5194/acp-18-12613-2018, 2018.

15 Kang, D.; Willison, J.; Sarwar, G.; Madden, M.; Hogrefe, C.; Mathur, R.; Gantt, B.; and Saiz-Lopez, A.: Improving the
Characterization of Natural Emissions in CMAQ Environmental Manager, A&WMA, October 2021.

16 Barsanti, K.C., Pickering, K.E., Pour-Biazar, A., Saylor, R.D., Stroud, C.A., (June 19, 2019). Final Report: Sixth Peer Review of
the Community Multiscale Air Quality (CMAQ) Modeling System, /https://www.epa.gov/sites/default/files/2019-
08/documents/sixth_cmaq_peer_review_comment_report_6.19.19.pdf.

This peer review was focused on CMAQv5.2, which was released in June of 2017, as well as CMAQ v5.3, which was released in
August of 2019. It is available from the Community Modeling and Analysis System (CMAS) as well as previous peer-review
reports at: http://www.cmascenter.org.

-------
4.2.2 Model Domain and Grid Resolution

The CMAQ modeling analyses were performed for a domain covering the continental United States, as
shown in Figure 4-1. This single domain covers the entire continental U.S. (CONUS) and large portions of
Canada and Mexico using 12-km by 12-km horizontal grid spacing. The 2021 simulation used a Lambert
Conformal map projection centered at (-97, 40) with true latitudes at 33 and 45 degrees north. The 12-
km CMAQ domain consisted of 459 by 299 grid cells and 35 vertical layers. Table 4-1 provides some basic
geographic information regarding the 12-km CMAQ domain. The model extends vertically from the
surface to 50 millibars (approximately 17,600 meters) using a sigma-pressure coordinate system. Table
4-2 shows the vertical layer structure used in the 2021 simulation. Air quality conditions at the outer
boundary of the 12-km domain were taken from the GEOS-Chem global model (discussed in Section
4.2.4).

Table 4-1. Geographic Information for 202112-km Modeling Domain

National 12 km CMAQ Modeling Configuration

Map Projection

Lambert Conformal Projection

Grid Resolution

12 km

Coordinate Center

97 W, 40 N

True Latitudes

33 and 45 N

Dimensions

459 x 299 x 35

Vertical Extent

35 Layers: Surface to 50 mb level
(see Table 4-2)

Table 4-2. Vertical layer structure for 2021 CMAQ simulation (heights are layer top).

Vertical
Layers

Sigma P

Pressure
(mb)

Approximate
Height (m)

35

0.0000

50.00

17,556

34

0.0500

97.50

14,780

33

0.1000

145.00

12,822

32

0.1500

192.50

11,282

31

0.2000

240.00

10,002

30

0.2500

287.50

8,901

29

0.3000

335.00

7,932

28

0.3500

382.50

7,064

27

0.4000

430.00

6,275

26

0.4500

477.50

5,553

25

0.5000

525.00

4,885

24

0.5500

572.50

4,264

23

0.6000

620.00

3,683

22

0.6500

667.50
46

3,136

-------
Vertical
Layers

Sigma P

Pressure
(mb)

Approximate
Height (m)

21

0.7000

715.00

2,619

20

0.7400

753.00

2,226

19

0.7700

781.50

1,941

18

0.8000

810.00

1,665

17

0.8200

829.00

1,485

16

0.8400

848.00

1,308

15

0.8600

867.00

1,134

14

0.8800

886.00

964

13

0.9000

905.00

797

12

0.9100

914.50

714

11

0.9200

924.00

632

10

0.9300

933.50

551

9

0.9400

943.00

470

8

0.9500

952.50

390

7

0.9600

962.00

311

6

0.9700

971.50

232

5

0.9800

981.00

154

4

0.9850

985.75

115

3

0.9900

990.50

77

2

0.9950

995.25

38

1

0.9975

997.63

19

0

1.0000

1000.00

0

47

-------
Figure 4-1, Map of the 2021 CMAQ Modeling Domain. The blue box denotes the 12-km national

modeling domain.

4.2.3 Modeling Period/ Ozone Episodes

The 12-km CMAQ modeling domain was modeled for the entire year of 2021. The annual simulation
included a spin-up period, comprised of 10 days before the beginning of the simulation, to mitigate the
effects of initial concentrations. Ail 365 model days were used in the annual average levels of PM2.5. For
the 8-hour ozone, we used modeling results from the period between May 1 and September 30, This
153-day period generally conforms to the ozone season across most parts of the U.S. and contains the
majority of days that observed high ozone concentrations.

48

-------
4.2.4 Model Inputs: Emissions, Meteorology, and Boundary Conditions

2021 Emissions: The emissions inventories used in the 2021 air quality modeling are described in Section
3, above.

2021 Meteorological Input Data: The gridded meteorological data for the entire year of 2021 at the 12-
km continental United States scale domain was derived from the publicly available version 4.1.1 of the
Weather Research and Forecasting Model (WRF), Advanced Research WRF (ARW) core.17The WRF
Model is a state-of-the-science mesoscale numerical weather prediction system developed for both
operational forecasting and atmospheric research applications (http://wrf-model.org). The 12US WRF
model was initialized using the 12-km North American Model (12NAM)18 analysis product provided by
National Climatic Data Center (NCDC). Where 12NAM data was unavailable, the 40-km Eta Data
Assimilation System (EDAS) analysis (ds609.2) from the National Center for Atmospheric Research
(NCAR) was used. Analysis nudging for temperature, wind, and moisture was applied above the
boundary layer only. The model simulations were conducted continuously. The 'ipxwrf' program was
used to initialize deep soil moisture at the start of the run using a 10-day spin-up period. The 2021 WRF
meteorology simulated was based on 2011 National Land Cover Database (NLCD).19 The WRF simulation
included the physics options of the Pleim-Xiu land surface model (LSM), Asymmetric Convective Model
version 2 planetary boundary layer (PBL) scheme, Morrison double moment microphysics, Kain- Fritsch
cumulus parameterization scheme utilizing the moisture-advection trigger20 and the RRTMG long-wave
and shortwave radiation (LWR/SWR) scheme.21 In addition, the Group for High Resolution Sea Surface
Temperatures (GHRSST)22'23 1-km SST data was used for SST information to provide more resolved
information compared to the more coarse data in the NAM analysis. Additionally, the hybrid-vertical
coordinate system was employed, where the model is terrain-following (Eta) near the surface and isobaric
aloft, reducing the influence of surface features on upper-level dynamics.

2021 Initial and Boundary Conditions: The 2021 annual lateral boundary and initial species
concentrations were provided using a global 3-D GEOS-Chem vl4.0.1. GEOS-Chem is a 3-D model of
atmospheric chemistry driven by meteorological inputs from the Goddard Earth Observing System of the
National Aeronautics and Space Administration (NASA) Global Modeling Assimilation Office. GEOS-Chem
was run using the standard (or default) options and full atmospheric chemistry.24 The GEOS-Chem
simulation was performed at 2 x 2.5-degree horizontal resolution with a 72-layer vertical structure (36

17 Skamarock, W.C., Klemp, J.B., Dudhia, J., Gill, D.O., Barker, D.M., Duda, M.G., Huang, X., Wang, W., Powers, J.G., 2008. A
Description of the Advanced Research WRF Version 3.

18 North American Model Analysis-Only, http://nomads.ncdc.noaa.gov/data.php; download from
ftp://nomads.ncdc.noaa.gov/NAM/analysis_only/.

19 National Land Cover Database 2011, http://www.mrlc.gov/nlcd2011.php.

211 Ma, L-M. and Tan, Z-M, 2009. Improving the behavior of the Cumulus Parameterization forTropical Cyclone Prediction:
Convection Trigger. Atmospheric Research 92 Issue 2,190-211.

http://www.sciencedirect.com/science/article/pii/S01698095080Q2585.

21 Gilliam, R.C., Pleim, J.E., 2010. Performance Assessment of New Land Surface and Planetary Boundary Layer Physics in the
WRF-ARW. Journal of Applied Meteorology and Climatology 49, 760-774.

22 Stammer, D., F.J. Wentz, and C.L Gentemann, 2003, Validation of Microwave Sea Surface Temperature Measurements for
Climate Purposes, J. Climate, 16, 73-87.

23 Global High-Resolution SST (GHRSST) analysis, https://www.ghrsst.org/.

24 GEOS-Chem, https://geoschem.github.io/index.html

-------
layers in troposphere, hybrid terrain following coordinate). Simulation used full chemistry including an
online stratosphere, non-local planetary boundary layer, and simple secondary organic aerosols. The
2021 simulation required extending the methane inputs to the year 2021, updating lightning inputs, and
other parameters for 2021. Emissions included online Model of Emissions of Gases and Aerosols from
Nature (MEGAN) version 2.125, online DUST module, and online sea salt module. Global Fire Emissions
Database (GFED)26were monthly mean. Anthropogenic emissions included fugitive, combustion, and
industrial dust (Philip et al. 2017).27 Marine emissions were based on Community Emissions Data System
(CEDS) version 2 including shipping vessels.28 Aircraft Emissions Inventory Code (AEIC)29 monthly aircraft
input data. The 2021 GEOS-Chem run was spun-up from the previous 2020 CEDS and AEIC was scaled by
COvid-19 adjustment Factors fOR eMjssions (CONFORM) dataset.30 Meteorology used in this 2021
GEOS-Chem run was from Modern-Era Retrospective analysis for Research and Applications, version 2
(MERRA2)31 meteorology at 2 x 2.5-degree. With the exception of input updates for 2021, these were
the default options and inputs distributed with vl4.0.1.

4.3 CMAQ Model Performance Evaluation

An operational model performance evaluation for ozone and PM2.5 and its related speciated components
was conducted for the 2021 simulation using state/local monitoring sites data in order to estimate the
ability of the CMAQ modeling system to replicate the 2021 base year concentrations for the 12-km
continental U.S. domain.

There are various statistical metrics available and used by the science community for model performance
evaluation. For a robust evaluation, the principal evaluation statistics used to evaluate CMAQ
performance were two bias metrics, mean bias and normalized mean bias; and two error metrics, mean
error and normalized mean error.

25 Guenther, A.B., Jiang, X., Heald, C.L., Sakulyanontvittaya, T., Duhl, T., Emmons, L.K., and Wang, X. The Model of Emissions of
Gases and Aerosols from Nature version 2.1 (MEGAN2.1): an extended and updated framework for modeling biogenic
emissions, 2012, GMD, Volume 5, Issue 6,1471-1492.

26 https://www.globalfiredata.org/

27 Philip, S., Martin, R.V., Snider, G., Weagle, C.L, van Donkelaar, A., Brauer, M., Henze, D.K., Klimont, Z., Venkataraman, C.,
Guttikunda, S.K., and Zhang, Q., April 2017. "Anthropogenic fugitive, combustion and industrial dust is a significant,
underrepresented fine particulate matter source in global atmospheric models." Environmental Research Letters; Bristol, Vol.
12, Iss. 4. Doi:10.1088/1748-9326/aa65a4.

28 A Community Emissions Data System (CEDS) for Historical emissions, https://www.pnnl.gov/projects/ceds
29Simone, N.W., Stettler, M.E.J., Barrett, S.R.H., 2013. Rapid estimation of global civil aviation emissions with uncertainty
quantification, Transportation Research Part D: Transport and Environment, Volume 25, 33-41, ISSN 1361-9209,
https://doi.Org/10.1016/j.trd.2013.07.001.

30 Doumbia, T., Granier, C., Elguindi, N., Bouarar, I., Darras, S., Brasseur, G., Gaubert, B., Liu, Y., Shi, X., Stavrakou, T., Tilmes,
S., Lacey, F., Deroubaix, A., and Wang, T., 2021: Changes in global air pollutant emissions during the COVID-19 pandemic: a
dataset for atmospheric modeling, Earth Syst. Sci. Data, 13,4191-4206, https://doi.org/10.5194/essd-13-4191-2021.

31 Global Modeling and Assimilation Office (GMAO). lnst3_3d_asm_Cp; MERRA-2 IAU State Meteorology Instantaneous 3-
hourly (p-coord, 0.625x0.5L42), version 5.12.4, Greenbelt, MD, USA: Goddard Space Flight Center (GSFC DAAC), 2015. Doi:
10.5067/VJAFPLlCSIV.

-------
Mean bias (MB) is used as average of the difference (predicted - observed) divided by the total number
of replicates (n). Mean bias is defined as:

MB = -£i(P — 0) , where P = predicted and 0 = observed concentrations.

Mean error (ME) calculates the absolute value of the difference (predicted - observed) divided by the
total number of replicates (n). Mean error is defined as:

ME = ^\P-0\

Normalized mean bias (NMB) is used as a normalization to facilitate a range of concentration
magnitudes. This statistic averages the difference (model - observed) over the sum of observed values.
NMB is a useful model performance indicator because it avoids overinflating the observed range of
values, especially at low concentrations. Normalized mean bias is defined as:

i(P-O)

NMB = _j	*100, where P = predicted concentrations and 0 = observed

n

E(o)

i

Normalized mean error (NME) is also similarto NMB, where the performance statistic is used as a
normalization of the mean error. NME calculates the absolute value of the difference (model - observed)
over the sum of observed values. Normalized mean error is defined as:

i\p-c*

_i	

n

Z(O)

NME = i *100

The performance statistics were calculated using predicted and observed data that were paired in time
and space on an 8-hour basis. Statistics were generated for each of the nine National Oceanic and
Atmospheric Administration (NOAA) climate regions32 of the 12-km U.S. modeling domain (Figure 4-2).
The regions include the Northeast, Ohio Valley, Upper Midwest, Southeast, South, Southwest, Northern
Rockies, Northwest, and West33,34 as were originally identified in Karl and Koss (1984).35

32	NOAA, National Centers for Environmental Information scientists have identified nine climatically consistent regions within
the contiguous U.S., http://www.ncdc.noaa.gov/monitoring-references/maps/us-climate-regions.php.

33	The nine climate regions are defined by States where: Northeast includes CT, DE, ME, MA, MD, NH, NJ, NY, PA, Rl, and VT;
Ohio Valley includes IL, IN, KY, MO, OH, TN, and WV; Upper Midwest includes IA, Ml, MN, and Wl; Southeast includes AL, FL,
GA, NC, SC, and VA; South includes AR, KS, LA, MS, OK, and TX; Southwest includes AZ, CO, NM, and UT; Northern Rockies
includes MT, NE, ND, SD, WY; Northwest includes ID, OR, and WA; and West includes CA and NV.

34	Note most monitoring sites in the West region are located in California (see Figure 4-2), therefore statistics for the West will
be mostly representative of California ozone air quality.

35	Karl, T. R. and Koss, W. J., 1984: "Regional and National Monthly, Seasonal, and Annual Temperature Weighted by Area,
1895-1983." Historical Climatology Series 4-3, National Climatic Data Center, Asheville, NC, 38 pp.

51

-------
U.S. Climate Regions

Figure 4-2. NOAA Nine Climate Regions (source: http://www.ncdc.noaa.gov/rnonitoring-references/maps/us-

climate-regions.php#references).

In addition to the performance statistics, regional maps which show the MB, ME, NMB, and NME were
prepared for the ozone season, May through September, at individual monitoring sites as well as on an
annual basis for PM2.sand its component species.

Evaluation for 8-hour Daily Maximum Ozone: The operational model performance evaluation for S-hour
daily maximum ozone was conducted using the statistics defined above. Ozone measurements in the
continental U.S. were included in the evaluation and were taken from the 2021 state/local monitoring
site data in AQS and the Clean Air Status and Trends Network (CASTNet).

The 8-hour ozone model performance bias and error statistics for each of the nine NOAA climate regions
and each season are provided in Table 4-4. Seasons were defined as: winter (December-January-
February), spring (March-April-May), summer (June-July-August), and fall (September-October-
November). In some instances, observational data were excluded from the analysis and model
evaluation based on a completeness criterion of 75 percent. Spatial plots of the MB, ME, NMB, and NME
for individual monitors are shown in Figures 4-3 through 4-6, respectively. The statistics shown in these
two figures were calculated over the ozone season, April through September, using data pairs on days
with observed 8-hour ozone of greater than or equal to 60 ppb.

In general, the model performance statistics indicate that the 8-hour daily maximum ozone
concentrations predicted by the 2021 CMAQ simulation closely reflectthe corresponding 8-hour
observed ozone concentrations in space and time in each subregion of the 12-km modeling domain. As
indicated by the statistics in Table 4-4, bias and error for 8-hour daily maximum ozone are relatively low
in each subregion, not only in the summer when concentrations are highest, but also during other times
of the year. Generally, 8-hour ozone at the AQS and CASTNet sites in the summer is over predicted at all

-------
climate regions (NMB ranging between 1.2 to 21.2 percent) except in the Southwest, Northwest, West,
Northern Rockies, and Upper Midwest at CASTNet sites only where there is a slight under prediction. Likewise,
8-hour ozone at the AQS and CASTNet sites in the fall is typically over predicted across the contiguous
U.S. (NMB ranging between 1.0 to 16.5 percent) except in the Southwest and West at CASTNet sites
only. In the winter, 8-hour ozone is overpredicted in all climate regions at AQS and CASTNet sites (NMB
ranging between 0.2 to 20.5 percent) except in the Southwest at CASTNet sites. However, in the spring,
8-hour ozone concentrations are under predicted at all CASTNet sites in all NOAA climate regions (with
NMBs less than approximately 10 percent in each subregion) except in the South and at AQS sites in the
Northeast, Southwest, Northern Rockies, Northwest, and West (slight over prediction of NMB ranging
between 0.5 and 6.2 percent).

Model bias at individual sites during the ozone season is similar to that seen on a subregional basis for
the summer. Figure 4-3 shows the mean bias for 8-hour daily maximum ozone greater than 60 ppb is
generally ± 15 ppb across the AQS and CASTNet sites. Likewise, the information in Figure 4-5 indicates
that the normalized mean bias for days with observed 8-hour daily maximum ozone greater than 60 ppb
is within ± 20 percent at the vast majority of monitoring sites across the U.S. domain. Model error, as
seen from Figures 4-4 and 4-6, is generally 2 to 16 ppb and 30 percent or less at most of the sites across
the U.S. modeling domain. Somewhat greater error is evident at sites in several areas most notably in
central California, Northern Rockies, Upper Midwest, and Southeast.

Table 4-4. Summary of CMAQ 2021 8-Hour Daily Maximum Ozone Model Performance Statistics by
NOAA climate region, by Season and Monitoring Network.

Climate

Monitor

No. of

NMB

region

Network

Season

Obs

(ppb)

(%)

AQS

Winter

10,552

5.0

3.6

12.1

15.4

Spring

16,053

-0.0

4.4

-0.1

10.2

Summer

16,608

5.0

7.5

12.1

18.0

Northeast

Fall

12,728

5.5

6.5

16.5

19.4

CASTNet

Winter

1,225

2.85

4.1

8.1

12.0

Spring

1,264

-1.4

4.4

-3.1

9.7

Summer

1,265

3.4

6.5

8.4

16.1

Fall

1,252

4.5

5.9

13.6

17.6

AQS

Winter

5,773

6.3

7.1

20.5

23.2

Spring

20,787

1.1

4.3

2.5

10.1

Summer

20,461

4.9

7.4

11.2

16.9

Ohio Valley

Fall

15,400

5.6

6.7

15.3

18.3

CASTNet

Winter

1,586

4.9

6.3

15.0

19.1

Spring

1,615

-1.0

4.5

-2.2

9.8

Summer

1,616

3.9

6.6

9.5

16.0

Fall

1,606

3.9

5.6

10.8

15.5

-------
Climate

Monitor



No. of

MB

ME

NMB



region

Network

Season

Obs

(ppb)

(ppb)

(%)







AQS

Winter

1,794

5.5

5.9

16.6

17.8





Spring

8,332

0.7

4.6

1.7

11.0





Summer

8,789

0.5

6.4

1.2

14.2

Upper



Fall

6,051

4.0

5.7

11.5

16.3















Midwest















CASTNet

Winter

443

4.6

4.9

13.9

14.7





Spring

456

-0.9

4.6

-2.1

9.9





Summer

439

-0.5

5.8

-1.3

13.7





Fall

444

3.5

4.2

10.9

15.2



















AQS

Winter

7,092

3.8

5.4

11.0

15.7





Spring

15,348

0.3

4.4

0.7

9.7





Summer

14,822

7.2

8.1

21.2

23.9





Fall

12,018

6.3

7.1

17.3

19.5

Southeast

















CASTNet

Winter

983

2.3

4.9

6.5

13.8





Spring

1,023

-1.7

4.4

-3.7

9.4





Summer

1,067

5.5

7.2

15.6

20.2





Fall

1,077

3.9

5.8

10.6

15.5



















AQS

Winter

10,192

4.7

6.6

14.8

20.6





Spring

12,797

2.6

5.5

6.2

13.2





Summer

12,338

7.1

9.0

18.3

23.3





Fall

11,840

3.5

6.3

8.9

15.7

South

















CASTNet

Winter

507

4.2

6.3

12.4

18.5





Spring

531

0.2

4.2

0.5

9.6





Summer

538

4.8

7.8

12.4

20.1





Fall

531

3.1

5.9

7.9

15.0



















AQS

Winter

10,325

1.1

4.6

2.9

11.9





Spring

11,348

-2.1

5.3

-4.2

10.3





Summer

11,235

-5.8

7.7

-9.9

13.3





Fall

11,018

0.4

4.8

1.0

10.6

Southwest

















CASTNet

Winter

991

-0.2

3.2

-0.5

7.5





Spring

1,063

-2.9

4.7

-5.5

9.0





Summer

1,061

-4.6

6.6

-8.2

11.8





Fall

1,070

-0.0

3.7

-0.0

8.0

















54

-------
Climate

Monitor



No. of

Hi







region

Network

Season

Obs











AQS

Winter

4,177

4.0

5.0

11.0

14.0





Spring

4,539

-0.5

4.6

-1.2

10.5





Summer

4,481

-5.5

8.1

-10.5

15.5

Northern



Fall

4,173

1.3

4.3

3.4

11.2

Rockies

















CASTNet

Winter

775

2.4

4.0

6.2

10.2





Spring

795

-1.8

3.4

-4.0

9.1





Summer

803

-6.6

7.9

-12.4

15.0





Fall

790

0.9

4.2

2.3

10.3



















AQS

Winter

745

2.8

4.7

8.6

14.5





Spring

1,522

-3.1

5.1

-7.3

12.2





Summer

2,384

-3.2

6.6

-7.5

15.7

Northwest



Fall

1,290

0.9

5.4

2.4

14.9

















CASTNet

Winter

256

3.5

4.5

9.8

12.5





Spring

271

-2.7

4.3

-6.1

9.7





Summer

273

-7.5

8.1

-14.7

16.0





Fall

268

1.9

5.5

5.1

14.9



















AQS

Winter

14,139

2.4

5.0

7.2

14.6





Spring

16,287

-1.3

4.7

-2.8

10.2





Summer

16,179

-3.5

7.9

-6.7

15.2

West



Fall

15,267

0.5

5.5

1.1

12.8

















CASTNet

Winter

592

0.1

3.4

0.2

8.5





Spring

619

-5.2

5.8

-10.1

11.3





Summer

623

-10.8

11.2

-17.7

18.3





Fall

591

-2.4

5.0

-5.1

10.6

-------
03 8hrmax MB (ppb) for run CMAQ 2021 hb MP 12US1 for 20210401 to 20210930

units = ppb
coverage limit = 75%

* CASTNET Daily • AQS Daily

Figure 4-3. Mean Bias (ppb) of 8-hour daily maximum ozone greater than 60 ppb over the period April-
September 2021 at AQS and CASTNet monitoring sites in the continental U.S. modeling domain.

units = ppb
coverage limit = 75%

* CASTNET Daily • AQS Daily

03_8hrmax ME (ppb) for run CMAQ 2021hb_MP 12US1 lor 20210401 lo 20210930

Figure 4-4. Mean Error (ppb) of 8-hour daily maximum ozone greater than 60 ppb over the period
April-September 2021 at AQS and CASTNet monitoring sites in the continental U.S. modeling domain.

56

-------
03_8hrmax NMB (%) for run CMAQ_2021 hb_MP_12US1 for 20210401 to 20210930

* CASTNET Daily • AQS Daily

Figure 4-5. Normalized Mean Bias (%) of 8-hour daily maximum ozone greater than 60 ppb over the
period April-September 2021 at AQS and CASTNet monitoring sites in the continental

U.S. modeling domain.

03_8hrmax NME (%) for run CMAQ_2021 hb_MP_12US1 for 20210401 to 20210930

a CASTNET Daily • AQS Daily

Figure 4-6. Normalized Mean Error (%) of 8-hour daily maximum ozone greater than 60 ppb over the
period April-September 2021 at AQS and CASTNet monitoring sites in the continental

U.S. modeling domain.

57

-------
Evaluation for Annual PMp.s Components: The PM evaluation focuses on PM2.5 components including
sulfate (SO4), nitrate (NO3), total nitrate (TNO3 = NO3 + HNO3), ammonium (NH4), elemental carbon (EC),
and organic carbon (OC). The bias and error performance statistics were calculated on an annual basis
for each of the nine NOAA climate subregions defined above (provided in Table 4-5). PM2.5
measurements for 2021 were obtained from the following networks for model evaluation: Chemical
Speciation Network (CSN, 24-hour average), Interagency Monitoring of Protected Visual Environments
(IMPROVE, 24-hour average, and Clean Air Status and Trends Network (CASTNet), weekly average). For
PM2.5 species that are measured by more than one network, we calculated separate sets of statistics for
each network by subregion. In addition to the tabular summaries of bias and error statistics, annual
spatial maps which show the mean bias, mean error, normalized mean bias, and normalized mean error
by site for each PM2.5 species are provided in Figures 4-7 through 4-30.

As indicated by the statistics in Table 4-5, annual average sulfate is consistently under predicted at
CASTNet, IMPROVE, and CSN monitoring sites across the 12-km modeling domain (with MB values
ranging from -0.0 to -0.6 ngm"3) except at IMPROVE and CSN sites in the Northwest (over prediction, 0.1
to 0.2 |-ignr3, respectively). Sulfate performance shows moderate error in the eastern subregions
(average of approximately 30-50 percent) while Western subregions show slightly larger error (ranging
from 30 to 80 percent). Figures 4-7 through 4-10, suggest spatial patterns vary by region. The model
bias for most of the Northeast, Southeast, Ohio Valley, and Southwest states are under predicted
within ± 40 percent. The model bias appears to be greater in the Northwest with predictions up to
approximately 60-80 percent at individual monitors. Model error also shows a spatial trend by region,
where much of the Eastern states are 30 to 50 percent, the Western and Central U.S. states are 40 to
100 percent.

Annual average nitrate is under predicted at the rural IMPROVE monitoring sites at all NOAA climate
subregions (NMB averaging of-40 percent), except in the Northeast, Ohio Valley, Southeast, and
Northwest where nitrate is over predicted (between 30 to 93 percent). At CSN urban sites, annual
average nitrate is over predicted at all subregions, except in the Southwest (-40.3 percent), Northern
Rockies (-27.1 percent), and West (-50.0 percent) where nitrate is under predicted. Likewise, model
performance of total nitrate at sub-urban CASTNet monitoring sites shows an under prediction at all
subregions (NMB in the range of-4.2 to -47.8 percent), except in the Northeast (24.7 percent), Ohio
Valley (6.4 percent), Southeast (2.0 percent), and South (27.2 percent). Model error for nitrate and total
nitrate is somewhat greater for each of the nine NOAA climate subregions as compared to sulfate.

Model bias at individual sites indicates over prediction of greater than 10 percent at monitoring sites
along the upper Northeast, and Northwest coastline as well as in the South and Southeast as indicated in
Figure 4-13. The exception to this is in the Southwest, Northern Rockies, and Western U.S. of the
modeling domain where there appears to be a greater number of sites with under prediction of nitrate
of 10 to 80 percent.

Annual average ammonium model performance as indicated in Table 4-5 has a tendency for the model
to under predict across CASTNet sites (ranging from -13.2 to -75.6 percent). Ammonium performance
across the urban CSN sites shows an over prediction in all NOAA climate subregions (ranging from 6.7 to
>100 percent), except under predictions in the Southwest (-51.9 percent), Northern Rockies (-7.6

58

-------
percent), and West (-53.4 percent). The spatial variation of ammonium across the majority of individual
monitoring sites in the Eastern U.S. shows bias within ± 50 percent (Figures 4-19 and 4-21). A larger bias
is seen in the Northeast and in the Northern Rockies, (over prediction bias on average 80 to 100
percent). The urban monitoring sites exhibit slightly larger errors than at rural sites for ammonium.

Annual average elemental carbon is under predicted in all of the nine climate regions at urban and rural
sites (biases between -11.1 to -55.9 percent) except at urban Northwest sites (over prediction ranging
between 0.5 to 33.0 percent). There is not a large variation in error statistics from subregion to
subregion or at urban versus rural sites.

Like elemental carbon, annual average organic carbon is under predicted in all of the nine climate
regions at urban and rural sites (biases between -4.0 to 69.2 percent) except at urban Northwest CSN
sites (over prediction of 64.2 percent). Similarly, error model performance does not show a large
variation from subregion to subregion or at urban versus rural sites.

Table 4-5. Summary of CMAQ 2021 Annual PM Species Model Performance Statistics
by NOAA Climate region, by Monitoring Network.

Pollutant

Monitor
Network

Subregion

No. of
Obs

MB

(lagnv3)

ME

(lagnv3)

NMB

(%)

NME

(%)



CSN

Northeast

3,069

-0.3

0.4

-34.6

41.5



Ohio Valley

2,261

-0.3

0.4

-27.7

39.4





Upper
Midwest

1,062

-0.2

0.3

-22.4

36.9



Southeast

1,740

-0.2

0.3

-21.9

40.8





South

1,066

-0.3

0.5

-28.5

43.6



Southwest

1,116

-0.2

0.3

-48.6

53.3





Northern
Rockies

548

-0.1

0.2

-24.2

42.6



Northwest

724

0.2

0.3

52.5

75.5

Sulfate



West

1,853

-0.2

0.4

-30.0

50.6

















IMPROVE

Northeast

1,959

-0.2

0.2

-38.9

44.9



Ohio Valley

922

-0.4

0.4

-40.9

44.6





Upper
Midwest

923

-0.2

0.2

-31.8

42.4



Southeast

1,583

-0.3

0.4

-36.3

46.0





South

1,080

-0.3

0.5

-37.9

46.4



Southwest

3,775

-0.2

0.2

-48.8

54.1





Northern
Rockies

2,121

-0.1

0.1

-28.5

47.2



Northwest

1,905

0.1

0.2

39.0

89.4





West

2,352

-0.1

0.3

-30.8

66.7





59

-------
Monitor	No. of MB ME NMB NME

Pollutant Network Subregion Obs (lagnr3) (|agnr3) (%)	(%)

















CASTNet

Northeast

883

-0.4

0.4

-48.4

48.5



Ohio Valley

894

-0.5

0.5

-47.3

47.6





Upper
Midwest

250

-0.3

0.3

-42.8

43.6

Sulfate



Southeast

631

-0.5

0.5

-53.8

54.1





South

381

-0.6

0.6

-50.9

51.2



Southwest

444

-0.2

0.2

-58.9

58.9





Northern















Rockies

517

-0.2

0.2

-46.7

47.9



Northwest

99

-0.0

0.1

-25.9

37.1





West

280

-0.3

0.4

-60.2

65.5



CSN

Northeast

3,068

0.3

0.6

31.3

67.8



Ohio Valley

2,260

0.7

0.6

14.2

49.9





Upper
Midwest

1,061

0.1

0.6

7.5

41.1



Southeast

1,739

0.3

0.5

95.2

>100





South

1,064

0.0

0.4

2.8

69.3



Southwest

1,116

-0.3

0.6

-40.3

69.0





Northern















Rockies

545

-0.2

0.4

-27.1

49.5



Northwest

724

0.6

0.9

>100

>100





West

1,853

-1.1

1.4

-50.0

63.0

Nitrate

















IMPROVE

Northeast

1,958

0.3

0.3

93.1

>100



Ohio Valley

922

0.1

0.4

29.4

78.2





Upper
Midwest

920

-0.0

0.3

-2.2

50.2



Southeast

1,582

0.1

0.3

62.7

>100





South

1,080

-0.0

0.3

-5.3

71.7



Southwest

3,774

-0.1

0.1

-62.7

84.9





Northern















Rockies

2,121

-0.0

0.1

-24.2

69.9



Northwest

1,890

0.0

0.2

29.7

>100





West

2,350

-0.2

0.3

-46.3

70.1

















CASTNet

Northeast

883

0.2

0.4

24.7

40.2

Total



Ohio Valley

894

0.1

0.4

6.4

26.9

Nitrate
(N03 +



Upper
Midwest

250

-0.0

0.3

-4.2

24.9

HNO3)



Southeast

631

0.0

0.5

2.0

53.1

60

-------
Monitor
Pollutant Network

Subregion

No. of
Obs

MB

(lagnr3)

ME

(lagnr3)

NMB

(%)

NME

(%)



South

381

-0.1

0.3

27.2

-12.3

Total

Southwest

444

-0.1

0.2

-26.1

38.4

Nitrate
(N03 +

Northern
Rockies

517

-0.1

0.2

-24.3

34.2

HNO3)

Northwest

99

-0.0

0.1

-5.3

28.6



West

280

-0.6

0.6

-47.8

51.8



CSN

Northeast

3,068

0.0

0.2

21.5

68.5



Ohio Valley

2,261

0.0

0.2

17.9

57.1



Upper
Midwest

1,062

0.0

0.2

14.3

48.4



Southeast

1,738

0.0

0.2

32.1

90.0



South

1,065

0.0

0.2

6.7

62.7



Southwest

1,114

-0.1

0.2

-51.9

75.6



Northern
Rockies

548

-0.0

0.1

-7.6

53.6



Northwest

721

0.1

0.2

>100

>100

Ammonium

West

1,850

-0.3

0.5

-53.4

70.0



CASTNet

Northeast

883

-0.0

0.1

-13.2

45.9



Ohio Valley

894

-0.1

0.2

-21.3

38.8



Upper
Midwest

250

-0.0

0.1

-22.5

38.3



Southeast

631

-0.0

0.1

-26.4

56.8



South

381

-0.1

0.1

-28.7

43.7



Southwest

444

-0.1

0.1

-65.1

67.1



Northern
Rockies

517

-0.1

0.1

-54.3

60.3



Northwest

99

-0.0

0.1

-58.8

69.0



West

280

-0.1

0.2

-75.6

80.9



CSN

Northeast

3,032

-0.1

0.3

-26.9

47.6



Ohio Valley

2,238

-0.2

0.3

-42.6

50.0

Elemental

Upper
Midwest

1,163

-0.1

0.2

-27.7

48.5

Carbon

Southeast

1,617

-0.4

0.4

-49.5

54.5



South

1,072

-0.1

0.1

-46.0

58.5



Southwest

1,119

-0.2

0.3

-30.7

52.5



Northern
Rockies

528

-0.2

0.2

-49.0

59.7

61

-------
Monitor	No. of MB ME NMB NME

Pollutant Network Subregion Obs (lagnr3) (|agnr3) (%)	(%)



Northwest

730

0.2

0.4

33.0

71.9



West

1,235

-0.3

0.4

-35.2

47.6



IMPROVE

Northeast

1,804

-0.0

0.1

-11.1

50.3



Ohio Valley

922

-0.1

0.1

-47.5

51.9



Upper
Midwest

1,029

-0.1

0.1

-40.1

58.3

Elemental

Southeast

1,673

-0.1

0.1

-45.9

51.2

Carbon

South

1,021

-0.2

0.3

-44.5

50.3



Southwest

3,688

-0.1

0.1

-55.9

61.9



Northern













Rockies

2,172

-0.0

0.1

-13.9

64.9



Northwest

1,811

0.0

0.1

0.5

76.9



West

2,225

-0.0

0.1

-27.6

61.4



CSN

Northeast

3,032

-0.0

1.1

-4.0

54.8



Ohio Valley

2,237

-0.6

0.9

-30.6

42.7



Upper
Midwest

1,163

-0.6

1.0

-31.5

49.9



Southeast

1,615

-0.3

1.0

-13.4

41.5



South

1,021

-0.8

1.1

-37.0

50.7



Southwest

1,119

-0.7

1.2

-36.2

58.4



Northern













Rockies

528

-1.3

1.4

-65.8

70.9



Northwest

730

1.2

2.0

64.2

>100

Organic

West

1,235

-0.9

1.4

-31.9

48.5

Carbon

IMPROVE

Northeast

1,819

-0.2

0.5

-19.5

53.0



Ohio Valley

923

-0.4

0.5

-34.1

43.5



Upper
Midwest

1,045

-0.5

0.7

-43.6

60.8



Southeast

1,693

-0.3

0.6

-23.2

49.6



South

1,080

-0.5

0.6

-45.1

56.3



Southwest

3,749

-0.6

0.7

-69.2

72.4



Northern













Rockies

2,224

-0.9

1.1

-62.4

74.9



Northwest

1,877

-0.3

1.2

-24.2

87.6



West

2,286

-0.9

1.3

-47.5

70.3

62

-------
S04 MB (ug/m3) for run CMAQ 2021 hb JilP 12US1 for 20210101 to 20211231

units = ug/m3
coverage limit =



,







10
1.0



1.0

1.1







1

0.8
0.6
0.4











0





0..







-0,





I



• IMPROVE	CSN	¦ CASTNET Weekly

Figure 4-7. Mean Bias (ngrrr3) of annual sulfate at monitoring sites in the continental U.S.

modeling domain.

S04 ME (ug/m3) for run CMAQ 2021 hb_MP 12US1 for 20210101 to 20211231

• IMPROVE	* CSN	¦ CASTNET Weekly

Figure 4-8. Mean Error (|igm3) of annual sulfate at monitoring sites in the continental U.S.

modeling domain.

63

-------
S04 NMB (%) tor run CMAQ 2021 hb MP 12US1 for 20210101 to 20211231

units = %

coverage limit = 75%

• IMPROVE	CSN	¦ CASTNET Weekly

Figure 4-9. Normalized Mean Bias (%) of annual sulfate at monitoring sites in the continental U.S.

modeling domain.

units = %

coverage limit = 75%

• IMPROVE	* CSN	¦ CASTNET Weekly

Figure 4-10. Normalized Mean Error (%) of annual sulfate at monitoring sites in the continental U.S.

modeling domain.

12US1 tor 20210101 to 20211231

> 100

90

80

70

60

50

40

30

20

10

S04 NME

64

-------
NQ3 MB (ug/m3) for run CMAQ 2021 hb MP 12US1 for 20210101 to 20211231

units = ug/m3
coverage limit = 75%

0.6
0.4
0.2
0

-0.2
-0.4
| -0.6
-0.8

-1

I -1.2

-1.4
| -1.6
-1.8

I <-2

• IMPROVE a CSN

Figure 4-11. Mean Bias (|igm 3) of annual nitrate at monitoring sites in the continental U.S.

modeling domain.

N03 ME (ug/m3) for run CMAQ 2021 hb_MP 12US1 for 20210101 to 20211231

units = ug/m3
coverage limit = 75%

• IMPROVE CSN

Figure 4-12. Mean Error fugnr3) of annual nitrate at monitoring sites in the continental U.S.

modeling domain.

65

-------
N03 NMB (%) for run CMAQ 2021 hb MP 12US1 for 20210101 to 20211231

units = %

coverage limit = 75%

• IMPROVE CSN

Figure 4-13. Normalized Mean Bias (%) of annual nitrate at monitoring sites in the continental U.S.

modeling domain.

units = %

coverage limit = 75%

>100

• IMPROVE CSN

Figure 4-14. Normalized Mean Error (%) of annual nitrate at monitoring sites in the continental U.S.

modeling domain.

N03 NME (%) for run CMAQ 2021 hb_MP 12US1 for 20210101 to 20211231

66

-------
TN03 MB (ug/m3) for run CMAQ_2021

12US1 for 20210101 to 20211231

units - ug/m3
coverage limit = 75%

• CASTNET Weekly

Figure 4-16. Mean Error (|ignr3) of annual total nitrate at monitoring sites in the continental U.S.

modeling domain.

• CASTNET Weekly

Figure 4-15. Mean Bias (ngnr3) of annual total nitrate at monitoring sites in the continental U.S.

modeling domain.

	TNQ3 ME (ug/m3) lor run CMAQ_2021 hb_MP_12US1 for 20210101 to 20211231

-------
TNQ3 NMB (%) for run CMAQ_2021

12US1 for 20210101 to 20211231

unils = %

coverage limit = 75%

> 100

90

80

70

60

50

40

30

20

10

0

-10

-20
-30
-40
-50
-60
-70
-80
-90
<-100

units - %

coverage limit = 75%

• CASTNET Weekly

Figure 4-18. Normalized Mean Error (%) of annual total nitrate at monitoring sites in the continental U.S.

modeling domain.

• CASTNET Weekly

Figure 4-17. Normalized Mean Bias (%) of annual total nitrate at monitoring sites in the continental U.S.

modeling domain.

	TNQ3 NME (%) for run CMAQ_2021hb_MP_12US1 for 20210101 to 20211231

68

-------
NH4 MB (ug/m3) for run CMAQ_2021 hb_MP_12US1 for 20210101 to 20211231

units = ug/m3
coverage limit = 75%

CSN

CASTNET Weekly

Figure 4-20. Mean Error (ngnr3) of annual ammonium at monitoring sites in the continental U.S.

modeling domain.

Figure 4-19. Mean Bias (ngm3)

NH4ME

CSN	± CASTNET Weekly

of annual ammonium at monitoring sites in the continental U.S.
modeling domain.

12US1 for 20210101 to 20211231

69

-------
NH4 NMB (%) for run CMAQ_2021hb_MP_12US1 for 20210101 to 20211231

units = %

coverage limit = 75%

> 100

90

80

70

60

50

40

30

20

10

0

-10
-20
-30
-40
-50
-60
-70
-80

units = %

coverage limit = 75%

• CSN	* CASTNET Weekly

Figure 4-22. Normalized Mean Error (%) of annual ammonium at monitoring sites in the continental U.S.

modeling domain.

• CSN	* CASTNET Weekly

Figure 4-21. Normalized Mean Bias (%) of annual ammonium at monitoring sites in the continental U.S.

modeling domain.

NH4 NME (%) for run CMAQ 2021hb MP 12US1 for 20210101 to 20211231

70

-------
EC MB (ug/m3) for run CMAQ 2021 hb_MP_12US1 for 20210101 to 20211231

• IMPROVE * CSN

Figure 4-23. Mean Bias (ngnr3) of annual elemental carbon at monitoring sites in the continental U.S.

modeling domain.

EC ME (ug/m3) for run CMAQ 2020ha2 MP cb6r5hap_ae7 12US1 for 20200101 to 20201231

• IMPROVE ^ CSN

Figure 4-24. Mean Error (ngrrr3) of annual elemental carbon at monitoring sites in the continental U.S.

modeling domain.

71

-------
EC NMB {%) for run CMAQ_2021hb_MP_12US1 tor 20210101 to 20211231

units = %

coverage limit = 75%

• IMPROVE * CSN

Figure 4-25. Normalized Mean Bias (%) of annual elemental carbon at monitoring sites in the continental

U.S. modeling domain.

units = %

coverage limit = 75%

• IMPROVE * CSN

for run CMAQ 2021hb MP 12US1 for 20210101 to 20211231

> 100

90

80

70

60

50

40

30

20

10

0

EC NME

Figure 4-26. Normalized Mean Error (%) of annual elemental carbon at monitoring sites in the continental

U.S. modeling domain.

72

-------
OC MB (ug/m3) for run CMAQ 2021 hb_MP_12US1 for 20210101 to 20211231

units = ug/m3
coverage limit =



,







10
1.0



1.0



l'



1^



1

0.8
0.6
0.4









0





0..







-0,





I



IMPROVE CSN

Figure 4-28. Mean Error (|ignr3) of annual organic carbon at monitoring sites in the continental U.S.

modeling domain.

• IMPROVE CSN

,3*

Figure 4-27. Mean Bias (ngm3) of annual organic carbon at monitoring sites in the continental U.S.

modeling domain.

OC ME (ug/m3) for run CMAQ 2021hb MP 12US1 for 20210101 to 20211231

73

-------
for 20210101 to 20211231

OC NMB

units = %

coverage limit = 75%

• IMPROVE ± CSN

Figure 4-29. Normalized Mean Bias (%) of annual organic carbon at monitoring sites in the continental U.S.

modeling domain.

units = %

coverage limit = 75%

Figure 4-30. Normalized Mean Error (%) of annual organic carbon at monitoring sites in the continental U.S.

modeling domain.

for 20210101 to 20211231

> 100

90

80

70

60

50

40

30

20

10

0

• IMPROVE CSN

OC NME

74

-------
5.0 Bayesian space-time downscaling fusion model (downscaler) -

Derived Air Quality Estimates

5.1	Introduction

The need for greater spatial coverage of air pollution concentration estimates has grown in recent years as
epidemiology and exposure studies that link air pollution concentrations to health effects have become more
robust and as regulatory needs have increased. Direct measurement of concentrations is the ideal way of
generating such data, but prohibitive logistics and costs limit the possible spatial coverage and temporal
resolution of such a database. Numerical methods that extend the spatial coverage of existing air pollution
networks with a high degree of confidence are thus a topic of current investigation by researchers. The
downscaler model (DS) is the result of the latest research efforts by EPA for performing such predictions. DS
utilizes both monitoring and CMAQ data as inputs and attempts to take advantage of the measurement
data's accuracy and CMAQ's spatial coverage to produce new spatial predictions. This chapter describes
methods and results of the DS application that accompany this report, which utilized ozone and PIVh.sdata
from AQS and CMAQ to produce predictions to continental U.S. 2020 census tract centroids for 2021.

5.2	Downscaler Model

DS develops a relationship between observed and modeled concentrations, and then uses that relationship
to spatially predict what measurements would be at new locations in the spatial domain based on the input
data. This process is separately applied for each time step (daily in this work) of data, and for each of the
pollutants under study (ozone and PM2.5). In its most general form, the model can be expressed in an
equation similar to that of linear regression:

Y(s) = /?0(s) + ¦ x(s) + e(s)

(Equation 1)

Where:

•	F(s) is the observed concentration at point s. Note that F(s) could be expressed as Yt(s), where t
indicates the model being fit at time t (in this case, t=l,...,365 would represent day of the year.)

•	x(s) is the point-level regressor based on the CMAQ concentration at point s. This value is a
weighted average of both the gridcell containing the monitor and neighboring gridcells.

•	Po(s) is the intercept, where /?0(s) = /?0 + /?o(5) 's composed of both a global component /?0 and a
local component /?0(5) that is modeled as a mean-zero Gaussian Process with exponential decay

•	is the global slope; local components of the slope are contained in the x(s) term.

•	e(s) is the model error.

75

-------
DS has additional properties that differentiate it from linear regression:

1. Rather than just finding a single optimal solution to Equation 1, DS uses a Bayesian approach so that
uncertainties can be generated along with each concentration prediction. This involves drawing
random samples of model parameters from built-in "prior" distributions and assessing their fit on the
data on the order of thousands of times. After each iteration, properties of the prior distributions are
adjusted to try to improve the fit of the next iteration. The resulting collection of /?0 and /?x values at
each space-time point are the "posterior" distributions, and the means and standard distributions of
these are used to predict concentrations and associated uncertainties at new spatial points.

2. The model is "hierarchical" in structure, meaning that the top-level parameters in Equation 1 (i.e.
/?0(s), /?i, x(s)), are actually defined in terms of further parameters and sub-parameters in the DS
code. For example, the overall slope and intercept is defined to be the sum of a global (one value for
the entire spatial domain) and local (values specific to each spatial point) component. This gives more
flexibility in fitting a model to the data to optimize the fit (i.e. minimize e(s)).

Further information about the development and inner workings of the current version of DS can be found in
Berrocal, Gelfand and Holland (2012)36 and references therein. The DS outputs that accompany this report
are described below, along with some additional analyses that include assessing the accuracy of the DS
predictions. Results are then summarized, and caveats are provided for interpreting them in the context of
air quality management activities.

5.3 Downscaler Concentration Predictions

In this application, DS was used to predict daily concentration and associated uncertainty values at the 2020
U.S. census tract centroids across the continental U.S. using measurement and CMAQ data as inputs. For
ozone, the concentration unit is the daily maximum 8-hour average in ppb and for PM2.5 the concentration
unit is the 24-hour average in pLg/m3.

5.3.1 Summary of 8-hour Ozone Results

Figure 5-1 summarizes the AQS, CMAQ, and DS ozone data over the year 2021. It shows the 4th max daily
maximum 8-hour average ozone for AQS observations, CMAQ model predictions, and DS model results. The
DS model estimated that for 2021, about 42% of the U.S. Census tracts (35384 out of 83776) experienced at
least one day with an ozone value above the NAAQS of 70 ppb.

36 Berrocal, V., Gelfand, A., and D. Holland. Space-Time Data Fusion Under Error in Computer Model Output: An Application to
Modeling Air Quality. Biometrics. 2012. September; 68(3): 837-848. doi:10.1111/j.l541-0420.2011.01725.

-------
AQS

120°W

110°W

100°W

90°W

80°W

Figure 5-1: Annual 4th max (daily max 8-hour ozone concentrations) derived from

AQS, CMAQ, and DS data.

45°N -
40°N -
35°N-
30°IM -
25°N -

45°N
40°IM
35°N
30°N
25CN

45°N -
40°N -
35°N -
30CIM -
25°N -

2021

4'th Max, Daily max
8-hour avg
ozone(ppb)

(-Inf, 55]

(55,60]

(60,65]

(65,70]

(70,75]

(75,80]

¦	(80,85]

¦	(85,90]

¦	(90, Inf]

CMAQ

77

-------
5.3.2 Summary ofPM2.s Results

Figures 5-2 and 5-3 summarize the AQS, CMAQ, and DS PM2.5 data over the year 2021. Figure 5-2 shows
annual means and Figure 5-3 shows 98th percentiles of 24-hour PM2.5 concentrations for AQS observations,
CMAQ model predictions, and DS model results. The DS model estimated that for 2021 about 43% of the U.S.
Census tracts (35753 out of 83776) experienced at least one day with a PM2.5 value above the 24-hour
NAAQS of 35 ^g/m3.

78

-------
AQS

45°N -

40°N -

35°N -

30°N -

45°N -

40°N -

35°N -

30°N -

25°N -

2021

Annual mean,
24-hour avg
PM2.5 (ug/m3)

(0,3]

(3,5]

(5,8]

(8,10]

(10,12]
(12,15]
(15,18]
¦ (18,1 nf]

110°W	100°W	90°W	80°W

mean PM2.5 concentrations derived from AQS, CMAQ, and DS data.

Figure 5-2: Annual

CMAQ

25°N -

120°W

79

-------
AQS

2021

98'th percentile,
24-hour avg
PM2.5 (ug/m3)

(0,10]

(10,15]

(15,20]

(20,25]

(25,30]

(30,35]

(35,40]

¦	(40,45]

¦	(45,50]

¦	(5 0, Inf]

45°N -

40°N -

35°N -

30°N -

25°N -

45°N -
40"N-
35°N -
30°N -
25°N -
120°W

110°W

100°W

90°W

80°W

CMAQ

Figure 5-3: 98th percentile 24-hour average PM2.s concentrations derived from AQS, CMAQ, and DS data.

80

-------
5.4 Downscaler Uncertainties

5.4.1 Standard Errors

As mentioned above, the DS model works by drawing random samples from built-in distributions during its
parameter estimation. The standard errors associated with each of these populations provide a measure of
uncertainty associated with each concentration prediction. Figures 5-4 and 5-5 show the percent errors
resulting from dividing the DS standard errors by the associated DS prediction. The black dots on the maps
show the location of EPA sampling network monitors whose data was input to DS via the AQS datasets
(Chapter 2). The maps show that, in general, errors are relatively smaller in regions with more densely
situated monitors (i.e. the eastern U.S.), and larger in regions with more sparse monitoring networks (i.e.
western states). These standard errors could potentially be used to estimate the probability of an
exceedance for a given point estimate of a pollutant concentration.

% DS Error:
ozone

¦	(5,10]

¦	(10,15]

¦	(15,20]

45°N -

40°N -

35°N -

30°N -

25°N -

Figure 5-4: Annual mean relative errors (standard errors divided by predictions) from the DS 2021 runs for
ozone. The black dots show the locations of monitors that generated the AQS data used as input to

the DS model.

81

-------
% DS Error:
pm25

¦	(20,30]
! (30,40]

(40,50]

¦	(50,75]

45°N-

40°N -

35°N -

30°N -

25°N -

Figure 5-5: Annual mean relative errors (standard errors divided by predictions) from the DS 2021 runs for
PM2.5. The black dots show the locations of monitors that generated the AQS data used as input to

the DS model.

5.4.2 Cross Validation

To check the quality of its spatial predictions, DS can be set to perform "cross-validation" (CV), which
involves leaving a subset of AQS data out of the model run and predicting the concentrations of those left
out points. The predicted values are then compared to the actual left-out values to generate statistics that
provide an indicator of the predictive ability. In the DS runs associated with this report, 10% of the data was
chosen randomly by the DS model to be used for the CV process. The resulting CV statistics are shown below
in Table 5-1.

Table 5-1: Cross-validation statistics associated with the 2021 DS runs.

Pollutant Monitor Count Mean Bias RMSE Mean Coverage
PM25	967 0.121 3.578	0.952

03	1237 0.020 4.329	0.960

The statistics indicated by the columns of Table 5-1 are as follows:

• Mean Bias; The bias of each prediction is the DS prediction minus the AQS value. This column is the
mean of all biases across the CV cases.

82

-------
• Root Mean Squared Error (RMSE): The bias is squared for each CV prediction, then the square root of
the mean of all squared biases across all CV predictions is obtained.

• Mean Coverage: A value of 1 is assigned if the measured AQS value lies in the 95% confidence interval
of the DS prediction (the DS prediction ± the DS standard error), and 0 otherwise. This column is the
mean of all those O's and l's.

5.5 Summary and Conclusions

The results presented in this report are from an application of the DS fusion model for characterizing
national air quality for ozone and PM2.5. DS provided spatial predictions of daily ozone and PM2.5 at 2020 U.S.
census tract centroids by utilizing monitoring data and CMAQ output for 2021. Large-scale spatial and
temporal patterns of concentration predictions are generally consistent with those seen in ambient
monitoring data. Both ozone and PM2.5 were predicted with lower error in the eastern versus the western
U.S., presumably due to the greater monitoring density in the east.

An additional caution that warrants mentioning is related to the capability of DS to provide predictions at
multiple spatial points within a single CMAQ grid cell. Care needs to be taken not to over-interpret any
within-grid cell gradients that might be produced by a user. Fine-scale emission sources in CMAQ are diluted
into the grid cell averages, but a given source within a grid cell might or might not affect every spatial point
contained therein equally. Therefore DS-generated fine-scale gradients are not expected to represent actual
fine-scale atmospheric concentration gradients, unless possibly where multiple monitors are present in the
grid cell.

-------
Appendix A - Acronyms

Acronyms



ARW

Advanced Research WRF core model

BEIS

Biogenic Emissions Inventory System

BlueSky

Emissions modeling framework

BSP

BlueSky Pipeline modeling system

CAIR

Clean Air Interstate Rule

CAMD

EPA's Clean Air Markets Division

CAP

Criteria Air Pollutant

CAR

Conditional Auto Regressive spatial covariance structure (model)

CARB

California Air Resources Board

CEM

Continuous Emissions Monitoring

CHIEF

Clearinghouse for Inventories and Emissions Factors

CMAQ

Community Multiscale Air Quality model

CMV

Commercial marine vessel

CO

Carbon monoxide

CSN

Chemical Speciation Network

DQO

Data Quality Objectives

EGU

Electric Generating Units

Emission Inventory

Listing of elements contributing to atmospheric release of pollutant substances

EPA

Environmental Protection Agency

EMFAC

Emission Factor (California's onroad mobile model)

FAA

Federal Aviation Administration

FDDA

Four-Dimensional Data Assimilation

FIPS

Federal Information Processing Standards

HAP

Hazardous Air Pollutant

HC

Hydrocarbon

HMS

Hazard Mapping System

ICS-209

Incident Status Summary form

IPM

Integrated Planning Model

UN

Itinerant

LSM

Land Surface Model

MOBILE

OTAQ's model for estimation of onroad mobile emissions factors

MODIS

Moderate Resolution Imaging Spectroradiometer

MOVES

Motor Vehicle Emission Simulator

NEEDS

National Electric Energy Database System

NEI

National Emission Inventory

NERL

National Exposure Research Laboratory

NESHAP

National Emission Standards for Hazardous Air Pollutants

NH

Ammonia

NMIM

National Mobile Inventory Model

NONROAD

OTAQ's model for estimation of nonroad mobile emissions

NO

Nitrogen oxides

84

-------
OAQPS	EPA's Office of Air Quality Planning and Standards

OAR	EPA's Office of Air and Radiation

ORD	EPA's Office of Research and Development

ORIS	Office of Regulatory Information Systems (code) - is a 4 or 5 digit

number assigned by the Department of Energy's (DOE) Energy
Information Agency (EIA) to facilities that generate electricity

ORL	One Record per Line

OTAQ	EPA's Office of Transportation and Air Quality

PAH	Polycyclic Aromatic Hydrocarbon

PFC	Portable Fuel Container

PM2.5	Particulate matter less than or equal to 2.5 microns

PM 10	Particulate matter less than or equal to 10 microns

PMc	Particulate matter greater than 2.5 microns and less than 10 microns

Prescribed Fire	Intentionally set fire to clear vegetation

RIA	Regulatory Impact Analysis

RPO	Regional Planning Organization

RRTM	Rapid Radiative Transfer Model

SCC	Source Classification Code

SMARTFIRE	Satellite Mapping Automatic Reanalysis Tool for Fire Incident Reconciliation

SMOKE	Sparse Matrix Operator Kernel Emissions

TSD	Technical support document

VOC	Volatile organic compounds

VMT	Vehicle miles traveled

Wildfire	Uncontrolled forest fire

WRAP	Western Regional Air Partnership

WRF	Weather Research and Forecasting Model

85

-------
Appendix B - Emissions Totals by Sector

Please see the independent spreadsheet Appendix_B_2021_emissions_totals_by_sector.xlsx that provides
inventory and speciation emissions totals for each emissions modeling sector.

86

-------
United States	Office of Air Quality Planning and Standards	Publication No. EPA-454/R-24-002

Environmental Protection	Air Quality Assessment Division	October 2024

Agency	Research Triangle Park, NC

-------