&EPA
United S*H;
Environmental Protectioti
Agency
 Bayesian space-time downscaling fusion model
 (downscaler) -Derived Estimates of Air Quality
 for 2009

-------
                                                         EPA-454/R-14-001
                                                             January 2014
Bayesian space-time downscaling fusion model (downscaler) -Derived
                  Estimates of Air Quality for 2009
                   U.S. Environmental Protection Agency
                Office of Air Quality Planning and Standards
                     Air Quality Assessment Division
                       Research Triangle Park, NC

-------
                              Contributors
                         Ellen Baldridge (EPA/OAR)
                           Halil Cakir (EPA/OAR)
                           Alison Eyth (EPA/OAR)
                          Dave Holland (EPA/ORD)
                           David Mintz (EPA/OAR)
                         Sharon Phillips (EPA/OAR)
                           Adam Reff (EPA/OAR)
                           Acknowledgements
The following people served as reviewers of this document and provided valuable comments
                          Jan Cortelyou (EPA/OAR)
                           Dennis Doll (EPA/OAR)
                            Tyler Fox (EPA/OAR)
                           Neil Frank (EPA/OAR)
                          James Hemby (EPA/OAR)
                         Marc Houyoux (EPA/OAR),
                        Dr. Bryan Hubbell (EPA/OAR)

-------
                                     Contents
Contents	1
1.0   Introduction	2
2.0   Air Quality Data	5
  2.1   Introduction to Air Quality Impacts in the United States	5
  2.2   Ambient Air Quality Monitoring in the United States	7
  2.3   Air Quality Indicators Developed for the EPHT Network	11
3.0   Emissions Data	13
  3.1   Introduction to Emissions Data Development	13
  3.2   2009 Emission Inventories and Approaches	13
  3.3   Emissions Modeling Summary	35
4.0   CMAQ Air Quality Model Estimates	56
  4.1   Introduction to the CMAQ Modeling Platform	56
  4.2   CMAQ Model Version, Inputs and Configuration	58
  4.3   CMAQ Model Performance Evaluation	63
5.0   Bayesian space-time downscaling fusion model (downscaler) -Derived Air Quality
Estimates	78
  5.1   Introduction	78
  5.2   Downscaler Model	78
  5.3   Downscaler Output	79
  5.4   Overview of Downscaler Model Results for 2009	80
  5.5   Accuracy Assessment of Downscaler Model Results	109
  5.6   Use of EPA Downscaler Model Predictions	120
Appendix A - Acronyms	122

-------
                                1.0   Introduction

 This report describes estimates of daily ozone (maximum 8-hour average) and PM2.5 (24-hour
 average) concentrations throughout the contiguous United States during the 2009 calendar
 year generated by EPA's recently developed data fusion method termed the "downscaler
 model" (DS). Air quality monitoring data from the National Air Monitoring Stations/State and
 Local Air Monitoring Stations (NAMS/SLAMS) and numerical output from the Community
 Multiscale Air Quality (CMAQ) model were both input to DS to predict concentrations at the
 2010 US census tract centroids encompassed by the CMAQ modeling domain. Information on
 EPA's air quality monitors, CMAQ model, and downscaler model is included to provide the
 background and context for understanding the data output presented in this report. These
 estimates are intended for use by statisticians and environmental scientists interested in the
 daily spatial distribution  of ozone and PM2.5.

 DS essentially operates by calibrating CMAQ data to the observational data, and then uses the
 resulting relationship to predict "observed" concentrations at new spatial points in the domain.
 Although similar in principle to a linear regression, spatial modeling aspects have been
 incorporated for improving the model fit, and a Bayesian1 approaching to fitting is used to
 generate an uncertainty value associated with each concentration prediction. The uncertainties
 that DS produces are a major distinguishing feature from earlier fusion methods previously
 used by EPA such as the "Hierarchical Bayesian" (HB) model (McMillan et al, 2009). The
 term "downscaler"  refers to the fact that DS takes grid-averaged data (CMAQ) for input and
 produces point-based estimates, thus "scaling down" the area of data representation. Although
 this allows air pollution concentration estimates  to be made at points where no observations
 exist, caution is needed when interpreting any within-gridcell spatial gradients generated by
 DS since they may  not exist in the input datasets. The theory, development, and initial
 evaluation of DS can be found in the earlier papers of Berrocal, Gelfand, and Holland (2009,
 2010, and 2011).

 The data contained in this report are an outgrowth of a collaborative research partnership
 between EPA scientists from the Office of Research  and Development's (ORD) National
 Exposure Research Laboratory (NERL) and personnel from EPA's Office of Air and
 Radiation's (OAR) Office of Air Quality Planning and Standards (OAQPS). NERL's Human
 Exposure and Atmospheric Sciences Division (HEASD), Atmospheric Modeling Division
 (AMD), and Environmental Sciences Division (ESD), in conjunction with OAQPS,  work
 together to provide air quality monitoring data and model estimates to the Centers for Disease
 Control and Prevention (CDC) for use in their Environmental Public Health Tracking (EPHT)
 Network.
1 Bayesian statistical modeling refers to methods that are based on Bayes' theorem, and model the world in terms
of probabilities based on previously acquired knowledge.
                                           2

-------
 CDC's EPHT Network supports linkage of air quality data with human health outcome data
 for use by various public health agencies throughout the U.S. The EPHT Network Program is
 a multidisciplinary collaboration that involves the ongoing collection, integration, analysis,
 interpretation, and dissemination of data from: environmental hazard monitoring activities;
 human exposure assessment information; and surveillance of noninfectious health conditions.
 As part of the National EPHT Program efforts, the CDC led the initiative to build the National
 EPHT Network (http:// www.cdc.gov/nceh/tracking/default.htm). The National EPHT
 Program, with the EPHT Network as its cornerstone,  is the CDC's response to requests calling
 for improved understanding of how the environment affects  human health. The EPHT
 Network is designed to provide the means to identify, access, and organize hazard, exposure,
 and health data from a variety of sources and to examine,  analyze and interpret those data
 based on their spatial and temporal characteristics.

 Since 2002, EPA has collaborated with the CDC on the development of the EPHT Network.
 On September 30, 2003, the Secretary of Health and Human Services (HHS) and the
 Administrator of EPA signed a joint Memorandum of Understanding (MOU) with the
 objective of advancing efforts to achieve mutual environmental public health goals2. HHS,
 acting through the CDC and the Agency for Toxic Substances and Disease Registry
 (ATSDR), and EPA agreed to expand their cooperative activities in support of the CDC
 EPHT Network and EPA's Central Data Exchange Node on the Environmental Information
 Exchange Network in the following areas:

    •   Collecting, analyzing and interpreting environmental and health data from both
        agencies (HHS and EPA).

    •   Collaborating on emerging information technology practices related to building,
        supporting, and operating the CDC EPHT Network and the Environmental
        Information Exchange Network.

    •   Developing and validating additional environmental  public health indicators.

    •   Sharing reliable environmental and public health data between their respective
        networks in an  efficient and effective manner.

    •   Consulting and informing each other about dissemination of results obtained through
        work carried out under the MOU and the associated Interagency Agreement (IAG)
        between EPA and CDC.
2 HHS and EPA agreed to extend the duration of the MOU, effective since 2002 and renewed in 2007, until June 29,
2017. The MOU isavailableatwww.cdc.gov/nceh/tracking/partners/epa mou 2007.htm.

-------
The best available statistical fusion model, air quality data, and CMAQ numerical model
output were used to develop the 2009 estimates. Fusion results can vary with different inputs
and fusion modeling approaches. As new and improved statistical models become available,
EPA will provide updates.

Although these data have been processed on a computer system at the Environmental Protection
Agency, no warranty expressed or implied is made regarding the accuracy or utility of the data on
any other system or for general or scientific purposes, nor shall the act of distribution of the data
constitute any such warranty. It is also strongly recommended that careful attention be paid to the
contents of the metadata file associated with these data to evaluate data set limitations, restrictions
or intended use.  The U.S. Environmental Protection Agency shall not be held liable for improper
or incorrect use of the data described and/or contained herein.

The four remaining sections and one appendix in the report are as follows.
    •   Section 2 describes the air quality data obtained from EPA's nationwide monitoring
       network and the importance of the monitoring data in determining health potential
       health risks.

    •   Section 3 details the emissions inventory data, how it is obtained and its role as a key
       input into the CMAQ air quality computer model.

    •   Section 4 describes the CMAQ computer model and its role in providing estimates of
       pollutant concentrations across the U.S. based on 12-km grid cells over the contiguous
       U.S.

    •   Section 5 explains the downscaler model used to statistically combine air quality
       monitoring data and air quality estimates from the CMAQ model to provide daily air
       quality estimates for the 2010 US census tract centroid locations within the contiguous
       U.S.

    •   The appendix provides a description of acronyms used in this report.

-------
                              2.0   Air Quality Data

To compare health outcomes with air quality measures, it is important to understand the origins
of those measures and the methods for obtaining them. This section provides a brief overview of
the origins and process of air quality regulation in this country. It provides a detailed discussion
of ozone (63) and particulate matter (PM). The EPHT program has focused on these two
pollutants, since numerous studies have found them to be most pervasive and harmful to public
health and the environment, and there are extensive monitoring and modeling data available.

2.1    Introduction to Air Quality Impacts in the United States

2.1.1   The Clean Air Act
In 1970, the Clean Air Act (CAA) was signed into law.  Under this law, EPA sets limits on how
much of a pollutant can be in the air anywhere in the United States. This ensures that all
Americans have the same basic health and environmental protections. The CAA has been
amended several times to keep pace with new information.  For more information on the CAA,
go to http://www.epa.gov/oar/caa/.

Under the CAA, the U.S. EPA has established standards or limits for six air pollutants, known as
the criteria air pollutants: carbon monoxide (CO), lead (Pb), nitrogen dioxide (NC^), sulfur
dioxide (802), ozone (Cb), and particulate matter (PM).  These standards, called the National
Ambient Air Quality Standards (NAAQS), are designed to protect public health and the
environment.  The CAA  established two types of air quality standards. Primary standards set
limits to protect public health,  including the health of "sensitive" populations such as asthmatics,
children, and the elderly.  Secondary standards set limits to protect public welfare, including
protection against decreased visibility, damage to animals, crops, vegetation, and buildings.  The
law requires EPA to review periodically these standards. For more specific information on the
NAAQS, go to www.epa.gov/air/criteria.html.  For general information on the criteria pollutants,
go to http://www.epa.gov/air/urbanair/6poll.html.

When these standards are not met, the area is designated as a nonattainment area. States must
develop state implementation plans (SIPs) that explain the regulations and controls it will use to
clean up the nonattainment areas. States with an  EPA-approved SIP can request that the area be
designated from nonattainment to attainment by providing three consecutive years of data
showing NAAQS  compliance. The state must also provide a maintenance plan to demonstrate
how it will continue to comply with the NAAQS  and demonstrate compliance over a 10-year
period, and what corrective actions it will take should a NAAQS violation occur after
designation. EPA must review and approve the NAAQS compliance data and the maintenance
plan before designating the area; thus, a person may live in an area designated as non- attainment
even though no NAAQS violation has been observed for quite  some time. For more information
on designations, go to http://www.epa.gov/ozonedesignations/  and
http://www.epa.gov/pmdesignations.
                                           5

-------
2.1.2  Ozone
Ozone is a colorless gas composed of three oxygen atoms. Ground level ozone is formed when
pollutants released from cars, power plants, and other sources react in the presence of heat and
sunlight. It is the prime ingredient of what is commonly called "smog."  When inhaled, ozone can
cause acute respiratory problems, aggravate asthma, cause inflammation of lung tissue, and even
temporarily decrease the lung capacity of healthy adults.  Repeated exposure may permanently scar
lung tissue.  lexicological, human exposure, and epidemiological studies were integrated by EPA
in "Air Quality Criteria for Ozone and Related Photochemical Oxidants." It is available at
http://www.epa.gov/ttn/naaqs/standards/ozone/s_o3_index.html.  The current (as of October 2008)
NAAQS for ozone is a daily maximum 8-hour average of 0.075 parts per million [ppm] (for details,
see http://www.epa.gov/ozonedesignations/).  The Clean Air Act requires EPA to review the
NAAQS at least every five years and revise them as appropriate in accordance with Section 108
and Section  109 of the Act.

2.1.3  Particulate Matter
PM air pollution is a complex mixture of small and large particles of varying origin that can
contain hundreds of different chemicals, including cancer-causing agents like polycyclic aromatic
hydrocarbons (PAH), as well as heavy metals such as arsenic and cadmium.  PM air pollution
results from direct emissions of particles as well  as particles formed through chemical
transformations of gaseous air pollutants.  The characteristics, sources, and potential health effects
of particulate matter depend on its source,  the season, and atmospheric conditions.

As  practical convention, PM is divided by sizes into classes with differing health concerns and
potential sourcesS . Particles less than 10  micrometers in diameter (PMio) pose a health concern
because they can be inhaled into and accumulate in the respiratory system.  Particles less than 2.5
micrometers in diameter (PM2.s) are referred to as "fine" particles.  Because of their small size, fine
particles can lodge deeply into the lungs. Sources of fine particles include all types of combustion
(motor vehicles, power plants, wood burning, etc.) and some industrial processes. Particles with
diameters between 2.5 and 10 micrometers (PMio-2.s) are referred to as "coarse" or PMc.  Sources
of PMc include crushing or grinding operations and dust from paved or unpaved roads.  The
distribution of PMio, PM2.5 and PMc varies from the Eastern U.S. to arid western areas.

Particle pollution - especially fine particles - contains microscopic solids and liquid droplets that
are  so small that they can get deep into the lungs and cause serious health problems.  Numerous
scientific studies have linked particle pollution exposure to a variety of problems, including
premature death in people with heart or lung disease, nonfatal heart attacks, irregular heartbeat,
aggravated asthma, decreased lung function, and increase respiratory symptoms, such as irritation
of airways, coughing or difficulty breathing. Additional information on the health effects of
particle pollution and other technical documents related to PM standards are available at
http://www.epa.gOv/ttn/naaqs/standards/pm/s pm index.html.
3 The measure used to classify PM into sizes is the aerodynamic diameter. The measurement instruments used for PM
are designed and operated to separate large particles from the smaller particles. For example, the PM2 5 instrument only
captures and thus measures particles with an aerodynamic diameter less than 2.5 micrometers.  The EPA method to
measure PMc is designed around taking the mathematical difference between measurements for PM10and PM25
                                              6

-------
The current NAAQS for PIVb.s includes both a 24-hour standard to protect against short-term effects, and
an annual standard to protect against long-term effects. The annual average PM2.5 concentration must not
exceed 12.0 micrograms per cubic meter (ug/m3), and the 24-hr average concentration must not exceed 35
ug/m3. More information is available at http://www.epa.gov/air/criteria.html and
http://www.epa.gov/oar/particlepollution/. The standards for PM2.5 values are shown in Table 2-1.
                                  Table 2-1. PMi.s Standards
Micrograms Per Cubic Meter:
Measurement - (ug/m3)
Annual Average
24-Hour Average
1997
15.0
65
2006
15.0
35
2012
12.0
35
2.2    Ambient Air Quality Monitoring in the United States

2.2.1   Monitoring Networks
The Clean Air Act requires every state to establish a network of air monitoring stations for criteria
pollutants, following specific guidelines for their location and operation. The monitoring stations in this
network have been called the State and Local Air Monitoring Stations (SLAMS). The SLAMS network
consists of approximately 4,000 monitoring sites whose distribution is largely determined by the needs of
State and local air pollution control agencies. All ambient monitoring networks selected for use in
SLAMS are tested periodically to assess the quality of the  SLAMS data being produced. Measurement
accuracy and precision are estimated for both automated and manual methods. The individual results of
these tests for each method or analyzer are reported to EPA. Then, EPA calculates quarterly integrated
estimates of precision and accuracy for the SLAMS data.

The National Air Monitoring Station network (NAMS) is about a 1,000-site subset of the SLAMS
network, with emphasis on areas of maximum concentrations and high population density in urban and
multi-source areas. The NAMS monitoring sites are designed to obtain more timely and detailed
information about air quality in strategic locations and must meet more stringent monitor siting,
equipment type, and quality assurance criteria.  NAMS monitors also must submit detailed quarterly and
annual monitoring results to EPA.

The SLAMS and NAMS networks experienced accelerated growth throughout the 1970s. The networks
were further expanded in 1999 following the 1997 revision of the CAA to include separate standards for
fine particles (PM2.s) based on their link to serious health problems ranging from increased symptoms,
hospital admissions, and emergency room visits, to premature death in people with heart or lung disease.
While most of the monitors in these networks are located in populated areas of the country, "background"
and rural monitors are an important part of these networks. For criteria pollutants other than ozone and
PM2.5, the number of monitors has  declined. For more information on SLAMS and NAMS, as well as
EPA's other air monitoring networks go to www.epa.gov/ttn/amtic.

In 2009, approximately 43 percent  of the US population was living within 10 kilometers of ozone and
PM2.5 monitoring sites. In terms of US Census Bureau tract locations, 31,341 out of 72,283 census tract
centroids were within 10 kilometers of ozone monitoring sites. Highly populated Eastern US and
California coasts are well covered by both ozone and PM2.5 monitoring network (Figure 2-1).

-------
    Distance to the Nearest Ozone Monitor
      •  41 - 10.000 meters
         10,001-25.000 meters
         25.001 - 50.000 meters
         50.001 - 75,000 meters
         75.001 -100,000 meters
      •  100,001 - 150,000 meters
      •  150.001-333.252 meters
    Distance to the Nearest PM2.5 Monitor
      •  41-10.000 meters
         10.001-25.000 meters
      •  25,001 - 50,000 meters
         50,001 - 75,000 meters
         75,001 - 100.000 meters
      •  100,001 -150,000 meters
      •  150,001-333,252 meters
Figure 2-1. Distances from US Census Tract centroids to the nearest monitoring site

-------
In summary, state and local agencies and tribes implement a quality-assured monitoring network to
measure air quality across the United States.  EPA provides guidance to ensure a thorough understanding
of the quality of the data produced by these networks. These monitoring data have been used to
characterize the status of the nation's air quality and the trends across the U.S. (see
www. epa. gov/airtrends).

2.2.2  Air Quality System Database
EPA's Air Quality System (AQS) database contains ambient air pollution data collected by EPA,  state,
local, and tribal air pollution control agencies from thousands of monitoring stations.  AQS also contains
meteorological data, descriptive information  about each monitoring station (including its geographic
location and its operator), and data quality assurance and quality control information. State and local
agencies are required to submit their air quality monitoring data into AQS within 90 days following the
end of the quarter in which the data were collected. This ensures timely submission of these data for use
by state, local, and tribal agencies, EPA, and the public. EPA's Office of Air Quality Planning and
Standards and other AQS users rely  upon  the data in AQS to assess air quality, assist in compliance with
the NAAQS, evaluate SIPs, perform modeling for permit review analysis, and perform other air quality
management functions. For more details,  including how users can retrieve data, go to
http://www.epa.gov/ttn/airs/airsaqs/index.htm.

2.2.3  Advantages and Limitations of the Air Quality Monitoring and Reporting System
Air quality data is required to assess public health outcomes that are affected by poor air quality. The
challenge is to get surrogates for air  quality on time and spatial scales that are useful for Environmental
Public Health Tracking activities.

The advantage of using ambient data from EPA monitoring networks for comparing with health outcomes
is that these measurements of pollution concentrations are the best characterization of the concentration
of a given pollutant at a given time and location.  Furthermore, the data are supported by a comprehensive
quality assurance program, ensuring data of known quality.  One disadvantage of using the ambient data
is that it is usually out of spatial and temporal alignment with health outcomes. This spatial and temporal
'misalignment' between air quality monitoring data and health outcomes is influenced by the following
key factors: the living and/or working locations (microenvironments) where a person spends their time
not being co-located with an air quality monitor; time(s)/date(s) when a patient experiences a health
outcome/symptom (e.g., asthma attack) not coinciding with time(s)/date(s) when an air quality monitor
records ambient concentrations of a  pollutant high enough to affect the symptom (e.g., asthma attack
either during or shortly after a high PM2.5  day). To compare/correlate ambient concentrations with acute
health  effects, daily local air quality  data is needed4.  Spatial gaps exist in the air quality monitoring
network, especially in rural areas, since the air quality monitoring network is designed to focus on
measurement of pollutant concentrations in high population density areas.  Temporal limits also exist.
Hourly ozone measurements are aggregated to daily values (the daily max 8-hour average is relevant to
the ozone standard).  Ozone is typically monitored during the ozone season (the warmer months,
approximately April through October).  However, year-long data is available in many areas and is
extremely useful to evaluate whether ozone is a factor in health outcomes during the non-ozone seasons.
PM2.5 is generally measured year-round. Most Federal Reference Method (FRM) PM2.5 monitors collect
4 EPA uses exposure models to evaluate the health risks and environmental effects associated with exposure.
These models are limited by the availability of air quality estimates, http://www.epa.gov/ttn/fera/index.html.

-------
data one day in every three days, due in part to the time and costs involved in collecting and analyzing the
samples. However, over the past several years, continuous monitors, which can automatically collect,
analyze, and report PM2.5 measurements on an hourly basis, have been introduced.  These monitors are
available in most of the major metropolitan areas. Some of these continuous monitors have been
determined to be equivalent to the FRM monitors for regulatory purposes and are called FEM (Federal
Equivalent Methods).

2.2.4   Use of Air Quality Monitoring Data
Air quality monitoring data has been used to provide the information for the following situations:

(1) Assessing effectiveness of SIPs in addressing NAAQS nonattainment areas
(2) Characterizing local, state, and national air quality status and trends
(3) Associating health and environmental damage with air quality levels/concentrations

For the EPHT effort, EPA is providing air quality data to support efforts associated with (2), and (3) above.
Data supporting (3) is generated by EPA through the use of its air quality data and its downscaler model.

Most studies that associate air quality with health outcomes use  air monitoring as a surrogate for exposure
to the air pollutants being investigated. Many studies have used the monitoring networks operated by
state and federal agencies.  Some studies perform special monitoring that can better represent exposure to
the air pollutants: community monitoring, near residences, in-house or work place monitoring, and
personal monitoring.  For the EPHT program, special monitoring is generally not supported, though it
could be used on a case-by-case basis.

From  proximity based exposure estimates to statistical interpolation, many approaches are developed for
estimating exposures to air pollutants using ambient monitoring data (Jerrett et al., 2005). Depending
upon the approach  and the spatial and temporal distribution of ambient monitoring data, exposure
estimates to air pollutants may vary greatly in areas further apart from monitors (Bravo et al., 2012).
Factors like limited temporal coverage (i.e., PM2.5 monitors do not operate continuously such as recording
every third day or ozone monitors operate only certain part of the year) and limited spatial coverage (i.  e.,
most monitors are located in urban areas and rural coverage is limited) hinder the ability of most of the
interpolation techniques that use monitoring data alone as the input. If we look at the example of
Voronoi Neighbor Averaging (VNA)  (referred as the Nearest Neighbor Averaging in most literature),
rural estimates would be biased towards the urban estimates. To further explain this point, assume the
scenario of two cities with monitors and no monitors in the rural areas between, which is very plausible. ,
Since exposure estimates are guaranteed to be within the range of monitors in VNA, estimates for the
rural areas would be higher according to this scenario.

Air quality models may overcome some of the limitations that monitoring networks possess. Models such
as the Community Multi-Scale Air Quality (CMAQ) modeling systems can estimate concentrations in
reasonable temporal and spatial resolutions. However these sophisticated air quality models are prune to
systematic biases since they depend upon so many variables (i.e., metrological models and emission
models) and complex chemical and physical process simulations.

Combining monitoring data with air quality models (via fusion or regression) may provide the best results
                                           10

-------
in terms of estimating ambient air concentrations in space and time.   EPA's eVNA5 is an example of an
earlier approach for merging air quality monitor data with CMAQ model predictions.  The downscaler
model attempts to address some of the shortcomings in these earlier attempts to statistically combine
monitor and model predicted data, see published paper referenced in section 1 for more information about
the downscaler model. As discussed in the next section, there are two methods used in EPHT to provide
estimates of ambient concentrations of air pollutants: air quality monitoring data and the downscaler
model estimate, which is a statistical 'combination' of air quality monitor data and photochemical air
quality model predictions (e.g., CMAQ).
2.3    Air Quality Indicators Developed for the EPHT Network
Air quality indicators have been developed for use in the Environmental Public Health Tracking Network
by CDC using the ozone and PIVb.s data from EPA. The approach used divides "indicators" into two
categories.  First, basic air quality measures were developed to compare air quality levels over space and
time within a public health context (e.g., using the NAAQS as a benchmark).  Next, indicators were
developed that mathematically link air quality data to public health tracking data (e.g., daily PM2.5 levels
and hospitalization data for acute myocardial  infarction). Table 2-3 and Table 2-4 describe the issues
impacting calculation of basic air quality indicators.
                  Table 2-2. Public Health  Surveillance Goals and Current Status
      Goal
Status
      Air data sets and metadata required for air quality
      indicators are available to EPHT state Grantees.
AQS data are available through state agencies and
EPA's Air Quality System (AQS). EPA and CDC
developed an interagency agreement, where EPA
provides air quality data along with statistically
combined AQS and Community Multiscale Air Quality
(CMAQ) Model data, associated metadata, and technical
reports that are delivered to CDC.	
      Estimate the linkage or association of PM2 5 and
      ozone on health to:
      Identify populations that may have higher risk of
      adverse health effects due to PM2 5 and ozone,
      Generate hypothesis for further research, and
      Provide information to support prevention and
      pollution control strategies.
Regular discussions have been held on health-air linked
indicators and CDC/HFI/EPA convened a workshop
January 2008. CDC has collaborated on a health impact
assessment (HIA) with Emory University, EPA, and
state grantees that can be used to facilitate greater
understanding of these linkages.
      Produce and disseminate basic indicators and other
      findings in electronic and print formats to provide
      the public, environmental health professionals, and
      policymakers, with current and easy-to-use
      information about air pollution and the impact on
      public health.
Templates and "how to" guides for PM2 5 and ozone
have been developed for routine indicators. Calculation
techniques and presentations for the indicators have been
developed.
5 eVNA is described in the "Regulatory Impact Analysis for the Final Clean Air Interstate Rule", EPA-452/R-05-002,
March 2005, http://www.epa.gov/cair/pdfs/finaltech08.pdf. Appendix F.
                                              11

-------
   Table 2-3. Basic Air Quality Indicators used in EPHT, derived from the EPA data delivered to
                                              CDC
    Ozone (daily 8-hr period with maximum concentration—ppm—by Federal Reference Method (FRM))
    •   Number of days with maximum ozone concentration over the NAAQS (or other relevant benchmarks (by county
       and MSA)
    •   Number of person-days with maximum 8-hr average ozone concentration over the NAAQS & other relevant
       benchmarks (by county and MSA)
   PM2 5 (daily 24-hr integrated samples -ug/m3-by FRM)
       Average ambient concentrations of paniculate matter (< 2.5 microns in diameter) and compared to annual PM2s
       NAAQS (by state).
       % population exceeding annual PM2sNAAQS (by state).
       % of days with PM2.5 concentration over the daily NAAQS (or other relevant benchmarks (by county and MSA)
       Number of person-days with PM2s concentration over the daily NAAQS & other relevant benchmarks (by
       county and MSA)
2.3.1   Rationale for the Air Quality Indicators
The CDC EPHT Network is initially focusing on ozone and PIVb.s. These air quality indicators are based
mainly around the NAAQS health findings and program-based measures (measurement, data and analysis
methodologies). The indicators will allow comparisons across space and time for EPHT actions. They
are in the context of health-based benchmarks. By bringing population into the measures, they roughly
distinguish between potential exposures (at broad scale).

2.3.2   Air Quality Data Sources
The air quality data will be available in the US EPA Air Quality System (AQS) database based on the
state/federal air program's data collection and processing. The AQS database contains ambient air
pollution data collected by EPA, state, local, and tribal air pollution control agencies from thousands of
monitoring stations (SLAMS and NAMS).

2.3.3   Use of Air Quality Indicators for Public Health Practice
The basic indicators will be used to inform policymakers and the public regarding the degree of hazard
within a  state and across states (national). For example, the number of days per year that ozone is above
the NAAQS can be used to communicate to sensitive populations (such  as asthmatics) the number of days
that they may be exposed to unhealthy levels of ozone.  This is the same level used in the Air Quality
Alerts that inform these  sensitive populations when and how to reduce their exposure.  These indicators,
however, are not a surrogate measure of exposure and therefore will not be linked with health data.
                                            12

-------
                                  3.0   Emissions Data
3.1    Introduction to Emissions Data Development

The U.S. Environmental Protection Agency (EPA) developed an air quality modeling platform based
primarily on the 2008 National Emissions Inventory (NEI), Version 2 to process year 2009 emission data
for this project. This section provides a summary of the emissions inventory and emissions modeling
techniques applied to Criteria Air Pollutants (CAPs) and the following select Hazardous Air Pollutants
(HAPs): chlorine (Cl), hydrogen chloride (HC1), benzene, acetaldehyde, formaldehyde and methanol.
This section also describes the approach and data used to produce emissions inputs to the air quality
model. The air quality modeling, meteorological inputs and boundary conditions are described in a
separate section.

The Community Multiscale Air Quality (CMAQ) model (http://www.epa.gov/AMD/CMAQ/) is used to
model ozone (63) and particulate matter (PM) for this project. CMAQ requires hourly and gridded
emissions of the following inventory pollutants: carbon monoxide (CO),nitrogen oxides (NOx), volatile
organic compounds (VOC), sulfur dioxide (SCh), ammonia (NHa), particulate matter less than or equal
tolO microns (PMio), and individual component species for particulate matter less than or equal to 2.5
microns (PIVh.s).  In addition, the CMAQ CB05 with chlorine chemistry used here  allows for explicit
treatment of the VOC HAPs benzene, acetaldehyde, formaldehyde and methanol (BAFM) and includes
anthropogenic HAP emissions of HC1 and Cl.

The effort to create the 2009 emission inputs for this study included development of emission inventories
for a 2009 model evaluation case, and application of emissions modeling tools to convert the inventories
into the format and resolution needed by CMAQ. An evaluation case uses 2009-specific fire and
continuous emission monitoring (CEM) data for electric generating units (EGUs) whereas other types of
cases use averages for these sources. The primary emissions modeling tool used to create the CMAQ
model-ready emissions was the Sparse Matrix Operator Kernel Emissions (SMOKE) modeling system.
SMOKE version 3.1 was used to create emissions files for a 12-km national grid. Additional information
about SMOKE is available from http://www.smoke-model.org.

This chapter contains two additional sections. Section 3.2 describes the inventories input to SMOKE and
the ancillary files used along with the emission inventories. Section 3.3 describes the emissions modeling
performed to convert the inventories into the format and resolution needed by CMAQ.


3.2    2009 Emission Inventories and Approaches

This section describes the emissions inventories created for input to SMOKE. The 2008 NEI, which is the
primary basis for the input to  SMOKE, includes five main categories of source sectors: a) nonpoint
(formerly called "stationary area") sources; b) point sources; c) nonroad mobile sources; d) onroad
mobile sources; and e) fires. The NEI data are largely compiled from data submitted by state, local and
tribal (S/L/T) agencies for CAPs. HAP emissions data are often augmented by EPA because they are a
voluntary component.  The 2008 NEI was compiled using the Emissions Inventory System (EIS). EIS
                                          13

-------
includes hundreds of automated QA checks to help improve data quality, and also supports release point
(stack) coordinates separately from facility coordinates. Improved EPA collaboration with S/L/T
agencies prevented duplication between point and nonpoint source categories such as industrial boilers.
Documentation for the 2008 NEI is available at http://www.epa.gov/ttn/chief/net/2008inventory.htmltf
inventorydoc.

2009-specific data submitted by S/L/T agencies was used for some large point sources. For EGU
emissions, 2009 continuous emissions monitoring (CEM) data was used where it was available. For fires,
EPA used the SMARTFIRE2 (SF2) system to develop 2009 emissions. SF2 was the first system to
assign all fires as either prescribed burning or wildfire categories and includes improved emission factor
estimates for prescribed burning.  2009-specific data for onroad, nonroad, and large commercial marine
sources was also developed. Some data obtained from regional planning organizations (RPOs) was
substituted for NEI data where the RPO data was more recently collected. California-provided mobile
source emissions were also used. For inventories  outside of the United States, including Canada, Mexico
and offshore emissions, the latest  available base year inventories were used.

The methods used to process emissions for this project are very similar to those documented for EPA's
Version 5, 2007 Emissions Modeling Platform. A technical support document (TSD) for this platform is
available at EPA's emissions modeling clearinghouse (EMCH):
http://www.epa.gov/ttn/chief/emch/index.htmltfpmnaaqs. Electronic copies of inventories similar to those
used for this project are available  in the same section of the EMCH.

The emissions modeling process,  performed using SMOKE v3.1  apportions the emissions inventories
into the grid cells used by CMAQ and temporalizes the emissions into  hourly values. In addition, the
pollutants in the inventories (e.g.,  NOx and VOC) are split into the chemical species needed by CMAQ.
For the purposes of preparing the  CMAQ- ready emissions, the broader NEI emissions inventories are
split into  emissions modeling "platform" sectors;  and  biogenic emissions are added along with emissions
from other sources other than the NEI, such as the Canadian, Mexican, and offshore inventories. The
significance of an emissions sector for the emissions modeling platform is that it is run through all of the
s programs, except the final merge, independently from the other sectors. The final merge program called
Mrggrid combines the sector- specific gridded, speciated and temporalized emissions to create the final
CMAQ-ready emissions inputs.

Table 3-1 presents the sectors in the emissions modeling platform used to develop 2009 emissions for this
project. The sector abbreviations are provided in italics; these abbreviations are used in the SMOKE
modeling scripts and inventory file names and throughout the remainder of this section. Annual 2009
emission  summaries for the U.S. anthropogenic sectors are shown in Table 3-2 (i.e., biogenic emissions
are excluded).  Table 3-3 provides a summary of emissions for the anthropogenic sectors containing
Canadian, Mexican and offshore sources. State total emissions for each sector are provided in Appendix
A, a workbook entitled "Appendix_A_2009_emissions_totals_by_sector.xlsx".
                                           14

-------
              Table 3-1. Platform Sectors Used in the Emissions Modeling Process
2009 Platform Sector
(Abbrev)
IPM (ptipm)
2009 NEI
Sector
Point
Point non-IPM (ptnonipm)    Point
Point source fire (ptfire)       Fires
Agricultural (ag)             Nonpoint
Area fugitive dust (afdust)     Nonpoint
Remaining nonpoint (nonpi)   Nonpoint
Nonroad (nonroad)
Cl and C2 marine and
locomotive (clc2rail)
Nonroad
Nonroad
C3 commercial marine
(c3marine)
Nonroad
Description and resolution of the data input to
SMOKE
2009 NEI point source EGUs that can be mapped to
the Integrated Planning Model (IPM) model. NEI
values replaced with year 2009 hourly continuous
emission monitoring (CEM) NOx and SO2 emissions.
Other pollutants are scaled from 2008 NEI using heat
input.
A mix of 2008 NEI point source emissions with some
2009 records where data was provided by states and
locals and 2006 WRAP oil and gas data; these are
emissions not matched to the ptipm sector, annual
resolution. Includes all aircraft emissions
Point source day-specific wildfires and prescribed
fires for 2009.
2008 NEI nonpoint NHa emissions from livestock
and fertilizer application; county and annual
resolution with some 2007 monthly resolution data
provided by the Midwest.
2008 NEI nonpoint PMio and PM2.5 from fugitive
dust sources (e.g., building construction, road
construction, paved roads, unpaved roads,
agricultural dust), county and annual resolution. A
land use-based transport fraction and 2009-based
precipitation zero-out is applied.
Primarily 2008 NEI nonpoint for sources not
included in other sectors mixed with 2006 WRAP oil
and gas data, county and annual resolution.
Year 2009 monthly nonroad emissions from the
National Mobile Inventory Model (NMEVI) plus
California-provided data; county and annual
resolution.
Year 2008 non-rail  maintenance locomotives, and
category 1 and category 2 commercial marine vessel
(CMV) emissions sources; county and annual
resolution; year 2009 for California.
Non-NEI, year 2009 category 3 (C3) CMV emissions
projected from year 2002. Developed for the rule
called "Control of Emissions from New Marine
Compression-Ignition Engines at or Above 30  Liters
per Cylinder", usually described as the Emissions
Control Area- International Maritime Organization
(ECA-IMO) study:
http://www.epa.gov/otaq/oceanvessels.htm. (EPA-
420-F-10-041, August 2010). Annual resolution and
                                         15

-------
                                             treated as point sources.
Onroad (onroad)
Onroad
Onroad Refueling
(onroad' rfl)
Biogenic (beis)
Onroad
Biogenic
Other point sources (othpt)    N/A
Other nonpoint and nonroad     ,.
(pthaf)
Other onroad sources (othon)  N/A
Year 2009 gridded hourly emissions from onroad
mobile gasoline and diesel vehicles from parking lots
and moving vehicles including exhaust, evaporative,
permeation, and brake and tire wear. Generated using
MOVES 201 Ob emission factors, 2009 VMT and
vehicle population data, and 2009 gridded met. data.
In California, adjusted to match CA-provided
emissions.
Year 2009 gridded hourly emissions from onroad
mobile gasoline and diesel vehicles from parking lots
and moving vehicles for refueling only. Generated
using MOVES 201 Ob, emission actors, 2009 VMT
and vehicle population data, and 2009 gridded met.
data. Spatially allocated to gasoline station locations.
Hour- and grid cell-specific emissions for 2009
generated from the BEIS 3.14 model, including
emissions in Canada and Mexico.
Point sources not from the NEI, including Canada's
2006 inventory and a 2008 projection of Mexico's
Phase III 1999 inventory;  annual resolution. Also
includes 2008 offshore oil point source emissions for
the U.S. from the 2008 NEI.
Nonpoint and nonroad sources not from the NEI,
including annual 2006 Canada sources at province
resolution and a 2008 projection of annual 1999
Mexico sources at municipio resolution.
Onroad sources not from the NEI, including annual
2006 Canada sources at province resolution and a
2008 projection of 1999 Mexico sources at municipio
resolution.
                                          16

-------
   Table 3-2. 2009 Continental United States Emissions by Sector (tons/yr in 48 states + D.C.)
   Sector
afdust
nonroad
onroad w/rfl.
 >tfire
;
ptipm
ptnonipm
c3 marine
Con.US
Total
  217,984
4,336,565
15,053,21
 	5
27,221,69
        8
12,378,69
        7
676,123
                                                   PMio     PMi.5
                                                  5,823,63
                                                         5    816,524
                                      1,329,661
                                      1,230,624
                                                    43,528
                                                  767,225
127,354

203,630

 24,015

 70,131
                            4,178,42
                                  0
6,165,415

  192,773

2,046,085

1,905,593
  160,083
14,814,53
        0
                                                  302,003
                                                  1,280,58
                                                         7
                                             40,733
                                            676,243
                              1,985   1,784,297    174,562    165,768
                                            222,002
                                            1,085,24
                                                  4
                                   298,162   205,675

                                   512,816   360,946
                                    14,515     13,326
                                  9,217,03   3,586,46
                                         4         0
  48,487
402,633

  32,169

  34,973

100,324
5,965,96
       8
1,329,43
       6
121,120
8,035,10
       9
   60,809
6,456,455

2,249,982

2,736,569

2,927,182

   32,955

1,065,623
    5,725
15,535,30
        0
   Table 3-3. 2009 Non-US Emissions by Sector within Modeling Domain (tons/yr for Canada,
                                     Mexico, Offshore)

Country &
Sector

Canada othar
Canada othon

Canada othpt

Canada Subtotal
Mexico othar
Mexico othon
Mexico othpt
Mexico Subtotal
Offshore othpt
Canada c3 marine
Offshore
c3 marine

[tons/yr]
CO

3,747,303
4,513,915

1,148,101

9,409,320
477,908
659,536
101,309
1,238,753
82,133
13,394

80,212
[tons/yr
]
NH3

537,912
21,810

21,138

580,860
132,913
2,971
0
135,884
0
0

0

[tons/yr]
NOx

718,757
537,704

861,256
2,117,71
7
198,972
93,839
344,896
637,708
74,277
160,983

961,146

[tons/yr]
PMio
1,421,68
6
15,004

117,254
1,553,94
4
88,319
7,935
122,654
218,908
780
13,434

80,549
[tons/yr
]
PMl.5

393,642
10,634

68,115

472,390
56,809
7,348
90,304
154,460
769
12,311

74,063

[tons/yr]
SOi

97,709
5,430
1,762,34
5
1,865,48
4
56,417
5,738
740,238
802,393
1,021
99,644

599,679

[tons/yr]
voc
1,267,47
2
277,874

425,792
1,971,13
8
510,955
96,218
78,465
685,639
60,756
5,690

34,079
                                         17

-------
                     10,823,81             3,951,83    1,867,61              3,368,22   2,757,30
 2009 TOTAL               2   716,744          1          5   713,994          1          3
3.2.1   Point Sources (ptipm andptnonipm)
Point sources are sources of emissions for which specific geographic coordinates (e.g., latitude/longitude)
are specified, as in the case of an individual facility. A facility may have multiple emission points, which
may be characterized as units such as boilers, reactors, spray booths, kilns, etc. A unit may have multiple
processes (e.g., a boiler that sometimes burns residual oil and sometimes burns natural gas). The point
sources used for this study include a limited set of emissions data for 2009 collected via the NEI process,
with 2008 NEI data for any sources that did not report in 2009. Note that only large sources are required
to report annually as opposed to triennially. This section describes NEI point sources within the
contiguous United States. The offshore oil (othpt sector), fires (ptfire) and category 3 CMV emissions
(cSmarine sector) are point source formatted inventories discussed later in this section. Full
documentation for the development of the 2008 NEI (EPA, 2012), is posted at:
http://www.epa.gov/ttn/chief/net/2008inventory.htmltfinventorydoc.

After removing offshore oil platforms into the othpt sector, we created two platform sectors from the
remaining point sources for input into SMOKE: the EGU sector - also called the IPM sector (i.e., ptipm)
and the non-EGU sector - also called the non-IPM sector (i.e., ptnonipm).  This split facilitates the use of
different SMOKE temporal processing and future-year projection techniques for each of these sectors.
The inventory pollutants processed through SMOKE for both the ptipm and ptnonipm sectors were: CO,
NOX, VOC, SO2, NH3, PM10, and PM2.5 and the following HAPs: HC1 (pollutant code = 7647010),
and Cl (code = 7782505). BAFM from these sectors was not utilized because VOC was speciated
without the use (i.e., integration) of VOC HAP pollutants from the inventory (integration is discussed in
detail in Sections.3.4).

In the 2009 model evaluation case used in this study, for ptipm sector sources with CEM data that could
be matched to the NEI, 2009 hourly SO2 and NOX emissions were used alongside annual emissions of all
other pollutants. The hourly electric generating unit (EGU) emissions were  obtained for SO2 and NOX
emissions and heat input from EPA's Acid Rain Program. This data also contained heat input, which was
used to allocate the annual emissions for other pollutants (e.g., VOC, PM2.5, HC1) to hourly values. For
unmatched EGU units, annual emissions were temporalized to days using multi-year averages and to
hours using state-specific averages.

The Non-EGU Stationary Point Sources (ptnonipm) emissions were provided to SMOKE as annual
emissions. The emissions were developed as follows:

   a.  2008  CAP and HAP data were provided by States, locals and tribes under the  Consolidated
       Emissions Reporting Rule
   b.  EPA corrected known issues and filled PM data gaps.
   c.  EPA added HAP data from the Toxic Release Inventory (TRI) where it was not provided by
       states/locals.
   d.  EPA provided data for airports and rail  yards.
   e.  Off-shore platform data was added from Mineral Management Services (MMS).
                                           18

-------
The changes made to the NEI point sources prior to modeling are as follows:

   •   The tribal data, which do not use state/county Federal Information Processing Standards (FIPS)
       codes in the NEI, but rather use the tribal code, were assigned a state/county FIPS code of
       88XXX, where XXX is the3-digit tribal code in the NEI. This change was made because SMOKE
       requires the state/county FIPS code.
   •   Stack parameters for some point sources were defaulted when modeling in SMOKE. SMOKE uses
       an ancillary file,  called the PSTK file, which provides default stack parameters by SCC code to
       either gap fill stack parameters if they are missing in the NEI or to correct stack parameters if they
       are outside the ranges specified.
   •   Replaced stack parameters with values  from the 2008 NEI where 2008 values were determined to
       be more realistic.
   •   Replaced facility emissions with 2008 NEI values where the 2009 NEI contained questionable
       values.


3.2.1.11PM Sector (ptipm)
The ptipm sector contains emissions from EGUs in the 2009 NEI point inventory that could be matched to
the units found in the NEEDS database, version 4.10 (http://www.epa.gov/airmarkets/progsregs/epa-ipm/
index.html). IPM provides future year emission inventories for the universe of EGUs contained in the
NEEDS database. As described below, matching with NEEDS was done (1) to provide consistency
between the 2009 EGU sources and future year EGU emissions for sources which are forecasted by IPM,
and (2) to avoid double counting when projecting point source emissions.

The 2009 NEI point source inventory contains  emissions estimates for both EGU and non-EGU sources.
When future years are modeled, IPM is used to predict the future year emissions for the EGU sources. The
remaining non-EGU point sources are projected by applying projection and control factors to the base
year emissions. It was therefore necessary to identify and separate into two sectors: (1) sources that are
projected via IPM (i.e., the "ptipm" sector) and (2) sources that are not (i.e., "the "ptnonipm" sector). The
two sectors are modeled separately in the base  year as well as the future years.

A primary reason the ptipm sources were separated from the other point sources was due to the difference
in the temporal resolution of the data input to SMOKE. The ptipm sector uses the available hourly CEM
data via a method first implemented in the 2002 platform and still used for the 2009 platform. Hourly
CEM data for 2009 were obtained from the CAMD Data and Maps website3. For sources and pollutants
with CEM data, the actual year 2009 hourly CEM data were used. The SMOKE modeling system matches
the ORIS Facility and Boiler IDs in the NEI SMOKE-ready file to the same fields in the CEM data,
thereby allowing the hourly SO2 and NOX CEM emissions to be read directly from the CEM data file. The
heat input from the hourly CEM data was used to allocate the NEI annual values to hourly values for all
other pollutants from CEM sources, because CEMs are not used to measure emissions of these
pollutants.

For this project, the point source inventory was reviewed to determine whether additional matches needed
to be made.  Newly identified matches for CEM and NEEDS IDs were loaded into the Emissions
                                          19

-------
Inventory System (EIS) so they could then be written into the modeling files. Some matches were made
outside of EIS when IDs were not mapped one to one between the systems.

Emissions were scaled from 2008 levels to 2009 levels where possible based on CEM data, where
possible. For sources not matching the CEM data ("non-CEM" sources), daily emissions were computed
from the NEI annual emissions using a structured query language (SQL) program and state-average CEM
data. To allocate annual emissions to each month, state-specific, three-year averages of 2008-2010 CEM
data were created. These average annual-  to-month factors were assigned to non-CEM sources by state.
To allocate the monthly emissions to each day, the 2009 CEM data were used to compute state-specific
month- to-day factors, which were then averaged across all units in each state. The resulting daily
emissions were input into SMOKE. The daily-to-hourly allocation was performed in SMOKE using
diurnal profiles. The development of these diurnal ptipm-specific profiles, considered ancillary data for
SMOKE, is described in a later section.

3.2.1.2 Non-IPM Sector (ptnonipm)
The non-IPM (ptnonipm) sector contains  all NEI point sources not included in the IPM (ptipm) sector
except for the offshore oil and day-specific fire emissions. For the most part, the ptnonipm sector reflects
the  non-EGU component of the NEI point inventory; however, as previously discussed, it is likely that
some small low-emitting EGUs that are not reflected in the CEMs database are present in the ptnonipm
sector. The ptnonipm sector contains a small amount of fugitive dust PM emissions from vehicular traffic
on paved or unpaved roads at industrial facilities or coal handling at coal mines. In previous versions of
the  platform, we would reduce these emissions prior to input to SMOKE.  However, in this platform the
reduction is not made because of a new methodology used to reduce PM dust.

For some geographic areas, some of the sources in the ptnonipm sector belong to source categories that
are  contained in other sectors. This occurs in the inventory when states, tribes or local programs report
certain inventory emissions as point sources because they have specific geographic coordinates for these
sources. They may use point source SCCs (8-digit) or they may use non- point, onroad or nonroad (10-
digit) SCCs. In the 2008 NEI, examples of these types of sources include: aircraft and ground support
emissions, livestock (i.e., cattle feedlots) in California, and rail yards.

Some adjustments were made to the point inventory prior to its use in modeling.  These include:

    •  Removing sources with state county codes ending in '777'. These are used for 'portable' point
       sources like asphalt plants.
    •  Removing sources with SCCs not  typically used for modeling.
    •  Adjusting latitude-longitude coordinates for sources identified to be substantially outside the
      county in which they reside.
    •  Removed all offshore oil records as reflected by FIPS=85000 because these sources are processed
      in the othpt sector.
    •  Added 2008 ethanol facilities provided by EPA's OTAQ that were not already included in the
      2008 NEI.
    •  Corrected stack parameters for some units with missing or invalid parameter assignments.
    •  Added South Dakota emissions because they did not submit to the 2008 NEI.
                                           20

-------
    •   Added MeadWestVaco facility in Covington, VA because it was missing in the 2008 NEI.
    •   Added oil and gas emissions that were not otherwise included in the NEI from the Western
       Regional Air Partnership (WRAP) RPO created year 2006 "Phase III" oil and gas inventory
       project.
    •   Removed onroad refueling emissions that some states included in the point sector because these
       are modeled nationwide using MOVES2010b.

3.2.2   Nonpoint Sources (afdust, ag, nonpt)
The nonpoint emissions sources used  in this study are primarily from the 2008 NEI. Documentation for
the 2008 NEI is available at http://www.epa.gov/ttn/chief/net/2008inventorv.html#inventorydoc. Prior to
modeling, the nonpoint portion of the 2008 NEI was divided into the following sectors for which the data
is processed in consistent ways: area fugitive dust (afdust), agricultural ammonia (ag), and the other
nonpoint sources (nonpt). This section describes stationary nonpoint sources only.  Class 1 & Class 2
(clc2) and Class 3  (c3) commercial marine vessels and locomotives are also in the 2008 NEI nonpoint
data category, but these sources are included in the mobile source portion of this documentation.
Nonpoint tribal-submitted emissions were removed to prevent possible double counting with county-level
emissions. Because the tribal nonpoint emissions are small, these  omissions should not impact results at
the 12-km scale used for modeling. This omission also eliminated the need to develop costly spatial
surrogate data to allocate tribal data to grid cells during the SMOKE processing.  Some specific types of
nonpoint sources were not included in the modeling due to one of the following reasons: 1) the sources
are only reported by a few states or agencies, 2) the sources are 'atypical' and small, and/or 3) there are
other data available that appears to be more accurate. Additional details on nonpoint source processing
can be found in the Version 5, 2007 Emissions Modeling Platform documentation discussed earlier.

In the rest of this section, each of the platform sectors into which the 2008 nonpoint NEI was divided is
described, along with any changes made to these data.

3.2.2.1 Area Fugitive Dust Sector (afdust)
The area-source fugitive dust (afdust)  sector contains PM emission estimates for 2008 NEI nonpoint
SCCs identified by EPA staff as fugitive dust sources. Categories included in this sector are paved roads,
unpaved roads and airstrips, construction (residential, industrial, road and total), agriculture production
and all of the mining 10-digit SCCs beginning with the digits "2325." It does not include fugitive dust
from grain elevators because these are elevated point sources.

This sector is separated from other nonpoint sectors to allow for the application of "transport fraction,"
and meteorology/precipitation ("MET") reductions. These adjustments are applied via sector-specific
scripts and make use of land use-based gridded transport fractions. The land use data used to reduce the
NEI emissions explains the amount of emissions that are subject to transport. This methodology is
discussed in (Pouliot, et. al., 2010),
http://www.epa.gov/ttn/chief/conference/ei 19/session9/pouliot_pres.pdf, and in Fugitive Dust Modeling
for the 2008 Emissions Modeling Platform (Adelman, 2012). The precipitation adjustment is then
applied to remove all emissions for days on which measureable rain occurs or there is  snow on the
ground. Both the transport fraction and MET adjustments are based on the gridded meteorological  data;
therefore, different emissions could result from different grid resolutions. Application of the transport
fraction and MET adjustments reduces the overestimation of fugitive dust impacts in the grid modeling as
                                           21

-------
compared to ambient samples.

3.2.2.2 Agricultural Ammonia Sector (ag)
The agricultural NFb  "ag" sector is comprised of livestock and agricultural fertilizer application emissions
from the nonpoint sector of the 2008 NEI. The livestock and fertilizer emissions were extracted based on
SCC. The "ag" sector includes all of the NFb emissions from fertilizer contained in the NEI. However,
the "ag" sector does not include all of the livestock ammonia emissions, as there are also some NFb
emissions from feedlot livestock in the point source inventory. To prevent double-counting, emissions
were not included in the nonpoint ag inventory for counties in which they were in the point source
inventory. A significant error in the 2008 NEI was corrected in the modeling platform ag sector. A
fertilizer application source "N-P-K (multi-grade nutrient fertilizers)" (SCC=2801700010) in Luna
county New Mexico (FIPS=35025), was 6,953 tons of NH3 in the 2008 NEI. This source was corrected
by a factor of 1,000 to be 6.953 tons in the modeling platform.

Monthly NH3 emissions provided by the Lake Michigan Air Directors Consortium were used to replace
NEI ag sector emissions in that region due to the improved temporal resolution. 2008 NEI (annual) ag
sector emissions were used in all other states.  A new temporal allocation methodology for animal  NH3
was implemented for this modeling platform that allocates monthly emissions down to the hourly level by
taking into account temperature and wind speed. This method is discussed in more detail in the emission
modeling portion of this chapter.

3.2.2.3 Other Nonpoint Sources (nonpt)
Stationary nonpoint sources that were not subdivided into the afdust, ag or nonpt sectors were assigned to
the "nonpt" sector. In preparing the nonpt sector, catastrophic releases were excluded since these
emissions were dominated by tire burning, which is  an episodic, location-specific emissions category. Tire
burning accounts for significant emissions of particulate matter in some parts of the country. Because such
sources are reported by a very small number of states, and are inventoried as county/annual totals without
the information needed to temporally and spatially allocate the emissions to the time and location where
the event occurred, catastrophic releases were excluded. All fire emissions, including  agricultural,
wildfire, and prescribed burning, were removed  and substituted with SMARTFIRE emissions (see  the
"ptfire" sector). Locomotives and CMV mobile  sources from the 2008 NEI nonpoint inventory are
described in the mobile sources section.

The nonpt sector includes emission estimates for Portable Fuel Containers (PFCs), also known as "gas
cans." The PFC inventory consists of five distinct sources of PFC emissions, further distinguished by
residential or commercial use. The five sources are:  (1) displacement of the vapor within the can; (2)
spillage of gasoline while filling the can; (3) spillage of gasoline during transport; (4) emissions due to
evaporation (i.e., diurnal emissions); and (5) emissions due to permeation. Note that spillage and vapor
displacement associated with using PFCs to refuel nonroad equipment are included in the nonroad
inventory.
Some adjustments to  the 2008 NEI nonpoint data were made using data from regional planning
organizations (RPOs) as follows:

   •   Replaced 2008 NEI oil and gas emissions (SCCs beginning with "23100") with year 2006 Phase
       III oil and gas emissions for several basins in the WRAP RPO states. These WRAP Phase  III
                                           22

-------
       emissions contain point and nonpoint formatted data are discussed in greater detail at:
       http://www.wrapair2.org/PhaseIII.aspx.  These changes were made only in counties for which
       there was WRAP data.
   •   Replaced 2008 NEI nonpoint agriculture burning emissions with year 2008 SMARTFIRE day-
       specific county-based emissions aggregated to monthly totals.
   •   Replaced open burning "land clearing" (SCC=2610000500) emissions in Florida and Georgia
       with  SESARM-provided daily point data, but aggregated to county and monthly resolution.
   •   Replaced open burning data (SCCs beginning with 261000x) in MARAMA states with RPO-
       proved data.
   •   Removed industrial coal combustion emissions (SCC=2102002000) in Tennessee.
   •   Replaced, removed and modified much of the residential wood combustion (RWC) emissions in
       the MARAMA, MWRPO and SESARM states with RPO data and non-RPO corrections,
       modified the outdoor hydronic heater (OHH) emissions in all states and indoor furnaces in
       MWRPO states.
   •   Removed EPA-estimated commercial  cooking (SCCs 2302002100 and 2302002200) duplicate
       PM emissions in California.
   •   Removed duplicate "Industrial Processes; Food and Kindred Products; Total" source
       (SCC=23020000000) in Maricopa county Arizona (FIPS=04013).

The oil and gas changes were already discussed in the ptnonipm section.  Other significant changes are
discussed below.

Ag burning
2008 NEI agricultural burning estimates were replaced with more specific data from the Fire
Characteristic Classification System (FCCS) module fuel loadings map in the BlueSky Framework
(http://blueskyframework.org/modules/fuel-loading/fccs).  Year 2008-specific fire locations from
SMARTFIRE version 1 (Sullivan, et al., 2008) were read into the FCCS module and intersected with the
FCCS fuel-loading dataset. The module assigned an FCCS code to each fire record that reflects the
ecosystem geography and  potential natural vegetation based on remote sensing data. Prescribed or
unclassified fires having an FCCS code equal to zero (0) were assumed to be agricultural fires.  Arc GIS
was used to categorize the fires as occurring on rangeland, cropland or other land use via USGS 2006
National Land Cover Database (NLCD). Activity data were analyzed to restrict to cropland fires and
assign state and crop-specific emission factors. Emissions were then appropriately weighted based on
known statistics about each state's crop mix.

These SMARTFIRE-based ag burning emissions were provided in at 1km point source and day-specific
resolution. State-county FIPS codes were assigned using GIS. The emissions were aggregated to county
and monthly resolution and converted to SMOKE nonpoint FF10 format.  This SMARTFIRE-based ag
burning dataset includes emissions for all but these 7 of the lower 48 states: CT, DC, MA, ME, NH, RI
and VT. These 7 states did not contain any cropland burning estimates for year 2008 based on this
SMARTFIRE approach.

Open burning RPO data
All 2008 NEI open burning emissions (CAPs  only) were replaced in the MARAMA states with the 2007
                                          23

-------
MARAMA open burning inventory. These MARAMA open burning emissions include estimates for
household waste (SCC=2610030000), land clearing (2610000500) and yard waste leaf and brush
(2610000100 and 2610000400 respectively).
The 2008 NEI land clearing emissions in Georgia and Florida were replaced with SESARM-based year-
2007 data. The SESARM land clearing emissions are based on daily point emissions from the
CONSUME v3.0 model (SESARM, 2012a). These daily point-format emissions were aggregated to
county and monthly resolution as a separate FF10 nonpoint monthly inventory.

TN coal combustion
Tennessee nonpoint industrial coal combustion (SCC=2102002000) emissions are significantly
overestimated in the 2008 NEI because of incorrect reconciliation with the point source inventory.
Nonpoint industrial coal combustion emissions were estimated by subtracting point source emissions
rather than activity. By not accounting for controlled sources, the remaining activity for nonpoint coal
combustion is significantly overestimated. EPA NEI experts determined that it would be more
appropriate to completely remove the nonpoint component of this sector than to leave the values as they
were.  The reality for TN industrial coal combustion nonpoint sector emissions is likely much closer to
zero than the value in the 2008 NEI because these emissions are accounted for in the point source
inventory.

Residential Wood Combustion
There were many modifications to the RWC emissions data. First, all RWC outdoor wood burning
devices such as "fire pits and chimeas" (SCC=2104008700) were removed because they were only
reported in a couple of states, RPO inventories did not include them for most states and emissions were
generally insignificant. A market research report (Frost and Sullivan, 2010) developed  in support of the
potential RWC New Source Performance Standard (NSPS) indicated slower sales of outdoor hydronic
heaters compared to what was assumed for growth estimates in the 2008 NEI. Therefore,  outdoor
hydronic heater appliance counts and emissions estimates (SCC=2104008610) were recomputed for all
states, resulting in a 51% reduction to outdoor hydronic heater emissions for all  states.

In addition, all emissions in the SESARM states (i.e., AL, FL, GA, KY, MS, NC, SC, TN, VA, WV),
including Virginia, were replaced with the SESARM year-2007 inventory (SESARM, 2012b). Urban area
RWC were lower than the NEI estimates partially because of the  assumptions about greater penetration of
natural gas fireplaces, less access to inexpensive wood supplies and a lower proportion of housing units
with wood burning appliances as primary heating units than rural areas. Overall, the SESARM RWC
estimates are considerably lower than the 2008 NEI estimates for several states, particularly for
"uncertified" and "general" wood stoves and insert categories: FL, KY, NC, TN, VA and WV. However,
emissions in Mississippi are only slightly reduced and emissions in AL, GA and SC are very similar to
those in the 2008NEIv2.

The Midwest RPO (LADCO) states (i.e., IL, IL, MI, OH, WI, MN) year-2007 RWC inventory was
similar to the 2008 NEI for most source types.  However, the pellet stoves (SCC=2104008400), indoor
furnaces (2104008510), and outdoor hydronic heater (OHH, SCC=2104008610) estimates were updated
to reallocate the indoor furnaces and OHHs to non-MSA counties (LADCO, 2012) for  several urban
areas. Some double counting of appliances was also fixed in Wisconsin and Michigan. Overall, the
MWRPO states totals are very  similar to the 2008 NEI; however, emissions are  spatially redistributed
from urban to rural areas.  Therefore, for the MWRPO states, the 2008 NEI emissions were used for all
                                          24

-------
RWC sources except the three aforementioned SCCs that use the 2007 MWRPO data.

Emissions from indoor wood fired furnaces (SCC=2104008510) in several MWRPO states based were
also recomputed based on newer, improved survey data from Minnesota. The 2008 NEI for these sources
started with an assumption of year 2002 Minnesota wood burning survey data of 38 indoor furnaces per
100 woodstoves for Illinois, Indiana, Michigan, Ohio, and Wisconsin. More recent year 2007 MN survey
data resulted in the much lower ratio of 7.3 indoor furnaces per 100 wood stove units.  Thus, for the other
five MWRPO states previously listed, the indoor furnace emissions are normalized by  setting the indoor
furnace count ratio to wood stoves to match the 7.6% reported value in Minnesota. The resulting
adjustment factors reduce the indoor furnace emissions in these states by 67% (Wisconsin) to as much as
83% in Ohio.

The MARAMA states (i.e., CT, DE, DC, ME, MD, MA, NH, NJ, NY, PA, RI, VT) year 2007 RWC
inventory was either unchanged from the 2008 NEI, or was missing for most states. The exceptions were
New York and Pennsylvania which includes significantly revised RWC estimates  compared to the 2008
NEI. For New York, the MARAMA estimates were not split out into the refined set of 10 RWC
appliance types/SCCs in the NEI.  New York only reported "general" fireplaces (SCC=2104008100) and
"EPA certified, non-catalytic" woodstoves (SCC=2104008320). However, similar to the SESARM and
MWRPO improvements, the MARAMA NY RWC estimates were spatially reallocated from urban to
more rural areas and were also lower state-wide than the NEI. For Pennsylvania, MARAMA RWC
estimates were not much different state-wide on the aggregate, but were refined by SCC and spatially
compared to the 2008 NEI. Therefore, the MARAMA 2007 RWC data is used for New York and
Pennsylvania and the 2008 NEI emissions are used for all RWC sources in the rest of the MARAMA
states.

The uniform temporalization from month to day was modified to be day-of-year specific as discussed in
more detail in the emissions modeling section. In short, the SMOKE program (GenTPRO) is used to
distribute annual RWC emissions to the coldest days of the year, using maximum temperature thresholds
by-state and/or by-county. On days where the low temperature does not drop below this threshold, RWC
emissions are zero. Conversely, the program temporally allocates the most relative emissions to the
coldest days.  This meteorological-based temporal allocation can have a substantial impact on the amount
of RWC emissions in an area on any given day.

3.2.4  Day-Specific Point Source Fires (ptfire)
Wildfire and prescribed burning emissions are contained in the ptfire sector. The ptfire sector has
emissions provided at geographic  coordinates (point locations) and has daily estimates  of the emissions
from each fires value. The ptfire sector for the 2009 Platform excludes agricultural burning and other open
burning sources, which are included in the nonpt sector. The agricultural burning and other open burning
sources are in the nonpt sector because these categories were not factored into the development of the
ptfire sector.  Additionally, their year-to-year impacts are not as variable as wildfires and non-agricultural
prescribed/managed burns.

The ptfire sector includes a satellite derived latitude/longitude of the fire's origin and other parameters
associated with the emissions such as acres-burned and fuel load, which allow estimation of plume rise.
Note that agricultural burning is not included in the ptfire sector but is included in the nonpt sector. The
point source day-specific emission estimates for 2009 fires rely on the Satellite Mapping Automated
                                           25

-------
Reanalysis Tool for Fire Incident Reconciliation Version 2 (SMARTFIRE2) system (Raffuse, et al.,
2012). Activity data was used from the Monitoring Trends in Burn Severity (MTBS) project, Incident
Command Summary Reports (ICS-209), and the National Oceanic and Atmospheric Administration's
(NCAA's) Hazard Mapping System (HMS).

The method involves the reconciliation of ICS-209 reports (Incident Status Summary Reports) with
satellite-based fire detections to determine spatial and temporal information about the fires. The ICS-209
reports for each large wildfire are created daily to enable fire incident commanders to track the status and
resources assigned to each large fire (100 acre timber fire or 300 acre rangeland fire).  The SMARTFIRE
system of reconciliation with ICS-209 reports  is described in an Air and Waste Management Association
report (Raffuse, et al., 2007). Once the fire reconciliation process is completed, the emissions are
calculated using the U.S. Forest Service's CONSUMEvS.O fuel consumption model and the FCCS fuel-
loading database in the BlueSky Framework (Ottmar, et. al., 2007). The detection of fires with this
method is satellite-based. Additional sources of information used in the fire classification process
included MODIS satellite and fuel moistures derived from fire weather observational  data.

The ICS-209 reports for each large wildfire are created daily to enable fire incident commanders to track
the status and resources assigned to each large  fire (100 acre timber fire or 300 acre rangeland fire). Note
that the distinction between wildfire and prescribed burn is not as precise as with ground-based methods.
The fire size was based on the number of satellite pixels and a nominal fire size of 100 acres/pixel was
assumed for a significant number of fire detections when the first detections were not matched to ICS 209
reports, so the fire size information is not as precise as ground-based methods.

The activity data and other information were used within the  BlueSky Framework to model vegetation
distribution, fuel consumption, and emission rates, respectively. Latitude and longitude locations were
incorporated as a post processing step.  The method to classify fires as WF, WFU, RX (FCCS > 0), and
unclassified (FCCS > 0) involves the reconciliation of ICS-209 reports (Incident Status Summary
Reports) with satellite-based fire detections to  determine spatial and temporal information about the fires.

Because the HMS satellite product from NOAA is based on daily detections, the emission inventory
represents a time-integrated emission estimate. For example, a large smoldering fire will show up on
satellite for many days and would count as acres burned on a  daily basis; whereas a ground-based method
would count the area burned only once even it burns over many days.

The SMOKE-ready "ORL" inventory files created from the raw daily fires contain both CAPs and HAPs.
The BAFM HAP emissions from the inventory were obtained using VOC speciation profiles (i.e., a "no-
integrate noHAP" use case).  The BEIS3.14 model creates gridded, hourly, model-species emissions from
vegetation and soils. It estimates CO, VOC,  and NOx emissions for the U.S., Mexico, and Canada. The
BEIS3.14 model is described further in
http://www.cmascenter.org/conference/2008/slides/pouliot tale two cmasOS.ppt. Additional references
for this method are provided in (McKenzie, et  al., 2007), (Ottmar, et al., 2003), (Ottmar, et al., 2006), and
(Anderson et al., 2004).

3.2.5   Biogenic Sources (beis)
For CMAQ, biogenic emissions were computed with the BEIS3.14 model within SMOKE using 2009
meteorological data. The BEIS3.14 model creates gridded, hourly, model-species emissions from vegetation
                                           26

-------
and soils. It estimates CO, VOC (most notably isoprene, terpine, and sesquiterpene), and NO emissions for the
U.S., Mexico, and Canada. The BEIS3.14 model is described further in:
http://www.cmascenter.org/conference/2008/slides/pouliot_tale_two_cmas08.ppt.

The inputs to BEIS include:
    •   Temperature data at 2 meters from the CMAQ meteorological input files,
    •   Land-use data from the Biogenic Emissions Landuse Database, version 3 (BELD3) that provides
       data on the 230 vegetation classes at 1-km resolution over most of North America.


3.2.6   Mobile Sources (onroad, onroad_rfl, nonroad, clc2rail, c3marine)
The 2009 onroad emissions are broken out into two sectors: "onroad" and "onroad_rfl". Aircraft
emissions are in the nonEGU point inventory.  The locomotive and commercial marine emissions are
divided into two sectors: "clc2rail" and "c3marine",  and the "nonroad" sector contains the remaining
nonroad emissions. Note that the 2008 NEI includes  state-submitted emissions data for nonroad, but the
modeling performed for this platform  does not incorporate state-submitted emissions for the onroad or
nonroad sectors, except for California. All  tribal data from the mobile sectors have been dropped because
we do not have spatial surrogate data,  and the emissions are small.

The onroad and onroad_rfl sectors are processed separately to allow for different spatial allocation to be
applied to onroad refueling via a gas station surrogate, versus onroad vehicles that are spatially allocated
based on roads and population. Except for California, all onroad and onroad refueling emissions are
generated using the  SMOKE-MOVES emissions modeling framework that leverages MOVES201 Ob-
generated outputs (http://www.epa.gov/otaq/models/moves/index.htm) and hourly meteorology.
Emissions for onroad (including refueling), nonroad  and clc2rail sources in California were provided by
the California Air Resources Board (CARB).

The nonroad sector is based on NMIM except for California which uses data provided by the California
Air Resources Board (CARB).  NMIM (EPA, 2005)  creates the nonroad emissions on a month-specific
basis that accounts for temperature, fuel types,  and other variables that vary by month. The 2009 NMEVI
nonroad emissions were generated using updated activity (fuels, vehicle population, etc) data, but are
otherwise similar in methodology to those generated  for the 2005 NEI. All nonroad emissions are
compiled at the county/SCC level. Detailed inventory documentation for the 2008 NEI nonroad sectors is
available at http://www.epa.gov/ttn/chief/net/2008inventory.html#inventorydoc. Neither NMIM nor
MOVES generates tribal data.

The locomotive and commercial marine vessel (CMV) emissions are divided into two nonroad sectors:
"clc2rail" and "c3marine". The clc2rail sector includes all railway  and most rail yard emissions as well
as the gasoline and diesel-fueled Class 1 and Class 2 CMV emissions. The c3marine sector emissions
contain the larger residual fueled ocean-going vessel Class 3 CMV emissions and are treated as point
emissions with an elevated release component; all other nonroad emissions are treated as county-specific
low-level emissions (i.e., are in model layer 1).  The  2008  NEI c3marine emissions were replaced with a
set of approximately 4-km resolution point source format emissions.  These data are used for all states,
including California, as well as offshore and international  emissions within our air quality modeling
domain, and are modeled separately as point sources in the "c3marine" sector.
                                           27

-------
3.2.7  Onroad non-refueling (onroad)
For the Version 5 modeling platform, EPA estimated emissions for every county in the continental U.S.
except for California using similar methods as for the 2008 NEI Versions 2 and 3. The modeling
framework took into account the strong temperature sensitivity of the onroad emissions. Specifically,
county-specific inputs and tools were used that integrated the MOVES model with the SMOKE emission
inventory model to take advantage of the gridded hourly temperature information available from
meteorology modeling used for air quality modeling. This integrated "SMOKE-MOVES" tool was
developed by EPA in 2010 and is in use by states and regional planning organizations for regional air
quality modeling.  SMOKE-MOVES requires emission rate "lookup" tables generated by MOVES that
differentiate emissions by process (running, start, vapor venting, etc.), vehicle type, road type,
temperature, speed, hour of day, etc.

To generate the MOVES emission rates that could be applied across the U.S., EPA used an automated
process to run MOVES to produce emission factors by temperature and  speed for 146 "representative
counties," to which every other county could be mapped as detailed below. Using the MOVES emission
rates, SMOKE selected appropriate emissions rates for each county, hourly temperature, SCC, and speed
bin and multiplied the emission rate by activity (i.e., VMT (vehicle miles travelled) or vehicle
population) to produce emissions. These calculations were done for every county, grid cell, and hour in
the continental United States.  SMOKE-MOVES can be used with different versions of the MOVES
model. For the Version 5 modeling platform, EPA used the latest publically released version:
MOVES2010b (http://www.epa.gov/otaq/models/moves/index.htm). The MOVES default database used
was named movesdb20120410.

Using SMOKE-MOVES for creating emissions for modeling requires numerous steps, as described in the
sections below:

    •  Determine which counties will be used to represent other counties in the MOVES runs.
    •  Determine which months will be used to represent other month's fuel characteristics.
    •   Create MOVES inputs needed only for MOVES runs. MOVES  requires county-specific
      information on vehicle populations, age distributions, and inspection-maintenance programs for
       each of the representative counties.
    •   Create inputs needed both by MOVES and by SMOKE, including a list of year-specific
      temperatures and activity data.
    •  Run MOVES to  create emission factor tables using year-specific fuel information.
    •  Run SMOKE to  apply the emission factors to activities to calculate emissions.
    •  Aggregate the results at the county-SCC level for summaries and quality assurance.


Some data used in the SMOKE-MOVES process is year-specific. When MOVES was run to generate the
emission factors,  gasoline and diesel properties for representing counties were based on 2009 fuel
information (i.e.,  RegionalFuels_2009_20120323). The temperature and humidity inputs were also based
on 2009 values. The VMT used by SMOKE-MOVES was generated by taking 2009 VMT by state and
freeway/non-freeway from FHWA VM-2 tables and allocating to county and month and roadtype using
the 2008 NEI VMT. The VMT was allocated to vehicle type using FHWA's VM-4 table and to MOVES
                                          28

-------
sourcetype using ratios from MOVES. Vehicle populations were then generated by applying
VMT/vehicle default ratios from MOVES to the VMT.  The same speed data used for the 2008 NEI were
also used for this study.

The California emissions were post-processed to incorporate both CARB supplied inventories and the
shape of the meteorologically-based SMOKE-MOVES results by scaling the SMOKE-MOVES
generated totals to match CARB-provided totals. Because CARB provide 2007 and 2011 emissions data,
the data for 2009 were linearly interpolated between 2007 and 2011 levels. For more details on this
process, see the Version 5 platform documentation.

3.2.8   OnroadRefueling (onroad_rfl)
Onroad refueling was modeled very similarly to the other onroad emissions. MOVES2010b was used
produce emission factors (EFs) for refueling. These EFs are at the resolution of the onroad SCC and
were run separately from the other onroad mobile sources to allow for different spatial allocation.  To
facilitate this, the EFs were separated into refueling and non-refueling tables.  SMOKE-MOVES was
then run using these EF tables as inputs and the results spatially allocated based on a gas stations spatial
surrogate. For California, the SMOKE-MOVES generated emissions were used for onroad refueling
without any adjustments because there were no CARB-supplied refueling emissions.

3.2.9   Nonroad Mobile Sources — NMIM-Based (nonroad)
The nonroad sector includes  monthly exhaust, evaporative and refueling emissions from nonroad engines
(not including commercial marine, aircraft, and locomotives) that are derived from NMEVI for all states
except California.  NMIM 20090504 was run using 2009 meteorological and fuel data to create county-
SCC emissions by month for the 2009 nonroad mobile CAP and HAP sources.  This version of NMEVI ran
the NROSa version of NONROAD. The nonroad county database was labeled 20101201_2009. The run
incorporated Bond rule revisions to some of the base case inputs and the Bond Rule controls did not take
effect until future years. NMEVI provides nonroad emissions for VOC by three emission modes: exhaust,
evaporative and refueling. Unlike the onroad sector, refueling emissions from nonroad sources are not
separated into a different sector.

EPA default inputs were replaced by state inputs where such data were provided via the 2008 NEI process.
The 2008 NEI documentation describes this and other details of the NMEVI nonroad emissions
development. CAPs and only the necessary HAPs for the nonroad sector (i.e., BAFM, butadiene, and
naphthalene) were included.  For this study, NMEVI was run separately for each county. To aid with the
processing by SMOKE, the mode was appended to the pollutant name and the  California NMEVI data was
replaced with state-supplied data.

For California, year 2009 nonroad  emissions values were interpolated between the 2007 and  2011
emissions provided by CARB.  The CARB-supplied nonroad annual inventory to monthly emissions
values by using the aforementioned EPA NMEVI monthly inventories to compute monthly ratios by
pollutant and SCC. Some adjustments to the CARB inventory were needed to  convert the provided total
organic gas (TOG) to the VOC that was needed by SMOKE.

3.2.10 Nonroad Mobile Sources:  Commercial Marine Cl, C2, and Locomotive (clc2rail)
                                          29

-------
The clc2rail sector contains CAP and HAP emissions from locomotive and commercial marine sources,
except for the category 3/residual-fuel (C3) commercial marine vessels (CMV) found in the cSmarine
sector. The "clc2" portion of this sector name refers to the Class I/II CMV emissions, not the railway
emissions. Railway maintenance emissions are included in the nonroad sector because these are included
in the nonroad NMEVI monthly inventories.  The C3 CMV emissions are in the cSmarine sector. Except
for California, the emissions in the clc2rail sector are year 2008 and are composed of the following
SCCs: 2280002100 (CMV diesel, ports), 2280002200 (CMV diesel, underway), 2285002006
(locomotives diesel line haul Class I), 2285002007 (locomotives diesel line haul Class II/III),
2285002008 (locomotives diesel line haul passenger trains), 2285002009 (locomotives diesel line haul
commuter lines), and 2285002010 (locomotives diesel, yard).

The 2008 NEI Version 2 was the starting point for this sector, but several adjustments were made. First,
the 2008 NEI point inventory contains rail yard emissions for several states and counties.  The NEI point
and nonpoint inventories were reviewed for counties with significant rail yard emissions in both
inventories.  It was assumed that the point inventory contained more accurate information when both
inventories contained rail yard emissions. Therefore, nonpoint rail yards were removed from the clc2rail
sector for certain counties in California, Maryland, Oregon and Arizona. For more information, see the
Version 5 2007 platform documentation.

Analysis of the total rail emissions in the 2008 NEI showed what appeared to be missing rail line
emissions in Texas.  It was determined that line haul emissions from Texas were essentially zero in the
2008 NEI. Therefore, all line haul emissions from the 2008 NEI were removed and information from an
EPA default dataset of Texas line haul emissions was added.  These EPA line haul emissions are
restricted to the Class I and Class II/III operations and add approximately 52,000 tons  of NOX to Texas
that would otherwise be missing.

For several Texas counties, the C1/C2 CMV emissions in the 2008 NEI included EPA gap filled values
where shape IDs were not populated on submittal.  The intended Texas submittal was  often much smaller
than the EPA-estimated default value for several counties.  An example of this is Harris county
(FIPS=48201) where the Texas submittal was approximately 1,200 tons of NOX for port and underway
emissions but not all shape IDs were included. The NEI methodology used EPA emissions where Texas
did not provide estimates and the resulting double count and overestimate of this top-down method
resulted in over 49,000 tons of NOX in the 2008 NEI in Harris County, Texas. Therefore, the modeling
platform used the original Texas submittal, did not append any EPA emissions, and summed up port and
underway for the modeling files to the county level. Similar corrections to these may  have been included
in Version 3  of the 2008 NEI. Other states were impacted by a similar error in the 2008 NEI Version 2,
but for many of these states alternative data were used as discussed below.

For California, the California Air Resources Board (CARB) provided year 2007 and 2011 emissions for
all mobile sources, including C1/C2 CMV and rail. These emissions are documented  in a staff report
available at:  http://www.arb.ca.gov/regact/2010/offroadlsi 10/offroadisor.pdf. The modeling platform uses
2009 emissions interpolated between the 2007 and 2011 emissions. The C1/C2 CMV  emissions were
obtained from the CARB nonroad mobile dataset and include the regulations to reduce emissions from
diesel engines on commercial harbor craft operated within California waters and 24 nautical miles of the
California baseline.  These  emissions were developed using Version 1 of the CEP AM that supports
various California off-road regulations. The locomotive emissions were obtained from the CARB trains
                                          30

-------
dataset "ARMJ_RF#2002_ANMJ AL_TRAINS.txt". Documentation of the CARS offroad mobile
methodology, including clc2rail sector data, is provided here:
http://www.arb.ca.gov/msei/categories.htm#offroad_motor_vehicles. The CARB inventory TOG
emissions were converted to VOC by dividing the inventory TOG by the available VOC-to-TOG
speciation factor.

Year-2007 inventories provided by MARAMA, SESARM and the MWRPO were used for the clc2rail
sector emissions in their respective states. Emissions data from MARAMA rather than SESARM was
used for Virginia because the SESARM data included some rather large emissions for Commuter Lines
(SCC=2285002009) that were not reflected in the 2008 NEI nor the MARAMA dataset.  The MWRPO
year-2007 clc2rail data were obtained from a subset of their version 7 emissions modeling file
"nrinv.mwrpo_alm.baseCv7.annual.orl.txt", where MWRPO NEI Inventory Format (NIF)-formatted data
were converted to SMOKE ORL format. The MARAMA dataset was obtained from  a subset of their
version 3.3 January 27, 2012 vintage file "ARINV_2007_MAR_Jan2012.txt". The SESARM dataset
was obtained from a subset of the file "nrinv.alm.semap.base07.v093010.orl.txt" developed for the
Southeastern Modeling, Analysis, and Planning (SEMAP) project.  All RPO datasets  were edited to
remove non-clc2rail sources.

3.2.11  Nonroad mobile sources: C3 commercial marine (c3marine)

The c3marine sector emissions data were developed based on a 4-km resolution ASCII raster format
dataset used since the Emissions Control Area-International Marine Organization (ECA-EVIO) project
began in 2005, then known as the Sulfur Emissions Control Area (SECA).  These emissions consist of
large marine diesel engines (at or above 30 liters/cylinder) that until very recently, were allowed to meet
relatively modest emission requirements, often burning residual fuel. The emissions in this sector are
comprised of primarily foreign-flagged ocean-going vessels, referred to as Category 3 (C3) CMV ships.

The c3marine inventory includes these ships in several intra-port modes (cruising, hoteling, reduced
speed zone, maneuvering, and idling) and underway mode and includes near-port auxiliary engines. An
overview of the C3 EGA Proposal to the International Maritime Organization (EPA-420-F-10-041,
August 2010) project  and future-year goals for reduction of NOX, SO2, and PM  C3 emissions can be
found at: http://www.epa.gov/oms/regs/nonroad/marine/ci/420r09019.pdf  The resulting ECA-EVIO
coordinated strategy, including emission standards under the Clean Air Act for new marine diesel engines
with per-cylinder displacement at or above 30 liters, and the establishment of Emission Control Areas is
at:  http://www.epa.gov/oms/oceanvessels.htm.

The ECA-EVIO emissions data were converted to SMOKE point-source ORL input format as described in
http://www.epa.gov/ttn/chief/conference/eil7/session6/mason.pdf thereby allowing for the emissions to
be allocated to modeling layers above the surface layer. As described in the paper, the ASCII raster
dataset was converted to latitude-longitude, mapped to state/county FIPS codes that extended up to 200
nautical miles (nm) from the coast, assigned stack parameters, and monthly ASCII raster dataset
emissions were used to create monthly temporal profiles. Counties were assigned as extending up to
200nm from the coast because this was the distance to the edge of the U.S. Exclusive Economic Zone
(EEZ), a distance that defines the outer limits of ECA-EVIO controls for these vessels. All non-US
emissions (i.e., in waters considered outside of the 200nm EEZ, and hence out of the U.S. territory) are
assigned a dummy state/county FIPS code=98001. The SMOKE-ready data were cropped from the
original ECA-EVIO data to cover only the 36-km CMAQ domain, which is the largest domain used for this
                                          31

-------
effort, and larger than the 12km domain used in this project.

The base year ECA inventory is 2002 and consists of these CAPs: PM10, PM2.5, CO, CO2, NH3, NOX,
SOX (assumed to be SO2), and Hydrocarbons (assumed to be VOC). The EPA developed regional
growth (activity-based) factors that we applied to create the 2007v5 inventory from the 2002 data. These
growth factors are provided in Table 3-4.  The East Coast and Gulf Coast regions were divided along a
line roughly through Key Largo (longitude 80° 26' West).
                                          32

-------
               Table 3-4. Growth factors to project the 2002 ECA inventory to 2009
Region
East Coast (EC)
Gulf Coast (GC)
North Pacific (NP)
South Pacific (SP)
Great Lakes (GL)
Outside ECA
EEZ FIPS
85004
85003
85001
85002
n/a
98001
NOx
1.284
1.137
1.193
1.334
1.108
1.252
PMio
1.374
1.217
1.268
1.429
1.137
1.338
PMl.5
1.376
1.214
1.250
1.427
1.137
1.338
voc
1.374
1.216
1.268
1.417
1.138
1.338
CO
1.374
1.217
1.268
1.415
1.137
1.338
SOi
1.374
1.217
1.268
1.434
1.137
1.338
A modification to the original ECA-IMO c3marine dataset include updating the state of Delaware county
total emissions to reflect comments received during the Cross-State Air Pollution Rule (CSAPR)
emissions modeling platform development: http://www.epa.gov/ttn/chief/emch/index.html#fmal. The
original ECA-IMO inventory also did not delineate between ports and underway (or other C3 modes such
as hoteling, maneuvering, reduced-speed zone, and idling) emissions; however, we used a U.S. ports
spatial surrogate dataset to assign the ECA-IMO emissions to ports and underway SCCs - 2280003100
and 2280003200, respectively. This has no effect on temporal allocation or speciation because all C3
emissions, unclassified/total, port and underway, share the same temporal and speciation profiles.

Canadian near-shore emissions were assigned to province-level FIPS codes and paired those to region
classifications for British Columbia (North Pacific), Ontario (Great Lakes) and Nova Scotia (East Coast).
The assignment of U.S. FIPS was also restricted to state-federal water boundaries data from the Mineral
Management Service (MMS) that extended only (approximately) 3 to 10 miles offshore. Emissions
outside the 3 to  10 mile MMS boundary but within the approximately 200 nm EEZ boundary in Figure 2
8 were projected to year 2009 using the same regional adjustment factors as the U.S. emissions; however,
the FIPS codes were assigned as "EEZ" FIPS. Note that state boundaries in the Great Lakes are an
exception, extending through the middle of each lake such that all emissions in the Great Lakes are
assigned to a U.S. county or Ontario.  The classification of emissions to  U.S. and Canadian FIPS codes is
primarily needed only for inventory summaries and is irrelevant for air quality modeling except
potentially for source apportionment of states contributions to transport.
Factors were applied to compute HAP emissions (based on emissions ratios) to VOC to obtain HAP
emissions values. Table 3-5 below shows these factors. Because HAPs were computed directly from the
CAP inventory and the calculations are therefore consistent, the entire c3marine sector utilizes CAP-HAP
VOC integration to use the VOC  HAP species directly, rather than VOC  speciation profiles.
  Table 3-5. HAP emission ratios for generation of HAP emissions from criteria emissions for C3
                                   commercial marine vessels
                        Pollutant      Apply to
                        Acetaldehyde     VOC
                        Benzene        VOC
                        Formaldehyde    VOC
Pollutant
  Code      Factor
  75070
  71432
  50000
0.0002286
 9.80E-06
0.0015672
                                          33

-------
3.2.12 Emissions from Canada, Mexico and Offshore Drilling Platforms (othpt, othar, othon)
The emissions from Canada, Mexico, and offshore drilling platforms are included as part of three
emissions modeling sectors: othpt, othar, and othon. The "oth" refers to the fact that these emissions are
usually "other" than those in the U.S. state-county geographic FIPS code, and the third and fourth
characters provide the SMOKE source types: "pt" for point, "ar" for "area and nonroad mobile", and
"on" for onroad mobile. All "oth" emissions are CAP-only inventories.

For Canada, year-2006 Canadian emissions were used but several modifications were applied to the
inventories:

   1.  Wildfires or prescribed burning were not included because Canada does not include these
       inventory data in their modeling.
   2.  In-flight aircraft emissions were not included because we do not include these for the U.S. and we
       do not have a finalized approach to include in our modeling.
   3.  A 75% reduction ("transport fraction") was applied to PM for the road dust, agricultural, and
       construction emissions in the Canadian "afdust" inventory.  This approach is more  simplistic than
       the county-specific approach used for the U.S., but a comparable approach was not available for
       Canada.
   4.  Speciated VOC emissions from the ADOM chemical mechanism were not included because we
       use speciated emissions from the  CBS chemical mechanism that Canada also provided.
   5.  Residual fuel CMV (C3) SCCs (22800030X0) were removed because these  emissions are
       included in the c3marine sector, which covers not only emissions close to Canada but also
       emissions far at sea.  Canada was involved in the inventory development of the c3marine sector
       emissions.
   6.  Wind erosion  (SCC=2730100000) and cigarette smoke (SCC=2810060000) emissions were
       removed from the nonpoint (nonpt) inventory; these emissions are also absent from our U.S.
       inventory.
   7.  Quebec PM2.5 emissions (2,000 tons/yr) were removed for one SCC (2305070000) for Industrial
       Processes, Mineral Processes, Gypsum, and Plaster Products due to corrupt fields after conversion
       to SMOKE input format. This error should be corrected in a future inventory.
   8.  Excessively high CO emissions were removed from Babine Forest Products Ltd (British
       Columbia SMOKE plantid='5188') in the point inventory.
   9.  The county part of the state/county FIPS code field in the SMOKE inputs were modified in the
       point inventory from "000" to "001" to enable matching to  existing temporal profiles.

For Mexico, year 2008 emissions  were used that are projections of their 1999 inventory originally
developed by Eastern Research Group Inc., (ERG,  2006) as part of a partnership between Mexico's
Secretariat of the Environment and Natural Resources (Secretaria de Medio Ambiente y Recursos
Naturales-SEMARNAT) and National Institute of Ecology (Institute Nacional de Ecologia-INE), the
U.S.  EPA, the Western Governors' Association (WGA), and the North American Commission for
Environmental Cooperation (CEC).  This inventory includes emissions from all states in Mexico. A
background on the development of year-2008 Mexico emissions from the 1999 inventory is available at:
http://www.wrapair.org/forums/ef/inventories/MNEI/index.html.
                                          34

-------
The offshore emissions include point source offshore oil and gas drilling platforms.  We used emissions
from the 2008 NEI point source inventory. The offshore sources were provided by the Mineral
Management Services (MMS).

3.2.13  SMOKE-ready non-anthropogenic chlorine inventory
The ocean chlorine gas emission estimates are based on the build-up of molecular chlorine (C12)
concentrations in oceanic air masses (Bullock and Brehme, 2002). Data at 36 km and 12 km resolution
were available and were not modified other than the name "CHLORINE" was changed to "CL2" because
that is the name required by the CMAQ model. The same data was used as in the CAP and HAP 2002-
based Platform was used. See ftp://ftp.epa.gov/EmisInventorv/2002v3CAPHAP/ documentation for
additional details.
3.3    Emissions Modeling Summary

CMAQ requires emissions data to be input as hourly rates of specific gas and particle species for the
horizontal and vertical grid cells contained within the modeled region (i.e., modeling domain).  To
provide emissions in the form and format required by the model, it is necessary to "pre-process" the
"raw" emissions (i.e., emissions input to SMOKE) for the sectors described above. In brief, the process
of emissions modeling transforms the emissions inventories from their original temporal resolution,
pollutant resolution, and spatial resolution into the hourly, speciated, gridded resolution required by the
air quality model.  The pre-processing steps involving temporal allocation, spatial allocation, pollutant
speciation, and vertical allocation of point sources are referred to as emissions modeling.

The temporal resolution of the emissions inventories input to SMOKE for the modeling platform varies
across sectors, and may be hourly, monthly, or annual total emissions.  The spatial resolution, which also
can be different for different sectors, may be at the level of individual point sources, county totals,
province totals for Canada, or municipio totals for Mexico.  This section provides some basic information
about the tools and data files used for emissions modeling as part of the Version 5 platform. The
emissions inventories were discussed in detail earlier. Therefore, we have limited the descriptions of data
in this section to the ancillary data SMOKE uses to perform the emissions modeling steps.

3.3.1   The SMOKE Modeling System
For this study, emission inventories were processed into CMAQ-ready inputs using SMOKE version 3.1.
SMOKE executables and source code are available from the Community Multiscale Analysis System
(CMAS) Center at http://www.cmascenter.org. Additional information about SMOKE is available from
http://www.smoke-model.org. For sectors that have plume rise, the in-line  emissions capability of CMAQ
was used, and therefore source-based emissions  files were created rather than the much larger three-
dimensional files.  For quality assurance purposes, emissions totals by specie for the entire model domain
are output as reports that are then compared to inventory level reports generated by SMOKE to ensure
mass is not lost or gained during this conversion process.

3.3.2   Key Emissions Modeling Settings
When preparing emissions for the air quality model, emissions for each sector are processed separately
                                           35

-------
through SMOKE, and then the final merge program (Mrggrid) is run to combine the model-ready, sector-
specific emissions across sectors. The SMOKE settings in the run scripts and the data in the SMOKE
ancillary files control the approaches used for the individual SMOKE programs for each sector. Table 3-
6 summarizes the major processing steps of each platform sector. The "Spatial" column shows the spatial
approach: "point" indicates that SMOKE maps the source from a point location (i.e., latitude and
longitude) to a grid cell; "surrogates" indicates that some or all of the sources use spatial surrogates to
allocate county emissions to grid cells; and "area-to-point" indicates that some of the sources use the
SMOKE area-to-point feature to grid the emissions.  The "Speciation" column indicates that all sectors
use the SMOKE speciation step, though biogenics speciation is done within BEIS3 and not as a separate
SMOKE step.  The "Inventory resolution" column shows the inventory temporal resolution from which
SMOKE needs to calculate hourly emissions. Note that for some sectors (e.g., onroad, beis), there is no
input inventory.  Instead activity data and emission factors are used in combination with meteorological
data to compute hourly  emissions.

Finally, the "plume rise" column indicates the sectors for which the  "in-line" approach is used.  These
sectors are the only ones which will have emissions in aloft layers, based on plume rise.  The term "in-
line" means that the plume rise calculations are done inside of the air quality model instead of being
computed by SMOKE.  The air quality model computes the plume rise using the stack data and the
hourly air quality model inputs found in the SMOKE output files for each model-ready emissions sector.
The height of the plume rise determines  the model layer into which the emissions are placed. The
cSmarine and ptfire sectors are the only  sectors with only "in-line" emissions, meaning that all of the
emissions are placed in  aloft layers and thus there  are no emissions for those sectors in the two-
dimensional, layer-1 files created by SMOKE. In addition to the other settings, no grouping of stacks was
performed using the PELVCONFIG file because grouping done for  "in-line" processing will not give
identical results as "offline" (i.e., processing whereby SMOKE creates 3-dimensional files). The only
way to get the same results between in-line and offline is to choose to have no grouping.
                        Table 3-6. Key emissions modeling steps by sector
Platform sector

Ptipm
Ptnonipm
Ptfire
Othpt
cSmarine
Ag
Afdust

Beis
clc2rail
Spatial
Point
Point
Point
Point
Point
Surrogates
Surrogates

pre-gridded landuse
Surrogates
Speciation
Yes
Yes
Yes
Yes
Yes
Yes
Yes

in BEIS
Yes
Inventory
resolution
daily & hourly
annual
Daily
annual
annual
annual & monthly
annual

computed hourly
annual

Plume rise
in-line
in-line
in-line
in-line
in-line





n











     Nonpt
     Nonroad
surrogates & area-
to-point

surrogates & area-
to-point
Yes
Yes
annual & monthly for
ag burning and
SESARM open
monthly
                                           36

-------
     Onroad
Surrogates
Yes
computed hourly
     onroad_rfl

     Othar
     Othon
Surrogates

Surrogates
Surrogates
Yes

Yes
Yes
computed hourly

annual
annual
3.3.3   Spatial Configuration
For this study, SMOKE and CMAQ were run for a 12-km modeling domain shown in Figure 3-1
(12US1). The grid used a Lambert-Conformal projection, with Alpha =33, Beta = 45 and Gamma = -97,
with a center of X = -97 and Y = 40. Later sections provide details on the spatial surrogates and area-to-
point data used to accomplish spatial allocation with SMOKE.
Figure 3-1. CMAQ Modeling Domain

3.3.4   ChemicalSpeciation Configuration
The emissions modeling step for chemical speciation creates "model species" needed by the air quality
model for a specific chemical mechanism. These model species are either individual chemical compounds
or groups of species, called "model species." The chemical mechanism used for this study is the Carbon
Bond 05 (CB05) mechanism (Yarwood, 2005) with secondary organic aerosol (SOA) and HONO
                                          37

-------
enhancements as described in http://www.cmascenter.org/help/model_docs/cmaq/4.7/
RELEASE NOTES.txt. The mapping of inventory pollutants to model species is shown in Table 3-7.
From the perspective of emissions preparation, the CB05 with SOA mechanism is the same as was used
in the 2005 platform. It should be noted that the BENZENE model species is not part of CB05 in that the
concentrations of BENZENE do not provide any feedback into the chemical reactions (i.e., it is not
"inside" the chemical mechanism). Rather, benzene is used as a reactive tracer and as such is impacted
by the CB05 chemistry. BENZENE, along with several reactive CB05 species (such as TOL and XYL)
plays a role in SOA formation in CMAQ 4.7.

                   Table 3-7. Model Species Produced by SMOKE for CB05
Inventory Pollutant
CO
NOx
SO2
NH3
VOC
Various additional VOC
species from the biogenics
model which do not map to
the above model species
PMio
PM2.5
Model Species
CO
NO
NO2
SO2
SULF
NH3
ALD2
ALDX
ETH
ETHA
ETOH
FORM
IOLE
ISOP
MEOH
OLE
PAR
TOL
XYL
TERP
PMC
PEC
PNO3
POC
PSO4
PMFINE
Model Species Description
Carbon monoxide
Nitrogen oxide
Nitrogen dioxide
Sulfur dioxide
Sulfuric acid vapor
Ammonia
Acetaldehyde
Propionaldehyde and higher aldehydes
Ethene
Ethane
Ethanol
Formaldehyde
Internal olefm carbon bond (R-C=C-R)
Isoprene
Methanol
Terminal olefm carbon bond (R-C=C)
Paraffin carbon bond
Toluene and other monoalkyl aromatics
Xylene and other polyalkyl aromatics
Terpenes
Coarse PM > 2.5 microns and < 10 microns
Particulate elemental carbon < 2.5 microns
Parti culate nitrate < 2.5 microns
Particulate organic carbon (carbon only) < 2.5
microns
Particulate sulfate < 2.5 microns
Other particulate matter < 2.5 microns
The approach for speciating PM2.5 emissions supports both CMAQ 4.7.1 with five species (i.e., AE5)
and CMAQ 5.0 that includes speciation of PM2.5 into 17 PM model species (i.e., AE6). The TOG and
                                         38

-------
PM2.5 speciation factors that are the basis of the chemical speciation approach were developed from the
SPECIATE4.3 database (http://www.epa.gov/ttn/chief/software/speciate) and is the EPA's repository of
TOG and PM speciation profiles of air pollution sources. A few of the profiles used in the v5 platform
will be published in later versions of the SPECIATE database. The SPECIATE database development
and maintenance is a collaboration involving the EPA's ORD, OTAQ, and the Office of Air Quality
Planning and Standards (OAQPS), and Environment Canada (EPA, 2006a).  The SPECIATE database
contains speciation profiles for TOG, speciated into individual chemical compounds, VOC-to-TOG
conversion factors associated with the TOG profiles, and speciation profiles for PIVh.s. The database also
contains the PM2.5, speciated into both individual chemical compounds (e.g., zinc, potassium,  manganese,
lead) and into the "simplified" PIVh.s components used in the air quality model.  These simplified
components for AE5 are:

   •  PSO4 : primary particulate sulfate
   •  PNOs: primary particulate nitrate
   •  PEC: primary particulate elemental carbon
   •  POC: primary particulate organic carbon
   •  PMFINE: other primary particulate, less than 2.5 micrograms in diameter


NOX can be speciated  into NO, NO2, and/or HONO. For the non-mobile sources, a single profile is used
"NHONO" to split NOX into NO and NO2 with 10% NO2  and  90% NO.  For the mobile sources except
for onroad (including nonroad, clc2rail, cSmarine, othon sectors) and for specific SCCs in othar and
ptnonipm, the profile "HONO" splits NOX into NO, NO2, and HONO with 90% NO, 9.2% NO2 and
0.8% HONO. The onroad sector does not use the "HONO"  profile to speciate NOX. Instead,
MOVES2010b produces speciated NO, NO2, and HONO by source, including emission factors for these
species in the emission factor tables used by SMOKE-MOVES. Within MOVES, the HONO fraction is a
constant 0.008 of NOX. The NO fraction varies by heavy duty versus light duty, fuel type, and model
year. The NO2 fraction =  1 - NO - HONO. For more details on the NOX fractions within MOVES, see
http://www.epa.gov/otaq/models/moves/documents/420rl2022.pdf The SMOKE-MOVES system is
configured to model these  species directly without further speciation.

The approach for speciating VOC emissions from non-biogenic  sources has the following characteristics:
1) for some sources, HAP emissions are used in the speciation process to allow integration of VOC and
HAP emissions in the NEI; and, 2) for some mobile sources, "combination" profiles are specified by
county and month and  emission mode (e.g., exhaust, evaporative). SMOKE computes the resultant profile
on-the-fly given the fraction of each specific profile specified for the particular county, month  and
emission mode. The SMOKE feature called the GSPRO_COMBO file supports this approach.

The VOC speciation approach for the 2009 Platform includes HAP emissions from the NEI in the
speciation process for some sectors. That is instead of speciating VOC to generate all of the species
needed by the model, emissions of the 4 HAPs, benzene, acetaldehyde, formaldehyde and methanol
(BAFM) from the NEI were integrated with the NEI VOC. The integration process combines the BAFM
HAPs with the VOC in a way that does not double-count emissions and uses the BAFM directly in the
speciation process. Generally, the HAP emissions from the NEI  are believed to be more representative of
emissions of these compounds than their generation via VOC speciation.
                                          39

-------
The BAFM HAPs were chosen for this special treatment because, with the exception of BENZENE, they
are the only explicit VOC HAPs in the base version of CMAQ 4.7 model. By "explicit VOC HAPs," we
mean model species that participate in the modeled chemistry using the CB05 chemical mechanism. The
use of these HAP emission estimates along with VOC is called "HAP-CAP integration". BENZENE was
chosen because it was added as a model species in the base version of CMAQ 4.7, and there was a desire
to keep its emissions consistent between multi- pollutant and base versions of CMAQ.

For specific sources, especially within the onroad and onroad_rfl sectors, we included ethanol in our
integration. To differentiate when a source was integrating BAFM versus EBAFM (ethanol in addition to
BAFM), the speciation profiles which do not include ethanol are referred to as an "E-profile", for
example E10 headspace gasoline evaporative speciation profile 8763 where ethanol is speciated from
VOC, versus 8763E where ethanol is obtained directly from the inventory. The specific profiles used in
2009 are the same as used for the 2007 platform (see 2007 speciation in Table 3-6 in the 2007v5 TSD).
The only differences between 2009 and 2007 are the GSPRO_COMBOs, which represent a different
mixture of EO and E10 by county between the two modeling years.

The integration of HAP VOC with VOC is a feature available in SMOKE for all inventory formats other
than PTDAY (the format used for the ptfire sector). SMOKE allows the user to specify the particular
HAPs to integrate and the particular sources to integrate. The HAPs to integrate are specified in the
INVTABLE file, and the sources to integrate are based on the NHAPEXCLUDE file (which lists the
sources that are excluded from integration). For the "integrate" sources,  SMOKE subtracts the "integrate
" HAPs from the VOC (at the source level) to compute emissions for the new pollutant
"NONHAPVOC." The user provides NONHAPVOC-to-NONHAPTOG factors and NONHAPTOG
speciation profiles. SMOKE computes NONHAPTOG and then applies the speciation profiles to allocate
the NONHAPTOG to the other CMAQ VOC species not including the integrated HAPs.

CAP-HAP integration was considered for all sectors and "integration criteria" were developed for some of
those. Table 3-8 summarizes the integration approach for each platform sector. For the clc2rail sector, the
integration criteria were (1) that the source had to have at least one of the 4 HAPs and (2) that the sum of
BAFM could not exceed the VOC emissions. For the nonpt sector, the following integration criteria were
used to determine the sources to integrate:

   1.  Any source for which the sum of B, A, F, or M is greater than the VOC was not integrated, since
       this clearly identifies sources for which there is an inconsistency between VOC and VOC HAPs.
   2.  For some source categories (those that comprised 80% of the VOC emissions), sources were
       selected for integration in the category per specific criteria. For most of these source categories,
       sources may be integrated if they had the minimum combination of B, A, F, and M. For some
       source categories, all sources were designated as "no-integrate".
   3.  For source categories that do not comprise the top  80% of VOC emissions, as long as the source
       has emissions of one of the B, F, A or M pollutants, then it can be integrated.
                                          40

-------
  Table 3-8. Integration status of benzene, acetaldehyde, formaldehyde and methanol (BAFM) for
                                       each platform sector
     Platform         Approach for Integrating NEI emissions of Benzene (B), Acetaldehyde (A), Formaldehyde (F) and
     Sector           Methanol (M)
     Ptipm          NO integration because emissions of BAFM are relatively small for this sector
     Ptnonipm        ^° integration because emissions of BAFM are relatively small for this sector and it is not
                    expected that criteria for integration would be met by a significant number of sources
     Ptfire           No integration.

     Ag             N/A—sector contains no VOC
     Afdust          N/A—sector contains no VOC
     Biog           N/A—sector contains no inventory pollutant "VOC"; but rather specific VOC species
     Clc2rail        Partial integration
     C3 marine        Full integration
     Nonpt          Partial integration
                    Partial integration—did not integrate California emissions, CNG or LPG sources (SCCs
     Nonroad        beginning with 2268 or 2267) because NMIM computed only VOC and not any HAPs for
                    these SCCs.
     Onroad         Full integration
     Othar           No integration—not the NEI
     Othon          NO integration—not the NEI
     Othpt           NO integration—not the NEI
The SMOKE feature to compute speciation profiles from mixtures of other profiles in user-specified
proportions was used in this project. The combinations are specified in the GSPRO_COMBO ancillary
file by pollutant (including pollutant mode, e.g., EXH   VOC), state and county (i.e., state/county FIPS
code) and time period (i.e., month). This feature was used for onroad and nonroad mobile and gasoline-
related related stationary sources. Since the ethanol content varies spatially (e.g., by state or sources use
fuels with varying ethanol content, and therefore the speciation profiles require different combinations of
gasoline and  E10 profiles by county), temporally (e.g., by month) and by modeling year (i.e., future years
have more thanol) the combo feature allows combinations to be specified at various levels for different
years.

The INVTABLE and NHAPEXCLUDE SMOKE input files have a critical function in the VOC
speciation process for emissions modeling cases utilizing HAP-CAP integration, as is done for the 2009
Platform. Two different types of INVTABLE files were developed to use with different sectors of the
platform.  For sectors in which we chose no integration across the  entire sector a "no HAP use"
INVTABLE was developed in which the "KEEP" flag is set to "N" for BAFM pollutants. Thus, any
BAFM pollutants in the inventory input into SMOKE are dropped. This both avoids double-counting of
these species and  assumes that the VOC speciation is the best available  approach for these species for the
sectors using the approach. The second INVTABLE is used for sectors in which one or more sources are
integrated and causes SMOKE to keep the BAFM pollutants and indicates that they are to be integrated
with VOC (by setting the "VOC or TOG component" field to "V"  for all four HAP pollutants. This
integrate INVTABLE is further differentiated into sectors that integrate BAFM versus those that integrate
EBAFM (e.g., the onroad and onroad_rfl sectors).
                                            41

-------
Unlike other sectors, the onroad sector has pre-speciated PM. This speciated PM comes from the
MOVES model and is processed through the SMOKE-MOVES system. Unfortunately, the
MOVES2010b speciated PM does not map 1-to-l to either the AE5 or AE6 species.  Table 3-9 shows the
relationship between MOVES2010b exhaust PM2.5 related species and CMAQ AE5 PM species.

                   Table 3-9. MOVES exhaust PM species versus AE5 species
MOVES2010b Pollutant Name
Primary Exhaust PM2.5 - Total
Primary PM2.5 - Organic Carbon
Primary PM2.5 - Elemental
Carbon
Primary PM2.5 - Sulfate
Particulate
Variable
name for
Equations
PM25 TOTAL
PM25OM
PM25EC
PM25SO4
Relation to AE5 model
species

Sum of POC, PNO3 and
PMFINE
PEC
PSO4
MOVES species are related as follows:
PM25_TOTAL = PM25EC + PM25OM + PSO4

The five CMAQ AE5 species also sum to total PM2.s:
PM2.5 = POC+PEC+PNO3+PSO4+PMFINE

The basic problem is to differentiate MOVES species "PM25OM" into the component AE5 species
(POC, PNO3 and PMFINE). The Moves2smkEF post-processor script takes the MOVES2010b species
(EF tables) and calculates the appropriate AE5 PM2.5 species and converts them into a format that is
appropriate for SMOKE (see http://www.smoke-model.Org/version3.l/html/ch05s02s04.html for details
on the Moves2smkEF script).

For brake wear and tire wear PM, total PM2.5 (not speciated) comes directly from MOVES2010b. These
PM modes are speciated by SMOKE.  PMFINE from onroad exhaust is further speciated by SMOKE into
the component AE6 species.

Speciation profiles for use with BEIS are not included in SPECIATE. The 2009 Platform uses BEIS3.14
and includes a species (SESQ) that was not in BEIS3.13 (the version used for the 2002 Platform). This
species was mapped to the CMAQ species SESQT. The profile code associated with BEIS3.14 profiles for
use with CB05 was "B10C5."

3.3.4   Temporal Processing Configuration
Temporal allocation or temporalization is the process of distributing aggregated emissions to a finer
temporal resolution, such as converting annual emissions to hourly emissions. While the total emissions
are important, the timing of the occurrence of emissions is also essential for accurately simulating ozone,
PM, and other pollutant concentrations in the atmosphere. Typically, emissions inventories are annual or
monthly in nature. Temporalization takes these annual emissions and distributes them to the month, the
monthly emissions to the day, and the daily emissions to the hour. This process is typically done by
                                         42

-------
applying temporal profiles—monthly, day of the week, and diurnal—to the inventories.

The monthly, weekly, and diurnal temporal profiles and associated cross references used to create the
hourly emissions inputs for the air quality model were similar to those used for the 2005v4.3 platform.
Some new methodologies are introduced in this platform and updated profiles are discussed. Temporal
factors are typically applied to the inventory by some combination of country, state, county, SCC, and
pollutant.

Table 3-10 summarizes the temporal aspect of the emissions processing configuration. It compares the key
approaches used for temporal processing across the sectors. The temporal aspects of SMOKE processing
are controlled through (a) the scripts T_TYPE (Temporal type) and M_TYPE (Mergetype) settings and (b)
ancillary data files. In the table, "Daily temporal approach" refers to the temporal approach for getting
daily emissions from the inventory using the Temporal program. The "Merge processing approach" refers
to the days used to represent other days in the month for the merge step. If not "all", then the SMOKE
merge step runs only for representative days, which could include holidays as indicated by the right-most
column. In addition to the resolution, temporal processing includes a ramp-up period for several days
prior to January 1, 2009, intended to mitigate the effects of initial condition concentrations. The ramp up
period for the national  12km grid was 10 days. For most sectors, the emissions from late December of
2008 were used to provide emissions for the end of December, 2009.

The Flat File 2010 format (FF10)  is a new inventory format for SMOKE that provides a more
consolidated format for monthly, daily, and hourly emissions inventories.  Previously, 12 separate
inventory files would be required  to process monthly inventory data.  With the FF10 format, a single
inventory file can contain emissions for all 12 months and the annual emissions in a single record. This
helps  simplify the management of numerous inventories.  Similarly, individual records contain data for
all days  in a month and all hours in a day in the daily and hourly FF10 inventories, respectively.

SMOKE 3.1 prevents the application of temporal  profiles on top of the "native" resolution of the
inventory. For example, a monthly inventory should not have annual to month temporalization applied;
rather, it should only have month to day and diurnal temporalization.  This becomes particularly
important when specific sectors have a mix of annual, monthly, daily, and/or hourly inventories (e.g. the
nonpt sector).  The flags that control temporalization for a mixed set of inventories are discussed in the
SMOKE documentation.
                                          43

-------
               Table 3-10. Temporal Settings Used for the Platform Sectors in SMOKE
                         Inventory
      Platform sector    resolution
      Ptipm
      Ptnonipm
      Ptfire
      Ag
      Afdust
      Beis
      c3 marine
      clc2rail
      Nonpt
      Nonroad
      Onroad
      onroad_rfl
      Othar
      Othon
      Othpt
daily & hourly
annual
Daily
annual & monthly
annual
hourly
annual
annual
annual & monthly
monthly
annual & monthly51
annual & monthly51
annual
annual
annual
                     Monthly
                     profiles
                     used?
yes

yes
yes

yes
yes
yes
yes
yes
yes
            Daily
            temporal
            approach
            1,2
All
Mwdss
All
all
Week
n/a
Aveday
Mwdss
All
Mwdss
all
All
Week
Week
Mwdss
                             Process
              Merge         Holidays as
              processing     separate
              approach 1,3   days?
all
all
all
all
all
all
aveday
mwdss
all
mwdss
all
all
week
week
mwdss
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
 Definitions for processing resolution:
all      = hourly emissions computed for every day of the year
week    = hourly emissions computed for all days in one "representative" week, representing all weeks for each month, which means
emissions have day-of-week variation, but not week-to-week variation within the month
mwdss   = hourly emissions for one representative Monday, representative weekday, representative Saturday and representative Sunday for
each month, which means emissions have variation between Mondays, other weekdays, Saturdays and Sundays within the month, but not
week-to-week variation within the month. Also Tuesdays, Wednesdays and Thursdays are treated the same.
aveday   = hourly emissions computed for one representative day of each month, which means emissions for all days of each month are the
same.
2 Daily temporal approach refers to the temporal approach for getting daily emissions from the inventory using the Temporal program. The
values given are the values of the L_TYPE setting.
3 Merge processing approach refers to the days used to  represent other days in the month for the merge step. If not "all", then the SMOKE
merge step just run for representative days, which could include holidays as indicated by the rightmost column. The values given are the values
of the M_TYPE setting.
a For onroad and onroad_rfl, the annual and monthly refers to activity data (VMT and VPOP). Emissions are computed on an hourly basis.

For the EGU emissions in the ptipm sector, hourly CEM NOx and SO2 data were used directly for
sources that match CEMs. For other pollutants, hourly CEM heat input data were used to allocate the
NEI annual values. For sources not matching CEM data ("non-CEM" sources), daily emissions were
computed from the NEI annual emissions using a structured query language (SQL) program and state-
average CEM data.  To allocate annual emissions to each month, state-specific three-year averages of
2008-2010 CEM data were created.  These average annual-to-month factors were assigned to non-CEM
sources within each  state. To allocate the monthly emissions to each day, the 2009 CEM data to compute
state-specific month-to-day factors, averaged across all units in each state.  These daily emissions wee
calculated outside of SMOKE and the resulting daily inventory is used as an input into SMOKE.

The daily-to-hourly allocation was performed in SMOKE using diurnal profiles. The state-specific and
pollutant-specific diurnal profiles for use in allocating the day-specific emissions for non-CEM sources in
                                               44

-------
the ptipm sector were updated. The 2009 CEM data was used to create state-specific, day-to-hour
factors, averaged over the whole year and all units in each state.  Diurnal factors were calculated using
CEM 862 and NOx emissions and heat input.  862 and NOx-specific factors were computed from the
CEM data for these pollutants. All other pollutants used factors created from the hourly heat input data.
The resulting profiles were assigned by state and pollutant.

Two updated diurnal temporal profiles were incorporated into the 2009 modeling  platform.  For all
agricultural burning, we used a diurnal temporal profile (McCarty et al., 2009) that puts more of the
emissions during the actual work day and suppresses the emissions during the middle of the night was
used. Note that all states used a uniform day of week profile for all agricultural burning emissions,
except for the following states that for which state-specific day of week profiles were used: Arkansas,
Kansas, Louisiana, Minnesota, Missouri, Nebraska, Oklahoma, and Texas. For residential wood
combustion, a profile was used that placed more of the emissions in the morning and the evening when
people are typically using these sources.  This profile is based on an average of 2004 MANE-VU survey
based temporal profiles (see http://www.marama.org/publications_folder/ResWoodCombustion/
Final report.pdf). When this profile was compared to a concentration-based analysis of aethalometer
measurements in Rochester, NY (Wang et al. 2011) for various seasons and day of the week it was found
that the updated RWC profile generally tracked the concentration based temporal  patterns.

The temporal profile assignments for the Canadian 2006 inventory were provided by Environment
Canada along with the inventory. They provided profile assignments that rely on the existing set of
temporal profiles in the 2002 Platform. For point sources, they provided profile assignments by
PLANTID.

3.3.5 Meteorological-based Temporal Profiles
A significant improvement over previous platforms is the introduction of meteorologically-based
temporalization.  We recognize that there are many factors that impact the timing  of when emissions
occur. The benefits of utilizing meteorology as method of temporalizing are: (1) a consistent
meteorological dataset as is used by the AQ model (e.g. WRF) is available; (2) the meteorological model
data is highly resolved in terms of spatial resolution; and (3) the meteorological variables vary at hourly
resolution which can translate to hour-specific temporalization.

The SMOKE program GenTPRO provides  a method for developing meteorologically-based
temporalization.  Currently, the program can utilize three types of temporal algorithms: RWC,
agricultural livestock ammonia, and a generic meteorology based algorithm. For the 2007 platform, we
used the RWC and ag NH3 GenTPRO generated profiles. GenTPRO reads in gridded meteorology data
(MCIP) and spatial surrogates and uses the  specified algorithm to produce a new temporal profile that can
be input into SMOKE. The meteorological variables and the resolution of the generated temporal profile
(hourly, daily, etc.) depend on the algorithm and the run parameters. For more details on the
development of these algorithms and running GenTPRO, see the GenTPRO documentation
http://www.smoke-model.org/version3.1/GenTPRO_TechnicalSummary_Aug2012_Final.pdf and the
SMOKE manual section http://www.smoke-model.Org/version3.l/html/ch05s03s07.html.

For the RWC algorithm, GenTPRO uses the daily minimum temperature to determine the temporal
allocation of emissions to days. GenTPRO was run to create an annual-to-day temporal profile for the
RWC sources within the nonpt sector.  These generated profiles distribute annual  RWC emissions to the
                                          45

-------
coldest days of the year. On days where the minimum temperature does not drop below a user-defined
threshold, RWC emissions are zero. Conversely, the program temporally allocates the largest percentage
of emissions to the coldest days. Similar to other temporal allocation profiles, the total annual emissions
do not change, just the distribution of the emissions within the year. Initially, the RWC algorithm used a
the default temperature threshold of 50 °F. For most of the country, this produced a reasonable
distribution of emissions, but for a few Southern counties all of the emissions were compressed into a few
days creating excessively high daily emissions. GenTPRO was then modified to accept an optional input
that defines a county/state specific alternative temperature threshold.  In addition, an alternative RWC
algorithm was created to avoid negative RWC emissions when the daily minimum temperature was
greater than 53.3 °F. For the v5 platform, the alternative RWC algorithm was used for the whole country,
with the default 50 °F threshold for the majority of the states, and a 60 °F threshold for the following
states: Alabama, Arizona, California, Florida, Georgia, Louisiana, Mississippi, South Carolina, and
Texas.

For the agricultural livestock NH3 algorithm, GenTPRO algorithm is based on the Russel and Cass
(1986) equation. This algorithm uses county-average hourly temperature and wind speed to calculate the
temporal profile. GenTPRO was run to create month-to-hour temporal profiles for these sources.
Because these profiles distribute to the hour based on monthly emissions, the emissions will either come
from a monthly inventory or from an annual inventory that has been temporalized already to the month.

For the onroad and onroad_rfl sectors, meteorology is not used in the development of the temporal
profiles; rather, but meteorology impacts the calculation of the hourly emissions through the program
Movesmrg. The result is that the emissions will vary at the hourly level by grid cell. More specifically,
the on-network (RPD) and the off-network (RPV) exhaust, evaporative, and evaporative permeation
modes use the gridded meteorology (MCIP) directly.  Movesmrg determines the temperature for each
hour and grid cell and uses it to select the appropriate EF for that SCC/pollutant/mode. For the off-
network rate per profile (RPP) emissions, Movesmrg uses  the Met4moves output for SMOKE (daily
minimum and maximum temperatures by county) to determine the appropriate EF for that hour and
SCC/pollutant. The result is that the emissions will vary hourly by county. The combination of these
three processes (RPD, RPV, and RPP) is the total onroad emissions, while the combination of the two
processes (RPD, RPV) for the refueling mode only is the total onroad_rfl emissions.  Both sectors will
show a strong meteorological influence on their temporal patterns.

3.3.6  Vertical Allocation of Emissions
Table 3-6 specifies the sectors for which plume rise is calculated. If there is no plume rise for a sector, the
emissions are placed into layer 1 of the air quality model. Vertical plume rise was performed in-line
within CMAQ for all of the SMOKE point-source sectors (i.e., ptipm, ptnonipm, ptfire, othpt, and
c3marine). The in-line plume rise computed within CMAQ is nearly identical to the plume rise that would
be calculated within SMOKE using the Laypoint program. See http://www.smoke-
model . org/version2.7/html/ch06s07.html for full documentation of Laypoint. The selection of point
sources for plume rise is pre-determined in SMOKE using the Elevpoint program (http://www.smoke-
model, org/version2.7/html/ch06s03.html). The calculation is done in  conjunction with the CMAQ model
time steps with interpolated meteorological data and is therefore more temporally resolved than when it is
done in SMOKE. Also, the calculation of the location of the point source is slightly different than the one
used in SMOKE and this can result in slightly different placement of point sources near grid cell
boundaries.
                                           46

-------
For point sources, the stack parameters are used as inputs to the Briggs algorithm, but point fires do not
have stack parameters. However, the ptfire inventory does contain data on the acres burned (acres per day)
and fuel consumption (tons fuel per acre) for each day. CMAQ uses these additional parameters to
estimate the plume rise of emissions into layers above the surface model layer. Specifically, these data are
used to calculate heat flux, which is then used to estimate plume rise. In addition to the acres burned and
fuel consumption, heat content of the fuel is needed to compute heat flux. The heat content was assumed
to be 8000 Btu/lb of fuel for all fires because specific data on the fuels were unavailable in the inventory.
The plume rise algorithm applied to the fires is a modification of the Briggs algorithm with a stack height
of zero.

CMAQ uses the Briggs algorithm to determine the plume top and bottom, and then computes  the plumes'
distributions into the vertical layers that the plumes intersect. The pressure difference across each layer
divided by the pressure difference across the entire plume is used as a weighting factor to assign the
emissions to layers. This approach gives plume fractions by layer and source.

3.3.7 Emissions Modeling Ancillary Files
The methods used to perform spatial allocation for the 2007 platform are summarized in this section.  For
the 2007 platform, spatial factors are typically applied by country and SCC. As described earlier,  spatial
allocation was performed for a national 12-km domain. To accomplish this, SMOKE used national 12-
km  spatial surrogates and a SMOKE area-to-point data file. For the U.S., the spatial surrogates used
2010-based data (e.g., population) wherever possible. For Mexico, the same spatial surrogates were used
in the 2005 platform. For Canada we used a set of Canadian surrogates provided by Environment
Canada, also unchanged from the 2005v4.3 platform.  The U.S., Mexican, and Canadian 12-km
surrogates cover the entire CONUS domain 12US1  shown in Figure 3-1.  The remainder of this
subsection provides further details on the origin of the data used for the spatial surrogates and the area-to-
point data.

The SMOKE ancillary data files, particularly the cross-reference files, provide the specific inventory
resolution at which spatial, speciation, and temporal factors are applied. For the 2009 Platform, spatial
factors were generally applied by country/SCC, speciation factors by pollutant/SCC or (for combination
profiles) state/ county FIPS code and month, and temporal factors by some combination of country, state,
county, SCC, and pollutant.

3.3.7.1 Surrogates for U.S. Emissions
More than sixty spatial surrogates were used to spatially allocate U.S. county-level emissions to the
CMAQ 12-km grid cells. The Surrogate Tool was used to generate all of the surrogates. The shapefiles
input to the Surrogate Tool are provided and documented at
                                                           The tool and updated documentation for
it is available at http://www.ie.unc.edu/cempd/projects/mims/spatial/ and
http ://www. cmascenter.org/help/documentation. cfm?MODEL=spatialallocator&VERSION=3. 6&temp
id=99999. The detailed steps in developing the county boundaries for the surrogates are documented at
ftp://ftp.epa.gov/EmisInventory/ emiss	shp2006/us/metadata_for	2002_
county boundary shapefiles rev.pdf.

Table 3-11 lists the codes and descriptions of the surrogates. The surrogates in bold have been updated
                                           47

-------
with 2010-based data, including 2010 census data at the block group level, 2010 American Community
Survey Data for heating fuels, 2010 TIGER/Line data for railroads and roads, and 2010 National
Transportation Atlas Data for ports and navigable waterways. For this project "Version 3" of the 2010-
baed spatial surrogates was used. Not all of the available surrogates are used to spatially allocate sources
in the 2007 platform; that is, some surrogates shown in Table 3-11 were not assigned to any SCCs. An
area-to-point approach overrides the use of surrogates for some airport-related sources.

Alternative surrogates for ports (801) and shipping lanes (802) were developed from the 2008 NEI
shapefiles: Ports_032310_wrf and ShippingLanes_l 11309FINAL_wrf. These surrogates were used for
cl and c2 commercial marine emissions instead of the standard 800 and 810 surrogates, respectively.
For the onroad sector, the on-network (RPD) emissions were spatially allocated to roadways, which  the
off-network (RPP and RPV) emissions were allocated to parking areas. For the onroad_rfl sector, the
emissions were spatially allocated to gas station locations.

For the oil and gas sources in the nonpt sector, the WRAP Phase III sources have detailed basin-specific
spatial surrogates shown in Table 3-12.  The remaining oil and gas sources used the 2005-based surrogate
"Oil & Gas Wells, fflS Energy, Inc. and USGS" (680) developed for oil and gas SCCs. The surrogates in
Table 3-12 were applied for the counties listed in Table 3-13.

3.3.7.3 Allocation Method for Airport-Related Sources in the U.S.
There are numerous airport-related emission sources in the 2005 NEI, such as aircraft, airport ground
support equipment, and jet refueling. In the 2002 platform most of these emissions were contained in
sectors with county-level resolution — aim (aircraft), nonroad (airport ground support) and nonpt (jet
refueling), but in the 2005 and 2008 platforms aircraft emissions are included as point sources as part of
the ptnonipm sector.

For the 2009 platform, the SMOKE "area-to-point" approach was used for airport ground support
equipment (nonroad sector), and jet refueling (nonpt sector). The approach is described in detail in the
2002 Platform documentation: http://www.epa.gov/scram001/reports/Emissions%20TSD%20Voll  02-
28-08.pdf.

Nearly the same ARTOPNT file was used to implement the area-to-point approach as was done for the
CAP and HAP-2002-based Platform. This was slightly updated from the CAP-only 2002 Platform by
further allocating the Detroit-area airports into multiple sets of geographic coordinates to support finer
scale modeling. The updated file was retained for the 2009 Platform.

3.3.7.4 Surrogates for Canada and Mexico Emission Inventories
The Mexican emissions and single surrogate (population) were the same as  those used  in the 2002 and
2005 Platforms. For Canada, surrogates provided by Environment Canada with the 2006 emissions were
used to spatially allocate the 2006 Canadian emissions for the 2005 and 2009 Platforms.

The Canadian surrogate data described in Table 3-14 came from Environment Canada. They provided
both the surrogates and cross references; the surrogates were  outputs from the Surrogate Tool (previously
referenced). Per Environment Canada, the surrogates are based on 2001 Canadian census data. The cross-
references that Canada originally provided were updated as follows: all assignments to  surrogate '978'
(manufacturing industries) were changed to '906'  (manufacturing services), and all assignments to '985'
                                          48

-------
(construction and mining) and '984' (construction industries) were changed to '907' (construction
services) because the surrogate fractions in 984, 978 and 985 did not sum to 1. Codes for surrogates other
than population that did not begin with the digit "9" were also changed.
                     Table 3-11. U.S. Surrogates Available for the 2009 Platform
    Code  Surrogate Descnption
                                            Code     Surrogate Description
    N/A
    100
    165

    170
    180
    190
    200
    210
    220
    230
    250
    255
    260
    270
    280
    300
    310
    312
    320
    330
    340
    350
    400
    500
Area-to-point approach (see 3.3.1.2)
Population
    110   Housing
    120   Urban Population
    130   Rural Population

    137   Housing Change
    140   Housing Change and Population
    150   Residential Heating - Natural Gas
    160   Residential Heating - Wood
0.5 Residential Heating - Wood plus 0.5 Low
Intensity Residential
Residential Heating - Distillate Oil
Residential Heating - Coal
Residential Heating - LP Gas
Urban Primary Road Miles
Rural Primary Road Miles
Urban Secondary Road Miles
Rural Secondary Road Miles
  tal Road Miles
                                            520

                                            525

                                            527
                                            530

                                            535
                                            540
                                            545
                                            550
                                            555

                                            560

                                            565
                                            570
                                            575
                                            580
                                            585
                                            590
                                            595
                                            596
Urban Primary plus Rural Primary             600
0.75 Total Roadway Miles plus 0.25 Population   650
Total Railroad Miles                          675

Class 2 and 3 Railroad Miles                   700
Low Intensity Residential                       710
Total Agriculture                              720

Forest Land                                  801

Land                                        807

Rural Land Area                             850
Commercial Land                            860
Commercial plus Industrial plus
Golf Courses + Institutional +Industrial +
Commercial
Single Family Residential
Residential - High Density
Residential + Commercial + Industrial +
Institutional
Retail Trade
Personal Repair
Retail Trade plus Personal Repair
Professional/Technical plus General
Government
Hospital
Medical Office/Clinic
Heavy and High Tech Industrial
Light and High Tech Industrial
Food, Drug, Chemical Industrial
Metals and Minerals Industrial
Heavy Industrial
Light Industrial
Industrial plus Institutional plus Hospitals
Gas Stations

Refineries and Tank Farms and Gas Stations
Oil and Gas
Airport Areas
Airport Points
Military Airports
Marine Ports
NEI Ports
NEI Shipping Lanes
Navigable Waterway Miles
Navigable Waterway Activity
Golf Courses
Mines
                                               49

-------
505   Industrial Land
510   Commercial plus Industrial
515   Commercial plus Institutional Land
870     Wastewater Treatment Facilities
880     Drycleaners
890     Commercial Timber
               Table 3-12. Spatial Surrogates for WRAP Oil and Gas Data
Country
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
Code
699
698
697
696
695
694
693
692
691
690
689
Surrogate Description
Gas production at CBM wells
Well count - gas wells
Oil production at gas wells
Gas production at gas wells
Well count - oil wells
Oil production at Oil wells
Well count - all wells
Spud count
Well count - CBM wells
Oil production at all wells
Gas production at all wells
                   Table 3-13. Counties included in the WRAP Dataset
FIPS
8001
8005
8007
8013
8014
8029
8031
8039
8043
8045
8051
8063
8067
8069
8073
8075
8077
8081
8087
8095
8103
State
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
County
Adams
Arapahoe
Archuleta
Boulder
Broomfield
Delta
Denver
Elbert
Fremont
Garfield
Gunnison
Kit Carson
La Plata
Larimer
Lincoln
Logan
Mesa
Moffat
Morgan
Phillips
Rio Blanco
FIPS
30075
35031
35039
35043
35045
49007
49009
49013
49015
49019
49043
49047
56001
56005
56007
56009
56011
56013
56019
56023
56025
State
Montana
New Mexico
New Mexico
New Mexico
New Mexico
Utah
Utah
Utah
Utah
Utah
Utah
Utah
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
County
Powder River
Me Kinley
Rio Arriba
Sandoval
San Juan
Carbon
Daggett
Duchesne
Emery
Grand
Summit
Uintah
Albany
Campbell
Carbon
Converse
Crook
Fremont
Johnson
Lincoln
Natrona
                                       50

-------
8107
8115
8121
8123
8125
30003
Colorado
Colorado
Colorado
Colorado
Colorado
Montana
Routt
Sedgwick
Washington
Weld
Yuma
Big Horn
56027
56033
56035
56037
56041
56045
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
Niobrara
Sheridan
Sublette
Sweetwater
Uinta
Weston
Table 3-14. Canadian Spatial Surrogates for Canadian Emissions
Code
9100
9101
9102
9103
9104
9106
9111
9113
9114
9115
9116
9211
9212
9213
9219
9221
9222
9231
9232
9233
9308
9309
9313
Description
Population
Total dwelling
Urban dwelling
Rural dwelling
Total Employment
ALLJNDUST
Farms
Forestry and logging
Fishing hunting and trapping
Agriculture and forestry activities
Total Resources
Oil and Gas Extraction
Mining except oil and gas
Mining and Oil and Gas Extract
activities
Mining-unspecified
Total Mining
Utilities
Construction except land subdivision
and land development
Land subdivision and land
development
Total Land Development
Food manufacturing
Beverage and tobacco product
manufacturing
Textile mills
Code
9493
9494
9511
9512
9513
9514
9516
9521
9522
9523
9524
9526
9528
9531
9532
9533
9534
9541
9551
9561
9562
9611
9621
Description
Warehousing and storage
Total Transport and warehouse
Publishing and information services
Motion picture and sound recording
industries
Broadcasting and
tel ecommuni cati ons
Data processing services
Total Info and culture
Monetary authorities - central bank
Credit intermediation activities
Securities commodity contracts and
other financial investment activities
Insurance carriers and related
activities
Funds and other financial vehicles
Total Banks
Real estate
Rental and leasing services
Lessors of non-financial intangible
assets (except copyrighted works)
Total Real estate
Professional scientific and technical
services
Management of companies and
enterprises
Administrative and support services
Waste management and remediation
services
Education Services
Ambulatory health care services
                         51

-------
Code
9314
9315
9316
9321
9322
9323
9324
9325
9326
9327
9331
9332
9333
9334
9335
9336
9337
9338
9339
9411
9412
9413
9414
Description
Textile product mills
Clothing manufacturing
Leather and allied product
manufacturing
Wood product manufacturing
Paper manufacturing
Printing and related support activities
Petroleum and coal products
manufacturing
Chemical manufacturing
Plastics and rubber products
manufacturing
Non-metallic mineral product
manufacturing
Primary Metal Manufacturing
Fabricated metal product
manufacturing
Machinery manufacturing
Computer and Electronic
manufacturing
Electrical equipment appliance and
component manufacturing
Transportation equipment
manufacturing
Furniture and related product
manufacturing
Miscellaneous manufacturing
Total Manufacturing
Farm product wholesaler-distributors
Petroleum product wholesaler-
distributors
Food beverage and tobacco
whol esal er-di stributor s
Personal and household goods
whol esal er-di stributor s
Code
9622
9623
9624
9625
9711
9712
9713
9721
9722
9723
9811
9812
9813
9814
9815
9911
9912
9913
9914
9919
9920
9921
9922
Description
Hospitals
Nursing and residential care
facilities
Social assistance
Total Service
Performing arts spectator sports and
related industries
Heritage institutions
Amusement gambling and
recreation industries
Accommodation services
Food services and drinking places
Total Tourism
Repair and maintenance
Personal and laundry services
Religious grant-making civic and
professional and similar
organizations
Private households
Total other services
Federal government public
administration
Provincial and territorial public
administration (9121 to 9129)
Local municipal and regional public
administration (9131 to 9139)
Aboriginal public administration
International and other extra-
territorial public administration
Total Government
Commercial Fuel Combustion
TOTAL DISTRIBUTION AND
RETAIL
52

-------
Code
9415
9416
9417
9418
9419
9420
9441
9442
9443
9444
9445
9446
9447
9448
9451
9452
9453
9454
9455
9481
9482
9483
9484
9485
9486
9487
9488
9491
9492
Description
Motor vehicle and parts wholesaler-
distributors
Building material and supplies
whol esal er-di stributor s
Machinery equipment and supplies
whol esal er-di stributor s
Miscellaneous wholesaler-distributors
Wholesale agents and brokers
Total Wholesale
Motor vehicle and parts dealers
Furniture and home furnishings stores
Electronics and appliance stores
Building material and garden
equipment and supplies dealers
Food and beverage stores
Health and personal care stores
Gasoline stations
clothing and clothing accessories
stores
Sporting goods hobby book and
music stores
General Merchandise stores
Miscellaneous store retailers
Non-store retailers
Total Retail
Air transportation
Rail transportation
Water Transportation
Truck transportation
Transit and ground passenger
transportation
Pipeline transportation
Scenic and sightseeing transportation
Support activities for transportation
Postal service
Couriers and messengers
Code
9923
9924
9925
9926
9927
9928
9929
9930
9931
9932
9933
9941
9942
9943
9944
9945
9946
9947
9950
9960
9970
9980
9990
9993
9994
9995
9996
9997
9991
Description
TOTAL INSTITUTIONAL AND
GOVERNEMNT
Primary Industry
Manufacturing and Assembly
Distribution and Retail (no
petroleum)
Commercial Services
Commercial Meat cooking
HIGHJET
LOWMEDJET
OTHERJET
CANRAIL
Forest fires
PAVED ROADS
UNPAVED ROADS
HIGHWAY
ROAD
Commercial Marine Vessels
Construction and mining
Agriculture Construction and
mining
Intersection of Forest and Housing
TOTBEEF
TOTPOUL
TOTSWIN
TOTFERT
Trail
ALLROADS
30UNPAVED_70trail
Urban area
CHBOISQC
Traffic
53

-------
       REFERENCES

Adelman, Z. 2012. Memorandum:  Fugitive Dust Modeling for the 2008 Emissions Modeling Platform. UNC
       Institute for the Environment, Chapel Hill, NC. September, 28, 2012.
Anderson, O.K.; Sandberg, D.V; Norheim, R.A., 2004. Fire Emission Production Simulator (FEPS) User's
       Guide. Available at http://www.fs.fed.us/pnw/fera/feps/FEPS  users guide.pdf
Bullock Jr., R, and K. A. Brehme (2002) "Atmospheric mercury simulation using the CMAQ model:
       formulation description and analysis  of wet deposition results." Atmospheric Environment 36, pp 2135-
       2146.
ERG, 2006. Mexico National Emissions Inventory, 1999: Final, prepared by Eastern Research Group for
       Secretariat of the Environment and Natural Resources and the National Institute of Ecology, Mexico,
       October 11,2006.  Available at:
       http://www.epa.gOv/ttn/chief/net/mexico/l 999_mexico_nei_final_report.pdf
Environ Corp. 2008. Emission Profiles for EPA SPECIATE Database, Part 2: EPAct Fuels (Evaporative
       Emissions).  Prepared for U. S. EPA, Office of Transportation and Air Quality, September 30, 2008.
EPA, 2005. EPA 's National Inventory Model (NMIM), A Consolidated Emissions Modeling System for
       MOBILE6 andNONROAD, U.S. Environmental Protection Agency, Office of Transportation and Air
       Quality, Assessment and Standards Division. Ann Arbor, MI 48105, EPA420-R-05-024, December 2005.
       Available at http://www.epa.gov/otaq/models/nmim/420r05024.pdf.
EPA 2006a.  SPECIATE 4.0, Speciation Database Development Document, Final Report, U.S. Environmental
       Protection Agency, Office of Research and Development, National Risk Management Research Laboratory,
       Research Triangle Park,  NC 27711, EPA600-R-06-161, February 2006. Available at
       http://www.epa.gov/ttn/chief/software/speciate/speciate4/documentation/speciatedoc_1206.pdf.
EPA, 2012a. 2008 National Emissions Inventory, version 2 Technical Support Document.  Office of Air Quality
       Planning and Standards,  Air Quality Assessment Division, Research Triangle Park, NC. Available at:
       http://www.epa.gov/ttn/chief/net/2008inventorv.htmltfinventorydoc
Frost & Sullivan, 2010. "Project: Market Research and Report  on North American Residential Wood Heaters,
       Fireplaces, and Hearth Heating Products Market (P.O. # PO1-EVIP403-F&S). Final Report April 26,
       2010".  Prepared by Frost & Sullivan, Mountain View,  CA 94041.
Joint Fire Science Program, 2009. Consume  3.0~a software tool for computing fuel consumption. Fire Science
       Brief. 66,  June 2009.  Consume 3.0 is available at:
       http://www.fs.fed.us/pnw/fera/research/smoke/consume/index.shtml
Kochera, A., 1997. "Residential Use of Fireplaces," Housing Economics, March 1997, 10-11. Also see:
       http://www.epa.gov/ttnchiel/conference/eilO/area/houck.pdf.
LADCO, 2012. "Regional Air Quality Analyses for Ozone, PM2.5,  and Regional Haze: Base C Emissions
       Inventory (September 12, 2011)". Lake Michigan Air Directors Consortium, Rosemont, IL 60018.
       Available at: http://www.ladco.org/tech/emis/basecv8/Base  C Emissions  Documentation  Sept 12.pdf
McCarty, J.L., Korontzi, S., Jutice,  C.O., and T. Loboda. 2009. The spatial and temporal  distribution of crop
       residue burning in the contiguous United States. Science of the Total Environment,  407 (21): 5701-5712.
McKenzie, D.; Raymond,  C.L.;  Kellogg, L.-K.B.; Norheim, R.A; Andreu, A.G.; Bayard,  A.C.; Kopper, K.E.;
       Elman. E. 2007. Mapping fuels at multiple scales: landscape application of the Fuel Characteristic
       Classification  System. Canadian Journal of Forest Research. 37:2421-2437. Oak Ridge National
       Laboratory, 2009.  Analysis of Fuel Ethanol Transportation Activity and Potential Distribution
       Constraints. U.S. Department of Energy, March 2009.  Docket No.  EPA-HQ-OAR-2010-0133.
Ottmar, R.D.; Sandberg, D.V.; Bluhm, A. 2003. Biomass consumption and carbon pools. Poster.  In: Galley,
       K.E.M., Klinger, R.C.; Sugihara, N.G. (eds.) Proceedings of Fire Ecology, Prevention, and Management.
                                                54

-------
       Misc. Pub. 13, Tallahassee, FL: Tall Timbers Research Station.
Ottmar, R.D.; Prichard, S.J.; Vihnanek, R.E.;  Sandberg, D.V. 2006. Modification and validation of fuel
       consumption models for shrub and forested lands in the Southwest, Pacific Northwest, Rockes, Midwest,
       Southeast, and Alaska. Final report, JFSP Project 98-1-9-06.
Ottmar, R.D.; Sandberg, D.V.; Riccardi, C.L.; Prichard, SJ. 2007. An Overview of the Fuel Characteristic
       Classification System - Quantifying, Classifying, and Creating Fuelbeds for Resource Planning.
       Canadian Journal of Forest Research. 37(12): 2383-2393. FCCS is available at:
       http://www.fs.fed.us/pnw/fera/fccs/index.shtml
Pouliot, G., H. Simon, P. Bhave, D. Tong, D.  Mobley, T. Pace, and T. Pierce .  (2010) "Assessing the
       Anthropogenic Fugitive Dust Emission Inventory and Temporal Allocation Using an Updated Speciation
       of Paniculate Matter." International Emission Inventory Conference, San Antonio, TX.  Available at
       http://www.epa.gov/ttn/chief/conference/eil9/session9/pouliot.pdf
Raffuse, S., N. Larkin, P. Lahm, Y. Du, 2012. Development of Version 2 of the Wildland Fire Portion of the
       [2011] National Emissions Inventory.  International Emission Inventory Conference, Tampa, FL.
       Available at: http://www.epa.gov/ttn/chief/conference/ei20/session2/sraffuse.pdf
Raffuse, S., D. Sullivan, L. Chinkin, S. Larkin, R. Solomon, A. Soja, 2007. Integration of Satellite-Detected and
       Incident Command Reported Wildfire Information into BlueSky, June 27, 2007.  Available at:
       http://getblueskv.org/smartfire/docs.cfm
Russell, A.G. and G.R. Cass, 1986. Verification of a Mathematical Model for Aerosol Nitrate and Nitric Acid
       Formation and Its Use for Control Measure Evaluation, Atmospheric Environment, 20: 2011-2025.
SESARM, 2012a. "Development of the 2007 Base Year and Typical Year Fire Emission Inventory for the
       Southeastern States", Air Resources Managers, Inc., Fire Methodology, AMEC Environment and
       Infrastructure, Inc. AMEC Project No.: 6066090326, April, 2012
SESARM, 2012b.  "Area and Nonroad 2007 Base Year Inventories. Revised Final Report", Contract No. S-2009-06-01,
       Prepared by Transystems Corporation, January 2012. Available at:
       http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&ved=OCDAQFjAC&url=ftp%
       3 A%2F%2Fwsip-70-164-45 -
       196.dc.dc.cox.net%2Fpublic%2FSESARM%2FRevised%2520Final%2FSESARM%2520Base%2520Year%2520
       Revised%2520Final%2520Report Jan2012.docx&ei=xU-AUPulF4WAOAHC5YHYCg&usg=AFOiCNFhigx3Ej-
       hbfYmMUP4zGI  HBiqZA&sig2=hWWNOm3WYPSO28QSzn5BIA.
Skamarock, W., J. Klemp, J. Dudhia, D. Gill,  D. Barker, M. Duda, X. Huang, W. Wang, J.  Powers, 2008. A
       Description of the Advanced Research WRF Version 3. NCAR Technical Note.  National Center for
       Atmospheric Research, Mesoscale and Microscale Meteorology Division, Boulder,  CO. June 2008.
       Available at: http://www.mmm.ucar.edu/wrf/users/docs/arw_v3.pdf
Sullivan D.C., Raffuse S.M., Pryden D.A., Craig K.J., Reid S.B., Wheeler N.J.M., Chinkin L.R., Larkin N.K.,
       Solomon R., and Strand T. (2008) Development and applications of systems for modeling emissions and
       smoke from fires: the BlueSky smoke  modeling framework and SMARTFIRE: 17th International
       Emissions Inventory Conference, Portland, OR,  June 2-5. Available at:
       http://www.epa.gov/ttn/chief/conferences.html
Wang, Y., P. Hopke, O.  V. Rattigan, X. Xia, D. C. Chalupa, M. J. Utell. (2011) "Characterization of Residential
       Wood Combustion Particles Using the Two-Wavelength Aethalometer", Environ. Sci. Technol., 45 (17),
       pp 7387-7393
Yarwood, G., S. Rao, M. Yocke, and G. Whitten, 2005: Updates to the Carbon Bond Chemical Mechanism:
       CB05. Final Report to the US EPA, RT-0400675.  Available at
       http://www.camx.com/publ/pdfs/CB05 Final  Report 120805.pdf
                                                 55

-------
                      4.0   CMAQ Air Quality Model Estimates


4.1    Introduction to the CMAQ Modeling Platform

The Clean Air Act (CAA) provides a mandate to assess and manage air pollution levels to protect human
health and the environment. EPA has established National Ambient Air Quality Standards (NAAQS),
requiring the development of effective emissions control strategies for such pollutants as ozone and
particulate matter. Air quality models are used to develop these emission control strategies to achieve the
objectives of the CAA.

Historically, air quality models have addressed individual pollutant issues separately. However, many of
the same precursor chemicals are involved in both ozone and aerosol (particulate matter) chemistry;
therefore, the  chemical transformation pathways are dependent. Thus, modeled abatement strategies of
pollutant precursors, such as volatile organic compounds (VOC) and NOx to reduce ozone levels, may
exacerbate other air pollutants such as particulate matter.

To meet the need to address the complex relationships between pollutants, EPA developed the Community
Multiscale Air Quality (CMAQ) modeling system. The primary goals for CMAQ are to:

   •   Improve the environmental management community's ability to evaluate the impact of air quality
       management practices for multiple pollutants at multiple scales.
   •   Improve the scientist's ability to better probe, understand, and simulate chemical and physical
       interactions in the atmosphere.

The CMAQ modeling system brings together key physical and chemical functions associated with the
dispersion and transformations of air pollution at various scales. It was designed to approach air quality as
a whole by including state-of-the-science capabilities for modeling multiple air quality issues, including
tropospheric ozone, fine particles, toxics, acid deposition, and visibility degradation CMAQ relies on
emission estimates from various sources, including the U.S. EPA Office of Air Quality Planning and
Standards' current emission inventories, observed emission from  major utility stacks, and model estimates
of natural emissions from biogenic and agricultural sources. CMAQ also relies on meteorological
predictions that include assimilation of meteorological observations as constraints. Emissions and
meteorology data are fed into CMAQ and run through various algorithms that simulate the physical and
chemical processes in the atmosphere to provide estimated concentrations of the pollutants. Traditionally,
the model has been used to predict air quality across a regional or national domain and then to simulate
the effects of various changes in emission levels for policymaking purposes. For health studies, the model
can also be used to provide supplemental information about air quality in areas where no monitors exist.

CMAQ was also designed to have multi-scale capabilities so that separate models were not needed for
urban and regional scale air quality modeling. The grid spatial resolutions in past annual CMAQ runs
have been 36  km x 36 km per grid for the "parent" domain, and nested within that domain are 12 km x 12
km grid resolution domains. The parent domain typically covered the continental United  States, and the
                                           56

-------
nested 12 km x 12 km domain covered the Eastern or Western United States. The CMAQ simulation
performed for this 2009 assessment used a single domain that covers the entire continental U.S. (CONUS)
and large portions of Canada and Mexico using 12 km by 12 km horizontal grid spacing. For urban
applications, CMAQ has also been applied with a 4-km x 4-km grid resolution for urban core areas;
however, the uncertainties in emissions and meteorology information can actually increase at this high of
a resolution. Currently, 12 km x 12 km resolution is recommended for most applications as the highest
resolution. With the temporal flexibility of the model, simulations can be performed to evaluate longer
term (annual to multi-year) pollutant climatologies as well as short-term (weeks to months) transport from
localized sources. By making CMAQ a modeling system that addresses multiple pollutants and different
temporal and spatial scales, CMAQ has a "one atmosphere" perspective that combines the efforts of the
scientific community. Improvements will be made to the CMAQ modeling system as the scientific
community further develops the state-of-the-science. For more information on CMAQ, go to
http://www.epa.gov/asmdnerl/CMAO or http://www.cmascenter.org.

4.1.1   Advantages and Limitations of the CMAQ Air Quality Model
An advantage of using the CMAQ model output for comparing with health outcomes is that it has the
potential to provide complete spatial and temporal coverage. Additionally, meteorological predictions,
which are also needed when comparing health outcomes, are available for every grid cell along with the
air quality predictions.

A disadvantage of using CMAQ is that, as a deterministic model, it has  none of the statistical qualities of
interpolation techniques that fit the observed data to one degree or another. Furthermore, the emissions
and meteorological data used in CMAQ each have large uncertainties, in particular for unusual emission
or meteorological events. There are also uncertainties associated with the chemical transformation and
fate process algorithms used in air quality models.  Thus, emissions and  meteorological data plus modeling
uncertainties cause CMAQ to predict best on  longer time scale bases (e.g.,  synoptic, monthly, and annual
scales) and be most error prone at high time and space resolutions compared to direct measures.

One practical disadvantage of using CMAQ output is that the regularly spaced grid cells do not line up
directly with counties or ZIP codes which are the geographical units over which health outcomes are
likely to be aggregated. But it is possible to overlay grid cells with county or ZIP code boundaries and
devise means of assigning an exposure level that nonetheless provides more complete coverage than that
available from ambient data alone. Another practical disadvantage is that CMAQ requires significant data
and computing resources to obtain results for daily environmental health surveillance.

This section describes the air quality modeling platform used for the 2009 CMAQ simulation. A modeling
platform is a structured system of connected modeling-related tools and data that provide a consistent and
transparent basis for assessing the air quality response to changes in emissions and/or meteorology. A
platform typically consists of a specific air quality model, emissions estimates, a set of meteorological
inputs, and estimates of "boundary conditions" representing pollutant transport from source areas outside
the region modeled. We used the CMAQ6 model as part of the 2009 Platform to provide a national  scale
6Byun, D.W., and K. L. Schere, 2006: Review of the Governing Equations, Computational Algorithms, and Other
Components of the Models-3 Community Multiscale Air Quality (CMAQ) Modeling System. Applied Mechanics
Reviews, Volume 59, Number 2 (March 2006), pp. 51-77.
                                            57

-------
air quality modeling analysis. The CMAQ model simulates the multiple physical and chemical processes
involved in the formation, transport, and destruction of ozone and fine particulate matter
This section provides a description of each of the main components of the 2009 CMAQ simulation along
with the results of a model performance evaluation in which the 2009 model predictions are compared to
corresponding measured concentrations.


4.2    CMAQ Model Version, Inputs and Configuration

4.2.1  Model Version
CMAQ is a non-proprietary computer model that simulates the formation and fate of photochemical
oxidants, including PM2.5 and ozone, for given input sets of meteorological conditions and emissions. The
CMAQ model version 4.7 was most recently peer-reviewed in February of 2009 for the U.S. EPA7. As
mentioned previously, CMAQ includes numerous science modules that simulate the emission,
production, decay, deposition and transport of organic and inorganic  gas-phase and pollutants in the
atmosphere. This analysis employed a version of CMAQ based on the latest publicly released version of
CMAQ (i.e., version 4. 7. 18) at the time of the 2009 air quality modeling. CMAQ version 4.7.1 reflects
updates to version 4.7 to improve the underlying science which include aqueous chemistry mass
conservation improvements and improved vertical convective mixing. The model enhancements in
version 4.7.1 also include:

1.  Aqueous chemistry
   •  Mass conservation improvements
              Imposed one second minimum timestep for remainder of the cloud lifetime after 100
              'iterations' in the solver
              Force mass balance for the last timestep in the cloud by limiting oxidized amount to mass
              available
   •  Implemented steady state assumption for OH
   •  Only allow sulfur oxidation to control the aqueous chemistry  solver timestep (previously,
       reactions of OH, GLY, MGLY, and Hg for multipollutant model also controlled the timestep)

2.  Advection
   •  Added additional divergence-based constraint on advection timestep
   •  Vertical advection in the Yamo module is now represented with the PPM scheme to limit
       numerical diffusion
7 Allen, D., Burns, D., Chock, D., Kumar, N., Lamb, B., Moran, M. (February 2009 Draft Version). Report on the
Peer Review of the Atmospheric Modeling and Analysis Division, NERL/ORD/EPA. U.S. EPA, Research Triangle
Park, NC. CMAQ version 4.7 was released on December, 2008. It is available from the Community Modeling and
Analysis System (CMAS) as well as previous peer-review reports at: http://www.cmascenter.org.

8 CMAQ version 4.7.1 model code is available from the Community Modeling and Analysis System (CMAS) at:
http://www.cmascenter.org.
                                           58

-------
3.  Model time step determination
   •   Fixed a potential advection time step error
          -   The sum of the advection steps for a given layer time step might not equal the output time
              step duration in some extreme cases
              Ensured that the advection steps sum up to the synchronization step

4.  Horizontal diffusion
   •   Fixed a potential error
          -   Concentration data may not be correctly initialized if multiple sub-cycle time steps are
              required
          -   Fix to initialize concentrations with values calculated in the previous sub-time step

5.  Emissions
   •   Bug fix in EMIS_DEFN.F to include point source layer 1 NH3 emissions
   •   Bug fix to calculate soil NO "pulse" emissions in BEIS
   •   Remove excessive logging of cases where ambient air temperature exceeds 315.0 Kelvin. When
       this occurs, the values are just slightly over 315
   •   Bug fix for parallel decomposition errors in plume rise emissions

6.  Photolysis
   •   JPROC/phot_table and phot_sat options
              Expanded lookup tables to facilitate applications across the globe and vertical extent to
              20km
              Updated temperature adjustments for absorption cross sections and quantum yields
              Revised algorithm that processes TOMS datasets for OMI data format
   •   In-line  option
              Asymmetry factor calculation updated using values from Mie theory integrated over log
              normal  particle distribution; added special treatment for large particles in asymmetry factor
              algorithm to avoid numerical instabilities

4.2.2   Model Domain and Grid Resolution
The CMAQ modeling  analyses were performed for a domain covering the continental United States, as
shown in Figure 4-1. This single domain covers the entire continental U.S. (CONUS) and large portions
of Canada and Mexico using 12 km by  12 km horizontal grid spacing. The model extends vertically from
the surface to 50 millibars (approximately 19 km) using a sigma-pressure coordinate system. Air quality
conditions at the outer boundary of the  12 km domain were taken from a global model. Table 4-1 provides
some basic geographic information regarding the 12 km CMAQ domain.
                                           59

-------
                Table 4-1. Geographic Information for 12 km Modeling Domain
                   National 12 km CMAQ Modeling Configuration
                   Map Projection
                   Grid Resolution
                   Coordinate
                   Center
                   True Latitudes
                   Dimensions
                   Vertical Extent
Map Projection
Grid Resolution
Coordinate Center

True Latitudes
Dimensions
Vertical Extent
           12km CONUS nationv/id
           x.y: -2556000.-172SOOO
           col: 459 row: 299
Figure 4-1. Map of the CMAQ Modeling Domain. The blue box denotes the 12 km national
modeling domain. (Same as Figure 3-1.)


4.2.3   Modeling Period/ Ozone Episodes
The 12 km CMAQ modeling domain was modeled for the entire year of 2009. The 2009 annual
simulation was performed in two half-year segments (i.e., January through June, and July through
December) for each emissions scenario. With this approach to segmenting an annual simulation we were
able to reduce the overall throughput time for an annual simulation. The annual simulation included a
"ramp-up" period, comprised of 10 days before the beginning of each half-year segment, to mitigate the
effects of initial concentrations. All 365 model days were used in the annual average levels of PIVb.s. For
                                          60

-------
the 8-hour ozone, we used modeling results from the period between May 1 and September 30. This 153-
day period generally conforms to the ozone season across most parts of the U.S. and contains the majority
of days that observed high ozone concentrations.

4.2.4  Model Inputs: Emissions, Meteorology and Boundary Conditions
2009 Emissions:  The emissions inventories used in the 2009 air quality modeling are described in Section
3, above.

Meteorological Input Data:  The gridded meteorological data for the entire year of 2009 at the 12 km
continental United States scale domain was derived from version 3.2 of the Weather Research and
Forecasting Model (WRF), Advanced Research WRF (ARW) core.9 Previous CMAQ annual simulations
have typically utilized meteorology provided by the 5th Generation Mesoscale Model (MM5).10 The WRF
Model is a next-generation mesoscale numerical weather prediction system developed for both operational
forecasting and atmospheric research applications (http://wrf-model.org). The 2009 WRF simulation
included the physics options of the Pleim-Xiu land surface model (LSM), Asymmetric Convective Model
version 2 planetary boundary layer (PEL) scheme, Morrison double moment microphysics, Kain- Fritsch
cumulus parameterization scheme and the RRTMG long-wave and shortwave radiation (LWR/SWR)
scheme.11

The WRF meteorological outputs were processed to create model-ready inputs for CMAQ using the
Meteorology- Chemistry Interface Processor (MCIP) package12, version 3.6, to derive the specific inputs
to CMAQ: horizontal wind components (i.e., speed and direction), temperature, moisture, and its related
speciated components was conducted for vertical diffusion rates, and rainfall rates for each grid cell in
each vertical layer. The WRF simulation used the same CMAQ map projection, a Lambert Conformal
projection centered at (-97, 40) with true latitudes at 33 and 45 degrees north.  The 12 km WRF domain
consisted of 459 by 299 grid cells. The WRF simulation utilized 34 vertical layers with a surface layer of
approximately 38 meters. Table 4-2 shows the vertical layer structure used in WRF  and the layer
collapsing approach to generate the CMAQ meteorological inputs. CMAQ resolved the vertical
atmosphere  with 24 layers, preserving greater resolution in the PEL.

In terms of the 2009 WRF meteorological model performance evaluation, an approach which included a
combination of qualitative and quantitative analyses was used to assess the adequacy of the WRF
simulated fields. The qualitative aspects involved comparisons of the model-estimated synoptic patterns
against observed patterns from historical weather chart archives. Additionally, the evaluations compared
9 Skamarock, W.C., Klemp, J.B., Dudhia, I, Gill, D.O., Barker, D.M., Duda, M.G., Huang, X., Wang, W., Powers,
J.G., 2008. A Description of the Advanced Research WRF Version 3.

10Grell, G. A., Dudhia, A. I, and Stauffer, D. R., 1994. A description of the Fifth-Generation PennState/NCAR
Mesoscale Model (MM5). NCAR Technical Note NCAR/TN-398+STR. Available at http://www.mmm.
ucar. edu/mm5/doc 1. html.

11 Gilliam, R.C., Pleim, J.E., 2010. Performance Assessment of New Land Surface and Planetary Boundary Layer
Physics in the WRF-ARW. Journal of Applied Meteorology and Climatology 49, 760-774.

12 Otte T.L., Pleim, J.E., 2010. The Meteorology-Chemistry Interface Processor (MCIP) for the CMAQ modeling
system: updates through v3.4.1. Geoscientific Model Development 3, 243-256.
                                            61

-------
spatial patterns of monthly average rainfall and monthly maximum planetary boundary layer (PEL)
heights. The statistical portion of the evaluation examined the model bias and error for temperature, water
vapor mixing ratio, solar radiation, and wind fields. These statistical values were calculated on a monthly
basis.
  Table 4-2Vertical layer structure for 2009 WRF and CMAQ simulations (heights are layer top)
Height Pressure WRTT
(\ / 1 \ ** -tVT
m) (mb)
17,145
14,490
12,593
11,094
9,844
8,766
7,815
6,962
6,188
5,477
4,820
4,208
3,635
3,095
2,586
2,198
1,917
1,644
1,466
1,292
1,121
952
787
705
624
544
465
386
307
230
153
114
76
38
50
95
140
185
230
275
320
365
410
455
500
454
590
635
680
716
743
770
788
806
824
842
860
869
878
887
896
905
914
923
932
937
941
946
34
33
32
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
^ CMAQ
(m)
2,655
1,896
1,499
1,250
1,078
951
853
775
711
657
612
573
539
509
388
281
273
178
174
171
168
165
82
81
80
80
79
78
78
77
38
38
38
38
24

23

22

21

20

19

18
17
16
15
14
13
12
11
10
9
8

7

6

5
4
O

2
1
Depth
(m)
4,552

2,749

2,029

1,627

1,368

1,185

539
509
388
281
273
178
174
171
168
165
163

160

157

78
77
76

38
38
                                           62

-------
Initial and Boundary Conditions: The lateral boundary and initial species concentrations are provided by
a three- dimensional global atmospheric chemistry model, the GEOS-CHEM13 model (standard version 8-
03-02 with 8-02-03 chemistry). The global GEOS-CHEM model simulates atmospheric chemical and
physical processes driven by assimilated meteorological observations from the NASA's Goddard Earth
Observing System (GEOS). This model was run for 2009 with a grid resolution of 2.0 degrees x 2.5
degrees (latitude-longitude) and 46 vertical layers up to 0.01 hPa.  The predictions were processed using
the GEOS-2-CMAQ tool and used to provide one-way dynamic boundary conditions at one-hour
intervals.14 Ozone was evaluated from these GEOS-Chem runs by comparing to satellite vertical profiles
and ground-based measurements and found acceptable model performance. More information is available
about the GEOS-CHEM model and other applications using this tool at: http://www-
as . harvard . edu/chemi stry/trop/geos .


4.3    CMAQ Model Performance Evaluation

An operational model performance evaluation for ozone and PM2.5 and its related speciated components
was conducted for the 2009 simulation using state/local monitoring sites data in order to estimate the
ability of the CMAQ  modeling system to replicate the 2009 base year concentrations for the 12 km
continental U.S. domain.

There are various statistical metrics available and used by the science community for model performance
evaluation. For a robust evaluation, the principal evaluation statistics used to evaluate CMAQ
performance were two bias metrics, normalized mean bias and fractional bias; and two error metrics,
normalized mean error and fractional error. Normalized mean bias (NMB) is used as a normalization to
facilitate a range of concentration magnitudes. This statistic averages the difference (model - observed)
over the sum of observed values. NMB is a useful model performance indicator because it avoids
overinflating the observed range of values, especially at low concentrations. Normalized mean bias is
defined as:
 NMB = — - *100, where P = predicted concentrations and O = observed
Normalized mean error (NME) is also similar to NMB, where the performance statistic is used as a
normalization of the mean error. NME calculates the absolute value of the difference (model - observed)
over the sum  of observed values. Normalized mean error is defined as:
13 Yantosca, B., 2004. GEOS-CHEMv7-01-02 User's Guide, Atmospheric Chemistry Modeling Group, Harvard
University, Cambridge, MA, October 15, 2004.

"Akhtar, F., Henderson, B., Appel, W., Napelenok, S., Hutzell, B., Pye, H., Foley, K., 2012. Multiyear Boundary
Conditions for CMAQ 5.0 from GEOS-Chem with Secondary Organic Aerosol Extensions, 11th Annual Community
Modeling and Analysis System conference, Chapel Hill, NC, October 2012.
                                           63

-------
        I\P-O\
NME=	*100
         t(o)
          i
Fractional bias is defined as:
        (   „         N
FB= -
     n
         1((P+0)
                      *100
FB is a useful model performance indicator because it has the advantage of equally weighting positive and
negative bias estimates. The single largest disadvantage in this estimate of model performance is that the
estimated concentration (i.e., prediction, P) is found in both the numerator and denominator.

Fractional error (FE) is similar to fractional bias except the absolute value of the difference is used so that
the error is always positive. Fractional error is defined as:
FE= -
     n
        I
       V l
                      *100
In addition to the performance statistics, regional maps which show the normalized mean bias and error
were prepared for the ozone season, May through September, at individual monitoring sites as well as on
an annual basis for PIVh.s and its component species.

Evaluation for 8-hour Daily Maximum Ozone: The operational model performance evaluation for eight-
hour daily maximum ozone was conducted using the statistics defined above. Ozone measurements for
2009 in the continental U.S. were included in the evaluation and were taken from the 2009 State/local
monitoring site data in the Air Quality System (AQS) Aerometric Information Retrieval System (AIRS).
The performance statistics were calculated using predicted and observed data that were paired in time and
space on an 8-hour basis.  Statistics were generated for the following geographic groupings in the 12-km
continental U.S. domain15: five large subregions: Midwest, Northeast, Southeast, Central and Western
U.S.

The 8-hour ozone model performance bias and error statistics for each subregion and each season are
provided in Table 4-4. Seasons were defined as: winter (December-January- February), spring (March-
April-May), summer (June, July, August),  and fall (September-October-November). Spatial plots of the
normalized mean bias and error for individual monitors are shown in Figures 4-2 through 4-3. The
statistics shown in these two figures were calculated over the ozone season using data pairs on days with
observed 8-hour ozone of greater than or equal to 60 ppb.
15 The subregions are defmedby States where: Midwest is IL, IN, MI, OH, and WI; Northeast is CT, DE, MA, MD,
ME, NH, NJ, NY, PA, RI, and VT; Southeast is AL, FL, GA, KY, MS, NC, SC, TN, VA, and WV; Central is AR, IA,
KS, LA, MN, MO, ME, OK, and TX; West is AK, CA, OR, WA, AZ, MM, CO, UT, WY, SD, ND, MT, ID, and NV.
                                           64

-------
In general, the model performance statistics indicate that the 8-hour daily maximum ozone concentrations
predicted by the 2009 CMAQ simulation closely reflect the corresponding 8-hour observed ozone
concentrations in space and time in each subregion of the 12 km modeling domain. As indicated by the
statistics in Table 4-4, bias and error for 8-hour daily maximum ozone are relatively low in each
subregion, not only in the summer when concentrations are highest, but also during other times of the
year. Specifically, 8-hour ozone in the summer is slightly over predicted with the greatest over prediction
in the Southeast (NMB is 23.1 percent). Ozone performance in Spring shows better performance with
slight over predictions in most of the subregions except in the West (slight under prediction of 0.6). In the
winter, when concentrations are generally low, the model slightly over predicts 8-hour ozone with the
exception of the Northeast (NMB is -11.2). In the fall, when concentrations are also relatively low, ozone
is also slightly over predicted; with NMBs less than 24 percent in each subregion.

Model bias at individual sites during the ozone season is similar to that seen on a subregional basis for the
summer. The information in Figure 4-2 indicates that the bias for days with observed  8-hour daily
maximum ozone greater than 60 ppb is within ± 20 percent at the vast majority of monitoring sites across
the U.S. domain. The exceptions are sites in and/or near Chicago, IL, Baton Rouge, LA, Tampa and
Orlando, FL, northern (St.  Lawrence/ Franklin counties) NY, Greenville, WV, Brunswick, GA; as well as
a few areas along the southern California coast. At these sites observed concentrations greater than 60 ppb
are generally predicted in the range of ±20 to 40 percent. Looking at the map of bias,  Figure 4-2 indicates
that the low bias at these sites is not evident at other sites in these same areas.  This suggests that the under
prediction at these sites is likely due to very local features (e.g., meteorology and/or emissions) and not
indicative of a systematic problem in the modeling platform. Model error, as seen from Figure 4-3, is 30
percent or less at most of the sites across the U.S. modeling domain. Somewhat greater error is evident at
sites in several areas most notably along portions of the Northeast Corridor and in portions of Florida,
Louisiana, Texas, Mississippi, Alabama, South Carolina and along the California coastline.

 Table 4-3. Summary of CMAQ 2009 8-Hour Daily Maximum Ozone Model Performance Statistics
                                    by Subregion, by  Season
Subregion
Northeast




Midwest




Central States



Season
Winter
Spring
Summer

Fall
Winter
Spring
Summer
Fall

Winter
Spring
Summer
Fall
No. of
Obs
5,472
11,995
15,215

11,070
2,708
11,616
15,914
9,350

11,083
14,851
16,464
14,495
NMB
(%)
-11.2
0.8
14.6

18.1
0.5
2.4
13.2
15.8

4.4
5.0
21.1
11.0
NME
(%)
19.2
12.0
19.4

24.3
23.9
13.0
18.3
20.9

16.5
14.7
26.2
20.2
FB (%)
-11.9
1.7
14.7

18.3
-4.3
3.4
13.4
16.4

5.7
6.7
21.0
12.8
FE (%)
21.7
12.7
19.1

24.1
23.0
13.7
18.1
22.0

18.2
15.6
25.3
21.5
                                           65

-------
                Southeast
Winter     6,536
Spring     17,194
Summer   19,395
Fall       15,308
5.5
9.8
23.1
24.4
14.9
16.4
26.0
28.7
11.6
11.1
22.8
23.9
6.3
17.0
25.2
27.6
                West
Winter     22,813
Spring     26,499
Summer   29,460
Fall       26,324
14.4
-0.6
6.8
8.2
22.7
12.4
16.5
17.1
16.4
-0.3
7.4
9.7
24.5
13.0
16.6
17.9
        03_8hrmax NMB (%) tor run 20Mat2_v5_Q9d_12USl (or May-Sep lor 12US1 [03_8hrmax_Qb>s60ppb]
                                      CtRCLE=AQS. Daily;
Figure 4-2. Normalized Mean Bias (%) of 8-hour daily maximum ozone greater than 60 ppb over
the period May- September 2009 at monitoring sites in the continental U.S. modeling domain
                                            66

-------
              03_8hrmax NME (%) for run 2009ef2_v5_09d_12US1 for 20090501 to 20090930
                                                                            units = %
                                                                            coverage limit • 75%
                                  CIRCLE=AQS_Daily;

Figure 4-3. Normalized Mean Error (%) of 8-hour daily maximum ozone greater than 60 ppb over
the period May-September 2009 at monitoring sites in the continental U.S. modeling domain

Evaluation for Annual PM.2.5'. The PM evaluation focuses on PIVh.s total mass and its components
including sulfate (864), nitrate (NCb), total nitrate (TNCb = NCb + HNCb), ammonium (NFU), elemental
carbon (EC), and organic carbon (OC).

The PM2.5 bias and error performance statistics were calculated on an annual basis for each subregion
(Table 4-5). PIVh.s measurements for 2009 were obtained from the following networks for model
evaluation: Chemical Speciation Network (CSN, 24 hour average), Interagency Monitoring of PROtected
Visual Environments (IMPROVE, 24 hour average, and Clean Air Status and Trends Network
(CASTNet, weekly average).  For PM2.5 species that are measure by more than one network, we
calculated separate sets of statistics for each network by subregion. For brevity, Table 4-5 provides
annual model performance statistics for PM2.5 and its component species for the five sub-regions in the 12
km continental U.S.  domain defined above (Northeast, Midwest, Southeast, Central, and West). In
addition to the tabular summaries of bias and error statistics, annual spatial maps which show the
normalized mean bias and error by site for each PM2.5 species are provided in Figures 4-4 through 4-17.

As indicated by  the statistics in Table 4-5, annual CMAQ PM2.5 for 2009 shows under predictions at rural
IMPROVE monitoring sites and urban CSN monitoring sites in each subregion except in the Northeast
and Midwest at  CSN sites which shows a slight over prediction in NMB  of 0 to 3  percent. Although not
shown here, the  mean observed concentrations of PM2.5 are more than twice as high at the CSN sites
(~10|ig m"3) as the IMPROVE sites (~5 jig m"3), thus illustrating the statistical differences between the
urban CSN and  rural IMPROVE networks.

Annual average  sulfate is consistently under predicted at CSN, IMPROVE, and CASTNet monitoring
sites across the modeling domain, with NMB values ranging from -14 percent to -41 percent. Overall,
sulfate bias performance is slightly better at rural IMPROVE sites than at urban CSN and/or suburban
                                          67

-------
CASTNet sites. Sulfate performance shows moderate error, ranging from 28 to 45 percent. Figures 4-6
and 4-7, suggest spatial patterns vary by region. The model bias for most of the Southeast, Central and
Southwest states are within -20 to -40 percent. The model  bias appears to be much less (±20 percent)
in the Northeast, and Northwest states. A few sites in the West and in the Central U.S. have biases
much greater than 20 percent. Model error also shows a spatial trend by region, where much of the
Eatern states are 20 to 40 percent, the Western and Central U.S. states are 30 to 60 percent.

Annual average nitrate is over predicted at the urban and rural monitoring sites in most of the
subregions in the 12 km modeling domain  (NMB in the range of 19% to 47%), except in the West
where nitrate is under predicted (NMB in the range of -20% to -32%). The bias  statistics indicate that
the model performance for nitrate  is generally best at the  urban CSN monitoring sites. Model
performance of total nitrate at sub-urban CASTNet monitoring sites shows an over prediction across
all subregions. Model error for nitrate is somewhat greater  for each  subregion as compared  to sulfate.
Model bias at individual sites indicates mainly over prediction of greater than 20 percent at most
monitoring sites in the Eastern half of the  U.S. as  well and in  the extreme Northwest, as indicated in
Figure 4-8. The exception to this is in the Florida and the Southwest of the modeling domain where
there appears to be a greater number of sites with  under  prediction  of nitrate of 20 to 80 percent.
Model error for annual nitrate, as shown in Figure 4-9, is least at sites in portions of the Midwest and
extending eastward  to the Northeast corridor. Nitrate concentrations are typically higher in these areas
than in other portions of the modeling domain.

Annual average  ammonium model performance as indicated in Table 4-5 has a  tendency for the model
to under predict across the CSN and CASTNet sites (ranging from -1 to -25 percent).  Ammonium is
slightly over predicted in the Midwest at CASTNet sites (NMB = ~ 3 percent). There is not a large
variation from subregion to subregion or at urban versus rural sites in the error statistics for ammonium.
The spatial variation of ammonium across the majority of individual monitoring sites shows bias within
±20  percent.

Annual average elemental carbon is over predicted in all subregions at urban and rural sites with the
exception of the near negligible bias  in the Central U.S.  at IMPROVE sites.  Similar to ammonium error
there is not a large variation from subregion to subregion or at urban versus rural sites.

Annual average organic carbon is under predicted at both urban and rural monitoring sites in all
subregions of the U.S. (NMB ranging from -4 to 45 percent). Similar to ammonium and elemental carbon,
error model performance does not show a large variation from subregion to subregion or at urban versus
rural sites (48 to 67 percent).

      Table 4-4. Summary of CMAQ 2009 Annual PM Species Model Performance Statistic
Pollutant


PM2.5


Monitor
Network
CSN




Subregion
Northeast
Midwest
Southeast
Central
West
No. of
Obs
2,754
2,087
2,345
1,891
2,986
NMB
(%)
3.0
0.2
-23.7
-14.8
-14.1
NME
(%)
37.7
30.3
40.0
41.7
49.5
FB (%)
0.4
-4.6
-32.8
-20.2
-15.2
FE (%)
36.6
33.2
46.1
46.8
51.0
                                           68

-------
Pollutant












Sulfate












Nitrate





Total Nitrate
(N03 +
HNO3)
Monitor „ ,
XT ^ . Subregion
Network 6
IMPROVE Northeast
Midwest
| Southeast
Central
| West
CSN Northeast
Midwest
| Southeast
Central
| West
IMPROVE Northeast
Midwest
| Southeast
Central
1^^^^ West
CASTNet Northeast
Midwest
| Southeast
Central
| West
CSN Northeast
Midwest
Southeast
Central
| West

IMPROVE Northeast
Midwest
| Southeast
Central
| West
CASTNet Northeast
Midwest
Southeast
Central
No. of
Obs
2,317
577
1,950
2,500
10,295
3,131
2,238
2,837
2,295
3,196
2,307
571
1,951
2,446
10,030
769
614
1,096
381
1,043
3,143
2,325
2,851
1,641
3,164

2,308
570
1,951
2,445
10,016
769
614
1,096
381
NMB
(%)
-1.3
-8.3
-25.2
-17.9
-27.9
-20.5
-24.5
-28.4
-30.8
-22.9
-15.3
-25.8
-27.1
-29.1
-14.4
-26.5
-30.5
-35.1
-41.3
-29.8
27.9
23.4
25.2
19.3
-32.4

46.8
33.9
24.7
33.9
-20.9
53.4
33.9
35.6
16.1
NME
(%)
43.2
33.4
44.3
41.3
56.3
34.3
33.3
36.8
42.2
42.8
32.0
35.4
37.0
37.9
44.9
28.1
31.2
36.2
42.0
39.2
65.5
56
93.3
57.3
61.8

91.0
69.8
108.0
74.1
88.7
59.1
43.6
54.3
36.9
FB (%)
-9.0
-15.4
-35.9
-24.2
-34.7
-15.4
-24.7
-32.7
-31.4
-15.6
-6.3
-19.1
-27.8
-24.5
-0.4
-26.7
-35.8
-42.5
-46.4
-24.8
-11.7
-8.7
56
-24.5
-69.6

-13.5
-19.7
-58.6
-16.0
-76.2
41.5
33.8
27.0
13.6
FE (%)
43.3
39.4
54.1
49.3
62.9
35.7
36.4
42.9
46.9
44.7
33.7
40.5
43.2
41.6
49.1
29.9
36.5
44.6
48.3
44.5
71.1
67.8
106
80.6
94.1

91.2
90.7
118.0
93.5
122.0
50.7
40.7
49.9
36.8
69

-------
Pollutant






Ammonium









Elemental
Carbon









Organic
Carbon





Monitor „ ,
,T ^ . Subregion
Network 6
| West
CSN Northeast
Midwest
Southeast
Central
| West

CASTNet Northeast
Midwest
| Southeast
Central
| West
CSN Northeast
Midwest
Southeast
Central
| West

IMPROVE Northeast
Midwest
| Southeast
Central
| West
CSN Northeast
Midwest
Southeast
Central
| West
IMPROVE Northeast
Midwest
| Southeast
Central
| West
No. of
Obs
1,043
3,131
2,238
2,837
2,295
3,196

769
614
1,096
381
1,043
2,978
2,212
2,823
2,284
3,117

2,320
577
1,949
2,501
10,403
2,900
2,155
2,779
2,239
3,058
2,314
574
1,951
2,499
10,238
NMB
(%)
4.8
-8.4
-3.6
-8.1
-7.1
-25.9

-1.6
3.0
-15.9
-2.0
-12.8
36.1
52.2
18.8
48.8
15.9

27.7
4.1
3.8
-0.3
15.1
-4.9
-14.9
-39.3
-31.5
-21.5
-3.8
-31.5
-34.1
-45.2
-34.8
NME
(%)
40.3
37.3
31.3
38.9
41.1
58.1

28.5
25.8
30.6
36.9
45.3
61.5
71.6
54.8
79.7
68.8

61.6
38.9
46.4
40.9
77.7
59.3
53.6
55.6
56.9
57.1
63.0
48.3
55.5
55.9
67.3
FB (%)
15.4
3.5
3.1
-7.7
-6.5
-8.7

-1.3
3.4
-18.4
-5.8
-12.4
25.1
33.6
13.2
33.8
11.7

6.0
-0.5
-6.5
-1.3
3.7
-6.3
-15.6
-48.7
-35.2
-16.4
-24.8
-41.9
-61.1
-60.7
-37.5
FE (%)
43.0
39.4
34.5
41.3
46.0
58.2

30.4
25.9
35.1
40.4
50.2
49.2
53.8
49.3
60.7
62.0

51.1
44.8
47.4
43.3
60.9
60.5
58.3
68.7
66.7
60.2
63.8
64.3
74.6
72.4
72.0
70

-------
                 PM_TOT NMB (%) for run 2009ef2_v5_09d_12US1 for Annual for 12US1
                   F 4 *
                                                                                 units - %
                                                                                 coverage limit. 75%
 >100

 80

 60

 40

 20

 0

 -20

 -40

 -60

|-80

 :-100
                               CIRCLE=CSN; TRIANGLE=IMPROVE;

Figure 4-4. Normalized Mean Bias (%) of annual PMi.s mass at monitoring sites in the
continental U.S. modeling domain
                  PM_TOT NME (%) for run 2009ef2_v5_09d_12US1 for Annual for 12US1
                                                                                units = %
                                                                                coverage limit • 75%
                               CIRCLE=CSN; TRIANGLE=IMPROVE;

Figure 4-5. Normalized Mean Error (%) of annual PMi.smass at monitoring sites in the
continental U.S. modeling domain
                                            71

-------
                  SO4 NMB (%) for run 2009ef2_v5_09d_12US1 for Annual for 12US1
                                                                               units = %
                                                                               coverage limit • 75%
                  	1 \  X.	£	W  .  S
                      CIRCLE=CSN; TRIANGLE=IMPROVE; SQUARE=CASTNET;
Figure 4-6. Normalized Mean Bias (%) of annual Sulfateat monitoring sites in the continental
U.S. modeling domain
                  S04 NME (%) for run 2009ef2_v5_09d_12US1 for Annual tor 12US1
                                                                               units - %
                                                                               coverage limit. 75%
                                                                               B
 >100
 90
 80
 70
 60
 50
 40
 30
 20
 10
'0
                      CIRCLE=CSN; TRIANGLE=IMPROVE; SQUARE=CASTNET;
Figure 4-7. Normalized Mean Error (%) of annual Sulfateat monitoring sites in the
continental U.S. modeling domain
                                           72

-------
                  NO3 NMB (%) for run 2009ef2_v5_09d_12US1 for Annual for 12US1
                                                                                units = %
                                                                                coverage limit • 75%
                                                                                   80
                                                                                   60
                                                                                   40
                                                                                   20
                                                                                   0
                                                                                   -20
                                                                                   -40
                                                                                   -60
                                                                                   -80
                                                                                   <-100
                             CIRCLE=IMPROVE;TRIANGLE=CASTNET;
Figure 4-8. Normalized Mean Bias (%) of annual Nitrate at monitoring sites in the continental U.S.
modeling domain.
                  N03 NME (%) for run 2009ef2 v5 09d  12US1 for Annual for 12US1
                                                                                 units - %
                                                                                 coverage limit - 75%
                                                                                   >100
                                                                                   90
                                                                                   80
                                                                                   70
                                                                                   60
                                                                                   50
                                                                                   40
                                                                                   30
                                                                                   20
                                                                                   10
                                                                                   0
                            CIRCLE=IMPROVE;TRIANGLE=CASTNET;
Figure 4-9. Normalized Mean Error (%) of annual Nitrate at monitoring sites in the continental
U.S. modeling domain
                                            73

-------
                 TN03 NMB (%) tor run 2009ef2_v5_09d_12US1 for Annual for 12US1
                                                                               coverage limit« 75%
Figure 4-10
continental
                         CIRCLE=CASTNET;
. Normalized Mean Bias (%) of annual Total Nitrate at monitoring sites in the
U.S. modeling domain
                 TNO3 NME (%) tor run 2009ef2_v5_09d_12US1 for Annual for 12US1
                                                                               units - %
                                                                               coverage limit» 75%
                                    CIRCLE=CASTNET;
Figure 4-11. Normalized Mean Error (%) of annual Total Nitrate at monitoring sites in the
continental U.S. modeling domain
                                           74

-------
                   NH4 NMB (%) for run 2009ef2_v5_09d_12US1 for Annual for 12US1
                                                                                unite o %
                                                                                coverage limit = 75%
                                                                                   >100
                                                                                   80
                                                                                   60
                                                                                   40
                                                                                   20
                                                                                   0
                                                                                   -20
                                                                                   -40
                                                                                   -60
                                                                                   -80
                                                                                   <-100
                               CIRCLE=CSN; TRIANGLE=CASTNET;
Figure 4-12. Normalized Mean Error (%) of annual Total Nitrate at monitoring sites in the
continental U.S. modeling domain
                  NH4 NME (%) for run 2009ef2_v5_09d_12US1 for Annual for 12US1
                                                                                lim'S - "t
                                                                                coverage limit = 75%
                              CIRCLE=CSN; TRIANGLE=CASTNET;
Figure 4-13. Normalized Mean Error (%) of annual Ammonium at monitoring sites in the
continental U.S. modeling domain
                                           75

-------
                  EC NMB (%) for run 2009ef2_v5_09d_12US1 for Annual for 12US1

                                                                               units ~ %
                                                                               coverage limit • 75%
                                                                                  >100
                                                                                  80
                                                                                  60
                                                                                  40
                                                                                  20
                                                                                  0
                                                                                  -20
                                                                                  -40
                                                                                  -60
                                                                                  -80
                                                                                  <-100
                              CIRCLE=CSN; TRIANGLE=IMPROVE;
Figure 4-14. Normalized Mean Bias (%) of annual Elemental Carbon at monitoring sites in the
continental U.S. modeling domain
                   EC NME (%) for run 2009ef2_v5_09d_12US1 tor Annual for 12US1
                                                                               units = %
                                                                               coverage limit • 75%
                                                                                  >100
                                                                                  90
                                                                                  80
                                                                                  70
                                                                                  60
                                                                                  50
                                                                                  40
                                                                                  30
                                                                                  20
                                                                                  10
                                                                                  0
                              CIRCLE=CSN; TRIANGLE=IMPROVE;
Figure 4-15. Normalized Mean Error (%) of annual Elemental Carbon at monitoring sites in the
continental U.S. modeling domain
                                           76

-------
                 PC NMB (%) for run 2009ef2_v5_09d_12US1 tor Annual tor 12US1
                                                                            unite = %
                                                                            coverage limit. 75%
                                                                              >100
                                                                              80
                                                                              60
                                                                              40
                                                                              20
                                                                              0
                                                                              -20
                                                                              -40
                                                                              -60
                                                                              -80
                                                                              <-100
                             CIRCLE=CSN; TRIANGLE=IMPROVE;
Figure 4-16. Normalized Mean Bias (%) of annual Organic Carbon at monitoring sites in the
continental U.S. modeling domain
                 PC NME (%) for run 2008ab_08c_12US1 tor 20080101 to 20081231
                                               *  / »<* '—-•'
                                        -L   v    -^
                                          I—T~   ~~/i7
                                               ,—r —. Zr-fHb.
                                               ,
                                    ^~**~~±.   I    » f
                                       ^   v-i     ;i
                                                                            units « %
                                                                            coverage limit • 75%
                         CIRCLE=IMPROVE; TRIANGLE=CSN;
Figure 4-17. Normalized Mean Error (%) of annual Organic Carbon at monitoring sites in the
continental U.S. modeling domain
                                         77

-------
     5.0   Bayesian space-time downscaling fusion model (downscaler) -
                            Derived Air Quality Estimates


5.1    Introduction

The need for greater spatial coverage of air pollution concentration estimates has grown in recent years as
epidemiology and exposure studies that link air pollution concentrations to health effects have become
more robust and as regulatory needs have increased. Direct measurement of concentrations is the ideal
way  of generating such data, but prohibitive logistics and costs limit the possible spatial coverage and
temporal resolution of such a database.  Numerical methods that extend the spatial coverage of existing
air pollution networks with a high degree of confidence are thus  a topic of current investigation by
researchers. The downscaler model (DS) is the result of the latest research efforts by EPA for performing
such predictions. DS utilizes both monitoring and CMAQ  data as inputs, and attempts to take advantage
of the measurement data's accuracy and CMAQ's spatial coverage to produce new spatial predictions.
This chapter describes methods and results of the DS application that accompany this report, which
utilized ozone and PIVh.s data from AQS and CMAQ to produce predictions to continental U.S. 2010
census tract centroids for the year 2009.

5.2   Downscaler Model

DS develops a relationship between observed and modeled concentrations, and then uses that relationship
to spatially predict what measurements would be at new locations in the spatial domain based on the
input data.  This process is separately applied for each time step  (daily in this work) of data, and for each
of the pollutants under study (ozone and PIVb.s). In its most general form, the model can be expressed in
an equation similar to that of linear regression:

Y(s,  t) = ~/J0(s, t) +  ^(s, t) * ~x(s, t) + e(s, t)  (Equation 1)

Where:
Y(s,t) is the observed concentration at point s and time t.
~x(s,t) is the CMAQ concentration at time t.  This value is  a weighted average of both the gridcell
containing the monitor and neighboring gridcells.
 ~fio(s,t) is the intercept, and is composed of both a global and a local component.
fti(t) is the global slope; local components of the slope are contained in the ~x(s,t) term.
e(s,t) is the model error.

DS has additional properties that differentiate it from linear regression:

1) Rather than just finding a single optimal solution to Equation  1, DS uses a Bayesian approach so that
uncertainties can be generated along with each concentration prediction. This involves drawing random
samples of model parameters from built-in "prior" distributions and assessing their fit on the data on the
order of thousands of times.  After each iteration, properties of the prior distributions are adjusted to try
to improve the fit of the next iteration.  The resulting collection of~/?o and fii values at each space-time
                                          78

-------
point are the "posterior" distributions, and the means and standard distributions of these are used to
predict concentrations and associated uncertainties at new spatial points.

2) The model is "heirarchical" in structure, meaning that the top level parameters in Equation 1 (ie
~fio(s,t), fiift), ~x(s,t)} are  actually defined in terms of further parameters and sub-parameters in the DS
code. For example, the overall slope and intercept is defined to be the sum of a global (one value for the
entire spatial domain) and local (values specific to each spatial point) component. This gives more
flexibility in fitting a model to the data to optimize the fit (i.e. minimize s(s,t)).

Further information about the development and inner workings of the current version of DS can be found
in Berrocal, Gelfand and Holland (2011) and references therein. The DS outputs that accompany this
report are described below, along with some additional analyses that include assessing the accuracy of the
DS predictions. Results are then summarized, and caveats are provided for interpreting them in the
context of air quality management activities.


5.3    Downscaler Output

In this application, DS was used to predict daily concentration and associated uncertainty values at the
2010 US census tract centroids across the continental U.S. using 2009 measurement and CMAQ data as
inputs. For ozone, the concentration unit is the daily maximum 8-hour average in ppb and for PIVh.s the
concentration unit is the 24-hour average in |j,g/m3. DS output is in the form of a comma-delimited table.
Example output of the 2009 ozone DS run is shown in Table 5-1. Each row is specific to date and census
tract. The columns of the output files are:

   • Date - represented by the data given in this row, in MM/DD/YYYY format.
   •  Census TractFIPS code (http://quickfacts.census.gOv/qfd/meta/long_fips.htm),
   • Latitude: The y-coordinate value transformed to latitude (degrees).
   • Longitude: The x-coordinate value transformed to longitude (degrees).
   • Prediction: Daily maximum estimated 8-hour ozone concentration in ppb or 24 hour average
      PM2.5 in ug/m3.
   •  Uncertainty: The posterior standard deviation (error) of the estimated ozone or
       concentration.
                                           79

-------
              Table 5-1. Downscaler Model Prediction: Example Data File (Ozone)
               2010 US Census  Latitude   Longitude     Daily Maximum 8-    Standard Error
                 Tract FIPS                            Hour Concentration          of
              	Code                                      (ppb)           Concentration
Jan-01-2002
Jan-01-2002
Jan-01-2002
Jan-01-2002
Jan-01-2002
Jan-01-2002
Jan-01-2002
Jan-01-2002
Jan-01-2002
Jan-01-2002
Jan-01-2002
Jan-01-2002
1001020100
1001020200
1001020300
1001020400
1001020500
1001020600
1001020700
1001020801
1001020802
1001020900
1001021000
1001021100
32.47718
32.47425
32.47544
32.47204
32.45892
32.44253
32.42723
32.41336
32.53474
32.64296
32.60895
32.45595
-86.49001
-86.47339
-86.4602
-86.4437
-86.42271
-86.47877
-86.44118
-86.5261
-86.51259
-86.52377
-86.75607
-86.73223
26.122939
25.900202
26.130488
26.085497
25.942581
25.854266
25.534966
25.707846
26.253984
26.212948
25.718179
26.065351
7.303598
7.373431
7.248804
7.312579
7.297929
7.347192
7.311914
7.50416
7.41871
7.552828
7.573458
7.432891
5.4  Downscaler Model Results for the 2009 Application

Monitoring data for 2009 from the AQS database described in Chapter 2 and output from the 2009 12 km
resolution CMAQ run described in Chapter 4 were input to DS to produce daily spatial predictions at the
2010 continental U.S. census tract centroids. The following summary information was extracted and
calculated for the DS ozone and PIVh.s inputs and outputs:
   •  Days with the highest and greatest spatial extent of high pollution
   •  Locations with the most days above the NAAQS
      Comparison between daily AQS observations and the nearest DS census tract prediction for
      selected sites
5.4.1  Summary of 8-hour Ozone Results

As a summary of the overall year, Figure 5.1 shows the 4th max daily maximum 8-hour average ozone for
AQS observations, CMAQ model predictions and downscaler model results. Based on downscaler model
estimates for 2009, approximately 26 percent of the US Census tracts (17,280 out 66,186) have at least
one day with an ozone value above 75 ppb.
                                         80

-------
                                                                          2009
                                                                          4'th Max, Daily max
                                                                          8-hour avg
                                                                          ozone (ppb)
                                                                              (-lnf,55]
                                                                              (55,60]
                                                                              (60,65]
                                                                           •  (65,70]
                                                                              (70,75]
                                                                              (75,80]
                                                                           •  (80,85]
                                                                           •  (85,90]
                                                                           •  (90, Inf]
Figure 5-1.  Observed, modeled and predicted annual 4th max (daily max 8-hour ozone
concentrations)
                                               81

-------
Figure 5-2 shows the location of the 4,952 census tracts that were predicted by the downscaler model to
have annual 4th max daily max 8 hour average ozone concentrations above 75 ppb. Approximately
24,638,377 million people (1,871,717 are 65 years in age or older) live in these 4,952 census tracts. Most
of the high ambient ozone concentrations are in California and followed by Texas.
         Census tract centroids where "4"1 high daily maximum
         B-Hour average" values exceed 75 ppb


Figure 5-2. Census tract locations where annual 4th max daily maximum 8-hour average ozone
concentrations estimates are above 75 ppb in 2009
Table 5-2 ranks the days in 2009 based on the combined spatial extent and intensity of the ozone
estimates above 75 ppb (only top 25 days are shown). There are 132 days on which at least one census
tract was predicted to have an ozone concentration above 75 ppb. This approach ranks the days of the
year based on two criteria: (1) spatial extent in terms of the number of census tracts where ozone
concentrations are above 75 ppb (spatial extent criterion), and (2) average ozone concentrations at those
locations (intensity criterion). Sunday, August 30, 2009 is the most intense day. It covers 2,696 census
tracts with average ozone concentrations of 91 ppb. Figure 5-2 shows the location of these census tracts
in yellow and red.  August 30th is ranked second in terms of spatial extent. The combined spatial extent
and average ozone concentration scores make this day the highest ranked ozone day in 2009. Figure 5-3
shows that the high ozone concentrations are concentrated in a relatively small geographical area.
However, it is spatially coincident with a highly urbanized/populated area (Los Angeles, CA) thus
explaining the high number of census tracts. It is important to distinguish here that the spatial extent
criterion used above is based on the number of census tracts not the geographical extent.
                                           82

-------
   Table 5-2. Rank order of days in 2009 based on combined spatial extent and intensity of ozone
                             estimates (only top 25 days are shown)
Day Spatial Extent in
Terms of Census
Tract Count
Sunday, August 30, 2009
Saturday, August 29, 2009
Saturday, July 18, 2009
Sunday, September 27, 2009
Sunday, June 28, 2009
Wednesday, July 01, 2009
Wednesday, August 19, 2009
Monday, August 31, 2009
Tuesday, August 18, 2009
Thursday, September 03, 2009
Friday, August 28, 2009
Saturday, June 27, 2009
Friday, September 18, 2009
Saturday, May 30, 2009
Thursday, July 02, 2009
Sunday, July 19, 2009
Saturday, September 26, 2009
Wednesday, August 12, 2009
Tuesday, August 11, 2009
Sunday, May 17, 2009
Thursday, August 20, 2009
Tuesday, July 07, 2009
Monday, August 17, 2009
Saturday, August 01, 2009
Saturday, July 25, 2009
2696
2543
2177
2418
1700
2622
1686
1497
2078
1656
1421
2483
1509
2036
2380
1675
1976
821
1156
1754
810
1075
3024
744
714
Spatial Extent
Ranking
2
4
10
6
15
o
3
16
20
11
18
21
5
19
12
7
17
13
40
27
14
42
30
1
44
45
Average
Ozone (ppb)
(truncated)
91
86
89
85
85
83
84
85
84
84
84
82
83
82
82
82
82
85
84
81
85
83
80
84
84
Intensity
Ranking
1
5
2
7
9
23
12
10
19
13
14
31
22
29
35
26
32
6
20
37
11
24
53
17
18
Overall
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Bottom part of the Figure 5-3 shows the uncertainties (posterior standard errors) associated with the
predictions made in August 30, 2009.  Posterior standard errors are lower in the North Eastern quadrant
of the US aided by good monitor coverage (Figure 2-1) and low observed ozone concentrations. Also
elevations in the magnitude of errors can be seen over areas where ozone concentrations are high and the
prediction locations are not far from the monitoring sites such as Los Angeles, CA. General trend with
the standard errors of prediction is that the magnitude increases with high ozone concentrations and
predictions locations further away from monitoring locations.  This can be best seen in the scatter plot of
August 30 predictions and associated posterior standard error with each prediction is color coded based
on its distance to the nearest ozone monitor (Figure 5-4).  Similar patterns observed over the predictions
made in others days and will be discussed throughout the document.
                                           83

-------
       30 August 2009 - UNCERTAINITY


       Posterior Standard Deviation (Error)


         *  •   •  •          •



         o  s , o  o  . Q  o  ^  (.9 .\
         J\,'  N  <3  *b   \-  (%  ^ -51  V1

        *3*  N' N'  N.'  \' N'  \' ».'  N'
       -V  ")•  t-  <3  fo- V   %  q,-  NO
Figure 5-3. August 30, 2009, ozone concentrations for the 2010 US Census Tract locations predicted

by downscaler model (Top) and posterior standard deviation of the predictions (Bottom)
                                            84

-------
                                    August 30, 2009
    10

   9.5

    9

   8.5
 S7.5-
    7-
   6.5-
13 5.5
o
Q_
   5

  4.5

   4

  3.5

   3
                                                            Distance to the Nearest Ozone Monitor
                                                                41 - 10,000 meters
                                                                10,001 -25,000 meters
                                                                25,001 - 50,000 meters
                                                                50,001 -75,000 meters
                                                                75,001 -100,000 meters
                                                                100,001 -150,000 meters
                                                                150,001 -333,252 meters
       20
            25
                30
                    35
                        40
                            45
                                50  55   60   65   70   75   80
                                Predicted Ozone Concentrations (ppb)
                                                             85
                                                                 90
                                                                     95  100  105
Figure 5-4. Scatter plot of predicted ozone concentrations in August 30, 2009 and associated
posterior standard deviations. Each prediction is color coded based on its distance to the nearest
ozone monitor.

Monday, August 17, 2009 is the highest ozone day in terms of spatial extent covering 3,024 census tracts
over the two largest metropolitan areas in the U.S., New York, NY and Los Angeles, CA where
approximately 13,190,042 people live (Figure 5-5). Ozone concentrations are averaging 79.91 ppb with a
maximum concentration of 96 ppb. New York City, NY and surrounding areas are the only East Coast
areas predicted by the downscaler model to have concentrations above 75 ppb. Regarding to associated
uncertainty with August 17 predictions (Figure 5-6), standard  error of the predictions are elevated by high
ozone concentrations observed over New York area.  Contrary to August 30, errors in the South East are
less than the North East of the United States on August 17. Scatter plot of the predicted ozone
concentrations and posterior standard deviations (Figure 5-6) shows  similar pattern observed in August
30th (Figure 5-4).
                                             85

-------
       17 August 2009 - UNCERTAINITY
       Posterior Standard Deviation (Error)
Figure 5-5. August 17, 2009 ozone concentrations for the 2010 US Census Tract locations predicted
by downscaler model (Top) and standard deviations of the predictions (Bottom).
                                           86

-------
                                  August 17, 2009
                                                              Distance to the Nearest Ozone Monitor
                                                                  41 -10,000 meters
                                                                  10,001 -25,000 meters
                                                                  25,001 - 50,000 meters
                                                                  50,001 - 75,000 meters
                                                                  75,001 -100,000 meters
                                                                  100,001 -150,000 meters
                                                                  150,001 -333,252 meters
              25
                  30
                      35
                           40
                               45   50   55   60    65   70
                              Predicted Ozone Concentrations (ppb)
                                                         75
                                                              80
                                                                  85
                                                                       90
                                                                           95
Figure 5-6. Scatter plot of predicted ozone concentrations and associated standard deviations in
August 17, 2009. Each prediction is color coded based on its distance to the nearest ozone monitor.
Shown in Figure 5-7 July 18  is another highly ranked ozone day, 2  in intensity and 10  in spatial
extent covering Dallas-Fort Worth, TX and Los Angeles, CA). On this day, 2177 census tracts are
estimated to have an average ozone concentration of 89 ppb.

During Friday, May 15, 2009 and Tuesday, March 10, 2009, only one census tract is predicted to be
above 75 ppb, which is considered to be the lowest amongst the high ozone days. In general, August is
the month with the highest ozone followed by July and September, respectively.  In August, average
ambient ozone concentrations are estimated to be 83 ppb on days with ozone above 75 ppb.
                                             87

-------
                    ' ••             •• •••
                    • • .  «    .3 « i ** " *  *
                     •**  •   • • ** ^ *****  *


                              ••«,$* A.
                                     v-/.
                                         i.
                              %~     ^•••' **•«.

                                ^f:*
18 July 2009 - UNCERTAINITY


Posterior Standard Deviation (Error)
Figure 5-7. July 18, 2009 ozone concentrations for the 2010 US Census Tract locations predicted by

downscaler model (Top) and standard deviations of the predictions (Bottom).
                                           88

-------
                                   July 18, 2009
           20  25  30  35  40  45
                               50  55  60  65  70  75  80   85
                               Predicted Ozone Concentrations (ppb)
                                                             Distance to the Nearest Ozone Monitor
                                                              *  41 -10,000 meters
                                                                 10,001 -25,000 meters
                                                                 25,001 -50,000 meters
                                                                 50,001 -75,000 meters
                                                                 75,001 -100,000 meters
                                                                 100,001 -150,000 meters
                                                                 150,001 -333,252 meters
                                                                 	,	1	1	r-
                                                         90  95  100  105  110  115  120
Figure 5-8. Scatter plot of predicted ozone concentrations and associated standard errors in July
18, 2009. Each prediction is color coded based on its distance to the nearest ozone monitor.

Table 5-3 ranks each census tract based on the number of days that the daily maximum 8 hour
concentrations are above 75 ppb.  Associated Figure 5-9 displays the number of ozone days above 75 ppb
and their location. Based on the downscaler model estimates, census tracts "06071008602" and
"06071008706" have the highest number of days (72) with ozone above 75 ppb (72 days each). For both
census tracts the average ozone concentrations for those days is 85 ppb. On those 72 days the maximum
ozone concentrations were 116 and 115 ppb for the two tracts, respectively.  The top 18 tracts are in San
Bernardino County in California (FIPS code "071" identified by 3rd through 5th character in Census Tract
ID) followed by tracts that are located in Riverside County, CA (FIPS code "065").
                                            89

-------
   Table 5-3. Census tract rankings based on ozone estimates (out of 17,280 census tracts that
predicted to have at least one day with ozone concentration above 75 ppb, only top 30 are shown)
Census Tract Number of Days
ID Above 75 ppb
06071008602
06071008706
06071007904
06071008705
06071008401
06071008402
06071008404
06071008601
06071008703
06071008704
06071008710
06071007901
06071007903
06071008403
06071008500
06071008800
06071008708
06071011101
06065043809
06065043811
06065044104
06071007604
06071008002
06071008200
06071008709
06071011002
06065043802
06065043823
06065044200
72
72
70
70
69
69
69
69
69
69
69
68
68
68
68
68
67
67
66
66
66
66
66
66
66
66
65
65
65
Ranking based on
Ozone Days
1
1
3
3
5
5
5
5
5
5
5
12
12
12
12
12
17
17
19
19
19
19
19
19
19
19
27
27
27
Average Ozone Above Maximum
75 ppb (truncated) Ozone
85
85
86
85
86
86
85
86
85
85
85
86
86
86
85
86
86
86
85
85
85
86
86
85
86
86
85
85
85
116.1
115.2
118.8
115.3
119.0
118.6
118.1
118.0
111.6
113.7
113.6
119.1
119.2
117.9
115.8
113.4
113.0
117.3
108.0
110.1
106.2
119.2
118.2
117.1
112.9
117.2
111.7
112.4
103.0
                                       90

-------
   Ozone days above 7Sppb
       1 - 3 days
       4-15 days
       16-30 days
    •  31-45 days
    •  46-72 days

Figure 5-9. Number of ozone days above 75 ppb for US census tracts predicted by DS.

The downscaler model estimates can track the AQS observations and CMAQ predictions, and the
downscaler model estimates can differ from either the AQS observations or the CMAQ predictions. To
see how the daily downscaler model estimates compare to the AQS observations, we selected the
monitors that are within 100 meters of a census tract centroid.  Census tract and AQS site pairs are shown
in Table 5.4. The associated Figure 5-10 shows the time series data for the listed sites. As shown the
downscaler model estimates generally follow the AQS and CMAQ data. Keep in mind that the
downscaler concentrations are point estimates trying to replicate the point measurement conditions of the
AQS monitoring site. CMAQ concentrations represent the average conditions within 12 by 12 km grid
cells.  To further elaborate this condition, Figure 5-11  shows the relationships among the AQS ozone
monitoring site locations, CMAQ grid cells and the US census tract centroids in the Los Angeles area.
The CMAQ  cells are on a continuous grid. EPA ozone monitor siting criteria requires States to place
monitors in and around the urban  areas with high populations.  The US Census Bureau uses population
size to define the census tract boundaries. Census tract population sizes vary between 1,200 and 8,000.
The optimum size is 4,000 people. Therefore, in the Los Angeles area, it is not a surprise to see how
dense the census tract locations are in the urban areas, where the monitoring sites and high population
areas are located.  Also, not surprising is how less dense the census tracts are in the rural areas where
there are few, if any ozone monitors.
                                           91

-------
Table 5-4. List of AQS sites that are within 100 meters of a census tract centroid
040134004
120712002
320032002
420950025
421010004
421250200
170310064
04013811200
12071010801
32003004000
42095017800
42101019000
42125754400
17031836200
Maricopa
Lee
Clark
Northampton
Philadelphia
Washington
Cook
Arizona
Florida
Nevada
Pennsylvania
Pennsylvania
Pennsylvania
Illinois
                                 92

-------
           Northampton County, PA (AQS ID 420950025; Census Tract
                          ID 42095017800)
Maricopa County, AZ (AQS ID 040134004; Census Tract ID
               04013811200)
                                                              Washington County, PA (AQS ID 421250200; CensusTract
                                                                             ID 42125754400)
          Clark County, NV(AQS ID 320032002; CensusTract ID
                      —32003004000)
                                                             Cook County, IL (AQS ID 170310064; CensusTract ID 17031836200)

           Philadelphia County, PA (AQS ID421010004; CensusTract
                                                             Lee County, FL (AQS ID 120712002; Census Tract ID 12071010801)
                	AQS  	CMAQ    Downscaler                            	AQS  	CMAQ    Dovvnscaler
Figure 5-10. Daily 8-hour maximum ozone concentrations measured by AQS monitor and
estimated by CMAQ model and Downscaler fusion for selected sites in Table 5-4 (Census centroids
that are within 100 meters of and AQS site).
                                                  93

-------
Census Tract Centrolds
August 17. 2009 - OS predicted Ozone (ppt> I
          + AQS Sites
            ^j CMAQ 12 km Grids
             County Boundaries
             Urban Areas
Figure 5-11. Downscaler model predictions over Los Angeles, CA and surrounding areas


5. ₯. 2   Summary ofPMi. s Results
As a summary of the overall year, Figure 5-12 and Figure 5.13 show the annual means and the 98th
percentile 24-hour average PIVb.s concentrations for AQS observations, CMAQ runs and downscaler
predictions. Based on downscaler model estimates for 2009, the 98th percentile of PIVh.s values are above
35 |ig/m3 for the 1,137 census tracts (Figure 5-14) averaging 43.8 |ig/m3. 18,056 Census tracts have at
least one day with a PIVh.s concentration above 35 jig/m3. 7,526 census tracts have PIVb.s annual average
concentrations above 12 |ig/m3 (mean 13.2 |ig/m3).
                                            94

-------
                                             AQS
                                                                             2009
                                                                             Annual mean,
                                                                             24-hour avg
                                                                             PM2.5 (ug/m3)
                                                                                 (0,3]
                                                                                 (3,5]
                                                                                 (5,8]
                                                                                 (8,10]
                                                                                 (10,12]
                                                                                 (12,15]
                                                                                 (15,18]
                                                                              •  (18,lnf]
Figure 5-12. Observed, modeled and predicted annual mean PMi.s concentrations)
                                               95

-------
                                                                               2009
                                                                               98'th percentile,
                                                                               24-hour avg
                                                                               PM2.5 (ug/m3)
                                                                                  (0,10]
                                                                                  (10,15]
                                                                                  (15,20]
                                                                                • (20,25]
                                                                                • (25,30]
                                                                                  (30,35]
                                                                                  (35,40]
                                                                                • (40,45]
                                                                                • (45,50]
                                                                                • (50,lnf]
Figure 5.13 Observed, modeled and predicted 98th percentile 24-hour average PMi.s
concentrations.
                                                 96

-------
       • PM25 98"'Percentile Above 35ug/m
      US Census Tract Locations
       • PM~ ^ Annual Average above 1
Figure 5-14. Census tract locations (centroid) where the 98th percentile of 24-hour average and
annual average PMi.s concentrations are above 35 ug/m3 (top) and 12 ug/m3 (bottom) respectively.
                                            97

-------
Table 5-5 ranks the days in 2009 based on the combined spatial extent and intensity of the 24-Hour
average PM2.5 concentration estimates above 35 jig/m3 (only top 25 days are shown). There are 102 days
on which at least one census tract was predicted to have a 24-hour average PM2.5 concentration above 35
|ig/m3. Similar to ozone in section 5.4.1, this approach ranks the days of the year based on two criteria:
(1) spatial extent in terms of the number of census tracts where 24-hour average PIVb.s concentration
estimates are above 35 |ig/m3 (spatial extent criterion), and (2) average PIVb.s concentrations at those
locations (intensity criterion). In overall, Thursday, January 01, 2009 is the highest ranked PIVb.s day
based on the two criterions mentioned above (Figure 5-15). Tuesday, October 13, 2009, is the most
intense day with average concentrations of 57 |ig/m3 covering 107 census tracts (Figure 5-17). Thursday,
January 22, 2009, is the highest PIVb.s day in terms of spatial extent covering 4,589 census tracts (Figure
5-19).
 Table 5-5. Rank order of days in 2009 based on combined spatial extent and intensity of 24-Hour
               average PMi.s concentration estimates (only top 25 days are shown)
Day Spatial Extent in
Terms of Census
Tract Count
Thursday, January 01, 2009
Thursday, December 10, 2009
Friday, December 11, 2009
Tuesday, January 20, 2009
Wednesday, January 21, 2009
Friday, January 02, 2009
Thursday, January 22, 2009
Thursday, December 03, 2009
Friday, December 04, 2009
Saturday, December 05, 2009
Friday, January 16, 2009
Monday, January 19, 2009
Tuesday, January 13, 2009
Wednesday, December 02,
2009
Sunday, January 18, 2009
Friday, December 25, 2009
Friday, December 18, 2009
Tuesday, December 29, 2009
Saturday, January 17, 2009
Sunday, August 30, 2009
Thursday, January 15, 2009
Sunday, December 20, 2009
Saturday, January 31, 2009
Tuesday, October 13, 2009
Wednesday, January 14, 2009
4202
1706
1391
836
746
3330
4589
652
2534
1142
1163
1987
725
454
688
1977
2572
505
806
1137
439
1410
452
107
419
Spatial
Extent
Ranking
2
9
12
21
24
3
1
29
5
14
13
7
25
37
28
8
4
35
22
15
39
11
38
60
41
Average PM2.s
(ug/m3)
(truncated)
54
47
48
55
55
43
43
52
43
44
43
42
46
50
46
42
41
47
43
42
46
41
44
57
44
Intensity
Ranking
6
13
11
5
o
3
24
26
7
31
23
27
33
17
8
18
38
43
14
29
37
15
46
22
1
21
Overall
Rank
1
2
3
4
5
5
5
8
8
10
11
11
13
14
15
15
17
18
19
20
21
22
23
24
25
                                           98

-------
99

-------
   January 1, 2009 - UNCERTAINLY
   Posterior Standard Deviation (Error)
          A<^ A<^  A<^  A<^ A
-------
                                     January! 2009-PM2.5
                                                         Distance to the Nearest PM2 5 Monitor
                                                             41 -10,000 meters
                                                          .'•;: 10,001 -25,000 meters
                                                             25,001 -50,000 meters
                                                             50,001 -75,000 meters
                                                             75,001 -100,000 meters
                                                          •".: 100,001 -150,000 meters
                                                             150,001 -333,252 meters
               10
                   15   20
                             25
                                 30   35    40   45   50   55   60
                                    Predicted PM2.5 Concentrations
                                                                  65   70
                                                                           75
                                                                                80
                                                                                    85
                                                                                         90
                                                                                              95
Figure 5-16. Scatter plot of predicted PMi.s concentrations and associated posterior standard
errors in January 1, 2009. Each prediction is color coded based on its distance to the nearest ozone
monitor.
                                               101

-------
102

-------
   October 13, 2009 - UNCERTAINTY
   Posterior Standard Deviation (Error)
Figure 5-17. October 13, 2009 24-Hour PMi.s concentrations for the 2010 US Census Tract
locations predicted by downscaler model (Top) and posterior standard error of the predictions
(Bottom). Kern County, CA is highlighted with red rectangle.
                                          103

-------
                                     October 13, 2009 - PM2.5
                                                     Distance to the Nearest PM2 5 Monitor
                                                          41 -10,000 meters
                                                          10,001 -25,000 meters
                                                          25,001 -50,000 meters
                                                          50,001 -75,000 meters
                                                          75,001 -100,000 meters
                                                          100,001 -150,000 meters
                                                          150,001 -333,252 meters
          4  6  8 10 12  14 16 18  20 22 24 26 28 30 32  34 36 38  40 42 44  46 48 50  52 54 56 58 60 62  64 66
                                    Predicted PM2.5 Concentrations (Mg>m3)
Figure 18. Scatter plot of predicted PMi.s concentrations and associated posterior standard errors
in October 13, 2009. Each prediction is color coded based  on its distance to the nearest ozone
monitor.
                                               104

-------
..c #?m
    105

-------
   January 22, 2009 - UNCERTAINTY
   Posterior Standard Deviation (Error)
        V  -V V  *f  N-  <5-  
-------
                                      January 22, 2009-PM2.5
                                                                          Salt Lake City
                                                             Distance to the Nearest PM
                                                                  41 -10,000 meters
                                                                  10,001 -25,000 meters
                                                                  25,001 -50,000 meters
                                                                  50,001 - 75,000 meters
                                                                  75,001 -100,000 meters
                                                                  100,001 -150,000 meters
                                                                  150,001 -333,252 meters
       2 4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66
                                     Predicted PM2.5 Concentrations (
                                                                                          70 72 74
Figure 5-20. Scatter plot of predicted PMi.s concentrations and associated posterior standard
errors in January 22, 2009. Each prediction is color coded based on its distance to the nearest ozone
monitor.
                                               107

-------
Table 5-6 ranks each census tract based on the number of days and average concentrations that the 24-
Hour average PIVb.s concentration estimates above 35 jig/m3.

Associated Figure 5-21 displays the number of PIVb.s days above 35  jig/m3 and their location. Based on
the downscaler model estimates, census tracts "06029002814" has the highest number of days (37) with
an average concentration of 50.8 |ig/m3. The top 30 tracts are in Kern County in California (FIPS code
"029" identified by 3rd through 5th character in Census Tract ID).
    Table 5-6 . Census tract rankings based on PMi.s estimates (out of 18,056 census tracts that
       predicted to have at least one high concentration day of PMi.s, only top 30 are shown)
Census Tract ID Number of Days
Above 35 ug/m3
06029002814
06029002815
06029002816
06029002812
06029003112
06029002813
06029001802
06029001801
06029002804
06029001901
06029003113
06029002807
06029002700
06029002806
06029002818
06029003812
06029002808
06029002900
06029001902
06029003114
06029002817
06029003811
06029000507
06029002819
06029002811
06029000506
06029003808
06029001700
06029002821
37
37
37
36
37
36
36
36
36
36
37
36
36
36
36
36
36
36
36
37
36
36
36
36
36
36
36
36
36
Average PM2.s
(ug/m3)
50.8
50.5
50.4
51.6
50.1
51.4
51.4
51.3
51.3
51.2
49.8
51.1
51.0
51.0
51.0
50.9
50.9
50.8
50.8
49.4
50.7
50.7
50.6
50.5
50.5
50.4
50.4
50.4
50.4
Maximum PMi.s
(ug/m3)
75.3
74.5
74.5
75.8
73.7
75.5
75.6
75.7
75.4
75.1
73.4
75.1
74.9
74.6
74.5
74.4
74.3
74.2
74.6
72.6
74.3
73.8
74.3
73.9
73.7
74.0
73.5
74.3
73.5
Rank
1
2
o
6
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
                                         108

-------
     I
  PM2.5 days above 35 |jg/m3
      1 - 7 days
      8-10 days
      11 -20 days
    •  21 -30 days
    •  31-37 days
Figure 5-21 Number of days 24-hour average PMi.s concentrations are above 35 ug/m3 for the US
census tracts predicted by DS
5.5    Accuracy Assessment of Downscaler Model Results

This section describes the predictive performance of DS in the 2009 application. The general approach
for this involves running DS with a subset (in this case 10%) of monitors removed from the monitoring
data set and predicting to the spatial points of the removed monitors.  This approach is sometimes called
"cross-validation" (CV). Errors and biases can then be calculated for each prediction point by comparing
the resulting prediction to the actual monitoring data. For this application, the default CV method in the
DS software was followed, which involves leaving out a random sample of 10% of the monitors in each
day of data. The sites left out on one day are not necessarily the same set of sites on another day.
Monitor- and day-specific errors and biases were then aggregated into the metrics below to provide an
overview of the model's accuracy.

First, day-specific Root Mean Square Error (RMSEd), Mean Absolute Error (PMAEd), and biasd (or Mean
Bias Error) were calculated to evaluate the predictive capability of the Downscaler model. Daily RMSE
is defined as
                              RMSEd =
                                                            V2
                                         109

-------
where/1/ and Oj are the DS prediction and observed concentrations respectively in location/  dis the
specific day of the year. Even though both the RMSE and the MAE measure the average magnitude of
the errors, it is useful to report both to diagnose the variation in the errors. Daily MAE is defined as
                                   MAEd  =
                                                 7 = 1
While the MAE gives equal weight to all, the RMSE emphasizes large errors and is most useful when
large errors are undesirable. Given the fact that the RMSE will always be larger or equal to the MAE, the
difference between the two highlights the magnitude of the variance in the individual errors in such the
greater the difference, the greater the variance. Daily bias is defined as

                              biasd = n~
                                           J=l
where Pd  and Od are the model-predicted and observed daily mean concentrations respectively.
Secondly, location-specific Root Mean Square Error (RMSEi), Mean Absolute Error (MAEi), and bias,
were calculated as:
                                RMSEj =
                                               d=l
                                                                V2
where Pd and Od are the DS prediction and observed concentrations respectively in day d. j is the specific
observation (monitor) location.
and
                                   MAEj =
= rr1
                                                dj - OdJ) = PJ - Oj
                                           d=l
where PJ and Oj are the model-predicted and observed location-specific mean concentrations
correspondingly.

Thirdly, to further analyze how the downscaler performed over different locations, the Getis-Ord Gi*
statistic (Getis and Ord, 1992)16  for each RMSE, and MAE, are calculated, which returned a z-score for
each monitor location. For statistically significant positive z-scores, the larger the z-score, the more
intense the clustering  of high values indicating relatively poor model performance. For statistically
significant negative z-scores, the smaller the z-score is, the more intense the clustering of low values
which indicates better model performance. Getis-Ord local statistics is calculated as17:
16 Getis, A. and J.K. Ord. 1992. "The Analysis of Spatial Association by Use of Distance Statistics" in Geographical
Analysis 24(3).
"The ArcGIS 10.1 Resources: How Hot Spot Analysis works:
http://resources.arcgis.eom/en/help/main/10.l/index.htmltf/How  Hot Spot Analysis Getis Ord Gi  works/005p
00000011000000/
                                            110

-------
                              „._
                                 ~
where Xy is either M4E/ or RMSEj for monitory, Wjj is the spatial weight between monitor /' andy, n is
equal to total number of AQS monitors. X and S2 are sample mean and variance:
                                        j?-Z"=i*Y
                                        A —       /n
and
                                    5=
5.5.1  Assessment of 8-hour Ozone Run
Daily RMSEs, MAEs and Bias values are depicted in Figure 5-22. Daily Bias values are ranging from -
1.5 to 2 ppb. Ranges for the daily RMSE^ and the MAEd are 2.7 to 7.5 ppb and 2.1 to 5.4 ppb,
respectively. On January 25th, the variance in the individual errors, the difference between the RMSE and
the MAE were minimal. On April 5th, however, the variance was the highest.  These results are
somewhat aligned with the test results reported in the Berrocal et. al. (2009) paper which documents
overall performance of the downscaler model. This provides some confidence that in general the 2009
application of the downscaler model is performing reasonably well.
                                         Ill

-------
                                •Daily Bias
•Daily RMSE
Daily MAE
Figure 5-22. Daily validation results

In Figure 5-23 the location specific RMSE/-, and MAE7 values are presented over 1207 monitoring
locations. Both the RMSE, and the MAE, show similar patterns over the US domain with a slightly better
fit for the Eastern US than the Western part of the country.
                                           112

-------
                                        *  /«f *V~
                                             T% ° 4?v ' 0° o *
                                         O—    y>-OO__.°  IM n
                                     •   n»
                                     O  & *
                   ° o °



0  °*^°0   «  ".t^^0

      0 ° o  •>°^%
               ^    a
    1            a O    O
    00f%  o  o «o    •
 1
y
Figure 5-23. The average magnitude of the errors in 2009 predictions based on the spatio-temporal

Downscaler model: the Mean Absolute Error (Top) and the Root mean Square Error (Bottom).
                                  113

-------
The Getis-Ord Gi* statistic (pronounced G-i-star) for each RMSE/ and MAE, are calculated, which
returned a z-score for each monitor location (Figure 5-24).  Clearly, the downscaler model performs
better over the Eastern US than the Western part of the country. There was intense clustering of high
RMSE7 and MAE, values in the West (statistically significant positive z-scores) indicating relatively poor
model performance. On the other hand over the Eastern US, there were statistically significant negative z-
scores indication of the clustering of low RMSE, and MAE, values which is most likely due to the higher
density of the ozone monitoring network in the East.

Lastly, overall DQO metrics across locations and time are calculated. For 2009, the RMSE, MAE and
bias values are 4.7, 3.3, and 0.01 ppb, respectively. The difference between overall RMSE and MAE isn't
big enough to indicate the presence of very large errors however there is some variation in the magnitude
of the errors.
                                           114

-------
          Model Validation
          (11 / -Hr.i no (or the RMSE
                                  .
           • -258--18GStd On
           V .1 9S •-1 63 SHI Oev
             .165-1 65Sid D«v
           « 1{.S IK. Sid D»v
                                9'
 *'•-«»                        *m  •
 *                             *
 i.
dF ..-
 £*
•                       *  •*
             -1.96- -1 65 Sid Dev
             •1.65- 1,65$!d D*v
             1.66 -1 96 Sid Dwv
             1.9S-2WSW Dcv,
             ^2 58 Sid Dev
                         t  •*



                            A
                              *  !
Figure 5-24. The Gi* statistic returned z-scores for each monitor locations over the United States.
Both the RMSE, (Top) and the MAE, (Bottom) based Gi* z-scores show similar patterns with very
slight differences.
                                              115

-------
5.5.2   Assessment of PM2.5

Daily RMSEs, MAEs and Bias values are depicted Figure 5-25. Daily Bias values are ranging from -3.8
to 2 |ig/m3. Ranges for the daily RMSE and the MAE are 0.5 to 13.3 |ig/m3 and 0.4 to 5.2 |ig/m3,
respectively. On September 27th, the variance in the individual errors, the difference between the RMSE
and the MAE were minimal. On January 17th, the variance was the highest; it was not enough to be great
concern however.
                           .5 Daily Bias
•PM2.5 Daily RMSE
PM2.5 Daily MAE
Figure 5-25 Daily validation results for PMi.s
In Figure 5-26 the location specific RMSE, and MAE values are presented over 929 monitoring locations.
Both the RMSE and the MAE show similar patterns over the US domain with a slightly better fit for the
Eastern US than the Western part of the country.
                                          116

-------
MAE (PM2.5)
 •  0.15-0.72
 •  0.73 - 0.99
 •  1.00-1.25
 O  1.26-1.56
 O  1.57-1.94
 O  1.95-2.39
 O  2.40-3.01
 O  3.02-3.94
 •  3.95 - 5.43
 •  5.44-10.40
                                                            117

-------
Figure 5-26 The average magnitude of the errors for PMi.s in 2009 predictions based on the spatio-
temporal downscaler model: the Mean Absolute Error (Top) and the Root mean Square Error
(Bottom).
Similar to ozone assessment, the Getis-Ord Gi* statistic for each RMSE and MAE are calculated for
PM2.5, which returned a z-score for each monitor location (Figure 5-27).  Statistically significant
clustering of high values are observed in the West coast indicating relatively poor model performance.
Significant clustering of low RMSE and MAE values are observed on the East coast indicating better
model performance. Similar to ozone application, the downscaler application of PM2.5 performs better
over the Eastern US than the Western part of the country.

Lastly, overall DQO metrics across locations and time are calculated for PM2.5. For 2009, the RMSE,
MAE and bias values are 2.8, 1.7, and 0.03 jig/m3 respectively. The difference between overall RMSE
and MAE isn't big enough to indicate the presence of very large errors however there is some variation in
the magnitude of the errors.
                                          118

-------
         1
         *•••      r

Model Validation (PM 2.5)
Gi* Z-Score for the MAE
    < -2.58 Std. Dev.
    -2.58--1.96 Std. Dev.
    -1.96--1.65 Std. Dev.
    -1.65-1.65 Std. Dev
    1.65-1.96 Std. Dev.
    1.96-2.58 Std. Dev.
    > 2 58 Std. Dev
                                                         119

-------
      Model Validation (PM 2.5)
      Gi* Z-Score for the RMSE
         < -2.58 Std. Dev.
         -2.58--1.96Std. Dev.
         -1.96--1.65 Std. Dev.
         -1.65-1.65 Std. Dev
         1.65-1.96 Std. Dev.
         1.96-2.58 Std. Dev.
         > 2.58 Std. Dev
Figure 5-27 The Gi* statistic returned z-scores for each PMi.s monitoring locations over the United
States. Both the RMSE (Top) and the MAE (Bottom) based Gi* z-scores show similar patterns with
very slight differences.
5.6    Summary and Conclusions

The results presented in this report are from an application of the DS fusion model for characterizing
national air quality for Ozone and PM2.5.  DS provided spatial predictions of daily ozone and PIVb.s at
2010 U.S. census tract centroids by utilizing monitoring data and CMAQ output for 2009. Large-scale
spatial and temporal patterns of concentration predictions are generally consistent with those seen in
ambient monitoring data.  Both ozone and PM2.5 were predicted with greater accuracy in the eastern
versus the western U.S., presumably due to the greater monitoring density in the east. Another way of
summarizing results is shown in Figure 5-28, which plots the DS predictions and ambient measurements
paired together by the census tracts containing them.  Data is plotted for all days and split out (in each
row) by the NOAA climate regions. The outliers seen to the right of the 1:1 lines in Figure 5-28 were
found to arise in census tracts where the AQS value was substantially higher than the surrounding CMAQ
values. Sampling more points from the DS prediction surface and averaging across the census tracts may
better characterize the census tract area averages, which can be explored in future analyses with the DS
model.

A major distinguishing feature of the DS output is the standard errors accompanying each concentration
prediction. These standard errors give information that complements the cross-validation (CV)-
determined errors and biases. Whereas CV provides measures of accuracy, the DS-produced
uncertainties give a measure of prediction precision. Figures 5-4, 5-6, 5-8,  5-16, 5-18, and 5-20 illustrate
                                           120

-------
their utility: the errors demonstrate a clear increase in magnitude as distance from the nearest monitor
increases. This numerically demonstrates the intuitively expected decrease in confidence of the
relationship between observed and CMAQ data that DS models as monitor network density decreases,
e.g. as in the western U.S. A total uncertainty could theoretically be constructed by combining the
precision and bias, which could be a potentially useful tool in future network assessment and other
sampling designing activities.

An additional caution that warrants mentioning is related to the capability of DS to provide predictions at
multiple spatial points within a single CMAQ gridcell.  Care needs to be taken not to over-interpret any
within-gridcell gradients that might be produced by a user. Fine-scale emission sources in CMAQ are
diluted into the gridcell averages, but a given source within a gridcell might or might not affect every
spatial point contained therein equally. Therefore DS-generated fine-scale gradients are not expected to
represent actual fine-scale atmospheric concentration gradients, unless possibly multiple monitors are
present in the gridcell.
                                            121

-------
                                        PM25
                                                      i
                                                      I
                                                          count
                                                            ! 5000
                                                              4000
•                                                              3000
                                                              2000
                                                              1000

                                                                    NOAA
                                                                    Climate
                                                                    Region
Iccntnl     H Northwest I	

E«slMorthC«ntral ^1 South

NorthEasI    H SouthEmM H
                                                                                           WastNorlhCantral
         0    50   100   150     0   50  100  150  200
                      AQS Concentration
Figure 5-28 Downscaler predictions in each census tract versus the AQS Monitoring value in the
same census tract. Each row pools all annual data for the specified NOAA Climate Region.
                                            122

-------
                               Appendix A - Acronyms
Acronyms
ARW
BEIS
BlueSky
CAIR
CAMD
CAP
CAR
CARS
CEM
CHIEF
CMAQ
CMV
CO
CSN
DQO
EGU
Emission Inventory

EPA
EMFAC
FAA
FDDA
FIPS
HAP
HMS
ICS-209
IPM
ITN
LSM
MOBILE
Advanced Research WRF core model
Biogenic Emissions Inventory System
Emissions modeling framework
Clean Air Interstate Rule
EPA's Clean Air Markets Division
Criteria Air Pollutant
Conditional Auto Regressive spatial covariance structure (model)
California Air Resources Board
Continuous Emissions Monitoring
Clearinghouse for Inventories and Emissions Factors
Community Multiscale Air Quality model
Commercial marine vessel
Carbon monoxide
Chemical Speciation Network
Data Quality Objectives
Electric Generating Units
Listing of elements contributing to atmospheric release of pollutant
substances
Environmental Protection Agency
Emission Factor (California's onroad mobile model)
Federal Aviation Administration
Four Dimensional Data Assimilation
Federal Information Processing Standards
Hazardous Air Pollutant
Hazard Mapping System
Incident Status Summary form
Integrated Planning Model
Itinerant
Land Surface Model
OTAQ's model for estimation of onroad mobile emissions factors
                                        123

-------
MODIS
MOVES
NEEDS
NEI
NERL
NESHAP
NH
NMIM
NONROAD
NO
OAQPS
OAR
ORD
ORIS
ORL
OTAQ
PAH
PFC
PM2.5
PMio
PMc
Prescribed Fire
RIA
RPO
RRTM
SCC
SMARTFIRE

SMOKE
TCEQ
TSD
VOC
VMT
Wildfire
WRAP
WRF
Moderate Resolution Imaging Spectroradiometer
Motor Vehicle Emission Simulator
National Electric Energy Database System
National Emission Inventory
National Exposure Research Laboratory
National Emission Standards for Hazardous Air Pollutants
Ammonia
National Mobile Inventory Model
OTAQ's model for estimation of nonroad mobile emissions
Nitrogen oxides
EPA's Office of Air Quality Planning and Standards
EPA's Office of Air and Radiation
EPA's Office of Research and Development
Office of Regulatory Information Systems (code) - is a 4 or 5 digit
number assigned by the Department of Energy's (DOE) Energy
 Information Agency (EIA) to facilities that generate electricity
One Record per Line
EPA's Office of Transportation and Air Quality
Polycyclic Aromatic Hydrocarbon
Portable Fuel Container
Particulate matter less than or equal to 2.5 microns
Particulate matter less than or equal to 10 microns
Particulate matter greater than 2.5 microns and less than 10 microns
Intentionally set fire to clear vegetation
Regulatory Impact Analysis
Regional Planning Organization
Rapid Radiative Transfer Model
Source Classification Code
Satellite Mapping Automatic Reanalysis Tool for Fire Incident
Reconciliation
Sparse Matrix Operator Kernel Emissions
Texas Commission on Environmental Quality
Technical support document
Volatile organic compounds
Vehicle miles traveled
Uncontrolled forest fire
Western Regional Air Partnership
Weather Research and Forecasting Model
                                         124

-------
United States                  Office of Air Quality Planning and Standards     Publication No. EPA-454/R-14-001
Environmental Protection            Air Quality Assessment Division                               January 2014
Agency                              Research Triangle Park, NC
                                               125

-------