Air Toxics Data Analysis
Workbook
U.S. Environmental Protection Agency
Office of Air Quality Planning and Standards
Research Triangle Park, NC
June 2009
STI:908304.03-3224
-------
Table of Contents
(1 of 2)
Subject
Front Matter 1
Table of Contents 2
Disclaimer 4
Workbook Content Summary 5
Workbook Purpose 6
1. Introduction to Air Toxics 1
What are air toxics? 3
Why analyze ambient air toxics data? 5
Types of questions analysts may want
to consider 6
Suggested analyses 7
Using the workbook 12
References 13
2. Definitions and Acronyms 1
References 14
Page Subject
Page
3. Background 1
Air toxics overview 3
Health risks from air toxics 4
Air toxics emissions 5
Physical properties 7
Formation, destruction, transport 8
History of sampling 11
Air toxics sampling and analysis 19
Critical issues for interpretation 22
Resources 24
Appendix 25
References 26
4. Preparing Data for Analysis 1
What data are available 5
Data completeness 22
Method Detection Limits 28
Data validation 43
Summary 60
Appendix 61
Resources 65
Treating data
-------
Table of Contents
(2 of 2)
Subject
5. Characterizing Air Toxics 1
Temporal patterns 5
Spatial patterns 36
Risk screening 69
Summary 73
Resources 75
References 76
6. Quantifying and Interpreting
Trends in Air Toxics 1
Quantifying trends 18
Visualizing trends 21
Summarizing trends 28
Resources 48
Summary 49
Additional reading 51
References 53
Page Subject
7. Advanced Analyses
Source apportionment
Trajectory analyses
Emission inventory evaluation
Evaluating models
Network assessment
Resources
References
8. Suggested Analyses
Motivation
Data completeness
Validation techniques
Summary
References
Page
1
4
18
25
29
33
44
45
1
2
13
17
40
42
June 2009
Front Matter
-------
Disclaimer
The information and procedures set forth here are intended as a technical resource
to those conducting analysis of air toxics monitoring data. This document does not
constitute rulemaking by the Agency and cannot be relied on to create a substantive
or procedural right enforceable by any party in litigation with the United States. As
indicated by the use of non-mandatory language such as "may" and "should," it
provides recommendations and does not impose any legally binding requirements. In
the event of a conflict between the discussion in this document and any Federal
statute or regulation, this document would not be controlling. The mention of
commercial products, their source, or their use in connection with material reported
herein is not to be construed as actual or implied endorsement of such products. This
is a living document and may be revised periodically.
The Environmental Protection Agency welcomes public input on this document at any
time. Comments should be sent to Barbara Driscoll (driscoll.barbara@epa.gov).
June 2009 Front Matter
-------
Workbook Content Summary
Introduction
Brief overview of the workbook and its motivation.
• Definitions and acronyms
Background
Summary of air toxics information to provide a basis for the analyst regarding
emissions, formation, transport, and sampling/analysis of air toxics.
Preparing data for analysis
Methods and examples for validating air toxics data and preparing daily,
quarterly, and annual averages.
Characterizing air toxics
Methods and examples of characterizing air toxics concentrations including spatial
patterns, relationships, and time of day/seasonal variations.
Quantifying trends in air toxics
Methods and examples for preparing data for inter-annual trend analyses, identifying
and quantifying trends, and tying these trends to changes in emissions.
• Advanced data analysis techniques
Brief overview of advanced methods for data analysis including source apportionment.
• Suggested analyses
Summary of basic set of analyses that could be performed with air toxics at a local,
state, and regional level to better understand the data and inform policy makers.
June 2009 Front Matter
-------
Workbook Purpose
• This workbook was designed to
- serve as an overview of the sizeable topic of air toxics data analysis;
- provide suggestions on the methodology to use in analyzing air toxics data,
building on the experience gained in the past several years of national level
data analysis efforts; and
- document current methodology being used in national data analysis efforts.
• The workbook contains a different topic area in each section. Distinctions
between methods used to assess the data at a national level and methods
that can be applied at a site level are provided.
• Sections contain a range of information and examples. Basic knowledge of
summary statistics and data analysis techniques is assumed. The more
advanced analyses or statistical techniques are separately discussed.
• Figures are used to show example analyses. The figures are not intended
to show the only way in which to perform an analysis but rather to provide
the analyst with a starting point. Most figure captions list the tool used to
present the data, the data used in the analysis, an observation or
interpretation point, and a reference. When a reference is not provided, the
figure was prepared by the workbook authors specifically for the workbook.
• References are provided at the end of each section.
June 2009 Front Matter
-------
ntroduction to
Air Toxics
June 2009 Section 1 - Introduction to Air Toxics
-------
ntroduction to Air Toxics
What's Covered in This Section?
What are air toxics?
Why analyze ambient air toxics data?
Types of questions analysts want to answer
Suggested analyses overview
Using the workbook
June 2009 Section 1 - Introduction to Air Toxics
-------
What Are Air Toxics?
There are 188 Hazardous air pollutants (HAPs) defined in the Clean Air Act Amendments
of 1990. HAPs are also referred to as air toxics, which is a broader term and includes
additional pollutants such as hydrogen sulfide. For this document, the two terms "HAPs"
and "air toxics" will be used interchangeably. Air toxics are those pollutants known or
suspected to cause cancer or other serious health effects, such as reproductive effects
or birth defects.
Examples of toxic air pollutants include
- Benzene, which is found in gasoline.
- Perchloroethylene, which is emitted from some dry cleaning facilities.
- Methylene chloride, which is used as a solvent and paint stripper by a number of industries.
- Metals such as arsenic, mercury, chromium, and lead compounds, which are emitted, for
example, from metal processing operations.
- Semivolatile organic compounds (SVOCs) such as naphthalene, which is emitted in petroleum
refining and fossil fuel and wood combustion.
Most air toxics originate from anthropogenic sources, including mobile sources (e.g.,
cars, trucks, buses) and stationary sources (e.g., factories, refineries, power plants), and
indoor sources (e.g., some building materials and cleaning solvents). Some air toxics
are also released from natural sources such as volcanic eruptions and forest fires.
EPA is working with state, local, and tribal governments to reduce air toxics releases to
the environment ( i ).
- EPA has issued rules covering over 80 categories of major industrial sources, such as chemical
plants, oil refineries, aerospace manufacturers, and steel mills, as well as categories of smaller
sources, such as dry cleaners, commercial sterilizers, secondary lead smelters, and chromium
electroplating facilities.
- EPA and state governments (e.g., California) have reduced emissions of benzene, toluene, and
other air toxics from mobile sources by requiring the use of reformulated gasoline and placing
limits on tailpipe emissions.
June 2009 Section 1 - Introduction to Air Toxics
-------
List of 188 Hazardous Air Pollutants
1,1,2,2-Tetrachloroethane
1,1,2-Trichloroethane
1,1-Dichloroethane
1,1-Dichloroethylene
1,2,4-Trichlorobenzene
1,2-Dichloropropane
1,3-Butadiene
1,4-Dichlorobenzene
2,2,4-Trimethylpentane
Acetaldehyde
Acetonitrile
Acrolein
Acrylonitrile
Antimony (Tsp)
Antimony Pm2.5 Lc
Arsenic (Pm10) Stp
Arsenic (Tsp)
Arsenic Pm2.5 Lc
Benzene
Benzyl Chloride
Beryllium (Pm10) Stp
Beryllium (Tsp)
Bromoform
Bromomethane
Cadmium (Pm10) Stp
Cadmium (Tsp)
Cadmium Pm2.5 Lc
Carbon Disulfide
Carbon Tetrachloride
Chlorine Pm2.5 Lc
Chlorobenzene
Chloroethane
Chloroform
Chloromethane
Chloroprene
Chromium (Pm10) Stp
Chromium (Tsp)
Chromium Pm2.5 Lc
Cobalt (Pm10) Stp
Cobalt (Tsp)
Cobalt Pm2.5 Lc
Dichloromethane
Ethyl Acrylate
Ethylbenzene
Ethylene Dibromide
Ethylene Dichloride
Formaldehyde
Hexachlorobutadiene
Isopropylbenzene
Lead (Pm10) Stp
Lead (Tsp)
Lead Pm2.5 Lc
M/P-Xylene
Manganese (Pm10) Stp
Manganese (Tsp)
Manganese Pm2.5 Lc
Mercury (Tsp)
Mercury Pm2.5 Lc
Methyl Chloroform
Methyl Isobutyl Ketone
Methyl Methacrylate
Methyl Tert-Butyl Ether
Naphthalene
N-Hexane
Nickel (Pm10) Stp
Nickel (Tsp)
Nickel Pm2.5 Lc
O-Xylene
Phosphorus Pm2.5 Lc
Propionaldehyde
Selenium (Pm10) Stp
Selenium (Tsp)
Selenium Pm2.5 Lc
Styrene
Tetrachloroethylene
Toluene
Trichloroethylene
Vinyl Acetate
Vinyl Chloride
1,2-Dibromo-3-Chloropropane
1,3-Dichloropropene(Total)
1,4-Dioxane
2,4,5-Trichlorophenol
2,4,6-Trichlorophenol
2,4-Dinitrophenol
2,4-Dinitrotoluene
3-Chloropropene
4,6-Dinitro-2-Methylphenol
4-Nitrophenol
Aniline
Antimony (Pm10) Stp
Antimony Pm10 Lc
Arsenic Pm10 Lc
Beryllium Pm10 Lc
Biphenyl
Bis (2-Chloroethyl)Ether
Bis(2-Ethylhexyl)Phthalate
Cadmium Pm10 Lc
Caprolactam
Chlorine (Tsp)
Chlorine Pm10 Lc
Chromium (Coarse Particulate)
Chromium Pm10 Lc
Cobalt Pm10 Lc
Dibenzofurans
Dimethyl Phthalate
Di-N-Butyl Phthalate
Ethylene Oxide
Heptachlor
Hexachlorobenzene
Hexachlorocyclopentadiene
Hexachloroethane
Isophorone
Lead Pm10 Lc
Lindane
Manganese (Coarse Particulate)
Manganese Pm10 Lc
Mercury (Pm10) Stp
Mercury (Vapor)
Mercury Pm10 Lc
Methanol
Methoxychlor
M-Xylene
Nickel (Coarse Particulate)
Nickel Pm10 Lc
Nitrobenzene
O-Cresol
P-Cresol
Pentachlorophenol
Phenol
Phosphorus (Tsp)
Phosphorus Pm10 Lc
P-Xylene
Selenium Pm10 Lc
Xylene(S)
1,1-Dimethyl hydrazine
1,2-Diphenylhydrazine
1,2-Epoxybutane
1,2-Propylenimin e
1,3-Propane sultone
2,3,7,8-Tetrachlorodibenzo-p-dioxin
2,4-D, salts and esters
2,4-Toluene diamine
2,4-Toluene diisocyanate
2-Acetylaminofluorene
2-Chloroacetophenone
2-Nitropropane
3,3-Dichlorobenzidene
3,3-Dimethoxybenzidine
3,3'-Dimethyl benzidine
4,4-Methylene bis(2-chloroaniline)
4,4-Methylenedianiline
4-Aminobiphenyl
4-Nitrobiphenyl
Acet amide
Acetophenone
Acrylamide
Acrylic acid
Asbestos
Benzidine
Benzotrichloride
beta-Propiolactone
Bis(chloromethyl)ether
Calcium cyanamide
Captan
Carbaryl
Carbonyl sulfide
Catechol
Chloramben
Chlordane
Chloroacetic acid
Chlorobenzilate
Chloromethyl methyl ether
Coke Oven Emissions
Cresols/Cresylic acid
Cyanide Compounds
DDE
Diazomethane
Dichlorvos
Diethanolamine
Diethyl sulfate
Dimethyl aminoazobenzene
Dimethyl carbamoyl chloride
Dimethyl formamide
Dimethyl sulfate
Epichlorohydrin
Ethyl carbamate (Urethane)
Ethylene glycol
Ethylene imine (Aziridine)
Ethylene thiourea
Fine mineral fibers
Glycol ethers
Hexamethylene-1,6-diisocyanate
Hexamethylphosphoramide
Hydrazine
Hydrochloric acid
Hydrogen fluoride
Hydrogen sulfide
Hydroquinone
Maleic anhydride
m-Cresol
Methyl hydrazine
Methyl iodide (lodomethane)
Methyl isocyanate
Methylene diphenyl diisocyanate
N,N-Diethyl aniline
N-Nitrosodimethylamine
N-Nitrosomorpholine
N-Nitroso-N-methylurea
o-Anisidine
o-Toluidine
Parathion
Pentachloronitrobenzene
Phosgene
Phosphine
Phthalic anhydride
Polychlorinated biphenyls
Polycylic Organic Matter
p-Phenylenediamine
Propoxur (Baygon)
Propylene oxide
Quinoline
Quinone
Radionuclides (including radon)
Styrene oxide
Titanium tetrachloride
Toxaphene
Triethylamine
Trifluralin
Vinyl bromide
Abundance of data: > 20 monitoring sites with sufficient data to create a valid annual average between 2003-2005, up to 434 sites
Little data: < 20 monitoring sites with sufficient data to create a valid annual average between 2003-2005, between 1-17 sites
No Data: No valid annual averages between 2003-2005 From: http://WWW.epa.gov/ttn/atw/188poHs.html
June 2009
Section 1 - Introduction to Air Toxics
-------
Why Analyze Ambient Air Toxics Data?
National level analyses provide an overview of the air toxics program
and build on the power of a large data set to find the central
tendencies in the data. Data anomalies at an individual site have little
influence on the overall results on a national scale.
On a site-by-site basis, a much finer level of detail is needed to
understand the characteristics and trends observed. Knowledge is
needed of the nearby sources, operating schedules, facility upsets
and closures, new emission sources, types of emissions, types of
controls and scheduled implementation, data reporting and quality
issues, changes in sampling and methodology, local meteorology,
and other details to fully understand changes in ambient pollutant
concentrations.
States collecting data have unique "local" perspectives on data
quality, meteorology, and sources, and in articulating policy-relevant
data analysis questions.
Air toxics data analysis is needed at all levels to track progress in risk
reduction.
June 2009 Section 1 - Introduction to Air Toxics
-------
Types of Questions Analysts May Want to Consider
How do I ensure that the data I plan to use for analysis are of good quality?
- How do I treat data below detection? What kinds of data metrics do I need for subsequent
analyses? (See Preparing Data for Analysis, Section 4)
How do air toxics concentrations change spatially and by time of day, day of week, and
season?
- Which air toxics have similar patterns? (See Characterizing Air Toxics, Section 5)
- Do these air toxics have common sources? (See Background, Section 3)
What are the most important air toxics in terms of potential risk?
- Are we measuring them and, if so, are we measuring them well? Where are they
important?
- Which pollutants are not monitored well enough to characterize their risk or hazard? (See
Advanced Analyses, Section 7)
How do concentration levels for a given city/area compare to other cities?
- Are concentrations comparable? What is the variability of air toxics concentrations within
cities? Do specific cities, states, or regions experience demonstrably higher or lower
concentrations? Do rural and remote sites show demonstrably lower concentrations? Are
there differences in concentrations associated with geo-political or agency differences?
(See Characterizing Air Toxics, Section 5)
Have air toxics concentrations declined over time in response to emission control programs?
(See Quantifying and Interpreting Trends in Air Toxics, Section 8)
How do the most important air toxics compare with model output (e.g., are ambient
concentrations high in locations not shown by the model)? (See Characterizing Air Toxics,
Section 5)
June 2009 Section 1 - Introduction to Air Toxics
-------
Suggested Analyses
Overview
A list of suggested air toxics data analyses is compiled here to provide
direction on those analyses that may be performed by air toxics monitoring
agencies and to give an overview of analyses covered in the workbook.
EPA compiled this list of suggested air toxics data analyses based on analyses
that would help regional, state, and local organizations determine which factors
contribute to air toxics concentrations in their area and whether the control
strategies they have implemented have been successful at reducing these
pollutants.
This list is a suggested set of analyses that each area may wish to use to help
understand air toxics concentrations in the area. There are several key areas
of interest:
- Are data of sufficient quality for analysis?
- How would air toxics be characterized in the area?
- What are local sources of air toxics?
- Do toxics concentrations change over time?
For the most informative results, some of these analyses could be performed
annually.
June 2009 Section 1 - Introduction to Air Toxics
-------
Suggested Analyses (1 of 4)
Questions
Example Analyses
Are data of sufficient quality for analysis?
How have data been validated?
Run screening checks on data from AQS;
identify outliers
Does suspect data quality appear in any
years or species measurements?
Review collocated data; inspect summary
statistics and concentration ranges; review
time series plots of concentrations and
detection limits
Have data been censored?
Assess concentration distributions;
compare concentrations to detection limits
Are sufficient samples available for detailed
analyses?
Determine number of samples/species with
concentrations above detection
June 2009
Section 1 - Introduction to Air Toxics
-------
Suggested Analyses (2 of 4)
Questions
Example Analyses
What is the nature and extent of air toxics problems in your area?
What are the most abundant air toxics at
each site on a risk-weighted basis?
Determine median concentrations and
concentration ranges and compare to
appropriate risk levels
How do these species vary by
measurement season, month, and time of
day? Are findings consistent with national
level results?
Prepare box plots of concentrations by
season, month, and time of day; compare
to national results and expectations based
on local conditions
Do species show any day-of-week
patterns?
Prepare box plots of concentrations by day
of week; compare results to expected
patterns of local emissions
How do concentrations compare to other
locations, risk levels, remote background,
or reference concentrations?
Compare monitor-level data to national-
perspective plots
June 2009
Section 1 - Introduction to Air Toxics
-------
Suggested Analyses (3 of 4)
Questions
Example Analyses
What are local sources of air toxics?
What are the potential toxics sources in the
area?
Investigate Google map of area; overlay
VOC, PM2 5, and air toxics emission
inventory information
Do the air toxics corroborate the source
mixture?
Examine key species noted as tracers for
the expected sources in the area using
scatter plots and correlation matrices
Compare concentrations of air toxics and
nontoxic tracer species to further assess
sources (e.g., PM25 components,
hydrocarbons)
June 2009
Section 1 - Introduction to Air Toxics
10
-------
Suggested Analyses (4 of 4)
Questions
Example Analyses
Do air toxics concentrations change over time?
What are the annual trends in air toxics
concentrations?
Prepare annual box plots of key species to
evaluate trends
How might changes in air toxics
concentrations be related to emissions
controls?
Compare trends in co-emitted pollutants
Assess timing of controls and expected
reductions relevant to local monitoring of
pollutants.
June 2009
Section 1 - Introduction to Air Toxics
11
-------
Using the Workbook
• This workbook documents methodology used in national-scale
analyses, extends these methodologies to possible use in local-
scale analyses, and suggests methodology for further exploration.
• Skills needed by analysts to conduct the analyses shown in this
workbook vary. Analyses require a range of tools, skills, and
knowledge. A fundamental understanding of databases,
spreadsheets, and summary statistics is desirable. Some
analyses require special training (e.g., source apportionment
tools) and/or tools (e.g., sophisticated statistical treatments). In
general, analyses described in the following sections are arranged
from "easiest" to "most difficult" to perform.
• Examples are provided from the national-scale analyses and
some analyses were custom-designed for the workbook.
• Space available in the workbook is limited; therefore, many details
are, of necessity, provided in the literature. A reference section is
provided at the end of each chapter.
June 2009 Section 1 - Introduction to Air Toxics 12
-------
References
Agency for Toxic Substances and Disease Registry (ASTDR)
(2007) Frequently asked questions about contaminants found at
hazardous waste sites. Available on the Internet at
http://www.atsdr.cdc.qov/toxfaq.html.
U.S. Environmental Protection Agency, FERA(Fate, Exposure
and Risk Analysis) Risk Assessment and Modeling web site.
Available on the Internet at
http://www.epa.gov/ttn/fera/risk atoxic.html
U.S. Environmental Protection Agency (2007a) EPA air toxics web
site. Available on the Internet at
http://www.epa.gov/ttn/atw/allabout.html
U.S. Environmental Protection Agency ( 2007b) About air toxics,
health and ecological effects. Available on the Internet at
http://www.epa.gov/air/toxicair/newtoxics.html.
June 2009 Section 1 - Introduction to Air Toxics 13
-------
Definitions and Acronyms
This section lists
definitions of terms
and acronyms used
in this workbook.
,,»%." ,«•««* .'
V ' rft' .
1s#2^'-:.><'
..*% * tV**"* ^s " «; ^ * *
T*4 » ,»\^ v
-------
Definitions and Acronyms
(1 of 12)
Aerosol A particle of solid and/or liquid matter that can remain suspended in the air because of
its small size (generally under one micron).
AIRNow The U.S. EPA, NOAA, tribal, state, and local agencies developed the AIRNow web site
to provide the public with easy access to national air quality information. The web site offers
daily air quality index (AQI) forecasts as well as real-time AQI conditions for over 300 cities
across the United States, and provides links to more detailed state and local air quality web
sites .
Airshed A geographic area that, because of topography, meteorology, and/or climate, is
frequently affected by the same air mass.
Air Toxics - Any pollutant that causes or may cause cancer, respiratory, cardiovascular, or
developmental effects, reproductive dysfunctions, neurological disorders, heritable gene
mutations, or other serious or irreversible chronic or acute health effects in humans. See
hazardous air pollutant.
AMTIC - Ambient Monitoring Technology Information Center. An EPA website that contains
information and files on ambient air quality monitoring programs, details on monitoring methods,
monitoring- related documents and articles, information on air quality trends and nonattainment
areas, and federal regulations related to ambient air quality monitoring.
Anthropogenic Caused or produced by human activities.
Anthropogenic emissions Emissions from man-made sources as opposed to natural (biogenic)
sources.
AQS Air Quality System; the EPA's repository of ambient air quality data
Back trajectory A trace backwards in time showing where an air mass has been.
June 2009 Section 2 - Definitions and Acronyms
-------
Definitions and Acronyms
(2 of 12)
Background Levels The concentration of a chemical already present in an environmental
medium due to sources other than those under study. Two type of background levels may exist
for chemical substances: (a) naturally occurring levels of substances present in the
environment, and (b) concentrations of substances present in the environment due to human
associated activities (e.g., automobile or industrial emissions).
Benchmark Dose An exposure due to the dose of a substance associated with a specified low
incidence of risk, generally in the range of 1 % to 10% of a health effect; or the dose associated
with a specified measure or change of a biological effect.
Black Carbon (BC) Black carbon measured using light absorption, typically with an
AetnalometerTM. Used in the air toxics monitoring network as a potential surrogate measure
(although not unique or quantitative) of diesel particulate matter.
Cancer benchmark A potential regulatory threshold concentration of concern related to long term
exposure to a chemical associated with increased cancer risk.
Cancer Incidence The number of new cases of a disease diagnosed each year.
Cancer Risk Estimates The probability of developing cancer from exposure to a chemical agent
or a mixture of chemicals over a specified period of time. In quantitative terms, risk is
expressed in values ranging from zero (representing an estimate that harm certainly will not
occur) to one (representing an estimate that harm certainly will occur). The following are
examples of how risk is commonly expressed: 1 .E-04 or 1x 1CH = a risk of 1 additional cancer
in an exposed population of 10,000 people (i.e., 1/10,000); 1.5E-5or 1x 1Q-5+ 1/100,000.
Cd Cadmium.
Censored Data The measured value is replaced with a proxy: Typical examples are MDL,
MDL/2, MDL/10, or zero.
June 2009 Section 2 - Definitions and Acronyms
-------
Definitions and Acronyms
(3 of 12)
Census tract Census tracts are small, relatively permanent statistical subdivisions of a county.
Census tracts are delineated for most metropolitan areas (MAs) and other densely populated
counties by local census statistical areas committees following Census Bureau guidelines
(more than 3,000 census tracts have been established in 221 counties outside MA's). Six states
(California, Connecticut, Delaware, Hawaii, New Jersey, and Rhode Island) and the District of
Columbia are covered entirely by census tracts. Census tracts usually represent between
2,500 and 8,000 people and, when first delineated, are designed to be homogeneous with
respect to population characteristics, economic status, and living conditions. Census tracts do
not cross county boundaries. The spatial size of census tracts varies widely depending on the
density of settlement
-------
Definitions and Acronyms
(4 of 12)
Conditional probability function (CPF) A method that analyzes local source impacts from
varying wind directions using the source contribution estimates from PMF coupled with the
corresponding wind directions.
Confidence Interval (Cl) Cl for a population parameter is an interval with an associated
probability p that is generated from a random sample of an underlying population such that if the
sampling was repeated numerous times and the confidence interval recalculated from each
sample according to the same method, a proportion p of the confidence intervals would contain
the population parameter in question.
Covariance A statistical measure of correlation of the fluctuations of two different quantities.
Cr Chromium.
Data Quality The encompassing term regarding the quality of information used for analysis and/or
dissemination of data. Utility, objectivity, and integrity are essential parts of data quality.
Data Quality Objectives (DQOs) Qualitative and quantitative statements derived from the DQO
process that clarify study objectives, define the appropriate type of data, and specify tolerable
levels of potential decision errors that will be used as the basis for establishing the quality and
quantity of data needed to support the decisions.
Data Quality Objectives Process A systematic planning tool to facilitate the planning of
environmental data collection activities. Data quality objectives are the qualitative and
quantitative outputs from the DQO process.
Detection limit (DL) The lowest concentration of a chemical that can reliably with analytical
methods be distinguished from a zero concentration. See also method detection limit.
June 2009 Section 2 - Definitions and Acronyms
-------
Definitions and Acronyms
(5 of 12)
Dispersion model A source-oriented approach in which a pollutant emission rate and
meteorological information are input into a mathematical model that disperses (and may also
chemically transform) the emitted pollutant, generating a prediction of the resulting pollutant
concentration at a point in space and time.
DPM Diesel particulate matter.
Edge A line that defines the boundary of the relationship between two parameters on a scatter
plot.
Elemental carbon (EC) Black carbon material with little or no hydrogen; non-volatile carbon
material; often called black carbon or soot.
Emission Inventory (El) A list of air pollutants emitted into a community's atmosphere in
amounts (commonly tons) per day or year, by type of source.
EPA U.S. Environmental Protection Agency.
EPA PMF A standalone version of PMF created by the EPA in 2005.
Environmental justice The fair treatment and meaningful involvement of all people regardless of
race, color, national origin, or income with respect to the development, implementation, and
enforcement of environmental laws, regulations, and policies.
F-test The F-test provides a statistical measure of the confidence that a relationship exists
between the two variables (i.e., the regression line does not have a slope of zero, which would
indicate the dependent variable is not related to the independent variable).
F-value Output of the F-test. Large F-values indicate a stronger correlation between the two
variables (i.e., the slope of the regression line is NOT zero).
Factor analysis A procedure for grouping data by similarity among variables (i.e., variables that
are highly correlated are grouped).
Factor strength (source strength). See Source contribution.
June 2009 Section 2 - Definitions and Acronyms
-------
Definitions and Acronyms
(6 of 12)
Federal Reference Method (FRM) Provides for the measurement of the mass concentration of
fine particulate matter having an aerodynamic diameter less than or equal to a nominal 2.5
microns (PM2 5) in ambient air over a 24-hr period for purposes of determining whether the
primary and secondary National Ambient Air Quality Standards for fine particulate matter are
met. Designation of a particle sampler as a Federal Reference Method (FRM) is based on a
demonstration that a vendor's instrument meets the design specifications, performance
requirements, and quality control standards specified in the regulation.
Fine particles Particulate matter with diameter less than 2.5 microns; PM2 5.
HAPs (hazardous air pollutants) Hazardous air pollutants, also known as air toxics, have been
associated with a number of adverse human health effects, including cancers, asthma and
other respiratory ailments, and neurological problems such as learning disabilities and
hyperactivity.
Hazard Quotient (HQ) The ratio of a single substance exposure level over a specified time period
(e.g., chronic) to a reference value (e.g., an RfC) for that substance derived from a similar
exposure period.
HYSPLIT HYbrid Single-Particle Lagrangian Integrated Trajectory model; a system for computing
simple air parcel trajectories < u /,, /, s •„$$ i ^ : ;K ,-si. >.
IMPROVE Interagency Monitoring of Protected Visual Environments. A collaborative monitoring
program to establish present visibility levels and trends, and to identify sources of man-made
impairment < "/.o • VH ;, a •..,•••• ;.?:.:;xc •:%,.: ' o v.cO-:."a , . •.•••>.
Interquartile range The difference between the 75th and 25th percentiles of a data set.
June 2009 Section 2 - Definitions and Acronyms
-------
Definitions and Acronyms
(7 of 12)
Level 0 validation Routine checks made during the initial data processing and generation of
data, including proper data file identification, review of unusual events, review of field data
sheets and result reports, instrument performance checks, and deterministic relationships.
Level I validation Tests for internal consistency to identify values in the data that appear atypical
when compared to values of the entire data set.
Level II validation Comparison of the current data set with historical data to verify consistency
over time. This level can be considered a part of the data interpretation or analysis process.
Level III validation Tests for parallel consistency with data sets from the same population (i.e.,
region, period of time, air mass, etc.) to identify systematic bias. This level can also be
considered a part of the data interpretation or analysis process.
LC Local conditions; refers to ambient PM measurements.
MACT Maximum achievable control technology. MACTs are technology-based air emission
standards established under Title III of the 1990 Clean Air Act Amendments
< .. • ' /• '•',-, • , •"./'..•. ' ./, ' '•...'.>.
Mean The sum of all values divided by the number of samples.
Median The middle value in a sorted list of samples if there is an odd number of samples, or the
average of the two middle values if there is an even number of samples.
Method Detection Limit (MDL) The minimum concentration of a substance that can be
measured and reported with 99% confidence that the analyte concentration is greater than zero
and is determined from the analysis of a sample in a given matrix containing the analyte
Mobile sources Motor vehicles and other moving objects that release pollution; mobile sources
include cars, trucks, buses, planes, trains, motorcycles, and gasoline-powered lawn mowers.
Mobile sources are divided into two groups: road vehicles, which include cars, trucks, and
buses, and non-road vehicles, which include trains, planes, and lawn mowers.
June 2009 Section 2 - Definitions and Acronyms
-------
Definitions and Acronyms
(8 of 12)
Mobile source air toxics (MSATs) Compounds that are emitted by mobile sources and have the
potential for serious adverse health effects.
National Ambient Air Quality Standards (NAAQS) Health-based pollutant concentration limits
established by the EPA that apply to outside air.
NATA National air toxics assessment < >. EPA's
national-scale assessment of 1999 air toxics emissions. The purpose of the national-scale
assessment is to identify and prioritize air toxics, emission source types and locations that are of
greatest potential concern in terms of contributing to population risk.
NATTS National air toxics trends stations < >.
NEI National emissions inventory < >.
NOAA National Oceanic and Atmospheric Administration.
NWS National Weather Service.
OAQPS Toxicity Table The EPA Office of Air and Radiation recommended default chronic toxicity
values for hazardous air pollutants. They are generally appropriate for screening-level risk
assessments, including assessments of select contaminants, exposure routes, or emission sources
of potential concern, or to help set priorities for further research. For more complex, refined risk
assessments developed to support regulatory decisions for single sources or substances, dose-
response data may be evaluated in detail for each "risk driver' to incorporate appropriate new
toxicological data.
OH Hydroxyl radical; the driving force behind the daytime reactions of hydrocarbons in the
troposphere.
O3 Ozone; a major component of smog. Ozone is not emitted directly into the air but is formed by the
reaction of VOCs and NOX in the presence of heat and sunlight.
Organic carbon (OC) Consists of hundreds of separate semi-volatile and particulate compounds.
June 2009 Section 2 - Definitions and Acronyms
-------
Definitions and Acronyms
(9 of 12)
Outliers Data physically, spatially, or temporally inconsistent.
P-value Provides a measure of the percentage confidence that the slope is not zero: % confidence
slope is not zero = 100%(1 - P). Generally, 95% confidence is used as a cutoff value,
corresponding to a P-value of 0.05.
PAMS Photochemical Assessment Monitoring Stations .
Particulate matter (PM) A generic term referring to liquid and/or solid particles suspended in the air.
Percentile The pth percentile of a data set is the number such that p% of the data is less than that
number.
PM25 Particulate matter less than 2.5 microns. Tiny solid and/or liquid particles, generally soot and
aerosols. The size of the particles (2.5 microns or smaller, about 0.0001 inches or less) allows
them to easily enter the air sacs deep in the lungs where they may cause adverse health effects;
PM25 also causes visibility reduction.
PM10 Particulate matter less than 10 microns. Tiny solid and/or
liquid particles of soot, dust, smoke, fumes, and aerosols. The
size of the particles (10 microns or smaller, about 0.0004 inches
or less) allows them to easily enter the air sacs in the lungs where
they may be deposited, resulting in adverse health effects. PM10
also causes visibility reduction and is a criteria air pollutant.
PMF Positive matrix factorization; a receptor model. PMF can be
used to determine source profiles and source contributions
based on the ambient data.
POC Pollutant occurrence code used in the AQS.
Human hair
cross-section (70 |jm)
PM25
(2.5 Mm)
June 2009
Section 2 - Definitions and Acronyms
10
-------
Definitions and Acronyms
(10 of 12)
Point source Point sources include industrial and nonindustrial stationary equipment or
processes considered significant sources of air pollution emissions. A facility is considered to
have significant emissions if it emits about one ton or more in a calendar year. Examples of
point sources include industrial and commercial boilers, electric utility boilers, turbine engines,
industrial surface coating facilities, refinery and chemical processing operations, and petroleum
storage tanks.
Potential Source Contribution Function (PSCF) A method that combines the source
contribution estimates from PMF with the air parcel backward trajectories to identify possible
source areas and pathways that give rise to the observed high particulate mass concentrations
from the potential sources.
Precursor Compounds that change chemically or physically after being emitted into the air and
eventually produce air pollutants. For example, sulfur and nitrogen oxides are precursors for
particulate matter.
Primary particles The fraction of PM10 and PM2 5 that is directly emitted from combustion and
fugitive dust sources.
QA Quality assurance; a set of external tasks to provide certainty that the quality control system
is satisfactory. These tasks include independent performance audits, on-site system audits,
interlaboratory comparisons, and periodic evaluations of internal quality control data.
QC Quality control; a set of internal tasks performed to provide accurate and precise measured
ambient air quality data. These tasks address sample collection, handling, analysis, and
reporting (e.g., periodic calibrations, routine service checks, instrument-specific monthly quality
control maintenance checks, and duplicate analyses on split and spiked samples).
R-squared, r2 Statistical measure of how well a regression line approximates real data points;
an r2 of 1.0 (100%) indicates a perfect fit.
June 2009 Section 2 - Definitions and Acronyms n
-------
Definitions and Acronyms
(11 of 12)
Receptor model A receptor-oriented approach for identifying and quantifying the sources of
ambient air contaminants at a receptor primarily on the basis of concentration measurements at
that receptor.
Reference Concentration (RfC) An estimate (with uncertainty of perhaps an order of magnitude)
of a continuous inhalation exposure to the human population (including sensitive subgroups) that
is likely to be without an appreciable risk of deleterious effects during a lifetime.
Reid Vapor Pressure (RVP) A measure of gasoline volatility.
RFC Reformulated gasoline.
Residuals Measured concentrations minus modeled concentrations.
SEARCH Southeastern Aerosol Research and Characterization Study.
Secondary formation The fraction of a pollutant that is formed in the atmosphere (e.g.,
formaldehyde is both emitted directly and formed in the atmosphere through secondary
photochemical processes).
Selected ion monitoring (SIM) A mass spectral mode in which the mass spectrometer is set to
scan over a very small mass range, typically one mass unit, providing higher sensitivity results
than a full mass scan.
Slope Statistical measure of the average ratio of the predicted to measured concentrations of a
species; a slope closer to 1.0 demonstrates a closer fit.
Source apportionment The process of apportioning ambient pollutants to an emissions source.
Also known as source attribution.
Source contribution Total mass of material from a source measured in a sample.
Source-dispersion model See Dispersion model.
Source profile Listing of individual chemical species emitted by a specific source category.
June 2009 Section 2 - Definitions and Acronyms 12
-------
Definitions and Acronyms
(12 of 12)
Speciation Trends Network (STN) A network of sampling locations established by the EPA in
2001 to characterize PM25 composition in urban areas. Roughly 300 sites nationwide are part
of this network. Now part of the Chemical Speciation Network (CSN).
Standard Deviation A measure of how much the average varies. The square root of the
average squared deviation of the observations from their mean.
Standard operating procedure (SOP) A set of instructions used to ensure data quality.
Standardized residual Ratio of the residual to the uncertainty of a species in a specific sample
determined by the user.
State implementation plan (SIP) A detailed description of the programs a state will use to carry
out its responsibilities under the Clean Air Act. State implementation plans are collections of
the regulations used by a state to reduce air pollution. The Clean Air Act requires that the EPA
approve each state implementation plan.
SVOC Semi-volatile organic compound.
Toxicity The degree to which a substance or mixture of substances can harm humans or
environmental receptors.
TRI Toxic Release Inventory. Publicly available EPA database that contains information about
toxic chemical releases and other waste management activities reported annually by certain
covered industry groups as well as federal facilities .
TSP Total suspended particulate.
Uncensored data Data reported "as is" with no substitution for values below detection.
Variance The square of the standard deviation.
VOC Volatile organic compound.
WD Wind direction.
WS Wind speed.
XRF Energy dispersive X-ray fluorescence. Method used to quantify particulate metals.
June 2009 Section 2 - Definitions and Acronyms 13
-------
References
Bay Area Air Quality Management District (2005) Air quality glossary. Available on the Internet at
.
California Air Resources Board (2003) Glossary of air pollution terms. Available on the Internet at
.
Minnesota Pollution Control Agency (2005) General glossary. Available on the Internet at
.
National Park Service (2005) Glossary of terms used by the NPS Inventory and Monitoring Program.
Available on the Internet at .
Sam Houston State University (2005) Atmospheric chemistry glossary. Web site prepared by Sam
Houston State University, Department of Chemistry, Huntsville, TX, by the Department of
Chemistry. Available on the Internet at .
U.S. Environmental Protection Agency (2002) The plain English guide to the Clean Air Act:
Glossary. Available on the Internet at
.
U.S. Environmental Protection Agency (2005) AIRTrends 1997 report: list of acronyms. Available on
the Internet at .
June 2009 Section 2 - Definitions and Acronyms 14
-------
Background
What are air toxics and why are they important?
June 2009 Section 3 - Background
-------
Background
What's Covered in This Section?
Air toxics overview
Health risks from air toxics; terminology
Air toxics emissions
Physical properties
Formation, destruction, and transport of air toxics
History of sampling; objectives of air toxics and other
monitoring programs
Air toxics sampling and analysis
Critical issues for data interpretation
June 2009 Section 3 - Background
-------
Air Toxics
Overview
• What are air toxics ?
- Air toxics are gaseous, aerosol, or particle pollutants present in the air in varying concentrations with
characteristics such as toxicity or persistence that can be hazardous to human, plant, or animal life.
- The terms "air toxics" and "hazardous air pollutants" (HAPs) are used interchangeably in this document.
- Air toxics include the following general categories of compounds: volatile and semi-volatile organic
compounds (VOCs, SVOCs), polycyclic aromatic hydrocarbons (PAHs), heavy metals, and carbonyl
compounds.
• What are the health and environmental effects of toxic air pollutants?
- People exposed to toxic air pollutants at sufficient concentrations and durations may have an increased
chance of getting cancer or experiencing other serious health effects.
- Both high values and annual means of air toxics concentrations are of interest because some air toxics
have both episodic, short-term health effects and chronic, long-term health effects.
- Other health effects can include damage to the immune system, as well as neurological, reproductive
(e.g., reduced fertility), developmental, respiratory, and other health problems.
- Some toxic air pollutants, such as mercury, can deposit onto soils or surface waters where they are taken
up by plants and ingested by animals and are eventually magnified up through the food chain.
- Animals may experience health problems if exposed to sufficient quantities of air toxics over time.
• How are people exposed to air toxics?
- Breathing contaminated air.
- Eating contaminated food products, such as fish from contaminated waters; meat, milk, or eggs from
animajs that feed on contaminated plants; and fruits and vegetables grown in contaminated soil on which
air toxics have been deposited.
- Drinking water contaminated by toxic air pollutants.
- Ingesting contaminated soil.
- Touching contaminated soil, dust, or water.
- Accumulating some persistent toxic air pollutants in body tissues after toxic air pollutants have entered
the body. Predators typically accumulate even greater pollutant concentrations than their contaminated
prey. As a result, people and other animals at the top of the food chain who eat contaminated fish or
meat are exposed to concentrations that are much higher than the concentrations in the water, air, or soil.
U.S. Environmental Protection Agency (2007c, g)
June 2009 Section 3 - Background
-------
Health Risks from Air Toxics
Simply put, health risks are a measure of the chance that you will experience
health problems.
Health risk = Hazard x exposure
Health risk is the probability that exposure to a hazardous substance will
make you sick. Animal experiments and human studies provide information
about a substance's level of hazard. Scientists use the results of such
studies to estimate the likelihood of illness at different levels of exposure.
Exposure to toxic air pollutants can increase your health risks. For example,
if you live near a factory that releases cancer-causing chemicals and inhale
contaminated air, your risk of getting cancer may increase. Breathing air
toxics could also increase your risk of noncancer
effects such as emphysema, asthma, or
reproductive disorders.
Ambient concentrations of air toxics are compared
to health related concentrations derived from
scientific assessments conducted by the EPA and
other environmental agencies. These levels of
concern provide a frame of reference to put air
toxics concentrations into perspective.
U.S. Environmental Protection Agency (2007a, b)
June 2009
Section 3 - Background
-------
June 2009
Air Toxics Emissions
What Are the Sources of Air Toxics?
Air toxics are both directly emitted by sources and formed in the
atmosphere. In emission inventory terminology, emissions are grouped as
point (major), area, and mobile sources. The following 3 definitions
describe how these terms are used in the emission inventory.
Major sources include chemical plants, steel mills, oil refineries, and
hazardous waste incinerators for which there is a specific location provided
in the inventory. Pollutants can be released when equipment leaks, when
material is transferred from one area to another, or when waste is given off
from a facility through smoke stacks.
Area sources are made up of many smaller sources releasing pollutants
to the outdoor air in a defined area. Examples include neighborhood dry
cleaners, small metal plating operations, gas stations, and woodstoves.
These sources may not be identified in the inventory by a specific location.
Mobile sources include highway vehicles, trains, marine vessels, aircraft,
and non-road equipment (such as construction equipment).
Routine releases, such as those from industry, cars, landfills, or
incinerators, may follow regular patterns and happen continuously over K
time. Other releases may be routine but intermittent, such as when a $
plant's production is performed in batches. Accidental releases can occur
during an explosion, equipment failure, or a transportation accident. The
timing and amount released during accidental releases are difficult to
estimate.
Natural sources - Some air toxics are also released from natural sources
such as volcanoes or fires, typically in the inventory these would be
included in area source emissions. fatei
Section 3 - Background
-------
Air Toxics Emissions
Source Type Characteristics
Understanding the emission source type of a particular air toxic can help
the analyst begin to develop a conceptual model of concentration patterns
and gradients that might be expected.
• Major source emissions, for example, are a localized source of toxics. Steep
concentration gradients of primarily emitted toxics around point sources are
typical, especially if there are no other nearby sources of the pollutants.
• Area source emissions are typically well-distributed emissions sources because
there are multiple sources in an area. Area source emissions can lead to
relatively homogeneous concentrations of toxics on the urban scale. However, if
a monitor is placed close to any source type, gradients may be observed.
• Mobile source air toxics exhibit both point
source and area source characteristics. Very
close to a roadway or near a construction
site, mobile source air toxics may be seen in
higher concentrations. A few hundred meters
away from the roadway, for example,
concentrations typically fall to more normal
average urban-scale levels.
June 2009
Section 3 - Background
-------
Physical Properties
• Physical properties of air toxics span the entire range of pollutants
present in the atmosphere.
- Air toxics are present in the atmosphere as particles and gases and in semi-
volatile form.
- Air toxics can be both primary (directly emitted) and secondary (formed in the
atmosphere) in origin.
- Air toxics are mostly emitted from anthropogenic sources, but include some
biogenic sources.
- Some air toxics have very short atmospheric lifetimes while others remain in
the atmosphere for decades.
• Some air toxics such as VOCs (e.g., benzene and toluene) are
precursors to ozone and particulate matter (PM); and other toxics such as
heavy metals are components of PM.
• Preliminary investigation of the linkage between criteria pollutants and air
toxics showed a correlation of acetaldehyde and formaldehyde with
ozone but that correlation was likely because of similar photochemical
production mechanisms, rather than source similarities (i.e., not a causal
association) and most air toxics did not correlate well with ozone, PM2 5,
or other air toxics ( / y ).
June 2009 Section 3 - Background
-------
Formation, Destruction, Transport
(1 of 2)
Conceptual depiction of
transport scales.
Typical Downwind
Concentration
Gradient from a Point Source
Typical Concentration Gradient
from an Area Source
Some air pollution problems are limited
to the local area where pollution is
emitted. Other air quality problems
spread to cover cities or regions of
the country. Emissions of some
pollutants from anywhere on earth
can contribute to a global problem.
While some pollutants can be neatly
characterized as contributors to local,
regional, or global problems, many
pollutants are important on multiple
spatial scales. Explaining the factors
that control the spatial extent of a
pollutant requires understanding the
emissions, transport, and chemistry
of a pollutant.
Concentrations of primarily emitted pollutants are almost always highest very close to
their emissions source (for primary pollutants). The figure illustrates the typical drop-off
in concentrations from an emissions source as distance increases from the source.
Pollution concentrations start very high, but are diluted by the atmosphere in the first few
hundred feet from a source as they are transported and dispersed.
Urban Center
Pollutant Source
JUL
9500 10000
Downwind Distance from Source (m)
June 2009
Section 3 - Background
-------
Formation, Destruction, Transport
(2 of 2)
Concentrations of pollutants that are secondarily formed in the atmosphere are often
highest downwind of the source of precursor compounds. Chemical or physical rates of
formation determine how far the precursor pollutants travel before they begin forming
secondary pollutants such as formaldehyde. Factors such as wind speed and
temperature will also influence where these secondary pollutants are formed, relative to
where they were originally emitted. Generally, pollutants that are secondarily formed
do not have steep concentration gradients near the original precursor emissions
source.
The distance that a particular air pollutant emitted from a source may travel is
determined by atmospheric chemistry (pollutant lifetimes and formation and removal
processes), meteorology (air mass movement and precipitation), and topography
(mountains and valleys that affect air movement). The longer a pollutant stays in the
atmosphere, the farther it can be transported. Some air toxics are removed quickly by
chemical reactions (e.g., 1,3-butadiene) or physical processes, (e.g., heavy larger
particles deposit to the ground quickly). These short-lived pollutants can only travel
short distances from where they are emitted (1 Os to 10Os of miles). Other pollutants
react more slowly and can travel large distances from where they are formed or emitted
(e.g., toxic metals in PM25). These pollutants may be more regionally homogenous.
Finally, some unreactive pollutants can remain in the atmosphere for months, years, or
decades and spread across the Earth (e.g., carbon tetrachloride).
June 2009 Section 3 - Background
-------
Residence Time
Overview
• Residence time is a pollutant-specific measure of the average
lifetime of a molecule in the atmosphere.
• It is dependent on chemical and physical removal pathways; these
include
- Chemical: reaction with hydroxyl radical (OH), photolysis
- Physical: Wet or dry deposition
• Why is it important to understand residence times?
- Residence times can provide insight into the spatial and temporal
variability of air toxics.
- Longer residence times result in less spatial variability (e.g., carbon
tetrachloride).
- Conversely, short residence times should result in steep gradients in
concentrations near sources and temporal patterns that are
dependent on emissions schedules.
• Residence times are not characterized well for all air toxics. Some
air toxics and their residence times are listed in the appendix to
this section.
June 2009 Section 3 - Background 10
-------
History of Sampling
• Air toxics measurements have been collected across the country
since the 1960s as part of various programs and measurement
studies.
• National monitoring efforts have included programs specific to air
toxics:
- National Air Toxics Trends Stations (NATTS)
- Urban Air Toxics Monitoring Program (UATMP)
• Some ambient monitoring networks are designed for other
purposes but also provide air toxics data:
- Photochemical Assessment Monitoring Station (PAMS) program
- Chemical Speciation Network (CSN) which includes the Speciation
Trends Network (STN)
- Interagency Monitoring of Protected Visual Environments
(IMPROVE)
• State and local agencies have also operated long-running
monitoring operations and special studies to understand air toxics
in their communities.
June 2009 Section 3 - Background 11
-------
NATTS Sampling
Overview
NATTS sampling began in 2003 with 23 sites; the first
complete year of data was 2004.
There are currently 27 national air toxics trends sites:
21 urban and 6 rural.
Most stations are collocated with PM25
speciation samplers, and some also
include PAMS measurements.
The principle objective of the NATTS
network is to provide long-term monitoring
data across representative areas of the
country for certain priority HAPs
(e.g., benzene, formaldehyde, 1,3-butadiene,
acrolein, and hexavalent chromium) in order
to establish national trends for these and other HAPs.
Recently, the list of pollutants monitored at NATTS
sites was expanded to include polycyclic aromatic
hydrocarbons (PAHs), of which naphthalene is the
most prevalent.
All sites follow QA programs for sampling and
siting.
Periodic refinement of pollutants and/or sampling
may be made (e.g., EPA plans to re-evaluate the
program every six years).
National Air Toxics Trend Stations (NATTS)
June 2008
More information can be found on the
NATTS web site:
http://www.epa.gov/ttn/amtic/natts.html
June 2009
Section 3 - Background
12
-------
NATTS Sampling
Objectives
The primary objectives of NATTS monitoring include
• Providing air toxics data of sufficient quality to identify
trends, characterize ambient concentrations in
representative areas, and evaluate air quality models.
• Providing tools and guidance that enable consistent,
high certainty measurements.
• Using these consistent measurements to facilitate
measuring progress towards national emission and
risk reduction goals.
• Considering all NATTS sites to be NCORE level 2
sites, thereby providing rich data sets to address
multi-pollutant issues. NCORE level 2 sites are
"backbone" sites providing consistent, long-term data
for multiple pollutant types.
June 2009 Section 3 - Background 13
-------
Urban Air Toxics Monitoring Program (UATMP)
2007 UATMP Sites
The UATMP has provided sample
collection and analysis support since
1987 to encourage state, local, and tribal
agencies to understand and appreciate
the nature and extent of potentially toxic
air pollution in urban areas.
Participation in the UATMP is voluntary;
aside from the NATTS, target pollutants
and monitor siting are at the discretion of
each participant agency.
UATMP is used by a variety of networks
including some NATTS, some local-
scale, and some 105-funded air toxics
monitoring sites.
All UATMP samples are analyzed in a
central laboratory for concentrations of
VOCs, carbonyls, SVOCs, and metals.
The laboratory is centrally managed by EPA's Office of Air Quality Planning and Standards
(OAQPS) Air Quality Assessment Division.
UATMP assures analytical consistency among participants
- Data validation and AQS data entry are standard
- Site support available (provide monitors, instrument certification, installation, troubleshooting, etc.)
U.S. Environmental Protection Agency (2006f)
June 2009
Section 3 - Background
14
-------
PAMS Sampling
The goal of the PAMS network is to help assess ozone control programs by
- identifying key constituents and parameters
- tracking trends
- characterizing transport
- assisting in forecasting episodes
- assisting in improving emission inventories
Toxic VOCs sampled by the PAMS network include benzene,
formaldehyde, xylenes, toluene, ethylbenzene, styrene, and acetaldehyde.
PAMS sites collect subdaily measurements at the same sites that are useful
in assessing diurnal trends.
More information can be found on the PAMS web site at
http://www.epa.qov/ttn/amtic/pamsmain.html.
• PAMS Mentoring N«twoik
December 2007
RAMS NETWORK DESIGN
'EXTREME
DOWNWMD SITE
MAXIMUM
OZONE SITE
SECONDARY
MORNING DISTRICT
WHO
PRIMARY AFTER MOON
WIND
PRIMARY MORNING WIND
Analysis Objectives
Corroborate precursor El
Assess changes in
emissions; corroborate
reductions
Assess ozone & precursor
trends
Provide input to models;
evaluate models
Evaluate population
exposure
Other analyses:
biogenics
transport
source apportionment
diurnal patterns
day-of-week
episode vs. non-episode
PAMS Site Type
1
(Upwind)
^
•/
^
II
(Max. Emissions)
•/
^
•/
^
^
•/
^
•/
V
(Max. Ozone)
^
^
^
^
^
IV
(Downwind)
^
•/
^
June 2009
Section 3 - Background
U.S. Environmental Protection Agency (2006c)
15
-------
CSN Sampling
The Chemical Speciation Network is a companion network to the
mass-based Federal Reference Method (FRM) network
implemented in support of the PM2 5 National Ambient Air Quality
Standards (NAAQS).
The purpose of the CSN is to provide nationally consistent
speciated PM2 5 data for the assessment of trends at
representative sites in urban areas across the country.
As part of a routine monitoring
program, the CSN quantifies mass
concentrations and PM25 constituents,
including numerous trace elements,
ions (sulfate, nitrate, sodium,
potassium, ammonium), elemental
carbon, and organic carbon.
CSN data are available via AQS.
Prior to 2007, the carbpn (especially EC)
measurements from this network differed
from IMPROVE. A phased in change in
methodology is underway
Hawaii O
Circa 2005
June 2009
f H''j ' """ '.' i. ' ).
Section 3 - Background
U.S. Environmental Protection Agency (2007f)
16
-------
IMPROVE Sampling
Interagency Monitoring of Protected Visual Environments (IMPROVE)
program provides PM25 speciated and mass measurements in 156 Class I
areas (national parks and wildness areas). Speciated PM25 metals are the
only toxics measured in this network. IMPROVE Site Locations
Data are available in AQS.
IMPROVE data can also be accessed via the
internet from the VIEWS* web site.
- Raw data and various aggregates can be obtained in a
variety of output formats (ASCII, HTML, XLS etc.).
- All data from the inception of the IMPROVE network in
1988 are currently available.
User-input mapping and plotting tools are available to visualize trends,
spatial patterns, back trajectories and metadata (i.e., site locations).
IMPROVE also provides site photos and local topographical maps which are
very useful for data analyses.
To download data or get more information see
http://vista.cira. colostate. edu/views/
*VIEWS: Visibility Exchange Web System
June 2009
Section 3 - Background
17
-------
Local-Scale Monitoring Projects
EPA began programs to fund local-scale monitoring projects
beginning in the 2004 fiscal year.
The goal of local monitoring is to provide more flexibility to
address middle- and neighborhood-scale (0.5 km to 4 km) issues
that are not handled well by national networks, given the diversity
of toxics issues across the nation.
Specific objectives include identifying and profiling air toxics
sources, developing and assessing emerging measurement
methods, characterizing the degree and extent of local air toxics
problems, and tracking progress of air toxics reduction activities.
Projects are selected through an open competition process.
Grant topics, funding levels, and number of awards are set for
each grant cycle - for more information, see
• Local scale monitoring is typically only conducted from 1-2 years.
U.S. Environmental Protection Agency (2006c).
June 2009 Section 3 - Background is
-------
Air Toxics Sampling and Analysis
• Because air toxics are present in the atmosphere in gaseous, particulate,
and semi-volatile form, no single measurement technique is adequate.
Differences in chemical and physical properties further complicate
collection; the choice of measurement technique depends on the
objectives of data collection, including the chemical species of interest,
funds available, and desired detection limit
• EPA offers seventeen approved sampling and analysis methods for toxic
gases; among the most commonly used methods are the following:
- Compendium method TO-11 A. Used to measure formaldehyde and other carbonyl
compounds. Previous methods include TO-5 which had lower sensitivity and
reproducibility and was more labor-intensive. Method TO-11A uses coated
dinitrophenylhydrazine (DNPH) cartridges to collect the samples and analyzes them
using high performance liquid chromatography (HPLC).
- Compendium method TO-13A. Used to measure Polycyclic Aromatic Hydrocarbon
(PAH) compounds. This method allows for a variety of sampling media; an effective
choice is the combination of polyurethane foam (PUF) and XAD-2 ®. Samples are
analyzed by high resolution gas chromatography/mass spectrometry (GC/MS).
- Compendium method TO-15. Created to target 97 compounds on the list of 187
hazardous air pollutants. The method uses specially prepared canisters analyzed by
high resolution gas chromatography/mass spectrometry (GC/MS).
June 2009 Section 3 - Background 19
-------
Air Toxics Sampling and Analysis (2of2)
• EPA-approved methods for collection and
analysis of suspended participate matter are
documented in the "Compendium of Methods for
the Determination of Inorganic Compounds in
Ambient Air."
- Chapters 1 and 2 address mass measurement only;
while important to the criteria air pollutant program,
these chapters are not of particular importance to
the air toxics ambient monitoring program:
• Chapter IO-1, Continuous Measurement of PM10
Suspended Particulate Matter (SPM) in Ambient Air
• Chapter IO-2, Integrated Sampling of Suspended
Particulate Matter (SPM) in Ambient Air
- Chapter IO-3, Chemical Species Analysis of Filter-Collected Suspended Particulate Matter
(SPM), is of considerable importance to the air toxics ambient monitoring program
• Several different methods for speciated particulate analyses are available
- Each have advantages and disadvantages depending on the target analytes and
desired minimum detection limits.
- For Hazardous Air Pollutant (HAP) metals, IO-3.5 (Inductively Coupled Plasma /
Mass Spectrometry (ICP/MS)) offers the lowest detection limits.
• Detailed information about these monitoring methods is available at:
June 2009
Section 3 - Background
20
-------
Differences Among Sampling Networks
• When using data from different sampling networks, it is
important to consider
- The multiple sampling networks from which data were drawn
for these analyses vary in their objectives and sampling and
analytical methods. Data may not always be comparable.
- Sampling, analysis, method detection limits, objectives, site
characteristics, etc. have changed over time. Care is needed
in interpreting temporal and spatial trends.
• Analysts need to gather, and understand, all metadata
prior to conducting analyses.
June 2009 Section 3 - Background 21
-------
Critical Issues for Interpretation
Issues to consider when planning and performing data analysis
• Data quality. Information from collection and chemical analysis such as standard
operating procedures, audits, accuracy and precision, and data validation provide
insight into sample and collection biases and errors. This information is necessary for
data validation. Metadata such as precision and accuracy are required for other
analyses (e.g., receptor modeling).
• Data quantity. The number of species and amount of data above detection give
insight into what analyses can be performed and provide a starting point for planning
data analysis.
• Sampling duration. Duration provides information about analysis possibilities, for
example, 24-hr data cannot be used to investigate diurnal patterns. This information
may also be necessary for calculating completeness criteria when aggregating data.
• Sampling frequency. Frequency information provides further insight into what
analyses will be possible; for example, one year of 1-in-6 day data may not be
sufficient to investigate day-of-week tendencies. Sample frequency will also be
necessary to calculate data completeness and to aggregate data.
• Complementary data. Additional data for criteria pollutants, speciated PM, and non-
toxic hydrocarbons and meteorological data can be useful in a variety of analyses
such as data validation, understanding transport, and source identification.
June 2009 Section 3 - Background 22
-------
Sampling Design
• To develop a sampling design or monitoring plan, the following
should be considered:
- Monitoring objectives including consideration of geophysical setting,
meteorology, types and characteristics of sources, and existing
monitoring programs.
- Data quality objectives needed to answer questions to be asked of the
data (i.e., how precisely or accurately do the questions need to be
answered?).
- Options for what, when, where, how frequently, and for how long to
monitor; these are related to the selection of appropriate monitoring
equipment and laboratory analyses.
- Data quality assurance and validation approach including collocated data
requirements, QA programs for analytical laboratories, and data
validation guidelines for ambient data.
- Options for data analysis and exploration including available tools, data
analyses, data needs, and training needs.
• Sampling design for the national air toxics monitoring program is
thoroughly discussed by Battelle and available here:
......0..;..!;.:...'.....;.:....;...-;...........;:....;.:,.:.:..,..<.:.............. (Phase I report).
June 2009 Section 3 - Background 23
-------
Resources
Monitoring Networks
NATTS: http://www.epa.gov/ttn/arritic/natts.htiTil
UATMP: http://www.epa.gov/ttn/arritic/uatrri.htrril
PAMS: http://www.epa.gov/ttn/arritic/parrisrriain.htrril
CSN: http://www.epa.gov/ttn/arritic/speciepg.htrri
IMPROVE: A source of speciated PM2 5 data
http://vista.cira.colostate.edu/views/
Local scale monitoring programs:
http://www.epa.gov/ttn/amtic/local.html
June 2009 Section 3 - Background 24
-------
Appendix
Residence Times
Approximate atmospheric residence
times for some air toxics are listed
here.
These values were found at
';/. To find the atmospheric
persistence of other air toxics, enter
the pollutant's name in the chemical
profile. Once the pollutant page is
available, select "links" and the
entry for "CalEPA Air Resources
Board Toxic Air Contaminant
Summary". A summary of physical
properties is provided including
atmospheric persistence.
Species
Carbon Tetrachloride
Chloroform
Tetrachloroethylene
Methylene Chloride
Benzene
1 ,2-Dichloropropane
Trichloroethylene
Acrylonitrile
Ethylbenzene
Vinyl Chloride
Formaldehyde
Acrolein
Naphthalene
Acetaldehyde
1,3-Butadiene
Arsenic and other toxic
metal compounds
Lifetime by reaction with OH
decades
months
months
months
84hrs
weeks*
84hrs
2.4 days
2 days
27hrs
26hrs
17hrs
16hrs
12hrs
2.8 hrs
N/A**
* Wet deposition is also a sink
** Lifetime is dependant on particle deposition and is typically days to
weeks. Deposition time is primarily determined by the size of the
particles.
June 2009
Section 3 - Background
25
-------
References
(1 of 2)
Hitchins J, Morawska L, Wolff L, Gilbert D. (2000) Concentration of submicrometer particles from vehicle
emissions near a major road. Atmos Environ 34:51-59.
Jaramillo VL, Kavouras I (2005). Monitoring, Source Identification, and Health Impacts of Air Toxics in
Albuquerque, NM. available on the internet at http://www.epa.gov/ttn/amtic/toxfy05.html
Kinney PL, Aggarwal M, Northridge ME, Janssen NA, Shepard P. (2000) Airborne concentrations of PM2.5 and
diesel exhaust particles on Harlem sidewalks: a community-based pilot study. Environ Health Perspect
108:213-218.
Seinfeld J.H. and Pandis S.N. (1998) Atmospheric chemistry and physics: from air pollution to global change, J.
Wiley and Sons, Inc., New York, New York.
U.S. Environmental Protection Agency. EPA's Air Toxics Risk Assessment (ATRA) Reference Library describes
the basics of exposure assessment, toxicity evaluation, and risk characterization (chronic and acute) for toxic
pollutants released to the air from stationary, mobile, and other types of sources. The library covers both
human and ecological assessment for individual sources of pollution as well as the combined impact of
multiple sources. This guidance is amenable to a variety of purposes, including assessments conducted
under the air toxics provisions of the Clean Air Act, analysis of combined multisource risks at the community
level, and as a supplement to other Agency guidance (e.g., as an aid to Superfund risk assessors evaluating
the air exposure pathway), http://www.epa.gov/ttn/fera/risk_atra_main.html
U.S. Environmental Protection Agency (2001) Pilot City Air Toxics Measurements Summary. Available on the
Internet at http://www.epa.gov/ttn/amtic/natts.html.
U.S. Environmental Protection Agency (1999) 1999 TO Compendium of Methods Second Edition. Available on the
Internet at http://www.epa.gov/ttn/amtic/airtox.html.
U.S. Environmental Protection Agency (1999) 10 Compendium of Methods for the Determination of Inorganic
Compounds in Ambient Air, EPA/625/R-96/01a available online at
U.S. Environmental Protection Agency (2006a) A Preliminary Risk-Based Screening Approach for Air Toxics
Monitoring Data Sets. Available on the Internet at http://www.epa.gov/region4/air/airtoxic/Screening-041106-
KM.pdf
June 2009 Section 3 - Background 26
-------
References
(2 of 2)
U.S. Environmental Protection Agency (2006b). NATA Glossary. Available on the Internet at
http://www.epa.gov/ttn/atw/nata/gloss.html
U.S. Environmental Protection Agency (2006c). Local-Scale Monitoring Projects. Available on the Internet at
http://www.epa.gov/ttn/amtic/local.html.
U.S. Environmental Protection Agency (2006e). PAMS - General Information. Available on the Internet at
http://www.epa.gov/oar/oagps/pams/general.htmlffcontacts.
U.S. Environmental Protection Agency (2006f). 2005 Urban Air Toxics Monitoring Program (UATMP) Available
on the Internet at http://www.epa.gov/ttn/amtic/uatm.html
U.S. Environmental Protection Agency (2007a) Air pollution and health risk. Available on the Internet at
http://www.epa.gov/ttn/atw/3_90_022.html
U.S. Environmental Protection Agency (2007b) Risk Assessment for Toxic Air Pollutants: A Citizen's Guide.
Available on the Internet at http://www.epa.gov/ttn/atw/3_90_024.html
U.S. Environmental Protection Agency (2007c). About air toxics. Available on the Internet at
http://www.epa.gov/ttn/atw/allabout.html
U.S. Environmental Protection Agency (2007d) Evaluating Exposures to Toxic Air Pollutants: A Citizen's Guide.
Available on the Internet at http://www.epa.gov/ttn/atw/3_90_023.html
U.S. Environmental Protection Agency (2007e) About air toxics, health and ecological effects. Available on the
Internet at http://www.epa.gov/air/toxicair/newtoxics.html.
U.S. Environmental Protection Agency (2007f) PM Research. Available on the internet at
http://www.epa.gov/pmresearch/pm grant/06 monitoring programs.html.
U.S. Environmental Protection Agency (2007g) Fact Sheet. Available on the Internet at
http://www.afcee. brooks. af.mil/pro-act/fact/caa.asp#2.
Zhu Y, Hinds WC, Kim S, Shen S, Sioutas C. 2002b. Study on ultrafine particles and other vehicular pollutants
near a busy highway. Atmos Environ 36:4323-4335
June 2009 Section 3 - Background 27
-------
Preparing Data for Analysis
How do I get my data ready for analysis?
How do I treat data below detection?
June 2009
Section 4 - Preparing Data for Analysis
-------
Overview
This section provides suggestions on acquiring and
preparing data sets for analysis, which is the basis for
subsequent sections of the workbook.
Data preparation is sometimes more difficult and time-
consuming than the data analyses.
It is vital to carefully construct a data set so that data
quality and integrity are assured.
In the process of constructing and validating data, the
analyst gains important insight into the data that may
help direct and facilitate the analyses.
Section 4 - Preparing Data for Analysis
-------
Data Quality Objectives
Preparation of data for subsequent analyses is tied to the data
quality objectives (DQOs) to be achieved. A DQO is
measurement performance or acceptance criteria established as
part of the study design. DQOs relate the quality of data needed
to the established limits on the chance of making a decision error
or of incorrectly answering a study question.
In setting DQOs, consider
- who will use the data;
- what the project's goals/objectives/questions or issues are;
- what decision(s) will be made from the information obtained;
- what type, quantity, and quality of data are specified;
- how "good" the data have to be to support the decision to be made.
EPA provides guidance on setting DQOs: G-4 Guidance on
Systematic Planning Using the Data Quality Objective Process,
Section 4 - Preparing Data for Analysis
-------
June 2009
Preparing Data for Analysis
What's Covered in This Section?
Data availability
- What data are available?
- Sources for ambient air toxics data
- Accessing data systems and acquiring data
• AQS
• IMPROVE
• SEARCH
• Other archives
- Supplementing air toxics data
- Know your data
Data processing
- Investigating collocated data
- Preparing daily, seasonal, and annual averages
- Determining data completeness
- Treating data below detection
Data validation
- Procedures and tools
- Handling suspect data
Section 4 - Preparing Data for Analysis
-------
What Data Are Available?
Air Toxics Overview
Air toxics ambient monitoring data is
typically collected in three major
durations (1-hr, 3-hr, 24-hr)
Sampling frequencies vary from
subdaily, daily, 1-in-3-day,1-in-6-day, to
1-in-12-day
Some sites have operated as long-term
(multiple year) sites while others may
report data for a short study only (e.g., a
week or two).
Data can be reported in a range of
units. For analyses, consistency in
units is essential.
For data to be useful, a minimum of
monitor locations, concentration units,
method codes, and parameter names is
required. Sampling frequency
information is also desirable.
Keep in mind: Air toxics measurements
are primarily captured in urban areas as
shown in the figures. VOC*
measurements, for example, are
typically made in higher population and
higher population density areas relative
to all counties in the United States.
US counties
Counties with metals measurements
Counties with VOC measurements
Median county population
The subsets of
counties with metals or
VOC measurements
have median
populations that are at
the upper end of the
distribution compared
to all US counties.
305,000
100
1000
10000 100000
Population
1000000
10000000
Plot prepared in SYSTAT using
2000 census and locations of air
toxics monitors in 2003-2005.
June 2009
Section 4 - Preparing Data for Analysis
VOC: Volatile Organic Compound
-------
What Data Are Available?
Sources for Ambient Air Toxics Data
Air toxics data are mostly obtained from federal, state, local
and tribal monitoring agencies and are listed here:
• EPA's Air Quality System (AQS)
• IMPROVE1 speciated PM25 data can be downloaded from VIEWS2
web site, http://vista.cira.colostate.edu/views/
• SEARCH3 speciated PM2 5 data can be downloaded from
Atmospheric Research Analysis web site,
http://www.atmospheric-research.com/public/index.html
• Air Quality Archive (AQA) (1990-2005) developed during Phase V
national air toxics analysis project; includes legacy air toxics archive
data (data posted here http://www.epa.gov/ttn/amtic/toxdat.html)
• Local, state and tribal air quality agency databases (i.e., some data
are not yet submitted to AQS)
1 IMPROVE = Interagency Monitoring of Protected Visual Environments
2 VIEWS = Visibility Information Exchange Web System
3 SEARCH = Southeastern Aerosol Research and Characterization Study
Section 4 - Preparing Data for Analysis
-------
AQS Data
Overview
• AQS is the EPA's principal data repository, containing the most complete
set of toxics (and other) data available.
• To obtain the massive data set required for the national analysis, AQS
was accessed via the Intranet with a user ID obtained from EPA.
- AMP501 request provides raw data in R-2 format.
• Data are available from 1995 to the present in AQS.
• Annual air toxics data are required to be submitted to AQS within 180 days of end of
Q4, i.e., 2007 data would be entered by July 2008.
• Archived AMP501 data prior to 1995 were requested directly from EPA.
- Data from AQS are provided in a pipe-delimited format that needs to be
transformed and processed.
• For the national assessment, SQL server was used to process data.
• Publicly available VOCDat can be used to process data from one site at a time
(http://vocdat.sonomatech.com/).
• Some data, such as criteria pollutant summaries, are available for
download without a user ID; most air toxics are not yet available this way.
• Find additional information about AQS at
http://www.epa.gov/ttnmain1/airs/airsaqs/
• The AQS Discoverer site may be used to retrieve data:
http://www.epa.gov/ttn/airs/airsaqs/aqsdiscover/
Section 4 - Preparing Data for Analysis
-------
AQS Data
Codes
AQS uses a variety of codes to simplify and condense information in the
R-2 output file.
Key Codes
- AQS site code; identifies a particular monitoring site.
- AQS parameter code; identifies the pollutant measured.
- AQS parameter occurrence code (POC); distinguishes among monitors for the
same pollutant at the same site.
- AQS method code; unique for each combination of sample collection and
analysis.
Each code contains additional metadata which would be unnecessarily
repetitive if included in the R-2 file.
- For example, default method detection limits MDLs) are not provided in the
R-2 file. This information must be looked up on the AQS website (below) using
the method query tool. Alternate MDLs, on the other hand, are included in the
R-2 file since they are unique to each record.
Descriptions of codes and additional metadata can be found at
Section 4 - Preparing Data for Analysis
-------
Other Data Archives
(1 of 2)
SEARCH Site Locations
IMPROVE data - PM25 speciated and mass measurements in
156 Class I areas (national parks and wildness areas). Speciated
PM2 5 metals are the only toxics measured in this network. Further
described in Section 3, "Background".
SEARCH data - PM2 5 species and mass
measurements at 8 sites in the Southeast
from 1998 to the present. Speciated PM25
metals are the only toxics measured in this
network. At the time of the national analysis,
these data were not available in AQS.
- SEARCH data are publicly available via the
Internet and can be downloaded on a site-by-
site basis in a Microsoft Excel output format.
- Site photographs and other useful metadata are available at
the web site, v v ,
June 2009
Section 4 - Preparing Data for Analysis
-------
Other Data Archives
(2 of 2)
As part of several projects, an air quality archive (AQA) was developed as
an analysis-ready database that includes data from AQS (1990-2005),
IMPROVE and SEARCH data, and data from the legacy air toxics archive.
This national level database contains nearly 1 billion raw data records, 27
million raw toxics records, and complete validated and temporally
aggregated data sets.
Key data summaries have been posted hjti^/w^^
- 24-hour CSV Files (very large file)
- Monthly CSV Files
- Quarterly CSV Files
- Annual Average CSV Files
- SAS Files (all data, very large file)
Note: CSV files are comma separated files suitable for importing into spreadsheets or
databases. These files are too large to fit into Microsoft Excel spreadsheets but will fit
into Microsoft Access. The SAS files are for use with the SAS Statistical Software
package.
Section 4 - Preparing Data for Analysis
-------
Supplementing Air Toxics Data
A Note on Data Acquisition
A complete set of data is always desirable to assist in analysis. Nontoxic species,
meteorological data, and site-specific conditions (e.g., proximity to emissions) provide
supporting information that will help in data interpretation. You may want to obtain the
following:
• Additional data
- Criteria pollutant species (AQS): multipollutant relationships, transport, diurnal/seasonal
evaluation, source identification
- Meteorological data (AQS, NWS): transport, mixing, source direction, meteorological
adjustment of trends
- All PM2 5 speciation data (OC, EC, sulfate, nitrate, etc.): source identification
- Aethalometer™ data (black carbon): diurnal characterization, source identification
- All speciated hydrocarbon data (e.g., full PAMS target list): air parcel age (transport), source
identification
- Special studies data (e.g., continuous speciated PM data, ammonia): diurnal characteristics,
source identification
• Metadata
- Monitoring objectives: time-frame of data, reasoning for site locations
- Site characteristics (e.g., photos): may explain data anomalies, source identification
- Monitoring scale (likely varies by pollutant): air parcel age (transport), source identification
• Supplemental data
- Emission inventory, especially point sources: source identification
- Population density: relative concentration level
- Vehicle traffic counts: diurnal patterns, source identification
• Links to these data can be found in the resources section of this chapter.
Section 4 - Preparing Data for Analysis
-------
Supplementing Air Toxics Data
Using Metadata
Although some metadata are available through
AQS, metadata are not routinely populated.
Site metadata can assist in analyses by illuminating
sources (such as local sources or roadways) or
physical attributes of the site.
The satellite image shows the monitoring site (red
circle) near an oil refinery that likely influences VOC
concentrations at the site.
A comparison of benzene annual averages at this
site (red) to the state-wide annual average (blue)
indicates benzene concentrations at this site are
significantly increased.
The satellite image was obtained from Google
Earth, a publicly available program that contains
satellite coverage of the entire planet and is very
useful to investigate monitor siting.
- The program is easy to use; site locations can be entered
as latitude and longitude or as a street address or
browsed to manually. Geographic data for multiple sites
can also be imported from text files.
- Once the site is located, it can be marked and named,
high-resolution pictures can be exported, and the site
information can be saved for future reference.
- Use caution when interpreting maps—reported precisions
of monitor locations vary and not all significant sources
will be easy to identify visually.
In this case, preliminary evidence shows the
refinery may influence local benzene
concentrations; however, this evidence is not
conclusive. Other local sources, local meteorology
(e.g., wind direction on high days), and data or
monitoring issues must be further investigated.
9 -i
E 7-
Se-
0)
o
o
o
Site Annual Average
State Average
5 -
4 -
3 -
2 -
1 -
0
2000
2001
2002
2003
2004
2005
2006
2007
Year
June 2009
Section 4 - Preparing Data for Analysis
12
-------
Supplementing Air Toxics Data
Using Metadata
This sample map shows point
source emissions of criteria
pollutants and annual
average daily traffic counts in
the Detroit area near three
monitoring sites. The
Dearborn site is closest to
major industry. Higher
concentrations of VOCs and
PM2 5 at the Dearborn site
could be explained by these
sources.
Emissions sources for more
detailed species (i.e., not all
VOCs lumped together) are
publicly available at the
county level from the latest
version of the NEI.
This figure was created with ESRI's
ArcMap program and NEI 2002 point
source emissions data.
Macomb County
O
Point Source Emissions
PM26 (tonnes/year)
° 10
O 100
O
1,000
NH3 (tonnes/year)
. 1
• 10
100
NOx (tonnes/year)
10
100
f 10,000
VOC (tonnes/year)
• 10
* 100
0 1,000
SO2 (tonnes/year)
100
• 1,000
' 10,000
Annual Avg Daily Traffic
— 0-10000
— 10001 - 20000
20001 - 50000
50001- 100000
100001 -175000
175001 -220000
June 2009
Section 4 - Preparing Data for Analysis
13
-------
Converting Units
(1of2)
Frequently used units for gaseous air toxics include
|jg/m3, parts per billion (ppb), and parts per billion
carbon (ppbC).
The preferred units for risk assessment are |jg/m3. The
data are not always delivered or reported in these units.
Useful equations for converting data units:
[cone, in |jg/m3] = ([cone, in ppb] * MW * 298 * P )/(24.45 * T * 760 )
[cone, in ppb] = ([cone, in ug/m3] * 24.45 * T * 760 )/( MW * 298 * P )
ppbC = ppb x (# of carbons in the molecule)
where:
MW = molecular weight of compound [g/mol]
P = absolute pressure of air [mm Hg]; 1 atm = 760 mm Hg
T = temperature of air [K]; 298 K is standard
Section 4 - Preparing Data for Analysis
-------
Converting Units
Examples
Benzene (C6H6)- convert 1 ppb to |jg/m3 at standard T and P
[cone, in |jg/m3] = ([1 ppb] * 78.11)7(24.45) = 3.195 |jg/m3
where T = 298 K (25 C) and P = 760 mm Hg
Carbon tetrachloride (CCI4)- convert 1 |jg/m3 to ppb at 0 C, 1 atm.
[cone, in |jg/m3] = ([1 ppb] * 153.82*298)7(24.45*273) = 6.867 |jg/m3
where P = 760 mm Hg
The EPA provides a thorough walk-through of the unit conversion process:
Section 4 - Preparing Data for Analysis
-------
Know Your Data
Overview
• Before beginning data validation, it helps to know the typical patterns in
an air toxics data set. Having this knowledge helps the analyst set
expectations for data patterns and identify data anomalies. Diurnal and
seasonal patterns help analysts understand possible impacts on data
aggregations when some data are missing.
• Using the power of the central tendencies in a large national data set,
typical air toxics relationships are provided. Patterns at individual sites
may differ from the typical examples shown— understanding why there
are differences becomes part of the data validation and data analysis
steps.
• EPA has developed tabulated dose-response assessments for use in risk
assessment of hazardous air pollutants. The information can be found in
two tables at this website: -lip //•>/./•>/./•>/./ on^aov/n.-7alw/toysoi/>-cc/si.- x-xa"y....:'/:;:...
One table presents values for long-term (chronic) inhalation and oral
exposures and the other presents short-term (acute) inhalation
exposures. Note that these tables are updated periodically to reflect the
most recent information; revisions can make a significant impact on risk
screening assessments.
Section 4 - Preparing Data for Analysis
-------
Know Your Data
Typical Air Toxics Relationships: Seasonal Trends
Pollutants that typically correlate well
- Acetaldehyde and formaldehyde, similar
sources and reactivity
- Benzene and 1,3-butadiene, especially at
locations influenced by mobile source emissions
- Toluene, benzene, and ethylbenzene
• Toluene concentrations are typically
higher than benzene concentrations
• Toluene and ethylbenzene typically
correlate well
National seasonal patterns
- Warm season peak
• Formaldehyde
• Acetaldehyde
• Chloroform
• Manganese PM2 5
- Cool season peak
• Benzene
• 1,3-butadiene
• Hexane
• Chlorine PM25 (especially at locations where
roads are salted in winter)
- Invariant, carbon tetrachloride
Example Seasonal Patterns
-------
Know Your Data
Typical Air Toxics Relationships: Diurnal Trends
Example Diurnal Patterns
•Benzene - •- Methylene chloride —A- 'Carbon Tetrachloride —x— Formaldehyde
ra
0)
u
c
o
u
•o
0)
Midday Peak
X
Nighttime Peak
June 2009
Midday peak, photochemical
production
- Acetaldehyde
- Formaldehyde
Morning peak, mobile
sources 2
o
- Benzene
- 1,3-butadiene
- Xylenes
- Hexane
- Ethylbenzene
- Toluene
- 2,2,4-trimethylpentane
Nighttime peak, affected by
dilution
- Methylene chloride
- Mercury Vapor The plot shows example diurnal patterns of benzene, methylene
Inworiont chloride, carbon tetrachloride, and formaldehyde at a national level.
I VCll I Cllll ., , , ... ... ,, .- ,
It was created with Microsoft Excel.
- Global background, carbon
tetrachloride
Section 4 - Preparing Data for Analysis
Photo-chemical peak
Rush hour peak
-- — -A— -- — A- — --
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
18
-------
Collocated Data
Overview
Differences between replicate, duplicate, and collocated
measurements
- A replicate sample is a single sample that is chemically analyzed
multiple times.
- A duplicate sample is a single sample that is chemically analyzed twice.
These samples provide a measure of the precision of the chemical
analysis, but do not provide any error estimates for the sample
collection method.
In contrast, collocated samples are two samples collected at the same
location and time by equivalent samplers and chemically analyzed by
the same method.
These samples provide a measure of the precision of both sample
collection and chemical analysis.
EPA's National Air Toxics Trend Sites (NATTS) program proposed
the following collocated data standards:
- Less than 25% bias between collocated samples
- Less than 15% coefficient of variation for each pollutant
June 2009
Section 4 - Preparing Data for Analysis
19
-------
Collocated Data
Handling Collocated Data
&
Q.
Q_
CM
-------
Collocated Data
Aggregating Collocated Data
Following are suggested treatments for collocated data:
• Double counting collocated data should be avoided when creating aggregates such
as annual averages. At a site level,
- If scatter plots of the collocated measurements correlate well, the values can be averaged
together for a given site, method, date, and time.
- If the collocated measurements do not agree, there can be no certainty which (if any)
measurement is correct and the data should be excluded from analyses.
If disagreement is a regular occurrence, confidence in other data collected with the same instruments
at that site is reduced.
• After determining that collocated measurements agree, average the two data sets
together following these guidelines.
- If one measurement is missing, use the collocated value as the average value.
Investigate the value to make sure it is consistent with the rest of the data.
- If both values are below detection, treat them as any other data (i.e., average them
together).
- If one measurement is below detection and one is not, use the value above detection as a
conservative approach.
• In some monitoring programs, only data from the primary sample are used in data
analysis and the collocated sample is used only for quality assurance purposes.
• At a national level, it was not possible to QC all collocated data. All valid collocated
data were averaged together. If a collocated value was missing, the secondary
value was used in its place, and all data were substituted with MDL/2 if they were
below detection.
Section 4 - Preparing Data for Analysis
-------
Data Completeness
Overview
• When performing an analysis, it is important to ensure that data are
comparable across sites, years, or other subsets of the data; and it is essential
to understand the time periods represented in the data (e.g., if the data set is
missing winter months and concentrations are typically high during winter, an
annual average might be biased low). Depending on the types of analyses, it
may be necessary to implement data completeness criteria.
• Completeness criteria are necessary in creating valid aggregated values (such
as annual averages) to verify that the distribution of measured values within
the aggregation window is representative of that entire period. Diurnal,
day-of-week, and seasonal patterns need to be considered.
• Data completeness is computed using the reported sampling frequency (when
available) as a measure of how many samples should be collected in a given
period versus the number of samples that were collected. When aggregating
data, 75% completeness is our suggested minimum value for data. Using
higher or lower completeness criteria may be appropriate for certain analyses
depending on your DQOs.
• If data are missing from a site because of an unforeseen event (e.g., a
hurricane), sampling contamination, or other problems, or a site may always
operate on an incomplete schedule (e.g., ozone monitoring in summer months
only), data may not be representative of the period of interest.
Section 4 - Preparing Data for Analysis
June ^uuy 2.2.
-------
20
15
CD
03 10
CD
Data Completeness
Interpreting Notched Box Plots
Notched box whisker plots are useful for showing the central trends
of the data (i.e., the median) while also showing variability (i.e., the
box and whiskers).
Definitions provided are for plots prepared using SYSTAT software;
other software may have different definitions.
300
0
Date > 3*IR
Outliers
tVAi/s/cer
Box
(Interquartile
range)
Data w/tfi/fi 3*/f?
Data within 1.5*IR
75th percentile
Median
25th percentile
200
03
CD
100
\
o
Outliers
Whisker
Notch
June 2009
0
Section 4 - Preparing Data for Analysis
median
25th percentile
95%C.L-
Median
95 % C.I.
23
-------
Data Completeness
Example Effect of Aggregating Incomplete Data
This example illustrates why data completeness
criteria should be met when creating data
aggregates.
The first graph shows the seasonal pattern of 24-hr
benzene samples from an urban site. This
seasonal pattern (lower concentrations in summer)
is typical of national concentrations and is driven by
dilution from higher mixing heights in summer.
Summer concentrations may also be reduced in
areas where Reid vapor pressure caps are
implemented (gasoline volatility).
Benzene
c
g
^4-J
JS
-t-J
c
(U
o
c
o
O
\ i \ \ i \ i i i i i r
I I I I I I I I I I I I
SEASON
• Summer
• Winter
01234
5678
MONTH
9 10 11 12 13
June 2009
The annual averages in the second figure were
constructed using only summer (red) or winter
(blue) data to illustrate aggregation results from an
incomplete data set (this is NOT how aggregations
should be constructed). Incomplete data cause the
summer "annual averages" to be biased low and
the winter "annual averages" to be biased high; the
black line shows the true average of all data. This
example is an artificial case of incomplete annual
data, but it demonstrates the importance of applying
data completeness and the erroneous results which
may be reached without it.
Section 4 - Preparing Data for Analysis
Average of
~~ All Data
Summer
Winter
1998
2004 2006
2000 2002
YEAR
Figures were created in SYSTAT
24
-------
Data Aggregation
Creating Valid 24-hr Averages
When day-of-week, seasonal, and annual patterns are examined,
subdaily data may be aggregated to valid daily averages as a
starting point for comparison.
In the calculation process, it is important to check that 24-hr
averages are representative of a significant portion of the day
because diurnal fluctuations in pollutant concentration throughout
the day may bias the average if incomplete data are used.
It is suggested that a 75% daily completeness criteria be used to
ensure that a large portion of the day is represented. These
criteria by sample frequency are shown in the table below.
Sample Duration
1-hr
2-hr
3-hr
4-hr
6-hr
8-hr
12-hr
75% Daily Completeness
Cutoff (# of samples)
18
9
6
5
3
3
2
June 2009
Section 4 - Preparing Data for Analysis
25
-------
Data Aggregation
Creating Valid Monthly Averages
Monthly averages are useful in assessing seasonal variability.
It is suggested data meet the 75% completeness criteria as determined by sample
frequency, assuming an average of 30 days in a month. Note that low sample
frequency data may not adequately represent monthly values with any certainty.
Therefore, at least four samples should be required in a month.
Frequency
Daily
Every 3rd Day
Every 6th Day
Other
75% Monthly
Completeness Cutoff
23
8
4
4
Unassigned frequencies mean that no frequency was reported with the data and a
frequency could not be easily determined. The completeness criteria then defaults
to the minimum to preserve data, but should be identified for later QC if possible.
In the national data set, 74% of air toxics data were not assigned frequencies. A few
methods were tested to fully populate the frequencies, but were not further pursued.
Also in the national level analyses, monthly averages were only used to investigate
seasonal patterns. Quarterly averages were used instead to compute annual
averages because more data were expected to meet completeness criteria.
June 2009
Section 4 - Preparing Data for Analysis
26
-------
Data Aggregation
Creating Valid Quarterly and Annual Averages
Annual averages are calculated by first computing valid quarterly averages
Quarterly Averages
- Quarterly averages are calculated from valid 24-hr averages.
- 75% of data at the expected daily sampling frequency is suggested for a valid
calendar quarter average, i.e.,
Frequency
Daily
Every 3rd Day
Every 6th Day
Every 1 2th Day
Unassigned
75% Quarterly
Completeness Cutoff
68
24
12
6
6
- At least 58 days are suggested between the first and last sample in a quarter to
ensure sampling represented the entire quarter.
- Unassigned frequencies mean that no frequency was reported with the data and
a frequency could not be easily determined. The completeness criteria then
defaults to the minimum to preserve data, but should be identified for later QC if
possible.
Annual Averages - three out of four valid quarterly averages are required.
June 2009
Section 4 - Preparing Data for Analysis
27
-------
Method Detection Limits
Overview
The EPA Code of Federal Regulations (CFR) defines the MDL as "The minimum
concentration of a substance that can be measured and reported with 99%
confidence that the analyte concentration is greater than zero and is determined from
analysis of a sample in a given matrix containing the analyte".
The purpose of an MDL is to discriminate against false positives. Values reported
below the MDL have much higher uncertainty but can provide insight into the lower
concentration distribution (i.e., are most values closer to the MDL or to zero?).
MDL
In the illustration, normally distributed
results from a measured value of zero
yields a 99% confidence value (3o) at
3 ppb, which would be used as the MDL in
this case. There is >99% confidence that
values above 3 ppb are not false positives.
-3
-2
-1
0 1 2
Concentration (ppb)
Environmental Protection Agency, 1982
June 2009
Section 4 - Preparing Data for Analysis
28
-------
Method Detection Limits
MDLs Are Not Low Enough For Most Air Toxics Measurements
• 52% of all air toxics measurements reported in AQS from 1990-2005
are at or below the MDL.
• This percentage varies widely across pollutants; some are close to
100% below MDL.
• Data below MDL can be reported in two ways.
- Uncensored: The measured value is reported.
- Censored: The measured value is replaced with a proxy. Typical
examples are MDL, MDL/2, MDL/10, or zero
• The NATTS program requires laboratories to report uncensored
values; this approach is neither uniformly nor historically applied
across networks and laboratories.
• We suggest that data below detection not be removed from analyses.
A measurement below detection does not necessarily indicate a
value of zero because ambient concentrations can be lower than
currently available MDLs. Data below detection are representative of
the lower ambient concentration range, and removing them from
analyses will bias results toward higher concentrations and may
cause incorrect conclusions.
Section 4 - Preparing Data for Analysis
-------
Identifying Censored Data
(1 of 2)
Data are typically reported as concentration values with accompanying
MDLs. In AQS, the MDL is either a default value associated with the
analytical method (MDL) or a value assigned by the reporting entity for
that specific record (alternate MDL).
NATTS program guidance suggests that laboratories report all values,
regardless of the MDL. However, many air toxics data are reported as
censored values—i.e., they have been replaced with zero, MDL/2, MDL,
or some other value.
Identifying censored values is a necessary first step in treating data
below detection. Reporting of censored data will most likely differ
between sites and may even be different by method, parameter, or time
period for a given site.
Identify and separate data at or below the detection limit along with the
associated MDL and date/time. If alternate MDLs are available, make
sure to use these alternates over the default MDLs.
Section 4 - Preparing Data for Analysis
-------
Identifying Censored Data
(2 of 2)
• Examine the data for obvious substitution. Count the number of times each
value at or below detection is reported for a given site, parameter, and method.
Are the majority of data reported as the same value (e.g., zero or MDL/2)?
- If data are largely reported as two or more values, investigate the temporal variation of
the data. Are there large step changes where reporting methods or MDLs have
changed?
- Do the duplicate values indicate a typical censoring method (e.g., MDL/2, MDL/10)?
- Alternate MDLs may be different for each sample run causing a distribution of values if
MDL/x substitutions were used. That values below MDL are not all the same does not
mean they are not censored.
• Check for MDL/X substitution.
- Make a scatter plot of the value vs. MDL to see if the data fall on a straight line.
- If the data form a straight line, the slope of the regression line will indicate the value by
which the MDL has been divided.
Is the value a reasonable number that would be used for MDL substitution (e.g., 1,2,5
or 10)?
- If the data have been formatted, processed, or converted, ratios may not be exactly the same
due to rounding differences; the distribution should be close to a straight line and centered
around a single integer if MDL/x substitutions have been made.
- If a bifurcated pattern is observed, the substitution method may have changed over time. Plot a
time series of the ratios and look for step changes.
• The distribution of the ratios should be highly variable if the data are not censored.
Section 4 - Preparing Data for Analysis
-------
Identifying Censored Data
Example
The data shown in the table
are values for a given air
toxic below detection in a
selected year.
The reported data, at first
glance, appear to be "real"
concentrations (e.g., the
histogram shows a
distribution of
concentrations).
However, the ratio of MDL
to reported concentration
equals 2 (with very small
deviations likely due to unit
conversions). The
relationship is also visible in
a scatter plot as shown
here.
Therefore, in this example,
the reported concentrations
have been substituted with
MDL/2.
c
o
+j
5
+•«
c
0) —.
o «*>
O §]
TJ —•'
0)
t
O
Q.
0)
0.4
0.3
0.2
0.1
y = 2. Ox -0.0
0.3
R2 =
0.5 0.7
MDL (jjg/m3)
15
10-
O
5-
0
0.1
0.9
-
'
-
-
-
-
— ,
0.6
0.5
0.4
0.3
0.2
0.1
n n
0.2
0.3
CONC
0.4
0.5
Reported
Concentration
(Mg/m3)
0.19161
0.20438
0.22141
0.38748
0.40451
0.37896
0.17032
0.18309
0.27251
0.31935
0.31083
0.29380
0.32361
0.26825
0.27677
0.31509
0.25548
0.32786
0.27677
0.25548
0.25548
0.25548
0.29380
0.31083
MDL (MQ/rn3)
0.38237
0.40834
0.44283
0.77921
0.81327
0.75792
0.34404
0.36193
0.54502
0.64295
0.62166
0.58760
0.65147
0.53225
0.55354
0.63018
0.51521
0.65573
0.55354
0.51521
0.51521
0.51521
0.58760
0.62166
June 2009
Section 4 - Preparing Data for Analysis
-------
Method Detection Limits
Treating Data Below Detection (1 of 2)
• Treatment of national-level data
At a national level, the majority of data collected from 1990 to present have been reported
below the MDL with censored values; uncensored values are not typically reported. When
analyzing national data, all measurements below detection were replaced with MDL/2 for two
reasons: (1) identification of data sets with uncensored values (i.e., NOT zero, MDL/2, or
MDL) is difficult and (2) data below detection need to be treated consistently across the entire
time period and all sites.
• Treatment of site-level data
- In a site-level analysis, in which the analyst knows how the data have been reported, more
sophisticated methods may be employed.
• If uncensored values are reported below MDL, use the data "as is" with no substitution.
• If uncensored values are not available, use MDL/2 substitution for data at or below MDL if trying to
calculate an annual mean value:
- Substitution may lead to a bias on the order of 10-40% in the annual average when < 85% of the data are below MDL.
- At >85% of data below MDL, uncertainties are large and one may only reliably state that the concentration is below MDL.
- Alternatives to MDL/2 substitution are more statistically intensive; however, in some cases
they may yield better results. Note at a high degree of censoring (>70% censored data), no
technique will produce good estimates of summary statistics. EPA recommends some
approaches other than MDL/2 substitution:
• Regression order statistics (ROS) and probability plotting (MR) methods. ROS and MR methods are
superior when distribution shape population is unknown or nonparametric.
• Maximum likelihood estimation (MLE). MLE methods have been shown to have the smallest mean-
squared error (i.e., higher accuracy) of available techniques when the data distribution is exactly normal
or lognormal.
Section 4 - Preparing Data for Analysis
Juns ^uuy GO
-------
Method Detection Limits
Treating Data Below Detection (2 of 2)
• Treatment of site-level data
- ROS produces more accurate results when >30% of the data is below detection.
- MLE does not work well for data sets with <50 detected values.
- Kaplan-Meier is effective for data sets when less than 70% of the data is
censored and the distribution is nonparametric.
• Mixed Data Sets
- For data sets that have a mix of censored and uncensored data, compare two
substitution methods: (1) substitute MDL/2 for censored values and leave
uncensored values "as is" and (2 ) substitute MDL/2 for all data below detection.
- Results that are comparable using both substitution methods increase
confidence in the results, and substitution method 1 should be retained. If the
results do not agree, a more sophisticated method for estimating the data below
MDL may be employed.
• In all cases, data below detection should be flagged, and the percentage of
data below MDL calculated for all aggregated values. A more detailed
discussion of aggregated trends and data below detection (as used in the
national data analysis) can be found in Section 6.
EPA's current guidance is summarized on Slide 42.
Section 4 - Preparing Data for Analysis
-------
Data Treatment Methods
The selection of a data treatment method for below MDL data depends on
the amount of data below MDL and the data quality objectives which are to
be met. Methods explored in previous air toxics work are discussed next.
- Ignore data below MDL.
• Not recommended. Reduces number of samples. Results in a bias of higher values
in summary statistics.
- Replace data below MDL with zero.
• Not recommended. May bias summary statistics low.
- Replace data below MDL with the actual MDL.
• Not recommended. May bias summary statistics high.
- Replace data below MDL with % non-detects*MDL
• Not recommended. Found to be similar to MDL/2 substitution.
- Replace data below MDL with MDL/2.
• Recommended as a simple method for calculating mean values with relatively small
bias.
- Replace data below MDL with more statistically intensive approaches (such
as Kaplan-Meier, Maximum Likelihood Estimation, and Robust Regression on
Order Statistics [KM, MLE, and ROS])
• Recommend for sophisticated analyses such as quantifying percentiles in the data
rather than simply the mean.
Section 4 - Preparing Data for Analysis
-------
Maximum Likelihood Estimation (MLE)
• Maximum likelihood estimation (MLE) (also called Cohen's
method) is a popular statistical method used for fitting a
mathematical model to data.
• This method relies on knowing (or assuming) the underlying
statistical distribution (e.g., lognormal) from which the data are
derived.
• Uncensored data are used to calculate fitting parameters that
represent the best fit to the distribution.
• MLE is sensitive to outliers and does not perform well if the data
do not follow the assumed distribution.
• MLE requires at least 50 uncensored values to work well, so
1-in-6-day sampling will usually not be sufficient for calculating
annual statistics using this technique.
Section 4 - Preparing Data for Analysis
-------
MLE Calculations
Using Statistical Software
The MLE model is a parametric analysis because the
distribution is assumed -- usually assumed to be
lognormal for atmospheric data.
Each data value is assigned a range of possible
concentrations:
- Censored data: Lower value = 0, Higher value = MDL
- Uncensored data: Lower value = Higher value = Reported value
The statistical software procedure may require a
distribution for the input, or require you to log-transform
your data if a normal distribution is assumed.
Summary statistics will be produced that provide
estimates of mean, standard deviation, and some
percentiles for the data set of interest.
Section 4 - Preparing Data for Analysis
-------
Nonparametric Kaplan-Meier (KM)
• Nonparametric methods rely only on ranks of
data and make no assumptions about the
statistical distribution of the data.
• Nonparametric methods are insensitive to
outliers.
Section 4 - Preparing Data for Analysis
-------
KM Using Statistical Software
• Kaplan-Meier can be accessed under Survival Analysis in most
statistical packages.
- This analysis usually expects data to be right-censored (i.e., values
greater than X, rather than less than X).
- Data may need to be "flipped". Take your highest value and set it as
the upper-bound. Subtract all values from it to get your input data set.
Censored data are considered less than the MDL.
• Original data set = 10, 7, 3, .", , 0.7,
• Flipped data set = 0, 3, 7, •", , 9.3,
- Input your flipped data set along with a second column indicating the
censored data values.
• The output will include a survival plot (cumulative distribution
function) and estimated summary statistics for the flipped data set.
- Re-flip the summary statistics for mean, median, and percentiles.
- Measures of variances (standard deviation, confidence intervals) are
independent of flipping and do not need to be changed from the output
values.
Section 4 - Preparing Data for Analysis
-------
Robust Regression on
Order Statistics (ROS)
These techniques calculate summary statistics with a
regression equation on a probability plot.
ROS assumes a distribution only for censored data.
This technique is better for data sets with <30
observations and is therefore suited to typical air toxics
data sets.
Section 4 - Preparing Data for Analysis
-------
ROS using Statistical Software
Data are input as reported values and MDL-censored values.
MDL-censored values will need a column indicating they are
censored.
ROS statistics calculate the probability that observed data are
below each MDL value. If there is only one MDL value, this is just
the fraction of data below MDL.
- Original data set = 10, 7, 3, ", / -, 0.7, -p .-• • ",:> .,••.••' '•<•/•.- > »•:; •' <"
• Probability > 2 = 0.375
• Probability > 1.5 = 0.375
• Probability > 0.3 = 0.583
- Using these probabilities, probability plotting positions are calculated
for all detected and censored observations using the detected data to
determine a best-fit distribution.
- Summary statistics are output from this dataset.
Section 4 - Preparing Data for Analysis
-------
Data Treatment Methods
Summary
EPA's current recommendations for treating data below MDL are provided in
the table below; EPA is developing more definitive guidance.
Small # of Samples
Large # of Samples
Very Large # of
Samples
Exploratory Use
MDL/2
(if only a few samples
are < MDL)
MDL/2
(if< 15% of samples
are < MDL)
Cohen (normal
distribution)
Kaplan Meier (other
than normal)
Publication Use
Kaplan Meier
Kaplan Meier
Cohen (ifapprox.
normal distribution)
Cohen (normal
distribution)
Kaplan Meier (other
than normal)
Regulatory Use
Kaplan Meier
Kaplan Meier
Kaplan Meier
Warren and Nussbaum, 2009
June 2009
Section 4 - Preparing Data for Analysis
42
-------
Data Validation
Introduction (iof2)
Data validation is defined as the process of determining the
quality and validity of observations.
The purpose of data validation is to detect and verify any
data values that may not represent the actual physical and
chemical conditions at the sampling station before the data
are used in analysis.
Validation guidelines are built on knowledge of typical air
toxics emissions sources; formation, loss, and transport
processes; chemical relationships; and site-specific
knowledge.
The primary objective is to produce a database with values
that are of a known quality, an acceptable quality, or a level
of uncertainty given the analyses intended to be conducted.
Section 4 - Preparing Data for Analysis
-------
Data Validation
Introduction (2 of2)
The identification of outliers, errors, or biases is typically carried out in several
stages or validation levels (U.S. Environmental Protection Agency 1999).
- Level 0: Routine verification that field and laboratory operations were conducted in
accordance with standard operating procedures (SOPs) and that initial data processing and
reporting were performed in accordance with the SOP (typically the monitoring entity
performs this step).
- Level I: Internal consistency tests to identify values in the data that appear atypical when
compared to values in the entire data set.
- Level II: Comparisons of current data with historical data (from the same site) to verify
consistency over time.
- Level III: Parallel consistency tests with other data sets with possibly similar characteristics
(e.g., the same region, period of time, background values, air mass) to identify systematic
bias.
The data analyst performs Level 1 steps, and performs additional validation when
other data sets are available.
Data validation is improved by understanding air toxics emissions, formation,
transport, and removal processes. Useful supplementary information in
understanding air toxics species (including data sheets and other information about
air toxics species) is available (links and examples are provided in the appendix to
this section).
There is no substitute for the local knowledge of monitoring sites; operators or
those who have extensive knowledge of the area are a unique resource for data
analysts. However, for those not familiar with a site, spatial maps with topography,
emissions source, and roadway information are excellent tools for understanding
site characteristics.
Section 4 - Preparing Data for Analysis
-------
Data Validation
Initial Approach
• Look at your data—visual inspection is vital.
• Manipulate your data—sort it, graph it, map it—so that it begins to tell a
story. Often, important issues or errors in the data will become apparent
only after someone begins to use the data for some purpose.
• Several checks may be made during the beginning stages of data
validation to single out odd data
- Range checks: check minimum and maximum concentrations for anomalous
values. National analysis may provide reasonable concentration ranges for
comparison; these levels are provided in the appendix to this section.
- Buddy site check: compare concentrations at one site to nearby sites to identify
anomalous differences.
- Sticking check: check data for consecutive equal data values which indicate the
possibility of censored data not appropriately flag.
- Comparison to remote background concentrations: urban air toxics
concentrations should not be lower than remote background concentrations.
• Examples of useful graphics and summaries include scatter plots, time
series plots, fingerprint plots (i.e., sample composition), box whisker plots,
and summary statistics.
Section 4 - Preparing Data for Analysis
-------
Things to Consider When
Evaluating Your Data
• Levels of other pollutants
A high concentration of benzene may be valid when concentrations of all mobile
source air toxics in the sample are also elevated.
• Time of day/year
Higher concentrations of some air toxics are expected in the summer (such as
formaldehyde) than in the winter and vice versa for benzene.
• Observations at other sites
High concentrations of a pollutant at several sites in an area on the same date may
indicate a real emission event.
• Audits and inter-laboratory comparisons
If data are from differing sources, how well did the concentrations compare between
labs? Did audits show some specific "problem" pollutants?
• Site characteristics
High concentrations may be expected for a pollutant emitted by a nearby source.
• Unique events (e.g., holiday fireworks)
High concentrations of trace metals associated with fireworks are seen around
the Fourth of July and New Years Day at many sites.
Section 4 - Preparing Data for Analysis
-------
Data Validation
Tips and Tricks (1 of 2)
Overall
- Proceed from the big picture to the details. For example, proceed from
inspecting species groups to individual species.
- Inspect every specie, even to confirm that a specie normally absent
met that expectation.
- Know the site topography, prevalent meteorology, and major emissions
sources nearby.
Inspect time series for the following
- Large "jumps" or "dips" in concentrations which may indicate a change
in analysis method or MDL.
- Periodicity of peaks. (Is there a pattern? Can the pattern be related to
emissions or meteorology?)
- Expected seasonal behavior (e.g., photochemically formed species
concentrations usually peak during summer).
- Expected relationships among species (e.g., benzene and toluene
typically correlate).
Section 4 - Preparing Data for Analysis
-------
Data Validation
Tips and Tricks (2 of 2)
To further investigate outliers,
- Use wind direction data (e.g., Do outliers occur from a consistent wind
direction?).
- Use subsets of data (e.g., inspect high concentration days vs. other
days for differences in meteorology or emissions).
- Investigate industrial or agricultural operating schedules, unusual
events, etc. (e.g., Were high metals data associated with a dust
event?).
- Determine local traffic patterns (e.g., When does peak traffic occur? Is
there a recreational area or event venue nearby?).
- If no explanation is forthcoming, try contacting the agency that
collected the data; they may have realized a problem too recently to
report it, or your question may alert them to a problem with data
collection, analysis, or reporting.
Section 4 - Preparing Data for Analysis
-------
Data Validation
Using Summary Statistics
Investigation of summary statistics is a great way to begin to understand your data.
Comparison of your data ranges to "typical" ranges provides a reality check and can
illuminate errors in your data.
The table below shows national summary statistics based on 2003 to 2005 annual averages
for selected species; a complete table can be found in the appendix to this section.
These data can be used as benchmarks for site-specific comparison; for example, if your
data are significantly higher than the national 95th percentile, there may be errors in the
data.
- Note that calculation of summary statistics smoothes extreme events so comparison of daily
data to these numbers, for example, may not be adequate; individual high concentration days
may legitimately be higher than the summary statistics.
- We suggest a comparison between similar summary statistics rather than a comparison of
summary statistics to raw data.
Pollutant
Toluene
N-Hexane
Benzene
Acetaldehyde
M_P Xylene
AQS
Code
45202
43231
45201
43503
45109
Average
% Below
Detection
1
2
2
4
5
#of
Monitoring
Sites
295
168
307
163
266
5th Percentile
Concentration
(|jg/m3)
6.9E-01
2.4E-01
4.9E-01
7.8E-01
2.8E-01
25th Percentile
Concentration
(|jg/m3)
1.5E+00
5.1E-01
7.4E-01
1.3E+00
6.7E-01
Median
Concentration
(|jg/m3)
2.4E+00
8.4E-01
1.0E+00
1.6E+00
1.1E+00
75th Percentile
Concentration
(|jg/m3)
3.8E+00
1.5E+00
1.5E+00
2.3E+00
1.7E+00
95th Percentile
Concentration
(|jg/m3)
7.4E+00
2.7E+00
3.1E+00
4.2E+00
3.4E+00
1-in-a-million
Cancer Risk
Level
(ug/m3)
1.3E-01
4.5E-01
Remote
Background
Concentration
(ug/m3)
1 .4E-01
1 .6E-01
June 2009
Section 4 - Preparing Data for Analysis
49
-------
Data Validation
Buddy Check Example
re
Buddy site checks are important at a site
level.
The plot shows a time series of arsenic
PM2 5 measurements at neighboring sites
near a major emissions source.
Plotting the time series together
illuminates 4 high concentration o>
measurements which are not in agreemei f"
at both sites (red circles),
as well as, 3 high concentration events
which were recorded at both sites (black
circles).
The measurement agreement (black
circles) between sites offers increased
confidence that arsenic concentrations
were truly higher on these days (i.e., thes
concentration values are not measuremei
or reporting errors).
Points marked with red circles, on the
other hand, should be flagged as suspect
for further investigation.
- Check that high concentration events do not
correlate with unusual events. In this case,
the analyst might check whether these events
coincide with typical firework days such as the
Fourth of July and New Years Eve; in this
example these measurements do not.
- The next step is to check correlation of wind
direction and local emissions sources as an
explanation for these measurements.
0.06
0.05
Arsenic PM2.5 Time Series
Jan-04 Mar-04 Jun-04 Aug-04
Nov-04
Time
Jan-05 Apr-05 Jun-05 Sep-05
Sample time series of 24-hr arsenic PM2 5 measurements
at two sites about five miles apart. Both sites show above
average arsenic concentrations and are located near a
major emissions source. The figure was created in
Microsoft Excel.
June 2009
Section 4 - Preparing Data for Analysis
50
-------
Screening Data Using Remote
Background Concentrations
Knowledge of remote background concentrations of air toxics can be used as lower
limits for data screening. A cutoff value of 20% lower than the background
concentration is used as a margin of error.
Data below this value may be identified as suspect.
If data are identified as below the background concentration, the first things to
check are
- Units (e.g., Were units reported and/or converted correctly?)
- Sticking from substituted values such as MDL/2, MDL/10, or 0.
This screen was applied to the national data set. It was decided that data failing
this check would not be used in subsequent analyses.
Pollutant
Acetaldehyde
Benzene
Carbon Tetrachloride
Chloroform
Formaldehyde
Methylene Chloride
Tetrachloroethylene
Trichlorofluoromethane
Dichlorodifluoromethane
Trichlorotrifluoroethane
1,1,1-trichloroethane
Methyl Chloride
Remote Background
Concentration (ug/m3)
0.16
0.14
0.62
0.046
0.18
0.087
0.022
1.4
2.7
0.61
0.18
1.2
Cutoff Value (ug/m3)
0.13
0.11
0.50
0.037
0.14
0.070
0.018
1.1
2.2
0.49
0.14
0.96
McCarthy etal., 2006
June 2009
Section 4 - Preparing Data for Analysis
51
-------
Screening Data Using Remote
Background Concentrations
Example
This plot shows a time series
plot of concentrations of long-
lived species measured at an
urban Southwestern site
compared to background
concentrations measured at
remote sites in the Northern
Hemisphere.
Significant spikes and dips in
concentrations are circled.
Most of the time, concentrations
at this monitor were equal to or
greater than background
concentrations, which might be
expected for urban locations.
Concentrations more than 20%
below the background level
were identified as suspect for
further review.
CH,CI Background = 0.6 ppb
CCUF, Background = 0.55 ppb
CCL Background = 0.09 ppb
.•F**, _.-,._._._ .jf^f.m^f^
s-*n "••••»r--N
June 2009
Date
Concentrations (ppb) of carbon tetrachloride (CCI4), dichlorodifluoromethane
(CCI2F2), and methyl chloride (CH3CI) from 2003 and 2004. Northern
Hemisphere background concentrations of each species were plotted as a
line. Concentration dips well below background concentrations are circled.
Section 4 - Preparing Data for Analysis
52
-------
Data Validation Examples
Scatter Plots
Scatter plot matrices can be used to rapidly and
qualitatively examine possible correlations among
measured species at a site.
To interpret a scatter plot matrix, locate the row
variable (e.g., methyl ethyl ketone [MEK] in the
figure near the top left) and the column variable
(e.g., methyl tert-butyl ether [MTBE]) on the
bottom. The intersection is the scatter plot of the
row variable on the vertical axis against the
column variable on the horizontal axis. Each
column and row is scaled so that data points fill
each frame; scale information is omitted for
clarity. The diagonal plots contain histograms of
the data for each row variable.
It is clear that some species correlate well. For
example, toluene has a reasonable correlation
with ethylbenzene and m- and p-xylene. In
contrast, MEK does not correlate with any of the
other species; this may indicate that MEK is
emitted from different sources. Finally, MTBE
shows a bifurcated relationship with toluene,
ethylbenzene, and m- and p-xylene. This
interesting relationship might be investigated in
later validation steps and analysis.
LU
N
LU
CO
LLJ
X
CL
LU
CO
o o on
goo
$
We
MEK
TOL
EBENZ
MPXYL
MTBE
Scatter plot matrix of selected species from an urban site.
The species plotted (from top to bottom and left to right) are
methyl ethyl ketone (MEK), toluene (TOL), ethylbenzene
(EBENZ), m- and p-xylene (MPXYL), and methyl tert-butyl
ether (MTBE). The plot was created with SYSTAT11.
June 2009
Section 4 - Preparing Data for Analysis
53
-------
Data Validation Examples
Time Series
The concentrations of selected VOCs
(acetylene, toluene, benzene, and
1,3-butadiene) are plotted as a function of
time. Note that (1) no valid data were
available on some dates in 2001 and in the
middle of 2002, (2) all species exhibited
seasonal variations in concentration with
higher concentrations observed in the cool
season, (3) concentrations of these species
varied by an order of magnitude, and (4) for
most days, these species concentrations
correlated well (e.g., R2=0.91).
This example illustrates how time series plots
may be used to check for expected temporal
variability (based on emission sources,
meteorology, and species reactivity), such as
interannual or seasonal variability. The
selected VOCs are present in gasoline
exhaust and are expected to have lower
concentrations during the summer due to
higher mixing heights (i.e., dilution) and
faster removal rates by photochemical
reactions. A species that does not follow its
expected temporal variability may indicate
misidentification or some other problem.
Date
Twenty-four-hour average concentrations (ppb) of acetylene,
1,3-butadiene, benzene, and toluene collected at an urban site
every sixth day from July 2001 through July 2002. The figure was
created with Microsoft Excel.
June 2009
Section 4 - Preparing Data for Analysis
54
-------
Data Validation Examples
Box Plot
To interpret these box plots,
see Slide 22 of this chapter.
This plot shows the
concentration of benzene at a
site from 1990-2005. It is
immediately clear by the large
concentration change from
1990-1993 that something
affected the data and should
be investigated.
- Were there significant method
or MDL changes during this
time?
- Is this change due to
emissions regulations or is
there another explanation?
oo
£:
~D)
c
.0
"-I—>
cp
-i—>
c
CD
O
c
O
O
Benzene
i i i i i i i i i i i i i i i i
X O
O
YEAR
Notched box whisker plot of 24-hr average concentration of
benzene by year at an urban monitoring site in the United
States. Concentrations show a substantial change from
1990 to 1993. The plot was created with SYSTAT11.
June 2009
Section 4 - Preparing Data for Analysis
55
-------
Data Validation Examples
Fingerprint Plot
A fingerprint plot is a depiction of all the species
concentrations present in a sample, preferably
presented in a meaningful order (e.g., by elution
order in the analytical technique, by carbon
number, etc.).
Fingerprint plots are used to examine
irregularities in whole sample concentrations
and unusual distributions of species. The
analyst may inspect all samples, with special
focus on those that were identified as suspect
or invalid in time series or scatter plot analyses.
The fingerprint plot here shows the
concentrations from an urban site on March 10,
2004, when the concentrations of the two
trimethylbenzene isomers were very high, and
other aromatic species like toluene, xylenes,
and ethylbenzene were also elevated relative to
other samples.
A "typical" fingerprint plot from October 6, 2003,
is shown in the inset for qualitative comparison.
"Typical" means the relationships among
pollutants was similar across most samples, i.e.,
representative of an average. The March 10,
2004, sample may be valid but was identified as
suspect and requires further investigation.
.Q
Q.
Q.
C
o
-—
CD
O
C
o
O
Typical fingerprint
1,2,4-trimethylbenzene
1,3,5-trimethylb
m-and p-xylene
enzena
MEK
acetylene
/ propylene \
ethylbenzene
toluene ^
benzene
\
p-xylene
MKZ- West 43rd
Example fingerprint plot of 24-hr concentrations (ppb)
from March 10, 2004. The inset figure shows a more
typical fingerprint at the same site on October 6, 2003.
Fingerprint plots were created with VOCDat software.
June 2009
Section 4 - Preparing Data for Analysis
56
-------
Data Validation Examples
Using Metadata - Urban vs. Rural Sites
Knowledge of metadata allows the analyst to
understand reasons for patterns observed in the
data.
This figure illustrates that the concentrations at
each site do not need to be the same but do
need to be consistent with our expectations of
concentrations at urban and rural sites.
Sites 1 and 2 show the highest concentrations
because these sites are relatively close to an
Interstate highway and are located in urban
areas.
In contrast, monitoring site 3 shows relatively
low m-&p-xylenes concentrations, as expected
for a site outside the urban area.
Note: Concentrations at rural sites may be
higher if a known emissions source is nearby or
if in situ production occurs. Metadata provide a
basis for thinking about the data and making
hypotheses, but expectations should never be
substituted for real data validation. Try to prove
your hypotheses wrong in order to be sure that
they are correct!
CD
X
Q_
5 1
0
x
X
o
Site 1 Site 2 Site 3
Notched box whisker plot of 24-hr m-&p-xylenes
concentrations at three monitoring stations in 2005.
Red indicates urban sites and blue represents a rural
site. Figure was created with SYSTAT.
June 2009
Section 4 - Preparing Data for Analysis
57
-------
Data Validation Examples
Investigating Suspect Data
Initial Analysis: Typically, toluene
concentrations are higher than benzene
concentrations. Observation of an unexpected
relationship, like these data at an urban site,
indicate that further investigation of the data is
needed.
o
.0
Q.
Q.
(U
C
(U
N
C
(U
CQ
Advanced Analysis: Wind direction data were
used to identify possible reasons for the high
benzene concentrations in this plot of 1-hr
benzene concentrations vs. wind direction. The
highest benzene concentrations are typically
coming from north of the site. Site and emission
inventory inspection showed a source of coke
oven emissions, which include benzene but not
toluene, to the north providing a reasonable
explanation for these data (and helping prove
their validity).
o
.a
a_
(U
c
(U
N
C
(U
CQ
Toluene (ppbC)
June 2009
Section 4 - Preparing Data for Analysis
Wind Direction
58
-------
Data Validation
Handling Suspect Data
During the process of data validation, the analyst may
identify data as suspect but not be able to prove that
the data are invalid.
Analysts may decide to exclude these suspect data
from central tendency computations (e.g., annual
average) or other analyses.
These data may warrant additional investigation using
case studies (i.e., inspection of individual dates).
Section 4 - Preparing Data for Analysis
-------
Summary
Data Preparation Check List
Acquire data
Q Check for availability of supplementary data
O Meteorological measurements
O Additional species
O Metadata
LI Use supplementary data
O Thoroughly review all metadata describing what/why/how
measurements were made.
O Find out about site characteristics including
- Meteorology
Local emissions sources
Geography
Know your data
Q A general knowledge of air toxics behaviors is
invaluable. Know and understand typical
relationships and patterns that have been observed
in air toxics data.
Process your data
LI Investigate collocated data, do they agree?
LI Create valid data aggregates
O Check for data completeness
O Prepare and inspect valid aggregates and calculate the
percentage of data below MDL
LI Identify censored data and make MDL substitutions if
necessary
O Use knowledge of data reporting methpds to identify
substitution used for data below detection, if any.
O If reporting of data below detection is unknown, separate
data below detection and check for repetitive values or
linear relationships detection limits
O If data are uncensored, use "as is"
O If data are censored, make MDL/2 substitutions or more
sophisticated method as needed
O If the data contain a mixture of censored and uncensored
data,
- Test two substitution methods for a sample analysis:
( 1) MDL/2 substitution for all data and (2) MDL/2
substitution for censored data, leaving uncensored data
"as is".
- If direction and magnitude of trends results agree, keep
substitution method 2.
Validate your data
LI Get an overview—prepare and inspect summary
statistics
LI Apply visual and graphical methods to illuminate
data issues and outliers
O Buddy site check
O Remote background comparison
O Scatter plots
O Time series
O Fingerprint plots
LI Flag suspect data
LI Investigate suspect data using
- Local sources/wind direction
- Subsets of data
- Unusual events
LI Exclude invalid data
O If you cannot prove the data are invalid, flag as suspect.
These data may be removed from some analyses as an
outlier even if they can not be invalidated. Advanced
analyses may provide more insight into the data.
June 2009
Section 4 - Preparing Data for Analysis
60
-------
Appendix:
National Summary Statistics (2003-2005)
The appendix contains a table of national summary statistics
based upon annual averages from 2003 to 2005.
These data are useful for comparison of data ranges to "typical"
national ranges.
These data can be used as benchmarks for site-specific
comparison; for example, if data are significantly higher than the
national 95th percentile, there may be errors in the data.
Section 4 - Preparing Data for Analysis
-------
Appendix - National Summary Statistics (2003-2005)
(1 of 3)
Pollutant
1 ,1 ,2,2-Tetrachloroethane
1,1,2-Trichloroethane
1,1-Dichloroethane
1,1-Dichloroethylene
1 ,2,4-Trichlorobenzene
1 ,2-Dichloropropane
1 ,3-Butadiene
1 ,4-Dichlorobenzene
1 ,4-Dioxane
2,2,4-Trimethylpentane
3-Chloropropene
Acenaphthene
Acenaphthylene
Acetaldehyde
Acetonitrile
Acrolein
Acrylonitrile
Anthracene
Antimony (Pm10) Stp
Antimony (Tsp)
Antimony Pm2.5 Lc
Arsenic (Pm10) Stp
Arsenic (Tsp)
Arsenic Pm2.5 Lc
Benzene
Benzo(A)Pyrene (Pm10) Stp
Benzo(B)Fluranthene (Pm10) Stp
Benzo(G,H,l)Perylene (Pm10) Stp
Benzo(K)Fluoranthene (Pm10) Stp
Benzo[A]Anthracene
Benzo[A]Pyrene
Benzo[B]Fluoranthene
AQS Code
43818
43820
43813
43826
45810
43829
43218
45807
46201
43250
43335
17147
17148
43503
43702
43505
43704
17151
82102
12102
88102
82103
12103
88103
45201
82242
82220
82237
82223
17215
17242
17220
% Below
Detection
97
98
97
98
90
96
26
64
94
13
100
44
68
4
58
43
70
73
68
84
92
46
75
60
2
67
50
27
74
90
94
90
#of
Monitoring
Sites
228
211
224
225
164
229
278
202
14
125
13
33
33
163
63
53
124
31
15
45
275
38
82
434
307
18
18
18
18
30
30
30
5th Percentile
Concentration
(ug/m3)
6.9E-02
5.5E-02
1 .OE-02
2.0E-02
1 .2E-02
1 .5E-02
3.5E-02
1 .9E-02
4.5E-02
1.1E-01
1.1E-01
5.6E-04
2.4E-04
7.8E-01
3.6E-01
1.2E-01
4.1E-02
1 .9E-04
7.3E-04
3.3E-04
4.8E-03
4.1E-04
9.9E-04
9.4E-05
4.9E-01
3.5E-05
5.5E-05
1 .2E-04
2.9E-05
7.8E-05
1 .6E-04
7.6E-05
25th Percentile
Concentration
(ug/m3)
1.6E-01
1.3E-01
6.1E-02
9.5E-02
6.2E-02
7.7E-02
9.5E-02
1.1E-01
4.9E-02
2.9E-01
1.2E-01
5.7E-03
6.8E-04
1.3E+00
6.3E-01
2.1E-01
8.2E-02
7.0E-04
1 .2E-03
1 .OE-03
6.7E-03
8.6E-04
1 .5E-03
2.7E-04
7.4E-01
6.2E-05
8.1E-05
1 .8E-04
3.6E-05
8.0E-05
2.3E-04
7.9E-05
Median
Concentration
(ug/m3)
1.7E-01
1 .4E-01
1.0E-01
9.9E-02
1.5E-01
7.9E-02
1.6E-01
2.4E-01
6.9E-02
4.8E-01
1.6E-01
1 .4E-02
3.4E-03
1 .6E+00
1.1E+00
4.4E-01
1 .4E-01
6.1E-03
8.5E-03
7.0E-03
1 .3E-02
1 .9E-03
5.0E-03
1 .2E-03
1 .OE+00
8.5E-05
1 .OE-04
2.7E-04
4.7E-05
1 .6E-04
3.2E-04
1 .9E-04
75th Percentile
Concentration
(ug/m3)
3.1E-01
1.9E-01
1.0E-01
1.1E-01
6.4E-01
1.5E-01
2.4E-01
5.2E-01
9.2E-02
7.8E-01
1.6E-01
3.9E-02
3.9E-02
2.3E+00
4.4E+00
1.2E+00
3.1E-01
7.9E-03
8.5E-03
1 .OE-02
1 .4E-02
1 .OE-02
5.5E-03
1 .7E-03
1.5E+00
1 .5E-04
1 .9E-04
3.4E-04
8.4E-05
4.4E-04
5.0E-04
6.2E-04
95th Percentile
Concentration
(ug/m3)
1.1E+00
9.0E-01
6.8E-01
6.5E-01
1.2E+00
7.6E-01
8.4E-01
9.9E-01
1.2E-01
2.4E+00
1.9E-01
7.2E-02
4.4E-02
4.2E+00
3.2E+01
1.5E+00
1.5E+00
8.9E-03
6.0E-02
1.1E-02
1 .5E-02
1.1E-02
1 .OE-02
2.5E-03
3.1E+00
4.4E-04
4.5E-04
6.4E-04
2.1E-04
1 .8E-03
3.6E-03
3.6E-03
June 2009
Section 4 - Preparing Data for Analysis
62
-------
Appendix - National Summary Statistics (2003-2005)
(2 of 3)
Pollutant
Benzyl Chloride
Beryllium (Pm10) Stp
Beryllium (Tsp)
Bromoform
Bromomethane
Cadmium (Pm10) Stp
Cadmium (Tsp)
Cadmium Pm2.5 Lc
Carbon Disulfide
Carbon Tetrachloride
Chlorine Pm2.5 Lc
Chlorobenzene
Chloroethane
Chloroform
Chloromethane
Chloroprene
Chromium (Pm10) Stp
Chromium (Tsp)
Chromium Pm2.5 Lc
Chromium Vi(Tsp)
Chrysene
Cobalt (Pm 10) Stp
Cobalt (Tsp)
Cobalt Pm2.5 Lc
Dibenz(A-H)Anthracene (Pm10) Stp
Dibenzo[A,H]Anthracene
Dichloromethane
Ethyl Acrylate
Ethylbenzene
Ethylene Dibromide
Ethylene Dichloride
Ethylene Oxide
Fluoranthene
Fluorene
Formaldehyde
Hexachlorobutadiene
Hydrogen Sulfide
AQS Code
45809
82105
12105
43806
43819
82110
12110
88110
42153
43804
88115
45801
43812
43803
43801
43835
82112
12112
88112
12115
17208
82113
12113
88113
82151
17231
43802
43438
45203
43843
43815
43601
17201
17149
43502
43844
42402
% Below
Detection
95
82
87
100
92
50
73
93
73
42
67
83
93
74
6
99
36
67
65
55
87
55
66
96
91
98
53
100
10
98
95
38
40
42
35
95
91
#of
Monitoring
Sites
110
27
62
94
228
37
105
263
75
280
427
226
159
273
245
114
33
106
428
21
30
23
52
270
18
30
277
46
291
235
253
16
33
33
163
153
39
5th Percentile
Concentration
(M9/m3)
7.4E-03
2.3E-06
8.8E-06
5.2E-02
4.4E-02
1 .2E-04
1 .4E-04
2.5E-03
1.1E-01
3.3E-01
3.4E-04
1 .2E-02
1 .3E-02
6.7E-02
7.9E-01
4.5E-02
4.9E-04
1 .3E-03
3.1E-05
1 .3E-05
1 .8E-04
8.1E-05
2.0E-04
3.2E-04
2.5E-05
8.3E-05
1.8E-01
9.6E-02
1.2E-01
3.8E-02
2.2E-02
1.7E-01
3.1E-04
2.2E-03
1.2E+00
8.0E-02
1 .OE-03
25th Percentile
Concentration
(M9/m3)
4.0E-02
4.1E-06
2.6E-05
2.7E-01
1.0E-01
2.4E-04
3.8E-04
2.9E-03
1.6E-01
4.8E-01
2.8E-03
4.4E-02
3.9E-02
1.2E-01
1.0E+00
4.5E-02
1 .OE-03
1 .8E-03
7.0E-05
1 .8E-05
3.1E-04
1 .6E-04
5.2E-04
5.3E-04
2.5E-05
1 .8E-04
2.4E-01
1.2E-01
2.5E-01
9.9E-02
1.0E-01
1.8E-01
3.2E-04
4.6E-03
2.0E+00
1.1E-01
1 .OE-03
Median
Concentration
(M9/m3)
1.8E-01
4.6E-05
3.0E-05
5.0E-01
1.9E-01
5.0E-04
8.0E-04
6.4E-03
2.6E-01
5.5E-01
1 .2E-02
5.5E-02
1.0E-01
2.4E-01
1.2E+00
4.5E-02
2.1E-03
2.4E-03
1.1E-03
2.6E-05
1 .8E-03
3.0E-04
9.2E-04
8.0E-04
2.9E-05
7.8E-04
4.0E-01
1.9E-01
4.2E-01
1.9E-01
1.0E-01
2.1E-01
1 .5E-03
7.8E-03
2.7E+00
1.7E-01
1.1E-03
75th Percentile
Concentration
(M9/m3)
3.7E-01
3.0E-04
1 .6E-04
5.2E-01
2.1E-01
9.0E-04
1 .5E-03
6.6E-03
1.3E+00
6.3E-01
2.9E-02
1.5E-01
1.4E-01
2.5E-01
1.3E+00
8.6E-02
2.8E-03
4.8E-03
2.0E-03
3.8E-05
3.1E-03
2.0E-03
2.0E-03
8.2E-04
3.6E-05
8.6E-04
8.7E-01
3.3E-01
6.3E-01
2.2E-01
2.0E-01
2.5E-01
3.6E-03
8.1E-03
3.8E+00
1.1E+00
1 .5E-03
95th Percentile
Concentration
(Mg/m3)
8.4E-01
4.6E-04
2.7E-04
7.2E-01
6.4E-01
1 .2E-03
2.7E-03
6.9E-03
3.2E+00
1.1E+00
1 .3E-01
7.6E-01
4.4E-01
8.2E-01
1 .6E+00
5.0E-01
6.2E-03
1 .6E-02
3.2E-03
7.5E-04
3.2E-03
4.8E-03
2.3E-03
8.8E-04
8.1E-05
3.6E-03
6.1E+00
5.0E-01
1 .OE+00
1.3E+00
6.8E-01
4.6E-01
1 .8E-02
3.5E-02
6.7E+00
1 .8E+00
4.1E-03
June 2009
Section 4 - Preparing Data for Analysis
63
-------
Appendix - National Summary Statistics (2003-2005)
(3 of 3)
Pollutant
Indeno[1,2,3-Cd] Pyrene (Pm10) Stp
lndeno[1,2,3-Cd]Pyrene
Isopropyl benzene
Lead (Pm 10) Stp
Lead (Tsp)
Lead Pm2.5 Lc
M_P Xylene
Manganese (Pm10) Stp
Manganese (Tsp)
Manganese Pm2.5 Lc
Mercury (Tsp)
Mercury Pm2.5 Lc
Methyl Chloroform
Methyl Isobutyl Ketone
Methyl Methacrylate
Methyl Tert-Butyl Ether
Naphthalene
N-Hexane
Nickel (Pm10) Stp
Nickel (Tsp)
Nickel Pm2.5 Lc
O-Xylene
Phenanthrene
Phosphorus Pm2.5 Lc
Propionaldehyde
P-Xylene
Scandium Pm2.5 Lc
Selenium (Pm10) Stp
Selenium (Tsp)
Selenium Pm2.5 Lc
Styrene
Tetrachloroethylene
Toluene
Trichloroethylene
Vinyl Acetate
Vinyl Chloride
AQS Code
82243
17243
45210
82128
12128
88128
45109
82132
12132
88132
12142
88142
43814
43560
43441
43372
17141
43231
82136
12136
88136
45204
17150
88152
43504
45206
88163
82154
12154
88154
45220
43817
45202
43824
43447
43860
% Below
Detection
51
92
61
37
34
37
5
4
46
35
97
87
72
87
98
57
51
2
38
70
57
9
37
94
20
13
99
52
82
55
51
69
1
87
18
96
#of
Monitoring
Sites
18
30
117
37
193
434
266
27
96
434
25
270
263
134
45
207
39
168
36
101
428
282
33
427
118
17
263
22
43
434
272
273
295
268
24
254
5th Percentile
Concentration
(M9/m3)
5.3E-05
1 .5E-04
2.6E-02
2.4E-03
1 .9E-03
4.8E-04
2.8E-01
2.7E-03
4.9E-03
4.6E-04
5.0E-05
1 .OE-03
9.3E-02
3.9E-02
1.4E-01
3.6E-02
1 .3E-03
2.4E-01
3.8E-04
1 .5E-03
5.7E-05
1.1E-01
3.0E-03
4.1E-04
7.5E-02
6.8E-01
1 .5E-03
8.1E-05
6.8E-04
8.3E-05
3.8E-02
1.1E-01
6.9E-01
6.1E-02
1.8E-01
2.6E-02
25th Percentile
Concentration
(M9/m3)
9.0E-05
2.6E-04
5.0E-02
3.7E-03
5.1E-03
1 .2E-03
6.7E-01
3.8E-03
1 .2E-02
9.3E-04
5.0E-05
1 .5E-03
1 .4E-01
5.0E-02
1.9E-01
1.3E-01
3.8E-02
5.1E-01
1 .7E-03
2.4E-03
1 .6E-04
2.4E-01
3.1E-03
7.4E-04
2.1E-01
1.2E+00
2.2E-03
4.0E-04
1 .2E-03
4.1E-04
7.8E-02
1.8E-01
1.5E+00
1.3E-01
7.2E-01
6.0E-02
Median
Concentration
(M9/m3)
1 .2E-04
7.8E-04
6.4E-02
5.6E-03
1 .2E-02
3.2E-03
1.1E+00
5.7E-03
2.1E-02
1 .6E-03
5.1E-05
2.6E-03
1 .4E-01
1.7E-01
2.0E-01
5.0E-01
4.0E-02
8.4E-01
2.6E-03
2.9E-03
9.6E-04
4.6E-01
7.0E-03
3.6E-03
2.7E-01
2.2E+00
3.6E-03
9.0E-04
1 .6E-03
1.1E-03
1.6E-01
2.3E-01
2.4E+00
1.5E-01
9.8E-01
6.5E-02
75th Percentile
Concentration
(M9/m3)
1 .9E-04
8.8E-04
1.1E-01
1 .3E-02
3.8E-02
4.3E-03
1.7E+00
1 .4E-02
2.9E-02
2.4E-03
4.5E-04
2.8E-03
1 .9E-01
2.8E-01
2.2E-01
1.1E+00
1.1E-01
1.5E+00
4.1E-03
3.4E-03
1 .4E-03
7.0E-01
1 .3E-02
5.3E-03
4.2E-01
2.9E+00
3.8E-03
8.5E-03
6.4E-03
1 .6E-03
3.7E-01
4.1E-01
3.8E+00
2.3E-01
1.3E+00
1 .3E-01
95th Percentile
Concentration
(Mg/m3)
4.3E-04
3.6E-03
5.0E-01
4.0E-02
2.9E-01
8.8E-03
3.4E+00
5.5E-02
8.4E-02
7.0E-03
2.1E-03
3.1E-03
9.2E-01
9.7E-01
6.6E-01
2.8E+00
5.0E-01
2.7E+00
5.8E-03
5.5E-02
3.8E-03
1 .3E+00
9.7E-02
7.7E-03
6.5E-01
4.0E+00
4.7E-03
9.3E-03
6.7E-03
2.4E-03
8.8E-01
1 .4E+00
7.4E+00
8.9E-01
2.2E+00
4.2E-01
June 2009
Section 4 - Preparing Data for Analysis
64
-------
Resources
Data Acquisition
Primary data source—EPA's AQS: National repository
of ambient monitoring data.
http://www.epa.gov/ttnmain1/airs/airsaqs/
AQS Discover Web- data retrieval system.
http://www.epa.gov/ttn/airs/airsaqs/aqsdiscover/
Other data sources
- IMPROVE: A source of speciated PM2 5 data.
http://vista.cira.colostate.edu/views/
- SEARCH: A source of speciated PM25 data.
http://www.atniospheric-research.coni/public/index.htnil
- National Weather Service: Has a variety of historical
meteorological data for selected locations.
http://www.nws.noaa.gov/
Section 4 - Preparing Data for Analysis
-------
Resources
Quality Assurance
Ambient Monitoring Technology Information Center: A variety of
background information on monitoring methods and QA for
multiple monitoring networks, http://www.epa.gov/ttn/amtic/
Toxics specifically: http://www.epa.gov/ttn/amtic/airtoxpq.html
EPA quality assurance: Office of Air Quality Planning and
Standards, http://www.epa.gov/oar/oagps/ga/index.html#back
PAMS data analysis workbook (circa 2000): analysis and
validation of PAMS data.
http://www.epa.gov/oar/oagps/pams/analvsis/
EPA supersite overview: background and QA documentation.
http://www.epa.gov/ttn/amtic/supersites.html
EPA PM2 5 network quality assurance.
http://www.epa.gov/ttn/amtic/specgual.html
Section 4 - Preparing Data for Analysis
-------
Resources
Metadata
• Google Earth: High resolution satellite data useful for investigating site
locations and local emissions sources, http://earth-software.com/freebie/
• Federal Highway Administration: Information on number of miles traveled
on roadways, total amount of gasoline sold etc.; useful for correlating long
term mobile source trends http://www.fhwa.dot.gov/index.html
Vehicle miles traveled, fuel composition, fleet characteristics
http://www.fhwa.dot.gov/policv/ohpi/
• National Emissions Inventory 2002: Emissions inventory for the United
States; some Canada and Mexico data also available.
http://www.epa.gov/ttn/chief/net/2QQ2inventory.html
• EPA's AirData Facility Emissions Report and regulations for Criteria Air
Pollutants and HAPS: Site level emissions data.
http://www.epa.gov/air/data/geosel.html
• MapQuest (useful for mapping site locations), http://www.mapguest.com/
• U.S. Census Bureau: A variety of information; some of the most useful are
population and population density, http://www.census.gov/
Query tool: factfinder.census.gov/
Section 4 - Preparing Data for Analysis
-------
Resources
Advanced methods for estimating data structure below detection
• Helsel D.R. (2005) Nondetects and data analysis: statistics for censored
environmental data. John Wiley & Sons, Inc., Hoboken, NJ.
• Helsel D.R. (2005) More than obvious: better methods for interpreting
nondetect data. Environ. Sci. Technol., 419A-423A, American Chemical
Society.
• Antweiler R.S. and Taylor H.E. (2008) Evaluation of statistical treatments of
left-censored environmental data using coincident uncensored data sets:
I. Summary statistics. Environ. Sci. Technol., 42, 10, 3732-3738.
• U.S. EPA (2004) Local Limits Development Guidance Appendices.
EPA 833-R-04-0-02B:, Office of Wastewater Management: Washington, DC.
• Kaplan-Meier Method
Kaplan, E. L. and Meier, P. (1958) Nonparametric estimation from incomplete
observations. J. Amer. Stat. Assn, 53, 282 (June), 457-481, doi:10.2307/2281868.
• Robust Regression on Order Statistics
Lee, L. and Helsel, D. (2007) Statistical analysis of water-quality data containing
multiple detection limits II: S-language software for nonparametric distribution
modeling and hypothesis testing. Comput. Geosci. 33, 5 (May), 696-704.
http://dx.doi.org/10.1016/j.cageo.2006.09.006
Section 4 - Preparing Data for Analysis
-------
Resources
. HAPs Information and Methods
- NATA. County level risk assessment modeling data for NATA all years
http://www.epa.gov/ttn/atw/natamain/
- EPA integrated risk information system: Searchable database of human
health effects by pollutant, http://www.epa.gov/iris/index.html
- Agency for Toxic Substances & Disease Registry. General toxics
information and FAQs, http://www.atsdr.cdc.gov/toxfag.html
- EPA air toxics website (ATW). General information on a variety of HAPs
topics, http://www.epa.gov/ttn/atw/
- Lake Michigan Air Directors Consortium. Summary of Phases l-lll of
national analyses, http://www.ladco.org/toxics.html
- EPA's FERA (Fate, Exposure and Risk Analysis)
http://www.epa.gov/ttn/fera/
• Hydrocarbons
- EPA PAMS web site including access to the PAMS Data Analysis
Workbook, http://www.epa.gov/oar/oagps/pams/
- PAMS validation and analysis projects (e.g.,
http://www.nescaum.org/proiects/pams/index.html)
- Ambient monitoring technology information center (AMTIC) - PAMs
monitoring information, http://www.epa.gov/ttn/amtic/pamsmain.html
• Particulate Matter
- EPA's PM2 5 data analysis web site, http://www.epa.gov/oar/oaqps/pm25/
Section 4 - Preparing Data for Analysis
-------
Resources
Data Validation
VOCDat (PAMS, air toxics),
http://vocdat.sonomatech.com/
SDVAT (PM2 5). Developed by RTI, available through
EPA OAQPS monitoring group.
Section 4 - Preparing Data for Analysis
-------
Resources
Data Analysis
Basic data handling, display, and analysis:
- Spreadsheets (if data sets are small enough)
- Databases
- Geographic information systems (CIS)
Statistical analyses
- Package used throughout this workbook: SYSTAT
(http://www.aspiresoftwareintl.com/html/svstat.html)
- Commonly used at EPA: SAS
(http://www.sas.com/technoloqies/analvtics/statistics/stat/)
- Open source: R (http://www.r-proiect.org/)
There are other sources of statistical software packages - this list
is not intended to be an endorsement.
Section 4 - Preparing Data for Analysis
-------
Treating Data
-------
Maximum Likelihood
Example
Let X-,, X2, ..., Xm, ..., Xn represent all the n data values ranked from
largest to smallest. The first "m" values represent the data values
above the detection limit (DL), and the remaining "n-m" data points are
those below DL.
Compute the sample mean and the sample variance from only the "m"
above detection data values. The mean will be too large because the
small undetected values have been ignored, and the variance too
small.
The mean will be lowered and the variance enlarged through the use of
factors! YI _ ^y^ 2 Xd is the sample mean
1_ _ _ v _ _ d _ Sd is the sample standard deviation
' m istnenumker of detected values
d 7 n is the total number of values
Use the table on the next page to obtain
A/ ( Y -O_ I
^ I 7 •/
From material supplied by
Warren and Nussbaum (2009)
Section 4 - Preparing Data for Analysis
-------
EPA/QA/G-9S, Table A-11
y
.00
.05
.10
,15
,20
.25
,30
.35
.40
,45
,50
,55
.60
.65
.70
.75
.80
.85
.90
.95
1 .00
.25
.31862
,32793
.33662
.34480
,35255
.35993
.36700
,37379
.38033
.38665
,39276
.39679
.40447
.4 i 008
.41555
.42090
,42612
,43122
.43622
.44112
.44592
.30
,4021
.4130
,4233
.4330
,4422
,4510
.4595
,4676
.4735
.4831
.4904
.4976
.5045
.5114
.5180
,5245
.5308
.5370
.5430
.5490
.5548
.35
.4941
.5066
.5184
5296
.5403
.5506
,5604
,5699
,579!
,5880
.5967
.6061
.6133
,6213
.6291
.6367
.6441
,6515
,6586
,6656
,6724
.40
,5961
.6101
,6234
,6361
.6483
,6600
.67 i 3
682 1
.6927
,7029
7129
.7225
,7320
7412
,7502
.7590
.7676
.7781
.7844
,7925
.8005
.45
.7096
.7252
,7400
,7542
.7673
.7810
.7937
.8060
,8179
.8295
,8408
,8517
.8625
.8729
.8832
.8932
.9031
,9127
9222
,9314
.9406
H
.50
.8388
.8540
,8703
.8860
.90 1 2
,9158
,9300
,9437
9570
.9700
,9826
,9950
1 .007
1.019
1.030
1.042
1.053
1 .064
1 .074
1 .085
1 .095
.55
.9808
,9994
1.017
,035
.051
.067
.083
.098
1.113
1.127
1.141
1.155
' 1.169
1,182
1.195
1 .207
1 .220
1.232
1.244
1 .255
1.287
.60
1 , 1 45
1 . 1 66
1.185
1 .204
1,222
1 ,240
1 .257
1.274
1.290
1.306
1.321
1.337
1.351
1 .368
1.380
! .394
1 .408
1.422
1 .435
1 .448
1.461
.65
1,336
1.358
1.379
1 ,400
1,419
1 439
1 ,457
1 .475
1.494
1.511
1.528
1 .545
1.561
i .577
1 ,593
1 .608
1,624
1 ,639
1 .653
1 .668
1.882
.70
1,561
1 ,585
1 ,608
1 ,630
1.651
1.672
1.693
1.713
1,732
1.751
1 .770
1 .788
1 ,806
1 ,824
1.841
1.851
1.875
1.892
1 .908
1.924
1 .940
.80
2.176
2.203
2.229
2 255
2.280
2.305
2.329
2.353
2.376
2,399
2,421
2,443
2.465
2.486
2,507
2.528
2.548
2.568
2.588
2.607
2.626
.90
3.283
3.314
3,345
3.376
3.405
3.435
3.464
3.492
3.520
3.547
3.575
3.601
3.628
3,654
3.679
3.705
3.730
3.754
3.779
3.803
3.827
June 2009
Section 4 - Preparing Data for Analysis
74
-------
Maximum Likelihood
Example Continued
Estimate the corrected sample mean and corrected sample variance to account for
the data below the DL:
x= xd - x(xd - DL)
s2 - s* +
• Let X.,, X2, ..., Xm, ..., Xn represent all the n data values ranked from largest to
smallest: 1752, 1563, 1498, 1477, 1.418, 1.358, 1.327, 1.289, 1.148, 1.060, 1.045,
<1.000, <1.000, <1.000, <1.000, <1.000, <1.000, <1.000, <1.000, <1.000
• The first "m" values represent the data values above the DL, and the remaining "n-m"
data points are those below the detection limit: n = 20, m = 1 1, n-m = 9
• Compute the sample mean and the sample variance from only the "m" above
detection data values: Mean = 1.358 Variance = 0.0524
• The first factor (h): 11/20 = 0.55
• The second factor (v): 0.05247(1.358 - 1.000)2 = 0.409
• The third factor (h,v, Table A-11): 1.113
• Estimate the corrected sample mean and corrected sample variance to account for
the data below the DL: Mean = 1.358- 1.113(1.358- 1) = 0.960 and
variance = 0.0524 + 1.113(1.358 - 1)2 = 0. 195 From material supplied by
Warren and Nussbaum (2009)
Section 4 - Preparing Data for Analysis
-------
Kaplan-Meier
Example
• For this example, the maximum was 1.752, so we can chose 2 (or 3 or 4, it makes no
difference) as the flip point. 1.752 when flipped is 0.248, 1.563 becomes 0.437, etc.
• This method will find a specific probability (denoted as g,) for each X, (the flipped values)
using an "Incremental Survival Probability" (actually through use of a table that must be
constructed).
• The "g," and "X," are combined to estimate the mean and variance:
Mean = J^X, Variance = ZgjX,2 - (Mean)2
• The Mean is then flipped back to the original scale; variance is left as is.
• The computation is summarized on the next slide.
- Col 1: The actual data values (non-detects indicated by a dashed line)
- Col 2: The "flipped data" = 2 minus the actual value
- Col 3: Rank order (the missing ranks belong to non-detects)
- Col 4: b = n-r+1 where n= total (20), r = rank
- Col 5: d = number of observations for this value (1 in this case)
- Col 6: p = (b - d)/b
- Col 7: S = The S from the previous row multiplied by the p for the current row (starts at 1.0000).
E.g., 10th data value: S = 0.5500x 10/11 = 0.500
- Col 8: g = The S from the previous row minus the S for the current row (starts at 1.000).
E.g., 10th data value: g = 0.5000-0.4500 = 0.0500.
• The XjS are the flipped values and the g,s come from the table.
- Mean = 0.05x0.248 + ...+ 0.16875x1.200 = 0.8620 From material supplied by
- Variance = 0.05x0.2482 +...+0.16875x1.2002 - 0.86202 = 0.085 Warren and Nussbaum (2009)
• The true Mean is then 2 - 0.8620 = 1.138 and the variance 0.085
Section 4 - Preparing Data for Analysis
-------
Kaplan-Meier
Example
Data
1.752
1.563
1.498
1.477
1.418
1.358
1.327
1.289
1.148
1.060
1.045
0.977
0.944
0.919
0.897
0.818
<0.800
Flip on 2
0.248
0.437
0.502
0.523
0.582
0.642
0.673
0.711
0.852
0.940
0.955
1.023
1.056
1.081
1.103
1.182
>1.200
rank
1
2
3
4
5
6
7
8
9
10
11
13
14
15
16
17
18
b = n-r+1
20
19
18
17
16
15
14
13
12
11
10
8
7
6
5
4
3
d
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
p=(b-d)/b
19/20
18/19
17/18
16/17
15/16
14/15
13/14
12/13
11/12
10/11
9/11
8/9
7/8
6/7
5/6
4/5
0
S
0.9500
0.9000
0.8500
0.8000
0.7500
0.7000
0.6500
0.6000
0.5500
0.5000
0.4500
0.3938
0.3375
0.2813
0.2250
0.1688
0
g
0.0500
0.0500
0.0500
0.0500
0.0500
0.0500
0.0500
0.0500
0.0500
0.0500
0.0500
0.05625
0.05625
0.05625
0.05625
0.05625
0.16875
June 2009
Section 4 - Preparing Data for Analysis
77
-------
Comparison of Methods
Example
Mean
Var
True
1.108
0.117
Zero DL 1/2 DL
0.747 1.4220.972
0.505 0.0990.302
MLE ROS
0.960 1.197
0.195 0.048
K-M
1.138
0.085
In this example, the easiest methods—substitution with zero, DL, or 14 DL—gives poor
results.
MLE and ROS (not shown in the example) provide fairly good mean and variance values
considering the high non-detect rate (45%) in this example. However, these methods
require significant work to calculate the estimates.
Kaplan-Meier provides reasonable estimates for this example, and works when there are
multiple detection limits. However, this method also requires significant work to calculate
the estimates.
From material supplied by
Warren and Nussbaum (2009)
June 2009
Section 4 - Preparing Data for Analysis
78
-------
References
(1 of 2)
Antweiler R.C. and Taylor H.E. (2008) Evaluation of statistical treatments of left-censored environmental data
using coincident uncensored data sets: I. summary statistics. Environ. Sci. Technol. 42 (10), 3732-3738
(10.1021/es071301c).
Bortnick S.M., Coutant B.W., and Biddle B.M. (2003) Estimate background concentrations for the national-scale
air toxics assessment. Final technical report prepared for the U.S. Environmental Protection Agency,
Research Triangle Park, NC, by Battelle, Columbus, OH, Contract No. 68-D-02-061, Work Assignment 1-
03, June.
Helsel D.R. (2005) More than obvious: better methods for interpreting nondetect data. Environ. Sci. Technol.,
419A-423A, American Chemical Society.
Helsel D.R. (2005) Nondetects and data analysis: statistics for censored environmental data. John Wiley &
Sons, Inc., Hoboken, NJ.
Khalil M.A. and Rasmussen R.A. (1997) The global distribution of atmospheric methyl chloride. Web site of the
Climate Monitoring and Diagnostics Laboratory. Available on the Internet at
Kuhlmann et al. (2003) A model for studies of tropospheric ozone and NMHCs: Model evaluation of ozone-
related species, J. Geophys. Res. 108(023) doi:10.1029/2002JD003348.
Main H.H. and Roberts P.T. (2001) PM25 data analysis workbook. Draft workbook prepared for the U.S.
Environmental Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park,
NC, by Sonoma Technology, Inc., Petaluma, CA, STI-900242-1988-DWB, February.
McCarthy M.C., Hafner H.R., and Montzka S.A. (2006) Background concentrations of 18 air toxics for North
America. J. Air and Waste Manag. Assoc. 56, 3-11 (STI-903550-2589). Available on the Internet at
http://www.awma.org/journal/ShowAbstract.asp?Year=&PaperlD=1509.
Montzka, S.A. et al. (1999) Present and future trends in the atmospheric burden of ozone-depleting halogens.
Nature, 398, 690-694.
Parrish D.D., Trainer M., Young V., Goldan P.O., KusterW.C., Jobson B., T., Fehsenfeld F.C., Lonneman W.A.,
Zika R.D., Farmer C.T., Riemer D.D., and Rodgers M.O. (1998) Internal consistency tests for evaluation of
measurements of anthropogenic hydrocarbons in the troposphere. J. Geophys. Res.-Atmos. 103(017),
June 2009 22339'22359- Section 4 - Preparing Data for Analysis 7g
-------
References
(2 of 2)
Rosenbaum A.S., Axelrad D.A., Woodruff T.J., Wei Y., Ligocki M.P., and Cohen J.P. (1999) National
estimates of outdoor air toxics concentrations, J. Air& Waste Manag. Assoc. 49, 1138-1152,.
Singh H.B. et al. (2001) Evidence from the Pacific troposphere for large global sources of oxygenated organic
compounds, Nature, 410, 1078-1081.
U.S. Environmental Protection Agency (1980) Validation of air monitoring data. Report prepared by the U.S.
Environmental Protection Agency, Research Triangle Park, NC, EPA-600/4-80-030.
U.S. Environmental Protection Agency (1982) Definition and procedure for the determination of the method
detection limit - revision 1.11: Federal Register. Pp. 565-567. To be codified at 40 CFR Part 136,
Appendix B.
U.S. Environmental Protection Agency (1999) Particulate matter (PM25) speciation guidance document.
Available at .
U.S. Environmental Protection Agency (2004) Local Limits Development Guidance Appendices. EPA 833-R-
04-0-02B:, Office of Wastewater Management: Washington, DC.
VIEWS website, http://vista.cira.colostate.edu/views/
Warren, J. and Nussbaum, B. (2009) "Analyzing Datasets Containing Semi-quantitative Values". Course
material. Office of Environmental Information, EPA
Watson J.G., DuBois D.W., DeMandel R., Kaduwela A., Magliano K., McDade C., Mueller P.K., Ranzieri A.,
Roth P.M., and Tanrikulu S. (1998) Aerometric monitoring program plan for the California Regional
PM25/PM10Air Quality Study. Draft report prepared for the California Regional PM10/PM25 Air Quality
Study Technical Committee, California Air Resources Board, Sacramento, CA, by Desert Research
Institute, Reno, NV, DRI Document No. 9801.1D5, December.
Weller et al. (2000) Meridional distribution of hydroperoxides and formaldehyde in the MBL of the Atlantic (48
N-35 S) measured during the Albatross campaign. J. Geophys. Res. 105(011), 14401-14412.
Zhou et al. (1996) Tropospheric formaldehyde concentrations at the Mauna Loa observatory during MLOPEX
2. J. Geophys. Res. 101(D9).
Section 4 - Preparing Data for Analysis
-------
Characterizing Air Toxics
What are the diurnal, seasonal, and spatial characteristics
of air toxics?
What do these characteristics tell us about emission
sources, transport, and chemistry?
June 2009 Section 5 - Characterizing Air Toxics
-------
Characterizing Air Toxics
What3s Covered in This Section
Temporal Patterns
- Diurnal
- Day-of-week
- Seasonal
Spatial Patterns
- Spatial characterization
• National concentration plots for perspective
• Maps
- Variability within and between cities
- Hot and cold spot analysis
- Comparing urban and rural sites
Risk screening
June 2009 Section 5 - Characterizing Air Toxics
-------
Characterizing Air Toxics
Overview
• Spatial and temporal characterizations of air toxics data are the basis
for improving our understanding of emissions and the atmospheric
processes that influence pollutant formation, distribution, and removal.
Goals of these data analyses can include
- Identifying possible important sources of air toxics.
- Determining chemical and physical processes that lead to high air toxics
concentrations.
• Characterization analyses help us develop a conceptual model of
processes affecting air toxics concentrations and also provide an
opportunity to compare data to existing conceptual models to identify
interesting or problematic data. Following are some typical questions
which may be addressed using these types of analyses:
- Where are air toxics concentrations highest or lowest?
- How do pollutant concentrations vary relative to each other - and what does this tell
us about their sources?
- What and where are the air toxics of concern?
- How do urban and rural sites compare?
- How do air toxics concentrations compare to criteria pollutants (e.g., ozone and
PM25)?
- What local or regional sources influence a particular measurement site?
June 2009 Section 5 - Characterizing Air Toxics
-------
Quantifying Patterns
• When investigating temporal patterns, analysts should use statistical measures to
understand if concentrations are statistically different.
• Testing statistical significance using T-test
- The t-test is a very common method for assessing the difference in mean values of two groups
of data (e.g., the difference in means of two years of data).
- This test assumes that both data sets are normally distributed, a fact that is not true for many
air toxics measurements. However, this is not a problem as long as there are sufficient data in
each group (>~100). Each data set is also required to contain the same number of samples.
- If there are fewer than 100 data points per group, a more advanced, non-parametric, test must
be used. Some examples are
• Kruskal-Wallis
• Kolmogorov-Smirnov
• Anderson-Darling (sample sizes of 10 to 40 only).
• Testing statistical significance using notched box plots
- For the national analyses, SYSTAT notched box plots were used as a quick check of statistical
significance between two groups. The notches on a box plot represent the range of the upper
to lower 95th percentile confidence intervals surrounding the median (a full description of
notched box plots can be found in Preparing Data For Analysis, Section 4, of this workbook).
If the notches of two box plots do not overlap, the median concentrations are statistically
significantly different.
- Testing with notched box plots provides significance tests on the median concentration value,
not the mean.
• Most of these statistical methods can be performed with Microsoft Excel or SYSTAT, as well
as many other statistical programs. StatSoft, inc. (2005)
June 2009 Section 5 - Characterizing Air Toxics
-------
Characterizing Temporal Patterns
Motivation
To more fully understand potential contributing air
toxics sources, analysts may also wish to consider:
- Diurnal patterns. How does the daily cycle of air toxics
concentrations relate to emissions and meteorology? Are
diurnal patterns properly reflected in exposure models?
- Day-of-week patterns. Does the weekly cycle of air toxics
concentrations tell us anything about emissions sources?
- Seasonal patterns. Do air toxics concentrations show
seasonal patterns and do these patterns make sense with
respect to what we know about formation, transport, and
removal processes?
Understanding diurnal, day-of-week, and seasonal
patterns may also help analysts understand potential
biases in aggregated data, assess exposure, and
evaluate models.
June 2009 Section 5 - Characterizing Air Toxics
-------
Diurnal Patterns
Overview
• Air toxics data are not routinely collected on a subdaily basis; most
data are reported as 24-hr averages. However, the PAMS program
provides subdaily measurements of nine air toxics: acetaldehyde,
benzene, ethylbenzene, formaldehyde, hexane, toluene, styrene,
xylenes (three isomers), and 2,2,4-trimethylpentane. The diurnal
variation of some air toxics is unknown because of data limitations.
• Subdaily data allow us to:
- Evaluate diurnal variation.
- Understand general atmospheric processes (the physics, chemistry,
and sources of air toxics).
- Assess the performance of models that are attempting to capture
diurnal cycles.
- Provide input to receptor-based models.
• Reasons to understand diurnal patterns include
- Assessing human exposure and health effects.
- Identifying local sources vs. regional transport.
- Contributing to an understanding of the physics and chemistry of air
toxics.
June 2009 Section 5 - Characterizing Air Toxics
-------
Diurnal Patterns
Conceptual Model
Daily concentrations are driven by dispersion (e.g., mixing height), sources (e.g., traffic patterns),
sinks (e.g., oxidation by OH radical), and transport.
Sources and transport from other areas increase concentrations at a monitor site, while sinks and
dispersion reduce concentrations.
The figure shows an example contribution of individual factors that commonly influence diurnal
concentrations. The overall diurnal pattern may be driven by a combination of these factors and
may be conceptually estimated in the following manner:
Concentrations = (Sources - - < • + Transport)/Dispersion
D)
C
'x
o jo
it o
co w
II
11
Solar Radiation
\
\
** ^Dispersion =lnverse Mixing Height
Source = Traffic Activity
012345678
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Hour
June 2009
Section 5 - Characterizing Air Toxics
-------
Diurnal Patterns
Approach a of 3)
For the most valid diurnal patterns, the following data requirements are suggested:
- 75% sampling completeness is recommended for each site, pollutant, and day (1) to ensure that data are
representative of a full day and (2) to provide consistency with completeness requirements used to
construct other aggregates (see Preparing Data for Analysis, Section 4).
- Other completeness criteria (daily, monthly, yearly) may be necessary to aggregate data from multiple
sites, depending on the length of time for which data are available and the objectives of the analysis.
- The percent below detection should be tabulated for each pollutant and year. Initially, all data may be
included regardless of the percent below detection.
- To investigate diurnal patterns, there must be a sufficient number of measurements of each pollutant and
sampling hour to accurately assess the value. In initial national level analyses, a minimum of
10 measurements for each air toxic and hour was set to try to include as many air toxics as possible in the
analysis; more measurements are recommended if they are available.
- Data should be inspected on both a concentration and normalized basis for each available duration.
Normalization enables a comparison of diurnal patterns among sites and pollutants even if pollutant
concentrations vary widely.
- Data are normalized using the average concentration for each individual day, site, duration, and pollutant.
To normalize data,
Calculate the average concentration by date, site, pollutant, and duration.
Divide the corresponding subdaily data by this average.
The resulting normalized values provide an indication of the magnitude of difference of the hourly concentration from the
average concentration for that day. A value of 1 indicates that the hourly concentration value is the same as the daily
average concentration. Values greater than one are greater than the average value (e.g., a value of 2 is 2 times greater
than the average value) while values less than one are lower than the average value (e.g., a value of 0.5 is half as large
as the average value).
June 2009 Section 5 - Characterizing Air Toxics
-------
Diurnal Patterns
Approach (2 of 3)
Subdaily measurements may be made on different sampling schedules which must be taken into
account when aggregating multi-site data.
- Daily sampling schedules may differ between sites. For example, the sampling schedule for 3-hr
measurements could begin at 12 a.m., 1 a.m., or 2 a.m., potentially creating three staggered hourly patterns
among sites.
- A visual representation of the possible 3-hr sampling schedules is shown in the figure below. The data points
represent the sample start-time. The lines between points represent the duration of sample collection (3-hr).
Subsequent sample lines are partitioned by shade for clarity.
- Diurnal analyses can be obscured by the different sample schedules when aggregating multi-site data if the
number of samples for each hour is not the same across all hours. This issue is typically not a problem
within a single agency's network, but needs to be considered when data from different jurisdictions are used
(such as at the national scale). Consider a hypothetical case in which Los Angeles sites used the
2 a.m. sample schedule and the rest of state used the 1 a.m. sample schedule.
- If one considers the first three hours of the day—the sample that begins at 2 a.m. includes all three sampling
schedules (i.e., all three samples overlap). For aggregating data with multiple sampling schedules, we
calculated a weighted average of the hour representing the middle of staggered sampling schedules (i.e.,
2 a.m. sampling schedule for 3-hr duration) from the raw data before completing the next steps.
- A detailed example will be examined in following slides.
Visual Representation of 3-hr Sampling Schedules
Schedule starts at 2am
Schedule starts at 1am
Schedule starts at 12am
y „ A ^^KUKHKUKmmimmmmmmmmmxmmmmimmmim^
f--" A
k ^--T-T-T-T—T-T-T-T-T—r—T-T-T—, -^ ^.
> ^™™
1 1 1 1 1 1 1
D123456789
Hour
•.
iii
10 11 12 13 14
A/nte fhe finnm is arhifmrilv ni
i
15 1
ifnff fit 9 n r
(14) and does not represent the whole day.
June 2009
Section 5 - Characterizing Air Toxics
-------
Diurnal Patterns
Approach oof3)
Summary statistics may be generated by pollutant and hour for the
concentration and normalized data sets.
- It is useful to inspect various parameterizations of the data (e.g., 10th,
50th, and 90th percentiles), especially when more than 50% of data is
below detection.
- Include the standard deviation or confidence interval as a measure of
uncertainty in the data.
Subdaily patterns can be visualizes the using line graphs of
summary statistics with confidence intervals or notched box plots.
June 2009 Section 5 - Characterizing Air Toxics 10
-------
Diurnal Patterns
Effect of Sampling Schedule
[1/
Table 2. Aggregated Measurements
Aggregated
Hour
2
5
8
11
14
23
Weighted Average
Median
Concentration
(ug/m3)
0.738
0.739
0.927
0.580
0.482
0.839
Weighted Average (WA) Formula:
N = Number of Measurements
C = Concentration
Example calculation, aggregated to
2 a.m. sample schedule:
[17(66+66+64)]* [66*0.777+66*0.708+64*0.729]
= 0.738
June 2009
Section 5 - Characterizing Air Toxics
11
-------
Diurnal Patterns
Effect of Sampling Schedule (2 of2)
The figures are a graphical representation of the
calculations performed in the previous slide.
(The data are not the same as those used in the
previous slide.)
Figure (a) shows the 10th, 50th, and 90th
percentile of national 3-hr benzene data. The
noise in this pattern is due to varying amounts of
data available from three sampling schedules
which begin at 12, 1, or 2 a.m. Sampling-
schedule differences are typical when
aggregating 3-hr or 4-hr measurements and can
obscure diurnal patterns.
Figure (b) shows the same data as a weighted
average by the most representative hour.
Averaging clarifies the diurnal pattern showing a
morning peak trend as would be expected for
benzene concentrations at most sites.
This averaging method is recommended when
aggregating multi-site data if multiple sampling
schedules are used.
Benzene 3-hr Subdaily Data
Raw Data (a)
~ 0 5 10
Weighted Average (b)
o
c
o
o
15
20
5 3
O)
re
2
1 -
0
10
15
20
HOUR
Figures show the 10th, 50th and 90th percentile of
national 3-hr benzene data. They were created with
SYSTAT11 and Microsoft Excel.
June 2009
Section 5 - Characterizing Air Toxics
12
-------
Diurnal Patterns
Commonly Observed Patterns
The figure shows a sample of four commonly observed diurnal patterns using national 3-hr
duration data. The sources, sinks, transport, and dispersion leading to each pattern are
discussed in this section. Data were normalized as described in the approach to diurnal patterns.
W
C
_O
'•^
(0
0)
o
c
o
o
0)
N
i_
o
Midday Peak
x- — .
I
•*
—x-
Photo-chemical peak
-~x
Nighttime Peak
Nighttime pe
Morning Peak
/7ot/r pea/c
-- — -A— -- — A- — --
Invariant
012345678
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Hour
June 2009
Section 5 - Characterizing Air Toxics
13
-------
Diurnal Patterns
Morning Peak
Morning peak patterns are observed
from the combination of traffic emissions
and mixing height dilution.
The morning rush hour occurs while
mixing heights are relatively low,
causing a peak in concentration while
emissions outweigh dilution.
By mid-morning, mixing height dilution
has outweighed traffic emissions,
reducing concentrations below their
nighttime value and obscuring the
remaining traffic emission patterns.
Evening concentration increases are a
consequence of mixing height lowering.
0 8 16 24
HOUR
Figure shows notched box plot of m-&p-xylenes
concentrations by hour at an urban site. Box
plots are defined in Preparing Data for Analysis,
Section 4. Several years of data are included.
The plot was created with SYSTAT11.
June 2009
Section 5 - Characterizing Air Toxics
14
-------
Diurnal Patterns
1.8-
C1.6-
0
IVI\
x x
I1-4! 5 • • 'J
c a x - f •
0> 1 oj - x- - . x *,*H x
O '-^B
§ <
0 1
•D
0 _ _
.NO. 8
CD
§0.6
o
0.4
0.2
n
1 B H"*
A o-Xylene
A m & p-Xylene
~ n-Hexane
^Illlliy 1 C7CTA \~>UI>
a
>•
, x
T/?ese VOCs
i it i icii y
X
2.4
are emitted • ° .s s
x
.1
A
i— i
.— | ^S - O3
rt* g by motor vehicles . ~ ? *
• ^ • .5' 2
* A
X A,\ < X
• ° *i
V D * f A
X |" *"
-
Ethylbenzene
• Benzene
•1,3-Butadiene
n ZA
B ^j* X
* I-'1 "
n fn x x
"J 1 *
—
x Isopropylbenzene
n Toluene
x 2,2,4-Trimethylpentane
June 2009
0123456789 1011121314151617181920212223
Hour
This figure shows 1990-2005 national hourly data normalized by site, pollutant, and day for
all pollutants that exhibited a morning peak pattern on the national scale. Data were
normalized as described in the approach to diurnal patterns.
Section 5 - Characterizing Air Toxics
15
-------
Diurnal Patterns
Daytime Peak
The daytime pattern is driven by in
situ secondary photochemical
production mechanisms and
mirrors the pattern of solar
radiation.
- Precursors of afternoon peak
pollutants are typically emitted by
motor vehicle sources and OH
sinks. Afternoon peak pollutants
experience daily dilution patterns in
a manner similar to morning peak
pollutants.
- Secondary production of a pollutant
(such as formaldehyde) must
outweigh all these factors in order to
create the observed pattern.
CO
C
o
"-I—•
CO
"c
0)
o
c
o
o
15
10
0
X
Formaldehyde
X
X
X
0 8 16 24
Hour
The figure shows notched box plots of national
3-hr formaldehyde concentrations by the middle
sampling schedule (as discussed in Slides 8-10).
The figure was created with SYSTAT11.
June 2009
Section 5 - Characterizing Air Toxics
16
-------
1.3
c
.2 1.2
+j
5
+j
c 1 1
-------
Diurnal Patterns
Evening Peak
Mercury vapor is the only air
toxic to exhibit a clear evening
peak pattern in the air toxics
investigated at the national
level. However, data from only
a few sites were available so
this analysis may not be
representative of a national
pattern.
Dilution appears to be the key
factor affecting evening peak
pollutants; emissions and sinks
are likely invariant at the
subdaily level.
Mercury Vapor
Monitoring Locations
Puerto Rico '\j
1.2
c
o
11.1
+J
a)
o
o
O 1
•o
a>
N
75
o 0.9
0.8
Mercury Vapor
012345678
9 1011 12131415161718192021 2223
Hour
1990-2005 national hourly mercury vapor data normalized by site,
pollutant, and day. The figure was created with Microsoft Excel.
June 2009
Section 5 - Characterizing Air Toxics
18
-------
Diurnal Patterns
Invariant
Invariant patterns are
observed for global
background pollutants (i.e.,
pollutant is no longer
emitted).
These pollutants show no
sources or sinks and are
evenly distributed
worldwide so that transport "g
and dilution have no effect ==
on concentration.
1.2
c
g
+j
5
+J
C
0)
o
c
o
O
1.1
E
o0.9
Carbon Tetrachloride
Monitoring Locations
Carbon Tetrachloride
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Hour
The figure shows 1990-2005 national 3-hr carbon tetrachloride data
normalized by site, pollutant, and day. Carbon tetrachloride is the
only pollutant to exhibit an invariant diurnal pattern on the national
scale. The figure was created with Microsoft Excel.
June 2009
Section 5 - Characterizing Air Toxics
19
-------
Diurnal Patterns
Seasonal Differences
12
Seasonal differences may be observed in the diurnal patterns of some air toxics.
For example, the diurnal pattern of formaldehyde on a national scale is highly affected by season, as seen
in Figures a and b, because the main production of formaldehyde depends on sunlight which is less
abundant in winter months; thus, midday production decreases significantly during these months.
The diurnal pattern of benzene
shows less seasonal
dependence because it is
driven by diurnal meteorology -;
that is consistent throughout
the year and benzene is less ? 6
photochemically reactive
(Figures cand d).
Formaldehyde winter
Figures show summary statistics
of national diurnal patterns for
formaldehyde and benzene
partitioned into summer and
winter patterns. Figures were
created with Microsoft Excel.
Formaldehyde summer
June 2009
Section 5 - Characterizing Air Toxics
20
-------
Diurnal Patterns
Summary
Diurnal patterns of air toxics are influenced by sources, sinks, and
dispersion processes that vary on a subdaily basis.
Diurnal patterns are useful in classifying source type, transport, and
reactivity of air toxics. These patterns can be used to improve exposure
modeling, air quality modeling, and emissions inventories.
Most air toxics data typically follow four diurnal patterns although many air
toxics have not been characterized because of sampling and detection
limitations.
- Morning peak. Driven by mobile source emissions and mixing height dilution
- Afternoon peak. Driven by secondary photochemical production
- Nighttime peak. Driven by mixing height dilution
- Invariant: Typical of global background pollutants that are not dependent on
sources, sinks, transport, or dilution.
If the diurnal pattern of a pollutant differs from the typical patterns shown at
a national level, the analyst should explore possible reasons for the
variation such as the presence of a nearby source.
June 2009 Section 5 - Characterizing Air Toxics 21
-------
Day-of-Week Patterns
Overview and Conceptual Model
Day-of-week patterns can be useful in
identifying emissions sources.
Expectations
- Emission sources that operate every day,
24 hours per day (e.g., refineries) will not
show a day-of-week pattern.
- Emission sources with lower emissions on
weekends should lead to lower ambient
weekend concentrations of the emitted air
toxics. Traffic studies (e.g., Chinkin et al.,
2003) show that in many cities, light-duty
vehicle activity is lower on Sunday
compared to other days of the week
(Figure a).
- Emission sources with higher emissions on
weekends should lead to high ambient
weekend concentrations of the emitted air
toxics. For example, studies in the Los
Angeles area showed that recreational
vehicle emissions may be higher on
Saturdays (Figure b).
Los Angeles
14000
Interior Basin
Light-duty vehicles
0:00
5:00
10:00 15:00
Hour
20:00
Chinkin etal., 2003
I Mon-Thurs 0 Friday D Saturday • Sunday
Estimated allocation of
residential emissions
activity by day of week
in Los Angeles
(Coeetal.,2003)
June 2009
Section 5 - Characterizing Air Toxics
BBQ
Rec.
Boats
Rec. Paint/
Off-Rd Solvent
RVs
22
-------
Day-of-Week Patterns
Approach
• Day-of-week patterns are typically constructed from 24-hr averages. See Preparing
Data for Analysis, Section 4, for a complete description of how to construct valid
averages.
- If subdaily data are available, it is sometimes useful to look at data subsets. For example,
when creating day-of-week trends of an air toxic that exhibits morning peak diurnal
patterns, the rush hour peak data subset (i.e., 6 to 9 a.m.) will provide more information
about the mobile source signature than the 24-hr average. Mobile source signatures
typically show day-of-week patterns, while mixing height dilution will occur on any day of
the week. 24-hr averages will be more heavily weighted by mixing height dilution and may
obscure mobile source day-of-week trends.
• A sufficient number of records for each day of the week is needed to create a
representative day-of-week pattern. The actual data requirements will vary
depending on the analysis types and variability of the data, among other factors.
- Statistically, decreasing the sample size increases the confidence interval (Cl). In general,
if the 95% CIs of two data subsets (e.g., weekend vs. weekday concentrations) do not
overlap, there is good evidence that the subset population means are different; therefore, it
will be more difficult to discern statistically significant patterns with smaller sample sizes.
- Quantify patterns using the statistical treatments described earlier in this section.
• Investigate the day-of-week pattern of multiple statistics (e.g., 10th, 50th, and, 90th
percentile) with the standard deviation or confidence intervals as a measure of
uncertainty.
• If data are insufficient for each day to determine a pattern, weekday vs. weekend
patterns may be investigated.
June 2009 Section 5 - Characterizing Air Toxics 23
-------
Day-of-Week Patterns o of2)
Example
In Figure (a), benzene concentrations at an
urban site are statistically significantly lower
on Sunday. The concentrations on
Saturday seem slightly lower, but
differences are not statistically significant.
These results are consistent with our
conceptual model of light-duty vehicle
traffic.
For carbon tetrachloride (Figure b), we
expect concentrations to be the same every
day. The central tendencies of the
concentrations at the same site are
consistent.
The figures show notched box plots of 24-hr concentrations by day
of week at selected sites. They were created with SYSTAT11.
(a)
Benzene
o
5
0.0
June 2009
Section 5 - Characterizing Air Toxics
24
-------
Day-of-Week Patterns
Example
Sometimes, not enough data are
available to determine patterns by
day of week—in some cases, the
data can be combined into weekday
vs. weekend groups.
In the example, benzene
concentrations at an urban site are
lower on weekends than on
weekdays (the difference in medians
is statistically significant). These
findings make sense because of the
urban location of the monitor and
lower motor vehicle emissions on the
weekend compared to weekdays.
The inspection of day-of-week
patterns of all air toxics was not
performed at a national level.
o
I
CD
c
-------
Day-of-Week Patterns
Summary
Typically, mobile source air toxics show the most obvious day-of-
week pattern consistent with traffic patterns. Sunday
concentrations were particularly low for most mobile source air
toxics, a pattern consistent with reduced traffic.
In general, day-of-week patterns can be difficult to discern due to
interference from other sources, sinks, or meteorology.
A low number of samples can obscure underlying patterns.
In exploratory investigations of national-level data, few non-mobile
source air toxics showed a clear day-of-week pattern.
Note that day-of-week patterns are highly dependent on the
proximity of the monitor's site to sources, the emission sources'
schedule, and meteorology (e.g., wind direction); site-level
examinations may provide a better explanation.
June 2009 Section 5 - Characterizing Air Toxics 26
-------
Seasonal Patterns
Overview
Understanding seasonal differences in air toxics
concentrations helps analysts
• Formulate or evaluate a conceptual model of emissions,
formation, removal, and transport of an air toxic.
• Better understand source types.
• Continue to validate data, i.e., do data meet expectations for
seasonal variation?
• Construct and interpret annual averages when a season's data
are missing from the average (e.g., if the data for a winter quarter
are missing, what biases in the annual average can be
expected?).
June 2009 Section 5 - Characterizing Air Toxics 27
-------
Seasonal Patterns
Conceptual Model
Cool season expectations
- Mixing heights are lower in the cold months. Low mixing heights create less air
available for pollutant dispersion which causes higher ambient concentrations.
- Temperatures are lower and sunlight is reduced in cold months. This
combination can lead to a reduction in evaporative emissions (e.g., gasoline)
and reduced photochemistry. Reductions in temperature and sunlight also limit
formation of hydroxyl radicals which efficiently oxidize many air toxics.
- Typically more precipitation occurs during winter months and reduces dust
emissions.
Warm season expectations
- Mixing heights are higher in warm months, allowing more dilution and transport
of air toxics which, in turn, reduces ambient concentrations.
- Higher temperatures and increased sunlight in warm months lead to an increase
in evaporative emissions and photochemistry.
- Conditions are typically drier, producing more dust.
- Wildfire activity can also cause an increase in concentrations of pollutants
emitted in smoke.
June 2009 Section 5 - Characterizing Air Toxics 28
-------
Seasonal Patterns
National Trends
Seasonal patterns observed
at a national level are
shown in the table.
These air toxics were
selected because they were
the ones with sufficient data
for analyses.
- Minimum of three valid
seasonal averages by site
and year
- At least 20 monitoring sites
meeting the above criteria
- Additionally, limited to
pollutants investigated in
diurnal variability and
annual analyses to focus
on similar pollutants.
Most of the VOCs, with the
exceptions of styrene and
isopropylbenzene, are cool
season pollutants as
expected.
We are not sure why
carbon tetrachloride shows
a warm season peak—we
expected it to be invariant.
No obvious data issues
suggested this pattern.
Pollutant Name
1 ,3-Butadiene
n-Hexane
2,2,4-Trimethylpentane
m- & p-Xylene
Tetrachloroethylene
Toluene
o-Xylene
Ethylbenzene
Benzene
Lead TSP
Dichloro methane
Styrene
Isopropylbenzene
Methyl Chloroform
Chloromethane
Carbon Tetrachloride
Nickel TSP
Manganese TSP
Chromium TSP
Acetaldehyde
Propionaldehyde
Chloroform
1,4-Dichlorobenzene
Formaldehyde
Pattern
Cool
Cool
Cool
Cool
Cool
Cool
Cool
Cool
Cool
Cool
Cool
Indeterminate
Indeterminate
Invariant
Warm
Warm
Warm
Warm
Warm
Warm
Warm
Warm
Warm
Warm
Number of
sites
195
159
119
256
137
137
261
262
306
149
187
207
91
89
245
240
44
71
61
163
112
102
97
163
Median CV
0.38
0.30
0.29
0.29
0.29
0.29
0.28
0.28
0.27
0.25
0.25
0.33
0.31
0.12
0.09
0.09
0.20
0.20
0.21
0.21
0.27
0.29
0.32
0.36
Median annual
concentration
(u.g/m3)
0.16
0.88
0.51
1.10
0.26
2.38
0.46
0.42
1.03
0.018
0.44
0.16
0.068
0.15
1.20
0.56
0.0026
0.015
0.0039
1.65
0.28
0.123
0.19
2.75
McCarthy et. al, 2007
June 2009
Section 5 - Characterizing Air Toxics
29
-------
Seasonal Patterns
Approach
• Investigation of seasonal variability patterns using normalized monthly and/or
quarterly averages.
- See Preparing Data for Analysis, Section 4, for a complete description of how to
construct valid monthly and quarterly averages.
- Quarterly averages may be calendar quarters or seasonal quarters depending on
the aim of analyses.
• Keep track of the percentage of data below detection; pollutants and years with
>85% of data below detection result in too much bias to draw conclusions.
• Preferably, inspect monthly data for seasonal patterns if sufficient data are
available.
- Noise in monthly data may be high due to fewer measurements. For this reason,
investigating quarterly (or specific monthly groupings relevant to the site) data in
addition to monthly data can be useful.
- Area-specific seasonal aggregations can be made.
• Normalize the data using the average value for each year, site, and pollutant.
- Calculate an annual average for each year, site, and pollutant.
- Divide the corresponding monthly or quarterly average by the annual average.
• Investigate seasonal patterns of normalized data using notched box plots or
summary statistics with a measure of confidence (e.g., standard deviation or
confidence intervals).
June 2009 Section 5 - Characterizing Air Toxics so
-------
Seasonal Patterns
Using Normalized National-Scale Data
To illustrate the use of
normalized data, consider the
monthly patterns of
propionaldehyde and
formaldehyde, both of which
show concentrations that
appear higher in summer
(Figures a and b).
However, normalized
concentration patterns
(Figures c and d) show that
the monthly pattern of
formaldehyde is more
significant than that of
propionaldehyde.
On a relative basis, Figures c
and d show that concentrations
of formaldehyde are nearly
three times higher in the
summer than in winter.
Propionaldehyde
(b)
Formaldehyde
0 1 2 3 4 5 6 7 8 9 10 11 12 13
d)
6 7 8 9 10 11 12 13
0.0
o 1
2 3 4 5 6 7 8 9 10 11 12 13
MONTH
0 1 2 3 4 5 6 7 8 9 10 11 12 13
MONTH
June 2009
Section 5 - Characterizing Air Toxics
31
-------
Seasonal Patterns
Cool Season Peak
Cool seasonal patterns are generally observed because mixing
heights are lower in winter and the enhanced removal by
photooxidation observed during summer is absent.
Heating-related emissions, such as wood burning, will typically be
higher during winter months, contributing to increased concentrations
of some air toxics.
Benzene and 1,3-butadiene, two mobile source air toxics, show
season peaks on the national scale.
1,3-butadiene
cool
Benzene
Figures show normalized
monthly national concentration
distributions for 2003-2005.
Figures were created with
SYSTAT11.
o 1
2 3 4 5 6 7 8 9 10 11 12 13
MONTH
0 1 2 3 4 5 6 7 8 9 10 11 12 13
MONTH
June 2009
Section 5 - Characterizing Air Toxics
32
-------
Seasonal Patterns
Warm Season Peak
To display a warm peak pattern, summertime
sources (emissions or secondary production)
must significantly outweigh the higher mixing
heights that occur during warm months.
Chloroform emissions from water treatment
processes and swimming pools may be
enhanced during summer months, explaining
the observed pattern.
It has been estimated that 85-95% of
formaldehyde concentrations originate from
secondary photochemical production, which
supports the observed warm season peak
(Grosjean etal., 1983).
2.0
1.5
1.0
0.5
Chloroform
0.0
\ i i i
j i
i i i i i
j_
i i i
0 1 2 3 4 5 6 7 8 9 10 11 12 13
MONTH
Formaldehyde
CD
June 2009
Figures show normalized monthly national concentration
distributions for 2003-2005. Figures were created with SYSTAT11.
Section 5 - Characterizing Air Toxics
T3
CD
N
"ro
o
O 1.0-
0.5-
0.0
0 1
3 4 5 6 7 8 9 10 11 12 13
MONTH
33
-------
Seasonal Patterns
A National Perspective
The figure shows the 10th, 50th,
and 90th percentiles of national
2003-2005 normalized seasonal
concentrations for selected
pollutants by calendar quarter.
Similar plots, such as regional
summaries, can be prepared for
any combination of sites.
Parameters at the top of the figure
show warm season peaks while
those at the bottom show cool
season peaks.
Warm season peaks are likely due
to secondary photochemical
production and dust; it is unclear
why carbon tetrachloride shows a
warm season peak.
Cool season peaks are primarily
due to lower mixing heights in the
winter.
Formaldehyde
Manganese TSP
Acetaldehyde
Carbon Tetrachloride
Die h I oro methane
Lead TSP
Benzene
M_P Xylene
Tetrachloroethylene
1,3-Butadiene
0.2
0.4
Warm
t
1
0.6 0.8 1 1.2 1.4
Normalized Concentration
I
1 st Quarter I0tri-90th Percentile
2nd Quarter 10th-90th Percentile
3rd Quarter 10lh-90th P ercentile
4th Quarter I0th-90th Percentile
50th Percentile
June 2009
Section 5 - Characterizing Air Toxics
Figure created with Grapher.
-------
Seasonal Patterns
Summary
Three seasonal patterns were observed at a national level
- Warm season peak. Photochemical production of secondary air toxics (e.g., formaldehyde
and acetaldehyde) can be important at some sites. Concentrations (e.g., manganese) may
also be high because of dust events and seasonally increased emissions (e.g.,
chloroform).
- Cool season peak. Concentrations can be high because of lower inversions, changes in
emissions through the use of wood-burning or fuel oil for home heating, and reduced
photochemical reactivity.
- Invariant. Invariant seasonal patterns are not commonly observed, but are typical of global
background pollutants that are not affected by emissions changes or dilution which cause
seasonal patterns of other air toxics.
The quality of many air toxics data was low or seasonal patterns inconsistent at the
national level; site level investigations may reveal additional seasonal patterns.
Seasonal patterns assist in air toxics data analysis by providing insight into the
chemistry, sources, and transport of air toxics. Deviation from expected seasonal
patterns at a site may indicate additional sources of interest or transport.
June 2009 Section 5 - Characterizing Air Toxics 35
-------
Spatial Patterns
Overview
Air toxics data are typically collected in urban locations. Given the
large number of air toxics, their often disparate sources, and the
wide range of chemical and physical properties, understanding
spatial patterns and gradients is important.
Understanding these gradients may help us
- Improve monitoring networks, (Are we measuring in the right places to
meet network objectives? Do we have the right number of monitors?)
- Improve emission inventories. (How finely do emissions need to be
spatially allocated?)
- Improve models, including exposure models. (Are gradients in
pollutants being properly represented in the model?)
- Identify contributing sources. (Are concentrations higher when winds
are predominantly from the direction of a source?)
June 2009 Section 5 - Characterizing Air Toxics 36
-------
Spatial Patterns
Conceptual Model
The concentration of a given species at any location is determined by
local production, local sinks, and transport.
• Production. Local emissions—higher emissions lead to higher
concentrations.
• Loss. Local removal (chemical or deposition)—reactive compounds and
large particles are removed faster resulting in lower concentrations.
• Transport. Movement of species in the atmosphere—pollutants from
sources are dispersed or diluted; local concentrations can either increase
or decrease.
c/(Concentration) _ , .. _
— = Production - Loss + Transport
off
June 2009 Section 5 - Characterizing Air Toxics 37
-------
Spatial Patterns
Methods
• To investigate spatial patterns, calculate one site average value for each air toxic for the
time period of interest. This method removes temporal variability and focuses on spatial
patterns.
- The method is only valid if sites are temporally comparable. If not, results may be driven
by a mixture of temporal and spatial patterns and will be difficult to interpret.
- Averages should be constructed from valid aggregates. For example, if data are available
for 2003-2005, you might first calculate the three valid annual averages then aggregate
these averages to one site average. If data are not sufficient to create valid annual
averages use valid seasonal or monthly averages. Note that site average values may be
biased by temporal patterns if data are not representative of the full year. Relative spatial
comparisons are still valid as long as data are available for all sites during the same time
period.
If possible, multiple years of data should be used in order to mitigate meteorological
effects.
Keep track of the percent of data below detection for each site average.
• Visualize concentration ranges by plotting summary statistics for each pollutant.
- These plots give an overview of concentration values.
- Supplementary data, such as levels of concern for increased cancer or noncancer risk
(i.e., health levels of concern), remote background concentrations, and method detection
limits (MDLs), are useful to put concentration data into perspective.
• Visualize site level concentrations using a mapping program to overlay supplementary
data, such as the percent of data below detection, to enrich conclusions.
• The visualization methods may illuminate site-level data anomalies which become
apparent upon comparison to other sites.
June 2009 Section 5 - Characterizing Air Toxics 38
-------
National Concentration Plots
Overview
• To put air toxics concentrations measured at a site or sites in perspective,
a summary of the typical national concentration ranges is useful.
• The following national site average concentrations for 2003-2005 air
toxics concentrations exemplify one way of visualizing summary statistics
and supplementary data.
- Are concentrations high, typical, or low?
- How does this concentration compare to remote background? To MDL? To
levels of concern?
• The following figures show the 5th, 25th, 50th (median), 75th, and 95th
concentration ranges by pollutant; supplementary data are then overlaid
as a progression. Wide ranges in concentration across sites indicate
greater spatial variability of that pollutant.
• The number of sites included are shown on the right axis for each
pollutant.
• Pollutants outlined in red represent <15% of samples nationally above
their respective MDLs. The distribution of concentrations for these
pollutants are mostly based on MDL/2 and should not be considered
quantitative. Data used for these plots is included in Preparing Data for
Analysis, Section 4. A|| perspective p|ots were created in Grapher.
June 2009 Section 5 - Characterizing Air Toxics 39
-------
National Concentration Plots
1 ,1 ,2,2-Tetrachloroelhane
1 ,1 ,2-Trichloroethane
1 ,2-Dichloropropane
1,3-Butadiene
1 ,4-Dichlorobenzene
Acetaldehyde
Acrylonitrile
Benzene
Benzyl Chloride
Carbon Tetrachloride
Ethylene Dibromide
Ethylene Dichloride
Ethylene Oxide
Hexachlorobutadiene
Tetrach loroethy lene
0.0
r~M — i 22s
1 H I 211
\ • 1 229
5% i m ~i 95% 2/8
[^^^•IH 202
1
i
1 • 1 307;
1 III 110}
£
| | | 280
r~n — i ^5
1 1 1 1 253
li 16
III II 153
D01 0.001 0.01 0.1 1 10 100 1000 10000
Concentration (pg/m3)
5th-95th Percentile Range of 2003-2005 Site Average Concentrations
25th-75th Percentile Range of 2003-2005 Site Average Concentrations
Median 2003-2005 Site Average Concentration
Interpretation
Summary plots provide an
overview of the spatial variability
of, and a comparison within and
between, air toxics. Spatial
variability is represented by the
width of the bar—nationally, air
toxics concentrations typically
varied by a factor of 3 to 10.
The figure shows the high spatial
variability of 1,3-butadiene. This
variability is due to the relatively
high reactivity of the compound.
Conversely, carbon tetrachloride
shows less spatial variability due
to its low removal rate from the
atmosphere and the absence of
domestic emissions.
A table of national concentration
summary statistics can be found
in the appendix to Preparing Data
for Analysis, Section 4.
Data outlined in red has < 15% of measurements above detection
June 2009
Section 5 - Characterizing Air Toxics
40
-------
1,1,2,2-Tetrachloroethane
1,1,2-Trichloroethane
1,2-Dichloropropane
1,3-Butadiene
1,4-Dichlorobenzene
Acetaldehyde
Acrylonitrile
Benzene
Benzyl Chloride
Carbon Tetrachloride
Ethylene Dibromide
Ethylene Dichloride
Ethylene Oxide
Hexachlorobutadiene
Tetrachloroethylene
National Concentration Plots
IX
XI
X
X
I
I I XI
1C
•CIJ2L
0.0001 0.001 0.01 0.1 1 10 100
Concentration (|jg/m3)
228
211
229
278
202
163
1241
<
307;
110;
280
235
253
16
153
273
cr
CD
1000 10000
I
X
5th-95th Percentile Range of 2003-2005 Site Average Concentrations
25th-75th Percentile Range of 2003-2005 Site Average Concentrations
Median 2003-2005 Site Average Concentration
Median Site Average MDL
Minimum-Maximum Range of 2003-2005 Site Average MDL
Adding MDLs
MDL ranges (thin lines) and median
MDLs (X's) are added to the plot to
illustrate how well pollutants are
monitored.
The minimum-maximum range of
MDL concentrations and the median
MDL concentration for a 2003-2005
site average are shown.
The median concentration of the
pollutants outlined in red are always
below the median MDL. These
pollutants are not adequately
monitored in the national ambient
monitoring networks (i.e., only a few
sites have >15% of data above
detection).
Data outlined in red has < 15% of measurements above detection
June 2009
Section 5 - Characterizing Air Toxics
41
-------
1.1,2,2-Tetrachloroethane
1,1,2-Trichloroethane
1,2-Dichloropropane
1.3-Butadiene
1,4-Dichlorobenzene
Acetaldehyde
Acrylonitrile
Benzene
Benzyl Chloride
Carbon Tetrachloride
Ethylene Dibromide
Ethylene Dichloride
Ethylene Oxide
Hexachlorobutadiene
Tetrachloroethylene
National Concentration Plots
I
I +1 Ki
0.0001 0.001
0.01 0.1 1
Concentration
228
211
229
278
202
163
124
307
f«
e
280
235
253
16
153
273
10
100
1000 10000
5th-95th Percentile Range of 2003-2005 Site Average Concentrations
25th-75th Percentile Range of 2003-2005 Site Average Concentrations
Median 2003-2005 Site Average Concentration
Median Site Average MDL
Minimum-Maximum Range of 2003-2005 Site Average MDL
1MO-6 Cancer Benchmark (EPA OAQPS)
Noncancer ReferenceConcentration (EPA OAQPS)
Risk Levels
Chronic exposure concentration
associated with a 1-in-a-million cancer
risk (red crosses) and noncancer
reference concentrations (red
diamonds) are added to the plot to
show a relationship to human health.
National measured annual average air
toxics concentrations are usually above
the chronic exposure concentration
associated with a 1-in-a-million cancer
risk and below noncancer reference
concentrations.
Note that the pollutant concentration
ranges outlined in red may actually be
below levels of concern, but the data
are not resolved well enough to
characterize risk.
Data outlined in red has < 15% of measurements above detection
June 2009
Section 5 - Characterizing Air Toxics
42
-------
National Concentration Plots
1,1,2,2-Tetrachloroethane
1.1,2-Trichloroethane
1,2-Dichloropropane
1,3-Butadiene -
1,4-Dichlorobenzene •
Acetaldehyde
Acrylonitrile
Benzene -
Benzyl Chloride
Carbon Tetrachloride
Ethylene Dibromide -
Ethylene Dichloride
Ethylene Oxide
Hexachlorobutadiene
Tetrachloroethylene
0.0001 0.001
0.01 0.1 1 10 100
Concentration (ug/m3)
228
211
229
278
202
163
124
307
""
280
235
253
16
153
273
I
1000 10000
I
X
5th-95th Percentile Range of 2003-2005 Site Average Concentrations
25th-75th Percentile Range of 2003-2005 Site Average Concentrations
Median 2003-2005 Site Average Concentration
Median Site Average MDL
Minimum-Maximum Range of 2003-2005 Site Average MDL
1MO-6 Cancer Benchmark (EPA OAQPS)
Noncancer ReferenceConcentration (EPA OAQPS)
Remote Background Concentration (McCarthy et al., 2006)
Remote Background
Remote background concentrations
(triangles) are added to the plot to
show the lowest levels expected to
be seen in the remote atmosphere;
urban concentrations of most air
toxics should not typically fall below
this value.
As expected, most air toxics are a
factor of 5-10 above their remote
background concentrations, with the
exception of carbon tetrachloride -
the only air toxic dominated by
background concentrations.
Background estimates are provided
for about 40 air toxics (see
Preparing Data for Analysis,
Section 4).
Data outlined in red has < 15% of measurements above detection
June 2009
Section 5 - Characterizing Air Toxics
43
-------
1,1-Dichloroethane
1,4-Dioxane
3-Chloropropene
Bromoform
Dichloromethane
Formaldehyde
Methyl Tert-Butyl Ether
Trichloroethylene
Vinyl Chloride
0,001
National Concentration Plots
9* •
I »
0.01
0.1 1 10 100
Concentration ((jg/m3)
1000
224
14
13
94
c
I
277
CD
163 w
207
268
254
10000
5th-95th Percentile Range of 2003-2005 Site Average Concentrations
25th-75th Percentile Range of 2003-2005 Site Average Concentrations
Median 2003-2005 Site Average Concentration
Median Site Average MDL
Minimum-Maximum Range of 2003-2005 Site Average MDL
+ 1*10-« Cancer Benchmark (EPA OAQPS)
^ Noncancer ReferenceConcentration (EPA OAQPS)
<] Remote Background Concentration (McCarthy et aL, 2006)
Additional VOCs
These VOCs are usually below their
1-in-a-million cancer risk level and
noncancer reference
concentrations.
Note that the 1 -in-a-million cancer
risk level for formaldehyde was
changed in 2004 from 0.08 to 182
ug/m3. 1-in-a-million cancer risk
levels plotted are provided by EPA
OAQPS.
See the NATA website for more
information regarding risk
characterization,
http://www.epa.gov/ttn/atw/nata1999/nsata99.html.
For example, analysts can
investigate the potential for health
effects from air toxics by target
organ/system.
Data outlined in red has < 15% of measurements above detection
June 2009
Section 5 - Characterizing Air Toxics
44
-------
Benzo(A)PyrenePM10
Benzo(B)Fluranthene PM10
Benzo(K)FluoranthenePM10 -
Benzo[A]Anthracene
Benzo[A)Pyrene
Benzo[B] Fluoranthene
Benzo[K]Fluoranthene -
Chrysene
Dibenz(A-H)Anthracene PM10
Dibenzo[A,H)Anthracene -
Indeno[1,2,3-Cd] Pyrene PM10
lndeno[1,2,3-Cd]Pyrene
Naphthalene
National Concentration Plots
SVOCs
ao
nc
=1* i
+
— 1_
p
xa~i
1 IH-
1 la
l^k ^f
+
E>
IXI-I-
•1m —
18
18
18
30
30
30 §
30 |
CO
30 §f
18
30
18
30
39
1E-008 1E-007 1E-006 1E-005 0.0001 0.001
Concentration
0.01 0.1
10
I
X
5th-95th Percentile Range of 2003-2005 Site Average Concentrations
25th-75th Percentile Range of 2003-2005 Site Average Concentrations
Median 2003-2005 Site Average Concentration
Median Site Average MDL
Minimum-Maximum Range of 2003-2005 Site Average MDL
1*10-* Cancer Benchmark (EPA OAQPS)
Noncancer ReferenceConcentration (EPA OAQPS)
The figure indicates that most
SVOCs are below their 1-in-a-
million cancer risk level. However,
the data quality for many SVOCs
is poor—less than 15% of
measurements are above the
detection limit.
Only naphthalene is above its 1-in-
a-million cancer risk level at most
sites.
Routine measurements of SVOCs
are relatively rare across the
United States.
* semi-volatile organic compounds
Data outlined in red has < 15% of measurements above detection
June 2009
Section 5 - Characterizing Air Toxics
45
-------
National Concentration Plots
Arsenic PM2.5
Arsenic PM10
Arsenic TSP
Beryllium PM10
Beryllium TSP -
Cadmium PM2.5
Cadmium PM10
Cadmium TSP
Nickel PM2.5
Nickel PM 10
Nickel TSP
1E-006 1E-005 0.0001 0.001 0.01
Concentration (^g/m.3)
0.1
434
38
82
27
62 j
263 j
(
37 <
105
428
36
101
3
cr
CD
1
5th-95th Percentile Range of 2003-2005 Site Average Concentrations
25th-75th Percentile Range of 2003-2005 Site Average Concentrations
| Median 2003-2005 Site Average Concentration
X Median Site Average MDL
i Minimum-Maximum Range of 2003-2005 Site Average MDL
+ 1*10-s Cancer Benchmark (EPA OAQPS)
+ Noncancer ReferenceConcentration (EPA OAQPS)
<| Remote Background Concentration (McCarthy et al., 2006)
Metals
All metals are well below their
noncancer reference concentrations.
With respect to 1-in-a-million cancer
risk level, arsenic is the most
important of these metals, with more
than 75% of sites measuring
concentrations above the 1-in-a-
million cancer risk level for PM25.
PM25 metals are more commonly
measured in rural and remote
locations via the IMPROVE network;
therefore, the lower range of PM2 5
concentrations commonly overlaps
remote background concentrations.
Only four metals could clearly be
shown in one figure (monitoring data
are available for many more); ranges
for other metals can be found in the
appendix to Preparing Data for
Analysis, Section 4.
Data outlined in red has < 15% of measurements above detection
June 2009
Section 5 - Characterizing Air Toxics
46
-------
National Concentration Plots
Summary
The national concentration plots provide perspective for
local, state, regional, and tribal analysts to see how their
data compare. A full list of the concentrations shown in the
plots is provided in Preparing Data for Analysis, Section 4.
Air toxics concentrations typically vary spatially by a factor
of 3 to 10, depending on the pollutant.
Almost all air toxics are below noncancer reference
concentrations (except acrolein, not shown).
At a national level, some air toxics are above their
respective chronic exposure concentration associated with
a 1-in-a-million cancer risk
(http://www.epa.gov/ttn/atw/toxsource/table1.pdf).
Most air toxics are well above their remote background
concentrations.
June 2009 Section 5 - Characterizing Air Toxics 47
-------
Spatial Patterns - Maps
Overview
• National concentration plots placing air toxics in a national context provide
useful information for quantifying air toxics spatial variability. To view spatial
patterns, though, it is also useful to plot site-level data on a map.
• Example maps of site average and risk-weighted concentrations (i.e., risk
estimates based on ambient measurements) from 2003 through 2005 are
shown in the following slides. These maps help analysts characterize the
national picture of air toxics and are most useful in a qualitative sense to
compare among sites, look for spatial patterns, and note data anomalies. The
maps also illustrate a method of displaying data that can be applied to sites
within a city, state, or region.
• In the examples, concentrations are displayed as proportional symbols which
are color-coded to impart additional information.
• Maps are useful for communicating a range of information—similar depictions
can be made using risk-weighted concentrations, percent change per year, or
ratios—over a range of spatial dimensions (e.g., city, state, or region).
• The volume of concentrations is indicated on the maps by the diameter of the
circle (the three sizes in the map legends) while the underlying percent of data
below detection is signified by color. All maps were created with ESRI's
ArcMap software.
June 2009 Section 5 - Characterizing Air Toxics 48
-------
Spatial Patterns - Maps
Benzene Concentrations 2003-2005
The largest circle on the map
corresponds to 17 |jg/m3.
Concentration (|jg/m )
0.1
O 1
O
10
• < 50% Below Detection
50 to 85% Below Detection
• > 85% Below Detection
The map shows that benzene concentrations have ambient measurements above detection across the
country with only a few exceptions (i.e., 0-50% of the measurements at most sites are below detection).
Concentrations are consistent for areas dominated by mobile sources (e.g., the Northeast and
California) while isolated high concentrations generally coincide with significant point source emissions
of benzene such as refineries and coking operations.
Sites that show unusually high concentrations with no clear emissions sources, or sites with
concentrations that are very different from other sites (e.g., the yellow circles in the map above), might
be further investigated to determine the cause.
June 2009
Section 5 - Characterizing Air Toxics
49
-------
Spatial Patterns - Maps
1,3-Butadiene Concentrations 2003-2005
The largest circle on the map
corresponds to 6.6 |jg/m3.
Concentration
0.1
O 1
o
10
< 50% Below Detection
50 to 85% Below Detection
> 85% Below Detection
The ability to obtain 1,3-butadiene concentration measurements above the MDL across the United
States varies (note all the red circles and their varying sizes).
Higher concentrations generally coincide with locations of known point source emissions.
Differences in monitoring methods and methods application have resulted in large differences in reported
MDLs across the United States.
June 2009
Section 5 - Characterizing Air Toxics
50
-------
Spatial Patterns - Maps
Arsenic PM25 Concentrations 2003-2005
The largest circle on the map
corresponds to 0.0054 |jg/m3
Concentration (|jg/m )
• 0.001
O 0.01
O
1
< 50% Below Detection
50 to 85% Below Detection
> 85% Below Detection
Arsenic concentrations are widely measured across the United States, and the entire range of data
availability is observed from more than 50% of data above detection to less than 15% above detection.
Significant MDL differences between networks make determining spatial patterns difficult.
In general, concentrations are higher and more often above detection in the eastern half of the country.
June 2009
Section 5 - Characterizing Air Toxics
51
-------
Spatial Patterns - Maps
Manganese PM25 Concentrations 2003-2005
The largest circle on the map
corresponds to 0.15 |jg/m3.
Concentration (|jg/m )
• 0.001
O 0.01
0.1
O
< 50% Below Detection
50 to 85% Below Detection
> 85% Below Detection
In contrast to arsenic, manganese concentrations are widely measured across the country with
most data recorded above the detection limit.
Concentrations vary spatially and several "hot spots" can be identified that may lend themselves
to additional investigation at a site level.
June 2009
Section 5 - Characterizing Air Toxics
52
-------
Spatial Patterns - Maps
Benzene Risk-Weighted Concentrations 2003-2005
Note:
2003-2005 average
concentrations are
divided by the 1-in-a-
million cancer risk
concentration.
Circle diameter
represents this ratio
while the chronic risk
assessment is
indicated by color.
Sites at which >85%
of data are below
detection are
considered
unreliable (grey).
Risk-weighted
Concentration
1
O 10
o
100
<1 in a million
1 to 10 in a million
10 to 100 in a million
>100 in a million
Unreliable
Benzene risk associated with measured ambient concentrations is almost always above the 1-in-a-
million cancer risk level across the United States. Many areas are also above the 10-in-a-million
cancer risk. These results are in good agreement with NATA 1999 results. The highest risk estimates
are located in areas with significant point source benzene emissions.
June 2009
Section 5 - Characterizing Air Toxics
53
-------
Spatial Patterns - Maps
1,3-Butadiene Risk-Weighted Concentrations 2003-2005
Risk-weighted
Concentration
1
O 10
O
100
<1 in a million
1 to 10 in a million
10 to 100 in a million
>100 in a million
Unreliable
Where measured reliably, 1,3-butadiene concentrations are almost always above the 1-in-a-
million cancer risk level. Some areas do not measure concentrations well enough to evaluate risk
(grey symbols). Highest concentrations are located in areas with known point source emissions
(e.g., Houston and Louisville).
June 2009
Section 5 - Characterizing Air Toxics
54
-------
Variability Within and Between Cities
Overview
• A topic of interest for air toxics data analysis is assessing variability in
concentration from site to site within a city. The aim of such analysis is to
understand how representative a given site is with respect to air toxics
concentrations in a city.
- What is the variability of air toxics concentrations within cities and what are the
implications for aggregating data at the city level?
- Where do sites need to be located to accurately characterize variability within a
city?
- How many sites are needed to characterize spatial variability within a city?
- How does within-city variability differ across cities?
• There may also be interest in assessing variability in air toxics from city to
city.
- What are the concentration distributions across all monitoring sites?
- Do specific cities, states, or regions have demonstrably higher or lower
concentrations?
- Do demonstrably lower concentrations occur at rural and remote sites?
- Are concentration differences associated with monitoring agency differences?
June 2009 Section 5 - Characterizing Air Toxics 55
-------
Variability Within and Between Cities
Approach
• To investigate within-city variation, a city of interest should have multiple
monitors. For example, for a national trend analysis, EPA required a city
to have at least four monitors to be included in analysis.
• Valid annual averages are calculated for each monitor in a city. To
reduce noise from year-to-year changes (e.g., the effect of meteorology),
it is best to use multiple years of data when available. The national study
used 2003-2005 data.
• Data can be visualized using notched box plots by air toxic, city, and year.
If variation between years at a given city is minor, notched box plots by air
toxic and city only can be constructed to increase the amount of data.
• Advanced Plotting Techniques
• Include a color-coded measure of the percent of data below detection to
understand the reliability of the data.
• Divide annual averages by the chronic exposure concentration associated with
a 1-in-a-million cancer risk (or other risk level) to show variation in risk
estimates within and between cities.
• Include a measure of relevant emissions by city to explain possible reasons for
high or low concentrations.
June 2009 Section 5 - Characterizing Air Toxics 56
-------
Variability Within and Between Cities
Example
In the example, risk estimates have
been used to provide a secondary
layer of information.
A single box in the figure contains
one annual average for each
monitor within the city; thus, each
box represents intra-city
concentration variation.
The variability between cities is
also represented by including
multiple cities on the same plot.
The within-city spatial variability of
1,3-butadiene is usually less than a
factor of 8 for the cities in the figure.
1,3-butadiene variability between
cities, however, can be greater than
an order of magnitude.
Emissions from major sources at a
county level are generally higher for
the cities with greater within-city
variability and higher concentrations,
but there are exceptions that could
be explored.
c/j
(U
o
c
CD
O
E
CD
£=
O
"CD
(U
o
c
o
o
1,3-Butadiene Variability within and Between Cities
100
10
0.1
0.01
1000000g
100000 I
10000 I
1000
100
10
1
0.1
0.01
The figure shows benzene risk-weighted (1-in-a-million) annual average
variation for 2003-2005 for selected U.S. cities along with non-mobile
emissions. Notched boxes include annual averages for each monitor within a
city, providing within-city variation. Dots over the notched boxes show the
individual data points and whether they are above (blue) or below (red) the
average MDL. Bars show county-level non-mobile emissions of 1,3-butadiene
from EPA's AirData. The figure was created with SYSTAT11.
CD
m
3
U)
o
^
C/)
June 2009
Section 5 - Characterizing Air Toxics
-------
Variability Within and Between Cities
National Perspective
At a national level, spatial variability within cities was found to be
pollutant- (or pollutant group-) specific.
Most toxic measurements are highly variable within cities; risk
values span an order of magnitude within some cities.
The spatial variability between cities is a good metric to estimate
the variability within cities a priori. Spatial variability analysis helps
set expectations for sampling in a new city.
Cities with point source emissions (e.g., Houston) showed higher
within-city variability than those dominated by area/mobile sources
(e.g., Los Angeles).
Some of the observed variability is due to differences in
sampling/analysis method and method detection limit.
June 2009 Section 5 - Characterizing Air Toxics 58
-------
Hot and Cold Spot Analysis
Overview
Hot and cold spot analysis is an investigation of sites
with the highest and lowest concentrations.
The objective of this analysis includes:
- Data validation. The highest and lowest values may be due to
some type of error, possibly reporting.
- Comparison to the spatial conceptual model. Are the highest
concentrations consistent with known sources, transport, and
dispersion?
- Risk screening. Where are the toxic concentrations highest?
June 2009 Section 5 - Characterizing Air Toxics 59
-------
Hot and Cold Spot Analysis
Approach
• Create valid annual averages (see Preparing Data, Section 4) for
each site and pollutant and rank each site by its concentration
(highest to lowest). The number of high- and low-ranked
concentration sites investigated depends on the number of available
sites. At a national level, the 10 highest and 10 lowest ranking sites
were investigated to illustrate the approach.
• Map all sites, marking the highest and lowest ranked sites to
investigate spatial variation.
• Identify why high or low concentrations occur at those sites and
whether the occurrence of those concentrations meets expectations.
- Review metadata about the sites (e.g., Google Earth images, local
emissions, and meteorology). Do concentrations meet spatial
conceptual models with respect to scale, sources, transport, and
dispersion?
- Inspect time series of concentration and MDL (e.g., is the value stuck,
are data outliers driving the average, is the MDL higher than the
concentrations at an average site?).
June 2009 Section 5 - Characterizing Air Toxics eo
-------
Hot and Cold Spot Analysis
Example - Benzene (1 of 2)
10 Highest Sites
10 Lowest Sites
O Other Sites
Alaska
The figure shows sites with the 10 highest and 10 lowest benzene concentrations based on 2003-2005
annual averages. Other monitoring sites are shown in yellow. The sites ranked lowest were either a
result of data reporting or siting issues or were located in rural areas, consistent with our conceptual
model of low concentrations.
June 2009
Section 5 - Characterizing Air Toxics
61
-------
Hot and Cold Spot Analysis
Example - Benzene (2 of 2)
The sites measuring the
highest concentrations in
the nation were dominated
by nearby point source
emissions; the site
identified in the figure
measured the second
highest benzene
concentration in the
nation.
This site is very close to
two refineries that emit a
significant amount of
benzene each year
according to the NEI.
. -v...
- ---
Google Earth image of the site with the second highest benzene
concentrations in the United States. Refineries to the right and left emitted
84,000 and 44,000 Ibs of benzene in 2004 (NEI).
June 2009
Section 5 - Characterizing Air Toxics
62
-------
Hot and Cold Spot Analysis
Example - Arsenic PM2 5
10 Highest Sites
10 Lowest Sites
Other Sites
Alaska
The figure shows sites with the 10 highest and 10 lowest arsenic PM25 concentrations based on 2003-2005
annual averages. Other monitoring sites are shown in yellow. Conceptually, we would expect Arsenic PM2 5
concentrations to be highest in locations dominated by point source emissions, especially smelting and coal
combustion. The highest sites are consistent with this conceptual model. The lowest sites are located in
extremely remote locations such as Alaska and US national parks which is reasonable for the lowest arsenic
PM25 concentrations.
June 2009
Section 5 - Characterizing Air Toxics
63
-------
Urban vs. Rural Analysis
Overview
• Measured concentrations can be highly dependent on individual monitor
locations, geography, emissions sources, and meteorological conditions
(e.g., prevailing winds).
• Urban areas - conceptual model
- Urban areas contain sources of air toxics that result in increased concentrations
and, in some cases, "hot spots" (areas with disproportionately higher
concentrations) in the spatial pattern.
- Urban concentrations vary greatly from day to day due to the mix of local
sources and meteorology.
• Rural areas - conceptual model
- Rural areas typically have fewer sources of air toxics. Air toxics concentrations
that are transported from urban locations are typically near background levels
when they reach rural areas (a function of source strength, distance, and the
lifetime of the pollutant).
- Concentrations do not vary consistently day to day. Daily and seasonal patterns
that are dependent on meteorological conditions may still be observed.
• Urban and rural sites that do not meet the expectations of conceptual
models may indicate monitoring location effects or data errors or problems
with the conceptual model.
June 2009 Section 5 - Characterizing Air Toxics 64
-------
Urban vs. Rural Analysis
Approach
• Characterize each site as urban or rural.
- If available, start with EPA urban/rural designations as listed in AQS (note that these designations are not
always up to date)
- Verify the designations using Google Earth—they may be outdated or incorrect
- Be wary of defining a site using population density, total county population, or other metrics—local knowledge
of the site appears to be the best way to identify site characteristics.
• Identify pollutant availability and time period for each site.
- The goal is to have a spatially representative mix of urban and rural sites measuring a pollutant over the same
time period. This mix can be a challenge since toxics are more commonly measured in urban locations.
• Choose pollutant/site combinations that are spatially and temporally representative.
- Pollutant-specific monitoring time periods need to be the same for site comparison; otherwise differences in
observed concentrations could be biased by seasonal or inter-annual patterns.
• Estimate valid 24-hr averages for the sites, pollutants, and time periods of interest.
- Characterize all concentration averages that are below the associated average MDL
• Visualize the data by site by preparing plots of data distributions, including some measure of the
data below detection. Look for differences in concentrations.
• Identify statistically significant differences in urban vs. rural site concentrations.
• Summarize the results with a focus on neighboring urban vs. rural sites.
- Which urban and rural sites measured significantly higher or significantly lower concentrations, if either?
Which showed no difference?
• Investigate data that do not meet expectations (e.g., concentrations as a rural may be significantly
higher than those at a nearby urban site).
- Are the sites representative of the area (i.e., compare to other urban or rural sites)?
- Are there monitor location abnormalities (e.g. local terrain, prevailing winds)?
- Are there measurement methods or MDL differences between the sites?
- Is there a significant rural emissions source?
- Are possible data errors or outliers driving the trend?
June 2009 Section 5 - Characterizing Air Toxics 65
-------
Urban vs. Rural Analysis
Example - Investigating Urban vs. Rural Sites (1 of 2)
When beginning an urban vs. rural analysis, it is important to verify that sites are properly
designated "urban" or "rural". This example is qualitative.
The pictures below show a map of urban and rural NATTS sites across the United States along
with Google Earth pictures of two of the rural sites—Grand Junction, Colorado, and La Grande
Oregon.
Both sites are designated as rural in AQS, but the Colorado site appears quite urban in
character, and it is likely that air toxics concentrations will not conform to the model for a rural
site.
The Oregon site, on the other hand, is rural-based on the observation that the surrounding area
is mainly farmland.
Grand Junction, CO La Grande, OR
NATTS Sites-2006
• Urban Site
A Rural Sites
•Urban Sites
•E Providfiace. El
•Boston 'T-oxtarv). MA
•HH»Tnk.NY
•RochesE!. NT
•U'ailm^oa, DC
•Decanir. GA
•Turps, i L
Csimt. MI
•ChicasB. IL
•KoustotCDeetPiiit).
IX
•^-. Lotii.MO
•BauaifjLUT
•San Jose. CA
•KUEDZ. AZ
•SeMieWA
•Rural
•Vad^idll \T
•Hward, KY
•CbestHf eld, SC
•MnynUe. U1
•toji Jmctsa, CO
•Li Grande. OR.
•HaziiajL CounK. TX
Two rural sites in the NATTS network. Images obtained from Google Earth.
June 2009
Section 5 - Characterizing Air Toxics
-------
Urban vs. Rural Analysis
Example - Investigating Urban vs. Rural Sites (2 of 2)
The figure shows benzene concentrations at a rural Vermont site compared to
concentrations at two urban northeastern sites.
The rural site shows statistically significantly lower concentrations.
If a site does not fit an urban or rural definition as expected, check for
- Measurement method or MDL differences
- Local emissions sources
- Time series comparing the two sites
with color-coded data below detection.
- Evaluate data subsets when both sites
have measurements above detection.
Does this tell a different story?
Benzene
Blue = above MDL
low
100.000
E
--^
D
10.000
1.000
g
E 0.100
(U
0.010
0.001
RURAL VT URBAN MA URBAN Rl
The example figure is from an analysis of NATTS sites using 2003-
2005, 24-hr average, benzene data. The box plots encompass all
data while the overlaid dot density shows each data point and whether
it is above or below detection (blue vs. red). It was produced in
SYSTAT11.
June 2009
Section 5 - Characterizing Air Toxics
67
-------
Spatial Patterns
Summary
• Analyses described in this section provide information about a variety of
aspects of air toxics spatial variability and help analysts evaluate multiple
conceptual models.
• Spatial patterns can provide information about sources, sinks, transport,
and dispersion which are of interest for air toxics analyses.
• At a national level, the following spatial patterns were observed for air
toxics.
- Benzene, 1,3-butadiene
• Concentrations vary around the United States and are high in urban areas. The
highest concentrations of these two air toxics, however, are found in areas influenced
by point source emissions in addition to mobile sources.
• Within- and between-city variability is generally near a factor of 5.
- Carbonyl compounds
• Carbonyl compounds are measured widely and show very consistent concentrations
across the nation. This is due to the dominant secondary formation mechanism.
• Within and between-city variability is relatively low with few exceptions.
- PM2 5 metals
• The spatial character of PM2 5 metals is difficult to determine due to differences in
measurement methods and MDLs among monitoring networks.
• Overall it seems that concentrations are slightly higher in the eastern half of the United
States.
June 2009 Section 5 - Characterizing Air Toxics 68
-------
Risk Screening
Overview
A key use of air toxics data is to compare annual
average concentrations to health thresholds to put
ambient levels into context.
Risk screening can help identify air toxics of concern.
Information to consider in conducting a risk screening is
available, for example, in "A Preliminary Risk-Based
Screening Approach for Air Toxics Monitoring Data
Sets",
For information on a more thorough air toxics risk
assessment, see the Air Toxics Risk Assessment
Library:
June 2009 Section 5 - Characterizing Air Toxics 69
-------
Risk Screening
Approach
Is 85% of data for this
site-pollutant below MDL?
For this first level of screening, site average concentration data from the
most recent year (s) (e.g., 2003-2005) were used to identify the number of
sites at which a pollutant was definitively above or below the relevant EPA
OAQPS chronic exposure concentration associated with a 1-in-a-million
cancer risk as found at: http://www.epa.gov/ttn/atw/toxsource/summary.html.
Results are ranked by screening level.
Air toxics were also noted if most
site concentrations could not be
characterized as above or below
the relevant risk level with certainty.
The figure shows steps
through a decision tree for
performing risk
screening.
Yes
Is level of concern
above MDL?
Is site-average
concentration above
level of concern?
Yes
Yes
The % of data below MDL
listed in the first box
may need to be stricter or
less strict to meet your DQOs.
Pollutant
concentration is
below health
level of concern
Site-pollutant is
uncertain
Upper limit
of risk
lxlO-6
Pollutant
concentration is
above health level
of concern
Pollutant
concentration
is below health
level of
concern
Risk
>lxlO-6
Risk
-------
Risk Screening
Example
Decreasing risk
Concentrations above l-in-100,000
cancer risk level at >25% of sites
Concentrations above l-in-1,000,000 cancer
risk level at >50% of sites
Concentrations above l-in-1,000,000 cancer
risk level at 10-50% of sites
Benzene
Acrylonitrile '
Arsenic (PM2 5 and PM10)
Acetaldehyde ''
Carbon tetrachloride
1,3-Butadiene
Nickel • I''- i •,-•'••!
Chromium > M' MI -.1 ;'"• \ i : .,-. r
Tetrachl oroethyl ene
Cadmium (PM10 and TSP)
Naphthalene
1,4-Dichlorobenzene
Benzyl Chloride
This table displays only pollutants whose concentrations were monitored well
enough to support a conclusion that they were above the relevant health levels of
concern for pollutants for which at least 20 monitoring sites existed in the United
States from 2003-2005.
We are confident these cancer-risk pollutants are at or exceed the categories of
cancer risk (i.e., may be higher, but are not lower)
June 2009
Section 5 - Characterizing Air Toxics
71
-------
Risk Screening
Summary
Risk screening results at a national level are provided in the following table.
At a regional, state, or local level, results may differ. This table provides a
context for comparing local results.
Higher confidence -
chronic cancer risk
(ordered by importance)
Lower confidence -
chronic cancer risk
(ordered by importance)
High confidence -
chronic and acute
noncancer hazard
Benzene
Acrylonitrile
Arsenic
Acetaldehyde2
Carbon tetrachloride
1,3-Butadiene
Nickel*
Chromium3
Tetrachloroethene
Naphthalene
Cadmium
1,4-Dichlorobenzene
Benzyl chloride
Ethylene dibromide
1,1,2,2-tetrachloroethane
1,2-dibromo-3-chloropropane
Ethylene oxide
Ethylene dichloride
Hexachlorobutadiene
1,2-dichloropropane
1,1,2-trichloroethane
Vinyl chloride
Trichloroethylene
Benzo[A]pyrene
Dibenzo[A,H]anthracene
3-Chloropropene
Acrolein
Local chronic hazard
Formaldehyde
Manganese
Acrylonitrile
1,3-Butadiene
Nickel
June 2009
Section 5 - Characterizing Air Toxics
72
-------
Summary
(1 of 2)
Check List for Ways to Characterize Air Toxics
Temporal Characterization
The general procedure for investigating temporal patterns
is the same for all aggregates.
- Prepare valid concentration and normalized temporal aggregates and
summary statistics.
• Normalization allows comparison between sites and pollutants even if
absolute concentration values vary widely.
• Keep track of the amount of data below detection.
- Plot data with notched box plots or line graphs of multiple statistics
(e.g., mean vs. 90th and 10th percentiles) with confidence intervals.
- Characterize patterns by pollutant
• Do patterns fit your conceptual model?
• Are they statistically significant?
- Investigate unexpected results
Diurnal patterns - If alternate sampling schedules are
used, calculate the weighted average by the most
representative sampling hour; otherwise, diurnal patterns
may be obscured.
Day-of-week patterns - Examine data availability by day-
of-week.
- If sufficient data exist for each day of the week, examine day-of-week
patterns.
- If insufficient data exist, weekday vs. weekend groupings can be used.
Seasonal patterns - Aggregate to the monthly level if
sufficient data exist. Use quarterly averages if data are
not sufficient or monthly patterns are too noisy.
Compare what you have learned from the different
temporal aggregates. Do conclusions make sense in the
larger temporal picture?
For example, the diurnal pattern of formaldehyde suggests that
concentrations are highly dependant on sunlight. This dependency is
confirmed by the seasonal pattern, which shows higher concentrations in
summer (i.e., more sunlight.
Spatial Characterization
General spatial patterns
- Create site level average values by pollutant for the time period of interest.
Make sure data are temporally comparable at all sites.
- Investigate spatial variability by calculating and graphing summary
statistics of the site averages. The results provide overview information
about the magnitude of spatial variation.
- Visualize spatial variability by creating maps of the site-level average
concentrations.
• Results will provide more specific information about the spatial gradients of air
toxics.
• Including supplementary data such as MDLs, remote background
concentrations, and cancer and noncancer risk levels provides a framework
for the observed concentrations.
Within- and between-city variation
- Calculate valid annual averages for each site within a city that has more
than one monitor.
- Create notched box plots of annual averages by city.
• Each box will contain one point for each monitor, so the box will indicate
within-city variability.
• Including multiple cities on one plot will provide a comparison of between city
variability.
Hot and cold spot analysis
- Calculate valid annual averages for each site.
- Rank the averages in order of concentration.
- Using maps, compare sites with highest and lowest concentrations to all
sites.
- Investigate data and metadata for the sites with highest and lowest
concentrations. Do concentrations make sense based on the metadata
and conceptual models?
Urban vs. rural site analysis
- Verify the EPA urban/rural designation of each site using Google Earth.
- Identify pollutant data availability and time period.
- Create a data set of pollutant/site combinations that are spatially and
temporally representative.
- Plot valid 24-hr average data as a notched box plots for neighboring urban
and rural sites.
- Summarize the results and investigate sites that do not meet the
conceptual model of an urban or rural site.
June 2009
Section 5 - Characterizing Air Toxics
73
-------
Summary
Check List for Characterizing Air Toxics
Risk Screening
Create valid site average concentration data
for the most recent years.
Calculate the percent of sites above the
selected risk level and the percent of data
below detection.
Follow the risk screening decision tree to
identify the exposure risk for each pollutant.
More advanced risk analyses should be
performed by risk assessment professionals.
A Final Note on
Data Below Detection
Most air toxics have enough data below
detection to cause uncertainties and/or biases in
aggregated data if not handled properly.
Note, however, that it is not valid to remove
these data because they are representative of
true values on the lower end of the concentration
spectrum; removal would cause even more
significant positive biases.
It is always important to know the amount of
data below detection when looking at any data
set. The effects of data below detection should
be considered in all analyses.
In national analyses, we did not draw
conclusions when more than 85% of the
measurements of a pollutant was below
detection.
June 2009
Section 5 - Characterizing Air Toxics
74
-------
Resources
Statistical
- StatSoft: Background on a variety of statistics
- NIST Engineering Statistics: Background on a variety of statistics
- SYSTAT: A graphical and statistical tool
- Minitab: A graphical and statistical tool
Emissions
- EPA AirData: Air toxics emissions reports to the county level
- National Emissions Inventory 2002: Emissions inventory for the United
States; some Canada and Mexico data also available.
- EPA Toxics Release Inventory (TRI): A variety of emissions data sets
June 2009 Section 5 - Characterizing Air Toxics 75
-------
References
Bortnick S.M. and Stetzer S.L. (2002) Sources of variability in ambient air toxics monitoring data. Atmos. Environ. 36, 1783-1791 (11).
Demerjian K.L. (2000) A review of national monitoring networks in North America. Atmos. Environ. 34,1861-1884.
Fortin T.J., Howard B.J., Parrish D.D., Goldan P.O., Kuster W.C., Atlas E.L., and Harley R.A. (2005) Temporal changes in U.S. benzene
emissions inferred from atmospheric measurements. Environ. Sci. Technol. 39,1403-1408 (6).
Grosjean D., Swanson R.D., and Ellis C. (1983) Carbonyls in Los Angeles air-contribution of direct emissions and photochemistry. Sci. Total
Environ. 29,65-85(1-2).
Grosjean D. (1982) Formaldehyde and other carbonyls in Los-Angeles ambient air. Environ. Sci. Technol. 16, 254-262 (5).
Hafner H.R. and McCarthy M.C. (2004) Phase III air toxics data analysis workbook. Workbook prepared for the Lake Michigan Air Directors
Consortium, Des Plaines, IL, by Sonoma Technology, Inc., Petaluma, CA, STI-903553-2592-WB, August.
Herrington, J.S., Fan, Z., Lioy P.J., and Zhang, J. (2007) Low acetaldehyde collection efficiencies for 24-hour sampling with
2,4-dinitrophenylhydrazine (DNPH)-coated solid sorbents. Environ. Sci. Technol. 41 (2), 580 -585.
Kao A.S. (1994) Formation and removal reactions of hazardous air-pollutants. J. Air & Waste Manag. Assoc. 44, 683-696 (5).
Main H.H., Roberts P.T., and Reiss R. (1998) Analysis of photochemical assessment monitoring station (PAMS) data to evaluate a reformulated
gasoline (RFC) effect. Final report prepared for the U.S. Environmental Protection Agency, Office of Mobile Sources, Fuels and Energy
Division, Washington, DC, by Sonoma Technology, Inc., Santa Rosa, CA, STI-997350-1774-FR2, April. Available on the Internet at
.
Main H.H. and Bortnick S. (2002) Temporal variability in ambient air toxics: implications to monitoring network design. Presentation at the
Coordinating Research Council (CRC) Air Toxics Modeling Workshop, Houston, TX, February 26-27 (STI-2153).
McCarthy M.C., Hafner H.R., Chinkin L.R., Touma J.S., and Cox W.M. (2005) Temporal variability of selected air toxics: a national perspective.
Prepared for the United States Environmental Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park,
NC, and Sonoma Technology, Inc., Petaluma, CA. Available on the Internet at last accessed
September 2, 2005.
McCarthy M.C., Hafner H.R., Chinkin L.R., and Charrier J.G. (2007) Temporal variability of selected air toxics in the United States. Atmos.
Environ., doi:10.1016/j.atmosenv.2007.1005.1037 (STI-2894).
McCulloch A. (2003) Chloroform in the environment: occurrence, sources, sinks and effects. Chemosphere 50, 1291-1308 (10).
Seinfeld J.H. and Pandis S.N. (1998) Atmospheric chemistry and physics: from air pollution to global change, J. Wiley and Sons, Inc., New
York, New York.
Singh H.B., Salas L., Viezee W., Sitton B., and Ferek R. (1992) Measurement of volatile organic-chemicals at selected sites in California.
Atmos. Environ. Part a-General Topics 26, 2929-2946 (16).
Spicer C.W., Buxton B.E., Holdren M.W., Smith D.L., Kelly T.J., Rust S.W., Pate A.D., Sverdrup G.M., and Chuang J.C. (1996) Variability of
hazardous air pollutants in an urban area. Atmos. Environ. 30, 3443-3456 (20).
US EPA Toxics Release Inventory (TRI) Explorer. Available on the internet at MM ; ,,v , :r-i i / mf x» i-n • i /H i n:ii;' v.
US EPA National Air Toxics Trends Stations. Available on the internet at ihi v ro,i ;nv i; ••,,. i bfh, -\ .
Vardoulakis S., Gonzalez-Flesca N., Fisher B.E.A., and Pericleous K. (2005) Spatial variability of air pollution in the vicinity of a permanent
monitoring station in central Paris. Atmos. Environ. 39, 2725-2736 (DOI:10.1016/j.atmosenv.2004.05.067).
StatSoft, Inc. (2005) The Statistics Homepage. Available in the Internet at no v v sni ;i ;..;- -.in/,;,c.-A<:,i;,:r-,• r i .
June 2009 Section 5 - Characterizing Air Toxics 76
-------
Quantifying and Interpreting
Trends in Air Toxics
Are air toxics concentrations changing?
Are the ambient concentration changes in response
to changes in emissions?
June 2009
Section 6 - Quantifying Trends
-------
Trends in Air Toxics
What3s Covered in This Section
• This section focuses on trends in ambient air toxics over time; diurnal
and seasonal trends are discussed in Characterizing Air Toxics,
Section 5.
• The following topics are addressed in this section:
- Quantifying Trends
• Overview of trends analysis
• Setting up the data for trend analyses
• Effect of changes in MDL on trends
• Summarizing trends
• Discerning and quantifying trends
- Quantifying Trends
- Visualizing Trends
• Aggregating trends to larger spatial areas
- Interpreting Trends
• Evaluating annual trends in the context of control programs
• Adjusting trends for meteorology (introductory)
June 2009 Section 6 - Quantifying Trends
-------
Trends Overview
Motivation
Assessing trends is useful. Monitoring data are needed to track air toxics concentrations and their changes over time.
One of the major programmatic objectives for air toxics measurements is providing data to track progress toward
emission and risk-reduction goals. The ability to detect trends in ambient concentrations that are associated with planned
air quality control efforts is needed to assess the effectiveness of emission control programs. For example, if specific
control strategies have been implemented in an area to reduce emissions of tetrachloroethylene from dry cleaners, do
the ambient data indicate that concentrations have decreased since the implementation of the control?
Visual inspection of trends is important. Air quality data typically do not fit a normal distribution. The data tend to be
skewed and exhibit a few high concentration events. Thus, trends in extreme values in a data set may differ significantly
from trends observed in a statistic that describes the bulk of the data. Different statistical metrics can be examined to
look for trends. For example, the annual maximum pollutant concentrations can be plotted to assess how annual peak
days are changing over time, or the median concentrations can be plotted to assess how the 50th percentile of the days
are changing. In addition, to assess a trend in air quality, representative data are required to estimate a trend that is
meaningful.
Understanding the data uncertainties is necessary. Uncertainties impact our ability to clearly discern air quality
trends and distinguish between "real" changes and artifacts. For example, measurement accuracy, interferences, and
the amount of data above method detection limits, need to be understood to properly interpret the data.
Obtaining consensus (or weight of evidence) among results from different approaches increases our certainty
in the observed trends. Quantifying and interpreting trends can be complicated (e.g., there are many different
methods). The analyst needs to understand methods for quantifying trends and determining their statistical significance.
When several different approaches or "looks" at the data point to the same conclusion, confidence in the conclusion is
increased. The analyst also needs to be able to communicate the results in a meaningful and understandable way.
Interpretation of trends from site level to larger scales, such as city-wide or regional scale, needs to be done with care.
Some site and pollutant combinations may be dominated by local sources or comparisons between some sites may not
be reasonable because of large differences between sampling methods.
June 2009 Section 6 - Quantifying Trends
-------
Trends Overview
Analysis Questions
Are concentration levels changing at a monitoring site?
Are changes consistent across sites, areas, or regions?
Are changes consistent across pollutants or pollutant
groups?
Are changes consistent across time periods?
Are changes consistent with expectations (e.g., emissions
controls, changes in population)?
June 2009 Section 6 - Quantifying Trends
-------
Setting Up Data for Trend Analysis
Overview
Steps to prepare data for trend analysis:
- Acquire and validate data (covered in Preparing Data for
Analysis, Section 4)
- Identify and treat data below detection in preparation for annual
averages (covered in this section)
- Create valid annual averages or other metrics for trends
(subannual data averaging is covered in Preparing Data for
Analysis, Section 4)
- Create valid site-level trends (covered in this section)
June 2009 Section 6 - Quantifying Trends
-------
Setting Up Data for Trend Analysis
Identifying Censored Data
• Data are typically reported as a concentration value with an accompanying method detection limit
(MDL). In AQS, the MDL is either a default value associated with the analytical method (MDL) or a
value assigned by the reporting entity for that specific record (alternate MDL).
• NATTS program guidance suggests that laboratories report all values, regardless of the MDL.
However, many air toxics data are reported as censored values; i.e., they have been replaced with
zero, MDL/2, or MDL (or some other value).
• Identifying censored values is a helpful first step in treating data below detection. Reporting of
censored data will most likely differ among sites and may even be different by method, parameter
or time period for a given site. For this reason it is recommended that censored data analyses be
carried out for each site, parameter, and method, and temporal variability should be considered.
• Data may be identified and separated at or below the detection limit along with the associated MDL
and date/time; if alternate MDLs are available, it is recommended they be used rather than the
default MDLs.
• Data may be examined for obvious substitution. Count the number of times each value at or below
detection is reported at a given site, parameter, and method. Are the majority of data reported as
the same value (e.g., zero or MDL/2)?
- If data are largely reported as two or more values, investigate the temporal variation of the data. Are there
large step changes where reporting methods or MDLs have changed?
- Do the duplicate values indicate a typical censoring method (e.g., MDL/2, MDL/10)?
- Alternate MDLs may be different for each sample run causing a distribution of values if MDL/x substitutions
were used. Just because values below MDL are not all the same does not mean they are not censored!
• Check for MDL/X substitution.
- Make a scatter plot of the value vs. MDL to see if the data fall on a straight line.
- If the data do form a straight line, the slope of the regression line will indicate the value by which the MDL has
been divided.
Is the value a reasonable number that would be used for MDL substitution (e.g., 1,2,5 or 10)?
- If the data have been formatted, processed or converted, ratios may not be exactly the same due to rounding differences; the
distribution should be close to a straight line and centered around a single integer if MDL/x substitutions have been made.
- If a bifurcated pattern is observed, the substitution method may have changed over time. Plot a time series of the ratios and look for
step changes.
The distribution of the ratios should be highly variable if the data are not censored.
June 2009 Section 6 - Quantifying Trends
-------
Setting Up Data for Trend Analysis
Treating Data Below Detection (1 of 2)
• Following are suggested steps to create averages:
- If uncensored values (i.e., NOT zero, MDL/2, or MDL) are reported below
MDL, use the data "as is" with no substitution.
- If uncensored values are not available, substitute MDL/2 for data below
MDL or use more sophisticated methods as described in Section 4.
- If there is a mix of censored and uncensored data,
• In data sets with a mixture of censored and uncensored data, two substitution
methods can be compared: (1) MDL/2 substitution for censored values and
leave uncensored values "as is" and (2) MDL/2 substitution for all data below
detection
• If results are in the same direction using both substitution methods, confidence
in the results is increased and substitution method 1 should be retained. If the
results do not agree, a more sophisticated method for estimating the data
below MDL should be employed.
- For all data sets, identify the percentage of data below MDL for each year
in the trend period. It is important to keep track of how much data are
below detection to better understand possible biases in the average.
Even if censored values are not used, keep a record of this information to
provide one measure of the uncertainty in the results.
June 2009 Section 6 - Quantifying Trends
-------
Setting Up Data for Trend Analysis
Treating Data Below Detection (2 of 2)
Each annual average should have an associated calculation of the percent below
detection. These data provide information about the biases of the annual average
when data are below detection.
When assessing trends over time for a pollutant,
- Assess trends at all sites regardless of the percent of data below MDL. Note, however,
that data are below detection for many site/pollutant combinations. To avoid over-
interpretation of observed trends, it is recommended the trend values and their associated
percent below detection be visually inspected. Consider trends at sites where at least half
of the years for a given trend period have at least 15% of their measurements above MDL
for that year.
For the national level analyses, a 15% "cut-off" was selected based on review of a
small data set with most data above detection. Bias in the annual average was
investigated for this data set across a range of percent of data below detection. At
15% below detection, the bias in the annual average was 10-40%. A more
stringent cut-off may be required if less bias is desirable.
- For example, if a 5% concentration change was observed but all years have greater than
85% data below detection, the analyst cannot be sure whether this change is real or an
effect of data below detection. In other words, the uncertainty masks the possible change.
In all cases, the percent below MDL should be considered as a possible source of
bias when interpreting site level trends.
June 2009 Section 6 - Quantifying Trends
-------
Setting Up Data for Trend Analysis
Creating Valid Annual Averages
Data averaging is fully covered in Preparing Data for Analysis, Section 4,
and summarized here for convenience.
• Subdaily data should first be aggregated to valid 24-hr averages. For a given day,
75% of data at the expected subdaily sampling duration is suggested for a valid
24-hr average.
• 75% of data at the expected daily sampling frequency is suggested for a valid
calendar quarter average.
Frequency
Daily
Every 3rd Day
Every 6th Day
Every 12th Day
Unassigned
75% Quarterly
Completeness Cutoff
68
23
11
5
5
At least 58 days are suggested between the first and last sample in a quarter to
ensure that sampling represents the entire quarter
Data for 3 of 4 quarters are suggested for annual averages prepared from quarterly
averages to ensure that sampling represents the entire year. Some air toxics
concentrations show significant seasonal variations.
June 2009
Section 6 - Quantifying Trends
-------
Setting Up Data for Trend Analysis
Creating Valid Trends
Trends are investigated for a unique combination of parameter,
monitoring location, and method code.
• Initially, it is important to segregate method codes for a given parameter
and monitoring location to assess differences (e.g., biases, detection
limits) that might result in comparability issues. In addition, methods
may change over time, perhaps causing significant analytical biases that
may affect trends assessments. After investigating individual trends,
e.g. by method, further aggregation may be reasonable (discussed later
in this section).
• At a given monitoring location, sometimes more than one monitor
reports the same pollutant, known as a collocated measurement. When
collocated measurements are made, data from each monitor are
differentiated in AQS using POCs.
Collocated measurements should be investigated individually as outlined in
Preparing Data for Analysis, Section 4. If agreement between collocated
measurements is good, the data may be averaged for a given parameter, site,
date, and method in order to avoid double-counting. At the national level, these
data were not used.
June 2009 Section 6 - Quantifying Trends 10
-------
Setting Up Data for Trend Analysis
Trend Length and Completeness
Length and completeness criteria may be used to ensure that trends are
representative of the time period of interest and that data are consistent for
intercomparison among sites.
When choosing these criteria, analysts should strive to strike a balance
between maximizing available data and creating valid trends in the period of
interest.
It is easier to discern underlying trends over long time periods.
More stringent constraints result in a reduction of available data. For
example, by selecting longer trend periods, fewer sites will be available for
analysis because longer continuous operation is required. On the other hand,
shorter trend periods are subject to more variability, for example, because of
changes in meteorology which often obscure underlying trends.
Decreasing
N-Hexane
1,3-Butadiene
Benzene
X
i
Increasing
ryyyyyy/yyyyy/yyyyyyyyyyyy/yyyyyy/yyyyA
=F
-20
10
I
20
0
34
74
12
39 '
77 :
17 :
61 :
125
c
13"
1990-200510th-90th Percentile
1995-20051 Oth-90th Percentile
2000-200510tri-90th Percentile
Meti\sn% Change per Year
-10 0
Percentage Change per Year
In the example, three trend periods were investigated: 1990-2005, 1995-2005, and 2000-2005. Only 17 sites in the United States collected
benzene data over the 1990-2005 sampling period that met the completeness criteria. In contrast, data from 125 sites met the completeness
criteria for the shorter 2000-2005 trend period. Variability for shorter trend periods is much higher.
June 2009
Section 6 - Quantifying Trends
11
-------
Setting Up Data for Trend Analysis
Trend Length and Completeness
Trend Length
- One goal of the NATTS is to provide data with a minimum trend
length of six years to be able to compare two 3-yr averages.
- Of course, other trend periods are acceptable!
Trend Completeness
- Of the number of data years in a trend period, at least 75% is
suggested for a site to be included (e.g., for a six-year trend
period, at least five years of valid annual averages are
suggested).
- Trends with data gaps of more than two years should not be
used.
June 2009 Section 6 - Quantifying Trends 12
-------
Setting Up Data for Trend Analysis
Example - Creating Valid Trends
This example illustrates why looking at trends by
method code is important.
• Figure (a) shows all annual averages for arsenic
PM2 5 at a site, color-coded by method. Solid lines ^
indicate annual averages and dashed lines show
average MDLs.
Figure (b) shows the trend (blue) and average MDL
(pink) for all data at a site regardless of method (i.e.,
the same data as in Figure (a) connected into one
trend). This produces a statistically significantly
increasing trend.
Figure (c) shows the results if data are partitioned by
method. Only data with method 831 are reserved
because this method is the only one to have a trend
period greater than four years. The results show a
statistically insignificant decreasing trend, opposite
the result obtained using all data.
Which trend result is "right"?
- The statistically significant trend in Figure (b) is driven
by the lower concentration values in 1996-1998. The
measured concentrations between 1996 and 2000
may be representative of ambient concentrations;
however, inconsistencies in sampling method and
MDLs cast doubt on the comparability of this data to
post-2000 data.
- In the end we cannot be sure which trend is "right";
more advanced analyses of the data should be
undertaken if time permits. At a national level, trends
could not be individually quality-controlled so they
were partitioned by method to reduce inconsistencies.
o
0.0016
0.0012
0.0008
0.0004
O
o
Arsenic PM7^ Annual Averages
•Method 800
•Method 801
•Method 802
Method 831
All Data
1994
0.00161
1996
1998
2000
2002
2004
2006
3 0.0012
C
o
re o.ooos
0)
o
o
o
0.0004
Annual Average Concentration
All Data
i-fcuci
1994
0.0016
1996
1998
2000
2002
2004
2006
i 0.0012
C
o
*= 0.0008 ]
C
0)
o
c
o
o
0.0004
-••Annual Average Concentration
Method 831
(c)
1994
1996
1998
2000
Year
2002
2004
2006
June 2009
Section 6 - Quantifying Trends
13
-------
Setting Up Data for Trend Analysis
Evaluating the Effect of Method Changes
• Due to the large number of data included in the national air toxics analysis, the effect
of changes in measurement methods and MDLs on trends could not be assessed on
a site-by-site basis.
• During more localized analyses, such differences may be investigated; not all method
changes need to be considered separately. Data may be retained across
comparable method changes in order to create the longest trend periods possible.
• Assessing the comparability of methods will be a case-by-case analysis; no one
procedure will provide the answer, but the following is a good start:
- Plot all available annual averages and associated average MDLs, color-coded by method for
each air toxic (as in Figure (a) on the previous slide); tabulate the percent of data below
detection by year.
- Visually assess method changes for unusual patterns in average concentration and MDL.
- If MDL changes occur, investigate the percent of data below detection to determine if MDL/2
substitutions are driving the difference. Keep in mind the percent of data below detection and
effect of MDL/2 substitutions for subsequent analyses.
- Examine trends in air toxics data that are not expected to change significantly between years
(e.g., carbon tetrachloride); significant jumps in annual average concentrations for these air
toxics may indicate a problem.
- Compare pollutants measured by the same methods that are expected to vary together (e.g.,
benzene and toluene) and look for discontinuities.
- Investigate collocated data together, if available. In some cases, a measurement method
may have changed in the primary monitor, but not in the secondary monitor. Look for
changes in the relationship in concentrations between the monitors.
June 2009 Section 6 - Quantifying Trends 14
-------
Effect of Changes in MDL on
Trends Assessment
Another important consideration in preparing data for trend
analysis is that detection limits can change over time for a given
monitoring site, parameter, and method. At a national scale,
some detection limits change by orders of magnitude.
These changes may influence annual averages, particularly if
MDL substitutions are used. Similar trends between MDL and
annual average concentrations may indicate that the changes in
MDL are strongly influencing the annual average trends.
It is recommended that the analyst inspect the trends in MDL in
addition to the trends in concentration, especially for air toxics with
concentrations close to the MDL (i.e., within a factor of 10).
More sophisticated statistical analysis may be needed to quantify
the underlying influence of the MDL changes on the ambient
concentrations. Such analysis has not yet been performed on the
national data set.
June 2009 Section 6 - Quantifying Trends 15
-------
Effect of Changes in MDL on
Trends Assessment Example a of 2)
0.004
0.0035
_ 0.003
1 Average Concentration
•Average MDL
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
Year
In the national level investigation of manganese (Mn) trends, we noted that MDL trends were similar to
concentration trends. The clear correlation between the two trend lines makes us suspicious of the reliability of
the overall ambient trend. This example shows average Mn PM25 concentrations and MDLs from 1990 to 2003.
For this data set, Hyslop and White (2007) showed that reported MDLs are much lower than actual detection
limits. Current recommendations are to be cautious with data within a factor of 6 to 10 of the reported MDL. The
trend shown here may not be a real trend—these data may all be below detection.
June 2009
Section 6 - Quantifying Trends
16
-------
Effect of Changes in MDL on
Trends Assessment Example
ro
is
c
01
o
o
o
Benzene 1997-2006 Trend
y = -0.02x + 48.63
R 2 = 0.88
1996
2006
In contrast to the previous Mn PM25 trend, this benzene trend
does not show influence from a change in MDL (i.e., the trends in
concentration and MDL show different patterns).
June 2009
Section 6 - Quantifying Trends
17
-------
Quantifying Trends
Approach
• Initial investigation of trends
- Inspect first and last year of the trend period or two multi-year averages for
change.
- Use simple linear regression to determine the magnitude of a trend over the trend
period.
• Quantifying trends
- The percent difference between the first and last year of the trend period provides
a rough, first cut, sense of the change.
- The difference between two multi-year averages provides another measure of
change and helps smooth out possible influences of meteorology.
- The percent change per year is provided by the slope of the regression line. This
"normalized" value allows the analyst to compare changes across varying lengths
of time (i.e., sites with different trend periods).
• Testing the significance of the observed trends
- Calculate the significance of the slope using the F-test (see next slide). The
F-test provides a statistical measure of the confidence that there is a relationship
between the two variables (i.e., the regression line does not have a slope of zero
which would indicate that the dependent variable is not related to the independent
variable).
- Other methods can be employed to test for significance including t-tests,
nonparametric tests (tests for and estimates a trend without making distributional
assumptions such as Spearman's rho test of trend; Kendall's tau test of trend),
and analysis of variance.
June 2009 Section 6 - Quantifying Trends 18
-------
Quantifying Trends
Interpreting Linear Regression Output
Example output from a linear regression of annual average benzene
concentrations (performed in Excel) is provided:
Slope
-0.3943
RA2
0.794456
F-Statistic
30.92103
Intercept
789.562
% Change
-69.241021
P-value
% Change Per Year
-6.2946382
Confidence level
99.946575
This example output shows
a decline in annual average
benzene concentrations over
time with 95% confidence and
slope not equal to zero.
The output is interpreted as follows:
• Slope, intercept, % change, % change per year, R2. Indicate the slope of the line,
y-axis intercept, % change between first and last year of the line, % change divided by
number of years, and fraction of variation accounted for.
• F-statistic or F-ratio. F-ratio is used to test the hypothesis that the slope is 0. The F-ratio
is large when the independent variable(s) helps to explain the variation in the dependent
variable. Therefore, large F-ratios indicate a stronger correlation between the two
variables (i.e., the slope of the regression line is NOT zero).
• P-value. The P-value is the probability of exceeding the F-ratio when the group means
are equal (generally, 95% confidence is used as a cutoff value, corresponding to a P-
valueof 0.05).
Microsoft Excel and SYSTAT11 are two of many software programs that can
calculate the F-test.
June 2009
Section 6 - Quantifying Trends
19
-------
CO
E
o
I
"c
0)
o
o
o
1.5
0.5
Quantifying Trends
Statistical Significance Example
Benzene Annual Average
1995
1997
1999
2001
Year
2003
2005
Site 1
Site 2
Linear (Site 1)
Linear (Site 2)
Y = -0.0639X + 128.9
R2 = 0.72
P-Value = 0.002 (significant)
Y = -0.0223X + 46.09
R2 = 0.056
P-Value = 0.6 (not significant)
This example shows benzene trends at two sites. Both sites show a linear regression with a negative
slope, but only Site 1 shows a statistically significant decrease. At Site 2, a decrease in
concentrations is apparent, but the change is not statistically significant (i.e., failed F-test).
June 2009
Section 6 - Quantifying Trends
20
-------
Visualizing Trends
Overview
Visual inspection of trend data is vital! A linear fit to a
trend may not be appropriate; for example, a step
change may have occurred due to a major emissions
regulation or a nonlinear or exponential fit may be more
appropriate.
Methods for visualizing the data include
- Line graphs of selected indicators
- Box plots (high and low values, median values, outliers)
- Plots of mean or median values with confidence intervals
- Combination of a map and temporal information
June 2009 Section 6 - Quantifying Trends 21
-------
Visualizing Trends
Line Graphs
It is sometimes useful to break a
long-term trend into shorter time
intervals because of significant
changes in emissions. Trends should
be individually and visually investigated.
For example, benzene in gasoline was
significantly reduced in several urban
areas starting in the mid-1990s when
reformulated gas (RFG) was introduced.
Dramatic reductions were observed in
ambient benzene concentrations
over this time period.
Both plots contain the same data.
If one trend line is used, the overall
trend decreases. If two trend
lines are segregated by the RFG
year (1995), the benzene concentrations
are relatively flat before and after RFG
implementation.
In this case, the difference between the
two time periods may be a better
quantitative reflection of how benzene
concentrations have changed.
0)
O
O
O
O
C
O
O
Benzene Annual Averages
4
3
5 2
0
1992
1993
1994
1995
_ 4
0
y = 0.0851x-166.54
R2 = 0.369
y = -0.2248x + 450.9f
R2 = 0.5492
1996
1997
1998
1999
2000
y = 0.1253x-248.65
R2 = 0.5038
1992
1993
1994
1995
1996
Year
1997
1998
1999
2000
The figure shows the same benzene annual averages fitted with
regression lines in two ways. The first fits all data with one regression
line and the second takes into account a large step change that
occurred from regulations put into effect in 1995. The figure was
created in Microsoft Excel.
June 2009
Section 6 - Quantifying Trends
22
-------
Visualizing Trends
Using Other Statistical Metrics
We are typically interested in
air toxics annual average
trends because the annual
average is used for
comparisons to levels of
concern for chronic health
effects. Guidelines for
preparing annual averages
were provided previously.
In addition to an annual
average, other statistical
indicators can be used to
verify a trend.
- These include median,
maximum, minimum, and
selected percentiles.
- These metrics are especially
helpful in identifying effects of
censored data below detection.
25
20
Formaldehyde Annual Averages
•95th Percentile of Annual Concentration
•Annual Average Concentration
•Annual Median Concentration
•5th Percentile of Annual Concentration
1999
2000
2001
2002
Year
2003
2004
2005
This figure, showing formaldehyde annual data with various
statistical measures, demonstrates that the annual pattern
in concentration is relatively consistent. 2002
concentrations were low and there is no consistent trend
over this 1999-2005 time period.
June 2009
Section 6 - Quantifying Trends
23
-------
Visualizing Trends
Box Plots
Box plots are another useful
way to display multiple
statistical metrics and
visually asses statistical
significance.
Box plots illustrate the
trends in the high and low
values, interquartile ranges,
median, and confidence
intervals of the annual
average.
The box plots displayed
here are described in
Characterizing Air Toxics
Section 5.
Formaldehyde Annual Averages
03
-i—<
C
8
C
O
O
35
30
25
20
15
10
5
0
1998
i i i
o
x
X
1999 2000 2001 2002 2003 2004 2005 2006
Year
The figure shows annual formaldehyde concentrations represented as
box plots. The variability is similar from year to year since the boxes
for each year are about the same height. Concentrations in 2002 were
statistically significantly lower than in other years because the
confidence intervals do not overlap any other year.
June 2009
Section 6 - Quantifying Trends
24
-------
Visualizing Trends
Using Confidence Intervals
Confidence intervals (CIs) are shown
around the annual averages for
several years of data.
Since the plotted CIs overlap in 1999
and 2001 but not in 2000 and 2001,
1999 and 2001 concentrations are
not significantly different, but 2000
and 2001 concentrations are
significantly different.
CIs are a function of fewer samples
resulting in large CIs. Air toxics data
sets are typically small (i.e., only a
few samples per month); thus, CIs
help analysts understand the range
in which the annual mean
concentration can statistically fall.
Cl is computed as follows:
— - -* £•
n
12
Formaldehyde Annual Average
10
n
8
o
I6
+j
c
a)
o 4
o
O
0
-A-Annual Average Concentration
Error bars represent 95% confidence
1998 1999 2000 2001
2002
Year
2003 2004 2005 2006
where x is the mean value, a is the
standard deviation, n is number of
samples, and z* is the upper (1-C)/2 critical
value (use a look up table for the %
required) for the standard normal
distribution.
June 2009
Section 6 - Quantifying Trends
25
-------
Visualizing Trends
Including Underlying Data
In this example, a trend for each parameter, site, and
method was plotted next to the underlying data. The
figures show annual averages with standard
deviations in blue and average MDLs in pink. The
underlying data include the average MDL, percent
below MDL by year and calculated regression, and
F-value statistics as well as percent change per year.
Figure (a) is an example of a benzene trend for the
1995-2005 trend period. In the plot, we can see that
data are mostly above detection and show a
statistically significant decreasing trend of about
5% per year.
Figure (b) shows arsenic PM25 data. Calculations
indicate a statistically significant increasing trend of
20% per year. If these statistics were used alone,
they would indicate a serious arsenic problem at this
site. When the underlying data are examined though,
it is clear that there may be other factors to consider.
The first two years of data are 100% below detection,
resulting in values that are entirely MDL/2-substituted.
The values for these years may, in fact, be
significantly lower and should not simply be
discarded; we cannot tell from the current data. This
trend should be considered suspect and validated by
comparison with neighboring sites; the summary
statistics should not be trusted as accurate values.
C34 - *
A B 1 C |
1 Ye.il .ivy v.il stdev < bel
2 m U'-HE 0180699
3 1997 0.671623 0.273198
4 1998 0.5B4B97 0189567
5 1999 0.511101 0.196835
6 2000 0.503381 0.201701 1.75
7 2001 0.329B9B 0 152473
8 2002 0.427127 0.165272
5 2003 0.470485 0.183091
D E
F G
H
1
iw MDL MDL slope inteicept "« change
0 0 053
0 0.053
22~ -0.040080631
223
0 0.053223
0 0.053223
3B59B5 0.053
0 0.053
223
223
0 0.053223
0 0.047901
in 2004 0.369374 0 19144 O.B47457B27 0.053
II 2005 0.37147 0.182926
1"1
1 j
14
15
16
17
18
19
20
21
-]-i
23
24
25
26
28
29
1 -,
0.9
-------
Visualizing Trends
Calculating Trend Period Percent Change
3 j
o
"ro
There are many methods for calculating trend-period
percentage change. Four such methods are listed below
along with the associated percentage change that would
result from applying each method to the benzene data
pictured at right:
1. Using the first and last measured data point (-40.43%).
2. Using the regression equation (-57.12%).
3. Using all values before and after a step change (-55.29%).
4. Using three-year averages before and after a step change (-53.71 %).
In method 1, there is no sense of the underlying pattern
for all years of interest, and the results are affected by
the differences in the meteorology of the chosen years.
Method 3 is a better measure of the percentage change
because it isolates the two data points having the most 2
impact on the overall trend, but requires visualizing the
data first. 1 ^
Methods 2 and 4 use values that are weighted by more
years of data within the trend period, providing more
smoothing of variability from meteorological fluctuations.
There is no right method for calculating trend results, but
knowledge of possible biases of each is important when
deciding which to use.
Benzene Annual Averages
y = -0.2248X + 450.95
-------
Summarizing Trends
Overview
Investigate trends among sites by pollutant.
- Similar trends results among the sites makes a compelling
argument that change on a larger spatial scale has occurred.
Characterize the spatial distribution of trends by
showing trends at each site on a map.
- Trends may not agree nationally in direction or magnitude but
may show spatial patterns of interest.
Characterize the distribution of individual site trends by
displaying the range of percentage change per year
over various trend periods and for all sites meeting
minimum trend criteria.
June 2009 Section 6 - Quantifying Trends 28
-------
Summarizing Trends
Trends Among Sites
Site-level trend investigation is vital!
The figures show site-level trends for benzene from
two U.S. sites; average MDLs are plotted in pink for
reference.
The top figure shows a statistically significant
decreasing trend, while the bottom figure shows a
statistically insignificant decreasing trend.
Confidence in these results is high. The data are
mostly above detection, MDLs are consistent for the
whole trend period, and no outliers appear to
influence the trend.
If any of these problems do exist, the underlying
trend data should be evaluated more carefully to
understand the reliability of the trend.
Next steps in investigating suspect trends
- If one or more annual averages are an outliers, re-
validate the underlying data. Is one high concentration
event the cause, or is there a distribution of high values?
Is there an explanation for the high annual average to
prove it valid (e.g., increased local source emissions) or
in error (e.g., unit conversion error)?
- If MDL changes occur and
• A low percentage of data is below detection, the change in MDL
should not have a noticeable effect.
• A high percentage of data are below detection, there is
decreased confidence in the trend. If MDL/2 substitutions is used
check that the trend does not follow the shape as the MDL
changes; if it does the trend is likely unreliable.
- If a high percentage of data is below detection without an
MDL change, the central tendency of the data may still be
accessible, but there is lower confidence in the trend.
^2.5
1 2
c
° ^ 5
v I .O
5
1 1
o
A statistically significant decreasing
benzene trend
* Annual Average
" Average MDL
y = -Q.16x+314.62
R =0.90
1999
2001
2003
Year
2005
1.8
1.6
1" 1.4
5 1.2
I 1
5 0.8
*J
§ 0.6
0.4
0.2
A statistically insignificant decreasing
benzene trend
• Annual
« Averag
y = -0.01x +26.18
R2=0.03
2000
2002
2004
June 2009
Section 6 - Quantifying Trends
Year
2006
29
-------
Summarizing Trends
Example - Spatial Distribution
-------
Summarizing Trends
Example - Spatial Distribution (2 of 2)
Site Level Percentage Change per Year
for the 2000-2006 Trend Period
Chromium PM2 5
Change per Year
O 10
O
100
Increasing
Decreasing
Increasing. Insignificant
Decreasing. Insignificant
This example shows chromium PM25 concentrations across the United States in 2000 to
2006. The statistically significant trends are spatially distinct, indicating increasing
concentrations in the eastern half of the country and decreasing concentrations in the West.
June 2009
Section 6 - Quantifying Trends
31
-------
Summarizing Trends
Example - Percentage Change per Year
We are typically interested in how a pollutant trend at a
site compares to other sites. Summarizing the data in this
way provides a succinct national perspective.
The bar chart summarizes trends in % change per year
for selected mobile source air toxics for 2000-2005 data.
The 10th, 50th, and 90th percentile of site-specific Styrene
percentage change per year are plotted. The number of
sites included in percentile calculations is also provided. Toluene
A range of results is seen across the network (i.e., 10th to
90th percentile sites); however, most sites are Benzene
experiencing declines of a few % per year with
remarkable consistency (see median); "outlier" (e.g., 95th 1,3-Butadiene
percentile) sites may be candidates for additional
investigation.
1,3-butadiene and styrene show a wider range of %
changes by site. The median U.S. monitoring site,
however, shows a trend of about -5%, in agreement with
the other mobile source air toxics.
Benzene and toluene show similar ranges in % change
per year and less variability in trends across the U.S. than
1,3-butadiene and styrene.
Toluene is decreasing at 90% of sites by about 2% to
12% per year, while benzene is decreasing at most sites
and may be increasing at some sites.
The map shows the site-specific % change values for
benzene used in the bar chart, similar to the proportional
maps shown previously. The magnitude of the change
per year is characterized by the size of the arrow.
Information as to whether the trend was statistically
significant is indicated by the color of the arrow.
Comparing data summaries, such as the bar chart, to
more detailed plots, such as the map, offers an overview
of the data. The map shows the spatial distribution of
data included in the summary statistics. For example,
benzene is increasing in some areas of the United States,
but none of the trends are statistically significant. Many of
the decreasing trends, on the other hand, are statistically
significant.
Decreasing
Increasing
84 £
3
119*
^
125 %
(/>
77 ®
''
-20
-10 0 10
Percentage Change per Year
20
2000-2005 10th-90th Percentile Percentage Change
Median % Change per Year
Percentage Change per Year
Decreasing Trend, Significant
0 - 8% per year
8 - 25% per year
25-100% per year
Decreasing Trend, non-signlfican!
b 0 - 6% per year
^ 8-25% per year
\|^ 25 - 1PO% per year
Increasing Trend, non-slgnifican!
T< 0 - 8% per year
^ 8 - 25% par year
^25-100% per year
June 2009
Section 6 - Quantifying Trends
32
-------
Aggregating Trends to
Larger Spatial Regions
• Aggregated trends for larger spatial regions, such as trends by state or
EPA Region, may be of interest to communicate results at a "big picture"
level to interested stakeholders.
• Previous examples provide approaches to handling data at an aggregate
level at spatial resolution less than the national scale, including
summarizing percent change by year, using central tendency statistics,
and plotting results on a map.
• As data sets become smaller—i.e., the analyst looks at fewer sites and
fewer years—gaps in the data record become more important. For
example, some site-level trend periods may meet the minimum criteria
but will still have gaps in the data. Problems arise when, in combining
data sets, a site, especially one measuring high or low concentrations,
has missing data during some time periods.
• To handle these data gaps, the following steps are recommended.
- For general site-level analyses, these gaps should be left as-is.
- While not done at a national level, when aggregating to larger spatial regions,
data gaps could be filled in, using the following methods, to be consistent with
current trends analyses performed for criteria pollutants:
• Missing the last year: set the missing year equal to the second-to-last year.
• Missing the first year: set the missing year equal to the second year.
• Missing any other year: interpolate between the adjacent two years.
• No more than two years in succession can be missing (this was applied in the national analyses).
June 2009 Section 6 - Quantifying Trends 33
-------
Aggregating Trends
Example - Using Line Graphs
Line graphs can be used to
assess trends in selected
indicators.
National benzene trends
(annual average
concentrations) from 2000-
2005 are summarized in the
graph. Sites included in the
summary are shown in the
inset map. These types of
summary displays are useful in
showing general trends for
multiple sites such as
nationally (shown here).
90 percent of sites are below this line.
1
10 percent of sites are below this line.
00
01 02 03 04
2000 to 2005: 17% decrease
05
Monitoring Network
* NATTS
O UATMP
A Other
Puerto Rico
June 2009
Section 6 - Quantifying Trends
Line graph figures were created with
Grapher/; maps were produced in Arcmap.
34
-------
Accountability
Overview (1 of 2)
• The term accountability in this section is used to refer to tying annual
trends in pollutant concentrations to control programs.
• Changes in air quality may be due to a number of factors. Trends in air
quality can provide evidence that local, regional, or federal emissions
controls have successfully reduced ambient concentrations of pollutants
harmful to human health.
• Analysis should bring as much information to bear on interpretation of
trends as possible including evaluation of other potential sources of the
compound in question as well as regulations, and meteorological
influences that may impact emissions.
• The evaluation of the impacts of regional control programs (those that
affect multiple states) and local control programs (those that affect an
urban area) on air quality is complicated and is stepwise and site- and
pollutant-specific.
• A major challenge in this type of analysis is the scale of influence of a
control and of the impact of that control on air quality. Previous
investigations of ambient air quality changes encountered the confounding
influences of multiple controls applied within similar time frames and at
different spatial scales.
June 2009 Section 6 - Quantifying Trends 35
-------
Accountability
Overview (2 of2)
Use caution - Matching trends to changes in emissions is not
sufficient to prove that an emission change actually caused the
ambient change.
Emissions regulations are typically phased in over a period of
years, causing a gradual change in ambient concentrations; other
factors such as meteorology, local source profiles, and MDL
changes may also explain changes. The use of supplementary
data (e.g., investigating trends in a pollutant not expected to be
influenced by the emission change) is necessary to be sure
observed changes are truly emissions-related.
Two approaches to a trends accountability analysis can be taken
depending on the availability of information: an emission control
approach (bottom up) and an ambient data approach (top down).
June 2009 Section 6 - Quantifying Trends 36
-------
Accountability
Bottom-Up Approach
• Select a control measure.
• Identify the air toxics expected to be affected and the available data, other controls
that might have affected the pollutants, and other pollutants that may have been
affected.
• Consider the spatial scale, or zone of influence (ZOI), of the control measure. Was
the control applied at a single facility (monitor-specific or fence line), at an urban
scale (MSA-wide), national scale (e.g., 49-state automobile emission rules), or
global scale (e.g., Montreal protocol)?
• Determine the timing and magnitude of the changes. Was the control phased in
over a period of time, applied to specific emitters? Phasing in a control makes it
more difficult to discern the relationship between the ambient concentration
change and the control change.
• Consider the magnitude of the expected air quality changes relative to the
variability in the ambient data. If the inherent variability in the ambient data is very
large, a small change in emissions may not be observable.
• Select the appropriate statistical metrics or approach for the analysis. Data
treatments may help reduce the variability in the data so that trends can be
observed.
• Develop hypotheses of expected changes, identify supporting evidence of
changes, and investigate corroborative evidence of the changes. It is often helpful
to test for changes in data sets or pollutants in which changes were not expected
(i.e., check the null hypothesis).
June 2009 Section 6 - Quantifying Trends 37
-------
Accountability
Top-Down Approach
• Quantify the change observed in the ambient data. This approach could also be
applied to a pollutant for which a change was not observed but expected.
• Identify and assess other data sets and sites that may have also been affected by a
similar control measure or emission change to understand the spatial scale of the
ambient change. If the control was applied across a broad area, changes at
additional sites might be expected.
• Identify potential emissions changes or control measures that could have
contributed to the ambient trends. Local knowledge is often a key component of this
part of the analysis.
• Compare the control measure implementation schedule with the ambient trends. Do
the timing of the control implementation and the change in ambient concentrations
coincide?
• Investigate corroborative evidence of the change and test for changes in pollutants
for which a change was not expected. It is important not to over-interpret changes
in ambient data.
Once methods have been developed for air toxics, it may be useful to apply
meteorological adjustments to the pollutant trend. The goal is to reduce the effect of
meteorology on ambient concentrations so that the underlying trend in emissions can be
more readily observed. The impact of meteorology is critical when trying to assess the
trend in toxics that are formed secondarily in the atmosphere (in addition to being emitted
directly from sources, e.g. formaldehyde). Meteorological adjustments for air toxics have
not yet been developed.
June 2009 Section 6 - Quantifying Trends 38
-------
Bottom-Up Example
Tetrachloroethene Controls in Los Angeles
-—
|2
§
o
O
0
• Burbank
A North Main Street, Los Angeles
*l_ong Beach
Local rule to phase
out emissions
completely by 2020
1-in-a-million Cancer risk
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Tetrachloroethene is the chemical most widely used by the dry cleaning industry, with over 85% of facilities using it
as the primary cleaning agent. In 1993, the EPA promulgated technology-based emissions standards to control
tetrachloroethene emissions from dry cleaners.
The MACT standards implemented in 1993 resulted in drastic reductions in tetrachloroethene concentrations in the
Los Angeles area where monitoring data have been available from three sites since 1992.
Trend lines show the reductions over time in average ambient concentrations. Although concentrations in the Los
Angeles area are still above the cancer risk level of concern, exposure to this air toxic has been reduced by about
80% in the past 15 years. In addition, the local South Coast Air Quality Management District implemented a rule
to phase out tetrachloroethene emissions completely by 2020.
June 2009
Section 6 - Quantifying Trends
39
-------
Bottom-Up Example
Ozone Precursor Controls in Baltimore, MD
0)
D)
C
re
O
•+•»
0)
B
20
0
-20
Pftase /
simple
Phase I
complex
Phase II
complex
27%
VMT
'Benzene
Toluene
RFC
implementation
dates
-40
-60
-80
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Air toxics, such as benzene and toluene, that are emitted by motor vehicles are significant contributors to ozone
formation. Reformulated gasoline (RFC) was introduced in the United States in phases to reduce motor vehicle
emissions of benzene and other ozone precursors in order to reduce ambient ozone concentrations.
Benzene and toluene concentrations decreased after the 1995 implementation of RFC despite an increase in the
number of vehicle miles traveled by cars and trucks in the Baltimore area.
The largest part of the decreases in benzene and toluene concentrations is directly attributable to the
implementation of RFC; the more steady, few percent change per year observed in latter years is likely due to
fleet turnover (i.e., newer cars with lower emissions replacing older, more polluting vehicles).
June 2009
Section 6 - Quantifying Trends
40
-------
National Level Top-Down Example
Method
• The hypothesis is that if pollutants are emitted by the same source, emissions should
covary over long time scales. In other words, trends should be parallel if normalized.
• At a national level, the goal was to identify covariant trends in MSATs as an indicator of
sites dominated by mobile source emissions.
• Site-specific trends for six MSATs (benzene, 1,3-butadiene, toluene, ethylbenzene,
o-xylene,
m-&p-xylenes) were investigated using carbon tetrachloride as a control.
• Trends were normalized by the maximum annual average concentration within the trend
period by site and pollutant (i.e., annual average concentrations each year were divided
by the highest annual average in the time period for each pollutant and at each site).
Normalization creates a data set that is easier to compare across sites and pollutants
and shows the relative change in concentration.
• Linear regression was used to create trend lines for each pollutant.
• The sites were visually grouped into various categories by the behavior of pollutant
trends. For example, if all MSAT trends had a similar slope, we expect the change in
concentration at that site to be a consequence of mobile source reductions. If one MSAT
exhibited a very different slope than the others, we would conclude that another source of
that pollutant impacting the site was likely.
• For this analysis, only the site and parameter were required to be consistent over the
trend period (method and POC were allowed to float between years). Sites with more
than five annual averages were included.
• Sites were then investigated using Google Earth to see if our hypotheses were correct.
June 2009 Section 6 - Quantifying Trends 41
-------
National Level Top-Down Example
Output
Example output from
site illustrates results
of this analysis
• Due to normalization,
maximum values are
always = 1.
• The slopes of the
MSATs are close to
parallel.
• Carbon tetrachloride's
slope (dashed line) is
very different (flatter)
than the MSATs.
c
o
O
o
o
E
3
E
'S
re
1 -
0.8
0.6
0.4
I 0.2
o
(0
0
* Benzene
A M-&P-Xylene
Toluene
X Carbon Tetrachloride
Linear
Linear
• • • Linear
Ethylbenzene)
O-xylene)
Carbon Tetrachloride)
Ethylbenzene
O-xylene
1,3-Butadiene
•Linear (Benzene)
• Linear (M-&P-Xylene)
•Linear (Toluene)
•Linear (1,3-Butadiene)
X
1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008
June 2009
Section 6 - Quantifying Trends
42
-------
National Level Top-Down Example
Normalized Site-Specific Regression Lines
At the monitor in the top
example, all MSATs show
a similar declining slope.
Investigation of the
monitoring location
indicates that this site is
primarily mobile source-
dominated (it is located
very near a major
freeway).
The second example
shows similar slopes for
all MSATs except
1,3-butadiene and
benzene. Benzene shows
a much slower decline in
concentration than the
other MSATs while
1,3-butadiene shows a
slightly faster decline.
This monitor is located
near a large refinery with
both benzene and 1,3-
butadiene emissions
which may explain this
divergent behavior.
June 2009
re
0
1995 1997
Benzene
Ethylbenzene
M&P-Xylene
O-Xylene
Toluene
Carbon Tetrachloride
1,3-Butadiene
Section 6 - Quantifying Trends
43
-------
National Level Top-Down Example
Spatial Characterization of Trend Profile "Signatures"
• Visual inspection of the slopes of trends provides useful
information on the covariance of pollutant concentrations over time.
• The percentage change in concentrations per year can also be
plotted on maps for each pollutant shown in the scatter plots to
spatially investigate the trends profiles.
• Mobile source signatures have MSAT profiles of similar
magnitudes; other signatures have increasing or varying
magnitudes among the pollutants.
1.3-Butadiene
Ethylbenzene
MP_Xylene
0_Xylene
Toluene
Benzene
C Tet
Mobile source
1,3-Butadiene
Benzene
Noncova riant
June 2009
Section 6 - Quantifying Trends
44
-------
Mobile Source Signature
Mobile Source Signature (MDL issues for 1)
Mobile Source with Slower 1,3-Butadiene Decline
Mobile Source with Slower Benzene Decline
Non-covariarvt
Other
| 1,3-Butadiene
J Ethylbenzene
| | MP_Xylene
| | O_Xylene
| | Toluene
| Counties
California: Mobile
Source Signatures
Most California profiles
are flat (i.e., similar
magnitude trend for
each MSAT), indicating
the relative dominance
of mobile source
emissions on these
ites.
Alsta note that carbon
tetrafchloride is not an
MSAT and should not
covary wjth the others
(which it doles not).
45
-------
National Level Top-Down Example
Summary
• The top-down approach is a useful way to investigate site-level trends of
pollutants commonly emitted by the same source.
• Most sites in the United States conformed to our expected mobile source
trend profile.
• The technique also allows identification of sites at which trends do not
conform to expectations. For example, two mobile source-like signatures
were identified at most of the remaining sites
- 1,3-butadiene signature sites showed shallow or increasing 1,3-butadiene
(possible measurement issues?).
- Benzene signature sites showed shallow or increasing benzene (likely
explained by nearby point-source emissions for some sites but was not clear
for others).
• Some sites showed increasing trends or noncovariant trends in multiple
MSATs. Nearby emissions sources may be influencing trends at these
sites, and they may be good candidates for case study analyses of other
emissions sources.
• The top-down approach may be applicable to other pollutants from mobile
sources (CO, NOX, black carbon) or other emissions sources of multiple
co-emitted pollutants.
June 2009 Section 6 - Quantifying Trends 46
-------
Meteorological Adjustment of Air Toxics
Introductory Thoughts
• Meteorology can impact air quality.
- Meteorology can vary significantly among years (e.g., El Nino), and meteorology
can have a considerable effect on air quality.
- To understand changes in air quality that are attributed to emission controls, we
need to be able to adjust the data to account for meteorological conditions that
were very different from average conditions.
- By properly accounting for the portion of the variability in the data attributable to
changes in meteorology, we can compare air quality among years with widely
different meteorological conditions.
- This assessment is important because we do not have control over
meteorological changes.
• Using meteorological adjustment of air toxics is still being explored.
• Application of meteorological adjustment is likely at site-level, and each
site and pollutant will need to be treated discretely.
• In preliminary investigations, meteorology accounted for 15-25% of total
variability for benzene and lead (tsp) at selected sites; meteorological
adjustments smoothed trends; and meteorological trends adjustment
appeared to be important for interpretation of trends in benzene and lead
(tsp) and may be important to other air toxics as well. More investigation is
needed to finalize an approach for meteorological adjustment.
June 2009 Section 6 - Quantifying Trends 47
-------
Resources
Tools Available for Trend Analysis
• Examples in this section were created with
- Arclnfo and ArcView
- SYSTAT
- Grapher
- Microsoft Excel
• Air toxics guidance
- http://www.epa.gov/ttn/fera/risk atra main.html
• Computing 95% upper confidence limit (95%
UCL) for use in risk assessment
- ProUCI 4.0 available at
http://www.epa.gov/nerlesd1/tsc/software.htm
June 2009 Section 6 - Quantifying Trends 48
-------
Trends Summary
• Setting up data for trends analysis.
- Acquire and validate data. See Preparing Data for Analysis, Section 4, for a complete
discussion.
- Identify censored data. Separate data at or below detection for each parameter, site and
method.
• Count the number of occurrences by value. Do the values indicate a specific substitution method?
• Make scatter plots of data below detection vs. the detection limit for each value. The slope of the line
will indicate the denominator if MDL/x substitutions were used, even if alternate MDLs are available.
- Treat data below detection.
• If uncensored values are used, include them "as is".
• If censored values are used, substitute MDL/2 or use a more sophisticated method as appropriate.
• If a mixture of censored and uncensored data is used, compare the methods of all substituted vs. only
censored substituted to see if results agree. If not, more advanced methods to treat data below
detection may be necessary.
- Calculate valid annual averages. See Preparing Data for Analysis, Section 4, for a complete
discussion.
- Create valid trends.
• Segregate trends by parameter, site and method.
• Consider and apply trend completeness criteria depending on data needs.
- Minimum trend length of 6 years
- 75% yearly completeness within trend period
- Data gaps longer than 2 years not allowed
• Consider yearly aggregated percent of data below detection.
- Look at all data regardless of percent below detection
- Remove trends where more than half the year's data are less than 15% of data above detection
June 2009 Section 6 - Quantifying Trends 49
-------
Trends Summary
• Quantifying Trends
- Magnitude of change
Use simple linear regression to calculate first and last year values to determine the percent change over the trend period.
Calculate percent change per year for intercomparison of trend periods.
- Significance of change
Quantify the statistical significance of the slope using the F-test.
Typically, a trend is considered significant at or above the 95% confidence level.
- Visualize trends; always include annual percent below detection as a measure of
uncertainty.
Line graphs
Box plots
Spatial representations
- Summarize trends
Characterize the distribution of percentage change per year for all sites and investigate mean, median and percentiles.
Characterize the spatial distribution of the percentage change per year.
Look for consensus in results among methods.
• Accountability - tie annual trends to control programs
- Acquire background information on control programs; compare this information to site-level
metadata keeping in mind local sources, site location etc.
Implementation date or time period
Pollutants affected and expected magnitude of reduction
Types of sources affected
- Acquire emissions inventory data
Toxics release inventory data (TRI) (does not include mobile source emissions!)
National emissions inventory data (NEI)
- Compare ambient data to emission inventories and control programs—correlation is not
enough to prove causation
Compare similar pollutants that should experience concentration reductions resulting from the control programs.
Compare similar pollutants that should NOT experience concentration reductions for the control program.
June 2009 Section 6 - Quantifying Trends so
-------
Additional Reading
Meteorological Adjustment Techniques
Methods for adjusting pollutant concentrations to account
for meteorology
- Expected peak-day concentration (California Air Resources
Board, 1993)
- Native variability (California Air Resources Board, 1993)
- Filtering techniques (e.g., Rao and Zurbenko, 1994)
- Probability distribution technique (Cox and Chu, 1998)
- Classification and Regression Tree (CART) analysis (e.g.,
Stoeckenius, 1990)
- Linear regression (e.g., Davidson, 1993)
- Nonlinear regression (e.g., Bloomfield et al., 1996)
June 2009 Section 6 - Quantifying Trends 51
-------
Additional Reading
Meteorological Adjustment Techniques for
Ozone and Particulate Matter
PAMS ozone adjustment techniques,
Thompson M.L., Reynolds J., Lawrence H.C., Guttorp P.,
and Sampson P.O. (2001) A review of statistical methods
for the meteorological adjustment of tropospheric ozone.
Atmos. Environ. 35, 617-630. Available on the Internet at
Data Quality Objectives for the Trends Component of the
PM Speciation Network (includes meteorological
adjustment techniques in Appendix),
June 2009 Section 6 - Quantifying Trends 52
-------
References
Battelle Memorial Institute and Sonoma Technology, Inc. (2003) Phase II air toxics monitoring data: analyses and network design
recommendations. Final technical report prepared for Lake Michigan Air Directors Consortium, Des Plaines, IL by Battelle
Memorial Institute, Columbus, OH, and Sonoma Technology, Inc., Petaluma, CA, December.
Bloomfield P., Royle J.A., Steinberg L.J., and Yang Q. (1996) Accounting for meteorological effects in measuring urban ozone levels
and trends. Atmos. Environ. 30, 3067-3077.
Bortnick S., Coutant B., Holdren M., Stetzer S., Holdcraft J., House L, Pivetz T., and Main H. (2001) Air toxics monitoring data:
Analyses and network design recommendations. Revised Draft Technical Report prepared for Lake Michigan Air Directors
Consortium, Des Plaines, IL, by Sonoma Technology, Inc., Petaluma, CA and Battelle Memorial Institute, Columbus, OH, October.
Cox W.M. and Chu S.H. (1998) Cox-Chu meteorologically-adjusted ozone trends (1-hour and 8-hour): 1986-1997. Web page prepared
for Center for Air Pollution Impact and Trend Analysis (CAPITA), Washington University, St. Louis, MO. Available on the Internet at
. October.
Davidson A. (1993) Update on ozone trends in California's South Coast Air Basin. J. Air& Waste Manag. Assoc. 43, 226-227.
Hafner H.R. and McCarthy M.C. (2004) Phase III air toxics data analysis workbook. Workbook prepared for the Lake Michigan Air
Directors Consortium, Des Plaines, IL, by Sonoma Technology, Inc., Petaluma, CA,
STI-903553-2592-WB, August.
Hyslop, N. and White, W. (2007) Interagency Monitoring for Protected Visual Environments (IMPROVE) Detection Limits. Presented
at the Symposium on Air Quality Measurement Methods and Technology, Air and Waste Management Association, San Francisco,
CA, May 2.
Kenski D., Koerber M., Hafner H.R., McCarthy M.C., and Wheeler N. (2005) Lessons learned from air toxics data: a national
perspective. Environ. Man. J., 19-22.
McCarthy M.C., Hafner H.R., Chinkin L.R., and Charrier J.G. (2007) Temporal variability of selected air toxics in the United States.
Atmos. Environ. 41 (34), 7180-7194 () (STI-2894). Available on the Internet at
.
Rao ST. and Zurbenko I.G. (1994) Detecting and tracking changes in ozone air quality. J. Air& Waste Manag. Assoc. 44, 1089-1092.
Stoeckenius T. (1990) Adjustment of ozone trends for meteorological variation. Presented at the Air and Waste Management
Association's Specialty Conference, Tropospheric Ozone and the Environment, Los Angeles, CA, March 19-22.
Thompson M.L., Reynolds J., Lawrence H.C., Guttorp P., and Sampson P.O. (2001) A review of statistical methods for the
meteorological adjustment of tropospheric ozone. Atmos. Environ. 35, 617-630.
U.S. Environmental Protection Agency (2003) National air quality and emissions trends report, 2003 special studies edition. Prepared
by the Office of Air Quality and Standards, Air Quality Strategies and Standards Division, Research Triangle Park, NC, EPA
454/R-03-005. Section 5 available on the Internet at .
June 2009 Section 6 - Quantifying Trends 53
-------
Advanced Analyses
What else can I do with my air toxics data?
June 2009
Section 7 - Advanced Analyses
-------
Advanced Analyses
What's Covered in This Section?
• This section is an overview of selected advanced data analysis
techniques that may be useful in further understanding air
toxics data.
• Discussion of each of these topics could fill an entire workbook;
a discussion is provided of the motivation behind using these
techniques and the reader is referred to available
documentation for further information.
• Not all of these analyses have yet been thoroughly applied to
air toxics data, but approaches that have been applied to PM2 5
and PAMS VOC data, for example, should be applicable to air
toxics data sets.
• The following topics are covered
- Source apportionment
- Trajectory analysis
- Emission inventory evaluation
- Model evaluation
- Monitoring network assessment
June 2009 Section 7 - Advanced Analyses
-------
Advanced Analyses
Motivation
After basic data validation and "display and describe" analyses
have been performed, more can be done with the data if sufficient
resources (e.g., time, expertise) are available and more
sophisticated analyses are needed because basic analyses did
not sufficiently answer questions.
• Source Apportionment. Understanding the sources impacting your
monitors can be explored with source apportionment techniques and
tools.
• Trajectory Analyses. In addition to better understanding high and
low concentrations, source apportionment results can be enhanced
with trajectory analyses.
• Evaluation of Emissions Inventories and Models. A primary goal
of national monitoring networks is to compare ambient data to
emission inventories and model output. These evaluations can lead
to improvements in the inventories and model performance.
• Network Assessment. The pollution sources impacting a site,
nearby demographics, and monitoring purpose can change overtime.
EPA's air toxics monitoring plan includes regular network
assessment.
June 2009 Section 7 - Advanced Analyses
-------
Source Apportionment
Why Perform?
• Also known as receptor modeling, source apportionment is defined as a specified
mathematical procedure for identifying and quantifying the sources of ambient air pollutants
at a monitoring site (the receptor) primarily on the basis of concentration measurements at
that site.
• Source apportionment relates source emissions to their quantitative impact on ambient air
pollution.
• Receptor models can be used to address the following questions:
- What emissions sources contribute to ambient air toxics concentrations?
- How much does each source type contribute?
- Which sources could be targeted with control measures to effect the highest reduction of air toxics
concentrations (or risk)?
- What are the discrepancies between emission inventories and sources identified by receptor models?
- Are known control strategies affecting the source contributions to air toxics?
• When performing source apportionment, the analyst should be aware of uncertainties and
limitations.
- Many emitters have similar species composition profiles. The practical implication of this limitation
is that one may not be able to discern the difference between benzene emitted from light-duty vehicles
(LDV) versus benzene from gasoline stations or refineries. One solution to this problem is to add
additional species to reduce collinearity. These profiles might help to qualitatively identify mobile
sources.
- Species composition profiles change between source and receptor. Most source-receptor models
cannot currently account for changes due to photochemistry. Since carbonyl compounds such as
formaldehyde and acetaldehyde have significant secondary sources, current methods cannot link these
compounds to their primary emission sources.
- Receptor models cannot predict the consequences of emissions reductions. However, source-
receptor models can check if control plans achieve their desired reductions using historical data.
June 2009 Section 7 - Advanced Analyses
-------
Source Apportionment
Single-Sample and Multivariate Models
Receptor models are classified into two types: single-sample or multivariate.
• In single-sample models, the analysis is performed independently on each available
pollutant.
- The simplest example of this is the "tracer element" method, in which a particular property (e.g.,
chemical species) is known to be uniquely associated with a single source. In this case, the impact of
the source on the ambient sample is estimated by dividing the measured ambient concentration of the
property by the property's known abundance in the source's emissions. This method is not often
available because of the difficulties of finding unique tracers or knowing their abundances. However,
even if a pollutant is not uniquely associated with a source of interest, knowledge of the abundance
from that source can be used to provide an upper limit for the source's impact.
- The best-known example of single-sample receptor modeling is the chemical mass balance model
(CMB). CMB eliminates the need for unique tracers of sources but still requires the abundances of the
chemical components of each source (source profiles) input.
• Multivariate receptor models use data from multiple pollutants and extract source
apportionment results from all of the sample data simultaneously.
- The reward for the extra complexity of these models is that they attempt to estimate not only the
source contributions (i.e., mass from each source) but also the source compositions (i.e., profiles).
- There are several tools to perform multivariate receptor modeling described in the literature; EPA has
supported the development of two modeling platforms: Unmix and positive matrix factorization (PMF).
These models are based on factor analysis, or the closely related principal component analysis.
- Factor analysis is a statistical method used to describe variability among observed variables in terms
of fewer unobserved variables called factors.
- There is extensive literature available describing CMB and PMF applications to speciated PM data,
less available literature describing applications to VOC data, and very little research on air toxics
specifically.
June 2009 Section 7 - Advanced Analyses
-------
Source Apportionment
Positive Matrix Factorization
• PMF was originally developed by Paatero (1994, 1997) with additional
development by Hopke et al. (1991, 2003). PMF can be used to determine
source profiles based on the ambient data and associated uncertainties.
• PMF has been applied to many data sets to determine sources of PM2 5,
ozone precursors, and air toxics.
• PMF uses weighted least squares fits for data that are normally distributed and
maximum likelihood estimates for data that are log normally distributed.
Concentrations are weighted by their analytical uncertainties.
• PMF constrains factor loadings and factor scores to nonnegative values and thereby
minimizes the ambiguity caused by rotating factors.
• Model input includes ambient monitoring data and associated analytical uncertainties
(see Wade et al., 2007). A large (species and sample matrix) ambient data set is
required.
• Model output includes
- Factor loadings expressed in mass units which allows them to be used directly as source
signatures.
- Uncertainties in factor loadings and factor scores which makes the loadings and scores easier
to use in quantitative procedures such as chemical mass balance.
• A free, standalone version of PMF was created by the EPA in 2005, available on the
Internet at httB^/www-e^a gov/scmnOD I /receptor innlex. him. Updates are underway.
• Data preparation and the interpretation of model diagnostics is covered in EPA's
Multivariate Receptor Modeling Workbook (Brown et al., 2007b).
June 2009 Section 7 - Advanced Analyses
-------
Source Apportionment
Unmix
• Unmix was developed by Ron Henry (1997) using a generalization
of the self-modeling curve resolution method developed in the
chemometric community.
• It originally used MATLAB computation routines. The EPA, along with Ron
Henry, developed EPA Unmix and documentation that uses MATLAB features
but is now a standalone model (i.e., MATLAB not needed).
• Unmix is a multivariate receptor modeling package that inputs ambient
monitoring data and seeks to find the composition and contributions of
influencing sources or source types. UNMIX also produces estimates of the
uncertainties in the source compositions.
• Unmix requires many samples to extract potential sources, similar to PMF.
• It assumes that sources have unique species ratios, i.e., "edges" that can be
observed in a scatter plot between species; uses these edges to constrain the
results and identify factors; and does not need to weigh data points.
• Model input includes ambient monitoring data; uncertainty information and
source profiles are not necessary.
• Model output includes source profiles with uncertainties.
• Unmix is available at..',.;t..;...;...;...;...:^.^^.^ ;.....
• Data preparation and the interpretation of model diagnostics is covered in EPA's
Multivariate Receptor Modeling Workbook (Brown et al., 2007b).
June 2009 Section 7 - Advanced Analyses
-------
Source Apportionment
Chemical Mass Balance
• The premise of chemical mass balance (CMB) is that source profiles from
various classes of sources are different enough that their contributions can be
identified by measuring concentrations of many species collected at the receptor
site.
• To apportion sources, CMB uses an effective variance-weighted, least squares
solution to a set of linear equations which expresses each receptor species
concentration as a linear sum of the products of the source profiles and source
contributions. This method can be applied to a single sample.
• Model input includes
- Source profile species (fractional amount of species in emissions from each source
type).
- Receptor (ambient) concentrations.
- Realistic uncertainties for source and receptor values. Input uncertainty is used to weigh
the relative importance of input data to model solutions and to estimate uncertainty of
the source contributions.
• Model output includes contributions from each source type and species to the
total ambient concentration along with uncertainty.
• CMB has been used in a number of air pollution studies that examine particulate
and VOC source apportionment, but few, if any, specific air toxics studies.
• CMB is available from EPA at !:LfP://vvyvyyj;;,pa gov/3c:^:::00.1/;.cccpLO- cex ur
June 2009 Section 7 - Advanced Analyses
-------
Source Apportionment
Source Profiles
Accurate source profiles are the key to successful modeling.
Source profiles provide information about the relative contribution of
pollutants to emissions from a given source. w ° 25
Understanding source profiles is important because receptor I 0.2
modeling tools typically output source profile information that needs .2
to be interpreted or requires user-input source profiles as a starting m ° 15
point for analysis. ° 0 1
Tho finiirpQ to thp rioht show pxamnlp nolvrhlorinatpd dibfin7ofuran •-
(PBDF) source profiles for hazardous waste incinerators and 2 ° °5
copper smelting compiled by the EPA. Though the same 0
compounds are present, the relative abundances are not the same,
providing a mechanism for source identification. *&
For CMB applications and for interpretation of PMF output, it is <£- A
important to use source profiles that are representative of the study ^ N<)?
area during the period when ambient data were collected. 0 25
In CMB, try available source profiles in sensitivity tests to determine £
the best ones for use (i.e., minimize collinearity). -5 ° 2
Source profiles can be obtained from I 0.15 -
r LU
- EPA SPECIATE, recently updated (version 4.0) and available at "5
c 0.1
o
- Literature review ts QQ$ -
are available. 0
- Local, state, and federal agencies.
- Source profiles can also be procured via analysis of ambient data using yp
tools such as PMF and UNMIX. v*
V" A
n .? Cy* ^y> O|N Oy* O)N O)N ^Y* Oj^ ^Y*
_0""^ _0""^ o\i" \y$~ \)$~ \^\r yQ ^*f5 O
^
-------
Source Apportionment
Approach
• Before beginning source apportionment, it is important to "know the data" in
order to identify and assess the receptor model outputs. Understanding the data
will be achieved in the process of data validation and analysis.
- Understand airshed geography and topography using maps, photographs, site visits, etc.
- Investigate the composition and location of emission sources.
- Understand the typical meteorology of the site, including diurnal and seasonal variations.
- Investigate the spatial and temporal characteristics of the data, including meteorological
dependence.
- Investigate the relationships among species using scatter plot matrices, correlation
matrices, and other statistical tools.
• Apply cluster and factor analysis techniques using standard statistical packages
to get an overall understanding of pollutant relationships and groupings by
season, time of day, etc.
• If there are sufficient samples (e.g., more than two years of 1-in-6 day samples
for more than 20 species and more than 50% of data above detection), Unmix
and/or PMF may be applied to obtain "source" profiles with more species and
further investigate data relationships.
• If samples are few and source profiles are available, CMB may be applied to
obtain source contribution estimates.
• Compare source contributions estimates and source profiles from Unmix and
PMF to the emission inventory.
June 2009 Section 7 - Advanced Analyses 10
-------
Source Apportionment
Example
PMF receptor modeling was performed for speciated VOC data collected at two PAMS
sites, Hawthorne and Azusa, in the Los Angeles area during the summers of 2001-2003.
Both toxic and non-toxic VOCs were investigated in order to provide as much data as
possible for apportionment (Brown et al., 2007a).
Air toxics included in the analysis were typically grouped as MSATs, though they have
industrial sources as well.
Data were collected as part of the PAMS network providing the advantage of subdaily data
and speciated-versus-total mass measurements (total non-methane organic compounds,
TNMOC).
Uncertainty estimates were enhanced from the original analytical uncertainties by reducing
the weighting of data below detection and missing data. Uncertainties for missing data
were estimated with 4 times the median concentration, data below detection were given
uncertainties of 1.5*MDL, and all other data were given the analytical uncertainty plus
2/3*MDL
Site Map
June 2009
Section 7 - Advanced Analyses
11
-------
Source Apportionment
Example Preliminary Analyses
Preliminary data analyses were performed including
investigation into data quality, local emissions, species
relationships, temporal patterns, etc.
Findings
- VOC concentrations were typically higher at Azusa compared to
Hawthorne, a result consistent with site locations relative to the
ocean.
- The Azusa air mass was more aged, as indicated by loss of
reactive species (except during rush hour); this is also consistent
with the sites' locations in the air basin.
- The Hawthorne site seemed to have constant, fresh emissions, with
little change in the relative abundance of VOCs throughout the day,
consistent with nearby industrial emissions.
- Both sites are significantly influenced by mobile sources.
June 2009 Section 7 - Advanced Analyses 12
-------
Source Apportionment
Example Hawthorne Site PMF Profiles
Six factors were identified by PMF at the
Hawthorne site following protocols
discussed in the Multivariate Workbook
(Brown and Hafner, 2005). The relative
percent of species mass attributed to
each profile is shown.
Profile names indicate analyst-identified
source types.
Some of the rationale for source
identification
- Biogenic. Isoprene is the only marker for
biogenic sources measured in this data set
and anthropogenic sources of isoprene are
insignificant; temporal patterns match
expectations.
- Liquid Gasoline. Abundance of C5 alkanes
agrees with previous work; temporal
patterns are consistent with mobile
sources.
- Evaporative Emissions. C3-C6 alkanes and
temporal patterns are similar to diurnal
temperature patterns.
- Motor Vehicle Exhaust. Typical exhaust
profile and temporal patterns are consistent
with rush-hour traffic.
- Natural Gas. Natural gas is mostly ethane
and propane. These are also long-lived
species that accumulate in the
atmosphere.
- Industrial Process. Losses. Consistent with
nearby industrial emissions.
Source Profiles From PMF
100
50
0
100
50 —
Biogenic
0
100
50 —
Liquid Gasoline
n
n
Evaporative Emissions
CO
u) u
'O 100 —
-------
Source Apportionment
Example Azusa Site PMF Profiles
Source Profiles From PMF
Liquid Gas
Five factors were identified by
PMF at the Azusa site. The
100
relative percent of species mass is 50
shown. o ••••••llil ••••••••-••_•••_••-
A ,. , , .1 r-i . 100^ Evaporative Emissions
Apportionment of these profiles to 1
specific sources was performed 5°~ • ill
, .. , Ll , c/) n 1 ialllaiaaa ••••••••••-•_ _•_•_
by the analyst based on .g> 1 °
33 O 10° ~~| Motor Vehicle Exhaust
knowledge of source profiles and £_ 50j
other investigations into the data, t o 111 1 •••••••••lilllilli
Some of the rationale for source ^ 100^ Coatings
identification 50
- Coatings. Presence of C9-C11 °
ii • • A A -±u • H Biogenic
alkanes is consistent with previous
results; temporal pattern showed a
daytime peaK COnSIStent Wltn CDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDWCDCDCDCDWCD-O
i n c\ i icTn^il ^% i^ ^\ I^^TI s\ m <^ co CD co co co CD co co co co CD co co CD co co co CD co co CD CD co ^— CD CD co CD ^— co *^—
II lUUOll ICJl UUCI CJLIUI IO. i— -^ Q.-^ -^ "5.*J •HJ-HJ*J'-X'HJNX'KX3'KTiN "^. CCDNNCJNCDCJ'-^
i*~! ^* ^ ^ m mm^m^"<^mc<^n)<^"om"r^c x" O^CCCDCNCDC
- Other profiles are similar to those ujoig | g-o. >, Q. w ^= o.^ |^x ^ H ^ |6Z1||Q||||
observed at the Hawthorne site. ~ I f 'sl^^l' £ £ £ £ 1s ^
C ^1 r^ ^ "^ ^ LJJ LJJ Q) Q) Q) -HJ
a ^ .i N £ ^ iii~
S ° 1 HH HQ
CM ^ m -
-------
Source Apportionment
Example Percent of Total Mass
The profiles in the previous slides indicate the
relative fraction of VOCs within a profile.
The pie charts to the right show the importance
of each source profile by quantifying the amount
of TNMOC mass represented by each profile.
For example, in Hawthorne, evaporative
emissions accounted for 34% of TNMOC mass
during the summers of 2001-2003.
Mobile source emissions are dominant
contributors to TNMOC at both Hawthorne and
Azusa with 71% and 80% of total mass,
respectively (sum of liquid/unburned gasoline,
motor vehicle exhaust, and evaporative
emissions).
The remaining VOC mass is attributed to
coatings at the Azusa site and is split between
industrial processes and natural gas at the
Hawthorne site.
Hawthorne
Liquid/Unburned
Gas, 10.3, 13%
Motor Vehicle
Exhaust, 18.6, 24%
Biogenic, 0.9, 1%
Industrial Process
Losses, 11.8,15%
Natural Gas, 10.4,
13%
Evaporative
Emissions, 26.7,
34%
Azusa
Liquid/Unburned
Gasoline, 65.0, 27%
Biogenic, 8.0, 3%
Coatings, 39.3, 17%
Motor Vehicle
Exhaust, 51.5, 22%
Evaporative
Emissions, 72.9,
31%
June 2009
Section 7 - Advanced Analyses
15
-------
Source Apportionment
Example Apportionment of Benzene
Hawthorne
Apportionment of individual species between profiles can
also provide interesting analyses.
For example, benzene is a significant cancer risk driver at
most sites in the United States. Source apportionment of
benzene can help policy makers develop effective control
regulations.
The figures to the right show the percentage of benzene
(by mass) attributed to each source profile identified by
PMF at the Hawthorne and Azusa sites.
As expected, both sites show a significant percentage of
benzene mass attributed to mobile sources and gasoline
evaporation. Interestingly, almost one-fourth of the
benzene at the Hawthorne site is attributed to natural gas.
Benzene is not emitted in natural gas (but may be emitted
from combustion of natural gas); however, a significant
fraction of ambient benzene is associated with air parcels
containing ethane and propane (key components of
natural gas). Since benzene is relatively long-lived, it is
possible that benzene in this profile represents urban
background. The same observation can be made for the
benzene in the biogenic profile—biogenic benzene
emissions are very small.
Industrial Process
Losses
11%
Biogenic
6% Liquid Gas
10%
Natural Gas
21%
Evaporative
Emissions
7%
Motor Vehicle
45%
Azusa
Coatings
6%
Biogenic
14%
Motor Vehicle
Exhaust
37%
Liquid Gas
25%
June 2009
Section 7 - Advanced Analyses
Evaporative
Emissions
18%
16
-------
Source Apportionment
Summary
Source apportionment steps
• Review data quality and spatial/temporal characteristics.
• Prepare data for source apportionment.
- Processing the necessary data differs among the tools, but typically the
analyst needs to select pollutants with sufficient data above detection and
understand/quantify uncertainty for each concentration. Guidance is provided
in the EPA's Multivariate Receptor Modeling workbook (Brown et al., 2007b).
• Understand the air shed by assessing likely emissions sources and local
meteorology. This helps set expectations for what the source apportionment
results should show.
• With guidance from literature and workbooks, apply source apportionment
tools. This is an iterative process!
• Evaluate results for reasonableness.
• Compare results to emission inventories.
With respect to toxics data, PMF and Unmix have been applied to a
range of data sets while CMB applications have largely been focused
on PM data.
June 2009 Section 7 - Advanced Analyses 17
-------
Trajectory Analysis
Introduction
Trajectory analysis uses knowledge of air mass
movement to trace the most likely areas of influence
on high pollutant concentrations.
The use of trajectory analysis after source
apportionment helps analysts better understand,
interpret, and verify source apportionment results.
Analysis techniques
- Backward trajectories
- Trajectory densities
- Potential Source Contribution Function (PSCF)
- Conditional Probability Function (CPF)
June 2009 Section 7 - Advanced Analyses 18
-------
Trajectory Analysis
Backward Trajectories
Backward air mass trajectories
estimate where air parcels were
during previous hours.
Air mass trajectories can be
employed to investigate long-
term, synoptic-scale
meteorological conditions
associated with high
concentrations of individual
factors.
Estimates grow less certain as
time elapses.
The NOAA HYSPLIT model is
one means to run trajectories.
It is available at
http://www.arl.noaa.gov/readv/hysplit4.html
48 Hour Back Trajectories - 50, 300,1000 m
HYS PL IT trajectory
hourly endpoints for
top 20% highest
Trajectories are often plotted as single points for
every hour backwards from the start point as
shown here (also called a spaghetti plot).
However, they should not be viewed as specific
points, but rather as a small area around that
point and with the last and next point.
June 2009
Section 7 - Advanced Analyses
19
-------
Trajectory Analysis
Trajectory Densities
48 Hour Back Trajectories - 50. 300,1000 m
Spatial Probability Density
:''.:•:'" ' ^^f^^4-^/-^^^'^.'-'': HYSPUT trajectory
./•""'•,>>:•'• V •V>f%-v^^V""(v..-^V- hourly endpoints for '
•'\ .''*' .-:V;r.*f ^..'.•''•yjr-]Vt^'.':''••'-'•.'' •.... davs with the 20%
- .... .. j.^ f .-..-.^771-..x .-,-.•.- -.. -.-.; !C"''"i days with the 20%
!"'*' ..:C-1V""J^~^*'"'"•• ;"^'H''-'''^::-'"''•'v' •V*.-;-'4 worst visibility
Y'-.." "••/-.'• conditions in
'/•••. .'.•'.'•'.'r. ':'-' Indianapolis in 2002
•o'^Soioo
.•-.•.•..•. •
n ti'r- .• •;. '.l* •• :-,..•
Spatial Probability
Density (SPD) of
trajectory endpoints
processed within
CIS
Trajectories are often processed into density, rather than "spaghetti", plots.
Higher density corresponds to more trajectories passing through that grid
square. This plotting enables a number of useful analysis techniques, such
as Potential Source Contribution Function (PSCF) analysis.
June 2009
Section 7 - Advanced Analyses
20
-------
Trajectory Analysis
Potential Source Contribution Function (PSCF)
PSCF uses HYSPLIT backward trajectories to
determine probable locations of emission
sources.
m..
HJJ = number of times trajectory passed through cell (i,j).
m^- number of times source contribution peaked while
trajectory passed through cell (i,j).
Top 10%-20% source contributions are used for mjf
In the example on the right, all five-day backward
trajectories, for every two hours were applied to the
corresponding 24-hr source contributions.
PSCF calculated for each cell sized 1°xi° and results
displayed in the form of maps on which PSCF values
ranging from 0 to 1 are displayed in a color scale.
fl-O
0.3- O.S
0,6-0,7
I 0.7 -1
\
PSCF function plot for sulfate affecting
Philadelphia. Higher probability is
associated with an area of high SO2
emissions. Computations and graphics
are made using ArcMap or other GIS
tool.
(Source: Begum et al., 2005)
June 2009
Section 7 - Advanced Analyses
21
-------
Trajectory Analysis
Conditional Probability Function (CPF)
CPF uses wind direction, rather than trajectories, to determine the likely
direction of sources. CPF compares days when concentrations were
highest to the average transport pattern (i.e., the climatology).
n
A8
nAQ= number of times wind direction is
from sector A0.
mAQ= number of times source
contributions are high while wind
direction was from sector A0.
A CPF value close to 1.0 for a given
sector (A0) indicates a high probability
that a source is located in that direction.
300
270
240
120
210
Example CPF plot for the highest 25%
contribution from a PMF factor pointing
to the northwest of site as a possible
source region. Computations can be
programmed into Microsoft Excel or
other statistical packages.
June 2009
Section 7 - Advanced Analyses
(Source: Kim etal., 2004)
22
-------
Trajectory Analysis
Interpretation
No matter which trajectory analysis is used, interpretation of results is
similar. These methods are all complementary to source
apportionment or can be standalone to assess source regions. No one
method shown is superior.
- To investigate a number of days, ensemble methods are preferred (such
as trajectory densities). These methods help identify source areas.
- CPF also requires a number of days to be included, but helps point toward
a particular direction.
- Single trajectories are useful when investigating an individual sample.
The following questions may be investigated for verification of results:
- Do results meet the conceptual model of emissions and removal of air
toxics?
- Are these the areas from which emissions influence would be expected?
- Does the transport pattern make sense with respect to the age/chemistry of
a given factor (i.e., more transport and chemistry are associated with
secondary pollutants such as formaldehyde)?
June 2009 Section 7 - Advanced Analyses 23
-------
Trajectory Analysis
Using CPF Results
This approach is based on the assumption that wind direction and trajectory
analysis results should be consistent with the spatial distribution of the
sources in the emission inventory.
In the example at right,
the directions of source
regions from the CPF
plots agree with the
locations of propene
sources in the area (red
circles), giving more
confidence to the source
apportionment results.
A similar approach can
be employed for toxic
species.
June 2009
Section 7 - Advanced Analyses
(Source: Berkowitz et al., 2004)
24
-------
Emission Inventory Evaluation
Introduction
• Why bother evaluating emissions data?
- Emission inventory development is an intricate process that involves estimating and
compiling emissions activity data from hundreds of point, area, and mobile sources in a
given region. Because of the complexities involved in developing emission inventories and
the implications of errors in the inventory on air quality model performance and control
strategy assessment, it is important to evaluate the accuracy and representativeness of any
inventory that is intended for use in modeling. Furthermore, existing emission factor and
activity data for sources of air toxics and their precursors are limited and the quality of the
data is questionable. An emission inventory evaluation should be performed before the
data are used in modeling.
• What tools are available for assessing emissions data?
- Several techniques are used to evaluate emissions data including "common sense" review
of the data; source-receptor methods such as PMF; bottom-up evaluations that begin with
emissions activity data and estimate the corresponding emissions; and top-down
evaluations that compare emission estimates to ambient air quality data. Each evaluation
method has strengths and limitations.
- Based on the results of an emissions evaluation, recommendations can be made to improve
an emission inventory, if warranted. Local agencies responsible for developing an inventory
can then make revisions to the inventory data prior to modeling.
- PM2 5 and PAMS data analysis workbooks provide some example analyses and approaches
that are applicable to air toxics data (Main and Roberts, 2000; 2001).
June 2009 Section 7 - Advanced Analyses 25
-------
Emission Inventory Evaluation
Using Ambient Data
Ambient air quality data can be used to evaluate
emission estimates ("top-down"); however, the
following issues should be considered:
- Proper spatial and temporal matching of emission estimates
and ambient data is needed.
- Ambient background levels of air toxics need to be
considered.
- Meteorological effects need to be considered.
- Comparisons are only valid for primarily emitted air toxics.
- To compare ambient concentrations to emissions estimates, a
pollutant or total value (such as total VOC) is needed to create
a ratio. Typically, NOX or CO is used.
June 2009 Section 7 - Advanced Analyses 26
-------
Emission Inventory Evaluation
Top-down Approach
Top-down emissions evaluation is a method of comparing
emissions estimates with ambient air quality data.
Ambient/emission inventory comparisons are useful for
examining the relative composition of emission inventories;
they are not useful for verifying absolute pollutant masses
unless they are combined with bottom-up evaluations. The
top-down method has demonstrated success at reconciling
emission estimates of VOC and NOX.
Top-down approach:
Compare ambient- and emissions-derived primary air
toxic/NOx, CO, or VOC ratios.
If early morning samples are available (such as with PAMS data), these sampling
periods are the most appropriate to use because emissions are generally high,
mixing depths are low, winds are light, and photochemical reactions are minimized.
June 2009 Section 7 - Advanced Analyses 27
-------
Emission Inventory Evaluation
Example
o
o
25%
20%
15%
10%
D)
'CD
-Ambient - Avg
-Ambient - Median
El - Low Level
Only
El - With Elevated
Sources
At this PAMS site, the El-derived compositions of benzene are significantly higher than
the ambient-derived compositions. Examination of point source records near the source
indicates that the sources of these emissions are chemical manufacturing operations. It
appears that the chemical speciation profiles used to speciate the point source inventory
over-represent the relative amount of benzene (by about a factor of 2 to 5). Similarly,
xylenes are overestimated.
Toluene and 1,3-butadiene are only slightly overestimated in the El at this site.
June 2009
Section 7 - Advanced Analyses
28
-------
Evaluating Models
Introduction
Air quality models have been used for decades to assess the potential
impact of emission sources on ambient concentrations of criteria and toxic
air pollutants.
In the past decade, air quality models have also been used as planning
tools for criteria pollutants, e.g., SIP development and attainment
demonstration.
However, until recently, air quality models have not been used as
planning tools for air toxics, due to the lack of measurements with which
to evaluate the models.
The need to assess the usefulness of these models in air quality planning
and to improve both modeling and evaluation methods has been identified
- How well are we modeling air toxics?
Reasonable agreement between model and monitor concentrations was
set by EPA as "within a factor of 2".
Example of model-to-monitor comparisons for NATA and methodology for
comparisons are provided at:
June 2009 Section 7 - Advanced Analyses 29
-------
Evaluating Models
Methodology
Modeled Data. Modeled data of interest for air toxics include publicly
available and widely used NATA data. For this example, NATA99
model results were used.
Monitored Data. In order to reduce perturbations from meteorology
and other data biases in monitored data, the site average of 1998-
2000 valid annual averages was used for comparison to model output.
The lowest spatial resolution of NATA99 data is census tract level, so
NATA99 modeled results should be related to ambient monitoring data
at this level. If multiple sites fall into one census tract the sites should
still be individually evaluated.
Analyses. If data from many sites are available, box plots of
modeled/monitored data can be examined; fewer sites lend
themselves to a scatter plot approach of model-to-monitor data.
Model-to-monitor ratios within a factor of 2 are considered to be within
the acceptable limits of a good comparison; see
June 2009 Section 7 - Advanced Analyses so
-------
Evaluating Models
Using Box Plots
The figure shows the ratio of NATA99
modeled data to monitored data at an
urban area's sites to indicate the
accuracy of modeled data.
Red lines indicate the cutoff for
modeled-to-monitored concentrations |
within a factor of 2. i1 °°
CD
T3
O
June 2009
Acetaldehyde, benzene,
dichloromethane, and trichloroethene
typically agreed within a factor of 2,
consistent with national level 0.10
comparisons of model and monitor
data.
However, ethylbenzene, ^
formaldehyde, carbon tetrachloride, ^ *
chloroform and tetrachloroethylene
showed monitored concentrations
more than a factor of
2 higher than model estimates at
these sites.
Section 7 - Advanced Analyses
o
I
31
-------
Evaluating Models
Using Scatter Plots
Modeled and monitored concentrations can
also be compared using scatter plots,
plotting each data pair (ambient site-
average, model output) separately. For
NATA 1999, benzene data compared well to
the modeled data.
There are several reasons why we would
expect good agreement between model
prediction and monitor results for benzene.
- It is a widely distributed pollutant which is
emitted from point, area, and mobile
sources. Thus, if the model is biased in the way
it handles any one of these source categories,
the bias will likely be dampened by one of the
other sources.
- An estimated background concentration was
available for benzene in the modeling effort.
- There is a large number (87) of monitoring sites
for benzene for this comparison, resulting in an
adequate sample size for the statistics in the
comparison.
- Monitoring technology for benzene has a long
history, suggesting that the monitoring data
reflects actual ambient concentrations.
- Benzene emissions have been tracked for many
years, so there is some confidence in emission
estimates.
loclel to Monitor p ot tor Benzene
Model Cone.
2:1
1:2
0 I '1 3 4 5 6 7 8
Mon i tor Concentr n tion
2001 Aspen Model concentrations vs 1S9S Monitor Averages
Model-to-monitor scatter plot for benzene. Most
points fall within the factor of 2 wedge, and none
are far outside the wedge. From
http://www.epa.gov/ttn/atw/nata/draft6. htm IffsecV
June 2009
Section 7 - Advanced Analyses
32
-------
Network Assessment
Introduction
Air quality agencies may choose to re-evaluate and reconfigure
monitoring networks because
- Air quality has changed;
- Populations and behaviors have changed;
- New air quality objectives have been established
(e.g., air toxics reductions, PM25, regional haze); and
- Understanding of air quality issues and monitoring capabilities have
improved.
Network assessments may include
- Re-evaluation of the objectives and budget for air monitoring;
- Evaluation of a network's effectiveness and efficiency relative to its
objectives and costs; and
- Development of recommendations for network reconfigurations and
improvements.
Network assessment guidance is available from EPA at
June 2009 Section 7 - Advanced Analyses 33
-------
Network Assessment
Methodology
Some things to consider when performing a
network assessment:
• Length of monitoring. Takes into account a site's
monitoring history because long data records can be
highly useful in trends and accountability analyses.
• Suitability analyses. Combines many data sets such as
population or population change, meteorology,
topography, and emissions to asses suitability of current
or future monitoring locations.
June 2009 Section 7 - Advanced Analyses 34
-------
Network Assessment
Period of Operation (1 of 2)
Motivation
- Monitors that have long
historical trends are
valuable for tracking
trends.
- This technique places
the most importance on
sites with the longest
continuous trend record.
Resources needed
- Historical monitor data,
typically valid annual
averages.
IH1 ,3-Butadiene
DAcetaldehyde
• Benzene
• Chromium (Tsp)
• 1 ,4-Dichloro benzene
DArsenic (Tsp)
DCarbon Tetrachloride
DNickel (Tsp)
The figure shows the number of monitoring sites per year
for a variety of air toxics. The number of air toxics
monitoring sites has increased dramatically since 1990.
June 2009
Section 7 - Advanced Analyses
35
-------
Network Assessment
Period of Operation (2 of 2)
City, State
Stockton, CA
Baltimore, MD
Los Angeles, CA
San Francisco, CA
Fresno, CA
Baltimore, MD
Los Angeles, CA
Los Angeles, CA
San Diego, CA
San Francisco, CA
San Jose, CA
Baltimore, MD
Sacramento, CA
San Diego, CA
Oxnard, CA
Chicago, IL-IN-WI
Baltimore, MD
AQS SitelD
06-077-1002
24-510-0040
06-037-1002
06-001-1001
06-019-0008
24-005-3001
06-037-1103
06-037-4002
06-073-0003
06-075-0005
06-085-0004
24-510-0006
06-061-0006
06-073-0001
06-111-2002
18-089-2008
24-510-0035
Years
13
12
11
10
10
10
9
9
9
9
9
9
8
8
8
8
8
•Tetrachloroethylene
1,400
1,200 -
> rfc
C? QN & &
SJ rvO rvO r^J
The table lists the number of annual averages available for
tetrachloroethylene at toxics monitoring sites from 1990 to 2003.
For this analysis, sites with the longest record would be rated
higher than those with shorter records.
June 2009
Section 7 - Advanced Analyses
36
-------
Network Assessment
Suitability Modeling/Spatial Analysis (1 of 2)
• Motivation
- This method may be used to identify suitable monitoring locations
based on user-selected criteria.
- Geographic map layers representing important criteria, such as
emissions source influence, proximity to populated places, urban
or rural land use, and site accessibility, can be compiled and
merged to develop a composite map representing the combination
of important criteria for a defined area.
- The results indicate the best locations to site monitors based on
the input criteria and may be used to guide new monitor siting or to
understand how changes may impact the current monitoring
network.
• Resources needed
- GIS, site locations, population and other
demographic/socioeconomic data, emission inventory data
- Meteorology and concentration data may be helpful, but are not
necessary
- Skilled GIS analyst
June 2009 Section 7 - Advanced Analyses 37
-------
Network Assessment
Suitability Modeling/Spatial Analysis (2 of 2)
A representation of the process of suitability modeling and spatial analysis
Points
Lines
Population
Elevation
Input Data:
Point, line, or
polygon geographic
data
Gridded Data:
Create distance
contours or density
plots from the data
sets
Reclassified Data:
Reclassify data to
create a common
scale
Weight and combine data sets
High Suitability
Low Suitability
June 2009
Output suitability model
Section 7 - Advanced Analyses
38
-------
Network Assessment
Suitability Modeling Example
The goal of this analysis of the Phoenix area was to use
CIS technology to identify locations within an area
potentially suitable for placing air toxics and/or particulate
monitors to better assess diesel particulate matter (DPM)
emissions impacts on population.
The emission inventory was assessed to determine
- predominant sources of DPM; and
- the best available geographic data to represent the spatial pattern
of the identified emission sources in the region.
The relative importance of each geographic data set was
determined based on its potential DPM contribution.
The input layers were weighted accordingly and combined
to produce a suitability map using the Spatial Analyst CIS
tool.
June 2009 Section 7 - Advanced Analyses 39
-------
Network Assessment
Example Suitability Modeling Data Layers
1. Traffic volume (Annual Average Daily
Traffic, AADT)
2. Heavy-duty truck volume (from AADT
data)
3. Locations of railroads and
transportation depots
4. Residential and commercial
development areas
5. Golf courses and cemetery locations
(lawn and garden equipment usage)
6. Airport locations
7. PM2.5 point source locations (weight
assigned to each source depends on
the source's relative EC contribution)
8. Total population and sensitive
population (e.g., under 5 and over
65 years of age) density
9. Annual average gridded wind fields
representing predominant wind
direction throughout the region
Lmked-based Annual
Average Daily Traffic
CHAHDL&KI
Airport City Boundary
Tribal Land Boundary County Boundary
AADT
June 2009
Section 7 - Advanced Analyses
40
-------
Network Assessment
Example Suitability Modeling Weighting
Weighting Scheme -two model scenarios were used:
1. Proximity to diesel emission sources (hot spot)
2. Proximity of population to diesel sources
Layer
Density of total population
Heavy-duty vehicle activity
Light-duty vehicle activity
Transportation distribution
facility
Lawn/garden activity areas
Commercial/residential
construction activity areas
Distance to airports
Distance to railroads
PM25 point source activity
(1)
Hot Spot
—
20%
15%
20%
12%
20%
2%
2%
9%
(2)
Total
Population
40%
12%
9%
12%
7.2%
12%
1.2%
1.2%
5.4%
Weighting Criteria
High population density = more suitable
High traffic density = more suitable
High traffic density = more suitable
Close to facility = more suitable
High activity density = more suitable
High activity density = more suitable
Close to airport = more suitable
Close to railroad = more suitable
High non-EC PM25
emissions density = less suitable
June 2009
Section 7 - Advanced Analyses
41
-------
Network Assessment
Example Results of Suitability Modeling
The map shows the
results of combining all
data layers in Scenario 1
(table on previous slide).
The map indicates that
the Glendale area is a
hot spot for both diesel
influence and population,
as well as the area
around the Phoenix
Supersite.
The area between
Guadalupe and Mesa is
also suitable for
monitoring to better
understand DPM
impacts.
Scenario 1 (population and meteorology included)
PHOENOfC' PARADISE VALLEY
JLG Supersite
?*
tsL
APACHE JUNCTION
-33-
• o.,(.'•• rfl
fe-
(WEEN CREEK
Legend
Suitability Model
A AQ Monitor Location
'~\_x Interstate/Freeway
Urban Boundary
Total Population/Wind Influence Weighting Scheme
Total Population Density = 40% Commercial Laval/Garden
Heavy Duty MOT Roads = ?2% Usage Areas - 7.2%
Transportation Facilities - 12% PM 2.5 Point Sources = 5.4%
Commercial/Residential Railroads = 1.2%
Development Areas = 12% Airports = 1.2%
Light DutyAADT Roads = 9%
June 2009
Section 7 - Advanced Analyses
42
-------
Network Assessment
Suitability Analysis Summary
Results of this analysis assisted decision makers in
- Assessing the utility of current monitors;
- Selecting locations for new monitors;
- Setting monitoring priorities; and
- Investigating a range of monitoring objectives and
considerations.
Suitability analysis can improve the effectiveness of
monitoring decisions
June 2009 Section 7 - Advanced Analyses 43
-------
Resources
PMF, Unmix, and CMB:
http://www.epa.gov/scram001/receptorindex.htm
EPA's Multivariate Receptor Modeling Workbook:
http://www.sonomatechdata.coni/sti workbooks/#MVRMWB
NOAA HYSPLIT model:
http://www.arl.noaa.gov/readv/hysplit4.html
EPA SPECIATE, recently updated (version 4.0):
http://www.epa.gov/ttn/chief/software/speciate/index.html.
Network assessment guidance:
http://www.epa.gov/ttn/amtic/cpreldoc.html
June 2009 Section 7 - Advanced Analyses 44
-------
References
Begum B.A., Kim E., Jeong C.H., Lee D.W., and Hopke P. (2005) Evaluation of the potential source contribution
function using the 2002 Quebec forest fire episode. Atmos. Environ. 39, 3719-3724.
Berkowitz C.M., Xie Y.-L, Jolly J., and Estes M. (2004) Receptor modeling and analysis: early first results from the
2003 enhanced Houston Auto-GC network. Presented at the TERC Science Advisory Committee (SAC) Meeting,
October 13. Available on the Internet at
.
Brown S.G., Frankel A., and Hafner H.R. (2007a) Source apportionment of VOCs in the Los Angeles area using
positive matrix factorization. Atmos. Environ. 41, 227-237 (STI-2725).
Brown S.G., Wade K.S., and Hafner H.R. (2007b) Multivariate receptor modeling workbook. Prepared for the U.S.
Environmental Protection Agency, Office of Research and Development, Research Triangle Park, NC, by Sonoma
Technology, Inc., Petaluma, CA, STI-906207.01-3216, August.
Chinkin L.R., Coe D.L., Hafner H.R., and Tamura T.M. (2003) Air toxics emission inventory training workshop.
Sponsored by the U.S. Environmental Protection Agency, Region IX, Richmond, CA. Prepared by Sonoma
Technology, Inc., Petaluma, CA, STI-903320-2398, July 15-16.
Friedlander S.K. (1973) Chemical element balances and identification of air pollution sources. Environ. Sci. Technol. 7,
235-240.
Fujita E.M., Croes B.E., Bennett C.L., Lawson D.R., Lurmann F.W., and Main H.H. (1992) Comparison of emission
inventory and ambient concentration ratios of CO, NMOG, and NOx in California's South Coast Air Basin. J. Air &
Waste Manag. Assoc. 42, 264-276.
Fujita E.M., Watson J.G., Chow J.C., and Lu Z. (1994) Validation of the chemical mass balance receptor model applied
to hydrocarbon source apportionment in the Southern California Air Study. Environ. Sci. Technol. 28, 1633-1649.
Gordon G.E. (1988) Receptor models. Environ. Sci. & Technol. 22(10), 1132-1142.
Hafner H.R., Penfold B.M., and Brown S.G. (2005) Using CIS tools to select suitable DPM monitoring locations:
Phoenix, Arizona. Presented at the 2005 Air Toxics Summit, Seeking Solutions for our Rural and Urban
Communities, Portland, OR, October 18-19, by Sonoma Technology, Inc., Petaluma, CA (STI-904234-2755).
June 2009 Section 7 - Advanced Analyses 45
-------
References
Henry R. C. (1997) History and fundamentals of multivariate air quality receptor models. Chemometrics and Intelligent
Laboratory Systems 37, 525-530.
Henry R.C., Lewis C.W., Hopke P.K., and Williamson H.J. (1984) Review of receptor model fundamentals. Atmos.
Environ. 18(8), 1507-1515.
Henry R.C. (1997) History and fundamentals of multivariate air quality receptor models. Chemometrics and Intelligent
Laboratory Systems 37, 525-530.
Henry R.C. (2000) Unmix Version 2 Manual. Available on the Internet at
.
Henry R.C. (2002) Receptor modeling. In Encyclopedia of Environmetrics, A.H. El-Shaarawi and W.W. Piegorsch eds.,
John Wiley & Sons, Ltd, Chichester, 1706-1721.
Hidy G.M. and Friedlander S.K. (1971) The nature of the Los Angeles aerosol. In proceedings from the Second
International Clean Air Congress, 391-404, Academic Press, New York.
Hopke P.K. (2003) A guide to positive matrix factorization. Prepared for Positive Matrix Factorization Program,
Potsdam, NY, by the Department of Chemistry, Clarkson University, Potsdam, NY.
Hopke P.K., Ramadan Z., Paatero P., Norris G., Landis M., Williams R., and Lewis C.W. (2003) Receptor Modeling of
Ambient and Personal Exposure Samples: 1998 Baltimore Particulate Matter Epidemiology-Exposure Study. Atmos.
Environ. 37, 3289-3302.
Kim E., Hopke P.K., Larson T.V., and Covert D.S. (2004a) Analysis of ambient particle size distributions using UNMIX
and positive matrix factorization. Environ. Sci. Technol. 38 (1), 202-209.
Kim E., Hopke P.K., Larson T.V., Maykut N.N., and Lewtas J. (2004b) Factor analysis of Seattle fine particles. Aerosol
Sci. Technol. 38 (7), 724-738.
Kim E., Hopke P.K., Kenski D.M., and Koerber M. (2005b) Sources of fine particles in a rural Midwestern U.S. area.
Environ. Sci. Technol, 39 (13), 4953-4960.
Kim E. and Hopke P.K. (2004) Improving source identification of fine particles in a rural northeastern U.S. area utilizing
temperature-resolved carbon fractions. J. Geophys. Res. 109 (D9), D09204, doi: 09210.01029/02003JD004199.
June 2009 Section 7 - Advanced Analyses 46
-------
References
Kim E., Hopke P.K., and Qin Y. (2005c) Estimation of organic carbon blank values and error structures of the speciation
trends network data for source apportionment. J. Air & Waste Manag. Assoc. 55, 1190-1199.
Lindsey C.G., Chen J., Dye T.S., Richards L.W., and Blumenthal D.L. (1999) Meteorological processes affecting the
transport of emissions from the Navajo Generating Station to Grand Canyon National Park. J. Appl. Meteorol. 38
(No. 8), 1031-1048.
Main H.H. and Roberts P.T. (2001) PM2.5 data analysis workbook. Draft workbook prepared for the U.S. Environmental
Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park, NC, by Sonoma
Technology, Inc., Petaluma, CA, STI-900242-1988-DWB, February.
Main H.H. and Roberts P.T. (2000) PAMS data analysis workbook: illustrating the use of PAMS data to support ozone
control programs. Prepared for the U.S. Environmental Protection Agency, Research Triangle Park, NC, by Sonoma
Technology, Inc., Petaluma, CA, STI-900243-1987-FWB, September.
Larsen R.K. and Baker J.E. (2003) Source apportionment of polycyclic aromatic hydrocarbons in the urban atmosphere:
a comparison of three methods. Environ. Sci. Technol. 37 (9), 1873-1881.
Lewis C.W., Norris G.A., Conner T.L., and Henry R.C. (2003) Source apportionment of Phoenix PM2.5 aerosol with the
Unmix receptor model. J. Air & Waste Manag. Assoc. 53 (3), 325-338
Paatero P. and Tapper U. (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of
error estimates of data values. Environmetrics 5, 111-126.
Paatero P. (1997) Least squares formulation of robust non-negative factor analysis. Chemometrics and Intelligent
Laboratory Systems 37, 23-35.
Paatero P., Hopke P.K., and Philip K. (2003) Discarding or downweighting high-noise variables in factor analytic
models. Anal. Chim. Acta 490, 277-289.
Poirot R.L., Wishinski P.R., Hopke P.K., and Polissar A.V. (2001) Comparative application of multiple receptor methods
to identify aerosol sources in northern Vermont. Environ. Sci. Technol. 35 (23), 4622-4636.
Raffuse S.M., Sullivan D.C., McCarthy M.C., Penfold B.M., and Hafner H.R. (2006) Analytical techniques for technical
assessments of ambient air monitoring networks. Guidance document prepared for the U.S. Environmental
Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park, NC, by Sonoma
Technology, Inc., Petaluma, CA, STI-905212.02-2805-GD, September.
June 2009 Section 7 - Advanced Analyses
-------
References
Raffuse S.M., Brown S.G., Sullivan D.C., and Chinkin L.R. (2005) Estimating regional contributions to atmospheric
haze. Presented at the 2005 ESRI International User Conference, San Diego, CA, July 26 (STI-2649).
Raffuse S.M., Sullivan D.C., and Chinkin L.R. (2005) Emission impact potential - a method for relating upwind
emissions to ambient pollutant concentrations. Presented at the U.S. Environmental Protection Agency 14th
International Emission Inventory Conference, Las Vegas, NV, April 11-14 by Sonoma Technology, Inc., Petaluma,
CA (STI-2715, STI-2722). Available on the Internet at
.
Rosenbaum A.S., Ligocki M.P., and Wei Y.H. (1999) Modeling cumulative outdoor concentrations of hazardous air
pollutants. Revised final report prepared for the U.S. Environmental Protection Agency, Research Triangle Park, NC
(SYSAPP-99-96/33R2), February.
Seigneur C., Pun B., Lohman K., and Wu S.-Y. (2002) Air toxics modeling. Report prepared for Coordinating Research
Council, Inc., Alpharetta, GA and U.S. Department of Energy's Office of Heavy Vehicle Technologies through the
National Renewable Energy Laboratory, Golden, CO by Atmospheric & Environmental Research, Inc., San Ramon,
CA, CRC Project Number A-42-1, Document Number CP079-02-3, August. Available on the Internet at
;
U.S. Environmental Protection Agency (1999) Air dispersion modeling of toxic pollutants in urban areas: Guidance,
methodology and example applications. Prepared by the Office of Air Quality Planning and Standards, Research
Triangle Park, NC, EPA-454/R-99-021, July
Watson J.G. (1979) Chemical element balance receptor model methodology for assessing the source of fine and total
particulate matter. Ph.D. Dissertation, Oregon Graduate Center, Portland, OR, University Microfilms International,
Ann Arbor, Ml.
Watson J.G. (1984) Overview of receptor model principles. J. Air Poll. Cont. Assoc. 34 (6), 619-623.
Watson J.G., Fujita E.M., Chow J.C., Zielinska B., Richards L.W., Neff W., and Dietrich D. (1998) Northern front range
air quality study. Final report prepared for Colorado State University, Cooperative Institute for Research in the
Atmosphere, Fort Collins, CO, by Desert Research Institute, Reno, NV, STI-996410-1772-FR, June.
June 2009 Section 7 - Advanced Analyses 48
-------
Suggested Analyses
What types of analyses could be done with my air toxics data?
June 2009 Section 8 - Suggested Analyses
-------
Motivation
• Ambient air toxics have been monitored since 2001/2002 as part of
NATTS and even longer as part of other monitoring programs.
While national-level analyses have been conducted, it is important
that these data be investigated at a local, state, and regional level
to better understand an area's air toxics issues.
• Regular data analysis may be conducted annually to identify
potential problems with the data at the site level. Adjustments can
then be made in collection or analysis to improve data quality
before several years of potentially poor quality data have been
collected.
• A list of suggested air toxics data analyses has been provided
(Introduction). This list is a potential minimum set of analyses that
each area could perform.
• Key areas of interest
- Is the quality of data sufficient for analysis?
- How would air toxics be characterized in the area?
- What are local sources of air toxics?
- Are there changes in toxics concentrations over time?
June 2009 Section 8 - Suggested Analyses
-------
Suggested Analyses
What's Covered in This Section
A set of potential analyses using Arizona data has been used as an example.
• This section outlines a sample analysis of an urban data set from start to finish in order to
provide a thorough example. These data were previously assessed and readily available.
• Note that this is an example analysis and is not intended to show the only way air toxics
analyses should be performed. Deviations or additional analyses may be necessary depending
on the data or the analyst's objectives.
• The following topics will be covered
following the sequence of this workbook
- Background
Introduction to the data
Understanding sources
- Data validation
(Workbook Section 3)
Determining data completeness
Assessing data below detection
Identifying censored data
Using quality-controlled data
Applying data validation techniques
- Data characterization
(Workbook Section 4)
Putting data in perspective
Spatial patterns
Temporal patterns
Model-to-monitor comparisons
Risk screening
- Trends
(Workbook Section 5)
- Advanced analyses
(Workbook Section 6)
Source apportionment
June 2009
Section 8 - Suggested Analyses
-------
Introduction to the Data
Overview
• The sample data set used throughout this section is from an air toxics study
performed in Arizona as part of the Joint Air Toxics Assessment Project (JATAP).
• The purpose of the study was to determine which air toxics are of most concern to
the area and tribal communities.
• The study was conducted in two phases. (Analyses in this section focus primarily
on Phase II data.)
- Phase I: March 2003-March 2004
- Phase II: February 2005 - March 2005
• Twenty-four-hour air toxics samples were collected every sixth day. On some days
at some sites, two 12-hr samples were collected; for this analysis, these samples
were 24-hr averaged. Only gaseous air toxics were collected and discussed here.
• A considerable quality assurance effort was made
- Duplicate samples (collocated)
- Replicate data (additional chemical analysis on canister)
- Interlaboratory comparisons (more than one laboratory was involved)
- Data validation
• For the trend assessment, we used historical data at two longer-term sites in the
study area to illustrate air toxics concentrations over time in the area.
June 2009 Section 8 - Suggested Analyses
-------
ntroduction to the Data
Monitoring Site Locations
st Phoenix L!LGH>ersi
Senior Center (Sa
Greenwood '-
th Phoenix f
Queen Valley
*
St. Johns (Gila River) m
ADEQ sites
St Johns site
Salt River site
Urban Areas
West 43rd St.
The map shows the eight monitoring sites in the study. The map was created with ArcMap. The
West Phoenix, South Phoenix, and Senior Center sites are used most frequently in the sample
analyses. The St. Johns site was operated by the Gila River Indian Community. The Senior
Center site was operated by the Salt River Pima-Maricopa Indian Community.
June 2009
Section 8 - Suggested Analyses
-------
Understanding Sources
Population Density
The map shows
population density in
the study area. The
three focus sites are
indicated.
Data from these sites
help identify the most
populated areas and
potential air toxics
source locations (e.g.,
high population
density » higher
emissions).
2000 population
density data were
obtained from the
U.S. Census Bureau.
Total Population Density
FOUNTAIN HILLS
PARADISE VALLEY
APACHE JUNCTION
I I
QUEEN CREdK
Legend
Airport Cojnty Boundary
t3 Tribal Land Boundary City Boundary
Total Pop/sq km
0 5 10 20 Kilometers
l i i i i i i i i
West Phoenix
+ South Phoenix
v "V v v v v
+ Senior Center
June 2009
Section 8 - Suggested Analyses
-------
Understanding Sources
Mobile Sources
Annual Average Daily Traffic
' \ v
The map shows annual
average daily traffic
(AADT) and heavy-duty
vehicle (HDV) daily traffic
for the study area (number
of vehicles per day). The
three sites of interest for
this example are shown.
AADT is an indicator of the
relative on-road mobile
source activity, and
corresponding emissions
levels, in the study area.
Traffic data were obtained
from the Arizona
Department of
Transportation (ADOT).
/HDV AADT
P f/f
*S
HDV Annual Average Daily Traffic
C) /OD
'
,-
BUCKEYE
SUNC
LITCHFIELD PAFJC
"TOIlLESON
Fot»
FOUNTAIN
PARAaSf VALLEY
Salt Riv
mDtW
»jM
"- i
-
!M
West Phoenix
+ South Phoenix
+ Senior Center
a
June 2009
Section 8 - Suggested Analyses
-------
Understanding Sources
Point Sources
Point Source Emissions of VOCs
The map shows point
source emissions for
total VOCs in the study ^irPnse -
area. The three sites
of interest are shown
on the map. Other
sites in the area are
also shown (Supersite
[PSAZ] and St. Johns
[SJAZ]).
Note that mobile
source emissions are
not included in this
data set (see the
average daily traffic
maps on previous
slide).
Emissions data were
obtained from the
2002 NEI.
VOC (tons/yr)
Scot
« o O O O
June 2009
Section 8 - Suggested Analyses
-------
Using Quality Assurance Data
Overview
Quality assurance (QA) is performed during
sample collection and analysis to provide
additional information about data quality and
usefulness.
-Collocated samples indicate agreement between
sample collection
-Replicate samples indicate agreement between
sample analysis
These data provide insight into biases and error
that may occur in the process of collecting and
analyzing samples.
June 2009 Section 8 - Suggested Analyses
-------
Using Quality Assurance Data
Visual Inspection of Collocated Samples (1 of 2)
Visual inspection of
collocated samples is
important to identify outliers
and understand sampler
performance.
Collocated data for
chloroform are plotted in
the figure.
The data indicate that
chloroform is consistently
measured; however
Sampler 2 reported slightly
lower values than Sampler 1
at higher concentrations.
0.2
0.16
Q.
3 0.12
CM
0)
g 0.08
"5
o
0.04
0
0
Chloroform
y = 0.8871x +0.003
R2 = 0.9648
0.04
0.08
0.12
0.16
0.2
Collocated 1 (ppbv)
The figure shows collocated chloroform samples collected in
the study. It was created with Microsoft Excel.
June 2009
Section 8 - Suggested Analyses
10
-------
Using Quality Assurance Data
Visual Inspection of Collocated Samples (2 of 2)
In this figure, collocated data
for hexachlorobutadiene are
plotted to the right; outliers are
circled in red. Outliers
identified from collocated
samples should be excluded
from further data analyses.
The data indicate that
hexachlorobutadiene is not
consistently measured;
Sampler 2 reported lower
values than Sampler 1 at high
concentrations. This is
consistent with observations of
collocated chloroform data.
2.4 i
2 -
a. 1.61
a.
CM
"§1.21
•+•»
re
o
o
O 0.81
0.4-
0
TO15 Hexachlorobutadiene
o
o
o
o
y = 1.2427X-0.0883
O N = 24
Standard Error
Intercept: 0.22
Slope: 0.31
0 0.4 0.8 1.2 1.6 2
Collocated 1 (ppbv)
2.4
June 2009
Section 8 - Suggested Analyses
11
-------
Using Quality Assurance Data
Summarizing Sample Problems for Analysis
The table shows an excerpt from the list of measurements, identifying problems in
one of the study area site replicate comparisons.
In site-level analyses, we typically exclude any of these failures. We flagged as
suspect the pollutant identified as a problem in the indicated sample and did not use
this pollutant/sample combination in subsequent analyses (e.g., toluene on 7/26/03).
Flag 1 indicates that the percentage error was greater than 50%. Flag 2 indicates
that the absolute difference in the two species was greater than three times MDL.
Flag 3 indicates that the replicate or collocated average was suspect.
Date
7/26/2003
7/26/2003
7/26/2003
8/25/2003
8/25/2003
8/25/2003
8/25/2003
8/25/2003
9/24/2003
Species Name
Toluene
1 ,3,5-trimethylbenzene
1 ,2,4-trimethylbenzene
MTBE
Methyl ethyl Ketone
n-octane
1 ,3,5-trimethylbenzene
1 ,2,4-trimethylbenzene
Methyl ethyl Ketone
Flagl
X
X
X
Flag 2
X
X
FlagS
X
X
X
X
X
Suspect
X
X
X
X
X
X
X
X
X
June 2009
Section 8 - Suggested Analyses
12
-------
Data Completeness
Overview
For the site-level analysis, we summarized available
data and calculated data completeness based on
expected samples.
This step included calculating the number of valid
samples versus the expected number of samples
based on collection frequency.
In general, 75% data completeness is required to
calculate valid aggregated values (e.g., monthly,
quarterly, and annual averages).
See Preparing Data for Analysis, Section 4, for a
complete description of methods and rationale.
June 2009 Section 8 - Suggested Analyses 13
-------
Data Completeness
Site-Level Summary
Site
Greenwood
JLG Supersite
Queen Valley
St. Johns
Senior Center
South Phoenix
West Phoenix
Sampling
Cartridges3
Canisters
Cartridges3
Canisters
Canisters
Canisters
Canisters
Cartridges3
Canisters
Canisters
Sampling
Duration
24-hr
24-hr
24-hr
24-hr
24-hr
24-hr
and 12-hr
24-hr
and 12-hr
24-hr
24-hr
24-hr
Samples
Expected
61
61
61
61
31
30 (24-hr)
62 (12-hr)
30 (24-hr)
62 (12-hr)
61
61
61
Samples
Available
60
61
61
61
31
37 (24-hr)
44 (12-hr)
37 (24-hr)
46 (12-hr)
60
60
60
Valid
Samples
60
59
49
55
30
79
83
52
59
59
Percent
Valid
98
97
80
90
97
95b
98b
85
97
97
The table shows data necessary to calculate the data completeness and the percent of valid
data. The number of valid samples was computed after data validation steps but shown here for
a complete summary.
A high percentage of samples from all sites were valid.
Additional samples may be marked as suspect during the process of data analysis.
June 2009
a Carbonyls only.
Section 8 - Suggested Analyses b This percentage is based on 24-hr average sample days. 14
-------
Assessing Data Above Detection
Species
Benzene
Bromomethane
Carbon
tetrachloride
Chloroform
Dichloromethane
Ethylbenzene
Hexachloro-
butadiene
2005 Percent Above MDL
St.
Johns
100
40
89
43
76
71
0
Senior
Center
99
36
89
90
94
92
0
South
Phoeni
X
100
37
89
77
97
92
0
West
Phoenix
100
49
83
83
98
94
0
Green-
wood
100
24
100
98
100
100
2
JLG
Supersite
100
33
100
100
100
100
4
Queen
Valley
100
23
100
53
97
93
0
The percent of data above detection should be calculated for each pollutant, site and year; additional
calculations will be needed if monthly or seasonal aggregates are produced. The table shows an
excerpt of the entire data set - the percent of data above detection for 2005. This example spans the
range of data above detection observed in the data set.
Data were color-coded in the table to illustrate potential patterns in data
availability. More data were below detection at St. Johns and Queen Valley,
consistent with their location away from sources. Hexachlorobutadiene was
typically below MDL at all sites.
< 25% Above MDL
25% to 75% Above MDL
>= 75% Above MDL
June 2009
Section 8 - Suggested Analyses
15
-------
Identifying Censored Data
Alternate MDLs were included with
the study data. Because alternate
MDLs are often different for each
sample, it is not always clear from the
data that censoring (e.g., substitution
with MDL or MDL/2) has occurred.
We need to ensure that all samples
are treated similarly when data are
aggregated.
Scatter plots are an easy way to
identify whether data below detection
are censored.
Plot all data points that are less than
or equal to the alternate MDL.
The agreement between
concentration and MDL indicates that
the alternate MDL was substituted for
values below detection. These
samples were identified and MDL/2
substitution was subsequently applied
for data aggregation.
0.6
<> 0.5
Q.
a.
c
o
c
0)
o
c
o
O 0.3
0.2
Hexachlorobutadiene
0.2
0.3
0.5
0.4
MDL (ppbv)
The graph shows the comparison of concentration values to
their MDL for data at or below detection. It was created with
Microsoft Excel.
0.6
June 2009
Section 8 - Suggested Analyses
16
-------
Validation Techniques
Overview
• Once data are received from the laboratory, or a data repository such as AQS, it is
useful to apply screening criteria during the early stages of data validation to
identify suspect data that may not be representative of actual ambient
concentrations.
• Basic visual analyses should be performed to identify potential problems in the data
and to begin to understand data characteristics.
• Knowledge of similarity of sources, lifetime, and reactivity should be used to assist
in data validation.
• The following screening checks are typically used
- Comparison to remote background concentrations. Urban air toxics concentrations should
not be lower than remote background concentrations.
- Range checks. Check minimum and maximum concentrations for anomalous values.
- Buddy site check. Compare concentrations at one site to nearby sites to look for
anomalies.
- Sticking check. Check data for consecutive equal data values which indicate the possibility
of censored data not flagged appropriately.
- Scatter plots. Investigate the relationship between species to identify sources and suspect
data.
- Fingerprint plots. Investigate the pattern of species concentrations and relationships
among species to identify sources and suspect data.
• See the Preparing Data for Analysis, Section 4, for a complete description of
methods and rationale.
June 2009 Section 8 - Suggested Analyses 17
-------
18
16
•-•»
£ 14
^
5 12
o 10
8
6
-------
Validation Techniques
Buddy Site Check
Buddy site checks are useful in
identifying suspect data.
In the example, time series of benzene
concentrations for three sites are
plotted.
There is clearly a suspect data point at
the West Phoenix site in March 2005,
which is not corroborated by the other
sites. This indicates that the data point
should be considered suspect because
a concentration spike of that magnitude
should register at nearby sites.
- Investigation into these data showed that
this event corresponds to a single data
point significantly higher than the others.
- Further investigation revealed that many
species showed the same behavior at the
West Phoenix site. The site may be
impacted by a local source or sources.
.a
a_
a.
o
W—•
03
-i—•
(U
o
O
60
50
40
30
20
10
0
Benzene
CD
1 I 1 I I I
n West Phoenix
South Phoenix
Senior Center
^^^f%f%f&&Vr
-------
Validation Techniques
Time Series
The figures show the same benzene
time series as the previous slide and
matching time series for a variety of
other compounds.
Benzene, ethylbenzene, and toluene
can all be emitted by mobile sources.
The fact that these species peak at
the same time is suspicious, because
an increase of that magnitude from
typical mobile source emissions is
unlikely. However, an unusual event
may have occurred, such as a
gasoline spill very near the West
Phoenix site that could have led to
the high concentrations.
Examining the time series of carbon
tetrachloride helps confirm or reject
this theory because there are no
likely sources that would cause a
spike of that magnitude. The time
series of carbon tetrachloride shows a
spike on the same day indicating that
the event is in fact an instrument or
analysis error. All data for that date
and site should be flagged as suspect
and not used in subsequent analyses.
n West Phoenix
D South Phoenix
Senior Center
June 2009
Section 8 - Suggested Analyses
7
Senior Center (Salt River)
+
Greenwood \5
South Phoenix
20
-------
Validation Techniques
Scatter Plots (1 of 2)
The scatter plots show the relationship between
toluene and benzene and toluene and m,p-xylene at
three study sites. This method is another way to
identify suspect data, which have been circled in red
in the figures.
At the West Phoenix site, the correlation between
toluene, benzene, and m,p-xylene is strong, indicating
that this site is highly mobile source-dominated.
Outlier data points may point to data issues or other
source influences. For toluene outliers, high toluene
concentrations are often associated with solvent use
or surface coatings; thus, the samples are likely valid.
The correlations at the South Phoenix site are not
quite as strong, but still indicate that the site is likely
mobile source-dominated.
The Senior Center site, on the other hand, shows
a weak correlation between the three species as
expected for a site farther from fresh emissions.
40
30
20
10
South Phoenix
N)
o
^
01
-^
o
0
°
o
Oo9
40
30
1 2
BENZENE
O-
234
BENZENE
0
0
3 4 5
BENZENE
20
O
10
TOLUENE
0.0
20
15
June 2009
Section 8 - Suggested Analyses
0
2 3
MPXYLENE
0
0.5
1.0 1.5
MPXYLENE
2.0 2.5
0
O= outlier
123
MPXYLENE
21
-------
Validation Techniques
Scatter Plots (2 of 2)
LU
0
The figures show the same data as in the previous slide for the
West Phoenix site only. The dates of the two highest outliers
have been marked.
The outlier values all correspond to the unusually high toluene
concentrations. Significantly, the three toluene outliers
correspond with the three highest m,p-xylene events.
These correlations indicate that the high concentrations may not
be due to collection or analysis errors, but may indicate solvent or
surface-coating emissions impacting the site. Further exploration
might include assessing the importance of these concentrations
on the annual average and looking for possible sources of
toluene in the emission inventory.
The table below shows emission profiles for surface coating from
EPA's SPECIATE. Xylenes and toluene account for almost one-
third of this source profile supporting the hypothesis that the high °
concentration events are solvent-driven.
O = outlier
40
30
Profile llumbei : M02
Profile II, in it: Surface Coating Operations (Industrial)
Percent Total: 1 00
POLLUTANT CAS No.
ISOMERSOFXVLENE 1330207
TOLUENE 108883
METHYL ETHYL KETONE 78933
DIETHYLENE GLYCOL 111466
N-BUTYL ALCOHOL 71363
Percent
15.800
14.700
B.100
6.600
6.400
20
10
40
30
20
10
West Phoenix
2/2 1/2005
J3/27/2005
1 2
BENZENE
2/21/2005
8/27/2005..
2 3
MPXYLENE
June 2009
Section 8 - Suggested Analyses
22
-------
Validation Techniques
Fingerprint Plots
Fingerprint plots represent
concentrations of all species by date.
They are useful for identifying relative
pollutant concentrations on typical and
unusual days.
A typical fingerprint can be quantitatively
determined (e.g., median sample
composition) or qualitative (e.g., visual
inspection of all fingerprints).
The figures to the right show a typical
fingerprint plot and fingerprint plots for
2/21/2005 and 8/27/2005 (the two dates
of the highest outlier events in the
previous slides).
A review of fingerprints listed in EPA's
SPECIATE shows that toluene and
xylenes are prominent components of
surface coatings.
Q.
Q.
O
'-4—'
05
CD
O
c
O
O
Note scale is
lower than the
other two plots
f!)
Typical
15 20 25 30 35
15 50 55 60 E5 JO 75 50 Specie
2/21/2005
5 10 15 20 25 30 35 40 45 50 55 GO 65 70 75
June 2009
Section 8 - Suggested Analyses
23
-------
Validation Techniques
Summary
What have we learned from applying these validation
techniques?
- Additional invalid and suspect data points were identified.
- Data quality and limitations are better understood.
- Spatial and temporal characteristics of the data are more
thoroughly indicated.
- Hypotheses about possible source influences for further
investigation can be formed.
These are a few examples of the data validation process
that would be performed on the data set.
Remember, data validation continues as part of data
analysis.
June 2009 Section 8 - Suggested Analyses 24
-------
Basic Understanding of Data
Scatter Plot Matrices
Scatter plot matrices provide a quick and easy
way to view correlations and outliers within a
large amount of data.
Scatter plot matrices are interpreted by
matching the pollutant name on the row and
column corresponding to the scatter plot.
Histograms showing the distribution of
measured values for each pollutant are included
along the top diagonal.
The graph to the right shows scatter plot
relationships for five pollutants at the South
Phoenix site. Note that previously identified
outliers have been removed.
The data show a clear correlation between
toluene, m,p-xylene, and benzene, indicating
that these pollutants are likely from mobile
sources. Chloroform also shows a slight
correlation with the mobile source pollutants
(across the second row from the bottom) but the
bifurcated relationship indicates a secondary
source. Carbon tetrachloride shows little
correlation with any species and shows a
histogram that is roughly Gaussian, as expected
for background pollutants.
-fc
*
'
^
^Ci.
\
June 2009
Section 8 - Suggested Analyses
25
-------
Putting Data In Perspective
Overview
Putting concentrations and MDLs into perspective
provides a framework for comparing site-level
concentrations to national levels and to other sites in
the area.
This information is useful in assessing whether
concentrations are typical, low, or high and can help
explain the impact of local source emissions on
monitored concentrations.
June 2009 Section 8 - Suggested Analyses 26
-------
Putting Data In Perspective
National Concentrations
Benzene —
Carbon Tetrachloride —
The figure shows the national 5th-95th,
25th-75th, and 50th percentile concentrations by
species (bars) compared to site-averaged 1,3-Butadiene —
concentrations (symbols).
Though Senior Center is the most rural
(although within a few miles of urban
emissions) of the other sites included in the
figure, concentrations are typically
higher than the national median and
sometimes higher than the national chloroform —
75th percentile concentration, showing
that the site is impacted by urban emissions.
Dichloromethane —
Concentrations at the West and South Phoenix
sites are also typically well above the national
median. Concentrations of benzene and 1 ^Tetrachioroethene
butadiene are near or above the 95th percentile
of national concentrations. Trichioroethene -\
National concentrations of carbon tetrachloride
fall within a very small range due to its
ubiquitous background concentration. The
average carbon tetrachloride concentrations at
all study sites are in good agreement with
national levels, providing confidence that data
collection in the study is representative of
national data collection methods.
| | 5th :95th National
|gi 2Sh:75lh National
| Median National
\o\ MCAZ average
SRAZ average
<^> SPAZ average
I
1
Ci!
0.01
0.1 1 10
Concentration (|ug/m3)
MCAZ = West Phoenix
SPAZ = South Phoenix
SRAZ = Senior Center
100
June 2009
Section 8 - Suggested Analyses
27
-------
Putting Data In Perspective
Cancer Risk
The figure shows the same
data as the previous slide,
with the addition of the
chronic exposure
concentration associated with
a 1-in-a-million cancer risk to
place health risks in
perspective.
Concentrations could be
compared to other cancer risk
levels: 0.1-in-a-million, 10-in-
a-million, 100-in-a-million, etc.
Concentrations are typically
higher than the 1-in-a-million
cancer risk level shown
except for dichloromethane
and sometimes
trichloroethene.
1,3-Butadiene
Benzene —
Carbon Tetrachloride —
Chloroform —
Dichloromethane —
Tetrachloroethene —
Trichloroethene —
I I 5th:95th National
H3 29h:73h National
[ Median National
[n] MCAZ average
SRAZ average
«~> SPAZ average
/g\ 1-in-a-million
^^ chronic
exposure
concentration
0.01
0,1 1 10
Concentration (jug/m3)
MCAZ = West Phoenix
SPAZ = South Phoenix
SRAZ = Senior Center
100
June 2009
Section 8 - Suggested Analyses
28
-------
Putting Data In Perspective
MDLs
June 2009
Examining the relationship between
MDLs at multiple sites is imperative to
check that MDL/2 substitutions are not
biasing the data differently at different
sites.
The graph shows the average MDL and
minimum-to-maximum MDL range for
three study sites.
This graphical method allows the analyst
to quickly confirm that MDLs are very
similar between sites.
- MDLs at the West Phoenix site (light
purple bar) are sometimes higher than
at other sites.
- The difference is not enough to cause a
major bias unless a high percentage of
data is below the MDL. For example,
hexachlorobutadiene is typically below
detection so MDL/2 substitution may cause
concentrations at the West Phoenix site to
appear higher than at the other sites.
However, hexachlorobutadiene, such a
large portion of data is below detection that
it cannot be reliably used for many
analyses in the first place.
Section 8 - Suggested Analyses
1,1-Dichloroethene TO-ISSIM
1,2-DichIoroethane TO-ISSIM
1,2-Diehloropropane TO-ISSIM
1,3-Butadiene TO-IS
Benzene TO-ISSIM
Bro mom ethane TO-IS SIM
Carbon tetrachloride TO-IS SIM
Chloroform TO-IS SIM
Dichloromethane TO-ISSIM
Hexachlorobutadiene TO-IS
m,p-xylene -TO-U
Methyl tert butyl ether TO-14
o-Xyler»e —TO-IS SIM
Styrene —TO-ISSIM
Tetrachloroethene —TO-ISSIM
Toluene —TO-ISSIM
Trichloroethene TO-15 SIM
Vinyl chloride ~ TO-IS SIM
MDL Assessment
4__
-.JCM
0.01
0.1
ppbv
MCAZ 2005 Min Max MDL Range
SRAZ 2005 Min Max MDL Range
SPAZ 2005 Min Max MDL Range
MCAZ 2005 Avg MDL
SRAZ 2005 Avg MDL
SPAZ 2005 Avg MDL
MCAZ = West Phoenix
SPAZ = South Phoenix
SRAZ = Senior Center
29
-------
Spatial Patterns
Understanding spatial patterns is important
and can provide insight into
- Improving monitoring networks
- Verifying and improving emission inventories
- Verifying and improving models
- Identifying sources
The box plots show 2005 concentrations of
benzene, 1,3-butadiene, chloroform, and
carbon tetrachloride at three study sites.
Benzene and 1,3-butadiene concentrations
are higher and more variable at the West and
South Phoenix sites.
- The lower concentrations and especially lower
variability at the Senior Center site indicates that
the site is removed from primary sources and is
representative of the regional background.
Chloroform and carbon tetrachloride are
relatively consistent at all sites.
- This behavior is expected for carbon
tetrachloride which should be at background
levels across the United States.
- That chloroform does not follow the same pattern
as benzene and 1,3-butadiene indicates the
compounds probably have different sources.
Benzene and 1,3-butadiene are primarily emitted
by mobile sources while chloroform is emitted
primarily from industrial operations.
.a
CL
CL
LU
N
LU
CO
0.4
.a
CL
CL
0.3
OL
O
O
I
O
0.2
0.1
0.0
2005 Concentrations by Site
2.0
its
CL
0)
I 1.0
MCAZ SPAZ SRAZ
,H- 15
0.0
0.20
So.15
0.10
MCAZ SPAZ SRAZ
O
-e
0.05
MCAZ SPAZ SRAZ
0.00
MCAZ SPAZ SRAZ
MCAZ = West Phoenix
SPAZ = South Phoenix
SRAZ = Senior Center
Senior Center (Salt River)
*
m
June 2009
Section 8 - Suggested Analyses
30
-------
Temporal Patterns
Overview
• Characterization of temporal patterns can provide information on
sources, physical or chemical processes affecting air toxics
concentrations, and additional data validation.
• Before beginning temporal characterization, it is recommended to
create valid aggregated data sets (examples in Characterizing Air
Toxics, Section 5) to ensure the data are representative.
• There are sufficient data records in the example data set (i.e., one
year of samples collected every sixth day) to characterize
seasonal and weekday/weekend patterns.
• There are too few records in this data set to create day-of-week
patterns (i.e., 95% confidence intervals on the means will overlap
too much across the days because of the small sample size).
• 1- to 3-hr samples were not collected so diurnal patterns cannot be
investigated.
June 2009 Section 8 - Suggested Analyses 31
-------
Temporal Patterns
Seasonal
The figures show seasonal patterns for benzene at three sites.
The South and West Phoenix sites show typical benzene seasonal patterns (see Characterizing Air Toxics,
Section 5) with lower concentrations during warm months and higher concentrations during cooler months. This
is a result of mixing height differences and reactivity with season as opposed to changes in sources.
At the Senior Center site, benzene shows an invariant seasonal pattern. While we expect higher concentrations
in winter, note that the concentrations are generally lower during all seasons at this site. All samples are well-
mixed upon arriving at the Senior Center and are similar to summer concentrations at the other sites.
These data follow expectations for urban and downwind sites. The seasonal variability for these pollutants
shows that for the urban data, computed annual averages without the winter quarter would be biased low and
vice versa for a missing summer quarter.
.a
a.
W 2
-z.
LLJ
N
-z.
LLJ
m
West Phoenix
\ i i r
I I
.a
a.
LU 2
~z.
LU
N
~Z.
LLJ
m
South Phoenix
&
a.
LLI 2
-z.
LLI
N
-z.
LLI
GO
Senior Center
June 2009
Section 8 - Suggested Analyses
32
-------
Temporal Patterns
WeekdayA/Veekend
The figures show weekday and weekend benzene concentrations at three study monitoring
sites.
Typically, we would expect lower MSAT concentrations on weekends, but in practice this is not
always observed.
The West Phoenix site shows higher weekend concentrations, but the difference is not
statistically significant at 95% confidence. This difference may indicate that additional weekend
events near the site are causing benzene emissions. For example, monitors placed near a
facility with high use on weekends, such as a recreational facility, may cause this pattern.
Additional investigation of the surrounding area may be warranted but was not done.
The South Phoenix site shows slightly lower weekend concentrations (but not statistically
significant). This pattern is more typical of urban sites at a national level.
The Senior Center site shows invariant weekday/weekend patterns consistent with the well-
mixed and aged nature of samples arriving at the site.
West Phoenix South Phoenix Senior Center
31 i i i 31 i i i 3
>
JD
Q.
^Q.
o;
c
a;
N
c
OJ
DO
> o
JD Z
a.
^o.
a;
c
a;
N
c
OJ -i
DQ '
Weekday Weekend
0
> 9
JD Z
Q.
o;
c
a;
N
a;
DQ
June 2009
Weekday Weekend
Section 8 - Suggested Analyses
Weekday Weekend
33
-------
Risk Screening
Overview
Risk screening may provide a summary of ambient concentrations of air toxics that
may be of concern.
To identify species which may indicate higher risk, follow the decision tree below for
each pollutant.
After risk species have been identified, you may wish to create risk-weighted annual
averages.
The screening here uses the 1-in-a-million cancer risk level - one could select a
higher or lower risk level and define the level of concern depending on the purpose of
the screening. Other health effects, such as non-cancer threshold values, could be
used as well.
Is 85% of data for this
site-pollutant below MDL?
Yes
Is health
benchmark above
MDL?
Is site-average
concentration above
health benchmark?
Yes
Pollutant
concentration
is below health
benchmark
Site-pollutant is
uncharacterizable
Upper limit
of risk
<1x10-6
Upper limit
of risk
>1x10-6
Yes
Pollutant
concentration is
above health
benchmark
Risk
>1x10-6
Pollutant
concentration
is below health
benchmark
Risk
<1x10-6
June 2009
Section 8 - Suggested Analyses
(ICF Consulting, 2004)
34
-------
Risk Screening
West Phoenix Site
West Phoenix data necessary for risk screening
Pollutant
Benzene
Hexachlorobutadiene
% Below
Detection
0
100
1-in-a-
million
cancer risk
(ppbv)
0.040
0.0043
Average
Method
Detection
Limit (ppbv)
0.50
0.13
West Phoenix
Site Average
Concentration
(ppbv)
1.7
0.17
Perform risk screening by applying all the data listed in the table to the risk-screening decision
tree (see previous slide). Screening may be performed on a range of risk levels and also for
non-cancer levels of concern.
Benzene
- More than 85% of data is above detection so there is high confidence in measured concentrations.
- The site average concentration is above the chronic exposure concentration associated with a 1-in-a-
million cancer risk.
Hexachlorobutadiene
- 100% of data is below detection so we have no confidence that the measured concentrations accurately
reflect ambient concentrations. However, we know that concentrations are below the MDL (note that
MDLs varied by sample and the average is shown).
- The chronic exposure concentration associated with a 1-in-a-million cancer risk is below the MDL.
- We know that both the data and the cancer risk level of 1-in-a-million are below the MDL- improved data
collection methods are necessary to more accurately characterize risk. The upper limit of risk is based on
the MDL.
June 2009
Section 8 - Suggested Analyses
35
-------
Trends
Five-Year Trends
Inter-annual trends were investigated for all
pollutants with sufficient data.
The notched box plots show benzene
concentrations at two sites with data available
from 2001 to 2005.
Benzene concentrations have remained relatively
flat at the JLG Supersite and South Phoenix site.
However, there is a statistically significant
difference between the 2001 and 2005
concentrations at the South Phoenix site.
Trends for other air toxics showed similarly
consistent concentrations from year to year for this
time period.
Once six years of data are available, two 3-yr
averages should be compared (i.e., average of
2001, 2002, and 2003 vs. 2004, 2005, and 2006;
see Quantifying Trends, Section 6).
JLG Supersite
_a
o.
LLJ
-
.
LU
N
~
.
LLJ
CO
2?,
00 2001 2002 2003 2004 2005 2006 2007
YEAR
South Phoenix
.a
o.
o.
LLJ
-z.
LLJ
N
LLJ
CO
2000 2001 2002 2003 2004 2005 2006 2007
YEAR
June 2009
Section 8 - Suggested Analyses
36
-------
Source Apportionment
Example
Principal component analysis (PCA) was applied to air toxics
data from two sites, South Phoenix and West 43rd St., as part of
an exploratory analysis. PCA uses correlation or covariance
between each pair of variables to estimate relationships. PCA
is relatively easy to perform with basic statistical packages;
however, the analyst must infer source types from the factors.
In South Phoenix, PCA resolved six factors, accounting for 81%
of the variance. These data are illustrated in the top pie chart
(note that the percentages are percent of variance explained in
the data, not percent of the mass).
- 37%: Mobile sources (benzene, 1,3-butadiene, xylenes, toluene,
ethyl benzene)
- 9%: Background (carbon tetrachloride, methyl ethyl ketone)
- 11%: Secondary (formaldehyde, acetaldehyde)
- 6%: Summer gasoline additives (MTBE)
- 9%: Plastics (methylene chloride)
- 9%: Refrigerants/AC (dichlorodifluoromethane, trichlorofluoromethane)
PCA resolved four factors at the West 43rd Phoenix site,
accounting for 82% of the variance; carbonyl compound data
were not available at this site (so fewer factors were resolved).
- 33%: Mobile sources (benzene, xylenes, toluene , ethylbenzene)
- 20%: Summer sources, e.g., BBQs, air conditioning
(trichlorofluoromethane, acetylene, propylene)
- 14%: Secondary/background (MEK, MTBE, dichlorodifluoromethane)
- 15%: Plastics (trimethylbenzenes)
Next steps in this analysis may be to apply CMB or PMF to
estimate source contributions.
South Phoenix
Refrigerants, 9%
Plastics,
Secondary, 11%
Background, 9
Mobile, 37%
Summer Gasoline
Additives, 6%
West 43rd St.
Plastics, 15%
Secondary and
Background, 15%
Mobile, 33%
Summer Sources, 20%
June 2009
Section 8 - Suggested Analyses
37
-------
Model-to-Monitor Comparisons
Overview
• EPA periodically performs national-scale air toxics
assessment (NATA) to identify and prioritize air toxics
emissions source types and locations which are of
greatest potential concern in terms of contributing to
population health risk. Modeled concentration
estimates for 177 air toxics and DPM are provided by
county. For more information on NATA see
http://www.epa.gov/ttn/atw/natamain/.
• As part of an evaluation of how models used in NATA
performed, EPA conducted a monitor-to-model
evaluation to evaluate modeled values.
• A comparison of monitored and modeled data may
help in checking the uncertainty of modeled values.
June 2009 Section 8 - Suggested Analyses 38
-------
Model-to-Monitor Comparisons
Example
The figure shows the ratio of NATA99 modeled
data to annual averages computed from
monitored data at the study area sites to indicate
the accuracy of modeled data. This example is
meant to illustrate a technique - note that the
modeled and ambient data are from different
years.
When comparing modeled-to-monitored
concentrations, results within a factor of 2 are
considered reasonable agreement (U.S.
Environmental Protection Agency, 2006b).
Acetaldehyde, benzene, dichloromethane, and
trichloroethene typically agreed within a factor of
2, consistent with national-level comparisons of
modeled and monitored data.
However, ethylbenzene, formaldehyde, carbon
tetrachloride, chloroform, and
tetrachloroethylene showed monitored
concentrations more than a factor of 2 higher
than model estimates at study area sites. There
are many possible reasons for the differences.
For example, the carbon tetrachloride model
estimates have been shown to be low because
of the use of background concentrations that
were too low.
cc.
o
1.00
LJJ
Q
O
0.10
o
I
The graph shows the comparison of modeled
to monitored annual averages at the study
area sites. Boxes are described in Section 4:
Preparing Data for Analysis.
June 2009
Section 8 - Suggested Analyses
39
-------
Summary
What We Learned from this Data Analysis (1 of 2)
Overall data completeness was sufficient for analysis.
For species data above detection were sufficient to perform most analysis, while a
significant percent of some species' data were below detection.
QA analyses showed agreement between collocated data were typical of what other
studies have concluded.
Data were validated using time series, buddy site checks, scatter plots, and fingerprint
plots. Invalid data points were identified and removed.
Data were determined to be of sufficient quality for most analyses.
Air toxics concentrations in the study area were compared to national concentrations and
chronic exposure concentrations associated with a 1-in-a-million cancer risk;
concentrations of most air toxics are above the national median concentration at all study
sites and are typically above the selected levels of risk. It is not clear why, and an
evaluation/development of the air toxics emission inventory is planned
MDLs at study sites were found to be similar across sites so that data are comparable.
Spatial analyses showed concentrations were similar at the South and West Phoenix sites
while significantly lower concentrations of MSATs at the Senior Center site were consistent
with the sites' proximity to emissions.
June 2009 Section 8 - Suggested Analyses 40
-------
Summary
What We Learned from this Data Analysis (2 of 2)
- Temporal patterns were investigated.
• Seasonal patterns showed expected trends at the West and South Phoenix sites. Senior Center site
benzene concentrations were low and showed no seasonal trend consistent with aged air impacting the
site.
• There were no significant weekend/weekday patterns, a typical result as truck traffic or weekday carryover
often cause increased Saturday concentrations. There were not enough data points to reliably investigate
trends by day-of-week.
- Ambient annual average concentrations were compared to NATA 1999 modeled data. About half the
species monitored at study area sites were more than two times above their modeled concentration
values. Inspection of the emission inventory for the study area may be a next step.
- Risk screening was performed and the species of most concern were found to be benzene,
1,3-butadiene, acetaldehyde, carbon tetrachloride, chloroform, and tetrachloroethene.
Hexachlorobutadiene may be a contributor to risk, but is not measured well enough to quantify the risk.
- Five year trends (2001-2005) showed no significant change at the study sites
- PCA was performed for South Phoenix and West 43rd St. Mobile sources contributed to about one-
third of the variance at both sites. Pollution related to plastics, background species, and secondary
species contributed about another third. Both sites showed significant influence from "summer"
pollutants related to BBQs, air-conditioning/refrigerants, and summer fuel additives.
- Mobile source influences were confirmed by other analyses.
• Scatter plots showed strong correlation between mobile source air toxics.
• Spatial patterns revealed higher mobile source concentrations near busy roadways and much lower
concentrations in remote areas
- Short-term solvent emissions events were identified during the process of data validation.
June 2009 Section 8 - Suggested Analyses 41
-------
References
(1of2)
Arizona Department of Transportation (2005) Average Annual Daily Traffic (AADT). Available on the Internet at
Brown S.G., Hafner H.R., and Shields E. (2004) Source apportionment of Detroit air toxics data with positive matrix
factorization. Paper no. 41 presented at the Air & Waste Management Association Symposium on Air Quality
Measurement Methods and Technology, Research Triangle Park, NC, April 19-22 (STI-2450).
Brown S.G. and Hafner H.R. (2003) Source apportionment of Detroit pilot city air toxics data. Presented at the
National Workshop on Air Toxics Monitoring, Chicago, IL, May 13-14 (STI-902530-2371).
Brown S.G., Frankel A., and Hafner H.R. (2005) Principal component analysis and source apportionment of PAMS
VOC data. Final report prepared for the South Coast Air Quality Management District, Diamond Bar, CA, by
Sonoma Technology, Inc., Petaluma, CA, STI-904046-2723-FR, July.
Hafner H.R. and Brown S.G. (2005) 2005 JATAP monitoring project - gaseous air toxics data validation and
analysis. Work plan prepared for the Arizona Department of Environmental Quality, Phoenix, AZ, by Sonoma
Technology, Inc., Petaluma, CA, STI-905039.01-2814-WP, October.
Hafner H.R., O'Brien T.E., Frankel A.P., McCarthy M.C., and Brown S.G. (2006) 2005 JATAP monitoring project
gaseous air toxics data validation and analysis. Presented at the JATAP Workshop Meeting, Phoenix, AZ,
March 6, by Sonoma Technology, Inc., Petaluma, CA (905039.02-2921).
Hafner H.R. and O'Brien T.E. (2006) Analysis of air toxics collected as part of the Joint Air Toxics Assessment
Project. Final report prepared for the Arizona Department of Environmental Quality, Phoenix, AZ, by Sonoma
Technology, Inc., Petaluma, CA, STI-905039.03-3016-FR, December.
Henry R.C. (2000) Unmix Version 2 Manual. Available on the Internet at
last accessed September 9, 2005.
Hopke P.K. (2003) A guide to positive matrix factorization. Prepared for Positive Matrix Factorization Program,
Potsdam, NY, by the Department of Chemistry, Clarkson University, Potsdam, NY.
ICF Consulting (2004) Air toxics risk assessment reference library, Volume 1. Prepared for the U.S. Environmental
Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park, NC, by ICF
Consulting, Fairfax, VA, EPA-453-K-04-001A, April. Available on the Internet at
<"'••','. ,.,'.,• \ i "' - " r -! • ' • >.
June 2009 Section 8 - Suggested Analyses 42
-------
References
(2 of 2)
Lewis C.W., Morris G.A., Conner T.L., and Henry R.C. (2003) Source apportionment of Phoenix PM2.5 aerosol
with the Unmix receptor model. Journal of Air and Waste Management Association 53 (3), 325-338.
McCarthy M.C., Brown S.G., Hafner H.R., Frankel A., and Broaders K.E. (2004) Data analyses for Phoenix,
Arizona, air toxics data collected from 2001 to 2004. Final report prepared for Arizona Department of
Environmental Quality, Phoenix, AZ, by Sonoma Technology, Inc., Petaluma, CA, STI-904236-2666-FR,
December.
McCarthy M.C., Hafner H.R., and Montzka S.A. (2006) Background concentrations of 18 air toxics for North
America. J. Air & Waste Manage. Assoc. 56, 3-11 (STI-903550-2589). Available on the Internet at
Sundblom M., Armijo C., and Hafner H. (2006) Joint Air Toxics Assessment Project (JATAP) for the
Maricopa/Pinal urban area, Arizona. Presentation for the EPA National Air Monitoring Conference, Las Vegas,
NV, Novembers, by the Arizona Department of Environmental Quality, the Salt River Pima Maricopa Indian
Community, and Sonoma Technology, Inc., Petaluma, CA.
U.S. Environmental Protection Agency (1998) CMB8 application and validation protocol for PM2.5 and VOC.
Report prepared by U.S. Environmental Protection Agency, Research Triangle Park, NC, EPA 454/R-98-xxx,
October.
U.S. Environmental Protection Agency (2005) Prioritization of Data Sources for Chronic Exposure. Available on
the Internet at
U.S. Environmental Protection Agency (2006) Technology Transfer Network, 1999 National-Scale Air Toxics
Assessment, 1999 assessment results. Available on the Internet at
.
U.S. Environmental Protection Agency (2006) A Preliminary risk-based screening approach for air toxics
monitoring data sets. Available on the Internet at
U.S. Environmental Protection Agency (2006) Comparison of ASPEN Modeling System Results to Monitored
Concentrations. Available on the Internet at
June 2009 Section 8 - Suggested Analyses 43
------- |