Air Toxics Data Analysis
          Workbook
  U.S. Environmental Protection Agency
Office of Air Quality Planning and Standards
      Research Triangle Park, NC
              June 2009

                                    STI:908304.03-3224

-------
                       Table  of Contents
                     (1 of 2)
      Subject
       Front Matter                       1
        Table of Contents                   2
        Disclaimer                         4
        Workbook Content Summary          5
        Workbook Purpose                  6

       1. Introduction to Air Toxics       1
        What are air toxics?                  3
        Why analyze ambient air toxics data?    5
        Types of questions analysts may want
           to consider                     6
        Suggested analyses                  7
        Using the workbook                 12
        References                        13

       2. Definitions and Acronyms       1
        References                        14
Page   Subject
Page
      3. Background                    1
       Air toxics overview                   3
       Health risks from air toxics             4
       Air toxics emissions                  5
       Physical properties                   7
       Formation, destruction, transport        8
       History of sampling                   11
       Air toxics sampling and analysis        19
       Critical issues for interpretation         22
       Resources                          24
       Appendix                           25
       References                         26

      4. Preparing Data for Analysis     1
       What data are available               5
       Data completeness                   22
       Method Detection Limits               28
       Data validation                      43
       Summary                          60
       Appendix                           61
       Resources                          65
       Treating data 
-------
                      Table of Contents
                     (2 of 2)
      Subject
      5. Characterizing Air Toxics      1
       Temporal patterns                  5
       Spatial patterns                   36
       Risk screening                    69
       Summary                        73
       Resources                       75
       References                      76

      6. Quantifying and Interpreting
         Trends in Air Toxics           1
       Quantifying trends                 18
       Visualizing trends                  21
       Summarizing trends                28
       Resources                       48
       Summary                        49
       Additional reading                 51
       References                      53
Page   Subject
      7. Advanced Analyses
       Source apportionment
       Trajectory analyses
       Emission inventory evaluation
       Evaluating models
       Network assessment
       Resources
       References

      8. Suggested Analyses
       Motivation
       Data completeness
       Validation techniques
       Summary
       References
Page
 1
 4
18
25
29
33
44
45

 1
 2
13
17
40
42
June 2009
  Front Matter

-------
      Disclaimer
      The information and procedures set forth here are intended as a technical resource
      to those conducting analysis of air toxics monitoring data.  This document does not
      constitute rulemaking by the Agency and cannot be relied  on to create a substantive
      or procedural right enforceable by any party in litigation with the United States. As
      indicated by the use of non-mandatory language such as "may" and "should," it
      provides recommendations and does not impose any legally binding requirements. In
      the event of a conflict between the discussion in this document and any Federal
      statute or regulation, this document would not be controlling. The mention of
      commercial products, their source, or their use in connection with material reported
      herein is not to be construed as actual or implied endorsement of such products. This
      is a living document and may be revised periodically.

      The Environmental Protection Agency welcomes public input on this document at any
      time.  Comments should be sent to Barbara Driscoll (driscoll.barbara@epa.gov).
June 2009                                   Front Matter

-------
            Workbook Content  Summary

        Introduction
           Brief overview of the workbook and its motivation.
     •   Definitions and acronyms
        Background
           Summary of air toxics information to provide a basis for the analyst regarding
           emissions, formation, transport, and sampling/analysis of air toxics.
        Preparing data for analysis
           Methods and examples for validating air toxics data and preparing daily,
           quarterly, and annual averages.
        Characterizing air toxics
           Methods and examples of characterizing air toxics concentrations including spatial
           patterns, relationships, and time of day/seasonal variations.
        Quantifying trends in air toxics
           Methods and examples for preparing data for inter-annual trend analyses, identifying
           and quantifying trends, and tying these trends to changes in emissions.
     •   Advanced data analysis techniques
           Brief overview of advanced methods for data  analysis including source apportionment.
     •   Suggested analyses
           Summary of basic set of analyses that could be performed with air toxics at a local,
           state, and regional level to better understand  the data and inform policy makers.

June 2009                                 Front Matter

-------
                    Workbook  Purpose

    •  This workbook was designed to
         - serve as an overview of the sizeable topic of air toxics data analysis;
         - provide suggestions on the methodology to use in analyzing air toxics data,
           building on the experience gained in the past several years of national level
           data analysis efforts; and
         - document current methodology being used in national data analysis efforts.
    •  The workbook contains a different topic area in each section. Distinctions
       between methods used to assess the data at a national level and methods
       that can be applied at a site level are provided.
    •  Sections contain a range of information and examples.  Basic knowledge of
       summary statistics and data analysis techniques is assumed. The more
       advanced analyses or statistical techniques are separately discussed.
    •  Figures are used to show example analyses.  The figures are not intended
       to show the only way in which to perform an analysis but rather to provide
       the analyst with a starting point. Most figure captions list the tool used  to
       present the data, the data used in the analysis, an observation or
       interpretation point, and a reference.  When a reference is not provided, the
       figure was prepared by the workbook authors specifically for the workbook.
    •  References are provided at the end of each section.

June 2009                               Front Matter

-------
               ntroduction to
                 Air Toxics
June 2009             Section 1 - Introduction to Air Toxics

-------
           ntroduction to Air Toxics
          What's Covered in This Section?
      What are air toxics?
      Why analyze ambient air toxics data?
      Types of questions analysts want to answer
      Suggested analyses overview
      Using the workbook
June 2009                Section 1 - Introduction to Air Toxics

-------
                          What Are Air  Toxics?

         There are 188 Hazardous air pollutants (HAPs) defined in the Clean Air Act Amendments
         of 1990. HAPs are also referred to as air toxics, which is a broader term and includes
         additional pollutants such as hydrogen sulfide.  For this document, the two terms "HAPs"
         and "air toxics" will be used interchangeably. Air toxics are those pollutants known or
         suspected to cause cancer or other serious health effects, such as reproductive effects
         or birth  defects.
         Examples of toxic air pollutants include
           - Benzene, which is found in gasoline.
           - Perchloroethylene, which is emitted from some dry cleaning facilities.
           - Methylene chloride, which is used as a solvent and paint stripper by a number of industries.
           - Metals such as arsenic, mercury, chromium, and lead compounds, which are emitted, for
             example, from metal processing operations.
           - Semivolatile organic compounds (SVOCs) such as naphthalene, which is emitted in petroleum
             refining and fossil fuel and wood combustion.
         Most air toxics originate from anthropogenic sources, including mobile sources (e.g.,
         cars, trucks, buses) and stationary sources (e.g., factories,  refineries, power plants), and
         indoor sources (e.g., some building materials and cleaning  solvents). Some air toxics
         are also released from natural sources such as volcanic eruptions and forest fires.
         EPA is working with state, local, and tribal governments to reduce air toxics releases  to
         the environment (      i                               ).
           - EPA has issued rules covering over 80 categories of major industrial sources, such as chemical
             plants, oil refineries, aerospace manufacturers, and steel mills,  as well as categories of smaller
             sources, such as dry cleaners, commercial sterilizers, secondary lead smelters, and chromium
             electroplating  facilities.
           - EPA and state governments (e.g., California) have reduced emissions of benzene, toluene, and
             other air toxics from mobile sources by requiring the use of reformulated gasoline and placing
             limits on  tailpipe emissions.
June 2009                              Section 1 - Introduction to Air Toxics

-------
                       List  of  188   Hazardous  Air   Pollutants
 1,1,2,2-Tetrachloroethane
 1,1,2-Trichloroethane
 1,1-Dichloroethane
 1,1-Dichloroethylene
 1,2,4-Trichlorobenzene
 1,2-Dichloropropane
 1,3-Butadiene
 1,4-Dichlorobenzene
 2,2,4-Trimethylpentane
 Acetaldehyde
 Acetonitrile
 Acrolein
 Acrylonitrile
 Antimony (Tsp)
 Antimony Pm2.5 Lc
 Arsenic (Pm10) Stp
 Arsenic (Tsp)
 Arsenic Pm2.5 Lc
 Benzene
 Benzyl Chloride
 Beryllium (Pm10) Stp
 Beryllium (Tsp)
 Bromoform
 Bromomethane
 Cadmium (Pm10) Stp
 Cadmium (Tsp)
 Cadmium Pm2.5 Lc
 Carbon Disulfide
 Carbon Tetrachloride
 Chlorine Pm2.5 Lc
 Chlorobenzene
 Chloroethane
 Chloroform
 Chloromethane
 Chloroprene
 Chromium (Pm10) Stp
 Chromium (Tsp)
 Chromium Pm2.5 Lc
 Cobalt (Pm10) Stp
Cobalt (Tsp)
Cobalt Pm2.5 Lc
Dichloromethane
Ethyl Acrylate
Ethylbenzene
Ethylene Dibromide
Ethylene Dichloride
Formaldehyde
Hexachlorobutadiene
Isopropylbenzene
Lead (Pm10) Stp
Lead (Tsp)
Lead Pm2.5 Lc
M/P-Xylene
Manganese (Pm10) Stp
Manganese (Tsp)
Manganese Pm2.5 Lc
Mercury (Tsp)
Mercury Pm2.5 Lc
Methyl Chloroform
Methyl Isobutyl Ketone
Methyl Methacrylate
Methyl Tert-Butyl Ether
Naphthalene
N-Hexane
Nickel (Pm10) Stp
Nickel (Tsp)
Nickel Pm2.5 Lc
O-Xylene
Phosphorus Pm2.5 Lc
Propionaldehyde
Selenium (Pm10) Stp
Selenium (Tsp)
Selenium Pm2.5 Lc
Styrene
Tetrachloroethylene
Toluene
Trichloroethylene
Vinyl Acetate
Vinyl Chloride
1,2-Dibromo-3-Chloropropane
1,3-Dichloropropene(Total)
1,4-Dioxane
2,4,5-Trichlorophenol
2,4,6-Trichlorophenol
2,4-Dinitrophenol
2,4-Dinitrotoluene
3-Chloropropene
4,6-Dinitro-2-Methylphenol
4-Nitrophenol
Aniline
Antimony (Pm10) Stp
Antimony Pm10 Lc
Arsenic Pm10 Lc
Beryllium Pm10 Lc
Biphenyl
Bis (2-Chloroethyl)Ether
Bis(2-Ethylhexyl)Phthalate
Cadmium Pm10 Lc
Caprolactam
Chlorine (Tsp)
Chlorine Pm10 Lc
Chromium (Coarse Particulate)
Chromium Pm10 Lc
Cobalt Pm10 Lc
Dibenzofurans
Dimethyl Phthalate
Di-N-Butyl Phthalate
Ethylene Oxide
Heptachlor
Hexachlorobenzene
Hexachlorocyclopentadiene
Hexachloroethane
Isophorone
Lead Pm10 Lc
Lindane
Manganese (Coarse Particulate)
Manganese Pm10 Lc
Mercury (Pm10) Stp
Mercury (Vapor)
Mercury Pm10 Lc
Methanol
Methoxychlor
M-Xylene
Nickel (Coarse Particulate)
Nickel Pm10 Lc
Nitrobenzene
O-Cresol
P-Cresol
Pentachlorophenol
Phenol
Phosphorus (Tsp)
Phosphorus Pm10 Lc
P-Xylene
Selenium Pm10 Lc
Xylene(S)
1,1-Dimethyl hydrazine
1,2-Diphenylhydrazine
1,2-Epoxybutane
1,2-Propylenimin e
1,3-Propane sultone
2,3,7,8-Tetrachlorodibenzo-p-dioxin
2,4-D, salts and esters
2,4-Toluene diamine
2,4-Toluene diisocyanate
2-Acetylaminofluorene
2-Chloroacetophenone
2-Nitropropane
3,3-Dichlorobenzidene
3,3-Dimethoxybenzidine
3,3'-Dimethyl benzidine
4,4-Methylene bis(2-chloroaniline)
4,4-Methylenedianiline
4-Aminobiphenyl
4-Nitrobiphenyl
Acet amide
Acetophenone
Acrylamide
Acrylic acid
Asbestos
Benzidine
Benzotrichloride
beta-Propiolactone
Bis(chloromethyl)ether
Calcium cyanamide
Captan
Carbaryl
Carbonyl sulfide
Catechol
Chloramben
Chlordane
Chloroacetic acid
Chlorobenzilate
Chloromethyl methyl ether
Coke Oven Emissions
Cresols/Cresylic acid
Cyanide Compounds
DDE
Diazomethane
Dichlorvos
Diethanolamine
Diethyl sulfate
Dimethyl aminoazobenzene
Dimethyl carbamoyl chloride
Dimethyl formamide
Dimethyl sulfate
Epichlorohydrin
Ethyl carbamate (Urethane)
Ethylene glycol
Ethylene imine (Aziridine)
Ethylene thiourea
Fine mineral fibers
Glycol ethers
Hexamethylene-1,6-diisocyanate
Hexamethylphosphoramide
Hydrazine
Hydrochloric acid
Hydrogen fluoride
Hydrogen sulfide
Hydroquinone
Maleic anhydride
m-Cresol
Methyl hydrazine
Methyl iodide (lodomethane)
Methyl isocyanate
Methylene diphenyl diisocyanate
N,N-Diethyl aniline
N-Nitrosodimethylamine
N-Nitrosomorpholine
N-Nitroso-N-methylurea
o-Anisidine
o-Toluidine
Parathion
Pentachloronitrobenzene
Phosgene
Phosphine
Phthalic anhydride
Polychlorinated biphenyls
Polycylic Organic Matter
p-Phenylenediamine
Propoxur (Baygon)
Propylene oxide
Quinoline
Quinone
Radionuclides (including radon)
Styrene oxide
Titanium tetrachloride
Toxaphene
Triethylamine
Trifluralin
Vinyl bromide
 Abundance of data: > 20 monitoring sites with sufficient data to create a valid annual average between 2003-2005, up to 434 sites
 Little data: < 20 monitoring sites with sufficient data to create a valid annual average between 2003-2005, between 1-17 sites
 No Data: No valid annual averages between 2003-2005                                                    From: http://WWW.epa.gov/ttn/atw/188poHs.html
June 2009
                                 Section 1 - Introduction to Air Toxics

-------
       Why Analyze Ambient Air Toxics  Data?

       National level analyses provide an overview of the air toxics program
       and build on the power of a large data set to find the central
       tendencies in the data. Data anomalies at an  individual site have little
       influence on the overall results on a national scale.
       On a site-by-site basis, a much finer level of detail is needed to
       understand  the characteristics and trends observed. Knowledge is
       needed of the nearby sources, operating schedules, facility upsets
       and closures, new emission sources, types of emissions, types of
       controls and scheduled implementation, data reporting and quality
       issues, changes in sampling and methodology, local meteorology,
       and other details to fully understand changes in ambient pollutant
       concentrations.
       States collecting data have unique "local" perspectives on data
       quality, meteorology, and sources, and in articulating policy-relevant
       data analysis questions.
       Air toxics data analysis is needed at all levels to track progress in risk
       reduction.
June 2009                       Section 1 - Introduction to Air Toxics

-------
        Types  of Questions Analysts  May Want  to  Consider


         How do I ensure that the data I plan to use for analysis are of good quality?
          -   How do I treat data below detection? What kinds of data metrics do I need for subsequent
              analyses? (See Preparing Data for Analysis, Section 4)
         How do air toxics concentrations change spatially and by time of day, day of week, and
         season?
          -   Which air toxics have similar patterns?  (See Characterizing Air Toxics, Section 5)
          -   Do these air toxics have common sources?  (See Background, Section 3)
         What are the most important air toxics in terms of potential risk?
          -   Are we measuring them and, if so,  are we measuring them well? Where are they
              important?
          -   Which pollutants are not monitored well enough to characterize their risk or hazard? (See
              Advanced Analyses, Section 7)
         How do concentration levels for a given  city/area compare to other cities?
          -   Are concentrations comparable? What is the variability of air toxics concentrations within
              cities? Do specific cities, states, or regions experience demonstrably higher or lower
              concentrations? Do rural and remote sites show demonstrably lower concentrations? Are
              there differences in concentrations associated with geo-political or agency differences?
              (See  Characterizing Air Toxics, Section 5)
         Have air toxics concentrations declined over time in response to emission control programs?
         (See Quantifying and Interpreting Trends in Air Toxics, Section 8)
         How do the most important air toxics compare with model output (e.g., are ambient
         concentrations high  in locations not shown by the model)? (See Characterizing Air Toxics,
         Section 5)
June 2009                             Section 1 - Introduction to Air Toxics

-------
                         Suggested  Analyses
                                    Overview
        A list of suggested air toxics data analyses is compiled here to provide
        direction on those analyses that may be performed by air toxics monitoring
        agencies and to give an overview of analyses covered in the workbook.
        EPA compiled this list of suggested air toxics data analyses based on analyses
        that would help regional, state, and local organizations determine which factors
        contribute to air toxics concentrations in their area and whether the control
        strategies they have implemented have been successful at reducing these
        pollutants.
        This list is a suggested set of analyses that each area may wish to use to help
        understand air toxics concentrations in the area. There are several key areas
        of interest:
         -  Are data of sufficient quality for analysis?
         -  How would air toxics be characterized in the area?
         -  What are local sources of air toxics?
         -  Do toxics concentrations change over time?
        For the most informative results, some of these analyses could be performed
        annually.
June 2009                         Section 1 - Introduction to Air Toxics

-------
                    Suggested Analyses  (1 of 4)
                  Questions
                      Example Analyses
                       Are data of sufficient quality for analysis?
     How have data been validated?
            Run screening checks on data from AQS;
            identify outliers
     Does suspect data quality appear in any
     years or species measurements?
            Review collocated data; inspect summary
            statistics and concentration ranges; review
            time series plots of concentrations and
            detection limits
     Have data been censored?
            Assess concentration distributions;
            compare concentrations to detection limits
    Are sufficient samples available for detailed
    analyses?
            Determine number of samples/species with
            concentrations above detection
June 2009
Section 1 - Introduction to Air Toxics

-------
                    Suggested Analyses (2 of 4)
                  Questions
                      Example Analyses
            What is the nature and extent of air toxics problems in your area?
    What are the most abundant air toxics at
    each site on a risk-weighted basis?
             Determine median concentrations and
             concentration ranges and compare to
             appropriate risk levels
     How do these species vary by
     measurement season, month, and time of
     day?  Are findings consistent with national
     level results?
             Prepare box plots of concentrations by
             season, month, and time of day; compare
             to national results and expectations based
             on local conditions
     Do species show any day-of-week
     patterns?
             Prepare box plots of concentrations by day
             of week; compare results to expected
             patterns of local emissions
     How do concentrations compare to other
     locations, risk levels, remote background,
     or reference concentrations?
             Compare monitor-level data to national-
             perspective plots
June 2009
Section 1 - Introduction to Air Toxics

-------
                    Suggested  Analyses (3 of 4)
                  Questions
                      Example Analyses
                         What are local sources of air toxics?
    What are the potential toxics sources in the
    area?
            Investigate Google map of area; overlay
            VOC, PM2 5, and air toxics emission
            inventory information
    Do the air toxics corroborate the source
    mixture?
              Examine key species noted as tracers for
              the expected sources in the area using
              scatter plots and correlation matrices
              Compare concentrations of air toxics and
              nontoxic tracer species to further assess
              sources (e.g., PM25 components,
              hydrocarbons)
June 2009
Section 1 - Introduction to Air Toxics
                                                                                 10

-------
                    Suggested Analyses  (4 of 4)
                  Questions
                      Example Analyses
                    Do air toxics concentrations change over time?
    What are the annual trends in air toxics
    concentrations?
            Prepare annual box plots of key species to
            evaluate trends
    How might changes in air toxics
    concentrations be related to emissions
    controls?
              Compare trends in co-emitted pollutants
              Assess timing of controls and expected
              reductions relevant to local monitoring of
              pollutants.
June 2009
Section 1 - Introduction to Air Toxics
                                                                                11

-------
                   Using the Workbook

     •  This workbook documents methodology used in national-scale
       analyses, extends these methodologies to possible use in local-
       scale analyses, and suggests methodology for further exploration.
     •  Skills needed by analysts  to conduct the analyses shown in this
       workbook vary. Analyses require a range of tools, skills, and
       knowledge.  A fundamental understanding of databases,
       spreadsheets,  and  summary statistics is desirable. Some
       analyses require special training (e.g., source apportionment
       tools) and/or tools (e.g., sophisticated statistical treatments).  In
       general, analyses described in the following sections are arranged
       from "easiest" to "most difficult" to perform.
     •  Examples are provided  from the national-scale analyses and
       some analyses were custom-designed for the workbook.
     •  Space available in the workbook is limited; therefore, many details
       are, of necessity, provided in the literature. A reference section is
       provided at the end of each chapter.
June 2009                      Section 1 - Introduction to Air Toxics                         12

-------
                         References
       Agency for Toxic Substances and Disease Registry (ASTDR)
       (2007) Frequently asked questions about contaminants found at
       hazardous waste sites. Available on the Internet at
       http://www.atsdr.cdc.qov/toxfaq.html.
       U.S. Environmental Protection Agency, FERA(Fate, Exposure
       and Risk Analysis) Risk Assessment and Modeling web site.
       Available on the Internet at
       http://www.epa.gov/ttn/fera/risk atoxic.html
       U.S. Environmental Protection Agency (2007a) EPA air toxics web
       site. Available on the Internet at
       http://www.epa.gov/ttn/atw/allabout.html
       U.S. Environmental Protection Agency ( 2007b) About air toxics,
       health and ecological effects. Available on the Internet at
       http://www.epa.gov/air/toxicair/newtoxics.html.
June 2009                      Section 1 - Introduction to Air Toxics                         13

-------
         Definitions and Acronyms
         This section lists
         definitions of terms
         and acronyms used
         in this workbook.
                 ,,»%." ,«•««* .'
                 V ' rft' .

                 1s#2^'-:.><'
                  ..*% * tV**"* ^s " «; ^ * *
                  T*4 » ,»\^ v
-------
            Definitions  and  Acronyms
(1 of 12)
      Aerosol A particle of solid and/or liquid matter that can remain suspended in the air because of
        its small size (generally under one micron).
      AIRNow The U.S. EPA, NOAA, tribal, state, and local agencies developed the AIRNow web site
        to provide the public with easy access to national air quality information.  The web site offers
        daily air quality index (AQI) forecasts as well as real-time AQI conditions for over 300 cities
        across the United States, and provides links to more detailed state and local air quality web
        sites .
      Airshed A geographic area that, because of topography, meteorology, and/or climate, is
        frequently affected by the same air mass.
      Air Toxics - Any pollutant that causes or may cause cancer, respiratory, cardiovascular, or
        developmental effects, reproductive dysfunctions, neurological disorders, heritable gene
        mutations, or other serious or irreversible chronic or acute health effects in humans. See
        hazardous air pollutant.
      AMTIC - Ambient Monitoring Technology Information Center.  An EPA website that contains
        information and files on ambient air quality monitoring programs, details on monitoring methods,
        monitoring- related documents and articles, information on air quality trends and nonattainment
        areas, and federal regulations related to ambient air quality monitoring.

      Anthropogenic Caused or produced by human activities.
      Anthropogenic emissions  Emissions from man-made sources as opposed to natural (biogenic)
        sources.
      AQS  Air Quality System; the EPA's repository of ambient air quality data
      Back trajectory A trace backwards in time showing where an air mass has been.


June 2009                            Section 2 - Definitions and Acronyms

-------
            Definitions  and  Acronyms
(2 of 12)
      Background Levels The concentration of a chemical already present in an environmental
        medium due to sources other than those under study. Two type of background levels may exist
        for chemical substances:  (a) naturally occurring levels of substances present in the
        environment, and (b) concentrations of substances present in the environment due to human
        associated activities (e.g., automobile or industrial emissions).
      Benchmark Dose An exposure due to the dose of a substance associated with a specified low
        incidence of risk, generally in the range of 1 % to 10% of a health effect; or the dose associated
        with a specified measure or change of a biological effect.
      Black Carbon (BC) Black carbon measured using light absorption, typically with an
        AetnalometerTM. Used in the air toxics monitoring network as a potential surrogate measure
        (although not unique or quantitative) of diesel particulate matter.
      Cancer benchmark A potential regulatory threshold concentration of concern related to long term
        exposure to a chemical associated with increased cancer risk.
      Cancer Incidence The number of new cases of a disease diagnosed each year.
      Cancer Risk Estimates The probability of developing cancer from exposure to a  chemical agent
        or a mixture of chemicals over a specified period of time.  In quantitative terms, risk is
        expressed  in  values ranging from zero  (representing an estimate that harm certainly will not
        occur) to one (representing an estimate that harm certainly will occur).  The following are
        examples of how risk is commonly expressed: 1 .E-04 or 1x 1CH = a risk of 1 additional cancer
        in an exposed population of 10,000 people (i.e., 1/10,000); 1.5E-5or 1x 1Q-5+ 1/100,000.
      Cd  Cadmium.
      Censored Data The measured value is replaced with a proxy: Typical examples  are MDL,
        MDL/2, MDL/10, or zero.
June 2009                           Section 2 - Definitions and Acronyms

-------
            Definitions  and  Acronyms
(3 of 12)
      Census tract Census tracts are small, relatively permanent statistical subdivisions of a county.
        Census tracts are delineated for most metropolitan areas (MAs) and other densely populated
        counties by local census statistical areas committees following Census Bureau guidelines
        (more than 3,000 census tracts have been established in 221 counties outside MA's). Six states
        (California, Connecticut, Delaware, Hawaii, New Jersey, and Rhode Island) and the District of
        Columbia are covered entirely by census tracts. Census tracts usually represent between
        2,500 and 8,000 people and, when first delineated, are designed to be homogeneous with
        respect to population  characteristics, economic status, and living conditions.  Census tracts do
        not cross county boundaries.  The spatial size of census tracts varies widely depending on the
        density of settlement 
-------
            Definitions and  Acronyms
(4 of 12)
      Conditional probability function (CPF)  A method that analyzes local source impacts from
         varying wind directions using the source contribution estimates from PMF coupled with the
         corresponding wind directions.
      Confidence Interval (Cl) Cl for a population parameter is an interval with an associated
         probability p that is generated from a random sample of an underlying population such that if the
         sampling was repeated numerous times and the confidence interval recalculated from each
         sample according to the same method, a proportion p of the confidence intervals would contain
         the population parameter in question.
      Covariance  A statistical measure of correlation of the fluctuations of two different quantities.
      Cr  Chromium.
      Data Quality The encompassing term regarding the quality of information used for analysis and/or
         dissemination of data. Utility, objectivity, and integrity are essential parts of data quality.
      Data Quality Objectives (DQOs) Qualitative and quantitative statements derived from the DQO
         process that clarify study objectives,  define the appropriate type of data, and specify tolerable
         levels of potential decision errors that will be used as the basis for establishing the quality and
         quantity of data needed to support the decisions.
      Data Quality Objectives Process A systematic planning tool to facilitate the planning of
         environmental data collection activities.  Data quality objectives are the qualitative and
         quantitative outputs from the DQO process.
      Detection limit (DL) The lowest concentration of a chemical that can reliably with analytical
         methods be distinguished from a zero concentration. See also method detection limit.
June 2009                            Section 2 - Definitions and Acronyms

-------
            Definitions  and  Acronyms
(5 of 12)
      Dispersion model A source-oriented approach in which a pollutant emission rate and
         meteorological information are input into a mathematical model that disperses (and may also
         chemically transform) the emitted pollutant, generating a prediction of the resulting pollutant
         concentration at a point in space and time.
      DPM Diesel particulate matter.
      Edge  A line that defines the boundary of the relationship between two parameters on a scatter
         plot.
      Elemental carbon (EC)  Black carbon material with little or no hydrogen; non-volatile carbon
         material; often called black carbon or soot.
      Emission Inventory (El)  A list of air pollutants emitted into a community's atmosphere in
         amounts (commonly tons) per day or year, by type of source.
      EPA U.S. Environmental Protection Agency.
      EPA PMF A standalone version of PMF created by the EPA in 2005.
      Environmental justice The fair treatment and meaningful involvement of all people regardless of
         race, color, national origin, or income with respect to the development, implementation, and
         enforcement of environmental laws, regulations, and policies.
      F-test The F-test provides a statistical measure of the confidence that a relationship exists
         between the two variables (i.e., the regression line does not have a slope of zero, which would
         indicate the dependent variable is not related to the independent variable).
      F-value Output of the F-test. Large F-values indicate a stronger correlation between the two
         variables  (i.e., the slope of the regression line is NOT zero).
      Factor analysis A procedure for grouping data by similarity among variables (i.e., variables that
         are highly correlated are grouped).
      Factor strength (source strength).  See Source contribution.

June 2009                            Section 2 - Definitions and Acronyms

-------
            Definitions  and  Acronyms
(6 of 12)
      Federal Reference Method (FRM)  Provides for the measurement of the mass concentration of
        fine particulate matter having an aerodynamic diameter less than or equal to a nominal 2.5
        microns (PM2 5) in ambient air over a 24-hr period for purposes of determining whether the
        primary and secondary National Ambient Air Quality Standards for fine particulate matter are
        met. Designation of a particle sampler as a Federal Reference Method (FRM) is based on a
        demonstration that a vendor's instrument meets the design specifications, performance
        requirements, and quality control standards specified in the regulation.
      Fine particles Particulate matter with diameter less than 2.5 microns; PM2 5.
      HAPs (hazardous air pollutants) Hazardous air pollutants, also known as air toxics, have been
        associated with a number of adverse human health effects, including cancers, asthma and
        other respiratory ailments, and neurological problems such as learning disabilities and
        hyperactivity.
      Hazard Quotient (HQ) The ratio of a single substance exposure level over a specified time period
        (e.g., chronic) to a reference value (e.g., an RfC) for that substance derived from a similar
        exposure period.
      HYSPLIT HYbrid Single-Particle Lagrangian Integrated Trajectory model; a system for computing
        simple air parcel trajectories <  u  /,,  /, s   •„$$ i ^  : ;K    ,-si.       >.
      IMPROVE Interagency Monitoring of Protected Visual Environments. A collaborative monitoring
        program to establish present visibility levels and trends, and to identify sources of man-made
        impairment < "/.o  • VH ;, a •..,••••  ;.?:.:;xc •:%,.: ' o  v.cO-:."a , .  •.•••>.
      Interquartile range  The difference between the 75th and 25th percentiles of a data set.
June 2009                            Section 2 - Definitions and Acronyms

-------
            Definitions and  Acronyms
(7 of 12)
      Level 0 validation Routine checks made during the initial data processing and generation of
         data, including  proper data file identification, review of unusual events, review of field data
         sheets and result reports, instrument performance checks, and deterministic relationships.
      Level I validation  Tests for internal consistency to identify values in the data that appear atypical
         when compared to values of the entire data set.
      Level II validation Comparison of the current data set with historical data to verify consistency
         over time. This level can be considered a part of the data interpretation or analysis process.
      Level III validation  Tests for parallel  consistency with data sets from the same population (i.e.,
         region, period of time, air mass,  etc.) to identify systematic bias. This level can also be
         considered a part of the data interpretation or analysis process.
      LC Local conditions; refers to ambient PM measurements.
      MACT  Maximum  achievable control technology. MACTs are technology-based air emission
         standards established under Title III of the 1990 Clean Air Act Amendments
         < ..    • '  /• '•',-,   •     ,  •"./'..•.  ' ./,  '   '•...'.>.
      Mean The sum of all values divided by the number of samples.
      Median  The middle value in a sorted  list of samples if there is an odd number of samples, or the
         average of the two middle values if there is an even number of samples.
      Method Detection Limit (MDL) The  minimum concentration of a substance that can be
         measured and  reported with 99% confidence that the analyte concentration is greater than zero
         and is determined from the analysis of a sample in a given matrix  containing the analyte
      Mobile sources  Motor vehicles and other moving objects that release pollution; mobile sources
         include cars, trucks, buses, planes,  trains, motorcycles, and gasoline-powered lawn mowers.
         Mobile sources are divided into two groups: road vehicles, which  include cars, trucks, and
         buses, and non-road vehicles, which include trains, planes, and lawn mowers.


June 2009                            Section 2 - Definitions and Acronyms

-------
            Definitions  and  Acronyms
(8 of 12)
      Mobile source air toxics (MSATs) Compounds that are emitted by mobile sources and have the
        potential for serious adverse health effects.
      National Ambient Air Quality Standards (NAAQS)  Health-based pollutant concentration limits
        established by the EPA that apply to outside air.
      NATA National air toxics assessment <                                         >. EPA's
        national-scale assessment of 1999 air toxics emissions. The purpose of the national-scale
        assessment is to identify and prioritize air toxics, emission source types and locations that are of
        greatest potential concern in terms of contributing to population risk.
      NATTS National air toxics trends stations <                                >.
      NEI National emissions inventory <                          >.
      NOAA  National Oceanic and Atmospheric Administration.
      NWS National Weather Service.
      OAQPS Toxicity Table The EPA Office of Air and Radiation recommended default chronic toxicity
        values for hazardous air pollutants. They are generally appropriate for screening-level risk
        assessments, including assessments of select contaminants, exposure routes, or emission sources
        of potential concern, or to help set priorities for further research. For more complex, refined risk
        assessments developed to support regulatory decisions for single sources or substances, dose-
        response data may be evaluated in detail for each "risk driver' to incorporate appropriate new
        toxicological data.
      OH  Hydroxyl radical; the driving force behind the daytime reactions of hydrocarbons in the
        troposphere.
      O3  Ozone; a major component of smog. Ozone is not emitted directly into the air but is formed by the
        reaction of VOCs and NOX in the presence of heat and sunlight.
      Organic carbon (OC)  Consists of hundreds of separate semi-volatile and particulate compounds.

June 2009                            Section 2 - Definitions and Acronyms

-------
            Definitions  and Acronyms
                                        (9 of 12)
      Outliers  Data physically, spatially, or temporally inconsistent.
      P-value Provides a measure of the percentage confidence that the slope is not zero:  % confidence
         slope is not zero = 100%(1 - P). Generally, 95% confidence is used as a cutoff value,
         corresponding to a P-value of 0.05.
      PAMS  Photochemical Assessment Monitoring Stations .
      Particulate matter (PM)  A generic term referring to liquid and/or solid particles suspended in the air.
      Percentile The pth percentile of a data set is the number such that p% of the data is less than that
         number.
      PM25  Particulate matter less than 2.5 microns.  Tiny solid and/or liquid particles, generally soot and
         aerosols. The size of the particles (2.5 microns or smaller, about 0.0001 inches or less) allows
         them to easily enter the air sacs deep in the lungs where they may cause adverse health effects;
         PM25 also causes visibility reduction.
      PM10 Particulate matter less than 10 microns. Tiny solid and/or
         liquid particles of soot, dust,  smoke, fumes, and aerosols. The
         size of the particles (10 microns or smaller, about 0.0004 inches
         or less) allows them to easily enter the air sacs in the lungs where
         they may be deposited, resulting in adverse health effects. PM10
         also causes visibility reduction and is a criteria air pollutant.
      PMF  Positive matrix factorization; a receptor model. PMF can be
         used to determine source profiles and source contributions
         based on the ambient data.
      POC Pollutant occurrence code used in the AQS.
                                      Human hair
                                   cross-section (70 |jm)
                                               PM25
                                              (2.5 Mm)
June 2009
Section 2 - Definitions and Acronyms
10

-------
           Definitions  and  Acronyms
(10 of 12)
      Point source Point sources include industrial and nonindustrial stationary equipment or
         processes considered significant sources of air pollution emissions.  A facility is considered to
         have significant emissions if it emits about one ton or more in a calendar year.  Examples of
         point sources include industrial and commercial boilers, electric utility boilers, turbine engines,
         industrial surface coating facilities, refinery and chemical processing operations, and petroleum
         storage tanks.
      Potential Source Contribution Function (PSCF)  A method that combines the source
         contribution estimates from PMF with the air parcel backward trajectories to identify possible
         source areas  and pathways that give rise to the observed high particulate mass concentrations
         from the potential sources.
      Precursor  Compounds that change chemically or physically after being emitted into the air and
         eventually produce air pollutants.  For example, sulfur and nitrogen oxides are precursors for
         particulate matter.
      Primary particles The fraction of PM10 and PM2 5 that is directly emitted from combustion and
         fugitive dust sources.
      QA  Quality assurance; a set of external tasks to provide certainty that the quality control system
         is satisfactory. These tasks include independent performance audits, on-site system audits,
         interlaboratory comparisons, and periodic evaluations of internal quality  control data.
      QC  Quality control; a set of internal tasks performed to provide accurate and precise measured
         ambient air quality data. These tasks address sample collection, handling, analysis, and
         reporting (e.g., periodic calibrations, routine service checks, instrument-specific monthly quality
         control maintenance  checks, and duplicate analyses on split and spiked samples).
      R-squared, r2  Statistical measure of how well a regression line approximates real data points;
         an r2 of 1.0 (100%) indicates a perfect fit.


June 2009                            Section 2 - Definitions and Acronyms                                  n

-------
           Definitions  and  Acronyms
(11 of 12)
      Receptor model A receptor-oriented approach for identifying and quantifying the sources of
        ambient air contaminants at a receptor primarily on the basis of concentration measurements at
        that receptor.
      Reference Concentration (RfC) An estimate (with uncertainty of perhaps an order of magnitude)
        of a continuous inhalation exposure to the human population (including sensitive subgroups) that
        is likely to be without an appreciable risk of deleterious effects during a lifetime.
      Reid Vapor Pressure (RVP) A measure of gasoline volatility.
      RFC Reformulated gasoline.
      Residuals Measured concentrations minus modeled concentrations.
      SEARCH Southeastern Aerosol Research and Characterization Study.
      Secondary formation  The fraction of a pollutant that is formed in the atmosphere (e.g.,
        formaldehyde is both emitted directly and formed in the atmosphere through secondary
        photochemical processes).
      Selected ion monitoring (SIM) A mass spectral mode in which the mass spectrometer is set to
        scan over a very small  mass range, typically one mass unit, providing higher sensitivity results
        than a full mass scan.
      Slope  Statistical measure of the average ratio of the predicted to measured concentrations of a
        species; a slope closer to 1.0 demonstrates a closer fit.
      Source apportionment  The process of apportioning ambient pollutants to an emissions source.
        Also known as source attribution.
      Source contribution  Total mass of material from a source measured in a sample.
      Source-dispersion model  See Dispersion model.
      Source profile  Listing of individual chemical species emitted by a specific source category.
June 2009                            Section 2 - Definitions and Acronyms                                 12

-------
           Definitions  and  Acronyms
(12 of 12)
      Speciation Trends Network (STN)  A network of sampling locations established by the EPA in
        2001 to characterize PM25 composition in urban areas.  Roughly 300 sites nationwide are part
        of this network. Now part of the Chemical Speciation Network (CSN).
      Standard Deviation A measure of how much the average varies.  The square root of the
        average squared deviation of the observations from their mean.
      Standard operating procedure (SOP) A set of instructions used to ensure data quality.
      Standardized residual  Ratio of the residual to the uncertainty of a species in a specific sample
        determined by the user.
      State implementation  plan (SIP) A detailed description of the programs a state will use to carry
        out its responsibilities under the Clean Air Act. State implementation plans are collections of
        the regulations used by a state to reduce air pollution. The Clean Air Act requires that the EPA
        approve each state implementation plan.
      SVOC Semi-volatile organic compound.
      Toxicity The degree to which a substance or mixture of substances can harm humans or
        environmental receptors.
      TRI Toxic Release Inventory.  Publicly available EPA database that contains information about
        toxic chemical releases and other waste management activities reported annually by certain
        covered industry groups as well as federal facilities .
      TSP Total suspended particulate.
      Uncensored data Data reported "as is" with no substitution for values  below detection.
      Variance The square of the standard deviation.
      VOC  Volatile organic compound.
      WD Wind direction.
      WS Wind speed.
      XRF  Energy dispersive X-ray fluorescence. Method used to quantify particulate metals.
June 2009                            Section 2 - Definitions and Acronyms                                 13

-------
                                   References
        Bay Area Air Quality Management District (2005) Air quality glossary. Available on the Internet at
           .
        California Air Resources Board (2003) Glossary of air pollution terms. Available on the Internet at
           .
        Minnesota Pollution Control Agency (2005) General glossary. Available on the Internet at
           .
        National Park Service (2005) Glossary of terms used by the NPS Inventory and Monitoring Program.
           Available on the Internet at .
        Sam Houston State University (2005) Atmospheric chemistry glossary. Web site prepared by Sam
           Houston State University, Department of Chemistry,  Huntsville, TX, by the Department of
           Chemistry. Available on the Internet at .
        U.S. Environmental Protection Agency (2002) The plain English guide to the Clean Air Act:
           Glossary. Available on the Internet at
           .
        U.S. Environmental Protection Agency (2005) AIRTrends 1997 report:  list of acronyms. Available on
           the Internet at .
June 2009                             Section 2 - Definitions and Acronyms                                  14

-------
                 Background
    What are air toxics and why are they important?
June 2009                 Section 3 - Background

-------
                    Background
          What's Covered in This Section?
      Air toxics overview
      Health risks from air toxics; terminology
      Air toxics emissions
      Physical properties
      Formation, destruction, and transport of air toxics
      History of sampling; objectives of air toxics and other
      monitoring programs
      Air toxics sampling  and analysis
      Critical issues for data interpretation
June 2009                     Section 3 - Background

-------
                                           Air  Toxics
                                                 Overview
        •  What are air toxics ?
            - Air toxics are gaseous, aerosol, or particle pollutants present in the air in varying concentrations with
               characteristics such as toxicity or persistence that can be hazardous to human, plant, or animal life.
            - The terms "air toxics" and "hazardous air pollutants" (HAPs) are used interchangeably in this document.
            - Air toxics include the following general categories of compounds: volatile and semi-volatile organic
               compounds (VOCs, SVOCs), polycyclic aromatic hydrocarbons (PAHs), heavy metals, and carbonyl
               compounds.
        •  What are the health and environmental effects of toxic air pollutants?
            - People exposed to toxic air pollutants at sufficient concentrations and durations may have an increased
               chance of getting cancer or experiencing other serious health effects.
            - Both high values and annual means of air toxics concentrations are of interest because some air toxics
               have both episodic, short-term health effects and chronic, long-term health effects.
            - Other health effects can include damage to the  immune system, as well as neurological, reproductive
               (e.g.,  reduced fertility),  developmental, respiratory, and other health problems.
            - Some toxic air pollutants, such as mercury, can deposit onto soils or surface waters where they are taken
               up by plants and ingested by animals and are eventually magnified up through the food chain.
            - Animals may experience health problems if exposed to sufficient quantities of air toxics over time.
        •  How are  people exposed to air toxics?
            - Breathing  contaminated air.
            - Eating contaminated food products, such as fish from contaminated waters; meat, milk, or eggs from
               animajs that feed on contaminated plants; and fruits and vegetables grown in contaminated soil on which
               air toxics have been deposited.
            - Drinking water contaminated by toxic air pollutants.
            - Ingesting contaminated soil.
            - Touching contaminated soil, dust, or water.
            - Accumulating some persistent toxic air pollutants in body tissues after toxic air pollutants have entered
               the body. Predators typically accumulate even greater pollutant concentrations than their contaminated
               prey. As a result, people and other animals at the top of the food chain who eat contaminated fish or
               meat are exposed to concentrations that are much higher than the concentrations in the water, air, or soil.

         U.S. Environmental Protection Agency (2007c, g)
June 2009                                        Section 3 - Background

-------
            Health  Risks from  Air  Toxics
        Simply put, health risks are a measure of the chance that you will experience
        health problems.
                          Health risk = Hazard x exposure
        Health risk is the probability that exposure to a hazardous substance will
        make you sick.  Animal experiments and human studies provide information
        about a substance's level of hazard. Scientists use the results of such
        studies to estimate the likelihood of illness at different levels of exposure.
        Exposure to toxic air pollutants can increase your health  risks. For example,
        if you live near a factory that releases cancer-causing chemicals and inhale
        contaminated air, your risk of getting cancer may increase. Breathing air
        toxics could also increase your risk of noncancer
        effects such as emphysema, asthma, or
        reproductive disorders.
        Ambient concentrations of air toxics are compared
        to health related concentrations derived from
        scientific assessments conducted by the EPA and
        other environmental agencies.  These levels of
        concern provide a frame of reference to put air
        toxics concentrations into perspective.
      U.S. Environmental Protection Agency (2007a, b)
June 2009
Section 3 - Background

-------
June 2009
             Air  Toxics  Emissions

             What Are the Sources of Air Toxics?

Air toxics are both directly emitted by sources and formed in the
atmosphere.  In emission inventory terminology, emissions are grouped as
point (major), area, and mobile sources. The following 3 definitions
describe how these terms are used in the emission inventory.
Major sources include chemical plants, steel mills, oil refineries, and
hazardous waste incinerators for which there is a specific location provided
in the inventory. Pollutants can be released when equipment leaks, when
material is transferred from one area to another, or when waste is given off
from a facility through smoke stacks.
Area sources are made up of many smaller sources releasing pollutants
to the outdoor air in a defined area. Examples include neighborhood dry
cleaners, small metal plating operations, gas stations, and woodstoves.
These sources may not be identified in the inventory by a specific location.
Mobile sources include highway vehicles, trains, marine vessels, aircraft,
and non-road equipment (such as construction equipment).
Routine releases, such as those from industry, cars, landfills, or
incinerators, may follow regular patterns and happen continuously over   K
time. Other releases may be routine but intermittent, such as when a     $
plant's production is performed in batches. Accidental releases can occur
during an explosion,  equipment failure, or a transportation accident. The
timing and amount released during accidental releases are difficult to
estimate.
Natural sources - Some air toxics are also released from natural sources
such as volcanoes or fires, typically in the inventory these would be
included in area source emissions.                                 fatei

                              Section 3 - Background

-------
                   Air Toxics  Emissions

                      Source  Type Characteristics

      Understanding the emission source type of a particular air toxic can help
      the analyst begin to develop a conceptual model of concentration patterns
      and gradients that  might be expected.
       • Major source emissions, for example, are a localized source of toxics. Steep
         concentration gradients of primarily emitted toxics around point sources are
         typical, especially if there are no other nearby sources of the pollutants.
       • Area source emissions are typically well-distributed emissions sources because
         there are multiple sources in an area. Area source emissions  can lead to
         relatively homogeneous concentrations of toxics on the urban  scale.  However, if
         a monitor is placed close to any source type, gradients may be observed.
       • Mobile source air  toxics exhibit both  point
         source and area source characteristics. Very
         close to a roadway or near a construction
         site, mobile source air toxics may be seen in
         higher concentrations.  A few hundred  meters
         away from the roadway, for example,
         concentrations typically fall to more normal
         average urban-scale levels.
June 2009
Section 3 - Background

-------
                     Physical  Properties

      •  Physical properties of air toxics span the entire range of pollutants
        present in the atmosphere.
        -  Air toxics are present in the atmosphere as particles and gases and in semi-
           volatile form.
        -  Air toxics can be both primary (directly emitted) and secondary (formed in the
           atmosphere) in origin.
        -  Air toxics are mostly emitted from anthropogenic sources, but include some
           biogenic sources.
        -  Some air toxics have very short atmospheric lifetimes while others remain in
           the atmosphere for decades.
      •  Some air toxics such as VOCs (e.g., benzene and toluene) are
        precursors to ozone and particulate matter (PM); and other toxics such as
        heavy metals are components of PM.
      •  Preliminary investigation of the linkage between criteria pollutants and air
        toxics showed a correlation of acetaldehyde and formaldehyde with
        ozone but that correlation was likely because of similar photochemical
        production mechanisms, rather than source similarities (i.e., not a causal
        association) and most air toxics did not correlate well with ozone, PM2 5,
        or other air toxics (                 /             y                ).

June 2009                           Section 3 - Background

-------
       Formation,  Destruction,  Transport
                                                                     (1 of 2)
                                                               Conceptual depiction of
                                                                  transport scales.
                                                       Typical Downwind
                                                        Concentration
                                                    Gradient from a Point Source
                                                     Typical Concentration Gradient
                                                        from an Area Source
Some air pollution problems are limited
to the local area where pollution is
emitted. Other air quality problems
spread to cover cities or regions of
the country. Emissions of some
pollutants from anywhere on earth
can contribute to a global problem.
While some pollutants can be neatly
characterized as contributors to local,
regional, or global problems, many
pollutants are important on multiple
spatial scales.  Explaining the factors
that control the spatial extent of a
pollutant requires understanding the
emissions,  transport, and chemistry
of a pollutant.
Concentrations of primarily emitted pollutants are almost always highest very close to
their emissions source (for primary pollutants).  The figure illustrates the typical drop-off
in concentrations from an emissions source as distance increases from the source.
Pollution concentrations start very high, but are diluted by the atmosphere in the first few
hundred feet from a  source as they are transported and dispersed.
                                                                      Urban Center
                                         Pollutant Source
                                           JUL
                                                                                     9500 10000
                                                           Downwind Distance from Source (m)
June 2009
                             Section 3 - Background

-------
       Formation,  Destruction,  Transport
(2 of 2)
         Concentrations of pollutants that are secondarily formed in the atmosphere are often
         highest downwind of the source of precursor compounds.  Chemical or physical rates of
         formation determine how far the precursor pollutants travel before they begin forming
         secondary pollutants such as formaldehyde. Factors such as wind speed and
         temperature will also influence where these secondary pollutants are formed, relative to
         where they were originally emitted. Generally, pollutants that are secondarily formed
         do not have steep concentration gradients near the original precursor emissions
         source.
         The distance that a particular air pollutant emitted from a source may travel is
         determined by atmospheric chemistry (pollutant lifetimes and formation and removal
         processes), meteorology (air mass movement and precipitation), and topography
         (mountains and  valleys that affect air movement). The longer a pollutant stays in the
         atmosphere, the farther it can be transported. Some air toxics are removed quickly by
         chemical reactions (e.g., 1,3-butadiene) or physical processes, (e.g., heavy larger
         particles deposit to the ground quickly).  These short-lived pollutants can only travel
         short distances from where they are emitted (1 Os to 10Os of miles).  Other pollutants
         react more slowly and can travel large distances from where they are formed or emitted
         (e.g., toxic metals in PM25).  These pollutants may be more regionally homogenous.
         Finally, some unreactive pollutants can remain in the atmosphere for months, years, or
         decades and spread across the Earth (e.g., carbon tetrachloride).
June 2009                               Section 3 - Background

-------
                      Residence  Time
                               Overview

     •  Residence time is a pollutant-specific measure of the average
       lifetime of a molecule in the atmosphere.
     •  It is dependent on chemical and physical removal pathways; these
       include
         - Chemical: reaction with hydroxyl radical (OH), photolysis
         - Physical: Wet or dry deposition
     •  Why is it important to understand residence times?
         - Residence times can provide insight into the spatial and temporal
           variability of air toxics.
         - Longer residence times result in less spatial variability (e.g., carbon
           tetrachloride).
         - Conversely, short residence times should result in steep gradients in
           concentrations near sources and temporal patterns that are
           dependent on emissions schedules.
     •  Residence times are not characterized well for all air toxics.  Some
       air toxics and their residence  times are listed in the appendix to
       this section.
June 2009                          Section 3 - Background                             10

-------
                  History of Sampling

     •  Air toxics measurements have been collected across the country
       since the 1960s as part of various programs and measurement
       studies.
     •  National monitoring efforts have included programs specific to air
       toxics:
        - National Air Toxics Trends Stations (NATTS)
        - Urban Air Toxics Monitoring Program (UATMP)
     •  Some ambient monitoring networks are designed for other
       purposes but also provide air toxics data:
        - Photochemical Assessment Monitoring Station (PAMS) program
        - Chemical Speciation Network (CSN) which includes the Speciation
          Trends Network (STN)
        - Interagency Monitoring of Protected Visual Environments
          (IMPROVE)
     •  State and local agencies have also operated long-running
       monitoring operations and special studies to understand air toxics
       in their communities.
June 2009                        Section 3 - Background                            11

-------
                          NATTS  Sampling
                                      Overview
         NATTS sampling began in 2003 with 23 sites; the first
         complete year of data was 2004.
         There are currently 27 national air toxics trends sites:
         21 urban and 6 rural.
         Most stations are collocated with PM25
         speciation samplers, and some also
         include PAMS measurements.
         The principle objective of the NATTS
         network is to provide long-term monitoring
         data across representative areas of the
         country for certain priority HAPs
         (e.g., benzene, formaldehyde, 1,3-butadiene,
         acrolein, and hexavalent chromium) in order
         to establish national trends for these and other HAPs.
         Recently, the list of pollutants monitored at NATTS
         sites was expanded to include polycyclic aromatic
         hydrocarbons (PAHs), of which naphthalene is the
         most prevalent.
         All sites follow QA programs for sampling and
         siting.
         Periodic refinement of pollutants and/or sampling
         may be made (e.g., EPA plans to re-evaluate the
         program every six years).
                      National Air Toxics Trend Stations (NATTS)
                    June 2008
                    More information can be found on the
                    NATTS web site:
                    http://www.epa.gov/ttn/amtic/natts.html
June 2009
Section 3 - Background
12

-------
                 NATTS Sampling
                        Objectives

    The primary objectives of NATTS monitoring include
     • Providing air toxics data of sufficient quality to identify
       trends, characterize ambient concentrations in
       representative areas, and evaluate air quality models.
     • Providing tools and guidance that enable consistent,
       high certainty measurements.
     • Using these consistent measurements to facilitate
       measuring progress towards national emission and
       risk reduction goals.
     • Considering all NATTS sites to be NCORE level 2
       sites, thereby providing rich  data sets to address
       multi-pollutant issues. NCORE level 2 sites are
       "backbone" sites providing consistent, long-term data
       for multiple pollutant types.

June 2009                     Section 3 - Background                        13

-------
         Urban  Air Toxics Monitoring  Program (UATMP)
                                                            2007 UATMP Sites

         The UATMP has provided sample
         collection and analysis support since
         1987 to encourage state, local, and tribal
         agencies to understand and appreciate
         the nature and extent of potentially toxic
         air pollution in urban areas.
         Participation in the UATMP is voluntary;
         aside from the NATTS, target pollutants
         and monitor siting are at the discretion of
         each participant agency.
         UATMP  is used by a variety of networks
         including some NATTS, some local-
         scale, and some 105-funded air toxics
         monitoring sites.
         All UATMP samples are analyzed in a
         central laboratory for concentrations of
         VOCs, carbonyls, SVOCs, and metals.

         The laboratory is centrally managed by EPA's Office of Air Quality Planning and Standards
         (OAQPS) Air Quality Assessment Division.
         UATMP assures analytical consistency among participants
          -  Data validation and AQS data entry are standard
          -  Site support available (provide monitors, instrument certification, installation, troubleshooting, etc.)
                                                                 U.S. Environmental Protection Agency (2006f)
June 2009
Section 3 - Background
14

-------
                            PAMS  Sampling
      The goal of the PAMS network is to help assess ozone control programs by
       - identifying key constituents and parameters
       - tracking trends
       - characterizing transport
       - assisting in forecasting episodes
       - assisting in improving emission inventories
      Toxic VOCs sampled by the PAMS network include benzene,
      formaldehyde, xylenes, toluene, ethylbenzene, styrene, and acetaldehyde.
      PAMS sites collect subdaily measurements at the same sites that are useful
      in assessing diurnal trends.
      More information can be found on the PAMS web site at
      http://www.epa.qov/ttn/amtic/pamsmain.html.
                             • PAMS Mentoring N«twoik
                                        December 2007
           RAMS NETWORK DESIGN
                       'EXTREME
                        DOWNWMD SITE
                 MAXIMUM
                 OZONE SITE
          SECONDARY
          MORNING    DISTRICT
           WHO
                      PRIMARY AFTER MOON
                      WIND
           PRIMARY MORNING WIND
Analysis Objectives
Corroborate precursor El
Assess changes in
emissions; corroborate
reductions
Assess ozone & precursor
trends
Provide input to models;
evaluate models
Evaluate population
exposure
Other analyses:
biogenics
transport
source apportionment
diurnal patterns
day-of-week
episode vs. non-episode
PAMS Site Type
1
(Upwind)



^

•/
^
II
(Max. Emissions)
•/
^
•/
^
^
•/
^
•/
V
(Max. Ozone)



^
^
^
^
^
IV
(Downwind)



^

•/
^
June 2009
Section 3 - Background
                                                            U.S. Environmental Protection Agency (2006c)
15

-------
                     CSN  Sampling
      The Chemical Speciation Network is a companion network to the
      mass-based Federal Reference Method (FRM) network
      implemented in support of the PM2 5 National Ambient Air Quality
      Standards (NAAQS).
      The purpose of the CSN is to provide nationally consistent
      speciated PM2 5 data for the assessment of trends at
      representative sites in urban areas across the country.
      As part of a routine monitoring
      program, the CSN quantifies mass
      concentrations and PM25 constituents,
      including numerous trace elements,
      ions (sulfate, nitrate,  sodium,
      potassium, ammonium), elemental
      carbon, and organic carbon.
      CSN data are available via AQS.
      Prior to 2007, the carbpn (especially EC)
      measurements from this network differed
      from IMPROVE. A phased in change in
      methodology is underway
                         Hawaii O
                             Circa 2005
June 2009
f   H''j  ' """ '.' i. ' ).


Section 3 - Background
                                           U.S. Environmental Protection Agency (2007f)
16

-------
                  IMPROVE Sampling

       Interagency Monitoring of Protected Visual Environments (IMPROVE)
       program provides PM25 speciated and mass measurements in 156 Class I
       areas (national parks and wildness areas). Speciated PM25 metals are the
       only toxics measured in this network.               IMPROVE Site Locations
       Data are available in AQS.
       IMPROVE data can also be accessed via the
       internet from the  VIEWS* web site.
        - Raw data and various aggregates can be obtained in a
           variety of output formats (ASCII, HTML, XLS etc.).
        - All data from the inception of the IMPROVE network in
          1988 are currently available.
       User-input mapping and plotting tools are available to visualize trends,
       spatial patterns, back trajectories and metadata (i.e., site locations).
       IMPROVE also provides site photos and local topographical maps which are
       very useful for data analyses.
       To download data or get more information see
       http://vista.cira. colostate. edu/views/
                                                  *VIEWS: Visibility Exchange Web System
June 2009
Section 3 - Background
17

-------
       Local-Scale  Monitoring Projects

       EPA began programs to fund local-scale monitoring projects
       beginning in the 2004 fiscal year.
       The goal of local monitoring is to provide more flexibility to
       address middle- and neighborhood-scale (0.5 km to 4 km) issues
       that are not handled well by national networks, given the diversity
       of toxics issues across the nation.
       Specific objectives include identifying and profiling air toxics
       sources, developing and assessing emerging measurement
       methods, characterizing the degree and extent of local air toxics
       problems, and tracking progress of air toxics reduction activities.
       Projects are selected through an open competition process.
       Grant topics, funding levels, and number of awards are set for
       each grant cycle - for more information, see
     •  Local scale monitoring is typically only conducted from 1-2 years.
     U.S. Environmental Protection Agency (2006c).


June 2009                         Section 3 - Background                            is

-------
      Air Toxics Sampling  and  Analysis


      • Because air toxics are present in the atmosphere in gaseous, particulate,
        and semi-volatile form, no single measurement technique is adequate.
        Differences in chemical and physical properties further complicate
        collection; the choice of measurement technique depends on the
        objectives of data collection, including the chemical species of interest,
        funds available, and desired detection limit
      • EPA offers seventeen approved sampling and analysis methods for toxic
        gases; among the most commonly used methods are the following:
         -  Compendium method TO-11 A. Used to measure formaldehyde and other carbonyl
            compounds. Previous methods include TO-5 which had lower sensitivity and
            reproducibility and was more labor-intensive. Method TO-11A uses coated
            dinitrophenylhydrazine (DNPH) cartridges to collect the samples and analyzes them
            using high performance liquid chromatography (HPLC).
         -  Compendium method TO-13A. Used to measure Polycyclic Aromatic Hydrocarbon
            (PAH) compounds. This method allows for a variety of sampling media; an effective
            choice is the combination of polyurethane foam (PUF) and XAD-2 ®.  Samples are
            analyzed by high resolution gas chromatography/mass spectrometry (GC/MS).
         -  Compendium method TO-15. Created  to target 97 compounds on  the list of 187
            hazardous air pollutants. The method uses specially prepared canisters analyzed by
            high resolution gas chromatography/mass spectrometry (GC/MS).


June 2009                            Section 3 - Background                                 19

-------
       Air  Toxics Sampling  and Analysis  (2of2)

        •  EPA-approved methods for collection and
          analysis of suspended participate matter are
          documented in the "Compendium of Methods for
          the Determination of Inorganic Compounds in
          Ambient Air."
           - Chapters 1 and 2 address mass measurement only;
             while important to the criteria air pollutant program,
             these chapters are not of particular importance to
             the air toxics ambient monitoring  program:
              • Chapter IO-1, Continuous Measurement of PM10
                Suspended Particulate Matter (SPM) in Ambient Air
              • Chapter IO-2, Integrated Sampling of Suspended
                Particulate Matter (SPM) in Ambient Air
           - Chapter IO-3, Chemical  Species Analysis of Filter-Collected Suspended Particulate Matter
             (SPM), is of considerable importance to the air toxics ambient monitoring program
              • Several different methods for  speciated particulate analyses are available
                     -  Each have advantages and disadvantages depending on the target analytes and
                        desired  minimum detection limits.
                     -  For Hazardous Air Pollutant (HAP) metals, IO-3.5 (Inductively Coupled Plasma /
                        Mass Spectrometry (ICP/MS)) offers the lowest detection limits.

        •  Detailed information about these  monitoring methods is available at:
June 2009
Section 3 - Background
20

-------
    Differences Among Sampling Networks

    •  When using data from different sampling networks, it is
       important to consider
       - The multiple sampling networks from which data were drawn
         for these analyses vary in their objectives and sampling and
         analytical methods. Data may not always be comparable.
       - Sampling, analysis, method detection limits, objectives, site
         characteristics, etc. have changed over time. Care is needed
         in interpreting temporal and spatial trends.
    •  Analysts need  to gather, and understand, all metadata
       prior to conducting analyses.
June 2009                     Section 3 - Background                        21

-------
         Critical  Issues for Interpretation

      Issues to consider when planning and performing data analysis
       • Data quality.  Information from collection and chemical analysis such as standard
         operating procedures, audits, accuracy and precision, and data validation provide
         insight into sample and collection biases and errors. This information is necessary for
         data validation. Metadata such as precision and accuracy are required for other
         analyses (e.g., receptor modeling).
       • Data quantity. The number of species and amount of data above detection give
         insight into what analyses can be performed and provide a starting point for planning
         data analysis.
       • Sampling duration. Duration provides information about analysis possibilities, for
         example, 24-hr data cannot be used to investigate diurnal patterns.  This information
         may also be necessary for calculating completeness criteria when aggregating data.
       • Sampling frequency. Frequency information provides further insight into what
         analyses will  be possible; for example, one year of 1-in-6 day data may not be
         sufficient to investigate day-of-week tendencies. Sample frequency will also be
         necessary to  calculate data completeness and to aggregate data.
       • Complementary data. Additional data for criteria pollutants, speciated PM, and non-
         toxic hydrocarbons and meteorological data can be useful  in a variety of analyses
         such as data  validation, understanding transport, and source identification.

June 2009                             Section 3 - Background                                 22

-------
                       Sampling Design
      •   To develop a sampling design or monitoring plan, the following
         should be considered:
          - Monitoring objectives including consideration of geophysical setting,
            meteorology, types and characteristics of sources, and existing
            monitoring programs.
          - Data quality objectives needed to answer questions to be asked of the
            data (i.e., how precisely or accurately do the questions need to be
            answered?).
          - Options for what, when, where, how frequently, and for how long to
            monitor; these are related to the selection of appropriate monitoring
            equipment and laboratory analyses.
          - Data quality assurance and validation approach including collocated data
            requirements, QA programs for analytical laboratories, and  data
            validation guidelines for ambient data.
          - Options for data analysis and exploration including  available tools, data
            analyses, data needs, and training  needs.
      •   Sampling design for the national air toxics  monitoring program is
         thoroughly discussed by Battelle and available here:
         ......0..;..!;.:...'.....;.:....;...-;...........;:....;.:,.:.:..,..<.:.............. (Phase I report).

June 2009                           Section 3 - Background                                23

-------
                          Resources
                       Monitoring Networks
       NATTS: http://www.epa.gov/ttn/arritic/natts.htiTil
       UATMP: http://www.epa.gov/ttn/arritic/uatrri.htrril
       PAMS: http://www.epa.gov/ttn/arritic/parrisrriain.htrril
       CSN: http://www.epa.gov/ttn/arritic/speciepg.htrri
       IMPROVE: A source of speciated PM2 5 data
       http://vista.cira.colostate.edu/views/
       Local scale monitoring programs:
       http://www.epa.gov/ttn/amtic/local.html
June 2009                        Section 3 - Background                            24

-------
                                Appendix
                             Residence Times
        Approximate atmospheric residence
        times for some air toxics are listed
        here.

        These values were found at
             ';/. To find the atmospheric
        persistence of other air toxics, enter
        the pollutant's name in the chemical
        profile.  Once the pollutant page is
        available, select "links" and the
        entry for "CalEPA Air Resources
        Board Toxic Air Contaminant
        Summary". A summary of physical
        properties is provided including
        atmospheric persistence.
Species
Carbon Tetrachloride
Chloroform
Tetrachloroethylene
Methylene Chloride
Benzene
1 ,2-Dichloropropane
Trichloroethylene
Acrylonitrile
Ethylbenzene
Vinyl Chloride
Formaldehyde
Acrolein
Naphthalene
Acetaldehyde
1,3-Butadiene
Arsenic and other toxic
metal compounds
Lifetime by reaction with OH
decades
months
months
months
84hrs
weeks*
84hrs
2.4 days
2 days
27hrs
26hrs
17hrs
16hrs
12hrs
2.8 hrs
N/A**
                                             * Wet deposition is also a sink
                                             ** Lifetime is dependant on particle deposition and is typically days to
                                              weeks. Deposition time is primarily determined by the size of the
                                              particles.
June 2009
Section 3 - Background
25

-------
                                   References
(1 of 2)
         Hitchins J, Morawska L, Wolff L, Gilbert D. (2000) Concentration of submicrometer particles from vehicle
             emissions near a major road. Atmos Environ 34:51-59.
         Jaramillo VL, Kavouras I (2005). Monitoring, Source Identification, and Health Impacts of Air Toxics in
             Albuquerque, NM. available on the internet at http://www.epa.gov/ttn/amtic/toxfy05.html
         Kinney PL, Aggarwal M, Northridge ME, Janssen NA, Shepard P. (2000) Airborne concentrations of PM2.5 and
             diesel exhaust particles on Harlem sidewalks: a community-based pilot study. Environ Health Perspect
             108:213-218.
         Seinfeld J.H. and Pandis S.N. (1998) Atmospheric chemistry and physics: from air pollution to global change, J.
             Wiley and Sons, Inc., New York, New York.
         U.S. Environmental Protection Agency. EPA's Air Toxics Risk Assessment (ATRA) Reference Library describes
             the basics of exposure assessment, toxicity evaluation,  and risk characterization (chronic and acute) for toxic
             pollutants released to the air from stationary, mobile, and other types of sources. The library covers both
             human and ecological assessment for individual  sources of pollution as well as the combined impact of
             multiple sources. This guidance is amenable to a variety of purposes,  including assessments conducted
             under the air toxics provisions of the Clean Air Act, analysis of combined multisource risks at the community
             level, and as a supplement to other Agency guidance (e.g., as an aid to Superfund risk assessors evaluating
             the air exposure pathway), http://www.epa.gov/ttn/fera/risk_atra_main.html
         U.S. Environmental Protection Agency (2001) Pilot City Air Toxics Measurements Summary. Available on the
             Internet at http://www.epa.gov/ttn/amtic/natts.html.
         U.S. Environmental Protection Agency (1999) 1999 TO Compendium of Methods Second Edition. Available on the
             Internet at http://www.epa.gov/ttn/amtic/airtox.html.
         U.S. Environmental Protection Agency (1999) 10 Compendium of Methods for the Determination of Inorganic
             Compounds in Ambient Air, EPA/625/R-96/01a available online at
         U.S. Environmental Protection Agency (2006a) A Preliminary Risk-Based Screening Approach for Air Toxics
             Monitoring Data Sets. Available on the Internet at http://www.epa.gov/region4/air/airtoxic/Screening-041106-
             KM.pdf


June 2009                                       Section 3 - Background                                              26

-------
                                   References
(2 of 2)
         U.S. Environmental Protection Agency (2006b). NATA Glossary. Available on the Internet at
             http://www.epa.gov/ttn/atw/nata/gloss.html
         U.S. Environmental Protection Agency (2006c). Local-Scale Monitoring Projects. Available on the Internet at
             http://www.epa.gov/ttn/amtic/local.html.
         U.S. Environmental Protection Agency (2006e). PAMS - General Information. Available on the Internet at
             http://www.epa.gov/oar/oagps/pams/general.htmlffcontacts.
         U.S. Environmental Protection Agency (2006f). 2005 Urban Air Toxics Monitoring Program (UATMP) Available
             on the Internet at http://www.epa.gov/ttn/amtic/uatm.html
         U.S. Environmental Protection Agency (2007a) Air pollution and health risk. Available on the Internet at
             http://www.epa.gov/ttn/atw/3_90_022.html
         U.S. Environmental Protection Agency (2007b) Risk Assessment for Toxic Air Pollutants: A Citizen's Guide.
             Available on the Internet at http://www.epa.gov/ttn/atw/3_90_024.html
         U.S. Environmental Protection Agency (2007c). About air toxics. Available on the Internet at
             http://www.epa.gov/ttn/atw/allabout.html
         U.S. Environmental Protection Agency (2007d) Evaluating Exposures to Toxic Air Pollutants: A Citizen's Guide.
             Available on the Internet at http://www.epa.gov/ttn/atw/3_90_023.html
         U.S. Environmental Protection Agency (2007e) About air toxics, health and ecological effects. Available on the
             Internet at http://www.epa.gov/air/toxicair/newtoxics.html.
         U.S. Environmental Protection Agency (2007f) PM Research. Available on the internet at
             http://www.epa.gov/pmresearch/pm  grant/06 monitoring programs.html.
         U.S. Environmental Protection Agency (2007g) Fact Sheet.  Available on the Internet at
             http://www.afcee. brooks. af.mil/pro-act/fact/caa.asp#2.
         Zhu Y, Hinds WC, Kim S, Shen S, Sioutas C. 2002b. Study on ultrafine particles and other vehicular pollutants
             near a busy highway. Atmos Environ 36:4323-4335
June 2009                                        Section 3 - Background                                              27

-------
        Preparing Data for Analysis
          How do I get my data ready for analysis?
            How do I treat data below detection?
June 2009
                  Section 4 - Preparing Data for Analysis

-------
                 Overview

This section provides suggestions on acquiring and
preparing data sets for analysis, which is the basis for
subsequent sections of the workbook.
Data preparation is sometimes more difficult and time-
consuming than the data analyses.
It is vital to carefully construct a data set so that data
quality and integrity are assured.
In the process of constructing and validating data, the
analyst gains important insight into the data that may
help direct and facilitate the analyses.
               Section 4 - Preparing Data for Analysis

-------
       Data  Quality Objectives

Preparation of data for subsequent analyses is tied to the data
quality objectives (DQOs) to be achieved. A DQO is
measurement performance or acceptance criteria established as
part of the study design. DQOs relate the quality of data needed
to the established limits on the chance of making a decision error
or of incorrectly answering a study question.
In setting DQOs, consider
 - who will use the data;
 - what the project's goals/objectives/questions or issues are;
 - what decision(s) will be made from the information obtained;
 - what type, quantity, and quality of data are specified;
 - how "good" the data have to be to support the decision to be made.
EPA provides guidance on setting DQOs: G-4 Guidance on
Systematic Planning  Using the Data Quality Objective Process,
                 Section 4 - Preparing Data for Analysis

-------
June 2009
           Preparing  Data for Analysis

              What's Covered in This Section?

       Data availability
        - What data are available?
        - Sources for ambient air toxics data
        - Accessing data systems and acquiring data
           • AQS
           • IMPROVE
           • SEARCH
           • Other archives
        - Supplementing air toxics data
        - Know your data
       Data processing
        - Investigating collocated data
        - Preparing daily, seasonal, and annual averages
        - Determining data completeness
        - Treating data below detection
       Data validation
        - Procedures and tools
        - Handling suspect data
                       Section 4 - Preparing Data for Analysis

-------
                 What  Data  Are  Available?
                                 Air  Toxics  Overview
         Air toxics ambient monitoring data is
         typically collected in three major
         durations (1-hr, 3-hr, 24-hr)
         Sampling frequencies vary from
         subdaily, daily, 1-in-3-day,1-in-6-day, to
         1-in-12-day
         Some sites have operated as long-term
         (multiple year) sites while others may
         report data for a short study only (e.g., a
         week or two).
         Data can be reported in a range of
         units. For analyses, consistency in
         units is essential.
         For data to be useful, a minimum of
         monitor locations, concentration units,
         method codes, and parameter names is
         required. Sampling frequency
         information is also desirable.
         Keep in mind: Air toxics measurements
         are primarily captured in urban areas as
         shown in the figures. VOC*
         measurements, for example, are
         typically made in higher population and
         higher population density areas relative
         to all counties in the United States.
   US counties
   Counties with metals measurements
   Counties with VOC measurements
      Median county population
                                 The subsets of
                                 counties with metals or
                                 VOC measurements
                                 have median
                                 populations that are at
                                 the upper end of the
                                 distribution compared
                                 to all US counties.
                                305,000
100
1000
10000     100000

  Population
1000000
10000000
                    Plot prepared in SYSTAT using
                    2000 census and locations of air
                    toxics monitors in 2003-2005.
June 2009
                                  Section 4 - Preparing Data for Analysis
                   VOC: Volatile Organic Compound

-------
        What  Data  Are  Available?

          Sources for Ambient Air Toxics Data

Air toxics data are mostly obtained from federal, state, local
and tribal monitoring agencies and are listed here:
 •  EPA's Air Quality System (AQS)
 •  IMPROVE1 speciated PM25 data can be downloaded from VIEWS2
   web site, http://vista.cira.colostate.edu/views/
 •  SEARCH3 speciated PM2 5 data can be downloaded from
   Atmospheric Research Analysis web site,
   http://www.atmospheric-research.com/public/index.html
 •  Air Quality Archive (AQA) (1990-2005) developed during Phase V
   national air toxics analysis project; includes legacy air toxics archive
   data (data posted  here http://www.epa.gov/ttn/amtic/toxdat.html)
 •  Local, state and tribal air quality agency databases (i.e., some data
   are not yet submitted to AQS)
 1 IMPROVE = Interagency Monitoring of Protected Visual Environments
 2 VIEWS = Visibility Information Exchange Web System
 3 SEARCH = Southeastern Aerosol Research and Characterization Study
                    Section 4 - Preparing Data for Analysis

-------
                           AQS  Data
                               Overview
• AQS is the EPA's principal data repository, containing the most complete
  set of toxics (and other) data available.
• To obtain the massive data set required for the national analysis, AQS
  was accessed via the Intranet with a user ID obtained from EPA.
   - AMP501 request provides raw data in R-2 format.
      •   Data are available from 1995 to the present in AQS.
      •   Annual air toxics data are required to be submitted to AQS within 180 days of end of
         Q4, i.e., 2007 data would be entered by July 2008.
      •   Archived AMP501 data prior to 1995 were requested directly from EPA.
   - Data from AQS are provided in a pipe-delimited format that needs to be
     transformed and processed.
      •   For the national  assessment, SQL server was used to process data.
      •   Publicly available VOCDat can be used to process data from one site at a time
         (http://vocdat.sonomatech.com/).
• Some  data, such as criteria pollutant summaries, are available for
  download without a user ID; most air toxics are not yet available this way.
• Find additional information about AQS at
  http://www.epa.gov/ttnmain1/airs/airsaqs/
• The AQS Discoverer site may be used to retrieve data:
  http://www.epa.gov/ttn/airs/airsaqs/aqsdiscover/

                        Section 4 - Preparing Data for Analysis

-------
                        AQS  Data

                             Codes

AQS uses a variety of codes to simplify and condense information in the
R-2 output file.
Key Codes
 - AQS site code; identifies a particular monitoring site.
 - AQS parameter code; identifies the pollutant measured.
 - AQS parameter occurrence code (POC); distinguishes among monitors for the
   same pollutant at the same site.
 - AQS method code; unique for each combination of sample collection and
   analysis.
Each code contains additional metadata which would  be unnecessarily
repetitive if included in the R-2 file.
 - For example, default method detection limits MDLs) are  not provided in the
   R-2 file. This information must be looked up on the AQS website (below) using
   the method query tool.  Alternate MDLs, on the other hand, are included in the
   R-2 file since they are unique to each record.
Descriptions of codes and additional metadata can be found at
                     Section 4 - Preparing Data for Analysis

-------
             Other Data Archives
                                           (1 of 2)
                                               SEARCH Site Locations
IMPROVE data - PM25 speciated and mass measurements in
156 Class I areas (national parks and wildness areas). Speciated
PM2 5 metals are the only toxics measured in this network.  Further
described in Section 3, "Background".
SEARCH data - PM2 5 species and mass
measurements at 8 sites in the Southeast
from 1998 to the present. Speciated PM25
metals are the only toxics measured in this
network. At the time of the national analysis,
these data were not available in AQS.
  - SEARCH data are publicly available via the
    Internet and can be downloaded on a site-by-
    site basis in a Microsoft Excel output format.
  - Site photographs and other useful metadata are available at
    the web site,       v                          v ,
June 2009
                        Section 4 - Preparing Data for Analysis

-------
        Other  Data  Archives
(2 of 2)
As part of several projects, an air quality archive (AQA) was developed as
an analysis-ready database that includes data from AQS (1990-2005),
IMPROVE and SEARCH data, and data from the legacy air toxics archive.
This national level database contains nearly 1 billion raw data records, 27
million raw toxics records, and complete validated and temporally
aggregated data sets.
Key data summaries have been posted hjti^/w^^
    - 24-hour CSV Files (very large file)
    - Monthly CSV Files
    - Quarterly CSV Files
    - Annual Average CSV Files
    - SAS Files (all data, very large file)
Note: CSV files are comma separated files suitable for importing into spreadsheets or
databases. These files are too large to fit into Microsoft Excel spreadsheets but will fit
into Microsoft Access. The SAS files are for use with the SAS Statistical Software
package.
                     Section 4 - Preparing Data for Analysis

-------
            Supplementing  Air  Toxics Data

                           A Note on Data Acquisition
A complete set of data is always desirable to assist in analysis. Nontoxic species,
meteorological data, and site-specific conditions (e.g., proximity to emissions) provide
supporting information that will help in data interpretation. You may want to obtain the
following:
 • Additional data
     - Criteria pollutant species (AQS): multipollutant relationships, transport,  diurnal/seasonal
       evaluation, source identification
     - Meteorological data (AQS, NWS): transport, mixing, source direction, meteorological
       adjustment of trends
     - All PM2 5 speciation data (OC,  EC, sulfate, nitrate, etc.): source identification
     - Aethalometer™ data (black carbon): diurnal characterization, source identification
     - All speciated  hydrocarbon data (e.g., full PAMS target list):  air parcel age (transport), source
       identification
     - Special studies data (e.g., continuous speciated PM data, ammonia): diurnal characteristics,
       source identification
 • Metadata
     - Monitoring objectives: time-frame of data, reasoning for site locations
     - Site characteristics (e.g., photos): may explain data anomalies, source identification
     - Monitoring scale (likely varies by pollutant): air parcel age (transport), source identification
 • Supplemental data
     - Emission inventory, especially point sources: source identification
     - Population density: relative concentration level
     - Vehicle traffic counts: diurnal patterns, source identification
 • Links to these data can be found in the resources section of this chapter.

                            Section 4 - Preparing Data for Analysis

-------
                 Supplementing  Air  Toxics  Data
                                         Using Metadata
  Although some metadata are available through
  AQS, metadata are not routinely populated.
  Site metadata can assist in analyses by illuminating
  sources (such as local sources or roadways) or
  physical attributes of the site.
  The satellite image shows the monitoring site (red
  circle) near an oil refinery that likely influences VOC
  concentrations at the site.
  A comparison of benzene annual averages at this
  site (red) to the state-wide annual average (blue)
  indicates benzene concentrations at this site are
  significantly increased.
  The satellite image was obtained from Google
  Earth, a publicly available program that contains
  satellite coverage of the entire planet and is very
  useful to investigate monitor siting.
   - The program is easy to use; site locations can be entered
     as latitude and longitude or as a street address or
     browsed to manually. Geographic data for multiple sites
     can also be imported from text files.
   - Once the site is located, it can be marked and named,
     high-resolution pictures can be exported, and the site
     information can be saved for future reference.
   - Use caution when interpreting maps—reported precisions
     of monitor locations vary and not all significant sources
     will be easy to identify visually.
  In this case, preliminary evidence shows the
  refinery may influence local benzene
  concentrations; however, this evidence is not
  conclusive. Other local sources, local meteorology
  (e.g., wind direction on high days), and data or
  monitoring issues must be further investigated.
  9 -i
E 7-
Se-
0)
o
o
o
                                        Site Annual Average
                                        State Average
  5 -
  4 -
  3 -
  2 -
  1 -
  0
  2000
          2001
2002
2003
2004
2005
2006
2007
                              Year
June 2009
                                       Section 4 - Preparing Data for Analysis
                                                              12

-------
               Supplementing  Air  Toxics  Data
                                    Using Metadata
    This sample map shows point
    source emissions of criteria
    pollutants and annual
    average daily traffic counts in
    the Detroit area near three
    monitoring sites. The
    Dearborn site is closest to
    major industry. Higher
    concentrations of VOCs and
    PM2 5 at the Dearborn site
    could be explained by these
    sources.
    Emissions sources for more
    detailed species (i.e., not all
    VOCs lumped together) are
    publicly available at the
    county level from the latest
    version of the NEI.
    This figure was created with ESRI's
    ArcMap program and NEI 2002 point
    source emissions data.
Macomb County
        O
Point Source Emissions

 PM26 (tonnes/year)
  ° 10
  O 100
            O
   1,000
            NH3 (tonnes/year)
             .  1

             •  10


               100


            NOx (tonnes/year)

               10

               100


               f 10,000


            VOC (tonnes/year)

             • 10

             * 100

            0 1,000

            SO2 (tonnes/year)

               100

             •  1,000


              ' 10,000


            Annual Avg Daily Traffic

             — 0-10000

             — 10001 - 20000

              20001 - 50000

              50001- 100000

              100001 -175000

            	 175001 -220000
June 2009
                                  Section 4 - Preparing Data for Analysis
                         13

-------
         Converting Units
(1of2)
Frequently used units for gaseous air toxics include
|jg/m3, parts per billion (ppb), and parts per billion
carbon (ppbC).

The preferred units for risk assessment are |jg/m3. The
data are not always delivered or reported in these units.

Useful equations for converting data  units:
 [cone, in |jg/m3] = ([cone, in ppb] * MW * 298 * P )/(24.45 * T * 760 )
 [cone, in ppb] = ([cone, in ug/m3] * 24.45 * T * 760 )/( MW * 298 * P )
 ppbC = ppb x (# of carbons in the molecule)

 where:
 MW = molecular weight of compound [g/mol]
 P = absolute pressure of air [mm Hg]; 1 atm = 760 mm Hg
 T = temperature of air [K]; 298 K is standard
                Section 4 - Preparing Data for Analysis

-------
             Converting  Units

Examples

 Benzene (C6H6)- convert 1 ppb to |jg/m3 at standard T and P
 [cone, in |jg/m3] = ([1 ppb] * 78.11)7(24.45) = 3.195 |jg/m3
        where T = 298 K (25 C) and P = 760 mm Hg

 Carbon tetrachloride (CCI4)- convert 1 |jg/m3 to ppb at 0 C, 1 atm.
 [cone, in |jg/m3] = ([1 ppb] * 153.82*298)7(24.45*273) = 6.867 |jg/m3
        where P = 760 mm Hg

 The EPA provides a thorough walk-through of the unit conversion process:
                     Section 4 - Preparing Data for Analysis

-------
                  Know Your  Data

                            Overview

•   Before beginning data validation, it helps to know the typical patterns in
   an air toxics data set. Having this knowledge helps the analyst set
   expectations for data patterns and identify data anomalies.  Diurnal and
   seasonal patterns help analysts understand possible impacts on data
   aggregations when some data are missing.
•   Using the power of the central tendencies in a large national data set,
   typical air toxics relationships are provided. Patterns at individual sites
   may differ from the typical examples shown— understanding why there
   are differences becomes part of the data validation and data analysis
   steps.
•   EPA has developed tabulated dose-response assessments for use in risk
   assessment of hazardous air pollutants. The information can be found in
   two tables at this website: -lip //•>/./•>/./•>/./ on^aov/n.-7alw/toysoi/>-cc/si.-	x-xa"y....:'/:;:...
   One table presents values for long-term (chronic) inhalation and oral
   exposures and the other presents short-term (acute) inhalation
   exposures. Note that these tables are updated periodically to reflect the
   most recent information; revisions can make a significant impact on risk
   screening assessments.

                      Section 4 - Preparing Data for Analysis

-------
                            Know Your  Data
         Typical Air Toxics Relationships: Seasonal  Trends
      Pollutants that typically correlate well
       - Acetaldehyde and formaldehyde, similar
         sources and reactivity
       - Benzene and 1,3-butadiene, especially at
         locations influenced by mobile source emissions
       - Toluene, benzene, and ethylbenzene
          • Toluene concentrations are typically
            higher than benzene concentrations
          • Toluene and ethylbenzene typically
            correlate well
      National seasonal patterns
       - Warm season peak
          • Formaldehyde
          • Acetaldehyde
          • Chloroform
          • Manganese PM2 5
       - Cool season peak
          • Benzene
          • 1,3-butadiene
          • Hexane
          • Chlorine PM25 (especially at locations where
            roads are salted in winter)
       - Invariant, carbon tetrachloride
          Example Seasonal Patterns

-------
                          Know Your  Data
          Typical Air Toxics Relationships: Diurnal  Trends
                                               Example Diurnal Patterns
                                        •Benzene - •- Methylene chloride —A- 'Carbon Tetrachloride —x— Formaldehyde
                                   ra
                                   0)
                                   u
                                   c
                                   o
                                   u
                                   •o
                                   0)
                                      Midday Peak
                                         X	
                                      Nighttime Peak
June 2009
Midday peak, photochemical
production
 - Acetaldehyde
 - Formaldehyde
Morning peak, mobile
sources                   2
                           o
 - Benzene
 - 1,3-butadiene
 - Xylenes
 - Hexane
 - Ethylbenzene
 - Toluene
 - 2,2,4-trimethylpentane
Nighttime peak, affected by
dilution
 - Methylene chloride
 - Mercury Vapor               The plot shows example diurnal patterns of benzene, methylene
Inworiont                     chloride, carbon tetrachloride, and formaldehyde at a national level.
 I VCll I Cllll                     .,      ,  ,  ... ...    ,, .-  ,
                              It was created with Microsoft Excel.
 - Global background, carbon
   tetrachloride
                      Section 4 - Preparing Data for Analysis
Photo-chemical peak
                                                  Rush hour peak
                                           -- — -A— -- — A- — --
                                     0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23
                         18

-------
                      Collocated  Data

                                Overview

       Differences between replicate, duplicate, and collocated
       measurements
        - A replicate sample is a single sample that is chemically analyzed
          multiple times.
        - A duplicate sample is a single sample that is chemically analyzed twice.
         These samples provide a measure of the precision of the chemical
         analysis, but do not provide any error estimates for the sample
         collection method.
          In contrast, collocated samples are two samples collected at the same
          location and time by equivalent samplers and chemically analyzed by
          the same method.
         These samples provide a measure of the precision of both sample
         collection and chemical analysis.
       EPA's National Air Toxics Trend Sites (NATTS) program proposed
       the following collocated data standards:
        - Less than 25% bias between collocated samples
        - Less than 15% coefficient of variation for each pollutant
June 2009
                          Section 4 - Preparing Data for Analysis
19

-------
                          Collocated  Data
                         Handling Collocated Data
                                            &
                                            Q.
                                            Q_

                                            CM
                                            
-------
                     Collocated  Data

                  Aggregating Collocated Data

Following are suggested treatments for collocated data:
•  Double counting collocated data should be avoided when creating aggregates such
  as annual averages.  At a site level,
   -  If scatter plots of the collocated measurements correlate well, the values can be averaged
      together for a given site, method, date, and time.
   -  If the collocated measurements do not agree, there can be no certainty which (if any)
      measurement is correct and the data should be excluded from analyses.
      If disagreement is a regular occurrence, confidence in other data collected with the same instruments
      at that site is reduced.
•  After determining that collocated measurements agree, average the two data sets
  together following these guidelines.
   -  If one measurement is missing, use the collocated value as the average value.
      Investigate the value to make sure it is consistent with the rest of the data.
   -  If both values are below detection, treat them as any other data (i.e., average them
      together).
   -  If one measurement is below detection and one is not, use the value above detection as a
      conservative approach.
•  In some monitoring programs, only data from the primary sample are  used in data
  analysis and the collocated sample is used only for quality assurance purposes.
•  At a national level, it was not possible to QC all collocated data.  All valid collocated
  data were averaged together. If a collocated  value was missing, the  secondary
  value was used in its place, and all data were substituted with MDL/2  if they were
  below detection.
                          Section 4 - Preparing Data for Analysis

-------
                    Data  Completeness

                                  Overview

     • When performing an analysis, it is important to ensure that data are
       comparable across sites, years, or other subsets of the data; and it is essential
       to understand the time periods represented in the data (e.g., if the data set is
       missing winter months and concentrations are typically high during winter, an
       annual average might be biased low). Depending on the types of analyses, it
       may be necessary to implement data completeness criteria.
     • Completeness criteria are necessary in creating valid aggregated values (such
       as annual averages) to verify that the distribution of measured values within
       the aggregation window is representative of that entire period. Diurnal,
       day-of-week, and seasonal patterns need to be considered.
     • Data completeness is computed using the reported sampling frequency (when
       available) as a measure of how many samples should be collected in a given
       period versus the number of samples that were collected.  When aggregating
       data, 75% completeness is our suggested minimum value for data. Using
       higher or lower completeness criteria may be appropriate for certain analyses
       depending on your DQOs.
     • If data are missing from a site because of an  unforeseen event (e.g., a
       hurricane), sampling contamination, or other problems, or a site may always
       operate on an incomplete schedule (e.g., ozone monitoring in summer months
       only), data may not be representative of the period of interest.

                            Section 4 - Preparing Data for Analysis
June ^uuy                                                                         2.2.

-------
    20
    15
  CD
03 10
CD
           Data Completeness

           Interpreting Notched Box Plots

Notched box whisker plots are useful for showing the central trends
of the data (i.e., the median) while also showing variability (i.e., the
box and whiskers).
Definitions provided are for plots prepared using SYSTAT software;
other software may have different definitions.
                               300
                   0
                      Date > 3*IR
          Outliers
         tVAi/s/cer
           Box
         (Interquartile
           range)
                      Data w/tfi/fi 3*/f?
              Data within 1.5*IR
                 75th percentile

                 Median

                 25th percentile
                               200
                                     03
                                     CD
                                       100
                                                      \

                                                      o
                                               Outliers
                                       Whisker
                                            Notch
June 2009
                                         0
                          Section 4 - Preparing Data for Analysis
                                          median

                                             25th percentile
                                                        95%C.L-
                                                               Median
                                                               95 % C.I.
                                                                  23

-------
                       Data  Completeness
             Example Effect of Aggregating Incomplete Data
       This example illustrates why data completeness
       criteria should be met when creating data
       aggregates.

       The first graph shows the seasonal pattern of 24-hr
       benzene samples from an urban site. This
       seasonal pattern (lower concentrations in summer)
       is typical of national concentrations and is driven by
       dilution from higher mixing heights in summer.
       Summer concentrations may also be reduced in
       areas where Reid vapor pressure caps are
       implemented (gasoline volatility).
                                                              Benzene
c
g
^4-J
JS
-t-J
c
(U
o
c
o
O
    \  i  \ \  i \  i  i i  i i  r
    I  I  I I  I I  I  I I  I I  I
               SEASON

              • Summer
              • Winter
   01234
5678
MONTH
        9 10 11 12 13
June 2009
       The annual averages in the second figure were
       constructed using only summer (red) or winter
       (blue) data to illustrate aggregation results from an
       incomplete data set (this is NOT how aggregations
       should be constructed).  Incomplete data cause the
       summer "annual averages" to be biased low and
       the winter "annual averages" to be biased high; the
       black line shows the true average of all data. This
       example is an artificial case of incomplete annual
       data, but it demonstrates the importance of applying
       data completeness and the erroneous results which
       may be reached without it.
                               Section 4 - Preparing Data for Analysis
                                 Average of
                               ~~ All Data

                                Summer
                                Winter
    1998
          2004  2006
 2000   2002
   YEAR
Figures were created in SYSTAT
                                     24

-------
                   Data Aggregation

                 Creating Valid 24-hr Averages

       When day-of-week, seasonal, and annual patterns are examined,
       subdaily data may be aggregated to valid daily averages as a
       starting point for comparison.
       In the calculation process, it is important to check that 24-hr
       averages are representative of a significant portion of the day
       because diurnal fluctuations in pollutant concentration throughout
       the day may bias the average if incomplete data are used.
       It is suggested that a 75% daily completeness criteria be used to
       ensure that a large portion of the day is represented. These
       criteria by sample frequency are shown in the table below.
Sample Duration
1-hr
2-hr
3-hr
4-hr
6-hr
8-hr
12-hr
75% Daily Completeness
Cutoff (# of samples)
18
9
6
5
3
3
2
June 2009
                        Section 4 - Preparing Data for Analysis
25

-------
                       Data  Aggregation

                   Creating  Valid Monthly Averages

        Monthly averages are useful in assessing seasonal variability.
        It is suggested data meet the 75% completeness criteria as determined by sample
        frequency, assuming an average of 30 days in a month. Note that low sample
        frequency data may not adequately represent monthly values with any certainty.
        Therefore, at least four samples should be required in a month.
Frequency
Daily
Every 3rd Day
Every 6th Day
Other
75% Monthly
Completeness Cutoff
23
8
4
4
        Unassigned frequencies mean that no frequency was reported with the data and a
        frequency could not be easily determined. The completeness criteria then defaults
        to the minimum to preserve data, but should be identified for later QC if possible.
        In the national data set, 74% of air toxics data were not assigned frequencies. A few
        methods were tested to fully populate the frequencies, but were  not further pursued.
        Also in the national level analyses, monthly averages were only used to investigate
        seasonal patterns.  Quarterly averages were used instead to compute annual
        averages because more data were expected to meet completeness criteria.
June 2009
                             Section 4 - Preparing Data for Analysis
26

-------
                      Data Aggregation

          Creating  Valid Quarterly and Annual Averages

       Annual averages are calculated by first computing valid quarterly averages
       Quarterly Averages
        - Quarterly averages are calculated from valid 24-hr averages.
        - 75% of data at the expected daily sampling frequency is suggested for a valid
          calendar quarter average, i.e.,
Frequency
Daily
Every 3rd Day
Every 6th Day
Every 1 2th Day
Unassigned
75% Quarterly
Completeness Cutoff
68
24
12
6
6
        - At least 58 days are suggested between the first and last sample in a quarter to
          ensure sampling represented the entire quarter.
        - Unassigned frequencies mean that no frequency was reported with the data and
          a frequency could not be easily determined. The completeness criteria then
          defaults to the minimum to preserve data, but should be identified for later QC if
          possible.
       Annual Averages - three out of four valid quarterly averages are required.
June 2009
                            Section 4 - Preparing Data for Analysis
27

-------
                 Method  Detection  Limits

                                    Overview

        The EPA Code of Federal Regulations (CFR) defines the MDL as "The minimum
        concentration of a substance that can be measured and reported with 99%
        confidence that the analyte concentration is greater than zero and is determined from
        analysis of a sample in a given matrix containing the analyte".
        The purpose of an MDL is to discriminate against false positives. Values reported
        below the MDL have much higher uncertainty but can provide insight into the lower
        concentration distribution (i.e., are most values closer to the MDL or to zero?).
                                         MDL
                                    In the illustration, normally distributed
                                    results from a measured value of zero
                                    yields a 99% confidence value (3o) at
                                    3 ppb, which would be used as the MDL in
                                    this case. There is >99% confidence that
                                    values above 3 ppb are not false positives.
     -3
-2
-1
0     1      2
 Concentration (ppb)
                                                           Environmental Protection Agency, 1982
June 2009
                              Section 4 - Preparing Data for Analysis
                                                                       28

-------
         Method  Detection  Limits
   MDLs Are Not Low Enough For Most Air Toxics Measurements

•   52% of all air toxics measurements reported in AQS from 1990-2005
   are at or below the MDL.
•   This percentage varies widely across pollutants; some are close to
   100% below MDL.
•   Data below MDL can be reported in two ways.
   -  Uncensored: The measured value is reported.
   -  Censored: The measured value is replaced with a proxy. Typical
      examples are MDL, MDL/2, MDL/10, or zero
•   The NATTS program requires laboratories to report uncensored
   values; this approach is neither uniformly nor historically applied
   across networks and laboratories.
•   We suggest that data below detection not be removed from analyses.
   A measurement below detection does not necessarily indicate a
   value of zero because ambient concentrations can be lower than
   currently available MDLs. Data below detection are representative of
   the lower ambient concentration range, and removing them from
   analyses will bias results toward higher concentrations and may
   cause incorrect conclusions.

                    Section 4 - Preparing Data for Analysis

-------
   Identifying Censored  Data
(1 of 2)
Data are typically reported as concentration values with accompanying
MDLs.  In AQS, the MDL is either a default value associated with the
analytical method (MDL) or a value assigned by the reporting entity for
that specific record (alternate MDL).
NATTS program guidance suggests that laboratories report all values,
regardless of the MDL.  However, many air toxics data are reported as
censored values—i.e., they have been replaced with zero, MDL/2,  MDL,
or some other value.
Identifying censored values is a necessary first step in treating data
below detection. Reporting of censored data will most likely differ
between sites and may even be different by method, parameter, or time
period for a given site.
Identify and separate data at or below the detection limit along with the
associated MDL and date/time.  If alternate MDLs are available, make
sure to use these alternates over the default MDLs.
                   Section 4 - Preparing Data for Analysis

-------
     Identifying  Censored  Data
(2 of 2)
• Examine the data for obvious substitution.  Count the number of times each
  value at or below detection is reported for a given site, parameter, and method.
  Are the majority of data reported as the same value (e.g., zero or MDL/2)?
   -  If data are largely reported as two or more values, investigate the temporal variation of
      the data.  Are there large step changes where reporting methods or MDLs have
      changed?
   -  Do the duplicate values indicate a typical censoring method (e.g., MDL/2, MDL/10)?
   -  Alternate MDLs may be different for each sample run causing a distribution of values if
      MDL/x substitutions were used.  That values below MDL are not all the same does not
      mean they are not censored.
• Check for MDL/X substitution.
   -  Make a scatter plot of the value vs. MDL to see if the data fall on a straight line.
   -  If the data form a straight line, the slope of the regression line will indicate the value by
      which the MDL has been divided.
          Is the value a reasonable number that would be used for MDL substitution (e.g., 1,2,5
          or 10)?
           -  If the data have been formatted, processed, or converted, ratios may not be exactly the same
              due to rounding differences; the distribution should be close to a straight line and centered
              around a single integer if MDL/x substitutions have been made.
           -  If a bifurcated pattern is observed, the substitution method may have changed over time. Plot a
              time series of the ratios and look for step changes.
       •  The distribution of the ratios should be highly variable if the data are not censored.


                         Section 4 - Preparing Data for Analysis

-------
               Identifying  Censored  Data
                                    Example
        The data shown in the table
        are values for a given air
        toxic below detection in a
        selected year.
        The reported data, at first
        glance, appear to be "real"
        concentrations (e.g., the
        histogram shows a
        distribution of
        concentrations).
        However, the ratio of MDL
        to reported concentration
        equals 2 (with very small
        deviations likely due to unit
        conversions). The
        relationship is also visible  in
        a scatter plot as shown
        here.
        Therefore, in this example,
        the reported concentrations
        have been substituted with
        MDL/2.
c
o
+j
5
+•«
c
0) —.
o «*>

O §]
TJ —•'
0)
t
O
Q.
0)
     0.4
0.3
0.2
     0.1
       y = 2. Ox -0.0
       0.3
         R2 =
            0.5        0.7
              MDL (jjg/m3)
                 15
                 10-
               O
                 5-
                                              0
                                              0.1
0.9



-






'














-
-
-
-
— ,
0.6
0.5
0.4
0.3
0.2
0.1
n n
                                                  0.2
                                                      0.3
                                                     CONC
                                                         0.4
                                                             0.5
Reported
Concentration
(Mg/m3)
0.19161
0.20438
0.22141
0.38748
0.40451
0.37896
0.17032
0.18309
0.27251
0.31935
0.31083
0.29380
0.32361
0.26825
0.27677
0.31509
0.25548
0.32786
0.27677
0.25548
0.25548
0.25548
0.29380
0.31083
MDL (MQ/rn3)
0.38237
0.40834
0.44283
0.77921
0.81327
0.75792
0.34404
0.36193
0.54502
0.64295
0.62166
0.58760
0.65147
0.53225
0.55354
0.63018
0.51521
0.65573
0.55354
0.51521
0.51521
0.51521
0.58760
0.62166
June 2009
                              Section 4 - Preparing Data for Analysis

-------
                    Method  Detection   Limits

                           Treating Data Below Detection  (1 of 2)


       •  Treatment of national-level data
            At a national level, the majority of data collected from 1990 to present have been reported
            below the MDL with censored values; uncensored values are not typically reported.  When
            analyzing national data, all measurements below detection were replaced with MDL/2 for two
            reasons: (1) identification of data sets with uncensored values (i.e., NOT zero, MDL/2, or
            MDL) is difficult and (2) data below detection need to be treated consistently across the entire
            time period and all sites.
       •  Treatment of site-level data
          - In a site-level analysis, in which the analyst knows how the data  have been reported, more
            sophisticated methods may be employed.
              • If uncensored values are reported below MDL, use the data "as is" with no substitution.
              • If uncensored values are not available, use MDL/2 substitution for data at or below MDL if trying to
                calculate an annual  mean value:
                 -  Substitution may lead to a bias on the order of 10-40% in the annual average when < 85% of the data are below MDL.
                 -  At >85% of data below MDL, uncertainties are large and one may only reliably state that the concentration is below MDL.
          - Alternatives to MDL/2 substitution are more statistically intensive; however, in some cases
            they may yield better  results.  Note at a high degree of censoring (>70% censored data), no
            technique will produce good estimates of summary statistics. EPA recommends some
            approaches other than MDL/2 substitution:
              • Regression order statistics (ROS) and probability plotting (MR) methods. ROS and MR methods are
                superior when distribution shape population is unknown or nonparametric.
              • Maximum likelihood estimation (MLE). MLE methods have been shown to have the smallest mean-
                squared error (i.e., higher accuracy) of available techniques when the data distribution is exactly normal
                or lognormal.

                                    Section 4 - Preparing Data for Analysis
Juns ^uuy                                                                                            GO

-------
          Method  Detection  Limits

                Treating Data Below Detection (2 of 2)


•  Treatment of site-level data
   - ROS produces more accurate results when >30% of the data is below detection.
   - MLE does not work well for data sets with <50 detected values.
   - Kaplan-Meier is effective for data sets when less than 70% of the data is
     censored and the distribution is nonparametric.
•  Mixed Data Sets
   - For data sets that have a mix of censored and uncensored data, compare two
     substitution methods: (1) substitute MDL/2 for censored values and leave
     uncensored values  "as is" and (2  ) substitute MDL/2 for all data below detection.
   - Results that are comparable using both substitution methods increase
     confidence in the results, and substitution method 1 should be retained.  If the
     results do not agree, a more sophisticated method for estimating the data below
     MDL may be employed.
•  In all cases, data below detection should be flagged, and the percentage of
  data below MDL calculated for all aggregated values. A more detailed
  discussion of aggregated trends and data below detection (as used in the
  national data analysis) can be found in Section 6.

                           EPA's current guidance is summarized on Slide 42.

                       Section 4 - Preparing Data for Analysis

-------
          Data  Treatment Methods

The selection of a data treatment method for below MDL data depends on
the amount of data below MDL and the data quality objectives which are to
be met. Methods explored in previous air toxics work are discussed next.
    -  Ignore data below MDL.
        • Not recommended. Reduces number of samples. Results in a bias of higher values
          in summary statistics.
    -  Replace data below MDL with zero.
        • Not recommended. May bias summary statistics low.
    -  Replace data below MDL with the actual MDL.
        • Not recommended. May bias summary statistics high.
    -  Replace data below MDL with % non-detects*MDL
        • Not recommended. Found to be similar to MDL/2 substitution.
    -  Replace data below MDL with MDL/2.
        • Recommended as a simple method for calculating mean values with relatively small
          bias.
    -  Replace data below MDL with more statistically intensive approaches (such
       as Kaplan-Meier, Maximum Likelihood Estimation, and Robust Regression  on
       Order Statistics [KM, MLE, and ROS])
        • Recommend for sophisticated analyses such as quantifying percentiles in the data
          rather than simply the mean.
                       Section 4 - Preparing Data for Analysis

-------
Maximum  Likelihood Estimation (MLE)

•   Maximum likelihood estimation (MLE) (also called Cohen's
   method) is a popular statistical method used for fitting a
   mathematical model to data.
•   This method relies on knowing (or assuming) the underlying
   statistical distribution (e.g., lognormal) from which the data are
   derived.
•   Uncensored data are used to calculate fitting parameters that
   represent the best fit to the distribution.
•   MLE is sensitive to outliers and does not perform well if the data
   do not follow the assumed distribution.
•   MLE requires at least 50 uncensored values to work well, so
   1-in-6-day sampling will usually not be sufficient for calculating
   annual statistics using this technique.
                   Section 4 - Preparing Data for Analysis

-------
            MLE  Calculations
               Using Statistical Software
The MLE model is a  parametric analysis because the
distribution is assumed -- usually assumed to be
lognormal for atmospheric data.
Each data value is assigned a range of possible
concentrations:
 - Censored data: Lower value = 0, Higher value = MDL
 - Uncensored data: Lower value = Higher value = Reported value
The statistical software procedure may require a
distribution for the input, or require you to log-transform
your data if a normal distribution is assumed.
Summary statistics will be produced that provide
estimates of mean, standard deviation, and some
percentiles for the data set of interest.
               Section 4 - Preparing Data for Analysis

-------
Nonparametric Kaplan-Meier (KM)

•  Nonparametric methods rely only on ranks of
  data and make no assumptions about the
  statistical distribution of the data.
•  Nonparametric methods are insensitive to
  outliers.
             Section 4 - Preparing Data for Analysis

-------
     KM  Using  Statistical  Software

•   Kaplan-Meier can be accessed under Survival Analysis in most
   statistical packages.
    - This analysis usually expects data to be right-censored (i.e., values
      greater than X, rather than less than X).
    - Data may need to be "flipped". Take your highest value and set it as
      the upper-bound. Subtract all values from it to get your input data set.
      Censored data are considered less than the MDL.
       • Original data set = 10, 7, 3, .",   , 0.7,
       • Flipped data set = 0, 3, 7, •",   , 9.3,
    - Input your flipped data set along with a second column indicating the
      censored data values.
•   The output will include a survival plot (cumulative distribution
   function) and estimated summary statistics for the flipped data set.
    - Re-flip the summary statistics for mean, median, and percentiles.
    - Measures of variances (standard deviation,  confidence intervals) are
      independent of flipping and do not need to be changed from the output
      values.

                     Section 4 - Preparing Data for Analysis

-------
        Robust Regression on

        Order Statistics (ROS)

These techniques calculate summary statistics with a
regression equation on a probability plot.
ROS assumes a distribution only for censored data.
This technique is better for data sets with <30
observations and is therefore suited to typical air toxics
data sets.
              Section 4 - Preparing Data for Analysis

-------
 ROS  using  Statistical  Software

Data are input as reported values and MDL-censored values.
MDL-censored values will need a column indicating they are
censored.
ROS statistics calculate the probability that observed data are
below each MDL value. If there is only one MDL value, this is just
the fraction of data below MDL.
 - Original data set = 10, 7, 3, ", /  -, 0.7, -p .-• • ",:> .,••.••'	'•<•/•.- > »•:; •' <"
     • Probability > 2 = 0.375
     • Probability > 1.5 = 0.375
     • Probability > 0.3 = 0.583
 - Using these probabilities, probability plotting positions are calculated
   for all detected and censored observations using the detected data to
   determine a best-fit distribution.
 - Summary statistics are output from this dataset.
                  Section 4 - Preparing Data for Analysis

-------
              Data  Treatment  Methods
                               Summary

     EPA's current recommendations for treating data below MDL are provided in
     the table below; EPA is developing more definitive guidance.
                   Small # of Samples
                   Large # of Samples
                    Very Large # of
                       Samples
    Exploratory Use
MDL/2
(if only a few samples
are < MDL)
MDL/2
(if< 15% of samples
are < MDL)
Cohen (normal
distribution)
Kaplan Meier (other
than normal)
    Publication Use
Kaplan Meier
Kaplan Meier
Cohen (ifapprox.
normal distribution)
Cohen (normal
distribution)
Kaplan Meier (other
than normal)
    Regulatory Use
Kaplan Meier
Kaplan Meier
Kaplan Meier
                                                       Warren and Nussbaum, 2009
June 2009
                          Section 4 - Preparing Data for Analysis
                                                      42

-------
              Data Validation
                  Introduction (iof2)

Data validation is defined as the process of determining the
quality and validity of observations.
The purpose of data validation is to detect and verify any
data values that may not represent the actual physical and
chemical conditions at the sampling station before the data
are used in analysis.
Validation guidelines are built on knowledge of typical air
toxics emissions sources; formation, loss, and transport
processes; chemical relationships; and site-specific
knowledge.
The primary objective is to produce a database with values
that are of a known quality, an acceptable quality, or a level
of uncertainty given the analyses intended to be conducted.
                 Section 4 - Preparing Data for Analysis

-------
                    Data Validation
                         Introduction  (2 of2)
The identification of outliers, errors, or biases is typically carried out in several
stages or validation levels (U.S. Environmental Protection Agency 1999).
 - Level 0: Routine verification that field and laboratory operations were conducted in
   accordance with standard operating procedures (SOPs) and that initial data processing and
   reporting were performed in accordance with the SOP (typically the monitoring entity
   performs this step).
 - Level I: Internal consistency tests to identify values in the data that appear atypical when
   compared to values in the entire data set.
 - Level II: Comparisons of current data with historical data (from the same site) to verify
   consistency over time.
 - Level III: Parallel consistency tests with other data sets with possibly similar characteristics
   (e.g., the same region, period of time, background values, air mass) to identify systematic
   bias.
The data analyst performs Level 1 steps, and performs additional validation when
other data sets  are available.
Data validation  is improved by understanding air toxics emissions, formation,
transport, and removal processes. Useful supplementary information in
understanding air toxics species (including data sheets and other information about
air toxics species) is available (links and examples are provided in the appendix to
this section).
There is no substitute for the local knowledge of monitoring sites; operators or
those who have extensive knowledge of the area are a unique resource for data
analysts.  However, for those not familiar with a site, spatial maps with topography,
emissions source, and roadway information are excellent tools for understanding
site characteristics.
                        Section 4 - Preparing Data for Analysis

-------
                    Data Validation
                         Initial Approach
•  Look at your data—visual inspection is vital.
•  Manipulate your data—sort it, graph it, map it—so that it begins to tell a
  story. Often, important issues or errors in the data will become apparent
  only after someone begins to use the data for some purpose.
•  Several checks may be made during the  beginning stages of data
  validation to single out odd data
   - Range checks: check minimum and maximum concentrations for anomalous
     values.  National analysis may provide reasonable concentration ranges for
     comparison; these levels are provided in the appendix to this section.
   - Buddy site check: compare concentrations at one site to nearby sites to identify
     anomalous differences.
   - Sticking check: check data for consecutive equal data values which indicate the
     possibility of censored data not appropriately flag.
   - Comparison to remote background concentrations: urban air toxics
     concentrations should not be lower than remote background concentrations.
•  Examples of useful graphics  and summaries include scatter plots, time
  series plots, fingerprint plots  (i.e., sample composition), box whisker plots,
  and summary statistics.
                       Section 4 - Preparing Data for Analysis

-------
              Things to Consider When

                  Evaluating Your Data

•  Levels of other pollutants
  A high concentration of benzene may be valid when concentrations of all mobile
  source air toxics in  the sample are also elevated.
•  Time of day/year
  Higher concentrations of some air toxics are expected in the summer (such as
  formaldehyde) than in the winter and vice versa for benzene.
•  Observations at other sites
  High concentrations of a pollutant at several sites in an area on the same date may
  indicate a real emission event.
•  Audits and inter-laboratory comparisons
  If data are from differing sources, how well did the concentrations compare between
  labs? Did audits show some specific "problem" pollutants?
•  Site characteristics
  High concentrations may be expected for a pollutant emitted by a nearby source.
•  Unique events (e.g., holiday fireworks)
  High concentrations of trace metals associated with fireworks are seen around
  the Fourth of July and New Years Day at many sites.

                       Section 4 - Preparing Data for Analysis

-------
                 Data  Validation
                   Tips and Tricks (1 of 2)
Overall
 - Proceed from the big picture to the details. For example, proceed from
   inspecting species groups to individual species.
 - Inspect every specie, even to confirm that a specie normally absent
   met that expectation.
 - Know the site topography, prevalent meteorology, and major emissions
   sources nearby.
Inspect time series for the following
 - Large "jumps" or "dips" in concentrations which may indicate a change
   in analysis method or MDL.
 - Periodicity of peaks.  (Is there a pattern? Can the pattern be related to
   emissions or meteorology?)
 - Expected seasonal behavior (e.g., photochemically formed species
   concentrations usually peak during summer).
 - Expected relationships among species (e.g., benzene and toluene
   typically correlate).
                    Section 4 - Preparing Data for Analysis

-------
                 Data  Validation

                   Tips and Tricks (2 of 2)

To further investigate outliers,
 - Use wind direction data (e.g., Do outliers occur from a consistent wind
   direction?).
 - Use subsets of data (e.g., inspect high concentration days vs. other
   days for differences in meteorology or emissions).
 - Investigate industrial or agricultural operating schedules, unusual
   events, etc. (e.g., Were high metals data associated with a dust
   event?).
 - Determine local traffic patterns (e.g., When does peak traffic occur?  Is
   there a recreational area or event venue nearby?).
 - If no explanation is forthcoming, try contacting  the agency that
   collected the data; they may have realized a problem too recently to
   report  it,  or your question may alert them to  a problem with data
   collection, analysis, or reporting.
                    Section 4 - Preparing Data for Analysis

-------
                            Data Validation
                          Using Summary Statistics
        Investigation of summary statistics is a great way to begin to understand your data.
        Comparison of your data ranges to "typical" ranges provides a reality check and can
        illuminate errors in your data.
        The table below shows national summary statistics based on 2003 to 2005 annual averages
        for selected species; a complete table can be found in the appendix to this section.
        These data can be used as benchmarks for site-specific comparison; for example, if your
        data are significantly higher than the national 95th percentile, there may be errors  in the
        data.
         - Note that calculation of summary statistics smoothes extreme events so comparison of daily
            data to these numbers, for example, may not be adequate; individual high concentration days
            may legitimately be higher than the summary statistics.
         - We suggest a comparison between similar summary statistics rather than a comparison of
            summary statistics to raw data.
Pollutant
Toluene
N-Hexane
Benzene
Acetaldehyde
M_P Xylene
AQS
Code
45202
43231
45201
43503
45109
Average
% Below
Detection
1
2
2
4
5
#of
Monitoring
Sites
295
168
307
163
266
5th Percentile
Concentration
(|jg/m3)
6.9E-01
2.4E-01
4.9E-01
7.8E-01
2.8E-01
25th Percentile
Concentration
(|jg/m3)
1.5E+00
5.1E-01
7.4E-01
1.3E+00
6.7E-01
Median
Concentration
(|jg/m3)
2.4E+00
8.4E-01
1.0E+00
1.6E+00
1.1E+00
75th Percentile
Concentration
(|jg/m3)
3.8E+00
1.5E+00
1.5E+00
2.3E+00
1.7E+00
95th Percentile
Concentration
(|jg/m3)
7.4E+00
2.7E+00
3.1E+00
4.2E+00
3.4E+00
1-in-a-million
Cancer Risk
Level
(ug/m3)


1.3E-01
4.5E-01

Remote
Background
Concentration
(ug/m3)


1 .4E-01
1 .6E-01

June 2009
                               Section 4 - Preparing Data for Analysis
49

-------
                                Data Validation
                                Buddy Check Example

                                        re
Buddy site checks are important at a site
level.
The plot shows a time series of arsenic
PM2 5 measurements at neighboring sites
near a major emissions source.
Plotting the time series together
illuminates 4 high concentration          o>
measurements which are not in agreemei f"
at both sites (red circles),
as well as, 3 high concentration events
which were recorded at both sites (black
circles).
The measurement agreement (black
circles) between sites offers increased
confidence that arsenic concentrations
were truly higher on these days (i.e., thes
concentration values are not measuremei
or reporting  errors).
Points marked with red circles, on the
other hand, should be flagged as suspect
for further investigation.
  - Check that high concentration events do not
   correlate with unusual events.  In this case,
   the analyst might check whether these events
   coincide with typical firework days such as the
   Fourth of July and New Years Eve; in this
   example these measurements do not.
  - The next step is to check correlation of wind
   direction and local emissions sources as an
   explanation for these measurements.
                                          0.06
                                          0.05
                                                             Arsenic PM2.5 Time Series
                                            Jan-04  Mar-04  Jun-04  Aug-04
Nov-04
Time
Jan-05   Apr-05   Jun-05  Sep-05
                                             Sample time series of 24-hr arsenic PM2 5 measurements
                                             at two sites about five miles apart.  Both sites show above
                                             average arsenic concentrations and are located near a
                                             major emissions source.  The figure was created in
                                             Microsoft Excel.
June 2009
                                    Section 4 - Preparing Data for Analysis
                                                                                                 50

-------
            Screening  Data  Using Remote


              Background  Concentrations

       Knowledge of remote background concentrations of air toxics can be used as lower
       limits for data screening. A cutoff value of 20% lower than the background
       concentration is used as a margin of error.
       Data below this value may be identified as suspect.
       If data are identified as below the background concentration, the first things to
       check are
        -  Units (e.g., Were units reported and/or converted correctly?)
        -  Sticking from substituted values such as MDL/2, MDL/10, or 0.
       This screen was applied to the national data set.  It was decided that data failing
       this check would not be used in subsequent analyses.
Pollutant
Acetaldehyde
Benzene
Carbon Tetrachloride
Chloroform
Formaldehyde
Methylene Chloride
Tetrachloroethylene
Trichlorofluoromethane
Dichlorodifluoromethane
Trichlorotrifluoroethane
1,1,1-trichloroethane
Methyl Chloride
Remote Background
Concentration (ug/m3)
0.16
0.14
0.62
0.046
0.18
0.087
0.022
1.4
2.7
0.61
0.18
1.2
Cutoff Value (ug/m3)
0.13
0.11
0.50
0.037
0.14
0.070
0.018
1.1
2.2
0.49
0.14
0.96
                                                          McCarthy etal., 2006
June 2009
                          Section 4 - Preparing Data for Analysis
51

-------
             Screening  Data Using  Remote
                Background  Concentrations
                                    Example
     This plot shows a time series
     plot of concentrations of long-
     lived species measured at an
     urban Southwestern site
     compared to background
     concentrations measured at
     remote sites in the Northern
     Hemisphere.
     Significant spikes and dips in
     concentrations are circled.
     Most of the time, concentrations
     at this monitor were equal to or
     greater than background
     concentrations, which might be
     expected for urban locations.
     Concentrations more than 20%
     below the background level
     were identified as suspect for
     further review.
              CH,CI Background = 0.6 ppb
                                 CCUF, Background = 0.55 ppb
                            CCL Background = 0.09 ppb
                                .•F**, _.-,._._._ .jf^f.m^f^
                                  s-*n  "••••»r--N
June 2009
                           Date
   Concentrations (ppb) of carbon tetrachloride (CCI4), dichlorodifluoromethane
   (CCI2F2), and methyl chloride (CH3CI) from 2003 and 2004. Northern
   Hemisphere background concentrations of each species were plotted as a
   line. Concentration dips well below background concentrations are circled.

Section 4 - Preparing Data for Analysis
                                                      52

-------
                 Data Validation  Examples
                                      Scatter Plots
         Scatter plot matrices can be used to rapidly and
         qualitatively examine possible correlations among
         measured species at a site.
         To interpret a scatter plot matrix, locate the row
         variable (e.g., methyl ethyl ketone [MEK] in the
         figure near the top left) and the column variable
         (e.g., methyl tert-butyl ether [MTBE]) on the
         bottom. The intersection is the scatter plot of the
         row variable on the vertical axis against the
         column variable on the horizontal axis. Each
         column and row is scaled so that data points fill
         each frame; scale information is omitted for
         clarity.  The diagonal plots contain histograms of
         the data for each row variable.
         It is clear that some species correlate well.  For
         example, toluene has a reasonable correlation
         with ethylbenzene and m- and p-xylene. In
         contrast, MEK does not correlate with any of the
         other species; this may indicate that MEK is
         emitted from different sources. Finally, MTBE
         shows a bifurcated relationship with toluene,
         ethylbenzene, and m- and  p-xylene. This
         interesting relationship might be  investigated in
         later validation steps and analysis.
 LU
 N

 LU
 CO
 LLJ
 X
 CL
 LU
 CO
   o   o on
       goo

        $
        We
     MEK
TOL
EBENZ
MPXYL
MTBE
Scatter plot matrix of selected species from an urban site.
The species plotted (from top to bottom and left to right) are
methyl ethyl ketone (MEK), toluene (TOL), ethylbenzene
(EBENZ), m- and p-xylene (MPXYL), and methyl tert-butyl
ether (MTBE). The plot was created with SYSTAT11.
June 2009
                                  Section 4 - Preparing Data for Analysis
                                           53

-------
                 Data  Validation  Examples
                                       Time Series
   The concentrations of selected VOCs
   (acetylene, toluene, benzene, and
   1,3-butadiene) are plotted as a function of
   time. Note that (1) no valid data were
   available on some dates in 2001 and in the
   middle of 2002, (2) all species exhibited
   seasonal variations in concentration with
   higher concentrations observed in the cool
   season, (3) concentrations of these species
   varied by an order of magnitude, and (4) for
   most days, these species concentrations
   correlated well (e.g., R2=0.91).
   This example  illustrates how time series plots
   may be used to check for expected temporal
   variability (based on emission sources,
   meteorology, and species reactivity), such as
   interannual or seasonal variability. The
   selected VOCs are present in gasoline
   exhaust and are expected to have lower
   concentrations during the summer due to
   higher mixing  heights (i.e., dilution) and
   faster removal rates by photochemical
   reactions. A species that does not follow its
   expected temporal variability may indicate
   misidentification or some other problem.
                         Date
Twenty-four-hour average concentrations (ppb) of acetylene,
1,3-butadiene, benzene, and toluene collected at an urban site
every sixth day from July 2001 through July 2002. The figure was
created with Microsoft Excel.
June 2009
                                  Section 4 - Preparing Data for Analysis
                                                    54

-------
              Data  Validation Examples
                                 Box Plot
     To interpret these box plots,
     see Slide 22 of this chapter.
     This plot shows the
     concentration of benzene at a
     site from 1990-2005. It is
     immediately clear by the large
     concentration change from
     1990-1993 that something
     affected the data and should
     be investigated.
      - Were there significant method
        or MDL changes during this
        time?
      - Is this change due to
        emissions regulations or is
        there another explanation?
oo
 £:
 ~D)
 c
 .0
 "-I—>
 cp
 -i—>
 c
 CD
 O
 c
 O
 O
                    Benzene
        i  i  i  i   i  i  i  i  i  i   i  i  i  i  i  i
          X                          O
                   O
                      YEAR
     Notched box whisker plot of 24-hr average concentration of
     benzene by year at an urban monitoring site in the United
     States. Concentrations show a substantial change from
     1990 to 1993.  The plot was created with SYSTAT11.
June 2009
                           Section 4 - Preparing Data for Analysis
                                            55

-------
                  Data Validation   Examples
                                     Fingerprint Plot
    A fingerprint plot is a depiction of all the species
    concentrations present in a sample, preferably
    presented in a meaningful order (e.g., by elution
    order in the analytical technique, by carbon
    number, etc.).
    Fingerprint plots are used to examine
    irregularities in whole sample concentrations
    and unusual distributions of species. The
    analyst may inspect all samples, with special
    focus on those that were identified as suspect
    or invalid in time series or scatter plot analyses.
    The fingerprint plot here shows the
    concentrations from an urban site on March 10,
    2004, when the concentrations of the two
    trimethylbenzene isomers were very high, and
    other aromatic species like toluene, xylenes,
    and ethylbenzene were also elevated relative to
    other samples.
    A "typical" fingerprint plot from October 6, 2003,
    is shown in the inset for qualitative comparison.
    "Typical" means the relationships among
    pollutants was similar across most samples, i.e.,
    representative of an average.  The  March 10,
    2004, sample may be valid but was identified as
    suspect and requires further investigation.
.Q
Q.
Q.

C
o
-—

CD
O
C
o
O
           Typical fingerprint
                              1,2,4-trimethylbenzene
                               1,3,5-trimethylb

                              m-and p-xylene
                      enzena
            MEK
 acetylene
/ propylene  \
 ethylbenzene
toluene ^
                       benzene
                         \
p-xylene
        MKZ- West 43rd
     Example fingerprint plot of 24-hr concentrations (ppb)
     from March 10, 2004. The inset figure shows a more
     typical fingerprint at the same site on October 6, 2003.
     Fingerprint plots were created with VOCDat software.
June 2009
                                   Section 4 - Preparing Data for Analysis
                                                      56

-------
                Data  Validation  Examples
                Using  Metadata - Urban vs. Rural Sites
     Knowledge of metadata allows the analyst to
     understand reasons for patterns observed in the
     data.
     This figure illustrates that the concentrations at
     each site do not need to be the same but do
     need to be consistent with our expectations of
     concentrations at urban and rural sites.
     Sites 1 and 2 show the highest concentrations
     because these sites are relatively close to an
     Interstate highway and are located in urban
     areas.
     In contrast, monitoring site 3 shows relatively
     low m-&p-xylenes concentrations, as expected
     for a site outside the urban area.
     Note: Concentrations at rural sites may be
     higher if a known emissions source is nearby or
     if in situ production  occurs. Metadata provide a
     basis for thinking about the data and making
     hypotheses, but expectations should never be
     substituted for real data validation.  Try to prove
     your hypotheses wrong in order to be sure that
     they are correct!
CD
X
Q_
5  1
   0
              x
              X
                                o
           Site 1   Site 2   Site 3
    Notched box whisker plot of 24-hr m-&p-xylenes
    concentrations at three monitoring stations in 2005.
    Red indicates urban sites and blue represents a rural
    site. Figure was created with SYSTAT.
June 2009
                                Section 4 - Preparing Data for Analysis
                                           57

-------
                Data  Validation  Examples
                        Investigating Suspect Data
   Initial Analysis: Typically, toluene
   concentrations are higher than benzene
   concentrations. Observation of an unexpected
   relationship, like these data at an urban site,
   indicate that further investigation of the data is
   needed.
o
.0
Q.
Q.

(U
C
(U
N
C
(U
CQ
   Advanced Analysis: Wind direction data were
   used to identify possible reasons for the high
   benzene concentrations in this plot of 1-hr
   benzene concentrations vs. wind direction. The
   highest benzene concentrations are typically
   coming from north of the site. Site and emission
   inventory inspection showed a source of coke
   oven emissions, which include benzene but not
   toluene, to the north providing a reasonable
   explanation for these data (and helping prove
   their validity).
  o
  .a
  a_
  (U
  c
  (U
  N
  C
  (U
  CQ
                                                         Toluene (ppbC)
June 2009
                              Section 4 - Preparing Data for Analysis
                                                          Wind Direction
                                             58

-------
             Data Validation
             Handling Suspect Data
During the process of data validation, the analyst may
identify data as suspect but not be able to prove that
the data are invalid.
Analysts may decide to exclude these suspect data
from central tendency computations (e.g., annual
average) or other analyses.
These data may warrant additional investigation using
case studies (i.e., inspection of individual dates).
               Section 4 - Preparing Data for Analysis

-------
                                              Summary
                                 Data  Preparation  Check List
       Acquire data
        Q  Check for availability of supplementary data
             O   Meteorological measurements
             O   Additional species
             O   Metadata
        LI  Use supplementary data
             O  Thoroughly review all metadata describing what/why/how
                measurements were made.
             O  Find out about site characteristics including
                  -   Meteorology
                     Local emissions sources
                     Geography
       Know your data
        Q  A general knowledge of air toxics behaviors is
            invaluable.  Know and understand typical
            relationships and patterns that have been observed
            in air toxics data.
       Process your data
        LI  Investigate collocated data, do they agree?
        LI  Create valid data aggregates
             O  Check for data completeness
             O  Prepare and inspect valid aggregates and calculate the
                percentage of data below MDL
        LI  Identify censored data and make MDL substitutions if
            necessary
             O  Use knowledge of data reporting methpds to identify
                substitution used for data below detection, if any.
             O  If reporting of data below detection is unknown, separate
                data below detection and check for repetitive values or
                linear relationships detection limits
             O  If data are uncensored, use "as is"
             O  If data are censored, make MDL/2 substitutions or more
                sophisticated method as needed
      O If the data contain a mixture of censored and uncensored
         data,
           -  Test two substitution methods for a sample analysis:
              ( 1) MDL/2 substitution for all data and (2) MDL/2
              substitution for censored data, leaving uncensored data
              "as is".
           -  If direction and magnitude of trends results agree, keep
              substitution method 2.
Validate your data
  LI  Get an overview—prepare and  inspect summary
     statistics
  LI  Apply visual and graphical methods to illuminate
     data issues and outliers
      O Buddy site check
      O Remote background comparison
      O Scatter plots
      O Time series
      O Fingerprint plots
  LI  Flag suspect data
  LI  Investigate suspect data using
      -  Local sources/wind direction
      -  Subsets of data
      -  Unusual events
  LI  Exclude invalid data
      O If you cannot prove the data are invalid, flag as suspect.
         These data may be removed from some analyses as an
         outlier even if they can not be invalidated. Advanced
         analyses may provide more insight into the data.
June 2009
                                           Section 4 - Preparing Data for Analysis
                                                       60

-------
                 Appendix:

 National  Summary Statistics (2003-2005)

The appendix contains a table of national summary statistics
based upon annual averages from 2003 to 2005.
These data are useful for comparison of data ranges to "typical"
national ranges.
These data can be used as benchmarks for site-specific
comparison; for example, if data are significantly higher than the
national 95th percentile, there may be errors in the data.
                Section 4 - Preparing Data for Analysis

-------
    Appendix - National Summary Statistics (2003-2005)
                              (1 of 3)
Pollutant
1 ,1 ,2,2-Tetrachloroethane
1,1,2-Trichloroethane
1,1-Dichloroethane
1,1-Dichloroethylene
1 ,2,4-Trichlorobenzene
1 ,2-Dichloropropane
1 ,3-Butadiene
1 ,4-Dichlorobenzene
1 ,4-Dioxane
2,2,4-Trimethylpentane
3-Chloropropene
Acenaphthene
Acenaphthylene
Acetaldehyde
Acetonitrile
Acrolein
Acrylonitrile
Anthracene
Antimony (Pm10) Stp
Antimony (Tsp)
Antimony Pm2.5 Lc
Arsenic (Pm10) Stp
Arsenic (Tsp)
Arsenic Pm2.5 Lc
Benzene
Benzo(A)Pyrene (Pm10) Stp
Benzo(B)Fluranthene (Pm10) Stp
Benzo(G,H,l)Perylene (Pm10) Stp
Benzo(K)Fluoranthene (Pm10) Stp
Benzo[A]Anthracene
Benzo[A]Pyrene
Benzo[B]Fluoranthene
AQS Code
43818
43820
43813
43826
45810
43829
43218
45807
46201
43250
43335
17147
17148
43503
43702
43505
43704
17151
82102
12102
88102
82103
12103
88103
45201
82242
82220
82237
82223
17215
17242
17220
% Below
Detection
97
98
97
98
90
96
26
64
94
13
100
44
68
4
58
43
70
73
68
84
92
46
75
60
2
67
50
27
74
90
94
90
#of
Monitoring
Sites
228
211
224
225
164
229
278
202
14
125
13
33
33
163
63
53
124
31
15
45
275
38
82
434
307
18
18
18
18
30
30
30
5th Percentile
Concentration
(ug/m3)
6.9E-02
5.5E-02
1 .OE-02
2.0E-02
1 .2E-02
1 .5E-02
3.5E-02
1 .9E-02
4.5E-02
1.1E-01
1.1E-01
5.6E-04
2.4E-04
7.8E-01
3.6E-01
1.2E-01
4.1E-02
1 .9E-04
7.3E-04
3.3E-04
4.8E-03
4.1E-04
9.9E-04
9.4E-05
4.9E-01
3.5E-05
5.5E-05
1 .2E-04
2.9E-05
7.8E-05
1 .6E-04
7.6E-05
25th Percentile
Concentration
(ug/m3)
1.6E-01
1.3E-01
6.1E-02
9.5E-02
6.2E-02
7.7E-02
9.5E-02
1.1E-01
4.9E-02
2.9E-01
1.2E-01
5.7E-03
6.8E-04
1.3E+00
6.3E-01
2.1E-01
8.2E-02
7.0E-04
1 .2E-03
1 .OE-03
6.7E-03
8.6E-04
1 .5E-03
2.7E-04
7.4E-01
6.2E-05
8.1E-05
1 .8E-04
3.6E-05
8.0E-05
2.3E-04
7.9E-05
Median
Concentration
(ug/m3)
1.7E-01
1 .4E-01
1.0E-01
9.9E-02
1.5E-01
7.9E-02
1.6E-01
2.4E-01
6.9E-02
4.8E-01
1.6E-01
1 .4E-02
3.4E-03
1 .6E+00
1.1E+00
4.4E-01
1 .4E-01
6.1E-03
8.5E-03
7.0E-03
1 .3E-02
1 .9E-03
5.0E-03
1 .2E-03
1 .OE+00
8.5E-05
1 .OE-04
2.7E-04
4.7E-05
1 .6E-04
3.2E-04
1 .9E-04
75th Percentile
Concentration
(ug/m3)
3.1E-01
1.9E-01
1.0E-01
1.1E-01
6.4E-01
1.5E-01
2.4E-01
5.2E-01
9.2E-02
7.8E-01
1.6E-01
3.9E-02
3.9E-02
2.3E+00
4.4E+00
1.2E+00
3.1E-01
7.9E-03
8.5E-03
1 .OE-02
1 .4E-02
1 .OE-02
5.5E-03
1 .7E-03
1.5E+00
1 .5E-04
1 .9E-04
3.4E-04
8.4E-05
4.4E-04
5.0E-04
6.2E-04
95th Percentile
Concentration
(ug/m3)
1.1E+00
9.0E-01
6.8E-01
6.5E-01
1.2E+00
7.6E-01
8.4E-01
9.9E-01
1.2E-01
2.4E+00
1.9E-01
7.2E-02
4.4E-02
4.2E+00
3.2E+01
1.5E+00
1.5E+00
8.9E-03
6.0E-02
1.1E-02
1 .5E-02
1.1E-02
1 .OE-02
2.5E-03
3.1E+00
4.4E-04
4.5E-04
6.4E-04
2.1E-04
1 .8E-03
3.6E-03
3.6E-03
June 2009
                       Section 4 - Preparing Data for Analysis
62

-------
    Appendix - National Summary Statistics (2003-2005)
                              (2 of 3)
Pollutant
Benzyl Chloride
Beryllium (Pm10) Stp
Beryllium (Tsp)
Bromoform
Bromomethane
Cadmium (Pm10) Stp
Cadmium (Tsp)
Cadmium Pm2.5 Lc
Carbon Disulfide
Carbon Tetrachloride
Chlorine Pm2.5 Lc
Chlorobenzene
Chloroethane
Chloroform
Chloromethane
Chloroprene
Chromium (Pm10) Stp
Chromium (Tsp)
Chromium Pm2.5 Lc
Chromium Vi(Tsp)
Chrysene
Cobalt (Pm 10) Stp
Cobalt (Tsp)
Cobalt Pm2.5 Lc
Dibenz(A-H)Anthracene (Pm10) Stp
Dibenzo[A,H]Anthracene
Dichloromethane
Ethyl Acrylate
Ethylbenzene
Ethylene Dibromide
Ethylene Dichloride
Ethylene Oxide
Fluoranthene
Fluorene
Formaldehyde
Hexachlorobutadiene
Hydrogen Sulfide
AQS Code
45809
82105
12105
43806
43819
82110
12110
88110
42153
43804
88115
45801
43812
43803
43801
43835
82112
12112
88112
12115
17208
82113
12113
88113
82151
17231
43802
43438
45203
43843
43815
43601
17201
17149
43502
43844
42402
% Below
Detection
95
82
87
100
92
50
73
93
73
42
67
83
93
74
6
99
36
67
65
55
87
55
66
96
91
98
53
100
10
98
95
38
40
42
35
95
91
#of
Monitoring
Sites
110
27
62
94
228
37
105
263
75
280
427
226
159
273
245
114
33
106
428
21
30
23
52
270
18
30
277
46
291
235
253
16
33
33
163
153
39
5th Percentile
Concentration
(M9/m3)
7.4E-03
2.3E-06
8.8E-06
5.2E-02
4.4E-02
1 .2E-04
1 .4E-04
2.5E-03
1.1E-01
3.3E-01
3.4E-04
1 .2E-02
1 .3E-02
6.7E-02
7.9E-01
4.5E-02
4.9E-04
1 .3E-03
3.1E-05
1 .3E-05
1 .8E-04
8.1E-05
2.0E-04
3.2E-04
2.5E-05
8.3E-05
1.8E-01
9.6E-02
1.2E-01
3.8E-02
2.2E-02
1.7E-01
3.1E-04
2.2E-03
1.2E+00
8.0E-02
1 .OE-03
25th Percentile
Concentration
(M9/m3)
4.0E-02
4.1E-06
2.6E-05
2.7E-01
1.0E-01
2.4E-04
3.8E-04
2.9E-03
1.6E-01
4.8E-01
2.8E-03
4.4E-02
3.9E-02
1.2E-01
1.0E+00
4.5E-02
1 .OE-03
1 .8E-03
7.0E-05
1 .8E-05
3.1E-04
1 .6E-04
5.2E-04
5.3E-04
2.5E-05
1 .8E-04
2.4E-01
1.2E-01
2.5E-01
9.9E-02
1.0E-01
1.8E-01
3.2E-04
4.6E-03
2.0E+00
1.1E-01
1 .OE-03
Median
Concentration
(M9/m3)
1.8E-01
4.6E-05
3.0E-05
5.0E-01
1.9E-01
5.0E-04
8.0E-04
6.4E-03
2.6E-01
5.5E-01
1 .2E-02
5.5E-02
1.0E-01
2.4E-01
1.2E+00
4.5E-02
2.1E-03
2.4E-03
1.1E-03
2.6E-05
1 .8E-03
3.0E-04
9.2E-04
8.0E-04
2.9E-05
7.8E-04
4.0E-01
1.9E-01
4.2E-01
1.9E-01
1.0E-01
2.1E-01
1 .5E-03
7.8E-03
2.7E+00
1.7E-01
1.1E-03
75th Percentile
Concentration
(M9/m3)
3.7E-01
3.0E-04
1 .6E-04
5.2E-01
2.1E-01
9.0E-04
1 .5E-03
6.6E-03
1.3E+00
6.3E-01
2.9E-02
1.5E-01
1.4E-01
2.5E-01
1.3E+00
8.6E-02
2.8E-03
4.8E-03
2.0E-03
3.8E-05
3.1E-03
2.0E-03
2.0E-03
8.2E-04
3.6E-05
8.6E-04
8.7E-01
3.3E-01
6.3E-01
2.2E-01
2.0E-01
2.5E-01
3.6E-03
8.1E-03
3.8E+00
1.1E+00
1 .5E-03
95th Percentile
Concentration
(Mg/m3)
8.4E-01
4.6E-04
2.7E-04
7.2E-01
6.4E-01
1 .2E-03
2.7E-03
6.9E-03
3.2E+00
1.1E+00
1 .3E-01
7.6E-01
4.4E-01
8.2E-01
1 .6E+00
5.0E-01
6.2E-03
1 .6E-02
3.2E-03
7.5E-04
3.2E-03
4.8E-03
2.3E-03
8.8E-04
8.1E-05
3.6E-03
6.1E+00
5.0E-01
1 .OE+00
1.3E+00
6.8E-01
4.6E-01
1 .8E-02
3.5E-02
6.7E+00
1 .8E+00
4.1E-03
June 2009
                       Section 4 - Preparing Data for Analysis
63

-------
    Appendix - National Summary Statistics (2003-2005)
                              (3 of 3)
Pollutant
Indeno[1,2,3-Cd] Pyrene (Pm10) Stp
lndeno[1,2,3-Cd]Pyrene
Isopropyl benzene
Lead (Pm 10) Stp
Lead (Tsp)
Lead Pm2.5 Lc
M_P Xylene
Manganese (Pm10) Stp
Manganese (Tsp)
Manganese Pm2.5 Lc
Mercury (Tsp)
Mercury Pm2.5 Lc
Methyl Chloroform
Methyl Isobutyl Ketone
Methyl Methacrylate
Methyl Tert-Butyl Ether
Naphthalene
N-Hexane
Nickel (Pm10) Stp
Nickel (Tsp)
Nickel Pm2.5 Lc
O-Xylene
Phenanthrene
Phosphorus Pm2.5 Lc
Propionaldehyde
P-Xylene
Scandium Pm2.5 Lc
Selenium (Pm10) Stp
Selenium (Tsp)
Selenium Pm2.5 Lc
Styrene
Tetrachloroethylene
Toluene
Trichloroethylene
Vinyl Acetate
Vinyl Chloride
AQS Code
82243
17243
45210
82128
12128
88128
45109
82132
12132
88132
12142
88142
43814
43560
43441
43372
17141
43231
82136
12136
88136
45204
17150
88152
43504
45206
88163
82154
12154
88154
45220
43817
45202
43824
43447
43860
% Below
Detection
51
92
61
37
34
37
5
4
46
35
97
87
72
87
98
57
51
2
38
70
57
9
37
94
20
13
99
52
82
55
51
69
1
87
18
96
#of
Monitoring
Sites
18
30
117
37
193
434
266
27
96
434
25
270
263
134
45
207
39
168
36
101
428
282
33
427
118
17
263
22
43
434
272
273
295
268
24
254
5th Percentile
Concentration
(M9/m3)
5.3E-05
1 .5E-04
2.6E-02
2.4E-03
1 .9E-03
4.8E-04
2.8E-01
2.7E-03
4.9E-03
4.6E-04
5.0E-05
1 .OE-03
9.3E-02
3.9E-02
1.4E-01
3.6E-02
1 .3E-03
2.4E-01
3.8E-04
1 .5E-03
5.7E-05
1.1E-01
3.0E-03
4.1E-04
7.5E-02
6.8E-01
1 .5E-03
8.1E-05
6.8E-04
8.3E-05
3.8E-02
1.1E-01
6.9E-01
6.1E-02
1.8E-01
2.6E-02
25th Percentile
Concentration
(M9/m3)
9.0E-05
2.6E-04
5.0E-02
3.7E-03
5.1E-03
1 .2E-03
6.7E-01
3.8E-03
1 .2E-02
9.3E-04
5.0E-05
1 .5E-03
1 .4E-01
5.0E-02
1.9E-01
1.3E-01
3.8E-02
5.1E-01
1 .7E-03
2.4E-03
1 .6E-04
2.4E-01
3.1E-03
7.4E-04
2.1E-01
1.2E+00
2.2E-03
4.0E-04
1 .2E-03
4.1E-04
7.8E-02
1.8E-01
1.5E+00
1.3E-01
7.2E-01
6.0E-02
Median
Concentration
(M9/m3)
1 .2E-04
7.8E-04
6.4E-02
5.6E-03
1 .2E-02
3.2E-03
1.1E+00
5.7E-03
2.1E-02
1 .6E-03
5.1E-05
2.6E-03
1 .4E-01
1.7E-01
2.0E-01
5.0E-01
4.0E-02
8.4E-01
2.6E-03
2.9E-03
9.6E-04
4.6E-01
7.0E-03
3.6E-03
2.7E-01
2.2E+00
3.6E-03
9.0E-04
1 .6E-03
1.1E-03
1.6E-01
2.3E-01
2.4E+00
1.5E-01
9.8E-01
6.5E-02
75th Percentile
Concentration
(M9/m3)
1 .9E-04
8.8E-04
1.1E-01
1 .3E-02
3.8E-02
4.3E-03
1.7E+00
1 .4E-02
2.9E-02
2.4E-03
4.5E-04
2.8E-03
1 .9E-01
2.8E-01
2.2E-01
1.1E+00
1.1E-01
1.5E+00
4.1E-03
3.4E-03
1 .4E-03
7.0E-01
1 .3E-02
5.3E-03
4.2E-01
2.9E+00
3.8E-03
8.5E-03
6.4E-03
1 .6E-03
3.7E-01
4.1E-01
3.8E+00
2.3E-01
1.3E+00
1 .3E-01
95th Percentile
Concentration
(Mg/m3)
4.3E-04
3.6E-03
5.0E-01
4.0E-02
2.9E-01
8.8E-03
3.4E+00
5.5E-02
8.4E-02
7.0E-03
2.1E-03
3.1E-03
9.2E-01
9.7E-01
6.6E-01
2.8E+00
5.0E-01
2.7E+00
5.8E-03
5.5E-02
3.8E-03
1 .3E+00
9.7E-02
7.7E-03
6.5E-01
4.0E+00
4.7E-03
9.3E-03
6.7E-03
2.4E-03
8.8E-01
1 .4E+00
7.4E+00
8.9E-01
2.2E+00
4.2E-01
June 2009
                       Section 4 - Preparing Data for Analysis
64

-------
                Resources
                Data Acquisition

Primary data source—EPA's AQS: National repository
of ambient monitoring data.
http://www.epa.gov/ttnmain1/airs/airsaqs/
AQS Discover Web- data retrieval system.
http://www.epa.gov/ttn/airs/airsaqs/aqsdiscover/
Other data sources
 - IMPROVE: A source of speciated PM2 5 data.
   http://vista.cira.colostate.edu/views/
 - SEARCH:  A source of speciated PM25 data.
   http://www.atniospheric-research.coni/public/index.htnil
 - National Weather Service: Has a variety of historical
   meteorological data for selected locations.
   http://www.nws.noaa.gov/
               Section 4 - Preparing Data for Analysis

-------
                  Resources
                 Quality Assurance

Ambient Monitoring Technology Information Center: A variety of
background information on monitoring methods and QA for
multiple monitoring networks, http://www.epa.gov/ttn/amtic/
 Toxics specifically: http://www.epa.gov/ttn/amtic/airtoxpq.html
EPA quality assurance: Office of Air Quality Planning and
Standards, http://www.epa.gov/oar/oagps/ga/index.html#back
PAMS data analysis workbook (circa 2000): analysis and
validation of PAMS data.
http://www.epa.gov/oar/oagps/pams/analvsis/
EPA supersite overview: background and QA documentation.
http://www.epa.gov/ttn/amtic/supersites.html
EPA PM2 5 network quality assurance.
http://www.epa.gov/ttn/amtic/specgual.html
                 Section 4 - Preparing Data for Analysis

-------
                        Resources

                            Metadata

•  Google Earth: High resolution satellite data useful for investigating site
  locations and local emissions sources, http://earth-software.com/freebie/
•  Federal Highway Administration: Information on number of miles traveled
  on roadways, total amount of gasoline sold etc.; useful for correlating long
  term mobile source trends http://www.fhwa.dot.gov/index.html
    Vehicle miles traveled, fuel composition, fleet characteristics
    http://www.fhwa.dot.gov/policv/ohpi/
•  National Emissions Inventory 2002: Emissions inventory for the United
  States; some Canada and Mexico data also available.
  http://www.epa.gov/ttn/chief/net/2QQ2inventory.html
•  EPA's AirData Facility Emissions Report and regulations for Criteria Air
  Pollutants and HAPS: Site level emissions data.
  http://www.epa.gov/air/data/geosel.html
•  MapQuest (useful for mapping site locations), http://www.mapguest.com/
•  U.S. Census Bureau: A variety of information; some of the most useful are
  population and population density, http://www.census.gov/
    Query tool: factfinder.census.gov/
                      Section 4 - Preparing Data for Analysis

-------
                         Resources
Advanced methods for estimating data structure below detection

• Helsel D.R. (2005) Nondetects and data analysis: statistics for censored
 environmental data. John Wiley & Sons, Inc., Hoboken, NJ.
• Helsel D.R. (2005) More than obvious: better methods for interpreting
 nondetect data. Environ. Sci. Technol., 419A-423A, American Chemical
 Society.
• Antweiler R.S. and Taylor H.E. (2008) Evaluation of statistical treatments of
 left-censored environmental data using coincident uncensored data sets:
 I. Summary statistics. Environ. Sci. Technol., 42, 10, 3732-3738.
• U.S. EPA (2004) Local Limits Development Guidance Appendices.
 EPA 833-R-04-0-02B:, Office of Wastewater Management: Washington, DC.
• Kaplan-Meier Method
  Kaplan, E. L. and Meier, P. (1958) Nonparametric estimation from incomplete
  observations. J. Amer. Stat. Assn, 53, 282 (June), 457-481, doi:10.2307/2281868.
• Robust Regression on Order Statistics
  Lee, L. and Helsel, D. (2007) Statistical analysis of water-quality data containing
  multiple detection limits II: S-language software for nonparametric distribution
  modeling and hypothesis testing. Comput. Geosci. 33, 5 (May), 696-704.
  http://dx.doi.org/10.1016/j.cageo.2006.09.006
                      Section 4 - Preparing Data for Analysis

-------
                       Resources

.  HAPs             Information and Methods

    - NATA. County level risk assessment modeling data for NATA all years
      http://www.epa.gov/ttn/atw/natamain/
    - EPA integrated risk information system: Searchable database of human
      health effects by pollutant, http://www.epa.gov/iris/index.html
    - Agency for Toxic Substances & Disease Registry. General toxics
      information and FAQs, http://www.atsdr.cdc.gov/toxfag.html
    - EPA air toxics website (ATW). General information on a variety of HAPs
      topics, http://www.epa.gov/ttn/atw/
    - Lake Michigan Air Directors Consortium. Summary of Phases l-lll of
      national analyses, http://www.ladco.org/toxics.html
    - EPA's FERA (Fate, Exposure and Risk Analysis)
      http://www.epa.gov/ttn/fera/
•  Hydrocarbons
    - EPA PAMS web site including access to the PAMS Data Analysis
      Workbook, http://www.epa.gov/oar/oagps/pams/
    - PAMS validation and analysis projects (e.g.,
      http://www.nescaum.org/proiects/pams/index.html)
    - Ambient monitoring technology information center (AMTIC) - PAMs
      monitoring information, http://www.epa.gov/ttn/amtic/pamsmain.html
•  Particulate Matter
    - EPA's PM2 5 data analysis web site, http://www.epa.gov/oar/oaqps/pm25/
                      Section 4 - Preparing Data for Analysis

-------
              Resources
               Data Validation
VOCDat (PAMS, air toxics),
http://vocdat.sonomatech.com/
SDVAT (PM2 5). Developed by RTI, available through
EPA OAQPS monitoring group.
              Section 4 - Preparing Data for Analysis

-------
                  Resources
                    Data Analysis
Basic data handling, display, and analysis:
 - Spreadsheets (if data sets are small enough)
 - Databases
 - Geographic information systems (CIS)
Statistical analyses
 - Package used throughout this workbook: SYSTAT
   (http://www.aspiresoftwareintl.com/html/svstat.html)
 - Commonly used at EPA: SAS
   (http://www.sas.com/technoloqies/analvtics/statistics/stat/)
 - Open source: R (http://www.r-proiect.org/)
    There are other sources of statistical software packages - this list
    is not intended to be an endorsement.

                 Section 4 - Preparing Data for Analysis

-------
                 Treating  Data 
-------
            Maximum  Likelihood
                         Example

Let X-,, X2, ..., Xm, ..., Xn represent all the n data values ranked from
largest to smallest.  The first "m" values represent the data values
above the detection limit (DL), and the remaining "n-m" data points are
those below DL.
Compute the sample mean and the sample variance from only the "m"
above detection data values. The mean will be too large because the
small undetected values have been ignored, and the variance too
small.
The mean will be lowered and the variance enlarged through the use of
factors!       YI _ ^y^             2        Xd is the sample mean
        1_ _ _    v  _ _ d _    Sd is the sample standard deviation
                       '                  m istnenumker of detected values
                             d      7     n is the total number of values

Use the table on the next page to obtain
                         A/ ( Y  -O_ I
                           ^ I 7   •/
From material supplied by
Warren and Nussbaum (2009)
                   Section 4 - Preparing Data for Analysis

-------
                      EPA/QA/G-9S, Table A-11

y
.00
.05
.10
,15
,20
.25
,30
.35
.40
,45
,50
,55
.60
.65
.70
.75
.80
.85
.90
.95
1 .00

.25
.31862
,32793
.33662
.34480
,35255
.35993
.36700
,37379
.38033
.38665
,39276
.39679
.40447
.4 i 008
.41555
.42090
,42612
,43122
.43622
.44112
.44592

.30
,4021
.4130
,4233
.4330
,4422
,4510
.4595
,4676
.4735
.4831
.4904
.4976
.5045
.5114
.5180
,5245
.5308
.5370
.5430
.5490
.5548

.35
.4941
.5066
.5184
5296
.5403
.5506
,5604
,5699
,579!
,5880
.5967
.6061
.6133
,6213
.6291
.6367
.6441
,6515
,6586
,6656
,6724

.40
,5961
.6101
,6234
,6361
.6483
,6600
.67 i 3
682 1
.6927
,7029
7129
.7225
,7320
7412
,7502
.7590
.7676
.7781
.7844
,7925
.8005

.45
.7096
.7252
,7400
,7542
.7673
.7810
.7937
.8060
,8179
.8295
,8408
,8517
.8625
.8729
.8832
.8932
.9031
,9127
9222
,9314
.9406
H
.50
.8388
.8540
,8703
.8860
.90 1 2
,9158
,9300
,9437
9570
.9700
,9826
,9950
1 .007
1.019
1.030
1.042
1.053
1 .064
1 .074
1 .085
1 .095

.55
.9808
,9994
1.017
,035
.051
.067
.083
.098
1.113
1.127
1.141
1.155
' 1.169
1,182
1.195
1 .207
1 .220
1.232
1.244
1 .255
1.287

.60
1 , 1 45
1 . 1 66
1.185
1 .204
1,222
1 ,240
1 .257
1.274
1.290
1.306
1.321
1.337
1.351
1 .368
1.380
! .394
1 .408
1.422
1 .435
1 .448
1.461

.65
1,336
1.358
1.379
1 ,400
1,419
1 439
1 ,457
1 .475
1.494
1.511
1.528
1 .545
1.561
i .577
1 ,593
1 .608
1,624
1 ,639
1 .653
1 .668
1.882

.70
1,561
1 ,585
1 ,608
1 ,630
1.651
1.672
1.693
1.713
1,732
1.751
1 .770
1 .788
1 ,806
1 ,824
1.841
1.851
1.875
1.892
1 .908
1.924
1 .940

.80
2.176
2.203
2.229
2 255
2.280
2.305
2.329
2.353
2.376
2,399
2,421
2,443
2.465
2.486
2,507
2.528
2.548
2.568
2.588
2.607
2.626

.90
3.283
3.314
3,345
3.376
3.405
3.435
3.464
3.492
3.520
3.547
3.575
3.601
3.628
3,654
3.679
3.705
3.730
3.754
3.779
3.803
3.827
June 2009
                          Section 4 - Preparing Data for Analysis
74

-------
               Maximum  Likelihood
                        Example Continued

   Estimate the corrected sample mean and corrected sample variance to account for
   the data below the DL:
x= xd - x(xd - DL)
                                     s2  - s* +


•   Let X.,, X2, ..., Xm, ..., Xn represent all the n data values ranked from largest to
   smallest:  1752, 1563, 1498, 1477, 1.418, 1.358, 1.327, 1.289, 1.148, 1.060, 1.045,
   <1.000, <1.000, <1.000, <1.000, <1.000, <1.000, <1.000, <1.000, <1.000
•   The first "m" values represent the data values above the DL, and the remaining "n-m"
   data points are those below the detection limit:  n = 20, m = 1 1, n-m = 9
•   Compute  the sample mean and the sample variance from only the "m" above
   detection  data values: Mean = 1.358  Variance = 0.0524
•   The first factor (h):  11/20 = 0.55
•   The second factor (v): 0.05247(1.358 - 1.000)2 = 0.409
•   The third factor (h,v, Table A-11): 1.113
•   Estimate the corrected sample mean and corrected sample variance to account for
   the data below the DL: Mean = 1.358- 1.113(1.358- 1) = 0.960 and
   variance = 0.0524 + 1.113(1.358 - 1)2 = 0. 195                From material supplied by
                                                        Warren and Nussbaum (2009)
                        Section 4 - Preparing Data for Analysis

-------
                          Kaplan-Meier
                                  Example
•  For this example, the maximum was 1.752, so we can chose 2 (or 3 or 4, it makes no
  difference) as the flip point.  1.752 when flipped is 0.248, 1.563 becomes 0.437, etc.
•  This method will find a specific probability (denoted as g,) for each X, (the flipped values)
  using an "Incremental Survival Probability" (actually through use of a table that must be
  constructed).
•  The "g," and "X," are combined to estimate the mean and variance:
                  Mean = J^X,          Variance = ZgjX,2 - (Mean)2
•  The Mean is then flipped back to the original scale; variance is left as is.
•  The computation is summarized on the next slide.
    - Col 1: The actual data values (non-detects indicated by a dashed line)
    - Col 2: The "flipped data" = 2 minus the actual value
    - Col 3: Rank order (the missing  ranks belong to non-detects)
    - Col 4: b = n-r+1 where n= total (20), r = rank
    - Col 5: d = number of observations for this value (1  in this case)
    - Col 6: p = (b - d)/b
    - Col 7: S = The S from the previous row multiplied by the p for the current row (starts at 1.0000).
       E.g., 10th data value: S = 0.5500x 10/11 = 0.500
    - Col 8: g = The S from the previous row minus the S for the current row (starts at 1.000).
       E.g., 10th data value: g = 0.5000-0.4500 = 0.0500.
•  The XjS are the  flipped values and the g,s come from the table.
    - Mean = 0.05x0.248 + ...+ 0.16875x1.200 = 0.8620                 From material supplied by
    - Variance = 0.05x0.2482 +...+0.16875x1.2002 - 0.86202 = 0.085      Warren and Nussbaum (2009)
•  The true Mean is then 2 - 0.8620 = 1.138 and the variance 0.085
                           Section 4 - Preparing Data for Analysis

-------
                 Kaplan-Meier
                      Example
Data
1.752
1.563
1.498
1.477
1.418
1.358
1.327
1.289
1.148
1.060
1.045
0.977
0.944
0.919
0.897
0.818
<0.800
Flip on 2
0.248
0.437
0.502
0.523
0.582
0.642
0.673
0.711
0.852
0.940
0.955
1.023
1.056
1.081
1.103
1.182
>1.200
rank
1
2
3
4
5
6
7
8
9
10
11
13
14
15
16
17
18
b = n-r+1
20
19
18
17
16
15
14
13
12
11
10
8
7
6
5
4
3
d
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
p=(b-d)/b
19/20
18/19
17/18
16/17
15/16
14/15
13/14
12/13
11/12
10/11
9/11
8/9
7/8
6/7
5/6
4/5
0
S
0.9500
0.9000
0.8500
0.8000
0.7500
0.7000
0.6500
0.6000
0.5500
0.5000
0.4500
0.3938
0.3375
0.2813
0.2250
0.1688
0
g
0.0500
0.0500
0.0500
0.0500
0.0500
0.0500
0.0500
0.0500
0.0500
0.0500
0.0500
0.05625
0.05625
0.05625
0.05625
0.05625
0.16875
June 2009
                  Section 4 - Preparing Data for Analysis
77

-------
                Comparison  of Methods
                                 Example

Mean
Var
True
1.108
0.117
Zero DL 1/2 DL
0.747 1.4220.972
0.505 0.0990.302
MLE ROS
0.960 1.197
0.195 0.048
K-M
1.138
0.085
     In this example, the easiest methods—substitution with zero, DL, or 14 DL—gives poor
     results.
     MLE and ROS (not shown in the example) provide fairly good mean and variance values
     considering the high  non-detect rate (45%) in this example. However, these methods
     require significant work to calculate the estimates.
     Kaplan-Meier provides reasonable estimates for this example, and works when there are
     multiple detection limits. However, this method also requires significant work to calculate
     the estimates.
                                                          From material supplied by
                                                          Warren and Nussbaum (2009)
June 2009
                           Section 4 - Preparing Data for Analysis
78

-------
                                References
(1 of 2)
     Antweiler R.C. and Taylor H.E. (2008) Evaluation of statistical treatments of left-censored environmental data
         using coincident uncensored data sets: I. summary statistics. Environ. Sci. Technol. 42 (10), 3732-3738
         (10.1021/es071301c).
     Bortnick S.M., Coutant B.W., and Biddle B.M. (2003) Estimate background concentrations for the national-scale
         air toxics assessment. Final technical report prepared for the U.S. Environmental Protection Agency,
         Research Triangle Park, NC, by Battelle, Columbus, OH, Contract No. 68-D-02-061, Work Assignment 1-
         03, June.
     Helsel D.R. (2005) More than obvious: better methods for interpreting nondetect data. Environ. Sci. Technol.,
         419A-423A, American Chemical Society.
     Helsel D.R. (2005) Nondetects and data analysis: statistics for censored environmental data. John Wiley &
         Sons, Inc., Hoboken, NJ.
     Khalil M.A. and Rasmussen R.A. (1997) The global distribution of atmospheric methyl chloride. Web site of the
         Climate Monitoring and Diagnostics Laboratory. Available on the Internet at
         
     Kuhlmann et al. (2003) A model for studies of tropospheric ozone and NMHCs: Model evaluation of ozone-
         related species, J. Geophys. Res. 108(023) doi:10.1029/2002JD003348.
     Main H.H. and Roberts P.T. (2001) PM25 data analysis workbook. Draft workbook prepared for the U.S.
         Environmental Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park,
         NC, by Sonoma Technology, Inc., Petaluma, CA, STI-900242-1988-DWB, February.
     McCarthy M.C., Hafner H.R., and Montzka S.A. (2006) Background concentrations of 18 air toxics for North
         America. J. Air and Waste Manag. Assoc. 56, 3-11 (STI-903550-2589). Available on the Internet at
         http://www.awma.org/journal/ShowAbstract.asp?Year=&PaperlD=1509.
     Montzka, S.A. et al. (1999) Present and future trends in  the atmospheric burden of ozone-depleting halogens.
         Nature, 398, 690-694.
     Parrish D.D., Trainer M., Young V., Goldan P.O., KusterW.C., Jobson B., T., Fehsenfeld F.C., Lonneman W.A.,
         Zika R.D., Farmer C.T., Riemer D.D., and Rodgers M.O. (1998) Internal consistency tests for evaluation of
         measurements of anthropogenic hydrocarbons in the troposphere. J. Geophys. Res.-Atmos.  103(017),
June 2009   22339'22359-                Section 4 - Preparing Data for Analysis                                   7g

-------
                           References
(2 of 2)
Rosenbaum A.S., Axelrad D.A., Woodruff T.J., Wei Y., Ligocki M.P., and Cohen J.P. (1999) National
    estimates of outdoor air toxics concentrations, J. Air& Waste Manag. Assoc. 49, 1138-1152,.
Singh H.B. et al. (2001) Evidence from the Pacific troposphere for large global sources of oxygenated organic
    compounds, Nature, 410, 1078-1081.
U.S. Environmental Protection Agency (1980) Validation of air monitoring data. Report prepared by the U.S.
    Environmental Protection Agency, Research Triangle Park, NC, EPA-600/4-80-030.
U.S. Environmental Protection Agency (1982) Definition and procedure for the determination of the method
    detection limit - revision 1.11: Federal Register. Pp. 565-567. To be codified at 40 CFR Part 136,
    Appendix B.
U.S. Environmental Protection Agency (1999) Particulate matter (PM25) speciation guidance document.
    Available at .
U.S. Environmental Protection Agency (2004) Local Limits Development Guidance Appendices. EPA 833-R-
    04-0-02B:, Office of Wastewater Management: Washington, DC.
VIEWS website, http://vista.cira.colostate.edu/views/
Warren, J. and Nussbaum, B. (2009) "Analyzing Datasets Containing Semi-quantitative Values". Course
    material. Office of Environmental Information, EPA
Watson J.G., DuBois D.W., DeMandel R., Kaduwela A., Magliano K., McDade C., Mueller P.K., Ranzieri A.,
    Roth P.M., and Tanrikulu S. (1998) Aerometric monitoring program plan for the California Regional
    PM25/PM10Air Quality Study. Draft report prepared for the California Regional PM10/PM25 Air Quality
    Study Technical Committee,  California Air Resources Board, Sacramento, CA, by Desert Research
    Institute, Reno, NV, DRI Document No. 9801.1D5, December.
Weller et al. (2000) Meridional distribution of hydroperoxides and formaldehyde in the MBL of the Atlantic (48
    N-35 S) measured during the Albatross campaign. J. Geophys. Res. 105(011),  14401-14412.
Zhou et al. (1996) Tropospheric formaldehyde concentrations at the Mauna Loa observatory during  MLOPEX
    2. J. Geophys. Res. 101(D9).


                               Section 4 - Preparing Data for Analysis

-------
            Characterizing  Air Toxics
         What are the diurnal, seasonal, and spatial characteristics
                          of air toxics?
           What do these characteristics tell us about emission
                  sources, transport, and chemistry?
June 2009                   Section 5 - Characterizing Air Toxics

-------
            Characterizing Air Toxics
             What3s Covered in This Section

      Temporal Patterns
       - Diurnal
       - Day-of-week
       - Seasonal
      Spatial Patterns
       - Spatial characterization
         •  National concentration plots for perspective
         •  Maps
       - Variability within and between cities
       - Hot and cold spot analysis
       - Comparing urban and rural sites
      Risk screening
June 2009                  Section 5 - Characterizing Air Toxics

-------
                 Characterizing Air Toxics

                                 Overview

     • Spatial and temporal characterizations of air toxics data are the basis
       for improving our understanding of emissions and the atmospheric
       processes that influence pollutant formation, distribution, and removal.
       Goals of these data analyses can include
        -  Identifying possible important sources of air toxics.
        -  Determining chemical and physical processes that lead to high air toxics
           concentrations.
     • Characterization analyses help us develop a conceptual model of
       processes affecting air toxics concentrations and also provide an
       opportunity to compare data to existing conceptual models to identify
       interesting or problematic data. Following are some typical questions
       which may be addressed using these types of analyses:
        -  Where are air toxics concentrations highest or lowest?
        -  How do pollutant concentrations vary relative to each other - and  what does this tell
           us about their sources?
        -  What and where are the air toxics of concern?
        -  How do urban and rural sites compare?
        -  How do air toxics concentrations compare to criteria pollutants  (e.g., ozone and
           PM25)?
        -  What local or regional sources influence a particular measurement site?

June 2009                       Section 5 - Characterizing Air Toxics

-------
                       Quantifying   Patterns

       •  When investigating temporal patterns, analysts should use statistical measures to
         understand if concentrations are statistically different.
       •  Testing statistical significance using T-test
          -  The t-test is a very common method for assessing the difference in mean values of two groups
             of data (e.g., the difference in means of two years of data).
          -  This test assumes that both data sets are normally distributed, a fact that is not true for many
             air toxics measurements. However, this is not a problem as long as there are sufficient data in
             each group (>~100). Each data set is also required to contain the same number of samples.
          -  If there are fewer than 100 data points per group, a more advanced, non-parametric, test must
             be used. Some examples are
              •  Kruskal-Wallis
              •  Kolmogorov-Smirnov
              •  Anderson-Darling (sample sizes of 10 to 40 only).
       •  Testing statistical significance using notched box plots
          -  For the national analyses, SYSTAT notched box plots were used as a quick check of statistical
             significance between two groups. The notches on a box plot represent the range of the upper
             to lower 95th percentile confidence intervals surrounding the median (a full description of
             notched box plots can be found in Preparing Data For Analysis, Section 4, of this workbook).
             If the notches of two box plots do not overlap, the median concentrations are statistically
             significantly different.
          -  Testing with notched box plots provides significance tests on the median  concentration value,
             not the mean.
       •  Most of these statistical methods can be performed with Microsoft Excel or SYSTAT, as well
         as many other statistical programs.                             StatSoft, inc. (2005)

June 2009                             Section 5 - Characterizing Air Toxics

-------
        Characterizing  Temporal Patterns

                          Motivation

      To more fully understand potential contributing air
      toxics sources, analysts  may also wish to consider:
       - Diurnal patterns.  How does the daily cycle of air toxics
        concentrations relate to emissions and meteorology? Are
        diurnal patterns properly reflected in exposure models?
       - Day-of-week patterns. Does the weekly cycle of air toxics
        concentrations tell us anything about emissions sources?
       - Seasonal patterns. Do air toxics concentrations show
        seasonal patterns and do these patterns make sense with
        respect to what we know about formation, transport, and
        removal processes?
      Understanding diurnal, day-of-week, and  seasonal
      patterns may also help analysts understand potential
      biases in aggregated data, assess exposure, and
      evaluate models.
June 2009                   Section 5 - Characterizing Air Toxics

-------
                      Diurnal  Patterns

                              Overview

     • Air toxics data are not routinely collected on a subdaily basis; most
      data are reported as 24-hr averages. However, the PAMS program
      provides subdaily measurements of nine air toxics: acetaldehyde,
      benzene, ethylbenzene, formaldehyde, hexane, toluene, styrene,
      xylenes (three isomers), and 2,2,4-trimethylpentane. The diurnal
      variation of some air toxics is unknown because of data limitations.
     • Subdaily data allow us to:
       -  Evaluate diurnal variation.
       -  Understand general atmospheric processes (the physics, chemistry,
          and sources of air toxics).
       -  Assess the performance of models that are attempting to capture
          diurnal cycles.
       -  Provide input to receptor-based models.
     • Reasons to understand diurnal patterns include
       -  Assessing human exposure and health effects.
       -  Identifying local sources vs. regional transport.
       -  Contributing to an understanding of the physics and chemistry of air
          toxics.
June 2009                      Section 5 - Characterizing Air Toxics

-------
                          Diurnal  Patterns
                            Conceptual Model
      Daily concentrations are driven by dispersion (e.g., mixing height), sources (e.g., traffic patterns),
      sinks (e.g., oxidation by OH radical), and transport.
      Sources and transport from other areas increase concentrations at a monitor site, while sinks and
      dispersion reduce concentrations.
      The figure shows an example contribution of individual factors that commonly influence diurnal
      concentrations. The overall diurnal  pattern may be driven by a combination of these factors and
      may be conceptually estimated in the following manner:

                 Concentrations = (Sources -   - < •  + Transport)/Dispersion
             D)
             C
             'x
             o jo
             it o
             co w
             II
             11

                                      Solar Radiation

                                                     \
                       \
                                  ** ^Dispersion =lnverse Mixing Height
                             Source = Traffic Activity
                 012345678
      9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

          Hour
June 2009
Section 5 - Characterizing Air Toxics

-------
                                   Diurnal   Patterns
                                          Approach a of 3)
           For the most valid diurnal patterns, the following data requirements are suggested:
            -  75% sampling completeness is recommended for each site, pollutant, and day (1) to ensure that data are
               representative of a full day and (2) to provide consistency with completeness requirements used to
               construct other aggregates (see Preparing Data for Analysis, Section 4).
            -  Other completeness criteria (daily, monthly, yearly) may be necessary to aggregate data from multiple
               sites, depending on the length of time for which data are available and the objectives of the analysis.
            -  The percent below detection should be tabulated for each pollutant and year. Initially, all data may be
               included regardless of the percent below detection.
            -  To investigate diurnal patterns, there must be a sufficient number of measurements of each pollutant and
               sampling hour to accurately assess the value. In initial national level analyses, a minimum of
               10 measurements for each air toxic and hour was set to try to include as many air toxics as possible in the
               analysis; more measurements are recommended if they are available.
            -  Data should be inspected on both a concentration and normalized basis for each available duration.
               Normalization enables a comparison of diurnal patterns among sites and pollutants even if pollutant
               concentrations vary widely.
            -  Data are normalized using the average concentration for each individual day, site, duration, and pollutant.
               To normalize data,
                   Calculate the average concentration by date, site, pollutant, and duration.
                   Divide the corresponding subdaily data by this average.
                   The resulting normalized values provide an indication of the magnitude of difference of the hourly concentration from the
                   average concentration for that day. A value of 1 indicates that the hourly concentration value is the same as the daily
                   average concentration. Values greater than one are greater than the average value (e.g., a value of 2 is 2 times greater
                   than the average value) while values less than one are lower than the average value (e.g., a value of 0.5 is half as large
                   as the average value).
June 2009                                  Section 5 - Characterizing Air Toxics

-------
                                Diurnal   Patterns
                                       Approach  (2 of 3)
          Subdaily measurements may be made on different sampling schedules which must be taken into
          account when aggregating multi-site data.
           -  Daily sampling schedules may differ between sites. For example, the sampling schedule for 3-hr
              measurements could begin at 12 a.m.,  1 a.m., or 2 a.m., potentially creating three staggered hourly patterns
              among sites.
           -  A visual representation of the possible 3-hr sampling schedules is shown in the figure below. The data points
              represent the sample start-time. The lines between points represent the duration of sample collection (3-hr).
              Subsequent sample lines are partitioned by shade for clarity.
           -  Diurnal analyses can be obscured by the different sample schedules when aggregating multi-site data if the
              number of samples for each hour is not the same across all hours. This issue is typically not a problem
              within a single agency's network, but needs to be considered when data from different jurisdictions are used
              (such as at the national scale).  Consider a hypothetical case in which Los Angeles sites used the
              2 a.m. sample schedule and  the rest of state used the 1 a.m. sample schedule.
           -  If one considers the first three hours of the day—the sample that  begins at 2 a.m. includes all three sampling
              schedules (i.e., all three samples overlap). For aggregating data  with multiple sampling schedules, we
              calculated a weighted average of the hour representing the middle of staggered sampling schedules (i.e.,
              2 a.m. sampling schedule for 3-hr duration) from  the raw data before completing the next steps.
           -  A detailed example will be examined in following  slides.
                                    Visual Representation of 3-hr Sampling Schedules
  Schedule starts at 2am
  Schedule starts at 1am
 Schedule starts at 12am
y „ A ^^KUKHKUKmmimmmmmmmmmxmmmmimmmim^
f--" A
k ^--T-T-T-T—T-T-T-T-T—r—T-T-T—, -^ ^.
> ^™™
1 1 1 1 1 1 1
D123456789
Hour

•.

iii
10 11 12 13 14
A/nte fhe finnm is arhifmrilv ni

i
15 1
ifnff fit 9 n r
                                                                              (14) and does not represent the whole day.
June 2009
Section 5 - Characterizing Air Toxics

-------
                      Diurnal  Patterns
                          Approach oof3)
       Summary statistics may be generated by pollutant and hour for the
       concentration and normalized data sets.
       - It is useful to inspect various parameterizations of the data (e.g., 10th,
         50th, and 90th percentiles), especially when more than 50% of data is
         below detection.
       - Include the standard deviation or confidence interval as a measure of
         uncertainty in the data.
       Subdaily patterns can be visualizes the using line graphs of
       summary statistics with confidence intervals or notched box plots.
June 2009                      Section 5 - Characterizing Air Toxics                         10

-------
                             Diurnal  Patterns
                  Effect of Sampling Schedule 
[1/
                                                                         Table 2. Aggregated Measurements

Aggregated
Hour

2
5
8
11
14
23
Weighted Average
Median
Concentration
(ug/m3)
0.738
0.739
0.927
0.580
0.482
0.839
                                    Weighted Average (WA) Formula:
                                      N = Number of Measurements
                                          C = Concentration
                                   Example calculation, aggregated to
                                       2 a.m. sample schedule:
                                   [17(66+66+64)]* [66*0.777+66*0.708+64*0.729]
                                             = 0.738
June 2009
Section 5 - Characterizing Air Toxics
11

-------
                         Diurnal  Patterns
                Effect of Sampling Schedule (2 of2)
     The figures are a graphical representation of the
     calculations performed in the previous slide.
     (The data are not the same as those used in the
     previous slide.)
     Figure (a) shows the 10th, 50th, and 90th
     percentile of national 3-hr benzene data. The
     noise in this pattern is due to varying amounts of
     data available from three sampling schedules
     which begin at 12, 1, or 2  a.m. Sampling-
     schedule differences are typical when
     aggregating 3-hr or 4-hr measurements and can
     obscure diurnal patterns.
     Figure (b) shows the same data as a weighted
     average by the most representative hour.
     Averaging clarifies the diurnal pattern showing a
     morning peak trend  as would be expected for
     benzene concentrations at most sites.
     This averaging  method  is  recommended when
     aggregating multi-site data if multiple sampling
     schedules are used.
                        Benzene 3-hr Subdaily Data
                      Raw Data (a)
~  0      5      10
                      Weighted Average (b)
                  o
                  c
                  o
                  o
                                         15
                            20
                  5 3
                  O)
                  re
2
                    1 -
                     0
              10
                       15
20
                                   HOUR
                    Figures show the 10th, 50th and 90th percentile of
                    national 3-hr benzene data. They were created with
                    SYSTAT11 and Microsoft Excel.
June 2009
Section 5 - Characterizing Air Toxics
                                 12

-------
                         Diurnal  Patterns

                   Commonly Observed Patterns

        The figure shows a sample of four commonly observed diurnal patterns using national 3-hr
        duration data. The sources, sinks, transport, and dispersion leading to each pattern are
        discussed in this section. Data were normalized as described in the approach to diurnal patterns.
       W
       C
       _O
       '•^
       (0
       0)
       o
       c
       o
       o
       0)
       N
       i_
       o
Midday Peak
    x- — .
                                   I
                                                     •*
                     —x-
Photo-chemical peak
                            -~x
Nighttime Peak
                                                            Nighttime pe
          Morning Peak
                              /7ot/r pea/c
                 -- — -A— -- — A- — --

          Invariant
         012345678
                      9  10 11 12 13 14 15 16 17 18 19 20 21  22 23
                           Hour
June 2009
                  Section 5 - Characterizing Air Toxics
                                       13

-------
                      Diurnal  Patterns
                            Morning Peak
       Morning peak patterns are observed
       from the combination of traffic emissions
       and mixing height dilution.
       The morning rush hour occurs while
       mixing heights are relatively low,
       causing a peak in concentration while
       emissions outweigh dilution.
       By mid-morning, mixing height dilution
       has outweighed traffic emissions,
       reducing  concentrations below their
       nighttime value and obscuring the
       remaining traffic emission patterns.
       Evening concentration increases are a
       consequence of mixing height lowering.
                      0       8       16       24
                                HOUR
                    Figure shows notched box plot of m-&p-xylenes
                    concentrations by hour at an urban site. Box
                    plots are defined in Preparing Data for Analysis,
                    Section 4. Several years of data are included.
                    The plot was created with SYSTAT11.
June 2009
Section 5 - Characterizing Air Toxics
14

-------
                        Diurnal  Patterns
1.8-
C1.6-
0
IVI\
x x

I1-4! 5 • • 'J
c a x - f •
0> 1 oj - x- - . x *,*H x
O '-^B
§ <
0 1
•D
0 _ _
.NO. 8
CD
§0.6
o
0.4

0.2
n
1 B H"*






A o-Xylene
A m & p-Xylene
~ n-Hexane
^Illlliy 1 C7CTA \~>UI>
a
>•
, x
T/?ese VOCs
i it i icii y
X
2.4
are emitted • ° .s s


x
.1
A
i— i
.— | ^S - O3
rt* g by motor vehicles . ~ ? *
• ^ • .5' 2
* A
X A,\ < X
• ° *i
V D * f A

X |" *"
-
Ethylbenzene
• Benzene
•1,3-Butadiene
n ZA
B ^j* X
* I-'1 "
n fn x x
"J 1 *
—
x Isopropylbenzene
n Toluene
x 2,2,4-Trimethylpentane









June 2009
0123456789 1011121314151617181920212223
                               Hour
This figure shows 1990-2005 national hourly data normalized by site, pollutant, and day for
all pollutants that exhibited a morning peak pattern on the national scale. Data were
normalized as described in the approach to diurnal patterns.

                      Section 5 - Characterizing Air Toxics
15

-------
                         Diurnal  Patterns
                              Daytime Peak
        The daytime pattern is driven by in
        situ secondary photochemical
        production mechanisms and
        mirrors the pattern of solar
        radiation.
         - Precursors of afternoon peak
           pollutants are typically emitted by
           motor vehicle sources and OH
           sinks. Afternoon peak pollutants
           experience daily dilution patterns in
           a manner similar to morning peak
           pollutants.
         - Secondary production of a pollutant
           (such as formaldehyde) must
           outweigh all these factors in order to
           create the observed pattern.
             CO
             C
             o
             "-I—•
             CO
             "c
             0)
             o
             c
             o
             o
                15
                10
                 0
                   X
                          Formaldehyde
                           X
                           X
                                              X
                  0         8         16        24
                               Hour
                   The figure shows notched box plots of national
                   3-hr formaldehyde concentrations by the middle
                   sampling schedule (as discussed in Slides 8-10).
                   The figure was created with SYSTAT11.
June 2009
Section 5 - Characterizing Air Toxics
16

-------
     1.3

   c
   .2 1.2
   +j
   5
   +j
   c 1 1
   
-------
                          Diurnal  Patterns
                                Evening Peak
      Mercury vapor is the only air
      toxic to exhibit a clear evening
      peak pattern in the air toxics
      investigated at the national
      level.  However, data from only
      a few sites were available so
      this analysis may  not be
      representative of a national
      pattern.
      Dilution appears to be the key
      factor affecting evening peak
      pollutants; emissions and sinks
      are likely invariant at the
      subdaily level.
                   Mercury Vapor
                 Monitoring Locations
                         Puerto Rico  '\j
     1.2
   c
   o
   11.1
   +J
   a)
   o
   o
   O  1
   •o
   a>
   N
   75

   o 0.9
     0.8
                                             Mercury Vapor
        012345678
9 1011 12131415161718192021 2223
    Hour
                                        1990-2005 national hourly mercury vapor data normalized by site,
                                        pollutant, and day. The figure was created with Microsoft Excel.
June 2009
Section 5 - Characterizing Air Toxics
                             18

-------
                           Diurnal  Patterns
                                      Invariant
Invariant patterns are
observed for global
background pollutants (i.e.,
pollutant is no longer
emitted).
These pollutants show no
sources or sinks and are
evenly distributed
worldwide so that transport "g
and dilution have no effect ==
on concentration.
                                 1.2
                               c
                               g
                               +j
                               5
                               +J
                               C
                               0)
                               o
                               c
                               o
                               O
 1.1
                               E
                               o0.9
           Carbon Tetrachloride
           Monitoring Locations
                                                                          Carbon Tetrachloride
                                      1  2 3  4  5 6 7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
                                                            Hour
                                   The figure shows 1990-2005 national 3-hr carbon tetrachloride data
                                   normalized by site, pollutant, and day. Carbon tetrachloride is the
                                   only pollutant to exhibit an invariant diurnal pattern on the national
                                   scale. The figure was created with Microsoft Excel.
June 2009
Section 5 - Characterizing Air Toxics
                                                                                  19

-------
                           Diurnal  Patterns
                           Seasonal Differences
                                                             12
Seasonal differences may be observed in the diurnal patterns of some air toxics.
 For example, the diurnal pattern of formaldehyde on a national scale is highly affected by season, as seen
in Figures a and b, because the main production of formaldehyde depends on sunlight which is less
abundant in winter months; thus, midday production decreases significantly during these months.
The diurnal pattern of benzene
shows less seasonal
dependence because it is
driven by diurnal meteorology -;
that is consistent throughout
the year and benzene is less ? 6
photochemically reactive
(Figures cand d).
                                           Formaldehyde winter
     Figures show summary statistics
     of national diurnal patterns for
     formaldehyde and benzene
     partitioned into summer and
     winter patterns. Figures were
     created with Microsoft Excel.
                                           Formaldehyde summer
June 2009
Section 5 - Characterizing Air Toxics
                                                                                      20

-------
                        Diurnal  Patterns
                                 Summary
        Diurnal patterns of air toxics are influenced by sources, sinks, and
        dispersion processes that vary on a subdaily basis.
        Diurnal patterns are useful in classifying source type, transport, and
        reactivity of air toxics. These patterns can be used to improve exposure
        modeling, air quality modeling, and emissions  inventories.
        Most air toxics data typically follow four diurnal patterns although many air
        toxics have not been characterized because of sampling and detection
        limitations.
        -  Morning peak. Driven by mobile source emissions and mixing height dilution
        -  Afternoon peak. Driven by secondary photochemical production
        -  Nighttime peak. Driven by mixing height dilution
        -  Invariant: Typical of global background pollutants that are not dependent on
           sources, sinks, transport, or dilution.
        If the diurnal pattern of a pollutant differs from  the typical patterns shown at
        a national level, the analyst should explore possible reasons for the
        variation such as the presence of a nearby source.
June 2009                        Section 5 - Characterizing Air Toxics                            21

-------
                    Day-of-Week  Patterns
                 Overview  and Conceptual  Model
        Day-of-week patterns can be useful in
        identifying emissions sources.
        Expectations
         - Emission sources that operate every day,
           24 hours per day (e.g., refineries) will not
           show a day-of-week pattern.
         - Emission sources with lower emissions on
           weekends should lead to lower ambient
           weekend concentrations of the emitted air
           toxics. Traffic studies (e.g., Chinkin et al.,
           2003) show that in many cities, light-duty
           vehicle activity is lower on Sunday
           compared to other days of the week
           (Figure a).
         - Emission sources with higher emissions on
           weekends should lead to high ambient
           weekend concentrations of the emitted air
           toxics. For example, studies in the Los
           Angeles area showed that recreational
           vehicle emissions may be higher on
           Saturdays (Figure b).
                                                                        Los Angeles
                    14000
                              Interior Basin
                             Light-duty vehicles
                       0:00
                           5:00
                               10:00  15:00

                                Hour
                                       20:00
                                        Chinkin etal., 2003
                            I Mon-Thurs 0 Friday D Saturday • Sunday
              Estimated allocation of
              residential emissions
              activity by day of week
              in Los Angeles
              (Coeetal.,2003)
June 2009
Section 5 - Characterizing Air Toxics
                                                                 BBQ
                                      Rec.
                                      Boats
 Rec.  Paint/
Off-Rd  Solvent
 RVs
            22

-------
                    Day-of-Week  Patterns


                                     Approach


      •  Day-of-week patterns are typically constructed from 24-hr averages. See Preparing
         Data for Analysis, Section 4, for a complete description of how to construct valid
         averages.
          - If subdaily data are available, it is sometimes useful to look at data subsets. For example,
            when creating day-of-week trends of an air toxic that exhibits morning peak diurnal
            patterns, the rush hour peak data subset (i.e., 6 to 9 a.m.) will provide more information
            about the mobile source signature than the 24-hr average. Mobile source signatures
            typically show day-of-week patterns, while mixing height dilution will occur on any day of
            the week.  24-hr averages will be more heavily weighted by mixing height dilution and may
            obscure mobile source day-of-week trends.
      •  A sufficient number of records for each  day of the week is needed to create a
         representative day-of-week pattern. The actual data requirements will vary
         depending on the analysis types and  variability of the data, among other factors.
          - Statistically, decreasing the sample size increases the confidence interval (Cl). In general,
            if the 95% CIs of two data subsets (e.g., weekend vs. weekday concentrations) do not
            overlap, there is good evidence that the subset population means are different; therefore, it
            will be more difficult to discern statistically significant patterns with smaller sample sizes.
          - Quantify patterns using the statistical treatments described earlier in this section.
      •  Investigate the day-of-week pattern of multiple statistics (e.g., 10th, 50th, and, 90th
         percentile) with the standard deviation or confidence intervals as a measure of
         uncertainty.
      •  If data are  insufficient for each day to determine a pattern, weekday vs. weekend
         patterns may be investigated.

June 2009                            Section 5 - Characterizing Air Toxics                                23

-------
             Day-of-Week  Patterns o of2)
                              Example
       In Figure (a), benzene concentrations at an
       urban site are statistically significantly lower
       on Sunday.  The concentrations on
       Saturday seem slightly lower, but
       differences are not statistically significant.
       These results are consistent with our
       conceptual model of light-duty vehicle
       traffic.
       For carbon tetrachloride (Figure b), we
       expect concentrations to be the same every
       day.  The central tendencies of the
       concentrations at the same site are
       consistent.
           The figures show notched box plots of 24-hr concentrations by day
           of week at selected sites. They were created with SYSTAT11.
                                                (a)
                             Benzene
                        
                                                o

                                                5
                                                  0.0
June 2009
Section 5 - Characterizing Air Toxics
24

-------
             Day-of-Week  Patterns
                                Example
       Sometimes, not enough data are
       available to determine patterns by
       day of week—in some cases, the
       data can be combined into weekday
       vs. weekend groups.
       In the example, benzene
       concentrations at an urban site are
       lower on weekends than on
       weekdays (the difference in medians
       is statistically significant). These
       findings make sense because of the
       urban location of the monitor and
       lower motor vehicle emissions on the
       weekend compared to weekdays.
       The inspection of day-of-week
       patterns of all air toxics was not
       performed at a national level.
              o

              I
              CD
              c
              
-------
               Day-of-Week Patterns

                            Summary

      Typically, mobile source air toxics show the most obvious day-of-
      week pattern consistent with traffic patterns. Sunday
      concentrations were particularly low for most mobile source air
      toxics, a pattern consistent with reduced traffic.
      In general, day-of-week patterns can be difficult to discern due to
      interference from other sources, sinks, or meteorology.
      A low number of samples can obscure underlying patterns.
      In exploratory investigations of national-level data, few non-mobile
      source air toxics showed a clear day-of-week pattern.
      Note that day-of-week patterns  are highly dependent on the
      proximity of the monitor's site to sources, the emission sources'
      schedule, and meteorology (e.g., wind direction); site-level
      examinations may provide a better explanation.
June 2009                     Section 5 - Characterizing Air Toxics                        26

-------
                  Seasonal Patterns

                           Overview

     Understanding seasonal differences in air toxics
     concentrations helps analysts
      • Formulate or evaluate a conceptual model of emissions,
       formation, removal, and transport of an air toxic.
      • Better understand source types.
      • Continue to validate data, i.e., do data meet expectations for
       seasonal variation?
      • Construct and interpret annual averages when a season's data
       are missing from the average (e.g., if the data for a winter quarter
       are missing, what biases in the annual average can be
       expected?).
June 2009                   Section 5 - Characterizing Air Toxics                      27

-------
                        Seasonal Patterns
                            Conceptual Model
        Cool season expectations
         - Mixing heights are lower in the cold months.  Low mixing heights create less air
           available for pollutant dispersion which causes higher ambient concentrations.
         - Temperatures are lower and sunlight is reduced in cold months. This
           combination can lead to a reduction in evaporative emissions (e.g., gasoline)
           and reduced photochemistry. Reductions in temperature and sunlight also limit
           formation of hydroxyl radicals which efficiently oxidize many air toxics.
         - Typically more precipitation occurs during winter months and reduces dust
           emissions.
        Warm season expectations
         - Mixing heights are higher in warm months, allowing more dilution and transport
           of air toxics which, in turn, reduces ambient concentrations.
         - Higher temperatures and increased sunlight in warm months lead to an increase
           in evaporative emissions and photochemistry.
         - Conditions are typically drier, producing more dust.
         - Wildfire activity can also cause an increase in concentrations of pollutants
           emitted in smoke.
June 2009                         Section 5 - Characterizing Air Toxics                             28

-------
                           Seasonal  Patterns
                                 National  Trends
         Seasonal patterns observed
         at a national level are
         shown in the table.
         These air toxics were
         selected because they were
         the ones with sufficient data
         for analyses.
          - Minimum of three valid
            seasonal averages by site
            and year
          - At least 20 monitoring sites
            meeting the above criteria
          - Additionally, limited to
            pollutants investigated in
            diurnal variability and
            annual analyses to focus
            on similar pollutants.
         Most of the VOCs, with the
         exceptions of styrene and
         isopropylbenzene, are cool
         season pollutants as
         expected.
         We are not sure why
         carbon tetrachloride shows
         a warm season peak—we
         expected it to be invariant.
         No obvious data issues
         suggested this pattern.
Pollutant Name
1 ,3-Butadiene
n-Hexane
2,2,4-Trimethylpentane
m- & p-Xylene
Tetrachloroethylene
Toluene
o-Xylene
Ethylbenzene
Benzene
Lead TSP
Dichloro methane
Styrene
Isopropylbenzene
Methyl Chloroform
Chloromethane
Carbon Tetrachloride
Nickel TSP
Manganese TSP
Chromium TSP
Acetaldehyde
Propionaldehyde
Chloroform
1,4-Dichlorobenzene
Formaldehyde
Pattern
Cool
Cool
Cool
Cool
Cool
Cool
Cool
Cool
Cool
Cool
Cool
Indeterminate
Indeterminate
Invariant
Warm
Warm
Warm
Warm
Warm
Warm
Warm
Warm
Warm
Warm
Number of
sites
195
159
119
256
137
137
261
262
306
149
187
207
91
89
245
240
44
71
61
163
112
102
97
163
Median CV
0.38
0.30
0.29
0.29
0.29
0.29
0.28
0.28
0.27
0.25
0.25
0.33
0.31
0.12
0.09
0.09
0.20
0.20
0.21
0.21
0.27
0.29
0.32
0.36
Median annual
concentration
(u.g/m3)
0.16
0.88
0.51
1.10
0.26
2.38
0.46
0.42
1.03
0.018
0.44
0.16
0.068
0.15
1.20
0.56
0.0026
0.015
0.0039
1.65
0.28
0.123
0.19
2.75
                                                                     McCarthy et. al, 2007
June 2009
Section 5 - Characterizing Air Toxics
29

-------
                       Seasonal  Patterns

                                   Approach

      •  Investigation of seasonal variability patterns using normalized monthly and/or
        quarterly averages.
         -  See Preparing Data for Analysis, Section 4, for a complete description of how to
            construct valid monthly and quarterly averages.
         -  Quarterly averages may be calendar quarters or seasonal quarters depending on
            the aim of analyses.
      •  Keep track of the percentage of data below detection; pollutants and years with
        >85% of data below detection result in too much  bias to draw conclusions.
      •  Preferably, inspect monthly data for seasonal patterns if sufficient data are
        available.
         -  Noise in monthly data may be high due to fewer measurements.  For this reason,
            investigating quarterly (or specific monthly groupings relevant to the site) data in
            addition to monthly data can be useful.
         -  Area-specific seasonal aggregations can  be made.
      •  Normalize the data using the average value for each year, site, and pollutant.
         -  Calculate an annual average for each year, site, and pollutant.
         -  Divide the corresponding monthly or quarterly average by the annual average.
      •  Investigate seasonal patterns of normalized data using notched box plots or
        summary statistics with a measure of confidence (e.g., standard deviation or
        confidence intervals).

June 2009                          Section 5 - Characterizing Air Toxics                              so

-------
                    Seasonal Patterns
           Using Normalized National-Scale Data
       To illustrate the use of
       normalized data, consider the
       monthly patterns of
       propionaldehyde and
       formaldehyde, both of which
       show concentrations that
       appear higher in summer
       (Figures a and b).
       However, normalized
       concentration patterns
       (Figures c and d) show that
       the monthly pattern of
       formaldehyde is more
       significant than that of
       propionaldehyde.
       On a relative basis, Figures c
       and d show that concentrations
       of formaldehyde are nearly
       three times higher in the
       summer than in winter.
                                       Propionaldehyde
                              (b)
                  Formaldehyde
                             0 1 2 3 4 5 6 7 8 9 10 11 12 13
                             d)
6 7 8 9 10 11 12 13
                                 0.0
                                   o 1
           2 3 4 5 6 7 8 9 10 11 12 13
               MONTH
                                                        0 1  2 3 4 5 6 7 8 9 10 11 12 13
                                                               MONTH
June 2009
Section 5 - Characterizing Air Toxics
                               31

-------
                   Seasonal  Patterns
                       Cool Season Peak
      Cool seasonal patterns are generally observed because mixing
      heights are lower in winter and the enhanced removal by
      photooxidation observed during summer is absent.
      Heating-related emissions, such as wood burning, will typically be
      higher during winter months, contributing to increased concentrations
      of some air toxics.
       Benzene and 1,3-butadiene, two mobile source air toxics, show
       season peaks on the national scale.
                                                    1,3-butadiene
                                                    cool
    Benzene

                           Figures show normalized
                           monthly national concentration
                           distributions for 2003-2005.
                           Figures were created with
                           SYSTAT11.
        o 1
2 3 4 5 6 7 8 9 10 11 12 13
    MONTH
0 1  2 3 4 5 6 7  8 9 10 11 12 13
       MONTH
June 2009
               Section 5 - Characterizing Air Toxics
                      32

-------
                   Seasonal  Patterns
                      Warm Season Peak
To display a warm peak pattern, summertime
sources (emissions or secondary production)
must significantly outweigh the higher mixing
heights that occur during warm months.
Chloroform emissions from water treatment
processes and swimming pools may be
enhanced during summer months, explaining
the observed pattern.
It has been estimated that 85-95% of
formaldehyde concentrations originate from
secondary photochemical production, which
supports the observed warm season peak
(Grosjean etal., 1983).
                                               2.0
                                               1.5
                                               1.0
                                               0.5
                                                      Chloroform
                                               0.0
                                                  \  i i i
                                                  j	i
                                                      i  i i i  i
                                   j_
                                                              i  i i
                                                 0 1 2 3 4 5 6 7 8 9 10 11 12 13
                                                        MONTH
                                                    Formaldehyde
                                              CD
June 2009
          Figures show normalized monthly national concentration
          distributions for 2003-2005. Figures were created with SYSTAT11.
Section 5 - Characterizing Air Toxics
                    T3
                    CD
                    N
                    "ro

                    o
                                              O 1.0-
                                               0.5-
                                               0.0
                                                 0 1
                           3 4 5  6 7 8 9 10 11 12 13
                              MONTH
                                                               33

-------
                        Seasonal  Patterns
                         A  National Perspective
        The figure shows the 10th, 50th,
        and 90th percentiles of national
        2003-2005 normalized seasonal
        concentrations for selected
        pollutants by calendar quarter.
        Similar plots, such as regional
        summaries, can be prepared for
        any combination of sites.
        Parameters at the top of the figure
        show warm season peaks while
        those at the bottom show cool
        season peaks.
        Warm season peaks are likely due
        to secondary photochemical
        production and dust; it is unclear
        why carbon tetrachloride shows a
        warm season peak.
        Cool season peaks are primarily
        due to lower mixing heights in the
        winter.
                                         Formaldehyde
        Manganese TSP
          Acetaldehyde
      Carbon Tetrachloride
        Die h I oro methane
           Lead TSP
            Benzene
          M_P Xylene
       Tetrachloroethylene
          1,3-Butadiene
                   0.2
0.4
                                                   Warm
                             t
                             1
0.6   0.8   1    1.2   1.4
 Normalized Concentration
                                                             I
                                1 st Quarter I0tri-90th Percentile
                                2nd Quarter 10th-90th Percentile
                                3rd Quarter 10lh-90th P ercentile
                                4th Quarter I0th-90th Percentile
                                50th Percentile
June 2009
Section 5 - Characterizing Air Toxics
                                                          Figure created with Grapher.

-------
                        Seasonal  Patterns

                                     Summary


        Three seasonal patterns were observed at a national level
         -  Warm season peak. Photochemical production of secondary air toxics (e.g., formaldehyde
            and acetaldehyde) can be important at some sites. Concentrations (e.g., manganese) may
            also be high because of dust events and seasonally increased emissions (e.g.,
            chloroform).
         -  Cool season peak. Concentrations can be high because of lower inversions, changes in
            emissions through the use of wood-burning or fuel oil for home heating, and reduced
            photochemical reactivity.
         -  Invariant. Invariant seasonal patterns are not commonly observed, but are typical of global
            background pollutants that are not affected by emissions changes or dilution which cause
            seasonal patterns of other air toxics.
        The quality of many air toxics data was low or seasonal patterns inconsistent at the
        national level; site level investigations may reveal additional seasonal patterns.
        Seasonal patterns assist in air toxics data analysis by providing insight into the
        chemistry, sources, and transport of air toxics. Deviation from expected seasonal
        patterns at a site may indicate additional sources of interest or transport.
June 2009                           Section 5 - Characterizing Air Toxics                               35

-------
                      Spatial  Patterns
                              Overview
       Air toxics data are typically collected in urban locations.  Given the
       large number of air toxics, their often disparate sources, and the
       wide range of chemical and physical properties, understanding
       spatial patterns and gradients is important.
       Understanding these gradients may help us
        - Improve monitoring networks, (Are we measuring in the right places to
          meet network objectives? Do we have the right number of monitors?)
        - Improve emission inventories. (How finely do emissions need to be
          spatially allocated?)
        - Improve models, including exposure models. (Are gradients in
          pollutants being properly represented in the model?)
        - Identify contributing sources.  (Are concentrations higher when winds
          are predominantly from the direction of a source?)
June 2009                      Section 5 - Characterizing Air Toxics                         36

-------
                     Spatial  Patterns
                       Conceptual Model
     The concentration of a given species at any location is determined by
     local production, local sinks, and transport.
      •  Production.  Local emissions—higher emissions lead to higher
        concentrations.
      •  Loss. Local removal (chemical or deposition)—reactive compounds and
        large particles are removed faster resulting in lower concentrations.
      •  Transport.  Movement of species in the atmosphere—pollutants from
        sources are dispersed or diluted; local concentrations can either increase
        or decrease.


       c/(Concentration)   _    ,   ..             _
      —	 = Production - Loss + Transport
               off
June 2009                     Section 5 - Characterizing Air Toxics                        37

-------
                               Spatial  Patterns
                                         Methods
       •  To investigate spatial patterns, calculate one site average value for each air toxic for the
         time period of interest.  This method removes temporal variability and focuses on spatial
         patterns.
          -   The method is only valid if sites are temporally comparable. If not, results may be driven
              by a mixture of temporal and spatial patterns and will be difficult to interpret.
          -   Averages should be constructed from valid aggregates. For example, if data are available
              for 2003-2005, you might first calculate the three valid annual averages then aggregate
              these averages to one site average. If data are not sufficient to create valid annual
              averages use valid seasonal or monthly averages.  Note that site average values may be
              biased by temporal patterns if data are not representative of the full year.  Relative spatial
              comparisons are still valid as long as data are available for all sites during the same time
              period.
              If possible, multiple years of data should be used in order to mitigate meteorological
              effects.
              Keep track of the percent of data below detection for each site average.
       •  Visualize concentration ranges by plotting summary statistics for each pollutant.
          -   These plots give an overview of concentration values.
          -   Supplementary data, such as levels of concern for increased cancer or noncancer risk
              (i.e., health levels of concern), remote  background concentrations, and method detection
              limits (MDLs), are useful to put concentration data into  perspective.
       •  Visualize site level concentrations using  a mapping program to overlay supplementary
         data, such as the  percent of data below detection, to enrich conclusions.
       •  The visualization methods may illuminate site-level data anomalies which become
         apparent upon comparison to other sites.


June 2009                             Section 5 - Characterizing Air Toxics                                  38

-------
              National  Concentration  Plots

                                 Overview

     • To put air toxics concentrations measured at a site or sites in perspective,
       a summary of the typical national concentration ranges is useful.
     • The following national site average concentrations for 2003-2005 air
       toxics concentrations exemplify one way of visualizing summary statistics
       and supplementary data.
        - Are concentrations high, typical, or low?
        - How does this concentration compare to  remote background? To MDL? To
          levels of concern?
     • The following figures show the 5th, 25th,  50th (median),  75th, and 95th
       concentration ranges by pollutant; supplementary data are then overlaid
       as a progression. Wide  ranges in concentration across sites  indicate
       greater spatial variability of that pollutant.
     • The number of sites included are shown on the right axis for each
       pollutant.
     • Pollutants outlined in red represent <15% of samples nationally above
       their respective MDLs. The distribution of concentrations for  these
       pollutants are mostly based on MDL/2 and should not be considered
       quantitative. Data used for these plots is included in Preparing Data for
       Analysis, Section 4.                   A|| perspective p|ots were created in Grapher.

June 2009                       Section 5 - Characterizing Air Toxics                            39

-------
                 National  Concentration  Plots
1 ,1 ,2,2-Tetrachloroelhane

1 ,1 ,2-Trichloroethane

1 ,2-Dichloropropane

1,3-Butadiene
1 ,4-Dichlorobenzene
Acetaldehyde
Acrylonitrile
Benzene
Benzyl Chloride

Carbon Tetrachloride
Ethylene Dibromide

Ethylene Dichloride
Ethylene Oxide
Hexachlorobutadiene
Tetrach loroethy lene
0.0
r~M — i 22s

1 H I 211

\ • 1 229

5% i m ~i 95% 2/8
[^^^•IH 202
1
i
1 • 1 307;
1 III 110}
£
| | | 280
r~n — i ^5

1 1 1 1 253
li 16
III II 153

D01 0.001 0.01 0.1 1 10 100 1000 10000
                              Concentration (pg/m3)
                   5th-95th Percentile Range of 2003-2005 Site Average Concentrations
                   25th-75th Percentile Range of 2003-2005 Site Average Concentrations
                   Median 2003-2005 Site Average Concentration
                                                                 Interpretation
                                Summary plots provide an
                                overview of the spatial variability
                                of, and a comparison within and
                                between, air toxics. Spatial
                                variability is represented by the
                                width of the bar—nationally, air
                                toxics concentrations typically
                                varied by a factor of 3 to 10.
                                The figure shows the high spatial
                                variability of 1,3-butadiene.  This
                                variability is due to the relatively
                                high reactivity of the compound.
                                Conversely, carbon tetrachloride
                                shows less spatial variability due
                                to its low removal rate from the
                                atmosphere and the absence of
                                domestic emissions.
                                A table of national concentration
                                summary statistics can be found
                                in the appendix to Preparing Data
                                for Analysis, Section 4.
          Data outlined in red has < 15% of measurements above detection
June 2009
Section 5 - Characterizing Air Toxics
40

-------
  1,1,2,2-Tetrachloroethane

     1,1,2-Trichloroethane

     1,2-Dichloropropane

         1,3-Butadiene

     1,4-Dichlorobenzene

         Acetaldehyde

          Acrylonitrile

            Benzene

        Benzyl Chloride

     Carbon Tetrachloride

     Ethylene Dibromide

     Ethylene Dichloride

        Ethylene Oxide

     Hexachlorobutadiene

     Tetrachloroethylene
                       National  Concentration  Plots
             IX
            XI
               X
             X
                   I
        I    I   XI
                 1C
           •CIJ2L
                0.0001  0.001   0.01   0.1    1     10    100
                               Concentration (|jg/m3)
228

211

229

278

202

163

1241
  <
307;

110;

280

235

253

16

153

273
                                            cr
                                            CD
                                  1000   10000
                I
               X
5th-95th Percentile Range of 2003-2005 Site Average Concentrations
25th-75th Percentile Range of 2003-2005 Site Average Concentrations
Median 2003-2005 Site Average Concentration
Median Site Average MDL
Minimum-Maximum Range of 2003-2005 Site Average MDL
                                                Adding MDLs
MDL ranges (thin lines) and median
MDLs (X's) are added to the plot to
illustrate how well pollutants are
monitored.
The minimum-maximum range of
MDL concentrations and the median
MDL concentration  for a 2003-2005
site average are shown.
The median concentration of the
pollutants outlined in red are always
below the median MDL.  These
pollutants are not adequately
monitored in the national ambient
monitoring networks (i.e., only a few
sites have >15% of data above
detection).
           Data outlined in red has < 15% of measurements above detection
June 2009
                  Section 5 - Characterizing Air Toxics
                                    41

-------
   1.1,2,2-Tetrachloroethane

     1,1,2-Trichloroethane

      1,2-Dichloropropane

          1.3-Butadiene

      1,4-Dichlorobenzene

          Acetaldehyde

           Acrylonitrile

             Benzene

         Benzyl Chloride

     Carbon Tetrachloride

      Ethylene Dibromide

      Ethylene Dichloride

         Ethylene Oxide

     Hexachlorobutadiene

      Tetrachloroethylene
                        National  Concentration  Plots
           I
I  +1   Ki
0.0001  0.001
                            0.01   0.1    1
                                Concentration
                                             228

                                             211

                                             229

                                             278

                                             202

                                             163

                                             124

                                             307


                                             f«
                                               e
                                             280

                                             235

                                             253

                                             16

                                             153

                                             273
                10
                                                  100
1000  10000
                     5th-95th Percentile Range of 2003-2005 Site Average Concentrations
                     25th-75th Percentile Range of 2003-2005 Site Average Concentrations
                     Median 2003-2005 Site Average Concentration
                     Median Site Average MDL
                     Minimum-Maximum Range of 2003-2005 Site Average MDL
                     1MO-6 Cancer Benchmark (EPA OAQPS)
                     Noncancer ReferenceConcentration (EPA OAQPS)
                                          Risk Levels
Chronic exposure concentration
associated with a 1-in-a-million cancer
risk (red crosses) and noncancer
reference concentrations (red
diamonds) are added to the plot to
show a relationship to human health.
National measured annual average air
toxics concentrations are usually above
the chronic exposure concentration
associated with a 1-in-a-million cancer
risk and below noncancer reference
concentrations.
Note that the pollutant concentration
ranges outlined in red may actually be
below levels of concern, but the data
are not resolved well enough to
characterize risk.
            Data outlined in red has < 15% of measurements above detection
June 2009
           Section 5 - Characterizing Air Toxics
                                                                                             42

-------
                        National  Concentration   Plots
   1,1,2,2-Tetrachloroethane

      1.1,2-Trichloroethane

      1,2-Dichloropropane

          1,3-Butadiene -

      1,4-Dichlorobenzene •

          Acetaldehyde

           Acrylonitrile

             Benzene -

         Benzyl Chloride

      Carbon Tetrachloride

       Ethylene Dibromide -

       Ethylene Dichloride

         Ethylene Oxide

      Hexachlorobutadiene

      Tetrachloroethylene
                 0.0001  0.001
       0.01   0.1    1     10    100
           Concentration (ug/m3)
       228

       211

       229

       278

       202

       163

       124

       307


       ""

       280

       235

       253

       16

       153

       273
                                          I
1000  10000
                 I
                X
5th-95th Percentile Range of 2003-2005 Site Average Concentrations
25th-75th Percentile Range of 2003-2005 Site Average Concentrations
Median 2003-2005 Site Average Concentration
Median Site Average MDL
Minimum-Maximum Range of 2003-2005 Site Average MDL
1MO-6 Cancer Benchmark (EPA OAQPS)
Noncancer ReferenceConcentration (EPA OAQPS)
Remote Background Concentration (McCarthy et al., 2006)
 Remote  Background

Remote background concentrations
(triangles) are added to the plot to
show the lowest levels expected to
be seen in the remote atmosphere;
urban concentrations of most air
toxics should not typically fall below
this value.
As expected, most air toxics  are a
factor of 5-10 above their remote
background  concentrations, with the
exception of carbon tetrachloride -
the only air toxic dominated by
background  concentrations.
Background  estimates are provided
for about 40  air toxics (see
Preparing Data for Analysis,
Section 4).
            Data outlined in red has < 15% of measurements above detection
June 2009
                   Section 5 - Characterizing Air Toxics
                                                         43

-------
  1,1-Dichloroethane


      1,4-Dioxane


    3-Chloropropene


      Bromoform


    Dichloromethane


     Formaldehyde


Methyl Tert-Butyl Ether


    Trichloroethylene


     Vinyl Chloride
0,001
                        National  Concentration  Plots
                               9*  •
         I  »

                       0.01
0.1      1     10    100
  Concentration ((jg/m3)
                                                  1000
                                                             224
                                                              14
                                13
                                                             94
                                                                c

                                                                I
                                                             277
                                                                CD
                                                              163 w
                                                             207
                                                             268
                                                             254
10000
                     5th-95th Percentile Range of 2003-2005 Site Average Concentrations
                     25th-75th Percentile Range of 2003-2005 Site Average Concentrations
                     Median 2003-2005 Site Average Concentration
                     Median Site Average MDL
                     Minimum-Maximum Range of 2003-2005 Site Average MDL
               +    1*10-« Cancer Benchmark (EPA OAQPS)
               ^    Noncancer ReferenceConcentration (EPA OAQPS)
               <]    Remote Background Concentration (McCarthy et aL, 2006)
Additional VOCs

These VOCs are usually below their
1-in-a-million cancer risk level and
noncancer reference
concentrations.
Note that the 1 -in-a-million cancer
risk level for formaldehyde was
changed in 2004 from 0.08 to 182
ug/m3. 1-in-a-million cancer risk
levels plotted are provided by EPA
OAQPS.
See the NATA website for more
information  regarding risk
characterization,
http://www.epa.gov/ttn/atw/nata1999/nsata99.html.
For example,  analysts can
investigate the potential for health
effects from air toxics by target
organ/system.
            Data outlined in red has < 15% of measurements above detection
June 2009
          Section 5 - Characterizing Air Toxics
                                                                                                        44

-------
    Benzo(A)PyrenePM10


  Benzo(B)Fluranthene PM10


 Benzo(K)FluoranthenePM10 -


     Benzo[A]Anthracene


       Benzo[A)Pyrene


    Benzo[B] Fluoranthene


    Benzo[K]Fluoranthene -


           Chrysene


Dibenz(A-H)Anthracene PM10


   Dibenzo[A,H)Anthracene -


Indeno[1,2,3-Cd] Pyrene PM10


    lndeno[1,2,3-Cd]Pyrene


         Naphthalene
                        National  Concentration  Plots
                                                                        SVOCs
ao
nc
=1* i
+
— 1_
p







xa~i




1 IH-
1 	 la

l^k ^f
+
E>
IXI-I-

•1m —
18
18
18
30
30
30 §
30 |
CO
30 §f
18
30
18
30
39
                 1E-008 1E-007 1E-006 1E-005 0.0001 0.001
                                 Concentration
0.01   0.1
                                                             10
                I
                X
                   5th-95th Percentile Range of 2003-2005 Site Average Concentrations
                   25th-75th Percentile Range of 2003-2005 Site Average Concentrations
                   Median 2003-2005 Site Average Concentration
                   Median Site Average MDL
                   Minimum-Maximum Range of 2003-2005 Site Average MDL
                   1*10-* Cancer Benchmark (EPA OAQPS)
                   Noncancer ReferenceConcentration (EPA OAQPS)
                                                                         The figure indicates that most
                                                                         SVOCs are below their 1-in-a-
                                                                         million cancer risk level. However,
                                                                         the data quality for many SVOCs
                                                                         is poor—less than 15% of
                                                                         measurements are above the
                                                                         detection limit.
                                                                         Only naphthalene is above its 1-in-
                                                                         a-million cancer risk level at most
                                                                         sites.
                                                                         Routine  measurements of SVOCs
                                                                         are relatively rare across the
                                                                         United States.
                                                                     * semi-volatile organic compounds
            Data outlined in red has < 15% of measurements above detection
June 2009
                                       Section 5 - Characterizing Air Toxics
                                                                45

-------
                       National  Concentration  Plots
        Arsenic PM2.5


         Arsenic PM10


          Arsenic TSP


        Beryllium PM10


         Beryllium TSP -


       Cadmium PM2.5


        Cadmium PM10


        Cadmium TSP


         Nickel PM2.5


          Nickel PM 10


          Nickel TSP
                1E-006   1E-005   0.0001   0.001     0.01
                               Concentration (^g/m.3)
                                                    0.1
434


38


82


27


62 j


263 j

  (
37 <


105


428


36


101
                        3
                        cr
                        CD
                                                            1
                    5th-95th Percentile Range of 2003-2005 Site Average Concentrations
                    25th-75th Percentile Range of 2003-2005 Site Average Concentrations
                |    Median 2003-2005 Site Average Concentration
               X    Median Site Average MDL
                   i Minimum-Maximum Range of 2003-2005 Site Average MDL
               +    1*10-s Cancer Benchmark (EPA OAQPS)
               +    Noncancer ReferenceConcentration (EPA OAQPS)
               <|    Remote Background Concentration (McCarthy et al., 2006)
                              Metals
All metals are well below their
noncancer reference concentrations.
With respect to 1-in-a-million cancer
risk level, arsenic is the most
important of these metals, with more
than 75% of sites measuring
concentrations above the 1-in-a-
million cancer risk level for PM25.
PM25 metals are more commonly
measured in  rural and remote
locations via  the IMPROVE network;
therefore, the lower range of PM2 5
concentrations commonly overlaps
remote background concentrations.
Only four metals could clearly be
shown in one figure (monitoring data
are available for many more); ranges
for other metals can be found in the
appendix to Preparing Data for
Analysis, Section 4.
           Data outlined in red has < 15% of measurements above detection
June 2009
Section 5 - Characterizing Air Toxics
                                     46

-------
            National  Concentration Plots

                           Summary

      The national concentration plots provide perspective for
      local, state, regional, and tribal analysts to see how their
      data compare. A full list of the concentrations shown in the
      plots is provided in Preparing Data for Analysis, Section 4.
      Air toxics concentrations typically vary spatially by a factor
      of 3 to 10, depending on the pollutant.
      Almost all air toxics are below noncancer reference
      concentrations (except acrolein, not shown).
      At a national level, some air toxics are above their
      respective chronic exposure concentration associated with
      a 1-in-a-million cancer risk
      (http://www.epa.gov/ttn/atw/toxsource/table1.pdf).
      Most air toxics are well above their remote background
      concentrations.
June 2009                   Section 5 - Characterizing Air Toxics                       47

-------
               Spatial  Patterns - Maps

                                Overview

    •  National concentration plots placing air toxics in a national context provide
      useful information for quantifying air toxics spatial variability. To view spatial
      patterns, though, it is also useful to plot site-level data on a map.
    •  Example maps of site average and risk-weighted concentrations (i.e., risk
      estimates based on ambient measurements) from 2003 through 2005 are
      shown in the following slides. These maps help analysts characterize the
      national picture of air toxics and are most useful in a qualitative sense to
      compare among sites, look for spatial patterns, and note data anomalies. The
      maps also illustrate a method of displaying data that can be applied to sites
      within a city, state, or region.
    •  In the examples, concentrations are displayed as proportional symbols which
      are color-coded to impart additional information.
    •  Maps are useful for communicating a range of information—similar depictions
      can be made using risk-weighted concentrations, percent change per year, or
      ratios—over a range of spatial dimensions (e.g., city, state, or region).
    •  The volume of concentrations is indicated on the  maps by the diameter of the
      circle (the three sizes in the map legends) while the underlying  percent of data
      below detection is signified by color. All maps were created with ESRI's
      ArcMap software.

June 2009                       Section 5 - Characterizing Air Toxics                           48

-------
                 Spatial  Patterns -  Maps
              Benzene Concentrations  2003-2005
       The largest circle on the map
       corresponds to 17 |jg/m3.
                                                            Concentration (|jg/m )
                                                                  0.1
                                                               O  1
                                                              O
                                   10
                            •  < 50% Below Detection

                               50 to 85% Below Detection

                            •  > 85% Below Detection
     The map shows that benzene concentrations have ambient measurements above detection across the
     country with only a few exceptions (i.e., 0-50% of the measurements at most sites are below detection).
     Concentrations are consistent for areas dominated by mobile sources (e.g., the Northeast and
     California) while isolated high concentrations generally coincide with significant point source emissions
     of benzene such as refineries and coking operations.
     Sites that show unusually high concentrations with no clear emissions sources, or sites with
     concentrations that are very different from other sites (e.g., the yellow circles in the map above), might
     be further investigated to determine the cause.
June 2009
Section 5 - Characterizing Air Toxics
49

-------
                 Spatial  Patterns -  Maps
                1,3-Butadiene Concentrations 2003-2005
       The largest circle on the map
       corresponds to 6.6 |jg/m3.
                                                             Concentration
                                                                   0.1
                                                                O  1
                                                               o
                                     10
                                < 50% Below Detection

                                50 to 85% Below Detection

                                > 85% Below Detection
    The ability to obtain 1,3-butadiene concentration measurements above the MDL across the United
    States varies (note all the red circles and their varying sizes).
    Higher concentrations generally coincide with locations of known point source emissions.
    Differences in monitoring methods and methods application have resulted in large differences in reported
    MDLs across the United States.
June 2009
Section 5 - Characterizing Air Toxics
50

-------
                 Spatial  Patterns  -  Maps
                Arsenic PM25 Concentrations 2003-2005
   The largest circle on the map
   corresponds to 0.0054 |jg/m3
                                                             Concentration (|jg/m )

                                                               •  0.001
                                                               O  0.01
                                                              O
                                     1
                               < 50% Below Detection

                               50 to 85% Below Detection

                               > 85% Below Detection
      Arsenic concentrations are widely measured across the United States, and the entire range of data
      availability is observed from more than 50% of data above detection to less than 15% above detection.
      Significant MDL differences between networks make determining spatial patterns difficult.
      In general, concentrations are higher and more often above detection in the eastern half of the country.
June 2009
Section 5 - Characterizing Air Toxics
51

-------
                Spatial  Patterns - Maps
             Manganese PM25 Concentrations 2003-2005
   The largest circle on the map
   corresponds to 0.15 |jg/m3.
                                                            Concentration (|jg/m )

                                                               •  0.001
                                                              O  0.01


                                                                 0.1
                                O
                                < 50% Below Detection

                                50 to 85% Below Detection

                                > 85% Below Detection
       In contrast to arsenic, manganese concentrations are widely measured across the country with
       most data recorded above the detection limit.
       Concentrations vary spatially and several "hot spots" can be identified that may lend themselves
       to additional investigation at a site level.
June 2009
Section 5 - Characterizing Air Toxics
52

-------
                 Spatial  Patterns  - Maps
          Benzene Risk-Weighted Concentrations 2003-2005
    Note:
    2003-2005 average
    concentrations are
    divided by the 1-in-a-
    million cancer risk
    concentration.
    Circle diameter
    represents this ratio
    while the chronic risk
    assessment is
    indicated by color.
    Sites at which >85%
    of data are below
    detection are
    considered
    unreliable (grey).
                                           Risk-weighted
                                           Concentration
                                                 1
                                              O  10
                                             o
100
                                              <1 in a million

                                              1 to 10 in a million

                                              10 to 100 in a million

                                              >100 in a million

                                              Unreliable
    Benzene risk associated with measured ambient concentrations is almost always above the 1-in-a-
    million cancer risk level across the United States. Many areas are also above the 10-in-a-million
    cancer risk.  These results are in good agreement with NATA 1999 results. The highest risk estimates
    are located in areas with significant point source benzene emissions.
June 2009
Section 5 - Characterizing Air Toxics
     53

-------
                Spatial  Patterns - Maps
      1,3-Butadiene Risk-Weighted Concentrations 2003-2005
                                                            Risk-weighted
                                                            Concentration
                                                                  1
                                                               O  10
                                                              O
                                     100
                                                              <1 in a million

                                                              1 to 10 in a million

                                                              10 to 100 in a million

                                                              >100 in a million

                                                              Unreliable
      Where measured reliably, 1,3-butadiene concentrations are almost always above the 1-in-a-
      million cancer risk level. Some areas do not measure concentrations well enough to evaluate risk
      (grey symbols). Highest concentrations are located in areas with known point source emissions
      (e.g., Houston and Louisville).
June 2009
Section 5 - Characterizing Air Toxics
54

-------
       Variability Within and  Between  Cities

                                  Overview

      •  A topic of interest for air toxics data analysis is assessing variability in
        concentration from site to site within a city. The aim of such analysis is to
        understand how representative a given site is with respect to air toxics
        concentrations in a city.
         - What is the variability of air toxics concentrations within cities and what are the
           implications for aggregating data at the city level?
         - Where do sites need to be located to accurately characterize variability within a
           city?
         - How many sites are needed to characterize spatial variability within a city?
         - How does within-city variability differ across cities?
      •  There may also be interest in  assessing variability in air toxics from city to
        city.
         - What are the concentration distributions across all monitoring sites?
         - Do specific cities, states, or regions have demonstrably higher or lower
           concentrations?
         - Do demonstrably lower concentrations occur at rural and remote sites?
         - Are concentration differences associated with monitoring agency differences?

June 2009                         Section 5 - Characterizing Air Toxics                            55

-------
       Variability Within  and  Between  Cities

                                Approach

     • To investigate within-city variation, a city of interest should have multiple
       monitors.  For example, for a national trend analysis, EPA required a city
       to have at least four monitors to be included in analysis.
     • Valid annual averages are calculated for each monitor in a city.  To
       reduce noise from year-to-year changes (e.g., the effect of meteorology),
       it is best to use multiple years of data when available. The national study
       used 2003-2005 data.
     • Data can be visualized using notched box plots by air toxic, city, and year.
       If variation  between years at a given city is minor, notched box plots by air
       toxic and city only can be constructed to increase the amount of data.
     • Advanced Plotting Techniques
        • Include a color-coded measure of the percent of data below detection to
          understand the reliability of the data.
        • Divide annual averages by the chronic exposure concentration associated with
          a 1-in-a-million cancer risk (or other risk level) to show variation in risk
          estimates within and between cities.
        • Include a measure of relevant emissions by city to explain possible reasons for
          high or low concentrations.


June 2009                       Section 5 - Characterizing Air Toxics                            56

-------
        Variability  Within  and  Between  Cities
                                        Example
   In the example, risk estimates have
   been used to provide a secondary
   layer of information.
   A single box in the figure contains
   one annual average for each
   monitor within the city; thus, each
   box represents intra-city
   concentration variation.
   The variability between cities is
   also represented by including
   multiple cities on the same plot.
   The within-city spatial variability of
   1,3-butadiene is usually less than a
   factor of 8 for the cities in the figure.
   1,3-butadiene variability between
   cities, however, can be greater than
   an order of magnitude.
   Emissions from major sources at a
   county level are generally higher for
   the cities with greater within-city
   variability and higher concentrations,
   but there are exceptions that could
   be explored.
                                  c/j
(U
o
c
CD
O
E
CD
£=
O
"CD
(U
o
c
o
o
  1,3-Butadiene Variability within and Between Cities
100
 10
   0.1
  0.01
1000000g
100000  I
10000   I
1000



100
10
1
0.1
0.01
   The figure shows benzene risk-weighted (1-in-a-million) annual average
   variation for 2003-2005 for selected U.S. cities along with non-mobile
   emissions. Notched boxes include annual averages for each monitor within a
   city, providing within-city variation. Dots over the notched boxes show the
   individual data points and whether they are above (blue) or below (red) the
   average MDL. Bars show county-level non-mobile emissions of 1,3-butadiene
   from EPA's AirData. The figure was created with SYSTAT11.
                                                          CD
                                                          m
                                                          3
                                                          U)
                                                          o
                                                          ^
                                                          C/)
June 2009
 Section 5 - Characterizing Air Toxics

-------
      Variability Within and Between  Cities

                     National Perspective

      At a national level, spatial variability within cities was found to be
      pollutant- (or pollutant group-) specific.
      Most toxic measurements are highly variable within cities; risk
      values span an order of magnitude within some cities.
      The spatial variability between cities is a good metric to estimate
      the variability within cities a priori. Spatial variability analysis helps
      set expectations for sampling in a new city.
      Cities with point source emissions (e.g.,  Houston) showed higher
      within-city variability than those dominated by area/mobile sources
      (e.g.,  Los Angeles).
      Some of the observed variability is due to differences in
      sampling/analysis method and method detection limit.
June 2009                     Section 5 - Characterizing Air Toxics                        58

-------
          Hot and Cold  Spot Analysis

                          Overview

      Hot and cold spot analysis is an investigation of sites
      with the highest and lowest concentrations.

      The objective of this analysis includes:
      - Data validation. The highest and lowest values may be due to
        some type of error, possibly reporting.
      - Comparison to the spatial conceptual model. Are the highest
        concentrations consistent with known sources, transport, and
        dispersion?
      - Risk screening. Where are the toxic concentrations highest?
June 2009                   Section 5 - Characterizing Air Toxics                      59

-------
           Hot and Cold  Spot Analysis

                             Approach

     • Create valid annual averages (see Preparing Data, Section 4) for
      each site and pollutant and rank each site by its concentration
      (highest to lowest). The number of high- and low-ranked
      concentration sites investigated depends on the number of available
      sites. At a national level, the 10 highest and 10 lowest ranking sites
      were investigated to illustrate the approach.
     • Map all sites, marking the highest and lowest ranked sites to
      investigate spatial variation.
     • Identify why high or low concentrations occur at those sites and
      whether the occurrence of those concentrations meets expectations.
       - Review metadata about the sites (e.g., Google Earth images, local
         emissions, and meteorology).  Do concentrations meet spatial
         conceptual models with respect to scale, sources, transport, and
         dispersion?
       - Inspect time series of concentration and MDL (e.g., is the value stuck,
         are data outliers driving the average, is the MDL higher than the
         concentrations at an average site?).

June 2009                      Section 5 - Characterizing Air Toxics                         eo

-------
            Hot  and  Cold  Spot  Analysis
                    Example - Benzene (1 of 2)
                                                           10 Highest Sites

                                                           10 Lowest Sites

                                                        O  Other Sites
              Alaska
    The figure shows sites with the 10 highest and 10 lowest benzene concentrations based on 2003-2005
    annual averages. Other monitoring sites are shown in yellow. The sites ranked lowest were either a
    result of data reporting or siting issues or were located in rural areas, consistent with our conceptual
    model of low concentrations.
June 2009
Section 5 - Characterizing Air Toxics
61

-------
           Hot and  Cold Spot Analysis
                   Example - Benzene (2 of 2)
    The sites measuring the
    highest concentrations in
    the nation were dominated
    by nearby point source
    emissions; the site
    identified in the figure
    measured the second
    highest benzene
    concentration in the
    nation.
    This site is very close to
    two refineries that emit a
    significant amount of
    benzene each year
    according to the NEI.
     . -v...
        - ---
                              Google Earth image of the site with the second highest benzene
                              concentrations in the United States. Refineries to the right and left emitted
                              84,000 and 44,000 Ibs of benzene in 2004 (NEI).
June 2009
Section 5 - Characterizing Air Toxics
62

-------
             Hot and  Cold  Spot Analysis
                     Example - Arsenic PM2 5
                                                               10 Highest Sites

                                                               10 Lowest Sites

                                                               Other Sites
               Alaska
  The figure shows sites with the 10 highest and 10 lowest arsenic PM25 concentrations based on 2003-2005
  annual averages.  Other monitoring sites are shown in yellow. Conceptually, we would expect Arsenic PM2 5
  concentrations to be highest in locations dominated by point source emissions, especially smelting and coal
  combustion. The highest sites are consistent with this conceptual model. The lowest sites are located in
  extremely remote locations such as Alaska and US national parks which is reasonable for the lowest arsenic
  PM25 concentrations.
June 2009
Section 5 - Characterizing Air Toxics
63

-------
               Urban  vs.  Rural  Analysis

                                 Overview

     •  Measured concentrations can be highly dependent on individual monitor
        locations, geography, emissions sources, and meteorological conditions
        (e.g., prevailing winds).
     •  Urban areas - conceptual model
        -  Urban areas contain sources of air toxics that result in increased concentrations
           and, in some cases, "hot spots" (areas with disproportionately higher
           concentrations) in the spatial pattern.
        -  Urban concentrations vary greatly from day to day due to the mix of local
           sources and meteorology.
     •  Rural areas - conceptual model
        -  Rural areas typically have fewer sources of air toxics. Air toxics concentrations
           that are transported from urban locations are typically near background levels
           when they reach rural areas (a function of source strength, distance, and the
           lifetime of the pollutant).
        -  Concentrations do not vary consistently day to day. Daily and seasonal patterns
           that are dependent on meteorological conditions may still be observed.
     •  Urban and rural sites that do not meet the expectations of conceptual
        models may indicate monitoring location effects or data errors or problems
        with the conceptual model.

June 2009                        Section 5 - Characterizing Air Toxics                            64

-------
                    Urban  vs.   Rural  Analysis


                                           Approach
       • Characterize each site as urban or rural.
          -  If available, start with EPA urban/rural designations as listed in AQS (note that these designations are not
             always up to date)
          -  Verify the designations using Google Earth—they may be outdated or incorrect
          -  Be wary of defining a site using population density, total county population, or other metrics—local knowledge
             of the site appears to be the best way to identify site characteristics.
       • Identify pollutant availability and time period for each site.
          -  The goal is to have a spatially representative mix of urban and rural sites measuring a pollutant over the same
             time period. This mix can be a challenge since toxics are more commonly measured in urban locations.
       • Choose pollutant/site combinations that are spatially and temporally representative.
          -  Pollutant-specific monitoring time periods need to be the same for site comparison; otherwise differences in
             observed concentrations could be biased by seasonal or inter-annual patterns.
       • Estimate valid 24-hr averages for the sites, pollutants, and time periods of interest.
          -  Characterize all concentration averages that are below the associated average MDL
       • Visualize the data by site by preparing plots of data distributions, including some measure of the
         data below detection.  Look for differences in concentrations.
       • Identify statistically significant differences in urban vs. rural site concentrations.
       • Summarize the results with a focus on neighboring urban vs. rural  sites.
          -  Which urban and rural sites measured significantly higher or significantly  lower concentrations, if either?
             Which showed no difference?
       • Investigate data that do not meet expectations (e.g., concentrations as a rural may be significantly
         higher than those at a nearby urban site).
          -  Are the sites representative of the area (i.e., compare to other urban or rural sites)?
          -  Are there monitor location abnormalities (e.g. local terrain, prevailing winds)?
          -  Are there measurement methods or MDL differences between the sites?
          -  Is there a significant rural emissions source?
          -  Are possible data errors or outliers driving the trend?

June 2009                                Section 5 - Characterizing Air Toxics                                     65

-------
                 Urban vs.   Rural  Analysis

          Example - Investigating Urban vs. Rural Sites (1 of 2)

        When beginning an urban vs. rural analysis, it is important to verify that sites are properly
        designated "urban" or "rural". This example is qualitative.
        The pictures below show a map of urban and rural NATTS sites across the United States along
        with Google Earth pictures of two of the rural sites—Grand Junction, Colorado, and La Grande
        Oregon.
        Both sites are designated as rural in AQS, but the Colorado site appears quite urban in
        character, and it is likely that air toxics concentrations will not conform to the model for a rural
        site.
        The Oregon site, on the other hand, is rural-based on the observation that the surrounding area
        is mainly farmland.
                                    Grand Junction, CO                 La Grande, OR
NATTS Sites-2006
                      • Urban Site

                      A Rural Sites
•Urban Sites
•E Providfiace. El
•Boston 'T-oxtarv). MA
•HH»Tnk.NY
•RochesE!. NT
•U'ailm^oa, DC
•Decanir. GA
•Turps, i L
Csimt. MI
•ChicasB. IL
•KoustotCDeetPiiit).
IX
•^-. Lotii.MO
•BauaifjLUT
•San Jose. CA
•KUEDZ. AZ
•SeMieWA
•Rural
•Vad^idll \T
•Hward, KY
•CbestHf eld, SC
•MnynUe. U1
•toji Jmctsa, CO
•Li Grande. OR.
•HaziiajL CounK. TX

                                   Two rural sites in the NATTS network. Images obtained from Google Earth.
June 2009
                  Section 5 - Characterizing Air Toxics

-------
                 Urban  vs.  Rural Analysis
          Example - Investigating Urban vs. Rural Sites (2 of 2)

       The figure shows benzene concentrations at a rural Vermont site compared to
       concentrations at two urban northeastern sites.
       The rural site shows statistically significantly lower concentrations.
       If a site does not fit an urban or rural definition as expected,  check for
- Measurement method or MDL differences
- Local emissions sources
- Time series comparing the two sites
  with color-coded data below detection.
- Evaluate data subsets when both sites
  have measurements above detection.
  Does this tell a different story?
                                                     Benzene
                                    Blue = above MDL
                                            low
                                           100.000
          E
          --^
          D
                                            10.000
                                             1.000
                                         g

                                         E   0.100
                                         (U
                                             0.010
                                             0.001
                                                     RURAL VT URBAN MA URBAN Rl
                                        The example figure is from an analysis of NATTS sites using 2003-
                                        2005, 24-hr average, benzene data. The box plots encompass all
                                        data while the overlaid dot density shows each data point and whether
                                        it is above or below detection (blue vs. red). It was produced in
                                        SYSTAT11.
June 2009
Section 5 - Characterizing Air Toxics
                                                                            67

-------
                           Spatial  Patterns
                                    Summary
      •  Analyses described in this section provide information about a variety of
        aspects of air toxics spatial variability and help analysts evaluate multiple
        conceptual models.
      •  Spatial patterns can provide information about sources, sinks, transport,
        and dispersion which are of interest for air toxics analyses.
      •  At a national level, the following spatial patterns were observed for air
        toxics.
         -  Benzene, 1,3-butadiene
             •  Concentrations vary around the United States and are high in urban areas. The
               highest concentrations of these two air toxics, however, are found in areas influenced
               by point source emissions in addition to mobile sources.
             •  Within- and between-city variability is generally near a factor of 5.
         -  Carbonyl compounds
             •  Carbonyl compounds are measured widely and show very consistent concentrations
               across the  nation.  This is due to the dominant secondary formation mechanism.
             •  Within and between-city variability is relatively low with few exceptions.
         -  PM2 5 metals
             •  The spatial character of PM2 5 metals is difficult to determine due to differences in
               measurement methods and MDLs among monitoring networks.
             •  Overall it seems that concentrations are slightly higher in the eastern half of the United
               States.

June 2009                           Section 5 - Characterizing Air Toxics                               68

-------
                   Risk Screening
                         Overview

     A key use of air toxics data is to compare annual
     average concentrations to health thresholds to put
     ambient levels into context.
     Risk screening can help identify air toxics of concern.
     Information to consider in conducting a risk screening is
     available, for example, in "A Preliminary Risk-Based
     Screening Approach for Air Toxics Monitoring Data
     Sets",
     For information on a more thorough air toxics risk
     assessment,  see the Air Toxics Risk Assessment
     Library:
June 2009                  Section 5 - Characterizing Air Toxics                    69

-------
                           Risk Screening
                                   Approach
                                                    Is 85% of data for this
                                                  site-pollutant below MDL?
For this first level of screening, site average concentration data from the
most recent year (s) (e.g., 2003-2005) were used to identify the number of
sites at which a pollutant was definitively above or below the relevant EPA
OAQPS chronic exposure concentration associated with a 1-in-a-million
cancer risk as found at: http://www.epa.gov/ttn/atw/toxsource/summary.html.
Results are ranked by screening level.
Air toxics were also noted if most
site concentrations could not be
characterized as above or below
the relevant risk level with certainty.
The figure shows steps
through a decision tree for
performing risk
screening.
                                                   Yes
                                             Is level of concern
                                              above MDL?
                Is site-average
               concentration above
                level of concern?
                                         Yes
                Yes
   The % of data below MDL
      listed in the first box
    may need to be stricter or
  less strict to meet your DQOs.
                                      Pollutant
                                    concentration is
                                     below health
                                    level of concern
 Site-pollutant is
  uncertain
                           Upper limit
                             of risk
                             lxlO-6
   Pollutant
 concentration is
above health level
  of concern
  Pollutant
concentration
is below health
  level of
  concern
   Risk
  >lxlO-6
   Risk
 
-------
                             Risk  Screening
                                       Example
                 Decreasing risk
    Concentrations above l-in-100,000
    cancer risk level at >25% of sites
Concentrations above l-in-1,000,000 cancer
      risk level at >50% of sites
Concentrations above l-in-1,000,000 cancer
      risk level at 10-50% of sites
             Benzene
           Acrylonitrile '
       Arsenic (PM2 5 and PM10)
          Acetaldehyde ''
        Carbon tetrachloride
          1,3-Butadiene
        Nickel • I''- i  •,-•'••!
 Chromium > M'  MI -.1 ;'"• \ i : .,-.  r
         Tetrachl oroethyl ene
      Cadmium (PM10 and TSP)
           Naphthalene
         1,4-Dichlorobenzene
          Benzyl Chloride
        This table displays only pollutants whose concentrations were monitored well
        enough to support a conclusion that they were above the relevant health levels of
        concern for pollutants for which at least 20 monitoring sites existed in the United
        States from 2003-2005.
        We are confident these cancer-risk pollutants are at or exceed the categories of
        cancer risk (i.e., may be higher, but  are not lower)
June 2009
     Section 5 - Characterizing Air Toxics
                                71

-------
                                  Risk   Screening
                                             Summary
         Risk screening results at a national level are provided in the following table.
         At a regional, state, or local level, results may differ.  This table provides a
         context for comparing local results.
          Higher confidence -
           chronic cancer risk
           (ordered by importance)
   Lower confidence -
   chronic cancer risk
    (ordered by importance)
High confidence -
chronic and acute
noncancer hazard
                Benzene
               Acrylonitrile
                Arsenic
              Acetaldehyde2
            Carbon tetrachloride
              1,3-Butadiene
                 Nickel*
               Chromium3
             Tetrachloroethene
               Naphthalene
                Cadmium
            1,4-Dichlorobenzene
              Benzyl chloride
     Ethylene dibromide
    1,1,2,2-tetrachloroethane
  1,2-dibromo-3-chloropropane
       Ethylene oxide
     Ethylene dichloride
     Hexachlorobutadiene
     1,2-dichloropropane
     1,1,2-trichloroethane
       Vinyl chloride
      Trichloroethylene
       Benzo[A]pyrene
    Dibenzo[A,H]anthracene
      3-Chloropropene
     Acrolein

 Local chronic hazard
   Formaldehyde
    Manganese
    Acrylonitrile
   1,3-Butadiene
      Nickel
June 2009
Section 5 - Characterizing Air Toxics
                            72

-------
                                            Summary
                               (1  of 2)
             Check List for  Ways to  Characterize  Air Toxics
                 Temporal Characterization

      The general procedure for investigating temporal patterns
      is the same for all aggregates.
       -  Prepare valid concentration and normalized temporal aggregates and
          summary statistics.
            • Normalization allows comparison between sites and pollutants even if
             absolute concentration values vary widely.
            • Keep track of the amount of data below detection.
       -  Plot data with notched box plots or line graphs of multiple statistics
          (e.g., mean vs. 90th and 10th percentiles) with confidence intervals.
       -  Characterize patterns by pollutant
            • Do patterns fit your conceptual model?
            • Are they statistically significant?
       -  Investigate unexpected results
      Diurnal patterns - If alternate sampling schedules are
      used, calculate the weighted average by the most
      representative sampling hour; otherwise, diurnal patterns
      may be obscured.
      Day-of-week patterns - Examine data  availability by day-
      of-week.
       -  If sufficient data exist for each day of the week, examine day-of-week
          patterns.
       -  If insufficient data exist, weekday vs. weekend groupings can be used.
      Seasonal patterns - Aggregate to the monthly level if
      sufficient  data exist.  Use quarterly averages if data are
      not sufficient or monthly patterns are too noisy.
      Compare what you have learned from  the different
      temporal aggregates. Do conclusions  make sense  in the
      larger temporal picture?
      For example, the diurnal pattern of formaldehyde suggests that
      concentrations are highly dependant on sunlight. This dependency is
      confirmed by the seasonal pattern, which shows higher concentrations in
      summer (i.e., more sunlight.
                                    Spatial Characterization

                      General spatial patterns
                        -  Create site level average values by pollutant for the time period of interest.
                           Make sure data are temporally comparable at all sites.
                        -  Investigate spatial variability by calculating and graphing summary
                           statistics of the site averages. The results provide overview information
                           about the magnitude of spatial variation.
                        -  Visualize spatial variability by creating maps of the site-level average
                           concentrations.
                            • Results will provide more specific information about the spatial gradients of air
                              toxics.
                            • Including supplementary data such as MDLs, remote background
                              concentrations, and cancer and noncancer risk levels provides a framework
                              for the observed concentrations.

                      Within- and between-city variation
                        -  Calculate valid annual averages for each site within a city that has more
                           than one monitor.
                        -  Create notched box plots of annual averages by city.
                            • Each box will contain one point for each monitor, so the box will indicate
                              within-city variability.
                            • Including multiple cities on one plot will provide a comparison of between city
                              variability.

                      Hot and cold spot analysis
                        -  Calculate valid annual averages for each site.
                        -  Rank the averages in order of concentration.
                        -  Using maps, compare sites with highest and lowest concentrations to all
                           sites.
                        -  Investigate data and metadata for the sites with highest and lowest
                           concentrations. Do concentrations make sense based on the metadata
                           and conceptual models?

                      Urban vs. rural site analysis
                        -  Verify the EPA urban/rural designation of each site using Google Earth.
                        -  Identify pollutant data availability and time period.
                        -  Create a data set of pollutant/site combinations that are spatially and
                           temporally representative.
                        -  Plot valid 24-hr average data as a notched box plots for neighboring urban
                           and rural sites.
                        -  Summarize the results and investigate sites that do not meet the
                           conceptual model of an urban or rural site.
June 2009
Section 5 - Characterizing Air Toxics
73

-------
                              Summary
             Check List for Characterizing Air Toxics
                 Risk Screening
        Create valid site average concentration data
        for the most recent years.
        Calculate the percent of sites above the
        selected risk level and the percent of data
        below detection.
        Follow the risk screening decision tree to
        identify the exposure risk for each pollutant.
        More advanced risk analyses should be
        performed by risk assessment professionals.
                          A Final Note on
                        Data Below Detection
                Most air toxics have enough data below
                detection to cause uncertainties and/or biases in
                aggregated data if not handled properly.
                Note, however, that it is not valid to remove
                these data because they are representative of
                true values on the lower end of the concentration
                spectrum; removal would cause even more
                significant positive biases.
                It is always important to know the amount of
                data below detection when looking at any data
                set. The effects of data below detection should
                be considered in all analyses.
                In national analyses, we did not draw
                conclusions when more than 85% of  the
                measurements of a pollutant was below
                detection.
June 2009
Section 5 - Characterizing Air Toxics
74

-------
                            Resources
       Statistical
        - StatSoft: Background on a variety of statistics

        - NIST Engineering Statistics: Background on a variety of statistics

        - SYSTAT: A graphical and statistical tool

        - Minitab: A graphical and statistical tool

       Emissions
        - EPA AirData: Air toxics emissions reports to the county level

        - National Emissions Inventory 2002: Emissions inventory for the United
          States; some Canada and Mexico data also available.

        - EPA Toxics Release Inventory (TRI): A variety of emissions data sets
June 2009                        Section 5 - Characterizing Air Toxics                          75

-------
                                                 References
       Bortnick S.M. and Stetzer S.L. (2002) Sources of variability in ambient air toxics monitoring data. Atmos. Environ.  36, 1783-1791 (11).
       Demerjian K.L. (2000) A review of national monitoring networks in North America. Atmos. Environ. 34,1861-1884.
       Fortin T.J., Howard B.J., Parrish D.D., Goldan P.O., Kuster W.C., Atlas E.L., and Harley R.A. (2005) Temporal changes in U.S. benzene
           emissions inferred from atmospheric measurements. Environ. Sci. Technol. 39,1403-1408 (6).
       Grosjean D., Swanson R.D., and Ellis C. (1983) Carbonyls in Los Angeles air-contribution of direct emissions and  photochemistry. Sci.  Total
           Environ. 29,65-85(1-2).
       Grosjean D. (1982) Formaldehyde and other carbonyls in Los-Angeles ambient air. Environ. Sci. Technol. 16, 254-262 (5).
       Hafner H.R. and McCarthy M.C. (2004) Phase III air toxics data analysis workbook. Workbook prepared for the Lake Michigan Air Directors
           Consortium, Des Plaines, IL, by Sonoma Technology, Inc., Petaluma, CA, STI-903553-2592-WB, August.
       Herrington, J.S., Fan, Z., Lioy P.J., and Zhang, J. (2007) Low acetaldehyde collection efficiencies for 24-hour sampling with
           2,4-dinitrophenylhydrazine (DNPH)-coated solid sorbents. Environ. Sci. Technol.  41  (2), 580 -585.
       Kao A.S. (1994)  Formation and removal reactions of hazardous air-pollutants. J. Air & Waste Manag. Assoc.  44, 683-696 (5).
       Main H.H., Roberts P.T., and Reiss R. (1998) Analysis of photochemical assessment monitoring station (PAMS) data to evaluate a reformulated
           gasoline (RFC) effect. Final report prepared for the U.S. Environmental Protection Agency, Office of Mobile Sources, Fuels and Energy
           Division, Washington, DC, by Sonoma Technology, Inc., Santa Rosa, CA, STI-997350-1774-FR2, April. Available on the Internet at
           .
       Main H.H. and Bortnick S. (2002) Temporal variability in ambient air toxics: implications to monitoring network design. Presentation at the
           Coordinating Research Council (CRC) Air Toxics Modeling Workshop, Houston, TX, February 26-27 (STI-2153).
       McCarthy M.C., Hafner H.R., Chinkin L.R., Touma J.S., and Cox W.M. (2005) Temporal variability of selected air toxics: a national perspective.
           Prepared for the United States Environmental Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park,
           NC,  and Sonoma Technology, Inc., Petaluma, CA. Available on the Internet at  last accessed
           September 2, 2005.
       McCarthy M.C., Hafner H.R., Chinkin L.R., and Charrier J.G. (2007) Temporal variability of selected air toxics in the United States. Atmos.
           Environ., doi:10.1016/j.atmosenv.2007.1005.1037 (STI-2894).
       McCulloch A. (2003) Chloroform in the environment: occurrence, sources, sinks and effects. Chemosphere 50,  1291-1308 (10).
       Seinfeld  J.H. and Pandis S.N. (1998) Atmospheric chemistry and physics: from air pollution to global change, J. Wiley and Sons, Inc., New
           York, New York.
       Singh H.B.,  Salas L., Viezee W., Sitton B., and Ferek R. (1992) Measurement of volatile organic-chemicals at selected sites in California.
           Atmos. Environ. Part a-General Topics 26, 2929-2946 (16).
       Spicer C.W., Buxton B.E., Holdren M.W., Smith D.L., Kelly T.J., Rust S.W., Pate A.D., Sverdrup G.M., and Chuang J.C. (1996) Variability of
           hazardous air pollutants in an urban area. Atmos. Environ.  30, 3443-3456 (20).
       US EPA Toxics Release Inventory (TRI) Explorer. Available on the internet at MM  ; ,,v ,  :r-i i  / mf x» i-n •  i /H i  n:ii;' v.
       US EPA National Air Toxics Trends Stations. Available on the internet at  ihi   v   ro,i  ;nv i; ••,,.  i   bfh, -\   .
       Vardoulakis S., Gonzalez-Flesca N., Fisher B.E.A., and Pericleous K. (2005) Spatial variability of air pollution in the vicinity of a permanent
           monitoring station in central Paris. Atmos. Environ.  39, 2725-2736 (DOI:10.1016/j.atmosenv.2004.05.067).
       StatSoft, Inc. (2005) The Statistics Homepage. Available in the Internet at  no  v  v sni  ;i ;..;- -.in/,;,c.-A<:,i;,:r-,•   r  i  .


June 2009                                         Section 5 - Characterizing Air Toxics                                                76

-------
        Quantifying and Interpreting
             Trends in Air Toxics
              Are air toxics concentrations changing?
          Are the ambient concentration changes in response
                  to changes in emissions?
June 2009
Section 6 - Quantifying Trends

-------
                   Trends  in Air  Toxics
               What3s Covered in This Section

     •  This section focuses on trends in ambient air toxics over time; diurnal
       and seasonal trends are discussed in Characterizing Air Toxics,
       Section 5.
     •  The following topics are addressed in this section:
        - Quantifying Trends
           • Overview of trends analysis
           • Setting up the data for trend analyses
           • Effect of changes in MDL on trends
           • Summarizing trends
           • Discerning and quantifying trends
              - Quantifying Trends
              - Visualizing Trends
           • Aggregating trends to larger spatial areas
        - Interpreting Trends
           • Evaluating annual trends in the context of control programs
           • Adjusting trends for meteorology (introductory)

June 2009                        Section 6 - Quantifying Trends

-------
                                   Trends  Overview
                                              Motivation
     Assessing trends is useful. Monitoring data are needed to track air toxics concentrations and their changes over time.
     One of the major programmatic objectives for air toxics measurements is providing data to track progress toward
     emission and risk-reduction goals. The ability to detect trends in ambient concentrations that are associated with planned
     air quality control efforts is needed to assess the effectiveness of emission control programs.  For example, if specific
     control strategies have been implemented in an area to reduce emissions of tetrachloroethylene from dry cleaners, do
     the ambient data indicate that concentrations have decreased since the implementation of the control?

     Visual inspection of trends is important.  Air quality data typically do not fit a normal distribution. The data tend to be
     skewed and exhibit a few high concentration events. Thus, trends in extreme values in a data set may differ significantly
     from trends observed in a statistic that describes the bulk of the data. Different statistical metrics can be examined to
     look for trends. For example, the annual maximum pollutant concentrations can be plotted to assess how annual peak
     days are changing over time, or the median concentrations can be plotted to assess how the 50th percentile of the days
     are changing.  In addition, to assess a trend in air quality, representative data are required to estimate a trend that is
     meaningful.

     Understanding the data uncertainties is necessary.  Uncertainties impact our ability to clearly discern air quality
     trends and distinguish between "real" changes and artifacts.  For example, measurement accuracy, interferences, and
     the amount of data above method detection limits, need to be understood to properly interpret the data.

     Obtaining consensus (or weight of evidence) among results from different approaches increases our certainty
     in the observed trends. Quantifying and interpreting trends can be complicated (e.g., there are many different
     methods).  The analyst needs to understand methods for quantifying trends and determining their statistical significance.
     When several  different approaches or "looks" at the data point to the same conclusion, confidence in the conclusion is
     increased. The analyst also needs to be able to communicate the results in a meaningful and understandable way.
     Interpretation of trends from  site level to larger scales, such as city-wide or regional scale, needs to be done with care.
     Some site and pollutant combinations may be dominated by local sources or comparisons between some sites may not
     be reasonable because of large differences  between sampling methods.

June 2009                                    Section 6 - Quantifying Trends

-------
                 Trends Overview
                  Analysis Questions
     Are concentration levels changing at a monitoring site?
     Are changes consistent across sites, areas, or regions?
     Are changes consistent across pollutants or pollutant
     groups?
     Are changes consistent across time periods?
     Are changes consistent with expectations (e.g., emissions
     controls, changes in population)?
June 2009                   Section 6 - Quantifying Trends

-------
       Setting  Up Data for Trend Analysis
                           Overview

     Steps to prepare data for trend analysis:
      - Acquire and validate data (covered in Preparing Data for
       Analysis, Section 4)
      - Identify and treat data below detection in preparation for annual
       averages (covered in this section)
      - Create valid annual averages or other metrics for trends
       (subannual data averaging is covered in Preparing Data for
       Analysis, Section 4)
      - Create valid site-level trends (covered in this section)
June 2009                      Section 6 - Quantifying Trends

-------
           Setting  Up  Data  for Trend Analysis


                          Identifying  Censored Data

     •  Data are typically reported as a concentration value with an accompanying method detection limit
       (MDL).  In AQS, the MDL is either a default value associated with the analytical method (MDL) or a
       value assigned by the reporting entity for that specific record (alternate MDL).
     •  NATTS program guidance suggests that laboratories report all values, regardless of the MDL.
       However, many air toxics data are reported as censored values; i.e., they have been replaced with
       zero, MDL/2, or MDL (or some other value).
     •  Identifying censored values is a helpful first step in treating data below detection. Reporting of
       censored data will most likely differ among sites and may even be different by method, parameter
       or time period for a given site.  For this reason it is recommended that censored data analyses be
       carried out for each site, parameter, and method, and temporal variability should be considered.
     •  Data may be identified and separated at or below the detection limit along with the associated MDL
       and date/time; if alternate MDLs are available, it is recommended they be used rather than the
       default MDLs.
     •  Data may be examined for obvious substitution. Count the number of times each value at or below
       detection is reported at a given site, parameter, and method. Are the majority of data reported as
       the same value (e.g., zero or MDL/2)?
         - If data are largely reported as two or more values, investigate the temporal variation of the data.  Are there
           large step changes where reporting methods or MDLs have changed?
         - Do the duplicate values indicate a typical censoring method (e.g., MDL/2, MDL/10)?
         - Alternate MDLs may be different for each sample run causing a distribution of values if MDL/x substitutions
           were used.  Just  because values below MDL are not all the same does not mean they are not censored!
     •  Check for MDL/X substitution.
         - Make a scatter plot of the value vs. MDL to see if the data fall on a straight line.
         - If the data do form a straight line, the slope of the regression line will indicate the value by which the MDL has
           been divided.
                Is the value a reasonable number that would be used for MDL substitution (e.g., 1,2,5 or 10)?
                 -  If the data have been formatted, processed or converted, ratios may not be exactly the same due to rounding differences; the
                   distribution should be close to a straight line and centered around a single integer if MDL/x substitutions have been made.
                 -  If a bifurcated pattern is observed, the substitution method may have changed over time. Plot a time series of the ratios and look for
                   step changes.
               The distribution of the ratios should be highly variable if the data are not censored.

June 2009                                 Section 6 - Quantifying Trends

-------
        Setting  Up  Data for  Trend  Analysis

             Treating Data Below Detection (1 of 2)

     •  Following are suggested steps to create averages:
        - If uncensored values (i.e., NOT zero, MDL/2, or MDL) are reported below
          MDL, use the data "as is" with no substitution.
        - If uncensored values are not available, substitute MDL/2 for data below
          MDL or use more sophisticated methods as described in Section 4.
        - If there is a mix of censored and uncensored data,
           • In data sets with a mixture of censored and uncensored data, two substitution
             methods can be compared: (1) MDL/2 substitution for censored values and
             leave uncensored values "as is" and (2) MDL/2 substitution for all data below
             detection
           • If results are in the same direction using both substitution methods, confidence
             in the results is increased and substitution method 1 should be retained. If the
             results do not agree, a more sophisticated method for estimating the data
             below MDL should be employed.
        - For all data sets, identify the percentage of data below MDL for each year
          in the trend period. It is important to keep track of how much data are
          below detection to better understand possible biases in the average.
          Even if censored values are not used, keep a record of this information to
          provide one measure of the uncertainty in the results.

June 2009                          Section 6 - Quantifying Trends

-------
         Setting  Up  Data for  Trend Analysis

               Treating Data Below Detection (2 of 2)


        Each annual average should have an associated calculation of the percent below
        detection. These data provide information about the biases of the annual average
        when data are below detection.
        When assessing trends over time for a pollutant,
         - Assess trends at all sites regardless of the percent of data below MDL. Note, however,
           that data are below detection for many site/pollutant combinations. To avoid over-
           interpretation of observed trends, it is recommended the trend values and their associated
           percent below detection be visually inspected. Consider trends at sites where at least half
           of the years for a given trend period have at least 15% of their measurements above MDL
           for that year.
        For the national level analyses, a 15% "cut-off" was selected based on review of a
        small data set with most data above detection.  Bias in the annual average was
        investigated for this  data set across a range of percent of data below detection. At
        15% below detection, the bias  in the annual average was 10-40%. A more
        stringent cut-off may be required if less bias is desirable.
         - For example, if a 5%  concentration change was observed but all years have greater than
           85% data below detection, the analyst cannot be sure whether this change is real or an
           effect of data below detection.  In other words, the uncertainty masks the possible change.
        In all cases, the percent below MDL should be considered as  a possible source of
        bias when interpreting site level trends.
June 2009                            Section 6 - Quantifying Trends

-------
        Setting  Up  Data  for Trend Analysis

               Creating Valid Annual Averages

     Data averaging is fully covered in Preparing Data for Analysis, Section 4,
     and summarized here for convenience.
      • Subdaily data should first be aggregated to valid 24-hr averages. For a given day,
        75% of data at the expected subdaily sampling duration is suggested for a valid
        24-hr average.
      • 75% of data at the expected daily sampling frequency is suggested for a valid
        calendar quarter average.
Frequency
Daily
Every 3rd Day
Every 6th Day
Every 12th Day
Unassigned
75% Quarterly
Completeness Cutoff
68
23
11
5
5
        At least 58 days are suggested between the first and last sample in a quarter to
        ensure that sampling represents the entire quarter
        Data for 3 of 4 quarters are suggested for annual averages prepared from quarterly
        averages to ensure that sampling represents the entire year. Some air toxics
        concentrations show significant seasonal variations.
June 2009
Section 6 - Quantifying Trends

-------
        Setting Up Data for Trend Analysis

                      Creating  Valid  Trends

     Trends are investigated for a unique combination of parameter,
     monitoring location, and method code.
      • Initially, it is important to segregate method codes for a given parameter
        and monitoring location to assess differences (e.g., biases, detection
        limits) that might result in comparability issues.  In addition, methods
        may change over time, perhaps causing significant analytical biases that
        may affect trends assessments. After investigating individual trends,
        e.g. by method, further aggregation may be reasonable (discussed later
        in this section).
      • At a given monitoring location, sometimes more than one monitor
        reports the same  pollutant, known as a collocated measurement.  When
        collocated measurements are made, data from each monitor are
        differentiated in AQS using POCs.
         Collocated measurements should be investigated individually as outlined in
         Preparing Data for Analysis, Section 4. If agreement between collocated
         measurements is good, the data may be averaged for a given parameter, site,
         date, and method in order to avoid double-counting. At the national level, these
         data were not used.


June 2009                         Section 6 - Quantifying Trends                           10

-------
          Setting  Up  Data for Trend  Analysis

                 Trend Length and  Completeness
        Length and completeness criteria may be used to ensure that trends are
        representative of the time period of interest and that data are consistent for
        intercomparison among sites.
        When choosing these criteria,  analysts should strive to strike a balance
        between maximizing available data and creating valid trends in the period of
        interest.
        It is easier to discern underlying trends over long time periods.
        More stringent constraints result in a reduction of available data. For
        example, by selecting longer trend periods, fewer sites will be available for
        analysis because longer continuous operation is required. On the other hand,
        shorter trend periods are subject to more variability, for example, because of
        changes in meteorology which often obscure underlying trends.
Decreasing
        N-Hexane
      1,3-Butadiene
         Benzene
     X
i
                                            Increasing
                                 ryyyyyy/yyyyy/yyyyyyyyyyyy/yyyyyy/yyyyA
                                    =F
                  -20
                      10
                  I

                  20
                                                    0
                                                    34
                                                    74

                                                    12
                                                    39 '
                                                    77 :

                                                    17 :
                                                    61 :
                                                    125
                                                              c

                                                              13"
                                                1990-200510th-90th Percentile
                                                1995-20051 Oth-90th Percentile
                                                2000-200510tri-90th Percentile
                                                Meti\sn% Change per Year
                   -10        0
                    Percentage Change per Year
In the example, three trend periods were investigated: 1990-2005, 1995-2005, and 2000-2005. Only 17 sites in the United States collected
benzene data over the 1990-2005 sampling period that met the completeness criteria. In contrast, data from 125 sites met the completeness
criteria for the shorter 2000-2005 trend period. Variability for shorter trend periods is much higher.
June 2009
           Section 6 - Quantifying Trends
                                                   11

-------
       Setting  Up  Data for Trend Analysis

            Trend Length and Completeness

      Trend Length
      - One goal of the NATTS is to provide data with a minimum trend
        length of six years to be able to compare two 3-yr averages.
      - Of course, other trend periods are acceptable!
      Trend Completeness
      - Of the number of data years in a trend period, at least 75% is
        suggested for a site to be included (e.g., for a six-year trend
        period, at least five years of valid annual averages are
        suggested).
      - Trends with data gaps of more than two years should not be
        used.
June 2009                    Section 6 - Quantifying Trends                        12

-------
           Setting  Up  Data  for  Trend  Analysis
                     Example -  Creating  Valid  Trends
  This example illustrates why looking at trends by
  method code is important.
   •  Figure (a) shows all annual averages for arsenic
      PM2 5 at a site, color-coded by method. Solid lines   ^
indicate annual averages and dashed lines show
average MDLs.
Figure (b) shows the trend (blue) and average MDL
(pink) for all data at a site regardless of method (i.e.,
the same data as in Figure (a) connected into one
trend). This produces a statistically significantly
increasing trend.
Figure (c) shows the results if data are partitioned by
method.  Only data with method 831 are reserved
because this method is the only one to have a trend
period greater than four years.  The results show a
statistically insignificant decreasing trend, opposite
the result obtained using all data.
Which trend result is "right"?
 - The statistically significant trend in Figure (b) is driven
   by the lower concentration values in  1996-1998. The
   measured  concentrations between 1996 and 2000
   may be representative of ambient concentrations;
   however, inconsistencies in sampling method and
   MDLs cast doubt on the comparability of this data to
   post-2000  data.
 - In the end  we cannot be sure which trend is "right";
   more advanced analyses of the data should be
   undertaken if time permits. At a national level, trends
   could not be individually quality-controlled so they
   were partitioned by method to reduce inconsistencies.
                                                   o
             0.0016
             0.0012
                                                     0.0008
                                                     0.0004
                                                   O
                                                   o
                      Arsenic PM7^ Annual Averages
                                                       •Method 800
                                                       •Method 801
                                                       •Method 802
                                                       Method 831
                                                   All Data
                                                        1994
                                                     0.00161	
                                                               1996
                                                                      1998
                                                                            2000
                                                                                   2002
                                                                                          2004
                                                                                                 2006
                                                   3 0.0012
                                                   C
                                                   o
                                                   re o.ooos
           0)
           o
           o
           o
                                                     0.0004
                                                             Annual Average Concentration
                                                   All Data
                                                      i-fcuci
                                                        1994
                                                     0.0016
                                                               1996
                                                                     1998
                                                                            2000
                                                                                   2002
                                                                                         2004
                                                                                                2006
                                                   i 0.0012

                                                   C
                                                   o
                                                   *= 0.0008 ]
           C
           0)
           o
           c
           o
           o
                                                     0.0004
                                                           -••Annual Average Concentration
                                                Method 831
                                                          (c)
                                                        1994
                                                               1996
                                                                      1998
                                                                            2000
                                                                            Year
                                                                                   2002
                                                                                          2004
                                                                                                2006
June 2009
Section 6 - Quantifying Trends
                                                                                                13

-------
         Setting  Up  Data  for Trend Analysis

          Evaluating the  Effect of Method Changes


      •  Due to the large number of data included in the national air toxics analysis, the effect
        of changes in measurement methods and MDLs on trends could not be assessed on
        a site-by-site basis.
      •  During more localized analyses, such differences may be investigated; not all method
        changes need to be considered separately. Data may be retained across
        comparable method changes  in order to create the longest trend periods possible.
      •  Assessing the comparability of methods will be a case-by-case analysis; no one
        procedure will provide the answer, but the following is a good start:
         -  Plot all available annual averages and associated average MDLs, color-coded by method for
            each air toxic (as in Figure (a) on the previous slide);  tabulate the percent of data below
            detection by year.
         -  Visually assess method changes for unusual patterns in average concentration and MDL.
         -  If MDL changes occur,  investigate the percent of data below detection to determine if MDL/2
            substitutions are driving the difference. Keep in mind the percent of data below detection and
            effect of MDL/2 substitutions for subsequent analyses.
         -  Examine trends in air toxics data that are not expected to change significantly between years
            (e.g., carbon tetrachloride); significant jumps in annual average concentrations for these air
            toxics may indicate a problem.
         -  Compare pollutants measured by the same methods  that are expected to vary together (e.g.,
            benzene and toluene) and look for discontinuities.
         -  Investigate collocated data together, if available. In some cases, a measurement method
            may have changed in the primary monitor, but not in the secondary monitor. Look for
            changes in the relationship in concentrations between the monitors.

June 2009                             Section 6 - Quantifying Trends                                 14

-------
            Effect of Changes  in MDL on

                  Trends Assessment

      Another important consideration in preparing data for trend
      analysis is that detection limits can change over time for a given
      monitoring site, parameter, and method.  At a national scale,
      some detection limits change by orders of magnitude.
      These changes may influence annual averages, particularly if
      MDL substitutions are used. Similar trends between MDL and
      annual average concentrations may indicate that the changes in
      MDL are strongly influencing the annual average trends.
      It is recommended that the analyst inspect the trends in MDL in
      addition to the trends in concentration, especially for air toxics with
      concentrations close to the MDL (i.e., within a factor of 10).
      More sophisticated statistical analysis may be needed to quantify
      the underlying influence of the MDL changes on the ambient
      concentrations.  Such analysis has not yet been performed on the
      national data set.
June 2009                      Section 6 - Quantifying Trends                         15

-------
               Effect of Changes in  MDL  on
            Trends Assessment Example a of 2)
                0.004
               0.0035
             _ 0.003
                        1 Average Concentration
                        •Average MDL
                   1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
                                          Year
      In the national level investigation of manganese (Mn) trends, we noted that MDL trends were similar to
      concentration trends. The clear correlation between the two trend lines makes us suspicious of the reliability of
      the overall ambient trend. This example shows average Mn PM25 concentrations and MDLs from 1990 to 2003.
      For this data set, Hyslop and White (2007) showed that reported MDLs are much lower than actual detection
      limits. Current recommendations are to be cautious with data within a factor of 6 to 10 of the reported MDL. The
      trend shown here may not be a real trend—these data may all be below detection.
June 2009
Section 6 - Quantifying Trends
16

-------
          Effect of Changes in  MDL on

        Trends Assessment Example
         ro
         is
         c
         01
         o

         o
         o
                    Benzene 1997-2006 Trend
                              y = -0.02x + 48.63

                                 R 2 = 0.88
             1996
                                          2006
       In contrast to the previous Mn PM25 trend, this benzene trend

       does not show influence from a change in MDL (i.e., the trends in

       concentration and MDL show different patterns).
June 2009
Section 6 - Quantifying Trends
17

-------
                        Quantifying  Trends
                                   Approach
      •  Initial investigation of trends
         - Inspect first and last year of the trend period or two multi-year averages for
           change.
         - Use simple linear regression to determine the magnitude of a trend over the trend
           period.
      •  Quantifying trends
         - The percent difference between the first and last year of the trend period provides
           a rough, first cut, sense of the change.
         - The difference between two multi-year averages provides another measure of
           change and helps smooth out possible influences of meteorology.
         - The percent change per year is provided by the slope of the regression line. This
           "normalized" value allows the analyst to compare changes across varying lengths
           of time (i.e., sites with different trend periods).
      •  Testing the significance of the observed trends
         - Calculate the significance of the slope using the F-test (see next slide). The
           F-test provides a statistical measure of the confidence that there is a relationship
           between  the two variables (i.e., the regression line does not have a slope of zero
           which would indicate that the dependent variable is not related to the independent
           variable).
         - Other methods can be employed to test for significance including t-tests,
           nonparametric tests (tests for and estimates a trend without making distributional
           assumptions such as Spearman's rho test of trend; Kendall's tau test of trend),
           and analysis of variance.
June 2009                            Section 6 - Quantifying Trends                               18

-------
                        Quantifying Trends

            Interpreting Linear Regression Output

            Example output from a linear regression of annual average benzene
            concentrations (performed in Excel) is provided:
Slope
-0.3943
RA2
0.794456
F-Statistic
30.92103
Intercept
789.562




% Change
-69.241021


P-value

% Change Per Year
-6.2946382


Confidence level
99.946575
                                                   This example output shows
                                                   a decline in annual average
                                                   benzene concentrations over
                                                   time with 95% confidence and
                                                   slope not equal to zero.
            The output is interpreted as follows:
             • Slope, intercept, % change, % change per year, R2. Indicate the slope of the line,
              y-axis intercept, % change between first and last year of the line, % change divided by
              number of years, and fraction of variation accounted for.
             • F-statistic or F-ratio. F-ratio is used to test the hypothesis that the slope is 0. The F-ratio
              is large when the independent variable(s) helps to explain the variation in the dependent
              variable. Therefore, large F-ratios indicate a stronger correlation between the two
              variables (i.e., the slope of the regression line is NOT zero).
             • P-value. The P-value is the probability of exceeding the F-ratio when the group means
              are equal (generally, 95% confidence is used as a cutoff value, corresponding to a P-
              valueof 0.05).
            Microsoft Excel and SYSTAT11 are two of many software programs that can
            calculate the F-test.
June 2009
Section 6 - Quantifying Trends
19

-------
CO
 E
o
I
"c
0)
o
o
o
       1.5
       0.5
                       Quantifying  Trends
                Statistical Significance Example
                   Benzene Annual Average
1995
                  1997
                     1999
   2001
Year
2003
2005
                                                            Site 1
                                                            Site 2
                                                            Linear (Site 1)
                                                            Linear (Site 2)
                              Y = -0.0639X + 128.9
                                  R2 = 0.72
                            P-Value = 0.002 (significant)
                                                             Y = -0.0223X + 46.09
                                                                 R2 = 0.056
                                                           P-Value = 0.6 (not significant)
       This example shows benzene trends at two sites. Both sites show a linear regression with a negative
       slope, but only Site 1 shows a statistically significant decrease. At Site 2, a decrease in
       concentrations is apparent, but the change is not statistically significant (i.e., failed F-test).
June 2009
                           Section 6 - Quantifying Trends
                                                  20

-------
                Visualizing Trends
                         Overview

     Visual inspection of trend data is vital!  A linear fit to a
     trend may not be appropriate; for example, a step
     change may have occurred due to a major emissions
     regulation or a nonlinear or exponential fit may be more
     appropriate.
     Methods for visualizing the data include
      - Line graphs of selected  indicators
      - Box plots (high and low  values, median values, outliers)
      - Plots of mean or median values with confidence intervals
      - Combination of a map and temporal information
June 2009                    Section 6 - Quantifying Trends                       21

-------
                          Visualizing  Trends
                                     Line Graphs
      It is sometimes useful to break a
      long-term trend into shorter time
      intervals because of significant
      changes in emissions. Trends should
      be individually and visually investigated.
      For example, benzene in gasoline was
      significantly reduced in several urban
      areas starting in the mid-1990s when
      reformulated gas (RFG)  was introduced.
      Dramatic reductions were observed in
      ambient benzene concentrations
      over this time period.
      Both plots contain the same data.
      If one trend line is used,  the overall
      trend decreases. If two trend
      lines are segregated by the RFG
      year (1995), the benzene concentrations
      are relatively flat before and after RFG
      implementation.
      In this case, the difference between the
      two time periods may be a better
      quantitative reflection of how benzene
      concentrations have changed.

    0)
    O

    O
    O

   O
   C
   O
   O
                                                          Benzene Annual Averages
     4

     3
   5 2
     0
      1992
1993
1994
1995
   _ 4
     0
         y = 0.0851x-166.54
            R2 = 0.369
                             y = -0.2248x + 450.9f
                                R2 = 0.5492
1996
1997
1998
1999
2000
                              y = 0.1253x-248.65
                                 R2 = 0.5038
      1992
1993
1994
1995
 1996

Year
1997
1998
 1999
 2000
      The figure shows the same benzene annual averages fitted with
      regression lines in two ways. The first fits all data with one regression
      line and the second takes into account a large step change that
      occurred from regulations put into effect in 1995. The figure was
      created in Microsoft Excel.
June 2009
Section 6 - Quantifying Trends
                                              22

-------
                       Visualizing  Trends
                   Using  Other Statistical Metrics
      We are typically interested in
      air toxics annual average
      trends because the annual
      average is used for
      comparisons to levels of
      concern for chronic health
      effects. Guidelines for
      preparing annual averages
      were provided previously.
      In addition to an annual
      average, other statistical
      indicators can be used to
      verify a trend.
       - These include median,
         maximum, minimum, and
         selected percentiles.
       - These metrics are especially
         helpful in identifying effects of
         censored data below detection.
    25
    20
              Formaldehyde Annual Averages
          •95th Percentile of Annual Concentration
          •Annual Average Concentration
          •Annual Median Concentration
          •5th Percentile of Annual Concentration
     1999
2000
2001
2002
Year
2003
2004
2005
     This figure, showing formaldehyde annual data with various
     statistical measures, demonstrates that the annual pattern
     in concentration is relatively consistent. 2002
     concentrations were low and there is no consistent trend
     over this 1999-2005 time period.
June 2009
Section 6 - Quantifying Trends
                                       23

-------
                      Visualizing Trends
                                  Box  Plots
        Box plots are another useful
        way to display multiple
        statistical metrics and
        visually asses statistical
        significance.
        Box plots illustrate the
        trends in the high and low
        values, interquartile ranges,
        median, and confidence
        intervals of the annual
        average.
        The box plots displayed
        here are described in
        Characterizing Air Toxics
        Section 5.
            Formaldehyde Annual Averages
   03
   -i—<
   C
   8
   C
   O
   O
35

30

25

20

15

10

 5

 0
1998
                           i          i     i
                 o
                                     x
                                     X
           1999 2000 2001 2002 2003 2004 2005 2006
                         Year
    The figure shows annual formaldehyde concentrations represented as
    box plots. The variability is similar from year to year since the boxes
    for each year are about the same height.  Concentrations in 2002 were
    statistically significantly lower than in other years because the
    confidence intervals do not overlap any other year.
June 2009
Section 6 - Quantifying Trends
                                             24

-------
                        Visualizing  Trends
                       Using  Confidence Intervals
Confidence intervals (CIs) are shown
around the annual averages for
several years of data.
Since the plotted CIs overlap in 1999
and 2001 but not in 2000 and 2001,
1999 and 2001  concentrations are
not significantly different, but 2000
and 2001 concentrations are
significantly different.
CIs are a function of fewer samples
resulting in large CIs.  Air toxics data
sets are typically small (i.e.,  only a
few samples per month); thus, CIs
help analysts understand the range
in which the annual mean
concentration can statistically fall.
Cl is computed  as follows:

                  — - -* £•
                         n
                                        12
                                                     Formaldehyde Annual Average
                                        10
                                      n
                                        8
                                      o
                                      I6
                                      +j
                                      c
                                      a)
                                      o 4
                                      o
                                      O
                                        0
                                                                    -A-Annual Average Concentration
                                                                    Error bars represent 95% confidence
                                         1998   1999   2000  2001
                             2002
                             Year
2003   2004   2005   2006
                                    where x is the mean value, a is the
                                    standard deviation, n is number of
                                    samples, and z* is the upper (1-C)/2 critical
                                    value (use a look up table for the %
                                    required) for the standard normal
                                    distribution.
June 2009
Section 6 - Quantifying Trends
                   25

-------
                          Visualizing  Trends
                         Including Underlying Data
       In this example, a trend for each parameter, site, and
       method was plotted next to the underlying data. The
       figures show annual averages with standard
       deviations in blue and average MDLs in pink. The
       underlying data include the average MDL, percent
       below MDL by year and calculated regression, and
       F-value statistics as well as percent change per year.
       Figure (a) is an example of a benzene trend for the
       1995-2005 trend period. In the plot, we can see that
       data are mostly above detection and show a
       statistically significant decreasing trend of about
       5% per year.
       Figure (b) shows arsenic PM25 data. Calculations
       indicate a statistically significant increasing trend of
       20% per year. If these statistics were used alone,
       they would indicate a serious arsenic problem at this
       site. When the underlying data are examined though,
       it is clear that there may be other factors to consider.
       The first two years of data are 100% below detection,
       resulting in values that are entirely MDL/2-substituted.
       The values for these years may, in fact, be
       significantly lower and should not simply be
       discarded; we cannot tell from the current data. This
       trend should be considered suspect and validated by
       comparison with neighboring sites; the summary
       statistics should not be trusted as accurate values.
C34 - *
A B 1 C |
1 Ye.il .ivy v.il stdev < bel
2 m U'-HE 0180699
3 1997 0.671623 0.273198
4 1998 0.5B4B97 0189567
5 1999 0.511101 0.196835
6 2000 0.503381 0.201701 1.75
7 2001 0.329B9B 0 152473
8 2002 0.427127 0.165272
5 2003 0.470485 0.183091

D E

F G

H



1
iw MDL MDL slope inteicept "« change
0 0 053
0 0.053
22~ -0.040080631
223
0 0.053223
0 0.053223
3B59B5 0.053
0 0.053
223
223
0 0.053223
0 0.047901
in 2004 0.369374 0 19144 O.B47457B27 0.053
II 2005 0.37147 0.182926
1"1
1 j
14
15
16
17
18
19
20
21
-]-i
23
24
25
26
28
29

1 -,
0.9

-------
                          Visualizing  Trends
           Calculating  Trend Period Percent Change
                                                          3 j
                                                        o
                                                        "ro
There are many methods for calculating trend-period
percentage change.  Four such methods are listed below
along with the associated percentage change that would
result from applying each method to the benzene data
pictured at right:
 1.  Using the first and last measured data point (-40.43%).
 2.  Using the regression equation (-57.12%).
 3.  Using all values before and after a step change (-55.29%).
 4.  Using three-year averages before and after a step change (-53.71 %).
In method 1, there is no sense of the underlying pattern
for all years of interest, and the results are affected by
the differences in the meteorology of the chosen years.
Method 3 is a better measure of the percentage change
because it isolates the two data points having the most      2
impact on the overall trend, but requires visualizing the
data first.                                             1 ^
Methods 2 and 4 use values that are weighted by more
years of data within the trend period, providing more
smoothing of variability from meteorological fluctuations.
There is no right method for calculating trend results, but
knowledge of possible biases of each is important when
deciding which to use.
                                                               Benzene Annual Averages
                                                                              y = -0.2248X + 450.95
                                                        
-------
               Summarizing Trends
                         Overview

      Investigate trends among sites by pollutant.
      - Similar trends results among the sites makes a compelling
        argument that change on a larger spatial scale has occurred.
      Characterize the spatial distribution of trends by
      showing trends at each site on a map.
      - Trends may not agree nationally in direction or magnitude but
        may show spatial patterns of interest.
      Characterize the distribution of individual site trends by
      displaying the range of percentage change per year
      over various trend periods and for all sites meeting
      minimum trend criteria.
June 2009                    Section 6 - Quantifying Trends                      28

-------
                         Summarizing  Trends
                                Trends Among Sites
      Site-level trend investigation is vital!
      The figures show site-level trends for benzene from
      two U.S. sites; average MDLs are plotted in pink for
      reference.
      The top figure shows a statistically significant
      decreasing trend, while the bottom figure shows a
      statistically insignificant decreasing trend.
      Confidence in these results is high. The data are
      mostly above detection, MDLs are consistent for the
      whole trend period, and no outliers appear to
      influence the trend.
      If any of these problems do exist, the underlying
      trend data should be evaluated more carefully to
      understand the reliability of the trend.
      Next steps in investigating suspect trends
       - If one or more annual averages are an outliers, re-
        validate the underlying data.  Is one high concentration
        event the cause, or is there a distribution of high values?
        Is there an explanation for the high annual average to
        prove it valid (e.g., increased local source emissions) or
        in error (e.g., unit conversion error)?
       - If MDL changes occur and
          • A low percentage of data is below detection, the change in MDL
           should not have a noticeable effect.
          • A high percentage of data are below detection, there is
           decreased confidence in the trend. If MDL/2 substitutions is used
           check that the trend does not follow the shape as the MDL
           changes; if it does the trend is likely unreliable.
       - If a high percentage of data is below detection without an
        MDL change, the central tendency of the data may still be
        accessible, but there is lower confidence in the trend.
                ^2.5


                1 2
                c
                ° ^ 5
                v I .O
                5

                1 1
                o
                       A statistically significant decreasing
                                 benzene trend
                                                 * Annual Average
                                                 " Average MDL
    y = -Q.16x+314.62
      R =0.90
                    1999
       2001
   2003
Year
2005
                  1.8

                  1.6

                1" 1.4

                5 1.2


                I  1
                5 0.8
                *J

                § 0.6

                  0.4

                  0.2
A statistically insignificant decreasing
          benzene trend






• Annual
« Averag

            y = -0.01x +26.18
              R2=0.03
                                                           2000
                                 2002
                       2004
June 2009
Section 6 - Quantifying Trends
                                                                               Year
                      2006
                                      29

-------
                     Summarizing  Trends
               Example - Spatial Distribution 
-------
                   Summarizing Trends

              Example -  Spatial Distribution (2 of 2)
                      Site Level Percentage Change per Year
                          for the 2000-2006 Trend Period
                                                            Chromium PM2 5
                                                             Change per Year
                                                                O  10
                                                               O
                                    100
                                                              Increasing

                                                              Decreasing

                                                              Increasing. Insignificant

                                                              Decreasing. Insignificant
        This example shows chromium PM25 concentrations across the United States in 2000 to
        2006. The statistically significant trends are spatially distinct, indicating increasing
        concentrations in the eastern half of the country and decreasing concentrations in the West.
June 2009
Section 6 - Quantifying Trends
31

-------
                             Summarizing   Trends
              Example - Percentage  Change  per  Year
    We are typically interested in how a pollutant trend at a
    site compares to other sites. Summarizing the data in this
    way provides a succinct national perspective.
    The bar chart summarizes trends in % change per year
    for selected mobile source air toxics for 2000-2005 data.
    The 10th, 50th, and 90th percentile of site-specific            Styrene
    percentage change per year are plotted. The number of
    sites included in percentile calculations is also provided.      Toluene
    A range of results is seen across the network (i.e., 10th to
    90th percentile sites); however, most sites are             Benzene
    experiencing declines of a few % per year with
    remarkable consistency (see median); "outlier" (e.g., 95th  1,3-Butadiene
    percentile) sites may be candidates for additional
    investigation.
    1,3-butadiene and styrene show a wider range of %
    changes by site. The median U.S. monitoring site,
    however, shows a trend of about -5%, in agreement with
    the other mobile source air toxics.
    Benzene and toluene show similar ranges in % change
    per year and less variability in trends across the U.S. than
    1,3-butadiene and styrene.
    Toluene is decreasing at 90% of sites by about 2% to
    12% per year, while benzene is decreasing at most sites
    and may be increasing at some sites.
    The map shows the site-specific % change values for
    benzene used in the bar chart, similar to the proportional
    maps shown previously. The magnitude of the change
    per year is characterized by the size of the arrow.
    Information as to whether the trend was statistically
    significant  is indicated by the color of the arrow.
    Comparing data summaries, such as the bar chart, to
    more detailed plots, such  as the map, offers an overview
    of the data. The map shows the spatial distribution of
    data included in the summary statistics.  For example,
    benzene is increasing in some areas of the United States,
    but none of the trends are statistically significant. Many of
    the decreasing trends, on the other hand, are statistically
    significant.
                     Decreasing
                       Increasing
                                                                   84 £
                                                                      3
                                                                   119*
                                                                      ^

                                                                   125 %
                                                                      (/>

                                                                   77 ®
                                                                   ''
                 -20
 -10         0         10
Percentage Change per Year
20
                        2000-2005 10th-90th Percentile Percentage Change
                        Median % Change per Year
                                                       Percentage Change per Year
                                                       Decreasing Trend, Significant

                                                         0 - 8% per year

                                                         8 - 25% per year


                                                         25-100% per year


                                                       Decreasing Trend, non-signlfican!

                                                        b 0 - 6% per year

                                                       ^ 8-25% per year


                                                       \|^ 25 - 1PO% per year


                                                       Increasing Trend, non-slgnifican!

                                                        T< 0 - 8% per year

                                                       ^ 8 - 25% par year


                                                       ^25-100% per year
June 2009
Section 6 - Quantifying Trends
                                             32

-------
                    Aggregating Trends to

                    Larger Spatial  Regions

      • Aggregated trends for larger spatial regions, such as trends by state or
       EPA Region, may be of interest to communicate results at a "big picture"
       level to interested stakeholders.
      • Previous examples provide approaches to handling data at an aggregate
       level at spatial resolution less than the national scale, including
       summarizing percent change by year, using central tendency statistics,
       and plotting results on a map.
      • As data sets become smaller—i.e., the analyst looks at fewer sites and
       fewer years—gaps in the data record become more important.  For
       example, some site-level trend periods may meet the minimum criteria
       but will still have gaps in the  data. Problems arise when, in combining
       data sets, a site, especially one measuring high or low concentrations,
       has missing data during some time periods.
      • To handle these data gaps, the following steps are recommended.
        - For general site-level analyses, these gaps should be left as-is.
        - While not done at a national level, when aggregating to larger spatial regions,
           data gaps could be filled in, using the following methods, to be consistent with
           current trends analyses performed for criteria pollutants:
           •  Missing the last year: set the missing year equal to the second-to-last year.
           •  Missing the first year: set the missing year equal to the second year.
           •  Missing any other year:  interpolate between the adjacent two years.
           •  No more than two years in succession can be missing (this was applied in the national analyses).
June 2009                          Section 6 - Quantifying Trends                              33

-------
                  Aggregating Trends
                Example - Using Line  Graphs
       Line graphs can be used to
       assess trends in selected
       indicators.
       National benzene trends
       (annual average
       concentrations) from 2000-
       2005 are summarized in the
       graph. Sites included in the
       summary are shown in the
       inset map.  These types of
       summary displays are useful in
       showing general trends for
       multiple sites such as
       nationally (shown here).
                      90 percent of sites are below this line.
                           1
         10 percent of sites are below this line.
       00
01      02     03      04
  2000 to 2005: 17% decrease
05
                                        Monitoring Network
                                         * NATTS
                                         O UATMP
                                         A Other
                                                           Puerto Rico
June 2009
Section 6 - Quantifying Trends
                                                 Line graph figures were created with
                                                 Grapher/; maps were produced in Arcmap.
                              34

-------
                         Accountability
                            Overview (1 of 2)
     • The term accountability in this section is used to refer to tying annual
       trends in pollutant concentrations to control programs.
     • Changes in air quality may be due to a number of factors.  Trends in air
       quality can provide evidence that local, regional, or federal emissions
       controls have successfully reduced ambient concentrations of pollutants
       harmful to human health.
     • Analysis should  bring as much information to bear on interpretation of
       trends as possible including evaluation of other potential sources of the
       compound in question as well as regulations, and meteorological
       influences that may impact emissions.
     • The evaluation of the impacts of regional control programs (those that
       affect multiple states) and local control programs (those that affect an
       urban area) on air quality is complicated and is stepwise and site- and
       pollutant-specific.
     • A major challenge in this type of analysis is the scale of influence of a
       control and  of the impact of that control on air quality. Previous
       investigations of ambient air quality changes encountered the confounding
       influences of multiple controls applied within similar time frames and at
       different spatial scales.
June 2009                         Section 6 - Quantifying Trends                             35

-------
                      Accountability
                         Overview (2 of2)
      Use caution - Matching trends to changes in emissions is not
      sufficient to prove that an emission change actually caused the
      ambient change.
      Emissions regulations are typically phased in over a period of
      years, causing a gradual change in ambient concentrations; other
      factors such as meteorology, local source profiles, and MDL
      changes may also explain changes. The use of supplementary
      data (e.g., investigating trends in a pollutant not expected to be
      influenced by the emission change) is necessary to be sure
      observed changes are truly emissions-related.
      Two approaches to a trends accountability analysis can be  taken
      depending on the availability of information: an emission control
      approach (bottom up) and an ambient data approach (top down).
June 2009                       Section 6 - Quantifying Trends                          36

-------
                            Accountability
                          Bottom-Up Approach
      •  Select a control measure.
      •  Identify the air toxics expected to be affected and the available data, other controls
        that might have affected the pollutants, and other pollutants that may have been
        affected.
      •  Consider the spatial scale, or zone of influence (ZOI), of the control measure. Was
        the control applied at a single facility (monitor-specific or fence line), at an urban
        scale (MSA-wide), national scale (e.g., 49-state automobile emission rules), or
        global scale (e.g., Montreal protocol)?
      •  Determine the timing and magnitude of the changes. Was the control phased in
        over a period of time, applied to specific emitters?  Phasing in a control makes it
        more difficult to discern the relationship between the ambient concentration
        change and the control change.
      •  Consider the magnitude of the expected air quality changes relative to the
        variability in the ambient data.  If the inherent variability in the ambient data is very
        large, a small change in emissions may not be observable.
      •  Select the appropriate statistical metrics or approach for the analysis. Data
        treatments may help reduce the variability in the data so that trends can be
        observed.
      •  Develop hypotheses of expected changes, identify supporting evidence of
        changes, and investigate corroborative evidence of the changes. It is often helpful
        to test for changes in data sets or pollutants in which changes were not expected
        (i.e., check the null  hypothesis).

June 2009                            Section 6 - Quantifying Trends                               37

-------
                           Accountability
                          Top-Down Approach
      •  Quantify the change observed in the ambient data. This approach could also be
        applied to a pollutant for which a change was not observed but expected.
      •  Identify and assess other data sets and sites that may have also been affected by a
        similar control measure or emission change to understand the spatial scale of the
        ambient change. If the control was applied across a broad area, changes at
        additional sites might be expected.
      •  Identify potential emissions changes or control measures that could have
        contributed to the ambient trends. Local knowledge is often a key component of this
        part of the analysis.
      •  Compare  the control measure implementation schedule with the ambient trends. Do
        the timing of the control implementation and the change in ambient concentrations
        coincide?
      •  Investigate corroborative evidence of the change and test for changes in pollutants
        for which a change was not expected.  It is important not to over-interpret changes
        in ambient data.
    Once methods have been developed for air toxics, it may be useful to apply
    meteorological adjustments to the pollutant trend. The goal is to reduce the effect of
    meteorology on ambient concentrations so that the underlying trend in emissions can be
    more readily observed. The impact of meteorology is critical when trying to assess the
    trend in toxics that are formed secondarily in the atmosphere (in addition to being emitted
    directly from sources, e.g. formaldehyde). Meteorological adjustments for air toxics have
    not yet been developed.

June 2009                           Section 6 - Quantifying Trends                                38

-------
                       Bottom-Up  Example
         Tetrachloroethene Controls in  Los Angeles
        -—
        |2

        §
         o
        O
           0
                                          • Burbank
                                          A North Main Street, Los Angeles
                                          *l_ong Beach
                                                                  Local rule to phase
                                                                    out emissions
                                                                  completely by 2020
                                                 1-in-a-million Cancer risk
            1991  1992 1993 1994 1995 1996 1997 1998 1999 2000 2001  2002 2003 2004 2005 2006 2007

        Tetrachloroethene is the chemical most widely used by the dry cleaning industry, with over 85% of facilities using it
        as the primary cleaning agent. In 1993, the EPA promulgated technology-based emissions standards to control
        tetrachloroethene emissions from dry cleaners.
        The MACT standards implemented in 1993 resulted in drastic reductions in tetrachloroethene concentrations in the
        Los Angeles area where monitoring data have been available from three sites since 1992.
        Trend lines show the reductions over time in average ambient concentrations.  Although concentrations in the Los
        Angeles area are still above the cancer risk level of concern, exposure to this air toxic has been reduced by about
        80% in the past 15 years.  In addition, the local South Coast Air Quality Management District implemented a rule
        to phase out tetrachloroethene emissions completely by 2020.
June 2009
Section 6 - Quantifying Trends
39

-------
                       Bottom-Up  Example
         Ozone Precursor Controls in Baltimore,  MD
0)
D)
C
re

O
•+•»

0)
B
              20
               0
             -20
                      Pftase /
                      simple
Phase I
complex
                                   Phase II
                                   complex
	27%
                                      VMT

                                     'Benzene

                                      Toluene
                         RFC
                      implementation
                         dates
             -40
             -60
             -80
                1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007

        Air toxics, such as benzene and toluene, that are emitted by motor vehicles are significant contributors to ozone
        formation.  Reformulated gasoline (RFC) was introduced in the United States in phases to reduce motor vehicle
        emissions of benzene and other ozone precursors in order to reduce ambient ozone concentrations.
        Benzene and toluene concentrations decreased after the 1995 implementation of RFC despite an increase in the
        number of vehicle miles traveled by cars and trucks in the Baltimore area.
        The largest part of the decreases in benzene and toluene concentrations is directly attributable to the
        implementation of RFC; the more steady, few percent change per year observed in latter years is likely due to
        fleet turnover (i.e., newer cars with lower emissions replacing older, more polluting vehicles).
June 2009
   Section 6 - Quantifying Trends
                                                                              40

-------
          National  Level  Top-Down  Example

                                      Method

      •  The hypothesis is that if pollutants are emitted by the same source, emissions should
        covary over long  time scales. In other words, trends should be parallel if normalized.
      •  At a national level, the goal was to identify covariant trends in MSATs as an indicator of
        sites dominated by mobile source emissions.
      •  Site-specific trends for six MSATs (benzene, 1,3-butadiene, toluene, ethylbenzene,
        o-xylene,
        m-&p-xylenes) were investigated using carbon tetrachloride as a control.
      •  Trends were normalized by the maximum annual average concentration within the trend
        period by site and pollutant (i.e., annual average concentrations each year were divided
        by the highest annual average in the time period for each pollutant and at each site).
        Normalization creates a data set that is easier to compare across sites and pollutants
        and shows the relative change in concentration.
      •  Linear regression was used to create trend lines for each pollutant.
      •  The sites were visually grouped into various categories by  the behavior of pollutant
        trends. For example, if all MSAT trends had a similar slope, we expect the change in
        concentration at that site to be a consequence of mobile source reductions. If one MSAT
        exhibited a very different slope than the others, we would conclude that another source of
        that pollutant impacting the site was likely.
      •  For this analysis, only the site and parameter were required to be consistent over the
        trend period (method and POC were allowed to float between years). Sites with more
        than five annual averages were included.
      •  Sites were then investigated using Google Earth to see if our hypotheses were correct.

June 2009                            Section 6 - Quantifying Trends                                 41

-------
         National  Level  Top-Down  Example
                                   Output
    Example output from
    site illustrates results
    of this analysis
     •  Due to normalization,
       maximum values are
       always = 1.
     •  The slopes of the
       MSATs are close to
       parallel.
     •  Carbon tetrachloride's
       slope (dashed line) is
       very different (flatter)
       than the MSATs.
c
o
O
o
o
E
3
E
'S
re
 1 -
  0.8
0.6
  0.4
I 0.2
o
(0
                              0
        * Benzene
        A M-&P-Xylene
          Toluene
        X Carbon Tetrachloride
       	Linear
       	Linear
       • • • Linear
           Ethylbenzene)
           O-xylene)
           Carbon Tetrachloride)
Ethylbenzene
O-xylene
1,3-Butadiene
•Linear (Benzene)
• Linear (M-&P-Xylene)
•Linear (Toluene)
•Linear (1,3-Butadiene)
                       X
                              1988  1990  1992  1994  1996  1998  2000 2002  2004  2006  2008
June 2009
    Section 6 - Quantifying Trends
                                                  42

-------
          National  Level  Top-Down  Example
                          Normalized Site-Specific Regression Lines
 At the monitor in the top
 example, all MSATs show
 a similar declining slope.
 Investigation of the
 monitoring location
 indicates that this site is
 primarily mobile source-
 dominated (it is located
 very near a major
 freeway).

 The second example
 shows similar slopes for
 all MSATs except
 1,3-butadiene and
 benzene. Benzene shows
 a much slower decline in
 concentration than the
 other MSATs while
 1,3-butadiene shows a
 slightly faster decline.
 This monitor is  located
 near a large refinery with
 both benzene and 1,3-
 butadiene emissions
 which may explain this
 divergent behavior.
June 2009
re
   0
    1995   1997
    Benzene
    Ethylbenzene
    M&P-Xylene
    O-Xylene
    Toluene
    Carbon Tetrachloride
    1,3-Butadiene
             Section 6 - Quantifying Trends
43

-------
        National Level Top-Down Example

   Spatial Characterization of Trend Profile "Signatures"

    • Visual inspection of the slopes of trends provides useful
      information on the covariance of pollutant concentrations over time.
    • The percentage change in concentrations per year can also be
      plotted on maps for each pollutant shown in the scatter plots to
      spatially investigate the trends profiles.
    • Mobile source signatures have MSAT profiles of similar
      magnitudes; other signatures have increasing or varying
      magnitudes among the pollutants.
         1.3-Butadiene

         Ethylbenzene

         MP_Xylene

         0_Xylene

         Toluene

         Benzene

         C Tet
                  Mobile source
    1,3-Butadiene
Benzene
Noncova riant
June 2009
Section 6 - Quantifying Trends
                       44

-------
  Mobile Source Signature

  Mobile Source Signature (MDL issues for 1)

  Mobile Source with Slower 1,3-Butadiene Decline

  Mobile Source with Slower Benzene Decline

  Non-covariarvt

  Other
   | 1,3-Butadiene

	J Ethylbenzene

|   | MP_Xylene

|   | O_Xylene

|   | Toluene





  | Counties
 California:  Mobile

Source Signatures

      Most California profiles
      are flat (i.e., similar
      magnitude trend for
      each MSAT), indicating
      the relative dominance
      of mobile source
      emissions on these
       ites.
      Alsta note that carbon
      tetrafchloride is not an
      MSAT and should not
      covary wjth the others
      (which it doles not).
                                                                                      45

-------
        National Level Top-Down  Example

                               Summary

     •  The top-down approach is a useful way to investigate site-level trends of
       pollutants commonly emitted by the same source.
     •  Most sites in the United States conformed to our expected mobile source
       trend profile.
     •  The technique also allows identification of sites at which trends do not
       conform to expectations.  For example, two mobile source-like signatures
       were identified at most of the remaining sites
        - 1,3-butadiene signature sites showed shallow or increasing 1,3-butadiene
          (possible measurement issues?).
        - Benzene signature sites showed shallow or increasing benzene (likely
          explained by nearby point-source emissions for some sites but was not clear
          for others).
     •  Some sites showed increasing trends or noncovariant trends in multiple
       MSATs. Nearby emissions sources may be influencing trends at these
       sites, and they may be good candidates for case study analyses of other
       emissions sources.
     •  The top-down approach may be applicable to other pollutants from mobile
       sources (CO, NOX, black carbon) or other emissions sources of multiple
       co-emitted pollutants.

June 2009                         Section 6 - Quantifying Trends                           46

-------
        Meteorological  Adjustment of Air Toxics

                        Introductory Thoughts

     • Meteorology can impact air quality.
        - Meteorology can vary significantly among years (e.g., El Nino), and meteorology
          can have a considerable effect on air quality.
        - To understand changes in air quality that are attributed to emission controls, we
          need to be able to adjust the data to account for meteorological conditions that
          were very different from average conditions.
        - By properly accounting for the portion of the variability in the data attributable to
          changes in meteorology, we can compare air quality among years with widely
          different meteorological conditions.
        - This assessment is important because we do not have control over
          meteorological changes.
     • Using meteorological adjustment of air toxics is still being explored.
     • Application of meteorological adjustment is likely at site-level, and each
       site and pollutant will need to be treated discretely.
     • In preliminary investigations, meteorology accounted for 15-25% of total
       variability for benzene and lead (tsp) at selected sites; meteorological
       adjustments smoothed trends; and meteorological trends adjustment
       appeared to be important for interpretation of trends in benzene and lead
       (tsp) and may be important to other air toxics as well. More investigation is
       needed to finalize an approach for meteorological adjustment.

June 2009                          Section 6 - Quantifying Trends                              47

-------
                    Resources
          Tools Available for Trend Analysis
   • Examples in this section were created with
      - Arclnfo and ArcView 
      - SYSTAT
      - Grapher
      - Microsoft Excel
   • Air toxics guidance
      - http://www.epa.gov/ttn/fera/risk atra main.html
   • Computing 95% upper confidence limit (95%
     UCL) for use in risk assessment
      - ProUCI 4.0 available at
        http://www.epa.gov/nerlesd1/tsc/software.htm

June 2009                  Section 6 - Quantifying Trends                    48

-------
                         Trends   Summary
       • Setting up data for trends analysis.
           - Acquire and validate data. See Preparing Data for Analysis, Section 4, for a complete
             discussion.
           - Identify censored data. Separate data at or below detection for each parameter, site and
             method.
              • Count the number of occurrences by value. Do the values indicate a specific substitution method?
              • Make scatter plots of data below detection vs. the detection limit for each value.  The slope of the line
                will  indicate the denominator if MDL/x substitutions were used, even if alternate MDLs are available.
           - Treat data below detection.
              • If uncensored values are used, include them "as is".
              • If censored values are used, substitute MDL/2 or use a more sophisticated method as appropriate.
              • If a  mixture of censored and uncensored data is used, compare the methods of all substituted vs. only
                censored substituted to see if results agree. If not, more advanced methods to treat data below
                detection may be necessary.
           - Calculate valid annual averages. See Preparing Data for Analysis, Section 4, for a complete
             discussion.
           - Create valid trends.
              • Segregate trends by parameter, site and method.
              • Consider and apply trend completeness criteria depending on data needs.
                  -  Minimum trend length of 6 years
                  -  75% yearly completeness within trend period
                  -  Data gaps longer than 2 years not allowed
              • Consider yearly aggregated percent of data below detection.
                  -  Look at all data regardless of percent below detection
                  -  Remove trends where more than half the  year's data are less than 15% of data above detection


June 2009                                   Section 6 - Quantifying Trends                                        49

-------
                          Trends  Summary

        •  Quantifying Trends
            -  Magnitude of change
                   Use simple linear regression to calculate first and last year values to determine the percent change over the trend period.
                   Calculate percent change per year for intercomparison of trend periods.
            -  Significance of change
                   Quantify the statistical significance of the slope using the F-test.
                   Typically, a trend is considered significant at or above the 95% confidence level.
            -  Visualize trends; always include annual percent below detection as a measure of
               uncertainty.
                   Line graphs
                   Box plots
                   Spatial representations
            -  Summarize trends
                   Characterize the distribution of percentage change per year for all sites and investigate mean, median and percentiles.
                   Characterize the spatial distribution of the percentage change per year.
                   Look for consensus in results among methods.

        •  Accountability - tie annual trends to control programs
            -  Acquire background information on  control programs; compare this information to site-level
               metadata keeping in mind local sources, site location etc.
                   Implementation date or time period
                   Pollutants affected and expected magnitude of reduction
                   Types of sources affected
            -  Acquire emissions inventory data
                   Toxics release inventory data (TRI) (does not include mobile source emissions!)
                   National emissions inventory data (NEI)
            -  Compare ambient data to emission  inventories and control programs—correlation is not
               enough to prove causation
                   Compare similar pollutants that should experience concentration reductions resulting from the control programs.
                   Compare similar pollutants that should NOT experience concentration reductions for the control program.

June 2009                                    Section 6 - Quantifying Trends                                          so

-------
              Additional  Reading
         Meteorological Adjustment Techniques

     Methods for adjusting pollutant concentrations to account
     for meteorology
       - Expected peak-day concentration (California Air Resources
         Board, 1993)
       - Native variability (California Air Resources Board, 1993)
       - Filtering techniques (e.g., Rao and Zurbenko, 1994)
       - Probability distribution technique (Cox and Chu, 1998)
       - Classification and Regression Tree (CART) analysis (e.g.,
         Stoeckenius, 1990)
       - Linear regression (e.g., Davidson,  1993)
       - Nonlinear regression (e.g., Bloomfield et al., 1996)
June 2009                     Section 6 - Quantifying Trends                        51

-------
            Additional Reading
      Meteorological Adjustment Techniques for
            Ozone and Particulate Matter
    PAMS ozone adjustment techniques,
    Thompson M.L., Reynolds J., Lawrence H.C., Guttorp P.,
    and Sampson P.O. (2001) A review of statistical methods
    for the meteorological adjustment of tropospheric ozone.
    Atmos. Environ. 35, 617-630. Available on the Internet at

    Data Quality Objectives for the Trends Component of the
    PM Speciation Network (includes meteorological
    adjustment techniques in Appendix),
June 2009                  Section 6 - Quantifying Trends                    52

-------
                                            References
     Battelle Memorial Institute and Sonoma Technology, Inc. (2003) Phase II air toxics monitoring data: analyses and network design
        recommendations.  Final technical report prepared for Lake Michigan Air Directors Consortium, Des Plaines, IL by Battelle
        Memorial Institute, Columbus, OH, and Sonoma Technology, Inc., Petaluma, CA, December.
     Bloomfield P., Royle J.A., Steinberg L.J., and Yang Q. (1996) Accounting for meteorological effects in measuring urban ozone levels
        and trends. Atmos. Environ. 30, 3067-3077.
     Bortnick S., Coutant B., Holdren M., Stetzer S., Holdcraft J., House L,  Pivetz T., and Main H. (2001) Air toxics monitoring data:
        Analyses and network design recommendations. Revised Draft Technical Report prepared for Lake Michigan Air Directors
        Consortium, Des Plaines, IL, by Sonoma Technology, Inc., Petaluma, CA and Battelle Memorial Institute, Columbus, OH, October.
     Cox W.M. and Chu S.H. (1998) Cox-Chu meteorologically-adjusted ozone trends (1-hour and 8-hour):  1986-1997. Web page prepared
        for Center for Air Pollution Impact and Trend Analysis (CAPITA), Washington University, St. Louis,  MO. Available on the Internet at
        . October.
     Davidson A. (1993) Update on ozone trends in California's South Coast Air Basin. J. Air& Waste Manag. Assoc. 43, 226-227.
     Hafner H.R. and McCarthy M.C. (2004)  Phase III air toxics data analysis workbook.  Workbook prepared for the Lake Michigan Air
        Directors Consortium, Des Plaines, IL, by Sonoma Technology, Inc., Petaluma,  CA,
        STI-903553-2592-WB, August.
     Hyslop, N. and White, W. (2007) Interagency Monitoring for Protected Visual Environments (IMPROVE) Detection Limits. Presented
        at the Symposium on Air Quality Measurement Methods and Technology, Air and Waste Management Association, San Francisco,
        CA, May 2.
     Kenski D., Koerber M., Hafner H.R., McCarthy M.C.,  and Wheeler N. (2005) Lessons learned from air toxics data:  a national
        perspective. Environ. Man. J.,  19-22.
     McCarthy M.C., Hafner H.R., Chinkin L.R., and Charrier J.G. (2007) Temporal variability of selected air toxics in the United States.
        Atmos. Environ. 41 (34), 7180-7194 () (STI-2894). Available on the Internet at
        .
     Rao ST. and Zurbenko I.G. (1994) Detecting and tracking  changes in ozone air quality. J. Air& Waste Manag. Assoc. 44, 1089-1092.
     Stoeckenius T. (1990) Adjustment of ozone trends for meteorological variation. Presented at the Air and Waste Management
        Association's Specialty Conference, Tropospheric Ozone and the Environment,  Los Angeles, CA, March 19-22.
     Thompson M.L., Reynolds J., Lawrence H.C., Guttorp P., and Sampson P.O. (2001) A review of statistical methods for the
        meteorological adjustment of tropospheric ozone. Atmos. Environ.  35, 617-630.
     U.S. Environmental Protection Agency (2003) National  air quality and emissions trends report, 2003 special studies edition. Prepared
        by the Office of Air Quality and Standards, Air Quality Strategies and Standards Division,  Research Triangle Park, NC, EPA
        454/R-03-005. Section 5 available on the Internet at .


June 2009                                         Section 6 - Quantifying Trends                                              53

-------
            Advanced Analyses
         What else can I do with my air toxics data?
June 2009
Section 7 - Advanced Analyses

-------
                 Advanced Analyses

               What's Covered in This Section?

      •  This section is an overview of selected advanced data analysis
        techniques that may be useful in further understanding air
        toxics data.
      •  Discussion of each of these topics could fill an entire workbook;
        a discussion is provided of the motivation behind using these
        techniques and the  reader is  referred to available
        documentation for further information.
      •  Not all of these analyses have yet been thoroughly applied to
        air toxics data, but approaches that have been applied to PM2 5
        and PAMS VOC data, for example, should be applicable to air
        toxics data sets.
      •  The following topics are covered
        - Source apportionment
        - Trajectory analysis
        - Emission inventory evaluation
        - Model evaluation
        - Monitoring network assessment

June 2009                      Section 7 - Advanced Analyses

-------
                  Advanced Analyses

                              Motivation

      After basic data validation and "display and describe" analyses
      have been performed, more can be done with the data if sufficient
      resources (e.g., time, expertise) are available and more
      sophisticated analyses are needed because  basic analyses did
      not sufficiently answer questions.
       •  Source Apportionment. Understanding the sources impacting your
         monitors can be explored with source apportionment techniques and
         tools.
       •  Trajectory Analyses. In addition to better understanding high and
         low concentrations, source apportionment results can be enhanced
         with trajectory analyses.
       •  Evaluation of Emissions Inventories and Models.  A primary goal
         of national monitoring networks is to compare  ambient data to
         emission inventories and model output. These evaluations can lead
         to improvements in the inventories and model  performance.
       •  Network Assessment. The pollution sources impacting a site,
         nearby demographics, and monitoring purpose can change overtime.
         EPA's air toxics monitoring plan includes regular network
         assessment.

June 2009                        Section 7 - Advanced Analyses

-------
                      Source  Apportionment


                                      Why Perform?


         •  Also known as receptor modeling, source apportionment is defined as a specified
            mathematical procedure for identifying and quantifying the sources of ambient air pollutants
            at a monitoring site (the receptor) primarily on the basis of concentration measurements at
            that site.
         •  Source apportionment relates source emissions to their quantitative impact on ambient air
            pollution.
         •  Receptor models can be used to address the following questions:
             - What emissions sources contribute to ambient air toxics concentrations?
             - How much does each source type contribute?
             - Which sources could be targeted with control measures to effect the highest reduction of air toxics
               concentrations (or risk)?
             - What are the discrepancies between emission inventories and sources identified by receptor models?
             - Are known control strategies affecting the source contributions to air toxics?
         •  When performing source apportionment, the analyst should be aware of uncertainties and
            limitations.
             - Many emitters have similar species composition profiles. The practical implication of this limitation
               is that one may not be able to discern the difference between benzene emitted from light-duty vehicles
               (LDV) versus benzene from gasoline stations or refineries.  One solution to this problem is to add
               additional species to reduce collinearity. These profiles might help to qualitatively identify mobile
               sources.
             - Species composition profiles change between source and receptor.  Most source-receptor models
               cannot currently account for changes due to photochemistry. Since carbonyl compounds such as
               formaldehyde and acetaldehyde have significant secondary sources, current methods cannot link these
               compounds to their primary emission sources.
             - Receptor models cannot predict the consequences of emissions reductions. However, source-
               receptor models can check if control  plans achieve their desired reductions using historical data.


June 2009                                Section 7 - Advanced Analyses

-------
                      Source  Apportionment


                Single-Sample and Multivariate Models


         Receptor models are classified into two types: single-sample or multivariate.
          •   In single-sample models, the analysis is performed independently on each available
             pollutant.
              - The simplest example of this is the "tracer element" method, in which a particular property (e.g.,
                chemical species) is known to  be uniquely associated with a single source. In this case, the impact of
                the source on the ambient sample is estimated by dividing the measured ambient concentration of the
                property by the property's known abundance in the source's emissions. This method is not often
                available because of the difficulties of finding unique tracers or knowing their abundances. However,
                even if a pollutant is not uniquely associated with a source of interest, knowledge of the abundance
                from that source can be used to provide an  upper limit for the source's impact.
              - The best-known example of single-sample receptor modeling is the chemical mass balance model
                (CMB). CMB eliminates the need for unique tracers of sources but still requires the abundances of the
                chemical components of each  source (source profiles) input.
          •   Multivariate receptor models use data from multiple pollutants and extract source
             apportionment results from all  of the sample data simultaneously.
              - The reward for the extra complexity of these models is that they attempt to estimate not only the
                source contributions (i.e., mass from each source) but also the source compositions (i.e., profiles).
              - There are several tools to perform multivariate receptor modeling described in the literature; EPA has
                supported the development of  two modeling platforms: Unmix and positive matrix factorization (PMF).
                These models are based on factor analysis, or the closely related principal component analysis.
              - Factor analysis is a statistical method used  to describe variability among  observed variables in terms
                of fewer unobserved variables  called factors.
              - There is extensive literature available describing CMB and PMF applications  to speciated PM data,
                less available literature describing applications to VOC data, and very little research on air toxics
                specifically.

June 2009                                Section 7 - Advanced Analyses

-------
                   Source  Apportionment

                      Positive Matrix Factorization

       •  PMF was originally developed by Paatero (1994, 1997) with additional
          development by Hopke et al. (1991, 2003). PMF can be used to determine
          source profiles based on the ambient data and associated uncertainties.
       •  PMF has been applied to many data sets to determine sources of PM2 5,
          ozone precursors, and air toxics.
       •  PMF uses weighted least squares fits for data that are normally distributed and
          maximum likelihood estimates for data that are log normally distributed.
          Concentrations are weighted by their analytical uncertainties.
       •  PMF constrains factor loadings and factor scores to nonnegative values and thereby
          minimizes the ambiguity caused by rotating factors.
       •  Model input includes ambient monitoring data and associated analytical uncertainties
          (see Wade et al., 2007). A large (species and sample matrix) ambient data set is
          required.
       •  Model output includes
           - Factor loadings expressed in mass units which allows them to be used directly as source
             signatures.
           - Uncertainties in factor loadings and factor scores which makes the loadings and scores easier
             to use in quantitative procedures such as chemical mass balance.
       •  A free, standalone version of PMF was created by the EPA in 2005, available on the
          Internet at httB^/www-e^a	gov/scmnOD	I	/receptor	innlex.	him. Updates are underway.
       •  Data preparation and the interpretation of model diagnostics is covered in EPA's
          Multivariate Receptor Modeling Workbook (Brown et al., 2007b).

June 2009                             Section 7 - Advanced Analyses

-------
                  Source  Apportionment

                                    Unmix

       • Unmix was developed by Ron Henry (1997) using a generalization
         of the self-modeling curve resolution method developed in the
         chemometric community.
       • It originally used MATLAB computation routines.  The EPA, along with Ron
         Henry, developed EPA Unmix and documentation that uses MATLAB features
         but is now a standalone model (i.e., MATLAB not needed).
       • Unmix is a multivariate receptor modeling package that inputs ambient
         monitoring data and seeks to find the composition and contributions of
         influencing sources or source types. UNMIX also produces estimates of the
         uncertainties in the source compositions.
       • Unmix requires many samples to extract potential sources, similar to PMF.
       • It assumes that sources have unique species ratios, i.e., "edges" that can be
         observed in a scatter plot between species; uses these edges to constrain the
         results and identify factors; and does not need to weigh data points.
       • Model input includes ambient monitoring data; uncertainty information and
         source profiles are not necessary.
       • Model output includes source profiles with uncertainties.
       • Unmix is available at..',.;t..;...;...;...;...:^.^^.^                  	;.....
       • Data preparation and the interpretation of model diagnostics is covered in EPA's
         Multivariate Receptor Modeling Workbook (Brown et al., 2007b).

June 2009                          Section 7 - Advanced Analyses

-------
                  Source Apportionment

                        Chemical Mass Balance

       • The premise of chemical mass balance (CMB) is that source profiles from
         various classes of sources are different enough that their contributions can be
         identified by measuring concentrations of many species collected at the receptor
         site.
       • To apportion sources, CMB uses an effective variance-weighted, least squares
         solution to a set of linear equations which expresses each receptor species
         concentration as a linear sum of the products of the source profiles and source
         contributions. This method can be applied to a single sample.
       • Model input includes
          - Source profile species (fractional amount of species in emissions from each source
             type).
          - Receptor (ambient) concentrations.
          - Realistic uncertainties for source and receptor values. Input uncertainty is used to weigh
             the relative importance of input data to model solutions and to estimate uncertainty of
             the source contributions.
       • Model output includes contributions from each  source type and species to the
         total ambient concentration along with uncertainty.
       • CMB has been used in a number of air pollution studies that examine particulate
         and VOC source apportionment, but few, if any, specific air toxics studies.
       • CMB is available from EPA at !:LfP://vvyvyyj;;,pa gov/3c:^:::00.1/;.cccpLO-	cex ur	

June 2009                          Section 7 - Advanced Analyses

-------
            Source Apportionment
                   Source Profiles
        Accurate source profiles are the key to successful modeling.
Source profiles provide information about the relative contribution of
pollutants to emissions from a given source. w ° 25
Understanding source profiles is important because receptor I 0.2
modeling tools typically output source profile information that needs .2
to be interpreted or requires user-input source profiles as a starting m ° 15
point for analysis. ° 0 1
Tho finiirpQ to thp rioht show pxamnlp nolvrhlorinatpd dibfin7ofuran •-

(PBDF) source profiles for hazardous waste incinerators and 2 ° °5
copper smelting compiled by the EPA. Though the same 0
compounds are present, the relative abundances are not the same,
providing a mechanism for source identification. *&
For CMB applications and for interpretation of PMF output, it is <£- A
important to use source profiles that are representative of the study ^ N<)?
area during the period when ambient data were collected. 0 25
In CMB, try available source profiles in sensitivity tests to determine £
the best ones for use (i.e., minimize collinearity). -5 ° 2
Source profiles can be obtained from I 0.15 -
r LU
- EPA SPECIATE, recently updated (version 4.0) and available at "5
c 0.1
o
- Literature review ts QQ$ -

are available. 0
- Local, state, and federal agencies.
- Source profiles can also be procured via analysis of ambient data using yp
tools such as PMF and UNMIX. v*
V" A
n .?  Cy* ^y> O|N Oy* O)N O)N ^Y* Oj^ ^Y*
_0""^ _0""^ o\i" \y$~ \)$~ \^\r yQ ^*f5 O
^ 
-------
                   Source Apportionment

                                   Approach

        • Before beginning source apportionment, it is important to "know the data" in
         order to identify and assess the receptor model outputs.  Understanding the data
         will be achieved in the process of data validation and analysis.
          - Understand airshed geography and topography using maps, photographs, site visits, etc.
          - Investigate the composition and location of emission sources.
          - Understand the typical meteorology of the site, including diurnal and seasonal variations.
          - Investigate the spatial and temporal characteristics of the data, including meteorological
            dependence.
          - Investigate the relationships among species using scatter plot matrices, correlation
            matrices, and other statistical tools.
        • Apply cluster and factor analysis techniques using standard statistical packages
         to get an overall understanding of pollutant relationships and groupings by
         season, time of day, etc.
        • If there are sufficient samples (e.g.,  more than two years of 1-in-6 day samples
         for more than 20 species and more than 50% of data above detection), Unmix
         and/or PMF may be applied to obtain "source" profiles with more species and
         further investigate data relationships.
        • If samples are few and source profiles are available, CMB may be applied to
         obtain source contribution estimates.
        • Compare source contributions estimates and  source profiles from Unmix and
         PMF to the emission inventory.


June 2009                            Section 7 - Advanced Analyses                                10

-------
                   Source  Apportionment

                                     Example

         PMF receptor modeling was performed for speciated VOC data collected at two PAMS
         sites, Hawthorne and Azusa, in the Los Angeles area during the summers of 2001-2003.
         Both toxic and non-toxic VOCs were investigated in order to provide as much data as
         possible for apportionment (Brown et al., 2007a).
         Air toxics included in the analysis were typically grouped as MSATs, though they have
         industrial sources as well.
         Data were collected as part of the PAMS network providing the advantage of subdaily data
         and speciated-versus-total mass measurements (total non-methane organic compounds,
         TNMOC).
         Uncertainty estimates were enhanced from the original analytical uncertainties by reducing
         the weighting of data below detection and missing data.  Uncertainties for missing data
         were estimated with 4 times the median concentration, data below detection were given
         uncertainties of 1.5*MDL, and all other data were given the analytical uncertainty plus
         2/3*MDL
                       Site Map
June 2009
Section 7 - Advanced Analyses
11

-------
               Source Apportionment

                Example Preliminary Analyses

        Preliminary data analyses were performed including
        investigation into data quality, local emissions, species
        relationships, temporal patterns, etc.
        Findings
        - VOC concentrations were typically higher at Azusa compared to
          Hawthorne, a result consistent with site locations relative to the
          ocean.
        - The Azusa air mass was more aged, as indicated by loss of
          reactive species (except during rush hour); this is also consistent
          with the sites' locations in the air basin.
        - The Hawthorne site seemed to have constant, fresh emissions, with
          little change in the relative abundance of VOCs throughout the day,
          consistent with nearby industrial emissions.
        - Both sites are significantly influenced by mobile sources.
June 2009                      Section 7 - Advanced Analyses                          12

-------
                    Source  Apportionment
               Example Hawthorne Site PMF Profiles
Six factors were identified by PMF at the
Hawthorne site following protocols
discussed in the Multivariate Workbook
(Brown and Hafner, 2005). The relative
percent of species mass attributed to
each profile is shown.

Profile names indicate analyst-identified
source types.
Some  of the rationale for source
identification
 - Biogenic. Isoprene is the only marker for
   biogenic sources measured in this data set
   and anthropogenic sources of isoprene are
   insignificant; temporal patterns match
   expectations.
 - Liquid Gasoline. Abundance of C5 alkanes
   agrees with previous work; temporal
   patterns are consistent with mobile
   sources.
 - Evaporative Emissions. C3-C6 alkanes and
   temporal patterns are similar to diurnal
   temperature patterns.
 - Motor Vehicle Exhaust. Typical exhaust
   profile and temporal patterns are consistent
   with rush-hour traffic.
 - Natural Gas. Natural gas is mostly ethane
   and propane. These are also long-lived
   species that accumulate in the
   atmosphere.
 - Industrial Process. Losses. Consistent with
   nearby industrial emissions.
                                                   Source Profiles From PMF
                                        100


                                        50
                                         0

                                        100
                                        50 —
         Biogenic
                                         0

                                       100


                                        50 —
                                            Liquid Gasoline
                                            	n
                             n
         Evaporative Emissions
CO
u) u
'O 100 —

-------
                 Source Apportionment
                Example Azusa  Site PMF Profiles
                                           Source Profiles From PMF
                                      Liquid Gas
Five factors were identified by
PMF at the Azusa site. The
                              100
relative percent of species mass is    50
shown.                         o   ••••••llil  ••••••••-••_•••_••-
A    ,.     ,  , .1       r-i  .     100^ Evaporative Emissions
Apportionment of these profiles to     1
specific sources was performed      5°~ • ill
,   ..      ,  Ll     ,            c/)  n  1 ialllaiaaa  ••••••••••-•_  _•_•_
by the analyst based on         .g> 1 °
 33                   O 10° ~~| Motor Vehicle Exhaust
knowledge of source profiles and  £_  50j

other investigations into the data,  t  o 111	1	•••••••••lilllilli

Some of the rationale for source   ^ 100^ Coatings
identification                     50

 - Coatings.  Presence of C9-C11       °
    ii    •    • A  A -±u   •        H Biogenic
   alkanes is consistent with previous
   results; temporal pattern showed a
   daytime peaK COnSIStent Wltn           CDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDCDWCDCDCDCDWCD-O
   i n c\ i icTn^il ^% i^ ^\ I^^TI s\ m <^                 co CD co co co CD co co co co CD co co CD co co co CD co co CD CD co ^— CD CD co CD ^— co *^—
   II lUUOll ICJl UUCI CJLIUI IO.                i— -^ Q.-^ -^ "5.*J •HJ-HJ*J'-X'HJNX'KX3'KTiN "^. CCDNNCJNCDCJ'-^
                                  i*~! ^* ^ ^ m  mm^m^"<^mc<^n)<^"om"r^c x" O^CCCDCNCDC
 - Other profiles are similar to those         ujoig   | g-o. >, Q. w ^= o.^ |^x ^ H ^  |6Z1||Q||||
   observed at the Hawthorne site.                 ~   I f   'sl^^l'   £ £ £  £ 1s  ^
                                             C ^1   r^  ^  "^  ^  LJJ   LJJ Q) Q)  Q) -HJ
                                             a ^   .i  N  £  ^       iii~
                                             S    °     1          HH  HQ
                                                 CM     ^          m -
-------
                  Source  Apportionment
                  Example Percent of Total Mass
       The profiles in the previous slides indicate the
       relative fraction of VOCs within a profile.
       The pie charts to the right show the importance
       of each source profile by quantifying the amount
       of TNMOC mass represented by each profile.
       For example, in Hawthorne, evaporative
       emissions accounted for 34% of TNMOC mass
       during the summers of 2001-2003.
       Mobile source emissions are dominant
       contributors to TNMOC at both Hawthorne and
       Azusa with 71% and 80% of total mass,
       respectively (sum of liquid/unburned gasoline,
       motor vehicle exhaust, and evaporative
       emissions).
       The remaining VOC mass  is attributed to
       coatings at the Azusa site and is split between
       industrial processes and natural gas at the
       Hawthorne site.
                                                          Hawthorne
                        Liquid/Unburned
                        Gas, 10.3, 13%
                  Motor Vehicle
                Exhaust, 18.6, 24%
    Biogenic, 0.9, 1%
        Industrial Process
        Losses, 11.8,15%
                                       Natural Gas, 10.4,
                                          13%
                              Evaporative
                             Emissions, 26.7,
                                34%
                             Azusa
                 Liquid/Unburned
                Gasoline, 65.0, 27%
Biogenic, 8.0, 3%

        Coatings, 39.3, 17%
                                                  Motor Vehicle
                                                Exhaust, 51.5, 22%
                                       Evaporative
                                      Emissions, 72.9,
                                         31%
June 2009
Section 7 - Advanced Analyses
                    15

-------
                    Source  Apportionment
                Example Apportionment of Benzene
                                                                 Hawthorne
        Apportionment of individual species between profiles can
        also provide interesting analyses.
        For example, benzene is a significant cancer risk driver at
        most sites in the United States. Source apportionment of
        benzene can help policy makers develop effective control
        regulations.
        The figures to the right show the percentage of benzene
        (by mass) attributed to each source profile identified by
        PMF at the Hawthorne and Azusa sites.
        As expected, both sites show a significant percentage of
        benzene mass attributed to mobile sources and gasoline
        evaporation.  Interestingly, almost one-fourth of the
        benzene at the Hawthorne site is attributed to natural gas.
        Benzene is not emitted  in  natural gas (but may be emitted
        from combustion of natural gas); however, a significant
        fraction of ambient benzene is associated with air parcels
        containing ethane and propane (key components of
        natural gas).  Since benzene is relatively long-lived, it is
        possible that benzene in this profile represents urban
        background.  The same observation can be made for the
        benzene in the biogenic profile—biogenic benzene
        emissions are very small.
                        Industrial Process
                          Losses
                           11%
 Biogenic
  6%  Liquid Gas
        10%
                     Natural Gas
                       21%
         Evaporative
         Emissions
           7%
                                      Motor Vehicle
                                         45%
                                Azusa
                              Coatings
                                6%
     Biogenic
      14%
                   Motor Vehicle
                     Exhaust
                      37%
          Liquid Gas
            25%
June 2009
Section 7 - Advanced Analyses
Evaporative
Emissions
  18%
16

-------
                 Source Apportionment

                                Summary

      Source apportionment steps
       • Review data quality and spatial/temporal characteristics.
       • Prepare data for source apportionment.
          - Processing the necessary data differs among the tools, but typically the
            analyst needs to select pollutants with sufficient data above detection and
            understand/quantify uncertainty for each concentration. Guidance is provided
            in the EPA's Multivariate Receptor Modeling workbook (Brown et al., 2007b).
       • Understand the air shed by assessing likely emissions sources and local
         meteorology. This helps set expectations for what the source apportionment
         results should show.
       • With guidance from literature and workbooks, apply source apportionment
         tools. This is an iterative process!
       • Evaluate results for reasonableness.
       • Compare results to emission inventories.
      With respect to toxics data, PMF and Unmix have been applied to a
      range of data sets while CMB applications have largely been focused
      on PM data.
June 2009                         Section 7 - Advanced Analyses                            17

-------
                Trajectory Analysis
                        Introduction

       Trajectory analysis uses knowledge of air mass
       movement to trace the most likely areas of influence
       on high pollutant concentrations.
       The use of trajectory analysis after source
       apportionment helps analysts better understand,
       interpret, and verify source apportionment results.
       Analysis techniques
        - Backward trajectories
        - Trajectory densities
        - Potential Source Contribution Function (PSCF)
        - Conditional Probability Function (CPF)
June 2009                   Section 7 - Advanced Analyses                      18

-------
                    Trajectory Analysis
                       Backward Trajectories
       Backward air mass trajectories
       estimate where air parcels were
       during previous hours.
       Air mass trajectories can be
       employed to investigate long-
       term,  synoptic-scale
       meteorological conditions
       associated with high
       concentrations of individual
       factors.
       Estimates grow less certain as
       time elapses.
       The NOAA HYSPLIT model is
       one means to run trajectories.
       It is available at
       http://www.arl.noaa.gov/readv/hysplit4.html
                 48 Hour Back Trajectories - 50, 300,1000 m
                                HYS PL IT trajectory
                                hourly endpoints for
                                top 20% highest
          Trajectories are often plotted as single points for
          every hour backwards from the start point as
          shown here (also called a spaghetti plot).
          However, they should not be viewed as specific
          points, but rather as a small area around that
          point and with the last and next point.
June 2009
Section 7 - Advanced Analyses
19

-------
                       Trajectory  Analysis
                             Trajectory Densities
           48 Hour Back Trajectories - 50. 300,1000 m
                                                         Spatial Probability Density
    :''.:•:'" ' ^^f^^4-^/-^^^'^.'-'': HYSPUT trajectory
    ./•""'•,>>:•'• V •V>f%-v^^V""(v..-^V-    hourly endpoints for  '
    •'\ .''*' .-:V;r.*f ^..'.•''•yjr-]Vt^'.':''••'-'•.'' •.... davs with the 20%
- ....  .. j.^ f .-..-.^771-..x .-,-.•.- -.. -.-.; !C"''"i days with the 20%
!"'*' ..:C-1V""J^~^*'"'"•• ;"^'H''-'''^::-'"''•'v' •V*.-;-'4 worst visibility
                 Y'-.."  "••/-.'• conditions in
               '/•••. .'.•'.'•'.'r. ':'-' Indianapolis in 2002
•o'^Soioo
                   .•-.•.•..•.  •
                 n ti'r- .• •;.  '.l* •• :-,..•
                                              Spatial Probability
                                              Density (SPD) of
                                              trajectory endpoints
                                              processed within
                                              CIS
    Trajectories are often processed into density, rather than "spaghetti", plots.
    Higher density corresponds to more trajectories passing through that grid
    square. This plotting enables a number of useful analysis techniques, such
    as Potential Source Contribution Function (PSCF) analysis.
June 2009
                              Section 7 - Advanced Analyses
                                                                                   20

-------
                      Trajectory  Analysis
        Potential Source Contribution Function (PSCF)
       PSCF uses HYSPLIT backward trajectories to
       determine probable locations of emission
       sources.
                           m..
       HJJ = number of times trajectory passed through cell (i,j).
       m^- number of times source contribution peaked while
       trajectory passed through cell (i,j).

       Top 10%-20% source contributions are used for mjf
       In the example on the right, all five-day backward
       trajectories, for every two hours were applied to the
       corresponding 24-hr source contributions.

       PSCF calculated for each cell sized 1°xi° and results
       displayed in the form of maps on which PSCF values
       ranging from 0 to 1 are displayed in a color scale.
                     fl-O
                     0.3- O.S
                     0,6-0,7
                    I 0.7 -1
\
                   PSCF function plot for sulfate affecting
                   Philadelphia. Higher probability is
                   associated with an area of high SO2
                   emissions. Computations and graphics
                   are made using ArcMap or other GIS
                   tool.
                           (Source: Begum et al., 2005)
June 2009
Section 7 - Advanced Analyses
        21

-------
                    Trajectory Analysis
             Conditional Probability Function (CPF)

       CPF uses wind direction, rather than trajectories, to determine the likely
       direction of sources. CPF compares days when concentrations were
       highest to the average transport pattern (i.e., the climatology).
                  n
                    A8
       nAQ= number of times wind direction is
       from sector A0.

       mAQ= number of times source
       contributions are high while wind
       direction was from sector A0.

       A CPF value close to 1.0 for a given
       sector (A0) indicates a high probability
       that a source is located in that direction.
                      300
                    270
                      240
120
                         210
                  Example CPF plot for the highest 25%
                  contribution from a PMF factor pointing
                  to the northwest of site as a possible
                  source region.  Computations can be
                  programmed into Microsoft Excel or
                  other statistical packages.
June 2009
Section 7 - Advanced Analyses
                                                         (Source: Kim etal., 2004)
        22

-------
                     Trajectory Analysis

                               Interpretation

         No matter which trajectory analysis is used, interpretation of results is
         similar. These methods are all complementary to source
         apportionment or can be standalone to assess source regions.  No one
         method shown is superior.
          -  To investigate a number of days, ensemble methods are preferred (such
             as trajectory densities). These methods help identify source areas.
          -  CPF also requires a number of days to be included, but helps point toward
             a particular direction.
          -  Single trajectories are useful when investigating an individual sample.
         The following questions may be investigated for verification of results:
          -  Do results meet the conceptual model of emissions and removal of air
             toxics?
          -  Are these the areas from which emissions influence would be expected?
          -  Does the transport pattern make sense with respect to the age/chemistry of
             a given factor (i.e., more transport and chemistry are associated with
             secondary pollutants such as formaldehyde)?
June 2009                          Section 7 - Advanced Analyses                             23

-------
                     Trajectory  Analysis
                           Using CPF Results

       This approach is based on the assumption that wind direction and trajectory
       analysis results should be consistent with the spatial distribution of the
       sources in the emission inventory.
     In the example at right,
     the directions of source
     regions from the CPF
     plots agree with the
     locations of propene
     sources in the area (red
     circles), giving more
     confidence to the source
     apportionment results.
     A similar approach can
     be employed for toxic
     species.
June 2009
Section 7 - Advanced Analyses
(Source: Berkowitz et al., 2004)
24

-------
           Emission  Inventory Evaluation

                                    Introduction

      •  Why bother evaluating emissions data?
         -  Emission inventory development is an intricate process that involves estimating and
            compiling emissions activity data from hundreds of point, area, and mobile sources in a
            given region.  Because of the complexities involved in developing emission inventories and
            the implications of errors in the inventory on air quality model performance and control
            strategy assessment, it is important to evaluate the accuracy and representativeness of any
            inventory that is intended for use in modeling. Furthermore, existing emission factor and
            activity data for sources of air toxics and their precursors are limited and the quality of the
            data is questionable.  An emission inventory evaluation should be performed before the
            data are used in modeling.

      •  What tools  are available for assessing emissions data?
         -  Several techniques are used to evaluate emissions data including "common sense" review
            of the data; source-receptor methods such as PMF; bottom-up evaluations that begin with
            emissions activity data and estimate the corresponding emissions; and top-down
            evaluations that compare emission estimates  to ambient air quality data. Each evaluation
            method has strengths and limitations.
         -  Based on the results of an emissions evaluation, recommendations can be made to improve
            an emission inventory, if warranted. Local agencies responsible for developing an inventory
            can then make revisions to the inventory data prior to modeling.
         -  PM2 5 and PAMS data analysis workbooks provide some example analyses and approaches
            that are applicable to air toxics data (Main and Roberts, 2000; 2001).


June 2009                             Section 7 - Advanced Analyses                                 25

-------
        Emission  Inventory Evaluation
                    Using Ambient Data
     Ambient air quality data can be used to evaluate
     emission estimates ("top-down"); however, the
     following issues should be considered:
      - Proper spatial and temporal matching of emission estimates
        and ambient data is needed.
      - Ambient background levels of air toxics need to be
        considered.
      - Meteorological effects need to be considered.
      - Comparisons are only valid for primarily emitted air toxics.
      - To compare ambient concentrations to emissions estimates, a
        pollutant or total value (such as total VOC) is needed to create
        a ratio. Typically, NOX or CO is used.
June 2009                   Section 7 - Advanced Analyses                       26

-------
        Emission Inventory Evaluation
                     Top-down Approach

      Top-down emissions evaluation is a method of comparing
      emissions estimates with ambient air quality data.
      Ambient/emission inventory comparisons are useful for
      examining the relative composition of emission inventories;
      they are not useful for verifying absolute pollutant masses
      unless they are combined with bottom-up evaluations. The
      top-down method has demonstrated success at reconciling
      emission estimates of VOC and NOX.
      Top-down approach:
         Compare ambient- and emissions-derived primary air
         toxic/NOx, CO, or VOC  ratios.
      If early morning samples are available (such as with PAMS data), these sampling
       periods are the most appropriate to use because emissions are generally high,
     mixing depths are low, winds are light, and photochemical reactions are minimized.
June 2009                    Section 7 - Advanced Analyses                       27

-------
          Emission  Inventory  Evaluation
                                  Example
        o
        o
            25%
            20%
            15%
            10%
         D)
        'CD
                                  -Ambient - Avg


                                  -Ambient - Median


                                  El - Low Level
                                  Only

                                  El - With Elevated
                                  Sources
       At this PAMS site, the El-derived compositions of benzene are significantly higher than
       the ambient-derived compositions. Examination of point source records near the source
       indicates that the sources of these emissions are chemical manufacturing operations. It
       appears that the chemical speciation profiles used to speciate the point source inventory
       over-represent the relative amount of benzene (by about a factor of 2 to 5). Similarly,
       xylenes are overestimated.
       Toluene and 1,3-butadiene are only slightly overestimated in the El at this site.
June 2009
Section 7 - Advanced Analyses
28

-------
                    Evaluating  Models
                             Introduction
        Air quality models have been used for decades to assess the potential
        impact of emission sources on ambient concentrations of criteria and toxic
        air pollutants.
        In the past decade, air quality models have also been used as planning
        tools for criteria pollutants, e.g., SIP development and attainment
        demonstration.
        However, until recently, air quality models have not been used as
        planning tools for air toxics, due to the lack of measurements with which
        to evaluate the models.
        The need to assess the usefulness of these models in air quality planning
        and to improve both modeling and evaluation methods has been identified
        - How well are we modeling air toxics?
        Reasonable agreement between  model and monitor concentrations was
        set by EPA as "within a factor of  2".
        Example of model-to-monitor comparisons for NATA and methodology for
        comparisons are provided at:
June 2009                        Section 7 - Advanced Analyses                           29

-------
                    Evaluating  Models

                             Methodology

        Modeled Data. Modeled data of interest for air toxics include publicly
        available and widely used NATA data. For this example, NATA99
        model results were used.
        Monitored Data.  In order to reduce perturbations from meteorology
        and other data biases in monitored data, the site average of 1998-
        2000 valid annual averages was used for comparison to model output.
        The lowest spatial resolution of NATA99 data is census tract level, so
        NATA99 modeled results should be related to ambient monitoring data
        at this level. If multiple sites fall into one census tract the sites should
        still be individually evaluated.
        Analyses. If data from many sites are available, box plots of
        modeled/monitored data can be examined; fewer sites lend
        themselves to a scatter plot approach of model-to-monitor data.
        Model-to-monitor ratios within a factor of 2 are considered to be within
        the acceptable limits of a good comparison; see
June 2009                        Section 7 - Advanced Analyses                           so

-------
                    Evaluating  Models
                           Using Box Plots
      The figure shows the ratio of NATA99
      modeled data to monitored data at an
      urban area's sites to indicate the
      accuracy of modeled data.
      Red lines indicate the cutoff for
      modeled-to-monitored concentrations |
      within a factor of 2.                i1 °°
                                      CD
                                      T3
                                      O
June 2009
Acetaldehyde, benzene,
dichloromethane, and trichloroethene
typically agreed within a factor of 2,
consistent with national level         0.10
comparisons of model and monitor
data.
However, ethylbenzene,                 ^
formaldehyde, carbon tetrachloride,      ^  *
chloroform and tetrachloroethylene
showed monitored concentrations
more than a factor of
2 higher than model estimates at
these sites.
                      Section 7 - Advanced Analyses
                                                            o
                                                            I
31

-------
                           Evaluating  Models
                                 Using Scatter Plots
         Modeled and monitored concentrations can
         also be compared using scatter plots,
         plotting each data pair (ambient site-
         average, model output) separately. For
         NATA 1999, benzene data compared well to
         the modeled data.
         There are several reasons why we would
         expect good agreement between model
         prediction and monitor results for benzene.
          - It is a widely distributed pollutant which is
            emitted from point, area, and mobile
            sources. Thus, if the model is biased in the way
            it handles any one of these source categories,
            the bias will likely be dampened by one of the
            other sources.
          - An estimated background concentration was
            available for benzene in the modeling effort.
          - There is a large number (87) of monitoring sites
            for benzene for this comparison, resulting in an
            adequate sample size for the statistics in the
            comparison.
          - Monitoring technology for benzene has a long
            history, suggesting that the monitoring data
            reflects actual ambient concentrations.
          - Benzene emissions have been tracked for many
            years, so there is some confidence in emission
            estimates.
                loclel  to Monitor  p  ot  tor  Benzene
          Model Cone.
                          2:1
                                                 1:2
                0     I     '1     3     4     5     6     7     8
                               Mon i tor Concentr n tion
                  2001 Aspen Model concentrations vs 1S9S Monitor Averages


                Model-to-monitor scatter plot for benzene. Most

                points fall within the factor of 2 wedge, and none

                are far outside the wedge. From
                http://www.epa.gov/ttn/atw/nata/draft6. htm IffsecV
June 2009
Section 7 - Advanced Analyses
32

-------
                 Network Assessment
                            Introduction

        Air quality agencies may choose to re-evaluate and reconfigure
        monitoring networks because
         - Air quality has changed;
         - Populations and behaviors have changed;
         - New air quality objectives have been established
          (e.g., air toxics reductions, PM25, regional haze); and
         - Understanding of air quality issues and monitoring capabilities have
          improved.
        Network assessments may include
         - Re-evaluation of the objectives and budget for air monitoring;
         - Evaluation of a network's effectiveness and efficiency relative to its
          objectives and costs; and
         - Development of recommendations for network reconfigurations and
          improvements.
        Network assessment guidance is available from EPA at
June 2009                       Section 7 - Advanced Analyses                          33

-------
              Network Assessment
                       Methodology

     Some things to consider when performing a
     network assessment:
      •  Length of monitoring. Takes into account a site's
        monitoring history because long data records can be
        highly useful in trends and accountability analyses.
      •  Suitability analyses. Combines many data sets such as
        population or population change, meteorology,
        topography, and emissions to asses suitability of current
        or future monitoring locations.
June 2009                   Section 7 - Advanced Analyses                     34

-------
                 Network Assessment
                    Period of Operation (1 of 2)
        Motivation
         - Monitors that have long
           historical trends are
           valuable for tracking
           trends.
         - This technique places
           the most importance on
           sites with the longest
           continuous trend record.
        Resources needed
         - Historical monitor data,
           typically valid annual
           averages.
           IH1 ,3-Butadiene

           DAcetaldehyde

           • Benzene

           • Chromium (Tsp)
• 1 ,4-Dichloro benzene

DArsenic (Tsp)

DCarbon Tetrachloride

DNickel (Tsp)
      The figure shows the number of monitoring sites per year
      for a variety of air toxics. The number of air toxics
      monitoring sites has increased dramatically since 1990.
June 2009
Section 7 - Advanced Analyses
                    35

-------
                 Network Assessment
                    Period of Operation (2 of 2)
City, State
Stockton, CA
Baltimore, MD
Los Angeles, CA
San Francisco, CA
Fresno, CA
Baltimore, MD
Los Angeles, CA
Los Angeles, CA
San Diego, CA
San Francisco, CA
San Jose, CA
Baltimore, MD
Sacramento, CA
San Diego, CA
Oxnard, CA
Chicago, IL-IN-WI
Baltimore, MD
AQS SitelD
06-077-1002
24-510-0040
06-037-1002
06-001-1001
06-019-0008
24-005-3001
06-037-1103
06-037-4002
06-073-0003
06-075-0005
06-085-0004
24-510-0006
06-061-0006
06-073-0001
06-111-2002
18-089-2008
24-510-0035
Years
13
12
11
10
10
10
9
9
9
9
9
9
8
8
8
8
8
                              •Tetrachloroethylene
                           1,400
                           1,200 -
                                      > rfc
                         C? QN & &
                         SJ rvO rvO r^J
                           The table lists the number of annual averages available for
                           tetrachloroethylene at toxics monitoring sites from 1990 to 2003.
                           For this analysis, sites with the longest record would be rated
                           higher than those with shorter records.
June 2009
Section 7 - Advanced Analyses
36

-------
                 Network Assessment

          Suitability Modeling/Spatial Analysis (1 of 2)

      • Motivation
         - This method may be used to identify suitable monitoring locations
           based on user-selected criteria.
         - Geographic map layers representing important criteria, such as
           emissions source influence, proximity to populated places, urban
           or rural land use, and site accessibility, can be compiled and
           merged to develop a composite map representing the combination
           of important criteria for a defined area.
         - The results indicate the best locations to site monitors based on
           the input criteria and may be used to guide new monitor siting or to
           understand how changes may impact the current monitoring
           network.

      • Resources needed
         - GIS, site locations, population and other
           demographic/socioeconomic data, emission inventory data
         - Meteorology and concentration data may be helpful, but are not
           necessary
         - Skilled GIS analyst
June 2009                       Section 7 - Advanced Analyses                          37

-------
                 Network Assessment
          Suitability Modeling/Spatial Analysis (2 of 2)
    A representation of the process of suitability modeling and spatial analysis
                        Points
        Lines
Population
Elevation
        Input Data:
        Point, line, or
        polygon geographic
        data
        Gridded Data:
        Create distance
        contours or density
        plots from the data
        sets

        Reclassified Data:
        Reclassify data to
        create a common
        scale
                 Weight and combine data sets
                                                    High Suitability
                                                    Low Suitability
June 2009
        Output suitability model

Section 7 - Advanced Analyses
                          38

-------
                Network Assessment
                Suitability Modeling Example

       The goal of this analysis of the Phoenix area was to use
       CIS technology to identify locations within an area
       potentially suitable for placing air toxics and/or particulate
       monitors to better assess diesel  particulate matter (DPM)
       emissions impacts on population.
       The emission inventory was assessed to determine
        - predominant sources of DPM; and
        - the best available geographic data to represent the spatial pattern
          of the identified emission sources in the region.
       The relative importance of each  geographic data set was
       determined based on its potential DPM contribution.
       The input layers were weighted accordingly and combined
       to produce a suitability map using the Spatial Analyst CIS
       tool.
June 2009                    Section 7 - Advanced Analyses                       39

-------
                    Network  Assessment
             Example Suitability Modeling Data Layers
   1. Traffic volume (Annual Average Daily
     Traffic, AADT)
   2. Heavy-duty truck volume (from AADT
     data)
   3. Locations of railroads and
     transportation depots
   4. Residential and commercial
     development areas
   5. Golf courses and cemetery locations
     (lawn and garden equipment usage)
   6. Airport locations
   7. PM2.5 point source locations (weight
     assigned to each source depends on
     the source's relative EC contribution)
   8. Total population and sensitive
     population (e.g., under 5 and over
     65 years of age) density
   9. Annual average gridded wind fields
     representing predominant wind
     direction throughout the region
   Lmked-based Annual
  Average Daily Traffic
                            CHAHDL&KI
   Airport      City Boundary
   Tribal Land Boundary  County Boundary
 AADT
June 2009
Section 7 - Advanced Analyses
40

-------
              Network Assessment

         Example Suitability Modeling Weighting

        Weighting Scheme -two model scenarios were used:
           1.  Proximity to diesel emission sources (hot spot)
           2.  Proximity of population to diesel sources
Layer
Density of total population
Heavy-duty vehicle activity
Light-duty vehicle activity
Transportation distribution
facility
Lawn/garden activity areas
Commercial/residential
construction activity areas
Distance to airports
Distance to railroads
PM25 point source activity
(1)
Hot Spot
—
20%
15%
20%
12%
20%
2%
2%
9%
(2)
Total
Population
40%
12%
9%
12%
7.2%
12%
1.2%
1.2%
5.4%
Weighting Criteria
High population density = more suitable
High traffic density = more suitable
High traffic density = more suitable
Close to facility = more suitable
High activity density = more suitable
High activity density = more suitable
Close to airport = more suitable
Close to railroad = more suitable
High non-EC PM25
emissions density = less suitable
June 2009
Section 7 - Advanced Analyses
41

-------
                    Network  Assessment
             Example Results of Suitability Modeling
   The map shows the
   results of combining all
   data layers in Scenario 1
   (table on previous slide).
   The map indicates that
   the Glendale area is a
   hot spot for both diesel
   influence and population,
   as well as the area
   around the Phoenix
   Supersite.
   The area between
   Guadalupe and Mesa is
   also suitable for
   monitoring to better
   understand DPM
   impacts.
       Scenario 1 (population and meteorology included)
                             PHOENOfC' PARADISE VALLEY

                             JLG Supersite

                                                     ?*

                                         tsL
                  APACHE JUNCTION
                                                    -33-
                                                        • o.,(.'•• rfl
                                           fe-
                 (WEEN CREEK
Legend
Suitability Model
                                               A AQ Monitor Location
                                              '~\_x Interstate/Freeway
                                                 Urban Boundary
Total Population/Wind Influence Weighting Scheme
Total Population Density = 40%   Commercial Laval/Garden
Heavy Duty MOT Roads = ?2%   Usage Areas - 7.2%
Transportation Facilities - 12%   PM 2.5 Point Sources = 5.4%
Commercial/Residential     Railroads = 1.2%
Development Areas = 12%     Airports = 1.2%
Light DutyAADT Roads = 9%
June 2009
       Section 7 - Advanced Analyses
                        42

-------
              Network Assessment
               Suitability Analysis Summary

       Results of this analysis assisted decision makers in
       - Assessing the utility of current monitors;
       - Selecting locations for new monitors;
       - Setting monitoring priorities; and
       - Investigating a range of monitoring objectives and
         considerations.
       Suitability analysis can improve the effectiveness of
       monitoring decisions
June 2009                   Section 7 - Advanced Analyses                       43

-------
                     Resources
       PMF, Unmix, and CMB:
       http://www.epa.gov/scram001/receptorindex.htm
       EPA's Multivariate Receptor Modeling Workbook:
       http://www.sonomatechdata.coni/sti workbooks/#MVRMWB
       NOAA HYSPLIT model:
       http://www.arl.noaa.gov/readv/hysplit4.html
       EPA SPECIATE, recently updated (version 4.0):
       http://www.epa.gov/ttn/chief/software/speciate/index.html.
       Network assessment guidance:
       http://www.epa.gov/ttn/amtic/cpreldoc.html
June 2009                   Section 7 - Advanced Analyses                     44

-------
                                        References
      Begum B.A., Kim E., Jeong C.H., Lee D.W., and Hopke P. (2005) Evaluation of the potential source contribution
        function using the 2002 Quebec forest fire episode. Atmos. Environ. 39, 3719-3724.
      Berkowitz C.M., Xie Y.-L, Jolly J., and Estes M. (2004) Receptor modeling and analysis: early first results from the
        2003 enhanced Houston Auto-GC network.  Presented at the TERC Science Advisory Committee (SAC) Meeting,
        October 13. Available on the Internet at
        .
      Brown S.G., Frankel A., and Hafner H.R.  (2007a) Source apportionment of VOCs in the Los Angeles area using
        positive matrix factorization. Atmos. Environ. 41, 227-237 (STI-2725).
      Brown S.G., Wade K.S., and Hafner H.R. (2007b) Multivariate receptor modeling workbook. Prepared for the U.S.
        Environmental Protection Agency, Office of  Research and Development, Research Triangle Park, NC, by Sonoma
        Technology, Inc., Petaluma, CA, STI-906207.01-3216, August.
      Chinkin L.R., Coe D.L., Hafner H.R., and  Tamura T.M. (2003) Air toxics emission inventory training workshop.
        Sponsored by the U.S. Environmental Protection Agency, Region IX, Richmond, CA. Prepared by Sonoma
        Technology, Inc., Petaluma, CA, STI-903320-2398, July 15-16.
      Friedlander S.K. (1973) Chemical element balances and identification of air pollution sources. Environ. Sci. Technol. 7,
        235-240.
      Fujita E.M., Croes B.E., Bennett C.L., Lawson D.R., Lurmann F.W., and Main H.H. (1992) Comparison of emission
        inventory and ambient concentration ratios of CO, NMOG, and NOx in California's South Coast Air Basin. J. Air &
        Waste Manag. Assoc. 42, 264-276.
      Fujita E.M., Watson J.G., Chow J.C., and Lu Z. (1994) Validation of the chemical mass balance receptor model applied
        to hydrocarbon source apportionment in the Southern California Air Study. Environ. Sci. Technol. 28,  1633-1649.
      Gordon G.E. (1988) Receptor models. Environ. Sci. & Technol. 22(10), 1132-1142.
      Hafner H.R., Penfold B.M., and Brown S.G. (2005) Using CIS tools to select suitable DPM monitoring locations:
        Phoenix, Arizona. Presented at the 2005 Air Toxics Summit, Seeking Solutions for our Rural and Urban
        Communities, Portland, OR, October 18-19, by Sonoma Technology, Inc., Petaluma, CA (STI-904234-2755).
June 2009                                    Section 7 - Advanced Analyses                                         45

-------
                                        References
      Henry R. C. (1997) History and fundamentals of multivariate air quality receptor models. Chemometrics and Intelligent
         Laboratory Systems 37, 525-530.
      Henry R.C., Lewis C.W., Hopke P.K., and Williamson H.J. (1984) Review of receptor model fundamentals. Atmos.
         Environ. 18(8), 1507-1515.
      Henry R.C. (1997) History and fundamentals of multivariate air quality receptor models. Chemometrics and Intelligent
         Laboratory Systems 37, 525-530.
      Henry R.C. (2000) Unmix Version 2 Manual. Available on the Internet at
         .
      Henry R.C. (2002) Receptor modeling. In Encyclopedia of Environmetrics, A.H. El-Shaarawi and W.W. Piegorsch eds.,
         John Wiley & Sons, Ltd, Chichester, 1706-1721.
      Hidy G.M. and Friedlander S.K. (1971) The nature of the Los Angeles aerosol. In proceedings from the Second
         International Clean Air Congress, 391-404, Academic Press, New York.
      Hopke P.K. (2003) A guide to  positive matrix factorization. Prepared for Positive Matrix Factorization Program,
         Potsdam, NY, by the Department of Chemistry, Clarkson University, Potsdam, NY.
      Hopke P.K., Ramadan Z., Paatero P., Norris G., Landis M., Williams R., and Lewis C.W. (2003) Receptor Modeling of
         Ambient and Personal Exposure Samples: 1998 Baltimore Particulate Matter Epidemiology-Exposure Study. Atmos.
         Environ. 37, 3289-3302.
      Kim E.,  Hopke P.K., Larson T.V., and Covert D.S. (2004a) Analysis of ambient particle size distributions using UNMIX
         and  positive matrix factorization. Environ. Sci. Technol. 38 (1), 202-209.
      Kim E.,  Hopke P.K., Larson T.V., Maykut N.N., and Lewtas J. (2004b) Factor analysis of Seattle fine particles. Aerosol
         Sci.  Technol. 38 (7), 724-738.
      Kim E.,  Hopke P.K., Kenski D.M.,  and Koerber M. (2005b) Sources of fine particles in a rural Midwestern U.S. area.
         Environ. Sci. Technol, 39 (13), 4953-4960.
      Kim E. and Hopke P.K. (2004) Improving source identification of fine particles in a rural northeastern U.S. area utilizing
         temperature-resolved carbon fractions. J. Geophys. Res.  109 (D9), D09204, doi:  09210.01029/02003JD004199.
June 2009                                    Section 7 - Advanced Analyses                                         46

-------
                                        References
      Kim E., Hopke P.K., and Qin Y. (2005c) Estimation of organic carbon blank values and error structures of the speciation
         trends network data for source apportionment. J. Air & Waste Manag. Assoc. 55, 1190-1199.
      Lindsey C.G., Chen J., Dye T.S., Richards L.W., and Blumenthal D.L. (1999) Meteorological processes affecting the
         transport of emissions from the Navajo Generating Station to Grand Canyon National Park. J. Appl. Meteorol. 38
         (No. 8),  1031-1048.
      Main H.H. and Roberts P.T. (2001) PM2.5 data analysis workbook. Draft workbook prepared for the U.S. Environmental
         Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park, NC, by Sonoma
         Technology, Inc., Petaluma, CA, STI-900242-1988-DWB, February.
      Main H.H. and Roberts P.T. (2000) PAMS data analysis workbook: illustrating the use of PAMS data to support ozone
         control programs. Prepared for the U.S. Environmental Protection Agency, Research Triangle Park, NC, by Sonoma
         Technology, Inc., Petaluma, CA, STI-900243-1987-FWB, September.
      Larsen R.K. and Baker J.E. (2003) Source apportionment of polycyclic aromatic hydrocarbons in the urban atmosphere:
         a comparison of three methods. Environ. Sci. Technol. 37 (9), 1873-1881.
      Lewis C.W., Norris G.A., Conner T.L., and Henry R.C. (2003) Source apportionment of Phoenix PM2.5 aerosol with the
         Unmix receptor model. J. Air & Waste Manag. Assoc. 53 (3), 325-338
      Paatero P.  and Tapper U. (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of
         error estimates of data values. Environmetrics 5, 111-126.
      Paatero P.  (1997) Least squares formulation of robust non-negative factor analysis. Chemometrics and Intelligent
         Laboratory Systems 37,  23-35.
      Paatero P., Hopke P.K., and Philip K. (2003) Discarding or downweighting high-noise variables in factor analytic
         models. Anal. Chim. Acta 490, 277-289.
      Poirot R.L., Wishinski P.R.,  Hopke P.K., and Polissar A.V. (2001) Comparative application of multiple receptor methods
         to identify aerosol sources in northern Vermont. Environ. Sci. Technol. 35 (23), 4622-4636.
      Raffuse S.M., Sullivan D.C., McCarthy M.C., Penfold B.M., and Hafner H.R. (2006) Analytical techniques for technical
         assessments of ambient air monitoring networks. Guidance  document prepared for the U.S. Environmental
         Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park, NC, by Sonoma
         Technology, Inc., Petaluma, CA, STI-905212.02-2805-GD, September.


June 2009                                    Section 7 - Advanced Analyses

-------
                                         References
      Raffuse S.M., Brown S.G., Sullivan D.C., and Chinkin L.R. (2005) Estimating regional contributions to atmospheric
         haze. Presented at the 2005 ESRI International User Conference, San Diego, CA, July 26 (STI-2649).
      Raffuse S.M., Sullivan D.C., and Chinkin L.R. (2005) Emission impact potential - a method for relating upwind
         emissions to ambient pollutant concentrations. Presented at the U.S. Environmental Protection Agency 14th
         International Emission Inventory Conference, Las Vegas, NV, April 11-14 by Sonoma Technology, Inc., Petaluma,
         CA (STI-2715, STI-2722). Available on the Internet at
         .
      Rosenbaum A.S., Ligocki M.P., and Wei Y.H. (1999) Modeling cumulative outdoor concentrations of hazardous air
         pollutants. Revised final report prepared for the U.S.  Environmental Protection Agency, Research Triangle Park, NC
         (SYSAPP-99-96/33R2), February.
      Seigneur C., Pun B., Lohman K., and Wu S.-Y. (2002) Air toxics modeling. Report prepared for Coordinating Research
         Council, Inc., Alpharetta, GA and U.S. Department of Energy's Office of Heavy Vehicle Technologies through the
         National Renewable  Energy Laboratory, Golden, CO by Atmospheric & Environmental Research, Inc., San Ramon,
         CA, CRC Project Number A-42-1, Document Number CP079-02-3, August. Available on the Internet at
         ;
      U.S. Environmental Protection Agency (1999) Air dispersion modeling of toxic pollutants in urban areas: Guidance,
         methodology and example applications. Prepared by the Office of Air Quality Planning and Standards, Research
         Triangle Park, NC, EPA-454/R-99-021, July
      Watson J.G. (1979) Chemical element balance receptor model methodology for assessing the source of fine and total
         particulate matter. Ph.D. Dissertation, Oregon Graduate Center, Portland, OR, University Microfilms International,
         Ann Arbor, Ml.
      Watson J.G. (1984) Overview of receptor model principles. J. Air Poll. Cont. Assoc. 34 (6), 619-623.
      Watson J.G., Fujita E.M., Chow J.C., Zielinska B., Richards L.W., Neff W., and Dietrich D. (1998) Northern front range
         air quality study. Final report prepared for Colorado State University, Cooperative Institute for Research in the
         Atmosphere, Fort Collins, CO, by Desert Research Institute, Reno, NV, STI-996410-1772-FR, June.
June 2009                                    Section 7 - Advanced Analyses                                          48

-------
             Suggested Analyses
        What types of analyses could be done with my air toxics data?
June 2009                  Section 8 - Suggested Analyses

-------
                            Motivation
     •  Ambient air toxics have been monitored since 2001/2002 as part of
       NATTS and even longer as part of other monitoring programs.
       While national-level analyses have been conducted, it is important
       that these data be investigated at a local, state, and regional level
       to better understand an area's air toxics issues.
     •  Regular data analysis may be conducted annually to identify
       potential problems with the data at the site level.  Adjustments can
       then be made in  collection or analysis to improve data quality
       before several years of potentially poor quality data have been
       collected.
     •  A list of suggested air toxics data analyses has been provided
       (Introduction). This list is a potential minimum set of analyses that
       each area could  perform.
     •  Key areas of interest
        -  Is the quality of data sufficient for analysis?
        -  How would air toxics be characterized in the area?
        -  What are local sources of air toxics?
        -  Are there changes in toxics concentrations over time?
June 2009                        Section 8 - Suggested Analyses

-------
                          Suggested Analyses
                        What's Covered in  This Section
      A set of potential analyses using Arizona data has been used as an example.
      •  This section outlines a sample analysis of an urban data set from start to finish in order to
         provide a thorough example. These data were previously assessed and readily available.
      •  Note that this is an example analysis and is not intended to show the only way air toxics
         analyses should be performed.  Deviations or additional analyses may be necessary depending
         on the data or the analyst's objectives.
      •  The following topics will be covered
         following the sequence of this workbook
          -  Background
                 Introduction to the data
                 Understanding sources
          -  Data validation
             (Workbook Section 3)
                 Determining data completeness
                 Assessing data below detection
                 Identifying censored data
                 Using quality-controlled data
                 Applying data validation techniques
          -  Data characterization
             (Workbook Section 4)
                 Putting data in perspective
                 Spatial patterns
                 Temporal patterns
                 Model-to-monitor comparisons
                 Risk screening
          -  Trends
             (Workbook Section 5)
          -  Advanced analyses
             (Workbook Section 6)
              Source apportionment
June 2009
Section 8 - Suggested Analyses

-------
                  Introduction  to the Data

                                    Overview

      •  The sample data set used throughout this section is from an air toxics study
        performed in Arizona as part of the Joint Air Toxics Assessment Project (JATAP).
      •  The purpose of the study was to determine which air toxics are of most concern to
        the area and tribal communities.
      •  The study was conducted in two phases.  (Analyses in this section focus primarily
        on Phase II data.)
         - Phase I: March 2003-March 2004
         - Phase II: February 2005 - March 2005
      •  Twenty-four-hour air toxics samples were collected every sixth day.  On some days
        at some sites, two 12-hr samples were collected; for this analysis, these samples
        were 24-hr averaged. Only gaseous air toxics were collected  and discussed here.
      •  A considerable quality assurance effort was made
         - Duplicate samples (collocated)
         - Replicate data (additional chemical analysis on canister)
         - Interlaboratory comparisons (more than one laboratory was involved)
         - Data validation
      •  For the trend assessment, we used  historical data at two longer-term sites in the
        study area to illustrate air toxics concentrations over time in the area.


June 2009                           Section 8 - Suggested Analyses

-------
                   ntroduction  to the  Data
                         Monitoring  Site Locations
                  st Phoenix L!LGH>ersi
  Senior Center (Sa
                 Greenwood '-
                        th Phoenix f
                                                          Queen Valley
                                                             *
                St. Johns (Gila River)  m
                                                            ADEQ sites

                                                            St Johns site

                                                            Salt River site

                                                            Urban Areas
             West 43rd St.
      The map shows the eight monitoring sites in the study. The map was created with ArcMap. The
      West Phoenix, South Phoenix, and Senior Center sites are used most frequently in the sample
      analyses. The St. Johns site was operated by the Gila River Indian Community. The Senior
      Center site was operated by the Salt River Pima-Maricopa Indian Community.
June 2009
Section 8 - Suggested Analyses

-------
                Understanding  Sources
                            Population Density
      The map shows
      population density in
      the study area. The
      three focus sites are
      indicated.
      Data from these sites
      help identify the most
      populated areas and
      potential air toxics
      source locations (e.g.,
      high population
      density » higher
      emissions).
      2000 population
      density data were
      obtained from the
      U.S. Census Bureau.
                                        Total Population Density
                                  FOUNTAIN HILLS
                            PARADISE VALLEY
                                       APACHE JUNCTION
                                       I I
                                    QUEEN CREdK
Legend
  Airport       Cojnty Boundary
t3 Tribal Land Boundary  City Boundary
Total Pop/sq km
0  5 10  20 Kilometers
l i i i i i i i i
                                         West Phoenix
                                        + South Phoenix
       v "V v v v v
                                        + Senior Center
June 2009
     Section 8 - Suggested Analyses

-------
               Understanding  Sources
                           Mobile Sources
                              	Annual Average Daily Traffic
                                             ' \   v
The map shows annual
average daily traffic
(AADT) and heavy-duty
vehicle (HDV) daily traffic
for the study area (number
of vehicles per day).  The
three sites of interest for
this example are shown.
AADT is an indicator of the
relative on-road mobile
source activity, and
corresponding emissions
levels, in the study area.
Traffic data were obtained
from the Arizona
Department of
Transportation (ADOT).
                              /HDV AADT




                              P f/f
                               *S
                                     HDV Annual Average Daily Traffic
                                       C) /OD
         '
         ,-
                                BUCKEYE
     SUNC
LITCHFIELD PAFJC


      "TOIlLESON
   Fot»


FOUNTAIN
                                                         PARAaSf VALLEY

                                                              Salt Riv
 mDtW

»jM
                                                                   "-   i
                                                                      -
                                                                     !M

                                                             West Phoenix
                                                           + South Phoenix
                                                           + Senior Center
                                                                     a
June 2009
                     Section 8 - Suggested Analyses

-------
                Understanding  Sources
                             Point Sources
                                  Point Source Emissions of VOCs
The map shows point
source emissions for
total VOCs in the study ^irPnse -
area. The three sites
of interest are shown
on the map. Other
sites in the area are
also shown (Supersite
[PSAZ] and St. Johns
[SJAZ]).
Note that mobile
source emissions are
not included in this
data set (see the
average daily traffic
maps on previous
slide).
Emissions data were
obtained from the
2002 NEI.
                                                           VOC (tons/yr)
                                                        Scot
                                                             «   o  O  O O

June 2009
                          Section 8 - Suggested Analyses

-------
      Using  Quality Assurance Data
                      Overview

     Quality assurance (QA) is performed during
     sample collection and analysis to provide
     additional information about data quality and
     usefulness.
     -Collocated samples indicate agreement between
       sample collection
     -Replicate samples indicate agreement between
       sample analysis
     These data provide insight into biases and error
     that may occur in the process of collecting and
     analyzing samples.
June 2009                 Section 8 - Suggested Analyses

-------
         Using  Quality Assurance  Data
         Visual Inspection of Collocated Samples (1 of 2)
      Visual inspection of
      collocated samples is
      important to identify outliers
      and understand sampler
      performance.
      Collocated data for
      chloroform are plotted in
      the figure.
      The data indicate that
      chloroform is consistently
      measured; however
      Sampler 2 reported slightly
      lower values than Sampler 1
      at higher concentrations.
       0.2
       0.16
Q.
3 0.12
CM

0)

g 0.08

"5
o

  0.04
        0
         0
                    Chloroform
                          y = 0.8871x +0.003
                             R2 = 0.9648
          0.04
0.08
0.12
0.16
0.2
                   Collocated 1 (ppbv)
                                   The figure shows collocated chloroform samples collected in
                                   the study. It was created with Microsoft Excel.
June 2009
Section 8 - Suggested Analyses
                                     10

-------
         Using  Quality Assurance  Data
         Visual Inspection of Collocated Samples (2 of 2)
      In this figure, collocated data
      for hexachlorobutadiene are
      plotted to the right; outliers are
      circled in red.  Outliers
      identified from collocated
      samples should be excluded
      from further data analyses.
      The data indicate that
      hexachlorobutadiene is not
      consistently measured;
      Sampler 2 reported lower
      values than Sampler 1 at high
      concentrations. This is
      consistent with observations of
      collocated chloroform data.
         2.4 i
          2 -
        a. 1.61
        a.
        CM

        "§1.21
        •+•»
        re
        o
        o
        O 0.81
         0.4-
          0
               TO15 Hexachlorobutadiene
              o
  o
o
                  o
    y = 1.2427X-0.0883
    O    N = 24
     Standard Error
     Intercept: 0.22
      Slope: 0.31
           0   0.4   0.8    1.2   1.6   2
                   Collocated 1 (ppbv)
                                                                2.4
June 2009
Section 8 - Suggested Analyses
                 11

-------
         Using  Quality Assurance  Data

           Summarizing Sample Problems for Analysis

       The table shows an excerpt from the list of measurements, identifying problems in
       one of the study area site replicate comparisons.
       In site-level analyses, we typically exclude any of these failures. We flagged as
       suspect the pollutant identified as a problem in the indicated sample and did not use
       this pollutant/sample combination in subsequent analyses (e.g., toluene on 7/26/03).
       Flag 1 indicates that the percentage error was greater than 50%. Flag 2 indicates
       that the absolute difference in the two species was greater than three times MDL.
       Flag 3 indicates that the replicate or collocated average was suspect.
Date
7/26/2003
7/26/2003
7/26/2003
8/25/2003
8/25/2003
8/25/2003
8/25/2003
8/25/2003
9/24/2003
Species Name
Toluene
1 ,3,5-trimethylbenzene
1 ,2,4-trimethylbenzene
MTBE
Methyl ethyl Ketone
n-octane
1 ,3,5-trimethylbenzene
1 ,2,4-trimethylbenzene
Methyl ethyl Ketone
Flagl
X
X
X






Flag 2
X







X
FlagS



X
X
X
X
X

Suspect
X
X
X
X
X
X
X
X
X
June 2009
Section 8 - Suggested Analyses
12

-------
              Data Completeness
                        Overview


     For the site-level analysis, we summarized available
     data and calculated data completeness based on
     expected samples.
     This step included calculating the number of valid
     samples versus the expected number of samples
     based on collection frequency.
     In general, 75% data completeness is required to
     calculate valid aggregated values (e.g., monthly,
     quarterly, and annual averages).
     See Preparing Data for Analysis, Section 4, for a
     complete description of methods and rationale.
June 2009                   Section 8 - Suggested Analyses                     13

-------
                    Data Completeness
                          Site-Level Summary
Site
Greenwood
JLG Supersite
Queen Valley
St. Johns
Senior Center
South Phoenix
West Phoenix
Sampling
Cartridges3
Canisters
Cartridges3
Canisters
Canisters
Canisters
Canisters
Cartridges3
Canisters
Canisters
Sampling
Duration
24-hr
24-hr
24-hr
24-hr
24-hr
24-hr
and 12-hr
24-hr
and 12-hr
24-hr
24-hr
24-hr
Samples
Expected
61
61
61
61
31
30 (24-hr)
62 (12-hr)
30 (24-hr)
62 (12-hr)
61
61
61
Samples
Available
60
61
61
61
31
37 (24-hr)
44 (12-hr)
37 (24-hr)
46 (12-hr)
60
60
60
Valid
Samples
60
59
49
55
30
79
83
52
59
59
Percent
Valid
98
97
80
90
97
95b
98b
85
97
97
       The table shows data necessary to calculate the data completeness and the percent of valid
       data. The number of valid samples was computed after data validation steps but shown here for
       a complete summary.
       A high percentage of samples from all sites were valid.
       Additional samples may be marked as suspect during the process of data analysis.
June 2009
                     a Carbonyls only.
Section 8 - Suggested Analyses  b This percentage is based on 24-hr average sample days. 14

-------
       Assessing  Data  Above  Detection
Species
Benzene
Bromomethane
Carbon
tetrachloride
Chloroform
Dichloromethane
Ethylbenzene
Hexachloro-
butadiene
2005 Percent Above MDL
St.
Johns
100
40
89
43
76
71
0
Senior
Center
99
36
89
90
94
92
0
South
Phoeni
X
100
37
89
77
97
92
0
West
Phoenix
100
49
83
83
98
94
0
Green-
wood
100
24
100
98
100
100
2
JLG
Supersite
100
33
100
100
100
100
4
Queen
Valley
100
23
100
53
97
93
0
       The percent of data above detection should be calculated for each pollutant, site and year; additional
       calculations will be needed if monthly or seasonal aggregates are produced. The table shows an
       excerpt of the entire data set - the percent of data above detection for 2005. This example spans the
       range of data above detection observed in the data set.
       Data were color-coded in the table to illustrate potential patterns in data
       availability. More data were below detection at St. Johns and Queen Valley,
       consistent with their location away from sources. Hexachlorobutadiene was
       typically below MDL at all sites.
                                  < 25% Above MDL
                                25% to 75% Above MDL
                                 >= 75% Above MDL
June 2009
Section 8 - Suggested Analyses
15

-------
                Identifying  Censored   Data
     Alternate MDLs were included with
     the study data. Because alternate
     MDLs are often different for each
     sample, it is not always clear from the
     data that censoring (e.g., substitution
     with MDL or MDL/2) has occurred.
     We need to ensure that all samples
     are treated similarly when data are
     aggregated.
     Scatter plots are an easy way to
     identify whether data below detection
     are censored.
     Plot all data points that are less than
     or equal to the alternate MDL.
     The agreement between
     concentration and MDL indicates that
     the alternate MDL was substituted for
     values below detection. These
     samples were identified and MDL/2
     substitution was subsequently applied
     for data aggregation.
   0.6
 <> 0.5
 Q.
 a.
 c
 o
 c
 0)
 o
 c
 o
 O 0.3
   0.2
                     Hexachlorobutadiene
      0.2
0.3
0.5
                         0.4
                     MDL (ppbv)
The graph shows the comparison of concentration values to
their MDL for data at or below detection.  It was created with
Microsoft Excel.
0.6
June 2009
Section 8 - Suggested Analyses
                                    16

-------
                    Validation  Techniques

                                      Overview

      •  Once data are received from the laboratory, or a data repository such as AQS, it is
        useful to apply screening criteria during the early stages of data validation to
        identify suspect data that may not be representative of actual ambient
        concentrations.
      •  Basic visual analyses should be performed to identify potential  problems in the data
        and to begin to understand data characteristics.
      •  Knowledge of similarity of sources, lifetime, and reactivity should be used to assist
        in data validation.
      •  The following screening checks are typically used
          - Comparison to remote background concentrations. Urban air toxics concentrations should
           not be lower than remote background concentrations.
          - Range checks. Check minimum and maximum concentrations for anomalous values.
          - Buddy site check. Compare concentrations at one site to nearby sites to look for
           anomalies.
          - Sticking check. Check data for consecutive equal data values which indicate the possibility
           of censored  data not flagged appropriately.
          - Scatter plots. Investigate the relationship between species to identify sources and suspect
           data.
          - Fingerprint plots.  Investigate the pattern of species concentrations and relationships
           among species to identify sources and suspect data.
      •  See the Preparing Data for Analysis, Section 4, for a complete description of
        methods and rationale.

June 2009                             Section 8 - Suggested Analyses                                 17

-------
  18


  16
•-•»

£ 14
^

5 12


o 10


   8


   6

     
-------
                 Validation  Techniques
                           Buddy Site  Check
       Buddy site checks are useful in
       identifying suspect data.
       In the example, time series of benzene
       concentrations for three sites are
       plotted.
       There is clearly a suspect data point at
       the West Phoenix site in March 2005,
       which is not corroborated by the other
       sites.  This indicates that the data point
       should be considered suspect because
       a concentration spike of that magnitude
       should register at nearby sites.
        - Investigation into these data showed that
          this event corresponds to a single data
          point significantly higher than the others.
        - Further investigation revealed that many
          species showed the same behavior at the
          West Phoenix site. The site may be
          impacted by a local source or sources.
                 .a
                 a_
                 a.
o
W—•
03
-i—•
(U

o
O
                   60
                   50
                   40
                   30
                   20
                   10
                    0
                                                         Benzene
                     CD
             1   I  1   I  I  I
              n West Phoenix
                South Phoenix
                Senior Center
                  ^^^f%f%f&&Vr
-------
                    Validation  Techniques
                                     Time Series
The figures show the same benzene
time series as the previous slide and
matching time series for a variety of
other compounds.
Benzene, ethylbenzene, and toluene
can all be emitted by mobile sources.
The fact that these species peak at
the same time is suspicious, because
an increase of that magnitude from
typical mobile source emissions is
unlikely. However, an unusual event
may have occurred, such as a
gasoline spill very near the West
Phoenix site that could have led to
the high concentrations.
Examining the time series of carbon
tetrachloride helps confirm or reject
this theory because there are no
likely sources that would cause a
spike of that magnitude. The time
series of carbon tetrachloride shows a
spike on the same day indicating that
the event is in fact an instrument or
analysis error. All data for that date
and site should  be flagged as suspect
and not used in subsequent analyses.
                                            n West Phoenix
                                            D South Phoenix
                                              Senior Center
June 2009
                            Section 8 - Suggested Analyses
     7
Senior Center (Salt River)
+
                                                            Greenwood \5


                                                                 South Phoenix

              20

-------
                    Validation  Techniques
                                 Scatter Plots (1 of 2)
The scatter plots show the relationship between
toluene and benzene and toluene and m,p-xylene at
three study sites. This method is another way to
identify suspect data, which have been circled in red
in the figures.
At the West Phoenix site, the correlation between
toluene, benzene, and m,p-xylene is strong, indicating
that this site is highly mobile source-dominated.
Outlier data points may point to data issues or other
source influences.  For toluene outliers, high  toluene
concentrations are often  associated with solvent use
or surface coatings; thus, the samples are likely valid.
The correlations at the South Phoenix site are not
quite as strong, but still indicate that the site is likely
mobile source-dominated.
The Senior Center site, on the other hand, shows
a weak correlation between the three species as
expected for a site farther from fresh emissions.
                                                       40
                                                       30
                                                       20
                                                       10
South Phoenix
N)
o
^
01
-^
o
                                                           0
                                                           °
                          o
                                                              Oo9
                                                                           40
                                                                           30
                                                              1     2
                                                              BENZENE
                                                                     O-
                                                              234
                                                              BENZENE
                                                                      0
                                                                   0
                                                              3 4 5
                                                              BENZENE
                                                                           20
                                                                         O
                                                                           10
TOLUENE
0.0

20
                                                                           15
June 2009
Section 8 - Suggested Analyses
                                                       0
                                                 2    3
                                               MPXYLENE
                                                  0
                                                                              0.5
                                               1.0  1.5
                                              MPXYLENE
                                                                                        2.0  2.5
                                                     0
                                                                      O= outlier
                                             123
                                               MPXYLENE
                                                                                   21

-------
                    Validation  Techniques
                                 Scatter Plots (2 of 2)
                                                                LU
                                                                0
The figures show the same data as in the previous slide for the
West Phoenix site only. The dates of the two highest outliers
have been marked.
The outlier values all correspond to the unusually high toluene
concentrations.  Significantly, the three toluene outliers
correspond with the three highest m,p-xylene events.
These correlations indicate that the high concentrations may not
be due to collection  or analysis errors, but may indicate solvent or
surface-coating emissions impacting the site.  Further exploration
might include assessing the importance of these concentrations
on the annual average and looking for possible sources of
toluene in the emission inventory.
The table below shows emission profiles for surface coating from
EPA's SPECIATE. Xylenes and toluene account for almost one-
third of this source profile supporting the hypothesis that the high  °
concentration events are solvent-driven.
                                                O = outlier
                                                                 40
                                                                 30
Profile llumbei : M02
Profile II, in it: Surface Coating Operations (Industrial)
Percent Total: 1 00






POLLUTANT CAS No.
ISOMERSOFXVLENE 1330207
TOLUENE 108883
METHYL ETHYL KETONE 78933
DIETHYLENE GLYCOL 111466
N-BUTYL ALCOHOL 71363
Percent
15.800
14.700
B.100
6.600
6.400

                                                                 20
                                                                 10
                                                                 40
                                                                 30

                                                                 20
                                                                 10
                                                              West Phoenix
                                                                        2/2 1/2005
                                                                      J3/27/2005
                                                                         1      2
                                                                          BENZENE
                                                                              2/21/2005
                                                                              8/27/2005..
                                                                            2     3
                                                                         MPXYLENE
June 2009
                          Section 8 - Suggested Analyses
22

-------
                  Validation  Techniques
                             Fingerprint Plots
       Fingerprint plots represent
       concentrations of all species by date.
       They are useful for identifying relative
       pollutant concentrations on typical and
       unusual days.
       A typical fingerprint can be quantitatively
       determined (e.g., median sample
       composition) or qualitative (e.g., visual
       inspection of all fingerprints).
       The figures to the right show a typical
       fingerprint plot and fingerprint plots for
       2/21/2005 and 8/27/2005 (the two dates
       of the highest outlier events in the
       previous slides).
       A review of fingerprints listed in EPA's
       SPECIATE shows that toluene and
       xylenes are prominent components of
       surface coatings.
             Q.
             Q.
             O
             '-4—'
             05
             CD
             O
             c
             O
             O
                   Note scale is
                   lower than the
                   other two plots
                             f!)
                                        Typical
                      15 20 25 30  35
15  50  55  60 E5 JO  75  50  Specie
      2/21/2005
                  5  10  15  20  25  30  35 40 45 50 55  GO  65  70  75
June 2009
Section 8 - Suggested Analyses
                23

-------
              Validation Techniques
                          Summary

      What have we learned from applying these validation
      techniques?
      - Additional invalid and suspect data points were identified.
      - Data quality and limitations are better understood.
      - Spatial and temporal characteristics of the data are more
        thoroughly indicated.
      - Hypotheses about possible source influences for further
        investigation can be formed.
      These are a few examples of the data validation process
      that would be performed on the data set.
      Remember, data validation continues as part of data
      analysis.
June 2009                   Section 8 - Suggested Analyses                      24

-------
             Basic  Understanding  of  Data
                              Scatter Plot Matrices
Scatter plot matrices provide a quick and easy
way to view correlations and outliers within a
large amount of data.
Scatter plot matrices are interpreted by
matching the pollutant name on the row and
column corresponding to the scatter plot.
Histograms showing the distribution of
measured values for each pollutant are included
along the top diagonal.
The graph to the right shows scatter plot
relationships for five pollutants at the South
Phoenix site. Note that previously identified
outliers have been removed.
The data show a clear correlation between
toluene, m,p-xylene, and benzene, indicating
that these pollutants are likely from mobile
sources.  Chloroform also shows a slight
correlation with the mobile source pollutants
(across the second row from the bottom) but the
bifurcated relationship indicates a secondary
source. Carbon tetrachloride shows little
correlation with any species and shows a
histogram that is roughly Gaussian, as expected
for background pollutants.







                                                            -fc
                                                             *
                                                                          '

^


^Ci.
                                                                                 \
June 2009
                          Section 8 - Suggested Analyses
               25

-------
         Putting Data In Perspective
                       Overview
     Putting concentrations and MDLs into perspective
     provides a framework for comparing site-level
     concentrations to national  levels and to other sites in
     the area.
     This information is useful in assessing whether
     concentrations are typical, low, or high and can help
     explain the impact of local  source emissions on
     monitored concentrations.
June 2009                  Section 8 - Suggested Analyses                   26

-------
                Putting   Data  In  Perspective
                              National  Concentrations
                                                 Benzene —
                                         Carbon Tetrachloride —
The figure shows the national 5th-95th,
25th-75th, and 50th percentile concentrations by
species (bars) compared to site-averaged     1,3-Butadiene —
concentrations (symbols).
Though Senior Center is the most rural
(although within a few miles of urban
emissions) of the other sites included in the
figure, concentrations are typically
higher than the national median and
sometimes higher than the national             chloroform —
75th percentile concentration, showing
that the site is impacted by urban emissions.
                                     Dichloromethane —
Concentrations at the West and South Phoenix
sites are also typically well above the national
median. Concentrations of benzene and  1 ^Tetrachioroethene
butadiene are near or above the 95th percentile
of national concentrations.                 Trichioroethene -\
National concentrations of carbon tetrachloride
fall within a very small range due to its
ubiquitous background concentration. The
average carbon tetrachloride concentrations at
all study sites are in good agreement with
national levels, providing confidence that  data
collection in the study is representative of
national data collection methods.
                                                                                     |  | 5th :95th National
                                                                                     |gi 2Sh:75lh National
                                                                                      |  Median National
                                                                                     \o\ MCAZ average
                                                                                        SRAZ average
                                                                                     <^> SPAZ average

I
1
Ci!
                                                       0.01
0.1        1         10
    Concentration (|ug/m3)

 MCAZ = West Phoenix
 SPAZ = South Phoenix
 SRAZ = Senior Center
100
June 2009
                              Section 8 - Suggested Analyses
                                27

-------
             Putting  Data  In  Perspective
                                 Cancer Risk
      The figure shows the same
      data as the previous slide,
      with the addition of the
      chronic exposure
      concentration associated with
      a 1-in-a-million cancer risk to
      place health risks in
      perspective.
      Concentrations could  be
      compared to other cancer risk
      levels: 0.1-in-a-million, 10-in-
      a-million, 100-in-a-million, etc.
      Concentrations are typically
      higher than the 1-in-a-million
      cancer risk level shown
      except for dichloromethane
      and sometimes
      trichloroethene.
       1,3-Butadiene
          Benzene —
   Carbon Tetrachloride —
        Chloroform —
     Dichloromethane —
     Tetrachloroethene —
      Trichloroethene —
                 I I 5th:95th National
                 H3 29h:73h National
                 [  Median National
                 [n] MCAZ average
                   SRAZ average
                 «~> SPAZ average

                 /g\ 1-in-a-million
                 ^^ chronic
                   exposure
                   concentration
               0.01
0,1       1        10
   Concentration (jug/m3)

MCAZ = West Phoenix
SPAZ = South Phoenix
SRAZ = Senior Center
100
June 2009
Section 8 - Suggested Analyses
                           28

-------
               Putting  Data  In  Perspective
                                            MDLs
June 2009
Examining the relationship between
MDLs at multiple sites is imperative to
check that MDL/2 substitutions are not
biasing the data differently at different
sites.
The graph shows the average MDL and
minimum-to-maximum MDL range for
three study sites.
This graphical method allows the analyst
to quickly confirm that MDLs are very
similar between sites.
 -  MDLs at the West Phoenix site (light
    purple bar) are sometimes higher than
    at other sites.
 -  The difference is not enough to cause a
    major bias unless a high percentage of
    data is below the MDL. For example,
    hexachlorobutadiene is typically below
    detection so MDL/2 substitution may cause
    concentrations at the West Phoenix site to
    appear higher than at the other sites.
    However,  hexachlorobutadiene, such a
    large portion of data is  below detection that
    it cannot be reliably used for many
    analyses in the first place.

                            Section 8 - Suggested Analyses
                                                  1,1-Dichloroethene TO-ISSIM

                                                  1,2-DichIoroethane TO-ISSIM

                                                  1,2-Diehloropropane TO-ISSIM

                                                     1,3-Butadiene TO-IS

                                                        Benzene TO-ISSIM

                                                    Bro mom ethane TO-IS SIM

                                                  Carbon tetrachloride TO-IS SIM

                                                      Chloroform TO-IS SIM

                                                   Dichloromethane TO-ISSIM

                                                 Hexachlorobutadiene TO-IS

                                                      m,p-xylene -TO-U

                                                 Methyl tert butyl ether TO-14

                                                        o-Xyler»e —TO-IS SIM

                                                        Styrene —TO-ISSIM

                                                  Tetrachloroethene —TO-ISSIM

                                                        Toluene —TO-ISSIM

                                                    Trichloroethene TO-15 SIM

                                                     Vinyl chloride ~ TO-IS SIM
MDL Assessment
  4__

  -.JCM
                                                                    0.01
                                                                           0.1
                                                                          ppbv
                                                                 MCAZ 2005 Min Max MDL Range
                                                                 SRAZ 2005 Min Max MDL Range
                                                                 SPAZ 2005 Min Max MDL Range
                                                                 MCAZ 2005 Avg MDL
                                                                 SRAZ 2005 Avg MDL
                                                                 SPAZ 2005 Avg MDL
               MCAZ = West Phoenix
               SPAZ = South Phoenix
               SRAZ = Senior Center

                            29

-------
                                Spatial   Patterns
          Understanding spatial patterns is important
          and can provide insight into
           -  Improving monitoring networks
           -  Verifying and improving emission inventories
           -  Verifying and improving models
           -  Identifying sources
          The box plots show 2005 concentrations of
          benzene, 1,3-butadiene, chloroform, and
          carbon tetrachloride at three study sites.
          Benzene and 1,3-butadiene concentrations
          are higher and more variable at the West and
          South Phoenix sites.
           -  The lower concentrations and especially lower
              variability at the Senior Center site indicates that
              the site is removed from primary sources and is
              representative of the regional background.
          Chloroform and carbon tetrachloride are
          relatively consistent at all sites.
           -  This behavior is expected for carbon
              tetrachloride which should be at background
              levels across the United States.
           -  That chloroform does not follow the same pattern
              as benzene and 1,3-butadiene indicates the
              compounds probably have different sources.
              Benzene and  1,3-butadiene are primarily emitted
              by mobile sources while chloroform is emitted
              primarily from industrial operations.
                .a
                CL
                CL
                LU
                N
                LU
                CO
                 0.4
               .a
               CL
               CL
                 0.3
               OL
               O
               O
               I
               O
                 0.2
                 0.1
                 0.0
                          2005 Concentrations by Site
                                            2.0
 its
 CL
 0)

 I 1.0
                       MCAZ  SPAZ  SRAZ
                                          ,H- 15
                                            0.0
  0.20
So.15
  0.10
        MCAZ  SPAZ  SRAZ
O
-e
0.05
                       MCAZ  SPAZ  SRAZ
  0.00
        MCAZ  SPAZ  SRAZ
              MCAZ = West Phoenix
              SPAZ = South Phoenix
              SRAZ = Senior Center
          Senior Center (Salt River)
           *
            m
June 2009
Section 8 - Suggested Analyses
                        30

-------
                  Temporal Patterns
                             Overview

     • Characterization of temporal patterns can provide information on
      sources, physical or chemical processes affecting air toxics
      concentrations, and additional data validation.
     • Before beginning temporal characterization, it is recommended to
      create valid aggregated data sets (examples in Characterizing Air
      Toxics, Section 5) to ensure the data are representative.
     • There are sufficient data records in the example data set (i.e., one
      year of samples collected every sixth day) to characterize
      seasonal and weekday/weekend patterns.
     • There are too few records in this data set to create day-of-week
      patterns (i.e., 95% confidence intervals on the means will overlap
      too much across the days because of the small sample size).
     • 1- to 3-hr samples were not collected so diurnal patterns cannot be
      investigated.
June 2009                      Section 8 - Suggested Analyses                         31

-------
                           Temporal  Patterns
                                           Seasonal
         The figures show seasonal patterns for benzene at three sites.

         The South and West Phoenix sites show typical benzene seasonal patterns (see Characterizing Air Toxics,
         Section 5) with lower concentrations during warm months and higher concentrations during cooler months. This
         is a result of mixing height differences and reactivity with season as opposed to changes in sources.

         At the Senior Center site, benzene shows an invariant seasonal pattern. While we expect  higher concentrations
         in winter, note that the concentrations are generally lower during all seasons at this site. All samples are well-
         mixed upon arriving at the Senior Center and are similar to summer concentrations at the other sites.

         These data follow expectations for urban and downwind sites.  The seasonal variability for  these pollutants
         shows that for the urban data, computed annual averages without the winter quarter would be biased low and
         vice versa for a missing summer quarter.
       .a
       a.
       W 2
       -z.
       LLJ
       N
       -z.
       LLJ
       m
                West Phoenix
               \     i    i     r
                        I	I
.a
a.
LU 2
~z.
LU
N
~Z.
LLJ
m
         South Phoenix
&
a.
LLI 2
-z.
LLI
N
-z.
LLI
GO
         Senior Center
June 2009
  Section 8 - Suggested Analyses
                                  32

-------
                        Temporal  Patterns

                               WeekdayA/Veekend
        The figures show weekday and weekend benzene concentrations at three study monitoring
        sites.
        Typically, we would expect lower MSAT concentrations on weekends, but in practice this is not
        always observed.
        The West Phoenix site shows higher weekend concentrations, but the difference is not
        statistically significant at 95% confidence. This difference may indicate that additional weekend
        events near the site are causing benzene emissions.  For example, monitors placed near a
        facility with high use on weekends, such as a recreational facility, may cause this pattern.
        Additional investigation of the surrounding area may be warranted but was not done.
        The South Phoenix site shows slightly lower weekend concentrations (but not statistically
        significant).  This pattern is more typical of urban sites at a national level.
        The Senior Center site shows invariant weekday/weekend patterns consistent with the well-
        mixed and aged nature of samples arriving at the site.
               West Phoenix                South Phoenix               Senior Center
        31        i       i        i   31       i        i       i   3
       >
      JD
       Q.
      ^Q.

       o;
       c
       a;
       N
       c
       OJ
      DO
> o
JD Z
a.
^o.

a;
c
a;
N
c
OJ -i
DQ  '
              Weekday  Weekend
                                   0
> 9
JD Z
Q.
o;
c
a;
N

a;
DQ
June 2009
        Weekday  Weekend


  Section 8 - Suggested Analyses
        Weekday  Weekend
                               33

-------
                             Risk  Screening
                                       Overview
        Risk screening may provide a summary of ambient concentrations of air toxics that
        may be of concern.
        To identify species which may indicate  higher risk, follow the decision tree below for
        each pollutant.
        After risk species have been identified,  you may wish to create risk-weighted annual
        averages.
        The screening here uses the 1-in-a-million cancer risk level - one could select a
        higher or lower risk level and define the level of concern depending on the purpose of
        the screening.  Other health effects, such as non-cancer threshold values, could  be
        used as well.
                                     Is 85% of data for this
                                    site-pollutant below MDL?
                                    Yes
                                Is health
                             benchmark above
                                 MDL?
                   Is site-average
                 concentration above
                 health benchmark?
                         Yes
                     Pollutant
                    concentration
                   is below health
                    benchmark
 Site-pollutant is
uncharacterizable
                    Upper limit
                       of risk
                      <1x10-6
 Upper limit
   of risk
   >1x10-6
                                                   Yes
  Pollutant
concentration is
 above health
  benchmark
   Risk
  >1x10-6
  Pollutant
concentration
is below health
 benchmark
  Risk
 <1x10-6
June 2009
   Section 8 - Suggested Analyses
                                                                       (ICF Consulting, 2004)
                                          34

-------
                               Risk  Screening
                                   West Phoenix  Site
                        West Phoenix data necessary for risk screening
Pollutant

Benzene
Hexachlorobutadiene
% Below
Detection

0
100
1-in-a-
million
cancer risk
(ppbv)
0.040
0.0043
Average
Method
Detection
Limit (ppbv)
0.50
0.13
West Phoenix
Site Average
Concentration
(ppbv)
1.7
0.17
         Perform risk screening by applying all the data listed in the table to the risk-screening decision
         tree (see previous slide).  Screening may be performed on a range of risk levels and also for
         non-cancer levels of concern.
         Benzene
          -  More than 85% of data is above detection so there is high confidence in measured concentrations.
          -  The site average concentration is above the chronic exposure concentration associated with a 1-in-a-
             million cancer risk.
         Hexachlorobutadiene
          -  100% of data is below detection so we have no confidence that the measured concentrations accurately
             reflect ambient concentrations. However, we  know that concentrations are below the MDL (note that
             MDLs varied by sample and the average is shown).
          -  The chronic exposure concentration  associated with a 1-in-a-million cancer risk is below the MDL.
          -  We know that both the data and the  cancer risk level of 1-in-a-million are below the MDL- improved data
             collection methods are necessary to more accurately characterize risk. The upper limit of risk is based on
             the MDL.
June 2009
Section 8 - Suggested Analyses
35

-------
                                 Trends
                             Five-Year Trends
       Inter-annual trends were investigated for all
       pollutants with sufficient data.
       The notched box plots show benzene
       concentrations at two sites with data available
       from 2001 to 2005.
       Benzene concentrations have remained relatively
       flat at the JLG Supersite and South Phoenix site.
       However,  there is a statistically significant
       difference between the 2001 and 2005
       concentrations at the South Phoenix site.
       Trends for other air toxics showed similarly
       consistent concentrations from year to year for this
       time period.
       Once six years of data are available, two 3-yr
       averages should be compared (i.e., average of
       2001, 2002, and 2003 vs. 2004, 2005, and 2006;
       see Quantifying Trends, Section 6).
                              JLG Supersite
                        _a
                        o.
                        LLJ
                        -
                         .
                        LU
                        N
                        ~
                         .
                        LLJ
                        CO
                         2?,
00 2001 2002 2003 2004 2005 2006 2007
        YEAR
                              South Phoenix
                        .a
                        o.
                        o.

                        LLJ
                        -z.
                        LLJ
                        N

                        LLJ
                        CO
                                                        2000 2001 2002 2003 2004 2005 2006 2007
                                                                 YEAR
June 2009
Section 8 - Suggested Analyses
                      36

-------
                       Source  Apportionment
                                              Example
         Principal component analysis (PCA) was applied to air toxics
         data from two sites, South Phoenix and West 43rd St., as part of
         an exploratory analysis. PCA uses correlation or covariance
         between each pair of variables to estimate relationships. PCA
         is relatively easy to perform with basic statistical packages;
         however, the analyst must infer source types from the factors.
         In South Phoenix, PCA resolved six factors, accounting for 81%
         of the variance. These data are illustrated in the top pie chart
         (note that the percentages are percent of variance explained in
         the data, not percent of the mass).
          -  37%: Mobile sources (benzene, 1,3-butadiene, xylenes, toluene,
             ethyl benzene)
          -  9%: Background (carbon tetrachloride, methyl ethyl ketone)
          -  11%: Secondary (formaldehyde, acetaldehyde)
          -  6%: Summer gasoline additives (MTBE)
          -  9%: Plastics (methylene chloride)
          -  9%: Refrigerants/AC (dichlorodifluoromethane, trichlorofluoromethane)
         PCA resolved four factors at the West 43rd Phoenix site,
         accounting for 82% of the variance; carbonyl compound data
         were not available  at this site (so fewer factors were resolved).
          -  33%: Mobile sources (benzene, xylenes, toluene , ethylbenzene)
          -  20%: Summer sources, e.g., BBQs, air conditioning
             (trichlorofluoromethane, acetylene, propylene)
          -  14%: Secondary/background (MEK, MTBE, dichlorodifluoromethane)
          -  15%: Plastics (trimethylbenzenes)
         Next steps in this analysis may be to apply CMB or PMF to
         estimate source contributions.
                                     South Phoenix
                            Refrigerants, 9%
                        Plastics,
                     Secondary, 11%
                          Background, 9
Mobile, 37%
                                             Summer Gasoline
                                             Additives, 6%

                                      West 43rd St.
                             Plastics, 15%
                      Secondary and
                    Background, 15%
                                                         Mobile, 33%
                                                                            Summer Sources, 20%
June 2009
Section 8 - Suggested Analyses
        37

-------
      Model-to-Monitor Comparisons
                        Overview

    • EPA periodically performs national-scale air toxics
     assessment (NATA) to identify and prioritize air toxics
     emissions source types and locations which are of
     greatest potential concern in terms of contributing to
     population health risk.  Modeled concentration
     estimates for 177 air toxics and DPM are provided by
     county.  For more information on NATA see
     http://www.epa.gov/ttn/atw/natamain/.
    • As part of an evaluation of how models used in NATA
     performed, EPA conducted a monitor-to-model
     evaluation to evaluate modeled values.
    • A comparison of monitored and modeled data may
     help in checking the uncertainty of modeled values.

June 2009                  Section 8 - Suggested Analyses                    38

-------
          Model-to-Monitor  Comparisons
                                      Example
        The figure shows the ratio of NATA99 modeled
        data to annual averages computed from
        monitored data at the study area sites to indicate
        the accuracy of modeled data. This example is
        meant to illustrate a technique - note that the
        modeled and ambient data are from different
        years.
        When comparing modeled-to-monitored
        concentrations, results within a factor of 2 are
        considered reasonable agreement (U.S.
        Environmental Protection Agency, 2006b).
        Acetaldehyde, benzene, dichloromethane, and
        trichloroethene typically agreed within a factor of
        2, consistent with national-level comparisons of
        modeled and monitored data.
        However, ethylbenzene, formaldehyde, carbon
        tetrachloride, chloroform, and
        tetrachloroethylene showed monitored
        concentrations more than a factor of 2 higher
        than model estimates at study area sites. There
        are many possible reasons for the differences.
        For example, the carbon tetrachloride model
        estimates have been shown to be low because
        of the use of background concentrations that
        were too low.
              cc.
              o
               1.00
              LJJ
              Q
              O
               0.10
                                        o
                                        I
                    The graph shows the comparison of modeled
                    to monitored annual averages at the study
                    area sites. Boxes are described in Section 4:
                    Preparing Data for Analysis.
June 2009
Section 8 - Suggested Analyses
39

-------
                                    Summary
           What We Learned from this Data  Analysis (1 of 2)
            Overall data completeness was sufficient for analysis.
            For species data above detection were sufficient to perform most analysis, while a
            significant percent of some species' data were below detection.
            QA analyses showed agreement between collocated data were typical of what other
            studies have concluded.
            Data were validated using time series, buddy site checks, scatter plots, and fingerprint
            plots. Invalid data points were identified and removed.
            Data were determined to be of sufficient quality for most analyses.


            Air toxics concentrations  in the study area were compared to national concentrations and
            chronic exposure concentrations associated with a 1-in-a-million cancer risk;
            concentrations of most air toxics are above the national median concentration at all study
            sites and are typically above the selected levels  of risk. It is not clear why, and an
            evaluation/development of the air toxics emission inventory is planned
            MDLs at study sites were found to be similar across sites so that data are comparable.

            Spatial analyses showed concentrations were similar at the South and West Phoenix sites
            while significantly lower concentrations of MSATs at the Senior Center site were consistent
            with the sites' proximity to emissions.
June 2009                              Section 8 - Suggested Analyses                                  40

-------
                                        Summary
            What We  Learned from  this  Data  Analysis (2 of 2)
           -  Temporal patterns were investigated.
               •   Seasonal patterns showed expected trends at the West and South Phoenix sites. Senior Center site
                  benzene concentrations were low and showed no seasonal trend consistent with aged air impacting the
                  site.
               •   There were no significant weekend/weekday patterns, a typical result as truck traffic or weekday carryover
                  often cause increased Saturday concentrations.  There were not enough data points to reliably investigate
                  trends by day-of-week.
           -  Ambient annual average concentrations were compared to NATA 1999 modeled data.  About half the
              species monitored at study area sites were more  than two times above their modeled concentration
              values. Inspection of the emission inventory for the study area may be a next step.
           -  Risk screening was performed and the species of most concern were found to be benzene,
              1,3-butadiene, acetaldehyde, carbon tetrachloride, chloroform, and tetrachloroethene.
              Hexachlorobutadiene may be a contributor to risk, but is not measured well enough to quantify the risk.



           -  Five year trends (2001-2005) showed no significant change at the study sites


           -  PCA was performed for South Phoenix and West 43rd St.  Mobile sources contributed to about one-
              third of the variance at both sites.  Pollution related to plastics, background species, and secondary
              species contributed about another third.  Both sites showed significant influence from "summer"
              pollutants related to BBQs, air-conditioning/refrigerants, and summer fuel additives.
           -  Mobile source influences were confirmed by other analyses.
               •   Scatter plots showed strong correlation between mobile source air toxics.
               •   Spatial  patterns revealed higher mobile source concentrations near busy roadways and much lower
                  concentrations in remote areas
           -  Short-term solvent emissions events were identified during the process of data validation.

June 2009                                 Section 8 - Suggested Analyses                                     41

-------
References
                                                                    (1of2)

        Arizona Department of Transportation (2005) Average Annual Daily Traffic (AADT). Available on the Internet at

        Brown S.G., Hafner H.R., and Shields E. (2004) Source apportionment of Detroit air toxics data with positive matrix
           factorization. Paper no. 41 presented at the Air & Waste Management Association Symposium on Air Quality
           Measurement Methods and Technology, Research Triangle Park, NC, April 19-22 (STI-2450).
        Brown S.G. and Hafner H.R.  (2003) Source apportionment of Detroit pilot city air toxics data. Presented at the
           National Workshop on Air Toxics Monitoring, Chicago, IL, May 13-14 (STI-902530-2371).
        Brown S.G., Frankel A., and  Hafner H.R. (2005) Principal component analysis and source apportionment of PAMS
           VOC data. Final report prepared for the South Coast Air Quality Management District, Diamond Bar, CA, by
           Sonoma Technology, Inc., Petaluma, CA, STI-904046-2723-FR, July.
        Hafner H.R. and Brown S.G.  (2005) 2005 JATAP monitoring project - gaseous air toxics data validation and
           analysis. Work plan prepared for the Arizona Department of Environmental Quality, Phoenix, AZ, by Sonoma
           Technology, Inc., Petaluma, CA, STI-905039.01-2814-WP, October.
        Hafner H.R., O'Brien T.E., Frankel A.P., McCarthy M.C.,  and Brown S.G. (2006) 2005 JATAP monitoring project
           gaseous air toxics data validation and analysis. Presented at the JATAP Workshop Meeting, Phoenix, AZ,
           March 6, by Sonoma Technology, Inc., Petaluma, CA (905039.02-2921).
        Hafner H.R. and O'Brien T.E. (2006) Analysis of air toxics collected as part of the Joint Air Toxics Assessment
           Project. Final report prepared for the Arizona Department of Environmental Quality, Phoenix, AZ, by Sonoma
           Technology, Inc., Petaluma, CA, STI-905039.03-3016-FR, December.
        Henry R.C. (2000) Unmix Version 2 Manual. Available on the Internet at
            last accessed September 9, 2005.
        Hopke P.K. (2003) A guide to positive matrix factorization. Prepared for Positive Matrix Factorization Program,
           Potsdam, NY, by the Department of Chemistry, Clarkson University, Potsdam, NY.
        ICF Consulting (2004) Air toxics risk assessment reference library, Volume 1. Prepared for the U.S. Environmental
           Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park, NC, by ICF
           Consulting, Fairfax, VA, EPA-453-K-04-001A, April. Available on the Internet at
           <"'••','. ,.,'.,• \ i  "'   - " r -! • '      •        >.

June 2009                                    Section 8 - Suggested Analyses                                         42

-------
                                  References
(2  of 2)
        Lewis C.W., Morris G.A., Conner T.L., and Henry R.C. (2003) Source apportionment of Phoenix PM2.5 aerosol
           with the Unmix receptor model. Journal of Air and Waste Management Association 53 (3), 325-338.
        McCarthy M.C., Brown S.G., Hafner H.R., Frankel A., and Broaders K.E. (2004) Data analyses for Phoenix,
           Arizona, air toxics data collected from 2001 to 2004. Final report prepared for Arizona Department of
           Environmental Quality, Phoenix, AZ, by Sonoma Technology, Inc., Petaluma, CA, STI-904236-2666-FR,
           December.
        McCarthy M.C., Hafner H.R., and Montzka S.A. (2006) Background concentrations of 18 air toxics for North
           America. J.  Air & Waste Manage. Assoc. 56, 3-11 (STI-903550-2589). Available on the Internet at

        Sundblom M., Armijo C., and Hafner H. (2006) Joint Air Toxics Assessment Project (JATAP) for the
           Maricopa/Pinal urban area, Arizona. Presentation for the EPA National Air Monitoring Conference, Las Vegas,
           NV, Novembers, by the Arizona Department of Environmental Quality, the Salt River Pima Maricopa Indian
           Community, and Sonoma Technology, Inc., Petaluma, CA.
        U.S. Environmental Protection Agency (1998) CMB8 application and validation protocol for PM2.5 and VOC.
           Report prepared by U.S. Environmental Protection Agency, Research Triangle Park, NC, EPA 454/R-98-xxx,
           October.
        U.S. Environmental Protection Agency (2005) Prioritization of Data Sources for Chronic Exposure. Available on
           the Internet at
        U.S. Environmental Protection Agency (2006) Technology Transfer Network, 1999 National-Scale Air Toxics
           Assessment, 1999 assessment results. Available on the Internet at
           .
        U.S. Environmental Protection Agency (2006)  A Preliminary risk-based screening approach for air toxics
           monitoring data sets. Available on the Internet at
        U.S. Environmental Protection Agency (2006) Comparison of ASPEN Modeling System Results to Monitored
           Concentrations.  Available on the Internet at
June 2009                                    Section 8 - Suggested Analyses                                        43

-------