vxEPA
United States
Environmental Protection
Agency
    EPA Positive Matrix
  Factorization (PMF) 5.0
     Fundamentals and
        User Guide
    RESEARCH AND DEVELOPMENT

-------

-------
                                   EPA/600/R-14/108
                                       April 2014
                                      www.epa.gov
   EPA Positive Matrix
Factorization (PMF)  5.0
    Fundamentals and
          User Guide
            Gary Morris, Rachelle Duvall
          U.S. Environmental Protection Agency
         National Exposure Research Laboratory
           Research Triangle Park, NC 27711

              Steve Brown, Song Bai
             Sonoma Technology, Inc.
              Petaluma, CA 94954
          U.S. Environmental Protection Agency
          Office of Research and Development
              Washington, DC 20460
 Notice: Although this work was reviewed by EPA and approved for
 publication, it may not necessarily reflect official Agency policy. Mention of
 trade names and commercial products does not constitute endorsement or
 recommendation for use.

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


                                  Disclaimer

EPA through its Office of Research and Development funded and managed the research and
development described here under contract 68-W-04-005 to Lockheed Martin and EP-D-09-097
to Sonoma Technology, Inc.  The User Guide has been subjected to Agency review and is
cleared for official distribution by the EPA.  Mention of trade names or commercial products
does not constitute endorsement or recommendation for use.

This User Guide is for the EPA PMF 5.0 program and the disclaimer for the software is shown
below.

The United States Environmental Protection Agency through its Office of Research and
Development funded and collaborated in the research described here under Contract Number
EP-D-09-097 to Sonoma Technology, Inc.

Portions of the code are Copyright ©2005-2014 ExoAnalytics Inc. and Copyright ©2007-2014
Bytescout.
                               Acknowledgments

The Multilinear Engine is the underlying program used to solve the PMF problem in EPA PMF
and version me2gfP4_1345c4 has been developed by Pentti Paatero at the University of
Helsinki and Shelly Eberly at Geometric Tools (http://www.geometrictools.com/). Shelly Eberly,
Pentti Paatero, Ram Vedantham, Jeff Prouty, Jay Turner, and Teri Conner have contributed to
the development of this and prior versions of EPA PMF. EPA would like to thank EPA PMF
Peer Reviewers for their comments on the software and user guide, and for providing an
improved list of PMF references.

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide



                               Table of Contents


1.  INTRODUCTION	1
   1.1  Model Overview	1
   1.2 Multilinear Engine	3
   1.3 Comparison to EPA PMF 3.0 and Other Methods	5

2.  USES OF PMF	6

3.  INSTALLING EPA PMF 5.0	11

4.  GLOBAL FEATURES	12

5.  GETTING STARTED	14
   5.1  Input Files	14
   5.2 Output Files	17
   5.3 Configuration Files	18
   5.4 Suggested Order of Operations	18
   5.5 Analyze Input Data	19
       5.5.1   Concentration/Uncertainty	20
       5.5.2  Concentration Scatter Plots	25
       5.5.3  Concentration Time Series	26
       5.5.4  Data Exceptions	27
   5.6 Base Model Runs	27
       5.6.1   Initiating a Base Run	28
       5.6.2  Base Model Run Summary	29
       5.6.3  Base Model Results	31
       5.6.4  Factor Names on Base Model Runs Screen	40
   5.7 Base Model Displacement Error Estimation	42
   5.8 Base Model BS Error Estimation	43
       5.8.1   Summary of BS Runs	45
       5.8.2  Base Bootstrap Box Plots	46
   5.9 Base Model BS-DISP Error Estimation	48
   5.10 Interpreting Error Estimate Results	50

6.  ROTATIONAL TOOLS	52
   6.1  Fpeak Model Run Specification	52
       6.1.1   Fpeak Results	53
       6.1.2  Evaluating Fpeak Results	57
   6.2 Constrained Model Operation	58
       6.2.1   Constrained Model Run Specification	58
       6.2.2  Constrained Profiles/Contribution Results	65
       6.2.3  Evaluating Constraints Results	68

7.  TROUBLESHOOTING	70

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


8.  TRAINING EXERCISES	71
   8.1  Milwaukee Water Data	72
       8.1.1   Data Set Development	72
       8.1.2   Analyze Input Data	73
       8.1.3   Base Model Runs	73
       8.1.4   Error Estimation	77
   8.2  St. Louis Supersite PM25 Data Set	78
       8.2.1   Data Set Development	78
       8.2.2   Analyze Input Data	81
       8.2.3   Base Model Runs	83
       8.2.4   Error Estimation	85
       8.2.5   Constrained Model Runs	85
   8.3  Baton Rouge PAMS VOC Data Set	87
       8.3.1   Data Set Development	90
       8.3.2   Analyze Input Data	91
       8.3.3   Base Model Runs	93
       8.3.4   Base Model Run Results	94
       8.3.5   Fpeak	100
       8.3.6   Constrained Model Runs	103

9.  PMF & APPLICATION REFERENCES	105
                                          IV

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide

                                   List of Figures

Figure 1.  Conjugate Gradient Method - underpinnings of PMF solution search	4
Figure 2.  Example of resizable sections and status bar	13
Figure 3.  Example of the Input Files screen	15
Figure 4.  Example of formatting of the Input Concentration file	16
Figure 5.  Example of an equation-based uncertainty file	16
Figure 6.  Flow chart of operations within EPA PMF - Base Model	19
Figure 7.  Flow chart of operations within EPA PMF- Fpeak	20
Figure 8.  Flow chart of operations within EPA PMF - Constraints	21
Figure 9.  Example of the Concentration/Uncertainty screen	22
Figure 10. Example of a concentration scatter plot	26
Figure 11. Example of the Concentration Time Series screen with excluded and selected samples	28
Figure 12. Example of the Base Model Runs screen showing Random Start (1) and Fixed Start (2)	29
Figure 13. Example of the Base Model Runs screen after base runs have been completed	30
Figure 14. Example of the Residual Analysis screen	32
Figure 15. Example of the Obs/Pred Scatter Plot screen	33
Figure 16. Example of the Obs/Pred Time Series screen	33
Figure 17. Example of the Profiles/Contributions screen	34
Figure 18. Example of the Profiles/Contributions screen with "Concentration Units" selected	35
Figure 19. Example of the Profiles/Contributions screen with "Q/Qexp" selected	36
Figure 20. Example of the Factor Fingerpints screen	37
Figure 21. Example of the G-Space Plot screen with a red line indicating an edge	38
Figure 22. Example of the Factor Contributions screen	39
Figure 23. Example of the Base Model Runs screen with default base model run factor names	41
Figure 24. Comparison of upper error estimates for zinc source	41
Figure 25. Example of the Base Model Displacement Summary screen	43
Figure 26. Example of the Base Model Runs screen highlighting the Base Model Bootstrap Method
          box	45
Figure 27. Example of the Base Bootstrap Summary screen	46
Figure 28. Example of the Base Bootstrap Box Plots screen	47
Figure 29. Diagram of box plot	47
Figure 30. Example of the Base Model BS-DISP Summary screen	49
Figure 31. Error estimation summary plot	51
Figure 32. Example of the Fpeak Model Run Summary in the Fpeak Model Runs screen	53
Figure 33. Example of the Fpeak Profiles/Contributions screen	54
Figure 34. Example of the Fpeak Factor Fingerprints screen	55

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide

Figure 35. Example of the Fpeak G-Space Plot screen	56
Figure 36. Example of the Fpeak Factor Contributions screen	57
Figure 37. G-Space plot and delta between the base run contribution and Fpeak run contribution
           for each contribution point	58
Figure 38. Expression Builder- Ratio	60
Figure 39. Expression Builder- Mass Balance	60
Figure 40. Expression Builder-Custom	61
Figure 41. Example of expressions on the Constrained Model Runs screen	61
Figure 42. Selecting constrained species and observations	62
Figure 43. Example of selecting points to pull to the y-axis in the G-space plot	63
Figure 44. Example of the Constrained Model Run summary table	64
Figure 45. Example of the Constrained Profiles/Contributions screen	65
Figure 46. Example of the Constrained Factor Fingerprints screen	66
Figure 47. Example of the Constrained G-Space Plot screen	67
Figure 48. Example of the Constrained Factor Contributions screen	68
Figure 49. Example of the Constrained Diagnostics screen	69
Figure 50. PMF results evaluation process	71
Figure 51. Deep tunnel system	73
Figure 52. Scatter plot of BOD5 and TSS	74
Figure 53. Example of observed/predicted results for cadmium	74
Figure 54. Stacked Graph plot	75
Figure 55. Profiles/Contributions Plot for mulitiple site data	76
Figure 56. Observed/Predicted Time Series Plot for multiple site data	77
Figure 57. Comparison of error estimation results	78
Figure 58. Error estimation summary plot of range of concentration by species in each factor	79
Figure 59. Satellite image of St. Louis Supersite and major emissions sources	80
Figure 60. Concentration Time Series screen and zoomed-in diagram for the St. Louis data set	81
Figure 61. Concentration scatter plots for steel elements	82
Figure 62. Example of output graphs for cadmium (poorly modeled) and lead (well-modeled)	83
Figure 63. Example of inconsistencies  in input data.  The multiple points shown in blue in the lower
           left graphic are fixed values	84
Figure 64. Example of G-space plots for independent (left) and weakly dependent factors (right)	85
Figure 65. St. Louis stacked base factor profiles	86
Figure 66. Distribution of mass for St. Louis PM2 5	87
Figure 67. Summary of base run and error estimates	88
Figure 68. Comparison of base model and constrained model run profiles for the steel factor	88
Figure 69. Summary of constrained run and error estimates	90
Figure 70. Relationships between ambient concentrations of various species	92
                                              VI

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide

Figure 71.  Histogram of scaled residuals for benzene (1) and ethylene (2)	95
Figure 72.  Observed/predicted plots for benzene	96
Figure 73.  Observed/predicted plots for ethylene	97
Figure 74.  VOC factor profiles	98
Figure 75.  Measured VOC profile information. Source: Fujita (2001)	99
Figure 76.  Factor fingerprint plot for VOCs	100
Figure 77.  G-Space plot of motor vehicle and diesel exhaust	101
Figure 78.  Apportionment of TNMOC to factors resolved in the initial 4-factor base run	101
Figure 79.  Observed vs. Predicted Time Series for refinery species	103
Figure 80.  Percent of species associated with a source (1) and Toggle Species Constraint (2)	104
                                             VII

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide

                                List of Tables

Table 1. Summary of key references	6
Table 2. Baltimore example - summary of PMF input information	24
Table 3. Common problems in EPA PMF 5.0	70
Table 4. Milwaukee Example - Summary of PMF Input Information	72
Table 5. St. Louis Example - Summary of PMF input information	80
Table 6. Error Estimaton Summary results	89
Table 7. Baton Rouge Example - Summary of PMF input information	91
Table 8. VOC species categories	93
Table 9. Base run boostrap mapping	102
                                        VIM

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
                          Acronyms
Acronym
AMS
BODS
BS
BS-DISP
Cl
CMB
DDP
DISP
EC
EDXRF
GUI
MDL
ME
ME-2
Obs/Pred
OC
PAMS
PCA
PM
PMF
S/N
TNMOC
TSS
VOC
Definition
Aerosol mass spectrometer
Biological oxygen demand
Bootstrap
Bootstrap-Displacement
Confidence interval
Chemical mass balance
Discrete difference percentiles
Displacement
Elemental carbon
Energy dispersive X-ray fluorescence
Graphical user interface
Method detection limit
Multilinear Engine
Multilinear Engine version 2
Observed/Predicted
Organic carbon
Photochemical assessment monitoring stations
Principal component analysis
Particulate matter
Positive Matrix Factorization
Signal-to-noise ratio
Total non-methane organic carbon
Total suspended solids
Volatile organic compound
                               IX

-------

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


1.     Introduction

1.1    Model Overview

Receptor models are mathematical approaches for quantifying the contribution of sources to
samples based on the composition or fingerprints of the sources. The composition or speciation
is determined using analytical methods appropriate for the media, and key species or
combinations of species are needed to separate impacts. A speciated data set can be viewed
as a data matrix X of / by; dimensions, in which / number of samples and; chemical species
were measured, with uncertainties u. The goal of receptor models is to solve the chemical
mass balance (CMB) between measured species concentrations and source profiles, as shown
in Equation 1-1, with number of factors p, the species profile f of each source, and the amount
of mass g contributed by each factor to each individual sample (see Equation  1-1):


                                       */»+*„                                 (1-D
where e/, is the residual for each sample/species. The CMB equation can be solved using
multiple models including EPA CMB, EPA Unmix, and EPA Positive Matrix Factorization (PMF).

PMF is a multivariate factor analysis tool that decomposes a matrix of speciated sample data
into two matrices:  factor contributions (G) and factor profiles (F).  These factor profiles need to
be interpreted by the user to identify the source types that may be contributing to the sample
using measured source profile information, and emissions or discharge inventories. The
method is  reviewed briefly here and described in greater detail elsewhere (Paatero and Tapper,
1994; Paatero, 1997).

Results are obtained using the constraint that no sample can have significantly negative source
contributions. PMF uses  both sample concentration and user-provided uncertainty associated
with the sample data to weight individual points.  This feature allows analysts to account for the
confidence in the measurement. For example, data below detection can be retained for use in
the model, with the associated uncertainty adjusted so these data points have less influence on
the solution than measurements above the detection limit.

Factor contributions and profiles are derived by the PMF model minimizing the objective
function Q (Equation 1-2):
n m
ZV
2^
i=\ j=\
p
^1 F
ij / i o ik J ki
k=l
Urj
                                                    n2
                                                                                (1-2)
Q is a critical parameter for PMF and two versions of Q are displayed for the model runs.

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


   •   Q(true) is the goodness-of-fit parameter calculated including all points.
   •   Q(robust) is the goodness-of-fit parameter calculated excluding points not fit by the
       model, defined as samples for which the uncertainty-scaled residual is greater than 4.

The difference between Q(true) and Q(robust) is a measure of the impact of data points with
high scaled residuals.  These data points may be associated with peak impacts from sources
that are not consistently present during the sampling period. In addition, the uncertainties may
be too high, which result in similar Q(true) and Q(robust) values because the residuals are
scaled by the uncertainty.

EPA PMF requires multiple iterations of the underlying Multilinear Engine (ME) to help identify
the most optimal factor contributions and profiles.  This is due to the nature of the ME algorithm
that starts the search for the factor profiles with a randomly generated factor profile.  This factor
profile is systematically modified using the gradient approach to chart the optimal path to the
best-fit solution.  In spatial terms, the model constructs a multidimensional space using the
observations  and then traverses the space using the gradient approach to reach its final
destination of the best solution along this path. The best solution is typically identified by the
lowest Q(robust) value along the path (i.e., the minimum Q) and  may be imagined as the bottom
of a trough in the multidimensional space. Due to the random nature of the starting point, which
is determined by the seed value and the path it dictates, there is no guarantee that the gradient
approach will always lead to the deepest point in the multidimensional space (global  minimum);
it may instead find a local minimum. To maximize the  chance of reaching the global  minimum,
the model should be run 20 times developing a solution and 100 times for a final solution, each
time with a different starting point.

Because Q(robust) is not influenced by points that are  not fit by PMF, it is used as a  critical
parameter for choosing the optimal run from the multiple runs.  In addition, the variability of
Q(robust) provides an indication of whether the initial base run results have significant variability
because of the random seed used to start the gradient algorithm in different locations.  If the
data provide a stable path to the minimum, the Q(robust) values  will have little variation between
the runs. In other cases, the combination of the starting point and the space defined by the data
will impact the path to the minimum, resulting in varying Q(robust) values; the lowest Q(robust)
value is used by default since it represents the most optimal solution. It should be noted that a
small  variation in Q-values does not necessarily indicate that the different runs have  low
variability between source compositions.

Variability due to chemical transformations or process  changes can cause significant differences
in factor profiles among PMF runs. Two diagnostics are provided to evaluate the differences
between runs: intra-run residual analysis and a factor  summary  of the species distribution
compared to those of the lowest Q(robust) run. The user must evaluate  all of the error
estimates in  PMF to understand the stability of the model results; the algorithms and ME output
are described in Paatero et al. (2014).  Variability in the PMF solution can be estimated using
three  methods:

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


    1.  Bootstrap (BS) analysis is used to identify whether there are a small set of observations
       that can disproportionately influence the solution.  BS error intervals include effects from
       random errors and partially  include effects of rotational ambiguity. Rotational ambiguity
       is caused by the existence of infinite solutions that are similar in many ways to the
       solution generated by PMF. That is, for any pair of matrices, infinite variations of the pair
       can be generated by a simple rotation. With only one constraint of non-negative source
       contributions, it is impossible to restrict this space of rotations. BS errors are generally
       robust and are not influenced by the user-specified sample uncertainties.
    2.  Displacement (DISP) is an analysis method that helps the user understand the selected
       solution in finer detail, including its sensitivity to small changes. DISP error intervals
       include effects of rotational  ambiguity but do not include effects of random errors in the
       data. Data uncertainty can  directly impact DISP error estimates.  Hence, intervals for
       downweighted species are likely to be large.
    3.  BS-DISP  (a hybrid approach) error intervals include effects of random errors and
       rotational  ambiguity.  BS-DISP results are more robust than DISP results since the DISP
       phase of BS-DISP does not displace as strongly as DISP by itself.

These methods are applied with three air pollution  data sets in Brown et al.  (2014).  The paper
provides an interpretation  of the EPA error estimates  based on the applications. Paatero et al.
(2014) and Brown et al. (2014) are key references  for EPA PMF and both provide details on the
error estimates and their interpretation, which are only briefly covered in this guide.

1.2    Multilinear Engine

Two common programs solve the PMF problem as described above.  Originally, the program
PMF2 (Paatero, 1997) was used. In PMF2, non-negativity constraints could be imposed on
factor elements and measurements could be weighted individually based on uncertainties when
determining the least squares fit. With these features, PMF2 was a significant improvement
over previous principal component analysis (PCA)  techniques for receptor modeling of
environmental data. PMF2 was limited, however, in that it was designed to  solve a very specific
PMF problem.  In the late  1990s, the ME, a more flexible program, was developed (Paatero,
1999). This program, currently in its second version and referred to as ME-2, includes many of
the same features as PMF2 (for instance, the user is  able to weight individual measurements
and provide non-negativity constraints); however, unlike PMF2, ME-2 is structured so that it can
be used to solve  a variety of multilinear problems including bilinear, trilinear, and mixed models.

ME-2 was designed to solve the PMF problem by combining two separate steps.  First, the user
produces a table  that defines the PMF model of interest.  Then an automated secondary
program reads the tabulated model parameters and computes the solution.  When solving the
PMF problem using EPA PMF, the first step is achieved via an input file that is produced by the
EPA PMF user interface.  Once the model has been specified, data and user specifications are
fed into the secondary ME-2 program by EPA PMF. ME-2 solves the PMF equation iteratively,
minimizing the sum-of-squares object function, Q, over a series of steps as  shown in Figure  1.
A stable solution  has been reached when additional iterations to minimize Q provide diminishing
returns. The search for the solution goes from coarser to a finer scale over  three levels of
iterations. The first level of iterations identifies the  overall region of solution in space.  In this

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
level, the change in Q (dQ) is required to be less than 0.1 over 20 consecutive steps in less than
800 steps. The second level identifies the neighborhood of the final solution. Here, dQ is
required to be less than 0.005 over 50 consecutive steps in less than 2,000 total steps. The
third level converges to the best possible Q-values (Paatero, 2000a) where dQ should be less
than 0.0003 over 100 consecutive steps in less than 5,000 steps.

ME-2 typically requires a few hundred iterations for small data sets (less than 300 observations)
and up to 2,000 for larger data sets (Paatero, 2000a). If a solution is not found that meets the
requirements of any of the three levels, then a solution is non-convergent (Paatero, 2000a).
                          -1
                               -2
                                                    -1
                                    -3   -3
                                              -2
         Figure 1. Conjugate Gradient Method - underpinnings of PMF solution search.
Output from ME-2 is read by EPA PMF and then formatted for the user to interpret.  In addition,
EPA PMF has three error estimate methods that are implemented through ME-2 and EPA PMF.

The differences between ME-2 and PMF2 model results have been examined in several studies
through the application of each model to the same data set and comparison of the results.
Overall, the studies showed similar results for the major components, but a greater uncertainty
in the PMF2 solution (Ramadan et al., 2003) and better source separation using ME-2 (Kim et
al., 2007). In two recent publications, the application of factor profile constraints by ME-2
resulted in a larger number of sources found (Amato et al., 2009; Amato and Hopke, 2012).

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Version 5.0 of EPA PMF uses the most recent version of ME-2 and a PMF script file, which
were developed by Pentti Paatero at the University of Helsinki and Shelly Eberly at Geometric
Tools (March 3, 2014; me2gfP4_1345c4.exe and PMF_bs_6f8xx_sealed_GUI.ini).

1.3   Comparison to EPA PMF 3.0 and Other Methods

EPA PMF 5.0 has added two key components to EPA PMF 3.0:  two additional error estimation
methods and source contribution and profile constraints. Many other changes have been added
to make the software easier to use, including the ability to read in multiple site data. The run
time for the new error estimation methods can take from an hour to half a day depending on the
number of factors and BS runs. The large amount of time is due to the high number of
computations required for the robust error estimates. The PMF Model Development Quality
Assurance Project Plan provides the details on the QA steps  used to develop EPA PMF 5.0 and
a number of interim versions between version 3.0 and 5.0.  Version 4.2 was externally peer
reviewed; the very useful comments were used to develop version 5.0 and improve the user
guide.

Other comparable source apportionment models include Unmix and CMB. Although both
models have aims similar to that of PMF, they have different mechanisms. Unmix identifies the
"edges" in the data where the factor contribution from at least one factor is present only in
negligible amounts.  The edges are then used to determine the profile compositions and the
number of sources in the data is provided.  Unmix does not allow individual weighting of data
points, as allowed by PMF. Although major factors resolved by PMF and Unmix are generally
the same, Unmix does not always resolve as many factors as PMF (Pekney et al., 2006c; Poirot
etal.,2001).

With CMB, the user must provide source profiles that the model uses to apportion mass.  PMF
and CMB have been compared in several studies. Rizzo and Scheff (2007a) compared the
magnitude of source contributions resolved  by each model and examined correlations between
PMF- and CMB-resolved contributions. They found the major factors correlated well and were
similar in magnitude;  additionally, the PMF-resolved source profiles were generally similar to
measured source profiles.  In supplementary work, Rizzo and Scheff (2007b) used information
from CMB PM source profiles to influence PMF results and used CMB results to help  control
rotations in PMF.  Jaeckels et al. (2007) used organic molecular markers with elemental carbon
(EC) and organic carbon (OC) in both CMB and PMF. Good  correlations were found for most
factors, with some biases present in a few of the factors. They also found an additional PMF
factor that did not correspond to any CMB factors.

The models discussed above are complementary and, whenever possible, should be  used
along with PMF to make source apportionment results more robust.  In addition, statistical
receptor modeling methods have been developed by William  F. Christensen at Brigham Young
University and other researchers.

-------
U.S. Environmental Protection Agency
                               EPA PMF 5.0 User Guide
2.     Uses of PMF

PMF has been applied to a wide range of data, including 24-hr speciated PM2.5, size-resolved
aerosol, deposition, air toxics, high time resolution measurements such as those from aerosol
mass spectrometers (AMS), and volatile organic compound (VOC) data. The References
section (Section 9) provides numerous references where PMF has been applied.  Additional
discussion of uses of PMF is available in the Multivariate Receptor Modeling Workbook (Brown
et al., 2007).  Users are encouraged to read the papers that are relevant to their data as well as
source profile measurement papers. The approaches used for PMF analyses have changed
over the years as options such as constraints have been made available.  Key references are
summarized in Table 1.
                            Table 1.  Summary of key references.
        Reference
 Brinkman, G.; Vance, G.;
 Hannigan, M.P.; Milford, J.B.
 (2006). Use of synthetic data
 to evaluate positive matrix
 factorization as a source
 apportionment tool for PM2 5
 exposure data. Environ. Sci.
 Technol., 40(6): 1892-1901.
                      Key Points
Uses coefficient of determination (R) and normalized gross error
(NGE) for the source contribution comparisons and the root mean
squared error (RMSE) for source profile comparisons.
R2 measures the fraction of the variance in the actual source
contributions.
The NGE and RMSE are measures of the accuracy of the source
contribution or profile estimate.
The RMSE was chosen for the profile comparisons to place the
greatest weight on compounds present in the largest fractions, which
are most important for source apportionment purposes, where total
mass apportionment is the goal.
 Chen, L.-W.A.; Lowenthal,
 D.H.; Watson, J.G.; Koracin,
 D.; Kumar, N.; Knipping,
 E.M.; Wheeler, N.; Craig, K.;
 Reid, S. (2010). Toward
 effective source
 apportionment using positive
 matrix factorization:
 Experiments with simulated
 PM25data. J. Air Waste
 Manage. Assoc., 60(1): 43-
 54.
Uses a metric to measure the difference between known source
profiles and PMF provided contributions.  Uses a minimization
technique to find the correct set of parameter values that helps
closely match the true source profiles with predicted source profiles.
Not much on using the source profile uncertainties from the model
output.

-------
U.S. Environmental Protection Agency
                                  EPA PMF 5.0 User Guide
        Reference
 Christensen, W.F.; Schauer,
 J.J. (2008). Impact of species
 uncertainty perturbation on
 the solution stability of
 positive matrix factorization
 of atmospheric particulate
 matter data. Environ. Sci.
 Technol., 42(16): 6015-
 6021.
                        Key Points
A perturbed uncertainty matrix is created by multiplying each original
uncertainty value by a random multiplier generated from a log-normal
distribution with a mean of 1 and a standard deviation (and CV)
equal to 0.25, 0.50, or 0.75. The average values for the measure of
relative error for the three scenarios are 8%, 14%, and 17%,
respectively.
Relative errors associated with day-today estimates of source
contributions can be more than double the size of the relative errors
associated with estimates of average source contributions, with
errors for four of 10 source contributions exceeding 30% for the
largest-perturbation scenario.
The stability of source profile estimates in the simulation varies
greatly between sources, with a mean correlation between perturbed
gasoline exhaust profiles and the true profile equal to only 59% for
the largest-perturbation scenario.
 Hemann, J.G.; Brinkman,
 G.L.; Dutton, S.J.; Hannigan,
 M.P.; Milford, J.B.; Miller,
 S.L. (2009). Assessing
 positive matrix factorization
 model fit:  a new method to
 estimate uncertainty and bias
 in factor contributions  at the
 measurement time scale.
 Atmos. Chem. Phys., 9(2):
 497-513.
A novel method was developed to estimate model fit uncertainty and
bias at the daily time scale, as related to factor contributions. A
circular block BS is used to create replicate data sets, with the same
receptor model then fit to the data.
Neural networks are trained to classify factors based upon chemical
profiles, as opposed to correlating contribution time series, and this
classification is used to align factor orderings across the model
results associated with the replicate data sets.
The results indicate that variability in factor contribution estimates
does not necessarily encompass model error: contribution estimates
can have small associated variability across results yet also be very
biased.
 Henry, R.C.; Christensen,
 E.R. (2010). Selecting an
 appropriate multivariate
 source apportionment model
 result. Environ. Sci. Technol.,
 44(7): 2474-2481.
Source apportionment results favor Unmix when edges in the data
are well-defined and PMF when several zeros are present in the
loading and score matrices. Because both models are seen to have
potential weaknesses, both should be applied in all cases.
Recommend that the EPA approved versions of PMF and Unmix
both  be applied to environmental data sets.  If the two produce very
similar results, then one has added confidence based  on the fact that
two independent methods of analysis support each other.  If the PMF
and Unmix results are different, then examine the estimated source
compositions:  if these have many zeros the PMF result should be
preferred,  but only if the Unmix diagnostic edges plots show that one
or more of the edges are not clearly defined by the data.

-------
U.S. Environmental Protection Agency
                                   EPA PMF 5.0 User Guide
        Reference
 Kim, E.; Hopke, P.K.
 (2007a). Comparison
 between sample-species
 specific uncertainties and
 estimated uncertainties for
 the source apportionment of
 the speciation trends network
 data. Atmos. Environ.,  41(3):
 567-575.
                          Key Points
   The objective of this study is to compare the use of the estimated
   fractional uncertainties (EFU) for the source apportionment of PM25
   (particulate matter less than 2.5 urn in aerodynamic diameter)
   measured at the speciated trends network (STN) monitoring sites
   with the results obtained using SSU (standard STN uncertainties).
   Thus, the source apportionment of STN PM2 5 data were performed
   and their contributions were estimated through the application of
   PMF for two selected STN sites, Elizabeth, NJ and Baltimore, MD
   with both SSU and  EFU for the elements measured by X-ray
   fluorescence. The PMF resolved factor profiles and contributions
   using EFU were similar to those using SSU at both monitoring sites.
   The comparisons of normalized concentrations indicated that the
   STN SSU were not well estimated. This study supports the use of
   EFU for the STN samples to provide useful error structure for the
   source apportionment studies of the STN data.
   Implies a flaw with uncertainties associated with STN data. Promotes
   EFU over SSN.
 Latella, A.; Stani, G.; Cobelli,
 L; Duane, M.; Junninen, H.;
 Astorga, C.; Larsen, B.R.
 (2005). Semicontinuous GC
 analysis and receptor
 modelling for source
 apportionment of ozone
 precursor hydrocarbons in
 Bresso, Milan, 2003. J.
 Chromatogr. A,  1071(1-2):
 29-39.
•  A new approach is presented, by which the input uncertainty is
   allowed to float as a function of the photochemical reactivity of the
   atmosphere and the stability of each individual compound.
 Lowenthal, D.H.; Rahn, K.A.
 (1988). Tests of regional
 elemental tracers of pollution
 aerosols. 2. Sensitivity of
 signatures and
 apportionments to variations
 in operating parameters.
 Atmos. Environ., 22: 420-
 426.
•  Straight forward use of PMF and Unmix along with HYSPLIT to
   confirm results using synthetic data.

-------
U.S. Environmental Protection Agency
                                   EPA PMF 5.0 User Guide
        Reference
 Miller, S.L.; Anderson, M.J.;
 Daly, E.P.; Milford, J.B.
 (2002). Source
 apportionment of exposures
 to volatile organic
 compounds. I. Evaluation of
 receptor models using
 simulated exposure data.
 Atmos. Environ.,  36(22):
 3629-3641.
                          Key Points
   Four receptor-oriented source apportionment models were evaluated
   by applying them to simulated personal exposure data for select
   VOCs that were generated by Monte Carlo sampling from known
   source contributions and profiles.  The exposure sources modeled
   are environmental tobacco smoke, paint emissions, cleaning and/or
   pesticide products, gasoline vapors, automobile exhaust, and
   wastewater treatment plant emissions. The receptor models
   analyzed are CMB, PCA/absolute principal component scores, PMF,
   and graphical ratio analysis for composition estimates/source
   apportionment by factors with explicit restriction, incorporated in the
   UNMIX model.
   All models identified only the major contributors to total exposure
   concentrations. PMF extracted factor profiles that most closely
   represented the major sources used to generate the simulated data.
   None of the  models were able to distinguish  between sources with
   similar chemical profiles. Sources that contributed 5% to the average
   total VOC exposure were not identified.
 Reff, A.; Eberly, S.I.; Bhave,
 P.V. (2007). Receptor
 modeling of ambient
 particulate matter data using
 positive matrix factorization:
 Review of existing methods.
 J. Air Waste Manage.
 Assoc., 57(2):  146-154.
   Guidance for the application and use of PMF.
 Shi, G.L.;Li, X.; Feng, Y.C.;
 Wang,Y.Q.; Wu, J.H.; Li, J.;
 Zhu, T. (2009). Combined
 source apportionment, using
 positive matrix factorization-
 chemical mass balance  and
 principal component
 analysis/multiple linear
 regression-chemical mass
 balance models. Atmos.
 Environ.,43(18): 2929-2937.
•  A straightforward application of PMF and PCA/MLR-CMB that deals
   with collinear sources and other real data issues.
Yuan, B., Min Shao, M.;
Gouw, J.; David D. Parrish,
D.; Lu, S.; Wang, M.; Zeng,
L; Zhang, Q.; Song, Y.;
Zhang, J.; Hu, M,  (2012).
Volatile organic compounds
(VOCs) in urban air: How
chemistry affects the
interpretation of positive
matrix factorization (PMF)
analysis, J. Geophys. Res.,
117
•  Impact of VOC atmospheric reactivity on PMF results.  (VOCs) were
   measured online at an urban site in Beijing
   in August-September 2010.

-------
U.S. Environmental Protection Agency
                                 EPA PMF 5.0 User Guide
        Reference
 Zhang, Y.X.; Sheesley, R.J.;
 Bae, M.S.; Schauer, J.J.
 (2009). Sensitivity of a
 molecular marker based
 positive matrix factorization
 model to the number of
 receptor observations.
 Atmos. Environ., 43(32):
 4951-4958.
                        Key Points
•  Impact of the number of observations on molecular marker-based
   positive matrix factorization (MM-PMF) source apportionment
   models, daily PM25 samples were collected in East St. Louis, IL,
   from April 2002 through May 2003.
PMF requires a data set consisting of a suite of parameters measured across multiple samples.
For example, PMF is often used on speciated PM2.s data sets with 10 to 20 species over 100
samples. An uncertainty data set, that assigns an uncertainty value to each species and
sample, is also needed. The uncertainty data set is calculated using propagated uncertainties
or other available information such as collocated sampling precision.
                                           10

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


3.    Installing EPA PMF 5.0

EPA PMF 5.0 can be obtained from EPA by e-mailing NERL_RM_Support@epa.gov. To install
the program, run EPA PMF 5.0 Setup.exe and follow the installation directions on the screen.
The installation program creates an EPA PMF subfolder in the Program Files folder for the
software and an EPA PMF subfolder in the Documents folder for data files.  Installation
problems and software error messages should be reported to Gary Morris at
RM_ Support@epa.gov.

EPA PMF 5.0 can be run on a personal computer using the Windows XP or Windows 7
operating system or higher.  Users will need to have permission to write to the computer's C:\
drive in order to install and run EPA PMF; this may not be the default setting for some users.
After installation, EPA PMF can be started by double clicking EPA PMF 5.0  icon on the desktop.
                                       11

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


4.     Global Features

The user can access the following features throughout EPA PMF 5.0:

•  Sorting data.  Columns in tables can be sorted by left-clicking the mouse button on a
   column heading. Clicking once will sort the items in ascending order and clicking twice will
   sort the items in descending order. If a column has been sorted, an arrow will appear in the
   header indicating the direction in which it is sorted.
•  Saving graphics.  All graphical output can be saved in a variety of formats by right-clicking
   on an image.  Available formats are .gif, .bmp, .png, and .tiff. In the same menu, the user
   can choose to copy or print a graphic.  A stacked graph option is also available to combine
   profiles or time series on one page. When "copy" is selected, the graphic is copied to  the
   clipboard. When "print" is selected, the graphic will automatically be sent to the local
   machine's default printer.  When saving a graphic, a dialog box appears so that the user can
   change the file path and file name of the output file.
•  Undocking graphs. Any graph can be opened in a new window by right-clicking on the
   graph and selecting Floating Window.  The user can open as many windows as required.
   However, the graphs in the floating windows do not update when model parameters and
   output are changed.
•  Resizing sections within tabs.  Many tabs have multiple sections separated by a gray line
   (Figure 2; red arrows point to the gray bars that enable the user to adjust height and width).
   These sections can be resized by clicking on the gray line and dragging it to the desired
   location.
•  Indicating selected data points. When the user moves the cursor over a point on  a  scatter
   plot or time series graph, the  point is outlined with a dashed-line  square, indicating the point
   to which the information in the status bar refers.
•  Using arrow keys on lists and tables. After  selecting (by clicking on or tabbing to) a list or
   table, the keyboard arrow keys can be used to change the selected row.
•  Accessing help files. The left bottom corner  of most screens has a "Help" shortcut that
   provides  users access to a help file associated with the main functions in the current screen.
•  Using the status bar. Most screens have a status bar across the bottom of the window that
   provides  additional information to the user. This information changes based on the tab
   selected. Individual tab details are discussed in subsequent sections of this guide. An
   example  of the status bar on the Concentration Scatter Plot screen is shown at the bottom
   of Figure 2.
                                          12

-------
U.S. Environmental  Protection Agency
EPA  PMF 5.0  User Guide
                                                                                                                        -Inlxl
          Base Model  Help
 Data Files  Co nee ntrati on/Uncertainty  Concentration Scatter Plot  Concentration Time Series | Data Exceptions
   Select Species ~
   YAxis
   Aluminum

   Arsenic

   Bromine

   Chlorine

   Copper
   Elemental Carbon
   Iron
   Lead
   Manganese
   Nickel
   Organic Carbon
   OM
   Potassium Ion

   Silicon
   XAas
   PM2.5
   Aluminum
   Ammonium Ion
   Arsenic
   Calcium
   Chlorine
   Chromium
   Copper
   Elemental Carbon
   Iron
   Lead
   Manganese
   Nickei
   Organic Carbon
   OM
   Potassium Ion
   Selenium
   Silicon
   Sodium Ion
                                             Species Concentration "
 Help |
               07/02/02 00:00
                                            PM2.5 = 49.50000
                                                                         Sulfate = 23.50000
              y = 1.76839X + 7.02668
                            Figure  2.  Example of resizable sections and status bar.
                                                              13

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


5.     Getting Started

Each time the EPA PMF 5.0 program is started, a splash screen with information about the
development of the software and various copyrights is displayed. The user must click the OK
button or press the spacebar or Enter key to continue.

The first EPA PMF window is Data Files under the Model Data tab, as shown in Figure 3. On
this screen, the user can provide file location information and make required choices that will be
used in running the model. This screen has three sections:  Input Files (Figure 3, 1), Output
Files (Figure 3, 2), and Configuration File (Figure 3, 3), each of which is described in detail
below.  EPA PMF 5.0 can  read multiple site data; time series plots of species concentrations or
source contributions are displayed in the same order as the user provided data and  PMF
displays a vertical line separating the sites.

The status bar at the bottom of the Data Files screen  indicates which section of the program has
been completed.  Prior to any user input on the Data Files screen, the status bar displays "NO
Concentration Data, NO Uncertainty Data, NO Base Results, NO Bootstrap  Results, NO BS-
DISP Results, and NO DISP Results" in red. When a task is completed, "NO" is replaced with
"Have" and the text color changes to green.  In the Figure 3 example, concentration  and
uncertainty files have been provided to the program, so the first two items on the status bar  are
green.  Base runs, BS runs, BS-DISP runs, and DISP runs have not been completed, so the last
four items are red.  The Baltimore PM files (Dataset_Baltimore_con.txt and
Dataset_Baltimore_unc.txt) are  part  of the installation package and can be found in the
"C:\Documents\EPA PMFMData" folder, if the user installed the model using the default
installation settings.

5.1    Input Files

Two input files  are required by PMF: (1) sample species concentration values and (2) sample
species uncertainty values or parameters for calculating uncertainty.  EPA PMF accepts tab-
delimited (.txt), comma-separated value (.csv), and Excel Workbook (.xls or .xlsx) files.  Each
file can be loaded either by typing the path into the "data file" input boxes or browsing to the
appropriate file. If the file includes more than one worksheet or named range, the user will be
asked to select the one they want to use. The concentration file has the species as  columns
and dates or sample numbers as rows, with headers for each (Figure 4).  All standard date and
time conventions are accepted and they are listed in the Date Format pull-down list.  Four
possible input options are  accepted:  (1) with sample  ID only, (2) with Date/Time only, (3) with
both Sample ID and Date/Time, (4) with no IDs or Date/Time.  Units can be included as a
second heading row in the concentration file, but are not required and units are not included in
the uncertainty file. If units are supplied by the user, they will be used by the graphical user
interface (GUI) for axis labels only and will not be used by the model.  Blank cells are not
accepted; the user will be  prompted  to examine the data and try again; species names cannot
contain commas. If values less than -999 are found in the data set, the program will give a
warning message but will continue.  If these values are not real or are missing value indicators,
the user should modify the data file outside the program and reload the data sets. Also, the
names of each species must be unique. The user must specify the Date/Time and ID/Site
                                          14

-------
U.S. Environmental Protection Agency
                                                     EPA PMF 5.0 User Guide
columns if they are included in the input data sets.  The basic PMF functions are demonstrated
using single site data and a multiple site example is shown in Section 8.1.  Multiple site data
should be sorted by Site and Date/Time before loading it into PMF.  Lines deliminating Sample
ID will not be  displayed if a missing value  is at the transition between Sample IDs and the option
"exclude missing samples" is selected; missing transition samples should be removed or the
option "replace missing samples with the species median" selected.
     Model Data Base Model  Help

     Data Files  Concentration/Uncertainty Concentration ScatterPlot Concentration Time Series  Data Exceptions

      Input Files

       Model input data in tab-delimited (.bd). comma-separated value (.csv). or Excel workbook (.xls) format.
       Species names in first row. units m second row (optional), and date/limes in first column (optional).
                                                                    Date Formal: Automatic
      Concentration Data File'  C \Users\Pubiic\DocumentsVEPA PMF\Data\Dalaset-BalIimore_con.txt

                  Concentration data table with parameter names in the first row.
         A        Optionally, the second row may contain units and the first
       Uncertainty Data File  C:\Users\Publici DocumentslEPA PMRData\Dataset-Baltimore_unc.txt

                  Observation-based or equation-based uncertainty values for each sample.
                  Must match concentration data format.
                                                                                        Browse | | Load
       Date/Time Colum
      Missing Value Indie
                                                     Unselect'Selec- A||
                             O Exclude Entre Sample
                                              Replace Missing Values with Species Media
          Output Folder- C \Users\Public\Documents\EPA PMR.Ou!put\Balt_example
                  Spectfyad,                                         Output File Preto Bait

         Output File Type:  "_ Tab-Delimited Text f'txt) •'_• Com ma-Delimited Text f.csv) « Excel 97-03 Workbook {'.xls)  _ Excel 07-10 Workbook f.xlsx)

                  \7\ Output Only Selected Run  ~
-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
sample numbers are the same between the concentration and uncertainty files and the program
will not allow the data to be evaluated if there is a mismatch.  If the headers are different due to
naming conventions but actually have the same order, the user can proceed to the next step. If
not, the user should correct the problem outside the GUI and reload the files. Negative values
and zero are not permitted as uncertainties; EPA PMF will provide an error message and the
user will have to remove these values outside EPA PMF and reload the uncertainty file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
A

DATE
2/9/2000
2/15/2000
2/27/2000
3/4/2000
3/10/2000
3/22/2000
4/6/2000
4/9/2000
4/12/2000
4/15/2000
4/18/2000
4/21/2000
4/24/2000
4/27/2000
4/30/2000
5/3/2000
5/12/2000
5/15/2000
5/18/2000
5/21/2000
B
Aluminum
ug/m3
0.0201
0.0057
0.0029
0.0011
0.0075
0.0006
0.0256
0.0165
0.0108
0.0065
0.0072
0.0092
0.0289
0.0033
0.0120
0.0098
0.0209
0.0096
0.0348
0.0008
C D E F G
Ammoniun Bromine Calcium Chlorine Copper
ug/m3 pg/m3 ug/m3 ug/m3 ug/m3
3.6020 0.0107 0.0676 0.0647 0.0059
1.3740 0.0006 0.0325 0.0016 0.0019
2.1860 0.0028 0.0422 0.0288 0.0028
0.4501 0.0014 0.0329 0.0024 0.0010
0.3099 0.0006 0.0247 0.0039 0.0003
1.1570 0.0033 0.0265 0.0015 0.0029
1.3520 0.0025 0.0863 0.0026 0.0041
0.2800 0.0011 0.0263 0.0016 0.0003
1.1290 0.0026 0.0304 0.0080 0.0046
1.5640 0.0037 0.1075 0.0296 0.0059
0.1983 0.0028 0.0351 0.0073 0.0017
0.1432 0.0022 0.0250 0.0042 0.0023
0.4066 0.0000 0.0337 0.0007 0.0006
1.5030 0.0031 0.0329 0.0010 0.0024
0.5734 0.0021 0.0442 0.0097 0.0022
1.3200 0.0014 0.0365 0.0039 0.0015
0.1049 0.0013 0.0394 0.0003 0.0033
1.1600 0.0010 0.0337 0.0023 0.0002
2.9630 0.0037 0.1088 0.0083 0.0066
1.9910 0.0014 0.0409 0.0011 0.0025
1 R/iyin nnnm n moK nnmq n nnnc
H
EC
ug/m3
3.1230
1.0710
0.6732
0.5503
0.2869
0.9487
2.1990
0.8535
0.9983
3.1430
0.6603
0.7096
1.1100
1 .4970
0.6726
1.1210
1 .2070
0.8730
1.9910
0.4828
1 yuan
I
Iron
Lg/m3
0.1497
0.0673
0.0727
0.0483
0.0565
0.0821
0.1492
0.0396
0.0959
0.1976
0.0539
0.0765
0.0830
0.0840
0.0741
0.0735
0.1108
0.0902
0.1519
0.0449
J
Lead
ug/m3
0.0157
0.0055
0.0073
0.0061
0.0032
0.0044
0.0089
0.0017
0.0042
0.0110
0.0004
0.0003
0.0067
0.0082
0.0025
0.0077
0.0046
0.0064
0.0054
0.0038
n nr~M£
K L M
Manganesi Nickel Nitrate
^g/m3 ug/m3 ug/m3
0.0043 0.0577 5.3700
0.0004 0.0285 0.8785
0.0002 0.0215 3.8820
0.0004 0.0188 0.4562
0.0016 0.0083 0.6763
0.0012 0.0107 1.0670
0.0034 0.0254 1 .4660
0.0019 0.0257 0.2515
0.0001 0.0344 1.1900
0.0026 0.0437 4.3040
0.0027 0.0082 0.6816
0.0009 0.0126 0.6017
0.0005 0.0256 0.2174
0.0013 0.0247 3.3670
0.0041 0.0153 0.5117
0.0000 0.0056 1 .3380
0.0000 0.0114 0.6438
0.0004 0.0167 0.3547
0.0031 0.0166 3.3450
0.0018 0.0099 2.0890
n nnn7 n ncfn 1 *^an
N
OC
ug/m3
7.3930
3.3310
5.2030
3.6160
2.8140
2.4150
4.7350
1 .6760
2.6360
6.9460
1 .9990
1 .7230
2.4420
3.5360
3.3610
4.2670
3.8460
3.1960
6.1610
2.5760
               Figure 4. Example of formatting of the Input Concentration file.


The equation-based uncertainty file provides species-specific parameters that EPA PMF 5.0
uses to calculate uncertainties for each sample.  This file should have one delimited row of
species, with species names (Figure 5). The next row should be species-specific method
detection limit (MDL) followed by the row of uncertainty (species-specific). Zeroes and
negatives are not permitted for either the detection limit or the percent uncertainty. If the
concentration is less than or equal to the MDL provided, the uncertainty (Unc) is calculated
using a fixed fraction of the MDL (Equation 5-1;  Polissaretal., 1998).

1
2
3
4
A
unc
2
10
B
Aluminum
0.00419
10
C
D E
Ammoniur Arsenic Barium
0.0125 0.00098 0.0068
10 10 10
F
Bromine
0.0016
10
G
Calcium
0.0038
10
H
Chlorine
0.002635
10
                  Figure 5. Example of an equation-based uncertainty file.
                                      = -xMDL
                                        6
                      (5-1)
                                          16

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide

If the concentration is greater than the MDL provided, the calculation is based on a user
provided fraction of the concentration and MDL (Equation 5-2).
                Unc = ij(Error Fraction x concentration)2 + (0.5 xMDZ)2               (5-2)

A sample equation-based uncertainty file (Dataset-Baltimore_unc_eqn) has been provided in
the C:\Documents\EPA PMFMData folder. The equation-based uncertainty is useful if only the
MDL and error percent are available; however, this approach will not capture errors associated
with the specific samples. The uncertainties calculated by the equation-based method do not
match the Dataset_Baltimore_unc.txt due to this simplification.

Users can specify a Missing Value Indicator (which can be any numeric value) in the Input Files
box on the Data Files screen.  The user should not choose a numeric indicator that could
potentially be a real concentration. For example, if the user specifies "-999" as the missing
value indicator, and chooses to replace the species with the  median, the program will find all
instances of "-999" in the data file and replace them with the species-specific median.  The
program will also replace all associated uncertainty values with a high uncertainty of four times
the species-specific median.  If all samples of a species are missing, that species is
automatically categorized as "bad" and excluded from further analysis.  The missing value
indicator is used in the output files.

If a message is displayed that the dates/times do not match in the concentration and uncertainty
files, the user needs to check the file dates/times and reload the data before being able to
evaluate the data in PMF.  If the dates/times in both files are the same, try saving both the
concentration and data file in a different format, such as .csv or .txt.

5.2    Output Files

The user can specify the output directory ("Output Folder"), choose the EPA PMF output file
types ("Output File Type" radio buttons) and define a prefix for output files ("Output File Prefix").
The prefix is added to the beginning of each file; for the example in Figure 3, the profiles will be
saved as Balt_profile.xls.  For the examples in the User Guide, the prefix is shown as an
asterisk (*).  The "Output File Type" includes tab-delimited text (.txt), comma-separated variable
(.csv), or Excel Workbook (.xls).  "Output File Prefix"  is the prefix that will be used as the first
part of any output file; this prefix can  contain any letters and/or numbers (other characters such
as "-" and "_" are not allowed).  If this prefix is not changed when a new run is initiated, a
warning will be displayed.  If Excel Workbook output  is selected, two output files are
automatically created by EPA  PMF during base runs and will be saved in the My
Documents\EPA PMF\Output folder selected by the user:  *_base.xls and *_diagnostics.xls.
Each file has tabs with the PMF results.

   •   *_base.xls  - Profiles, Contributions, Residual, Run Comparison
   •   "_diagnostics.xls - Summary, Input, Base Runs
                                           17

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


If a delimited output is selected, the information in the Base Runs tab is provided as separate
files and the diagnostics tab information is combined into one file. The following list provides the
details on the data that are saved in the Excel output files.

Additional files are created and saved after conducting bootstrapping:  (*_profile_boot), DISP
(*_DISPres1,  *_DISPres2, *_DISPres3, *_DISPres4), BS-DISP (*_BSDISP1, *_BSDISP2,
*_BSDISP3, *_BSDISP4), Fpeak (*_fpeak), and/or constrained model runs (*_Constrained).
The four files  output for DISP and BS-DISP are for each dQmax; the runs using the lowest
dQmax are used in the summary graphics and in the summary output file. The file
*_ErrorEstimationSummary provides a summary of the base run and the error estimations that
have been done using BS, DISP, and BS-DISP.  The file *_profile_boot contains the number of
BS runs mapped to each base run, each BS profile that was mapped to the base profile, and all
bootstrapping statistics generated by the GUI.  The file *_fpeak contains the profiles and
contributions of each Fpeak run.  When multiple base model runs are completed, by default,
only the run with the lowest Q(robust) value is saved to the output, but the user may opt to
include all runs in the output by unselecting "Output Only Selected Run."

5.3   Configuration Files

EPA PMF provides the option of saving run preferences and input parameters in a configuration
file. The user must provide a name for a configuration file on the Input File Screen to create a
configuration file. Information saved in the configuration file include specifications from the Data
Files screen (e.g., input files, output file location,  and output file type), species categorizations
from the Concentration/Uncertainty screen, and all run specifications from the Base Model Runs
screen, Fpeak Rotation screen, and Constrained Model Runs screen.  Model output is not
saved as part of the configuration file; however, the model random starting point or seed
number is saved  if the Random Start button is unchecked. To choose a configuration file, the
user can click on "Browse" to browse to the correct path or type in a path and name. The user
can also press the "Load Last" button or simply press "Enter" on the keyboard to load the most
recently used configuration file.  The "Save" and "Save As" buttons can be used to save the
current settings to an existing or new configuration file.

Configuration files can be used on multiple computers or shared with collaborators, thereby
avoiding a long list of preferences to replicate the results.  Use the "Browse" button to locate
and load the configuration file.  The location of both the concentration and uncertainty files must
be identified next. PMF does not store past run data; however, the results can be easily
calculated by PMF as long as the same number of factors, runs, and a fixed seed is used
(random start is not selected).

5.4   Suggested Order of Operations

The GUI is  designed to give the user as much flexibility as possible when running the PMF
model. However, certain steps must be completed to utilize the full potential of the provided
tools. The order of operations is mainly based on how the tabs and functions are arranged
(from left to right) in the program  (Figure 6, Figure 7, and Figure 8); the sections in this user
guide also follow this order. To begin using the program, the user must provide input files via
                                          18

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
the Model Data - Data Files screen before other operations are available.  The first time PMF is
performed on the data set, the user should analyze the input data via the
Concentration/Uncertainty, Concentration Scatter Plot, Concentration Time Series, and Data
Exceptions screens. This step is usually followed by Base Model Runs and Base Model  Results
under the Base Model tab; these steps should be repeated as needed  until the user reaches a
reasonable solution. The solution is evaluated using the Error Estimation options starting with
DISP and progressing to BS and BS-DISP; the output from the error estimation methods (DISP,
BS, and BS-DISP) provides key information on the stability of the solution.  All three error
estimation methods are required to understand the uncertainty associated with the solution.
Advanced users may wish to initiate Fpeak runs or constrained model runs based on a selected
base run; both options are available under the Rotational Tools tab.
Input/Output
Specification

/Concentration i\
— i
\ & Uncertainty J
— ( Output Files 1
/Configuration/
File
/Concentration/,
Scatter Plot )
^ 	 ^>x
V, Time Series )
	
Data
~V Exceptions )



Base Model
Execution

f Residual \
. , .
V Analysis /
/Obs/Pred/,
~\ Scatter Plot y1
_( Obs/Pred /^
"/Time Series )
_( Profiles/ \^
"\Contributions J

/Fingerprints/

( G-Space N,
plots
Factor \
V Contributions )
^^"^^

,/ \^
Displacament
Execution

/DISP results/
plots
,/ \
— ( Output Files )
DISP
Summary )








Bootstrap
Execution

_( BS results /,
plots
-( Output Files )
S \
— ( BS Summary )
^^ ^/
> '
Error Estimate
Plots





BS-DISP
Execution

/BS-DISP^\
^ results plots ,
,/ \
— ( Output Files )
__/BS-DISP^\
Summary








              Figure 6. Flow chart of operations within EPA PMF - Base Model.
5.5    Analyze Input Data

Several tools are available to help the user analyze the concentration and uncertainty data
before running the model. These tools help the user decide whether certain species should be
excluded or downweighted (e.g., due to increased uncertainty or a low signal-to-noise ratio), or
                                          19

-------
U.S. Environmental Protection Agency
                EPA PMF 5.0 User Guide
if certain samples should be excluded (e.g., due to an outlier event). All changes and deletions
should be reported with the final solution. The four screens for analyzing input data are
described below.
                              Fpeak
                             Execution
                             Fpeak dQ
                             Profiles/
                           Contributions /
    Bootstrap
    Execution
    BS results
      plots
4  Output Files  )
                              Factor    \
                            Fingerprints^   4 BS Summary  \
                             G-Space
                               Plots
      T
                              Factor
                           Contributions
                                               Error Estimate
                                              Summary File &
                                                   Plots
                            Diagnostics   1
                 Figure 7. Flow chart of operations within EPA PMF - Fpeak.


5.5.1   Concentration/Uncertainty

Input data statistics and concentration/uncertainty scatter plots are presented in the
Concentration/Uncertainty screen, as shown in Figure 9. The following statistics are calculated
for each species and displayed in a table on the left of the screen (Figure 9, 1):

•  Minimum (Min) - minimum concentration value
•  25th percentile (25th)
•  Median - 50th percentile (50th)
•  75th percentile (75th)
•  Maximum (Max) - maximum value reported
•  Signal-to-noise ratio (S/N) - indicates whether the variability in the measurements is real or
   within the noise of the data
                                           20

-------
U.S. Environmental Protection Agency
                                                             EPA PMF 5.0 User Guide
             Constraint
             Execution
           Constraint dQ  )
              Profiles/
            Contributions J
              Factor    \
            Fingerprints  J
                               Displacement
                                 Execution
                                DISP results
                                   plots
                                Output Files
                                   DISP
                                 Summary
     Bootstrap
     Execution
     BS results
       plots
    Output Files
—(  BS Summary
     BS-DISP
    Execution
/  BS-DISP
   results plots
4  Output Files
 (  BS-DISP
 V  Summary
            G-Space Plots
              Factor
            Contributions
             Diagnostics
               Figure 8. Flow chart of operations within EPA PMF - Constraints.
Percentiles are calculated using a weighted average approach (Equation 5-2):
                                      , f    ,
                                      L(n,p) =
                                       ^  '^J
                                                100

                                      L(n,p) = I + F

                                  = l-F;W2=F;W3
                                                                                  (5-2)
                             P=
where n represents the number of non-missing values of the selected variable; p is the
percentile of interest; / is the integer part of L(n,p); F represents the fractional part of L(n,p);
W2, and W3 are weights; P is the pt
the variable of interest.
                                  percentile; and X1,X2,... ,Xn represent the ordered values of
The S/N calculation in EPA PMF has been revised in the new version.  Previously, S/N of a
given species was essentially the sum of the concentration values divided by the sum  of
uncertainty values.  While reasonable, this could lead to different problems in certain specific
situations.  Artificially high S/N values would be obtained for species with a handful of  high
                                           21

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
concentration events, resulting in a S/N that may actually be higher than another species' S/N
with more consistent signal.  More seriously, artificially low S/N values could appear for species
with a few missing values. Missing values are usually downweighted by very large uncertainty
values, typically (much) larger than the largest concentration values in the species in question.
55, EPA PMF
Model Data Base Model
Data Files | Concentration/U
Input Data Statistics
Species Cat
1 PM2.5
Aluminum
Ammonium Ion
Arsenic
Barium
Bromine
Calcium
Chlorine
Chromium
Copper
Elemental Carbon
Iron
Lead
Manganese
Nickel
Organic Carbon
OM
Potassium Ion
Selenium
Silicon
Sodium Ion
Sutfate
Titanium
Total Nitrate
Vanadium
Zmc
l
Weak

Help
ncertainty



Concentration Scatter Plot

S/N Min 25th
9.0
Weakj 0.1
Strong
Weak
Weak
Strong
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Weak
Weak
Strong
Bad
Strong
Weak
Strong
Weak
Strong
Weak
Strong
Weak
Strong
' I 0 i
Unsort Species Gate
89
0-1
0.0
20
2-1
0.1
0-1
1.1
4 5
5.6
0.5
0.4
05
7.8
7.8
22
02
20
1.0
92
0.7
79
0.6
5.1

2.00000
0 OO419
0.01250
000098
000680
000160
0.00380
0 00264
0 00052
0.00130
0 12500
0 00499
0.00432
0.00175
0.00095
090800
1.27120
001200
0.00137
0 00950
0.01500
0.11200
0.00265
005100
0.00190
0.00175




1 =, ' 0 I
Concentration Time Series Data Exceptions

50th
897500 1350000
0.01250 0.01250
110000 1.68500
0.00190 000190
0.04450 0 04450
0.00160 0.00367
0.02388
0.00750
000130
000130
044900
005085
0.00445
0.00175
000095
3 1 3000
4.38200
001200
0.00170
0.03040
0-04770
2.43750
0 00265
0.69250
0.00190
0.00816
'

0.03575
0 00750
0.00130
0.00282
0 64500
0.08150
000445
0.00175

75th
19.60000 76
0.01250
2.54000
0.00190
0.04450
0.00538
005340
001563
0.00130
0.00455
0.88700
\
(
C


C


0.12325 ]
0.00445
0.00198
0.00095! 0.00217
422000' 5.51500
5.90800 7.72100
0.05720 0.10700
0.00170
0.05225
0.08810
0.00170
0.08173
0.16300
3.76000' 5.91250
0.00265
1.27000
000190
0.01335
0.00654
2.34500
0.00432
c
:
5!
'

3C
'
i;
1
0.02313 i
gory Settings: Strong | Weak 1 Bad i
Help | Strong Species: 11

Concentration/Uncertainty Scatter Ptot
3
0.16
0.15
0.14
0.13
0.12
0.11
0.10
IT 0.09
1 0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00
0
Zinc
-

D 0.1
Total Variable (Defaults to Weak)


0.2 0.3
4 0 % Extra Modeling Uncertainty {0 - 100%)

Weak Species: 14 Bad Species 1 Samples Excluded: 0 %
                 Figure 9. Example of the Concentration/Uncertainty screen.


If this process was done to the data prior to ingest into EPA PMF, such inflated uncertainty
values will inflate the N in S/N calculations, resulting in a S/N that will be small enough to cause
the classification of a perfectly strong variable as "weak."  The latter problem has been
repeatedly observed in practical work.  In addition, the presence of slightly negative
concentration values, not uncommon in environmental data, could artificially decrease S and
hence the S/N of a species.

In the revised calculation, only concentration values that exceed the uncertainty contribute to
the signal portion of the S/N calculation, because the concentration value is essentially equal to
the sum of signal and noise, and therefore signal is the difference between concentration and
uncertainty.

Two calculations are performed  to determine S/N, where concentrations below uncertainty are
determined to have  no signal, and for concentrations above uncertainty, the difference between
concentration (Xj) and uncertainty (Sj) is used as the signal (Equation 5-3):
                                           22

-------
U.S. Environmental Protection Agency                    EPA PMF 5.0 User Guide
                                  dv=0    if^<^                               (5-3)

S/N is then calculated using Equation 5-4:
The result with this new S/N calculation is that species with concentrations always below their
uncertainty have a S/N of 0. Species with concentrations that are twice the uncertainty value
have a S/N of 1.  S/N greater than 1 may often indicate a species with "good" signal, though this
depends on how uncertainties were determined. Negative concentration values do not
contribute to the S/N, and species with a handful of high concentration events will not have
artificially high S/N. While there are many methods to determine S/N, the one selected in the
new version of EPA PMF may be more useful in environmental data analysis compared to the
prior version, though with the caveat that the S/N is merely one of many analyses for screening
data.

Based on these statistics and knowledge of analytical and sampling issues, the user can
categorize a species as "Strong," "Weak," or "Bad" by selecting the species in the Input Data
Statistics table (Figure 9, 1) and pressing the appropriate button under the table (Figure 9, 2).
In addition, Alt+W, Alt+B, and Alt+G can be used to change a species category to Weak, Bad,
or Good, respectively. The default value for all species is "Strong." A categorization of "Weak"
triples the provided uncertainty, and a categorization  of "Bad" excludes the species from the rest
of the analysis.  If a species is marked "Weak," the row is highlighted orange; if a species is
marked "Bad," the row is highlighted pink. When choosing the category for each species, the
user should consider the presence of sources that could be contributing to species  based on
measured profiles, tracer species for point sources that may have infrequent impacts, the
number of samples that are missing or below the limit of detection, known problems with the
collection or analysis of the species, and species reactivity.

A discussion of these considerations is provided in Reff et al. (2007).  Detailed knowledge of
the sources, sampling, and analytical uncertainties is the best way to decide on the species
category.  If detailed information about the data set is unavailable, the S/N ratios may be used
to categorize one or more species.  To conservatively use the S/N ratios to categorize species,
categorize the species as "Bad" if the S/N ratio is less than 0.5 and "Weak" if the S/N ratio is
greater than 0.5 but less than 1.  For the sample Baltimore data set provided with the installation
package (Dataset-Baltimore_con.txt and Dataset-Baltimore_unc.txt), these guidelines would
result in aluminum, arsenic, barium, chlorine, chromium, manganese, and selenium categorized
as "Bad" and lead, nickel, titanium, and vanadium as "Weak." Any changes made to the
                                          23

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
user-provided uncertainty by making a species category "Weak" or by adding extra modeling
uncertainty should be documented by the user and reported with the final solution.

For users familiar with EPA PMF, Table 2 shows a summary of the PMF input information for
the Baltimore Example, which  is used in Sections 5 and 6 to demonstrate PMF.  This summary
information will be presented for users who would like to run the software while learning about
the new features and structure of EPA PMF 5.0.

A concentration/uncertainty scatter plot is displayed on the right of the screen (Figure 9, 3) and
the plot shows the relationship between the concentration and the user provided or PMF
calculated uncertainties. The  species to be plotted is selected in the Input Data Statistics table
either by clicking on the species row or scrolling up and down through the species and only one
species can be displayed at a  time.  The statistics for each species are shown in the table: S/N;
Minimum (Min), 25th, 50th, and 75th percentile; Maximum (Max), % Modeled Samples (number of
samples with matched non-missing selected species divided by total number of input samples),
and % Raw Samples (number of non-missing input samples divided by total  number of input
samples).  For example, if four sites with equivalent number of data points and no missing data
were ingested, and only one of the four sites was included for modeling,"% modeled
samples"=25%, while "% raw samples"=100%, since there was no reduction of data directly
upon ingest. If missing data were in the ingested data, and "exclude entire sample" for missing
data was selected, both % modeled and % raw would be lower. The last two values are
important because PMF requires that all good or weak category species be non-missing for the
sample to be included in the PMF run. The % Modeled Samples and % Raw Samples can be
used to identify the species that may be limiting the total number of samples used in a run.
              Table 2.  Baltimore example - summary of PMF input information.
«««« Qata pj|gs ««««
Concentration file:
Uncertainty file:


Excluded Samples
07/04/02
07/07/02
07/08/02
12/31/02
07/05/03
01/01/05
07/03/05
07/01/06
07/04/06

«««« Base Run Summary****
Dataset-Baltimore con.txt Number of base runs:
Dataset-Baltimore unc.txt













**** Input Data Statistics ****
Species
PM2.5
Aluminum
Ammonium Ion
Arsenic
Barium
Bromine
Calcium
Chlorine
Chromium
Copper
Elemental Carbon
Iron
Lead
Category
Weak
Weak
Strong
Weak
Weak
Strong
Strong
Weak
Weak
Weak
Strong
Strong
Weak






S/N
9.0
0.1
8.9
0.1
0.0
2.0
2.1
0.1
0.0
1.0
4.4
5.6
0.5















Base random seed:
Number of factors:
Extra modeling uncertainty:












Species
Manganese
Nickel
Organic Carbon
OM
Potassium Ion
Selenium
Silicon
Sodium Ion
Sulfate
Titanium
Total Nitrate
Vanadium
| Zinc

20
89
	 7
0












Category
Weak
Weak
Strong
Bad
Strong
Weak
Strong
Weak
Strong
Weak
Strong
Weak
Strong

















S/N
0.3
0.5
7.8
7.8
2.1
0.2
2.0
1.0
9.2
0.7
7.9
0.6
5.1
                                         24

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


The x-axis is the concentration, the y-axis is the uncertainty, and the graph title is the name of
the species plotted.  If users change a species categorization to "Weak," the
concentration/uncertainty scatter plot for that species will be updated to three times the original
uncertainty and the data points will be changed to orange squares.  If users change a species
categorization to "Bad," the graph for that species will not be displayed.  A typical concentration
and uncertainty relationship is a hockey stick shape where the MDL dominates the uncertainty
at low concentrations and becomes linear as the percentage of the concentration dominates the
uncertainty.  Points with  uncertainties that do not follow the general trend of the data should be
further evaluated by reading available sampling and analytical reports.

The user can also add "Extra Modeling Uncertainty (0-100%)," which is applied to all species, by
entering a value in the box in the lower right corner of the screen (Figure 9, 4).  This value
encompasses various errors that are not considered measurement or analytical errors and
which are included in the user-provided uncertainty files.  Issues that could cause modeling
errors include variation of source profiles and chemical transformations in the atmosphere. The
model uses the "Extra Modeling Uncertainty" variable to calculate "sigma," which corresponds to
total uncertainty (modeling uncertainty plus species/sample-specific uncertainty).  If the user
specifies extra modeling uncertainty, all concentration/uncertainty graphs will be updated to
reflect the increase in uncertainty.  As shown in Equation  1-2, the uncertainty values are a
critical  input in the PMF model.

On this screen, the user can also specify a  "Total Variable" (Figure 9, 2) that will be used by the
program in the post-processing of results. For example, if the data used are PM2.s components,
the total variable would be PM2.s mass.  The user specifies the total variable by selecting the
species and pressing the "Total Variable" button beneath the Input Data Statistics table.
Because a total variable should not have a  large influence on the solution, it should be given a
high uncertainty. Therefore, when a species is selected as a total variable, its categorization is
automatically set to "Weak." If the user has already adjusted the uncertainty of the total variable
outside of PMF and wishes to categorize it as "Strong," the default characterization can be
overridden by selecting "Strong" for the variable after selecting "Total Variable." A species
designated "Bad" cannot be selected as a total variable, and a total variable  cannot be made
"Bad."

The status bar in the Concentration/Uncertainty screen displays the number of species of each
category as well as the percentage of samples excluded by the user. Hot keys can be used to
assign "Strong" (Alt-S), "Weak" (Alt-W),  "Bad" (Alt-B), and "Total Variable" (Alt-T).  The user can
also sort the input data by clicking on the column headers. Clicking on the "Species" and "Cat"
columns will sort the input data in alphabetical  or reverse alphabetical order. Clicking on the
remaining columns will sort the data  in ascending or descending order. To return to the original
species sort order (which corresponds to the order listed in the input concentration data file on
the Data  Files screen) the user can select "Unsort" (Figure 9, 2) or use a hot key (Alt-U).

5.5.2   Concentration Scatter Plots

Scatter plots between species are a  useful pre-PMF analysis tool; a correlation between species
indicates a similar source type or source locations. The user should examine scatter plots to
                                           25

-------
U.S.  Environmental Protection Agency
EPA PMF 5.0 User Guide
look for expected relationships, as well as to look for other relationships that might indicate
sources or source categories.

The Concentration Scatter Plot screen shows scatter plots between two user-specified species
(Figure 10).  The user selects the species for each axis in the appropriate "Y Axis" or "X Axis"
list.  Only one species can be selected for each axis.  A one-to-one line (in blue) and linear
regression line (in dashed red) are shown on the plot.  Axis labels are the species names and
units (if provided) and the plot title is "Y Axis Species/X Axis Species."  Some examples of linear
relationships between species indicate source impacts: iron  and zinc for steel production and
sulfate and ammonium ion for ammonium sulfate from coal-fired power plants.

As the user mouses over the points, the status bar at the bottom of the window shows the date,
y-value, x-value, and the regression equation.
 Data Files  Concentration/Uncertainty  Concentration Scatter Plot  Concentration Time Series  Data Exceptions
  Select Species
  PM2.5
  Aluminum

  Arsenic

  Bromine

  Chlorine

  Copper
  Elemental Carboi
  Lead
  Manganese
  Nickel
  Organic Carbon
  OM
  Potassium Ion
  Selenium
  Silicon
  Sodium Ion
  Copper
  Elemental Carbon

  Lead
  Manganese
  Nicke!

  OM
  Potassium Ion
  Selenium


  Sulfate
  Titanium
  Total Nitrate
  Vanadium	
                                      Species Concentration
 Help]
          V = 1.54412X + 0.07308
                      Figure 10.  Example of a concentration scatter plot.
5.5.3  Concentration Time Series

Time series of species concentrations (Figure 11) are useful to determine whether expected
temporal patterns are present in the data and whether there are any unusual events.  By
overlaying multiple species, the user can see if any unusual events are present across a group
                                             26

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


of species that may indicate a shared source. The user should also examine time series for
extreme events that should be excluded from modeling (for example, elevated potassium
concentrations on the Fourth of July from fireworks). The firework impacts can show up both
before and after the Fourth of July as well as on New Year's Eve (elevated concentrations on
the January 1 sample).

The user can select up to 10 species in the Concentration Time Series list by checking the box
next to each species name (Figure 11, 1). The selected species will be displayed in varying
colors on the plot. To clear all species from the plot, the user should select "Clear Selections"
below the list. Vertical orange lines denote January 1 of each year (if appropriate) for reference.
A legend is provided at the top of the graph with species names and units (if available).  Vertical
lines separating points by SamplelD can be toggled on the Data Files screen. A legend is
provided at the top of the graph with species names and units (if available). The legend
automatically updates with each selection.  If data are not in order by date, e.g., if there are
multiple  SamplelDs for a given date, the x-axis will display "Sample Number", as the plot is
simply a line plot, rather than a time series of sequential samples. The legend automatically
updates with each selection. The status bar on this screen shows the selected sample
date/time, the SamplelD if provided, the number of samples included out of the total number of
samples, and the percent of samples excluded by the user.  The arrow buttons below the plot,
or the right and left arrow keys on the keyboard, can be used to scroll through samples. If a
group of samples is selected, the arrows will move the first selected sample forward/backward
by one sample.  Samples can be removed from analysis by selecting individual data points with
a single  mouse click or dragging the mouse over a range of dates.  Pressing the "Exclude
Samples" button below the plot will remove the samples and gray them out for all species
(Figure 11,2).  Excluded samples can be included again by selecting the data point/range on
any species time series graph and pressing "Restore Samples." If a sample is removed from
analysis, it will not be included in the statistics or plots generated by EPA PMF or in any model
output, but it is not removed from the original user input files. Hot keys can be used to exclude
(Alt-E) or restore (Alt-R) selected samples.  A number of samples impacted by fireworks were
excluded: 07/04/02, 07/07/02, 07/08/02, 12/31/02, 07/05/03, 01/01/05, 07/03/05, 07/01/06, and
07/04/06.  Impacts such as fireworks represent a challenge for PMF and multivariate models
because they are infrequent short duration events with high concentrations.

5.5.4  Data Exceptions

Changes made by the GUI  to the input data are detailed in the Data Exceptions screen. These
changes include designating a species "Weak" or "Bad," excluding a sample via the
Concentration Time Series screen, or excluding a sample using "Missing Value Indicator" in the
Data Files screen "Input Files" box. Click the right mouse button to save the data exceptions
information.

5.6   Base Model Runs

Base Model Run produces the primary PMF output of profiles and contributions.  The base
model run uses a new random seed or starting point for iterations if the "Random Start" option is
selected. A user can test whether the solution found is a local or global minimum by using
many random seeds and examining whether the Q(robust) values are stable. A constant seed
                                          27

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
can be set by unselecting the "Random Start" box. A constant seed with the same number of
factors and runs will generate the same PMF result; the seed is also saved in the configuration
file. The configuration file can be reloaded for additional evaluation of PMF solutions and can
also be sent to collaborators for evaluation of a PMF solution.

Mode! Data Base Model | Rotationa Tools
Data Files Concentration/Uncertainty


D PM2.5
D Aluminum

LI Arsenic
n Barium
Bromine
1 Chlorine

Help


Concentration Scatter Plot [ Concentration Time Series
Species Con






J Chromium
LI Coooer
IJ Elemental Carbon
Jlron
Lead
J Manaanese
n Nickel
LI Oraanic Carbon
JOM

IJ Selenium
L Silicon
D Sodium Ion
J Sulfate
L Titanium
IJ Total Nitrate
J Vanadium
IJ Zinc



Ctear Selections













r





4







3



c
g
c
1
O y


1




-







_










'-_
I


Log Scale
Help| Time Mr: 07/04/0600:00



























L***&*M^
10/01/01 08/01/02








Dala Exceptions



























ft






























7Vr¥«


o
Tne Max: 07/04/0600:00








































J




































2

























































> «4^.iML,










-







_










•;




4







3



o
§
§
I



1

04/01/04 02/01 '05 12/OT05 10/01/06

O
I

Exclude Samples

Restore Samples
Potassium Ion - 3.04000 | Samples Excluded: 1 %
Figure 11.  Example of the Concentration Time Series screen with excluded and selected samples.


5.6.1  Initiating a Base Run

Base model runs are initiated on the Base Model Runs screen under the Base Model tab
(Figure 12). The following parameters need to be specified:

•  "Number of Runs" - the number of base runs to be performed; this number must be an
   integer between 1 and 999.  The recommended number of runs is 20, which will allow for an
   evaluation of the variation in Q.
•  "Number of Factors" - the number of factors the model should fit; this number must be an
   integer between 1 and 999.  The number of factors to be chosen will depend on the user's
   understanding of the sources impacting samples, number of samples, sampling time
   resolution, and species characteristics.
                                        28

-------
U.S. Environmental Protection Agency
                                                           EPA PMF 5.0 User Guide
•  "Seed" - the starting point for each iteration in ME-2; the default is Random Start, which
   tells the GUI to randomly choose a starting point for each run. The random seed number is
   displayed in the "Seed Number" box (Figure 12, 1). To reproduce results, unselectthe
   "Random Start" option, so that the seed number used will be saved as part of the .cfg file,
   and thus an identical solution can be recreated later using the same .cfg (Figure 12, 2).

After the aforementioned parameters are specified, the user should press the "Run" button in
Base Model Runs to initiate  the base runs.  Once runs are initiated, the "Run Progress" box in
the lower right corner of the  screen activates.  Base model runs can be terminated at any time
by pressing the "Stop" button in the "Run Progress" box. The progress bar in this box also fills
whenever runs are performed.  No information about the runs will be saved or displayed if the
runs are stopped.

The status bar on  the Base Model Runs screen displays the same information as on the Data
Files screen.
I Base Model Runs |

rBase Model Run

  Number of Run;
                  Number of Factors:
      Random Start  Seed Number:
                                  a Model Run Summary'
                                                    Q (Robust)
                                                                  Q(Tn
                                                                              Converged
 Model Data  Base Modei  Rotational Tools Help
 | Base Model Runs Base Model Resulls
Number of Runs: [20 Number of Factors: |?
T Random Slort Seed Number: |89 J Run |

Run Number
2
3
Q (Robust)
5789-9
5790.0
QCTrue)
6162S
61631
Converged
Yes
Yes
Figure 12. Example of the Base Model Runs screen showing Random Start (1) and Fixed Start (2).
5.6.2   Base Model Run Summary

When the base runs are completed, a summary of each run appears on the right portion of the
Base Model Runs screen in the Base Model Run Summary table (Figure 13, red box). The
Q-values are goodness-of-fit parameters calculated using Equation 1-2 and are an assessment
of how well the model fits the input data. The run with the lowest Q(robust) is highlighted and
only the converged solutions should be investigated. Non-convergence implies that the model
did  not find any minima. Several things could cause the non-convergence, including
uncertainties that are too low or specified incorrectly, or inappropriate input parameters.

The Q(robust)  and Q(true) values provide a comparison of the fit of the runs; more detail is
provided by comparing the residuals.  The intra-run residual calculation compares the residuals
between base  runs by adding the squared difference between the uncertainty-scaled residuals
for each pair of base runs (Equation 5-5):
                                                                                  (5-5)
                                          29

-------
U.S.  Environmental Protection Agency
                                                              EPA PMF 5.0 User Guide
where r is the scaled residual, / is the sample, j is the variable, and k and / are two different runs.
These results are shown in a matrix and can be used to identify runs with significantly different
fits.  Also, the paired species values for each run can be compared by adding the cf-values
(Equation 5-6).
   Model Data  | Base Model Rotational Tools

              Model Results |
| Base Model Runs

 Base Model Runs

   Number of Runs: 20

   |	; Random Start Seed Nurr



  Base Model Displacement Method

   Selected Base Run: 20

  Base Model Boolstrap Method

   Selected Base Run: 20

       Block Size: 22

   Number of Bootstraps: 100

  Mm. Correlation R-Value: 0.6
     Base Model BS-DISP Method
                    Number of Factors.
                             [H Run |
                                    Base Model Run Summary
Run Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Q (Robust)
6221.2
6221 2
6221.1
6221.2
6221.2
6221.2
6221.2
6221.1
6221.2
6221.2
6221.2
6221.2
6221 2
0221.1
6221.1
6221.1
6221.1
6221.1
6221.2

Q (True)
6731.9
6732.0
6732.0
6732.0
6731.9
6731.9
6731.9
6731.9
6731.9
6731 9
6731.9
6731.9
6731.8
67320
6732.0
6732.0
6731.9
6731.9
6731 9

Converged
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes

Y*"*
Yes

Yes
Yes

Yes
Yes

Yes

           Factor 1 Factor 2 Factor 3 Factor 4 Factors Factors Factor 7
   Help| HAVE Concentration Data  |  HAVE Uncertainty Data     HAVE Base Results
                                                   NO DISP Result!
                                                                NO Bootstrap Results
                                                                              NO 6S-D1SP Result!
    Figure 13.  Example of the Base Model Runs screen after base runs have been completed.
                                                                                         (5-6)
The D-values are reported in a matrix of base run pairs. The user should examine this matrix
for large variations, which indicate that two runs resulted in truly different solutions rather than
merely being rotations of each other.  If different solutions are seen, the user can then examine
the cf-values, which will indicate the individual species that are fitted differently across the runs.

The distribution of species concentration and percent of species sum results are also evaluated
for each of these factors: Lowest Q, Minimum (Min), 25th percentile, 50th percentile, 75th
percentile, Maximum (Max), Mean, Standard Deviation (SD), Relative Standard Deviation
(SD*100/mean), and RSD % Lowest Q.  Large variations in species distributions may indicate
that the factor profile is changing due to process changes, reactivity, or measurement issues.

These intra-run variability results are recorded in the *_diag file and can be viewed through the
GUI by selecting the Diagnostics tab and scrolling to "Scaled residual analysis." In addition, a
factor summary of the species distribution compared to the lowest Q(robust) run is recorded in
                                              30

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


the *_run_comparison file and can be viewed through the GUI by selecting the Diagnostics tab
and the lower window "Run Comparison Statistics."

5.6.3   Base Model Results

Details of the base model run results are provided in the screens under the Base Model Results
tab. The results for the run with the lowest Q(robust) value are automatically displayed. The
user can change the run  number either by highlighting it in the Base Model Run Summary table
on the Base Model Runs screen, or by selecting the run number at the bottom of the Base
Model Results screen.

Residual Analysis

The Residual Analysis screen (Figure 14) displays the uncertainty-scaled residuals in several
formats for the selected run. At the left of the screen (Figure 14, 1), the user can select a
species, which will be displayed in the histogram in the center of the screen (Figure 14, 2). The
histogram  shows the  percent of all scaled residuals in a given bin (each bin is equal to 0.5).
These plots are useful to determine how well the model fits each species.  If a species has
many large scaled residuals or displays a non-normal curve,  it may be an indication of a poor fit.

The species in Figure 14 (sulfate) is well-modeled; all residuals are between +3 and -3 and they
are normally distributed.  Gray lines are provided for reference at +3 and -3.  Selecting the
"Autoscale Histogram" box will set the y-axis range maximum at +10% of the maximum bin
count for each species. If the box is unchecked, the y-axis maximum  is fixed at 100%.  Species
with residuals beyond +3 and -3 need to be evaluated in the Obs/Pred Scatter Plot and Time
Series screens.  Large positive scaled residuals may indicate that PMF is not fitting the species
or the species is present  in an infrequent source.

The screen also displays the samples with  scaled residuals that are greater than a user-
specified value (Figure 14, 3). The default  value is 3.0.  The  residuals can be displayed as
"Dates by  Species" or "Species  by Dates" by choosing the appropriate option above the table.
When a species is selected in the list on the left (Figure 14, 1), the table on the right (Figure
14, 3) automatically scrolls to that species.

Observed/Predicted Scatter Plot

A comparison between observed (input data) values and predicted (modeled) values is useful to
determine if the model fits the individual species well.  Species that do not have a strong
correlation between observed and predicted values should be evaluated by the user to
determine whether they should be down-weighted or excluded from the model.

A table in the Obs/Pred Scatter  Plot screen shows Base Run Statistics for each species (Figure
15, 1). These numbers are calculated using the observed and predicted concentrations to
indicate how well each species is fit by the  model. The statistics shown are the coefficient of
determination (r2),  Intercept, Intercept SE (standard error), Slope, Slope SE, SE, and Normal
Resid (normal residual).  The table also indicates whether the residuals are normally distributed,
as determined by a Kolmogorov-Smirnoff test.  If the test indicates that the residuals are not
                                          31

-------
U.S. Environmental Protection Agency
                                                             EPA PMF 5.0 User Guide
normally distributed, the user should visually inspect the histogram for outlying residuals.  If not
all statistics are visible, the user can use the scroll bars at the bottom and side of the table to
display additional statistics.  These statistics are also provided in the *_diag output file.  The
Obs/Pred Scatter Plot (Figure 15, 2) shows the observed (x-axis) and predicted (y-axis)
concentrations for the selected  species.  A blue one-to-one line is provided on this plot for
reference (a perfect fit would line up exactly on this line), and the regression line is shown as a
dotted red line.  The status bar  on  this screen  (Figure 15) displays the date, x-value, y-value,
and regression equation between predicted  and observed data as data points are moused-over
(Figure 15, 3).
   Model Data  Base Model I Rotational Tools  Help
    lase Model Runs | Base Model Results
I Residual Analysis  Obs/Pred Scatter Plot  Obs/Pred Time Series  Profiles/Contributioi

   ~	         	   " Auloscale Histogram
                                          Factor Fingerprints  G-Space Plot Factor Contribut
    Copper
    Elemental Carbon
    Iron
    Lead
    Manganese
    Nickel
    Organic Carbon
    Potassium Ion

    Silicon
    Titanium
    Total Nitrate
      admm
                                                                  Diagnostics

                                                                   <•" Dates by Species r Species by Dates

                                                                   Absolute Seeled Residual Greater Than: |^0
PM2.5
Ammonium Ion
Ammonium Ion
Ammonium Ion
Ammonium Ion
Ammonium Ion
Ammonium Ion
Ammonium Ion
Ammonium Ion
Ammonium Ion
Ammonium Ion
Ammonium Ion
Ammonium Ion
Ammonium Ion
Bromine
Bromine
Bromine
Bromine
Bromine
Bromine
Bromine
Bromine
Bromine
Bromine
Bromine
Bromine
D 	 ;_„
07/12/050000
02/21/01 00:00
09/10/01 00:00
0/10/01 00:00
0,'13/DI 00:00
0/31/01 00:00
1/03/01 00:00
1/21/01 00.00
02/01/02 00:00
02/02/02 00:00
12/11/0400:00
08/15/060000
09/1 4/06 00:00
09/23/06 00:00
02/09/01 00:00
02/26;Q2 00.00
04/OS;02 00.00
11/16/0200:00
12/07/0200:00
02/14/0300-00
03/19/0300:00
04/03/03 00:00
11/23/0400:00
01/19(0500.00
04/1 G<05 0000
07/12/0500.00
•incjtinK nn-n/i
-4 49SOD
-3.03SOO
-3.05900
-3.26300
-3.53100
-3.04400
-3.15100
^.06400
-3.21600
-3.53300
-3.28700
3.33900
-3.14100
4.23000
3.17500
3.42600
3 42SOO
4.34900
5 05000
-3.31800
3.70600
3.26600
4.46700
4.09200
5 19700
3.36200
t norvir.
                                                                                        ^
    Run 14
   Help |
                      Figure 14. Example of the Residual Analysis screen.


Observed/Predicted Time Series

The data displayed on the Obs/Pred Scatter Plot screen are the same data displayed as a time
series on the Obs/Pred Time Series screen (Figure 16).  When a species is selected by the
user,  the observed (user-input) data for that species are displayed in blue and the predicted
(modeled)  data are displayed in red. The user can view this screen to determine when the
model is fitting the observed data well. If the peak values of a species are not reproduced by
the model, it may be advisable to exclude the species or change the species category to weak.
The status bar on this screen displays the date, and the observed and predicted concentrations
for the sample closest to the black vertical dotted reference line.
                                             32

-------
U.S. Environmental Protection  Agency
                                                                                               EPA  PMF  5.0 User Guide
        Model Data 11 Base Model  Rotational Tools  | Help
        Base Model Runs 11 Base Model Results I
                             :atterPlot  Obs/Pred Time Series   Profiles/Contributions  Factor Fingerprints  G-SpacePlot  Factor Contributions  Diagnostics
                                                           Observed/Predicted Scatter Plot
         Ammonium Ion
         Elemental Carbon
         Potassium Ion
Category
    Weak 0 8
                                     Intercept
                                       2.84878
                                       Intercepl SE
                                            0.2451!
                          Weak  0 32496
                         Strong  0.91581
                                       o.oa
 Weak
 Weak 0
Strong | 0
                                07038
                               .01996
                               .55906
                                 000134
                                 0.02526
                                 0.00209
                                       0 00306
                          Weak  003260
 Weak 0
 Weak
                                00061
                               .20290
                                 0.00117
                                 0.00267
                         Strong  066932   003221
 Weak 0
 Weak 0
                               .250S6
                               .33794
                               .01816
                                 0 00405
                                 0.00169
                                 0.00165
                                       0.80380
                         Strong  0 98335   0 00042
                          Weak  0
                                07633
                               .93401
                               .06611
                         Strong  0.8SS44
                          Weak  028117
                               0.99377
                               0.28745
                                -0.01711
                                 0.00229
                                                  0 02797
                            0 00008
                            0.00203
                            0.00010
                                                  C 00085
                            0 00002
                            0.00009
                                                  0 00255
                            000021
                            0.00008
                            0.00004
                                                  0 08497
                                                  0 00053
                                            0 00008
                                            0.00100
                                            0.00159
                                                  0 OS330
                            0 00797
                            0.00009
        Help I
                                                                                                                 y = 0.96805* + 0.33285
                               Figure  15.  Example  of the Obs/Pred  Scatter Plot screen.
Model Data | Bas
Base Model Runs 1
                   Model  Rotational Tools  Help
                      e Model Results |
                          :! Sr ^tt~' Pl>v  ', 0-j; r- •-=-:! Tir.e ':-?ne5  Profile ;,'0onliibuli.->ns  Factor Fingerprints | G-SpacePlot  Factor Contributions  Diagnostics
        Ammonium Ion
        Arsenic
        Barium
           lental Carboi

         Manganese
         Nickel
         Organic Carbon
         Total Nitrate
         Vanadium
                                   Observed/Predicted Time Series
                                                 10/OV01       08(01VQ2       Oej01/03       Q4AI1/04       Q2IOMQ5       12^01/05       10W1/06
                           06/26/03 00:00
                                                              Observed Concentration = 30.20000
                                                                                                         Predicted Concentration = 29.83110
                               Figure  16.  Example of the Obs/Pred  Time Series screen.
                                                                        33

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
Profiles/Contributions

The factors resolved by PMF are displayed under the Profiles/Contributions screen.  Two
graphs are shown for each factor, one displaying the factor profile and the other displaying the
contribution per sample of each factor (Figure 17). The profile graph, displayed on top (Figure
17, 1), shows the concentration of each species apportioned to the factor as a pale blue bar and
the percent of each species  apportioned to the factor as a red box. The concentration bar
corresponds to the left y-axis, which is a logarithmic scale.  The percent of species corresponds
to the right y-axis. The bottom graph shows the contribution of each factor to the total mass by
sample (Figure 17, 2).  This  graph is normalized so that the average of all contributions for each
factor is 1.  The status  bar on this screen (Figure 17, red box) displays the date and
contributions of data points as they are moused-over on the Factor Contributions plot.

Pull-down menus at the bottom of the Profiles/Contributions screen allow the user to easily
compare runs and factors.  Beginning in the bottom  left corner, each run can be chosen by
toggling to and clicking on the appropriate run number. The user can quickly compare runs to
assess the stability of the solution or determine what, if any, individual species or factors are
varying between runs.  Users can switch between the factors resolved by PMF by using the
pull-down menu second from the left.  Factor 1 is currently selected. The user can create a
stacked plot of  the profiles or time series by first selecting either the factor profile plot or the
factor concentration plot, right-clicking on the mouse to view the menu, and selecting "Stacked
Graphs."
        Model Data  Base Model Rotational Tools  Help
        Base Model Runs  | Base Model Results

         Residual Analysis Obs/Pred Scatter Plot  Obs/Pned Time Senes |~Profiles Contributions Factor Fingerprints  G-Space Plot  Factor Contributions  Diagnostics
                                      factor Profile - Run 4 - Factor 1
•
1
o.x
•
R^m .
•
M ^



L
•

•

•
•
0
I
r»i _
                                                                              Hi
                                                                             GO 3,
                                \\>V  v V> V  \»\\\^
                                     Factor Contributions - Run 4 - Factor 1
                 Normalized Contributions
                                                           Contribution = 7.16700
                   Figure 17. Example of the Profiles/Contributions screen.
                                            34

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
If a total variable is selected, the user can select "Concentration Units" in the bottom left corner
of the Profiles/Contributions screen to display the contributions in the same units as the total
mass (Figure 18). If this option is selected, the GUI multiplies the contributions by the mass of
the total variable in that factor. The status bar displays the date, factor contribution, total
variable selected, and the species factor as they are moused-over on the Factor Contributions
plot (Figure 18, red box).  If no mass  from the total  variable is apportioned to the factor, the
graph is not shown and the GUI instead displays "Total Variable mass is 0 for this  run/factor."
                                                                              I. .=, i B UM
      Mi.cei jata  Base Model Rotational Tools | Help
      Base Model Runs  Ease Model Results
       Residual Analysis  Obs/Pred Scatter Plot  Qbs/Pred Time Senes  Profiles/Contributions Factor Fingerprints  G-Space Plot  Factor Contributions  Diagnostics
                                       Factor Profile - Run 4 - Factor 1
•

-*-
. I71
•
ffl
•
H ' m .
•
flfi^
•

•

¥
&
m

•

¥
0
f
rfi .
         = 10'-   *

                                     Factor Concentrations - Run 4 - Factor 1
                 09/01/01    06/01/02    03/01/03


             i Units I O/Oeyp Run 4	~ JFartnrl
                                                        06/01/05    03/01/06    12/01/06
               07 13/0600:00
                                Concentration = 7.34833
                                                   Total Variable =PM2.5
                                                                       Species Factor = 1.0253
  Figure 18.  Example of the Profiles/Contributions screen with "Concentration Units" selected.


The user can give a factor a name in the Profiles/Contributions screen by right-clicking on the
mouse to view the menu, selecting "factor name," typing in a unique name, and then pressing
"Apply Factor Name." The new factor name(s) will appear on the Factor Fingerprints, G-Space
Plot, Factor Contributions, and Diagnostics screens.  Factor 1  has high concentrations of sulfate
and ammonium  ions and it represents secondary sulfate formation from the combustion of coal
in power plants. The identification of factors from PMF requires review of measured species
relationships. Some sources may be easily identified; an industrial source, for example, may be
dominated by peaks in zinc concentrations.  Other sources may be more difficult to identify.

The species  Q/Qexpected (Q/Qexp) can be displayed by selecting the "Q/Qexp" toggle on the
Profiles/Contributions tab (Figure  19). Qexpected is equal to (number of non-weak data values in
                                             35

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
X) - (numbers of elements in G and F, taken together).  For example, for five factors, 642
samples, and 19 strong species, this equals (642*19) - ((5*642)+(5*19)), or 8893.  For each
species, the Q/Qexp for a species is the sum of the squares of the scaled residuals for that
species, divided by the overall Qexpected divided by the number of strong species. For each
sample, the Q/Qexp is the sum of the square of the scaled residuals over all species, divided by
the number of species. Examining the Q/Qexp graphs is an efficient way to understand the
residuals of the PMF solution, and in particular, what samples and/or species were not well
modeled (i.e., have values greater than 2).  A comparison of the species results shows that EC
and OC have elevated Q/Qexp values, which might indicate that motor vehicle contribution
could be better explained by adding another source (Figure 19, 1). Also, the time series of
Q/Qexp values  shows two days where the species concentrations were not fit as well compared
to other days (Figure 19,  2). These days might have had unique source impacts and should be
investigated further.
 Model Data Base Model  Rotational Tools  Help
  = M:id = Run; i E 33e Model Faults
 ResidualAnalysis Obs/Pred Scatter Plot Obs/Pred Time Seties | Profiles/Contributions Factot Fingerprints I G-SpacePlot  Factor Contributions Dia
                                     QJQexp-Run 12
;





1
rp -r- I"!
' ]


} nnn :
I r-i _ I . I , m m rp
                                     Q/QexD-Run 12
 Helpf
                           Concentraton = 7.67985
                                              Total Variable = PM2.5
                                                                 Species Factor = 6.317
        Figure 19. Example of the Profiles/Contributions screen with "Q/Qexp" selected.
Factor Fingerprints

The concentration (in percent) of each species contributing to each factor is displayed as a
stacked bar chart in the Factor Fingerprints screen (Figure 20). This plot can be used to verify
factor names and determine the distribution of the factors for individual species.  The plot only
displays the currently selected run. To change runs, the user can select a different run number
                                            36

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
at the bottom left-hand corner of the Residual Analysis, Obs/Pred Scatter Plot, Obs/Pred Time
Series, or Profiles/Contributions screens.
  Residual Analysts Obs/Pred Scatter Plot Obs/Pred Time Series  Profiles/Contributions [ Factc'i Fmgei punts;  G-Space Plot  Factor Contributions  Diagnostics
                                    Factor Fingerprints-Run 12
n
                          I
                                  i


   r
                 • - Secondary Sulfa

                 • -Potassiums Bio

                 i -Secondary Nitra

                 • -Crystal

                  -Industrial

                 T! - Steel Productto

                 3 -MotorVehicle
           \
                     Figure 20. Example of the Factor Fingerpints screen.
G-Space Plot
The G-Space Plot screen (Figure 21) shows scatter plots of one factor versus another factor,
which can be used to assess rotational abiguity as well as the relationship between source
contributions. A more stable solution will have many samples with zero contributions on both
axes, which provide greater stability in the PMF solution to less rotational ambiguity.  A solution
or combination of sources may also have no points on or near the axes, which results in greater
rotational ambiguity. The user selects one factor for the y-axis and one factor for the x- axis
from lists on the left of the screen. A scatter plot of these factors will be shown on the right of
the screen.  The plot in  Figure 21 is  an example of a non-optimal rotation of a factor, which has
an upper edge that is not aligned with the axis in  the G-Space plot (red line added for
reference).  In EPA PMF, the user can explore different rotations via the Fpeak option (Paatero
et al., 2005), which is explained in detail in Section 6.1. The G-Space plots are also useful for
understanding the relationship  between the factor source contributions and the pattern in Figure
21 shows not relationship between regional secondary sulfate and local steel production.
                                             37

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
   Model Data j Base Model Rotational Tools  Help

   Base Model Rur
     Select Factors
     YAxis
     Secondary Sulfate
     Potassium SBioma;
     Secondary Nitrate
     Crustal
      idustnal
     Motor-Vehicle
     Secondary Nitrate
     Crustal
      idustrial
     Steel Production
     Motor Vehicle
                                                           G-Space Plot - Run 12
                                                         Secondary Sulfate Contributions (avg=l)
   Help |
        Figure 21. Example of the G-Space Plot screen with a red line indicating an edge.


Factor Contributions

The Factor Contributions screen (Figure 22) shows two graphs. The top graph is a pie chart
which displays the distribution of each species among the factors resolved by PMF (Figure
22, 1). The species of interest is selected in the table on the left of the screen; the
categorization of that species is also displayed for reference.  If a total variable was chosen by
the user under the Concentration/Uncertainty  screen, that variable  is boldfaced in the table.
The pie chart for the selected species is on the right side of the screen. If the user has specified
a total variable, the distribution of this variable across the factors will be of particular importance.
The user may also want to examine the distribution of key source tracer species across factors.
The bottom graph shows the contribution of all the factors to the total mass by sample (Figure
22, 2). The dotted orange  lines denote January 1 of each year. The graph is normalized so that
the average of all the contributions for each factor is 1, to allow for a comparison of the temporal
pattern of source contributions.

Diagnostics

The Diagnostics screen displays two outputs,  which are also saved in the output directory:
*_diag and the *_run_comparison  file.
                                             38

-------
U.S. Environmental Protection Agency
                 EPA PMF 5.0 User Guide
    Model Data j Base Model Rotational Tools Help
    Base Model Runs | Base Model Results

    Residual Analysis  Obs/Prsd Scatter Plot Obs/Pred Time Series Profiles/Contributions j Factor Fingerprints G-Space Plot | Factor Contributions  Diagno;
     Ammonium Ion
            Strong
     Elemental Carbon
     Manganese
                                                     -• 505%
• SeconOiry Sulftt* • 6.31700 (46 2 %>

• Polusium A Biomsss • 1.02800 lT4*j

D SeconOary Nitrate = t.46600 [10.5*1

• CruatBl = 0.50491 (36%)

• Industrial -034440 (2.5%)

D ==:**! Production • 1 04610 (7 SSj

B MnlorVWicle-3.2S7SO (23.3%)
                                  Factor Contributions [avg = 1! from Base Run #12 (Convergent Run!
                                                       ^m:^m^^&k
              OS/01/01     04/01/02    12/01/02     OB/01/03     04/01/04    12/01/04     OB/01105    O4..'01'06
    ! Run 12

    Help |
                    Figure 22.  Example of the Factor Contributions screen.


Output Files

After the base runs are completed, the GUI creates output files that contain all of the data used
for the on-screen display of the results.  The number of output files created depends on the type
of output file selected: tab-delimited (*.txt) and comma-delimited (*.csv) create five output files -
*_diag, *_contrib, *_profile, *_resid and *_runcomparison. Excel Workbook (*.xls) creates two
output files - *_diag and *_base. The output files are saved to the directory specified in the
"Output Folder" box in the Data Files screen, using the prefix specified in the "Output File Prefix"
box.
•   *_diag contains a record of the  user  inputs and model diagnostic information (identical to the
    Diagnostics screen).
•   *_contrib contains the contributions for each base run used to generate the contribution
    graphs on the Profiles/Contributions tab. Contributions are sorted by run number.
    Normalized contributions are shown first, followed by contributions in mass units if a total
    variable is specified.
•   *_profile contains the profiles for each base run  used to generate the profile graphs on the
    Profiles/Contributions tab.  Profiles are sorted by run number.  Profiles in mass  units are
    written first, followed by profiles in percent of species and concentration fraction of species
    total if a total  mass variable is specified.
•   *_resid contains the residuals (regular and scaled by the uncertainty) for each base run,
    used to generate the graphs and tables on the Residual Analysis screen.
                                             39

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


•  *_run_comparison contains a summary of the species distribution for each factor over all
   PMF runs and compared to the lowest Q(robust) run.
•  *_base contains the *_contrib, *_profile, *_resid and *_run_comparison on separate
   worksheets in the same Excel Workbook. This output file only appears if the user selects
   "Excel Workbook" as the output file type.

5.6.4   Factor Names on Base Model Runs Screen

The Factor Name can be entered or changed on the Profiles/Contributions screen or the Base
Runs screen. After the base runs are completed, the "Factor Names" box located in the lower
left portion of the Base Model Runs screen will be populated (Figure 23, red box).  Each row in
the matrix will be labeled by run number, in ascending order, and each column will be labeled by
factor number, in ascending order. The table is then populated with the factor name associated
with each column header.

The factor names are used to indicate specific solutions in the tools for assessing model results.
Users can input their own factor names, which will replace the defaults in the Factor Names
table and be saved in the configuration file.  The user can also set a unique factor name for all
the base runs by inputting the name in one cell and then pressing the "Apply to All Runs" button;
update factors names in the profile and contribution files by pressing the "Update Diag Files"
button; or reload the default factor names into the Factor Names table by pressing "Reset to
Defaults."

It should be noted that, if the user loads an existing configuration file with user-defined factor
names and initiates base model runs with random seeds, the factor order in the run solutions
may change.  In this case, the GUI will generate a pop-up warning to remind the user to verify
that previous factor names are appropriate.

Short descriptions of the error estimation methods available in PMF are shown in Figure 24
along with the example base factor concentration (blue) and upper error limits for the three
methods. The upper error estimate for BS is the lowest for the zinc source and the estimates
increase for the  DISP and BS-DISP.  Random errors are estimated with the BS method
described in this section. Also, the Methods for Estimating  Uncertainty in Factor Analytic
Solutions paper (Paatero et al., 2014) provides a detailed description of the PMF error
estimation methods.
                                         40

-------
U.S. Environmental Protection Agency
                                                                           EPA PMF 5.0 User Guide
  Moi
| Base Model Runs  Base Model Results

  Base Model Runs

   Number of Runs. 20       Number of Factors: 7

      Random Start  Seed Number: 89       JQ

  Error Estimation

  Base Model Displacement Method
                                          Base Model Run Summary
     Selected Base Run: 12

  Base Model Bootstrap Method

     Selected Base Run: 12

         Block Size: 21

    Number of Bootstraps: 100

  Min. Correlation R-Va!ut
                                                  Run Number
                                                                                       6163,2
                                                                                       6163.1
    Apply to All Runs [  Update Output Files   Reset to Defaults
  Help | HAVE Concentration Data  |  HAVE Uncertainty Data      HAVE Base Results        NO DISP Results       NO Bootstrap Results      NO BS-DISP Result
  Figure 23.  Example of the Base Model Runs screen with default base model run factor names.
   Displacement (DISP) intervals include effects of
   rotational ambiguity.  They do not include effects
     of random errors in the data. For modeling
       errors, if the user misspecifies the data
   uncertainty, DISP intervals are directly impacted.
     Bootstrap (BS) intervals include effects from
     random errors and partially include effects of
   rotational ambiguity. For modeling errors, if the
     user misspecifies the data uncertertainty, BS
           results are still generally robust.
     BS-DISP intervals include effects of random
    errors and rotational ambiguity.  For modeling
   errors, if the user misspecifies data uncertainty,
   BS-DISP results are more robust than for DISP
     since the DISP phase of BS-DISP does not
        displace as strongly at DISP by itself.
                                                     Zinc DISP
                                                       Zinc BS
                                                   Zinc BS-DISP
                                                                         Concentration ng/m3
                   Figure 24.  Comparison of upper error estimates for zinc source.
                                                       41

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


5.7    Base Model Displacement Error Estimation

The DISP explicitly explores the rotational ambiguity in a PMF solution by assessing the largest
range of source profile values without an appreciable increase in the Q-value. The DISP Error
Estimation can be run without running BS or can be run after BS and BS-DISP (discussed in
Sections 5.8 and  5.9, respectively). For the solution chosen by the user, each value in the
factor profile is first adjusted up and down and then all other values are computed to achieve the
associated PMF (convergence to a Q-minimum).  It is important to note that the newly computed
minimum Q-value (modified) may be different from the Q-value associated with the unadjusted
solution (base). The adjustment in factor profile values (up and down) is always the maximum
allowable, with the constraint that the difference (dQ = base - modified) because of this
adjustment is no greater than the dQmax (dQ <= dQmax).  The model generates results for the
following dQMax values:  4, 8, 15, and 25.  For each dQmax value, DISP is  executed and
intervals (minimum and maximum source profile values) are summarized for each element in
each factor profile. For example, if 20 species are in a data set and a 7-factor model has been
fitted, then the  DISP method will estimate 20 x 7 = 140 intervals for each dQmax value.

Simulations indicate dQmax values of 4 and 8 provide the smallest error ranges with the least
number of base factor values outside the range. EPA PMF provides results for all dQmax, but
plots are only shown for dQmax of 4 because this should provide robust intervals for nearly  all
data sets.  DISP intervals may be calculated for both the base model solutions and base model
solutions with added constraints. Press the "Run" button in the Base Model Displacement
Method box to  start DISP.

The DISP output  is shown in Figure 25, along with guidance on interpreting  the output. When
the DISP method is completed, two output files (*_DISPest.dat and *_DISP.txt) are saved in the
directory specified in the Output Folder box  in the Data Files screen.  The .dat file is in a concise
format most usable by software and is not intended for users to view; there are no labels in this
file, only numbers. The .txt file is a very large text file with details about the  models fitted and
the resulting  DISP intervals.

Four files are output from DISP,  one for each dQmax used, and the user-provided output file
prefix is placed at the start of the file name and is denoted in this user guide as an asterisk (*)
(dQmax=4, 8, 16, 32; *_DISPres1, *_DISPres2, *_DISPres3, *_DISPres4). In each file, there
is a line with  two numbers, followed by four lines of data. In the first line, the first value is an
error code:  0 means no error; 6 or 9 indicates that the run was aborted.  If this first value is
non-zero, the DISP analysis results are considered invalid. The second value is the largest
observed drop  of Q during DISP.

Below the first line is a four-line table that contains swap counts for factors (columns) for each
dQmax level (rows). The first row is for dQmax = 4, the second row dQmax=8, the third
dQmax=15 and the fourth dQmax=25.  The  swap counts are a key indicator of the stability of a
PMF solution and swaps at dQmax = 4 or the first row in the table indicate that the solution
should not be interpreted.
                                         42

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
 Model Data
  Base Model Runs  Base Model Results   Base Model Bootstrap Results  Base Mode! DISP Results  Base Model BS-DiSP Results  Error Estimation Summary
}ISP Box Plots
0
0 0
0 0
0 0
0 0
[ DISP Summary!
-0.081
000
000
000
000


0 0
0 0
0 0
0 0
In the first line the first value is an error





code: 0 means no error; 6 or 9 indicates
   considered invalid. The second value is the largest observed drop of Q during DISP.


   Below the first line is a table (four lines) which contains swap counts for factors
   (columns) for each dQmax level {rows}. The first row is for dQraax = 4, the second row
   dQmax=S, the third dQntax=15 and the fourth dQmax=25. If any swaps are present for dQmax=4,
   the solution has a large amount of rotational ambiguity and caution should be used if
   interpreting the solution.


   Results for dCmax=4 are graphed in the DISP box plot tab. Detailed DISP results are
   included in the *_DISPresl-4.txt files (corresponding to the four dQmax levels) in the
   output folder.


   Note: DISP intervals include effects of rotational ambiguity. They do not include effects
   of random errors in the data.  For modeling errors, if user misspecifies the uncertainty of
   the concentration data, DISP intervals are directly impacted. Hence intervals for
   dcwnweighted or "weak" species are likely too long.
              Figure 25. Example of the Base Model Displacement Summary screen.


If factor swaps occur for the smallest dQmax, it indicates that there is significant rotational
ambiguity and that the solution is not sufficiently robust to be used. If the decrease in Q is
greater than 1%, it likely is the case that no DISP results should be published unless DISP
analysis is redone after finding the true global minimum  of Q. To improve the solution, the
number of factors could be reduced, marginal species could be excluded, or unusual events in
time series plots could be excluded.

Below these diagnostics in the *_DISPresX data files are four blocks of data, where each
column is a factor and each row a species: (1) the profile matrix upper bound, in concentration
units; (2)  the profile matrix lower bound, in  concentration units; (3) the  profile matrix upper
bound, in % species units; (4) the profile matrix lower bound, in % species.  The DISPPres files
are output directly from ME and are for users who want to process the output.  The DISP results
for a dQmax of 4 are summarized in an easy-to-use file:  *_ErrorEstimationSummary.

5.8    Base Model  BS  Error Estimation

BS is used to detect and estimate disproportionate effects of a small set of observations on the
solution and also, to a lesser extent, effects of rotational ambiguity. BS data sets are
constructed by randomly sampling blocks of observations from the original data set.  The block
length depends on the data set and is chosen so that each BS data set preserves the
underlying serial correlation that may be present in the base data set.  Blocks of observations
are randomly selected until the BS data set is the same  size as the original  input data.
                                               43

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


A number of BS data sets (e.g., 100) are then processed with PMF, and for each BS run, the BS
factors are compared with the base run factors using the following method: the BS factor is
mapped to the base factor with which the BS factor contribution has the highest correlation (and
above a user-specified threshold). If no base factors correlate above the threshold for a given
BS factor, that factor is considered "unmapped."  This process is repeated for as many BS runs
as the user specifies. There can be instances when multiple BS factors from the same run may
be mapped to the same base factor.

EPA PMF then summarizes all the bootstrapping runs. The user should examine the BS results
to determine if the base run (blue square) is within the interquartile ranges (box) around the
profiles. Species with their base run value outside of the interquartile range should be
interpreted with caution because a small  set of observations may have impacted the base run
results or the species concentration in the factor could be insignificant. The mapping of BS
factors to base factors will ideally be one-to-one.  That is, factors from each BS run factor
should match exactly one, and only one,  base factor.  However, it is likely that the presence (or
absence) of a few critical observations can dramatically impact the BS factor  profile. In such
instances, the affected BS factors may closely match a particular base factor most of the times
and some other base factor the rest of the time. In addition, specification of too many factors in
the base model may also create a phantom factor. Any factor with approximately 80% or less
mapping from the BS run should have the major contributing species  in the profile investigated
and further  evaluation of the base model  results should be done with the BS-DISP and DISP
error estimation methods.

Initiating BSRuns

Bootstrapping captures the error associated with random errors and it is initiated under the Base
Model tab, in the Base Model Runs screen (Figure 26, red box).  As with the base runs, the user
must make  multiple choices prior to initiating the BS runs:

•  Base Run - the base run to be used  to map each BS run.  The base run with the lowest
   Q(robust) is automatically provided; the user can enter another run number.
•  Block Size - the number of samples that will be selected  in each step of  resampling. For
   example, a block size of three means that each BS block will comprise three samples from
   the input data set (i.e., samples 8-10  could  be one block). The default block size is
   calculated according to Politis and White (2003), but can be overridden by the user. If the
   default has been overridden, the user can press the "Suggest" button to restore the default
   value.
•  Number of Bootstraps - the number of BS runs to be performed. It is recommended that
   100 BS  runs be performed to ensure  the robustness of the statistics; for preliminary
   analysis, 50 BS runs may be performed to quickly gauge the stability of a solution.  A
   minimum of 20 BS runs are  required.
•  Minimum Correlation R-Value - the minimum Pearson correlation coefficient that will be
   used in  the assignment of a BS run factor to a base run factor. The default value is 0.6.  If a
   large number of factors are unmapped, the user may want to investigate the impact of
   lowering the R-value.  This change should be reported with the final solution.
                                          44

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
After all input parameters have been entered, the BS runs can be initiated by pressing the "Run"
button inside the Base Model Bootstrap Method box. As with the base runs, the user can
interrupt the runs by pressing the "Stop" button in the lower right corner of the Base Model Runs
screen.  No outputs will be saved or overwritten if the run is interrupted.
S, EPA PMF '
Model Data Base Model Rotational Tools
Help
Base Model Runs Base Model Results Base Model Bootstrap Results Er
Base Model Runs
Number of Runs: 20 Number of Factors: 7
n Random Start Seed Number:
Error Estimation
Base Model Displacement Method
Selected Base Run: 12
Base Model Bootstrap Method
Selected Base Run: 12
3 |B Run |

Block Size: 22 | Suggest j
Number of Bootstraps: 100
Mm Correlation R-Value: 0.6
D
Base Model Run Summary
Run Number
1
2
3
4
5
6
7
8
9
10
11
13
14
15
16
17
18

Q (Robust) Q (True) Converged
62212
62212
6221 1
6221.2
6221.2
6221.2
6221.2
6221 1
6221 2
6221 2
6221 2
6221.2
6221.1
6221.1
6221-1
6221 1
6221 1
6731.9
6732.0
6732.0
6732.0
6731.9
6731.9
6731.9
6731 9
6731-9
6731 9
6731 9
6731 .8
6732.0
6732-0
6732.0
6731.9
6731 9
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes

Yes
Yes
Yes
Yes
Yes

       Figure 26. Example of the Base Model Runs screen highlighting the Base Model
       Bootstrap Method box.
5.8.1   Summary of BS Runs

A summary of base model BS runs is presented in the Base Bootstrap Summary screen under
the Base Model Bootstrap Results tab (Figure 27), which appears only after the BS has been
run. The first eight lines in this screen contain all the input parameters for bootstrapping, as
specified by the user in  the Base Model Runs screen. The summary screen also includes
several tables that summarize the BS run results.  The first table is a matrix of how many BS
factors were matched to each base factor. The next table shows the minimum, maximum,
median, and 25th and 75th percentiles of the Q(robust) values.  The rest of the summary is the
variability in each factor profile, also given as the mean, standard deviation, 5th percentile, 25th
percentile, median, 75th percentile, and 95th percentile, using weighted average percentiles  (see
equation 5-2).  The base run of each profile is included as the first column for reference, as is a
column indicating if the  base run profile is within the interquartile range of the BS run profiles.

EPA PMF also calculates the Discrete Difference Percentiles (DDP) associated with the BS
runs and reports these values in the Base Bootstrap Summary screen. This method estimates
the 90th and 95th percentile confidence intervals (Cl) around the base run profile, reported as
percentages.  The DDP is calculated by taking the 90th and 95th percentiles of the absolute
differences between the base run and the BS runs for each species in each profile and
expressing it as a percentage of the base run value.  If the DDP percent is greater than 999, a
"+" is displayed on screen.  The original value is saved in the output files (*_diag and *_boot). If
the base run value for a species is zero, it is not possible to calculate the DDP; in these cases,
an asterisk (*) is displayed. The DDP values can be used for reporting the BS error estimates.

For this example, the base and boot factors are matched except for three factors with three runs
that were mapped to factor 7. The crustal (factor 4) and motor vehicle (factor 7) contain crustal
                                          45

-------
U.S. Environmental Protection Agency
                                                               EPA PMF 5.0 User Guide
elements and the steel source also was mapped to three other sources, which could be due to
BS not creating a data set with all of the samples with high steel production impacts. The total
number of mapped factors may also not add up to the number of BS runs if the boot factor run
did not converge. Mapping over 80% of the factors indicates that the BS uncertainties can be
interpreted and the number of factors  may be appropriate.
 Base Model Runs  Base Model Results 11 Base Model Bootstrap Results  Error Estimation Summary

 Bootsirap BOX^IOIB  SooMtiac- '^unvr,
Numbe
    model run number:
    r of bootstrap rai
    trap random seed:

    r of factors :
     modeling uncerta;
                  12
                  100
                  Rando:
                  0.6
          Base Factor 1  Base Factor 2  Base Factor 3 Base Factor 4  Base Factor 5  Base Factor 6  Base Factor 7
 Boot Factor 1
   it Factor 2
 Boot Factor 3
  .at Factor 5
 Boot Factor
          25th
          5460
                 Median
                 5646
75th
5849
Max
6257
 Factcr Mean
1.1544E+QOQ
9.Q540E-QQ1
I.1108E+000
1.2B87E+000
1.0023E+000
S.1640E-QQ1
7.1945E-001
4.Q537E-001
6.1649E-001
8.1895E-001
8.3414E-QQ1
8.0732E-001
6.Q700E-001
8.1B78E-001
9.2090E-001
1.Q376E+QOO 1
9.0079E-001 9
1.0Q64E+000 1
1.0585E+000 1
9.9327E-001 1
2184E+000 2
9289E-001 1
3792E+OOQ 2
4969E+000 2
0686E+OOQ 1
7230E+QQO
1430E+OQG
3486E+OOQ
963QE+OQQ
2132E-j:O
 Secondary Sulfate
                                         Within Bootstrap Runs:
                                         Mean       Std. Dev.
                                         6.3324C4-QOO   1.6746E-001
                                                         5th
                                                         6.0612E4-000
                                       25th
                                       6.1949E+000
                                       Median     75th
                                       6.3200C+000   E.4380E+00
                  Figure 27.  Example of the Base Bootstrap Summary screen.


5.8.2  Base Bootstrap Box Plots

The variability in BS runs is shown graphically in the Base Bootstrap Box Plots screen (Figure
28). Two graphs are presented: the variability in the percentage of each species (Figure 28, 1)
and the variability in the concentration of each species (Figure 28, 2), which corresponds to the
Variability in Factor Profiles table in the Base Bootstrap Summary screen. In  both box plots, the
box (Figure 29) shows the interquartile range (25th-75th percentile) of the BS runs. The
horizontal green line represents the median BS run and the red  crosses represent values
outside the interquartile range.  The base run is shown as a blue box for reference.  Values
outside of the interquartile range are shown as red crosses. At the bottom of  this screen, the
base run numbers are grayed out and not selectable; however, the base run used for
bootstrapping is highlighted in orange.  The user can select the factor they want to view by
clicking on the factor number across the bottom of the screen. The Variability in  Concentration
                                             46

-------
U.S.  Environmental Protection Agency
                                                                EPA PMF 5.0 User Guide
of Species is shown in the bottom plot. Species with the base run profile value (blue box)
outside interquartile range (tan box) should be interpreted only after evaluating the two
additional error estimation results in PMF. These species have influential BS observations that
biased either the base or BS runs; the DISP and BS-DISP will provide more reliable error
estimates.
 Model Data | Base Model Rotational Tools Help
 Base Model Runs
          Base Model Results Base Model DISP Results | Base Model Bootstrap Results  Error Estimation Summary
 j Bootstrap Box Plots  Bootstrap Summary
                                   Variability in Percentage of Species - Run 12 - Factor 1
        A
: | Run 12
Help]
                                   Variability in Concentra ion of Species - Run 12 - Factor 1
                                                      t
                                     |*j
                  Figure 28. Example of the Base Bootstrap Box Plots screen.
                                              Base run
                                              value
                                              25th-75th
                                              Percentile of
                                              Bootstrap
                                               Median of
                                               Bootstrap

                                         Values below 25th
                                         and above 75th
                                         percentiles
                                Figure 29.  Diagram of box plot.
                                              47

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


5.9    Base Model  BS-DISP Error Estimation

BS-DISP estimates the errors associated with both random and rotational ambiguity and it is run
from the Error Estimation section of the Base Model Runs screen. BS-DISP may take many
hours to run due to the number of combinations that are evaluated, so it is recommended that
the user evaluate the  BS-DISP results first with less than 100 BS runs (50 is recommended); for
final BS-DISP results, use 100 BS runs.

BS-DISP is a combination of BS and the DISP method. The BS Error Estimation must be run
before BS-DISP because each BS resample undergoes a DISP analysis so that error limits are
found for all F (profile) factor elements. This process may be viewed as follows:  each DISP
defines the span of rotationally accessible space.  Each BS resample moves this space around,
randomly in different directions. Taken together, all the replications of the rotationally
accessible space,  in random locations, represent both the random uncertainty and the rotational
uncertainty.

The limits obtained by displacing a factor element include both rotational ambiguity and
variability due to input data uncertainty.  To speed up computation of BS-DISP, it is suggested
that only a small subset of all F factor elements are adjusted. Downweighted variables create a
special problem in DISP computations. If such variables are adjusted, the error intervals can be
very large (based on simulated data evaluations).  The error estimates for downweighted
species are best estimated from the results obtained from adjusting non-downweighted species.

BS-DISP provides the change in Q associated with the displacement.  Occasionally, it is seen
that displacements cause a significant decrease of Q, typically by tens or by hundreds of units.
If such a decrease occurs in DISP or BS-DISP, it means that the base case solution was in fact
not a global minimum, although it was assumed to be such.  The value associated with a
significant change in Q is still being evaluated, but the initial guidance is that a change in Q
greater than 1% is significant.  If the change in Q is greater than 0.5%, it is recommended to
increase the number of Base Model runs to 40 to find a global minima.

A key output from DISP and BS-DISP analyses is the extent of factor swapping, usually
resulting from a "not-well-defined" solution (i.e., a solution where factor identities are fluid).  A
sample BS-DISP output is shown in Figure 30 along with guidance on interpreting the output.
Starting from the most plausible solution, it is possible to transform the solution gradually,
without significant  increase of Q, so that factor identities change. In the extreme case, factors
may change so much  that they exchange identities.  This is called factor swap.  Physically,  a
solution with swapped factors represents the same physical model as the original solution.
However, the presence of factor swaps means that all those intermediate solutions also exist
and must be considered as alternative solutions.

For a higher dQmax, a larger uncertainty interval or Cl is usually obtained. The larger the
interval, the higher the chance that it contains the true unknown value. Cl is displayed along
with the profile values in the BS-DISP Box Plots tab. The dQmax values are still being
evaluated and a dQmax of 4 for DISP and 0.5 for BS-DISP provide lower bounds for the true
uncertainty estimates  if the input data uncertainties are reasonable.  Smaller dQmax values are
used in BS-DISP versus DISP because the combination of bootstrapping and DISP should
                                          48

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
capture nearly all the uncertainty within the solution. All dQmax values should be evaluated to
determine whether the solution is well-defined.
EPA PMF
Model Data | Base Me
Base Model Runs B
BS-DISP Box Plots |["
del Rotational Tools Help
gse Model Results Base Model Bootstrap Results Base Model DISP Results [ Base Model BS-DISP Results Error Estimation Summary

BS-DISP Summary
99 -18.907 101
0001010
0001010
0001010
0001010
   The five values in the first line are:

   (1) # of cases used in BS-DISP, i.e., the base run plus the number of accepted (not
   rejected) resamples.  If all cases were accepted,  then this value will be the number of
   bootstraps •*- 1.

   (2) Largest decrease of Q. A large value is not alarming in itself, it only says  that
   there was at least one resample where a deeper minimum appeared.

   (3,4,5} # of cases with:  /drop of Q / swap in best fit / swap in DISP phase/

   Below the first line is a table  (four lines) which contains swap counts for factors
   (columns} for each dQmax level (rows), which are in descending order dQmax=0.5, 1, 2, 4.
   If swaps are present in the first line for the lowest dQiaax, it indicates the solution is
   not well constrained, and caution used when interpreting the solution.

   Detailed BS-DISP results are included in the *_3SDISPresl-4.txt files (corresponding to
   the four dQmax levels) in the output folder.

   Note: BS-DISP intervals include effects of random errors and rotational ambiguity.
                Figure 30. Example of the Base Model BS-DISP Summary screen.


Sample results from the BS-DISP Summary tab are shown in Figure 30 after using key species
from each of the sources (sulfate, potassium ion, total nitrate, silicon, zinc, iron, and EC).

The BS-DISP results in Figure 30 show that the solution does not have significant rotational
ambiguity and the base model and error estimates can be interpreted.  Having no swaps at all,
dQmax provides confidence that the solution is well constrained and the BS-DISP results can
be reported.

If factor swaps are produced at dQmax = 0.5, then the number of factors in the solution and BS
and DISP results need to be evaluated before reporting the BS-DISP results.  Because the
BS-DISP is a combination of BS and DISP, it is suggested that the results of each component
be evaluated to understand what might be causing the swaps. Steps to reduce the number of
swaps include reducing the number of factors and adding constraints.

Four files are output from BS-DISP, one for each dQmax used; the user-provided output file
prefix is placed at the start of the file name and is denoted in this user guide as an asterisk (*)
(dQmax=0.5,  1, 2, 4;  *_BSDISPres1, *_BSDISPres2, * _BSDISPres3, *_BSDISPres4).  These
contain the same summary diagnostics that are provided in the BS-DISP Summary tab.  The
five values in the first line of diagnostics that are displayed within the EPA PMF program  are:
                                             49

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


   1.  k, the number of cases in the file. This includes both the full-data case and the accepted
       (not rejected) resamples; if all bootstrap cases were accepted, this value would be equal
       to one plus the number of bootstraps (the extra one run is an initialization run). If no
       cases were excluded,  k should be equal to the number of bootstraps times the number
       of factors times the number of species selected for BS-DISP.
   2.  Largest decrease of Q. A large value is not necessarily alarming, but it indicates that
       there was at least one resample where a deeper minimum appeared. A  large value for a
       decrease in Q is approximately 1% or more of Q(robust); more testing is required to
       provide better guidance on this value.
   3.  Number of cases with  drop of Q.
   4.  Number of cases with  swap in best fit.
   5.  Number of cases with  swap in DISP.

Below the first line of diagnostics in the BS-DISP summary is a four-line table that contains
swap counts for factors (columns) for each dQmax level (rows), which are in ascending order
(dQmax=0.5, 1, 2, 4). In the best case,  all of the swaps  are zero; however, the probability of
creating a BS data set that results in a swap is based on the data characteristics (i.e. peaks),
the number of BS runs, and the  number of factors.  The  profiles and DISP results should be
evaluated to determine whether there is a reason for the swaps. A result with swaps between
two factors is more reliable than swaps occurring across many factors. For this  example, the
swaps are occurring between  the crustal (factor 4) and steel production (factor 6), which have
many common elements. Also,  the number of swaps is  one for two factors, which indicates
some ambiguity between the factors.

The output files from BS-DISP contain many blocks of data following the diagnostics shown in
Figure 30. The first two blocks of data are the initial run data,  with each row representing a
species and each column a factor. The last line of each block is always a series of "T's as a
placeholder. There are four blocks of data for each BS resample: (1) profile matrix for BS
resample #1 after displacing down, in concentration units;  (2) profile matrix for BS resample #1
after displacing up, in concentration units; (3) profile matrix for BS resample #1  after displacing
down, in % species; (4) profile matrix for BS resample #1 after displacing up, in % species.
These four blocks are then repeated for each BS resample. The BSDISPPres files are output
directly from ME and are for users who want to process the output. The BS-DISP results for a
dQmax of 0.5 are summarized in an easy to use file: *_BaseErrorEstimationSummary.

5.10   Interpreting  Error Estimate Results

A comprehensive set  of error  estimates are available and the results are added to the summary
files for easy use after running each error estimation method (*_BaseErrorEstimationSummary,
*_FpeakErrorEstimationSummary, *_ConstrainedErrorEstimationSummary). The summary files
contain the species and diagnostics as well as the error estimates by factor for concentrations,
percent of species sum, and percent of total variable if one is selected.

The error estimation information is summarized in the *_BaseErrorEstimationSummary file and
the following figure after each error estimation method is run.  The
                                          50

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
*_BaseErrorEstimationSummary file has a useful summary of the factor error estimates:  Base
Value, BS 5th, BS Median, BS 95th, BS-DISP 5th, BS-DISP Average, BS-DISP 95th, DISP Min,
DISP Average, and DISP Max.  Figure 31 shows the error estimation summary plot for the three
error estimates.
 Model DataJ Base Model Rotational Tools Help

 Base Model Runs Base Model Results Base Model DISP Results Base Model Bootstrap Results  Base Model BS-DISP Results | Error Estimation Summary

  Run 12

 Help |
                                            s.  \ \
                          Figure 31.  Error estimation summary plot.
                                            51

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


6.     Rotational Tools

In general, the non-negativity constraint alone is not sufficient to produce a unique solution. An
infinite number of plausible solutions may be generated and cannot be simply disqualified using
mathematical algorithms.  Rotating a given solution and evaluating how the rotated results fill
the solution space is one approach to reduce the number of solutions. Additional information,
such as known source contributions and/or source compositions, can also be used to reduce
the number of solutions and to determine whether one solution is more physically realistic than
other solutions.

Mathematically, a pair of factor matrices  (G and F) that can be transformed to another pair of
matrices (G* and F*) with the same Q-value is said to be "rotated." The transformation takes
place as shown in Equation 6-1:

                              G* = GT and F* = T-1F                              (6-1)

The T-matrix is a p x p,  non-singular matrix, where p is the number of factors. In PMF, this is
not strictly a rotation but rather a linear transformation  of the G and F matrices.  Due to the
non-negativity constraints in PMF, a pure rotation (i.e., a specific T-matrix) is only possible if
none of the elements of the new matrices are less than zero. If no rotation is possible, the
solution is unique. Therefore, approximate rotations that allow some increase in the Q-value
and prevent any elements in the solution from becoming negative are useful in PMF.

For some solutions, the non-negativity constraint is enough to ensure that there is little rotational
ambiguity in a solution.  If there are a sufficient number of zero values in the profiles  (F-matrix)
and contributions (G-matrix) of a solution, the solution  will not rotate away from the "real"
solution. However, in many cases, the non-negativity constraint is not sufficient to prevent
rotation away from the "real" solution.  To help determine whether an optimal solution has been
found, the user should inspect the G-space plots for selected pairs of factors in the original
solution. The current guidance is to select a regional source type such as coal-fired  power
plants (sulfate) and plot it against local industrial sources such as steel production (Fe).

6.1    Fpeak Model Run Specification

After evaluating the base run BS error estimates, the rotations should be explored.  Fpeak runs
are initiated by selecting "Rotational Tools," "Fpeak Rotation & Notes," and "Fpeak Model
Runs." The base run with the lowest Q(robust) is automatically selected by the program as the
run for Fpeak runs; this can be overridden by the user in the "Selected Base Run" box. The
user can perform up to five Fpeak runs by checking the appropriate number of boxes and
entering the desired strength  of each  Fpeak run. While there are no limits on the values that
can be entered as Fpeak strengths (under "Selected Fpeak Runs"), generally values between -5
and 5 should  be explored first. Positive Fpeak values  sharpen the F-matrix and smear the
G-matrix; negative Fpeak values smear the  F-matrix and sharpen the G-matrix.  More details on
positive and negative Fpeak values can be found in Paatero (2000). The Fpeak strengths in
ME-2 are not the same as those in PMF2; values of around five times the PMF2 values are
                                          52

-------
U.S. Environmental Protection Agency
                                        EPA PMF 5.0 User Guide
needed to produce comparable results in ME-2.  Additionally, an Fpeak value of 0 is not
allowed; EPA PMF will give the user an error message if 0 is entered in any Fpeak strength box.

Fpeak runs begin when the user presses the "Run" button on the Fpeak Model Runs screen.
Base run and BS run results will not be lost when Fpeak is run. After the Fpeak runs are
completed, a summary of the Fpeak results, with the same information contained in the Base
Model Run Summary table, is shown in the Fpeak Model Run Summary table (Figure 32, red
box). Additional results are displayed in: Fpeak Profiles/Contributions, Fpeak Factor
Fingerprints, Fpeak G-Space Plot, Fpeak Factor Contributions, and Fpeak Diagnostics; these
results should be used as a reference when evaluating the Fpeak runs. Fpeak is useful for
examining the span of possible rotations, with an end result  of more values at or near 0 in either
the contributions or profiles, depending on whether a positive or negative Fpeak is  used.  Thus
DISP and BS-DISP with Fpeak forcing will yield shorter EE intervals, potentially leading to
incorrect interpretation of a solution.
        Model Data Base Model || Rot
        I Fpeak Rotation 8 Notes | Constraint

        | Model Runs Profiles/Contribution:

          Fpeak Model Runs	
Factor Fingerprints G-Space Plot | Factor Contributions Diagno:

                     :peakMi
           Selected Base Ru

          Selected Fpeak Runs
      R [ro~ F pTiT r [iT~


              IP  Runl
          Fpeak Model Bootstrap Method

            Number of Bootstraps: [uJO

          Minimum Correlation R-Value: [u~6
        Block Size: \25 _Suggest

              Q  Run
                                (Robust)
                                  5953.3
 %dQ
(Robust)
   Q 68
Q (True)   Converged

  6336 8   Yes
         Rotational Tools Notes
        Help |
     Figure 32. Example of the Fpeak Model Run Summary in the Fpeak Model Runs screen.


6.1.1   Fpeak Results

The Fpeak Profiles/Contributions screen presents profile (Figure 33, 1) and contribution (Figure
33, 2) plots for Fpeak runs (by Fpeak strength value and factor) and for the selected base run.
In the profile graph, the concentration of species (left y-axis) is a green bar and the percent of
species (right y-axis) is an orange box.  For comparison, the original base run results are also
displayed on the profile graph.  The mass of the species (left y-axis) is a light gray bar and the
percent of species (right y-axis) is a dark gray box.  The contribution graph presents the time
                                            53

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
series of factor contributions.  Factor contributions for the base model results are also displayed
(gray line). The Fpeak values are in the same order as entered on the Fpeak Model Runs
screen; the factors are in the same order as those in Base Model Results.  In these graphs,
users should look for deviations (i.e., increases or decreases in a particular species in a factor)
among Fpeak values and with the corresponding base run results.  Users can select an Fpeak
value and factor number by clicking on the desired number at the bottom of the screen. The
status bar (Figure 33, red box) in the Fpeak Profiles/Contributions screen displays the date and
contribution of data points closest to the mouse position on the contribution graph. The status
bar displays the date, concentration, total variable selected, and the species factor as they are
moused over on the Factor Contributions  plot. If no mass from the total variable is apportioned
to the factor, the graph is not shown and the GUI instead displays, "Total Variable mass is 0 for
this run/factor."
 | Fpeak Rotation & Notes Constraints
  Model Runs  Profiles/Contributions  Factor Fingerprints  G-Space Plot  Factor Contributions  Diagnostics
      Sase Run  - % of Spe
                                    FpeaK Factor Profile - Fpeak - -0.5 - Motor Vehich
         Base Run Fs;hr vsninbutscris  *	 Fpeak Factor Contributions
                                   07/01/03    02/01/04
                                                          06/01/05    01/01/06
                                        Base Contribution = 4.72700
                                                                    Fpeak Contribution = 5.16640
                Figure 33.  Example of the Fpeak Profiles/Contributions screen.


Fpeak Factor Fingerprints

The Fpeak Factor Fingerprints screen shows the concentration (in percent) of each species
contributing to each factor as a stacked bar chart (Figure 34). This plot can be used to verify
unique factor names and determine the distribution of the factors for individual species. Users
should look for deviations (i.e., increases or decreases in  a particular species in a factor) among
                                             54

-------
U.S. Environmental Protection Agency
                                               EPA PMF 5.0 User Guide
Fpeak values and the corresponding base run results. The user can select an Fpeak value by
clicking on the desired number at the bottom of the screen.
  Model Runs  Profiles/Contntuitions * rac;orFin..ierprin;s  'j-Spaee Plot Factor Contributions Diagnostics
                                Fpeak Factor Fingerprints - Fpeak - -0 5
        I
n
T
I

  Fpeak = -0.5
 Help |
                 Figure 34. Example of the Fpeak Factor Fingerprints screen.


Fpeak G-Space Plot

As in the Base Model Results screen, the Fpeak G-Space Plot screen shows a scatter plot of
factors. The user assigns a factor to the x- and y-axes by selecting the desired factor from the
lists on the left of the screen (Figure 35, 1).  The Fpeak value to display, the base run G-space
plot ("Show Base"), and the delta in G-space plots between the base run and an Fpeak run
("Show Delta") are selected at the bottom of the screen (Figure 35, 2). When an Fpeak value is
selected in either the Fpeak Profiles/Contributions screen or the Fpeak G-Space Plot screen, it
is automatically selected in the other screen. The user can  also select a point in any Fpeak
G-space plot by clicking on that point. The selected point will turn orange and the date and x-y
values will be stored to the *_Fpeak_diag file.  This feature helps the user identify and track
rotations.  For example, if a G-Space plot appears rotated, the  user can mark the edge points.
Using information such as meteorological conditions or emissions information, the user can
determine whether these edge points are expected to have  low contributions from the source.
                                           55

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide


| Fpeak Rotation^ Notes Constraints
Model Runs Profiles/Contnbulions Facto i Fingeipnnls ; :--Sra:.:-' Plot Facio; Contributions Diagnostics
[
.


YAxis
Secondary Sulfate
Potassium S. Biomass
Secondary Nitrate
Crusta!
industrial
Steel Production
1
XAxts
Potassiums Biomass
Secondary Nitrate
Crustal
Industrial
Steel Production
Motor Vehicle
1 Show Base Show Delta Fpeak = -0.5 - 1


MotorVehida Contributions (avg=1)

" P " _. cpeg.
s
*. *ft
'-%°°*'!' *
C-'Wft.' i» Vf:" '. '•:
'-"• *•.*••>,"• <•
__ 0 0 0 ^
012345678
Seconflary Sulfate Contributions (a«j=i)

Help]



                   Figure 35. Example of the Fpeak G-Space Plot screen.


Fpeak Factor Contributions

The Fpeak Factor Contributions screen (Figure 36) shows two graphs.  The top graph is a pie
chart which displays the distribution of each species among the factors resolved by PMF (Figure
36, 1).  The species of interest are selected from the table on the left of the screen; the
categorization of that species is also displayed for reference.  If a total variable was chosen by
the user under the Concentration/Uncertainty screen, that variable is boldfaced in the table.
The pie chart for the selected species appears on the right side of the screen. If the user has
specified a total variable, the distribution of this variable across the factors will be of particular
importance.  The user may also want to examine the distribution of certain key species, such as
toxic species, across factors.  The bottom graph shows the contribution of all the factors to the
total mass by sample (Figure 36, 2). The dotted orange reference lines denote January 1 of
each year. The graph is normalized so that the average of all the contributions for each factor
is 1.

Fpeak Diagnostics

The Fpeak Diagnostics screen summarizes the Fpeak input parameters and output for
reference (e.g.,  Fpeak run summary, factor profiles and contributions, and samples that are
marked on the Fpeak G-space plot). All of the information on this screen  is  saved in *_Fpeak.
                                          56

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
  Model Data Base Mode! 11 Rotational Tools | Help

  | Fpeak Rotations Notes | Constraints
   Model Runs Profiles/Contributions Factor Fingerprints  G-SpacePlot | Factor C jntibutrons: Diagnostics
   Spe.
            Category
           Strong
   Elemenlal Carbon
   Manganese
           Strong
           Strong
                            PM25-Fpea* = -05
                                               • SeconGary Sultate • 5 878&0 [42 1 Si
                                               •          i'5 (7.3*i

                                               n Secondary Nitrate = 1.S5150 (13.3%;
                                               • Crusts'= 090531 (65%)

                                               • Industrie « 048618 (3.6%)

                                               • Sttti Production-O.Sezi 4 H9*)
                                               9 Moior Vehicle = 3 14400 (225 °,ij
                      i Potassium & Biomass
                                      Factor Contributions (avg= 1)-Fpeak = -0.5
                                   SeconQary Nitrate    •-—• Crustal
                                                                        Steel Production
     I 10
                                                                 V
   • Fpeak - -0.5

  Help |
                 Figure 36.  Example of the Fpeak Factor Contributions screen.
6.1.2  Evaluating Fpeak Results

Fpeak runs should be viewed by the user as a means of exploring the full space of the chosen
PMF solution.  Several aspects of the solution should be evaluated to understand how Fpeak
changes the PMF solution. Users should first examine the Q-values of the Fpeak runs
(available in the Fpeak Model Run Summary on the Fpeak Rotation & Notes -> Fpeak Model
Runs screen) to evaluate their increase from the base run Q-value.  In a pure rotation, the
Q-value would not change because the rotation  is simply a linear transformation of the original
solution. However, because of the non-negativity constraints of PMF, pure rotations are not
usually possible and the rotations induced by Fpeak are approximate rotations, which change
the Q-value. In general, an increase of the Q-value due to the Fpeak rotation with a dQ of less
than 5% of the Base Run Q(robust) value is acceptable. Corresponding G-space plots of Fpeak
solution factors should be examined to see if points move toward the axis or lower/zero
contributions (Figure 37).  Additionally, profiles and contributions should be examined to
determine  the impact of the rotation.
                                              57

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
 Model Data Base Model | Rotational Tools Help
 [ Fpeak Rotation & Notes

  Model Runs Profiles/Contributions Factor Fingerprints | G-Space Plot  Factor Contributions  Diagnostics
   Secondary Sulfate
   Potassium dBiomas
    ndary Nitrate
   Industrial
   Steel Production
   Motor Vehicle
   Potassium SBiomass
    ndary Nitrate

   Industrial
   Steel Production
   Motor Vehicle
  Snow Base Ishow De;ta ?peak = 0,5
                                                               Fpeak G-Space Plot - Fpeak = 05
                                                               Secondary Sulfate Contributions iairg=1 i
       Figure 37. G-Space plot and delta between the base run contribution and Fpeak
       run contribution for each contribution point.
6.2    Constrained Model Operation

Source composition and contribution knowledge can be used to constrain a model run. For
example, if a source is known to be inactive for a certain period, there should be no
contributions from the factor that represents that source during the inactive time period. The
contributions can be set to zero or pulled to zero and the penalty in Q is provided for moving the
contribution from the optimal solution to one based on external knowledge.  Another example is
if a source profile from a nearby facility has been quantified, the user could constrain the profile
in a factor that represents that facility type to match the measured profile. The amount of Q
allowed for a constraint depends on the data set; however, 5% of Q(robust) is the current
maximum that is recommended and PMF automatically calculates the amount of Q associated
with a percent by entering a % dQ. Applications of using constraints are discussed  in  greater
detail elsewhere (Morris et al., 2009; Paatero et al., 2002; Paatero and Hopke, 2008; Rizzo and
Scheff, 2007).

6.2.1   Constrained Model Run Specification

The Constrained Model Runs screen is used to specify constraints associated with a variety of
types of a priori information including:  (a) creating constraints using the Expression Builder and
                                             58

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


(b) specifying constraint points from the base model results and the constraints table. Starting
with a selected base run, two types of constraints can be performed: (1) "hard pulling," which is
imposed without regard to the change in the Q-value (e.g., a specific factor element in either the
profile or the contribution matrix is set to zero, given a lower and upper limit, or fixed to its
original value), or (2) "soft pulling," which has a limit of change allowed in the Q-value (e.g., an
element or expression of elements is pulled up maximally,  pulled down maximally, or  pulled to a
target value).

The Expression Builder has three radio  buttons that users can select to define constraints as
constant ratio (Figure 38), mass balance (Figure 39), or customized expression (Figure 40).

•  Ratio (Figure 38) - Select a factor and two different species from the lists, and input the
   ratio in the "Value" text box.
•  Mass Balance (Figure 39) - Select  and add  one or multiple factor-species into the text
   boxes on both sides of the equal sign under "Mass Balance" to set the balance equation. If
   needed, a number can be  input into  the "Coefficient" text box, which will be used as a
   coefficient for the species selected.  Click the "Clear" buttons to remove the current
   specifications of the balance equation.
•  Custom (Figure 40) - Specify a constraint by creating a customized equation. The
   customized equation can be based on either profiles (with species as element) or
   contributions (with  sample as  element).  The custom equation must follow the same
   structure as the equations  developed by the Expression Builder.

For each of the three Expression  Builder functions, after the user defines a constraint and
presses the "Add to Expressions" button, the corresponding equation in a standardized format
will appear in the Expressions table (Figure 41, red box).  Since the constraints defined using
Expression Builder are "soft pulling," a limit of change in the Q-value must be specified.  A
default value (% dQ =  0.5) is provided in the Expressions table, which can be updated by users
if needed.  Users are also allowed to delete the selected constraints or all constraints by
pressing the "Remove Selected Expressions" or "Remove All Expressions" buttons at the
bottom of the Expressions table.

Source contributions can be constrained; the user can identify the points to be constrained in
three graphs:

•  On the Base Model -> Base Model Results -> Profiles/Contributions screen, left-click on the
   top graph to highlight a bar for the species to be constrained, then right-click the bar and
   select "Toggle Constraints" (Figure 42, 1).
•  From the Base Model -> Base Model Results -> Profiles/Contributions screen, left-click on
   the bottom figure to select one data  point or drag a square to select multiple data points,
   then right-click the data point and select "Toggle Constraints" (Figure 42, 2).
•  From the Base Model -> Base Model Results -> Base G-Space Plot screen, left-click to
   select one data point or drag a square to select multiple data points, then right-click the data
   point(s) and select "Pull to X-Axis" or "Pull to  Y-Axis" (Figure 43). The user can also select
   multiple data points pressing the CTRL button.
                                           59

-------
U.S. Environmental Protection Agency
          EPA PMF 5.0 User Guide
                            Model Data  Base Model | Rotational Tools  Help
                            Fpeak Rotation & Notes  j Constraints
                            I Model Runs
** Ratio <" Mass Balance r Custom
Factor: Species (numerator): Species (denominator):
Secondary Sulfate
Potassium & Biomass
Secondary Nitrate
Crustal
industrial
Steel Production
Motor Vehicle



PM2.5 _*|
Aluminum
Ammonium Ion
Arsenic
Barum
Bromine
Calcium
Chlorine

Copper ,3]
PM2.5
Aluminum
Arsenic
Barium
Bromine
Calcium
Chlorine

Copper


1





\
Value: [t50 Add to Expressions |
Expression dQ I % dQ



                               Remove Selected Expressions
Remove All Expressions |
                                 Figure 38.  Expression Builder- Ratio.
                            Model Data  Base Model | Rotational Tools  Help
                            Fpeak Rotation & Notes  j Constraints
                            I Model Runs
                               Expression Builder

r~
Clear |
Coefficient:

[> Add to Left Side Add to Right Side <3
Factor:
Species:
Secondary Sulfate ^J
Potassium & Biomass
Secondary Nitrate
Crustal
Industrial
Steel Production "H
PM2.5
Aluminum
Ammonium Ion
Arsenic
Barium
Bromine
Clear |
d
_^
Add to Expressions |
                               Expression   | dQ
                               Remove Selecied Expressions
Remove All Expressions |
                            Figure 39. Expression Builder- Mass Balance.
                                                    60

-------
U.S.  Environmental Protection Agency
              EPA PMF 5.0 User Guide
                             Model Data  Base Model  | Rotational Tools  Help
                                Expressd
                                 Expression Builder
                                                          Mass Balance
                                                               r Contributions
                                        Potassium & Biornass
                                        Secondary Nitrate
                                        Crustal           H
PM2.5            _*|
Aluminum
Ammonium Ion
Arsenic             y I
                                  Add Factor/Element
                                                                       Add to Expressions  |
                                 Expression   | dQ  | % dQ
                                 Remove Selected Egressions |
     Remove All Expressions
                                  Figure 40.  Expression Builder - Custom.
                                Model Data  Base Model | Rotational Tools j Help

                                Fpeak Rotation S Notes  I Constraints
                                   Expression Builder
                                   Ratio

                                   Factor:
                                                   Species (numerator):    Species (denominator):
Secondary Sulfate
Potassium & Biomass
Secondary Nitrate
Crustal
Industrial
Steel Production
Motor Vehicle



PM2.5
Aluminum
Ammonium Ion =
Arsenic
Barium
Bromine
Calcium
Chlorine
Chromium
Copper
PM2 5
Aluminum
Ammonium Ion
Arsenic

Bromine
Calcium
Chlorine
Chromium
Copper



Remove Selected Expressions
Expression


dQ %dQ

Remove All Expressions
Constraints
                                   Add Constraints
                                   Factor    Etement
                                                     Type   Value   dQ  % dQ
                                   Remove Selected Constraints
                                                                     Remove All Constraints
              Figure 41.  Example of expressions on the Constrained Model Runs screen.
                                                         61

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
                                                                                           -Inlxl
 Model Date | Base Model | Rotational Tools | Help

 Base Model Runs |Base Model Results^ Base Model Bootstrap Results  Base Model BS-DISP Results  Base Model DISP Results | Error Estimation Summary
  Residual Analysis  Obs/Pred Scatter Plot  Obs/Pred Time Series | Profiles/Contributions | Factor Fingerprints | G-Space Plot  Factor Contributions j Diagnostics
                                        Factor Prafas- Fun i; -SBeondarj
•
y
•
1
• n
n.nu.m*. ."mm , . n u
•


H . ("IB"
                                      Factor Contributions - Run 12 - Secondary Suit at
             08/01/01     04/01/02      12/01/02
                                       '01/03      04/01/04     12/01/04     08/01/05     04/01/06
  Concentration Unts Q/Qexp Run 12       - Secondary SiJate

 Help |
                   Figure 42.  Selecting constrained species and observations.


As discussed in Section 6.1.2, G-space plots in PMF solutions are evaluated to find edges that
indicate rotational ambiguity and to determine if there are rotations in the solution.  If users
identify an edge in a G-space plot, constraints can be specified to pull the data points along the
edge toward the axis (i.e., toward zero).  The user should examine the points along the edge; if
there is any a priori information that would indicate that a value should be zero (e.g.,  the source
that the factor represents was inactive during a given time), the point should be pulled using the
associated constraints. The strength of each pull is controlled by specifying a limit on the
change in the Q-value. If the user wishes to perform a weak pull, a small  limit on the change in
the Q-value should be allowed. Conversely, if the user wishes to perform a strong pull, a large
limit on the change in Q-value should be allowed.  The strength of the pull should  be based on a
priori information about the  pollutant sources that indicate that the contribution for the given
sample should be zero. The user can select as many points in as many factors to pull as they
wish.
                                               62

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide

Model Data j Base Model Rotational Tools Help
Base Model Runs Base Model Results Base Model Bootstrap Results Base Model BS-DISP Results
Residual Analysis Obs/Pred Scatter Plot Obs/Pred Time Series Profiles/Contribution




























YAxis
Secondary Sulfate
Secondary Nitrate

Industrial
Steel Production
Motor Vehicle





XAxis
. :''-'3'5'5iLim & Biomass
Secondary Nitrate
Crustal
industrial
Steel Production
Motor Vehicle










Base Model DISPResu
Factor Fingerprints
G-Space Pic


| G-Space Plot




Its




Error Estimation Summary
Factor Contributions







Diagnostics
















11
10
9
1
tr a
1
3
% 6
1
E 5
i
§
S. 4

3
2
1
0


~

\
\
_
',













0



















































: . ' Constrain and move to
o ^^ y-axis

r


—
: ,
{
'-
i-_
'. i
0





%
.1? =>>
f^
K


0

0


.**•' °
fi'v*' '"•
t^** °^%
s^sSi^.*
1




"


?%
A
7






•* * • .
;*'*?V.'
-..V0:,*'
3







o -
,
4
Secondary Suffste Contribution
; Run 12
Help]


















































567
a |avg=1)




























I

         Figure 43. Example of selecting points to pull to the y-axis in the G-space plot.


After the Constraint Points are defined in the previous three graphs, the Constraints table will
appear on the Rotational Tools, Constraints screen, showing a constraint in each row (Figure
44, yellow box). Users then need to select one of the six constraint types included in the pull-
down list (column "Type"):

•  Pull Down Maximally - A factor element is pulled  down maximally given a limit of change in
   the Q-value; users can update the default dQ-value.
•  Pull Up Maximally - A  factor element is pulled up  maximally given a limit of  change in the
   Q-value; users can update the default dQ-value.
•  Pull to Value - A factor element is pulled to a target value given a limit of change in the
   Q-value (default % dQ = 0.5); users need to input the target value into the "Value" column.
•  Set to Zero - A factor element is forced to equal zero, with no limit of change in the
   Q-value.
•  Set to Original Value - A factor element is fixed to its original value, with no limit of change
   in the Q-value.
•  Define Limits - A factor element is given a lower and upper limit; users need to input the
   "low/high" limit in the column "Value."
                                          63

-------
U.S. Environmental Protection Agency
                                                   EPA PMF 5.0 User Guide
   , EPA PMF
   Model Data  Base Model | Rotational Tools  Help

   Fpeak Rotation & Notes

   1 Model Runs  Profiles/Contributions  Factor Fingerprints  G-Space Plot   Factor Contributions  Diagnostics
     Expressions
      Expression Builder
      Ratio
      Factor:
',*• Ratio   .  Mass Balance      Custom


   Species (numerator):   Species (denominator):
Potassium
Industrial:
Suffate
Steel
Crustal
Nitrate
Mobile



PM2.5 *
Aluminum
.Ammonium Ion
Arsenic
Barium
Bromine
Calcium
Chlorine
Chromium
Copper
Elemental Carbon
PM2.5 *
Aluminum
Ammonium Ion
Arsenic
Barium
Bromine
Calcium
Chlorine
Chromium
Copper
Bemental Carbon
                                             Constraned Modd Run
                                               Run ] Selected Base Run:  1
                                              dQ (Robust)
                                                       Q (Robust)
                                                                %dQ (Robust) Q(Aux)   Q [True)
                                                                                    Converged
                                             Error Estimation
                                              Constrained Model Bootstrap Method

                                                 Number of Bootstraps:  20
                    Value: 0
                                                    Minimum Correlation R-Value: 0.6

                                                    Block Size: ID  Suggest |
Expression dQ
%dQ

                                              Conarained Model BS-DISP Method
                                              Displacement
      Remove Selected Expressions
                                | Remove Ail Expressions ]
                                                       Specie!
                                                                  S/N
                                                          PM2.5

                                                              Strang
                                                              Strong
                                                              Strong
                                                              Strong
                                                                    6.6
                            Type
UlUi^^ 07/1 8/01 00:00
Industrial:
Industrial:
Industrial:
< |.

08/01/0600:00
08/14/0500:00
08/08/01 00:00
Pull Down Maximally T
Pull Down Maximally -
Pull Down Maximally *•
Pull n-iwn Ma-jrimalK' -
NA
NA
NA
NA
„-

Remove Selected Constraints
[
10
10
10
10

Remove All Constrain
                                              Constrained Model Displacement Method

                                              [H Run ]   Selected Base Run: 1


                                              Run Progress
   Help]
                Figure 44. Example of the Constrained Model Run summary table.


It should be noted that the constraints defined through the Expression Builder or "Constrain
Points" are specific for a selected base run.  If users input another run number as the "Selected
Base Run" under Constrained Model Run, all constraints associated with the previous base run
will be removed from the Expressions and Constraints tables.

After the specification of all constrained model parameters, the user should  press the "Run"
button in the Constrained Model  Run box to initiate the run for a  constrained model.  Once the
run is initiated, the "Run Progress" box in the lower right corner of the screen activates and the
constrained model run can be terminated at any time by pressing the "Stop" button.  No
information about the constrained model runs will be saved or displayed if the runs are stopped.
When the  constrained model run is completed, the  summary table shows dQ,  Q(robust), %
dQ(robust), Q(Aux), Q(true), as well as whether the run converged (Figure 44, red  box). Five
new tabs with constrained model run results will  appear, including Constrained
Profiles/Contributions, Constrained Factor Fingerprints, Constrained G-Space Plot, Constrained
Factor Contributions, and Constrained Diagnostics.

The % dQ (robust) value needs to be evaluated based on the amount of dQ that was used in
the constraint(s).  The % dQ(robust) shows the increase in Q due to the constraint(s). An
increase of dQ  of up to  1% for all of the constraints may be acceptable;  however, the
                                              64

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
interpretation of the factor profiles, contribution time series, and error estimation results are also
critical.  The Profiles/Contributions tab provides both the base and constrained factor profiles
and well as the base and constrained factor time series.  Evaluate all of the plots for all factors
to understand the impact of the constraints and determine whether the constraint has provided a
more interpretable solution.

Typically, species contributions to factors fall into two categories: (1) stiff, in that they will not
significantly change or if they are constrained, unreasonable profiles are created; and (2) weak,
in that they move easily and are typically not well modeled by PMF.  The understanding of the
stiff and weak key tracer species for sources allows for optimization of the solution using
measured profile or other information. Weak species should be  interpreted as easily moved
between sources while stiff species are strongly associated with the factor  and should  be used
in the interpretation of its source.

6.2.2  Constrained Profiles/Contribution Results

The Constrained Profiles/Contributions screen (Figure 45) shows factor profile and contributions
graphs  in the same format as those on the Fpeak Profiles/Contributions screen.  The mass and
percentage of species and the time series of factor contributions are presented for both the
constrained model run and the selected base run. The user should look at the deviations in the
results between the two model runs and examine the impact of constraints.
  Model Data Base Model | Rotational Tools | Help

  Fpeak Rotation & Notes | Constraints
   Model Runs I Profiles CorthbLiiiors  Factor Fingerprints  G-Space Plot Factor Contributions Diagnostics
I
10'
ase

lun: • ttofSp

ee-es Constrained Factor Profile - Secondary Sulfate ConstrBmed RUT * % of Spec*?
"

80
                                    •;cn.-!r-ainec Fantc.f C .-•ntnl:u:,crr; - ie^r,,-;.:,,-_, SUfnt.?
             58/01/01     04/01/02    11/01/02     07/01/03    02'01/04     10/01/04     06/01/05   01/01/06    09/01/06
    Concentration Units Q/Qexp Secondary Sulfate

  Help |
              Figure 45. Example of the Constrained Profiles/Contributions screen.
                                             65

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
Constrained Factor Fingerprints

The Constrained Factor Fingerprints screen shows the concentration (in percent) of each
species contribution to each factor as a stacked bar chart (Figure 46).  This plot can be used to
verify unique factor names and determine the distribution of the factors for individual species.
Users should look for deviations (i.e., increases or decreases in a particular species in a factor)
with the specified constraint(s) and corresponding base run results.
        Profiles/Contributions j Factor Fingerprints G-Space Plot  Factor Contributions Diagnostic
              I


                 Factor Legend

                 • -Secondary Sulfa
                 • = :?££. 5. Bio

                 D - Secondary Nitra
                 • - Crustal
                  - Industry;

                 i_' - Steel Productio
                 L . - Motor Venicle
 Help |
                                     \
              Figure 46.  Example of the Constrained Factor Fingerprints screen.


Constrained G-Space Plot

The Constrained G-Space Plot (Figure 47) presents the scatter plot of factor contributions for
the constrained model run. Similar to the Fpeak G-Space Plot screen, the user can select
"Show Base" to display the base run G-space plot and select "Show Delta" to display the
difference in G-space plots between the constrained model run and the base run.

Constrained Factor Contributions

The Constrained Factor Contributions screen (Figure 48) shows two graphs.  The top graph is a
pie chart, which displays the distribution of each species among the factors resolved by PMF
(Figure 48, 1). The species of interest is selected from the table on the left of the screen; the
                                            66

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
categorization of that species is also displayed for reference.  If a total variable was chosen by
the user under the Concentration/Uncertainty screen, that variable is boldfaced in the table.
The pie chart for the selected species appears on the right side of the screen.  If the user has
specified a total variable, the distribution of this variable across the factors will  be of particular
importance.  The bottom graph shows the contribution of all the factors to the total  mass by
sample (Figure 48, 2). The dotted orange reference lines denote January 1 of each year.  The
graph is normalized so that the average of all the contributions for each factor  is 1.
 Fpeak Rotation 8. Notes

  Model Runs  Profiles/Contributions Factor Fingerprints \ G-Space Plot | Factor Contributions  Diagnostics

YAMS
Secondary Sulfate
Secondary Nitrate
Crustal
Industrial
Steel Production
MotorVehicle



XAws
Secondary Nitrate
Crustal
Industrial
Steel Production
MotorVehicle







13
12
11
10
o 7
E
E '
•1 5
1 '
Q.
4
3
2
1
0



ons aine pace o . z..,.r,,.=.
0
1
>


•:
Fi**"0"*°" «
•i/|^'V;;,.. ; •
j^^r^j£.',.V'' '.,',. /.
012345678
Secondary Sulfste Contributions (avg=1)













  Show Base Shoxv Deta
 Help |
                 Figure 47. Example of the Constrained G-Space Plot screen.


Constrained Diagnostics

The Constrained Diagnostics screen (Figure 49) includes a summary of the constrained model
parameters and output for reference (e.g., constraint types, constrained model run summary
table, factor profiles, and factor contributions). All of the information on this screen is saved in
*_Constrained files.

Constrained BS-DISP and DISP Runs

The BS-DISP and DISP error estimation for the constrained model can be performed in the
same manner as the error estimations for the base run.  DISP run output files will be saved in
                                           67

-------
U.S.  Environmental Protection Agency
           EPA PMF 5.0 User Guide
the directory specified in the Output Folder box in the Data Files screen.  The DISP and BS-
DISP files are saved as * ConstraintedBSDISPres# and *ConstrainedDISPresd#.
 Model Data Base Mode! | j Rotational Tools | Help

 Fpeak Rotation & Notes I Constraints
  Model Runs Profiles/Contributions Factor Fingerprints G-Space Plot jTa™toTc"onfri'butions! Diagnostics
  Species
          Category
  Ammonium Ion  Strong
          Strong
          Strong
  Copper
  Elemental Carbon
  Mangana
          Strong
          Strong
factor Cgntnjiutiqn ?.Q_P5 %
• Secondary Sulfdle-5 4KM 090 %i

• Potassium i Siomass-l 11090 <«.(>*)

n SecanOaryNitrate = 2.G4760 (14.7Xj

• Crystal =0.57115 (4,1 %)
• Industrial-052707 (38 %i

ffl Steel Production-0.72770 (52%|
ID Motor Vehicle = 3.52690 (25 3 %)
                     ' Potassium SBiomass
                                         Factor Contributions (avg- 1)
                                   Secondary1 Nitrate    •	* Crustal
                                                                         Steel Production

                                                       !feJ!.
                                                     i  S .-'uftUU ..L.--
                                                                                         i
                                                                                         7
              Figure 48. Example of the Constrained Factor Contributions screen.


Constrained BS Runs and Results

A constrained model run can be bootstrapped in the same manner as base model runs.  After a
constrained model run is completed, the user can initiate a  BS run for the constrained model in
Constrained Model Bootstrapping. The constrained bootstrapping results are displayed in
Constrained Bootstrap Box Plots and  Constrained Bootstrap Summary in the same format as
the Base Run bootstrapping output screens for easy comparison. The BS files are saved as
*_Gcon_profile_boot.

6.2.3  Evaluating Constraints Results

Constraints can be used to reduce rotational ambiguity, to refine a solution, and to understand
both stiff and weak factor species. All factors and source contribution time series must be
evaluated to understand the impact of the constraint(s).  In  addition, the error estimation results
need to be evaluated to determine if the constraint has changed the species factor contribution
significantly. The guidance on constraints will continue to be developed as PMF is applied to
                                              68

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
more data sets and the Training Exercises in Section 8 provide more examples on how to
interpret the results.
 Model Data Base Mode! | j Rotational Tools Help

 Fpeak Rotation & Notes 11 Constraints

  Model Runs Profiles/Contributions Factor Fingerprints  G-SpacePlot  Factoi Contributions \ D'anr C'S'K --•
Time of run:

Configuration file
11/03/12 10:34
"

Base model run number:
Sase random seed:

Expressions :
Expression
Factor
Secondary Sulfate
Secondary Sulfate
Secondary Sulfate
Secondary Sulfate
Secondary Sulfate
Secondary Sulfate
Secondary Sulfate
Secondary Sulfate
Constrained Run Su
Constrained «
1
12
12
12
12
12
i|



Element
02/03/02 00:00
12/18/05 00:00
12/24/05 00:00
11/25/06 00:00
12/10/06 00:00
01/03/07 00:00
01/18/07 00:00
01/27/07 00:00
nraary Table:
dQ( Robust)
•38.9
PM2.5
Aluminum
Ammonium Ion
Arsenic
Barium


.\ 3crava\u«™nmi-awjrii rpir •.uai.a \uat-a3ei, OBJ. i-uiiuie_i.
C:\Oaers\n\Documents\EPA PMF\Data\E
12
3



Type
Pull
Pull
Pull
Pull
Pull
Pull
Pull
Pull

Q(Rok
5961.









Value
Down Maximally
Down Maximally
Down Maximally
Down Maximally
Down Maximally
Down Maximally
Down Maximally
Down Maximally

ust) QIAuxJ
4 36.8





alt example . cf g



dQ
dQ
NA
NA
NA
NA
NA
NA
NA
NA

Q(True)
6345.2
5.450500E+QOQ
3.55S700E-Q05
1.20190QE+OOQ
1.379300E-004
O.OOOOOOE*000
^
in. txt




dQ
dQ
0 0.
0 0.
0 0.
0 0.
0 0.
0 0.
0 0.1
10 0.17

Converged t Steps
Yes 684
1.110900E+000 2.047GOOE+000 5.711500E-001 5.270700E-0(
8 . 76S400E-004 1 . 8009QOE-QQ3 5 . 52190QE-Q03 4 . 954600E-OC
S.382400E-002 5. 987600E-001 0. OOOOOOE+000 3.916SOOE-OC
1.186700E-004 3. 193000E-004 2.258700E-QQ3 7.575700E-0(
0 . OOOOOOE+000 8 . 651200E-003 7 . 150000E-003 2 . 510900E-0( ^J
«!£] .:=
                   Figure 49.  Example of the Constrained Diagnostics screen.
                                              69

-------
U.S. Environmental Protection Agency
                                            EPA PMF 5.0 User Guide
7.     Troubleshooting

Common problems in EPA PMF 5.0, including the error messages generated by the GUI and
the action the user should take to correct the problem, are detailed in Table 3.  If a problem
cannot be resolved using the following information, send an email to
NERL_RM_ Support@epa .gov.
                         Table 3. Common problems in EPA PMF 5.0.
     Problem
 Cannot run base
 runs
           Error Message
Access to the path 'C:\Program Files\EPA
PMF 5.0\PMFData.txt' is denied.  Please
close all output files.
             Action
Turn off User Access Controls in
Microsoft Vista
 Column headers
 of concentration
 and uncertainty
 files do not match
Species names in uncertainty file do not
match those in concentration file.  Do you
wish to continue?
If the names are correct, continue.  If
the columns are in a different order,
correct and retry.
 Number of
 columns in
 concentration file
 is not the same as
 in uncertainty file
Number of species in uncertainty file does
not match the number of species in
concentration file.
Select "OK" and examine input files.
The same number of columns, in the
same order, should be included in
the concentration  and uncertainty
files. If named ranges are used,
check that the ranges are defined
correctly.
 Number of rows in
 concentration file
 is not the same as
 in uncertainty file
Dates/times in uncertainty file do not
match those in concentration file.
Select "OK" and examine input files.
The same number of rows, sorted by
the date/time, should be included in
the concentration and uncertainty
files. If named ranges are used,
check that the ranges are defined
correctly.
 Blank cells are
 included in
 concentration file
Empty cells are not permitted in the
concentration input file. Please check
your data file.
Select "OK" and remove blank cells
from input file before trying again.
 Blank cells, zero
 values, or
 negative values
 are included in
 uncertainty file
Null, zero, and negative uncertainty
values are not permitted.  Please check
your data file.
Select "OK" and remove
inappropriate cells from input file
before trying again.
 Cannot save
 output files
 because one is
 open
The process cannot access the file 'file
path and name' because it is being used
by another process. Please close all
output files.
Close file and select "Retry" or select
"Cancel" to change the file path and
name.
                                            70

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
8.     Training Exercises

The following sections offer examples of PMF analyses of three types of data: (1) water
samples collected at multiple locations during rainfall events; (2) hourly aerosol metals data
from St Louis, Missouri; and (3) speciated VOC data from a Photochemical Assessment
Monitoring Stations (PAMS) site in Baton Rouge, Louisiana.  The data sets are installed in the
EPA PMF/Data folder and are provided as examples for analyses. Users can follow the steps
outlined in each example to better understand the PMF process and the interaction of the
components described in this User Guide.

The examples all follow the flow shown in Figure 50, recommended for all PMF analyses. For
some users, the Base Model may be sufficient.  However, Fpeak can be used to optimize the
solution and Constraints can be used to incorporate information on the source such as
composition or emissions. Evaluating the error estimates is a critical  component of a PMF
analysis.
                          Increasing Complexity
                                                     Measured Source
                                                     Profile, Emissions,
                                                    and Source Location
                                                       Information
Displacement


Bootstrap


Bootstrap
Displacement
                        Figure 50. PMF results evaluation process.
                                         71

-------
U.S. Environmental Protection Agency
                                              EPA PMF 5.0 User Guide
8.1    Milwaukee Water Data
This exercise focuses on the data set provided in Mil_water_samples.xls.  This exercise is
intended to demonstrate the thought process as well as steps involved in evaluating a small
data set with event sampling from multiple sites; it is not intended to be a complete source
apportionment analysis.  The PMF input parameters are summarized in Table 4 and all sites
were used in the analysis.
              Table 4. Milwaukee Example - Summary of PMF Input Information.
 ***Data Files***
 Concentration file: Mil_water_samples.xlsx ("Cone" worksheet)
 Uncertainty file:   Mil_water_samples.xlsx ("Unc" worksheet)

 Excluded Samples
 none
    ' Input Data Statistics ****
Species
BODS
TSS
NH3
TP
Cd
Category  S/N
Strong
Strong
Strong
Strong
Bad
                                 **** Base Run Summary ****
                                 Number of base runs:              20
                                 Base random seed:               12
                                 Number of factors:                3
                                 Extra modeling uncertainty (%):        0
Species                   Category  S/N
Cr                      Strong
Cu                      Strong
Pb                      Strong
Ni                      Strong
Zn                      Strong
8.1.1   Data Set Development
Soonthornnonda and Christensen (2008) conducted a source apportionment of pollutants
contributing to combined sewer overflows (waste water + storm water) from the 19.5-mile
(31.4 km) inline storage system in Milwaukee.  A diagram of the deep tunnel system is shown in
Figure 51 and more information can be found at http://v3.mmsd.com/DeepTunnel.aspx.
Samples were collected from multiple sites on one day and the Mil_water_samples.xls file has
three tabs:  cone (concentration), unc (uncertainty), and site information.  The paper reference is
also included on the site tab.
Both CMB and a version of PMF that was developed by Bzdusek et al. (2006) were used for the
data analysis and the data used for the PMF modeling was posted as supplemental information
on the Environmental Science and Technology website1.  In addition, the authors assumed 20%
relative error of the elements of the data matrix. All of the species were initially used in the base
model  run, 3 factors, and 20 runs. A random seed was initially used to evaluate the variability in
runs and the following results are based on a seed number of 12.
 http://www.researchqate.net/iournal/0013-936X Environmental Science and Technology
                                           72

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
                                            EXTREME WET WEATHER FLOW
         SEWER
         AREA
                             Figure 51. Deep tunnel system.
8.1.2   Analyze Input Data
The species relationships were evaluated using the concentration scatter plots.  The biological
oxygen demand (BODS) was not related to the total suspended solids (TSS) (Figure 52),
indicating that they had separate sources.  Also, the cadmium concentrations were only at two
levels (Figure 53), potentially indicating an issue with using the species.

8.1.3  Base Model Runs

The obs/pred scatter plot was used to evaluate the base model results because the data were
collected from multiple sites on the same date. All of the species have a linear relationship
except for cadmium, as shown in Figure 53.  Based on these results, cadmium was set to "bad"
and the base model was re-run.

The stacked graph plot shown in Figure 54, which shows results similar to Bzdusek et al.
(2006a), is created by selecting the top figure in the Profiles/Contributions screen, right-clicking,
and selecting Stack Graphs.  Select the new window and right-click for file saving options or use
"Copy to Clipboard" to paste the figure into a document.

This data set poses some challenges for plotting since the samples were collected from multiple
sites on the same day when it is was raining.  Rather than  on a fixed schedule, the sampling
was event-based. The time-series plots have horizontal lines between the sites (Figure 55).
Information on the site name and sampling time is displayed on the bottom bar after a point is
selected on the figure.  The user needs to evaluate whether combining the data in a PMF
analysis is justified. The key receptor modeling assumption is the composition of the sources
impacting the sites does not change between sites.
                                          73

-------
U.S.  Environmental  Protection Agency
EPA PMF 5.0 User Guide
      tf. EPA PMF
                   I ° II ED jf
      I Model Data  Base Model   Help

       Data Files  Concentration/Uncertainty  I Concentration Scatter Plot  Cor
                                                            Data Exception*
        TSS
        NH3
        TP
        Cd
        Cr
        Cu
        Pb
        Ni
        Zn
        NH3
        TP
        Cd
        Cr
                                            S pe ci es Co nc en tra tian
                                                                          BOD5/T3S
                  	Qne-to-Qne
                  - - Regression
                                                              80     120     160    200    240     280    320
                                                                             TSS
                 Sample ID; IssCT56
                                                                    TSS =220.0001)1)
                                                                                           y = 0,06807x + 13.74003
                                  Figure 52. Scatter plot of BODS and TSS.
       Mode! Data | [Base Model

       Base Model Runs I Base N1
                  | Obs/Pred Scatter Plot  Obs/Pred Tin
                                                    | Facior Fingetpiinls | G-Space Plot  Fa<
                                                                          ir Contributsons Diagnostics
         Species  I _ Category
                           | Intercept  , Intercept SE  j Stops
                                                Observed/Predicted Scatter Plot
       Helpf
                                                                                           y = 0.20838X H- 0.00099
                     Figure 53.  Example of observed/predicted results for cadmium.
                                                          74

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
                                 Base Factor Prof lies
                                                              Legend: • % of Species
                                                                  1=1 Cone, of Species

10

-------
U.S. Environmental Protection Agency
                                                   EPA PMF 5.0 User Guide
multiple sites in PMF (Figure 55, Figure 56) and the user is encouraged to run each site
separately using the check box on the Data File screen and the combined analysis.
 Base Model Runs 11 Base Mode! Results

  Residual Analysis  Obs/Pred Scatter Plot  Obs/Pred Time Series | Profiles/Contribulions | Factor Fingerprints | G-Space Plot  Factor Contributions Diagnostics
                                       Factor Fi-3lile - Run 12 - Sanlar. Se..age
                                     Factor Contributions-Run 12 -Sanitary Sewage
         > lioirn-aiized Contributor
    5 2

    1
  Concentration lints Q/Qexp Ri
   .un 12  |£   - j
 Help)
Sample ID: IssKKM
                           Sample Time: 05/14/04 00:00
                                                                       Contrbution = 2.65460
                  Figure 55.  Profiles/Contributions Plot for mulitiple site data.


The relative magnitude of the source impacts varies across the sampling sites, however, the
impacts are variable and multiple sites have both high and low source contributions.  Combining
the sites seems justified based on the variability between sites. The observed vs. predicted
concentration time series also has lines between the sites (Figure 56).  The time series shows
that observed and predicted concentrations are large for a few sampling sites and low for
others.  The data from the sites with large differences should be evaluated in more detail to
determine whether the samples should be combined in the PMF analysis.

The Q/Qexp plots should also be evaluated because it provides a complimentary time-series
plot to the obs/pred species plots.  Time series plots in the Rotational Tools also display the
lines between the sites.
                                             76

-------
U.S.  Environmental Protection Agency
                                                             EPA PMF 5.0 User Guide
ifc EPA PMF
 Model Data  I Base Model  Rotational Tools  Help
                                                                                 I ° || a fal
     Base Model Runs  Base Model Resulls
      Residual Analysis  Obs/Pred Scatter Plot | Obs/Pred Time Seiies  Profiles/Contributions  Factor Fingerprints  G-SpacePlot  Factor Contributions  Diagnostics
       Select Species
       TSS
       NH3
       TP
       Cr
       Cu
       Pb
       Ni
       Zn
                       Observed/Predicted Time Seiies
                                                BODS-Run 5
                                             Predicted Concentration
      : Run 5

     Help I     Sample ID: I.S5NS06         Sample Time: 03/13/0600:00      Observed Concentration = 2,00000    Predicted Concentration = 5.64100
              Figure 56. Observed/Predicted Time Series Plot for multiple site data.


8.1.4   Error Estimation

The BS, DISP, and BS-DISP results show some instability in the solution, which is due to the
small size of the data set and limited number of factors. The error estimation results are shown
in Figure 57.

    •   DISP results (Figure 57, 1) show that the solution is stable because no swaps are
       present.
    •   BS results (Figure 57, 2) for the metals source show that the source was mapped to the
       sanitary sewage and  stormwater sources 6 and 8 times, respectively. This may be due
       to PMF not fitting this highly variable source and the BS data sets also might not have
       captured the variability in the metals.
    •   BS-DISP results (Figure 57, 3) highlight that the solution may not be reliable due to
       swaps across  two factors.  The number of swaps is low and the results may reflect the
       relatively small data set with variability introduced by many sampling sites.
                                              77

-------
U.S.  Environmental Protection Agency                     EPA PMF 5.0 User Guide
                                                                                         -iDlxl
 Base Model Runs  Base Model Results | &a-;: ' ': ••. D jT Resufe  Base Model Bootstrap Results  Base Model BS-DISP Results  Error Estimation Summary

 DISP Box Plots | DiSP Summary
      0    -0.002
    000
    000
    000
    000
 Model Data Base Model  Rotational Tools  Help
          Base Model Results
                    Base Model DISP Results | [ Base Model Bootstrap Results  Base Model BS-DISP Results  Error Estimation Summary
 Bootstrap Box Plots [ Bootstrap Summary.
 Base model run number:      5
 Number of bootstrap runs:    100

 Min. Correlation R-Value:    0.6
   her of factors:        3
 Extra modeling uncertainty (%): 0,0
 Boot Factor 1
   it Factor 2
 3cot Factor 3
 Model Data j Base Model  Rotational Tools  Help
 Base Model Runs  Base Model Results
                    Base Model DISP Results
                                Base Model Bootstrap Results  [Base Model BS-D SF Results  Error Estimation Summary
 BS-DISP Box Plots j BS-DISP Summary
     94  -38.121    601
    Oil

    011       o
    oil       ,3
                       Figure 57. Comparison of error estimation results.


It is recommended that all of the results be reported and explained, and that the
*_ErrorEstimationSummary file should be provided as supplemental information for publications.
The error estimation summary plot provides a summary of the error estimates.  For this
analysis, the BS-DISP errors, which capture both random errors and rotational ambiguity, have
the largest range (Figure 58).

8.2    St. Louis Supersite PM2.s Data Set

This exercise focuses on the data set provided in Dataset-StLouis-con.csv and Dataset-StLouis-
unc.csv. The exercise is intended to demonstrate the evaluation of base model results and
addition of constraints using EPA PMF. A number of papers have been published on St. Louis
particulate matter (PM) apportionment and Amato and Hopke (2012) have recently published an
analysis of St. Louis data. The example given here is not a complete analysis;  it illustrates how
to analyze the data with PMF and the importance of evaluating the model results.  The PMF
input parameters are summarized in Table 5.

8.2.1   Data Set Development

The St. Louis PM  data set includes  13 species and 420 hourly samples, taken during June
2001, November 2001, and March 2002 at the East St.  Louis Supersite (Figure 59). The data
were formatted in .csv files with each row representing one sample and each column one
species. Uncertainty estimates by species and sample were provided by the analytical lab.
                                              78

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
Samples below the detection limit were given an uncertainty of 5/6 the detection limit, missing
samples were given an uncertainty of 4 times the median concentration, and samples above the
detection limit were given an uncertainty of 1/3 the detection limit plus a sample-specific
laboratory uncertainty.  In particular, this data set was chosen to illustrate adding constraints to
the PMF model based on known source profiles.
       DBS  DBS Disp • Dsp  — Ba=« Run    Error Estimation Concentration Summary
   10'
   10"
  a
  CO
  £•
  = irT
  c 111
  en
  GO
   10"




   101


   10:

  ffi
  1 10"'

  o



   10"


   10"



   10'


   10=


  »10"
  O3
  s
   10"





   10"
           \      %       %       *                e^        ^                <*

  Figure 58. Error estimation summary plot of range of concentration by species in each factor.
                                           79

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
             Table 5.  St. Louis Example - Summary of PMF input information.
***Data Files***
Concentration file:
Uncertainty file:

Excluded Samples
none

Data set-StLouis-con. csv
Data set-StLouis-unc. csv



	 | |
**** Input Data Statistics ****
Species
Cd
Cu
Fe
Mn
Ni
Pb
Se
Category S/N
Bad 0.80
Strong 5.35
Strong 2.30
Strong 8.80
Weak 0.52
Strong 8.43
Weak 0.55
**** Base Run Summary ****
Number of base runs:
Base random seed:
Number of factors:
Extra modeling uncertainty (%):



20
30
7
0


Species Category S/N
Zn Strong 5.05
SO4 Strong 6.73
NO3 Bad
_J_ 5.31
OC Strong 3.59
EC Weak 0.67
Mass Weak 0.92


        Figure 59. Satellite image of St. Louis Supersite and major emissions sources.
                                       80

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
8.2.2  Analyze Input Data

Characterizing Species (Concentration/Uncertainty and Concentration Time Series)

The species categories were set based on the guidance in Section 5.5.1. The user should first
examine the input data to determine whether the species concentrations from expected sources
are temporally related.  For example, do iron and zinc concentrations vary together, indicating
the presence of steel production or other sources?  The time series of iron and zinc are shown
in Figure 60.  A zoomed-in graph of the time series is generated by both holding the "Alt" key,
and the left mouse button while drawing a box around the period of interest. Select "Alt" and
click the left mouse button to return to the original figure.
 Data Files Concentration/Uncertainty  Concentration Scatter Plot | Concentration Time Series Data Exceptions
  Select Species "
  DCd
  D Cu
   Fe
  Dj
  3 Ni
  H Pb
  II Se
  & Zn
  D SO4
  D N03
  JOC
  D EC
  ~l Moss

0.4
Species Concentrations
• — - Fe « 	 • Zn



— —




0.4
      Clear Selection!
                    LngScnks
                                           o   a
                                                                      Exdude Samples  Restore Samples
 Help I
  Figure 60.  Concentration Time Series screen and zoomed-in diagram for the St. Louis data set.


The plot in Figure 60 shows a complex picture, because high zinc concentrations do not
correspond to iron concentrations.  This discrepancy may indicate a local source of zinc that
does not include iron. In the case of this example in St. Louis, a zinc smelter was located near
the monitoring site.
                                           81

-------
U.S. Environmental Protection Agency
                        EPA PMF 5.0 User Guide
Relationships Between Species (Concentration Scatter Plot)

Scatter plots between species should be examined for relationships that indicate that a common
source emitted both species (e.g., OC and EC are both emitted by mobile sources). In the St.
Louis data set, lead and zinc are not related, which indicates two potential sources (Figure 61).
                                                                                     .JnJ_xJ
       Base Model Relational Toi

       Concentration/Uncertainty Concentration Scatter Plot  Concentration Time Series  Data Except!
  YAxis
  Cd
  Cu
  Fe
  Mn
  Ni
  Se
  Zn
  SO4
  N03
  OC
  EC
  Mass
  S04
  N03
  OC
  EC
  Mass
                                     Species Concentration
0.24

0.23

0.22

0 1

0 0
0.07

0.06

0.05

0.04

0.03

0.02

0.01

0.00
 Help]
           03/22/02 17:00
                                                     Zn = 0.06230
                                                                        y = 0.18954X + 0.00638
                   Figure 61. Concentration scatter plots for steel elements.


Excluding Samples (Concentration Time Series)

The user should examine the concentration time-series plots to verify that the species selected
for PMF have expected seasonal patterns (e.g., high sulfate during the summer), as well as to
identify unusual events (e.g., fireworks on the Fourth of July, which contribute to high levels of
potassium, strontium, and other trace metals). Often, these events are easily identified. The
samples taken  during these identified events should be excluded because the overall profiles
may not capture the unique composition of the source, or the profiles of non-event sources may
be distorted.  Exclude a sample by highlighting it and clicking "Exclude Samples" at the bottom
right of the screen.  All data exclusions must be well-justified and documented.
                                            82

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
8.2.3   Base Model Runs

Initial Model Parameters (Base Model Runs)

The model was run 20 times with 8 factors and a seed of 30. A constant seed was used to
replicate results for training  purposes and the runs converged and the Q values were very
stable.  The Q(robust) was about 10% lower than the Q(true), indicating some, but not heavy,
impact of outliers on the Q-value.

Based on the observed-versus-predicted scatter plots and time series, some species, such as
lead, were modeled well, and others, such as cadmium, were not well-modeled (Figure 62).
This could be the result of incorrect uncertainties, improper categorization (e.g., as strong
species),  too few factors being modeled, not enough impacts from the source, or PMF
incorrectly modeling the species variability.  This lack of fitting trace species has been noticed
for high-time-resolution sampling (one-hour frequency or less). A cadmium source such as an
incinerator is most likely present near the monitoring site. However, the data does not have
enough information for PMF to resolve it. The poorly modeled species (cadmium) should be
categorized as "Bad."
                                                             Observed Concentrations
  Figure 62. Example of output graphs for cadmium (poorly modeled) and lead (well-modeled).


In addition,  NO3 (shown in the graphs in  Figure 63) has many fixed values for the first intensive
during the summer of 2001 that may be set at the MDL.  This issue is not present for the next
two time periods as shown in Figure 63 and NO3 should be set as "Bad" if the entire data set is
used and "Strong" if only the last two intensives are used.
                                         83

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
           Observed Concentration  e  e Predicted Concentration
       Figure 63. Example of inconsistencies in input data.  The multiple points shown in
       blue in the lower left graphic are fixed values.
Rotations (G-Space Plots)

G-space plots of the solution should be examined to determine whether the contributions fill the
solution space and there are edges or points with low or zero contributions. Selection of the
species for these plots is important and species should be plotted against regional source
indicators, such as coal-fired power plants.  Figure 64 shows two examples, one with points
near both axes and the other with points only on one axis. Fpeak should be evaluated to
determine whether a more optimal solution can be found.  If a point is selected in one figure, the
same point will be highlighted in the other figures.

Factor Identification (Profiles/Contributions, Aggregate Contributions)

Factors may be identified using dominant species and temporal patterns. Nitrate was removed
from the analysis and the number of factors was reduced to seven (since nitrate was one
factor).  The seven factors identified in the St. Louis data set represent a realistic solution based
on known sources in the area, which are crustal (Mn), copper smelter (Cu), coal combustion
(SO4, Se), zinc smelter (Zn), iron and  coal (Fe and EC), lead smelter (Pb), and motor vehicle
(OC, EC).  The iron and coal factor seems to be a mix of species and the factor is evaluated
using the constraints later in this  example. The factor profiles are shown in Figure 65.
                                          84

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
                   G-Space Plot- Run 1
                                                              G-Space Plot-Run 1
                 Coal Combustion Contributions (avg=1)
                                                            Coal Combustion Contributions iavg=1:
 Figure 64. Example of G-space plots for independent (left) and weakly dependent factors (right).


Mass Distribution (Factor Contributions)

Figure 66 shows the factor contributions as a pie chart for the total mass variable (PM2.s).
Evaluate the distribution of contributions to determine whether they are within the expected
range for the samples.  The major sources for this example are motor vehicles and coal
combustion, with minor contributions from the crustal, zinc smelter, lead smelter, and copper
smelter sources.

8.2.4 Error Estimation

A summary of the error estimate results from the *_ErrorEstimationSummary file are shown in
Table 6 along with comments. The results are stable and no swaps were present.  The
*_ErrorEstimationSummary file should be reported with any publication and report.

This example demonstrates the iterative approach for evaluating a PMF solution:  evaluate input
data, calculate and evaluate base results, and evaluate error estimates.  The Error Estimation
Concentration Summary plot  is shown in Figure  67.

8.2.5 Constrained Model Runs

Define Constrain Expressions (Expression Builder)

For the St. Louis data set, source profiles of local steel facilities were  used to determine
appropriate ratios of iron and  manganese in the  steel factor. Samples were analyzed as
described in Pancras et al. (2005). This method provides total inorganic concentrations, which
are comparable to the  total inorganic concentrations from Energy Dispersive X-ray fluorescence
(EDXRF). The profile of the Granite City Steelworks basic oxygen furnace was used as a
representative sample, because it is believed to  be impacting the site; the ratio of EDXRF iron to
                                          85

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
manganese in the source profile was 60. The average ratio of iron to manganese in the St.
Louis ambient air data was 10.8.  However, the base model run results from PMF showed that
the iron-to-manganese ratio of 51 was a little low based on the steel factor profiles. The ratio
constraint was defined using the Expression Builder, which was interpreted as an autopull
equation of iron  minus 60 times the manganese in the steel factor, pulled to zero with a given
dQ limit ([Steel|Fe] - 60 * [Steel|Mn] = 0). In addition, EC was selected in the iron and coal
factor and the right mouse button was used to toggle EC as a constraint. This might allow EC to
be better separated from the steel source. The % dQ was set at 5% for each constraint and the
converged results used 2.1% dQ.
                                Base Factor Profiles
                                                              Legend: • % of Species
                                                                   1=1 Cone, of Species
10D
£ £ 10''

oS io2
10"3

 g 1U,
^010
s 10-;

_
_

-
"

-
-
-



-



_
-
~_


-


-
-

-
-
~


-






m

•











—
^













\~f
1







































u



n




rii
^*



H


i








H


rii
™



































B
























-•-




^



%.




•









H



_





_




"



•
"














p=-




[•"








B






*





H


•



•






H





•



B



.
w




_








	 !
^










•




H


"il
^





















































H











^



^
















































i


















i















i


















i

3







































1













|




1









1





I













|




1









I
<-,,





















































B






•







•
i
%
^.
_
_
_
-
-

:
-
:


_
_
_

-
—
-
-

_
-


-
-

-
-
:


j
-


70 0
38 1
SL


O
0
70 1
(D
* W
(D
0?

O
o
7° 0
O
c
=*
o
n ^

M
70 g
!« ™
* 3
(D
^T





>5 P°
O
0


70 IS
Q.
3
2.
?


70 §
r Vehicle
%
c

                      Figure 65. St. Louis stacked base factor profiles.
                                          86

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
                    Mass- Run b
                                                  Factor Contribution >0.05 %

                                                  •  Crustal= 0.47107 (2.4%)

                                                  •  Copper Smelter = 0.20125 (1.0%)

                                                  D  Coal Combustion = 7.93860 (40.1 %)
                                                  •  Zinc Smelter = 0.86724 (4.4%)

                                                  D  Iron & Coal = 1.36020 (9.4 %)

                                                  D  Lead Sm elter = 0.30517 (1.5 %)

                                                  D  Motor Vehicle = 8.14000 (41.1 %)
                     Figure 66.  Distribution of mass for St. Louis PM2.s.
Constrained Model Run Results (Constrained Profiles/Contributions and Diagnostics)

In the resulting constrained run, the ratio moved to 60 and the EC was also significantly reduced
to around 40%, shown in Figure 68.  It is important to remember that EC will be shifted to
another factor. The largest change in profile was found for motor vehicles.  This indicates that
the constraints provide an improved result compared to the base run.

These changes did not have a large impact on the overall factor contributions to the mass (the
iron and coal factor was reduced by 2.3% and the motor vehicle factor increased by 1.1%);
however, it demonstrates the benefit of bringing in  external information. After adding
constraints, run all three error estimates and compare them to the base model results.  The
error estimate summary (Figure 69) does not show a significant change. In other data  sets, the
addition of constraints may reduce the size of error estimates by reducing rotational ambiguity.

8.3    Baton Rouge PAMS VOC Data Set

The following sections detail a PMF analysis of a Photochemical Air Monitoring Station (PAMS)
VOC data set from  Baton Rouge,  Louisiana.  The user should run EPA PMF 5.0 with the data
sets provided in Dataset-BatonRouge-con.csv and Dataset-BatonRouge-unc.csv to follow the
analyses described below.  This exercise is intended to demonstrate the thought process and
steps involved in reaching a solution using  EPA PMF 5.0; it is not intended to  be a complete
source apportionment analysis. The PMF input parameters are summarized in Figure 69.
                                          87

-------
U.S. Environmental Protection Agency
                       EPA PMF 5.0 User Guide
    IBS • BS D(EP • DISP — Base Run  Error Estimation Concentration Summary
   10"
    -Hrttt"
   10°
   10
  cio'
  I 10°
  | 10"
  ° 10'
  m 10"!
  O ,
  0 10

  ,. 10'
  CD ^
   10
  oS -
  _ 10'3
   10'
   10
   10
-in
rttt.
          _Lk_L
                 rrtf
turf
                      fffci
              .  mi  .,  ~tft
It
                      Hit
                     1
              -1	1	1-
         Figure 67. Summary of base run and error estimates.
   eRun: • % of Specws
                lonsirainec! Factor Profile - Iron & Coal
 = 10"'


 Figure 68. Comparison of base model and constrained model run profiles for the steel factor.

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
                                   Table 6. Error Estimaton Summary results.
BS-DISP Diagnostics
# of Cases:
Largest Decrease in
Q:
% dQ:
# of Decreases in Q:
# of Swaps in Best
Fit:
# of Swaps in DISP:
Swaps by Factor:
101
0.382999986
0.370098358
0
0
0
0






0






0






0






0






0






0
DISP Diagnostics
Error Code:
Largest Decrease in
Q:
% dQ:
Swaps by Factor:
0
0.035999998
0.034787313
0



0



0



0



0



0



0
BS Mapping

Boot Factor 1
Boot Factor 2
Boot Factor 3
Boot Factor 4
Boot Factor 5
Boot Factor 6
Boot Factor 7
Base Factor 1
100
0
0
0
0
0
0
Base Factor 2
0
100
0
0
0
0
0
Base Factor 3
0
0
100
0
0
0
0
Base Factor 4
0
0
0
100
0
0
0
Base Factor 5
0
0
0
0
100
0
0
Base Factor 6
0
0
0
0
0
100
0
Base Factor 7
0
0
0
0
0
0
100
Unmapped
0
0
0
0
0
0
0
                                                   89

-------
U.S. Environmental Protection Agency
                           EPA PMF 5.0 User Guide
    D ES D BS DSP • Dsp — Bs=e Run  Constrained EE Concentration Sutnmary
  10"
 •5
 2 10
 0 -
  10
  10"
  10
  10
 fio;

 l£
 
-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
             Table 7. Baton Rouge Example - Summary of PMF input information.
***Data Files***
Concentration file:
Uncertainty file:

Excluded Samples
none




Data set-BatonRouge-con. csv
Data set-BatonRouge-unc. csv








**** Input Data Statistics ****
Species
Category
124-Trimethylbenze Bad
224-Trimethylpentai Strong
234-Trimethylpentai Bad
23-Dimethylbutane Bad
23-Dimethylpentane Bad
2-Methylheptane Weak
3-Methylhexane Bad
3-Methylpentane
Acetylene
Benzene
Cis-2-Butene
Bad
Strong
Strong
Bad
Cis-2-Pentene Bad
Ethane
Ethylbenzene
Ethylene
1 so butane
Isopentane
Isoprene
Iso pro pyl benzene
M_P Xylene
M-Diethyl benzene
Bad
Strong
Weak
Weak
Weak
Bad
Bad
Bad
Bad
S/N
5.46
5.67
5.55
5.51
5.48
5.08
5.65
5.62
5.67
5.67
3.28
5.10
5.67
5.67
5.67
5.67
5.67
5.56
2.32
5.67
2.66














**** Base Run Summary ****
Number of base runs:
Base random seed:
Number of factors:
Extra modeling uncertainty (%):



Species
M-Ethyltoluene
N-Butane
N-Decane
N-Heptane
N-Hexane
N-Nonane
N-Octane
N-Pentane
N-Propyl benzene
N-Undecane
O-Ethyltoluene


O-Xylene
Propane
Propylene
Styrene



Toluene
Trans-2-Butene
Trans-2-Pentene
Unidentified


TNMOC


20
25
4
0



Category
Bad
Strong
Weak
Strong
Weak
Weak
Weak
Weak
Bad
Bad
Weak
Strong
Strong
Weak
Bad
Strong
Bad
Bad
Bad
Weak









S/N
5.53
5.67
5.20
5.67
5.62
5.43
5.58
5.67
3.76
5.03
5.00
5.67
5.67
5.67
4.95
5.67
3.16
5.43
1.00
0.75

8.3.2  Analyze Input Data

Characterizing Species (Concentration/Uncertainty and Concentration Time Series)

S/N ratios are not as useful in this analysis because all species were given a set uncertainty;
therefore, species categorizations will be evaluated based on residuals and observed/predicted
statistics after the initial base runs.  Species with greater relative uncertainties were categorized
as "Bad" and excluded from the analysis. For the initial run, all included species were
categorized as "Strong" and all 21 species, including total non-methane organic compounds
(TNMOC), were used.

Relationships Between Species (Concentration Scatter Plot)

Scatter plots between species are examined to evaluate relationships between the species that
may indicate a common source.  In  the Baton  Rouge data set, expected relationships between
gasoline mobile source species, such as toluene and o-xylene (Figure 70, 1) and heavy-duty
vehicle mobile source species, such as n-decane and n-undecane (Figure 70, 2) are indicated.
                                         91

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
Ethane and propane (Figure 70, 3) show some evidence of two source influences that have
different ethane and propane ratios, potentially indicating a mix of fresh sources from
petrochemical processing/natural gas use and aged carryover from other areas.  Benzene and
styrene (Figure 70, 4), often mobile source-dominated species, were not well-correlated with
other mobile source species; this lack of correlation is likely due to emissions of these species
from the several large petrochemical sources in the area.
                                                           N-Decane/N-Undecant
         Figure 70.  Relationships between ambient concentrations of various species.
Excluding Samples and Species (Concentration Time Series)

Time series of each pollutant were examined for extreme events and/or noticeable step
changes in concentrations that should be removed from the analysis. Step changes (e.g.,
differences due to changes in laboratory analytical technique) may be mistakenly identified as
separate sources of the species. If samples are removed due to unusual events in various
                                          92

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
species, further data analysis outside EPA PMF could be used to confirm whether the data are
real and informative.

8.3.3  Base Model Runs

Initial Model Parameters (Model Execution)

Initially, 20 base runs with 4 factors and a seed of 25 were explored.  In this iteration, the
Q-values varied by several hundred units, indicating the solution may not be stable.  The
species and categories are shown in Table 8. A number of the species categories were
changed to "Weak" after the residuals and plots were evaluated as described below.
Strong/Weak is shown in the Category column of Table 8 for species that were changed.
                           Table 8. VOC species categories.
Species Category
1 ,2,4-Trimethylbenzene
2,2,4-Trimethylpentane
2,3,4-Trimethylpentane
2,3-Dimethylbutane
2,3-Dimethylpentane
2-Methylheptane
3-Methylhexane
3-Methylpentane
Acetylene
Benzene
Cis-2-Butene
Cis-2-Pentene
Ethane
Ethyl benzene
Ethylene
Isobutane
Isopentane
Isoprene
Isopropyl benzene
M_P Xylene
M-Diethyl benzene
M-Ethyltoluene
N-Butane
N-Decane
N-Heptane
N-Hexane
N-Nonane
Bad
Strong
Bad
Bad
Bad
Strong/Weak
Bad
Bad
Strong
Strong
Bad
Bad
Bad
Strong
Strong/Weak
Strong/Weak
Strong/Weak
Bad
Bad
Bad
Bad
Bad
Strong
Strong/Weak
Strong
Strong/Weak
Strong/Weak
                                         93

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
Species
N-Octane
N-Pentane
N-Propylbenzene
N-Undecane
O-Ethyltoluene
O-Xylene
Propane
Propylene
Styrene
Toluene
Trans-2-Butene
Trans-2-Pentene
Unidentified
TNMOC
Category
Strong/Weak
Strong/Weak
Bad
Bad
Strong/Weak
Strong
Strong
Strong/Weak
Bad
Strong
Bad
Bad
Bad
Weak
8.3.4   Base Model Run Results

Model Reconstruction (Obs/Pred Scatter Plots, Obs/Pred Time Series)

Residuals of the species were analyzed and the histograms of scaled residuals (after selecting
autoscale) are shown for benzene, which had a good fit, and poorly fit ethylene in Figure 71.  In
addition, the observed vs. predicted scatter plots and time series are shown in Figure 72 and
Figure 73, respectively.  Since PAMS data are only collected during the summer, the time-series
plots have a missing time period during fall through spring. The scatter plots and the time series
also show the difference between the observed and predicted concentrations.  The poorly fit
species have scaled residuals greater than 3.0 and the peak observations are not fit in the
scatter or time-series plots. Species with a  number of scaled residuals above 4 have peak
concentrations that were not fit by PMF: 2-methylheptane, ethylene, isobutane, isopentane,
n-decane, n-hexane, n-nonane, n-octane, n-pentane, o-ethyltoluene, and propylene. The
category for these species was set to "Weak."

Factor Identification (Profiles/Contributions, Aggregate Contributions)

The base run was re-run and profiles and contributions were examined to identify factors.
Measured profiles were  used to support the identification of the factors and the factor names
have been added to Figure 74 by right-clicking in Profiles/Contributions  and naming the factors
via the "Factor Name" option.
                                          94

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
                                         0123
                                         Scaled Residuals
                           ..EL.
                     -11 -10 -9  -8-7-6-5-4-3-2-10  1   2  3  4  5  6  7
                                         Scaled Residuals




           Figure 71.  Histogram of scaled residuals for benzene (1) and ethylene (2).
                                          95

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
                                             Benzene
        Qre-lo-Qre
        =*=;•'== = -;-
                                           4      5
                                         Observed Concentrations
                                           Benzene-Run 4
             Observed Concentration  »—• Predicted Concentration
                       Figure 72. Observed/predicted plots for benzene.
                                              96

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
                                             Ethyiene
                IT
                                         20     25    30
                                         Observed Concentrations
                                                          35     40    45     50
                                           Ethyiene -Run 4
            Cbser.-ed Concentration
                             Predicted Concentration
                                                                                     - 50
                       Figure 73.  Observed/predicted plots for ethylene.
                                              97

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide



101
10°
1 1 1Q~1
a: °
10"'
io-3
-4
-
—

-
-

10 —
101
(D
1 10°
CD
O (J
 Q ' u
1° 2
1 io-2
ra
5 1Q-3
in"'
_
-

-

-





•










101
1 10°
* 0
LU ^ -1
c. o
"o -2
5
1Q-3
.n*
-
_





-

—
















io1
1 10°
ra
* £ -n-l
L1J g 10

40 £?
•B
O
B'
20 m
,-i
vi f\n

___



























•









_











•




















M



-
-
-


~
-
~

1 UU
30
CD
CO
60 =
- n
m
40 *z-
D9
C
S3.
20
,-|
vi rin









—



-I-











-1-







1 — |




•







— • —
	







•
-
-
-

~
-
-
-
~
1 UU
80
D
(D
60 S

* 51
40 if
s
20
n

                             Figure 74. VOC factor profiles.


The PMF results were compared to measured profiles using the first and second columns from
Fujita (2001), shown in Figure 75. The n-decane levels in the diesel exhaust profile
(Tu_MchHD) are high  compared to the vehicle emissions (Exh_J) and Figure 76 shows the
factor fingerprint plot for which n-decane is predominately associated with the diesel factor.  The
acetylene contributions to sources will be discussed in  later in this example. Acetylene is
                                         98

-------
U.S. Environmental Protection Agency
EPA PMF 5.0  User Guide
predominately associated with vehicle emissions and has a small contribution to gasoline vapor.
It is also present in the industrial source and diesel.
 Table 1
 Source profiles applied in the Paso del Norlc O/0nc Study (volume percent of sum of 55 PAMS target species)3 MS target Species)3
PROFILE
clhcne
ethane
acetylene
Propene
n-propanc
isobuune
1 butene
n-hottne
t-2-Bulene
c-2-bulcnc
isopeoune
1-pcntene
n-peaar*
i-2-Penieue
c-2-pcnlcnc
2.2-dunelhylriuunc
cyclopcnlanc
2.3-diruclriylbuunc
2-mclhylpentane
3-rncrhylpcnijiric
2-melhyl- 1-peniene
ri-heftarie
Mcthyicyclufimunc
2.4 dllTKUIvlfHUILUlf
benzene
eyclahcxane
2-mcthy]hcxanc
2,3-dinwhylpenunc
3 mrthylhexuue
2,2,4.!nmclh>lpi;ii!aiw
hura Vehicle
Fjniuioat
lE.hJ)
5 19*078
* 095*0,14
* 5.29 1 1 .47
2.00*0.73
" 3-06 ± 0.46
» 0.86±0.13
0.00 ±020
• 3.72 1 0.56
0.96 ±043
0.30 1 0.05
• 880±l.32
0.42 ± 0.06
* 4.64 * 0 70
0.97*0.15
0.49*0.07
• 0.37*0.06
• 0.46*0.07
1.19*0.18
• 3.36*0.50
• 2.09i 0.31
0.39 x 0.06
• 2.68*0.40
* 1.78 ±0.27
• 1.29*0.19
• 4.50 1 0.68
• 0.90 ±0.14
• 1.5110.23
• 1.98*0.30
• 1,6710.25
• 3.114 * 0.1O
Dieiel
Exhaust
llX-MebHD)
9.87 ±2 81
1.1810.68
US 11.71
3,9910.93
2.22 11.05
0.2710.29
2.95 ±0.55
0.64 ± 1.58
0,24*0.44
0.29*0.11
1.31 ±3.43
0.89 ±0.21
1.52*1.27
036*0.38
0.2910.19
2.63 1 0.98
0,32 ±0.23
0.32 1 0.62
1-97 ±1-08
0.91 1 0-68
0-22±0-19
096 ±0.60
0.62 ± 0.48
0.36 ±0.27
3.19*1.52
0.23 * 0.0";
0.00*0.11
0.91 1 0.37
2J1 i 1.29
1 .48 * 1 26
Adjusted
Propane Bus
6 30 1 0 95
108*051
3.72 1 0.65
5.01 1 2.39
57.90lB.69
3-531 1.74
0.00*020
17.77 ± 10.14
1.65 ±2 34
0.06 ± 0.01
0.57 ±028
0.05 10.01
000*0.20
0,00 ±0.20
0.00*0-20
O.OOl 020
0,00 ±0.20
0.00 ±020
0.00 ± 0 20
0.00 ±020
0-011 ±020
0.00 ±0.20
0.00 1 0.20
0.00 ±0.20
0.00 1 0.20
0.00 * 0 20
0.00 1 0.20
0.00 10.20
0.00 10.20
0.011 1 0 20
Liquid
Gasoline
IME75R25PI
000*013
0.00 * 0. 1 3
0.00*013
0.00*0.13
0.02 * 0.01
0.5310.15
0.00 ±0.01
2.52 1 0.60
0.04 ± 0.01
0.01 ±001
592*157
0.28 1 0.07
3 65 ±069
0.75 ±013
0.43 + 0.08
04410.07
0.46 ±0 07
1X0 ±009
3.971023
2.61 ±0,14
0-00 ±0.10
3.46±0.17
2.27 1 0.04
158 ±0.67
2.64 ±1.03
1 .Z3 * 0.02
210*015
3.34*0.22
2.38*0.12
4.38*069
Gasoline
Vapor
(M»ea_HS>
000*012
0.00*0.12
0.1710.13
011*012
1.44*0 19
4.80 ± 0.50
0.06*012
19821 1.99
2.19 ±0.25
1.91 10.23
3640±364
0.9810.16
11.91 ± 1.20
0.00±0,12
0.93*0.15
1 161017
0.00 ±0.1 2
0001012
471 ±0.49
2.5210.28
0-00 ±0 10
2.03 ±0.24
0.8710.15
0.52 ±0.14
0761014
0.20*0.13
0.4810.13
0.31 10.13
0.48 10.13
2.13*005
Liquefied
Pf irdeum Gas
0 00 1 0 20
t.51 ±0.30
0.00 10.20
0.11*0.21
85.601 1286
2.47 ±0.42
0.00*020
9.5814.71
0.00 ±020
000*020
034±021
0.00 ±020
0.10*0-20
0,00 ±0.20
0.00 * 0.20
0,00±020
0.00 ±020
0.00 ±0.20
0 00 1 0 20
0.00 ±0.20
0-00 1 0 20
0.00 ±0.20
000 1 0.20
0.00 ±0.20
0.00 ±0.20
0.00 ± 0.20
000*0.20
0.001020
0.00 10.20
0 00 1 0 20
Commercial
Natural Gas
(CNGJ)
000 • 020
78.06 ±11. 72
0.00 10.20
0.00 ± 0-20
15.75 12.37
2.26 1 0.39
0.00 ±020
3.76 1 0.60
0.00 ±020
0.00 * I) 20
130±028
0.00 1 0.20
1.20*0.27
0.00*0.20
0.00 to 20
0.00 ±020
0.00 ±0.20
0.00 ±020
000 10.20
0.00*020
0.00 r 0.20
0.0010.20
0.00 1 0 JO
0.00 ±0.20
0.00 ±0.20
0.00 * 0.20
0.00*0.20
0.00 1 0.20
0.00*0.20
0.00*0.20
Industrial
FC>
234*035
22,80*14.90
0.4! 10 20
3.30*1.40
21.50*3.24
4.2711.13
1.05*0.25
Il.33l6.42
0.31*0.12
0.28 1 0.06
354*1.14
0 43 ±0.29
3.93*0.61
0.28*0.20
0.15*0.10
008*0.08
0.46 ± 0.08
037 ±0.22
139*0.39
0.81 ±0.27
0.10*0.05
1.95 ±0.30
1.13*0.23
0 65 ± 0.53
1.23 ±0.43
0.57 ±0.33
052*0.09
1.1710.92
0.56*0.18
149*160
iZcncol
2.65 1 0 96
9.98* 1.51
3.0711.38
0 98 ± 0.23
21.49*3,24
3.51*2.20
0.45*0.13
3.54 ± 0.68
0.09*0.11
0.56 ± 0.79
3 41 * 0 88
0-18*0.12
3 77 1 1-49
025*0.12
0.13*0.12
0161012
0.1810.12
0.45 ±0.1 3
1.47 1 0.25
095 ±018
0-OH±0-11
1.24*0.22
0.70*0.17
0.43±0.13
1.52 ±038
04510.22
0.87 ;0.17
0.75*0.16
0.59*0.14
3.92*2.37
Surface
Coaling
ICOATcouip)
0001044
0.00*0.44
0 00 1 0.44
0 0(1 ± 0 44
0 00 1 044
0.00 * 0 44
000±044
0.00 1 0.44
0.00 ± 0.44
0.00 ± 0.44
000±000
0-00 * 0.44
0-00 * 0.44
0-00 ±0.10
0.00 T 0.44
0.00*0.44
0.00*0.44
000±0.44
001 *OOI
001-0.07
000±010
0.00-0.44
0.04 ± 0.06
0.01 ±0.07
0.0010.44
0.15t0.23
0.2811.22
0.12*0.21
0.34*0.45
0.00*0.44
   "Profiles consisting of 100 percent isoprene (Biogenic) and 100 percent unidentified hydrocarbons (UNID) were also applied.

   * Fitting species in both auto-GC and canister samples, c • fitting species in canister samples only.

 Table 1 (Continued)
 Source profiles applied in the Paso del None Ozone Study (volume percent of sum of 55 PAMS target species)" IS target species)
PROFILE
o-hcplanc *
mcldyicyclohctanc *
2.3.4-1rimet]]y!pcnLine *
lolucne *
2-iaeuiylheptaiw •
3-meltiylhcptanc •
n-octanc •
idlgBmn
mp-xykoe
ityreoe
o-xyfcne
n-noa^Dt
isopropylbeazeue
n-propytbcn7cne
m-ethyllDluenc
B4flVftOlMM
1 .3.5-lrimcrhylhCTi'cne
iMruiyiluluefle
1.2.4-iriineuiyflwnzene
11 ri^f.-in.- c
1 ,2,3-trimelhylbenienc
m-dielhyltmume
p-dielti>1betueoe
n-UTidecane G
Othcn
Uniucnuficd c
NMHC
J Profile-s amsisting of
EjihJ
1.621024
0.85 10-13
1.28*0.19
10.34*1.55
055*0.08
0.70*0.11
051*0.08
2.05*0.31
6-11*0.92
0.46*0X17
2.46*0.37
0.19*0.03
0.29 ± 0.04
0-58 1 0.09
1.71*0.26
0.37 ±0.52
0.97 ±0 15
0.66*0.10
2,74±0.41
0.27*0.05
0.00 * 0.20
0.00*0.01
0.00 ±0.20
0.17*0.03
11.92*1.19
16.11*2.43
>2803±i:BO
100 percent
Tu McbllD
O.S71030
0.44 * 0.27
0,32*0,44
4.50 * 3 64
0.00 1 0.2 1
0-44-O22
031*017
2.85*1.98
10.99*6.84
1-85 1073
3.7.3 ±2.27
1.12*0.15
0.33 ±0.1 7
1-06*0.68
4.15*2.88
1.40*0.73
2.0911.09
2.00*1.12
7,46*4,64
2.62 * 0.64
1.67*1.04
0.00*0.11
0.00±0.1I
5.32*1.04
8.51 10.85
040*004
108 91 ±10 89
E
-------
U.S. Environmental Protection Agency
                EPA PMF 5.0 User Guide
                                 Factor Fingerprints - Run 4

                 I
                 Jl
I

I
      i
                                         V X
                                          \  \
           \ \
~act:-r ^cie-nd
• - Ref nery
• - Evaporative Gas
 - Gasoline Exhaus
I - Diesel Exhaust
                        Figure 76. Factor fingerprint plot for VOCs.
Rotations (G-Space Plots)

The G-space plot of the motor vehicle and the diesel exhaust source contributions had a weak
linear relationship (Figure 77).  This may indicate that the diesel motor vehicle source may be
mixed with the motor vehicle source, or another source of diesel combustion may be present.
The other G-space  plot pairings showed the points were distributed across the solution space
between the axes.  Fpeak should be investigated to determine whether a rotation moves points
to the axes.

Species Distribution (Factor Pie Chart)

The total variable (TNMOC) was mainly contributed to by motor vehicle exhaust and gasoline
vapor.  The industrial component was also a major contributor, as shown in Figure 78.

8.3.5   Fpeak

Examination of the  Fpeak G-space plots of motor vehicle exhaust vs. gasoline vapor showed
that some optimization might be gained using an Fpeak of -1.0.  The focus of this example is to
demonstrate source profile constraints, so the Fpeak result will not be discussed further. The
base, Fpeak, and constrained model results should be compared to determine whether the
rotational tools and  constraints provide a different interpretation of the factors and contributions.
                                         100

-------
U.S.  Environmental Protection Agency
EPA PMF 5.0 User Guide
                                           G-Space Plot - Run 4
                                         .. &
                                               2           3
                                        Diesel Exhaust Contributions (avg=1)
                  Figure 77. G-Space plot of motor vehicle and diesel exhaust.
                     TNMOC - Run 4
                                                     Factor Contribution >0.05%
                                                     •  Refinery = 49.86000 (27.0 %)
                                                     •  Evaporative Gasoline = 60.80900 (32.7%)
                                                     D  Gasoline Exhaust = 65.15000 (35.3%)
                                                     •  Diesel Exhaust = 9.26400 (5.0%)
      Figure 78. Apportionment of TNMOC to factors resolved in the initial 4-factor base run.
                                               101

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
Error Estimate Summary

As shown in Table 9, not all of the base factors were mapped to the boot factors and the
number of factors that were not correctly mapped is approximately 80%, which is relatively
stable.  The unmapped factors are due to the combination of the high variability in the data and
PMF not fitting all of the spikes in the data (Figure 79). All of the "Strong" species were selected
for the BS-DISP error estimation. The number of DISP swaps is zero and the BS-DISP swaps
are distributed across three factors.  The number of swaps in  BS-DISP is relatively high and the
BS results and model fit statistics need to be evaluated before reporting results.
                          Table 9. Base run boostrap mapping.
BS-DISP Diagnostics
# of Cases:
Largest Decrease in
Q:
% dQ:
# of Decreases in Q:
# of Swaps in Best
Fit:
# of Swaps in DISP:
Swaps by Factor:
87
-6.846000195
-0.138746462
0
1
13
1






3






4






0
DISP Diagnostics
Error Code:
Largest Decrease in
Q:
% dQ:
Swaps by Factor:
0
0
0
0



0



0



0
BS Mapping

Boot Factor 1
Boot Factor 2
Boot Factor 3
Boot Factor 4

Base Factor 1
80
0
0
0

Base Factor 2
8
92
0
0

Base Factor 3
8
6
100
13

Base Factor 4
4
2
0
87

Unmapped
0
0
0
0

                                        102

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
                                       Isobutane -Run A
          Qbsen/ed Concentration  *  a Predicted Concentration
    32
                                                                                  32
                                                    07/3 0«5
              Figure 79. Observed vs. Predicted Time Series for refinery species.


8.3.6   Constrained Model Runs

Constraints were used to determine if the acetylene is strongly associated with the industrial
source because acetylene is a key tracer for motor vehicle exhaust. In the base run, 84 and 14
percent of the acetylene was associated with the gasoline exhaust and refinery factors,
respectively. Acetylene was selected in the refinery factor using toggle constraints and it was
constrained using "Pull Down Maximally" with  a 1% dQ and acetylene was also constrained in
the gasoline exhaust factor using "Pull Up Maximally" with a 1% dQ.

The base run and constrained run results are  shown in Figure 80. The constraint used 0.84%
dQ and acetylene was pulled to zero in the refinery factor (Figure 80, 1) and increased to almost
100% in the gasoline exhaust factor (Figure 80, 2). The low amount of dQ needed to move
acetylene indicates that it is not a firm feature  of the refinery factor and that acetylene can be
used as a tracer for gasoline motor vehicle exhaust.
                                          103

-------
U.S. Environmental Protection Agency
EPA PMF 5.0 User Guide
                                      Constrained Factor Profile - Refinery
                                    Constrained Factor Profile - Gasoline Exhaust
                          \
  Figure 80. Percent of species associated with a source (1) and Toggle Species Constraint (2).
                                           104

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


9.     PMF & Application References

Adhikary, B.; Kulkarni, S.; Dallura, A.; Tang, Y.; Chai, T.; Leung, L.R.; Qian, Y.; Chung, C.E.;
   Ramanathan, V.; Carmichael, G.R. (2008). A regional scale chemical transport modeling of Asian
   aerosols with data assimilation of AOD observations using optimal interpolation technique. Atmos.
   Environ., 42(37): 8600-8615.
Aiken, A.C.; DeCarlo, P.P.; Kroll, J.H.; Worsnop, D.R.; Huffman, J.A.; Docherty, K.S.; Ulbrich, I.M.; Mohr,
   C.; Kimmel, J.R.; Sueper, D.; Sun, Y.; Zhang, Q.; Trimborn, A.; Northway, M.; Ziemann, P.J.;
   Canagaratna, M.R.; Onasch, T.B.; Alfarra, M.R.; Prevot, A.S.H.; Dommen, J.; Duplissy, J.; Metzger,
   A.; Baltensperger, U.; Jimenez, J.L. (2008). O/C and OM/OC ratios of primary, secondary, and
   ambient organic aerosols with high-resolution time-of-flight aerosol mass spectrometry. Environ. Sci.
   Technol., 42(12): 4478-4485.
Aiken, A.C.; Salcedo, D.; Cubison, M.J.; Huffman, J.A.; DeCarlo, P.P.; Ulbrich, I.M.; Docherty, K.S.;
   Sueper, D.; Kimmel, J.R.; Worsnop, D.R.; Trimborn, A.; Northway, M.; Stone, E.A.; Schauer, J.J.;
   Volkamer, R.M.; Fortner, E.; de Foy, B.; Wang, J.; Laskin, A.; Shutthanandan, V.; Zheng, J.; Zhang,
   R.; Gaffney, J.; Marley, N.A.; Paredes-Miranda, G.; Arnott, W.P.; Molina,  L.T.; Sosa, G.; Jimenez, J.L.
   (2009). Mexico City aerosol analysis during MILAGRO using high resolution aerosol mass
   spectrometry at the urban supersite (TO) - Part 1:  Fine particle composition and organic source
   apportionment. Atmos. Chem. Phys., 9(17): 6633-6653.
Aiken, A.C.; de Foy, B.; Wiedinmyer, C.; DeCarlo, P.F.; Ulbrich, I.M.; Wehrli, M.N.; Szidat, S.; Prevot,
   A.S.H.; Noda, J.; Wacker, L.; Volkamer, R.; Fortner, E.; Wang, J.; Laskin, A.; Shutthanandan, V.;
   Zheng, J.; Zhang, R.;  Paredes-Miranda, G.; Arnott, W.P.; Molina, L.T.; Sosa, G.; Querol, X.; Jimenez,
   J.L. (2010). Mexico City aerosol analysis during MILAGRO using high resolution aerosol mass
   spectrometry at the urban supersite (TO) - Part 2:  Analysis of the biomass burning contribution and
   the non-fossil carbon fraction. Atmos. Chem. Phys., 10(12): 5315-5341.
Allan, J.D.; Williams, P.I.; Morgan, W.T.; Martin, C.L.; Flynn, M.J.; Lee, J.; Nemitz, E.; Phillips, G.J.;
   Gallagher, M.W.; Coe, H. (2010). Contributions from transport, solid fuel burning and cooking to
   primary organic aerosols in  two UK cities. Atmos. Chem.  Phys., 10(2): 647-668.
Amato, F.; Pandolfi,  M.; Escrig, A.; Querol, X.; Alastuey, A.; Pey, J.; Perez, N.; Hopke,  P.K. (2009).
   Quantifying road dust resuspension in urban environment by Multilinear Engine: A comparison with
   PMF2. Atmos. Environ., 43(17): 2770-2780.
Amato, F. and Hopke, P.K. (2012) Source apportionment of the ambient PM25 across St. Louis using
   constrained positive matrix factorization. Atmos. Environ., 46(2012): 329-337
Anderson, M.J.; Miller, S.L.; Milford, J.B. (2001). Source apportionment of exposure to toxic volatile
   organic compounds using positive matrix factorization. J. Expo. Anal. Environ. Epidemiol., 11(4): 295-
   307.
Anderson, M.J.; Daly, E.P.; Miller, S.L.; Milford, J.B. (2002). Source apportionment of exposures to
   volatile organic compounds II. Application of receptor models to TEAM study data. Atmos. Environ.,
   36(22): 3643-3658.
Anttila, P.; Paatero, P.; Tapper, U.; Jarvinen, O. (1994). Application of positive matrix factorization to
   source apportionment: Results of a study of bulk deposition chemistry in Finland. Atmos. Environ., 29:
   1705-1718.
Banta, J.R.; McConnell, J.R.; Edwards, R.; Engelbrecht, J.P. (2008). Delineation of carbonate dust,
   aluminous dust, and sea salt deposition in a Greenland glaciochemical array using positive matrix
   factorization. Geochemistry Geophysics Geosystems, 9
Bari, M.A.; Baumbach, G.; Kuch, B.; Scheffknecht, G. (2009). Wood smoke as a source of particle-phase
   organic compounds in residential areas. Atmos. Environ., 43(31):  4722-4732.
Baumann, K.; Jayanty, R.K.M.; Flanagan, J.B.  (2008). Fine particulate matter source apportionment for
   the Chemical Speciation Trends Network site at Birmingham, Alabama, using Positive Matrix
   Factorization. J. Air Waste Manage. Assoc., 58: 27-44.
                                             105

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Begum, B.A.; Kim, E.; Biswas, S.K.; Hopke, P.K. (2004). Investigation of sources of atmospheric aerosol
   at urban and semi-urban areas in Bangladesh. Atmos. Environ., 38(19): 3025-3038.
Begum, B.A.; Biswas, S.K.; Kim, E.; Hopke, P.K.; Khaliquzzaman, M. (2005). Investigation of sources of
   atmospheric aerosol at a hot spot area in Dhaka, Bangladesh. J. Air Waste Manage. Assoc., 55(2):
   227-240.
Begum, B.A.; Hopke, P.K.; Zhao, W.X. (2005). Source identification of fine particles in Washington, DC,
   by expanded factor analysis modeling. Environ. Sci. Technol., 39(4):  1129-1137.
Begum, B.A.; Biswas, S.K.; Hopke, P.K.;  Cohen, D.D. (2006). Multi-element analysis and characterization
   of atmospheric particulate pollution in  Dhaka. AAQR, 6(4): 334-359. aaqr.org.
Begum, B.A.; Biswas, S.K.; Nasiruddin, M.;  Hossain, A.M.S.; Hopke, P.K. (2009). Source identification of
   Chittagong aerosol by receptor modeling. Environmental Engineering Science, 26(3):  679-689.
Begum, B.A.; Biswas, S.K.; Markwitz, A.;  Hopke, P.K. (2010). Identification of Sources of Fine and
   Coarse Particulate Matter in Dhaka, Bangladesh. AAQR, 10(4): 345-U1514.
Bhanuprasad, S.G.; Venkataraman, C.; Bhushan, M. (2008). Positive matrix factorization  and trajectory
   modelling for source identification: A new look at Indian Ocean Experiment ship observations. Atmos.
   Environ., 42(20):  4836-4852.
Bon, D.M.; Ulbrich, I.M.;  de Gouw, J.A.; Warneke, C.; Kuster, W.C.; Alexander, M.L.; Baker, A.;
   Beyersdorf, A.J.; Blake, D.; Fall, R.; Jimenez, J.L.; Herndon, S.C.; Huey, L.G.; Knighton, W.B.; Ortega,
   J.; Springston, S.; Vargas, O. (2011). Measurements of volatile organic compounds at a suburban
   ground site  (T1) in Mexico City during  the MILAGRO 2006 campaign:  measurement comparison,
   emission ratios, and source attribution. Atmos. Chem. Phys., 11(6): 2399-2421.
Brinkman, G.; Vance, G.; Hannigan, M.P.; Milford, J.B. (2006). Use of synthetic data to evaluate positive
   matrix factorization as a source apportionment tool for PM2§ exposure data. Environ. Sci. Technol.,
   40(6): 1892-1901.
Brown, S.G.; Frankel, A.; Raffuse, S.M.; Roberts, P.T.; Hafner, H.R.; Anderson, D.J. (2007). Source
   apportionment of fine particulate matter in Phoenix, AZ, using  positive matrix factorization. J. Air
   Waste Manage. Assoc., 57(6):  741-752.
Brown, S.G.; Frankel, A.; Hafner, H.R. (2007). Source apportionment of VOCs in the Los Angeles area
   using positive matrix factorization. Atmos. Environ., 41(2): 227-237.
Brown S.G., Wade K.S., and Hafner H.R. (2007) Multivariate receptor modeling workbook. Prepared for
       the U.S. Environmental Protection Agency, Office of Research and Development, Research
       Triangle Park, NC, by Sonoma Technology, Inc., Petaluma, CA, STI-906207.01-3216, August.
Brown, S.G .; Eberly, S.;. Pentti, P.; Morris, G.A. (2014) Methods for Estimating Uncertainty in PMF
   Solutions:  Examples with Ambient Data, submitted.
Bullock, K.R.; Duvall, R.M.; Morris, G.A.; McDow, S.R.; Hays, M.D. (2008). Evaluation of the CMB and
   PMF models using organic molecular  markers in fine particulate matter collected during the Pittsburgh
   Air Quality Study. Atmos. Environ., 42(29): 6897-6904.
Buset, K.C.; Evans, G.J.; Leaitch, W.R.; Brook, J.R.; Toom-Sauntry, D. (2006). Use of advanced receptor
   modelling for analysis of an intensive 5-week aerosol sampling campaign. Atmos. Environ., 40(Suppl.
   2): S482-S499.
Buzcu-Guven,  B.; Brown, S.G.; Frankel, A.; Hafner, H.R.; Roberts, P.T. (2007). Analysis and
   apportionment of organic carbon and fine particulate matter sources at multiple sites in the Midwestern
   United States. J. Air Waste Manage. Assoc., 57(5): 606-619.
Buzcu-Guven,  B.; Fraser, M.P. (2008). Comparison of VOC emissions inventory data with source
   apportionment results for Houston, TX. Atmos. Environ., 42(20): 5032-5043.
Buzcu, B.; Fraser,  M.P. (2006). Source identification and apportionment of volatile organic compounds in
   Houston, TX. Atmos.  Environ., 40(13): 2385-2400. 151:000236773000014.
Bzdusek, P.A.; Lu, J.; Christensen, E.R. (2006) PCB congeners and dechlorination in sediments of
   Sheboygan  River, Wisconsin, determined by matrix factorization.  Environ. Sci.  Technol., 40(1), 120-
   129. Available  at http://dx.doi.org/10.1021/es050083p.
                                             106

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Chan, Y.C.; Cohen, D.D.; Hawas, O.; Stelcer, E.; Simpson, R.; Denison, L; Wong, N.; Hodge, M.;
   Comino, E.; Carswell, S. (2008). Apportionment of sources of fine and coarse particles in four major
   Australian cities by positive matrix factorisation. Atmos. Environ., 42(2):  374-389.
Chan, Y.C.; Hawas, O.;  Hawker, D.;  Vowles, P.; Cohen, D.D.; Stelcer, E.; Simpson, R.; Golding, G.;
   Christensen, E. (2011). Using multiple type composition data and wind data in  PMF analysis to
   apportion and locate  sources of air pollutants. Atmos. Environ., 45(2): 439-449.
Chand, D.; Hegg, D.A.; Wood, R.; Shaw, G.E.; Wallace, D.; Covert, D.S. (2010). Source attribution of
   climatically important aerosol properties measured at Paposo (Chile) during VOCALS. Atmos. Chem.
   Phys., 10(22):  10789-10801.
Chen, L.-W.A.; Watson,  J.G.;  Chow, J.C.; Magliano, K.L. (2007). Quantifying PM25 source contributions
   for the San Joaquin Valley with multivariate receptor models. Environ. Sci. Technol., 41(8): 2818-
   2826.
Chen, L.-W.A.; Lowenthal, D.H.; Watson, J.G.; Koracin, D.; Kumar, N.; Knipping, E.M.; Wheeler, N.;
   Craig, K.; Reid, S. (2010).  Toward effective source apportionment using positive matrix factorization:
   Experiments with simulated PM25 data. J. Air Waste Manage. Assoc., 60(1): 43-54.
   http://pubs.awma.org/gsearch/journal/2010/1 /10.3155-1047-3289.60.1.43.pdf.
Chen, L.-W.A.; Watson,  J.G.;  Chow, J.C.; DuBois, D.W.; Herschberger, L. (2011). PM2 5 source
   apportionment: Reconciling receptor models for U.S. non-urban and urban long-term networks. J. Air
   Waste Manage. Assoc., 61(11): 1204-1217.
Cheng, I.; Lu, J.; Song, X.J. (2009). Studies of potential sources that contributed to atmospheric mercury
   in Toronto, Canada. Atmos. Environ., 43(39):  6145-6158.
Cherian, R.; Venkataraman, C.; Kumar, A.; Sarin, M.M.; Sudheer, A.K.;  Ramachandran, S. (2010).
   Source identification  of aerosols influencing atmospheric extinction:  Integrating PMF and PSCF with
   emission  inventories  and satellite observations. Journal of Geophysical Research-Atmospheres, 115
Chiou, P.; Tang, W.; Lin, C.J.; Chu, H.W.; Tadmor, R.; Ho, T.C. (2008).  Atmospheric aerosols over two
   sites in a southeastern region of Texas. Canadian Journal of Chemical Engineering, 86(3):  421-435.
Chiou, P.; Tang, W.; Lin, C.J.; Chu, H.W.; Ho, T.C. (2009). Atmospheric aerosol over a southeastern
   region of Texas: Chemical composition and possible sources. Environ. Mon. Assess., 14(3): 333-350.
Chiou, P.; Tang, W.; Lin, C.J.; Chu, H.W.; Ho, T.C. (2009). Comparison of atmospheric aerosols between
   two sites over Golden Triangle of Texas. International Journal of Environmental Research, 3(2): 253-
   270.
Choi, E.; Heo, J.B.; Hopke, P.K.; Jin, B.B.; Yi, S.M. (2011). Identification, apportionment, and
   photochemical reactivity of non-methane hydrocarbon sources in Busan, Korea. Water Air and Soil
   Pollution, 215(1-4): 67-82.
Choi, H.W.; Hwang, I.J.; Kim,  S.D.; Kim, D.S.  (2004). Determination of source contribution based on
   aerosol number and mass concentration in the Seoul subway stations. J. Korean Society for Atmos.
   Environ., 20(1):  17-31.
Christensen, W.F.; Schauer, J.J. (2008). Impact of species uncertainty perturbation on the solution
   stability of positive matrix factorization of atmospheric particulate matter data. Environ. Sci. Technol.,
   42(16): 6015-6021.
Chueinta, W.; Hopke, P.K.; Paatero, P. (2000). Investigation of sources of atmospheric aerosol at urban
   and suburban residential areas in Thailand by positive matrix factorization. Atmos. Environ., 34(20):
   3319-3329.
Chueinta, W.; Hopke, P.K.; Paatero, P. (2004). Multilinear model for spatial pattern analysis of the
   measurement of haze and visual  effects project. Environ. Sci. Technol.,  38(2):  544-554.
Cohen, D.D.; Crawford, J.; Stelcer, E.; Bac, V.T. (2010). Characterisation and source apportionment of
   fine particulate sources at  Hanoi from 2001 to 2008. Atmos. Environ., 44(3): 320-328.
Coutant, B.W.; Kelly, T.; Ma, J.; Scott, B.; Wood, B.; Main, H.H. (2002).  Source Apportionment Analysis of
   Air Quality Data:  Phase 1  - Final Report, prepared  by Mid-Atlantic Regional Air Management Assoc.,
   Baltimore, MD, http://www.marama.org/visibility/SA_report/
                                             107

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Cuccia, E.; Bernardoni, V.; Massabo, D.; Prati, P.; Valli, G.; Vecchi, R. (2010). An alternative way to
   determine the size distribution of airborne particulate matter. Atmos.  Environ., 44(27): 3304-3313.
DeCarlo, P.P.; Ulbrich, I.M.; Crounse, J.; de Foy, B.; Dunlea, E.J.; Aiken, A.C.; Knapp, D.; Weinheimer,
   A.J.; Campos, T.; Wennberg, P.O.; Jimenez, J.L. (2010). Investigation of the sources and processing
   of organic aerosol over the Central Mexican Plateau from aircraft measurements during MILAGRO.
   Atmos. Chem. Phys., 10(12):  5257-5280.
Dogan, G.; Gullu, G.; Tuncel, G. (2008). Sources and source regions effecting the aerosol composition of
   the Eastern Mediterranean. MicrochemicalJournal, 88(2): 142-149.
Dreyfus, M.A.; Adou, K.; Zucker, S.M.; Johnston, M.V. (2009). Organic aerosol source apportionment
   from highly time-resolved molecular composition measurements. Atmos. Environ., 43(18): 2901-2910.
Du, S.; Belton, T.J.; Rodenburg, L.A. (2008). Source apportionment of polychlorinated biphenyls in the
   tidal Delaware River. Environ. Sci. Technol., 42(11): 4044-4051.
Du, S.; Wall, S.J.; Cacia, D.; Rodenburg, L.A. (2009). Passive air sampling for polychlorinated biphenyls
   in the Philadelphia metropolitan area. Environ. Sci. Technol., 43(5):  1287-1292.
Du, S.Y.; Rodenburg, L.A. (2007). Source identification of atmospheric PCBs  in Philadelphia/Camden
   using positive matrix factorization followed by the potential source contribution function. Atmos.
   Environ., 41: 8596-8608.
Dutton, S.J.; Vedal, S.; Piedrahita, R.; Milford, J.B.; Miller, S.L.; Hannigan, M.P. (2010). Source
   apportionment using positive matrix factorization on daily measurements of inorganic and organic
   speciated PM25.  Atmos. Environ., 44(23): 2731-2741.
Eatough, D.J.; Anderson, R.R.; Martello, D.V.; Modey, W.K.; Mangelson, N.E. (2006). Apportionment of
   ambient primary and secondary PM2 5 during a 2001 summer intensive study at the NETL Pittsburgh
   site using PMF2 and EPA UNMIX. Aerosol Sci. Technol., 40 (10): 925-940.
Eatough, D.J.; Mangelson, N.F.; Anderson, R.R.; Martello, D.V.; Pekney, N.J.; Davidson, C.I.; Modey,
   W.K. (2007). Apportionment of ambient primary and secondary fine particulate matter during a 2001
   summer intensive study at the CMU supersite and NETL Pittsburgh site. J. Air Waste Manage. Assoc.,
   57(10): 1251-1267.
Eatough, D.J.; Grover, B.D.; Woolwine, W.R.; Eatough, N.L.; Long, R.; Farber, R. (2008). Source
   apportionment of 1 h semi-continuous data during the 2005 Study of Organic Aerosols in Riverside
   (SOAR) using positive matrix factorization. Atmos. Environ., 42(11):  2706-2719.
Eatough, D.J.; Farber, R. (2009). Apportioning visibility degradation to sources of PM2§ using positive
   matrix factorization. J. Air Waste Manage. Assoc., 59(9): 1092-1110.
Eberly, S.I. (2005). EPA PMF 1.1 User's Guide, prepared by U.S.  Environmental Protection Agency,
   Research Triangle Park, NC,
Engel-Cox, J.A.; Weber, S.A. (2007). Compilation and assessment of recent positive matrix factorization
   and UNMIX receptor model studies on fine particulate matter source apportionment for the eastern
   United States. J. Air Waste Manage. Assoc., 57(11): 1307-1316.
Escrig, A.; Monfort, E.; Celades, I.; Querol, X.; Amato, F.; Minguillon, M.C.; Hopke, P.K. (2009).
   Application of optimally scaled target factor analysis for assessing source contribution of ambient
   PM10. J. Air Waste Manage. Assoc., 59(11):  1296-1307.
Favez, O.; El Haddad, I.; Plot, C.; Boreave, A.; Abidi,  E.; Marchand, N.; Jaffrezo, J.L.; Besombes,  J.L.;
   Personnaz, M.B.; Sciare, J.; Wortham, H.; George, C.; D'Anna, B. (2010).  Intercomparison of source
   apportionment models for the estimation of wood burning aerosols during wintertime in an Alpine city
   (Grenoble, France). Atmos. Chem. Phys.,  10(12):  5295-5314.
Friend, A.J.; Ayoko, G.A. (2009). Multi-criteria ranking and source apportionment of fine particulate matter
   in Brisbane, Australia. Environmental Chemistry, 6(5): 398-406.
Friend, A.J.; Ayoko, G.A.; Elbagir, S.G. (2011). Source apportionment of fine particles at a suburban site
   in Queensland, Australia. Environmental Chemistry, 8(2): 163-173.
Fry, J.L.; Kiendler-Scharr, A.; Rollins, A.W.; Brauers, T.; Brown, S.S.; Dorn, H.P.; Dube, W.P.; Fuchs, H.;
   Mensah, A.; Rohrer, F.; Tillmann, R.; Wahner, A.; Wooldridge,  P.J.; Cohen, R.C. (2011). SOA from
   limonene:  Role of NO3 in its generation and degradation. Atmos. Chem. Phys., 11(8): 3879-3894.
                                             108

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Fujita E.M. (2001) Hydrocarbon source apportionment for the 1996 Paso del Norte Ozone Study, The
   Science of the Total Environment 276: 171-184.
Furusjo, E.; Sternbeck, J.; Cousins, A.P.  (2007). PM10 source characterization at urban and highway
   roadside locations. Sci. Total Environ., 387: 206-219.
Gaimoz, C.; Sauvage, S.; Gros, V.; Herrmann, F.; Williams, J.; Locoge, N.; Perrussel, O.; Bonsang, B.;
   d'Argouges, O.; Sarda-Esteve, R.; Sciare, J. (2011). Volatile organic compounds sources in Paris in
   spring 2007. Part II: source apportionment using positive matrix factorisation. Environmental
   Chemistry, 8(1): 91-103.
Gao, N.; Gildemeister, A.E.; Krumhansl, K.; Lafferty, K.; Hopke, P.K.; Kim, E.;  Poirot, R.L. (2006).
   Sources of fine particulate species in ambient air over Lake Champlain Basin, VT. J. Air Waste
   Manage. Assoc., 56(11):  1607-1620.
Gietl, J.K.; Klemm, O. (2009). Source identification of size-segregated aerosol in Munster, Germany, by
   factor analysis. Aerosol Sci. Technol., 43(8): 828-837.
Gilardoni, S.; Vignati, E.; Marmer, E.; Cavalli, F.; Belis, C.; Gianelle, V.; Loureiro, A.; Artaxo, P. (2011).
   Sources of carbonaceous aerosol in the Amazon basin. Atmos. Chem. Phys., 11(6):  2747-2764.
Gildemeister, A.E.; Hopke, P.K.; Kim, E. (2007). Sources of fine urban particulate matter in Detroit, Ml.
   Chemosphere,QQ: 1064-1074.
Gong, F.; Wang,  B.T.; Fung, Y.S.; Chau,  FT. (2005). Chemometric characterization of the quality of the
   atmospheric environment in Hong Kong. Atmos. Environ., 39(34):  6388-6397.
Grahame, T.; Hidy, G.M. (2007). Pinnacles and pitfalls for source apportionment of potential health
   effects from airborne particle exposure. Inhal.  Toxicol., 19(9): 727-744.
Gratz, I.E.; Keeler, G.J. (2011). Sources of mercury in precipitation to Underhill, VT. Atmos. Environ.,
   45(31): 5440-5449.
Green, M.C.; Xu,  J. (2007). Causes of haze in the Columbia River Gorge. J. Air Waste Manage. Assoc.,
   57(8): 947-958.
Graver, B.D.; Eatough, D.J. (2008). Source apportionment of one-hour semi-continuous data using
   positive matrix factorization with total mass (nonvolatile plus semi-volatile) measured by the R&P
   FDMS monitor. Aerosol Sci. Technol., 42(1): 28-39.
Gu, J.W.; Pitz, M.; Schnelle-Kreis, J.; Diemer, J.;  Reller, A.; Zimmermann, R.; Soentgen, J.; Stoelzel, M.;
   Wichmann, H.E.; Peters, A.; Cyrys, J. (2011).  Source apportionment of ambient particles:  Comparison
   of positive matrix factorization analysis applied to particle size distribution and chemical composition
   data. Atmos. Environ., 45(10):  1849-1857.
Hagler, G.S.W.; Bergin, M.H.; Salmon, L.G.; Yu, J.Z.; Wan, E.C.H.; Zheng, M.; Zeng, L.M.; Kiang, C.S.;
   Zhang, Y.H.; Schauer, J.J. (2007). Local and regional anthropogenic influence on PM2§ elements in
   Hong Kong. Atmos. Environ., 41(28):  5994-6004.
Hammond, D.M.; Dvonch, J.T.; Keeler, G.J.; Parker, E.A.; Kamal,  A.S.; Barres, J.A.; Yip, F.Y.; Brakefield-
   Caldwell, W. (2008). Sources of ambient fine particulate matter at two community sites in Detroit,
   Michigan. Atmos. Environ., 42(4):  720-732.
Han, J.S.; Moon,  K.J.; Kim, Y.J. (2006). Identification of potential sources and source regions of fine
   ambient particles measured at Gosan background site in Korea using advanced hybrid receptor model
   combined with positive matrix factorization. Journal of Geophysical Research-Atmospheres,
   111 (D22)ISI:000242740700001.
Han, J.S.; Moon,  K.J.; Lee, S.J.; Kim, Y.J.; Ryu, S.Y.; Cliff, S.S.; Yi, S.M. (2006). Size-resolved source
   apportionment of ambient particles by positive matrix factorization at Gosan background site in East
   Asia. Atmos.  Chem. Phys., 6:  211-223.
Harrison, R.M.; Beddows, D.C.S.; Dall'Osto, M. (2011). PMF analysis of wide-range particle size spectra
   collected on a major highway.  Environ. Sci. Technol., 45(13):  5522-5528.
Hawkins, L.N.; Russell, L.M.; Covert, D.S.; Quinn, P.K.; Bates, T.S. (2010). Carboxylic acids, sulfates,
   and organosulfates in processed continental organic aerosol over the southeast Pacific Ocean during
   VOCALS-REx 2008. Journal of Geophysical Research-Atmospheres, 115
                                              109

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Healy, R.M.; Hellebust, S.; Kourtchev, I.; Allanic, A.; O'Connor, IP.; Bell, J.M.; Healy, D.A.; Sodeau, J.R.;
   Wenger, J.C. (2010). Source apportionment of PM25 in Cork Harbour, Ireland using a combination of
   single particle mass spectrometry and quantitative semi-continuous measurements. Atmos. Chem.
   Phys., 10(19): 9593-9613.
Hedberg, E.; Gidhagen, L; Johansson, C. (2005). Source contributions to PM10 and arsenic
   concentrations in Central Chile using positive matrix factorization. Atmos. Environ., 39(3): 549-561.
Hegg, D.A.; Warren, S.G.; Grenfell, T.C.; Doherty, S.J.; Larson, T.V.; Clarke, A.D. (2009). Source
   attribution of black carbon in Arctic snow. Environ. Sci. Technol., 43(11): 4016-4021.
Hegg, D.A.; Warren, S.G.; Grenfell, T.C.; Doherty, S.J.; Clarke, A.D. (2010). Sources of light-absorbing
   aerosol in Arctic snow and their seasonal variation. Atmos. Chem. Phys., 10(22):  10923-10938.
Hellebust, S.; Allanic, A.; O'Connor, IP.; Wenger, J.C.; Sodeau, J.R. (2010). The use of real-time
   monitoring data to evaluate  major sources of airborne particulate matter. Atmos. Environ., 44(8):
   1116-1125.
Hemann, J.G.; Brinkman, G.L.; Dutton, S.J.; Hannigan, M.P.; Milford, J.B.; Miller, S.L. (2009). Assessing
   positive matrix factorization  model fit:  a new method to estimate uncertainty and bias in factor
   contributions at the measurement time scale. Atmos. Chem. Phys., 9(2): 497-513.
Henry, R.C. (2002). Multivariate receptor models - Current practice and future trends. Chemom. Intell.
   Lab. Sys., 60(1-2): 43-48. doi:10.1016/S0169-7439(01)00184-8.
Henry, R.C.; Christensen, E.R. (2010). Selecting an appropriate multivariate source apportionment model
   result. Environ. Sci. Technol., 44(7):  2474-2481.
Heo, J.B.; Hopke, P.K.; Yi, S.M. (2009). Source apportionment of PM2 5 in Seoul, Korea. Atmos. Chem.
   P/?ys.,9(14): 4957-4971.
Hersey, S.P.; Craven, J.S.; Schilling, K.A.; Metcalf, A.R.; Sorooshian, A.; Chan, M.N.; Flagan, R.C.;
   Seinfeld, J.H.  (2011). The Pasadena Aerosol Characterization Observatory (PACO):  chemical and
   physical analysis of the western Los Angeles basin aerosol. Atmos. Chem. Phys., 11(15): 7417-7443.
Hien, P.O.; Bac, V.T.; Thinh, N.T.H. (2004). PMF receptor modelling of fine and coarse PM10 in air
   masses governing monsoon conditions in Hanoi, northern Vietnam. Atmos. Environ., 38(2):  189-201.
   151:000188210700003.
Hien, P.O.; Bac, V.T.; Thinh, N.T.H. (2005). Investigation of sulfate and nitrate formation on mineral dust
   particles by receptor modeling. Atmos. Environ., 39(38):  7231-7239.  151:000233671700003.
Hodzic, A.; Jimenez, J.L.; Madronich, S.; Canagaratna, M.R.; DeCarlo, P.P.; Kleinman,  L.; Fast, J.  (2010).
   Modeling organic aerosols in a megacity: potential contribution of semi-volatile and intermediate
   volatility primary organic compounds to secondary organic aerosol formation. Atmos. Chem. Phys.,
   10(12): 5491-5514.
Hopke, P.K.; Xie, Y.L.; Paatero, P. (1999). Mixed multiway analysis of airborne particle composition data.
   J. Chemometrics, 13:  343-352.
Hopke, P.K.  (2000). A Guide to Positive Matrix Factorization, prepared by Clarkson University, Clarkson
   University-Department of Chemistry,
Hopke, P.K.; Ramadan, Z.; Paatero, P.; Morris, G.A.; Landis, M.S.; Williams, R.W.; Lewis, C.W. (2003).
   Receptor modeling of ambient and personal exposure samples: 1998 Baltimore Particulate Matter
   Epidemiology-Exposure Study. Atmos. Environ., 37(23):  3289-3302.  doi:  10.1016/S1352-
   2310(03)00331-5.
Hopke, P.K.; Ito, K.; Mar, T.; Christensen, W.F.; Eatough, D.J.; Henry, R.C.; Kim, E.; Laden, F.; Lall, R.;
   Larson, T.V.; Liu, H.; Neas,  L.;  Pinto, J.; Stolzel, M.; Suh, H.; Paatero, P.; Thurston, G.D. (2006). PM
   source apportionment and health effects: 1. Intercom pa rison of source apportionment results. J.
   Expo. Anal. Environ. Epidemiol., 16: 275-286. doi: 10.1038/sj.jea.7500458.
Hopke, P.K.  (2010). Discussion of "Sensitivity of a molecular marker based positive matrix factorization
   model to the number of receptor observations" by YuanXun Zhang, Rebecca J. Sheesley, Min-Suk
   Bae and James J. Schauer. Atmos. Environ., 44(8): 1138.
                                              110

-------
U.S.  Environmental Protection Agency	EPA PMF 5.0 User Guide


Hu, D.; Bian, Q.J.; Lau, A.K.H.; Yu, J.Z. (2010). Source apportioning of primary and secondary organic
   carbon in summer PM25 in Hong Kong using positive matrix factorization of secondary and primary
   organic tracer data. Journal of Geophysical Research-Atmospheres, 115
Hu, S.H.; McDonald, R.; Martuzevicius, D.; Biswas, P.; Grinshpun, S.A.; Kelley, A.; Reponen, T.; Lockey,
   J.; LeMasters, G. (2006). UNMIX modeling of ambient PM2 5 near an interstate highway in Cincinnati,
   OH, USA. Atmos. Environ., 40(Suppl. 2): S378-S395.
Huang, S.L.; Arimoto, R.; Rahn, K.A. (2001). Sources and source variations for aerosol at Mace Head,
   Ireland. Atmos. Environ., 35(8):  1421-1437.
Huang, X.F.; Yu, J.Z.; He, L.Y.; Yuan, Z.B. (2006). Water-soluble organic carbon and oxalate in aerosols
   at a coastal urban site in China: Size distribution characteristics, sources, and formation mechanisms.
   Journal of Geophysical Research-Atmospheres, 111 (D22)
Huang, X.F.; Yu, J.Z.; Yuan, Z.B.; Lau, A.K.H.; Louie, P.K.K. (2009). Source analysis of high particulate
   matter days in Hong Kong. Atmos. Environ., 43(6): 1196-1203.
Huang, X.F.; Zhao, Q.B.; He, L.Y.; Hu, M.; Bian, Q.J.; Xue, L.A.; Zhang, Y.H. (2010). Identification of
   secondary organic aerosols based on  aerosol mass spectrometry. Science China-Chemistry, 53(12):
   2593-2599.
Huang, X.F.; He,  L.Y.; Hu, M.; Canagaratna, M.R.; Kroll, J.H.; Ng, N.L.; Zhang, Y.H.; Lin, Y.; Xue, L; Sun,
   T.L.; Liu, X.G.; Shao, M.; Jayne, J.T.; Worsnop, D.R. (2011). Characterization of submicron aerosols
   at a rural  site in Pearl River Delta of China using an Aerodyne High-Resolution Aerosol Mass
   Spectrometer. Atmos. Chem. Phys., 11(5): 1865-1877.
Hubble, M. (2000). Phoenix Source Apportionment Studies:  Positive Matrix Factorization (PMF) and
   Unmix Applications for PM25 Source Apportionment,  prepared by Arizona Department of
   Environmental Quality, Arizona Department of Environmental Quality-Phoenix, AZ,
Huffman, J.A.; Docherty, K.S.; Aiken, A.C.; Cubison, M.J.; Ulbrich, I.M.; DeCarlo, P.F.; Sueper, D.; Jayne,
   J.T.; Worsnop, D.R.;  Ziemann, P.J.; Jimenez, J.L. (2009). Chemically-resolved aerosol volatility
   measurements from two megacity field studies. Atmos. Chem. Phys., 9(18):  7161-7182.
Hwang, I.; Hopke, P.K. (2006). Comparison of source apportionments of fine particulate matter at two
   San Jose Speciation  Trends Network sites. J. Air Waste Manage. Assoc., 56(9):  1287-1300.
Hwang, I.; Hopke, P.K. (2007). Estimation of source apportionment and  potential source locations Of
   PM25 at a west coastal IMPROVE site. Atmos. Environ., 41(3): 506-518.
Hwang, I.; Hopke, P.K.; Pinto, J.P. (2008). Source apportionment and spatial distributions of coarse
   particles during the Regional Air Pollution Study. Environ. Sci. Technol., 42(10): 3524-3530.
Hwang, I.J.; Bong, C.K.; Lee, T.J.; Kim, D.S. (2002). Source identification and quantification of coarse
   and fine particles by TTFA and PMF. J. Korean Society for Atmos. Environ., 18(E4):  203-213.
Hwang, I.J.; Kim, D.S. (2003). Estimation of quantitative source contribution of ambient PM10 using the
   PMF model. J. Korean Society for Atmos. Environ., 19(6):  719-731.
lijima, A.; Tago, H.; Kumagai, K.; Kato, M.; Kozawa, K.; Sato, K.; Furuta, N. (2008). Regional and
   seasonal  characteristics of emission sources of fine airborne particulate matter collected in the  center
   and suburbs of Tokyo, Japan as determined by multielement analysis and source receptor models. J.
   Environ. Monit,  10(9): 1025-1032.
Ito, K.; Xue, N.; Thurston, G. (2004). Spatial variation of PM2 5 chemical species and source-apportioned
   mass concentrations in New York City. Atmos. Environ., 38(31): 5269-5282.
Jacobson, M.Z.; Kaufman, Y.J. (2006). Wind reduction by aerosol particles. Geophys. Res.  Lett., 33(24)
Jaeckels, J.M.; Bae, M.S.; Schauer, J.J. (2007).  Positive matrix factorization (PMF) analysis of molecular
   marker measurements to quantify the sources of organic aerosols. Environ. Sci. Technol., 41 (16):
   5763-5769.
Jagoda, C.A.; Chambers, S.; David, D.C.; Dyer,  L.; Wang, T.; Zahorowski, W. (2007). Receptor modelling
   using positive matrix  factorisation, back trajectories and radon-222. Atmos. Environ., 41(32): 6823-
   6837.
                                             111

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Jeong, C.H.; Evans, G.J.; Dann, T.; Graham, M.; Herod, D.; bek-Zlotorzynska, E.; Mathieu, D.; Ding, L;
   Wang, D. (2008). Influence of biomass burning on wintertime fine particulate matter: Source
   contribution at a valley site in rural British Columbia. Atmos. Environ., 42(16): 3684-3699.
Jia, Y.L.; Clements, A.L.; Fraser, M.P. (2010). Saccharide composition in atmospheric particulate matter
   in the southwest US and estimates of source contributions. J. Aerosol Sci., 41(1): 62-73.
Jia, Y.L.; Fraser, M. (2011). Characterization of saccharides in size-fractionated ambient particulate
   matter and aerosol sources: The contribution of Primary Biological Aerosol Particles (PBAPs) and soil
   to ambient particulate matter. Environ. Sci. Technol., 45(3): 930-936.
Jimenez, J.; Wu, C.F.; Claiborn, C.; Gould, T.; Simpson, C.D.; Larson, T.; Liu, L.J.S. (2006). Agricultural
   burning smoke in eastern Washington - part 1: Atmospheric characterization. Atmos. Environ., 40(4):
   639-650.
Johnson, K.S.; de Foy, B.; Zuberi, B.; Molina, L.T.; Molina, M.J.; Xie, Y.; Laskin, A.; Shutthanandan, V.
   (2006). Aerosol composition and source apportionment in the Mexico City Metropolitan Area with
   PIXE/PESA/STIM and multivariate analysis. Atmos. Chem. Phys., 6(12): 4591-4600.
Jorquera, H.; Rappengluck, B. (2004). Receptor modeling of ambient VOC  at Santiago, Chile. Atmos.
   Environ., 38(25): 4243-4263.
Junninen, H.; Monster, J.; Rey, M.; Cancelinha, J.; Douglas, K.; Duane, M.; Forcina, V.; Muller, A.; Lagler,
   F.; Marelli, L.; Borowiak, A.; Niedzialek, J.;  Paradiz, B.; Mira-Salama, D.; Jimenez, J.; Hansen, U.;
   Astorga, C.; Stanczyk, K.; Viana, M.; Querol, X.;  Duvall, R.M.; Morris, G.A.; Tsakovski, S.; Wahlin, P.;
   Horak, J.; Larsen, B.R. (2009). Quantifying the impact of residential heating on the urban air quality in
   a typical European coal  combustion region. Environ. Sci. Technol., 43(20): 7964-7970.
Juntto, S.; Paatero, P. (1994). Analysis of daily precipitation data by positive matrix factoriztion.
   Environmetrics, 5: 127-144.
Juvela, M.;  Lehtinen,  K.; Paatero, P. (1996). The use of positive matrix factorization  in the analysis of
   molecular line spectra. ROYAL ASTR. SOC., 280(2)
Karanasiou, A.; Moreno, T.; Amato, F.;  Lumbreras, J.; Narros, A.; Borge, R.; Tobias, A.; Boldo, E.;
   Linares, C.; Pey, J.; Reche, C.; Alastuey, A.; Querol, X. (2011). Road dust contribution to PM levels -
   Evaluation of the effectiveness of street washing activities by means of Positive Matrix Factorization.
   Atmos. Environ., 45(13): 2193-2201.
Karanasiou, A.A.; Siskos, P.A.; Eleftheriadis, K. (2009). Assessment of source apportionment by Positive
   Matrix Factorization analysis on fine and coarse  urban aerosol size fractions. Atmos. Environ., 43(21):
   3385-3395.
Karnae, S.; Kuruvilla, J. (2011). Source apportionment of fine  particulate matter measured in an
   industrialized coastal urban area of South Texas. Atmos. Environ., 45(23):  3769-3776.
Kasumba, J.; Hopke, P.K.;  Chalupa, D.C.; Utell, M.J. (2009). Comparison of sources of submicron particle
   number concentrations measured at two sites in  Rochester, NY. Sci. Total Environ., 407(18): 5071-
   5084.
Ke, L.; Liu, W.; Wang, Y.; Russell, A.G.; Edgerton, E.S.; Zheng, M.  (2008).  Comparison of PM25 source
   apportionment using positive matrix factorization and molecular  marker-based chemical mass balance.
   Sci. Total Environ., 394(2-3): 290-302.
Keeler, G.J.; Landis, M.S.;  Morris, G.A.; Christiansen, E.M.; Dvonch, J.T. (2006). Sources of mercury wet
   deposition in Eastern Ohio, USA. Environ. Sci. Technol., 40(19): 5874-5881. 151:000240826000015.
Kertesz, Z.; Szoboszlai, Z.; Angyal, A.;  Dobos, E.; Borbely-Kiss, I. (2010). Identification  and
   characterization of fine and coarse particulate matter sources in  a  middle-European  urban
   environment. Nuclear Instruments & Methods in  Physics Research Section B-Beam Interactions with
   Materials and Atoms, 268(11-12):  1924-1928.
Kim, E.; Hopke, P.K.; Paatero, P.; Edgerton, E.S. (2003). Incorporation of parametric factors into
   multilinear receptor model studies of Atlanta aerosol. Atmos. Environ., 37 (36): 5009-5021.
Kim, E.; Hopke, P.K.; Edgerton, E.S. (2003). Source identification of Atlanta aerosol by  positive matrix
   factorization. J. Air Waste Manage. Assoc., 53(6):  731-739.
                                              112

-------
U.S.  Environmental Protection Agency	EPA PMF 5.0 User Guide


Kim, E.; Larson, T.V.; Hopke, P.K.; Slaughter, C.; Sheppard, I.E.; Claiborn, C. (2003). Source
   identification of PM2§ in an arid northwest U.S. city by positive matrix factorization. Atmos. Res., 66:
   291-305.
Kim, E.; Hopke, P.K.; Larson, T.V.; Covert, D.S. (2004). Analysis of ambient particle size distributions
   using UNMIX and positive matrix factorization. Environ. Sci. Technol., 38(1): 202-209.
Kim, E.; Hopke, P.K. (2004). Comparison between conditional probability function and nonparametric
   regression for fine particle source directions. Atmos. Environ., 38(28): 4667-4673.
Kim, E.; Hopke, P.K.; Larson, T.V.; Maykut, N.N.; Lewtas, J. (2004).  Factor analysis of Seattle fine
   particles. Aerosol Sci. Technol., 38(7): 724-738.
Kim, E.; Hopke, P.K.; Edgerton, E.S.  (2004). Improving source identification of Atlanta aerosol using
   temperature resolved carbon fractions in positive matrix factorization. Atmos. Environ., 38(20):  3349-
   3362.
Kim, E.; Hopke, P.K. (2004). Improving source identification of fine particles in a rural northeastern US
   area utilizing temperature-resolved carbon fractions. Journal of Geophysical Research-Atmospheres,
   109(009204): 1-13. doi:2003JD004199.
Kim, E.; Hopke, P.K. (2004). Source apportionment of fine particles at Washington, DC, utilizing
   temperature-resolved carbon fractions. J. Air Waste Manage. Assoc., 54(7): 773-785.
Kim, E.; Brown, S.G.; Hafner, H.R.; Hopke, P.K. (2005). Characterization of non-methane volatile organic
   compounds sources in Houston during 2001 using positive matrix factorization. Atmos. Environ.,
   39(32):  5934-5946.
Kim, E.; Hopke, P.K. (2005). Identification of fine particle sources in mid-Atlantic US area. Water Air and
   Soil Pollution, 168(1 -4): 391-421.
Kim, E.; Hopke, P.K. (2005). Improving source apportionment of fine particles in the eastern United
   States utilizing temperature-resolved carbon fractions. J. Air Waste Manage. Assoc., 55(10): 1456-
   1463.
Kim, E.; Hopke, P.K.; Kenski, D.M.; Koerber, M. (2005). Sources of fine particles in a rural Midwestern US
   area. Environ. Sci. Technol., 39(13):  4953-4960.
Kim, E.; Hopke, P.K.; Pinto, J.P.; Wilson, W.E. (2005). Spatial variability of fine particle mass,
   components, and source contributions during the Regional Air Pollution Study in St. Louis. Environ.
   Sci. Technol., 39(11): 4172-4179.
Kim, E.; Hopke, P.K. (2006). Characterization of fine particle sources in the Great Smoky Mountains area.
   Sci. Total Environ., 368(2-3): 781-794.
Kim, E.; Hopke, P.K. (2007). Comparison between sample-species specific uncertainties and estimated
   uncertainties for the source apportionment of the speciation trends network data. Atmos. Environ.,
   41(3): 567-575.
Kim, E.; Hopke, P.K. (2007). Source identifications of airborne fine particles using positive matrix
   factorization and  US environmental protection agency positive matrix factorization. J. Air Waste
   Manage. Assoc., 57(7): 811-819.
Kim, E.; Hopke, P.K. (2008). Source characterization of ambient fine particles at multiple sites in the
   Seattle area. Atmos. Environ.,  42(24): 6047-6056.
Kim, E.; Turkiewicz,  K.; Zulawnick, S.A.; Magliano, K.L. (2010). Sources of fine particles in the South
   Coast area, California.  Atmos.  Environ., 44(26): 3095-3100.
Kim, M.; Deshpande, S.R.; Crist, K.C. (2007). Source apportionment of fine particulate matter (PM25) at a
   rural Ohio River Valley site. Atmos. Environ., 41: 9231-9243.
Lambe, AT.; Logue, J.M.; Kreisberg, N.M.; Hering, S.V.; Worton, D.R.;  Goldstein, A.H.; Donahue, N.M.;
   Robinson, A.L. (2009). Apportioning black carbon to sources using highly time-resolved ambient
   measurements of organic molecular markers in Pittsburgh. Atmos. Environ., 43(25): 3941-3950.
Lan, Z.J.; Chen, D.L.; Li, X.A.; Huang, X.F.; He, L.Y.; Deng, Y.G.; Feng, N.; Hu, M. (2011). Modal
   characteristics of carbonaceous aerosol size distribution in an urban atmosphere of South China.
   Atmos. Res., 100(1): 51-60.
                                              113

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Lanz, V.A.; Alfarra, M.R.; Baltensperger, U.; Buchmann, B.; Hueglin, C.; Prevot, A.S.H. (2007). Source
   apportionment of submicron organic aerosols at an urban site by factor analytical modelling of aerosol
   mass spectra. Atmos. Chem. Phys., 7(6):  1503-1522.
Lanz, V.A.; Hueglin, C.;  Buchmann, B.; Hill, M.; Locher, R.; Staehelin, J.; Reimann, S. (2008). Receptor
   modeling of C-2-C-7  hydrocarbon sources at an urban  background site in Zurich, Switzerland:
   changes between 1993-1994 and 2005-2006. Atmos. Chem. Phys., 8(9): 2313-2332.
Lanz, V.A.; Henne, S.; Staehelin, J.; Hueglin,  C.; Vollmer,  M.K.; Steinbacher, M.; Buchmann, B.;
   Reimann, S. (2009).  Statistical analysis of anthropogenic non-methane VOC variability at a European
   background location  (Jungfraujoch, Switzerland). Atmos.  Chem. Phys., 9(10): 3445-3459.
Lanz, V.A.; Prevot, A.S.H.; Alfarra, M.R.; Weimer, S.; Mohr, C.; DeCarlo, P.P.;  Gianini, M.F.D.; Hueglin,
   C.; Schneider, J.; Favez, O.; D'Anna, B.; George, C.; Baltensperger, U. (2010). Characterization of
   aerosol chemical composition with aerosol mass spectrometry in Central Europe: An overview.
   Atmos. Chem. Phys., 10(21):  10453-10471.
Lapina, K.; Paterson, K.G. (2004). Assessing  source characteristics of PM25 in the eastern United States
   using positive matrix factorization.  J. Air Waste Manage. Assoc., 54(9): 1170-1174.
Larsen, R.K., III; Baker,  J.E. (2003). Source apportionment of polycyclic aromatic hydrocarbons in the
   urban  atmosphere: A comparison  of three methods. Environ. Sci.  Technol., 37: 1873-1881.
Larson, T.; Gould, T.; Simpson, C.;  Liu, L.J.S.; Claiborn, C.; Lewtas, J. (2004).  Source apportionment of
   indoor, outdoor, and  personal PM25 in Seattle, WA, using positive  matrix factorization. J.  Air Waste
   Manage. Assoc., 54(9):  1175-1187.
Larson, T.V.; Covert, D.S.; Kim, E.;  Elleman, R.; Schreuder, A.B.; Lumley, T. (2006). Combining size
   distribution and chemical species measurements into a multivariate receptor model of PM2 5. Journal
   of Geophysical Research-Atmospheres, 111 (D10): D1OS09. doi:10.1029/2005JD006285.
Latella, A.; Stani, G.; Cobelli, L.; Duane, M.; Junninen, H.;  Astorga, C.; Larsen, B.R. (2005).
   Semicontinuous GC  analysis and receptor modelling for source apportionment of ozone precursor
   hydrocarbons in Bresso, Milan, 2003. J. Chromatogr. A, 1071(1-2): 29-39.
Laupsa, H.; Denby, B.; Larssen, S.; Schaug, J. (2009). Source apportionment of particulate matter (PM25)
   in an urban  area using dispersion,  receptor and inverse modelling. Atmos. Environ., 43(31):  4733-
   4744.
Lee, E.; Chan, C.K.; Paatero, P. (1999). Application of positive matrix factorization in source
   apportionment of particulate pollutants  in Hong Kong. Atmos. Environ., 33(19): 3201-3212.
Lee, J.H.; Yoshida, Y.; Turpin, B.J.; Hopke, P.K.; Poirot, R.L.; Lioy, P.J.; Oxley, J.C. (2002).  Identification
   of sources contributing to mid-Atlantic regional aerosol. J. Air Waste Manage. Assoc., 52(10):  1186-
   1205.
Lee, J.H.; Gigliotti, C.L.; Offenberg,  J.H.; Eisenreich, S.J.;  Turpin, B.J. (2004). Sources of polycyclic
   aromatic hydrocarbons to the Hudson River Airshed. Atmos. Environ.,  38(35): 5971-5981.
Lee, J.H.; Hopke, P.K. (2006). Apportioning sources of PM25 in St. Louis,  MO using speciation trends
   network data. Atmos. Environ., 40(Suppl. 2):  S360-S377.
Lee, J.H.; Hopke, P.K.; Turner, J.R. (2006). Source identification of airborne PM25 at the St.  Louis-
   Midwest Supersite. Journal of Geophysical Research-Atmospheres, 111 (D1OS10):  1-12.
   doi:10.1029/2005JD006329.
Lee, P.K.H.; Brook, J.R.; Dabek-Zlotorzynska, E.; Mabury, S.A. (2003). Identification of the major sources
   contributing to PM25  observed in Toronto.  Environ. Sci. Technol., 37(21): 4831-4840.
Lee, S.; Liu, W.; Wang, Y.H.; Russell, A.G.; Edgerton, E.S. (2008). Source apportionment of PM25:
   Comparing PMF and CMB results  for four ambient monitoriniz sites in the southeastern United States.
   Atmos. Environ., 42(18): 4126-4137.
Lei, C.; Landsberger, S.; Basunia, S.;  Tao, Y.  (2004). Study of PM25 in Beijing  suburban site by neutron
   activation analysis and source apportionment. Journal  of Radioanalytical and Nuclear Chemistry,
   261(1): 87-94. 151:000221903800011.
Lestari, P.; Mauliadi, Y.D. (2009). Source apportionment of particulate matter at urban mixed site in
   Indonesia using PMF. Atmos.  Environ., 43(10):  1760-1770.
                                             114

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Leuchner, M.; Rappengluck, B. (2010). VOC source-receptor relationships in Houston during TexAQS-ll.
   Atmos. Environ., 44(33):  4056-4067.
Li, Z.; Hopke, P.K.; Husain, L; Qureshi, S.; Dutkiewicz, V.A.; Schwab, J.J.; Drewnick, F.; Demerjian, K.L.
   (2004). Sources of fine particle composition in New York city. Atmos. Environ., 38(38):  6521-6529.
Liang, J.Y.;  Kaduwela, A.; Jackson, B.; Gurer, K.; Allen, P. (2006). Off-line diagnostic analyses of a three-
   dimensional PM model using two matrix factorization methods. Atmos. Environ., 40(30): 5759-5767.
   151:000241217500003.
Liang, J.Y.;  Fairley, D. (2006). Validation  of an efficient non-negative matrix factorization method and its
   preliminary application in Central California. Atmos. Environ., 40(11): 1991-2001.
Liggio, J.; Li, S.M.; Vlasenko, A.; Sjostedt, S.; Chang, R.; Shantz, N.; Abbatt, J.; Slowik, J.G.; Bottenheim,
   J.W.; Brickell, P.C.; Stroud, C.; Leaitch, W.R. (2010). Primary and secondary organic aerosols in
   urban air masses intercepted at a rural site. Journal of Geophysical Research-Atmospheres, 115
Lingwall, J.W.; Christensen, W.F. (2007). Pollution source apportionment using a priori information and
   positive matrix factorization. Chemom. Intell. Lab. Sys., 87(2): 281-294.
Liu, S.; Takahama, S.; Russell, L.M.; Gilardoni, S.; Baumgardner, D. (2009). Oxygenated organic
   functional groups and their sources in single and submicron organic particles in MILAGRO 2006
   campaign. Atmos. Chem. Phys., 9(18): 6849-6863.
Liu, W.;  Hopke, P.K.; Han, Y.J.; Yi, S.M.;  Holsen, T.M.; Cybart, S.; Kozlowski, K.; Milligan, M. (2003).
   Application of receptor modeling to atmospheric constituents at Potsdam and Stockton, NY. Atmos.
   Environ., 37(36):  4997-5007.
Liu, W.;  Hopke, P.K.; VanCuren, R.A. (2003). Origins of fine aerosol mass  in the western United States
   using positive matrix factorization. Journal of Geophysical Research-Atmospheres,
   108(D23)doi:10.1029/2006JD007978.
Liu, W.; Wang, Y.H.;  Russell, A.; Edgerton, E.S. (2005). Atmospheric aerosol over two urban-rural pairs in
   the southeastern United States: Chemical composition and possible sources. Atmos. Environ., 39(25):
   4453-4470.
Liu, W.; Wang, Y.H.;  Russell, A.; Edgerton, E.S. (2006). Enhanced source  identification of southeast
   aerosols using temperature-resolved carbon fractions and gas phase components. Atmos. Environ.,
   40(Suppl. 2):  S445-S466.
Logue, J.M.; Small, M.J.; Robinson, A.L. (2009). Identifying priority pollutant sources: Apportioning air
   toxics risks using  positive matrix factorization. Environ. Sci. Technol., 43(24): 9439-9444.
Lonati, G.; Ozgen, S.; Giugliano, M.  (2007). Primary and secondary  carbonaceous species in PM25
   samples in Milan (Italy). Atmos. Environ., 41(22): 4599-4610.
Lopez, M.L.; Ceppi, S.; Palancar, G.G.; Olcese, L.E.; Tirao, G.; Toselli, B.M. (2011). Elemental
   concentration and source identification of PM10 and PM2 5 by SR-XRF in Cordoba City, Argentina.
   Atmos. Environ., 45(31):  5450-5457.
Lowenthal, D.H.; Watson, J.G.; Koracin, D.; Chen, L.-W.A.; DuBois,  D.; Vellore, R.; Kumar, N.; Knipping,
   E.M.; Wheeler, N.; Craig, K.; Reid, S. (2010). Evaluation of regional scale receptor modeling. J. Air
   Waste Manage. Assoc., 60(1):  26-42. http://pubs.awma.org/gsearch/iournal/2010/1/10.3155-1047-
   3289.60.1.26.pdf.
Lowenthal, D.H.; Rahn,  K.A. (1988). Tests of regional elemental tracers of pollution aerosols. 2.
   Sensitivity of signatures and apportionments to variations in operating parameters. Atmos. Environ.,
   22:  420-426.
Lu, J.H.; Wu, L.S. (2004). Technical details and programming guide  fora general two-way positive matrix
   factorization algorithm. Journal of Chemometrics, 18(12): 519-525. 151:000229692100001.
Markus, A.;  Matsaev, V. (1994). The failure of factorization of positive matrix functions on noncircular
   contours. LINEAR ALGEBRA & APPL, 208/209: 231.
Marmur, A.; Mulholland, J.A.; Russell, A.G. (2007). Optimized variable source-profile approach for source
   apportionment. Atmos. Environ., 41(3): 493-505.
                                              115

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Marmur, A.; Liu, W.; Wang, Y.; Russell, A.G.; Edgerton, E.S. (2009). Evaluation of model simulated
   atmospheric constituents with observations in the factor projected space:  CMAQ simulations of
   SEARCH measurements. Atmos. Environ., 43(11):  1839-1849.
Martello, D.V.; Pekney, N.J.; Anderson, R.R.; Davidson, C.I.; Hopke, P.K.; Kim, E.; Christensen, W.F.;
   Mangelson, N.F.; Eatough, D.J. (2008). Apportionment of ambient primary and secondary fine
   particulate matter at the Pittsburgh National Energy Laboratory particulate matter characterization site
   using positive matrix factorization and a potential source contributions function analysis. J. Air Waste
   Manage. Assoc., 58(3):  357-368.
Mazzei, F.; Lucarelli, F.; Nava, S.; Prati, P.; Valli, G.; Vecchi, R. (2007). A new methodological approach:
   The combined use of two-stage streaker samplers and optical particle counters for the characterization
   of airborne particulate matter. Atmos. Environ., 41(26):  5525-5535.
Mazzei, F.; D'Alessandro, A.; Lucarelli, F.; Nava, S.; Prati, P.; Valli, G.; Vecchi, R. (2008).
   Characterization of particulate matter sources in an urban environment. Sci. Total Environ., 401(1-3):
   81-89.
Mazzei, F.; Prati, P. (2009). Coarse particulate matter apportionment around a steel smelter plant. J. Air
   Waste Manage. Assoc., 59(5): 514-519.
McGuire, M.L.;  Jeong, C.H.; Slowik, J.G.; Chang, R.Y.W.; Corbin, J.C.;  Lu, G.; Mihele, C.;  Rehbein,
   P.J.G.;  Sills, D.M.L.; Abbatt, J.P.D.; Brook, J.R.; Evans, G.J. (2011). Elucidating determinants of
   aerosol composition through particle-type-based receptor modeling.  Atmos. Chem. Phys., 11(15):
   8133-8155.
McMeeking, G.R.; Morgan, W.T.; Flynn, M.; Highwood, E.J.; Turnbull, K.; Haywood, J.; Coe, H. (2011).
   Black carbon aerosol mixing state, organic aerosols and aerosol optical properties over the United
   Kingdom. Atmos. Chem. Phys., 11(17): 9037-9052.
Mehta, B.;  Venkataraman, C.; Bhushan, M.; Tripathi, S.N. (2009). Identification of sources affecting fog
   formation using receptor modeling approaches and inventory estimates of sectoral emissions. Atmos.
   Environ., 43(6):  1288-1295.
Miller, S.L.; Anderson, M.J.; Daly, E.P.; Milford, J.B. (2002). Source apportionment of exposures to
   volatile  organic compounds. I. Evaluation of receptor models using simulated exposure data. Atmos.
   Environ., 36(22):  3629-3641.
Mohr, C.; Richter, R.;  DeCarlo, P.F.; Prevot, A.S.H.; Baltensperger, U. (2011). Spatial variation of
   chemical composition and sources of submicron aerosol in Zurich during wintertime using mobile
   aerosol mass spectrometer data. Atmos. Chem. Phys., 11(15): 7465-7482.
Mooibroek, D.;  Schaap,  M.; Weijers, E.P.; Hoogerbrugge, R. (2011). Source apportionment and spatial
   variability of PM(2.5) using measurements at five sites in the Netherlands. Atmos. Environ., 45(25):
   4180-4191.
Moon, K.J.; Han, J.S.; Ghim, Y.S.; Kim, Y.J. (2008). Source apportionment of fine carbonaceous particles
   by positive matrix factorization at Gosan background site in East Asia. Environ. Int., 34(5):  654-664.
Moreno, T.; Perez, N.; Querol, X.; Amato,  F.; Alastuey, A.; Bhatia, R.; Spiro, B.; Hanvey, M.; Gibbons, W.
   (2010).  Physicochemical variations in atmospheric aerosols recorded at sea onboard the Atlantic-
   Mediterranean 2008 Scholar Ship cruise (Part II): Natural versus anthropogenic influences revealed
   by PM10 trace element geochemistry. Atmos. Environ., 44(21-22): 2563-2576.
Morino, Y.; Ohara, T.; Yokouchi, Y.; Ooki,  A. (2011). Comprehensive source apportionment of volatile
   organic compounds using observational data, two receptor models, and an emission  inventory in
   Tokyo metropolitan area. Journal of Geophysical Research-Atmospheres, 116
Morishita, M.; Keeler,  G.J.;  Wagner, J.G.;  Harkema, J.R. (2006). Source identification  of ambient PM2§
   during summer inhalation exposure studies in Detroit, Ml. Atmos. Environ., 40(21): 3823-3834.
   151:000238827200001.
Morishita, M.; Keeler,  G.J.;  Kamal, A.S.; Wagner, J.G.; Harkema, J.R.; Rohr, A.C. (2011). Identification of
   ambient PM2 5 sources and analysis of pollution episodes in Detroit,  Michigan using highly time-
   resolved measurements. Atmos. Environ., 45(8):  1627-1637.
                                              116

-------
U.S.  Environmental Protection Agency	EPA PMF 5.0 User Guide


Ng, N.L.; Herndon, S.C.; Trimborn, A.; Canagaratna, M.R.; Croteau, P.L.; Onasch, T.B.; Sueper, D.;
   Worsnop, D.R.; Zhang, Q.; Sun, Y.L.; Jayne, J.T. (2011). An Aerosol Chemical Speciation Monitor
   (ACSM) for routine monitoring of the composition and mass concentrations of ambient aerosol.
   Aerosol Sci. Technol., 45(7): 770-784.
Ng, N.L.; Canagaratna,  M.R.; Jimenez, J.L.; Zhang, Q.; Ulbrich, I.M.; Worsnop, D.R. (2011).  Real-time
   methods for estimating organic component mass concentrations from Aerosol Mass Spectrometer
   data. Environ. Sci. Technol., 45(3): 910-916.
Nicolas, J.; Chiari, M.; Crespo, J.; Orellana, I.G.; Lucarelli, F.; Nava, S.; Pastor, C.; Yubero, E. (2008).
   Quantification of Saharan and local dust impact in an arid Mediterranean area by the positive matrix
   factorization (PMF) technique. Atmos. Environ., 42(39): 8872-8882.
Nicolas, J.; Chiari, M.; Crespo, J.; Galindo, N.; Lucarelli, F.; Nava, S.; Yubero, E. (2011). Assessment of
   potential source regions of PM2§ components at a southwestern Mediterranean site.  Tellus Series B-
   Chemical and Physical Meteorology, 63(1): 96-106.
Norman, A.L.; Barrie, L.A.; Toom-Sauntry, D.; Sirois, A.; Krouse, H.R.; Li, S.M.; Sharma, S. (1999).
   Sources of aerosol sulphate at Alert: Apportionment using  stable isotopes. J. Geophys. Res.,
   104(09): 11619-11631.
 Norris, G., Vedantham, R., Wade, K., Zahn, P., Brown, S., Paatero, P., Eberly, S., and Foley, C. (2009)
    Guidance document for PMF applications with the Multilinear Engine. EPA 600/R-09/032, Prepared
    for the U.S. Environmental Protection Agency, Research Triangle Park, NC, April.
Ogulei, D.; Hopke, P.K.; Wallace, L.A. (2006). Analysis of indoor particle size distributions  in  an occupied
   townhouse  using  positive matrix factorization. Indoor Air, 16(3): 204-215.
Ogulei, D.; Hopke, P.K.; Zhou, L.M.; Pancras, J.P.; Nair, N.; Ondov, J.M. (2006). Source apportionment of
   Baltimore aerosol from combined size distribution and chemical composition data. Atmos. Environ.,
   40(Suppl. 2):  S396-S410.
Ogulei, D.; Hopke, P.K.; Ferro, A.R.; Jaques, P.A. (2007). Factor analysis of submicron particle size
   distributions near a major United States-Canada trade bridge. J. Air Waste Manage. Assoc., 57(2):
   190-203.
Oh, M.S.; Lee, T.J.; Kim, D.S. (2011). Quantitative source apportionment of size-segregated  particulate
   matter at urbanized local site in Korea. AAQR, 11 (3): 247-264.
Owega, S.; Khan, B.U.Z.; D'Souza, R.; Evans, G.J.; Fila, M.; Jervis, R.E. (2004). Receptor modeling  of
   Toronto PM25 characterized by aerosol laser ablation mass spectrometry. Environ. Sci. Technol.,
   38(21):  5712-5720.
Paatero, J.; Hopke, P.K.; Song, X.H.;  Ramadan, Z. (2002). Understanding  and controlling rotations in
   factor analytic models. Chemom. Intell. Lab. Sys., 60(1-2):  253-264. doi:10.1016/S0169-
   7439(01)00200-3.
Paatero, P.; Tapper,  U.  (1994). Positive matrix factorization: A non-negative factor model with optimal
   utilization of error estimates of data values. Environmetrics, 5: 111 -126.
Paatero, P. (1997). Least squares formulation of robust non-negative factor analysis. Chemom. Intell.
   Lab. Sys., 37: 23-35.
Paatero, P. (1998). User's guide for positive matrix factorization programs PMF2 and PMF3  Part 1:
   Tutorial, prepared by University of Helsinki, Helsinki, Finland,
Paatero, P. (1999). The multilinear engine-A table-driven, least squares program for solving multilinear
   problems, including the n-way parallel factor analysis model. Journal of Computational and Graphical
   Statistics, 8: 854-888.
Paatero, P. (2000). User's guide for positive matrix factorization programs PMF2 and PMF3  Part 2:
   Reference,  prepared by University of Helsinki, Helsinki, Finland,
Paatero, P.; Hopke, P.K.; Song, X.H.; Ramadan, Z. (2002). Understanding  and controlling  rotations in
   factor analytical models. Chemom. Intell. Lab. Sys., 60: 253-264.
Paatero, P.; Hopke, P.K.; Hoppenstock, J.; Eberly, S.I. (2003). Advanced factor analysis of spatial
   distributions of PM25 in the eastern United States. Environ. Sci. Technol., 37(11): 2460-2476.
                                              117

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Paatero, P.; Hopke, P.K.; Begum, B.A.; Biswas, S.K. (2005). A graphical diagnostic method for assessing
   the rotation in factor analytical models of atmospheric pollution. Atmos. Environ., 39(1):  193-201.
Paatero P. and Hopke P.K. (2008) Rotational tools for factor analytic models implemented by using the
   multilinear engine. Chemometrics. 23 (2):  91-100
Paatero, P., Eberly, S., Brown, S. G., and Morris, G. A.(2014) "Methods for estimating uncertainty in factor
   analytic solutions", Atmos. Meas. Tech., 7, 781-797, doi:10.5194/amt-7-781-2014.
Pancras, J.P.; Ondov, J.M.; Poor, N.; Landis,  M.S.; Stevens, R.K. (2006). Identification of sources and
   estimation of emission profiles from highly time-resolved pollutant measurements  in Tampa, FL.
   Atmos. Environ., 40(Suppl. 2): S467-S481.
Pancras J.P., Ondov J.M., Zeisler R. (2005) Multi-element electrothermal AAS determination of 11 marker
    elements in fine ambient aerosol slurry samples collected with SEAS-II. Analytica Chimica Acta 538:
    303-312.
Pandolfi, M.; Viana, M.; Minguillon, M.C.; Querol, X.; Alastuey, A.; Amato, F.; Celades, I.; Escrig, A.;
   Monfort, E. (2008). Receptor models application to multi-year ambient PM10 measurements in an
   industrialized ceramic area: Comparison of source apportionment results. Atmos. Environ., 42(40):
   9007-9017.
Paterson, K.G.; Sagady, J.L.; Hooper, D.L.  (1999). Analysis of air quality data using positive matrix
   factorization. Environ. Sci. Technol., 33(4): 635-641.
Pekney, N.J.; Davidson, C.I.; Zhou, L.M.; Hopke, P.K. (2006a). Application of PSCF and CPFto PMF-
   modeled sources of PM2§ in Pittsburgh. Aerosol Sci. Technol., 40(10): 952-961.
Pekney, N.J.; Davidson, C.I.; Bein, K.J.; Wexler, A.S.; Johnston, M.V. (2006b).  Identification of sources of
   atmospheric PM at the Pittsburgh Supersite, Part I: Single particle analysis  and filter-based positive
   matrix factorization. Atmos. Environ., 40(Suppl. 2):  S411-S423.
Pekney, N.J.; Davidson, C.I.; Robinson, A.; Zhou, L.M.; Hopke, P.K.; Eatough, D.J.; Rogge, W.F. (2006c).
   Major source categories for PM25 in Pittsburgh using PMF and UNMIX. Aerosol Sci. Technol., 40(10):
   910-924.
Pitz, M.; Gu, J.; Soentgen, J.; Peters, A.; Cyrys, J. (2011). Particle size distribution factor as an indicator
   for the impact of the Eyjafjallajokull ash plume at ground level in Augsburg, Germany. Atmos. Chem.
   Phys., 11(17):  9367-9374.
Poirot, R.L.; Wishinski, P.R.; Hopke, P.K.; Polissar, A.V. (2001). Comparitive application of multiple
   receptor methods to identify aerosol sources in northern Vermont. Environ. Sci. Technol., 35(23):
   4622-4636.
Poirot, R.L.; Wishinski, P.R.; Hopke, P.K.; Polissar, A.V. (2002). Comparative application of multiple
   receptor methods to identify aerosol sources in northern Vermont (vol 35, pg 4622, 2001). Environ.
   Sci. Technol., 36(4): 820.
Polissar, A.V.; Hopke, P.K.; Paatero, P.; Malm, W.C.; Sisler, J.F. (1998). Atmospheric aerosol over
   Alaska 2. Elemental composition and sources. J. Geophys. Res., 103(015):  19045-19057.
Polissar, A.V.; Hopke, P.K.; Paatero, P.; Kaufmann, Y.J.; Hall, O.K.; Bodhaine,  B.A.; Dutton, E.G.; Harris,
   J.M. (1999). The aerosol at Barrow, Alaska:  Long-term trends and source locations. Atmos. Environ.,
   33(16): 2441-2458.
Polissar, A.V.; Hopke, P.K.; Poirot, R.L.  (2001).  Atmospheric aerosol over Vermont: Chemical
   composition and sources. Environ. Sci. Technol., 35(23): 4604-4621.
Polissar, A.V.; Hopke, P.K.; Harris, J.M. (2001). Source regions for atmospheric aerosol  measured at
   Barrow, Alaska. Environ. Sci.  Technol., 35(21):  4214-4226.
Politis D.N. and White H. (2003) Automatic block-length selection for the dependent bootstrap.  Prepared
   by the University of California at San Diego,  La Jolla, CA, February.
Prendes, P.; Andrade, J.M.; Lopez-Maha, P. (1999). Source apportionment  of inorganic ions in airborne
   urban particles from Coruna City using positive matrix factorization. Talanta, 49(1): 165.
Qi, L.; Nakao, S.; Malloy, Q.; Warren, B.; Cocker, D.R. (2010). Can secondary organic aerosol formed in
   an atmospheric simulation chamber continuously age? Atmos. Environ.,  44(25): 2990-2996.
                                              118

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Qin, Y.; Oduyemi, K.; Chan, L.Y. (2002). Comparative testing of PMF and CFA models. Chemom. Intell.
   Lab. Sys., 61(1-2):  75-87. doi:10.1016/S0169-7439(01)00175-7.
Qin, Y.; Oduyemi, K. (2003). Atmospheric aerosol source identification and estimates of source
   contributions to air pollution in Dundee, UK. Atmos. Environ., 37(13): 1799-1809.
Qin, Y.J.; Kim, E.; Hopke, P.K. (2006). The concentrations and sources of PM2 5 in metropolitan New York
   city. Atmos. Environ., 40(Suppl.2): S312-S332.
Raatikainen, T.; Vaattovaara, P.; Tiitta, P.; Miettinen, P.; Rautiainen, J.; Ehn, M.; Kulmala, M.; Laaksonen,
   A.; Worsnop, D.R. (2010). Physicochemical properties and origin of organic groups detected in boreal
   forest using an aerosol mass spectrometer. Atmos. Chem. Phys.,  10(4):  2063-2077.
Raja, S.;  Biswas, K.F.; Husain, L; Hopke, P.K. (2010). Source apportionment of the atmospheric aerosol
   in Lahore, Pakistan. Water Air and Soil Pollution, 208(1-4): 43-57.
Ramadan, Z.; Song, X.H.; Hopke, P.K. (2000). Identification of sources of Phoenix aerosol by positive
   matrix factorization. J. Air Waste Manage. Assoc., 50(8):  1308-1320.
Ramadan, Z.; Eickhout, B.;  Song, X.H.; Buydens, L.M.C.; Hopke, P.K. (2003). Comparison of positive
   matrix factorization  and  multilinear engine for the source apportionment of particulate pollutants.
   Chemom. Intell.  Lab. Sys., 66(1):  15-28. doi:10.1016/S0169-7439(02)00160-0.
Raman, R.S.; Hopke, P.K. (2007). Source apportionment of fine particles utilizing partially speciated
   carbonaceous aerosol data at two rural locations in New York State. Atmos. Environ.,  41:  7923-7939.
Raman, R.S.; Ramachandran, S. (2010). Annual and seasonal variability of ambient aerosols over an
   urban region in western  India. Atmos. Environ., 44(9): 1200-1208.
Raman, R.S.; Ramachandran, S.; Kedia, S. (2011). A methodology to estimate source-specific aerosol
   radiative forcing. J.  Aerosol Sci., 42(5): 305-320.
Raman, R.S.; Ramachandran, S. (2011). Source apportionment of the ionic components in precipitation
   over an urban region in Western India. Environmental Science and Pollution Research, 18(2): 212-
   225.
Reff, A.; Eberly, S.I.; Bhave, P.V. (2007). Receptor modeling of ambient particulate matter data using
   positive matrix factorization: Review of existing methods. J. Air Waste Manage. Assoc., 57(2): 146-
   154.
Richard, A.; Gianini, M.F.D.; Mohr,  C.; Furger, M.; Bukowiecki, N.; Minguillon, M.C.; Lienemann, P.;
   Flechsig, U.; Appel, K.; DeCarlo, P.F.;  Heringa, M.F.; Chirico, R.; Baltensperger, U.; Prevot, A.S.H.
   (2011). Source apportionment of size and time resolved trace elements and organic aerosols from an
   urban courtyard  site in Switzerland. Atmos. Chem. Phys., 11(17):  8945-8963.
Rizzo, M.J.; Scheff, P.A. (2004). Assessing ozone networks using positive matrix factorization.
   Environmental Progress, 23(2):  110-119.
Rizzo, M.J.; Scheff, P.A. (2007). Fine particulate source apportionment using data from the USEPA
   speciation trends network in Chicago, Illinois: Comparison of two  source apportionment models.
   Atmos. Environ., 41(29): 6276-6288.
Rizzo, M.J.; Scheff, P.A. (2007). Utilizing the Chemical Mass Balance and Positive Matrix Factorization
   models to determine influential species and examine  possible rotations in receptor modeling results.
   Atmos. Environ., 41(33): 6986-6998.
Robinson, N.H.; Hamilton, J.F.; Allan, J.D.; Langford, B.; Oram, D.E.; Chen, Q.; Docherty, K.; Farmer,
   O.K.; Jimenez, J.L.; Ward, M.W.; Hewitt, C.N.; Barley, M.H.; Jenkin, M.E.; Rickard, A.R.; Martin, ST.;
   McFiggans, G.; Coe, H.  (2011).  Evidence for a significant proportion of Secondary Organic Aerosol
   from isoprene above a maritime tropical forest. Atmos. Chem. Phys., 11(3):  1039-1050.
Rodriguez, S.; Alastuey, A.; Alonso-Perez, S.; Querol, X.; Cuevas, E.; Abreu-Afonso, J.; Viana, M.; Perez,
   N.; Pandolfi,  M.; de la Rosa, J. (2011). Transport of desert dust mixed with North African industrial
   pollutants in the  subtropical Saharan Air Layer. Atmos. Chem. Phys., 11(13): 6663-6685.
Santoso, M.; Hopke, P.K.; Hidayat, A.; Diah, D.L. (2008). Source identification of the atmospheric aerosol
   at urban and suburban sites in Indonesia by positive matrix factorization. Sci. Total Environ. , 397(1-3):
   229-237.
                                              119

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Sarnat, J.A.; Marmur, A.; Klein, M.; Kim, E.; Russell, A.G.; Sarnat, S.E.; Mulholland, J.A.; Hopke, P.K.;
   Tolbert, P.E. (2008). Fine particle sources and cardiorespiratory morbidity:  An application of chemical
   mass balance and factor analytical source-apportionment methods. Environ. Health Perspect, 116(4):
   459-466.
Sauvage, S.; Plaisance, H.; Locoge,  N.; Wroblewski, A.; Coddeville, P.; Galloo, J.C. (2009). Long term
   measurement and source apportionment of non-methane hydrocarbons in three French rural areas.
   Atmos. Environ., 43(15): 2430-2441.
Schnelle-Kreis, J.; Sklorz, M.; Orasche, J.; Stolzel, M.; Peters, A.; Zimmermann, R. (2007). Semi volatile
   organic compounds in ambient PM2 5.  Seasonal trends and daily resolved source contributions.
   Environ. Sci. Technol., 41(11): 3821-3828.
Shi, G.L.; Li, X.; Feng, Y.C.; Wang, Y.Q.;  Wu, J.H.; Li, J.; Zhu, T. (2009). Combined source
   apportionment, using positive matrix factorization-chemical mass balance and principal component
   analysis/multiple linear regression-chemical mass balance models. Atmos. Environ., 43(18): 2929-
   2937.
Shim, C.; Wang, Y.; Yoshida, Y. (2008). Evaluation of model-simulated source contributions to
   tropospheric ozone with  aircraft observations in the factor-projected space. Atmos. Chem. Phys., 8(6):
   1751-1761.
Shrivastava, M.K.; Subramanian, R.; Rogge, W.F.; Robinson, A.L. (2007). Sources of organic aerosol:
   Positive  matrix factorization of molecular marker data and comparison of results from different source
   apportionment models. Atmos. Environ., 41(40): 9353-9369.
Slowik, J.G.; Vlasenko, A.; McGuire,  M.; Evans, G.J.; Abbatt, J.P.D. (2010). Simultaneous factor analysis
   of organic particle and gas mass spectra: AMS and PTR-MS measurements at an urban site. Atmos.
   Chem. Phys., 10(4):  1969-1988.
Slowik, J.G.; Brook, J.;  Chang, R.Y.W.; Evans, G.J.; Hayden, K.; Jeong, C.H.; Li, S.M.; Liggio, J.; Liu,
   P.S.K.; McGuire, M.; Mihele, C.; Sjostedt, S.; Vlasenko, A.; Abbatt, J.P.D. (2011). Photochemical
   processing of organic aerosol at nearby continental sites: contrast between urban plumes and
   regional  aerosol. Atmos. Chem. Phys., 11(6):  2991-3006.
Sofowote, U.M.; McCarry, B.E.; Marvin, C.H. (2008). Source apportionment of PAH in Hamilton Harbour
   suspended sediments: Comparison of two factor analysis methods. Environ. Sci. Technol., 42(16):
   6007-6014.
Sofowote, U.M.; Hung,  H.; Rastogi, A.K.;  Westgate, J.N.; Deluca, P.F.; Su, Y.S.; McCarry, B.E. (2011).
   Assessing the long-range transport of PAH to  a sub-Arctic site using positive matrix factorization and
   potential source contribution function.  Atmos.  Environ., 45(4): 967-976.
Song, X.H.; Polissar, A.V.; Hopke, P.K. (2001). Sources of fine particle composition in the northeastern
   US. Atmos. Environ., 35(31):  5277-5286.
Song, Y.; Zhang, Y.H.; Xie,  S.D.; Zeng, L.M.; Zheng, M.;  Salmon, L.G.; Shao, M.; Slanina, S. (2006).
   Source apportionment of PM25 in Beijing by positive matrix factorization. Atmos. Environ., 40(8):
   1526-1537. 151:000236306800012.
Song, Y.; Zhang, Y.H.; Xie,  S.D.; Zeng, L.M.; Zheng, M.;  Salmon, L.G.; Shao, M.; Slanina, S. (2006).
   Source apportionment of PM25 in Beijing by positive matrix factorization (vol 40, pg 1526, 2006).
   Atmos. Environ., 40(39): 7661-7662.  151:000242289800018.
Song, Y.; Xie, S.D.; Zhang,  Y.H.; Zeng, L.M.; Salmon, L.G.; Zheng, M. (2006). Source apportionment of
   PM2 5  in Beijing using principal component analysis/absolute principal component scores and UNMIX.
   Sci. Total Environ., 372(1):  278-286.
Song, Y.; Shao, M.; Liu, Y.;  Lu, S.H.; Kuster, W.;  Goldan, P.; Xie, S.D. (2007). Source apportionment of
   ambient volatile organic compounds in Beijing. Environ. Sci. Technol., 41(12): 4348-4353.
Song, Y.; Tang, X.Y.; Xie, S.D.; Zhang, Y.H.; Wei, Y.J.; Zhang, M.S.; Zeng, L.M.; Lu, S.H. (2007). Source
   apportionment of PM25 in Beijing  in 2004. J. Hazard. Mat, 146(1-2): 124-130.
Song, Y.; Dai, W.; Shao, M.; Liu, Y.; Lu, S.H.; Kuster, W.; Goldan,  P. (2008). Comparison of receptor
   models for source apportionment of volatile organic compounds in Beijing, China. Environ. Poll.,
   156(1):  174-183.
                                             120

-------
U.S.  Environmental Protection Agency	EPA PMF 5.0 User Guide


Song, Y.; Dai, W.; Wang, X.S.; Cui, M.M.; Su, H.; Xie, S.D.; Zhang, Y.H. (2008). Identifying dominant
   sources of respirable suspended particulates in Guangzhou, China. Environmental Engineering
   Science , 25(7):  959-968.
Soonthornnonda, P.; Christensen, E.R. (2008). Source apportionment of pollutants and flows of combined
   sewer wastewater. Water Research, 42(8-9):  1989-1998.
Sun, Y.L.; Zhang, Q.; Zheng, M.; Ding, X.; Edgerton, E.S.; Wang, X.M. (2011). Characterization and
   source apportionment of water-soluble organic matter in atmospheric fine particles (PM(2.5)) with
   high-resolution aerosol mass spectrometry and GC-MS. Environ. Sci. Technol., 45(11):  4854-4861.
Sundqvist, K.L.; Tysklind, M.; Geladi, P.;  Hopke, P.K.; Wiberg, K. (2010). PCDD/F source apportionment
   in the Baltic Sea using positive matrix factorization. Environ. Sci. Technol., 44(5): 1690-1697.
Tandon, A.; Yadav, S.; Attri,  A.K. (2010).  Coupling between meteorological factors and ambient aerosol
   load. Atmos. Environ., 44(9):  1237-1243.
Tauler, R.; Viana, M.; Querol, X.; Alastuey, A.; Flight, R.M.; Wentzell, P.O.; Hopke, P.K. (2009).
   Comparison of the results obtained by four receptor modelling methods in aerosol source
   apportionment studies. Atmos. Environ., 43(26): 3989-3997.
Thimmaiah, D.; Hovorka, J.;  Hopke, P.K.  (2009). Source apportionment of winter submicron Prague
   aerosols from combined particle number size distribution and gaseous  composition data. AAQR, 9(2):
   209-236.
Thornhill, D.A.; Williams, A.E.; Onasch, T.B.; Wood, E.;  Herndon, S.C.; Kolb, C.E.; Knighton, W.B.;
   Zavala, M.; Molina, L.T.; Marr, L.C. (2010). Application of positive matrix factorization to on-road
   measurements for source apportionment of diesel- and gasoline-powered vehicle emissions in Mexico
   City. Atmos. Chem. Phys., 10(8): 3629-3644.
Thurston, G.D.; Ito,  K.; Mar, T.; Christensen, W.F.; Eatough, D.J.; Henry, R.C.; Kim, E.; Laden, F.; Lall,
   R.; Larson, T.V.; Liu, H.; Neas, L; Pinto, J.; Stolzel, M.; Suh, H.; Hopke, P.K. (2005). Workgroup
   report: Workshop on source apportionment of particulate matter health effects - Intercomparison of
   results and implications. Environ. Health Perspect, 113(12): 1768-1774.
Tian, F.L.; Chen, J.W.; Qiao, X.L.; Cai, X.Y.; Yang, P.; Wang, Z.; Wang, D.G. (2008). Source identification
   of PCDD/Fs and PCBs in pine (Cedrus deodara) needles: A case study in Dalian, China. Atmos.
   Environ., 42(-\Q):  4769-4777.
Tsai, J.; Owega, S.; Evans, G.; Jervis, R.; Fila, M.; Tan,  P.; Malpica, O. (2004).  Chemical composition  and
   source apportionment of Toronto summertime urban fine aerosol (PM25). Journal of Radioanalytical
   and Nuclear Chemistry, 259(1): 193-197.
Tsimpidi, A.P.; Karydis, V.A.; Zavala, M.;  Lei, W.; Molina, L.; Ulbrich, I.M.;  Jimenez, J.L.; Pandis, S.N.
   (2010). Evaluation of the volatility basis-set approach for the simulation of organic aerosol formation in
   the Mexico City metropolitan area. Atmos.  Chem. Phys., 10(2):  525-546.
Tsimpidi, A.P.; Karydis, V.A.; Zavala, M.;  Lei, W.; Bei, N.; Molina, L.; Pandis, S.N. (2011). Sources and
   production of organic aerosol in Mexico City: insights from the combination of a chemical transport
   model (PMCAMx-2008) and measurements during MILAGRO. Atmos.  Chem. Phys., 11(11): 5153-
   5168.
U.S.EPA (2010). EPA Positive Matrix Factorization (PMF) 3.0 model, prepared by U.S. Environmental
   Protection Agency, Research Triangle Park, NC, http://www.epa.gov/heasd/products/pmf/pmf.html
Uchimiya, M.; Aral, M.; Masunaga, S.  (2007). Fingerprinting localized dioxin contamination:  Ichihara
   anchorage case. Environ. Sci. Technol., 41(11): 3864-3870.
Ulbrich, I.M.; Canagaratna, M.R.; Zhang,  Q.; Worsnop, D.R.; Jimenez, J.L. (2009). Interpretation of
   organic components from Positive  Matrix Factorization of aerosol mass spectrometric data. Atmos.
   Chem. Phys., Q(Q): 2891-2918.
Vaccaro, S.; Sobiecka, E.; Contini, S.; Locoro, G.; Free, G.; Gawlik, B.M. (2007). The application of
   positive matrix factorization in the analysis, characterisation and detection of contaminated soils.
   Chemosphere,QQ: 1055-1063.
                                             121

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Vecchi, R.; Chiari, M.; D'Alessandro, A.; Fermo, P.; Lucarelli, F.; Mazzei, F.; Nava, S.; Piazzalunga, A.;
   Prati, P.; Silvani, F.; Valli, G. (2008). A mass closure and PMF source apportionment study on the sub-
   micron sized aerosol fraction at urban sites in Italy. Atmos. Environ., 42(9): 2240-2253.
Vecchi, R.; Bernardoni, V.; Cricchio, D.; D'Alessandro, A.; Fermo, P.; Lucarelli, F.; Nava, S.; Piazzalunga,
   A.; Valli, G. (2008). The impact of fireworks on airborne particles. Atmos. Environ., 42(6):  1121-1132.
Vedal, S.; Hannigan, M.P.; Dutton, S.J.; Miller, S.L.; Milford, J.B.; Rabinovitch, N.; Kim, S.Y.; Sheppard, L.
   (2009). The Denver Aerosol Sources and Health (DASH) study: Overview and early findings. Atmos.
   Environ., 43(9): 1666-1673.
Vestenius, M.; Leppanen, S.; Anttila, P.; Kyllonen, K.; Hatakka,  J.; Hellen, H.; Hyvarinen, A.P.; Hakola, H.
   (2011). Background concentrations and source apportionment of polycyclic aromatic hydrocarbons in
   south-eastern Finland. Atmos. Environ., 45(20):  3391-3399.
Viana, M.; Pandolfi, M.; Minguillon, M.C.; Querol, X.; Alastuey, A.; Monfort, E.; Celades, I. (2008). Inter-
   comparison of receptor models for PM source apportionment: Case study in an industrial area.
   Atmos. Environ., 42(16): 3820-3832.
Viana, M.; Amato, F.; Alastuey, A.; Querol, X.; Moreno, T.; Dos  Santos, S.G.; Herce,  M.D.; Fernandez-
   Patier, R. (2009). Chemical tracers of particulate emissions from commercial shipping. Environ. Sci.
   7ec/?no/.,43(19): 7472-7477.
Viana, M.; Salvador, P.; Artinano, B.; Querol, X.; Alastuey, A.; Pey, J.; Latz, A.J.; Cabanas, M.; Moreno,
   T.; Dos Santos, S.G.;  Herce, M.D.; Hernandez, P.O.; Garcia, D.R.; Fernandez-Patier, R. (2010).
   Assessing the performance of methods to  detect and quantify African dust in airborne particulates.
   Environ. Sci. Technol., 44(23):  8814-8820.
Vlasenko, A.;  Slowik, J.G.; Bottenheim, J.W.;  Brickell, P.C.; Chang, R.Y.W.; Macdonald, A.M.; Shantz,
   N.C.; Sjostedt, S.J.; Wiebe, H.A.; Leaitch, W.R.; Abbatt, J.P.D. (2009). Measurements of VOCs by
   proton transfer reaction mass spectrometry at a rural Ontario site: Sources and correlation to aerosol
   composition. Journal of Geophysical Research-Atmospheres, 114
Wang, D.G.; Tian, F.L.; Yang, M.;  Liu, C.L.; Li, Y.F. (2009). Application of positive matrix factorization to
   identify potential sources of PAHs in soil of Dalian, China. Environ. Poll., 157(5): 1559-1564.
Wang, H.B.; Shooter,  D. (2005). Source apportionment of fine and coarse atmospheric particles in
   Auckland,  New Zealand. Sci. Total Environ., 340(1-3):  189-198.
Wang, Y.; Zhuang, G.S.;  Tang, A.H.; Zhang, W.J.; Sun, Y.L.; Wang, Z.F.; An, Z.S. (2007). The evolution
   of chemical components of aerosols at five monitoring  sites of China during dust storms. Atmos.
   Environ., 41 (5): 1091-1106.
Wang, Y.G.; Hopke, P.K.; Chalupa, D.C.; Utell, M.J. (2011). Effect of the shutdown of a coal-fired power
   plant on urban ultrafine particles and other pollutants. Aerosol Sci. Technol., 45(10): 1245-1249.
Watson, J.G.; Chow, J.C. (2004). Receptor models for air quality management. EM, 10(Oct.):  27-36.
Watson, J.G.; Chen, L.-W.A.; Chow, J.C.; Lowenthal, D.H.; Doraiswamy, P. (2008). Source
   apportionment: Findings from the U.S. Supersite Program. J. Air Waste Manage.  Assoc., 58(2): 265-
   288. http://pubs.awma.Org/gsearch/iournal/2008/2/10.3155-1047-3289.58.2.265.pdf.
Willis, R.D. (2000). Workshop on UNMIX and  PMF as applied to PM25. Report Number EPA/600/A-
   00/048; prepared by U.S. Environmental Protection Agency,  Research Triangle Park, NC, for US EPA,
Wingfors, H.;  Hagglund, L.; Magnusson, R. (2011). Characterization of the size-distribution of aerosols
   and particle-bound content  of oxygenated  PAHs, PAHs, and n-alkanes in urban environments in
   Afghanistan. Atmos. Environ., 45(26):  4360-4369.
Wu, C.F.; Larson, T.V.; Wu, S.Y.; Williamson, J.; Westberg, H.H.; Liu, L.J.S. (2007). Source
   apportionment of PM25 and selected hazardous air pollutants in Seattle. Sci. Total Environ., 386:  42-
   52.
Xiao, R.; Takegawa, N.; Zheng, M.; Kondo, Y.; Miyazaki, Y.; Miyakawa, T.; Hu, M.; Shao, M.; Zeng, L.;
   Gong, Y.; Lu, K.; Deng, Z.; Zhao, Y.; Zhang, Y.H. (2011). Characterization and source apportionment
   of submicron aerosol with aerosol mass spectrometer during the PRIDE-PRD 2006 campaign. Atmos.
   Chem.Phys., 11(14):  6911-6929.
                                              122

-------
U.S. Environmental Protection Agency	EPA PMF 5.0 User Guide


Xie, Y.L.; Hopke, P.K.; Paatero, P. (1998). Positive matrix factorizaiton applied to a curved resolution
   problem. J. Chemometrics, 12(6): 357-364.
Xie, Y.L.; Hopke, P.K.; Paatero, P.; Barrie, L.A.; Li, S.M. (1999). Identification of source nature and
   seasonal variations of Arctic aerosol by positive matrix factorization. J. Atmos. Sci., 56(2): 249-260.
Xie, Y.L.; Hopke, P.K.; Paatero, P.; Barrie, L.A.; Li, S. (1999). Identification of source nature and seasonal
   variations of Arctic aerosol by the multilinear engine. Atmos. Environ., 33(16): 2549-2562.
Xie, Y.L.; Berkowitz, C.M. (2006). The use of positive matrix factorization with conditional probability
   functions in air quality studies: An application to hydrocarbon emissions in Houston, Texas. Atmos.
   Environ., 40(17): 3070-3091.
Yakovleva, E.; Hopke, P.K.; Wallace, L. (1999). Receptor modeling assessment of particle total exposure
   assessment methodology data. Environ. Sci. Technol., 33(20): 3645-3652.
Yatkin, S.; Bayram, A. (2008). Source apportionment of PM10 and PM25 using positive matrix factorization
   and chemical mass balance in Izmir, Turkey. Sci. Total Environ., 390(1): 109-123.
Yli-Tuomi, T.; Paatero, P.; Raunemaa, T. (1996). The soil factor in Rautavaara aerosol in positive matrix
   factorization solutions with 2 to 8 factors. J.  Aerosol Sci., 27(supplement 1):  S671-S672.
   doi:10.1016/0021-8502(96)00408-9.
Yli-Tuomi, T.; Hopke, P.K.; Paatero, P.; Basunia, M.S.; Landsberger, S.; Viisanen, Y.; Paatero, J. (2003).
   Atmospheric aerosol over Finnish Arctic: Source analysis by the multilinear engine and the  potential
   source contribution function. Atmos. Environ., 37(31):  4381-4392. doi:  10.1016/S1352-
   2310(03)00569-7.
Yu, J.Z.; Yang, H.; Zhang, H.Y.; Lau, A.K.H. (2004).  Size distributions of water-soluble organic carbon in
   ambient aerosols and its size-resolved thermal  characteristics. Atmos. Environ., 38(7):  1061-1071.
Yuan, H.; Zhuang, G.S.;  Li, J.; Wang, Z.F.; Li, J. (2008). Mixing of mineral with pollution aerosols in  dust
   season in Beijing: Revealed by source apportionment study. Atmos. Environ., 42(9): 2141-2157.
Yuan, Z.B.; Yu, J.Z.; Lau, A.K.H.;  Louie, P.K.K.; Fung, J.C.H. (2006). Application of positive matrix
   factorization in estimating aerosol secondary organic carbon in Hong Kong and its relationship with
   secondary sulfate. Atmos. Chem. Phys., 6(1): 25-34.
Yuan, Z.B.; Lau, A.K.H.;  Zhang, H.Y.; Yu, J.Z.;  Louie, P.K.K.; Fung, J.C.H. (2006). Identification and
   spatiotemporal variations of dominant PM10  sources over Hong Kong. Atmos. Environ., 40(10): 1803-
   1815.
Yuan, Z.B.; Lau, A.K.H.;  Shao, M.; Louie, P.K.K.; Liu, S.C.;  Zhu, T. (2009). Source analysis of volatile
   organic compounds by positive matrix factorization in  urban and rural environments in Beijing. Journal
   of Geophysical Research-Atmospheres, 114
Yuan, B., Min Shao, M.; Gouw, J.; David D. Parrish,  D.; Lu, S.; Wang, M.; Zeng, L.; Zhang, Q.;  Song, Y.;
Zhang, J.;Hu, M,  (2012), Volatile  organic compounds (VOCs) in urban air: How chemistry affects the
   interpretation of positive matrix factorization (PMF) analysis, J. Geophys. Res., 117
Yue, W.; Stolzel, M.; Cyrys,  J.; Pitz, M.; Heinrich, J.;  Kreyling, W.G.; Wichmann, H.E.; Peters, A.; Wang,
   S.; Hopke, P.K. (2008). Source apportionment of ambient fine particle size distribution using positive
   matrix factorization in Erfurt, Germany. Sci.  Total Environ., 398(1-3): 133-144.
Zhang, Q.; Alfarra, M.R.; Worsnop, D.R.; Allan, J.D.; Coe, H.; Canagaratna, M.R.; Jimenez, J.L. (2005).
   Deconvolution and quantification of hydrocarbon-like and oxygenated organic aerosols based on
   aerosol mass spectrometry. Environ. Sci. Technol., 39(13): 4938-4952.
Zhang, W.; Guo, J.H.; Sun, Y.L.; Yuan, H.; Zhuang, G.S.; Zhuang, Y.H.; Hao, Z.P. (2007). Source
   apportionment for,urban  PM10  and PM2§ in the  Beijing area. Chinese Science Bulletin, 52(5): 608-
   615.
Zhang, Y.; Sheesley, R.J.; Schauer, J.J.; Lewandowski, M.; Jaoui, M.; Offenberg, J.H.; Kleindienst,  T.E.;
   Edney, E.O. (2009). Source apportionment of primary and secondary organic aerosols using positive
   matrix factorization (PMF) of molecular markers. Atmos. Environ., 43(34):  5567-5574.
 Zhang, Y.X.; Schauer, J.J.;  Shafer, M.M.; Hannigan, M.P.;  Dutton, S.J. (2008). Source apportionment of
   in vitro reactive oxygen species bioassay activity from atmospheric particulate matter. Environ. Sci.
   7ec/?no/.,42(19):  7502-7509.
                                              123

-------
U.S.  Environmental Protection Agency	EPA PMF 5.0 User Guide


Zhang, Y.X.; Sheesley, R.J.; Bae, M.S.; Schauer, J.J. (2009). Sensitivity of a molecular marker based
   positive matrix factorization model to the number of receptor observations. Atmos. Environ., 43(32):
   4951-4958.
Zhao, W.; Hopke, P.K.; Karl, T. (2004). Source identification of volatile organic compounds in Houston,
   Texas. Environ. Sci. Technol., 38(5):  1338-1347.
Zhao, W.X.; Hopke, P.K.  (2004). Source apportionment for ambient particles in the San Gorgonio
   wilderness. Atmos. Environ., 38(35): 5901-5910.
Zhao, W.X.; Hopke, P.K.  (2006). Source identification for fine aerosols in Mammoth Cave National Park.
   Atmos. Res., 80(4): 309-322.
Zhao, W.X.; Hopke, P.K.  (2006). Source investigation for ambient PM2§ in Indianapolis, IN. Aerosol Sci.
   Technol., 40(10): 898-909.
Zhou, L; Hopke, P.K.; Zhao, W.X. (2009). Source apportionment of airborne particulate matter for the
   Speciation Trends Network site in Cleveland, OH. J. Air Waste Manage. Assoc., 59(3): 321-331.
Zhou, L.M.; Kim, E.; Hopke, P.K.; Stanier, C.O.; Pandis, S.N. (2004). Advanced factor analysis on
   Pittsburgh particle size-distribution data. Aerosol Sci. Technol., 38(Suppl. 1):  118-132.
Zhou, L.M.; Hopke, P.K.;  Liu, W. (2004). Comparison of two trajectory based models for locating particle
   sources for two rural New York sites. Atmos. Environ., 38(13): 1955-1963.
Zhou, L.M.; Hopke, P.K.;  Stanier, C.O.; Pandis, S.N.; Ondov, J.M.; Pancras, J.P. (2005). Investigation of
   the relationship between chemical composition and size distribution of airborne particles by partial
   least squares and positive matrix factorization. Journal of Geophysical Research-Atmospheres,
   110(07)
Zhou, L.M.; Kim, E.; Hopke, P.K.; Stanier, C.; Pandis, S.N.  (2005).  Mining airborne particulate size
   distribution data by positive matrix factorization. Journal of Geophysical Research-Atmospheres,
   110(07):  D07S19.doi:10.1029/2004JD004707.
Zota, A.R.; Willis, R.; Jim, R.; Norris, G.A.; Shine, J.P.; Duvall, R.M.; Schaider, L.A.; Spengler, J.D.
   (2009). Impact of mine waste on airborne respirable particulates in  northeastern Oklahoma, United
   States. J. Air Waste Manage. Assoc., 59(11): 1347-1357.
                                              124

-------